Maximum likelihood parameter estimation

At the very beginning of the recognition labs, we assumed the conditioned measurement probabilities p(x|k) and the apriori probabilities P(k) to be know and we used them to find the optimal Bayesian strategy. Later, we abandoned the assumption of the known apriori probability and we constructed the optimal minimax strategy. Today, we are faced with the problem of unknown probability density function p(x|k) and we will be interested in its estimation.

Problem formulation

We will keep on solving the letters classification task. However, in contrast to the previous labs, both, apriori probabilities P(k) and probability density functions p(x|k) are unknown. We have only training examples (training set) from which the probabilities can be estimated and test examples (test set) to verify our estimate. To estimate p(x|k) we will use the maximum likelihood estimate. Having estimated the probability density functions we can use the Bayesian classificator to solve the problem. This approach, where density estimates are used instead of the true densities, is called "naive Bayes" in the literature.

Following two simple measurements will be used during the task:

x = (sum of pixel intensities in the left half of the image) - (sum of pixel intensities in the right half of the image)
y = (sum of pixel intensities in the top half of the image) - (sum of pixel intensities in the bottom half of the image)

We will assume the normal distribution of the class probability density functions: p(x|k) ~ N(μk, σk). To estimate the densities, only their parameters, μk and σk, have to be estimated. We will find their maximum likelihood estimates.

Estimated apriori probabilities and probability density functions will be used to build a Bayesian classifier which, in turn, will be applied to the test set.

The task

  1. Load the datafile data_33rpz_cv04.mat into Matlab.

  2. For all the training sets do following:
    1. Compute the apriori probabilities P(A) a P(C).
    2. Compute the maximum likelihood estimates of the parameters μk and σk of  the distributions p(x|A) and p(x|C).
      The estimation is explained in [2].
    3. For  μk computed in 2.2 plot the log-likelihood function L from equation (2) in support text [2] (or [1], eq 5, page 87) as a function of σk. It is enough to do it for one class only, e.g. class A.

  3. Plot the estimates of  p(x|A) and p(x|C) into one graph together with normalised histogram of the training set.

  4. Use the estimates to build a Bayesian classifier. Apply the classifier to the test set and compute the classification error.
    Hint: use the implementation from previous labs.

Bonus task

  1. Repeat steps 2.1, 2.2 and 4  for two-dimensional measurements X = (x, y)T.

  2. As in step 3, display the distribution estimates (use the function pgauss) and the test set (function ppatterns). 

Recommended literature

[1] Richard O. Duda, Peter E. Hart, David G. Stork. Pattern Classification.
[2] Maximum Likelihood Parameter Estimation (short support text for labs)
[3] Maximálně věrohodný odhad (longer text in Czech, includes multi-dimensional normal distribution estimates (13,14) needed for bonus task)
Created by Jan ©ochman. Last modification 18.7.2011