The EM Algorithm for Nick Carter

Famous investigator Nick Carter got into troubles while solving a case of mysterious disappearance of lady Thun's dog. The only clue he was able to obtain was a set of photographs of the likely villain. Unfortunatelly, the photographs are of so low quality, that they are almost unreadable. 

How is Nick now missing his 33RPZ seminars, which he was light-headedly skipping during his studies. Well, it's now too late to feel sorry about it. On his own, he is unable to exploit the photographs he have and successfully solve the case. Fortunatelly, his old friend, inspector Kidney, got a bright idea, and told him about you, our talented students. 

So there we have Nick, in a dire need of your help. Can you help him to reveal the true identity of the villain?

Nick Carter
Figure 1. Nick Carter pondering the EM algorithm

The clues

So the unfortunate Nick has m corrupted images of Himg x Wimg pixels (Figure 2). He knows, that each image depicts the face of the villain. He suspects, that the faces do not occupy whole images - the image size of a face is only Himg x w pixels. I.e. a face spans whole height of the image, but is narrower. Thus, only the horizontal position of the face varies in the images (i.e. we can assume that horizontal position k is the only unknown). We also know that backround of the images (i.e. the image area outside the face itself) is constant - has constant intensity b. See figure 3. The images are very noisy (assume gaussian noise distribution here).

 

Nick Carter
Figure 2. One of the Nick's images



Nick Carter
Figure 3. Image composition. The face is at an unknown horizontal position k.

The stochastic model

We have a set of m images {X1, X2, ... , Xm}, of Himg × Wimg pixels. Xis are the images, each has n = Himg * Wimg pixels. In each image there is the villain's face F at position k. The face dimension is Himg x w pixels, pixels outside the face have constant intensity b. Each pixel Xi(r,c), i.e. pixel in i-th image at position (r, c), is observed with a gaussian noise N (0). Therefore, the probability of observing value Xi(r,c), assuming face at position ki, is:

 

  1. for background pixels (for c: 1 <= c < ki OR ki+w <= c <=Wimg ):

    P(xi(r,c) | ki) = N(b, σ)
  2. for face pixels (i.e. for c: ki <= c < ki+w ):

    P(xi(r,c) | ki) = N(F(r,c-ki+1), σ) .
The probability of observing image Xi, assuming face at position ki, can be written as a product over all pixels:

P( Xi | ki ) = Πr Πc P( xi(r,c) | ki) .

The probability of observing the set of all m images P( {X1,...,Xm} ) is

P( {X1,...,Xm} ) = Πi P( Xi) = Πi Σk P( Xi , k) = Πi Σk P(k) P( Xi | k) .

Taking the logarithm, we obtain the likelihood of the image set: L = log P( {X1,...,Xm} ) .

As Nick Carter correctly guessed, the most likely estimate of the villain's face F can be obtained using the EM algorithm. Hidden parameters are here the face positions ki , i=1,..,m in the images.

EM Formulation

The EM algorithm will be used for finding the Maximal Likelihood (ML)-estimate of  parameters F, b, σ

(F, b, σ) = argmax P({X1,...,Xm}|F, b, σ).

The hidden parameters are the face positions ki.

The EM algorithm iteratively executes in two steps, E-step and M-step:
  1. The E-step:
    In the E-step, coefficients α(k,i) are computed for current estimates of F, b, σ:

    α(k,i) = P(k) * P(Xi | k) / Σk [ P(k) * P(Xi | k) ] ,


    where k = 1,...,Wimg-w+1, i = 1,...,m
    The coefficients α(k,i) represent estimates of probability P(ki|Xi). In other words, here we estimate the probability, that i-th image contains the face F at position ki. The probability is evaluated for all images and for all possible positions.
  2. The M-step:
    In the M-step, parameters P(k), F, b, and σ are estimated (by maximising the lower bound of the likelihood) for current estimates of α(k,i). The probability P(k) is computed as the average of α(k,i) over all images i.

    P(k) = Σi α(k,i) / m ,

    where k = 1, ..., Wimg - w + 1, and the sum goes over i = 1, ..., m. The remaining parameters F, b, σ are estimated by maximising

    (F, b, σ) = argmax Σi Σk α(k,i) * log P(Xi | k, F, b, σ) ,         (1)

    k = 1, ..., Wimg - w + 1, and the sum again goes over  i = 1, ..., m. The equations for F, b, σ are obtained by setting respective partial derivations of eq. (1) to zero. We get

    F = [ Σi Σk α(k,i) * Xi (:, k:k+w-1) ] / m

    b = [ Σi Σk α(k,i) * S(k,i) ] / [ m * (n-Himg*w) ] ,

    σ2 = [ Σi Σk α(k,i) * ( A(k,i) + B(k,i) ) ] / (m*n) , 

    where
    S( k,i) = Σr Σc Xi( r,c) , r=1,..., Himg , c=[1:k-1, k+w:Wimg]
    A(k,i) = Σr Σc [ Xi ( r,k+c-1) - F(r,c) ]2 , r=1,...,Himg , c=1,...,w
    B( k,i) = Σr Σc [ Xi( r,c) - b ]2 , r=1,..., Himg , c=[1:k-1, k+w:Wimg]

    ( Try to derive the equation for b. Set the partial derivation of eq. (1) to zero, and express b. )

The tasks

  1. Go through the EM algorithm.
  2. Try to derive the equation for b (the background intensity) in the M-step.
  3. Use the EM-algorithm to obtain the maximally likely estimate of the villain's face F. In each iteration, display the actual estimate of F along with the likelihood of the estimate. The input images can be downloaded here from image_data.mat. The datafile contains 500 images of 45x60 pixels. The face width w is expected to be 36 pixels. Compute the estimate for 
    increasing m = 10,100,500.
  4. In the first 10 images, mark the face position ki which you have found, ki = argmax ki P( ki | Xi).
  5. pozice tváøe
    Figure 4. The mostlikely face position k1in the first image.

Hint

In the E-step, it is reccommended first to sum up all the exponents of the gaussian distributions, which occur in the products. Also, to avoid rounding errors in the exp function, it is reccommended, before the division takes place, to multiply both the numerator and the denumerator of the fraction by a large constant (i.e. by adding a large number to the exponents).

References

[1] Expectation-maximization algorithm (Wikipedia).
[2] Michail I. Schlesinger, Vaclav Hlavac. Ten Lectures on Statistical and Structural Pattern Recognition. Kluwer Academic Publishers, 2002. 

Created by Martin Urban, last update 18.7.2011