AdaBoost

This week's labs will be about implementing the AdaBoost algorithm. A classifier will be trained which will classify images of digits. There will be two classes: first ('positive') class will consist of images of a single digit (of your choice), the second class will contain all the remaining digits. The problem is therefore to distinguish the choosen digit from the remaining ones. 
The trained classifier is a cascade of weak classifiers. Each weak classifier will take a value of a single pixel, threshold it, and classify the image as positive or negative, depending on the pixel value being under or over the threshold.

Task formulation

The AdaBoost learning is described in [2], page 4. The term Zt is a normalisation factor chosen so that Dt+1 will be a distribution (page 8). The double square brackets operator yields 1 if the condition inside is met, 0 otherwise.

The input for the training consists of a set of 13x13 grayscale images of digits. The images are accordingly labeled, the digits they represent are known. Choose one of the digits as the positive class (yi  = +1), remaining digits will serve as counterexamples (negative class, yi = -1).

Set H of weak classifiers consists of 13 x 13 = 169 possible weak classifiers, each of them processes a single pixel value. Each classifier is parametrised by two values: threshold θ and parity  p ∈ {+1, -1}. The parity identifies, whether the positive class is under or over the threshold θ. A weak classifier therefore takes a pixel value at position (x,y), compares it with a threshold θ, and decides to which class the image belongs:

hx,y(I) = sign[p * (I(x,y) - θ)].

Threshold θ and parity p are unknown, and are to be specified during the training.

The task

  1. Download training (trn_data) and test (tst_data) data from data_rpz33_cv07.mat.
  2. Choose one digit as the positive class. Remaining digits will form the negative class. The measurements are intensities of all 13×13 pixels:
    		X = reshape(trn_data.images, 13*13, []);
    		
    and class is -1 for each sample except the chosen digit, where it's 1:
    		y = -ones(1,size(X,2)); y(trn_data.labels == digit) = 1;
    		
  3. Implement the AdaBoost training. To find appropriate threshold and parity, use function findThetaPar.
    Hint: The update of weights Dt+1 is easily implemented using vector notation.
  4. Display evolution of training and test errors in one figure.

References

[1] short support text on AdaBoost
[2] AdaBoost lecture

Created by Jan Šochman, last update 18.7.2011