AdaBoost
This week's labs will be about implementing the AdaBoost algorithm. A classifier
will be trained which will classify images of digits. There will be two classes:
first ('positive') class will consist of images of a single digit (of your
choice), the second class will contain all the remaining digits. The problem is
therefore to distinguish the choosen digit from the remaining ones.
The trained classifier is a cascade of weak classifiers. Each weak classifier
will take a value of a single pixel, threshold it, and classify the image as
positive or negative, depending on the pixel value being under or over the
threshold.
Task formulation
The AdaBoost learning is described in [2], page 4.
The term Zt is a normalisation factor chosen so that Dt+1 will be a
distribution (page 8). The double square brackets operator yields 1 if the
condition inside is met, 0 otherwise.
The input for the training consists of a set of 13x13 grayscale images of
digits. The images are accordingly labeled, the digits they represent are known.
Choose one of the digits as the positive class (yi = +1),
remaining digits will serve as counterexamples (negative class, yi =
-1).
Set H of weak classifiers consists of 13 x 13 = 169 possible weak
classifiers, each of them processes a single pixel value. Each classifier is
parametrised by two values: threshold θ
and parity p ∈ {+1, -1}.
The parity identifies, whether the positive class is under or
over the threshold θ. A weak
classifier therefore takes a pixel value at position (x,y), compares it with a
threshold θ, and decides to which
class the image belongs:
h
x,y(I) = sign[
p * (I(x,y)
-
θ)].
Threshold θ and parity p
are unknown, and are to be specified during the training.
The task
- Download training (trn_data) and test (tst_data) data from data_rpz33_cv07.mat.
- Choose one digit as the positive class. Remaining digits will form the
negative class. The measurements are intensities of all 13×13 pixels:
X = reshape(trn_data.images, 13*13, []);
and class is -1 for each sample except the chosen digit, where it's 1:
y = -ones(1,size(X,2)); y(trn_data.labels == digit) = 1;
- Implement the AdaBoost training. To find appropriate threshold and parity,
use function findThetaPar.
Hint: The update of weights Dt+1
is easily implemented using vector notation.
- Display evolution of training and test errors in one figure.
References
[1] short support text on AdaBoost
[2] AdaBoost lecture
Created by Jan
Šochman, last update 18.7.2011