AdaBoost

Introduction

The AdaBoost algorithm allows to combine a set of weak rules to more precise resulting classifier. Let ${\cal T}_{XY} = \{(x_1,y_1),\ldots,(x_l,y_l)\}$ be a training set of observable vectors $x_i\in{\cal X}\subseteq{\mathbb{R}}^n$ and corresponding binary hidden states $y_i\in\{-1,1\}$ . The AdaBoost classifier sequentially calls a weak learner its input is the set ${\cal T}_{XY}$ and a discrete distribution over the training examples. A task of the weak learner is to train a rule $q_t\colon{\cal X}\rightarrow\{-1,1\}$ which minimizes the weighted error

$\begin{displaymath} \varepsilon_t = \sum_{i=1}^l D_t(i) W(q_t(x_i),y_i) \:. \end{displaymath}$

(3)

-loss function which attains

for

and

for $q_t(x_i)\neq y_i$ . The weak learner is required to produce rule

with weighted error (3) at least a bit better than random guessing, i.e., $\varepsilon_t < 0.5$ . The AdaBoost combines the weak rules to a more precise final classifier

$\begin{displaymath} q(x) = \left \{ \begin{array}{rcl} \displaystyle 1 & \mb... ..._{t=1}^T \alpha_t q_t(x) < 0 \:, \\ \end{array} \right . \end{displaymath}$

where $\alpha_t$ , $t=1,\ldots, T$ are weights assigned by the AdaBoost to each rule. The AdaBoost is proven to decrease in each step an upper bound on the empirical error

$\begin{displaymath} \varepsilon_{EMP} = \frac{1}{l}\sum_{i=1}^l W(q_t(x_i),y_i) \:. \end{displaymath}$

Therefore the AdaBoost can produce the classifier with arbitrary low empirical error $\varepsilon_{EMP}$ which, however, does not have to necessarily correspond to low generalization error. This implies that the number

of weak rules must be appropriately chosen. The cross-validation is a possible choice to resolve this problem.

A selection of proper weak rules is application related problem. A useful example is a weak rule defined as

$\begin{displaymath} h_t(x) = \left \{ \begin{array}{rcl} 1 & \mbox{for} ... ... -1 & \mbox{for} & [x]_j < b \:,\\ \end{array} \right . \end{displaymath}$

(4)

where $j\in\{1,\ldots,n\}$ is a feature index and $b\in{\mathbb{R}}$ is a bias. The parameters

and

are trained to minimize the weighted error (3). The weak rule (4) uses just one feature

for classification. The AdaBoost algorithm combining rules of this type can be used as a feature selection method.

Task assignment

Implement learning algorithm of weak rule (4) and incorporate it to the AdaBoost algorithm implemented in function adaboost of the STPRtool.
Apply the AdaBoost algorithm to train binary classifier for Brodatz textures and [brodatz1_trn.mat]. Use the cross-validation procedure to select the proper number of weak rules.
Validate the resulting SVM classifier on the testing data [brodatz1_tst.mat].

Useful functions

adaboost_exp1 Example on AdaBoost.

weak_learner2d Example of weak learner for AdaBoost.

crossval Partitions data for cross-validation.

adaboost AdaBoost training algorithm.

adaclass AdaBoost classifier.

Vojtech Franc
2004-08-31

adaboost_exp1	Example on AdaBoost.
weak_learner2d	Example of weak learner for AdaBoost.
`crossval`	Partitions data for cross-validation.
`adaboost`	AdaBoost training algorithm.
`adaclass`	AdaBoost classifier.