Subsections

AdaBoost

Introduction

The AdaBoost algorithm allows to combine a set of weak rules to more precise resulting classifier. Let ${\cal T}_{XY} = \{(x_1,y_1),\ldots,(x_l,y_l)\}$ be a training set of observable vectors $x_i\in{\cal X}\subseteq{\mathbb{R}}^n$ and corresponding binary hidden states $y_i\in\{-1,1\}$. The AdaBoost classifier sequentially calls a weak learner its input is the set ${\cal T}_{XY}$ and a discrete distribution $D_t$ over the training examples. A task of the weak learner is to train a rule $q_t\colon{\cal X}\rightarrow\{-1,1\}$ which minimizes the weighted error

\begin{displaymath}
\varepsilon_t = \sum_{i=1}^l D_t(i) W(q_t(x_i),y_i) \:.
\end{displaymath} (3)

$ W(q_t(x_i),y_i)$ is $0/1$-loss function which attains $0$ for $q_t(x_i)=y_i$ and $1$ for $q_t(x_i)\neq y_i$. The weak learner is required to produce rule $h_t$ with weighted error (3) at least a bit better than random guessing, i.e., $\varepsilon_t < 0.5$. The AdaBoost combines the weak rules to a more precise final classifier

\begin{displaymath}
q(x) = \left \{ \begin{array}{rcl}
\displaystyle 1 & \mb...
..._{t=1}^T \alpha_t q_t(x)
< 0 \:, \\
\end{array} \right .
\end{displaymath}

where $\alpha_t$, $t=1,\ldots, T$ are weights assigned by the AdaBoost to each rule. The AdaBoost is proven to decrease in each step an upper bound on the empirical error

\begin{displaymath}
\varepsilon_{EMP} = \frac{1}{l}\sum_{i=1}^l W(q_t(x_i),y_i) \:.
\end{displaymath}

Therefore the AdaBoost can produce the classifier with arbitrary low empirical error $\varepsilon_{EMP}$ which, however, does not have to necessarily correspond to low generalization error. This implies that the number $T$ of weak rules must be appropriately chosen. The cross-validation is a possible choice to resolve this problem.

A selection of proper weak rules is application related problem. A useful example is a weak rule defined as

\begin{displaymath}
h_t(x) = \left \{ \begin{array}{rcl}
1 & \mbox{for} ...
...
-1 & \mbox{for} & [x]_j < b \:,\\
\end{array} \right .
\end{displaymath} (4)

where $j\in\{1,\ldots,n\}$ is a feature index and $b\in{\mathbb{R}}$ is a bias. The parameters $j$ and $b$ are trained to minimize the weighted error (3). The weak rule (4) uses just one feature $j$ for classification. The AdaBoost algorithm combining rules of this type can be used as a feature selection method.

Task assignment

  1. Implement learning algorithm of weak rule (4) and incorporate it to the AdaBoost algorithm implemented in function adaboost of the STPRtool.
  2. Apply the AdaBoost algorithm to train binary classifier for Brodatz textures $1$ and $2$ [brodatz1_trn.mat]. Use the cross-validation procedure to select the proper number $T$ of weak rules.
  3. Validate the resulting SVM classifier on the testing data [brodatz1_tst.mat].

Useful functions

adaboost_exp1 Example on AdaBoost.
weak_learner2d Example of weak learner for AdaBoost.
crossval Partitions data for cross-validation.
adaboost AdaBoost training algorithm.
adaclass AdaBoost classifier.



Vojtech Franc
2004-08-31