Support Vector Machines (SVM)

Today, we will generalise the SVM classifier, implemented last time, to non-linear decision functions.

Task formulation

Generalisation of the linear SVM can be implemented in several ways. But always we take advantage of the fact that, in the dual formulation, all operations on the data are dot products. The generalisation is then simple, the dot products are replaced by more general 'kernel functions'. The approach is explained in 4th chapter of [4].

In our case we will use RBF kernel function K(xi, xj) = exp(-||xi - xj||2/2σ2). This function maps the original data into a space of infinite dimensionality. Therefore, the mapped data themselves cannot be represented - it is not equivalent to expressing the data in a higher-dimensional space, as was done with the perceptron. Rather, we have to use the kernel function consistently everywhere where we have used the dot product in the linear case.

The task

  1. (At home) Study the SVM generalisation to non-linear decision functions in chapter 4 of [4] .

  2. Generalise the SVM, which you have implemented last time. Use quadratic mapping from equations (62-64). Demonstrate its functionality on a simple example (generated by createdata).

    Hint: For visualisation of the decision boundary use function pboundary in the same way as it was used for the Perceptron. Implement the SVM classification function, i.e. replace your classif_quadrat_perc with classif_rbf_svm, wich will have the same arguments.
    Input to the function classif_rbf_svm are test data, supplied automatically by the pboundary function, and a struct model, which contain whichever data you need for specification of the classifier. The function outputs a vector of the same length as the test data, which contains +1 or -1 depending on which class were the test data classified to.

  3. Experiment with several training data sets.

  4. Use your non-linear SVM classifier to classify characters from file data_33rpz_cv08.mat, using two features:
    x = (sum of pixel intensities in the left half of the image) - (sum of pixel intensities in the right half of the image)
    y = (sum of pixel intensities in the top half of the image) - (sum of pixel intensities in the bottom half of the image)

  5. Experiment with different parameters C and σ.

Bonus task

In the character classification task find the optimal value of parameter σ of the RBF kernel using the cross-validation (see labs Parzen windows).

References

[1] Text of exercise from previous course.
[2] Lecture slides
[3] Quadratic programming
[4] Christopher J. C. Burges. A Tutorial On Support Vector Machines for Pattern Recognition.
   
Created by Jan Šochman, last update 18.7.2011