Algorithm for Large-Scale Learning of SVM Classifiers

Vojtech Franc (Fraunhofer Institut FIRST IDA, Berlin, Germany)

Support Vector Machines (SVMs) have become one of the most prominent methods used for learning classifiers from example data. SVMs learning leads to a convex Quadratic Programming (QP) problem scaling with the number of training examples. Despite considerable progress made in development of SVM solvers, learning from large-scale data remain a challenging problem intractable by convectional methods. Examples of large-scale problems can be found in bio-informatics where the amount of available data (tens of millions) exceeds capabilities of existing solvers. We propose a new solver for linear SVMs which itself by far outperforms current state of the art SVM solvers, like SVM-Light, SVM-Perf and BMRM, achieving speedups of over 1000 on some datasets. Moreover, the core parts of the proposed solver can be effectively parallelized allowing to exploit modern multi-processor architectures to achieve additional speedups. The proposed SVM solver was able to train on a dataset of size 15 million examples in just 11 minutes - a competing string kernel SVM required over 27 hours to train on 10 million examples sub-sampled from this dataset.