High-Dimensional Non-Linear Variable Selection through Hierarchical
Kernel Learning

Invited talk by Francis Bach

Abstract:
We consider the problem of high-dimensional non-linear variable selection for
supervised learning. Our approach is based on performing linear selection among
exponentially many appropriately defined positive definite kernels that
characterize non-linear interactions between the original variables. To select
efficiently from these many kernels, we use the natural hierarchical structure
of the problem to extend the multiple kernel learning framework to kernels that
can be embedded in a directed acyclic graph; we show that it is then possible to
perform kernel selection through a graph-adapted sparsity-inducing norm, in
polynomial time in the number of selected kernels. Moreover, we study the
consistency of variable selection in high-dimensional settings, showing that
under certain assumptions, our regularization framework allows a number of
irrelevant variables which is exponential in the number of observations. Our
simulations on synthetic datasets and datasets from the UCI repository show
state-of-the-art predictive performance for non-linear regression problems.