In this work we present a definition of "relevancy" based on spectral properties of the Laplacian of the features' measurement matrix. The feature selection process is then based on a continuous ranking of the features defined by a least-squares optimization process. A remarkable property of the feature relevance function is that sparse solutions for the ranking values naturally emerge as a result of a ``biased non-negativity'' of a key matrix in the process. As a result, a simple least-squares optimization process converges onto a sparse solution, i.e., a selection of a subset of features which form a local maxima over the relevance function. The feature selection algorithm can be embedded in both unsupervised and supervised inference problems and empirical evidence show that the feature selections typically achieve high accuracy even when only a small fraction of the features are relevant.
Preliminary results were presented in ICCV'03. Since then we have extended the basic results to include "side information", creating topographical maps of all relevant feature subsets, kernalization of the basic principle and further applications - such as for gene expression data selection.
The work was Jointly done with Lior Wolf