We also provide the post-processed T-Embedding features used in our experiments. These can be downloaded here, and were extracted using the software available here. These features are post-processed by removing the first 128 components and rotating by the PCA matrix. Power-law normalization is not applied by default, but can be applied to the provided features if necessary.
All filenames use the following name format:
#datasetname#_data_k_#vocabularysize#_dim#dimensionality#.mat
for data files
#datasetname#_query_k_#vocabularysize#_dim#dimensionality#.mat
for query files
We provide both .mat and .fvecs formats. Yael is needed to load .fvecs files.
Please contact ahmet.iscen [at] inria.fr for any questions.