What is the Met dataset?
The Met dataset is a large-scale dataset for Instance-Level Recognition (ILR) in the artwork domain.
- We rely on the open access collection from the Metropolitan Museum of Art (The Met) in New York to form the training set, which consists of about 400k images from more than 224k classes, with artworks of world-level geographic coverage and chronological periods dating back to the Paleolithic period. Each museum exhibit corresponds to a unique artwork, and defines its own class. The training set exhibits a long-tail distribution with more than half of the classes represented by a single image, making it a special case of few-shot learning.
- We have established ground-truth for more than 1,100 images from museum visitors, which form the Met queries. The goal is to recognize the Met exhibits depicted in the Met queries. There is a distribution shift between these queries and the training images which are created in studio-like conditions. We additionally include a large set of images not related to The Met, which form an Out-Of-Distribution (OOD) query set, the distractor queries. There is no Met exhibit despicted in distractor queries. The query set is composed by the combination of those two sets.
Evaluation
Recognition performance on the test set (queries) of the Met Dataset is measured with average classification accuracy (ACC) on the Met queries, and with
Global Average Precision (GAP) on all queries.
Downloads
Dataset
The images of the dataset and the ground-truth files can be downloaded from the links below. All images have been resized so that their largest side is 500 pixels.
Embedding models
Models for descriptor extraction can be downloaded from the links below. They can be used directly to extract descriptors with the provided code for descriptor extraction.
- ResNet18-IN-SRC: trained on Met with contrastive loss (Syn+Real-Closest). Initialization: ImageNet pre-training
- ResNet18-SWSL-SRC: trained on Met with contrastive loss (Syn+Real-Closest). Initialization: SWSL
Descriptors
Descriptors extracted using the models above can be downloaded from the links below. They can be used directly with the provided code for kNN classification.
Code
Code is provided on
github to offer support for:
- - using the dataset
- - performing the evaluation
- - reproducing experiments in the NeurIPS 2021 paper
Related publication
The Met Dataset: Instance-level Recognition for Artworks [
pdf |
arXiv version |
bib |
poster |
video ]
N.A. Ypsilantis, N. Garcia, G. Han, S. Ibrahimi, N. van Noord, G. Tolias
Accepted at NeurIPS 2021 Track on Datasets and Benchmarks.
(arXiv version includes the supplementary material.)