Browse » Benchmark

Top Benchmark Datasets

Roboflow hosts the most popular computer and machine vision benchmarking and transfer learning datasets. Datasets in this category include Microsoft COCO, Pascal VOC, MNIST, and more.

THE MNIST DATABASE of handwritten digits

Authors:

  • Yann LeCun, Courant Institute, NYU
  • Corinna Cortes, Google Labs, New York
  • Christopher J.C. Burges, Microsoft Research, Redmond

Dataset Obtained From: http://yann.lecun.com/exdb/mnist/

All images were sized 28x28 in the original dataset

The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.

It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

Version 1 (original-images_trainSetSplitBy80_20):

  • Original, raw images, with the train set split to provide 80% of its images to the training set and 20% of its images to the validation set
  • Trained from Roboflow Classification Model's ImageNet training checkpoint

Version 2 (original-images_ModifiedClasses_trainSetSplitBy80_20):

  • Original, raw images, with the train set split to provide 80% of its images to the training set and 20% of its images to the validation set
  • Modify Classes, a Roboflow preprocessing feature, was employed to change class names from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 to one, two, three, four, five, six, seven, eight, nine
  • Trained from the Roboflow Classification Model's ImageNet training checkpoint

Version 3 (original-images_Original-MNIST-Splits):

  • Original images, with the original splits for MNIST: train (86% of images - 60,000 images) set and test (14% of images - 10,000 images) set only.
  • This version was not trained

Citation:

@article{lecun2010mnist,
                              title={MNIST handwritten digit database},
                              author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
                              journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
                              volume={2},
                              year={2010}
                            }
                            

Pascal VOC 2012 is common benchmark for object detection. It contains common objects that one might find in images on the web.

Image example

Note: the test set is witheld, as is common with benchmark datasets.

You can think of it sort of like a baby COCO.

This is the full 2017 COCO object detection dataset (train and valid), which is a subset of the most recent 2020 COCO object detection dataset.

COCO is a large-scale object detection, segmentation, and captioning dataset of many object types easily recognizable by a 4-year-old. The data is initially collected and published by Microsoft. The original source of the data is here and the paper introducing the COCO dataset is here.