Browse » Benchmark » Object Detection Benchmark

Top Object Detection Benchmark Datasets

Roboflow hosts the most popular computer and machine vision benchmarking and transfer learning datasets. Datasets in this category include Microsoft COCO, Pascal VOC (object detection), and more.

Example Annotations

About this Dataset

The Oxford Pets dataset (also known as the "dogs vs cats" dataset) is a collection of images and annotations labeling various breeds of dogs and cats. There are approximately 100 examples of each of the 37 breeds. This dataset contains the object detection portion of the original dataset with bounding boxes around the animals' heads.

Origin

This dataset was collected by the Visual Geometry Group (VGG) at the University of Oxford.

This dataset is a copy of a subset of the full Stanford Cars dataset

The original dataset contained 16,185 images of 196 classes of cars.

The classes are typically at the level of Make, Model, Year, e.g. 2012 Tesla Model S or 2012 BMW M3 coupe in the original dataset, and in this subset of the full dataset (v3, TestData and v4, original_raw-images).

v4 (original_raw-images) contains a generated version of the original, raw images, without any modified classes

v8 (classes-Modified_raw-images) contains a generated version of the raw images, with the Modify Classes preprocessing feature used to remap or omit the following classes:

  1. bike, moped --remapped to--> motorbike
  2. cng, leguna, easybike, smart fortwo Convertible 2012, and all other specific car makes with named classes (such as Acura TL Type-S 2008) --remapped to--> vehicle
  3. rickshaw, boat, bicycle --> omitted

v9 (FAST-model_mergedAllClasses-augmented_by3x) contains a generated version of the raw images, with the Modify Classes preprocessing feature used to remap or omit the following classes:

  1. bike, moped --remapped to--> motorbike
  2. cng, leguna, easybike, smart fortwo Convertible 2012, and all other specific car makes with named classes (such as Acura TL Type-S 2008) --remapped to--> vehicle
  3. rickshaw, boat, bicycle --> omitted

v10 (ACCURATE-model_mergedAllClasses-augmented_by3x) contains a generated version of the raw images, with the Modify Classes preprocessing feature used to remap or omit the following classes:

  1. bike, moped --remapped to--> motorbike
  2. cng, leguna, easybike, smart fortwo Convertible 2012, and all other specific car makes with named classes (such as Acura TL Type-S 2008) --remapped to--> vehicle
  3. rickshaw, boat, bicycle --> omitted
Citation:

3D Object Representations for Fine-Grained Categorization Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei 4th IEEE Workshop on 3D Representation and Recognition, at ICCV 2013 (3dRR-13). Sydney, Australia. Dec. 8, 2013. pdf BibTex slides

This is the full 2017 COCO object detection dataset (train and valid), which is a subset of the most recent 2020 COCO object detection dataset.

COCO is a large-scale object detection, segmentation, and captioning dataset of many object types easily recognizable by a 4-year-old. The data is initially collected and published by Microsoft. The original source of the data is here and the paper introducing the COCO dataset is here.

Image example

Overview

This dataset contains 581 images of various shellfish classes for object detection. These images are derived from the Open Images open source computer vision datasets.

This dataset only scratches the surface of the Open Images dataset for shellfish!

Image example

Use Cases

  • Train object detector to differentiate between a lobster, shrimp, and crab.
  • Train object dector to differentiate between shellfish
  • Object detection dataset across different sub-species
  • Object detection among related species
  • Test object detector on highly related objects
  • Train shellfish detector
  • Explore the quality and range of Open Image dataset

Tools Used to Derive Dataset

Image example

These images were gathered via the OIDv4 Toolkit This toolkit allows you to pick an object class and retrieve a set number of images from that class with bound box lables.

We provide this dataset as an example of the ability to query the OID for a given subdomain. This dataset can easily be scaled up - please reach out to us if that interests you.

Pistols
11

Overview

This dataset contains 2986 images and 3448 labels across a single annotation class: pistols. Images are wide-ranging: pistols in-hand, cartoons, and staged studio quality images of guns.

The dataset was originally released by the University of Grenada , duplicates removed, and rehosted by a Roboflow user. Example Image

Use Cases

One can create a gun object detection model to monitor security camera footage for the presence of guns, perhaps in places where they should not be. Alaa Senjab built on Roboflow to achieve this goal. He's also open sourced much of his work in this tutorial .

Realtime gun detection

Overview

The PlantDoc dataset was originally published by researchers at the Indian Institute of Technology, and described in depth in their paper. One of the paper’s authors, Pratik Kayal, shared the object detection dataset available on GitHub.

PlantDoc is a dataset of 2,569 images across 13 plant species and 30 classes (diseased and healthy) for image classification and object detection. There are 8,851 labels. Read more about how the version available on Roboflow improves on the original version here.

And here's an example image:

Tomato Blight

Fork this dataset (upper right hand corner) to receive the raw images, or (to save space) grab the 416x416 export.

Use Cases

As the researchers from IIT stated in their paper, “plant diseases alone cost the global economy around US$220 billion annually.” Training models to recognize plant diseases earlier dramatically increases yield potential.

The dataset also serves as a useful open dataset for benchmarks. The researchers trained both object detection models like MobileNet and Faster-RCNN and image classification models like VGG16, InceptionV3, and InceptionResnet V2.

The dataset is useful for advancing general agriculture computer vision tasks, whether that be health crop classification, plant disease classification, or plant disease objection.

Using this Dataset

This dataset follows Creative Commons 4.0 protocol. You may use it commercially without Liability, Trademark use, Patent use, or Warranty.

Provide the following citation for the original authors:

@misc{singh2019plantdoc,
                                title={PlantDoc: A Dataset for Visual Plant Disease Detection},
                                author={Davinder Singh and Naman Jain and Pranjali Jain and Pratik Kayal and Sudhakar Kumawat and Nipun Batra},
                                year={2019},
                                eprint={1911.10317},
                                archivePrefix={arXiv},
                                primaryClass={cs.CV}
                            }
                            

About Roboflow

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.

Roboflow Workmark

Overview

The original Udacity Self Driving Car Dataset is missing labels for thousands of pedestrians, bikers, cars, and traffic lights. This will result in poor model performance. When used in the context of self driving cars, this could even lead to human fatalities.

We re-labeled the dataset to correct errors and omissions. We have provided convenient downloads in many formats including VOC XML, COCO JSON, Tensorflow Object Detection TFRecords, and more.

Some examples of labels missing from the original dataset: Examples of Missing Labels

Stats

The dataset contains 97,942 labels across 11 classes and 15,000 images. There are 1,720 null examples (images with no labels).

All images are 1920x1200 (download size ~3.1 GB). We have also provided a version downsampled to 512x512 (download size ~580 MB) that is suitable for most common machine learning models (including YOLO v3, Mask R-CNN, SSD, and mobilenet).

Annotations have been hand-checked for accuracy by Roboflow.

Class Balance

Annotation Distribution: Annotation Heatmap

Use Cases

Udacity is building an open source self driving car! You might also try using this dataset to do person-detection and tracking.

Using this Dataset

Our updates to the dataset are released under the MIT License (the same license as the original annotations and images).

Note: the dataset contains many duplicated bounding boxes for the same subject which we have not corrected. You will probably want to filter them by taking the IOU for classes that are 100% overlapping or it could affect your model performance (expecially in stoplight detection which seems to suffer from an especially severe case of duplicated bounding boxes).

About Roboflow

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Developers reduce 50% of their boilerplate code when using Roboflow's workflow, save training time, and increase model reproducibility. :fa-spacer:

Roboflow Wordmark

Pascal VOC 2012 is common benchmark for object detection. It contains common objects that one might find in images on the web.

Image example

Note: the test set is witheld, as is common with benchmark datasets.

You can think of it sort of like a baby COCO.

This is the labled dataset for the 5 person, 5 ball video provided by the LPCV challenge.

Training Set has 131 unique images Validation Set has 37 unique images Test Set has 19 unique images

There are 10 classes as follows:

Person 1 (Brown shirt) Person 2 (Dark blue shirt) Person 3 (Light blue shirt) Person 4 (Pink shirt) Person 5 (White shirt)

Ball 1 (Blue) Ball 2 (Orange) Ball 3 (Purple) Ball 4 (Red) Ball 5 (Yellow)

Data_V1: First instance of the dataset, not completely labled Data_V2: Completely labled with lables such as (ball 1, person 1, etc) Data_V3: Completely labled with lables such as (ball_1, person_1, etc) Data_V4: Completely labled with only ball and person Data_V5: Same as 4 but less augmentation