Browse » Featured
Top Featured Datasets
Open source featured computer vision datasets, pre-trained models, and APIs.
251 images of playing cricket, football & baseball.
Original Dataset from Kaggle - Bikram Saha
This dataset contains 252 images of playing cricket, football, and baseball.
(1) cricket
- 95
images
(2) football
- 77
images
(3) baseball
- 79
images
This is a dataset for image classification in sports. This model will help to identify if the sport or activity occurring in the image or video feed is, or most closely resembles, cricket
, football
, or baseball
.
The raw image versions (v1
or v5
) of the dataset can be downloaded, or the entire dataset can be cloned, to your own project for image classification, or to label the figures in the images for object detection, instance or semantic segmentation, etc.
Nike, Adidas and Converse Shoes Dataset for Classification
This dataset was obtained from Kaggle: https://www.kaggle.com/datasets/die9origephit/nike-adidas-and-converse-imaged/
Dataset Collection Methodology:
"The dataset was obtained downloading images from Google images
. The images with a .webp
format were transformed into .jpg
images. The obtained images were randomly shuffled and resized so that all the images had a resolution of 240x240 pixels
. Then, they were split into train
and test
datasets and saved."
Versions:
- v1:
original_raw-images
: the original images without Preprocessing or Augmentation applied, other than Auto-Orient to remove EXIF data. These images are in the original train/test split from Kaggle:237 images in each train set
and38 images in each test set
- v2:
original_trainTestSplit-augmented3x
: the original train/test split, augmented with 3x image generation. This version was not trained with Roboflow Train. - v3:
original_trainTestSplit-augmented5x
: the original train/test split, augmented with 5x image generation. This version was not trained with Roboflow Train. - v4:
rawImages_70-20-10split
: the original images without Preprocessing or Augmentation applied, other than Auto-Orient to remove EXIF data. Dataset splies were modified to a70% train
,20% valid
,10%
test train/valid/test split- NOTE: 70%/20%/10% split:
576 images in train set
,166 images in valid set
,83 images in test set
- NOTE: 70%/20%/10% split:
- v5:
70-20-10split-augmented3x
: modified to a70% train
,20% valid
,10%
test train/valid/test split, augmented with 3x image generation. This version was trained with Roboflow Train. - v6:
70-20-10split-augmented5x
: modified to a70% train
,20% valid
,10%
test train/valid/test split, augmented with 5x image generation. This version was trained with Roboflow Train.
Project Overview:
The original goal was to use this model to monitor my rowing workouts and learn more about computer vision. To monitor the workouts, I needed the ability to identify the individual digits on the rowing machine. With the help of Roboflow's computer vision tools, such as assisted labeling, I was able to more quickly prepare, test, deploy and improve my YOLOv5 model.
Roboflow's Upload API, which is suitable for uploading images, video, and annotations, worked great with a custom app I developed to modify the predictions from the deployed model, and export them in a format that could be uploaded to my workspace on Roboflow.
What took me weeks to develop can now be done with the help of a single click utilize Roboflow Train, and the Upload API for Active Learning (dataset and model improvement).
Dataset Classes:
1
,2
,3
,4
,5
,6
,7
,8
,9
,90
(class "90" is a stand-in for the digit, zero)
This dataset consits of 841 images. There are images from a different rowing machine and also from this repo. Some scenes are illuminated with sunlight. Others have been cropped to include only the LCD. Digits like 7, 8, and 9 are underrepresented.
For more information:
- A full tutorial for creating and deploying this model with YOLOv5: https://vision.philbrockman.com/
People and Ladders Dataset
batch images cloned from:
This is an Instance Segmentation project for visualizing detected cracks on concrete. This dataset is usable for those doing transportation and public safety studies, creating self-driving car models, or testing out computer vision models for fun.
Check out this example guide from Augmented Startups to see the model in action and learn how the dataset came together: https://medium.com/augmented-startups/yolov7-segmentation-on-crack-using-roboflow-dataset-f13ae81b9958
Real-time Web Page Code Generation using YOLOR Framework
This model by Augmented Startups allows you to generate a web page instantly by detecting visual elements on a whiteboard.
Classes include:
short_paragraph
: 2,105 annotations
button
: 1,968 annotations
header
: 1,948 annotations
banner
: 1,859 annotations
long_paragraph
: 1,307 annotations
form
: 1,202 annotations
logo
: 931 annotations
YOLOR Object Detection Course
YouTube video displaying the model.
Augmented Startups YouTube.
Fall Detection
dataset images and annotations were cloned from the following projects:
License Plate Detection
dataset images and annotations were cloned from the following projects:- https://universe.roboflow.com/carplates2/licenseplate_v2
- https://universe.roboflow.com/a-stx8a/licenseplateimage_thresholdfiltered---roboflow/
- https://universe.roboflow.com/augmented-startups/vehicle-registration-plates-trudk
- https://universe.roboflow.com/xavier-jimenez/placas-stcpz/
- https://universe.roboflow.com/yolo-training-jaqog/yolo-training-wis72/
- https://universe.roboflow.com/abdullahi-ayantayo/car-license-plate-eosye/
Indoor Scene Recognition
From the official dataset page:
Indoor scene recognition is a challenging open problem in high level vision. Most scene recognition models that work well for outdoor scenes perform poorly in the indoor domain. The main difficulty is that while some indoor scenes (e.g. corridors) can be well characterized by global spatial properties, others (e.g., bookstores) are better characterized by the objects they contain. More generally, to address the indoor scenes recognition problem we need a model that can exploit local and global discriminative information.
Database
The database contains 67 Indoor categories
... The number of images varies across categories, but there are at least 100 images per category. All images are in jpg format. The images provided here are for research purposes only.
Paper
A. Quattoni, and A.Torralba. Recognizing Indoor Scenes. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
Acknowledgments
Thanks to Aude Oliva for helping to create the database of indoor scenes.
Funding for this research was provided by NSF Career award (IIS 0747120)
THE MNIST DATABASE of handwritten digits
Authors:
- Yann LeCun, Courant Institute, NYU
- Corinna Cortes, Google Labs, New York
- Christopher J.C. Burges, Microsoft Research, Redmond
Dataset Obtained From: http://yann.lecun.com/exdb/mnist/
All images were sized 28x28 in the original dataset
The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.
Version 1 (original-images_trainSetSplitBy80_20):
- Original, raw images, with the
train
set split to provide 80% of its images to the training set and 20% of its images to the validation set - Trained from Roboflow Classification Model's ImageNet training checkpoint
Version 2 (original-images_ModifiedClasses_trainSetSplitBy80_20):
- Original, raw images, with the
train
set split to provide 80% of its images to the training set and 20% of its images to the validation set - Modify Classes, a Roboflow preprocessing feature, was employed to change class names from
0
,1
,2
,3
,4
,5
,6
,7
,8
,9
toone
,two
,three
,four
,five
,six
,seven
,eight
,nine
- Trained from the Roboflow Classification Model's ImageNet training checkpoint
Version 3 (original-images_Original-MNIST-Splits):
- Original images, with the original splits for MNIST:
train
(86% of images - 60,000 images) set andtest
(14% of images - 10,000 images) set only. - This version was not trained
Citation:
@article{lecun2010mnist,
title={MNIST handwritten digit database},
author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
volume={2},
year={2010}
}
Overview
The Weeds dataset is a collection of garden weeds that can easily confuse object detection models due to similiarity of the weeds compared to its surroundings. This dataset was used with YOLOR for object detection to detect weeds in complex backgrounds.
Example Footage!
Training and Deployment
The weeds model has been trained in Roboflow, available for inference on the Dataset tab.
One could also build a Weeds Detector using YOLOR. This is achieved using the Roboflow Platform which you can deploy the model for robust and real-time detections. You can learn more here: https://augmentedstartups.info/YOLOR-Get-Started
About Augmented Startups
We are at the forefront of Artificial Intelligence in computer vision. With over 92k subscribers on YouTube, we embark on fun and innovative projects in this field and create videos and courses so that everyone can be an expert in this field. Our vision is to create a world full of inventors that can turn their dreams into reality
This dataset was prepared with images cloned from: https://universe.roboflow.com/1122022/val-isxy7
The dataset was re-configured to have a 70/20/10 train-valid-test split.
Overview
This is a dataset of blood cells photos, originally open sourced by cosmicad and akshaylambda.
There are 364 images across three classes: WBC
(white blood cells), RBC
(red blood cells), and Platelets
. There are 4888 labels across 3 classes (and 0 null examples).
Here's a class count from Roboflow's Dataset Health Check:
And here's an example image:
Fork
this dataset (upper right hand corner) to receive the raw images, or (to save space) grab the 500x500 export.
Use Cases
This is a small scale object detection dataset, commonly used to assess model performance. It's a first example of medical imaging capabilities.
Using this Dataset
We're releasing the data as public domain. Feel free to use it for any purpose.
It's not required to provide attribution, but it'd be nice! :)
About Roboflow
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their boilerplate code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.
CSGO AIMBOT
Go Win
Trained on 5.9k Images
This project was created by downloading the GTSDB German Traffic Sign Detection Benchmark
dataset from Kaggle and importing the annotated training set files (images and annotation files)
to Roboflow.
https://www.kaggle.com/datasets/safabouguezzi/german-traffic-sign-detection-benchmark-gtsdb
- Original home of the dataset: https://benchmark.ini.rub.de/?section=gtsdb&subsection=dataset - Institut Für Neuroinformatik
The annotation files were adjusted to conform to the YOLO Keras TXT format prior to upload, as the original format did not include a label map file.
v1
contains the original imported images, without augmentations. This is the version to download and import to your own project if you'd like to add your own augmentations.
v2
contains an augmented version of the dataset, with annotations. This version of the project was trained with Roboflow's "FAST" model.
v3
contains an augmented version of the dataset, with annotations. This version of the project was trained with Roboflow's "ACCURATE" model.
This dataset was created by Harry Field and contains the labelled images for capturing the game state of a draughts/checkers 8x8 board.
This was a fun project to develop a mobile draughts applciation enabling users to interact with draughts-based software via their mobile device's camera.
The data captured consists of:
- White Pieces
- White Kings
- Black Pieces
- Black Kings
- Bottom left corner square
- Top left corner square
- Top right corner square
- Bottom right corner square
Corner squares are captured so the board locations of the detected pieces can be estimated.
From this data, the locations of other squares can be estimated and game state can be captured. The image below shows the data of a different board configuration being captured. Blue circles refer to squares, numbers refer to square index and the coloured circles refer to pieces.
Once game state is captured, integration with other software becomes possible. In this example, I created a simple move suggestion mobile applciation seen working here.
The developed application is a proof of concept and is not available to the public. Further development is required in training the model accross multiple draughts boards and implementing features to add vlaue to the physical draughts game.
The dataset consists of 759 images and was trained using Yolov5 with a 70/20/10 split.
The output of Yolov5 was parsed and filtered to correct for duplicated/overlapping detections before game state could be determined.
I hope you find this dataset useful and if you have any questions feel free to drop me a message on LinkedIn as per the link above.
This project is trying to create an efficient computer or machine vision model to detect different kinds of construction equipment in construction sites and we are starting with three classes which are excavators, trucks, and wheel loaders.
The dataset is provided by Mohamed Sabek, a Spring 2022 Master of Science graduate from Arizona State University in Construction Management and Technology.
The raw images (v1) contains:
- 1,532 annotated examples of "excavators"
- 1,269 annotated examples of "dump truck"
- 1,080 annotated examples of "wheel loader"
Note: versions 2 and 3 (v2 and v3) contain the raw images resized at 416 by 416 (stretch to) and 640 by 640 (stretch to) without any augmentations.
BIRDSAI: A Dataset for Detection and Tracking in Aerial Thermal Infrared Videos
Authors:
- Elizabeth Bondi, Harvard University
- Raghav Jain, University of Southern California
- Palash Aggrawal, Indraprastha Institute of Information Technology
- Saket Anand, Indraprastha Institute of Information Technology
- Robert Hannaford, Duke University
- Ashish Kapoor, University of Delhi
- Jim Piavis, The Citadel
- Shital Shah, University of Mumbai
- Lucas Joppa, Chief Environmental Officer, Microsoft
- Bistra Dilkina, University of Southern California
- Milind Tambe, Harvard University
Published: 2020
Description: The Benchmarking IR Dataset for Surveillance with Aerial Intelligence (BIRDSAI, pronounced bird's-eye) is a long-wave thermal infrared dataset containing nighttime images of animals and humans in Southern Africa. The dataset allows for benchmarking of algorithms for automatic detection and tracking of humans and animals with both real and synthetic videos.
Use Cases: Wildlife Poaching Prevention, Night-time Intruder Detection, Wildlife Monitoring, Animal Behavior Research, Long Distance IR Detection
Download: The data can be downloaded from the Labeled Information Library of Alexandria
Training Dataset Download: https://lilablobssc.blob.core.windows.net/conservationdrones/v01/conservation_drones_train_real.zip
Annotation Format:
We follow the MOT annotation format, which is a CSV with the following columns:
<frame_number>, <object_id>, <x>, <y>, <w>, <h>, <class>, <species>, <occlusion>, <noise>
class: 0 if animals, 1 if humans
species: between -1 and 8 representing species below; 3 and 4 occur only in real data; 5, 6, 7, 8 occur only in synthetic data (note: most very small objects have unknown species)
-1: unknown, 0: human, 1: elephant, 2: lion, 3: giraffe, 4: dog, 5: crocodile, 6: hippo, 7: zebra, 8: rhino
occlusion: 0 if there is no occlusion, 1 if there is an occlusion (i.e., either occluding or occluded) (note: intersection over union threshold of 0.3 used to assign occlusion; more details in paper)
noise: 0 if there is no noise, 1 if there is noise (note: noise labels were interpolated from object locations in previous and next frames; for more than 4 consecutive frames without labels, no noise labels were included; more details in paper)
Acknowledgements: BIRDSAI was supported by Microsoft AI for Earth, NSF CCF-1522054 and IIS-1850477, MURI W911NF-17-1-0370, and the Infosys Center for Artificial Intelligence, IIIT-Delhi . Thanks to the labeling team and the Labeled Information Library of Alexandria for hosting the data.
Citation:
@inproceedings{bondi2020birdsai,
title={BIRDSAI: A Dataset for Detection and Tracking in Aerial Thermal Infrared Videos},
author={Bondi, Elizabeth and Jain, Raghav and Aggrawal, Palash and Anand, Saket and Hannaford, Robert and Kapoor, Ashish and Piavis, Jim and Shah, Shital and Joppa, Lucas and Dilkina, Bistra and Tambe, Milind},
booktitle={WACV},
year={2020}
}
This project was created by Arfiani Nur Sayidah and is for sorting "apples" from "damaged apples."
The classes are "apple" and "damaged_apples"
Original Class Balance:
- apple: 2,152
- damaged_apple: 708
Golf Ball Object Detection
Usage
This model will perform best on images + videos that are taken on a golf course (similar to photo in thumbnail and dataset).
It's a great model for sports broadcasting and other apps to have automated ball tracking, scoring, lost ball finding and more!
This is a U.S. license plate dataset + model using object detection. The images for this dataset were collected from Google images and around Central Florida parks. If you see your license plate in this dataset and you wish to remove it, please contact friends@roboflow.com
Try it out on this example web app or deploy to Luxonis Oak .
TACO: Trash Annotations in Context Dataset
From: Pedro F. Proença; Pedro Simões
- For more information, go to: http://tacodataset.org
- https://github.com/pedropro/TACO
TACO is a growing image dataset of trash in the wild. It contains segmented images of litter taken under diverse environments: woods, roads and beaches. These images are manually labeled according to an hierarchical taxonomy to train and evaluate object detection algorithms. Annotations are provided in a similar format to COCO dataset.
The model in action:
Examples images from the dataset:
For more details and to cite the authors:
- Paper: https://arxiv.org/abs/2003.06975
- Paper Citation:
@article{taco2020,
title={TACO: Trash Annotations in Context for Litter Detection},
author={Pedro F Proença and Pedro Simões},
journal={arXiv preprint arXiv:2003.06975},
year={2020}
}
Soybeans kernels counter
Personal Protective Equipment Dataset and Model
This dataset is a collection of images that contains annotations for the classes below:
- goggles
- helmet
- mask
- no-suit
- no_goggles
- no_helmet
- no_mask
- no_shoes
- shoes
- suit
- no_glove
- glove
Usage
Most of these classes are underrepresented and would need to be balanced for better detection. An improved model can be utilized for use cases that'll detect the classes above in order to minimize exposure to hazards that cause serious workplace injuries.
The dataset comes from Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, and Kavita Sultanpure - creators of CascadeTabNet.
Depending on the dataset version downloaded, the images will include annotations for 'borderless' tables, 'bordered' tables', and 'cells'. Borderless tables are those in which every cell in the table does not have a border. Bordered tables are those in which every cell in the table has a border, and the table is bordered. Cells are the individual data points within the table.
A subset of the full dataset, the ICDAR Table Cells Dataset, was extracted and imported to Roboflow to create this hosted version of the Cascade TabNet project. All the additional dataset components used in the full project are available here: All Files.
Versions:
- Version 1, raw-images : 342 raw images of tables. No augmentations, preprocessing step of auto-orient was all that was added.
- Version 2, tableBordersOnly-rawImages : 342 raw images of tables. This dataset version contains the same images as version 1, but with the caveat of Modify Classes being applied to omit the 'cell' class from all images (rendering these images to be apt for creating a model to detect 'borderless' tables and 'bordered' tables.
For the versions below: Preprocessing step of Resize (416by416 Fit within-white edges) was added along with more augmentations to increase the size of the training set and to make our images more uniform. Preprocessing applies to all images whereas augmentations only apply to training set images.
3. Version 3, augmented-FAST-model : 818 raw images of tables. Trained from Scratch (no transfer learning) with the "Fast" model from Roboflow Train. 3X augmentation (generated images).
4. Version 4, augmented-ACCURATE-model : 818 raw images of tables. Trained from Scratch with the "Accurate" model from Roboflow Train. 3X augmentation.
5. Version 5, tableBordersOnly-augmented-FAST-model : 818 raw images of tables. 'Cell' class ommitted with Modify Classes. Trained from Scratch with the "Fast" model from Roboflow Train. 3X augmentation.
6. Version 6, tableBordersOnly-augmented-ACCURATE-model : 818 raw images of tables. 'Cell' class ommitted with Modify Classes. Trained from Scratch with the "Accurate" model from Roboflow Train. 3X augmentation.
Example Image from the Dataset
Cascade TabNet in Action
CascadeTabNet is an automatic table recognition method for interpretation of tabular data in document images. We present an improved deep learning-based end to end approach for solving both problems of table detection and structure recognition using a single Convolution Neural Network (CNN) model. CascadeTabNet is a Cascade mask Region-based CNN High-Resolution Network (Cascade mask R-CNN HRNet) based model that detects the regions of tables and recognizes the structural body cells from the detected tables at the same time. We evaluate our results on ICDAR 2013, ICDAR 2019 and TableBank public datasets. We achieved 3rd rank in ICDAR 2019 post-competition results for table detection while attaining the best accuracy results for the ICDAR 2013 and TableBank dataset. We also attain the highest accuracy results on the ICDAR 2019 table structure recognition dataset.
From the Original Authors:
If you find this work useful for your research, please cite our paper:
@misc{ cascadetabnet2020,
title={CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents},
author={Devashish Prasad and Ayan Gadpal and Kshitij Kapadni and Manish Visave and Kavita Sultanpure},
year={2020},
eprint={2004.12629},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
SkyBot
This is the dataset powering http://skybot.cam, an app that captures planes flying over top of my house.
Upon the project gaining popularity on Hacker News from the above tweet, I thought I'd share the dataset and an example model to make it easier for others to build a plane spotting app, too.
About this Project
I built a system to take photos of all of the airplanes that fly over my house. Most of these planes are passing by at more than 30,000 feet! It uses ADS-B to track where the aircraft are relative to the camera, points the camera in the right direction and snaps a photo. I then run a few serverless functions that are running to detect where the aircraft is in the image and make a thumbnail. Much of the services are hosted on Azure. There's more details on the overall project here! http://skybot.cam/about. The project is open source as a part of my work from IQT as well.
About the Dataset
The dataset is of airfract that was captured as they flew overhead. It includes a mix of large and small passenger jets and an assortment of business jets. There are also a images with buildings and contrails, where there is not aircraft present.
Use Cases
This dataset should allow for a plane dectector model to be built like for plane spotting and plane detection.
About Me
I'm Luke Berndt, I work on Azure products at Microsoft. You can learn more about me here: http://lukeberndt.com/
Overview
The Roboflow Mask Wearing iOS
dataset is an object detection dataset of individuals wearing various types of masks and those without masks. A subset of the images were originally collected by Cheng Hsun Teng from Eden Social Welfare Foundation, Taiwan and relabled by the Roboflow team.
Example images (with masks, and without):
Use Cases
One could use this dataset to build a system for detecting if an individual is wearing a mask in a given photo. PPE detection in high-risk work settings, or general health safety settings are other good use cases.
The dataset has a few batches of images collected only from iPhone's, so as to help improve the performance of model predictions on iPhone's with the Roboflow Mobile iOS SDK.
Using this Dataset
Use the Download this Dataset
button to download and import this dataset to your own Roboflow account and export it with new preprocessing settings, perhaps [resized]( for your model's desired format or converted to grayscale, or additional augmentations to make your model generalize better.
You can also import this dataset to your own Roboflow account and export it, or continue working on it on Roboflow to test, improve, and deploy your model.
About Roboflow
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.
This project was created after collecting images from retail coolers at Walgreens stores in Chicago, Illinois and Iowa City, Iowa.
All products are marked/labeled as product
, and all empty spaces are labeled as empty
This is a good starter-dataset for anyone interested in doing "void space" calculations (percent of space that is empty), product inventory counts, or making more custom classes for individual product inventory counts.
Versions 2 and 3 were trained from the Microsoft COCO model checkpoint.
Versions 4 and 5 were trained from the SKU-110k model's training checkpoint, from the Roboflow Universe Retail datasets page.
For more, check out the blog posts below:
Background Information
This dataset was curated and annotated by Syed Salman Reza. A custom dataset composed of two classes (With Helmet, Without Helmet). Main objetive is to identify if a Biker wearing Helmet or not.
The original custom dataset (v1) is composed of 1,371 images of people with and without bike helmets.
The dataset is available under the Public License.
Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
Dataset Versions
Version 1 (v1) - 1,371 images
- Preprocessing: Auto-Orient and Resize (Stretch to 416x416)
- Augmentations: Augmentations applied prior to import - Bounding Box Blur (up to 10px)
- Training Metrics: Trained from the COCO Checkpoint in Public Models ("transfer learning") on Roboflow
- mAP = 74.4%, precision = 54.0%, recall = 77.0%
Version 2 (v2) - 3,735 images
- Preprocessing: Auto-Orient and Resize (Stretch to 416x416)
- Augmentations: Augmentations applied prior to import - Bounding Box Blur.
- New augmentations:
Outputs per training example: 3
Rotation: Between -30° and +30°
Shear: ±15° Horizontal, ±15° Vertical
Blur: Up to 1.5px
Mosaic: Applied
- New augmentations:
- Training Metrics: Trained from the COCO Checkpoint in Public Models ("transfer learning") on Roboflow
- mAP = 91.5%, precision = 65.1%, recall = 92.8%
Syed Salman Reza - Github
Use this to make a scoring app of rock paper scissors (Cheaters beware)!
This dataset contains images of the rock, paper, scissors hand gestures that are detected in the model and can be used in gaming. Repo with counter (Human vs Computer) here
This dataset needs additional data on:
- diverse representaion
- null images
This dataset is a copy of a subset of the full Stanford Cars dataset
The original dataset contained 16,185 images of 196 classes of cars.
The classes are typically at the level of Make, Model, Year, e.g. 2012 Tesla Model S or 2012 BMW M3 coupe in the original dataset, and in this subset of the full dataset (v3
, TestData and v4
, original_raw-images).
v4
(original_raw-images) contains a generated version of the original, raw images, without any modified classes
v8
(classes-Modified_raw-images) contains a generated version of the raw images, with the Modify Classes preprocessing feature used to remap or omit the following classes:
bike
,moped
--remapped to-->motorbike
cng
,leguna
,easybike
,smart fortwo Convertible 2012
, and all other specific car makes with named classes (such asAcura TL Type-S 2008
) --remapped to-->vehicle
rickshaw
,boat
,bicycle
--> omitted
v9
(FAST-model_mergedAllClasses-augmented_by3x) contains a generated version of the raw images, with the Modify Classes preprocessing feature used to remap or omit the following classes:
bike
,moped
--remapped to-->motorbike
cng
,leguna
,easybike
,smart fortwo Convertible 2012
, and all other specific car makes with named classes (such asAcura TL Type-S 2008
) --remapped to-->vehicle
rickshaw
,boat
,bicycle
--> omitted
v10
(ACCURATE-model_mergedAllClasses-augmented_by3x) contains a generated version of the raw images, with the Modify Classes preprocessing feature used to remap or omit the following classes:
bike
,moped
--remapped to-->motorbike
cng
,leguna
,easybike
,smart fortwo Convertible 2012
, and all other specific car makes with named classes (such asAcura TL Type-S 2008
) --remapped to-->vehicle
rickshaw
,boat
,bicycle
--> omitted
Citation:
3D Object Representations for Fine-Grained Categorization
Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei
4th IEEE Workshop on 3D Representation and Recognition, at ICCV 2013 (3dRR-13). Sydney, Australia. Dec. 8, 2013.
pdf BibTex slides
This dataset contains images of 3D printer failures across a variety of print jobs spanning several types of printer, material, models, and colors. It can be used as the starting point for creating a computer vision model that monitors a print job and aborts it as soon as it becomes evident that there is a problem (and alerts you before wasting a ton of time and money on materials). It can also be used to do automated quality assurance on finished models to make sure they do not exhibit common defects.
A dataset of 276 images is provided, along with a pre-trained model you can try in your browser and deploy to several different edge devices.
Introducing RICK: Saving the Internet from Rickroll
We get it. Rickrolling happens.
But what if, using the latest state of the art machine learning, you could build ways to detect and prevent you and your loved ones from ever being Rickrolled again?
We're thrilled to introduce RICK: Real-time Intrusion Checker Kernel, a state of the art advancement and foundation model in internet safety. RICK is capable of detecting the presence of Rick Astley in images and video, so that applications can be built to shield you from Rick Astley content (or amplify said content, should you choose 🙃).
Conceived on April 1, 2022 at 9:41 AM ET and hacked together in about 30min during Roboflow's Friday team lunch, RICK has undergone thorough development to address a pressing need in online safety.
RICK is trained on 1200 images of Rick Astley and non-Rick Astley content.
Read more about how and why we built RICK here.
Building with RICK
As an example of RICK's utility, the Roboflow team has built Rickblocker, an open source application that automatically mutes your computer and places a black box on Rick Astley's face whenever he is present in a YouTube video you may accidentally click.
Try RICK Yourself
RICK is fully hosted and free to use as an API or in-browser model. You can even confirm if you are not-rick
by trying with your webcam.
Background Information
This dataset was curated and annotated by Mohamed Traore from the Roboflow Team. A custom dataset composed of one class (chicken). The main objective is to identify chicken(s) and perform object-tracking on chicken(s) using Roboflow's "zero shot object tracking."
The original video is from Wendy Thomas (Description: "Definitive proof that the chicken crossed the road to get to the other side.")
The original custom dataset (v1) is composed of 106 images of chickens and their surrounding environment.
The dataset is available under the Public License.
Zero Shot Object Tracking
- Using the video from Wendy Thomas (which was included in this dataset through the use of Roboflow's Video Ingestion tool
Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
Dataset Versions
Version 1 (v1) - 106 images
- Preprocessing: Auto-Orient
- Augmentations: No augmentations applied
- Training Metrics: This version of the dataset was not trained
Version 2 (v2) - 106 images
- Preprocessing: Auto-Orient and Resize (Stretch to 416x416)
- Augmentations: No augmentations applied
- Training Metrics: This version of the dataset was not trained
Version 3 (v3), "v1-augmented-COCO-transferLearning" - 254 images
Trained from the COCO Checkpoint in Public Models ("transfer learning") on Roboflow
- 3x image generation
Version 11 (v11), "v1-augmented-trainFromScratch" - 463 images
Trained from the Version 3 training checkpoint.
- Modify Classes was applied to remap the "chickens" class to "rooster" (meaning "rooster" will show up for the bounding boxes when running inference).
- 3x image generation
Version 12 (v12) - 185 images
- Preprocessing: Auto-Orient, Modify Classes (remap the "chickens" class to "rooster")
- Augmentations: No augmentations applied
- Training Metrics: This version of the dataset was not trained
Mohamed Traore - LinkedIn
Soccer (Fùtbol) Match Analysis
FIFA World Cup - Qatar 2022
United States of America (USA) vs. Netherlands (NED) on Dec. 3, 2022
This dataset includes classes for:
USA
(USA players)NED
(NED players)Ball
(soccer/fùtbol)Ref
(referee)Goalie
GOAL
(soccer/fùtbol goal)
(video/GIF slowed to help with visualization)
How to Use this Dataset/Project:
This is a great starter-dataset for anyone wanting to make computer vision models for player-tracking, ball-tracking, or other game/match analytics.
- If the player-classes are left as
USA
andNED
, then you will be tracking players with labels ofUSA
andNED
. - If the player-classes are updated to
Team1
andTeam2
with Modify Classes, then you will be tracking players with labels ofTeam1
andTeam2
. - If the player-classes are all updated to
Player
with Modify Classes, then you will be tracking all players on the pitch with labels ofPlayer
.
Next Steps/Dataset and Model Improvement:
Try Cloning this project and adding more of your own images or video; images from another project on Roboflow Universe, or a video from YouTube.
- Roboflow Universe: All Categories | Sports
- Try Active Learning to root out false and low-confidence detections to make a more highly-performant model!
UPDATES (12/15/2022):
- More image data (video frames) from YouTube was added from the France vs. Morocco 2022 World Cup Semi-Final Match to help make the model for generalizable for other soccer/fùtbol matches.
This projects combines the Dollar Bill Detection project from Alex Hyams (v13
of the project was exported in COCO JSON format for import to this project) and the Final Counter, or Coin Counter, project from Dawson Mcgee (v6
of the project was exported in COCO JSON format for import to this project).
v1
contains the original imported images, without augmentations. This is the version to download and import to your own project if you'd like to add your own augmentations.
This dataset can be used to create computer vision applications in the banking and finance industry for use cases like detecting and counting US currency.
Background Information
This dataset was curated and annotated by Mohamed Traore and Justin Brady after forking the raw images from the Roboflow Universe Mask Wearing dataset and remapping the mask
and no-mask
classes to face
.
The main objective is to identify human faces in images or video. However, this model could be used for privacy purposes with changing the output of the bounding boxes to blur the detected face or fill it with a black box.
The original custom dataset (v1) is composed of 867 unaugmented (raw) images of people in various environments. 55 of the images are marked as Null to help with feature extraction and reducing false detections.
Version 2 (v2) includes the augmented and trained version of the model. This version is trained from the COCO model checkpoint to take advantage of transfer learning and improve initial model training results.
Model Updates:
After a few trainings, and running tests with Roboflow's webcam model and Roboflow's video inference repo, it was clear that edge cases like hands sometimes recognized as faces was an issue. I grabbed some images from Alex Wong's Hand Signs dataset (96 images from the dataset) and added them to the project. I uploaded the images, without the annotation files, labeled all the faces, and retrained the model (version 5).
The dataset is available under the CC BY 4.0 license.
Includes images from:
@misc{ person-hgivm_dataset,
title = { person Dataset },
type = { Open Source Dataset },
author = { Abner },
howpublished = { \url{ https://universe.roboflow.com/abner/person-hgivm } },
url = { https://universe.roboflow.com/abner/person-hgivm },
journal = { Roboflow Universe },
publisher = { Roboflow },
year = { 2021 },
month = { aug },
note = { visited on 2022-10-14 },
}
The Apple Vision annotated data set contains over 350 images of naturally growing apples on an apple tree. Unlike other existing sets, this set attempted to capture apples growing on trees with different exposures of natural light during the daytime.
The training data was comprised of 77 photos taken of Peter Bloch’s home apple tree. These images were shot between July and September of 2021 on an iPhone 11 camera. After the photos were taken, they were sliced into multiple smaller images with a resolution of 360 × 640 pixels per image. This number was selected as the lowest natural resolution for a CV camera later used in this project.
This set was originally created for the ECE 31 Capstone project at Oregon State University.
This project labels solar panels collected via a DJI Mavic Air 2 flying over Rancho Santa Fe, California in August 2022. Both rooftop and backyard solar panels are labeled. It was used as the basis for the Using Computer Vision with Drones for Georeferencing blog post and the open source DJI aerial georeferencing project.
53 images labeled with 267 polygons were used to train a computer vision model to detect solar panels from above. It's a demonstration of collecting and annotating data from a drone video and using that to train a machine learning model.
Boxpunch Detector
Onboarding project for Roboflow
This project captures punch types thrown during boxing training
v12
contains the original, raw images, with annotations. It includes the following classes:
one-front
,one-back
,five-front
,five-back
,ten-front
,ten-back
,twenty-front
,twenty-back
,fifty-front
,fifty-back
v13
contains the original, raw images, with annotations and Modified Classes. It includes the following classes:
one
,five
,ten
,twenty
,fifty
Halo Infinite: Spartan Dataset
Classifications
There are four classifications:
- Enemy
- Enemy Head
- Friendly
- Friendly Head
Image Settings
Images are 320p by 320p centered on the targeting reticule
Game Settings
Images were gathered on low settings. Enemies are color:pineapple and allies are default blue.
Attribution and License
This dataset was created and annotated by Graham Doerksen and is available under CC BY 4.0 license
Overview
This is an attempt to make a computer vision model that can identify chocolates in a box of chocolates.
I trained this model on images of a See’s 1lb Classic Red Heart - Assorted Chocolates box. There are 22 different classes, one for each type of chocolate in this box.
This dataset contains 87 original images, and 1,697 annotations.
Learn More
This project uses an Object Detection model to detect CRACKS to any infrastructure like tall buildings, skyscraper, bridges etc. using footage from Drone.
Classes and class balance (as of December 22, 2022):
cardboard
- 1,549rigid_plastic
- 622metal
- 559soft_plastic
- 554
When learning to play Dreidel, I would sometimes forget what the names of each character are and what action they correspond to in the game. I thought it’d be fun to create a computer vision model that could understand what each symbol on a Dreidel is, making it easier to learn to play the game.
This model tracks the dreidel as it spins and detects the letters that are on the four sided dreidel.
How to Play Dreidel
Rules:
1. The players are dealt gelt (chocolate wrapped in gold paper made to look like a coin)
2. Each player takes a turn at spinning the Dreidel
3. The Dreidel has four sides that each prompt an action to take by the spinner
If נ (nun) is facing up, the player does nothing.
If ג (gimel) is facing up, the player gets everything in the pot.
If ה (hay) is facing up, the player gets half of the pieces in the pool.
If ש (shin) the player adds one of their gelt to the pot
4. The winner, of course, gets to eat all the gelt
Hopefully, with this application, one can create an application that teaches someone how to play dreidel.
This dataset contains annotated pictures of animals (like wild pigs and deer) from trail cameras in East Texas.
You can use this dataset and the detection API to create computer vision applications for hunting, monitoring animal population health, counting deer sightings, and more!
Automatically filter through hours of trail cam footage to find the times/frames when wild game is caught on camera.
videos from:
images cloned from:
- https://universe.roboflow.com/roboflow-universe-projects/personal-protective-equipment-combined-model/
- https://universe.roboflow.com/roboflow-universe-projects/people-and-ladders/
- https://universe.roboflow.com/roboflow-universe-projects/safety-vests/
- https://universe.roboflow.com/mohamed-sabek-6zmr6/excavators-cwlh0
- https://universe.roboflow.com/popular-benchmarks/mit-indoor-scene-recognition/ - null images
- https://universe.roboflow.com/mohamed-traore-2ekkp/people-detection-general
- https://universe.roboflow.com/labeler-projects/construction-madness/
This dataset was found on Kaggle's AFO - Aerial Dataset of floating objects by Jan Gąsienica-Józkowy, and was used to build an object detection model for the "How to Train Computer Vision Models on Aerial Imagery" technical blog.
Future iterations may include the detection of people, as well as the use of Project_2 and Project_3 of the original dataset.
Self-Driving Thermal Object-Detection
Overview
This model detects potentially moving objects (cars, bicycles, people, and dogs), to aid in self-driving and autonomous vehicles.
Dataset
The dataset is comprised of over twelve thousand thermal images, largely annotating cars.
Real life Valorant Gameplay Expirience
This Dataset was used to create an AI for Valorant that would Smoke and Flash me In Real Life.
Full Video: https://youtu.be/aopXw22iL1M
Rocket Detect is the dataset used to train the nueral network behind Autotrack, NASASpaceflight's automated rocket tracking system. Rocket Detect uses three classes:
- Engine Flames - The fire produced by the rocket
- Rocket Body - The body of the launch vehicle
- Space - A tiny spec in the sky that is the rocket after it has ascended into space
Using YoloV5 and Strongsort, Rocket Detect in its current form has proven sufficient to reliably track a Falcon 9 first stage continuously form launch to landing. It also works on a wide variety of other launch vehicles. The dataset will increase with time to further improve reliablity.
Overview
The Hard Hat
dataset is an object detection dataset of workers in workplace settings that require a hard hat. Annotations also include examples of just "person" and "head," for when an individual may be present without a hard hart.
Example Image:
Use Cases
One could use this dataset to, for example, build a classifier of workers that are abiding safety code within a workplace versus those that may not be. It is also a good general dataset for practice.
Using this Dataset
Use the fork
button to copy this dataset to your own Roboflow account and export it with new preprocessing settings (perhaps resized for your model's desired format or converted to grayscale), or additional augmentations to make your model generalize better. This particular dataset would be very well suited for Roboflow's new advanced Bounding Box Only Augmentations.
About Roboflow
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.
This dataset was prepared for First Robotics by the 2914 Robotics Team of Wilson High School.
The dataset contains labeled blue and red balls. The original dataset contains 987 blue annotated examples, and 731 red annotated examples of the balls.
The raw images are contained in version 10 (raw-images)
Background Information
This dataset was curated and annotated by Ahmed Elmogtaba Abdelaziz.
The original dataset (v6) is composed of 204 images of honeybees present in a wide variety of scenes.
The dataset is available under a Public License.
Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
Dataset Versions
Version 5 - 490 images
- Preprocessing: Resize, 416 by 416
- Augmentations:
- 90° Rotate: Clockwise, Counter-Clockwise
Rotation: Between -15° and +15°
Saturation: Between -10% and +10%
Brightness: Between -10% and +10%
Blur: Up to 0.25px
Mosaic: Applied - Output: 3x image generation
Overview
This dataset contains 74 images of aerial maritime photographs taken with via a Mavic Air 2 drone and 1,151 bounding boxes, consisting of docks, boats, lifts, jetskis, and cars. This is a multi class problem. This is an aerial object detection dataset. This is a maritime object detection dataset.
The drone was flown at 400 ft. No drones were harmed in the making of this dataset.
This dataset was collected and annotated by the Roboflow team, released with MIT license.
Use Cases
- Identify number of boats on the water over a lake via quadcopter.
- Boat object detection dataset
- Aerial Object Detection proof of concept
- Identify if boat lifts have been taken out via a drone
- Identify cars with a UAV drone
- Find which lakes are inhabited and to which degree.
- Identify if visitors are visiting the lake house via quad copter.
- Proof of concept for UAV imagery project
- Proof of concept for maritime project
- Etc.
This dataset is a great starter dataset for building an aerial object detection model with your drone.
Getting Started
Fork or download this dataset and follow our How to train state of the art object detector YOLOv4 for more. Stay tuned for particular tutorials on how to teach your UAV drone how to see and comprable airplane imagery and airplane footage.
Annotation Guide
See here for how to use the CVAT annotation tool that was used to create this dataset.
About Roboflow
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
:fa-spacer:
Developers reduce 50% of their boilerplate code when using Roboflow's workflow, save training time, and increase model reproducibility.
:fa-spacer:
This Dataset contains images of popular North American mushrooms, Chicken of the Woods and Chanterelle, differentiating between the two species.
This dataset is an example of an object detection task that is possible via custom training with Roboflow.
Two versions are listed. "416x416" is a 416 resolution version that contains the base images in the dataset. "416x416augmented" contains the same images with various image augmentations applied to build a more robust model.
About Scubotics
Scubotics created https://www.namethatfish.com/. We are a startup dedicated to helping people better understand the ocean, one fish at a time.
About this dataset
The Ocean dataset contains images of ocean imagery depicting a few different species of fish.
Example Footage
Models trained on images like this dataset empower fish identification like the following:
Background Information
This dataset was curated and annotated by Find This Base. A custom dataset composed of 16 classes from the popular mobile game, Clash of Clans.
- Classes: Canon, WizzTower, Xbow, AD, Mortar, Inferno, Scattershot, AirSweeper, BombTower, ClanCastle, Eagle, KingPad, QueenPad, RcPad, TH13 and WardenPad.
The original custom dataset (v1) is composed of 125 annotated images.
The dataset is available under the CC BY 4.0 license.
Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
Dataset Versions
Version 1 (v1) - 125 images
- Preprocessing - Auto-Orient and Resize: Fit (black edges) to 640x640
- Augmentations - No augmentations applied
- Training Metrics - Trained from Scratch (no checkpoint used) on Roboflow
- mAP = 83.1%, precision = 43.0%, recall = 99.1%
Version 4 (v4) - 301 images
- Preprocessing - Auto-Orient and Resize: Fit (black edges) to 640x640
- Augmentations - Mosaic
- Generated Images - Outputs per training example: 3
- Training Metrics - Trained from Scratch (no checkpoint used) on Roboflow
- mAP = %, precision = %, recall = %
Find This Base: Official Website | How to Use Find This Base | Discord | Patreon
Background Information
This dataset was curated and annotated by Ilyes Talbi, Head of La revue IA, a French publication focused on stories of machine learning applications.
Main objetive is to identify if soccer (futbol) players, the referree and the soccer ball (futbol).
The original custom dataset (v1) is composed of 163 images.
- Class 0 = players
- Class 1 = referree
- Class 2 = soccer ball (or futbol)
The dataset is available under the Public License.
Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
Dataset Versions
Version 7 (v7) - 163 images (raw images)
- Preprocessing: Auto-Orient, Modify Classes: 3 remapped, 0 dropped
- Modified Classes: Class 0 = players, Class 1 = referree, Class 2 = futbol
- Augmentations: No augmentations applied
- Training Metrics: This version of the dataset was not trained
Version 2 (v2) - 163 images
- Preprocessing: Auto-Orient and Resize (Stretch to 416x416)
- Augmentations: No augmentations applied
- Training Metrics: This version of the dataset was not trained
Version 3 (v3) - 391 images
- Preprocessing: Auto-Orient and Resize (Stretch to 416x416), Modify Classes: 3 remapped, 0 dropped
- Modified Classes: Class 0 = players, Class 1 = referree, Class 2 = futbol
- Augmentations:
- Outputs per training example: 3
- Rotation: Between -25° and +25°
- Shear: ±15° Horizontal, ±15° Vertical
- Brightness: Between -25% and +25%
- Blur: Up to 0.75px
- Noise: Up to 1% of pixels
- Bounding Box: Blur: Up to 0.5px
- Training Metrics: 86.4%mAP, 51.8% precision, 90.4% recall
Version 4 (v4) - 391 images
- Preprocessing: Auto-Orient and Resize (Stretch to 416x416), Modify Classes: 3 remapped, 0 dropped
- Modified Classes: Class 0 = players, Class 1 = referree, Class 2 = futbol
- Augmentations:
- Outputs per training example: 3
- Rotation: Between -25° and +25°
- Shear: ±15° Horizontal, ±15° Vertical
- Brightness: Between -25% and +25%
- Blur: Up to 0.75px
- Noise: Up to 1% of pixels
- Bounding Box: Blur: Up to 0.5px
- Training Metrics: 84.6% mAP, 52.3% precision, 85.3% recall
Version 5 (v5) - 391 images
- Preprocessing: Auto-Orient and Resize (Stretch to 416x416), Modify Classes: 3 remapped, 2 dropped
- Modified Classes: Class 0 = players, Class 1 = referree, Class 2 = futbol
- Only Class 0, which was remapped to players was included in this version
- Modified Classes: Class 0 = players, Class 1 = referree, Class 2 = futbol
- Augmentations:
- Outputs per training example: 3
- Rotation: Between -25° and +25°
- Shear: ±15° Horizontal, ±15° Vertical
- Brightness: Between -25% and +25%
- Blur: Up to 0.75px
- Noise: Up to 1% of pixels
- Bounding Box: Blur: Up to 0.5px
- Training Metrics: Trained from the COCO Checkpoint in Public Models ("transfer learning") on Roboflow
- 98.8%mAP, 76.3% precision, 99.2% recall
Version 6 (v6) - 391 images
- Preprocessing: Auto-Orient and Resize (Stretch to 416x416), Modify Classes: 3 remapped, 2 dropped
- Modified Classes: Class 0 = players, Class 1 = referree, Class 2 = futbol
- Only Class 0, which was remapped to players was included in this version
- Modified Classes: Class 0 = players, Class 1 = referree, Class 2 = futbol
- Augmentations:
- Outputs per training example: 3
- Rotation: Between -25° and +25°
- Shear: ±15° Horizontal, ±15° Vertical
- Brightness: Between -25% and +25%
- Blur: Up to 0.75px
- Noise: Up to 1% of pixels
- Bounding Box: Blur: Up to 0.5px
- Training Metrics: Trained from Scratch (no transfer learning employed)
- 95.5%mAP, 67.8% precision, 95.5% recall
Ilyes Talbi - LinkedIn | La revue IA
About this Dataset
This dataset was created by exporting images from images.cv and labeling them as an object detection dataset. The dataset contains 421 raw images (v1 - raw-images) and labeled classes include:
- forklift
- person
Background Information
This dataset was curated and annotated by Mohamed Attia.
The original dataset (v1) is composed of 451 images of various pills that are present on a large variety of surfaces and objects.
The dataset is available under the Public License.
Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
Dataset Versions
Version 1 (v1) - 451 images
- Preprocessing: Auto-Orient and Resize (Stretch to 416x416)
- Augmentations: No augmentations applied
- Training Metrics: This version of the dataset was not trained
Version 2 (v2) - 1,083 images
- Preprocessing: Auto-Orient, Resize (Stretch to 416x416), all classes remapped (Modify Classes) to "pill"
- Augmentations:
90° Rotate: Clockwise, Counter-Clockwise, Upside Down
Crop: 0% Minimum Zoom, 77% Maximum Zoom
Rotation: Between -45° and +45°
Shear: ±15° Horizontal, ±15° Vertical
Hue: Between -22° and +22°
Saturation: Between -27% and +27%
Brightness: Between -33% and +33%
Exposure: Between -25% and +25%
Blur: Up to 3px
Noise: Up to 5% of pixels
Cutout: 3 boxes with 10% size each
Mosaic: Applied
Bounding Box: Brightness: Between -25% and +25% - Training Metrics: Trained from the COCO Checkpoint in Public Models ("transfer learning") on Roboflow
- mAP = 91.4%, precision = 61.1%, recall = 93.9%
Version 3 (v3) - 1,083 images
- Preprocessing: Auto-Orient, Resize (Stretch to 416x416), all classes remapped (Modify Classes) to "pill"
- Augmentations:
90° Rotate: Clockwise, Counter-Clockwise, Upside Down
Crop: 0% Minimum Zoom, 77% Maximum Zoom
Rotation: Between -45° and +45°
Shear: ±15° Horizontal, ±15° Vertical
Hue: Between -22° and +22°
Saturation: Between -27% and +27%
Brightness: Between -33% and +33%
Exposure: Between -25% and +25%
Blur: Up to 3px
Noise: Up to 5% of pixels
Cutout: 3 boxes with 10% size each
Mosaic: Applied
Bounding Box: Brightness: Between -25% and +25% - Training Metrics: Trained from "scratch" (no transfer learning employed) on Roboflow
- mAP = 84.3%, precision = 53.2%, recall = 86.7%
Version 4 (v4) - 451 images
- Preprocessing: Auto-Orient, Resize (Stretch to 416x416), all classes remapped (Modify Classes) to "pill"
- Augmentations: No augmentations applied
- Training Metrics: This version of the dataset was not trained
Version 5 (v5) - 496 images
- Preprocessing: Auto-Orient, all classes remapped (Modify Classes) to "pill", Isolate Objects
- The Isolate Objects preprocessing step was added to convert this object detection project into a suitable format for export in OpenAI's CLIP annotation format so that it could be used as a classifcation model (classification dataset available here: https://universe.roboflow.com/mohamed-attia-e2mor/pill-classification)
Mohamed Attia - LinkedIn
Overview
This project started over 3 years ago, where I wanted to make something that would draw out football plays automatically. Last year I hit a break through in my python development where I could track players individually. Roboflow has allowed me to track players by position groups.
Classes
Some of them are straight forward like Center, QB (quarterback), db (defensive back), lb (linebacker), but the rest are identified as skill. That means an offensive player like Runningback, Fullback, Tightend, H-back, Wide Reciever.
The project in action
I haven't made a video with myself using roboflow but I will shortly. You can see the project on my linkedin and how it's grown and will continue to grow.
My LinkedIn
Background Information
This dataset was curated and annotated by Mohamed Attia.
The original dataset (v1) is composed of 451 images of various pills that are present on a large variety of surfaces and objects.
The dataset is available under the Public License.
Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
Dataset Versions
Version 1 (v1) - 496 images
- Preprocessing: Auto-Orient and Resize (Stretch to 416x416)
- Augmentations: No augmentations applied
- Training Metrics: This version of the dataset was not trained
Version 2 (v2) - 1,190 images
-
Preprocessing: Auto-Orient, Resize (Stretch to 416x416), all classes remapped (Modify Classes) to "pill"
-
Augmentations:
Outputs per training example: 3
90° Rotate: Clockwise, Counter-Clockwise, Upside Down
Shear: ±5° Horizontal, ±5° Vertical
Hue: Between -25° and +25°
Saturation: Between -10% and +10%
Brightness: Between -10% and +10%
Exposure: Between -10% and +10%
Noise: Up to 2% of pixels
Cutout: 5 boxes with 5% size each -
Trained from the COCO Checkpoint in Public Models ("transfer learning") on Roboflow
NOTE:
The Isolate Objects preprocessing step was added to convert the original object detection project into a suitable format for export in OpenAI's CLIP annotation format so that it could be used as a classifcation model in this project.
Mohamed Attia - LinkedIn
Background Information
This dataset was curated and annotated by - Karel Cornelis.
The original dataset (v1) is composed of 516 images of various ingredients inside a fridge. The project was created as part of a groupwork for a postgraduate applied AI at Erasmus Brussels - we made an object detection model to identify ingredients in a fridge.
From the recipe dataset we used (which is a subset of the recipe1M dataset) we distilled the top50 ingredients and used 30 of those to randomly fill our fridge.
Read this blog post to learn more about the model production process: How I Used Computer Vision to Make Sense of My Fridge
Watch this video to see the model in action: AICook
The dataset is available under the MIT License.
Getting Started
You can download this dataset for use within your own project, fork it into a workspace on Roboflow to create your own model, or test one of the trained versions within the app.
Dataset Versions
Version 1 (v1) - 516 images (original-images)
- Preprocessing: Auto-Orient
- Augmentations: No augmentations applied
- Training Metrics: This version of the dataset was not trained
Version 2 (v2) - 3,050 images (aicook-augmented-trainFromCOCO)
- Preprocessing: Auto-Orient, Resize (Stretch to 416x416)
- Augmentations:
- Outputs per training example: 8
Rotation: Between -3° and +3°
Exposure: Between -20% and +20%
Blur: Up to 3px
Noise: Up to 5% of pixels
Cutout: 12 boxes with 10% size each
- Outputs per training example: 8
- Training Metrics: Trained from the COCO Checkpoint in Public Models ("transfer learning") on Roboflow
- mAP = 97.6%, precision = 86.9%, recall = 98.5%
Version 3 (v3) - 3,050 images (aicook-augmented-trainFromScratch)
- Preprocessing: Auto-Orient, Resize (Stretch to 416x416)
- Augmentations:
- Outputs per training example: 8
Rotation: Between -3° and +3°
Exposure: Between -20% and +20%
Blur: Up to 3px
Noise: Up to 5% of pixels
Cutout: 12 boxes with 10% size each
- Outputs per training example: 8
- Training Metrics: Trained from "scratch" (no transfer learning employed) on Roboflow
- mAP = 97.9%, precision = 79.6%, recall = 98.6%
Version 4 (v4) - 3,050 images images (aicook-augmented)
- Preprocessing: Auto-Orient, Resize (Stretch to 416x416)
- Augmentations:
- Outputs per training example: 8
Rotation: Between -3° and +3°
Exposure: Between -20% and +20%
Blur: Up to 3px
Noise: Up to 5% of pixels
Cutout: 12 boxes with 10% size each
- Outputs per training example: 8
- Training Metrics: This version of the dataset was not trained
Karel Cornelis - LinkedIn
Overview
The Drowsiness dataset is a collection of images of a person in a vehicle (Ritesh Kanjee, of Augmented Startups) simulating "drowsy" and "awake" facial postures. This dataset can easily be used as a benchmark for a "driver alertness" or "driver safety" computer vision model.
Example Footage!
Training and Deployment
The Drowsiness model has been trained with Roboflow Train, and available for inference on the Dataset tab. We have also trained a YOLOR model for robust detection and tracking of a fatigued driver. You can learn more here: https://augmentedstartups.info/YOLOR-Get-Started
About Augmented Startups
We are at the forefront of Artificial Intelligence in computer vision. With over 94k subscribers on YouTube, we embark on fun and innovative projects in this field and create videos and courses so that everyone can be an expert in this field. Our vision is to create a world full of inventors that can turn their dreams into reality.
Overview
This dataset contains images taken of QR Codes in variable lighting conditions at different angles.
Trained Model with Roboflow Train
High Performance
- 99.5% mAP
- 100.0% precision
- 99.2% recall
Testing The Model
You can test this trained model by dropping an image on this page or via curl command
base64 YOUR_IMAGE.jpg | curl -d @- "https://detect.roboflow.com/qr-code-oerhe/1?api_key=YOUR_API_KEY"
Kaylee from Team Roboflow demos how to train a rock paper scissors object detector with this dataset.
Try it out on your Webcam! And remember, if it doesn't work so well, to make it better, you can upload and annotate new images of yourself doing rock paper scissors to futher educate the model.
Ultimate Fighting Champion
This data set and object detection model detects recent UFC champions.
About this Dataset
This dataset was created by exporting the Oxford Pets dataset from Roboflow Universe, generating a version with Modify Classes to drop all of the classes for the labeled dog breeds and consolidating all cat breeds under the label, "cat." The bounding boxes were also modified to incude the entirety of the cats within the images, rather than only their faces/heads.
Oxford Pets
-
The Oxford Pets dataset (also known as the "dogs vs cats" dataset) is a collection of images and annotations labeling various breeds of dogs and cats. There are approximately 100 examples of each of the 37 breeds. This dataset contains the object detection portion of the original dataset with bounding boxes around the animals' heads.
-
Origin: This dataset was collected by the Visual Geometry Group (VGG) at the University of Oxford.
This dataset is derived by the following publication:
Kaspars Sudars, Janis Jasko, Ivars Namatevs, Liva Ozola, Niks Badaukis,
Dataset of annotated food crops and weed images for robotic computer vision control,
Data in Brief,
Volume 31,
2020,
105833,
ISSN 2352-3409,
https://doi.org/10.1016/j.dib.2020.105833.
(https://www.sciencedirect.com/science/article/pii/S2352340920307277)
Abstract: Weed management technologies that can identify weeds and distinguish them from crops are in need of artificial intelligence solutions based on a computer vision approach, to enable the development of precisely targeted and autonomous robotic weed management systems. A prerequisite of such systems is to create robust and reliable object detection that can unambiguously distinguish weed from food crops. One of the essential steps towards precision agriculture is using annotated images to train convolutional neural networks to distinguish weed from food crops, which can be later followed using mechanical weed removal or selected spraying of herbicides. In this data paper, we propose an open-access dataset with manually annotated images for weed detection. The dataset is composed of 1118 images in which 6 food crops and 8 weed species are identified, altogether 7853 annotations were made in total. Three RGB digital cameras were used for image capturing: Intel RealSense D435, Canon EOS 800D, and Sony W800. The images were taken on food crops and weeds grown in controlled environment and field conditions at different growth stages
Keywords: Computer vision; Object detection; Image annotation; Precision agriculture; Crop growth and development
Overview
The Aerial Sheep dataset contains images taken from a birds-eye view with instances of sheep in them. Images do not differentiate between gender or breed of sheep, instead grouping them into a single class named "sheep".
Example Footage
See RIIS's sheep counter application for additional use case examples.
Link - https://riis.com/blog/counting-sheep-using-drones-and-ai/
About RIIS
QR Code and Bar Code Dataset
Dataset (downloadable)
- 2.5k Images
- ~50/50 distribution of QR Code and Bar Code Classes
Training Object Detection Model
In progress using Roboflow train (click to train using YOLOV5)
Background Information
This dataset was created by Michael Shamash and contains the images used to train the OnePetri plaque detection model (plaque detection model v1.0).
In microbiology, a plaque is defined as a “clear area on an otherwise opaque field of bacteria that indicates the inhibition or dissolution of the bacterial cells by some agent, either a virus or an antibiotic. Plaques are a sensitive laboratory indicator of the presence of some anti-bacterial factor.”
When working with bacteriophages (phages), viruses which can only infect and kill bacteria, scientists often need to perform the time-intensive monotonous task of counting plaques on Petri dishes. To help solve this problem I developed OnePetri, a set of machine learning models and a mobile phone application (currently iOS-only) that accelerates common microbiological Petri dish assays using AI.
A task that once took microbiologists several minutes to do per Petri dish (adds up quickly considering there are often tens of Petri dishes to analyze at a time!) could now be mostly automated thanks to computer vision, and completed in a matter of seconds.
App in Action
Example Image
A total of 43 source images were used in this dataset with the following split: 29 training, 9 validation, 5 testing (2505 images after preprocessing and augmentations are applied).
OnePetri is a mobile phone application (currently iOS-only) which accelerates common microbiological Petri dish assays using AI. OnePetri's YOLOv5s plaque detection model was trained on a diverse set of images from the HHMI's SEA-PHAGES program, many of which are included in this dataset. This project wouldn't be possible without their support!
The following pre-processing options were applied:
- Auto-orient
- Tile image into 5 rows x 5 columns
- Resize tiles to 416px x 416px
The following augmentation options were applied:
- Grayscale (35% of images)
- Hue shift (-45deg to +45deg)
- Blur up to 2px
- Mosaic
For more information and to download OnePetri please visit: https://onepetri.ai/.
This project was created for research work by
Dainius Varna and Vytautas Abromavičius of Vilnius Gediminias Technical University in Lithuania.
The dataset consists of images of SMD-type electronic components, which are moving on a conveyor belt. There are four types of components in the collected dataset:
- Capacitors
- Resistors
- Diodes
- Transistors
This is the initial dataset that was augmented to create the final model. Download the raw-images dataset version (v2) of this project to start your own custom project.
- A research paper on the project, titled A System for a Real-Time Electronic Component Detection and Classification on a Conveyor Belt was published in the MDPI Applied Sciences journal on May 31, 2022.
Department of Electronic Systems, Vilnius Gediminas Technical University (VILNIUS TECH), 03227 Vilnius, Lithuania; vgtu@vgtu.lt
- Correspondence: vytautas.abromavicius@vilniustech.lt
Dataset Collection
The dataset was collected using Nvidia Data Capture Control.
Figure 3. Four types of electronic components used for the dataset. (a) capacitor, (b) resistor, (c) diode, (d) transistor.
Abstract:
The presented research addresses the real-time object detection problem with small and moving objects, specifically the surface-mount component on a conveyor.
Detecting and counting small moving objects on the assembly line is a challenge. In order to meet the requirements of real-time applications, state-of-the-art electronic component detection and classification algorithms are implemented into powerful hardware systems.
This work proposes a low-cost system with an embedded microcomputer to detect surface-mount components on a conveyor belt in real time. The system detects moving, packed, and unpacked surface-mount components.
The system’s performance was experimentally investigated by implementing several object-detection algorithms. The system’s performance with different algorithm implementations was compared using mean average precision and inference time. The results of four different surface-mount components showed average precision scores of 97.3% and 97.7% for capacitor and resistor detection.
The findings suggest that the system with the implemented YOLOv4-tiny algorithm on the Jetson Nano 4 GB microcomputer achieves a mean average precision score of 88.03% with an inference time of 56.4 ms and 87.98% mean average precision with 11.2 ms inference time on the Tesla P100 16 GB platform.
Overview
The Surfline Surfer Spotting dataset contains images with surfers floating on the coast. Each image contains one classification called "surfer" but may contain multiple surfers.
Example Footage
Using this Dataset
There are several deployment options available, including inferring via API, webcam, and curl command.
Here is a code snippet for to hit the hosted inference API you can use. Here are code snippets for more languages
const axios = require("axios");
const fs = require("fs");
const image = fs.readFileSync("YOUR_IMAGE.jpg", {
encoding: "base64"
});
axios({
method: "POST",
url: "https://detect.roboflow.com/surfer-spotting/2",
params: {
api_key: "YOUR_KEY"
},
data: image,
headers: {
"Content-Type": "application/x-www-form-urlencoded"
}
})
.then(function(response) {
console.log(response.data);
})
.catch(function(error) {
console.log(error.message);
});
Download Dataset
On the versions tab you can select the version you like, and choose to download in 26 annotation formats.
This classification dataset is from Kaggle and was uploaded to Kaggle by Paul Mooney.
It contains over 5,000 images of chest x-rays in two categories: "PNEUMONIA" and "NORMAL."
- Version 1 contains the raw images, and only has the pre-processing feature of "Auto-Orient" applied to strip out EXIF data, and ensure all images are "right side up."
- Version 2 contains the raw images with pre-processing features of "Auto-Orient" and Resize of 640 by 640 applied
- Version 3 was trained with Roboflow's model architecture for classification datasets and contains the raw images with pre-processing features of "Auto-Orient" and Resize of 640 by 640 applied + augmentations:
- Outputs per training example: 3
- Shear: ±3° Horizontal, ±2° Vertical
- Saturation: Between -5% and +5%
- Brightness: Between -5% and +5%
- Exposure: Between -5% and +5%
Below you will find the description provided on Kaggle:
Context
http://www.cell.com/cell/fulltext/S0092-8674(18)30154-5
Figure S6. Illustrative Examples of Chest X-Rays in Patients with Pneumonia, Related to Figure 6
The normal chest X-ray (left panel) depicts clear lungs without any areas of abnormal opacification in the image. Bacterial pneumonia (middle) typically exhibits a focal lobar consolidation, in this case in the right upper lobe (white arrows), whereas viral pneumonia (right) manifests with a more diffuse ‘‘interstitial’’ pattern in both lungs.
http://www.cell.com/cell/fulltext/S0092-8674(18)30154-5
Content
The dataset is organized into 3 folders (train, test, val) and contains subfolders for each image category (Pneumonia/Normal). There are 5,863 X-Ray images (JPEG) and 2 categories (Pneumonia/Normal).
Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou. All chest X-ray imaging was performed as part of patients’ routine clinical care.
For the analysis of chest x-ray images, all chest radiographs were initially screened for quality control by removing all low quality or unreadable scans. The diagnoses for the images were then graded by two expert physicians before being cleared for training the AI system. In order to account for any grading errors, the evaluation set was also checked by a third expert.
Acknowledgements
Data: https://data.mendeley.com/datasets/rscbjbr9sj/2
License: CC BY 4.0
Citation: http://www.cell.com/cell/fulltext/S0092-8674(18)30154-5
Inspiration
Automated methods to detect and classify human diseases from medical images.
Art Classification Dataset
This classification dataset contains artistic movement art images that ranges from Abstract Expressionism to Pop Art.
Classes
- Abstract_Expressionism
- Action_painting
- Analytical_Cubism
- Art_Nouveau_Modern
- Baroque
- Color_Field_Painting
- Contemporary_Realism
- Cubism
- Early_Renaissance
- Expressionism
- Fauvism
- High_Renaissance
- Impressionism
- Mannerism_Late_Renaissance
- Minimalism
- Naive_Art_Primitivism
- New_Realism
- Northern_Renaissance
- Pointillism
- Pop_Art
- Post_Impressionism
- Realism
- Rococo
- Romanticism
- Symbolism
- Synthetic_Cubism
- Ukiyo_e
Using this computer vision model
The computer vision model that's been trained for this dataset can be used to help identify art from the artistic movement classes above.
Apex Enemy Detection
Use this PRE-TRAINED MODEL to create an aimbot and identify enemies
Use your webcam to infer, or use the hosted inference API
More deployment options are available
Banana Ripening Process Dataset and Model
This dataset contains images of the classes below:
- freshripe
- freshunripe
- overripe
- ripe
- rotten
- unripe
Usage
This is an object detection model that can be used to possibly identify where in the Fruit Ripening Process fruit at stores are and when to take them off the shelves and put them in composting.
Overview
The Numberplate Dataset is a collection of Licence Plates that can easily be used for Automatic Number Plate Detection.
Example Footage!
Training and Deployment
The Number Plate model has been trained in Roboflow, and available for inference on the Dataset tab.
One could also build a Automatic Number Plate Recognition [ANPR] App using YOLOR and EasyOCR. This is achieved using the Roboflow Platform which you can deploy the model for robust and real-time ANPR.
About Augmented Startups
We are at the forefront of Artificial Intelligence in computer vision. With over 92k subscribers on YouTube, we embark on fun and innovative projects in this field and create videos and courses so that everyone can be an expert in this field. Our vision is to create a world full of inventors that can turn their dreams into reality
Recyclable Items Dataset and Model
This is an object detection dataset that contains the classes below:
- Plastic
- Glass
- Metal
Usage
This model can be potentially used to detect the objects above in an effort to sort them in a recycling center or an Automated River Cleaning System that uses computer vision.
Garbage Object-Detection to Identify Disposal Class
This dataset detects various kinds of waste, labeling with a class that indentifies how it should be disposed
Overview
This is the largest Gastrointestinal dataset generously provided by Simula Research Laboratory in Norway
You can read their research paper here in Nature
In total, the dataset contains 10,662 labeled images stored using the JPEG format. The images can be found in the images folder. The classes, which each of the images belong to, correspond to the folder they are stored in (e.g., the ’polyp’ folder contains all polyp images, the ’barretts’ folder contains all images of Barrett’s esophagus, etc.). Each class-folder is located in a subfolder describing the type of finding, which again is located in a folder describing wheter it is a lower GI or upper GI finding. The number of images per class are not balanced, which is a general challenge in the medical field due to the fact that some findings occur more often than others. This adds an additional challenge for researchers, since methods applied to the data should also be able to learn from a small amount of training data. The labeled images represent 23 different classes of findings.
The data is collected during real gastro- and colonoscopy examinations at a Hospital in Norway and partly labeled by experienced gastrointestinal endoscopists.
Use Cases
"Artificial intelligence is currently a hot topic in medicine. The fact that medical data is often sparse and hard to obtain due to legal restrictions and lack of medical personnel to perform the cumbersome and tedious labeling of the data, leads to technical limitations. In this respect, we share the Hyper-Kvasir dataset, which is the largest image and video dataset from the gastrointestinal tract available today."
"We have used the labeled data to research the classification and segmentation of GI findings using both computer vision and ML approaches to potentially be used in live and post-analysis of patient examinations. Areas of potential utilization are analysis, classification, segmentation, and retrieval of images and videos with particular findings or particular properties from the computer science area. The labeled data can also be used for teaching and training in medical education. Having expert gastroenterologists providing the ground truths over various findings, HyperKvasir provides a unique and diverse learning set for future clinicians. Moreover, the unlabeled data is well suited for semi-supervised and unsupervised methods, and, if even more ground truth data is needed, the users of the data can use their own local medical experts to provide the needed labels. Finally, the videos can in addition be used to simulate live endoscopies feeding the video into the system like it is captured directly from the endoscopes enable developers to do image classification."
Borgli, H., Thambawita, V., Smedsrud, P.H. et al. HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Sci Data 7, 283 (2020). https://doi.org/10.1038/s41597-020-00622-y
Using this Dataset
Hyper-Kvasir is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source. This means that in all documents and papers that use or refer to the Hyper-Kvasir dataset or report experimental results based on the dataset, a reference to the related article needs to be added: PREPRINT: https://osf.io/mkzcq/. Additionally, one should provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
About Roboflow
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their boilerplate code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.
Smoke Detection Dataset
This computer vision smoke detection dataset contains images of synthsized smoke in both indoor and outdoor settings. Check out the source link below for more information on this dataset.
source:
Smoke100k dataset
https://bigmms.github.io/cheng_gcce19_smoke100k/
Use Cases
- Identifying smoke indoors
- Identifying smoke outdoors (but not with aerial imagery)
- Identifying smoke-like object (eg: mist/steam from humidifiers)
Testing
You can test this model by using the Roboflow Inference Widget found above. The action hits the model inference API, which in turn produces the color coded bounding boxes on the objects the model was trained to detect, along with its labels, and confidence for each prediction. The feature also produces the JSON output provided by the API.
Overview
The Drone Gesture Control Dataset is an object detection dataset that mimicks DJI's air gesture capability. This dataset consists of hand and body gesture commands that you can command your drone to either ,'take-off', 'land' and'follow'.
Example Footage
Model Training and Inference
The model for this dataset has been trained on Roboflow the Dataset tab, with exports to the OpenCV AI Kit, which is running on the drone in this example.
One could also build a model using MobileNet SSD using the Roboflow Platform deploy it to the OpenCV AI Kit. Watch the full tutorial here: https://augmentedstartups.info/AI-Drone-Tutorial
Using this Dataset
Use the fork button to copy this dataset to your own Roboflow account and export it with new preprocessing settings, or additional augmentations to make your model generalize better.
About Augmented Startups
We are at the forefront of Artificial Intelligence in computer vision. We embark on fun and innovative projects in this field and create videos and courses so that everyone can be an expert in this field. Our vision is to create a world full of inventors that can turn their dreams into reality.
Overview
Via https://rpc-dataset.github.io:
- This dataset enjoys the following characteristics: (1) It is by far the largest dataset in terms of both product image quantity and product categories. (2) It includes single-product images taken in a controlled environment and multi-product images taken by the checkout system. (3) It provides different levels of annotations for the checkout images. Comparing with the existing datasets, ours is closer to the realistic setting and can derive a variety of research problems.
Use Cases
This dataset could be used to create an automatic item counter or checkout system using computer vision with Roboflow's API, Python Package, or other deployment options, such as Web Browser, iOS device, or to an Edge Device: https://docs.roboflow.com/inference/hosted-api.
Using this Dataset
This dataset has been licensed on a CC BY 4.0 license. You can copy, redistribute, and modify the images as long as there is appropriate credit to the authors of the dataset.
About Roboflow
Roboflow creates tools that make computer vision easy to use for any developer, even if you're not a machine learning expert. You can use it to organize, label, inspect, convert, and export your image datasets. And even to train and deploy computer vision models with no code required.
This dataset was originally created by Raya Al. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/raya-al/french-paintings-dataset-d2vbe.
This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.
Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark
This dataset was originally created by Anonymous.
This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.
Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark
CIFAR-10
The CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset. They were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.
- More info on CIFAR-10: https://www.cs.toronto.edu/~kriz/cifar.html
- TensorFlow listing of the dataset: https://www.tensorflow.org/datasets/catalog/cifar10
- GitHub repo for converting CIFAR-10
tarball
files topng
format: https://github.com/knjcode/cifar2png
All images were sized 32x32 in the original dataset
The CIFAR-10
dataset consists of 60,000 32x32 colour images in 10 classes
, with 6,000 images per class. There are 50,000
training images and 10,000 test
images [in the original dataset].
The dataset is divided into five training batches and one test batch, each with 10,000 images. The test
batch contains exactly 1,000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5,000 images from each class.
Here are the classes in the dataset, as well as 10 random images from each:
The classes are completely mutually exclusive. There is no overlap between automobiles
and trucks
. Automobile
includes sedans, SUVs, things of that sort. Truck
includes only big trucks. Neither includes pickup trucks.
Version 1 (original-images_Original-CIFAR10-Splits):
- Original images, with the original splits for CIFAR-10:
train
(83.33% of images - 50,000 images) set andtest
(16.67% of images - 10,000 images) set only. - This version was not trained
Version 3 (original-images_trainSetSplitBy80_20):
- Original, raw images, with the
train
set split to provide 80% of its images to the training set (approximately 40,000 images) and 20% of its images to the validation set (approximately 10,000 images) - https://blog.roboflow.com/train-test-split/
Citation:
@TECHREPORT{Krizhevsky09learningmultiple,
author = {Alex Krizhevsky},
title = {Learning multiple layers of features from tiny images},
institution = {},
year = {2009}
}
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
Authors:
- Han Xiao, Kashif Rasul and Roland Vollgraf
- https://arxiv.org/abs/1708.07747
Dataset Obtained From: https://github.com/zalandoresearch/fashion-mnist
All images were sized 28x28 in the original dataset
Fashion-MNIST
is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. We intend Fashion-MNIST
to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.
Here's an example of how the data looks (each class takes three-rows):
Version 1 (original-images_Original-FashionMNIST-Splits):
- Original images, with the original splits for MNIST:
train
(86% of images - 60,000 images) set andtest
(14% of images - 10,000 images) set only. - This version was not trained
Version 3 (original-images_trainSetSplitBy80_20):
- Original, raw images, with the
train
set split to provide 80% of its images to the training set and 20% of its images to the validation set - https://blog.roboflow.com/train-test-split/
Citation:
@online{xiao2017/online,
author = {Han Xiao and Kashif Rasul and Roland Vollgraf},
title = {Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms},
date = {2017-08-28},
year = {2017},
eprintclass = {cs.LG},
eprinttype = {arXiv},
eprint = {cs.LG/1708.07747},
}
COCO 128 is a subset of 128 images of the larger COCO dataset. It reuses the training set for both validation and testing, with the purpose of proving that your training pipeline is working properly and can overfit this small dataset.
COCO 128 is a great dataset to use the first time you are testing out a new model.
Object tracking for cars in my garage for use in home automation.
A simple dataset for benchmarking CreateML object detection models. The images are sampled from COCO dataset with eyes and nose bounding boxes added. It’s not meant to be serious or useful in a real application. The purpose is to look at how long it takes to train CreateML models with varying dataset and batch sizes.
Training performance is affected by model configuration, dataset size and batch configuration. Larger models and batches require more memory. I used CreateML object detection project to compare the performance.
Hardware
M1 Macbook Air
- 8 GPU
- 4/4 CPU
- 16G memory
- 512G SSD
M1 Max Macbook Pro
- 24 GPU
- 2/8 CPU
- 32G memory
- 2T SSD
Small Dataset
Train: 144
Valid: 16
Test: 8
Results
batch | M1 ET | M1Max ET | peak mem G |
---|---|---|---|
16 | 16 | 11 | 1.5 |
32 | 29 | 17 | 2.8 |
64 | 56 | 30 | 5.4 |
128 | 170 | 57 | 12 |
Larger Dataset
Train: 301
Valid: 29
Test: 18
Results
batch | M1 ET | M1Max ET | peak mem G |
---|---|---|---|
16 | 21 | 10 | 1.5 |
32 | 42 | 17 | 3.5 |
64 | 85 | 30 | 8.4 |
128 | 281 | 54 | 16.5 |
CreateML Settings
For all tests, training was set to Full Network. I closed CreateML between each run to make sure memory issues didn't cause a slow down. There is a bug with Monterey as of 11/2021 that leads to memory leak. I kept an eye on the memory usage. If it looked like there was a memory leak, I restarted MacOS.
Observations
In general, more GPU and memory with MBP reduces the training time. Having more memory lets you train with larger datasets. On M1 Macbook Air, the practical limit is 12G before memory pressure impacts performance. On M1 Max MBP, the practical limit is 26G before memory pressure impacts performance. To work around memory pressure, use smaller batch sizes.
On the larger dataset with batch size 128, the M1Max is 5x faster than Macbook Air. Keep in mind a real dataset should have thousands of samples like Coco or Pascal. Ideally, you want a dataset with 100K images for experimentation and millions for the real training. The new M1 Max Macbooks is a cost effective alternative to building a Windows/Linux workstation with RTX 3090 24G. For most of 2021, the price of RTX 3090 with 24G is around $3,000.00. That means an equivalent windows workstation would cost the same as the M1Max Macbook pro I used to run the benchmarks.
Full Network vs Transfer Learning
As of CreateML 3, training with full network doesn't fully utilize the GPU. I don't know why it works that way. You have to select transfer learning to fully use the GPU. The results of transfer learning with the larger dataset. In general, the training time is faster and loss is better.
batch | ET min | Train Acc | Val Acc | Test Acc | Top IU Train | Top IU Valid | Top IU Test | Peak mem G | loss |
---|---|---|---|---|---|---|---|---|---|
16 | 4 | 75 | 19 | 12 | 78 | 23 | 13 | 1.5 | 0.41 |
32 | 8 | 75 | 21 | 10 | 78 | 26 | 11 | 2.76 | 0.02 |
64 | 13 | 75 | 23 | 8 | 78 | 24 | 9 | 5.3 | 0.017 |
128 | 25 | 75 | 22 | 13 | 78 | 25 | 14 | 8.4 | 0.012 |
Github Project
The source code and full results are up on Github https://github.com/woolfel/createmlbench
The dataset includes actual pictures of Transmission and Sub-transmission Electrical Structure captured during the official inspection of assets being amortized by the 17 Electric Cooperative of the Philippines to the National Transmission Corporation (TransCo). The pictures are captured in years 2018, 2019 and 2020. Below are the following Electric Cooperatives:
- Bukidnon Sub-transmission Corporation (BSTC)
- Northern Negros Electric Cooperative, Inc. (NONECO)
- South Cotabato 1 Electric Cooperative, Inc. (SOCOTECO 1)
- Cebu 2 Electric Cooperative, Inc. (CEBECO 2)
- Peninsula Electric Cooperative, Inc. (PENELCO)
- Misamis Oriental 2 Electric Cooperative, Inc. (MORESCO 2)
- Davao del Sur Electric Cooperative, Inc. (DASURECO)
- Camiguin Electric Cooperative, Inc. (CAMELCO)
- Iloilo 2 Electric Cooperative, Inc. (ILECO 2)
- Misamis Oriental 1 Electric Cooperative, Inc. (MORESCO 1)
- Davao Oriental Electric Cooperative, Inc. (DORECO)
- Isabela 1 Electric Cooperative, Inc. (ISELCO 1)
- Aklan Electric Cooperative, Inc. (AKELCO)
- Sultan Kudarat Electric Cooperative, Inc. (SUKELCO)
- Zamboanga City Electric Cooperative, Inc. (ZAMCELCO)
- South Cotabato 2 Electric Cooperative, Inc. (SOCOTECO 2)
- Camarines Norte Electric Cooperative, Inc. (CANORECO)
Rotifers, Microbeads and Algae
By Jord Liu and The Exploratorium
Background
This is the Machine Learning half of a larger project at the Exploratorium's Biology Lab called Seeing Scientifically, which is a research project that investigates how to use machine learning and other exhibit technology to best teach visitors in an informal learning context like the Exploratorium.
In this iteration of the project, we train an ML model to detect microscopic animals called rotifers, parts of their body (e.g. head, gut, jaw), and microbeads and algae in real time. This model is then integrated into a museum exhibit kiosk prototype that is deployed live on the Exploratorium's museum floor, and visitor research is collected on the efficacy of the exhibit.
Data and Model
The images used here are captured directly from a microscope feed and then labelled by Exploratorium employees and volunteers. Some include up to hundreds of microbeads or algae, some are brightfield and some are darkfield. They show rotifers in multiple poses, including some where the tails are not readily visible. There is relatively little variance in the images here as the environment is highly controlled. We use tiled data of multiple sizes mixed in with the full images.
We use YOLOv4, though future work includes retraining with YOLO-R, YOLO-v7, and other SOTA models. We also experimented with KeypointRCNN for pose estimation but found that the performance did not exceed our baseline of using YOLOv4 and treating the keypoints as objects.
Current performance by class is:
class_id = 0, name = algae, ap = 64.29% (TP = 176, FP = 79)
class_id = 1, name = bead, ap = 77.01% (TP = 251, FP = 41)
class_id = 2, name = bigbead, ap = 82.46% (TP = 36, FP = 5)
class_id = 3, name = egg, ap = 95.51% (TP = 16, FP = 4)
class_id = 4, name = gut, ap = 82.55% (TP = 70, FP = 13)
class_id = 5, name = head, ap = 78.38% (TP = 59, FP = 3)
class_id = 6, name = mastics, ap = 86.82% (TP = 49, FP = 6)
class_id = 7, name = poop, ap = 56.27% (TP = 34, FP = 15)
class_id = 8, name = rotifer, ap = 72.60% (TP = 83, FP = 17)
class_id = 9, name = tail, ap = 46.14% (TP = 27, FP = 7)
Examples
Screen captures from our exhibit as of July 2022.
Dataset collected from CARLA.
10 classes:
- Good for traffic light detection by color
- Good for traffic sign detection by speed
- Cars,trucks,... have been simplified to "vechicles"
- Bikes, motobikes and persons
More about me
You can find out more about me on my linkedin
VOT2015 Dataset
The dataset comprises 60 short sequences showing various objects in challenging backgrounds. The sequences were chosen from a large pool of sequences including the ALOV dataset, OTB2 dataset, non-tracking datasets, Computer Vision Online, Professor Bob Fisher’s Image Database, Videezy, Center for Research in Computer Vision, University of Central Florida, USA, NYU Center for Genomics and Systems Biology, Data Wrangling, Open Access Directory and Learning and Recognition in Vision Group, INRIA, France. The VOT sequence selection protocol was applied to obtain a representative set of challenging sequences. The dataset is automatically downloaded by the evaluation kit when needed, there is no need to separately download the sequences for the challenge.
Annotations
The sequences were annotated by the VOT committee using rotated bounding boxes in order to provide highly accurate ground truth values for comparing results. The annotations are stored in a text file with the format:
frameN: X1, Y1, X2, Y2, X3, Y3, X4, Y4
where Xi and Yi are the coordinates of corner i of the bounding box in frame N, the N-th row in the text file.
The bounding box was be placed on target such that at most ~30% of pixels within the bounding box corresponded to the background pixels, while containing most of the target. For example, in annotating a person with extended arms, the bounding box was placed such that the arms were not included. Note that in some sequences parts of objects rather than entire objects have been annotated. A rotated bounding box was used to address non-axis alignment of the target. The annotation guidelines have been applied at the judgement of the annotators.
Some targets were partially occluded or were partially out of the image frame. In these cases the bounding box were “inferred” by the annotator to fully contain the object, including the occluded part. For example, if a person’s legs were occluded, the bounding box should also include the non-visible legs.
The annotations have been conducted by three groups of annotators. Each annotator group annotated one third of the dataset and these annotations have been cross-checked by two other groups. The final annotations were checked by the coordinator of the annotation process. The final bounding box annotations have been automatically rectified by replacing a rotated bounding box by an axis-aligned if the ratio of the shortest and longest bounding-box side exceeded 0.95.
Annotators:
Gustavo Fernandez (coordinator)
Jingjing Xiao
Georg Nebehay
Roman Pflugfelder
Koray Aytac