Browse » Agriculture
Top Agriculture Datasets
The use-cases for computer vision in agriculture are endless. From weed detection, to crop disease treatment, to automated spraying via drones, to autonomous tractors, to color sorting, to livestock monitoring, these datasets and pre-trained models can be used to optimize farmers' productivity, and boost yield, decrease costs, and increase profits.
For more information: https://roboflow.com/industries/agriculture
Top 6 agriculture datasets: https://blog.roboflow.com/top-agriculture-datasets-computer-vision/
PlantDoc dataset overview: https://blog.roboflow.com/introducing-an-improved-plantdoc-dataset-for-plant-disease-object-detection/
Overview
The Weeds dataset is a collection of garden weeds that can easily confuse object detection models due to similiarity of the weeds compared to its surroundings. This dataset was used with YOLOR for object detection to detect weeds in complex backgrounds.
Example Footage!
Training and Deployment
The weeds model has been trained in Roboflow, available for inference on the Dataset tab.
One could also build a Weeds Detector using YOLOR. This is achieved using the Roboflow Platform which you can deploy the model for robust and real-time detections. You can learn more here: https://augmentedstartups.info/YOLOR-Get-Started
About Augmented Startups
We are at the forefront of Artificial Intelligence in computer vision. With over 92k subscribers on YouTube, we embark on fun and innovative projects in this field and create videos and courses so that everyone can be an expert in this field. Our vision is to create a world full of inventors that can turn their dreams into reality
This project was created by Arfiani Nur Sayidah and is for sorting "apples" from "damaged apples."
The classes are "apple" and "damaged_apples"
Original Class Balance:
- apple: 2,152
- damaged_apple: 708
Soybeans kernels counter
Background Information
This dataset was curated and annotated by Mohamed Traore from the Roboflow Team. A custom dataset composed of one class (chicken). The main objective is to identify chicken(s) and perform object-tracking on chicken(s) using Roboflow's "zero shot object tracking."
The original video is from Wendy Thomas (Description: "Definitive proof that the chicken crossed the road to get to the other side.")
The original custom dataset (v1) is composed of 106 images of chickens and their surrounding environment.
The dataset is available under the Public License.
Zero Shot Object Tracking
- Using the video from Wendy Thomas (which was included in this dataset through the use of Roboflow's Video Ingestion tool
Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
Dataset Versions
Version 1 (v1) - 106 images
- Preprocessing: Auto-Orient
- Augmentations: No augmentations applied
- Training Metrics: This version of the dataset was not trained
Version 2 (v2) - 106 images
- Preprocessing: Auto-Orient and Resize (Stretch to 416x416)
- Augmentations: No augmentations applied
- Training Metrics: This version of the dataset was not trained
Version 3 (v3), "v1-augmented-COCO-transferLearning" - 254 images
Trained from the COCO Checkpoint in Public Models ("transfer learning") on Roboflow
- 3x image generation
Version 11 (v11), "v1-augmented-trainFromScratch" - 463 images
Trained from the Version 3 training checkpoint.
- Modify Classes was applied to remap the "chickens" class to "rooster" (meaning "rooster" will show up for the bounding boxes when running inference).
- 3x image generation
Version 12 (v12) - 185 images
- Preprocessing: Auto-Orient, Modify Classes (remap the "chickens" class to "rooster")
- Augmentations: No augmentations applied
- Training Metrics: This version of the dataset was not trained
Mohamed Traore - LinkedIn
The Apple Vision annotated data set contains over 350 images of naturally growing apples on an apple tree. Unlike other existing sets, this set attempted to capture apples growing on trees with different exposures of natural light during the daytime.
The training data was comprised of 77 photos taken of Peter Bloch’s home apple tree. These images were shot between July and September of 2021 on an iPhone 11 camera. After the photos were taken, they were sliced into multiple smaller images with a resolution of 360 × 640 pixels per image. This number was selected as the lowest natural resolution for a CV camera later used in this project.
This set was originally created for the ECE 31 Capstone project at Oregon State University.
Beans, Strawberry and Tomato diseases - v1 2022-09-01
This dataset was exported via roboflow.com on Sep 2, 2022
Roboflow is an end-to-end computer vision platform that helps you
- collaborate with your team on computer vision projects
- collect & organize images
- understand unstructured image data
- annotate, and create datasets
- export, train, and deploy computer vision models
- use active learning to improve your dataset over time
It includes 5494 images.
Diseases are annotated in YOLO v7 PyTorch format.
The following pre-processing was applied to each image:
- Auto-orientation of pixel data (with EXIF-orientation stripping)
- Resize to 416x416 (Stretch)
No image augmentation techniques were applied.
Classes:
Strawberry
'Angular Leafspot'
'Anthracnose Fruit Rot'
'Blossom Blight'
'Gray Mold'
'Leaf Spot'
'Powdery Mildew Fruit'
'Powdery Mildew Leaf'
Tomato
'disease'
'leaf mold'
'spider mites'
Bean
'ALS'
'Bean Rust'
Purpose of the Project
This project started as a way to add real-time counts of bees with/without pollen entering my backyard beehive to append some additional information to a livestream of the hive, and to correlate behavior at the hive entrance to weather, temperature, etc. Since then, I've added additional training data not specific to my hive which accounts for classification of drones and queens in addition to bees/pollen bees. Currently the model generalizes reasonably well, but more training data is required.
Assessed Classes & Labeling Guidelines
- bees (either workers or foragers)
- bees carrying pollen
- drones
- queens
Labeling should cover the entire body of the bee, excluding the wings as per the following example:
For the class of bees carrying pollen, it is acceptable to extend the box to include the visible pollen packs to distinguish this from the bee class:
Sample Results
Sample video of backyard hive entrance with low to moderate level of activity:
https://www.youtube.com/watch?v=qZW5eYd0Yw8&t=2266s
Generic sample video of single bee:
https://www.youtube.com/watch?v=A1x6VA8TWCg
Latest YOLOv5 Weights Files
Latest weights files for use by others are posted on my github here: https://github.com/mattnudi/bee-detection
These files will be updated as more images are added to the dataset
Background Information
This dataset was curated and annotated by Ahmed Elmogtaba Abdelaziz.
The original dataset (v6) is composed of 204 images of honeybees present in a wide variety of scenes.
The dataset is available under a Public License.
Getting Started
You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.
Dataset Versions
Version 5 - 490 images
- Preprocessing: Resize, 416 by 416
- Augmentations:
- 90° Rotate: Clockwise, Counter-Clockwise
Rotation: Between -15° and +15°
Saturation: Between -10% and +10%
Brightness: Between -10% and +10%
Blur: Up to 0.25px
Mosaic: Applied - Output: 3x image generation
2022-10-19发现roboflow导出的图片旋转但标记不旋转的bug! auto-orient也没用
银耳缺陷数据集
缺陷可由以下情况各种组合 标记起来棘手
- m :霉
- l :烂
- b:黑
- q:切伤
- g:根部
- n:粘附
w , h 小于0.04%算小目标了
e1 (功能上一定要识别出来的,不好标记,就标记了几个)
- 极易区分的缺陷部分
e-m (AP .95)
- 白块 (叶片白块化,单片、多片:伴有塌陷)
- 长毛(白毛、绿毛)
- 霉变:烂掉的霉变,烂绿
- 霉变:菇头变霉变,黄绿
e-b
- 纯粹胡须:一小撮、一片(通常环形分布) (面积不大没事)
e-b-l
- 胡须加四周异常色:有点烂的颜色 (面积不大没事)
e-l (需要再细分,AP.27)
- 小面积轻度烂面,发油偏向褐色
- 大面积轻度烂面,发油面积大
e-l-q
- 因烂面被切掉的
e-l-n
- 轻度烂面+粘连污物
e-l-bb
- 重度烂面(褐色、黑褐色)
e-l-g
- 根部四周 e-l
e-n (数据太少)
- 清洗后粘上尘土
- 碰到烤箱壁后的
err
- 不顺眼的:颜色不行
- 其他类别也有被归为此类的
e-s 绳子(样本很少)
e-f (数据太少)
- 被压扁的
This Dataset contains images of popular North American mushrooms, Chicken of the Woods and Chanterelle, differentiating between the two species.
This dataset is an example of an object detection task that is possible via custom training with Roboflow.
Two versions are listed. "416x416" is a 416 resolution version that contains the base images in the dataset. "416x416augmented" contains the same images with various image augmentations applied to build a more robust model.
This Dataset contains images of popular North American mushrooms, Chicken of the Woods and Chanterelle, differentiating between the two species.
This dataset is an example of an object detection task that is possible via custom training with Roboflow.
Two versions are listed. "416x416" is a 416 resolution version that contains the base images in the dataset. "416x416augmented" contains the same images with various image augmentations applied to build a more robust model.
Background Information
This dataset was curated and annotated by - Karel Cornelis.
The original dataset (v1) is composed of 516 images of various ingredients inside a fridge. The project was created as part of a groupwork for a postgraduate applied AI at Erasmus Brussels - we made an object detection model to identify ingredients in a fridge.
From the recipe dataset we used (which is a subset of the recipe1M dataset) we distilled the top50 ingredients and used 30 of those to randomly fill our fridge.
Read this blog post to learn more about the model production process: How I Used Computer Vision to Make Sense of My Fridge
Watch this video to see the model in action: AICook
The dataset is available under the MIT License.
Getting Started
You can download this dataset for use within your own project, fork it into a workspace on Roboflow to create your own model, or test one of the trained versions within the app.
Dataset Versions
Version 1 (v1) - 516 images (original-images)
- Preprocessing: Auto-Orient
- Augmentations: No augmentations applied
- Training Metrics: This version of the dataset was not trained
Version 2 (v2) - 3,050 images (aicook-augmented-trainFromCOCO)
- Preprocessing: Auto-Orient, Resize (Stretch to 416x416)
- Augmentations:
- Outputs per training example: 8
Rotation: Between -3° and +3°
Exposure: Between -20% and +20%
Blur: Up to 3px
Noise: Up to 5% of pixels
Cutout: 12 boxes with 10% size each
- Outputs per training example: 8
- Training Metrics: Trained from the COCO Checkpoint in Public Models ("transfer learning") on Roboflow
- mAP = 97.6%, precision = 86.9%, recall = 98.5%
Version 3 (v3) - 3,050 images (aicook-augmented-trainFromScratch)
- Preprocessing: Auto-Orient, Resize (Stretch to 416x416)
- Augmentations:
- Outputs per training example: 8
Rotation: Between -3° and +3°
Exposure: Between -20% and +20%
Blur: Up to 3px
Noise: Up to 5% of pixels
Cutout: 12 boxes with 10% size each
- Outputs per training example: 8
- Training Metrics: Trained from "scratch" (no transfer learning employed) on Roboflow
- mAP = 97.9%, precision = 79.6%, recall = 98.6%
Version 4 (v4) - 3,050 images images (aicook-augmented)
- Preprocessing: Auto-Orient, Resize (Stretch to 416x416)
- Augmentations:
- Outputs per training example: 8
Rotation: Between -3° and +3°
Exposure: Between -20% and +20%
Blur: Up to 3px
Noise: Up to 5% of pixels
Cutout: 12 boxes with 10% size each
- Outputs per training example: 8
- Training Metrics: This version of the dataset was not trained
Karel Cornelis - LinkedIn
Overview
This dataset contains 581 images of various shellfish classes for object detection. These images are derived from the Open Images open source computer vision datasets.
This dataset only scratches the surface of the Open Images dataset for shellfish!
Use Cases
- Train object detector to differentiate between a lobster, shrimp, and crab.
- Train object dector to differentiate between shellfish
- Object detection dataset across different sub-species
- Object detection among related species
- Test object detector on highly related objects
- Train shellfish detector
- Explore the quality and range of Open Image dataset
Tools Used to Derive Dataset
These images were gathered via the OIDv4 Toolkit This toolkit allows you to pick an object class and retrieve a set number of images from that class with bound box lables.
We provide this dataset as an example of the ability to query the OID for a given subdomain. This dataset can easily be scaled up - please reach out to us if that interests you.
This dataset is derived by the following publication:
Kaspars Sudars, Janis Jasko, Ivars Namatevs, Liva Ozola, Niks Badaukis,
Dataset of annotated food crops and weed images for robotic computer vision control,
Data in Brief,
Volume 31,
2020,
105833,
ISSN 2352-3409,
https://doi.org/10.1016/j.dib.2020.105833.
(https://www.sciencedirect.com/science/article/pii/S2352340920307277)
Abstract: Weed management technologies that can identify weeds and distinguish them from crops are in need of artificial intelligence solutions based on a computer vision approach, to enable the development of precisely targeted and autonomous robotic weed management systems. A prerequisite of such systems is to create robust and reliable object detection that can unambiguously distinguish weed from food crops. One of the essential steps towards precision agriculture is using annotated images to train convolutional neural networks to distinguish weed from food crops, which can be later followed using mechanical weed removal or selected spraying of herbicides. In this data paper, we propose an open-access dataset with manually annotated images for weed detection. The dataset is composed of 1118 images in which 6 food crops and 8 weed species are identified, altogether 7853 annotations were made in total. Three RGB digital cameras were used for image capturing: Intel RealSense D435, Canon EOS 800D, and Sony W800. The images were taken on food crops and weeds grown in controlled environment and field conditions at different growth stages
Keywords: Computer vision; Object detection; Image annotation; Precision agriculture; Crop growth and development
Overview
The Aerial Sheep dataset contains images taken from a birds-eye view with instances of sheep in them. Images do not differentiate between gender or breed of sheep, instead grouping them into a single class named "sheep".
Example Footage
See RIIS's sheep counter application for additional use case examples.
Link - https://riis.com/blog/counting-sheep-using-drones-and-ai/
About RIIS
Banana Ripening Process Dataset and Model
This dataset contains images of the classes below:
- freshripe
- freshunripe
- overripe
- ripe
- rotten
- unripe
Usage
This is an object detection model that can be used to possibly identify where in the Fruit Ripening Process fruit at stores are and when to take them off the shelves and put them in composting.
This dataset was originally created by Melanie S. Capalungan, "B-Jay" Daguio, Isaac Balbuena, Reanne Joy Rafael. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/molds-onbk3/peanuts-mckge/.
This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.
Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark
This dataset was originally created by Roopa Shree, Shriya J. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/cotton-nqp2x/bt-cotton.
This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.
Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark
This dataset was originally created by Arfiani Nur Sayidah. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/arfiani-nur-sayidah-9lizr/apple-sorting-2bfhk.
This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.
Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark
This dataset was originally created by Nirmani. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/nirmani/yolo-custome-925.
This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.
Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark
This dataset was originally created by Quandong Qian. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/quandong-qian/desease-cotton-plant.
This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.
Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark
This dataset was originally created by Rinat Landman. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/lettucedetector/complete_dataset_0910.
This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.
Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark
This dataset was originally created by Jan Douwe. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/jan-douwe/testbl.
This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.
Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark
This dataset was originally created by 윤태원 (yuntaewon), 황혜윤 (hwanghyeyun), 김민서 (gimminseo), 김노현 (gimnohyeon) , 신다홍 (sindahong), 김성수 (gimseongsu). To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/puri/puri4-ygapu.
This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.
Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark
This dataset was originally created by Jordan Bird, Leah Bird, Carrie Ijichi, Aurelie Jolivald, Salisu Wada, Kay Owa, Chloe Barnes of Nottingham Trent University (United Kingdom).
This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.
Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark
Overview
The PlantDoc dataset was originally published by researchers at the Indian Institute of Technology, and described in depth in their paper. One of the paper’s authors, Pratik Kayal, shared the object detection dataset available on GitHub.
PlantDoc is a dataset of 2,569 images across 13 plant species and 30 classes (diseased and healthy) for image classification and object detection. There are 8,851 labels. Read more about how the version available on Roboflow improves on the original version here.
And here's an example image:
Fork
this dataset (upper right hand corner) to receive the raw images, or (to save space) grab the 416x416 export.
Use Cases
As the researchers from IIT stated in their paper, “plant diseases alone cost the global economy around US$220 billion annually.” Training models to recognize plant diseases earlier dramatically increases yield potential.
The dataset also serves as a useful open dataset for benchmarks. The researchers trained both object detection models like MobileNet and Faster-RCNN and image classification models like VGG16, InceptionV3, and InceptionResnet V2.
The dataset is useful for advancing general agriculture computer vision tasks, whether that be health crop classification, plant disease classification, or plant disease objection.
Using this Dataset
This dataset follows Creative Commons 4.0 protocol. You may use it commercially without Liability, Trademark use, Patent use, or Warranty.
Provide the following citation for the original authors:
@misc{singh2019plantdoc,
title={PlantDoc: A Dataset for Visual Plant Disease Detection},
author={Davinder Singh and Naman Jain and Pranjali Jain and Pratik Kayal and Sudhakar Kumawat and Nipun Batra},
year={2019},
eprint={1911.10317},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
About Roboflow
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.
We will try to retain the wide and short size of these images. This is how the Oak-D camera will send video to the NN model, wide and short, so that the weed spraying wand will be directed left or right only, for it dosn't have fore and aft movement yet. The machine moves forward, of course, and we plan that the speed will allow for good timing and spraying of the weeds. For this reason, only the strongest detection will be sent to Raspi and servo for moving the wand; there won't be time to spray two spots in an image. We will train in Yolo5 on Ultralytics, save to Openvino format, then convert that into the "blob" that Oak requires. The Raspi will also run the Python main script and control the relay and pump for herbicide application.
deepNIR: Dataset for generating synthetic NIR images and improved fruit detection system using deep learning techniques
This page introduces our Blueberry dataset (Bounding box) imported from deepFruits dataset and used for producing results. if this dataset is helpful for your research or application, please do consider to cite our paper as follow;
@Article{s22134721,
AUTHOR = {Sa, Inkyu and Lim, Jong Yoon and Ahn, Ho Seok and MacDonald, Bruce},
TITLE = {deepNIR: Datasets for Generating Synthetic NIR Images and Improved Fruit Detection System Using Deep Learning Techniques},
JOURNAL = {Sensors},
VOLUME = {22},
YEAR = {2022},
NUMBER = {13},
ARTICLE-NUMBER = {4721},
URL = {https://www.mdpi.com/1424-8220/22/13/4721},
PubMedID = {35808218},
ISSN = {1424-8220},
DOI = {10.3390/s22134721}
}
You can download our published paper from:
https://www.mdpi.com/1424-8220/22/13/4721
Please contact us if you have questions, suggestions, or concerns to enddl22@gmail.com
deepNIR: Dataset for generating synthetic NIR images and improved fruit detection system using deep learning techniques
This page introduces our Avocado dataset (Bounding box) imported from deepFruits dataset and used for producing results. if this dataset is helpful for your research or application, please do consider to cite our paper as follow;
@Article{s22134721,
AUTHOR = {Sa, Inkyu and Lim, Jong Yoon and Ahn, Ho Seok and MacDonald, Bruce},
TITLE = {deepNIR: Datasets for Generating Synthetic NIR Images and Improved Fruit Detection System Using Deep Learning Techniques},
JOURNAL = {Sensors},
VOLUME = {22},
YEAR = {2022},
NUMBER = {13},
ARTICLE-NUMBER = {4721},
URL = {https://www.mdpi.com/1424-8220/22/13/4721},
PubMedID = {35808218},
ISSN = {1424-8220},
DOI = {10.3390/s22134721}
}
You can download our published paper from:
https://www.mdpi.com/1424-8220/22/13/4721
Please contact us if you have questions, suggestions, or concerns to enddl22@gmail.com
Allergen30
About Allergen30
Allergen30 is created by Mayank Mishra, Nikunj Bansal, Tanmay Sarkar and Tanupriya Choudhury with a goal of building a robust detection model that can assist people in avoiding possible allergic reactions.
It contains more than 6,000 images of 30 commonly used food items which can cause an adverse reaction within a human body. This dataset is one of the first research attempts in training a deep learning based computer vision model to detect the presence of such food items from images. It also serves as a benchmark for evaluating the efficacy of object detection methods in learning the otherwise difficult visual cues related to food items.
Description of class labels
There are multiple food items pertaining to specific food intolerances which can trigger an allergic reaction. Such food intolerance primarily include Lactose, Histamine, Gluten, Salicylate, Caffeine and Ovomucoid intolerance.
The following table contains the description relating to the 30 class labels in our dataset.
S. No. | Allergen | Food label | Description |
---|---|---|---|
1 | Ovomucoid | egg | Images of egg with yolk (e.g. sunny side up eggs) |
2 | Ovomucoid | whole_egg_boiled | Images of soft and hard boiled eggs |
3 | Lactose/Histamine | milk | Images of milk in a glass |
4 | Lactose | icecream | Images of icecream scoops |
5 | Lactose | cheese | Images of swiss cheese |
6 | Lactose/ Caffeine | milk_based_beverage | Images of tea/ coffee with milk in a cup/glass |
7 | Lactose/Caffeine | chocolate | Images of chocolate bars |
8 | Caffeine | non_milk_based_beverage | Images of soft drinks and tea/coffee without milk in a cup/glass |
9 | Histamine | cooked_meat | Images of cooked meat |
10 | Histamine | raw_meat | Images of raw meat |
11 | Histamine | alcohol | Images of alcohol bottles |
12 | Histamine | alcohol_glass | Images of wine glasses with alcohol |
13 | Histamine | spinach | Images of spinach bundle |
14 | Histamine | avocado | Images of avocado sliced in half |
15 | Histamine | eggplant | Images of eggplant |
16 | Salicylate | blueberry | Images of blueberry |
17 | Salicylate | blackberry | Images of blackberry |
18 | Salicylate | strawberry | Images of strawberry |
19 | Salicylate | pineapple | Images of pineapple |
20 | Salicylate | capsicum | Images of bell pepper |
21 | Salicylate | mushroom | Images of mushrooms |
22 | Salicylate | dates | Images of dates |
23 | Salicylate | almonds | Images of almonds |
24 | Salicylate | pistachios | Images of pistachios |
25 | Salicylate | tomato | Images of tomato and tomato slices |
26 | Gluten | roti | Images of roti |
27 | Gluten | pasta | Images of one serving of penne pasta |
28 | Gluten | bread | Images of bread slices |
29 | Gluten | bread_loaf | Images of bread loaf |
30 | Gluten | pizza | Images of pizza and pizza slices |
Data collection
We used search engines (Google and Bing) to crawl and look for suitable images using JavaScript queries for each food item from the list created. The images with incomplete RGB channels were removed, and the images collected from different search engines were compiled. When downloading images from search engines, many images were irrelevant to the purpose, especially the ones with a lot of text in them. We deployed the EAST text detector to segregate such images. Finally, a comprehensive manual inspection was conducted to ensure the relevancy of images in the dataset.
Fair use
This dataset contains some copyrighted material whose use has not been specifically authorized by the copyright owners. In an effort to advance scientific research, we make this material available for academic research. If you wish to use copyrighted material in our dataset for purposes of your own that go beyond non-commercial research and academic purposes, you must obtain permission directly from the copyright owner. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for non-commercial research and educational purposes.(adapted from Christopher Thomas).
Citation
If you find our dataset useful, please cite us as:
@article{mishra2022allergen30,
title={Allergen30: Detecting Food Items with Possible Allergens Using Deep Learning-Based Computer Vision},
author={Mishra, Mayank and Sarkar, Tanmay and Choudhury, Tanupriya and Bansal, Nikunj and Smaoui, Slim and Rebezov, Maksim and Shariati, Mohammad Ali and Lorenzo, Jose Manuel},
journal={Food Analytical Methods},
pages={1--34},
year={2022},
publisher={Springer}
}
Overview
The Fruits
dataset is an image classification dataset of various fruits against white backgrounds from various angles, originally open sourced by GitHub user horea. This is a subset of that full dataset.
Example Image:
Use Cases
Build a fruit classifier! This could be a just-for-fun project just as much as you could be building a color sorter for agricultural use cases before fruits make their way to market.
Using this Dataset
Use the fork
button to copy this dataset to your own Roboflow account and export it with new preprocessing settings (perhaps resized for your model's desired format or converted to grayscale), or additional augmentations to make your model generalize better. This particular dataset would be very well suited for Roboflow's new advanced Bounding Box Only Augmentations.
About Roboflow
Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.
Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.