Browse » Food

Top Food Datasets

Open source food computer vision datasets, pre-trained models, and APIs.

Wecome!

This is a project on training the machine to read and pickup wine label information, specifically there are several class labels I look at from each of the wine labels, in each class, specific class attributes (such as under the wine type different attributes: Cabernet Sauvignion or Riesling or Merlot) can be assigned to provide more detailed information:

(1)Maker/Name of the vineyard or producer
(2)Vintage/Year of the wine produced
(3)Whether being sustainable or sustainably farmed
(4)Whether being organic or not
(5)Alcohol level
(6)Appellation Quality in terms of common AVA ratings
(7)Established Year of the vineyard
(8)Whether having any appelation AOC DOC AVA name
(9)Whether Country of the origin can be identified
(10)Whether type of the wine can be identified
(11)Whether there is distinct picture or brand logo
(12) Whether there is indication of sweetness level

I hope we all can help train the machine to be better at reading the wine label and be smarter and make more quality inference rather than just reading and picking up information as it which would be just like an OCR

-Yilong Eric Zheng

Bananen

-> Bilder, die Bündel enthalten, habe ich noch nicht gelabelt, um dem Modell ein generelles Verständnis einer Banane zu geben.
-> Bündel sind für mich als ">3 verbundene Bananen" definiert

Namenskonvention

-> Bilder, die aussehen wie Bananen, aber keine sind, heißen "banana_uncertain...". Über diese müssen wir einmal zusammen entscheiden (ob sie als Negative verwendet werden können, doch gelabelt werden, oder sogar ganz entfernt werden sollen)

Generell

-> Immer alle Objekte labeln!
-> Fehlende Labels sind sehr schlecht, aber immer noch besser als falsche Labels.

Überdeckung

(-> Wenn ein Objekt von etwas Anderem überdeckt wird, muss so gelabelt werden, als wäre es komplett sichtbar, damit das Modell die Größendimension verstehen kann.) Bananen bilden eine Ausnahme
-> Wenn sich ein Objekt am Rand des Bildes befindet, muss entschieden werden, ob es dem Modell einen Mehrwert geben kann und ob die Größe des Objekts korrekt gelabelt werden kann.

Vollständigkeit

-> Die Boxen so eng wie möglich an das Objekt machen.
-> Niemals etwas vom Objekt abschneiden!

Dataset Generierung

-> Bildgröße: 640x640 mit "Fit Within", um Seitenverhältnisse beibehalten zu können.
-> Keine weiteren Augmentierungen! Das macht yolo von selbst.

IndianFood-7


About IndianFood-7

IndianFood-7 is created by Ritu Agarwal, Nikunj Bansal, Tanmay Sarkar, Tanupriya Choudhury and Neelu Jyothi Ahuja with a goal of building a Indian Food detection model. It contains more than 800 images of 7 popular Indian food items.

Data collection

We used search engines (Google and Bing) to crawl and look for suitable images using JavaScript queries for each food item from the list created. The images with incomplete RGB channels were removed, and the images collected from different search engines were compiled. When downloading images from search engines, many images were irrelevant to the purpose, especially the ones with a lot of text in them. We deployed the EAST text detector to segregate such images. Finally, a comprehensive manual inspection was conducted to ensure the relevancy of images in the dataset.

Fair use

This dataset contains some copyrighted material whose use has not been specifically authorized by the copyright owners. In an effort to advance scientific research, we make this material available for academic research. If you wish to use copyrighted material in our dataset for purposes of your own that go beyond non-commercial research and academic purposes, you must obtain permission directly from the copyright owner. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for non-commercial research and educational purposes.(adapted from Christopher Thomas).

Overview:

Food waste detection dataset.

Labelling instruction:

  1. Label Every Object of Interest in Every Image
  2. Label the Entirety of an Object
  3. Label Occluded Objects, as if they were fully visible (It is a common misconception that boxes cannot overlap.)
  4. Create Tight Bounding Boxes. The edges of bounding boxes should touch the outermost pixels of the object that is being labeled.
  5. Create Specific Label Names (e.g. white pawn, black pawn, green apple, red apple). We can regroup them later into the same class within roboflow preprocessing step .
  6. Maintain Clear Labeling Instructions

Allergen30


About Allergen30

Allergen30 is created by Mayank Mishra, Nikunj Bansal, Tanmay Sarkar and Tanupriya Choudhury with a goal of building a robust detection model that can assist people in avoiding possible allergic reactions.

It contains more than 6,000 images of 30 commonly used food items which can cause an adverse reaction within a human body. This dataset is one of the first research attempts in training a deep learning based computer vision model to detect the presence of such food items from images. It also serves as a benchmark for evaluating the efficacy of object detection methods in learning the otherwise difficult visual cues related to food items.

Description of class labels

There are multiple food items pertaining to specific food intolerances which can trigger an allergic reaction. Such food intolerance primarily include Lactose, Histamine, Gluten, Salicylate, Caffeine and Ovomucoid intolerance.
Food intolerance

The following table contains the description relating to the 30 class labels in our dataset.

S. No. Allergen Food label Description
1 Ovomucoid egg Images of egg with yolk (e.g. sunny side up eggs)
2 Ovomucoid whole_egg_boiled Images of soft and hard boiled eggs
3 Lactose/Histamine milk Images of milk in a glass
4 Lactose icecream Images of icecream scoops
5 Lactose cheese Images of swiss cheese
6 Lactose/ Caffeine milk_based_beverage Images of tea/ coffee with milk in a cup/glass
7 Lactose/Caffeine chocolate Images of chocolate bars
8 Caffeine non_milk_based_beverage Images of soft drinks and tea/coffee without milk in a cup/glass
9 Histamine cooked_meat Images of cooked meat
10 Histamine raw_meat Images of raw meat
11 Histamine alcohol Images of alcohol bottles
12 Histamine alcohol_glass Images of wine glasses with alcohol
13 Histamine spinach Images of spinach bundle
14 Histamine avocado Images of avocado sliced in half
15 Histamine eggplant Images of eggplant
16 Salicylate blueberry Images of blueberry
17 Salicylate blackberry Images of blackberry
18 Salicylate strawberry Images of strawberry
19 Salicylate pineapple Images of pineapple
20 Salicylate capsicum Images of bell pepper
21 Salicylate mushroom Images of mushrooms
22 Salicylate dates Images of dates
23 Salicylate almonds Images of almonds
24 Salicylate pistachios Images of pistachios
25 Salicylate tomato Images of tomato and tomato slices
26 Gluten roti Images of roti
27 Gluten pasta Images of one serving of penne pasta
28 Gluten bread Images of bread slices
29 Gluten bread_loaf Images of bread loaf
30 Gluten pizza Images of pizza and pizza slices

Data collection

We used search engines (Google and Bing) to crawl and look for suitable images using JavaScript queries for each food item from the list created. The images with incomplete RGB channels were removed, and the images collected from different search engines were compiled. When downloading images from search engines, many images were irrelevant to the purpose, especially the ones with a lot of text in them. We deployed the EAST text detector to segregate such images. Finally, a comprehensive manual inspection was conducted to ensure the relevancy of images in the dataset.

Fair use

This dataset contains some copyrighted material whose use has not been specifically authorized by the copyright owners. In an effort to advance scientific research, we make this material available for academic research. If you wish to use copyrighted material in our dataset for purposes of your own that go beyond non-commercial research and academic purposes, you must obtain permission directly from the copyright owner. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for non-commercial research and educational purposes.(adapted from Christopher Thomas).

Citation

If you find our dataset useful, please cite us as:

@article{mishra2022allergen30,
                              title={Allergen30: Detecting Food Items with Possible Allergens Using Deep Learning-Based Computer Vision},
                              author={Mishra, Mayank and Sarkar, Tanmay and Choudhury, Tanupriya and Bansal, Nikunj and Smaoui, Slim and Rebezov, Maksim and Shariati, Mohammad Ali and Lorenzo, Jose Manuel},
                              journal={Food Analytical Methods},
                              pages={1--34},
                              year={2022},
                              publisher={Springer}
                            }