Browse » Biology

Top Biology Datasets

Roboflow hosts the world's biggest set of open source biology and biological datasets and pre-trained computer vision models. Captured from microscopes, handheld devices, etc. These projects can help you find objects of interest in things like Petri dishes, agar plates, museum or aquarium displays, and more. This section also features a highlighted project from Exploratorium (https://www.exploratorium.edu/), a science education and R&D museum in San Francisco, California.

Overview

This is a dataset of blood cells photos, originally open sourced by cosmicad and akshaylambda.

There are 364 images across three classes: WBC (white blood cells), RBC (red blood cells), and Platelets. There are 4888 labels across 3 classes (and 0 null examples).

Here's a class count from Roboflow's Dataset Health Check:

BCCD health

And here's an example image:

Blood Cell Example

Fork this dataset (upper right hand corner) to receive the raw images, or (to save space) grab the 500x500 export.

Use Cases

This is a small scale object detection dataset, commonly used to assess model performance. It's a first example of medical imaging capabilities.

Using this Dataset

We're releasing the data as public domain. Feel free to use it for any purpose.

It's not required to provide attribution, but it'd be nice! :)

About Roboflow

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Developers reduce 50% of their boilerplate code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.

Roboflow Workmark

Overview

The Roboflow Mask Wearing iOS dataset is an object detection dataset of individuals wearing various types of masks and those without masks. A subset of the images were originally collected by Cheng Hsun Teng from Eden Social Welfare Foundation, Taiwan and relabled by the Roboflow team.

Example images (with masks, and without):
Example Image - With Mask

Example Image - Without Mask

Use Cases

One could use this dataset to build a system for detecting if an individual is wearing a mask in a given photo. PPE detection in high-risk work settings, or general health safety settings are other good use cases.

The dataset has a few batches of images collected only from iPhone's, so as to help improve the performance of model predictions on iPhone's with the Roboflow Mobile iOS SDK.

Using this Dataset

Use the Download this Dataset button to download and import this dataset to your own Roboflow account and export it with new preprocessing settings, perhaps [resized]( for your model's desired format or converted to grayscale, or additional augmentations to make your model generalize better.

You can also import this dataset to your own Roboflow account and export it, or continue working on it on Roboflow to test, improve, and deploy your model.

About Roboflow

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.

Roboflow Workmark

Background Information

This dataset was created by Michael Shamash and contains the images used to train the OnePetri plaque detection model (plaque detection model v1.0).

In microbiology, a plaque is defined as a “clear area on an otherwise opaque field of bacteria that indicates the inhibition or dissolution of the bacterial cells by some agent, either a virus or an antibiotic. Plaques are a sensitive laboratory indicator of the presence of some anti-bacterial factor.”
When working with bacteriophages (phages), viruses which can only infect and kill bacteria, scientists often need to perform the time-intensive monotonous task of counting plaques on Petri dishes. To help solve this problem I developed OnePetri, a set of machine learning models and a mobile phone application (currently iOS-only) that accelerates common microbiological Petri dish assays using AI.

A task that once took microbiologists several minutes to do per Petri dish (adds up quickly considering there are often tens of Petri dishes to analyze at a time!) could now be mostly automated thanks to computer vision, and completed in a matter of seconds.

App in Action

Video Clip

Petri Dish

Example Image

Plaque Detection

A total of 43 source images were used in this dataset with the following split: 29 training, 9 validation, 5 testing (2505 images after preprocessing and augmentations are applied).

OnePetri is a mobile phone application (currently iOS-only) which accelerates common microbiological Petri dish assays using AI. OnePetri's YOLOv5s plaque detection model was trained on a diverse set of images from the HHMI's SEA-PHAGES program, many of which are included in this dataset. This project wouldn't be possible without their support!

The following pre-processing options were applied:

  1. Auto-orient
  2. Tile image into 5 rows x 5 columns
  3. Resize tiles to 416px x 416px

The following augmentation options were applied:

  1. Grayscale (35% of images)
  2. Hue shift (-45deg to +45deg)
  3. Blur up to 2px
  4. Mosaic

OnePetri App In Action

For more information and to download OnePetri please visit: https://onepetri.ai/.

This classification dataset is from Kaggle and was uploaded to Kaggle by Paul Mooney.

It contains over 5,000 images of chest x-rays in two categories: "PNEUMONIA" and "NORMAL."

  • Version 1 contains the raw images, and only has the pre-processing feature of "Auto-Orient" applied to strip out EXIF data, and ensure all images are "right side up."
  • Version 2 contains the raw images with pre-processing features of "Auto-Orient" and Resize of 640 by 640 applied
  • Version 3 was trained with Roboflow's model architecture for classification datasets and contains the raw images with pre-processing features of "Auto-Orient" and Resize of 640 by 640 applied + augmentations:
    • Outputs per training example: 3
    • Shear: ±3° Horizontal, ±2° Vertical
    • Saturation: Between -5% and +5%
    • Brightness: Between -5% and +5%
    • Exposure: Between -5% and +5%

Below you will find the description provided on Kaggle:

Context

http://www.cell.com/cell/fulltext/S0092-8674(18)30154-5
Figure S6
Figure S6. Illustrative Examples of Chest X-Rays in Patients with Pneumonia, Related to Figure 6
The normal chest X-ray (left panel) depicts clear lungs without any areas of abnormal opacification in the image. Bacterial pneumonia (middle) typically exhibits a focal lobar consolidation, in this case in the right upper lobe (white arrows), whereas viral pneumonia (right) manifests with a more diffuse ‘‘interstitial’’ pattern in both lungs.
http://www.cell.com/cell/fulltext/S0092-8674(18)30154-5

Content

The dataset is organized into 3 folders (train, test, val) and contains subfolders for each image category (Pneumonia/Normal). There are 5,863 X-Ray images (JPEG) and 2 categories (Pneumonia/Normal).

Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou. All chest X-ray imaging was performed as part of patients’ routine clinical care.

For the analysis of chest x-ray images, all chest radiographs were initially screened for quality control by removing all low quality or unreadable scans. The diagnoses for the images were then graded by two expert physicians before being cleared for training the AI system. In order to account for any grading errors, the evaluation set was also checked by a third expert.

Acknowledgements

Data: https://data.mendeley.com/datasets/rscbjbr9sj/2

License: CC BY 4.0

Citation: http://www.cell.com/cell/fulltext/S0092-8674(18)30154-5
citation - latest version (Kaggle)

Inspiration

Automated methods to detect and classify human diseases from medical images.

This dataset was originally created by Terada Shoma. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/terada-shoma/gram-positive-bacteria.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Simeon Marlokov. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/publictestsite/xray-rheumatology-images-public.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Victor Perez, Stephen Groff, Mason Hintermeister. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/sharks/shark-taxonomy.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Anonymous. To see the current project, which may have been updated since this version, please go here.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Adrian Rodriguez. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/tfg-2nmge/axial-dataset.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Nirmani. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/nirmani/yolo-custome-925.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Ahmad Rabiee. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/ahmad-rabiee/asbest91/.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Danish. To see the current project, which may have been updated since this version, please go here.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Thuan Phat Nguyen. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/objectdetection-9lu9z/detectron2-acl.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Abhishek Dada. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/liver-t5yvf/liver-diseases.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Yousef Ghanem. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/yousef-ghanem-jzj4y/brain-tumor-detection-fpf1f.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Miguel Fernández Cruchaga. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/new-workspace-86q1t/t03-proyecto-celula-dataset-ampliado.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

Rotifers, Microbeads and Algae

By Jord Liu and The Exploratorium

Background

This is the Machine Learning half of a larger project at the Exploratorium's Biology Lab called Seeing Scientifically, which is a research project that investigates how to use machine learning and other exhibit technology to best teach visitors in an informal learning context like the Exploratorium.

In this iteration of the project, we train an ML model to detect microscopic animals called rotifers, parts of their body (e.g. head, gut, jaw), and microbeads and algae in real time. This model is then integrated into a museum exhibit kiosk prototype that is deployed live on the Exploratorium's museum floor, and visitor research is collected on the efficacy of the exhibit.

Short gif demo of ML detection

Data and Model

The images used here are captured directly from a microscope feed and then labelled by Exploratorium employees and volunteers. Some include up to hundreds of microbeads or algae, some are brightfield and some are darkfield. They show rotifers in multiple poses, including some where the tails are not readily visible. There is relatively little variance in the images here as the environment is highly controlled. We use tiled data of multiple sizes mixed in with the full images.

We use YOLOv4, though future work includes retraining with YOLO-R, YOLO-v7, and other SOTA models. We also experimented with KeypointRCNN for pose estimation but found that the performance did not exceed our baseline of using YOLOv4 and treating the keypoints as objects.

Current performance by class is:
class_id = 0, name = algae, ap = 64.29% (TP = 176, FP = 79)
class_id = 1, name = bead, ap = 77.01% (TP = 251, FP = 41)
class_id = 2, name = bigbead, ap = 82.46% (TP = 36, FP = 5)
class_id = 3, name = egg, ap = 95.51% (TP = 16, FP = 4)
class_id = 4, name = gut, ap = 82.55% (TP = 70, FP = 13)
class_id = 5, name = head, ap = 78.38% (TP = 59, FP = 3)
class_id = 6, name = mastics, ap = 86.82% (TP = 49, FP = 6)
class_id = 7, name = poop, ap = 56.27% (TP = 34, FP = 15)
class_id = 8, name = rotifer, ap = 72.60% (TP = 83, FP = 17)
class_id = 9, name = tail, ap = 46.14% (TP = 27, FP = 7)

Examples

Screen captures from our exhibit as of July 2022.
Rotifer body parts
Microbead detection
Algae detection

Overview

This is a dataset of blood cells photos, originally open sourced by cosmicad and akshaylambda.

There are 364 images across three classes: WBC (white blood cells), RBC (red blood cells), and Platelets. There are 4888 labels across 3 classes (and 0 null examples).

Here's a class count from Roboflow's Dataset Health Check:

BCCD health

And here's an example image:

Blood Cell Example

Fork this dataset (upper right hand corner) to receive the raw images, or (to save space) grab the 500x500 export.

Use Cases

This is a small scale object detection dataset, commonly used to assess model performance. It's a first example of medical imaging capabilities.

Using this Dataset

We're releasing the data as public domain. Feel free to use it for any purpose.

It's not required to provide attribution, but it'd be nice! :)

About Roboflow

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Developers reduce 50% of their boilerplate code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.

Roboflow Workmark