Browse » Environmental

Top Environmental Datasets

Roboflow hosts the world's biggest set of open source environmental datasets and pre-trained computer vision models. Captured from satellites, drones, handheld devices, etc. - these projects can help you find objects of interest in environmental settings, such as oceans, forests, trails, and more.

BIRDSAI: A Dataset for Detection and Tracking in Aerial Thermal Infrared Videos

Authors:

  • Elizabeth Bondi, Harvard University
  • Raghav Jain, University of Southern California
  • Palash Aggrawal, Indraprastha Institute of Information Technology
  • Saket Anand, Indraprastha Institute of Information Technology
  • Robert Hannaford, Duke University
  • Ashish Kapoor, University of Delhi
  • Jim Piavis, The Citadel
  • Shital Shah, University of Mumbai
  • Lucas Joppa, Chief Environmental Officer, Microsoft
  • Bistra Dilkina, University of Southern California
  • Milind Tambe, Harvard University

Published: 2020

Description: The Benchmarking IR Dataset for Surveillance with Aerial Intelligence (BIRDSAI, pronounced bird's-eye) is a long-wave thermal infrared dataset containing nighttime images of animals and humans in Southern Africa. The dataset allows for benchmarking of algorithms for automatic detection and tracking of humans and animals with both real and synthetic videos.

Use Cases: Wildlife Poaching Prevention, Night-time Intruder Detection, Wildlife Monitoring, Animal Behavior Research, Long Distance IR Detection

Download: The data can be downloaded from the Labeled Information Library of Alexandria

Training Dataset Download: https://lilablobssc.blob.core.windows.net/conservationdrones/v01/conservation_drones_train_real.zip

Annotation Format:
We follow the MOT annotation format, which is a CSV with the following columns:

<frame_number>, <object_id>, <x>, <y>, <w>, <h>, <class>, <species>, <occlusion>, <noise>

class: 0 if animals, 1 if humans

species: between -1 and 8 representing species below; 3 and 4 occur only in real data; 5, 6, 7, 8 occur only in synthetic data (note: most very small objects have unknown species)

-1: unknown, 0: human, 1: elephant, 2: lion, 3: giraffe, 4: dog, 5: crocodile, 6: hippo, 7: zebra, 8: rhino

occlusion: 0 if there is no occlusion, 1 if there is an occlusion (i.e., either occluding or occluded) (note: intersection over union threshold of 0.3 used to assign occlusion; more details in paper)

noise: 0 if there is no noise, 1 if there is noise (note: noise labels were interpolated from object locations in previous and next frames; for more than 4 consecutive frames without labels, no noise labels were included; more details in paper)

Acknowledgements: BIRDSAI was supported by Microsoft AI for Earth, NSF CCF-1522054 and IIS-1850477, MURI W911NF-17-1-0370, and the Infosys Center for Artificial Intelligence, IIIT-Delhi . Thanks to the labeling team and the Labeled Information Library of Alexandria for hosting the data.

Citation:

@inproceedings{bondi2020birdsai,
title={BIRDSAI: A Dataset for Detection and Tracking in Aerial Thermal Infrared Videos},
author={Bondi, Elizabeth and Jain, Raghav and Aggrawal, Palash and Anand, Saket and Hannaford, Robert and Kapoor, Ashish and Piavis, Jim and Shah, Shital and Joppa, Lucas and Dilkina, Bistra and Tambe, Milind},
booktitle={WACV},
year={2020}
}

TACO: Trash Annotations in Context Dataset

From: Pedro F. Proença; Pedro Simões

TACO is a growing image dataset of trash in the wild. It contains segmented images of litter taken under diverse environments: woods, roads and beaches. These images are manually labeled according to an hierarchical taxonomy to train and evaluate object detection algorithms. Annotations are provided in a similar format to COCO dataset.

The model in action:

Gif of the model running inference

Examples images from the dataset:

Example Image #2 from the Dataset
Example Image #5 from the Dataset

For more details and to cite the authors:

  • Paper: https://arxiv.org/abs/2003.06975
  • Paper Citation:
    @article{taco2020,
    title={TACO: Trash Annotations in Context for Litter Detection},
    author={Pedro F Proença and Pedro Simões},
    journal={arXiv preprint arXiv:2003.06975},
    year={2020}
    }

All images are captured in the Eidselva river - Stadt Nordfjordeid Norway

This model have been created due to the lack of public models of Atlantic Salmon and other fish in Norwegian rivers.

Add a readme with project details and resources.

Some helpful things you should add are:
A project overview
Descriptions of each class type
Current status and timeline
Links to external resources
Contribution and labeling guidelines

Model for the detection of Espeletia Oswaldiana

This model identifies an endemic and endanger species of plant called Espeletia Oswaldiana. Its natural habitat is at around 2800m a.s.l. in the region of Vado Hondo, municipality of Aquitania, Boyacá, Colombia.

The dataset used compiles aerial images taken with UAV/drone.

Only one class is considered, namely Espeletia Oswaldiana.

Future classes will include other species found in the area as well as different species of Espeletia from other locations.

The project was supported by Corpoboyacá corporation, Department of Boyacá, Colombia

Special thanks to Planeta Dron Servicios Integrales for the drone operation.

This garbage detection dataset uses images of large groupings of garbage. A garbage image dataset can be used to recognize garbage pileups at various distances, depths, environments, and times of day. A garbage recognition API could be used to help governments monitor garbage by using stationary cameras or cameras used during asset management collection efforts.

As is, this dataset helps localize the location of the garbage within an image or video and draw a bounding box around the area to give an idea of the size of the garbage pile. This dataset could be used to label individual garbage as well but current annotations focus on large groupings of garbage.

A similar dataset which labels indivudla garbage can be found here: https://universe.roboflow.com/material-identification/garbage-classification-3

Datasets related to different types of garbage materials can be found here: https://universe.roboflow.com/material-identification

A use case of identifying garbage from images or video could be to understand the location of garbage within a city using Google Images and placing it over a map like this: https://glasgow-litter.garyblackwood.co.uk/

Github for the litter map can be found here: https://github.com/Garee/glasgow-litter

This is a dataset of bumble bee images curated by the Spiesman Lab at Kansas State University

Image example

Overview

This dataset contains 581 images of various shellfish classes for object detection. These images are derived from the Open Images open source computer vision datasets.

This dataset only scratches the surface of the Open Images dataset for shellfish!

Image example

Use Cases

  • Train object detector to differentiate between a lobster, shrimp, and crab.
  • Train object dector to differentiate between shellfish
  • Object detection dataset across different sub-species
  • Object detection among related species
  • Test object detector on highly related objects
  • Train shellfish detector
  • Explore the quality and range of Open Image dataset

Tools Used to Derive Dataset

Image example

These images were gathered via the OIDv4 Toolkit This toolkit allows you to pick an object class and retrieve a set number of images from that class with bound box lables.

We provide this dataset as an example of the ability to query the OID for a given subdomain. This dataset can easily be scaled up - please reach out to us if that interests you.

Detecting Wildfire Smoke with Computer Vision

This dataset is released by AI for Mankind in collaboration with HPWREN under a Creative Commons by Attribution Non-Commercial Share Alike license. The original dataset (and additional images without bounding boxes) can be found in their GitHub repo.

We have mirrored the dataset here for ease of download in a variety of common computer vision formats.

To learn more about this dataset and its possible applications in fighting wildfires, see this case study of Abhishek Ghosh's wildfire detection model.

About Roboflow

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Developers reduce 50% of their boilerplate code when using Roboflow's workflow, save training time, and increase model reproducibility.
:fa-spacer:

Roboflow Wordmark

This dataset was originally created by Anonymous. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/new-workspace-rt1da/solarpaneldetectmodel.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Ammar Abdlmutalib. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/sedimentary-structures/sedimentary-features-rmadz.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Victor Perez, Stephen Groff, Mason Hintermeister. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/sharks/shark-taxonomy.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Roopa Shree, Shriya J. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/cotton-nqp2x/bt-cotton.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Dane Sprsiter. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/dane-sprsiter/barnyard.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Anonymous. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/nasca37/peixos3.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by 윤태원 (yuntaewon), 황혜윤 (hwanghyeyun), 김민서 (gimminseo), 김노현 (gimnohyeon) , 신다홍 (sindahong), 김성수 (gimseongsu). To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/puri/puri4-ygapu.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Yimin Chen. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/workspace-txxpz/underwater-detection.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Nazmuj Shakib Diip, Afraim, Shiam Prodhan. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/commolybroken/dataset-z2vab.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Amro. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/research-project/shallow-cloud.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Nikita Manolis. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/nikita-manolis-je2ii/coral-growth-form.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

Rotifers, Microbeads and Algae

By Jord Liu and The Exploratorium

Background

This is the Machine Learning half of a larger project at the Exploratorium's Biology Lab called Seeing Scientifically, which is a research project that investigates how to use machine learning and other exhibit technology to best teach visitors in an informal learning context like the Exploratorium.

In this iteration of the project, we train an ML model to detect microscopic animals called rotifers, parts of their body (e.g. head, gut, jaw), and microbeads and algae in real time. This model is then integrated into a museum exhibit kiosk prototype that is deployed live on the Exploratorium's museum floor, and visitor research is collected on the efficacy of the exhibit.

Short gif demo of ML detection

Data and Model

The images used here are captured directly from a microscope feed and then labelled by Exploratorium employees and volunteers. Some include up to hundreds of microbeads or algae, some are brightfield and some are darkfield. They show rotifers in multiple poses, including some where the tails are not readily visible. There is relatively little variance in the images here as the environment is highly controlled. We use tiled data of multiple sizes mixed in with the full images.

We use YOLOv4, though future work includes retraining with YOLO-R, YOLO-v7, and other SOTA models. We also experimented with KeypointRCNN for pose estimation but found that the performance did not exceed our baseline of using YOLOv4 and treating the keypoints as objects.

Current performance by class is:
class_id = 0, name = algae, ap = 64.29% (TP = 176, FP = 79)
class_id = 1, name = bead, ap = 77.01% (TP = 251, FP = 41)
class_id = 2, name = bigbead, ap = 82.46% (TP = 36, FP = 5)
class_id = 3, name = egg, ap = 95.51% (TP = 16, FP = 4)
class_id = 4, name = gut, ap = 82.55% (TP = 70, FP = 13)
class_id = 5, name = head, ap = 78.38% (TP = 59, FP = 3)
class_id = 6, name = mastics, ap = 86.82% (TP = 49, FP = 6)
class_id = 7, name = poop, ap = 56.27% (TP = 34, FP = 15)
class_id = 8, name = rotifer, ap = 72.60% (TP = 83, FP = 17)
class_id = 9, name = tail, ap = 46.14% (TP = 27, FP = 7)

Examples

Screen captures from our exhibit as of July 2022.
Rotifer body parts
Microbead detection
Algae detection

Overview

The PlantDoc dataset was originally published by researchers at the Indian Institute of Technology, and described in depth in their paper. One of the paper’s authors, Pratik Kayal, shared the object detection dataset available on GitHub.

PlantDoc is a dataset of 2,569 images across 13 plant species and 30 classes (diseased and healthy) for image classification and object detection. There are 8,851 labels. Read more about how the version available on Roboflow improves on the original version here.

And here's an example image:

Tomato Blight

Fork this dataset (upper right hand corner) to receive the raw images, or (to save space) grab the 416x416 export.

Use Cases

As the researchers from IIT stated in their paper, “plant diseases alone cost the global economy around US$220 billion annually.” Training models to recognize plant diseases earlier dramatically increases yield potential.

The dataset also serves as a useful open dataset for benchmarks. The researchers trained both object detection models like MobileNet and Faster-RCNN and image classification models like VGG16, InceptionV3, and InceptionResnet V2.

The dataset is useful for advancing general agriculture computer vision tasks, whether that be health crop classification, plant disease classification, or plant disease objection.

Using this Dataset

This dataset follows Creative Commons 4.0 protocol. You may use it commercially without Liability, Trademark use, Patent use, or Warranty.

Provide the following citation for the original authors:

@misc{singh2019plantdoc,
                                title={PlantDoc: A Dataset for Visual Plant Disease Detection},
                                author={Davinder Singh and Naman Jain and Pranjali Jain and Pratik Kayal and Sudhakar Kumawat and Nipun Batra},
                                year={2019},
                                eprint={1911.10317},
                                archivePrefix={arXiv},
                                primaryClass={cs.CV}
                            }
                            

About Roboflow

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.

Roboflow Workmark