Browse » Environmental

Top Environmental Datasets

Roboflow hosts the world's biggest set of open source environmental datasets and pre-trained computer vision models. Captured from satellites, drones, handheld devices, etc. - these projects can help you find objects of interest in environmental settings, such as oceans, forests, trails, and more.

cotton
3

This dataset was originally created by Roopa Shree, Shriya J. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/cotton-nqp2x/bt-cotton.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Dane Sprsiter. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/dane-sprsiter/barnyard.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

bees
3

This dataset was originally created by Jordan Bird, Leah Bird, Carrie Ijichi, Aurelie Jolivald, Salisu Wada, Kay Owa, Chloe Barnes of Nottingham Trent University (United Kingdom).

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Yimin Chen. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/workspace-txxpz/underwater-detection.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Ammar Abdlmutalib. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/sedimentary-structures/sedimentary-features-rmadz.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Amro. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/research-project/shallow-cloud.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

Here are a few use cases for this project:

  1. Recycling and Waste Management: Implementing the "Beverage Containers" computer vision model in waste sorting facilities to identify and separate different types of beverage containers for efficient recycling or waste management based on their material (glass, plastic, tin).

  2. Inventory Management in Stores and Supermarkets: Integrate the model into an inventory management system to track, manage and restock different types of beverage containers, assisting storekeepers in maintaining optimal inventory levels and organizing shelves.

  3. Smart Drink Dispenser Machines: Integrating the computer vision model into drink dispenser machines to automatically recognize and recommend the ideal refilling option for the type of beverage container presented, minimizing spillage and ensuring a seamless customer experience.

  4. Public Health and Safety: Using the "Beverage Containers" computer vision model in public gathering spaces, stadiums, or large events to monitor adherence to alcohol and other beverage regulations, identifying non-compliant or inappropriate containers and ensuring the safety of attendees.

  5. Market Research: Analyze the presence of various types of beverage containers in social media posts or public spaces to identify consumer preferences and trends, helping beverage companies and marketers to tailor their products and marketing strategies to meet the demands of different target groups.

Detecting Wildfire Smoke with Computer Vision

This dataset is released by AI for Mankind in collaboration with HPWREN under a Creative Commons by Attribution Non-Commercial Share Alike license. The original dataset (and additional images without bounding boxes) can be found in their GitHub repo.

We have mirrored the dataset here for ease of download in a variety of common computer vision formats.

To learn more about this dataset and its possible applications in fighting wildfires, see this case study of Abhishek Ghosh's wildfire detection model.

About Roboflow

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Developers reduce 50% of their boilerplate code when using Roboflow's workflow, save training time, and increase model reproducibility.
:fa-spacer:

Roboflow Wordmark

This dataset was originally created by Nazmuj Shakib Diip, Afraim, Shiam Prodhan. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/commolybroken/dataset-z2vab.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Wojciech Przydział, Dorota Przydział, Magdalena Przydział-Mazur, Bartłomiej Mazur. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/underwaterpipes/underwater_pipes_orginal_pictures.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Victor Perez, Stephen Groff, Mason Hintermeister. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/sharks/shark-taxonomy.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by 윤태원 (yuntaewon), 황혜윤 (hwanghyeyun), 김민서 (gimminseo), 김노현 (gimnohyeon) , 신다홍 (sindahong), 김성수 (gimseongsu). To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/puri/puri4-ygapu.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

coral
2

This dataset was originally created by Nikita Manolis. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/nikita-manolis-je2ii/coral-growth-form.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Anonymous. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/nasca37/peixos3.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

This dataset was originally created by Anonymous. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/new-workspace-rt1da/solarpaneldetectmodel.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

BIRDSAI: A Dataset for Detection and Tracking in Aerial Thermal Infrared Videos

Authors:

  • Elizabeth Bondi, Harvard University
  • Raghav Jain, University of Southern California
  • Palash Aggrawal, Indraprastha Institute of Information Technology
  • Saket Anand, Indraprastha Institute of Information Technology
  • Robert Hannaford, Duke University
  • Ashish Kapoor, University of Delhi
  • Jim Piavis, The Citadel
  • Shital Shah, University of Mumbai
  • Lucas Joppa, Chief Environmental Officer, Microsoft
  • Bistra Dilkina, University of Southern California
  • Milind Tambe, Harvard University

Published: 2020

Description: The Benchmarking IR Dataset for Surveillance with Aerial Intelligence (BIRDSAI, pronounced bird's-eye) is a long-wave thermal infrared dataset containing nighttime images of animals and humans in Southern Africa. The dataset allows for benchmarking of algorithms for automatic detection and tracking of humans and animals with both real and synthetic videos.

Use Cases: Wildlife Poaching Prevention, Night-time Intruder Detection, Wildlife Monitoring, Animal Behavior Research, Long Distance IR Detection

Download: The data can be downloaded from the Labeled Information Library of Alexandria

Training Dataset Download: https://lilablobssc.blob.core.windows.net/conservationdrones/v01/conservation_drones_train_real.zip

Annotation Format:
We follow the MOT annotation format, which is a CSV with the following columns:

<frame_number>, <object_id>, <x>, <y>, <w>, <h>, <class>, <species>, <occlusion>, <noise>

class: 0 if animals, 1 if humans

species: between -1 and 8 representing species below; 3 and 4 occur only in real data; 5, 6, 7, 8 occur only in synthetic data (note: most very small objects have unknown species)

-1: unknown, 0: human, 1: elephant, 2: lion, 3: giraffe, 4: dog, 5: crocodile, 6: hippo, 7: zebra, 8: rhino

occlusion: 0 if there is no occlusion, 1 if there is an occlusion (i.e., either occluding or occluded) (note: intersection over union threshold of 0.3 used to assign occlusion; more details in paper)

noise: 0 if there is no noise, 1 if there is noise (note: noise labels were interpolated from object locations in previous and next frames; for more than 4 consecutive frames without labels, no noise labels were included; more details in paper)

Acknowledgements: BIRDSAI was supported by Microsoft AI for Earth, NSF CCF-1522054 and IIS-1850477, MURI W911NF-17-1-0370, and the Infosys Center for Artificial Intelligence, IIIT-Delhi . Thanks to the labeling team and the Labeled Information Library of Alexandria for hosting the data.

Citation:

@inproceedings{bondi2020birdsai,
title={BIRDSAI: A Dataset for Detection and Tracking in Aerial Thermal Infrared Videos},
author={Bondi, Elizabeth and Jain, Raghav and Aggrawal, Palash and Anand, Saket and Hannaford, Robert and Kapoor, Ashish and Piavis, Jim and Shah, Shital and Joppa, Lucas and Dilkina, Bistra and Tambe, Milind},
booktitle={WACV},
year={2020}
}

Here are a few use cases for this project:

  1. Wildlife Conservation: The Elephant Detection model can be employed by wildlife organizations and researchers to monitor elephant populations in their natural habitats, track their movements, and analyze their behavior to support conservation efforts.

  2. Anti-poaching Initiatives: The model can help detect and track elephants in real-time, allowing park rangers and other authorities to identify potential poaching activities and intervene before any harm comes to the animals.

  3. Ecotourism Enhancement: Tour operators can use the model to locate elephants during guided safaris or nature walks in wildlife reserves, improving the overall experience for tourists who want to observe these magnificent creatures in the wild.

  4. Habitat Management: The model can assist researchers and conservationists in identifying important elephant habitats and analyzing their conditions, such as vegetation, water access, and potential threats. This information can then be used to develop and implement habitat management plans to ensure the long-term survival of elephant populations.

  5. Smart Wildlife Corridor Planning: The Elephant Detection model can be used to analyze elephant movement patterns, helping urban planners and conservationists develop wildlife corridors that balance the needs of both humans and wildlife, reduce human-elephant conflicts, and protect the overall ecosystem.

TACO: Trash Annotations in Context Dataset

From: Pedro F. Proença; Pedro Simões

TACO is a growing image dataset of trash in the wild. It contains segmented images of litter taken under diverse environments: woods, roads and beaches. These images are manually labeled according to an hierarchical taxonomy to train and evaluate object detection algorithms. Annotations are provided in a similar format to COCO dataset.

The model in action:

Gif of the model running inference

Examples images from the dataset:

Example Image #2 from the Dataset
Example Image #5 from the Dataset

For more details and to cite the authors:

  • Paper: https://arxiv.org/abs/2003.06975
  • Paper Citation:
    @article{taco2020,
    title={TACO: Trash Annotations in Context for Litter Detection},
    author={Pedro F Proença and Pedro Simões},
    journal={arXiv preprint arXiv:2003.06975},
    year={2020}
    }

All images are captured in the Eidselva river - Stadt Nordfjordeid Norway

This model have been created due to the lack of public models of Atlantic Salmon and other fish in Norwegian rivers.

Add a readme with project details and resources.

Some helpful things you should add are:
A project overview
Descriptions of each class type
Current status and timeline
Links to external resources
Contribution and labeling guidelines

Classes and class balance (as of December 22, 2022):

  • cardboard - 1,549
  • rigid_plastic - 622
  • metal - 559
  • soft_plastic - 554

Model for the detection of Espeletia Oswaldiana

This model identifies an endemic and endanger species of plant called Espeletia Oswaldiana. Its natural habitat is at around 2800m a.s.l. in the region of Vado Hondo, municipality of Aquitania, Boyacá, Colombia.

The dataset used compiles aerial images taken with UAV/drone.

Only one class is considered, namely Espeletia Oswaldiana.

Future classes will include other species found in the area as well as different species of Espeletia from other locations.

The project was supported by Corpoboyacá corporation, Department of Boyacá, Colombia

Special thanks to Planeta Dron Servicios Integrales for the drone operation.

This garbage detection dataset uses images of large groupings of garbage. A garbage image dataset can be used to recognize garbage pileups at various distances, depths, environments, and times of day. A garbage recognition API could be used to help governments monitor garbage by using stationary cameras or cameras used during asset management collection efforts.

As is, this dataset helps localize the location of the garbage within an image or video and draw a bounding box around the area to give an idea of the size of the garbage pile. This dataset could be used to label individual garbage as well but current annotations focus on large groupings of garbage.

A similar dataset which labels indivudla garbage can be found here: https://universe.roboflow.com/material-identification/garbage-classification-3

Datasets related to different types of garbage materials can be found here: https://universe.roboflow.com/material-identification

A use case of identifying garbage from images or video could be to understand the location of garbage within a city using Google Images and placing it over a map like this: https://glasgow-litter.garyblackwood.co.uk/

Github for the litter map can be found here: https://github.com/Garee/glasgow-litter

Here are a few use cases for this project:

  1. Pet Store Assistance: This model could be used in pet stores to aid in the behavior analysis and health monitoring of specific fish species. The automatic identification of the fish species can provide a non-invasive way to monitor individual fish and track their activities without the need for physical handling.

  2. Aquatic Veterinary Diagnostics: In veterinary medicine, the model can be used to identify freshwater species that may have specific diseases or health conditions. It could assist aquatic veterinarians in providing targeted treatments for diseases that are specific to particular species.

  3. Home Aquarium Maintenance: The model could benefit aquarium enthusiasts, helping them to monitor and manage the health and well-being of their pet fishes. It facilitates the identification of species for better care, diet, and preventative treatment regimes.

  4. Fish Farming and Aquaculture: This model could play a key role in fish farming industries. It could be used to monitor the population and health of specific species, helping progressive farmers and aquaculture companies keep track of their stocks and mitigate the risks associated with illness or invasive species.

  5. Educational Tool: The model could serve as an educational tool for students studying marine biology and related fields. It could help students familiarize themselves with different freshwater fish species and observe their behaviors and interactions in various environments.

This project labels solar panels collected via a DJI Mavic Air 2 flying over Rancho Santa Fe, California in August 2022. Both rooftop and backyard solar panels are labeled. It was used as the basis for the Using Computer Vision with Drones for Georeferencing blog post and the open source DJI aerial georeferencing project.

53 images labeled with 267 polygons were used to train a computer vision model to detect solar panels from above. It's a demonstration of collecting and annotating data from a drone video and using that to train a machine learning model.

This is a dataset of bumble bee images curated by the Spiesman Lab at Kansas State University

Here are a few use cases for this project:

  1. Environmental Cleanup: Utilize Garbage_ydieh to identify and locate various types of garbage in public spaces like parks, beaches, and rivers to facilitate efficient cleanup efforts by volunteers or waste management teams.

  2. Automated Waste Sorting: Implement Garbage_ydieh in recycling and waste management plants to automatically sort different types of garbage, improving waste processing efficiency and promoting proper recycling practices.

  3. Marine Life Conservation: Employ Garbage_ydieh to monitor coastal areas, shorelines or fishing locations to detect and remove discarded fishing nets and plastic waste, ultimately aiding in the protection of marine ecosystems and wildlife.

  4. Smart Trash Bins: Integrate Garbage_ydieh into smart trash bin systems installed in cities or residential communities to identify and sort waste items into appropriate compartments, enhancing recycling efficiency and reducing improper disposal.

  5. Awareness and Education Campaigns: Use Garbage_ydieh to analyze large sets of images/videos from social media, online platforms, or urban spaces to determine the prevalence and distribution of garbage classes, informing targeted awareness campaigns, and driving data-driven policy-making.

Here are a few use cases for this project:

  1. Underwater Infrastructure Maintenance: The model can help identify and classify underwater pipes for maintenance and repair activities, allowing professionals to easily assess the condition and plan necessary repairs for underwater pipelines and infrastructure.

  2. Environmental Research and Monitoring: The model can be used for assessing the impact of underwater pipes on the surrounding ecosystems and water quality. This would help environmental researchers understand the potential risks of pipe leaks or spills, and develop contingency plans.

  3. Marine Construction Planning: The model can assist engineers in industry or urban development projects by providing them with a clear understanding of existing underwater pipe networks, allowing for better planning and design in areas such as ports, offshore facilities, or coastal development.

  4. Disaster Response and Recovery: In the event of natural disasters like hurricanes or tsunamis, the model can aid in identifying damaged or displaced pipes, helping emergency response teams to prioritize their efforts and make informed decisions for recovery and rebuilding.

  5. Leak Detection and Monitoring: By identifying and classifying underwater pipes, the model can facilitate the monitoring of pipe health to detect potential leaks, leading to timely interventions to minimize environmental and financial impacts.

Image example

Overview

This dataset contains 581 images of various shellfish classes for object detection. These images are derived from the Open Images open source computer vision datasets.

This dataset only scratches the surface of the Open Images dataset for shellfish!

Image example

Use Cases

  • Train object detector to differentiate between a lobster, shrimp, and crab.
  • Train object dector to differentiate between shellfish
  • Object detection dataset across different sub-species
  • Object detection among related species
  • Test object detector on highly related objects
  • Train shellfish detector
  • Explore the quality and range of Open Image dataset

Tools Used to Derive Dataset

Image example

These images were gathered via the OIDv4 Toolkit This toolkit allows you to pick an object class and retrieve a set number of images from that class with bound box lables.

We provide this dataset as an example of the ability to query the OID for a given subdomain. This dataset can easily be scaled up - please reach out to us if that interests you.

Overview

The PlantDoc dataset was originally published by researchers at the Indian Institute of Technology, and described in depth in their paper. One of the paper’s authors, Pratik Kayal, shared the object detection dataset available on GitHub.

PlantDoc is a dataset of 2,569 images across 13 plant species and 30 classes (diseased and healthy) for image classification and object detection. There are 8,851 labels. Read more about how the version available on Roboflow improves on the original version here.

And here's an example image:

Tomato Blight

Fork this dataset (upper right hand corner) to receive the raw images, or (to save space) grab the 416x416 export.

Use Cases

As the researchers from IIT stated in their paper, “plant diseases alone cost the global economy around US$220 billion annually.” Training models to recognize plant diseases earlier dramatically increases yield potential.

The dataset also serves as a useful open dataset for benchmarks. The researchers trained both object detection models like MobileNet and Faster-RCNN and image classification models like VGG16, InceptionV3, and InceptionResnet V2.

The dataset is useful for advancing general agriculture computer vision tasks, whether that be health crop classification, plant disease classification, or plant disease objection.

Using this Dataset

This dataset follows Creative Commons 4.0 protocol. You may use it commercially without Liability, Trademark use, Patent use, or Warranty.

Provide the following citation for the original authors:

@misc{singh2019plantdoc,
                                title={PlantDoc: A Dataset for Visual Plant Disease Detection},
                                author={Davinder Singh and Naman Jain and Pranjali Jain and Pratik Kayal and Sudhakar Kumawat and Nipun Batra},
                                year={2019},
                                eprint={1911.10317},
                                archivePrefix={arXiv},
                                primaryClass={cs.CV}
                            }
                            

About Roboflow

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Developers reduce 50% of their code when using Roboflow's workflow, automate annotation quality assurance, save training time, and increase model reproducibility.

Roboflow Workmark

Rotifers, Microbeads and Algae

By Jord Liu and The Exploratorium

Background

This is the Machine Learning half of a larger project at the Exploratorium's Biology Lab called Seeing Scientifically, which is a research project that investigates how to use machine learning and other exhibit technology to best teach visitors in an informal learning context like the Exploratorium.

In this iteration of the project, we train an ML model to detect microscopic animals called rotifers, parts of their body (e.g. head, gut, jaw), and microbeads and algae in real time. This model is then integrated into a museum exhibit kiosk prototype that is deployed live on the Exploratorium's museum floor, and visitor research is collected on the efficacy of the exhibit.

Short gif demo of ML detection

Data and Model

The images used here are captured directly from a microscope feed and then labelled by Exploratorium employees and volunteers. Some include up to hundreds of microbeads or algae, some are brightfield and some are darkfield. They show rotifers in multiple poses, including some where the tails are not readily visible. There is relatively little variance in the images here as the environment is highly controlled. We use tiled data of multiple sizes mixed in with the full images.

We use YOLOv4, though future work includes retraining with YOLO-R, YOLO-v7, and other SOTA models. We also experimented with KeypointRCNN for pose estimation but found that the performance did not exceed our baseline of using YOLOv4 and treating the keypoints as objects.

Current performance by class is:
class_id = 0, name = algae, ap = 64.29% (TP = 176, FP = 79)
class_id = 1, name = bead, ap = 77.01% (TP = 251, FP = 41)
class_id = 2, name = bigbead, ap = 82.46% (TP = 36, FP = 5)
class_id = 3, name = egg, ap = 95.51% (TP = 16, FP = 4)
class_id = 4, name = gut, ap = 82.55% (TP = 70, FP = 13)
class_id = 5, name = head, ap = 78.38% (TP = 59, FP = 3)
class_id = 6, name = mastics, ap = 86.82% (TP = 49, FP = 6)
class_id = 7, name = poop, ap = 56.27% (TP = 34, FP = 15)
class_id = 8, name = rotifer, ap = 72.60% (TP = 83, FP = 17)
class_id = 9, name = tail, ap = 46.14% (TP = 27, FP = 7)

Examples

Screen captures from our exhibit as of July 2022.
Rotifer body parts
Microbead detection
Algae detection