Browse » Sports

Top Sports Datasets

Sports datasets and computer vision models can be used for real-time analytics and automated post-game analysis. Sports analytics are time intensive to manually collect and error prone due to human mistakes. Computer vision can be used to track detailed analytics with regards to all sports.

Top 7 sports datasets: https://blog.roboflow.com/top-sports-datasets-computer-vision/

Case study: https://blog.roboflow.com/football-kaggle-computer-vision-safety-scaled-yolov4/

AI-assisted coaching: https://blog.roboflow.com/ai-football-coach-playbook/

Video example: https://www.youtube.com/watch?v=uQnRis9j11M

251 images of playing cricket, football & baseball.

Original Dataset from Kaggle - Bikram Saha

This dataset contains 252 images of playing cricket, football, and baseball.
(1) cricket - 95 images
(2) football - 77 images
(3) baseball - 79 images

This is a dataset for image classification in sports. This model will help to identify if the sport or activity occurring in the image or video feed is, or most closely resembles, cricket, football, or baseball.

The raw image versions (v1 or v5) of the dataset can be downloaded, or the entire dataset can be cloned, to your own project for image classification, or to label the figures in the images for object detection, instance or semantic segmentation, etc.

Nike, Adidas and Converse Shoes Dataset for Classification

This dataset was obtained from Kaggle: https://www.kaggle.com/datasets/die9origephit/nike-adidas-and-converse-imaged/

Dataset Collection Methodology:

"The dataset was obtained downloading images from Google images. The images with a .webp format were transformed into .jpg images. The obtained images were randomly shuffled and resized so that all the images had a resolution of 240x240 pixels. Then, they were split into train and test datasets and saved."

Versions:

  • v1: original_raw-images: the original images without Preprocessing or Augmentation applied, other than Auto-Orient to remove EXIF data. These images are in the original train/test split from Kaggle: 237 images in each train set and 38 images in each test set
  • v2: original_trainTestSplit-augmented3x: the original train/test split, augmented with 3x image generation. This version was not trained with Roboflow Train.
  • v3: original_trainTestSplit-augmented5x: the original train/test split, augmented with 5x image generation. This version was not trained with Roboflow Train.
  • v4: rawImages_70-20-10split: the original images without Preprocessing or Augmentation applied, other than Auto-Orient to remove EXIF data. Dataset splies were modified to a 70% train, 20% valid, 10% test train/valid/test split
    • NOTE: 70%/20%/10% split: 576 images in train set, 166 images in valid set, 83 images in test set
  • v5: 70-20-10split-augmented3x: modified to a 70% train, 20% valid, 10% test train/valid/test split, augmented with 3x image generation. This version was trained with Roboflow Train.
  • v6: 70-20-10split-augmented5x: modified to a 70% train, 20% valid, 10% test train/valid/test split, augmented with 5x image generation. This version was trained with Roboflow Train.

Project Overview:

The original goal was to use this model to monitor my rowing workouts and learn more about computer vision. To monitor the workouts, I needed the ability to identify the individual digits on the rowing machine. With the help of Roboflow's computer vision tools, such as assisted labeling, I was able to more quickly prepare, test, deploy and improve my YOLOv5 model.
Example Annotated Image from the Dataset

Inference on a Test Image using the rfWidget

Roboflow's Upload API, which is suitable for uploading images, video, and annotations, worked great with a custom app I developed to modify the predictions from the deployed model, and export them in a format that could be uploaded to my workspace on Roboflow.

What took me weeks to develop can now be done with the help of a single click utilize Roboflow Train, and the Upload API for Active Learning (dataset and model improvement).
Training Results - Roboflow FAST Model

Dataset Classes:

  • 1, 2, 3, 4, 5, 6, 7, 8, 9, 90 (class "90" is a stand-in for the digit, zero)

This dataset consits of 841 images. There are images from a different rowing machine and also from this repo. Some scenes are illuminated with sunlight. Others have been cropped to include only the LCD. Digits like 7, 8, and 9 are underrepresented.

For more information:

Badminton Shuttlecock Object Detection

Golf Ball Object Detection

Usage

This model will perform best on images + videos that are taken on a golf course (similar to photo in thumbnail and dataset).

It's a great model for sports broadcasting and other apps to have automated ball tracking, scoring, lost ball finding and more!

Peyton Manning couldn't help but try to call timeout for the Broncos

on Monday, September 12's ESPN Monday Night Football.

This model is available to use any time you'd like to try make a video including object (timeout) counts of Peyton Manning, or another TV sports analyst, attempting to call timeout, or criticize a team for failing to call timeout.

Classes:

  • timeout
  • hands
    • Note: this class is omitted for the Deploy tab model - it only exists in versions: v1, v2, v3
    • Modify Classes was used to omit the hands class
    • The class was originally included in case the model did not do well with recognizing the timeout hand signal, and post-processing such as comparing the overlap/distance between the predicted hands bounding boxes was necessary for confirming a timeout call

Want to test it for yourself??

  • Click this link for the video and right-click to copy/paste the Video Address for the video in the Deploy tab, or right-click to save the video file and upload it to the Deploy tab for testing.

Overview:

Football (soccer) player and football (soccer) ball detection dataset from Augmented Startups.

  • Project Type: Object Detection
  • Labeled/Annotated with: Bounding boxes

Classes:

  • football, player

How to Use:

This is a great starter-dataset for those wanting to test player and/or ball-tracking for football (soccer) games with the Deploy Tab, or the Deployment device and method of their choice.

Images can also be Cloned to another project to continue iterating on the project and model. World Cup, Premier League, La Liga, Major League Soccer (MLS) and/or Champions League computer vision projects, anyone?

Roboflow offers AutoML model training - Roboflow Train, and the ability to import and export up to 30 different annotation formats. Leaving you flexibility to deploy directly with a Roboflow Train model, or use Roboflow to prepare and manage datasets, and train and deploy with the custom model architecture of your choice + https://github.com/roboflow-ai/notebooks.

Tips for Model and Dataset Improvement:

Background Information

This dataset was curated and annotated by Mohamed Traore and Justin Brady after forking the raw images from the Roboflow Universe Mask Wearing dataset and remapping the mask and no-mask classes to face.

Example Image from the Dataset

The main objective is to identify human faces in images or video. However, this model could be used for privacy purposes with changing the output of the bounding boxes to blur the detected face or fill it with a black box.

The original custom dataset (v1) is composed of 867 unaugmented (raw) images of people in various environments. 55 of the images are marked as Null to help with feature extraction and reducing false detections.

Version 2 (v2) includes the augmented and trained version of the model. This version is trained from the COCO model checkpoint to take advantage of transfer learning and improve initial model training results.

Model Updates:

After a few trainings, and running tests with Roboflow's webcam model and Roboflow's video inference repo, it was clear that edge cases like hands sometimes recognized as faces was an issue. I grabbed some images from Alex Wong's Hand Signs dataset (96 images from the dataset) and added them to the project. I uploaded the images, without the annotation files, labeled all the faces, and retrained the model (version 5).

The dataset is available under the CC BY 4.0 license.

Includes images from:

@misc{ person-hgivm_dataset,
                                title = { person Dataset },
                                type = { Open Source Dataset },
                                author = { Abner },
                                howpublished = { \url{ https://universe.roboflow.com/abner/person-hgivm } },
                                url = { https://universe.roboflow.com/abner/person-hgivm },
                                journal = { Roboflow Universe },
                                publisher = { Roboflow },
                                year = { 2021 },
                                month = { aug },
                                note = { visited on 2022-10-14 },
                            }
                            

A Disk Golf (frisbee) data set. Created for tracking flying disks.

Thanks to Patrick for his frisbees on the ground data: https://universe.roboflow.com/patrick-o-gorman/frisbee

Boxpunch Detector

Onboarding project for Roboflow

This project captures punch types thrown during boxing training

Classes

  • blocking - for players who have their arms under the net while the ball is crossing the net
  • passing - for players who make the first touch below the net
  • spiking - for palyers who make the touch (second or third) on their side of the court after which the ball goes to the other side
  • setting - for players who make the touch after which other player from their team spikes
  • serving - the first touch in the game
  • digging - action after opponent spike, blok, action to save the ball in not ordinary situation, when player should fall, fall and run, use not typical to play

Cyclist Detection

Overview

This model helps detect people riding bycicles, and from which direction the cyclist can be seen (front, back, side).

Use Cases

Both self driving and sports broadcasting are great use cases for this model, as it gives great information about how the camera is positioned relative to the rider(s).

Overview

This project started over 3 years ago, where I wanted to make something that would draw out football plays automatically. Last year I hit a break through in my python development. where I could track players individually. Roboflow has allowed me to track players by position groups.

Classes

Some of them are straight forward like Center, QB (quarterback), db (defensive back), lb (linebacker), but the rest are identified as skill. That means an offensive player like Runningback, Fullback, Tightend, H-back, Wide Reciever.

The project in action

I haven't made a video with myself using roboflow but I will shortly. You can see the project on my linkedin and how it's grown and will continue to grow.
My LinkedIn

Background Information

This dataset was curated and annotated by Ilyes Talbi, Head of La revue IA, a French publication focused on stories of machine learning applications.

Main objetive is to identify if soccer (futbol) players, the referree and the soccer ball (futbol).

The original custom dataset (v1) is composed of 163 images.

  • Class 0 = players
  • Class 1 = referree
  • Class 2 = soccer ball (or futbol)

The dataset is available under the Public License.

Getting Started

You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.

Dataset Versions

Version 7 (v7) - 163 images (raw images)

  • Preprocessing: Auto-Orient, Modify Classes: 3 remapped, 0 dropped
    • Modified Classes: Class 0 = players, Class 1 = referree, Class 2 = futbol
  • Augmentations: No augmentations applied
  • Training Metrics: This version of the dataset was not trained

Version 2 (v2) - 163 images

  • Preprocessing: Auto-Orient and Resize (Stretch to 416x416)
  • Augmentations: No augmentations applied
  • Training Metrics: This version of the dataset was not trained

Version 3 (v3) - 391 images

  • Preprocessing: Auto-Orient and Resize (Stretch to 416x416), Modify Classes: 3 remapped, 0 dropped
    • Modified Classes: Class 0 = players, Class 1 = referree, Class 2 = futbol
  • Augmentations:
    • Outputs per training example: 3
    • Rotation: Between -25° and +25°
    • Shear: ±15° Horizontal, ±15° Vertical
    • Brightness: Between -25% and +25%
    • Blur: Up to 0.75px
    • Noise: Up to 1% of pixels
    • Bounding Box: Blur: Up to 0.5px
  • Training Metrics: 86.4%mAP, 51.8% precision, 90.4% recall

Version 4 (v4) - 391 images

  • Preprocessing: Auto-Orient and Resize (Stretch to 416x416), Modify Classes: 3 remapped, 0 dropped
    • Modified Classes: Class 0 = players, Class 1 = referree, Class 2 = futbol
  • Augmentations:
    • Outputs per training example: 3
    • Rotation: Between -25° and +25°
    • Shear: ±15° Horizontal, ±15° Vertical
    • Brightness: Between -25% and +25%
    • Blur: Up to 0.75px
    • Noise: Up to 1% of pixels
    • Bounding Box: Blur: Up to 0.5px
  • Training Metrics: 84.6% mAP, 52.3% precision, 85.3% recall

Version 5 (v5) - 391 images

  • Preprocessing: Auto-Orient and Resize (Stretch to 416x416), Modify Classes: 3 remapped, 2 dropped
    • Modified Classes: Class 0 = players, Class 1 = referree, Class 2 = futbol
      • Only Class 0, which was remapped to players was included in this version
  • Augmentations:
    • Outputs per training example: 3
    • Rotation: Between -25° and +25°
    • Shear: ±15° Horizontal, ±15° Vertical
    • Brightness: Between -25% and +25%
    • Blur: Up to 0.75px
    • Noise: Up to 1% of pixels
    • Bounding Box: Blur: Up to 0.5px
  • Training Metrics: Trained from the COCO Checkpoint in Public Models ("transfer learning") on Roboflow
    • 98.8%mAP, 76.3% precision, 99.2% recall

Version 6 (v6) - 391 images

  • Preprocessing: Auto-Orient and Resize (Stretch to 416x416), Modify Classes: 3 remapped, 2 dropped
    • Modified Classes: Class 0 = players, Class 1 = referree, Class 2 = futbol
      • Only Class 0, which was remapped to players was included in this version
  • Augmentations:
    • Outputs per training example: 3
    • Rotation: Between -25° and +25°
    • Shear: ±15° Horizontal, ±15° Vertical
    • Brightness: Between -25% and +25%
    • Blur: Up to 0.75px
    • Noise: Up to 1% of pixels
    • Bounding Box: Blur: Up to 0.5px
  • Training Metrics: Trained from Scratch (no transfer learning employed)
    • 95.5%mAP, 67.8% precision, 95.5% recall

Ilyes Talbi - LinkedIn | La revue IA

Overview

This project started over 3 years ago, where I wanted to make something that would draw out football plays automatically. Last year I hit a break through in my python development where I could track players individually. Roboflow has allowed me to track players by position groups.

Classes

Some of them are straight forward like Center, QB (quarterback), db (defensive back), lb (linebacker), but the rest are identified as skill. That means an offensive player like Runningback, Fullback, Tightend, H-back, Wide Reciever.

The project in action

I haven't made a video with myself using roboflow but I will shortly. You can see the project on my linkedin and how it's grown and will continue to grow.
My LinkedIn

Overview

The Surfline Surfer Spotting dataset contains images with surfers floating on the coast. Each image contains one classification called "surfer" but may contain multiple surfers.

Example Footage

Surfers

Using this Dataset

There are several deployment options available, including inferring via API, webcam, and curl command.

Here is a code snippet for to hit the hosted inference API you can use. Here are code snippets for more languages

const axios = require("axios");
                            const fs = require("fs");
                            
                            const image = fs.readFileSync("YOUR_IMAGE.jpg", {
                                encoding: "base64"
                            });
                            
                            axios({
                                method: "POST",
                                url: "https://detect.roboflow.com/surfer-spotting/2",
                                params: {
                                    api_key: "YOUR_KEY"
                                },
                                data: image,
                                headers: {
                                    "Content-Type": "application/x-www-form-urlencoded"
                                }
                            })
                            .then(function(response) {
                                console.log(response.data);
                            })
                            .catch(function(error) {
                                console.log(error.message);
                            });
                            

Download Dataset

On the versions tab you can select the version you like, and choose to download in 26 annotation formats.

This dataset was originally created by Wilfred Shu and Adrian Stuart. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/a-s/uwh.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

VOT2015 Dataset

The dataset comprises 60 short sequences showing various objects in challenging backgrounds. The sequences were chosen from a large pool of sequences including the ALOV dataset, OTB2 dataset, non-tracking datasets, Computer Vision Online, Professor Bob Fisher’s Image Database, Videezy, Center for Research in Computer Vision, University of Central Florida, USA, NYU Center for Genomics and Systems Biology, Data Wrangling, Open Access Directory and Learning and Recognition in Vision Group, INRIA, France. The VOT sequence selection protocol was applied to obtain a representative set of challenging sequences. The dataset is automatically downloaded by the evaluation kit when needed, there is no need to separately download the sequences for the challenge.

Annotations
The sequences were annotated by the VOT committee using rotated bounding boxes in order to provide highly accurate ground truth values for comparing results. The annotations are stored in a text file with the format:

frameN: X1, Y1, X2, Y2, X3, Y3, X4, Y4
where Xi and Yi are the coordinates of corner i of the bounding box in frame N, the N-th row in the text file.

The bounding box was be placed on target such that at most ~30% of pixels within the bounding box corresponded to the background pixels, while containing most of the target. For example, in annotating a person with extended arms, the bounding box was placed such that the arms were not included. Note that in some sequences parts of objects rather than entire objects have been annotated. A rotated bounding box was used to address non-axis alignment of the target. The annotation guidelines have been applied at the judgement of the annotators.

Some targets were partially occluded or were partially out of the image frame. In these cases the bounding box were “inferred” by the annotator to fully contain the object, including the occluded part. For example, if a person’s legs were occluded, the bounding box should also include the non-visible legs.

The annotations have been conducted by three groups of annotators. Each annotator group annotated one third of the dataset and these annotations have been cross-checked by two other groups. The final annotations were checked by the coordinator of the annotation process. The final bounding box annotations have been automatically rectified by replacing a rotated bounding box by an axis-aligned if the ratio of the shortest and longest bounding-box side exceeded 0.95.

Annotators:

Gustavo Fernandez (coordinator)
Jingjing Xiao
Georg Nebehay
Roman Pflugfelder
Koray Aytac

https://www.votchallenge.net/vot2015/dataset.html