Browse » Sports

Top Sports Datasets

Sports datasets and computer vision models can be used for real-time analytics and automated post-game analysis. Sports analytics are time intensive to manually collect and error prone due to human mistakes. Computer vision can be used to track detailed analytics with regards to all sports.

Top 7 sports datasets: https://blog.roboflow.com/top-sports-datasets-computer-vision/

Case study: https://blog.roboflow.com/football-kaggle-computer-vision-safety-scaled-yolov4/

YOLOv5 for player tracking tutorial: https://www.youtube.com/watch?v=QCG8QMhga9k

AI-assisted coaching: https://blog.roboflow.com/ai-football-coach-playbook/

Video example: https://www.youtube.com/watch?v=uQnRis9j11M

251 images of playing cricket, football & baseball.

Original Dataset from Kaggle - Bikram Saha

This dataset contains 252 images of playing cricket, football, and baseball. (1) cricket - 95 images (2) football - 77 images (3) baseball - 79 images

This is a dataset for image classification in sports. This model will help to identify if the sport or activity occurring in the image or video feed is, or most closely resembles, cricket, football, or baseball.

The raw image versions (v1 or v5) of the dataset can be downloaded, or the entire dataset can be cloned, to your own project for image classification, or to label the figures in the images for object detection, instance or semantic segmentation, etc.

Nike, Adidas and Converse Shoes Dataset for Classification

This dataset was obtained from Kaggle: https://www.kaggle.com/datasets/die9origephit/nike-adidas-and-converse-imaged/

Dataset Collection Methodology:

"The dataset was obtained downloading images from Google images. The images with a .webp format were transformed into .jpg images. The obtained images were randomly shuffled and resized so that all the images had a resolution of 240x240 pixels. Then, they were split into train and test datasets and saved."

Versions:

  • v1: original_raw-images: the original images without Preprocessing or Augmentation applied, other than Auto-Orient to remove EXIF data. These images are in the original train/test split from Kaggle: 237 images in each train set and 38 images in each test set
  • v2: original_trainTestSplit-augmented3x: the original train/test split, augmented with 3x image generation. This version was not trained with Roboflow Train.
  • v3: original_trainTestSplit-augmented5x: the original train/test split, augmented with 5x image generation. This version was not trained with Roboflow Train.
  • v4: rawImages_70-20-10split: the original images without Preprocessing or Augmentation applied, other than Auto-Orient to remove EXIF data. Dataset splies were modified to a 70% train, 20% valid, 10% test train/valid/test split
    • NOTE: 70%/20%/10% split: 576 images in train set, 166 images in valid set, 83 images in test set
  • v5: 70-20-10split-augmented3x: modified to a 70% train, 20% valid, 10% test train/valid/test split, augmented with 3x image generation. This version was trained with Roboflow Train.
  • v6: 70-20-10split-augmented5x: modified to a 70% train, 20% valid, 10% test train/valid/test split, augmented with 5x image generation. This version was trained with Roboflow Train.

This dataset was originally created by Wilfred Shu and Adrian Stuart. To see the current project, which may have been updated since this version, please go here: https://universe.roboflow.com/a-s/uwh.

This dataset is part of RF100, an Intel-sponsored initiative to create a new object detection benchmark for model generalizability.

Access the RF100 Github repo: https://github.com/roboflow-ai/roboflow-100-benchmark

Digits
1

Project Overview:

The original goal was to use this model to monitor my rowing workouts and learn more about computer vision. To monitor the workouts, I needed the ability to identify the individual digits on the rowing machine. With the help of Roboflow's computer vision tools, such as assisted labeling, I was able to more quickly prepare, test, deploy and improve my YOLOv5 model. Example Annotated Image from the Dataset

Inference on a Test Image using the rfWidget

Roboflow's Upload API, which is suitable for uploading images, video, and annotations, worked great with a custom app I developed to modify the predictions from the deployed model, and export them in a format that could be uploaded to my workspace on Roboflow.

What took me weeks to develop can now be done with the help of a single click utilize Roboflow Train, and the Upload API for Active Learning (dataset and model improvement). Training Results - Roboflow FAST Model

Dataset Classes:

  • 1, 2, 3, 4, 5, 6, 7, 8, 9, 90 (class "90" is a stand-in for the digit, zero)

This dataset consits of 841 images. There are images from a different rowing machine and also from this repo. Some scenes are illuminated with sunlight. Others have been cropped to include only the LCD. Digits like 7, 8, and 9 are underrepresented.

For more information:

Background Information

This dataset was curated and annotated by Ilyes Talbi, Head of La revue IA, a French publication focused on stories of machine learning applications.

Main objetive is to identify if soccer (futbol) players, the referree and the soccer ball (futbol).

The original custom dataset (v1) is composed of 163 images.

  • Class 0 = players
  • Class 1 = referree
  • Class 2 = soccer ball (or futbol)

The dataset is available under the Public License.

Getting Started

You can download this dataset for use within your own projects, or fork it into a workspace on Roboflow to create your own model.

Dataset Versions

Version 7 (v7) - 163 images (raw images)

  • Preprocessing: Auto-Orient, Modify Classes: 3 remapped, 0 dropped
    • Modified Classes: Class 0 = players, Class 1 = referree, Class 2 = futbol
  • Augmentations: No augmentations applied
  • Training Metrics: This version of the dataset was not trained

Version 2 (v2) - 163 images

  • Preprocessing: Auto-Orient and Resize (Stretch to 416x416)
  • Augmentations: No augmentations applied
  • Training Metrics: This version of the dataset was not trained

Version 3 (v3) - 391 images

  • Preprocessing: Auto-Orient and Resize (Stretch to 416x416), Modify Classes: 3 remapped, 0 dropped
    • Modified Classes: Class 0 = players, Class 1 = referree, Class 2 = futbol
  • Augmentations:
    • Outputs per training example: 3
    • Rotation: Between -25° and +25°
    • Shear: ±15° Horizontal, ±15° Vertical
    • Brightness: Between -25% and +25%
    • Blur: Up to 0.75px
    • Noise: Up to 1% of pixels
    • Bounding Box: Blur: Up to 0.5px
  • Training Metrics: 86.4%mAP, 51.8% precision, 90.4% recall

Version 4 (v4) - 391 images

  • Preprocessing: Auto-Orient and Resize (Stretch to 416x416), Modify Classes: 3 remapped, 0 dropped
    • Modified Classes: Class 0 = players, Class 1 = referree, Class 2 = futbol
  • Augmentations:
    • Outputs per training example: 3
    • Rotation: Between -25° and +25°
    • Shear: ±15° Horizontal, ±15° Vertical
    • Brightness: Between -25% and +25%
    • Blur: Up to 0.75px
    • Noise: Up to 1% of pixels
    • Bounding Box: Blur: Up to 0.5px
  • Training Metrics: 84.6% mAP, 52.3% precision, 85.3% recall

Version 5 (v5) - 391 images

  • Preprocessing: Auto-Orient and Resize (Stretch to 416x416), Modify Classes: 3 remapped, 2 dropped
    • Modified Classes: Class 0 = players, Class 1 = referree, Class 2 = futbol
      • Only Class 0, which was remapped to players was included in this version
  • Augmentations:
    • Outputs per training example: 3
    • Rotation: Between -25° and +25°
    • Shear: ±15° Horizontal, ±15° Vertical
    • Brightness: Between -25% and +25%
    • Blur: Up to 0.75px
    • Noise: Up to 1% of pixels
    • Bounding Box: Blur: Up to 0.5px
  • Training Metrics: Trained from the COCO Checkpoint in Public Models ("transfer learning") on Roboflow
    • 98.8%mAP, 76.3% precision, 99.2% recall

Version 6 (v6) - 391 images

  • Preprocessing: Auto-Orient and Resize (Stretch to 416x416), Modify Classes: 3 remapped, 2 dropped
    • Modified Classes: Class 0 = players, Class 1 = referree, Class 2 = futbol
      • Only Class 0, which was remapped to players was included in this version
  • Augmentations:
    • Outputs per training example: 3
    • Rotation: Between -25° and +25°
    • Shear: ±15° Horizontal, ±15° Vertical
    • Brightness: Between -25% and +25%
    • Blur: Up to 0.75px
    • Noise: Up to 1% of pixels
    • Bounding Box: Blur: Up to 0.5px
  • Training Metrics: Trained from Scratch (no transfer learning employed)
    • 95.5%mAP, 67.8% precision, 95.5% recall

Ilyes Talbi - LinkedIn | La revue IA

Badminton Shuttlecock Object Detection

VolleyVision

vv

Project Overview:

The Volleyball Activity Dataset is a collection of annotated volleyball video sequences captured from professional matches in the Austrian Volley League (season 2011/12). The dataset contains 6 videos with HD resolution (1920x1080) at 25 frames per second, encoded using the DX50 codec. It includes 7 activity classes: 'Serve', 'Reception', 'Setting', 'Attack', 'Block', 'Stand', 'Defense/Move'

Class Descriptions:

Serve: The action of serving the ball to initiate a rally. Reception: The action of receiving the served ball. Setting: The action of accurately positioning the ball for an attacker. Attack: The action of hitting or spiking the ball towards the opponent's court. Block: The action of intercepting the opponent's attack at the net. Stand: The class representing a player's static position or standing. Defense/Move: The class representing defensive movements or general player movements.

Links to External Resources:

Official Dataset: Graz University of Technology Volleyball Activity Dataset Paper 1: Improved Sport Activity Recognition using Spatio-temporal Context Paper 2: Indoor Activity Detection and Recognition for Automated Sport Games Analysis

RoboFlow Author

Shakhansho Sabzaliev - Github

Golf Ball Object Detection

Usage

This model will perform best on images + videos that are taken on a golf course (similar to photo in thumbnail and dataset).

It's a great model for sports broadcasting and other apps to have automated ball tracking, scoring, lost ball finding and more!

Peyton Manning couldn't help but try to call timeout for the Broncos

on Monday, September 12's ESPN Monday Night Football.

This model is available to use any time you'd like to try make a video including object (timeout) counts of Peyton Manning, or another TV sports analyst, attempting to call timeout, or criticize a team for failing to call timeout.

Classes:

  • timeout
  • hands
    • Note: this class is omitted for the Deploy tab model - it only exists in versions: v1, v2, v3
    • Modify Classes was used to omit the hands class
    • The class was originally included in case the model did not do well with recognizing the timeout hand signal, and post-processing such as comparing the overlap/distance between the predicted hands bounding boxes was necessary for confirming a timeout call

Want to test it for yourself??

  • Click this link for the video and right-click to copy/paste the Video Address for the video in the Deploy tab, or right-click to save the video file and upload it to the Deploy tab for testing.

VolleyVision

vv

This dataset is part of my bachelor thesis project. It was created by combining multiple open-source datasets from RoboFlow Universe as well as manual annotation.

Code

You can find the code, models, metrics, everything related to this dataset in practical terms - GitHub Link

Versions

  1. original - original images with no augmentation techniques applied, nor size changed
  2. stretched_dataset - resized images to 1024x1024
  3. resized_640 - resized images to 640x640

Datasets used:

Author

Shakhansho Sabzaliev - Github

Overview:

Football (soccer) player and football (soccer) ball detection dataset from Augmented Startups.

  • Project Type: Object Detection
  • Labeled/Annotated with: Bounding boxes

Classes:

  • football, player

How to Use:

This is a great starter-dataset for those wanting to test player and/or ball-tracking for football (soccer) games with the Deploy Tab, or the Deployment device and method of their choice.

Images can also be Cloned to another project to continue iterating on the project and model. World Cup, Premier League, La Liga, Major League Soccer (MLS) and/or Champions League computer vision projects, anyone?

Roboflow offers AutoML model training - Roboflow Train, and the ability to import and export up to 30 different annotation formats. Leaving you flexibility to deploy directly with a Roboflow Train model, or use Roboflow to prepare and manage datasets, and train and deploy with the custom model architecture of your choice + https://github.com/roboflow-ai/notebooks.

Tips for Model and Dataset Improvement:

Example Video from Deploy Tab

Here are a few use cases for this project:

  1. Sports Analytics: The Soccer Players computer vision model can be used to analyze player performance during games by tracking player and ball positions, individual player actions, and goal-scoring events, allowing coaches and trainers to make data-driven decisions for improving performance and strategies.

  2. Automated Highlight Reels: The model can be used to automatically curate soccer match highlights by identifying crucial moments such as goals, outstanding player performances, and referee decisions. This can streamline the video editing process for broadcasting and streaming companies.

  3. Virtual Assistant for Soccer Enthusiasts: The Soccer Players model can be integrated into a mobile application, allowing users to take pictures or upload images from soccer matches and receive instant information about the teams (USA, NED), player roles (goalie, outfield player, referee), and other relevant classes such as ball and goal locations, enhancing their understanding and engagement with the sport.

  4. Real-Time Augmented Reality (AR) Applications: The model can be used to create AR experiences for soccer fans attending live matches, providing pop-up information about players (such as player stats, team affiliations, etc.) and game events (goals, referee decisions) when viewing the live match through an AR device or smartphone.

  5. Training and Scouting Tools: Soccer scouts and trainers can use the Soccer Players model to evaluate potential recruits or assess the performance of their own players during practice sessions. By rapidly identifying key actions (goals, saves, tackles) and providing context for each play, the model can help scouts and trainers make informed decisions faster.

A Disk Golf (frisbee) data set. Created for tracking flying disks.

Thanks to Patrick for his frisbees on the ground data: https://universe.roboflow.com/patrick-o-gorman/frisbee

Background Information

This dataset was curated and annotated by Mohamed Traore and Justin Brady after forking the raw images from the Roboflow Universe Mask Wearing dataset and remapping the mask and no-mask classes to face.

Example Image from the Dataset

The main objective is to identify human faces in images or video. However, this model could be used for privacy purposes with changing the output of the bounding boxes to blur the detected face or fill it with a black box.

The original custom dataset (v1) is composed of 867 unaugmented (raw) images of people in various environments. 55 of the images are marked as Null to help with feature extraction and reducing false detections.

Version 2 (v2) includes the augmented and trained version of the model. This version is trained from the COCO model checkpoint to take advantage of transfer learning and improve initial model training results.

Model Updates:

After a few trainings, and running tests with Roboflow's webcam model and Roboflow's video inference repo, it was clear that edge cases like hands sometimes recognized as faces was an issue. I grabbed some images from Alex Wong's Hand Signs dataset (96 images from the dataset) and added them to the project. I uploaded the images, without the annotation files, labeled all the faces, and retrained the model (version 5).

The dataset is available under the CC BY 4.0 license.

Includes images from:

@misc{ person-hgivm_dataset,
                                title = { person Dataset },
                                type = { Open Source Dataset },
                                author = { Abner },
                                howpublished = { \url{ https://universe.roboflow.com/abner/person-hgivm } },
                                url = { https://universe.roboflow.com/abner/person-hgivm },
                                journal = { Roboflow Universe },
                                publisher = { Roboflow },
                                year = { 2021 },
                                month = { aug },
                                note = { visited on 2022-10-14 },
                            }
                            

Boxpunch Detector

Onboarding project for Roboflow

This project captures punch types thrown during boxing training

Here are a few use cases for this project:

  1. Sports Analytics: Use the "Basketball Players" model to automatically track players' movements, ball possession, and referee decisions during live games or post-game analysis. This data can be used by coaches, analysts, and teams to inform and improve strategies, tactics, and player performance.

  2. Real-time Game Commentary: Integrate the model into sports broadcasting platforms, providing real-time updates and statistics to commentators, allowing them to focus on in-depth analysis and storytelling while the model handles identification and stat-tracking.

  3. Automated Sports Highlights: Utilize the model to automatically create highlights from basketball games by identifying key moments, such as successful shots, blocks, and referee decisions. This can streamline post-production process for sports media outlets and social media channels.

  4. Training and Skill Development: Leverage the "Basketball Players" model to create feedback tools for players, identifying areas of improvement in team dynamics and individual technique during practice sessions or games.

  5. Fan Experience: Employ the model in smartphone apps or AR devices, providing fans with real-time information on their favorite teams and players during live games, enhancing their overall experience and engagement.

VolleyVision

vv

This dataset is part of my bachelor thesis project. It was created by combining multiple open-source datasets from RoboFlow Universe as well as manual annotation.

Code

You can find the code, models, metrics, everything related to this dataset in practical terms - GitHub Link

Versions

  1. 25k_original_version - original images with no augmentation techniques applied
  2. 25_version_640 - resized images to 640x640
  3. enlarged_images - resized images 800x800 to solve the issue of small object detection

Datasets used:

Author

Shakhansho Sabzaliev - Github

Classes

  • blocking - for players who have their arms under the net while the ball is crossing the net
  • passing - for players who make the first touch below the net
  • spiking - for palyers who make the touch (second or third) on their side of the court after which the ball goes to the other side
  • setting - for players who make the touch after which other player from their team spikes
  • serving - the first touch in the game
  • digging - action after opponent spike, blok, action to save the ball in not ordinary situation, when player should fall, fall and run, use not typical to play

Cyclist Detection

Overview

This model helps detect people riding bycicles, and from which direction the cyclist can be seen (front, back, side).

Use Cases

Both self driving and sports broadcasting are great use cases for this model, as it gives great information about how the camera is positioned relative to the rider(s).

Overview

This project started over 3 years ago, where I wanted to make something that would draw out football plays automatically. Last year I hit a break through in my python development. where I could track players individually. Roboflow has allowed me to track players by position groups.

Classes

Some of them are straight forward like Center, QB (quarterback), db (defensive back), lb (linebacker), but the rest are identified as skill. That means an offensive player like Runningback, Fullback, Tightend, H-back, Wide Reciever.

The project in action

I haven't made a video with myself using roboflow but I will shortly. You can see the project on my linkedin and how it's grown and will continue to grow. My LinkedIn

Here are a few use cases for this project:

  1. Sports Analytics: The "Juggling" model can be used for tracking player and ball movements during football matches. It can provide insights into player performance, ball control and technique.

  2. Player Training Improvement: Coaches can use the model to analyse and improve a player's juggling skills, by identifying any errors in foot-ball coordination and providing feedback on how to improve.

  3. Augmented Reality Sports Games: In AR games aiming to emulate real life ball games (like football), this model can be used to detect and track the person's movements as well as the ball, making the gaming experience more interactive and realistic.

  4. Physical Therapy and Rehabilitation: This model can be used to monitor patients’ progress during physical therapy - particularly in exercises involving coordination between the person and balls.

  5. Video Review and Referee Assistance: The model can help in making judgment calls such as fouls and handballs by determining if the ball contacted the player's foot or another part of the body. It can also be used for retrospective video analysis to validate or dispute calls made during the game.

Overview

This project started over 3 years ago, where I wanted to make something that would draw out football plays automatically. Last year I hit a break through in my python development where I could track players individually. Roboflow has allowed me to track players by position groups.

Classes

Some of them are straight forward like Center, QB (quarterback), db (defensive back), lb (linebacker), but the rest are identified as skill. That means an offensive player like Runningback, Fullback, Tightend, H-back, Wide Reciever.

The project in action

I haven't made a video with myself using roboflow but I will shortly. You can see the project on my linkedin and how it's grown and will continue to grow. My LinkedIn

Gridiron Intelligence

Overview

The Surfline Surfer Spotting dataset contains images with surfers floating on the coast. Each image contains one classification called "surfer" but may contain multiple surfers.

Example Footage

Surfers

Using this Dataset

There are several deployment options available, including inferring via API, webcam, and curl command.

Here is a code snippet for to hit the hosted inference API you can use. Here are code snippets for more languages

const axios = require("axios");
                            const fs = require("fs");
                            
                            const image = fs.readFileSync("YOUR_IMAGE.jpg", {
                                encoding: "base64"
                            });
                            
                            axios({
                                method: "POST",
                                url: "https://detect.roboflow.com/surfer-spotting/2",
                                params: {
                                    api_key: "YOUR_KEY"
                                },
                                data: image,
                                headers: {
                                    "Content-Type": "application/x-www-form-urlencoded"
                                }
                            })
                            .then(function(response) {
                                console.log(response.data);
                            })
                            .catch(function(error) {
                                console.log(error.message);
                            });
                            

Download Dataset

On the versions tab you can select the version you like, and choose to download in 26 annotation formats.

Here are a few use cases for this project:

  1. Sports Analytics: Use the "soccer data" model to automatically classify and track players' actions during a soccer match, helping teams and coaches analyze player performance, decision-making, and ball possession patterns.

  2. Soccer Training Applications: Incorporate the model into a soccer training app or system that provides real-time feedback to players, assisting them in improving their ball-handling skills, positioning, and decision-making on the field.

  3. Interactive Sports Broadcasting: Enhance the viewer experience during live broadcasts or replays of soccer matches by automatically identifying which player has the ball, enabling new interactive features such as instant player statistics or alerts for key events.

  4. Augmented Reality Sports Experiences: Implement the model into an AR app that allows users to watch live or recorded soccer games with an overlay that highlights player positions and their current ball possession status, making it easier for viewers to follow and understand the game's progression.

  5. Automated Soccer Highlights Generation: Utilize the "soccer data" model to automatically identify and extract key moments in soccer matches (such as goals, saves, or exciting plays) based on player and ball possession patterns, making it more efficient to create highlight reels or videos for fans to enjoy.

VOT2015 Dataset

The dataset comprises 60 short sequences showing various objects in challenging backgrounds. The sequences were chosen from a large pool of sequences including the ALOV dataset, OTB2 dataset, non-tracking datasets, Computer Vision Online, Professor Bob Fisher’s Image Database, Videezy, Center for Research in Computer Vision, University of Central Florida, USA, NYU Center for Genomics and Systems Biology, Data Wrangling, Open Access Directory and Learning and Recognition in Vision Group, INRIA, France. The VOT sequence selection protocol was applied to obtain a representative set of challenging sequences. The dataset is automatically downloaded by the evaluation kit when needed, there is no need to separately download the sequences for the challenge.

Annotations The sequences were annotated by the VOT committee using rotated bounding boxes in order to provide highly accurate ground truth values for comparing results. The annotations are stored in a text file with the format:

frameN: X1, Y1, X2, Y2, X3, Y3, X4, Y4 where Xi and Yi are the coordinates of corner i of the bounding box in frame N, the N-th row in the text file.

The bounding box was be placed on target such that at most ~30% of pixels within the bounding box corresponded to the background pixels, while containing most of the target. For example, in annotating a person with extended arms, the bounding box was placed such that the arms were not included. Note that in some sequences parts of objects rather than entire objects have been annotated. A rotated bounding box was used to address non-axis alignment of the target. The annotation guidelines have been applied at the judgement of the annotators.

Some targets were partially occluded or were partially out of the image frame. In these cases the bounding box were “inferred” by the annotator to fully contain the object, including the occluded part. For example, if a person’s legs were occluded, the bounding box should also include the non-visible legs.

The annotations have been conducted by three groups of annotators. Each annotator group annotated one third of the dataset and these annotations have been cross-checked by two other groups. The final annotations were checked by the coordinator of the annotation process. The final bounding box annotations have been automatically rectified by replacing a rotated bounding box by an axis-aligned if the ratio of the shortest and longest bounding-box side exceeded 0.95.

Annotators:

Gustavo Fernandez (coordinator) Jingjing Xiao Georg Nebehay Roman Pflugfelder Koray Aytac

https://www.votchallenge.net/vot2015/dataset.html