Website Screenshots Computer Vision Project

Brad Dwyer

Updated 5 years ago

54

views

3

downloads
Description

About This Dataset

The Roboflow Website Screenshots dataset is a synthetically generated dataset composed of screenshots from over 1000 of the world's top websites. They have been automatically annotated to label the following classes: :fa-spacer:

  • button - navigation links, tabs, etc.
  • heading - text that was enclosed in <h1> to <h6> tags.
  • link - inline, textual <a> tags.
  • label - text labeling form fields.
  • text - all other text.
  • image - <img>, <svg>, or <video> tags, and icons.
  • iframe - ads and 3rd party content.

Example

This is an example image and annotation from the dataset: WIkipedia Screenshot

Usage

Annotated screenshots are very useful in Robotic Process Automation. But they can be expensive to label. This dataset would cost over $4000 for humans to label on popular labeling services. We hope this dataset provides a good starting point for your project. Try it with a model from our model library.

Collecting Custom Data

Roboflow is happy to provide a custom screenshots dataset to meet your particular needs. We can crawl public or internal web applications. Just reach out and we'll be happy to provide a quote!

About Roboflow

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless. :fa-spacer: Developers reduce 50% of their boilerplate code when using Roboflow's workflow, save training time, and increase model reproducibility. :fa-spacer:

Roboflow Wordmark

Supervision

Build Computer Vision Applications Faster with Supervision

Visualize and process your model results with our reusable computer vision tools.

Cite This Project

LICENSE
MIT

If you use this dataset in a research paper, please cite it using the following BibTeX:

                        @misc{
                            website-screenshots-archived_dataset,
                            title = { Website Screenshots Dataset },
                            type = { Open Source Dataset },
                            author = { Brad Dwyer },
                            howpublished = { \url{ https://universe.roboflow.com/brad-dwyer/website-screenshots-archived } },
                            url = { https://universe.roboflow.com/brad-dwyer/website-screenshots-archived },
                            journal = { Roboflow Universe },
                            publisher = { Roboflow },
                            year = { 2020 },
                            month = { may },
                            note = { visited on 2024-11-21 },
                            }