TFT-ID Computer Vision Project
Updated 2 months ago
8606 STEM academic paper pages and 41,702 objects (table, figure, and text) annotated and verified by myself. You can use this dataset to 100% reproduce TFT-ID-1.0: https://huggingface.co/yifeihu/TFT-ID-1.0
Tables and figures: in the same bounding boxes with their captions. It’s easier to extract captions instead of associating them later. Text: Text sections that belong to the main content (excluding page headers and footers), following left-to-right and top-to-bottom reading order, and can be further converted to text format (title, paragraph, list, code, algorithm, math expression, etc).
Breaking text into smaller sections makes OCR easier. The OCR models won’t have to handle long input/output which could pose serious challenges to both performance and hardware requirements.
The annotations are under MIT licenses. As for the paper images, please follow the original license or user agreement on arXiv (most of them are under CC0 license).
Build Computer Vision Applications Faster with Supervision
Visualize and process your model results with our reusable computer vision tools.
Cite This Project
If you use this dataset in a research paper, please cite it using the following BibTeX:
@misc{
tft-id-so1jb_dataset,
title = { TFT-ID Dataset },
type = { Open Source Dataset },
author = { Alsayes },
howpublished = { \url{ https://universe.roboflow.com/alsayes/tft-id-so1jb } },
url = { https://universe.roboflow.com/alsayes/tft-id-so1jb },
journal = { Roboflow Universe },
publisher = { Roboflow },
year = { 2024 },
month = { oct },
note = { visited on 2024-12-23 },
}