Description

8606 STEM academic paper pages and 41,702 objects (table, figure, and text) annotated and verified by myself. You can use this dataset to 100% reproduce TFT-ID-1.0: https://huggingface.co/yifeihu/TFT-ID-1.0

Tables and figures: in the same bounding boxes with their captions. It’s easier to extract captions instead of associating them later. Text: Text sections that belong to the main content (excluding page headers and footers), following left-to-right and top-to-bottom reading order, and can be further converted to text format (title, paragraph, list, code, algorithm, math expression, etc).

Breaking text into smaller sections makes OCR easier. The OCR models won’t have to handle long input/output which could pose serious challenges to both performance and hardware requirements.

The annotations are under MIT licenses. As for the paper images, please follow the original license or user agreement on arXiv (most of them are under CC0 license).

@misc{ tft-id-bzod1_dataset, title = { TFT-ID Dataset }, type = { Open Source Dataset }, author = { buraktrial }, howpublished = { \url{ https://universe.roboflow.com/buraktrial/tft-id-bzod1 } }, url = { https://universe.roboflow.com/buraktrial/tft-id-bzod1 }, journal = { Roboflow Universe }, publisher = { Roboflow }, year = { 2024 }, month = { dec }, note = { visited on 2024-12-28 }, }

TFT-ID

TFT-ID Computer Vision Project

Metrics

Use This Trained Model

Build Computer Vision Applications Faster with Supervision

Cite This Project

Similar Projects