Dataset Versions

v2

2024-10-05 9:44pm

Generated on Oct 5, 2024

Popular Download Formats

Pascal VOC XML
Common XML annotation format for local data munging (pioneered by ImageNet).
PaliGemma
PaliGemma JSONL format used for fine-tuning PaliGemma, Google's open multimodal vision model.
CreateML JSON
CreateML JSON format is used with Apple's CreateML and Turi Create tools.
Other Formats
Choose another format.

Dataset Split

Train Set %
0Images
Valid Set 100%
1112Images
Test Set %
0Images

Preprocessing

Auto-Orient: Applied
Static Crop: 10-90% Horizontal Region, 5-100% Vertical Region
Auto-Adjust Contrast: Using Adaptive Equalization
Filter Null: Require at least 75% of images to contain annotations.

Augmentations

Outputs per training example: 3
Crop: 0% Minimum Zoom, 10% Maximum Zoom
Saturation: Between -10% and +10%
Brightness: Between -25% and +25%
Exposure: Between -10% and +10%
Blur: Up to 2.5px
Noise: Up to 5% of pixels
Bounding Box: Brightness: Between -15% and +15%
Bounding Box: Exposure: Between -10% and +10%
Bounding Box: Noise: Up to 3% of pixels