VOT2015 Image Dataset
The dataset comprises 60 short sequences showing various objects in challenging backgrounds. The sequences were chosen from a large pool of sequences including the ALOV dataset, OTB2 dataset, non-tracking datasets, Computer Vision Online, Professor Bob Fisher’s Image Database, Videezy, Center for Research in Computer Vision, University of Central Florida, USA, NYU Center for Genomics and Systems Biology, Data Wrangling, Open Access Directory and Learning and Recognition in Vision Group, INRIA, France. The VOT sequence selection protocol was applied to obtain a representative set of challenging sequences. The dataset is automatically downloaded by the evaluation kit when needed, there is no need to separately download the sequences for the challenge.
The sequences were annotated by the VOT committee using rotated bounding boxes in order to provide highly accurate ground truth values for comparing results. The annotations are stored in a text file with the format:
frameN: X1, Y1, X2, Y2, X3, Y3, X4, Y4
where Xi and Yi are the coordinates of corner i of the bounding box in frame N, the N-th row in the text file.
The bounding box was be placed on target such that at most ~30% of pixels within the bounding box corresponded to the background pixels, while containing most of the target. For example, in annotating a person with extended arms, the bounding box was placed such that the arms were not included. Note that in some sequences parts of objects rather than entire objects have been annotated. A rotated bounding box was used to address non-axis alignment of the target. The annotation guidelines have been applied at the judgement of the annotators.
Some targets were partially occluded or were partially out of the image frame. In these cases the bounding box were “inferred” by the annotator to fully contain the object, including the occluded part. For example, if a person’s legs were occluded, the bounding box should also include the non-visible legs.
The annotations have been conducted by three groups of annotators. Each annotator group annotated one third of the dataset and these annotations have been cross-checked by two other groups. The final annotations were checked by the coordinator of the annotation process. The final bounding box annotations have been automatically rectified by replacing a rotated bounding box by an axis-aligned if the ratio of the shortest and longest bounding-box side exceeded 0.95.
Gustavo Fernandez (coordinator)
5 months ago
bag, ball1, ball2, basketball, birds1, birds2, blanket, bmx, bolt1, bolt2, book, butterfly, car1, car2, crossing, dinosaur, fernando, fish1, fish2, fish3, fish4, girl, glove, godfather, graduate, gymnastics1, gymnastics2, gymnastics3, gymnastics4, hand, handball1, handball2, helicopter, iceskater1, iceskater2, leaves, marching, matrix, motocross1, motocross2, nature, octopus, pedestrian1, pedestrian2, rabbit, racing, road, shaking, sheep, singer1, singer2, singer3, soccer1, soccer2, soldier, sphere, tiger, traffic, tunnel, wiper