Description

This project contains the images Roboflow uses to evaluate the capabilities of Large Multimodal Models (LMMs) such as GPT-4 with Vision and Gemini.

This project contains seven images, which we pass through LMMs with the prompts listed below. The task type each prompt investigates is in parenthesis.

coins.jpeg: How many coins do I have? (Visual Question Answering (VQA))
kevin.png: Which movie is this scene from? (VQA)
ocr.png: Read text from the picture. (Document OCR)
pizza.jpg: What is the price of Pastrami Pizza? (Document VQA)
receipg.jpeg: How much tax did I pay? (Document VQA)
simple-doge.jpeg: Find the dog. (Zero-shot object detection)
tire.jpg: Read the serial number. (VQA)

LMM Evaluations

Blog posts in which this dataset is featured:

GPT-4 Vision Alternatives

Model Comparison

Table of LMMs compared

@misc{ lmm-evaluation_dataset, title = { lmm-evaluation Dataset }, type = { Open Source Dataset }, author = { Roboflow }, howpublished = { \url{ https://universe.roboflow.com/roboflow-gw7yv/lmm-evaluation } }, url = { https://universe.roboflow.com/roboflow-gw7yv/lmm-evaluation }, journal = { Roboflow Universe }, publisher = { Roboflow }, year = { 2023 }, month = { dec }, note = { visited on 2024-12-22 }, }

lmm-evaluation

lmm-evaluation Computer Vision Project

LMM Evaluations

Model Comparison

Build Computer Vision Applications Faster with Supervision

Cite This Project

Similar Projects