face-features-test Computer Vision Project
Updated 3 years ago
A simple dataset for benchmarking CreateML object detection models. The images are sampled from COCO dataset with eyes and nose bounding boxes added. It’s not meant to be serious or useful in a real application. The purpose is to look at how long it takes to train CreateML models with varying dataset and batch sizes.
Training performance is affected by model configuration, dataset size and batch configuration. Larger models and batches require more memory. I used CreateML object detection project to compare the performance.
Hardware
M1 Macbook Air
- 8 GPU
- 4/4 CPU
- 16G memory
- 512G SSD
M1 Max Macbook Pro
- 24 GPU
- 2/8 CPU
- 32G memory
- 2T SSD
Small Dataset Train: 144 Valid: 16 Test: 8
Results
batch | M1 ET | M1Max ET | peak mem G |
---|---|---|---|
16 | 16 | 11 | 1.5 |
32 | 29 | 17 | 2.8 |
64 | 56 | 30 | 5.4 |
128 | 170 | 57 | 12 |
Larger Dataset Train: 301 Valid: 29 Test: 18
Results
batch | M1 ET | M1Max ET | peak mem G |
---|---|---|---|
16 | 21 | 10 | 1.5 |
32 | 42 | 17 | 3.5 |
64 | 85 | 30 | 8.4 |
128 | 281 | 54 | 16.5 |
CreateML Settings
For all tests, training was set to Full Network. I closed CreateML between each run to make sure memory issues didn't cause a slow down. There is a bug with Monterey as of 11/2021 that leads to memory leak. I kept an eye on the memory usage. If it looked like there was a memory leak, I restarted MacOS.
Observations
In general, more GPU and memory with MBP reduces the training time. Having more memory lets you train with larger datasets. On M1 Macbook Air, the practical limit is 12G before memory pressure impacts performance. On M1 Max MBP, the practical limit is 26G before memory pressure impacts performance. To work around memory pressure, use smaller batch sizes.
On the larger dataset with batch size 128, the M1Max is 5x faster than Macbook Air. Keep in mind a real dataset should have thousands of samples like Coco or Pascal. Ideally, you want a dataset with 100K images for experimentation and millions for the real training. The new M1 Max Macbooks is a cost effective alternative to building a Windows/Linux workstation with RTX 3090 24G. For most of 2021, the price of RTX 3090 with 24G is around $3,000.00. That means an equivalent windows workstation would cost the same as the M1Max Macbook pro I used to run the benchmarks.
Full Network vs Transfer Learning
As of CreateML 3, training with full network doesn't fully utilize the GPU. I don't know why it works that way. You have to select transfer learning to fully use the GPU. The results of transfer learning with the larger dataset. In general, the training time is faster and loss is better.
batch | ET min | Train Acc | Val Acc | Test Acc | Top IU Train | Top IU Valid | Top IU Test | Peak mem G | loss |
---|---|---|---|---|---|---|---|---|---|
16 | 4 | 75 | 19 | 12 | 78 | 23 | 13 | 1.5 | 0.41 |
32 | 8 | 75 | 21 | 10 | 78 | 26 | 11 | 2.76 | 0.02 |
64 | 13 | 75 | 23 | 8 | 78 | 24 | 9 | 5.3 | 0.017 |
128 | 25 | 75 | 22 | 13 | 78 | 25 | 14 | 8.4 | 0.012 |
Github Project
The source code and full results are up on Github https://github.com/woolfel/createmlbench
Build Computer Vision Applications Faster with Supervision
Visualize and process your model results with our reusable computer vision tools.
Cite This Project
If you use this dataset in a research paper, please cite it using the following BibTeX:
@misc{
face-features-test_dataset,
title = { face-features-test Dataset },
type = { Open Source Dataset },
author = { Peter Lin },
howpublished = { \url{ https://universe.roboflow.com/peter-lin/face-features-test } },
url = { https://universe.roboflow.com/peter-lin/face-features-test },
journal = { Roboflow Universe },
publisher = { Roboflow },
year = { 2021 },
month = { dec },
note = { visited on 2024-11-14 },
}