face-features-test Object Detection Dataset by Peter Lin

Description

A simple dataset for benchmarking CreateML object detection models. The images are sampled from COCO dataset with eyes and nose bounding boxes added. It’s not meant to be serious or useful in a real application. The purpose is to look at how long it takes to train CreateML models with varying dataset and batch sizes.

Training performance is affected by model configuration, dataset size and batch configuration. Larger models and batches require more memory. I used CreateML object detection project to compare the performance.

Hardware

M1 Macbook Air

8 GPU
4/4 CPU
16G memory
512G SSD

M1 Max Macbook Pro

24 GPU
2/8 CPU
32G memory
2T SSD

Small Dataset Train: 144 Valid: 16 Test: 8

Results

batch	M1 ET	M1Max ET	peak mem G
16	16	11	1.5
32	29	17	2.8
64	56	30	5.4
128	170	57	12

Larger Dataset Train: 301 Valid: 29 Test: 18

Results

batch	M1 ET	M1Max ET	peak mem G
16	21	10	1.5
32	42	17	3.5
64	85	30	8.4
128	281	54	16.5

CreateML Settings

For all tests, training was set to Full Network. I closed CreateML between each run to make sure memory issues didn't cause a slow down. There is a bug with Monterey as of 11/2021 that leads to memory leak. I kept an eye on the memory usage. If it looked like there was a memory leak, I restarted MacOS.

Observations

In general, more GPU and memory with MBP reduces the training time. Having more memory lets you train with larger datasets. On M1 Macbook Air, the practical limit is 12G before memory pressure impacts performance. On M1 Max MBP, the practical limit is 26G before memory pressure impacts performance. To work around memory pressure, use smaller batch sizes.

On the larger dataset with batch size 128, the M1Max is 5x faster than Macbook Air. Keep in mind a real dataset should have thousands of samples like Coco or Pascal. Ideally, you want a dataset with 100K images for experimentation and millions for the real training. The new M1 Max Macbooks is a cost effective alternative to building a Windows/Linux workstation with RTX 3090 24G. For most of 2021, the price of RTX 3090 with 24G is around $3,000.00. That means an equivalent windows workstation would cost the same as the M1Max Macbook pro I used to run the benchmarks.

Full Network vs Transfer Learning

As of CreateML 3, training with full network doesn't fully utilize the GPU. I don't know why it works that way. You have to select transfer learning to fully use the GPU. The results of transfer learning with the larger dataset. In general, the training time is faster and loss is better.

batch	ET min	Train Acc	Val Acc	Test Acc	Top IU Train	Top IU Valid	Top IU Test	Peak mem G	loss
16	4	75	19	12	78	23	13	1.5	0.41
32	8	75	21	10	78	26	11	2.76	0.02
64	13	75	23	8	78	24	9	5.3	0.017
128	25	75	22	13	78	25	14	8.4	0.012

Github Project

The source code and full results are up on Github https://github.com/woolfel/createmlbench

face-features-test

face-features-test Computer Vision Project

Build Computer Vision Applications Faster with Supervision

Cite This Project

Similar Projects