Image classification models

Larry Igna

Power Member
Joined
Nov 25, 2016
Messages
663
Reaction score
717
Anyone here has experience with image classification?
I'm looking for a model that is small enough, something like resnet50, Mobilenet, etc. I have tried many models like these, especially from https://huggingface.co/timm

Timm models are great, being small enough to run on CPU. They spit out top5 classes and probabilities, which is enough for me. The only problem is they are all trained on ImageNet dataset and this dataset has people removed. So, if you have people in a photo, these models simply won't recognize them.

I'm looking for any suggestions of models under ~300MB trained on something else (maybe flickr30k) that can recognize humans in photos. Like, just "man" :30% it's enough for me
 
I've had success using Smiling Wolfs ViT models in the past to match people. The weights vary in and around 350-400mb. If your dead set on <300mb look around for other models trained for the purpose of tagging stable diffusion images.
 
If your dead set on <300mb...
Just to clarify, not dead set on 300 mb limit, it's only a rough approximation for the size of models that can easily run on almost any CPU (including mobile).

For anyone else who finds this thread, I've found models that meet my criteria in Timm collection. What I didn't notice first time it's that models are trained on ImageNet21k, meaning the full set of classes, but many of them are fine tuned on ImageNet1k, a subset of the big one.
It's very easy to distinguish between them, the bright folks out there named them for idiots like me to understand easily. If the filename has "ft_in_1k" at the end it's fine tuned, if it ends with "in_21k" it means it's the big one.
Now, the 21k synset contains people classes, while the 1k doesn't. In my tests, I probably selected only fine tuned models (bad luck I guess), but after figuring this out I got the correct results.

I ended up using this:
https://huggingface.co/timm/tf_efficientnetv2_b3.in21k (186MB)
But there are models even smaller:
https://huggingface.co/timm/vit_tiny_r_s16_p8_224.augreg_in21k/
This one is ~45MB and working, but I opted for the slightly better model above.
 
To recognize people in photos, try models trained on COCO dataset such as YOLO (e.g. YOLOv3-tiny), SSD-MobileNet or EfficientDet-D0. These models are small and run on CPU. They are capable of recognizing people and are suitable for your task. If it doesn't work out, you can train the model on your own data
 
Back
Top