Guide: Getting started with choosing a Machine Learning CLIP Model for Smart Search #11862

radh21301 · 2024-08-17T04:27:48Z

radh21301
Aug 17, 2024

I am an absolute newbie when it comes to ML, so like many others, I was lost on how to choose a CLIP model. For over a year, I just stuck with the default, mostly because I didn't know what else to choose and because it worked pretty well. But after the recent release of some new models and the post about those being better, I was curious, like many others, about how to proceed. I realized a lot of people were asking questions on the discord channel, and the dev (mertalev) gave some really helpful advice and information there. Since we don't have an official guide yet, I thought I will curate some of their responses here.

Note: All this information is mostly just a copy/paste or a rephrased version of what I read on Discord. I could be wrong. But hopefully, it helps someone.

Performance Metrics of Different Models 📈
I got these links from a PR (#11468). It has the performance metrics of many models. They should have most of the models supported by immich (https://huggingface.co/immich-app).
Monolingual models metrics: https://github.com/mlfoundations/open_clip/blob/main/docs/openclip_retrieval_results.csv
Multilingual models metrics: https://github.com/mlfoundations/open_clip/blob/main/docs/openclip_multilingual_retrieval_results.csv

Easiest way to choose a model
The easiest way to determine what model to choose is to look at the attached interactive plots. I believe these plots were generated using information from the links listed above. These plots are in the .zip that is attached. There are three .html files that Mertalev shared (GitHub would not let me attach a .html file. So I had to .zip). Download them and open them on any browser. You might need a computer to open it.
clip_models_efficiency_plots.zip

The following are the specs of some of the popular models in tabular form:

Bubble Size -> RAM 💾
The bigger the bubble, the more RAM it's going to consume. Note: The charts don't factor in concurrency (running multiple things at once). At the default setting of 2, you might see a tiny bump in RAM use (like 10-20 MB). Crank it up higher, and you'll notice more of a difference.
Pro tip: Hover over a bubble to get more information about that model, including the RAM.

MACs (x-axis) -> Model Speed 🛩️
If you know FLOPs, MACs is a similar metric. It’s basically how much computation the model needs to do, which relates to how much time the device needs to spend to do them. More MACs = more time your device spends thinking. On a powerful GPU, speed might not be a big deal.

Smaller MACs = Faster model
Larger MACs = Slower model

Quality of the search (y - axis)
Higher quality = Better search results. As simple as that. I don't know how quality is compared when the different models use different datasets. But it does give a very close idea of how the model performs.

Efficiency
"Efficiency" is related to both the x-axis (MACs) and y-axis (Quality). Models that require less number-crunching to get the same quality are more efficient.

For example: The model 'ViT-B-16-SigLIP-256__webli' has MACs = 29.45 Billion, and quality = 0.767. And the model 'ViT-H-14-378-quickgelu__dfn5b' has a staggering MACs = 542.15 Billion, and quality = 0.828. Now, certainly the bigger model will give better results. But is that increase of (~ 0.06 quality (7%)) worth the extra time to process things, you need to decide.

Remember, these charts just give you a general vibe. Don't stress too much about picking the "perfect" model based on them.

Some Questions You Might Have:
What Does "Slow" Mean? 🐌
When we're talking "slow," we're looking at two major things:

Initial processing time (running the smart search job)
Response time when you're actually searching for stuff
- Time to load model into memory:
  - By default, Immich loads your model into memory when you make your first search after starting up Immich. And this might take a few seconds if the model is large. And after a default duration of 300s, the model is unloaded to reduce memory usage. TIP: If you want to keep the model constantly in memory, set the environment variable MACHINE_LEARNING_MODEL_TTL=0 for the machine learning container (in your docker compose).
  - TIP: To mitigate this initial loading time, set the environment variable MACHINE_LEARNING_PRELOAD__CLIP=<model-name> for the machine learning container. This will make the service load the specified model at startup instead of in the first request. For preloaded models, once it is loaded in memory, it will stay in memory until Immich shuts down. This means it will constantly use that RAM. Preloading the model also ensures that Immich does not unload the model after a certain duration of inactivity. If you have enough RAM, I recommend you preload your models. It has a noticeable effect on the search experience.
- Time to search for things: This depends on the model you choose. Lower MACs = Faster.

Both 'Initial processing time' and 'Response time when a search is done' can be affected by the model choice. If you are like me, you probably hoped it would only affect the initial processing time. I don't care how long the initial time is, but it does become a bother when every search takes longer to show results. So, pick your poison. But hey, "slow" is relative, right? What is slow to you, might be fast for me!

Time Estimates? 🕰️
Here's the deal: Giving you exact times is impossible. It all depends on your hardware.

What is this 'webli', 'DFN-5b' etc?
Not all models are trained on the same data. It's like they've all read different books:

Models ending with "webli" = WebLI dataset
DFN5b models = DFN-5b dataset
And so on...

How to set the model in Immich
Say you looked at the plots and have a model you want to try. Go to the huggingface page for Immich. Ctrl+F and find that model. Click on it and copy the exact name (as in the figure). Then paste that into the settings page under CLIP models (https://immich.app/docs/features/smart-search)

TIP - Both of these conventions are acceptable:

immich-app/ViT-H-14-378-quickgelu__dfn5b
ViT-H-14-378-quickgelu__dfn5b

Some general tips:

Check your setup: Got enough RAM and a beefy machine? You're golden to try anything!
Start big: Go for the model with the best efficiency/average score -> Keeping in mind the capability of your hardware.
Test drive: Run the smart search job and then run some searches to see how it feels.
Adjust as needed: If it's too slow, step down to a smaller model.
FYI: This does mean, rerunning the smart search jobs, waiting for a while for all that is done. I suggest you take backups of your postgres database after every trial. So, once you are done with your trials, and know what you want, you can just switch to the database backup.

So, there you have it! Happy model hunting! 🎉

Some stats from my setup (Do not use this as a reference). Just to give you an idea:

You can do something similar with your trials to help you decide on one.

Hardware	Model	Macs	Quality	Assets	Duration (Initial Processing)	Duration Feel (Searching). Cant really measure this :)
Ryzen 2600, Nvidia P600, Free RAM=32 GB	ViT-B-32__laion2b_e16	7.47 B	0.71	80,000	80 mins	less than a second
	ViT-B-16-SigLIP-384__webli	63.64 B	0.79	80,000	270 mins	slightly close to a second. Little more than `ViT-B-32__laion2b_e16`. But the difference was in milli seconds. Not much of a noticeable one. So, I stuck with this

mertalev · 2024-08-17T06:14:52Z

mertalev
Aug 17, 2024
Maintainer

Thanks for the write-up! Most of this could be added straight to the docs.

Small correction: "efficiency" is related to both the x-axis (MACs) and y-axis (quality). Models that require less number-crunching to get the same quality are more efficient.

A general tip to mitigate the search time for larger models is to set the env MACHINE_LEARNING_PRELOAD__CLIP=<model-name> for the machine learning service. This will make the service load the specified model at startup instead of in the first request. But of course, this means it will be constantly using that RAM.

8 replies

mertalev Aug 17, 2024
Maintainer

Yup, it's unloaded after 300s by default to minimize RAM usage. You can disable the unloading mechanism with the env MACHINE_LEARNING_MODEL_TTL=0.

radh21301 Aug 17, 2024
Author

Awesome. Updated the post.

radh21301 Sep 13, 2024
Author

Hey. I tried running the biggest model (4 gig) on a gpu with 2 gig VRAM :), and it failed - and I saw a github discussion that explained the cause. For some reason I kept thinking it was getting loaded onto the regular RAM and not the VRAM. Is the following statement correct.

Bubble Size -> Memory

When running machine learning only on the CPU -> the 'Memory' is basically RAM
When using hardware accelerated machine learning -> the 'Memory' is the VRAM. So limit yourself to what your GPU is capable of.

mertalev Sep 13, 2024
Maintainer

Yes, if it's a discrete GPU. Integrated GPUs do use normal RAM (mainly applicable to OpenVINO).

MiguelNdeCarvalho Dec 13, 2024

I have an Intel ARC A380 that have 6GB of VRAM, but when I try to use the ViT-H-14-378-quickgelu__dfn5b model it appears to not fit my GPU and defaults to CPU. I took a look the model should consume ~4.8GB and I'm running 2 cuncurrent jobs. Do you guys know anything about this? Thanks!

raisinbear · 2024-08-18T10:35:56Z

raisinbear
Aug 18, 2024

Thanks for the writeup! Exactly what I was looking for. FWIW, I have Immich running on a low power RK3399 machine. So I didn't want to slow things down any furhter. I know, I could potentially use rockchip hardware acceleration, but it's by far not as straightforward as I'd like and I'm not at all in the mood for experimenting on this system right now, as it's running a bunch of other services, too. (Also, would HWA affect search speed or just inital processing?)

So I started the test drive, coming from ViT-B-32__openai and going with the obviously next better ViT-B-16-SigLIP__webli. This post includes a table showing the new model to increase MAC by over a factor of 3 vs. the old one. Indeed, initial processing took much longer. By how much, I cannot say, I'd estimate more than twice. Search, however, is not noticeably slower. It was slightly over a second with ViT-B-32__openai and if at all increased in the ms range with ViT-B-16-SigLIP__webli.
Just wanted to leave that here, in case that helps anyone.

1 reply

radh21301 Aug 18, 2024
Author

That is actually not too bad. 1 second is very reasonable. I also added some stats from my trials to the original post.
I went from 7.47 B MACs to 63.64 B. And my search time did not increase too much. Maybe a few milliseconds, which is totally fine.

sveken · 2024-08-18T23:45:36Z

sveken
Aug 18, 2024

Thanks for this.
I picked ViT-SO400M-14-SigLIP-384__webli and wondered if i was reading the graph right that this was one of the better ones (on pure quality). Though the results where mixed depending on the subject. Seems to be very good on objects but not very good at animals.

Also i did not realize the model was required for individual searches, Good to know.

Also does anyone have any insights in how much better ViT-H-14-378-quickgelu__dfn5b responds over the default in the real world. I know numbers wise its a small jump but do other people find it noticable?

2 replies

mertalev Aug 19, 2024
Maintainer

Just to check, but did you use any uppercase letters for the animal searches? If so, it's related to #11865.

The difference between ViT-B-32__openai and ViT-H-14-378-quickgelu__dfn5b will depend on the query. They're both great at common things like "cat". But for queries involving a less common subject, describing a scene, or searching for text in an image, the latter is much more likely to find exactly what you want. The ranking is also better in general, so the most relevant results are usually at the very top and less scattered.

I don't have experience with ViT-SO400M-14-SigLIP-384__webli, but it should be similar to ViT-H-14-378-quickgelu__dfn5b on paper, maybe with certain domains it's better or worse at.

sveken Aug 19, 2024

Thanks, that linked issue is what i was experiencing.

radh21301 · 2024-09-05T13:11:44Z

radh21301
Sep 5, 2024
Author

Hey Mertalev/internet stranger, if the content in this post looks good to you, would it be worthwhile to add it to the community guides?

https://immich.app/docs/community-guides

I would appreciate if someone would be kind enough to add it on my behalf. I know nothing of software and never used git..ever - so it is a bit daunting to try myself.

2 replies

mertalev Sep 5, 2024
Maintainer

I think it'd be better to add it to the Smart Search documentation. I can make a PR for that.

radh21301 Sep 5, 2024
Author

That sounds great. Thank you!

andyzukunft · 2024-09-24T17:24:05Z

andyzukunft
Sep 24, 2024

Hey all,

thanks for the data provided. I was looking for a good model for German and English usage with "medium" or less hardware requirements.

Based on the benchmarks my current choice is ViT-L-14-quickgelu which scores around 0.80 for both languages.

ViT-L-14-quickgelu stats
MAC: ~90
RAM: ~1900 MB
English score: 0.79
French score: 0.82
German score: 0.79

0 replies

Qhilm · 2024-10-03T20:12:01Z

Qhilm
Oct 3, 2024

I didn't see this mentioned here, hence adding the comment: use remote machine learning if you have a laptop which is more powerful than the machine running immich.

My MacBook Pro M1 is 35 times faster than my Synology DS918+ for example. I can run a new large model on my entire collection of 50k+ pictures in two days vs. more than two months for the Synology NAS. And that's without hardware acceleration / GPU.

I still wish we are a proper OCR job in Immich. CLIP models are just not conceived for recognising text and hence are really bad at it. Ideally, additionally to the OCR job, a way to exclude all text from CLIP, because it just confuses the model a lot I noticed. Pictures with text are often returned for unrelated searches.

8 replies

Qhilm Oct 21, 2024

I think it's not niche at all (lots of people run immich on rasberry pis and similar, this is why the remote ML feature was added) and I think it would be awesome if the remote ML feature could be extended to automatically switch between two destination URLs for ML, each with different job concurrency settings.

slamp Nov 5, 2024

And it's the reason I choose Immich: use a remote gaming PC for ML, I discovered this feature in this video: https://www.youtube.com/watch?v=QHWNu_in0Zc

mertalev Nov 5, 2024
Maintainer

I think it'd be easy to change the URL setting to accept a list and try the URLs in order. Having separate job concurrency would be a lot harder through.

tbelway Nov 6, 2024

I think it'd be easy to change the URL setting to accept a list and try the URLs in order. Having separate job concurrency would be a lot harder through.

Separate job concurrency could be managed through queuing. If redis/valkey is retained as the queuing management, or if it's switched to postgres, both support different mechanics to do so.

Untimed4598 Dec 6, 2024

Thank you for the procedure, it would be super great (if a bit niche) that immich would be able to do this automatically (if powerfull server is up > it uses it and it's fast, if it isn't > it does it locally).

Heads up, v1.122.0 has answered your prayers!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guide: Getting started with choosing a Machine Learning CLIP Model for Smart Search #11862

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 21 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Guide: Getting started with choosing a Machine Learning CLIP Model for Smart Search #11862

Replies: 6 comments · 21 replies

mertalev Aug 17, 2024 Maintainer

mertalev Aug 17, 2024 Maintainer

radh21301 Aug 17, 2024 Author

radh21301 Sep 13, 2024 Author

mertalev Sep 13, 2024 Maintainer

radh21301 Aug 18, 2024 Author

mertalev Aug 19, 2024 Maintainer

radh21301 Sep 5, 2024 Author

mertalev Sep 5, 2024 Maintainer

radh21301 Sep 5, 2024 Author

mertalev Nov 5, 2024 Maintainer

Replies: 6 comments 21 replies

mertalev
Aug 17, 2024
Maintainer

mertalev Aug 17, 2024
Maintainer

radh21301 Aug 17, 2024
Author

radh21301 Sep 13, 2024
Author

mertalev Sep 13, 2024
Maintainer

radh21301 Aug 18, 2024
Author

mertalev Aug 19, 2024
Maintainer

radh21301
Sep 5, 2024
Author

mertalev Sep 5, 2024
Maintainer

radh21301 Sep 5, 2024
Author

mertalev Nov 5, 2024
Maintainer