Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warm up runpod workers #90

Open
triplecookedchips opened this issue Jan 1, 2025 · 3 comments
Open

Warm up runpod workers #90

triplecookedchips opened this issue Jan 1, 2025 · 3 comments

Comments

@triplecookedchips
Copy link

Problem: When a new worker gets assigned the first run always takes significantly longer due to models needing to be loaded into the GPU memory (around 45 secs in my case). Subsequent runs are much faster as the models have been cached (15 secs in my case). Runpod is pretty good at caching but workers do come and go and occasionally I'll have to wait a long time for my image if I'm dealing with a new worker.

It would be great if every time a worker gets assigned, they automatically run a warm up workflow that caches the models. That way, all API calls would be rapid.

I just can't figure out how to trigger the workflow once the docker image has been pulled.

@franckdsf
Copy link

I'm not an expert, but one potential workaround could be to run a Python script that preloads the model into RAM before starting ComfyUI.

Note that if you're using RunPod serverless, what you refer to as a "new worker" is essentially a worker with the Dockerfile pre-loaded on it. However, nothing is actually loaded onto the machine except for the Dockerfile. The "new worker" isn't online yet. It's still offline. As a result, preloading anything other than the Dockerfile isn't possible. You might want to set an "active worker" that will always run and have the model loaded on it.

@alan0xd7
Copy link

alan0xd7 commented Jan 6, 2025

Hey, I'm interested in doing some warmup exercise as well!

I tried adding vmtouch in the startup script, but like franckdsf said above, nothing actually gets run before a worker receives a request, so it doesn't really help too much.

Maybe some kind of external pings to the endpoint? But there's no guarantee it'll hit a "cold" worker 🤔

Also just wanted to ask, how are you guys getting the models into the container? I tried adding it in the Dockerfile, but the build just takes extremely long (1 or 2 plus hours!), so I ended up using a network volume but that restricts my workers to one DC and limits availability...

@triplecookedchips
Copy link
Author

I have resorted to using active workers as I couldn't figure out a solution. As I'm only using SD1.5 I can get away with a low VRAM GPU, plus it's 30% discount for active workers.

yeah I bake the models into the dockerfile - takes a while to build, but then I believe it's faster to run over a network volume.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants