You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HalvingRandomSearchCV spawns 4 workers with multiprocessing and then uses threadpoolctl to configure each worker to use 6 CPU cores for OpenMP.
Failure mode
For a 24 CPU cores, here is an example that over-subscribes and stalls:
deff(x):
...
x @ A# Uses matmul which uses all cores by default (24)
...
# Spawns 24 multi-processing workers to run fquad_vec(f, ..., workers=4)
The user is responsible to prevent oversubscription:
deff(x):
withthreadpool_limits(limits=6, user_api='blas'):
x @ A
Underlying Questions
As free-threading Python because real, more users will run library code with multi-threading and ultimately run into this problem. There are two questions:
Should libraries with "spawners" be responsible for setting the number of cores for their workers?
If so, how should this configuration be communicated between libraries?
Currently, Python libraries generally have three ways to configure parallelism:
Environment variable, export OMP_NUM_THREADS=8
Set globally, torch.set_num_threads(8).
Context manager, with threadpool_limits(limits=8)
Functions signature, fft(..., workers=8)
Solution is likely a thread-local config using contextvars that is (somehow) shared between libraries.
Parallelism in Python have two semantics:
concurrent.futures.ProcessPoolExecutor(max_workers=4)
cuoncurrent.futures.ThreadPoolExecutor(max_workers=4)
joblib.Parallel(n_jobs=4)
A @ B
does matrix multiplication with BLASscipy.fft.fft(..., workers=8)
list(range(10))
(Pure Python that is single core)Spawner Configuring the Computer
Scikit-learn uses both semantics and automatically configures parallelism to prevent over subscription. With 24 CPU cores:
HalvingRandomSearchCV
spawns 4 workers with multiprocessing and then usesthreadpoolctl
to configure each worker to use 6 CPU cores for OpenMP.Failure mode
For a 24 CPU cores, here is an example that over-subscribes and stalls:
The user is responsible to prevent oversubscription:
Underlying Questions
As free-threading Python because real, more users will run library code with multi-threading and ultimately run into this problem. There are two questions:
export OMP_NUM_THREADS=8
torch.set_num_threads(8)
.with threadpool_limits(limits=8)
fft(..., workers=8)
Session Notes
The text was updated successfully, but these errors were encountered: