-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Severe performance degradation with bundled netcdf library in wheel packages (Linux) #1393
Comments
@ocefpaf any idea what might be going on here? |
Hard to say without a deep inspection on how we are building the libraries before and after the change. There are so many differences from netcdf-c and hdf5 version, the flags used to build them, to the whole build system :-/ @xavierabellan one thing that could help us debug this is to figure out if the problem is with how we are building netcdf-c. If you are familiar with docker can you try to use our image from https://github.com/ocefpaf/netcdf-manylinux and see if you can find anything? Maybe try to create a test that uses the netcdf-c from it? To compare, we would need the same for the old wheels but that is not so simple to track down. Another thing we could do is to try to downgrade the libs back to the ones from netcdf4 1.6.5 and see if that changes anything. |
Ok, I had a bit of a play with your netcdf-manylinux container, and to rule out the python part I wrote an equivalent C program to the python test above, so we can assess the netcdf-c library directly. What I have found is that when using the official 4.9.2 release you get the expected performance, but when using the clone from the main branch, or even the latest pre-release 4.9.3-rc2 tag, then you see the degradation. So, I believe it is not an issue of the build itself in the container, but with the latest, yet unreleased netcdf-c library. Since it is the main branch of netcdf-c what is being used in the netcdf-manylinux container, we could see the issue here in the python wheels before experiencing in other applications. |
@xavierabellan thanks for narrowing it down. Would you mind reporting your results here so the C library developers are aware of it? As soon as it's fixed, we can issue a post release with new wheels. |
@xavierabellan I'm a bit lazy here but can you share your tests in case we need to reproduce them in the next netcdf-c release/wheel release? |
I created Unidata/netcdf-c#3067 with the hope the issue can be identified there |
We have noticed a very significant degradation in read performance from version 1.7 of the netcdf4-python, at least reading certain files. The culprit seems to be the bundled netcdf library in the python wheel.
As a reproducer, the International Best Track Archive for Climate Stewardship (IBTrACS) data can be used:
wget https://www.ncei.noaa.gov/data/international-best-track-archive-for-climate-stewardship-ibtracs/v04r01/access/netcdf/IBTrACS.ALL.v04r01.nc
Here are two minimal examples:
I could reproduce the above with versions 1.7.1 and 1.7.2 (latest as of today) of netcdf4 binary wheels on Linux for python 3.11, 3.12 and 3.13. Linux wheels for version 1.6.5 or below seem to be unaffected, as well as the wheels for Mac arm64. I did not test on additional platforms.
When installing from source against an existing netcdf library or using the conda packages, performance is also at the expected level with no degradation.
The text was updated successfully, but these errors were encountered: