Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for grib idx & reinflate #528

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

emfdavid
Copy link
Contributor

@emfdavid emfdavid commented Nov 19, 2024

Add chunky test files and copy over unit test approach.

Builds on #523

TODO:

  • Get tests working by editing fixtures/code
  • Modify to use pytest instead of unittest
  • Squash commits to avoid extra git blobs

@martindurant
Copy link
Member

Well that's a lot .... :)

@emfdavid
Copy link
Contributor Author

Well that's a lot .... :)

All part of my plan to become top contributor...

@emfdavid emfdavid changed the title Add tests for discussion Add tests for grib idx & reinflate Nov 27, 2024
@emfdavid emfdavid force-pushed the grib_idx_tests branch 2 times, most recently from 65ff274 to 261ee2f Compare December 16, 2024 18:57
@@ -284,6 +291,10 @@ def test_build_idx_grib_mapping(self):
)
expected = pd.read_parquet(kindex_test_path)

expected = expected.assign(
step=lambda x: x.step.astype("timedelta64[ns]")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martindurant These errors in the CI system are really strange... These typically only happen when using an old version of numpy/pandas that has a different default time type.
The last one, the sync error... that seems like maybe I need to convert from unittest subtest to pytest?

@martindurant
Copy link
Member

It claims to have numpy 2.2.0, but there is this:

RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 16 from C header, got 96 from PyObject

which suggests maybe there is a conda-forge/defaults/pip crossover? We do have nodefaults in the environment spec.

@emfdavid
Copy link
Contributor Author

It claims to have numpy 2.2.0, but there is this:

RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 16 from C header, got 96 from PyObject

which suggests maybe there is a conda-forge/defaults/pip crossover? We do have nodefaults in the environment spec.

Welp - I dumped as much version info as I can think to and I don't see any smoking gun
https://github.com/fsspec/kerchunk/actions/runs/12385105383/job/34570789000?pr=528#step:5:532

@emfdavid
Copy link
Contributor Author

Okay - I can repro locally now after installing anaconda python.
I will debug tomorrow.

@emfdavid
Copy link
Contributor Author

Okay - fixed the dtype on the step column.
It is an issue with reading the parquet files - I had to set engine='fastparquet' on the pd.read_parquet calls in my test.
Otherwise the timedelta64 step column does not decode properly.
This only happens in the anaconda environment. I can't repro the issue when I use a python 3.12 virtual env.

I am not sure what the remaining sync error is. I can't reproduce that one locally yet. Could it be a version skew problem?

@emfdavid
Copy link
Contributor Author

Looks like installing the head of fsspec was breaking this test.
Thoughts @martindurant ?

The pd.read_parquet requiring engine='fastparquet' when using anaconda python is also quiet strange. Is there some global state being set that would break timedelta64 types?
I will try to make a repro test for the latter on the main branch.

@emfdavid emfdavid mentioned this pull request Dec 19, 2024
@emfdavid
Copy link
Contributor Author

Okay - tests are green but I don't see an easy way to convert the heavy use of unittest subtest to pytest parameterize mark?

The current behavior, when run with subtest give really nice error messages when things go wrong for a particular set of subtests. I forced a failure for some subtests by adding:

diff --git a/tests/test__grib_idx.py b/tests/test__grib_idx.py
index 1e83d2f..4651710 100644
--- a/tests/test__grib_idx.py
+++ b/tests/test__grib_idx.py
@@ -630,6 +630,9 @@ class DataExtractorTests(unittest.TestCase):
                             ]
                         )

+                        if var.name == "dswrf":
+                            self.fail("oh no!")
+
                         # # To update test grib_idx_fixtures

Now, when I run python -m unittest tests/test__grib_idx.py, I get:

======================================================================
FAIL: test_reinflate_grib_store (tests.test__grib_idx.DataExtractorTests.test_reinflate_grib_store) (var_name='dswrf', node_path='/dswrf/avg/surface', dataset='hrrr.wrfsubhf', aggregation=<AggregationType.BEST_AVAILABLE: 'best_available'>)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/davidstuebe/projects/kerchunk/tests/test__grib_idx.py", line 634, in _reinflate_grib_store
    self.fail("oh no!")
AssertionError: oh no!

======================================================================
FAIL: test_reinflate_grib_store (tests.test__grib_idx.DataExtractorTests.test_reinflate_grib_store) (var_name='dswrf', node_path='/dswrf/instant/surface', dataset='hrrr.wrfsubhf', aggregation=<AggregationType.BEST_AVAILABLE: 'best_available'>)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/davidstuebe/projects/kerchunk/tests/test__grib_idx.py", line 634, in _reinflate_grib_store
    self.fail("oh no!")
AssertionError: oh no!

Pytest will run all these cases... but it doesn't give any of the subtest context on which part failed:

______________________________________________________________________________ DataExtractorTests.test_reinflate_grib_store _______________________________________________________________________________

self = <test__grib_idx.DataExtractorTests testMethod=test_reinflate_grib_store>

    def test_reinflate_grib_store(self):
        for dataset in self._reinflate_grib_store_dataset():
            for aggregation, axes in self._reinflate_grib_store_aggregation():
                with self.subTest(dataset=dataset, aggregation=aggregation):
>                   self._reinflate_grib_store(dataset, aggregation, axes)

tests/test__grib_idx.py:658:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/test__grib_idx.py:634: in _reinflate_grib_store
    self.fail("oh no!")
E   AssertionError: oh no!

I found a subtest package for pypi, but I am not sure you want the extra dependency. Any clever ideas on how to restructure the tests without a total rewrite?

@martindurant
Copy link
Member

Any clever ideas on how to restructure the tests without a total rewrite?

Sorry, I have never had to do any complex unittest->pytest refactoring.

When you use parametrize and -v for the command, you do see parameter names (often just numbered, depending on input type) in the PASS/FAIL list.

I think adding helper packages for the sake of saner test run output is totally fine. Test-time dependencies are easier to justify than runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants