-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slurp tests & testing setup #319
base: main
Are you sure you want to change the base?
Conversation
73b1d6d
to
2738ac6
Compare
2738ac6
to
c190364
Compare
@@ -506,6 +513,9 @@ help: | |||
@echo "----------------------------------------" | |||
@echo " Command reference: mondo-ingest" | |||
@echo "----------------------------------------" | |||
# Tests | |||
@echo "tests" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests: Command to run Python-based automated tests.
sys.path.insert(0, str(PROJECT_ROOT)) | ||
from src.scripts.migrate import slurp | ||
|
||
TEST_INPUT_DIR = TEST_DIR / 'input' / 'test_migrate' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Static files selected for testing
Test/Slurp: For migrate.py
tests, I decided to make special static copies in test/input/
of the following files:
mondo_mappings_path=str(TEST_INPUT_DIR / 'mondo.sssom.tsv'),
mapping_status_path=str(TEST_INPUT_DIR / 'ordo_mapping_status.tsv'),
And not do so for the following files:
ontology_path=str(ONTOLOGY_DIR / 'components' / 'ordo.owl'),
mondo_terms_path=str(ONTOLOGY_DIR / 'tmp' / 'mirror_signature-mondo.tsv'),
onto_config_path=str(ONTOLOGY_DIR / 'metadata' / 'ordo.yml'),
My rationale is that the first 2 can be customized so that even when our mapping status with ORDO changes, these static files can be set up to help us test basically every possible edge case we can imagine.
The other ones haven't been changed because there's not really a need, and if anything goes wrong with those files either, the test should catch that by erroring.
Possible future changes to static test files
Although, there is some rationale to maybe creating a shortened, static version of ordo.owl
which will allow the script to run faster.
Caveats / more possible future changes
I didn't make any dependencies for the tests
goal, so if some of those files don't exist, it'll error out. Maybe this is a design flaw. I up the tests
goal to conveniently automatically find and run all Python unit tests in the project.
Perhaps a different way would be to split things up, like a test-slurp
that has goal dependencies, and then a tests
goal with test-slurp
as its dependency.
c0ae34b
to
690e4a0
Compare
################# | ||
.PHONY: tests | ||
tests: | ||
cd ../../; python -m unittest discover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than creating many such tests, a better general strategy could be to run the whole odk test pipeline with a subset of the configured ontologies. I have never tried this, maybe something like:
sh run.sh make build-mondo-ingest OTHER_SRC=components/ordo.owl components/doid.owl
No idea if that would work, and it will almost certainly require tweaking, but at least it would run the entire pipeline rather than cherry picking some parts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. I added this to:
I think doing testing on whole build a good idea but not a 100% substitute for what can be addressed by running smaller parts in isolation with some static inputs, which has benefits of:
- Can run quickly whenever needed
- Static inputs can be built specifically to detect bugs and handle edge, which otherwise might not come up when running the whole pipeline with dynamic inputs.
690e4a0
to
1ae4441
Compare
1ae4441
to
9d57219
Compare
233cc1a
to
def3532
Compare
- Add: test/ dir, and initialized with the necessary files to test slurp / migrate - Update: Temp: Updated slurp migrate script with some temporary logic to create a modified mapping status file that will be used as a static test input.
def3532
to
a6b1a34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matentzn Now ready for review.
@hrshdhgd @souzadevinicius Finally added some unit testing to mondo-ingest
, starting with the migration pipeline. The tests are short and have some comments. If you glance through them you can probably grok what we're checking for when it comes to determining whether a term from an ontology is migratable or not into Mondo.
cls.migratable = cls.df['xref'].to_list()[1:] | ||
cls.mapping_status_df = pd.read_csv(mapping_status_path, sep='\t') | ||
|
||
def test_parents_field(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test parents field to make sure these are Mondo IDs and not native ontology IDs.
parents: List[CURIE] = example_parents_str.split('|') | ||
self.assertTrue(all([x.startswith('MONDO') for x in parents])) | ||
|
||
def test_all_expected_in_output(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests against known passing cases in input file. All of these terms meet all conditions for inclusion in the output file, e.g. they're not mapped, excluded, or obsolete, and they either have no parents, or all of their parents are either mapped, excluded, or obsolete.
self.mapping_status_df[self.mapping_status_df['expected_in_output'] == True]['subject_id'].to_list() | ||
self.assertTrue(all([x in self.migratable for x in expected])) | ||
|
||
def test_all_unexpected_not_in_output(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests against known failing cases in input file. None of these terms meet all conditions for inclusion in the output file, e.g. they're either mapped, excluded, or obsolete, or not all of their parents are mapped, excluded, or obsolete.
self.mapping_status_df[self.mapping_status_df['expected_in_output'] == False]['subject_id'].to_list() | ||
self.assertTrue(all([x not in self.migratable for x in unexpected])) | ||
|
||
def test_deprecated_excluded_mapped_not_in_output(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests subset cases of test_all_unexpected_not_in_output(), specifically one case of each where a term is either mapped, excluded, or obsolete/deprecated.
unexpected: CURIE = df['subject_id'].to_list()[0] | ||
self.assertNotIn(unexpected, self.migratable) | ||
|
||
def test_no_parents_in_output(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests subset case of test_all_expected_in_output(), specifically a case where the term is not mapped, excluded, or obsolete, and also that it has no parents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matentzn Now ready for review.
@hrshdhgd @souzadevinicius Finally added some unit testing to mondo-ingest
, starting with the migration pipeline. The tests are short and have some comments. If you glance through them you can probably grok what we're checking for when it comes to determining whether a term from an ontology is migratable or not into Mondo.
ff43641
to
5e9af1f
Compare
@@ -20,8 +22,11 @@ | |||
from oaklib.implementations import ProntoImplementation | |||
from oaklib.types import CURIE, URI | |||
|
|||
from utils import CACHE_DIR, DOCS_DIR, PREFIX, PROJECT_DIR, Term, _get_all_owned_terms, _get_next_available_mondo_id, \ | |||
get_mondo_term_ids, _load_ontology, SLURP_DIR | |||
SCRIPTS_DIR = Path(os.path.abspath(os.path.dirname(__file__))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be rewritten to purely using the Path
lib by
SCRIPTS_DIR = Path(__file__).parent
I have experienced very rare situations where os
creates issues in Windows systems but I could be wrong on this. I have tried to be pure Path
as much as possible since it is os agnostic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I should have probably switched to Path
long ago. I will likely make this change.
import pandas as pd | ||
from oaklib.types import CURIE | ||
|
||
TEST_DIR = Path(os.path.abspath(os.path.dirname(__file__))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same refactor suggestion as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests in general look good to me. Nico's suggestion about adopting the odd test pipeline would be awesome and a learning experience for me through you @joeflack4 ! lol!
#319 - Delete: Temporary logic for generating static test input file for slurp.
5e9af1f
to
584518f
Compare
@souzadevinicius I don't know if you have any experience or opinions to share regarding automated testing in Python. Feel free to share if you have a review; just wanting to keep you in on the loop here. FWe have a sore lack of testing in Mondo, and some time soon (hopefully) Nico and I, maybe others, will settle on the architecture we want for running tests. IMO I don't think it has to be complicated and I'm not too opinionated about it. |
Nice.I would be happy to contribute 😊 |
@@ -0,0 +1 @@ | |||
"""Unit tests""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorporate QC checks
The checks added in #385 should be incorporated here as tests as part of this PR ideally.
Updates
Unit testing
Related