Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slurp tests & testing setup #319

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Slurp tests & testing setup #319

wants to merge 2 commits into from

Conversation

joeflack4
Copy link
Contributor

@joeflack4 joeflack4 commented May 27, 2023

Updates

Unit testing

  • Add: test/ dir, and initialized with the necessary files to test slurp / migrate

Related

@joeflack4 joeflack4 changed the title Slurp / migrate Additional migrate updates May 27, 2023
@joeflack4 joeflack4 self-assigned this May 27, 2023
@joeflack4 joeflack4 marked this pull request as draft May 27, 2023 20:04
@joeflack4 joeflack4 added bug Something isn't working qc / test labels May 27, 2023
src/scripts/migrate.py Outdated Show resolved Hide resolved
@joeflack4 joeflack4 force-pushed the more-slurp-updates branch 3 times, most recently from 73b1d6d to 2738ac6 Compare May 27, 2023 21:45
@joeflack4 joeflack4 changed the title Additional migrate updates Unit testing & Additional migrate updates May 27, 2023
@joeflack4 joeflack4 force-pushed the more-slurp-updates branch from 2738ac6 to c190364 Compare May 27, 2023 21:50
@@ -506,6 +513,9 @@ help:
@echo "----------------------------------------"
@echo " Command reference: mondo-ingest"
@echo "----------------------------------------"
# Tests
@echo "tests"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests: Command to run Python-based automated tests.

sys.path.insert(0, str(PROJECT_ROOT))
from src.scripts.migrate import slurp

TEST_INPUT_DIR = TEST_DIR / 'input' / 'test_migrate'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static files selected for testing

Test/Slurp: For migrate.py tests, I decided to make special static copies in test/input/ of the following files:

mondo_mappings_path=str(TEST_INPUT_DIR / 'mondo.sssom.tsv'),
mapping_status_path=str(TEST_INPUT_DIR / 'ordo_mapping_status.tsv'),

And not do so for the following files:

ontology_path=str(ONTOLOGY_DIR / 'components' / 'ordo.owl'),
mondo_terms_path=str(ONTOLOGY_DIR / 'tmp' / 'mirror_signature-mondo.tsv'),
onto_config_path=str(ONTOLOGY_DIR / 'metadata' / 'ordo.yml'),

My rationale is that the first 2 can be customized so that even when our mapping status with ORDO changes, these static files can be set up to help us test basically every possible edge case we can imagine.

The other ones haven't been changed because there's not really a need, and if anything goes wrong with those files either, the test should catch that by erroring.

Possible future changes to static test files

Although, there is some rationale to maybe creating a shortened, static version of ordo.owl which will allow the script to run faster.

Caveats / more possible future changes

I didn't make any dependencies for the tests goal, so if some of those files don't exist, it'll error out. Maybe this is a design flaw. I up the tests goal to conveniently automatically find and run all Python unit tests in the project.

Perhaps a different way would be to split things up, like a test-slurp that has goal dependencies, and then a tests goal with test-slurp as its dependency.

@joeflack4 joeflack4 force-pushed the more-slurp-updates branch 4 times, most recently from c0ae34b to 690e4a0 Compare May 28, 2023 00:05
src/scripts/migrate.py Outdated Show resolved Hide resolved
#################
.PHONY: tests
tests:
cd ../../; python -m unittest discover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than creating many such tests, a better general strategy could be to run the whole odk test pipeline with a subset of the configured ontologies. I have never tried this, maybe something like:

sh run.sh make build-mondo-ingest OTHER_SRC=components/ordo.owl components/doid.owl

No idea if that would work, and it will almost certainly require tweaking, but at least it would run the entire pipeline rather than cherry picking some parts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea. I added this to:

I think doing testing on whole build a good idea but not a 100% substitute for what can be addressed by running smaller parts in isolation with some static inputs, which has benefits of:

  1. Can run quickly whenever needed
  2. Static inputs can be built specifically to detect bugs and handle edge, which otherwise might not come up when running the whole pipeline with dynamic inputs.

@joeflack4 joeflack4 force-pushed the more-slurp-updates branch from 690e4a0 to 1ae4441 Compare May 28, 2023 21:42
@joeflack4 joeflack4 changed the title Unit testing & Additional migrate updates Unit testing May 28, 2023
@joeflack4 joeflack4 mentioned this pull request May 28, 2023
@joeflack4 joeflack4 force-pushed the more-slurp-updates branch from 1ae4441 to 9d57219 Compare May 28, 2023 21:50
@joeflack4 joeflack4 removed the bug Something isn't working label May 28, 2023
@joeflack4 joeflack4 force-pushed the more-slurp-updates branch 4 times, most recently from 233cc1a to def3532 Compare May 29, 2023 01:13
- Add: test/ dir, and initialized with the necessary files to test slurp / migrate
- Update: Temp: Updated slurp migrate script with some temporary logic to create a modified mapping status file that will be used as a static test input.
@joeflack4 joeflack4 force-pushed the more-slurp-updates branch from def3532 to a6b1a34 Compare May 29, 2023 01:15
Copy link
Contributor Author

@joeflack4 joeflack4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matentzn Now ready for review.
@hrshdhgd @souzadevinicius Finally added some unit testing to mondo-ingest, starting with the migration pipeline. The tests are short and have some comments. If you glance through them you can probably grok what we're checking for when it comes to determining whether a term from an ontology is migratable or not into Mondo.

cls.migratable = cls.df['xref'].to_list()[1:]
cls.mapping_status_df = pd.read_csv(mapping_status_path, sep='\t')

def test_parents_field(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test parents field to make sure these are Mondo IDs and not native ontology IDs.

parents: List[CURIE] = example_parents_str.split('|')
self.assertTrue(all([x.startswith('MONDO') for x in parents]))

def test_all_expected_in_output(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests against known passing cases in input file. All of these terms meet all conditions for inclusion in the output file, e.g. they're not mapped, excluded, or obsolete, and they either have no parents, or all of their parents are either mapped, excluded, or obsolete.

self.mapping_status_df[self.mapping_status_df['expected_in_output'] == True]['subject_id'].to_list()
self.assertTrue(all([x in self.migratable for x in expected]))

def test_all_unexpected_not_in_output(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests against known failing cases in input file. None of these terms meet all conditions for inclusion in the output file, e.g. they're either mapped, excluded, or obsolete, or not all of their parents are mapped, excluded, or obsolete.

self.mapping_status_df[self.mapping_status_df['expected_in_output'] == False]['subject_id'].to_list()
self.assertTrue(all([x not in self.migratable for x in unexpected]))

def test_deprecated_excluded_mapped_not_in_output(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests subset cases of test_all_unexpected_not_in_output(), specifically one case of each where a term is either mapped, excluded, or obsolete/deprecated.

unexpected: CURIE = df['subject_id'].to_list()[0]
self.assertNotIn(unexpected, self.migratable)

def test_no_parents_in_output(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests subset case of test_all_expected_in_output(), specifically a case where the term is not mapped, excluded, or obsolete, and also that it has no parents.

Copy link
Contributor Author

@joeflack4 joeflack4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matentzn Now ready for review.
@hrshdhgd @souzadevinicius Finally added some unit testing to mondo-ingest, starting with the migration pipeline. The tests are short and have some comments. If you glance through them you can probably grok what we're checking for when it comes to determining whether a term from an ontology is migratable or not into Mondo.

@joeflack4 joeflack4 marked this pull request as ready for review May 29, 2023 01:28
@joeflack4 joeflack4 requested a review from hrshdhgd May 29, 2023 01:28
@joeflack4 joeflack4 force-pushed the more-slurp-updates branch from ff43641 to 5e9af1f Compare May 29, 2023 01:29
@@ -20,8 +22,11 @@
from oaklib.implementations import ProntoImplementation
from oaklib.types import CURIE, URI

from utils import CACHE_DIR, DOCS_DIR, PREFIX, PROJECT_DIR, Term, _get_all_owned_terms, _get_next_available_mondo_id, \
get_mondo_term_ids, _load_ontology, SLURP_DIR
SCRIPTS_DIR = Path(os.path.abspath(os.path.dirname(__file__)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be rewritten to purely using the Path lib by

SCRIPTS_DIR = Path(__file__).parent

I have experienced very rare situations where os creates issues in Windows systems but I could be wrong on this. I have tried to be pure Path as much as possible since it is os agnostic.

More about this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I should have probably switched to Path long ago. I will likely make this change.

import pandas as pd
from oaklib.types import CURIE

TEST_DIR = Path(os.path.abspath(os.path.dirname(__file__)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same refactor suggestion as before.

Copy link
Contributor

@hrshdhgd hrshdhgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests in general look good to me. Nico's suggestion about adopting the odd test pipeline would be awesome and a learning experience for me through you @joeflack4 ! lol!

#319
- Delete: Temporary logic for generating static test input file for slurp.
@joeflack4
Copy link
Contributor Author

joeflack4 commented Dec 8, 2023

@souzadevinicius I don't know if you have any experience or opinions to share regarding automated testing in Python. Feel free to share if you have a review; just wanting to keep you in on the loop here. FWe have a sore lack of testing in Mondo, and some time soon (hopefully) Nico and I, maybe others, will settle on the architecture we want for running tests.

IMO I don't think it has to be complicated and I'm not too opinionated about it.

@souzadevinicius
Copy link
Member

Nice.I would be happy to contribute 😊

@joeflack4 joeflack4 changed the title Unit testing Testing setup Dec 8, 2023
@joeflack4 joeflack4 mentioned this pull request Dec 9, 2023
@@ -0,0 +1 @@
"""Unit tests"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporate QC checks

The checks added in #385 should be incorporated here as tests as part of this PR ideally.

@joeflack4 joeflack4 marked this pull request as draft February 2, 2024 23:12
@joeflack4 joeflack4 changed the title Testing setup Slurp tests & testing setup Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants