Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouping issue: Automated testing #318

Open
5 tasks
joeflack4 opened this issue May 24, 2023 · 9 comments
Open
5 tasks

Grouping issue: Automated testing #318

joeflack4 opened this issue May 24, 2023 · 9 comments

Comments

@joeflack4
Copy link
Contributor

joeflack4 commented May 24, 2023

Overview

I felt it would be a good idea to create a general issue around automated testing (unit tests, integration tests, etc). Right now we don't have any tests at all, I believe.

Modules & specific cases to test

Details

These are some of the workflows that I think could most use some tests.

For "whole pipeline itself", we would just download sources and run; dynamic inputs. For each of these other "modules", we probably should create some static input files.

Integration tests (checking that outputs match expectations) are higher priority than unit tests (e.g. at the function level).

  • 1. E2E: Whole pipeline itself
  • 2. Migration pipeline (slurp)
  • 3. "Unmapped" artefact pipelines
  • 4. Exclusions pipeline
  • 5. prefix maps
    E.g. the bimaps, i.e. mondo.sssom.config.yaml's curie_map (moot if fully replaced by the extended_prefix_map in there) and config/prefixes.csv (moot once it is dynamically generated from mondo.sssom.config.yaml). Basically we want to ensure it is bijective; i.e. no duplciate URI prefixes.

1. Whole pipeline

Details

Nico wrote:

Rather than creating many such tests, a better general strategy could be to run the whole odk test pipeline with a subset of the configured ontologies. I have never tried this, maybe something like:

sh run.sh make build-mondo-ingest OTHER_SRC=components/ordo.owl components/doid.owl

2. Migration pipeline (slurp)

Details

Possible tests:
I. Mock terms
Can create some static input files where we can create some mock terms that meet various conditions, and we can assert what we expect after running the workflow (e.g. whether or not the term appears in slurp/%.tsv).

  • excluded / not
  • deprecated /not
  • parents deprecated (0, 1, or all)
  • parents not meet narrow/exact match conditions?
    II. Output file should never have more than 1k rows

3. "Unmapped" artefact pipelines

Details

Possible tests:
I. Every ingest should have no more than 50% unmapped classes
The easiest way to check this is to use scripts/unmapped_docs.py. There is a function that creates the table in unmapped.md in the docs/ which has these %'s calculated.

Creating test input files

GitHub action

Details

Should run:

  • Integration & unit tests
  • ODK Makefile test goal
    Should not run:
  • The whole pipeline: Not enough memory, and we'd want a separate action for that anyway.

When:

  • schedule?: Doesn't make sense unless dynamic inputs
  • push: To main
  • pull_request: To main

Makefile set up

Details

We have an existing Makefile test goal, but this is for common cases for ODK projects. To incorporate integration tests and unit tests, we probably want something like:

test-all: test mondo-ingest-tests

mondo-ingest-tests:
    python -m unittest discover

Nico thought about integration-test but we might have unit tests as well. I don't know if it makes sense to create a 2nd goal for unit-test; I don't see that we'll really want to run them separately.

Discussion

Module tests in isolation with static inputs

Details

Possible future changes to static test files

Adding ontology files: Although, there is some rationale to maybe creating a shortened, static version of ontology files, e.g. ordo.owl which will allow SPARQL queries, etc, to run faster. Will also prevent failures in cases where the ontology changes.

Caveats / more possible future changes

Details

Makefile test setup: I didn't make any dependencies for the tests goal, so if some of those files don't exist, it'll error out. Maybe this is a design flaw. I up the tests goal to conveniently automatically find and run all Python unit tests in the project. Perhaps a different way would be to split things up, like a test-slurp that has goal dependencies, and then a tests goal with test-slurp as its dependency.

@joeflack4
Copy link
Contributor Author

joeflack4 commented Aug 31, 2023

@matentzn We discussed testing today at our meeting. Rather than making a new issue, I updated this existing one with our thoughts.

The reason it's called a "grouping issue" is because I don't expect it to be something where we specifically pick a time to tackle the whole issue at once.

@joeflack4
Copy link
Contributor Author

joeflack4 commented Sep 8, 2023

@matentzn Btw, I ran make test today in mondo-ingest and the issue here is that NCIT fails because of memory.

Log

make test
mondo-ingest.Makefile:39: warning: overriding commands for target `components/omim.owl'
Makefile:447: warning: ignoring old commands for target `components/omim.owl'
mondo-ingest.Makefile:54: warning: overriding commands for target `components/ordo.owl'
Makefile:458: warning: ignoring old commands for target `components/ordo.owl'
mondo-ingest.Makefile:73: warning: overriding commands for target `components/ncit.owl'
Makefile:436: warning: ignoring old commands for target `components/ncit.owl'
mondo-ingest.Makefile:84: warning: overriding commands for target `components/doid.owl'
Makefile:392: warning: ignoring old commands for target `components/doid.owl'
mondo-ingest.Makefile:106: warning: overriding commands for target `component-download-icd10cm.owl'
Makefile:410: warning: ignoring old commands for target `component-download-icd10cm.owl'
mondo-ingest.Makefile:112: warning: overriding commands for target `components/icd10cm.owl'
Makefile:414: warning: ignoring old commands for target `components/icd10cm.owl'
mondo-ingest.Makefile:127: warning: overriding commands for target `components/icd10who.owl'
Makefile:425: warning: ignoring old commands for target `components/icd10who.owl'
mondo-ingest.Makefile:142: warning: overriding commands for target `component-download-gard.owl'
Makefile:399: warning: ignoring old commands for target `component-download-gard.owl'
mondo-ingest.Makefile:146: warning: overriding commands for target `components/gard.owl'
Makefile:403: warning: ignoring old commands for target `components/gard.owl'
mondo-ingest.Makefile:151: warning: overriding commands for target `mondo-ingest-full.owl'
Makefile:588: warning: ignoring old commands for target `mondo-ingest-full.owl'
mondo-ingest.Makefile:545: warning: overriding commands for target `help'
Makefile:659: warning: ignoring old commands for target `help'
echo "ODK Makefile version: v1.5 (this is the version of the ODK with which this Makefile was generated, \
        not the version of the ODK you are running)" &&\
echo "ROBOT version (ODK): " && robot --catalog catalog-v001.xml --version
ODK Makefile version: v1.5 (this is the version of the ODK with which this Makefile was generated,         not the version of the ODK you are running)
ROBOT version (ODK):
ROBOT version 1.9.1
robot --catalog catalog-v001.xml reason --input tmp/mondo-ingest-preprocess.owl --reasoner ELK --equivalent-classes-allowed asserted-only \
--exclude-tautologies structural --output test.owl && rm test.owl
robot --catalog catalog-v001.xml verify -i tmp/mondo-ingest-preprocess.owl --queries ../sparql/properties-as-annotation-and-object-violation.sparql ../sparql/ordo-mapping-annotations-violation.sparql -O reports
bPASS Rule ../sparql/properties-as-annotation-and-object-violation.sparql: 0 violation(s)
PASS Rule ../sparql/ordo-mapping-annotations-violation.sparql: 0 violation(s)
if [ true  = true ] && [ true  = true ]; then robot --catalog catalog-v001.xml merge -I http://purl.obolibrary.org/obo/doid.owl \
annotate --ontology-iri http://purl.obolibrary.org/obo/mondo-ingest/component-download-doid.owl annotate -V http://purl.obolibrary.org/obo/mondo-ingest/releases/2023-08-31/component-download-doid.owl --annotation owl:versionInfo 2023-08-31 -o tmp/component-download-doid.owl.owl; fi
^R
beep
if [ true  = true ]; then robot --catalog catalog-v001.xml query -i "tmp/component-download-doid.owl.owl" -q "../sparql/doid-relevant-signature.sparql" tmp/doid_relevant_signature.txt; fi
if [ true  = true ]; then robot --catalog catalog-v001.xml remove -i tmp/component-download-doid.owl.owl --select imports \
rename --mappings config/property-map-1.sssom.tsv --allow-missing-entities true \
rename --mappings config/property-map-2.sssom.tsv --allow-missing-entities true \
remove -T tmp/doid_relevant_signature.txt --select complement --select "classes individuals" --trim false \
query \
--update ../sparql/[fix_omimps.ru](http://fix_omimps.ru/) \
--update ../sparql/[fix_hgnc_mappings.ru](http://fix_hgnc_mappings.ru/) \
--update ../sparql/[fix-labels-with-brackets.ru](http://fix-labels-with-brackets.ru/) \
--update ../sparql/[fix_complex_reification.ru](http://fix_complex_reification.ru/) \
--update ../sparql/[rm_xref_by_prefix.ru](http://rm_xref_by_prefix.ru/) \
--update ../sparql/[fix_make_omim_exact.ru](http://fix_make_omim_exact.ru/) \
remove -T config/properties.txt --select complement --select properties --trim true \
annotate --ontology-iri http://purl.obolibrary.org/obo/mondo/sources/doid.owl --version-iri http://purl.obolibrary.org/obo/mondo/sources/2023-08-31/doid.owl -o components/doid.owl; fi
if [ true  = true ] && [ true  = true ]; then robot --catalog catalog-v001.xml merge -I https://github.com/monarch-initiative/gard/releases/latest/download/gard.owl \
annotate --ontology-iri http://purl.obolibrary.org/obo/mondo-ingest/component-download-gard.owl annotate -V http://purl.obolibrary.org/obo/mondo-ingest/releases/2023-08-31/component-download-gard.owl --annotation owl:versionInfo 2023-08-31 -o tmp/component-download-gard.owl.owl; fi
if [ true  = true ]; then robot --catalog catalog-v001.xml query -i "tmp/component-download-gard.owl.owl" -q "../sparql/gard-relevant-signature.sparql" tmp/gard_relevant_signature.txt; fi
if [ true  = true ]; then robot --catalog catalog-v001.xml remove -i tmp/component-download-gard.owl.owl --select imports \
remove -T tmp/gard_relevant_signature.txt --select complement --select "classes individuals" --trim false \
annotate --ontology-iri http://purl.obolibrary.org/obo/mondo/sources/gard.owl --version-iri http://purl.obolibrary.org/obo/mondo/sources/2023-08-31/gard.owl -o components/gard.owl; fi
if [ true  = true ]; then wget "https://data.bioontology.org/ontologies/ICD10CM/submissions/22/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb" -O tmp/icd10cm.tmp.owl && robot --catalog catalog-v001.xml remove -i tmp/icd10cm.tmp.owl --select imports \
remove -T config/remove_properties.txt \
annotate --ontology-iri http://purl.obolibrary.org/obo/mondo-ingest/component-download-icd10cm.owl annotate -V http://purl.obolibrary.org/obo/mondo-ingest/releases/2023-08-31/component-download-icd10cm.owl --annotation owl:versionInfo 2023-08-31 -o tmp/component-download-icd10cm.owl.owl; fi
--2023-08-31 19:10:09--  https://data.bioontology.org/ontologies/ICD10CM/submissions/22/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb
Resolving [data.bioontology.org](http://data.bioontology.org/) ([data.bioontology.org](http://data.bioontology.org/))... 171.64.13.9
Connecting to [data.bioontology.org](http://data.bioontology.org/) ([data.bioontology.org](http://data.bioontology.org/))|171.64.13.9|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 50870565 (49M) [application/octet-stream]
Saving to: ‘tmp/icd10cm.tmp.owl’
tmp/icd10cm.tmp 100%[=====>]  48.51M  1.34MB/s    in 37s
2023-08-31 19:10:51 (1.30 MB/s) - ‘tmp/icd10cm.tmp.owl’ saved [50870565/50870565]
if [ true  = true ]; then robot --catalog catalog-v001.xml query -i "tmp/component-download-icd10cm.owl.owl" -q "../sparql/icd10cm-relevant-signature.sparql" tmp/icd10cm_relevant_signature.txt; fi
if [ true  = true ]; then robot --catalog catalog-v001.xml merge -i tmp/component-download-icd10cm.owl.owl \
rename --mappings config/property-map-1.sssom.tsv --allow-missing-entities true \
rename --mappings config/property-map-2.sssom.tsv --allow-missing-entities true \
remove -T tmp/icd10cm_relevant_signature.txt --select complement --select "classes individuals" --trim false \
remove -T tmp/icd10cm_relevant_signature.txt --select individuals \
query \
--update ../sparql/[fix_omimps.ru](http://fix_omimps.ru/) \
--update ../sparql/[fix-labels-with-brackets.ru](http://fix-labels-with-brackets.ru/) \
--update ../sparql/[fix_hgnc_mappings.ru](http://fix_hgnc_mappings.ru/) \
--update ../sparql/[fix_complex_reification.ru](http://fix_complex_reification.ru/) \
remove -T config/properties.txt --select complement --select properties --trim true \
annotate --ontology-iri http://purl.obolibrary.org/obo/mondo/sources/icd10cm.owl --version-iri http://purl.obolibrary.org/obo/mondo/sources/2023-08-31/icd10cm.owl -o components/icd10cm.owl; fi
if [ true  = true ] && [ true  = true ]; then robot --catalog catalog-v001.xml merge -I https://github.com/monarch-initiative/icd10who/releases/latest/download/icd10who.ttl \
annotate --ontology-iri http://purl.obolibrary.org/obo/mondo-ingest/component-download-icd10who.owl annotate -V http://purl.obolibrary.org/obo/mondo-ingest/releases/2023-08-31/component-download-icd10who.owl --annotation owl:versionInfo 2023-08-31 -o tmp/component-download-icd10who.owl.owl; fi
if [ true  = true ]; then robot --catalog catalog-v001.xml query -i "tmp/component-download-icd10who.owl.owl" -q "../sparql/icd10who-relevant-signature.sparql" tmp/icd10who_relevant_signature.txt; fi
if [ true  = true ] ; then robot --catalog catalog-v001.xml remove -i tmp/component-download-icd10who.owl.owl --select imports \
rename --mappings config/property-map-1.sssom.tsv --allow-missing-entities true \
rename --mappings config/property-map-2.sssom.tsv --allow-missing-entities true \
remove -T tmp/icd10who_relevant_signature.txt --select complement --select "classes individuals" --trim false \
remove -T tmp/icd10who_relevant_signature.txt --select individuals \
query \
--update ../sparql/[fix_omimps.ru](http://fix_omimps.ru/) \
--update ../sparql/[fix-labels-with-brackets.ru](http://fix-labels-with-brackets.ru/) \
--update ../sparql/[fix_hgnc_mappings.ru](http://fix_hgnc_mappings.ru/) \
--update ../sparql/[fix_complex_reification.ru](http://fix_complex_reification.ru/) \
remove -T config/properties.txt --select complement --select properties --trim true \
annotate --ontology-iri http://purl.obolibrary.org/obo/mondo/sources/icd10who.owl --version-iri http://purl.obolibrary.org/obo/mondo/sources/2023-08-31/icd10who.owl -o components/icd10who.owl; fi
if [ true  = true ] && [ true  = true ]; then robot --catalog catalog-v001.xml merge -I http://purl.obolibrary.org/obo/ncit.owl \
annotate --ontology-iri http://purl.obolibrary.org/obo/mondo-ingest/component-download-ncit.owl annotate -V http://purl.obolibrary.org/obo/mondo-ingest/releases/2023-08-31/component-download-ncit.owl --annotation owl:versionInfo 2023-08-31 -o tmp/component-download-ncit.owl.owl; fi
if [ true  = true ]; then robot --catalog catalog-v001.xml query -i "tmp/component-download-ncit.owl.owl" -q "../sparql/ncit-relevant-signature.sparql" tmp/ncit_relevant_signature.txt; fi
if [ false = false ] && [ true  = true ]; then robot --catalog catalog-v001.xml remove -i tmp/component-download-ncit.owl.owl --select imports \
rename --mappings config/property-map-1.sssom.tsv --allow-missing-entities true \
rename --mappings config/property-map-2.sssom.tsv --allow-missing-entities true \
query --update ../sparql/[rm_xref_by_prefix.ru](http://rm_xref_by_prefix.ru/) \
remove -T tmp/ncit_relevant_signature.txt --select complement --select "classes individuals" --trim false \
remove -T config/properties.txt --select complement --select properties --trim true \
remove --term "http://purl.obolibrary.org/obo/NCIT_C179199" --axioms "equivalent" \
annotate --ontology-iri http://purl.obolibrary.org/obo/mondo/sources/ncit.owl --version-iri http://purl.obolibrary.org/obo/mondo/sources/2023-08-31/ncit.owl -o components/ncit.owl; fi
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3537)
at java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:100)
at java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:130)
at java.base/sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:234)
at java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:304)
at java.base/sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282)
at java.base/sun.nio.cs.StreamEncoder.write(StreamEncoder.java:132)
at java.base/java.io.OutputStreamWriter.write(OutputStreamWriter.java:205)
at org.apache.jena.atlas.io.BufferingWriter.flushBuffer(BufferingWriter.java:132)
at org.apache.jena.atlas.io.BufferingWriter.output(BufferingWriter.java:126)
at org.apache.jena.atlas.io.BufferingWriter.write(BufferingWriter.java:175)
at org.apache.jena.atlas.io.IndentedWriter.write$(IndentedWriter.java:187)
at org.apache.jena.atlas.io.IndentedWriter.printOneChar(IndentedWriter.java:182)
at org.apache.jena.atlas.io.IndentedWriter.print(IndentedWriter.java:123)
at org.apache.jena.riot.out.quoted.QuotedURI.writeURI(QuotedURI.java:46)
at org.apache.jena.riot.out.NodeFormatterNT.formatURI(NodeFormatterNT.java:46)
at org.apache.jena.riot.out.NodeFormatterTTL.formatURI(NodeFormatterTTL.java:88)
at org.apache.jena.riot.out.NodeFormatterBase.formatURI(NodeFormatterBase.java:70)
at org.apache.jena.riot.out.NodeFormatterBase.format(NodeFormatterBase.java:43)
at org.apache.jena.riot.writer.TurtleShell.writeNode(TurtleShell.java:982)
at org.apache.jena.riot.writer.TurtleShell$ShellGraph.writePredicate(TurtleShell.java:750)
at org.apache.jena.riot.writer.TurtleShell$ShellGraph.writePredicateObjectList(TurtleShell.java:715)
at org.apache.jena.riot.writer.TurtleShell$ShellGraph.writePredicateObjectList(TurtleShell.java:693)
at org.apache.jena.riot.writer.TurtleShell$ShellGraph.writeNestedObject(TurtleShell.java:828)
at org.apache.jena.riot.writer.TurtleShell$ShellGraph.writeNestedObjectTopLevel(TurtleShell.java:796)
at org.apache.jena.riot.writer.TurtleShell$ShellGraph.writeBySubject(TurtleShell.java:618)
at org.apache.jena.riot.writer.TurtleShell$ShellGraph.writeGraph(TurtleShell.java:498)
at org.apache.jena.riot.writer.TurtleShell$ShellGraph.access$100(TurtleShell.java:111)
at org.apache.jena.riot.writer.TurtleShell.writeGraphTTL(TurtleShell.java:97)
at org.apache.jena.riot.writer.TurtleWriter$TurtleWriter$.write(TurtleWriter.java:44)
at org.apache.jena.riot.writer.TurtleWriter$TurtleWriter$.access$000(TurtleWriter.java:34)
at org.apache.jena.riot.writer.TurtleWriter.output(TurtleWriter.java:31)
make: *** [components/ncit.owl] Error 1

I tried this flag even though I didn't expect it to work for make test. It doesn't. Still running out of memory. Maybe we need to implement a similar flag?
IMP_LARGE=true # Global parameter to bypass handling of large imports

@matentzn
Copy link
Member

You need I think ca 18 GB memory in Java and 21 GB memory in docker to run this locally I think.. Do you have that configured?

Else, cant if you don't need to update sources, you could use COMP=false to bypass all the component generations?

@joeflack4
Copy link
Contributor Author

Yeah I have 28GB allocated in Docker but I didn't pass -Xmx 18g or whatever Java flag. If I'm running make test, I'm not sure how to enable that; maybe an edit to run.sh? We can look into that later.

I will try COMP=false and report back, thanks.

@joeflack4
Copy link
Contributor Author

Darn, looks like the COMP=false didn't work. Again, not the biggest priority; we can worry about this when you have time.

@matentzn
Copy link
Member

Why not? If you look at for example:

https://github.com/monarch-initiative/mondo-ingest/blob/main/src/ontology/mondo-ingest.Makefile#L39

COMP=false should prevent components to be rebuild. Or are you saying a non-component goals still needs too much memory?

@joeflack4
Copy link
Contributor Author

Hmm, maybe I was running it the wrong way. I tried:
COMP=false sh run.sh make test

I was thinking that the order of the flag shouldn't matter, but I'll try the below one now.
sh run.sh make COMP=false test

Here's all that remains from the log from yesterday. You can see it's doing something with tmp/component-download-ncit.owl.owl.

Log

		--update ../sparql/fix_complex_reification.ru \
	remove -T config/properties.txt --select complement --select properties --trim true \
	annotate --ontology-iri http://purl.obolibrary.org/obo/mondo/sources/icd10who.owl --version-iri http://purl.obolibrary.org/obo/mondo/sources/2023-09-11/icd10who.owl -o components/icd10who.owl; fi
if [ true  = true ] && [ true  = true ]; then robot --catalog catalog-v001.xml merge -I http://purl.obolibrary.org/obo/ncit.owl \
annotate --ontology-iri http://purl.obolibrary.org/obo/mondo-ingest/component-download-ncit.owl annotate -V http://purl.obolibrary.org/obo/mondo-ingest/releases/2023-09-11/component-download-ncit.owl --annotation owl:versionInfo 2023-09-11 -o tmp/component-download-ncit.owl.owl; fi
if [ true  = true ]; then robot --catalog catalog-v001.xml query -i "tmp/component-download-ncit.owl.owl" -q "../sparql/ncit-relevant-signature.sparql" tmp/ncit_relevant_signature.txt; fi
if [ false = false ] && [ true  = true ]; then robot --catalog catalog-v001.xml remove -i tmp/component-download-ncit.owl.owl --select imports \
	rename --mappings config/property-map-1.sssom.tsv --allow-missing-entities true \
	rename --mappings config/property-map-2.sssom.tsv --allow-missing-entities true \
	query --update ../sparql/rm_xref_by_prefix.ru \
	remove -T tmp/ncit_relevant_signature.txt --select complement --select "classes individuals" --trim false \
	remove -T config/properties.txt --select complement --select properties --trim true \
	remove --term "http://purl.obolibrary.org/obo/NCIT_C179199" --axioms "equivalent" \
	annotate --ontology-iri http://purl.obolibrary.org/obo/mondo/sources/ncit.owl --version-iri http://purl.obolibrary.org/obo/mondo/sources/2023-09-11/ncit.owl -o components/ncit.owl; fi
Killed
make: *** [mondo-ingest.Makefile:73: components/ncit.owl] Error 137
Command exited with non-zero status 2
### DEBUG STATS ###
Elapsed time: 39:55.68
Peak memory: 15397432 kb

I believe I also tried SKIP_HUGE=true earlier as well, but I'm not 100% certain.

@matentzn
Copy link
Member

COMP=false sh run.sh make test

If you do this, you are passing the COMP parameter to the shell script, not make. The second solution is the right one.

"Killed" means it ran out of memory, or rather, not often than not, that you gave more memory to the robot process (check run.sh) then you allocated to docker. Always make sure you are 15-20% below docker Max.

@joeflack4
Copy link
Contributor Author

Yeah I should have tried that earlier. It passed:

Finished running all tests successfully.
Elapsed time: 14:30.81
Peak memory: 8562068 kb

I know about Killed. I'll jot down a note about that robot memory mismatch.. good to remember if it ever comes up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants