-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DO NOT MERGE: Bug: Synonym Sync: Duplicate rows - build #723
base: bugfix-syn-sync-dupe-rows-684
Are you sure you want to change the base?
DO NOT MERGE: Bug: Synonym Sync: Duplicate rows - build #723
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure this has nothing to do with this PR, and I'm sure this has been observed elsewhere, but I'm wondering why:
mondo-omim-genes.robot.tsv
has only 20 lines changed, but mondo-omim-genes.robot.owl
has over 30k.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, what is the diff? The most typical reason for huge diffs are genid s, so generated identifiers for anonymous expression necessary for RDF XML
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matentzn Ah, I see, you are correct. These are bnode identifiers.
<owl:Class rdf:about="http://purl.obolibrary.org/obo/MONDO_0013913">
- <rdfs:subClassOf rdf:nodeID="genid5482"/>
+ <rdfs:subClassOf rdf:nodeID="genid5478"/>
</owl:Class>
- <owl:Restriction rdf:nodeID="genid5482">
+ <owl:Restriction rdf:nodeID="genid5478">
<owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0004003"/>
<owl:someValuesFrom rdf:resource="http://identifiers.org/hgnc/11528"/>
</owl:Restriction>
<owl:Axiom>
<owl:annotatedSource rdf:resource="http://purl.obolibrary.org/obo/MONDO_0013913"/>
<owl:annotatedProperty rdf:resource="http://www.w3.org/2000/01/rdf-schema#subClassOf"/>
- <owl:annotatedTarget rdf:nodeID="genid5482"/>
+ <owl:annotatedTarget rdf:nodeID="genid5478"/>
<oboInOwl:source>OMIM:614840</oboInOwl:source>
</owl:Axiom>
Not the hugest problem, but would make reviews easier if these IDs were deterministic. I have some Python code to do that in the OMIM pipeline, but that wouldn't be helpful here. Not that we have to worry about it now, but do you think there's a way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, switch to functional syntax in ROBOT! convert -f ofn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. But not in RDF/XML.
I have begun to take a great liking for OFN now not because of its syntax, but because of its determinism in general, and the ease of diffing it, for that reason as well as fewer lines. This is now a regular part of some of my workflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@twhetzel This file has nothing to do with this PR. But any idea why this has so many new rows?
Note that the other OMIM unmapped files do not have such a large diff:
docs/reports/unmapped_omim.md | 6 +
.../lexmatch/mondo-only/unmapped_omim_mondo.tsv | 7 +-
src/ontology/lexmatch/unmapped_omim_lex.tsv | 2 +
src/ontology/reports/omim_unmapped_terms.tsv | 6 +
rc/ontology/unmapped/omim-unmapped.owl | 45170 +++++-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General review
Other than some OMIM related oddities not related to this PR, LGTM!
Build for:
Best to check results in: the Google Sheet
Related: