-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICD11foundation x-ref #7629
ICD11foundation x-ref #7629
Conversation
Added all the ICD11foundation x-ref that were reported in the exact lexical matching file.
FYI: @matentzn @joeflack4 @twhetzel |
Proxy merges... 👎 sigh.... Note: This means that we have even more issues with the orphanet-icd11 mappings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting.. not sure how to account for illegal proxy merges coming in through lexmatch. Forgive my ignorance on this part of the pipeline, but aren't the lexmatch matches just suggestions? If what they suggest is an illegal proxy merge, then would it not have a step to not take a look at them and probably discard those as invalid, non-matches?
I see you uploaded the ICD11foundation 240423 exact lex match, and have a tab for "proxymerge review".
Just adding some additional investigation into this. I took a look at the original MONDO->ORDO->ICD11 SSSOM that I created, and I did find some illegal proxy merges there as well... see MONDO->ORDO->ICD11 SSSOM illegal proxies google sheet. There are 3 MONDO IDs there which have duplicates / illegal proxy merges: MONDO:0005823, MONDO:0013626, and MONDO:0019171. Of these, I only see that MONDO:0019171 appears in Sabrina's lexmatch google sheet tab 'proxymerge review'. But when I look at the tab with the actual lexmatch results, I don't see that ID listed.
@joeflack4 I think it is part of the review. The lex matching is a suggestion, and we rarely realize that there is a proxy merge until the QC checks tell us. |
Use OAK to validate mappings rather than rely on waiting for a GH action https://github.com/INCATools/ontology-access-kit/blob/main/notebooks/Commands/ValidateMappings.ipynb |
FYI I added this to the tech meeting agenda tomorrow. @cmungall @matentzn @twhetzel If OAK has means to validate illegal proxy matches, shouldn't I guess we could validate lexmatch files in @cmungall I finagled the lexmatch file until it passed |
@sabrinatoro I am looking over the mondo-edit.obo file, one question is why does
And for
I gather these were added first by the mondo-orphanet-icd11 xref work, and then again by the mondo-icd11 lexical alignment work. If possible, I think these types of duplications should be removed. If you agree to remove these, let me know and I can find these if needed. |
This is correct, I think these lines are duplicated because they were added via the orphanet-icd11 mappings and via the exact lexical matching. |
I don't know if |
Definitely normalise this it will do the trick |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I addition to needing to run NORM
to collapse the cases where the same Mondo ID has xrefs to the same source and source id, e.g. xref: icd11.foundation:1173858031 {source="MONDO:equivalentTo"} and xref: icd11.foundation:1173858031 {source="Orphanet:79237", source="MONDO:equivalentTo"}
there are 85 cases where the same Mondo ID is mapped to two different ICD11 terms. I was not expecting terms to be mapped to >1 ICD11 term given the workflow to add the ICD11 mappings gathered from Orphanet first and then do the lexical alignment.
I reviewed the cases where the Mondo term was mapped to >1 ICD11 term and added comments here in the Sheet "Curated Dups".
The general cases are:
- one of the ICD11 xRefs is in the Extension Codes branch --> I'm not sure if both xRefs to ICD11 terms in the Extension Code branch should be included. These are marked in the column "Correct xRef" as
Both match
and the "Notes" column indicates which xRef is in the Extension Codes branch - there seem to be a few xRefs that are mis-mapped based on the ICD11 structure --> These are marked with what should be the correct mapping in the "Correct xRef" column
- a few that need further review due to how the Mondo term is labeled, defined, and/or the synonyms it has --> These are marked in the "Correct xRef" column as
Needs Review
I used the WHO-FIC Maintenance Platform to look up these ICD11 codes. However since this site includes unreleased, work in progress development and not all ICD11 codes mapped to Mondo in this PR can be found in the ICD11 browser, e.g. 249843059
which is mapped to MONDO:0016691, pilocytic astrocytoma along with 424726772
also being mapped to this Mondo term, should mappings to ICD11 that are not found in the main released version be included, will these ICD11 codes eventually be in the released version, should we have used a different source file for ICD11?
PS - I think we are also missing Orcid or have some mixed source information as well
+1 on most definitely adding 1 or more orcids
This is a very complex issue. We treat structure merely as evidence for the correctness of a mapping. The goal of determining exactness is to determine "intent" (whether or not explicitly reflected in the structure of the ontology) and approach mapping from a "practical POV" (nothing is truly an exact match, but if it "appears the same" and pretending it is the same does impact diagnostic tools adversely, we assert it). I would break out the 85, make a new PR for these and otherwise merge it, but I understand from slack that you prefer to be more careful. |
Since there is a newer, production version of ICD11 Foundation that we learned about this week this PR will be closed and the lexical alignment will be re-run next week using this newer version. |
Added all the ICD11foundation x-ref that were reported in the exact lexical matching file.
File is here: https://raw.githubusercontent.com/monarch-initiative/mondo-ingest/main/src/ontology/lexmatch/unmapped_icd11foundation_lex_exact.tsv