Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICD11foundation x-ref #7629

Closed
wants to merge 3 commits into from
Closed

ICD11foundation x-ref #7629

wants to merge 3 commits into from

Conversation

sabrinatoro
Copy link
Collaborator

Added all the ICD11foundation x-ref that were reported in the exact lexical matching file.
File is here: https://raw.githubusercontent.com/monarch-initiative/mondo-ingest/main/src/ontology/lexmatch/unmapped_icd11foundation_lex_exact.tsv

Added all the ICD11foundation x-ref that were reported in the exact lexical matching file.
@sabrinatoro sabrinatoro requested a review from twhetzel as a code owner April 23, 2024 23:11
@sabrinatoro
Copy link
Collaborator Author

FYI: @matentzn @joeflack4 @twhetzel

@sabrinatoro
Copy link
Collaborator Author

Proxy merges... 👎 sigh....

Note: This means that we have even more issues with the orphanet-icd11 mappings.

Copy link
Collaborator

@joeflack4 joeflack4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting.. not sure how to account for illegal proxy merges coming in through lexmatch. Forgive my ignorance on this part of the pipeline, but aren't the lexmatch matches just suggestions? If what they suggest is an illegal proxy merge, then would it not have a step to not take a look at them and probably discard those as invalid, non-matches?

I see you uploaded the ICD11foundation 240423 exact lex match, and have a tab for "proxymerge review".

Just adding some additional investigation into this. I took a look at the original MONDO->ORDO->ICD11 SSSOM that I created, and I did find some illegal proxy merges there as well... see MONDO->ORDO->ICD11 SSSOM illegal proxies google sheet. There are 3 MONDO IDs there which have duplicates / illegal proxy merges: MONDO:0005823, MONDO:0013626, and MONDO:0019171. Of these, I only see that MONDO:0019171 appears in Sabrina's lexmatch google sheet tab 'proxymerge review'. But when I look at the tab with the actual lexmatch results, I don't see that ID listed.

@sabrinatoro
Copy link
Collaborator Author

@joeflack4 I think it is part of the review. The lex matching is a suggestion, and we rarely realize that there is a proxy merge until the QC checks tell us.
As we move forward, this will happen less often because the ICD11 code will not be suggested anymore (since it is already in Mondo).
The very good thing (even though it is annoying to review-but it is my job :-) ) is that it unveil issues about the orphanet-ICD11 mappings. So it is all fine.

@cmungall
Copy link
Member

Use OAK to validate mappings rather than rely on waiting for a GH action

https://github.com/INCATools/ontology-access-kit/blob/main/notebooks/Commands/ValidateMappings.ipynb

@joeflack4
Copy link
Collaborator

joeflack4 commented Apr 25, 2024

FYI I added this to the tech meeting agenda tomorrow.


@cmungall @matentzn @twhetzel If OAK has means to validate illegal proxy matches, shouldn't sssom-py also? I just checked and it did not detect.

I guess we could validate lexmatch files in mondo-ingest, but we'd have to do some refactoring. They're like half robot template / half SSSOM.

@cmungall I finagled the lexmatch file until it passed sssom-py validation, but when I ran runoak validate-mappings on it, I got a not-validation related error:

@twhetzel
Copy link
Collaborator

twhetzel commented Apr 26, 2024

@sabrinatoro I am looking over the mondo-edit.obo file, one question is why does MONDO:0009255 have both of these lines:

xref: icd11.foundation:1173858031 {source="MONDO:equivalentTo"}
xref: icd11.foundation:1173858031 {source="Orphanet:79237", source="MONDO:equivalentTo"}

And for MONDO:0000087 these are duplicated:

xref: icd11.foundation:2081858551 {source="MONDO:equivalentTo"}
xref: icd11.foundation:2081858551 {source="MONDO:equivalentTo", source="Orphanet:35981"}

I gather these were added first by the mondo-orphanet-icd11 xref work, and then again by the mondo-icd11 lexical alignment work. If possible, I think these types of duplications should be removed.

If you agree to remove these, let me know and I can find these if needed.

@sabrinatoro
Copy link
Collaborator Author

This is correct, I think these lines are duplicated because they were added via the orphanet-icd11 mappings and via the exact lexical matching.
I agree, these lines should be merged. I wonder if a normalization step would do the trick...

@twhetzel
Copy link
Collaborator

I don't know if NORM will do that, but worth a try. If not, my next try would be a SPARQL query to find these and manually remove.

@matentzn
Copy link
Member

Definitely normalise this it will do the trick

Copy link
Collaborator

@twhetzel twhetzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addition to needing to run NORM to collapse the cases where the same Mondo ID has xrefs to the same source and source id, e.g. xref: icd11.foundation:1173858031 {source="MONDO:equivalentTo"} and xref: icd11.foundation:1173858031 {source="Orphanet:79237", source="MONDO:equivalentTo"} there are 85 cases where the same Mondo ID is mapped to two different ICD11 terms. I was not expecting terms to be mapped to >1 ICD11 term given the workflow to add the ICD11 mappings gathered from Orphanet first and then do the lexical alignment.

I reviewed the cases where the Mondo term was mapped to >1 ICD11 term and added comments here in the Sheet "Curated Dups".

The general cases are:

  • one of the ICD11 xRefs is in the Extension Codes branch --> I'm not sure if both xRefs to ICD11 terms in the Extension Code branch should be included. These are marked in the column "Correct xRef" as Both match and the "Notes" column indicates which xRef is in the Extension Codes branch
  • there seem to be a few xRefs that are mis-mapped based on the ICD11 structure --> These are marked with what should be the correct mapping in the "Correct xRef" column
  • a few that need further review due to how the Mondo term is labeled, defined, and/or the synonyms it has --> These are marked in the "Correct xRef" column as Needs Review

I used the WHO-FIC Maintenance Platform to look up these ICD11 codes. However since this site includes unreleased, work in progress development and not all ICD11 codes mapped to Mondo in this PR can be found in the ICD11 browser, e.g. 249843059 which is mapped to MONDO:0016691, pilocytic astrocytoma along with 424726772 also being mapped to this Mondo term, should mappings to ICD11 that are not found in the main released version be included, will these ICD11 codes eventually be in the released version, should we have used a different source file for ICD11?

PS - I think we are also missing Orcid or have some mixed source information as well

@sabrinatoro sabrinatoro marked this pull request as draft April 30, 2024 00:22
@matentzn
Copy link
Member

+1 on most definitely adding 1 or more orcids

there seem to be a few xRefs that are mis-mapped based on the ICD11 structure

This is a very complex issue. We treat structure merely as evidence for the correctness of a mapping. The goal of determining exactness is to determine "intent" (whether or not explicitly reflected in the structure of the ontology) and approach mapping from a "practical POV" (nothing is truly an exact match, but if it "appears the same" and pretending it is the same does impact diagnostic tools adversely, we assert it).

I would break out the 85, make a new PR for these and otherwise merge it, but I understand from slack that you prefer to be more careful.

@twhetzel
Copy link
Collaborator

Since there is a newer, production version of ICD11 Foundation that we learned about this week this PR will be closed and the lexical alignment will be re-run next week using this newer version.

@twhetzel twhetzel closed this May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants