-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lexmatch revamp #397
Lexmatch revamp #397
Conversation
@@ -85,7 +85,8 @@ def main(verbose: int, quiet: bool): | |||
def run(input: str, config: str, rules: str, rejects: str, output: str): | |||
# Implemented `meta` param in `lexical_index_to_sssom` | |||
|
|||
meta = get_metadata_and_prefix_map(config) | |||
#meta = get_metadata_and_prefix_map(config) | |||
meta = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lexical_index_to_sssom(meta=None)
Nico wrote:
skips
meta
insrc/scripts/match-mondo-sources-all-lexical.py
which is not urgently needed and was causing all the problems @joeflack4 is fixing
You are more familiar with all of the lexmatch outputs and you say that running it locally was fine, so it seems there are no significant negative side effects from this.
You say it is not urgently needed, so I take it that this means that there is still a desire to eventually do what I did in #394 (pass a Converter
) or similar, perhaps after addressing INCATools/ontology-access-kit#698. So perhaps I should make like a medium priority ticket for this, or keep my current PR open and rebase from / conform to what you have here.
Not important but to clarify, this wasn't causing all the problems I am fixing. Rather it is a solution for the last, only, conditional, remaining unfixed problem in #394.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joeflack4 have you also run the entire pipeline locally? Did you find that Nico's solution to "ditch meta" worked as intended?
I haven't run the whole pipeline locally. I'll give it a go but I'm terrified of it; I've heard it takes over 24 hours and sometimes this can interfere w/ other projects I'm working on. My computer wasn't actually capable of doing this until last month (without some skipping of things like NCIT). When I've asked Nico if he wants me to do this before, he's said I could hold off. But I'll at least give it a go.
Regarding checking myself the results of meta=None
, I'll give that a go. I was trusting Nico and it sound like he got the results he expected, but good to have second eyes especially if you think so. Another difference I just noticed as well is that I've also been passing no meta
(same as passing None
), but I've been passing prefix_map
instead. I'm not sure it matters in this case but I'd better remove that as well and run in a more closely to what Nico is running. I'd check out his PR as well but I think better to run on my PR because Nico made some other changes to prefixes.csv
, but I want to more closely compare the outputs I got before/after with simply the meta=None
/prefix_map=None
change.
@@ -85,7 +85,8 @@ def main(verbose: int, quiet: bool): | |||
def run(input: str, config: str, rules: str, rejects: str, output: str): | |||
# Implemented `meta` param in `lexical_index_to_sssom` | |||
|
|||
meta = get_metadata_and_prefix_map(config) | |||
#meta = get_metadata_and_prefix_map(config) | |||
meta = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#394 redundancy
The main difference is that #397 does passes meta=None
to lexical_index_to_sssom()
, whereas #394 passes a Converter
. Supposedly #397 approach is fine.
Everything else in that PR is now cosmetic, so i can merge just those changes afterwards or ignore and close.
My instinct is to feel a little irked that I spent a lot of work and found a way to get the pipeline working without throwing the oio
err (although #394 currently has oio
removed and loses outputs), and that my work had I think the same overall effect but was supplanted by Nico and Charlie's PRs. But maybe I shouldn't feel irked; I laid the groundwork and this is just the nature of collaboration, and Nico/Charlie are more senior here and have a better idea of how things should exactly be.
Also, I see that this PR does add more than #394, such as missing prefixes.
@@ -227,12 +227,27 @@ ICD10CM,http://purl.bioontology.org/ontology/ICD10CM/ | |||
ICD10CM2,https://icd.codes/icd10cm/ | |||
ICD10WHO,https://icd.who.int/browse10/2019/en#/ | |||
ICD10WHO2010,http://apps.who.int/classifications/icd10/browse/2010/en#/ | |||
OMIMPS,https://www.omim.org/phenotypicSeries/PS | |||
OMIMPS,https://omim.org/phenotypicSeries/PS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefixes.csv
updates
That will be supplanted eventually by dynamic creation of prefixes.csv
but these changes seem necessary for now.
Harshad and I should have thought to check the status of this sooner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I trust your review of the outputs and if you say it is working locally and that you're verifying by running the pipeline, I'm fine for you / us to merge this after that verification has been made!
@joeflack4 have you also run the entire pipeline locally? Did you find that Nico's solution to "ditch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved.. Phew!
I looked through all the results - They look correct to me. A lot of issues fixed, way beyond the the original problem with the prefixes.
I wont be looking at this anymore this week - please merge this, write down any question you have in a Google docs so we can document things more permanently, and I will, once I see the merge, immediately make another run.
@joeflack4 if you want to retain the contents of your comments, please move them to issues.
This was a bit painful, sorry about that. The source of all problems, was a major refactor of sssom-py
and curies
, which led to 3 completely independent major issues all around the pipeline. The debugging process has led to some good fixes in curies
(thanks @joeflack4), and I hope some better understanding of what is happening. Lets keep one thing in mind for the future:
Never ever again do we merge a branch unless a full run of the pipeline was successful. I can be responsible for the runs until we have a runner on GCP.
@twhetzel I will post a comment on slack now to advice Nicole and Sabrina to push forward the syncing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nico mentioned all looks good and it looks good to me at this point
# SNOMEDCT_US_2021_09_01: http://identifiers.org/snomedct/ | ||
# SNOMEDCT_US_2022_07_31: http://identifiers.org/snomedct/ | ||
# SNOMEDCT_US_2022_10_31: http://identifiers.org/snomedct/ | ||
# SNOMEDCT_US_2023_02_28: http://identifiers.org/snomedct/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matentzn why are these commented out vs. removed or what are these comments a reminder of in the future?
This PR
prefix_map
withconverter
parameter insrc/scripts/lexmatch-sssom-compare.py
meta
insrc/scripts/match-mondo-sources-all-lexical.py
which is not urgently needed and was causing all the problems @joeflack4 is fixingsrc/ontology/config/prefixes.csv
to cover all relevant prefixes for the matchingsrc/ontology/metadata/*
configs to avoid conflicting namespaces.@joeflack4 I am not reading your answers etc - please review this PR with @twhetzel and see if you have any concern with these changes.
To test this PR if you like:
I ran it locally - now testing the whole pipeline (I circumvent the main problem you are trying to solve by simply ditching
meta
here, hoping its not needed).