-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in match-mondo-sources-all-lexical.py
#394
Conversation
3b30be4
to
a3db7fd
Compare
a3db7fd
to
4e82092
Compare
match-mondo-sources-all-lexical.py
2e416fb
to
71fd56a
Compare
da2a7f4
to
2ed4612
Compare
@@ -3,7 +3,7 @@ rdf,http://www.w3.org/1999/02/22-rdf-syntax-ns# | |||
rdfs,http://www.w3.org/2000/01/rdf-schema# | |||
xsd,http://www.w3.org/2001/XMLSchema# | |||
owl,http://www.w3.org/2002/07/owl# | |||
oio,http://www.geneontology.org/formats/oboInOwl# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bugfix~: oio
removed from prefixes.csv
@matentzn
@matentzn It looks like this has unintended consequences; may want to revert.
Details
Overview
The advantage of removing this is that lexmatch succeeds without needing the changes that Charlie and I introduced to clean_prefix_map()
, namely converter.get_prefixes(include_synonyms=True)
.
The negative is that now OAK is not checking for lexical matches on synonyms identified by the oio:hasExactSynonym
field.
You can see a sample of the differences in the outputs by looking at the "b4" and "after" spreadsheets in the outputs GSheet. Comparing the before/after for the first of the outputs, called "oak output prefiltered", the 'after' version has 8 less entries of a total 198 entries in the 'before' set, a reduction of 4%.
Choices
a. Leave oio
out of prefixes.csv
. Forfeit matching on oio:hasExactSynonym
.
b. Keep oio
in prefixes.csv
in tandem with [Charlie and Joe's bugfix](the changes that Charlie and I introduced) which fixes the perceived missing oio
prefix (which is now a prefix synonym of oboInOwl
).
c. Change OAK so that even if we remove oio
from prefixes.csv
, there is no lossiness (more below).
I think we should choose 'b', and possibly do 'c' separately, unless matching on oio:hasExactSynonym
is considered unimportant, in which case we can just go with 'a'.
Additional discussion
I want to expand on 'c', possible OAK changes. I think we want OAK to somehow treat oio
and oboInOwl
as the same, but this is tricky and I can only begin to think about what to be done here. I can only guess and can't speak to the full nature of why, when oio
is left out of prefixes.csv
, lexmatches on oio:hasExactSynonym
go away.
I wonder if it might have something to do with this, at the top of OAK's lexical_indexer.py
:
QUALIFIER_DICT = {
"exact": "oio:hasExactSynonym",
"broad": "oio:hasBroadSynonym",
"narrow": "oio:hasNarrowSynonym",
"related": "oio:hasRelatedSynonym",
}
If you guys think OAK or SemanticSQL needs any sort of update, maybe you should tag Chris here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The negative is that now OAK is not checking for lexical matches on synonyms identified by the oio:hasExactSynonym field.
This makes no sense. The oio:hasExactSynonym
should not exist, if the input prefix map does not contain it.
I will look into it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matentzn If I understand you correctly, that's what I'm saying. If prefixes.csv
contains oio
, then when I run the pipeline, it does matches on just on rdfs:label
but also oio:hasExactSynonym
, and you can see that reflected in my subsample outputs. And if I remove oio
from prefixes.csv
I no longer see those in the outputs.
So if we want to keep matching on oio:hasExactSynonym
, it seems that we should keep oio
in prefixes.csv
, even though that causes it to be non-bijective. It's only passed to SemanticSQL anyway. Alternative is to ask OAK to rename oio
to oboInOwl
.
Also, if you change your mind and want to keep oio
in prefixes.csv
, then I think we need to merge mapping-commons/sssom-py#485 first, otherwise error will be thrown. So maybe best to merge this PR now and then if we want to put oio
back, we can do it in a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of today here's where we are:
Nico has looked into further and opened: INCATools/ontology-access-kit#698
Joe looked to see other places in which merged.db
is used for identifying possible side effects of changing oio
to oboInOwl
in prefixes.csv
when constructing merged.db
. These other usages were found, but do not know if unintended side effects could occur:
lexmatch-sssom-compare.py
mappings.ipynb
oak-lexmatch-sssom-compare.ipynb
oak-lexmatch-sssom-compare-checkpoint.ipynb
remove-mappings-test.ipynb
synchronizeMappingsWithMondo-checkpoint.ipynb
Nico said he would put in some time to solve this, so Joe is pausing activity on this.
@@ -145,6 +194,9 @@ def run(input: str, config: str, rules: str, rejects: str, output: str): | |||
kwargs = {"subject_id": ("MONDO:%",), "object_id": prefix_args} | |||
with open(str(Path(output.replace("lexical", "lexical-2"))), "w") as f: | |||
filter_file(input=str(Path(output)), output=f, **kwargs) | |||
t1 = datetime.now() # todo: temp | |||
print('match-mondo-sources-all-lexical complete in seconds:', (t1 - t0).seconds) # todo temp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outputs
@matentzn @twhetzel I will be uploading outputs here.
You'll notice there are some outputs labeled 'subsample'. This is an exercise I did to quickly compare the before/after ramifications of removing oio
from `prefixes.csv.
For the full output, look at these sheets:
after - mondo-sources-all-lexical-2.sssom.tsv
after - mondo-sources-all-lexical.sssom.tsv
If you're having trouble loading those tabs (I was, probably because they're 20MB and 50MB), I uploaded the raw files that you can download here.
2ed4612
to
80eb1d1
Compare
63adbb6
to
4d2f551
Compare
4d2f551
to
d4fd48e
Compare
d4fd48e
to
cebb558
Compare
- Bugfix: AttributeError: 'tuple' object has no attribute 'pop': wrong datatype for metadata was being passed to lexical_index_to_sssom() - Bugfix: Several other bugs in mondo-ingest, and upgrading OAK/sssom-py/curies to fix other bugs related to prefix maps. - Update: mondo.sssom.config.yml: Commented out duplicate prefix 'oio' - Update: prefixes.csv: Removed duplicate prefix oio - Update: Python requirements: Upgraded curies for bugfix involving get_prefixes(include_synonyms) - Update: Not bugfixes, but did some codestyle fixes: (i) converted regular comments to docstring, (ii) added missing docstring, (iii) renamed built-in 'input' param
cebb558
to
d2328dc
Compare
# Implemented `meta` param in `lexical_index_to_sssom` | ||
|
||
meta = get_metadata_and_prefix_map(config) | ||
"""Run the script""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AttributeError: 'Converter' object has no attribute 'items'
(fixed)
You can mark this resolved. It's fixed.
Details
I thought this was ready to merge because it ran successfully through the debugger. However I'm got error in OAK now: AttributeError: 'Converter' object has no attribute 'items'
.
ls Traceback (most recent call last):
File "/work/src/ontology/../scripts/[match-mondo-sources-all-lexical.py](http://match-mondo-sources-all-lexical.py/)", line 176, in <module>
main()
File "/usr/local/lib/python3.10/dist-packages/click/[core.py](http://core.py/)", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/click/[core.py](http://core.py/)", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/dist-packages/click/[core.py](http://core.py/)", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/dist-packages/click/[core.py](http://core.py/)", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/dist-packages/click/[core.py](http://core.py/)", line 783, in invoke
return __callback(*args, **kwargs)
File "/work/src/ontology/../scripts/[match-mondo-sources-all-lexical.py](http://match-mondo-sources-all-lexical.py/)", line 122, in run
msdf = lexical_index_to_sssom(oi, lexical_index, ruleset=ruleset, prefix_map=converter)
File "/usr/local/lib/python3.10/dist-packages/oaklib/utilities/lexical/[lexical_indexer.py](http://lexical_indexer.py/)", line 317, in lexical_index_to_sssom
{k: v for k, v in prefix_map.items() if k not in meta.prefix_map.keys()}
AttributeError: 'Converter' object has no attribute 'items'
I upgraded OAK within ODK but that didn't work. However, upgrading my ODK image did.
@@ -118,9 +119,9 @@ def run(input: str, config: str, rules: str, rejects: str, output: str): | |||
save_lexical_index(lexical_index, OUT_INDEX_DB) | |||
|
|||
if rules: | |||
msdf = lexical_index_to_sssom(oi, lexical_index, ruleset=ruleset, meta=meta) | |||
msdf = lexical_index_to_sssom(oi, lexical_index, ruleset=ruleset, prefix_map=converter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to close this, as this is essentially supplanted by #397. There's just some minor stylistic stuff in here that we can re-open and merge if we really want. |
Updates
Bugfixes: match-mondo-sources-all-lexical.py
tring, (iii) renamed built-in 'input' param
Sub-tasks
sssom-py
PR: Bugfix:clean_prefix_map()
not detecting prefix synonyms mapping-commons/sssom-py#485sssom-py
if possiblesssom-py
andcuries
and use those recent changes instead of my PR changes, if needed (wasn't necessary for this PR but upgradedcuries
anyway. may need in near future)oio
andoboInOwl
both in Mondo), make PR inmondo
repo too (doesn't seems so. seems to be related tooio
inconfig/prefixes.csv
)AttributeError: 'tuple' object has no attribute 'pop'
mondo-ingest
oio
frommondo.sssom.config.yaml
curie_map
andprefixes.csv
Related
clean_prefix_map()
not detecting prefix synonyms mapping-commons/sssom-py#485clean_prefix_map()
(.get_prefix(include_synonyms=True)
)lexical_index_to_sssom()
EPM param INCATools/ontology-access-kit#697oio
tooboInOwl
in part of our pipeline.@souzadevinicius FYI. Harshad is out ATM so I'm helping out with a bug that appeared here, but it opened a can of worms, involving conflicting prefixes that are causing trouble when
mondo-ingest
interacts with SSSOM and OAK. And we've been decided how to fix this, involving using extended prefix maps instead of plain, flat, bijective prefix maps (key: val), etc.