You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Not crucial or urgent, and I'm not certain this is a problem, but here are my musings:
It seems likely that every time the validator is run (including import), the TRAPI YAML and the Biolink YAML are probably fetched and parsed
Parsing YAML in Python is REEEEALY slow, like stunningly slow. Parsing the same model from JSON is 100x faster
Parsing the Biolink YAML and the TRAPI YAML each take like 0.5 seconds. Parsing the equivalent JSON is like 0.005 seconds.
Plus, are we also downloading these files each time from GitHub?
When someone clicks on a parent PK in the ARAX GUI, ARAX ends up validating a whole batch of documents in parallel, each probably paying the cost of downloading and parsing the YAMLs. Probably each process pays a penalty of over a second?
I wonder if some clever caching would improve each validation by at least a second.
Maybe not huge, but when someone is waiting on the result, not paying a 1 second cost 10 times may be a benefit.
Not trivial though. Where do you store the caches? How do you make the JSON conversion storage thread and concurrent process safe?
Maybe a job for ARAX, not for the validator? I don't know.
The text was updated successfully, but these errors were encountered:
Not crucial or urgent, and I'm not certain this is a problem, but here are my musings:
I wonder if some clever caching would improve each validation by at least a second.
Maybe not huge, but when someone is waiting on the result, not paying a 1 second cost 10 times may be a benefit.
Not trivial though. Where do you store the caches? How do you make the JSON conversion storage thread and concurrent process safe?
Maybe a job for ARAX, not for the validator? I don't know.
The text was updated successfully, but these errors were encountered: