Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json_to_obo.php script parsing #22

Open
Ferrisx4 opened this issue Apr 1, 2021 · 0 comments
Open

json_to_obo.php script parsing #22

Ferrisx4 opened this issue Apr 1, 2021 · 0 comments

Comments

@Ferrisx4
Copy link
Member

Ferrisx4 commented Apr 1, 2021

Discussion:

Currently, the json file pulled down from KEGG has long strings of information in the 'name' key. For example:

"name":"106672041 glucose-6-phosphate isomerase\tK01810 GPI; glucose-6-phosphate isomerase [EC:5.3.1.9]"

When converted with the script, this becomes

[Term]
id: KEGG:106670716 glucose-6-phosphate 1-epimerase	K01792 E5.1.3.15; glucose-6-phosphate 1-epimerase [EC:5.1.3.15]
name: 106670716 glucose-6-phosphate 1-epimerase	K01792 E5.1.3.15; glucose-6-phosphate 1-epimerase [EC:5.1.3.15]
is_a: KEGG:00010 Glycolysis / Gluconeogenesis [PATH:clec00010] (clec00001)

Recently on a discussion on the Tripal git issue queue, @spficklin recommended that we try to parse the values out of this string into its constituent parts, particularly the ID field, which was the crux of that issue. As seen in the example above, the ID and name fields in the generated OBO are largely repetitive.

Things to consider

ID should be a relatively short value
What terms the Tripal OBO importer supports
What term can be assigned to each portion of the long 'name' string

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant