json_to_obo.php script parsing #22

Ferrisx4 · 2021-04-01T18:53:04Z

Discussion:

Currently, the json file pulled down from KEGG has long strings of information in the 'name' key. For example:

"name":"106672041 glucose-6-phosphate isomerase\tK01810 GPI; glucose-6-phosphate isomerase [EC:5.3.1.9]"

When converted with the script, this becomes

[Term]
id: KEGG:106670716 glucose-6-phosphate 1-epimerase	K01792 E5.1.3.15; glucose-6-phosphate 1-epimerase [EC:5.1.3.15]
name: 106670716 glucose-6-phosphate 1-epimerase	K01792 E5.1.3.15; glucose-6-phosphate 1-epimerase [EC:5.1.3.15]
is_a: KEGG:00010 Glycolysis / Gluconeogenesis [PATH:clec00010] (clec00001)

Recently on a discussion on the Tripal git issue queue, @spficklin recommended that we try to parse the values out of this string into its constituent parts, particularly the ID field, which was the crux of that issue. As seen in the example above, the ID and name fields in the generated OBO are largely repetitive.

Things to consider

ID should be a relatively short value
What terms the Tripal OBO importer supports
What term can be assigned to each portion of the long 'name' string

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

json_to_obo.php script parsing #22

json_to_obo.php script parsing #22

Ferrisx4 commented Apr 1, 2021

json_to_obo.php script parsing #22

json_to_obo.php script parsing #22

Comments

Ferrisx4 commented Apr 1, 2021

Discussion:

Things to consider