bionlp

This project is a regular expression based information extraction system that extracts chemical reactions from biological publications. The code here depends on several datasets and REST APIs that are not publicly available. Examples of the structure and contents of these datasets are included in the report.

Review the report for project overview and evaluation results: https://docs.google.com/document/d/110WzRsLv0m4Aof3l3V1BoJllMLr_t6i6ML1tuyv46qU/edit

Where to start

train.py - methods to generate the training set
patterns.py - the set of regular expression patterns along with basic test cases
evaluate.py - applies the patterns on the training set and computes statistics

Utilities

chem_canonicalizer.py - uses Indigo to convert between InChI and SMILES notation
smiles_map.py - a cache of chemical names to SMILES, generated via CIR
smiles_inchi.py - a cache of chemical names to InChI, generated via CIR
chemtagger.py - simple library to interface with ChemicalTagger API (custom API wrapper around ChemicalTagger library http://chemicaltagger.ch.cam.ac.uk/)

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
indigo		indigo
.gitignore		.gitignore
README.md		README.md
chem_canonicalizer.py		chem_canonicalizer.py
chemtagger.py		chemtagger.py
cirpy.py		cirpy.py
evaluate.py		evaluate.py
inchi_map.py		inchi_map.py
parse_utils.py		parse_utils.py
patterns.py		patterns.py
smiles_map.py		smiles_map.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bionlp

Where to start

Utilities

About

Releases

Packages

Languages

jtsui/bionlp

Folders and files

Latest commit

History

Repository files navigation

bionlp

Where to start

Utilities

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages