-
Notifications
You must be signed in to change notification settings - Fork 1
TODO
hrybacki edited this page Mar 24, 2013
·
5 revisions
- Be ready for people to help.
- Get some working examples of how our code works
- Build a large test bank for people to work with and build grammars for.
- Document class
- Create a pretty print function for displaying Document contents
- Parser classes
- Complete "abbreviated journal" to "unabbreviated journal" tool.
- NOTE: Look into what pubmed has done toward this already
- Complete "abbreviated journal" to "unabbreviated journal" tool.
- Fetcher classes
- Pubmed
- Need to determine resolutionToken lifespan
- Store meta-collection data i.e. query used, source obtained from, and timestamp
- Need to know who, when, and from which batch. I.E.: user.datetime.query.pointers to all documents/raw data collected.
- Think about saving all captured information to disk -- json?
- local storage?
- database?
- used in conjunction with merges/conflict resolutions
- Need to know who, when, and from which batch. I.E.: user.datetime.query.pointers to all documents/raw data collected.
- Pubmed
- CorpusController class
- Setup logging?
- Database stuff
- Think about an optimistic insert or random ID
- Improve document merging / conflict resolution
- Consider Bloom filter vs hash for DB queries
- Modify db class to accept which database to interface with
- Modify db.add_or_update() to return the objectID of the Document inserted into the DB?
- Task Queue
- Implement and test Celery/RabbitMQ with citation engine