Skip to content
hrybacki edited this page Mar 24, 2013 · 5 revisions

CitationEngine

Short term todo list:

  • Be ready for people to help.
    • Get some working examples of how our code works
    • Build a large test bank for people to work with and build grammars for.

Long term todo list:

  • Document class
    • Create a pretty print function for displaying Document contents
  • Parser classes
    • Complete "abbreviated journal" to "unabbreviated journal" tool.
      • NOTE: Look into what pubmed has done toward this already
  • Fetcher classes
    • Pubmed
      • Need to determine resolutionToken lifespan
    • Store meta-collection data i.e. query used, source obtained from, and timestamp
      • Need to know who, when, and from which batch. I.E.: user.datetime.query.pointers to all documents/raw data collected.
        • Think about saving all captured information to disk -- json?
        • local storage?
        • database?
        • used in conjunction with merges/conflict resolutions
  • CorpusController class
    • Setup logging?
  • Database stuff
    • Think about an optimistic insert or random ID
    • Improve document merging / conflict resolution
    • Consider Bloom filter vs hash for DB queries
    • Modify db class to accept which database to interface with
    • Modify db.add_or_update() to return the objectID of the Document inserted into the DB?
  • Task Queue
    • Implement and test Celery/RabbitMQ with citation engine
Clone this wiki locally