Skip to content

Collection, geolocalisation and topic modeling on crime-related news articles

Notifications You must be signed in to change notification settings

parasmehta/newsarticles

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

newsarticles

Data collection

This software collects crime related information (news articles) from the following websites:

London: Possible source - http://www.london24.com/news/crime

Rome:

Sofia:

The collected infomation will be stored in a .csv-file for each website.


Data Processing

Some simple informations will be extracted from the collected information in the .csv-files:

  • type of crime / category
  • location of mentioned crime
  • time

Important: Database access for user "agdb_mem" on esel.imp.fu-berlin.de" has to be setup!


Queries for roads and admin levels: ROADS:

ADMIN_LVLS:

  • SELECT distinct a.id, a.name, a.admin_level, ST_astext(a.geometry) FROM import.osm_admin a, import.osm_admin b WHERE (ST_contains(a.geometry,b.geometry) OR ST_intersects(a.geometry,b.geometry)) AND a.name = 'Roma' AND b.name != ''

Therefore some tools are used:

  1. NLTK for text mining: http://www.nltk.org/book/ch07.html
  2. Scikit-learn for machine learning: http://scikit-learn.org/stable/
  3. Gensim package for topic modeling
  4. Clavin for geotagging

About

Collection, geolocalisation and topic modeling on crime-related news articles

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published