GeoScraper

This project consists of a collection of scrapers dedicated to browse and collect data for geospatial analysis using different tools like Python, R, Microsoft Excel, Tableau, Power BI, etc. and some of which will be implemented later in different repositories (to be linked later).

Objective

Geospatial data are becoming more and more essential for people all around the world and suitable data for electronic visualization are becoming more and more accessible and useful everyday. The collection of scraper collects most of the data in GeoJSON format which is a widely accepted standard for geospatial data visualization on many platforms. The scraper also has a report generation script integrated to it which will generate real-time reports on the network performance where the scraper is being run for monitoring the scraper health and performance. In the end, it will also save the last frame of the report for further analysis on daily reports accumulated over time.

Scraped data, logs, and reports are stored in a way that will keep track of daily information accumulation which is useful in order to analyze application and implications of the product in use in the long run and customize according to the analyzed results.

There is one spider dedicated for logging the user into the system followed by the remaining spiders which will collect data from the target website or websites. The user only needs to have the credentials as environment variables available to the scraper environment in the right format.

Tools

This collection of scraper is built solely on Python and a few modules publicly available to everyone. The list goes as follows:

Python
Scrapy
Pandas
Matplotlib

The user will also need to install MongoDB on the machine where the scraper will run, or have a cluster of MongoDB databases hosted on a remote machine. If using MongoDB to store collected and pre-processed data is not desired, it can be turned of from the geo/settings.py script.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
geo		geo
services		services
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GeoScraper

Objective

Tools

About

Releases

Packages

Languages

License

rifatrakib/geo-scraper

Folders and files

Latest commit

History

Repository files navigation

GeoScraper

Objective

Tools

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages