Eventpoints is one of the project developed within the Open Source Weekends community.
EventPoints is a web calendar of technological events in Spain that uses scrapping techniques to read the available events from various sources.
EventPoints has two main repositories:
-
Backend: This repository contains both the scraper and the API code that serves the events.
-
Frontend: This repository contains a React application that uses this project API to query, filter and georeference different events.
There is also a Spanish version of this Readme here.
The repository has two fundamental blocks structured in two different folders:
- api: API Rest developed with Node.js, Pillarsjs and a GoblinDB database that serves the information obtained by the scrapers.
- scrapers: Source code of the different scrapers of the project that store the information ones it has been obtained.
The API can be found in the api
directory.
Run npm install
from within the api
directory to install dependencies.
API documentation, created using Swagger, can be accessed at: http://localhost:3000/api/v1/spec
To start, run the API using npm start
, then access it at http://localhost:3000/api/v1/events
There are many scrapers developed in Python using Scrapy Library.
- Recommended Python version: 3.6*
- Recommended version of Pip: 18.1
Create a virtual environment
python3 -m venv ./venv
Install the dependencies
pip3 install -r requirements.txt
If you're using Python 3.7, you'll get an error that you can solve by running
pip3 install git+https://github.com/twisted/twisted.git@trunk
To run the scraper use:
scrapy crawl {spider_name} -o {json_path}
With spider_name
being the name of the spider and json_path
being the JSON file in which the scraped data will be dumped in.
As an example, the following command will create a spider named meetup
and any scraped data will be dumped in meetup.json
scrapy crawl meetup -o output / meetup.json
Scrapy appends data to the end of the JSON file at json_path
. Therefore, you should delete the JSON file before each execution.
If you want to run the scraper using R (on Linux Debian based) you should first install the R language.
- Install previous dependencies.
apt install libcurl4-openssl-dev libssl-dev libxml2-dev
- Install R.
apt install r-base
- Run R environment (using sudo).
sudo -i R
- Install scraper dependencies.
install.packages("tidyverse")
To execute in console one of the R spiders, we will execute the following command:
R CMD BATCH {spider_name}.R {json_path}
Where spider_name
is the name of the spider and json_path
is the JSON where the scrap will be dumped.
- Facilitator:
- Daniel García (Slack:@DGJones / GitHub:@danielgj))
- Mentors:
- Daniel García (Slack:@DGJones / GitHub:@danielgj))
- Jorge Baumann (Slack:@jbaumann / GitHub:@baumannzone)
- Ricardo García-Duarte (Slack:@RicardoGDM / GitHub:@rgarciaduarte)
- Theba Gomez (Slack:@KoolTheba / GitHub:@KoolTheba)
- Ulises Gascon (Slack:@ulisesgascon / GitHub:@UlisesGascon)
- Slack channel: *#pr_eventpoints_new