Python Scraping playground

Setup:

virtualenv -p python env
source env/bin/activate
pip install -r requirements.txt

Run

python <scraper_name>.py --chrome-driver-path=/path/to/driver

Note: some scrapers use webdriver.Chrome in order to overcome getting blocked by certain websites; that said you are welcome to change this and use your preferred browser and corresponding driver.

from selenium.webdriver.chrome.options import Options

def get_browser():
    chrome_options = Options()
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--disable-gpu')
    chrome_options.add_argument("--headless")
    if browser_driver_path:
        return webdriver.Chrome(chrome_options=chrome_options, executable_path=browser_driver_path)
    return webdriver.Chrome(chrome_options=chrome_options)

If no driver path is provided it assumes the path to find the driver is present in PATH

Available scrapers

scrape-listings-lavoz.py : will scrape 1 dorm non seasonal apartments from clasificados.lavoz.com.ar and produce a csv file containing information about each apartment and a histogram for prices distribution.
scrape-listings-mercadolibre.py : will scrape 1 dorm non seasonal apartments from mercadolibre.com.ar and produce a csv file containing information about each apartment and a histogram for prices distribution.

Running scrapers with docker

Alternatively you can choose to run the scraper in a ready to use image provided.

Build image and run scraper container:

docker-compose build

docker-compose up -d

Then inside the container ( docker attach scraping_scraper_1 ) you can run a scraper by:

python <scraper_name>.py

The project directory is mounted as a volume for ease of use. Any files created by a scraper should be made visible to the host immediately.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
scrape-listings-lavoz.py		scrape-listings-lavoz.py
scrape-listings-mercadolibre.py		scrape-listings-mercadolibre.py
todos.md		todos.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Scraping playground

Setup:

Run

Available scrapers

Running scrapers with docker

Build image and run scraper container:

About

Releases

Packages

Languages

gerazenobi/scraping

Folders and files

Latest commit

History

Repository files navigation

Python Scraping playground

Setup:

Run

Available scrapers

Running scrapers with docker

Build image and run scraper container:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages