Skip to content

Minimalistic example of crawling any kind of web page, including a dynamic one page app

Notifications You must be signed in to change notification settings

gittyeric/one-page-app-web-crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

one-page-app-web-crawler

Minimalistic example of crawling any kind of web page, including a dynamic one page app

Setup

  1. You'll need to have Python 3 installed on your system. Be sure to check "Add Python environment variables" if asked during installation.

  2. Now either download this project or clone with git:

    git clone https://github.com/gittyeric/one-page-app-web-crawler.git

  3. Download the Selenium web driver for chrome (you can use Firefox too but you'll have to change the code a bit). This will allow you to control your browser with Python code.

  4. Drop the downloaded web driver file into the root of this project.

  5. From command line, install the selenium library for Python:

    pip install selenium

Running

Now simply run the python script from command line:

cd path/to/project

python clark_crawl.py

Clark what?

This is a simple example written for the Clark County PD, it'll run through an inmate database, pull out all the records page-by-page and print them out at the end. You can change the print statement in clark_crawl.py to save the result however you'd like, and you can follow the code in crawler.py to see how it's done.

About

Minimalistic example of crawling any kind of web page, including a dynamic one page app

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages