Race and Social Justice in Literature
- Brandy Knust
- Tyler Nguyen
Our project is to look for the genre of each book that is on the Amazon Top 50 Bestselling List from 2009 to 2020, and determine if sales of books within the Race and Social Justice genre have increased in the last 2-3 years by using the Extract, Transform and Load method.
-
How many books about Race or Social Justice were in the top 50 in the past decade?
-
Why would we think it's important?
-
Which other genres are more popular?
Kaggle (https://www.kaggle.com)
Use of publicly available dataset to download the Amazon Top 50 Bestselling data using Kaggle on Jupyter Notebook.
Goodreads (https://www.goodreads.com)
Use of publicly available book information to scrape needed data.
- ETLProject_v2.ipynb - Main script to extract and analyze
- Book_Scrape_v2.ipynb - Scraping and export data
- SQLETL.sql - Clean up data
- Data identification
- Data scraping and extraction (Selenium, BeautifulSoup, Pandas)
- Data cleanup (SQL, Pandas)
- Data aggregation (SQL, Pandas)
- Data analysis
- Data visualization (Pandas)
- Summary
- Documentation
- Presentation