This project will showcase a comprehensive analysis of the Android app market by comparing over 10,000 apps in Google Play across different categories.
We'll look for insights in the data to devise strategies to drive growth and retention when building a mobile app.
This project is based on an exercise in the "Data Scientist with Python Track" on DataCamp which I am currently undertaking (Jan 2021) with various additions and changes of my own.
The Dataset comprises two files:
apps.csv: details of the applications on Google Play. There are 13 features that describe any given app in the dataset.
user_reviews.csv: contains 100 reviews for each app, most helpful first. The text in each review has been pre-processed and attributed with three new features: Sentiment (Positive, Negative or Neutral), Sentiment Polarity and Sentiment Subjectivity.
The project demonstrates various data analysis and visualisation techniques, including:
- importing data from csv files;
- merging tables;
- cleaning and reshaping data;
- exploratory and summary statistical analysis;
- data visualisation with matplotlib and Seaborn;
- how to pose commercial questions and define assumptions in order to derive meaningful business insights from a large dataset.