This project aims to apply analytics for fare prediction using historical data from a pilot project.
-
Train Data:
- Variables:
pickup_datetime
pickup_longitude
pickup_latitude
dropoff_longitude
dropoff_latitude
passenger_count
- Variables:
-
Test Data:
- Test data requiring fare prediction
-
Original Predictions:
- Python Predictions:
Original_with_fare&Dist_python.csv
- R Predictions:
Original_with_fare_amount&Dist_R.csv
- Python Predictions:
-
Project Reports:
- Detailed project report:
Project Report_Cab_Rental
- Detailed project report:
-
Code Files:
- Python Code:
Project_2.ipynb
- R Code:
Project_2_R
- Python Code:
- Identified missing values in the dataset and addressed them.
- Derived new variables (
Month
,Year
,Time (Hrs)
,Day
,Day/Night
) from thepickup_datetime
variable. - Calculated
Distance_Km
using geographical coordinates.
- Filtered data based on reasonable thresholds for
passenger_count
,fare_amount
, andDistance_Km
. - Conducted boxplot analysis for outliers in specific variables.
- Converted specific variables (
passenger_count
,Month
,Year
,Day
,Day/Night
) into factor variables.
- Removed variables (
pickup_datetime
,pickup_longitude
,pickup_latitude
,dropoff_longitude
,dropoff_latitude
) based on heatmap analysis.
- Created dummy variables for better analysis of specific factor variables (
Month
,Year
,Day
,Day/Night
,passenger_count
).
- Applied Linear Regression (LR) model in Python and Random Forest in R for fare prediction.
-
Clone the repository:
git clone <repository_url>
-
Navigate to the project directory:
cd Cab-Fare-Prediction--Data-Science-Capstone-project
-
Explore the project files and reports for a detailed understanding.
Feel free to refer to the Project Report_Cab_Rental
for an in-depth explanation of the project.
Note: Adjust file paths and comments as needed for your project structure.