The objective of this project is to predict house prices based on various features such as average area income, house age, number of rooms, number of bedrooms, and area population. The goal is to build a machine learning model that can accurately estimate the price of a house given these features.
- 📥 Load and clean the dataset.
- 📊 Visualize the data to understand relationships between features.
- 🤖 Build and evaluate a linear regression model.
- 🔮 Predict house prices based on new input values.
- 📈 Provide visualizations to help understand the model's performance.
- Python: Programming language used for data analysis and machine learning.
- Pandas: Library for data manipulation and analysis.
- NumPy: Library for numerical computations.
- Matplotlib: Library for creating static, animated, and interactive visualizations.
- Seaborn: Library for statistical data visualization.
- Scikit-learn: Library for machine learning.
-
Data Loading and Cleaning:
- Load the dataset using Pandas.
- Check for missing values and clean the data.
- Remove unnecessary columns (e.g.,
Address
).
-
Data Visualization:
- Visualize the distribution of house prices.
- Create a correlation heatmap to understand relationships between features.
- Generate pair plots, box plots, and regression plots to explore data.
-
Model Building and Evaluation:
- Normalize and scale the data.
- Split the data into training and testing sets.
- Build and evaluate a linear regression model.
-
Prediction:
- Create a function to predict house prices based on new input values.
- Test the model with example input values.
- Linear Regression:
- Mean Squared Error: 10068422551.400879
- Root Mean Squared Error: 100341.52954485435
- Mean Absolute Error: 81135.56609336878
- Mean Absolute Percentage Error: 0.07336544896281169
- R^2 Score: 0.9146818498754016
- Explained Variance Score: 0.9147412103528018
- Distribution of House Prices
- Correlation Heatmap
- Pairplot of Features
- Boxplot of House Prices by Number of Bedrooms
- Regression Plots
- House Prices vs. Average Area Income
- House Prices vs. House Age
- House Prices vs. Area Population
- Countplot of Number of Bedrooms:
- Predictions vs Actual Values
- Residuals Plot
- Distribution of Residuals
- Errors vs Actual Values
- Percentage Errors vs Actual Values
To manually test the model, you can use the predict_house_price
function to input new values and get the predicted house price.
- The project is organized as follows:
- Data
- Notebook
- The Linear Regression model performed well in terms of accuracy and explained variance.
- Visualizations helped in understanding the relationships between features and the target variable.
- Manual testing allows for easy prediction of house prices based on new input values.
- If you have any questions or would like to get in touch, please contact me at: jotaduranbon@gmail.com