In this project, we perform a comprehensive analysis of the Melbourne House Price dataset to predict house prices. The goal is to practice predictive modeling by applying regression analysis. The dataset contains various features of houses such as the number of rooms, location, type, and more. The task involves preprocessing, performing exploratory data analysis (EDA), fitting a linear regression model, and interpreting the results.
Apply statistical modeling to predict house prices. Clean and preprocess data to handle missing values and outliers. Perform exploratory data analysis to understand the relationships between variables. Fit a linear regression model and evaluate its performance.
-
Load and Clean Data:
-
Import the Melbourne House Price dataset.
-
Handle missing data by addressing critical issues.
- Visualize and understand key variables and relationships, especially between the target variable (Price) and predictors.
- Handle skewness in the target variable by applying transformations like the natural logarithm to the 'Price' column.
- Split the dataset into training and testing datasets.
- Deal with missing data and outliers in the features.
- Select relevant features for the model based on correlation and domain knowledge.
- Fit a linear regression model using ordinary least squares (OLS).
- Include at least one categorical variable and apply necessary transformations to numerical features.
- Evaluate the model's performance using key metrics like Mean Squared Error (MSE), R-squared, etc.
- Analyze the model's coefficients to understand the influence of each variable on the target variable.