Real Estate Price Prediction with Streamlit

This project focuses on analyzing real estate data and building predictive insights on house prices. The dataset contains features such as house size, number of bedrooms, and other key attributes.

Project Overview

The goal is to explore the dataset, prepare it for analysis, visualize key trends, and build a predictive model to estimate house prices.

Key Steps

1. Data Preparation

Loaded the real estate data from a CSV file (real_state_dataset.csv).
```
import pandas as pd
df = pd.read_csv('real_state_dataset.csv')
```
Displayed basic dataset information including shape, column names, and data types.
```
print(df.shape)
print(df.info())
```
Inspected the first few rows using head().
```
print(df.head())
```
Checked for missing values using df.isnull().sum().
```
print(df.isnull().sum())
```

Dropped unnecessary columns: brokered_by, zip_code, and prev_sold_date.

df.drop(columns=['brokered_by', 'zip_code', 'prev_sold_date'], inplace=True)

Removed rows with missing values using dropna().
```
df.dropna(inplace=True)
```
Checked for duplicate entries and removed them using drop_duplicates().
```
df.drop_duplicates(inplace=True)
```

2. Exploratory Data Analysis (EDA)

Calculated descriptive statistics (count, mean, min, max) for numerical columns using describe().
```
print(df.describe())
```
Analyzed the distribution of key features.

Visualized the top 10 states with the most houses using a bar plot.

import matplotlib.pyplot as plt
df['state'].value_counts().sort_values(ascending=False).head(10).plot(kind='bar')
plt.title('Top 10 States with Most Houses')
plt.show()

Calculated average house prices by state and city.

avg_price_by_state = df.groupby('state')['price'].mean()
print(avg_price_by_state)

Displayed the correlation between numerical features and the target variable (price).
```
print(df.corr()['price'])
```

3. Feature Engineering and Selection

Selected relevant features (bed, bath, house_size) for model building.
```
X = df[['bed', 'bath', 'house_size']]
y = df['price']
```
No additional feature engineering was performed.

4. Model Building and Evaluation

Split the dataset into training and testing sets using train_test_split.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Standardized the numerical features using StandardScaler to improve model performance.

from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
import joblib

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
joblib.dump(scaler, 'scaler.pkl')

Trained a Linear Regression model using the training data.
```
lr = LinearRegression()
lr.fit(X_train, y_train)
```

Made predictions on the test data and evaluated the model using Mean Absolute Error (MAE).

from sklearn.metrics import mean_absolute_error
lr_pred = lr.predict(X_test)
mae = mean_absolute_error(y_test, lr_pred)
print(f'Mean Absolute Error: {mae}')

Saved the trained model and scaler using joblib.dump().
```
joblib.dump(lr, 'model.pkl')
```

5. Streamlit Application

A Streamlit app was developed to allow users to input house features and get a predicted price.

import streamlit as st
import joblib
import numpy as np

scaler = joblib.load('scaler.pkl')
model = joblib.load('model.pkl')

st.title('House Price Prediction')

st.divider()

bed = st.number_input('Bedrooms', value=2 , step=1)
bath = st.number_input('Bathrooms', value=1, step=1)
house_size = st.number_input('House Size', value=1000, step=50)

X = [bed, bath, house_size]

st.divider()

predict_btn = st.button('Predict')
st.divider()

if predict_btn:
    st.balloons()   
    X1 = np.array(X)
    X_array = scaler.transform([X1])
    prediction = model.predict(X_array)[0]
    st.write(f'Predicted Price: {prediction:.2f}')
else:
    st.write('Click the button to predict the price')

Results

The model was evaluated using Mean Absolute Error (MAE), which measures how close predictions are to actual values.
The Streamlit app provides an interactive interface for predicting house prices.

Result Image

Future Work

Handle outliers (if present) and perform feature scaling.
Compare the Linear Regression model with other machine learning models (e.g., Decision Trees, Random Forests).
Tune hyperparameters to improve model performance.
Deploy the final model as a web application.

Contributing

Contributions are welcome! Please fork the repository and create a pull request for any enhancements or bug fixes.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
app.py		app.py
model.pkl		model.pkl
project.ipynb		project.ipynb
real_state_dataset.csv.zip		real_state_dataset.csv.zip
scaler.pkl		scaler.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real Estate Price Prediction with Streamlit

Project Overview

Key Steps

1. Data Preparation

2. Exploratory Data Analysis (EDA)

3. Feature Engineering and Selection

4. Model Building and Evaluation

5. Streamlit Application

Results

Future Work

Contributing

License

About

Releases

Packages

Languages

License

mahfuzur-mafu/Real-Estate-Price-Prediction

Folders and files

Latest commit

History

Repository files navigation

Real Estate Price Prediction with Streamlit

Project Overview

Key Steps

1. Data Preparation

2. Exploratory Data Analysis (EDA)

3. Feature Engineering and Selection

4. Model Building and Evaluation

5. Streamlit Application

Results

Future Work

Contributing

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages