This project is designed to estimate a car price range suitable for customers based on their financial information. By inputting details such as age, salary, and net worth, the app calculates and advises a car price range tailored to the customer.
The goal of this project is to provide a seamless interface for users to input their details and receive an estimated car price range based on their financial data. The app uses simple calculations and a clean user interface to deliver actionable insights.
- Loaded the car purchasing data from a CSV file (
car_purchasing.csv
).import pandas as pd df = pd.read_csv("car_purchasing.csv", encoding="latin-1")
- Displayed basic dataset information including shape, column names, and data types using
info()
andshape
. - Inspected the first few rows using
head()
. - Checked for and handled missing values (if any) using
isnull().sum()
and potentiallydropna()
. - Checked for and removed duplicate entries (if any) using
drop_duplicates()
.
-
Calculated descriptive statistics for numerical columns using
describe()
. -
Analyzed the distribution of key features, including
age
,annual salary
,credit card debt
, andnet worth
, using histograms and scatter plots. -
Visualized relationships between features and the target variable (
car purchase amount
) using scatter plots. -
Calculated correlations between numerical features and the target variable using
corr()
. -
Explored potential categorical features like gender using
groupby()
and visualizations.
Scatter plots: Visualized the relationships between numerical features (age, salary, net worth, credit and debt) and car purchase amount.
Histograms: Showed the distribution of age in the dataset.
Bar plots: Compared mean net worth and Mean annual salary by gender, and visualized mean/maximum car purchase amount by age range.
- Created new features to potentially improve model performance:
- Converted
gender
to a categorical variable (n_gender
) usingapply()
.
df['n_gender'] = df['gender'].apply(lambda x: 1 if x == 'Male' else 0)
- Grouped
age
into ranges (age_range
) using a custom function andapply()
.
def age_group(age): if age < 30: return 'Young' elif 30 <= age < 50: return 'Middle-aged' else: return 'Senior' df['age_range'] = df['age'].apply(age_group)
- Converted
- Analyzed the impact of the new features on the target variable.
- Split the dataset into training and testing sets using
train_test_split
fromsklearn.model_selection
.from sklearn.model_selection import train_test_split X = df[['age', 'annual_salary', 'credit_card_debt', 'net_worth']] y = df['car_purchase_amount'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
- Scaled numerical features using
StandardScaler
fromsklearn.preprocessing
.from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test)
- Trained three regression models:
- Linear Regression
from sklearn.linear_model import LinearRegression lr = LinearRegression() lr.fit(X_train, y_train)
- Support Vector Regression
from sklearn.svm import SVR svr = SVR() svr.fit(X_train, y_train)
- Random Forest Regression
from sklearn.ensemble import RandomForestRegressor rf = RandomForestRegressor() rf.fit(X_train, y_train)
- Performed hyperparameter tuning for SVR and Random Forest Regression using
GridSearchCV
to optimize model performance. - Evaluated model performance using metrics like Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) using functions from
sklearn.metrics
.
- Selected the best-performing model based on evaluation results. (In this case, it was likely Linear Regression).
- Saved the chosen model using
joblib.dump
for future use.import joblib joblib.dump(lr, "model.pkl") joblib.dump(scaler, "scaler.pkl")
- A Streamlit app was developed to allow users to input customer details and get a recommended car price range.
age =st.number_input('Age',min_value=18,max_value=90,value=40, step=1) annual_salary = st.number_input('Annual Salary', min_value=500, max_value=9999999999, value=12000, step=5000) net_worth = st.number_input('Net Worth', min_value=0, max_value=999999999, step=2000, value=100000) X = [age,annual_salary,net_worth] calculate =st.button('Calculate') st.divider() X_scaled = scaler.transform([X]) if calculate: st.balloons() X_2 = np.array(X) X_array = scaler.transform([X_2]) prediction = model.predict(X_array) # Check if prediction is less than 0 if prediction[0] < 0: st.write("Prediction is: Negative") else: st.write(f"Prediction is: {prediction[0]:}") st.write("Advice: cars in the similar values") else: st.write("Enter values")
- The application provides quick and accurate car price range estimates based on user inputs.
- Screenshot of the app interface:
- Improve the model by incorporating more complex features.
- Compare performance with advanced models like Gradient Boosting or Neural Networks.
- Deploy the app on a cloud platform for wider accessibility.
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch for your feature or bugfix:
git checkout -b feature-name
- Commit your changes:
git commit -m "Description of your changes"
- Push to your branch:
git push origin feature-name
- Create a pull request.
This project is licensed under the MIT License. See the LICENSE
file for more details.