Machine Learning Projects

This repository consolidates multiple machine learning projects into a single repository. Each project is implemented using Jupyter Notebooks, highlighting various machine learning methodologies, use cases, and algorithms.

📂 Projects

1. Exploratory Data Analysis and Machine Learning for predicting loan defaults using Python.

Description: A project that utilizes Decision Tree models to predict heart disease outcomes. The project includes feature analysis, hyperparameter tuning, and model evaluation.

2. RSVP Movies SQL Case Study

Description: In this project, I analyzed the RSVP Movies dataset using MySQL to derive key insights into movie trends, genres, ratings, and industry success. Through advanced SQL queries, I explored director and actor performance, production house rankings, and genre-based analysis. This project highlights my expertise in SQL query design, data exploration, and delivering actionable insights for real-world datasets.

3. Bike Sharing Prediction using Linear Regression

Description: In this project, I analyzed bike-sharing system data to predict user demand using Linear Regression. Key steps included data exploration, feature engineering (using manual and automatic like RFE), and model development to identify the relationship between environmental conditions and bike usage and performance evaluation. The model provides actionable insights for optimizing bike availability and resource allocation. I used libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn

4. Car Evaluation Prediction Using Decision Tree

Description: A project explors the car evluation datasets using Decision Tree models. It explores feature engineering, hyperparameter optimization, and regression modeling.

5. Heart Disease Prediction Using Decision Tree

Heart Disease Prediction and Hyperparameter Tuning

Description: A project that utilizes Decision Tree models to predict heart disease outcomes. The project includes feature analysis, hyperparameter tuning, and model evaluation.

6. Housing Price Prediction Using Decision Tree

Housing Price Prediction Using Ensemble - Stacking Regressor

Housing Price Prediction Using Ensemble - Random Forest

Description: This project addresses the business problem of predicting housing prices with high accuracy, a critical requirement for stakeholders in the real estate sector. It employs a combination of regression models—linear regression, KNN regressor, and decision tree regressor—and enhances their predictive performance using the Decision Tree, Stacking Regressor and Random Forest from sklearn.ensemble. Key libraries used include pandas, numpy, matplotlib, seaborn, Scikit-learn, and statsmodels, showcasing expertise in data preprocessing, visualization, machine learning, and statistical modeling.
The dataset, accessible here, forms the basis for this study. The models are evaluated using the R-squared metric, with statistical analysis performed through the OLS module from statsmodels. This project demonstrates the advantages of ensemble techniques and statistical rigor in deriving actionable insights for housing price prediction.

7. Home Loan Default Prediction Using Logistic Regression

Description: This project focuses on predicting home loan default risks using logistic regression. It involves solving a critical business problem by identifying potential high-risk, medium-risk, and low-risk loan applicants. The project explores data preprocessing, feature engineering, and implementing multi-class classification using One-vs-Rest and One-vs-One strategies. Additionally, it delves into the mathematical concepts of logistic regression, including the sigmoid function and log-loss optimization. Key learnings include understanding logistic regression coefficients for feature importance, applying libraries like Scikit-learn, Pandas, and Matplotlib, and evaluating models using metrics such as accuracy, precision, recall, F1-Score.

8. Country Clustering on Socio-Economic Factors

Description: This project focuses on grouping countries based on socio-economic and health-related indicators using clustering techniques. It addresses global development patterns by analyzing metrics like child mortality, exports, health expenditure, income, and GDP. The project covers data exploration, feature scaling, and implementing clustering algorithms such as K-Means and Hierarchical Clustering. It also provides insights into the evaluation of clustering performance and visualization of clusters.
Key learnings include understanding feature normalization for clustering, interpreting cluster centroids, and applying tools like Scikit-learn, Pandas, and Matplotlib to build and analyze clustering models. The project highlights the significance of clustering for policy-making, identifying development disparities, and exploring socio-economic similarities among countries.

🛠️ Tools and Libraries Used

Pandas: Data manipulation and preprocessing.
Scikit-learn: Machine learning algorithms and evaluation metrics.
Matplotlib & Seaborn: Data visualization and exploratory analysis.
category_encoders - is a Python library that provides a wide range of encoding techniques for categorical features, such as OneHot, Ordinal, Binary, and Target Encoding, to enhance machine learning model performance.
Jupyter Notebook: Interactive development environment for analysis and presentation.

📜 License

This repository is licensed under the MIT License, allowing free use for educational and non-commercial purposes.

🌐 Connect with Me

Feel free to connect, collaborate, or share feedback:

LinkedIn: Vijay Mahawar
GitHub: vmahawar

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
bike-sharing-linear-regression		bike-sharing-linear-regression
car-evaluation-prediction-dt		car-evaluation-prediction-dt
country-clustering-on-socio-economic-factor		country-clustering-on-socio-economic-factor
credit-loan-default-eda		credit-loan-default-eda
heart-disease-prediction-dt-hyperparameter-tuning		heart-disease-prediction-dt-hyperparameter-tuning
heart-disease-prediction-dt		heart-disease-prediction-dt
home-loan-default-prediction-logistic-regression		home-loan-default-prediction-logistic-regression
housing-price-prediction-dt		housing-price-prediction-dt
housing-price-prediction-ensemble		housing-price-prediction-ensemble
housing-price-prediction-random-forest		housing-price-prediction-random-forest
rsvp-movies-sql-case-study		rsvp-movies-sql-case-study
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Projects

📂 Projects

1. Exploratory Data Analysis and Machine Learning for predicting loan defaults using Python.

2. RSVP Movies SQL Case Study

3. Bike Sharing Prediction using Linear Regression

4. Car Evaluation Prediction Using Decision Tree

5. Heart Disease Prediction Using Decision Tree

6. Housing Price Prediction Using Decision Tree

7. Home Loan Default Prediction Using Logistic Regression

8. Country Clustering on Socio-Economic Factors

🛠️ Tools and Libraries Used

📜 License

🌐 Connect with Me

About

Releases

Packages

Languages

vmahawar/machine-learning-projects

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Projects

📂 Projects

1. Exploratory Data Analysis and Machine Learning for predicting loan defaults using Python.

2. RSVP Movies SQL Case Study

3. Bike Sharing Prediction using Linear Regression

4. Car Evaluation Prediction Using Decision Tree

5. Heart Disease Prediction Using Decision Tree

6. Housing Price Prediction Using Decision Tree

7. Home Loan Default Prediction Using Logistic Regression

8. Country Clustering on Socio-Economic Factors

🛠️ Tools and Libraries Used

📜 License

🌐 Connect with Me

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages