This repository consolidates multiple machine learning projects into a single repository. Each project is implemented using Jupyter Notebooks, highlighting various machine learning methodologies, use cases, and algorithms.
- Description: A project that utilizes Decision Tree models to predict heart disease outcomes. The project includes feature analysis, hyperparameter tuning, and model evaluation.
- Description: In this project, I analyzed the RSVP Movies dataset using MySQL to derive key insights into movie trends, genres, ratings, and industry success. Through advanced SQL queries, I explored director and actor performance, production house rankings, and genre-based analysis. This project highlights my expertise in SQL query design, data exploration, and delivering actionable insights for real-world datasets.
- Description: In this project, I analyzed bike-sharing system data to predict user demand using Linear Regression. Key steps included data exploration, feature engineering (using manual and automatic like RFE), and model development to identify the relationship between environmental conditions and bike usage and performance evaluation. The model provides actionable insights for optimizing bike availability and resource allocation. I used libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn
- Description: A project explors the car evluation datasets using Decision Tree models. It explores feature engineering, hyperparameter optimization, and regression modeling.
Heart Disease Prediction and Hyperparameter Tuning
- Description: A project that utilizes Decision Tree models to predict heart disease outcomes. The project includes feature analysis, hyperparameter tuning, and model evaluation.
Housing Price Prediction Using Ensemble - Stacking Regressor
Housing Price Prediction Using Ensemble - Random Forest
-
Description: This project addresses the business problem of predicting housing prices with high accuracy, a critical requirement for stakeholders in the real estate sector. It employs a combination of regression models—linear regression, KNN regressor, and decision tree regressor—and enhances their predictive performance using the Decision Tree, Stacking Regressor and Random Forest from sklearn.ensemble. Key libraries used include pandas, numpy, matplotlib, seaborn, Scikit-learn, and statsmodels, showcasing expertise in data preprocessing, visualization, machine learning, and statistical modeling.
-
The dataset, accessible here, forms the basis for this study. The models are evaluated using the R-squared metric, with statistical analysis performed through the OLS module from statsmodels. This project demonstrates the advantages of ensemble techniques and statistical rigor in deriving actionable insights for housing price prediction.
- Description: This project focuses on predicting home loan default risks using logistic regression. It involves solving a critical business problem by identifying potential high-risk, medium-risk, and low-risk loan applicants. The project explores data preprocessing, feature engineering, and implementing multi-class classification using One-vs-Rest and One-vs-One strategies. Additionally, it delves into the mathematical concepts of logistic regression, including the sigmoid function and log-loss optimization. Key learnings include understanding logistic regression coefficients for feature importance, applying libraries like Scikit-learn, Pandas, and Matplotlib, and evaluating models using metrics such as accuracy, precision, recall, F1-Score.
-
Description: This project focuses on grouping countries based on socio-economic and health-related indicators using clustering techniques. It addresses global development patterns by analyzing metrics like child mortality, exports, health expenditure, income, and GDP. The project covers data exploration, feature scaling, and implementing clustering algorithms such as K-Means and Hierarchical Clustering. It also provides insights into the evaluation of clustering performance and visualization of clusters.
-
Key learnings include understanding feature normalization for clustering, interpreting cluster centroids, and applying tools like Scikit-learn, Pandas, and Matplotlib to build and analyze clustering models. The project highlights the significance of clustering for policy-making, identifying development disparities, and exploring socio-economic similarities among countries.
- Pandas: Data manipulation and preprocessing.
- Scikit-learn: Machine learning algorithms and evaluation metrics.
- Matplotlib & Seaborn: Data visualization and exploratory analysis.
- category_encoders - is a Python library that provides a wide range of encoding techniques for categorical features, such as OneHot, Ordinal, Binary, and Target Encoding, to enhance machine learning model performance.
- Jupyter Notebook: Interactive development environment for analysis and presentation.
This repository is licensed under the MIT License, allowing free use for educational and non-commercial purposes.
Feel free to connect, collaborate, or share feedback:
- LinkedIn: Vijay Mahawar
- GitHub: vmahawar