GitHub - aleynakurt/Stroke-prediction: This project aims to predict the stroke cases by using different machine learning models.

1.Introduction:

This project utilizes real-world and synthetic datasets to predict stroke events by analyzing clinical features. The aim is to determine the most key risk factors for strokes by investigating parameters like gender, age, hypertension, heart disease, and lifestyle choices.

2.Requirements

To install requirements:

pip install -r requirements.txt

3.Data

Data Source

Dataset:: Stroke Prediction Dataset from Kaggle website
Kaggle Dataset 1
Kaggle Dataset 2

id: Patient ID
gender: "Male", "Female" or "Other"
age: patient age
hypertension: 0 if the patient does not have hypertension, 1 if the patient does not have hypertension
heart_disease: 0 if the patient does not have heart disease, 1 if the patient has heart disease
ever_married: "No" or "Yes"
work_type: "children", "Govt_jov", "Never_worked", "Private" or "Self-employed"
Residence_type: "Rural" or "Urban"
avg_glucose_level: average blood sugar
bmi: body mass index
smoking_status: "formerly smoked", "never smoked", "smokes" or "Unknown"

Data Cleaning

In this project, we perform data cleaning to ensure the dataset is ready for analysis.
Missing values in the ’bmi’ column was filled with the mean values which are calculated separately for the cases with and without strokes.

Encoding the Variables

To work with categorical variables for further analysis, data encoding was used. Converting categorical variables into numerical format using the factorize function, enhances the datasets’ relevance for modelling.

Data Analysis

4.Training

From a Python library which is scikit-learn we used 6 different machine learning models.
GradientBoostingClassifier,SVC, LogisticRegression,DecisionTreeClassifier , Xgboost , and RandomForestClassifier .
We also used regressor models just to see how they worked. Regressor models are not decent options to use in binary classification problems.
Because regressor models used in continuous variables.

5.Results

Our model achieves the following performance:

Classification algorithm	Accuracy	Accuracy with hyperparameter tuning
GradientBoostingClassifier	83.73%	83.73%
LogisticRegression	79.20%	79.47%
RandomForestClassifier	99.32%	99.32%
SVC	79.58%
DecisionTreeClassifier	98.00%	98.16%
XGBClassifier	95.16%

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Img		Img
README.md		README.md
healthcare-dataset-stroke-data.csv		healthcare-dataset-stroke-data.csv
requirements.txt		requirements.txt
stroke_prediction.ipynb		stroke_prediction.ipynb
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1.Introduction:

2.Requirements

3.Data

Data Source

Data Cleaning

Encoding the Variables

Data Analysis

4.Training

5.Results

About

Releases

Packages

Languages

aleynakurt/Stroke-prediction

Folders and files

Latest commit

History

Repository files navigation

1.Introduction:

2.Requirements

3.Data

Data Source

Data Cleaning

Encoding the Variables

Data Analysis

4.Training

5.Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages