This repository contains the code and related files for the "Titanic: Machine Learning from Disaster" project on Kaggle. The goal of the project is to create a machine learning model that predicts whether a passenger survived the Titanic shipwreck or not.
The sinking of the RMS Titanic remains an indelible mark on maritime history, standing as one of the most notorious shipwrecks to date. On the fateful day of April 15, 1912, during its inaugural journey, the Titanic met its demise when it struck an iceberg, leading to the devastating loss of 1502 lives out of the 2224 passengers and crew aboard. This heartrending incident reverberated across the globe, prompting the international community to enforce stringent safety measures for all future vessels.
The challenge is to analyze the characteristics of the passengers and develop a machine learning model that can predict the likelihood of survival. By applying the tools of machine learning, we aim to predict which passengers survived the tragedy.
The repository provides two datasets: a training set (train.csv) and a test set (test.csv).
pclass: A proxy for socio-economic status (SES) 1st = Upper 2nd = Middle 3rd = Lower
age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5
sibsp: The dataset defines family relations in this way... Sibling = brother, sister, stepbrother, stepsister Spouse = husband, wife (mistresses and fiancés were ignored)
parch: The dataset defines family relations in this way...
Parent = mother, father Child = daughter, son, stepdaughter, stepson Some children travelled only with a nanny, therefore parch=0 for them.
Embarked: Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton
Survival: 0 = No, 1 = Yes
The evaluation metric for this competition is accuracy, which measures the percentage of correctly predicted passengers.
You should submit a CSV file with exactly 418 entries plus a header row. Your submission will show an error if you have extra columns (beyond PassengerId and Survived) or rows.
The file should have exactly 2 columns:
- PassengerId (sorted in any order)
- Survived (contains your binary predictions: 1 for survived, 0 for deceased)
PassengerId,Survived
892,0
893,1
894,0
Clone the repository to your local environment:
git clone https://github.com/g3rley/titanic.git
Install the project dependencies:
pip install -r requirements.txt
Run the titanic.ipynb Jupyter Notebook to explore the data, create and evaluate the model.
Export the final model predictions to the submission.csv file.
- Python - Programming language
- Jupyter Notebook - Web application for creating and sharing documents that contain live code, equations, visualizations and narrative text
- Pandas - Data analysis and manipulation tool
- NumPy - Library for working with arrays
- Matplotlib - Library for creating static, animated, and interactive visualizations
- Seaborn - Data visualization library based on matplotlib
- Scikit-learn - Machine learning library for the Python programming language
- Gerley Adriano - g3rley
This project is licensed under the MIT License. See the LICENSE file for more information.