This repository is a warm-up project which solves the Titanic competition on Kaggle. It was created with the purpose of getting myself acquainted with the pandas library as well as basic data cleaning and data analysis practices.
The repo consists of three Jupyter Notebooks.
data_preparation.ipynb
- all the data cleaning steps and some visualisations of feature relationships.sklearn_algorithms.ipynb
- a very basic overview of different machine learning algorithms available in the scikit-learn Python library and their training performancepytorch_nn.ipynb
- a solution using a neural network written in PyTorch
Directories:
data/raw/
- original datasetdata/processed/
- dataset after data cleaning with pandasresults/
- predicted output labels of the PyTorch model (initially empty)