this is group work with my friends where we doin EDA on dataset from Kaggle
in this project, i doin data preprocessing for the dataset where i;
- handled missing values
- changed data type
- handled duplicates
key-takes from this project are;
- understand the dataset and its attributes is compulsory to know what are you planning to explore in the dataset.
- minimizing the RAM usage as small as possible is a need so that the EDA be able to run smoothly.
- always able to differentiate which attributes are categorical or numerical.
- observe the values in the dataset that want to be dropped to ensure they will not affect the data visualization, which may result in inaccurate output.
source dataset: https://www.kaggle.com/datasets/mkechinov/ecommerce-behavior-data-from-multi-category-store