sundaslatif / data-cleaning-using-numpy-and-pandas Goto Github PK

View Code? Open in Web Editor NEW

This is tutorial based project which shows how various ways to clean your data before pushing it into Data Science/ Data Analysis black box.

Jupyter Notebook 100.00%

data-cleaning-using-numpy-and-pandas's Introduction

Data-Cleaning-using-Numpy-and-Pandas

This is tutorial based project which shows how various ways to clean your data before pushing it into Data Science/ Data Analysis black box.

Objective:

Around 80-85% time of Data Scientist's job goes into cleaning the raw, unstructured, unformatted, and unwanted data. To get a clean data to process on we need to process and use various techniques.

For this we can use Python (most powerful programming language) packages such as pandas and numpy (https://pandas.pydata.org/pandas-docs/stable/dsintro.html).

Programming Environment: Python Jupyter Notebook

Parts:

1. Removing unwanted columns from the raw data(.csv file)

2. Data Cleaning Step : Setting up key Index for the Dataframe

It is important to have a primary key column in your dataset here which is 'identifier' field. Let check if identifier has all unique values.

Recommend Projects

sundaslatif / data-cleaning-using-numpy-and-pandas Goto Github PK