This is tutorial based project which shows how various ways to clean your data before pushing it into Data Science/ Data Analysis black box.
Around 80-85% time of Data Scientist's job goes into cleaning the raw, unstructured, unformatted, and unwanted data. To get a clean data to process on we need to process and use various techniques.
For this we can use Python (most powerful programming language) packages such as pandas and numpy (https://pandas.pydata.org/pandas-docs/stable/dsintro.html).
Programming Environment: Python Jupyter Notebook
It is important to have a primary key column in your dataset here which is 'identifier' field. Let check if identifier has all unique values.