Giter Site home page Giter Site logo

odd2023-datascience-ex01's Introduction

Ex-01_DS_Data_Cleansing

AIM

To read the given data and perform data cleaning and save the cleaned data to a file.

Explanation

Data cleaning is the process of preparing data for analysis by removing or modifying data that is incorrect ,incompleted , irrelevant , duplicated or improperly formatted. Data cleaning is not simply about erasing data ,but rather finding a way to maximize datasets accuracy without necessarily deleting the information.

ALGORITHM

STEP 1

Read the given Data

STEP 2

Get the information about the data

STEP 3

Remove the null values from the data

STEP 4

Save the Clean data to the file

CODE and OUTPUT

CODE

import pandas as pd
df = pd.read_csv("SampleDS.csv")
df
df.shape
df.describe()
df.info()
df.isnull().sum()
df.dropna(how='any').shape
x = df.dropna(how='any')
x
df.dropna(how='all').shape
df
tot = df.dropna(subset=['TOTAL'],how='any')
tot
tot = df.dropna(subset=['M1','M2','M3','M4'],how='any')
tot
df.fillna(0)
df
df.fillna(method='ffill')
df.interpolate()
mn = df.TOTAL.mean()
mn
df.TOTAL.fillna(mn,inplace=False)
td = df.TOTAL.fillna(mn,inplace = True)
df
l = df.M1.interpolate()
l
df.isnull()
df.notnull()
df.dropna()
df.head()
df.tail()
df.info()
df.describe()
df
df.duplicated()
df
df.drop_duplicates(inplace=True)
df
df['cd']=pd.to_datetime(df['DOB'])
df
for x in df.index:
  if df.loc[x,'AVG']>100:
    df.drop(x,inplace = True)
df

OUTPUT

image image image image image image image image image image image image image image image image image image image image image image image image image

odd2023-datascience-ex01's People

Contributors

karthi-govindharaju avatar bhargava-shankar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.