Giter Site home page Giter Site logo

data_cleaning's Introduction

data_cleaning

A repository of SQL data cleaning projects.

Introduction

This is a repo for small projects that can be used to practice data cleansing using SQL, Excel or any other method. This small project was inspired by a post made by Sushanta Khara on LinkedIn.

Project List:

Problem Statement

In Data Analysis, the analyst must ensure that the data is 'clean' before doing any analysis. 'Dirty' data can lead to unreliable, inaccurate and/or misleading results. Garbage in = garbage out.

These are the some steps that can be taken to properly prepare your dataset for analysis.

  • Check for duplicate entries and remove them.
  • Remove extra spaces and/or other invalid characters.
  • Separate or combine values as needed.
  • Ensure that certain values (age, dates...) are within certain range.
  • Check for outliers.
  • Correct incorrect spelling or inputted data.
  • Adding new and relevant rows or columns to the new dataset.
  • Check for null or empty values.

Using the criteria above, create a new SQL table with the properly formatted data.

Datasets used

This repository contains different projects/datasets to give the user many opportunities to practice:

  • Basic select statements (select, where, group by, having).
  • Aggregate functions (count, sum, min, max, avg)
  • Joins (inner, outer, left, right)
  • CTE's, temp tables and views
  • string & date manipulation functions.
  • Window functions (rank, lead, lag, row_number, ntile...)

data_cleaning's People

Contributors

iweld avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.