Giter Site home page Giter Site logo

python_eda's Introduction

About Us GitHub

The information on this Github is part of the materials for the subject High Performance Data Processing (SECP3133). This folder contains general Exploratory Data Analysis (EDA) information as well as EDA case studies using Malaysian datasets. This case study was created by a Bachelor of Computer Science (Data Engineering), Universiti Teknologi Malaysia student.

Exploratory Data Analysis

Exploratory data analysis (EDA) involves using graphics and visualizations to explore and analyze a data set. The goal is to explore, investigate and learn, as opposed to confirming statistical hypotheses.

When do I use it?: Exploratory data analysis is a powerful way to explore a data set. Even when your goal is to perform planned analyses, EDA can be used for data cleaning, for subgroup analyses or simply for understanding your data better. An important initial step in any data analysis is to plot the data.

EDA

๐Ÿ“– Lab

No Title Colab GitHub
1 Introduction to Exploratory Data Analysis Open in Colab Open in GitHub
2 Exploratory data analysis in Python Open in Colab Open in GitHub
3 Housing Dataset Open in Colab Open in GitHub
4 Exploring data and missing values Open in Colab Open in GitHub

๐Ÿš€ Case Study: Evaluation Criteria

Your submission will be evaluated using the following criteria:

  • Dataset must contain at least 5 columns and 1500 rows of data
  • You must ask and answer at least 5 questions about the dataset
  • Your submission must include at least 5 visualizations (graphs)
  • Your submission must include explanations using markdown cells, apart from the code.
  • Your work must not be plagiarized i.e. copy-pasted from somewhere else.

Follow this step-by-step guide to work on your project.

Step 1: Select a real-world dataset

Step 2: Perform data preparation & cleaning

  • Load the dataset into a data frame using Pandas
  • Explore the number of rows & columns, ranges of values etc.
  • Handle missing, incorrect and invalid data
  • Perform any additional steps (parsing dates, creating additional columns, merging multiple dataset etc.)

Step 3: Perform exploratory analysis & visualization

  • Compute the mean, sum, range and other interesting statistics for numeric columns
  • Explore distributions of numeric columns using histograms etc.
  • Explore relationship between columns using scatter plots, bar charts etc.
  • Make a note of interesting insights from the exploratory analysis

Step 4: Ask & answer questions about the data

  • Ask at least 4 interesting questions about your dataset
  • Answer the questions either by computing the results using Numpy/Pandas or by plotting graphs using Matplotlib/Seaborn
  • Create new columns, merge multiple dataset and perform grouping/aggregation wherever necessary
  • Wherever you're using a library function from Pandas/Numpy/Matplotlib etc. explain briefly what it does

Step 5: Summarize your inferences & write a conclusion

  • Write a summary of what you've learned from the analysis
  • Include interesting insights and graphs from previous sections
  • Share ideas for future work on the same topic using other relevant datasets
  • Share links to resources you found useful during your analysis

Step 6: Make a submission

  • Upload your notebook to e-learning.

Example Projects

Refer to these projects for inspiration:

๐ŸŒŸ Case Study: Exploratory Data Analysis

Team Title Colab GitHub
404 Error Property in Kuala Lumpur Open in Colab Open in GitHub
Alrite ABC Open in Colab Open in GitHub
BEFE ABC Open in Colab Open in GitHub
Boboiboy Property Listings in Kuala Lumpur Open in Colab Open in GitHub
COLBY ABC Open in Colab Open in GitHub
FANTOM ABC Open in Colab Open in GitHub
HAHA Foreign Direct Investment In Malaysia Open in Colab Open in GitHub
HD ABC Open in Colab Open in GitHub
KIA Malaysia State Election 2018 Open in Colab Open in GitHub
LAB ABC Open in Colab Open in GitHub
MAAM ABC Open in Colab Open in GitHub
MEOW ABC Open in Colab Open in GitHub
MM Malaysia's 14th State Election Result Open in Colab Open in GitHub
PIXALATED ABC Open in Colab Open in GitHub
POTATO ABC Open in Colab Open in GitHub
QnX ABC Open in Colab Open in GitHub
SAMVERSE ABC Open in Colab Open in GitHub
SMOL Population in Malaysia from 2010-2019 Open in Colab Open in GitHub
SQ ABC Open in Colab Open in GitHub
TUK ABC Open in Colab Open in GitHub
UWU Property Listings in Kuala Lumpur Open in Colab Open in GitHub

python_eda's People

Contributors

drshahizan avatar nellyexey avatar peiyu00 avatar diniehazim avatar maizatulafrina avatar afifhazmie avatar yejui626 avatar terence172 avatar izzahmardhiah avatar jrkong2001utm avatar nursyamalia avatar jokeryde avatar aimanhafizi619 avatar rishmafathima avatar adrinaasyiqin avatar mqilee avatar racquelmae avatar madihah04 avatar kelvinnn-2 avatar mikheladam avatar tanyongsheng728 avatar arasayooo avatar samsamsambal avatar madinasuraya avatar farrahinutm avatar prowong01 avatar raihanarahim avatar myzanazifah avatar muhdimranh avatar ongwah avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.