Giter Site home page Giter Site logo

dsc-pandas-series-and-dataframes-lab-houston-ds-111819's Introduction

Understanding Pandas Series and DataFrames - Lab

Introduction

In this lab, let's get some hands-on practice working with data cleanup using Pandas.

Objectives

You will be able to:

  • Use the .map() and .apply() methods to apply a function to a pandas Series or DataFrame
  • Perform operations to change the structure of pandas DataFrames
  • Change the index of a pandas DataFrame
  • Change data types of columns in pandas DataFrames

Let's get started!

Import the file 'turnstile_180901.txt'.

# Import the required libraries
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
# Import the file 'turnstile_180901.txt'
df = pd.read_csv('turnstile_180901.txt')

# Print the number of rows ans columns in df
print(df.shape)

# Print the first five rows of df
df.head()

Rename all the columns to lower case:

# Rename all the columns to lower case

Change the index to 'linename':

# Change the index to 'linename'

Reset the index:

# Reset the index

Create another column 'Num_Lines' that is a count of how many lines pass through a station. Then sort your DataFrame by this column in descending order.

Hint: According to the data dictionary, LINENAME represents all train lines that can be boarded at a given station. Normally lines are represented by one character. For example, LINENAME 456NQR represents trains 4, 5, 6, N, Q, and R.

# Add a new 'num_lines' column

Write a function to clean column names:

def clean(col_name):
    # Clean the column name in any way you want to. Hint: think back to str methods 
    cleaned = None
    return cleaned
# Use the above function to clean the column names
# Check to ensure the column names were cleaned
df.columns
  • Change the data type of the 'date' column to a date
  • Add a new column 'day_of_week' that represents the day of the week
# Convert the data type of the 'date' column to a date


# Add a new column 'day_of_week' that represents the day of the week 
# Group the data by day of week and plot the sum of the numeric columns
grouped = df.groupby('day_of_week').sum()
grouped.plot(kind='barh')
plt.show()
  • Remove the index of grouped
  • Print the first five rows of grouped
# Reset the index of grouped
grouped = None

# Print the first five rows of grouped

Add a new column 'is_weekend' that maps the 'day_of_week' column using the dictionary weekend_map

# Use this dictionary to create a new column 
weekend_map = {0:False, 1:False, 2:False, 3:False, 4:False, 5:True, 6:True}

# Add a new column 'is_weekend' that maps the 'day_of_week' column using weekend_map
grouped['is_weekend'] = grouped['day_of_week'].map(weekend_map)
# Group the data by weekend/weekday and plot the sum of the numeric columns
wkend = grouped.groupby('is_weekend').sum()
wkend[['entries', 'exits']].plot(kind='barh')
plt.show()

Remove the 'c/a' and 'scp' columns.

# Remove the 'c/a' and 'scp' columns
df = None
df.head(2)

Analysis Question

What is misleading about the day of week and weekend/weekday charts you just plotted?

# Your answer here 

Summary

Great! You practiced your data cleanup skills using Pandas.

dsc-pandas-series-and-dataframes-lab-houston-ds-111819's People

Contributors

mathymitchell avatar loredirick avatar peterbell avatar sumedh10 avatar mas16 avatar mike-kane avatar cheffrey2000 avatar lmcm18 avatar

Watchers

James Cloos avatar  avatar Mohawk Greene avatar Victoria Thevenot avatar Bernard Mordan avatar Otha avatar raza jafri avatar  avatar Joe Cardarelli avatar The Learn Team avatar Sophie DeBenedetto avatar  avatar  avatar Matt avatar Antoin avatar  avatar Alex Griffith avatar  avatar Amanda D'Avria avatar  avatar Ahmed avatar Nicole Kroese  avatar Kaeland Chatman avatar Lisa Jiang avatar Vicki Aubin avatar Maxwell Benton avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.