Giter Site home page Giter Site logo

larshoffmann3 / python-data-wrangling Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dlab-berkeley/python-data-wrangling-legacy

0.0 0.0 0.0 2.29 MB

D-Lab's 3 hour introduction to data wrangling in Python. Learn how to import and manipulate dataframes using pandas in Python.

License: Other

Jupyter Notebook 100.00%

python-data-wrangling's Introduction

D-Lab Introduction to Pandas workshop

This repository contains materials for the introductory pandas workshop at the UC Berkeley D-Lab.

1. Software for the workshop

The best learning experience happens when you can edit and run code. So, please have Python Anaconda Distribution 3.7, pandas, matplotlib, and Jupyter installed before the start of the workshop. Alternatively, if you cannot install Anaconda, you can still access the workshop materials through this datahub link. Note, this will only work if you have a berkeley.edu email address.

To use Anaconda, follow the steps below to setup your environment:

  1. Click here to download Python Anaconda 3.7 Distribution, although 3.6 is also okay if you already have it installed. Scroll down to the "Anaconda Installers" section and click the "Graphical Installer" option that corresponds to your operating system.

  2. If you are using Terminal (Mac) or GitBash (PC), you can pip install the necessary packages by typing:

$ pip install pandas matplotlib jupyter

Windows users only - if you wish to emulate the Bash programming language found in Mac users' "Terminal" application, click here to download GitBash, a Unix command-line environment for Windows users.

Alternatively, you can install these packages by adding a cell to the top of your Jupyter Notebook and typing:

!pip install pandas matplotlib jupyter

2. Files for the workshop

Once the software is installed, download the necessary files for the workshops which are contained in this repository. Get them by doing the following:

  1. Click the green "Clone or Download" button
  2. Click "Download Zip"
  3. Extract this .zip file someplace familiar, such as your Desktop.

Or, if you are a Git user you can simply clone this repository

$ git clone [email protected]:dlab-berkeley/introduction-to-pandas.git

3. Open a Jupyter Notebook

  1. Open the "Anaconda Navigator" application and click "Launch" under Jupyter Notebook

or

Navigate to the respository using Terminal or Gitbash and type

$ cd introduction-to-pandas

then

$ jupyter notebook or python3 -m notebook

This will open a blank notebook for you to use as a scratch space is you desire. Open the file "introduction-to-pandas.ipynb" to access the tutorial.

4. Outline

For this workshop, we'll go through an example using European unemployment data. We'll load, view, and modify the data as well as calculate some descriptive statistics. The idea is to get a sense of what it would be like to use pandas as part of your workflow.

We plan to cover:

  • pandas data structures
  • loading data
  • subsetting and filtering
  • calculating summary statistics
  • dealing with missing values
  • merging data sets
  • creating new variables
  • basic plotting
  • exporting data

5. Resources

Getting started with pandas

10 minutes to pandas

Visualization with pandas

6. Launch binder

If you have trouble installing the software or can otherwise not get the Jupyter Notebook to open, click this "launch binder" badge to start this session Binder

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.