Giter Site home page Giter Site logo

subhacom / np_tut_breastcancer Goto Github PK

View Code? Open in Web Editor NEW
5.0 3.0 3.0 571 KB

Numpy and matplotlib tutorial for beginners using a breastcancer dataset

Jupyter Notebook 100.00%
numpy-tutorial matplotlib-tutorial jupyter-notebook datascience data-science data-analysis

np_tut_breastcancer's Introduction

Numpy and matplotlib tutorial for beginners using a breastcancer dataset.

Running this tutorial online:

You can Launch this tutorial on Binder or Google Colab. The slide-show will work on binder, but not colab. Press Alt+r to toggle slide-show.

Running this tutorial on your own computer:

You need to have Python installed with numpy, matplotlib and pandas libraries. https://www.anaconda.com/ Python distribution is easy to install and bundles common libraries for scientific computing.

  1. Click the Clone or download button.
  2. Select Download ZIP.
  3. Unzip the downloaded zip file into a folder.
  4. Open a terminal (command prompt/anaconda prompt on Windows) and go to this folder by entering cd folder-path command).
  5. Start jupyter notebook (enter jupyter notebook). This will open a browser window showing the contents of the folder.
  6. Click the Wisconsin_breast_cancer_data.ipynb file and this will open the notebook in a new window or tab of the browser.
  7. In order to run it as slide-show you must install https://rise.readthedocs.io module first. Press Alt+r to toggle slide-show.

Summary

This tutorial explores basic usage of numpy and matplotlib using this publicly available dataset: http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin.

  • Introduction to jupyter including running it on the cloud.
  • Import python modules.
  • numpy basics
    • array creation, arithmetic, indexing, slicing, reshaping.
    • heterogeneous arrays (structured/record arrays with named fields).
  • load data from csv (text) file.
    • (advanced) use requests library to retrieve data from the Internet.
    • (advanced) use StringIO to read from string containing the csv data.
    • (basic) specify field names and data types using numpy dtype.
    • (basic) specify missing value handling when loading data
    • (basic) check columns for missing values.
  • Pandas
    • (basic) load data using Pandas.
    • (basic) accessing rows and columns in Pandas dataframe.
    • (basic) do simple boxplots of dataframe columns. -matplotlib
    • basic plotting
      • line plot
      • scatter plot
      • histogram
      • box plot
    • legend
    • subplots
  • list of other useful modules

History:

It was originally the first part of a two part workshop on the basics of numpy, matplotlib and pandas for data science. The first version used in the live workshop can be downloaded as an archive here: https://github.com/subhacom/np_tut_breastcancer/releases/tag/v0.1. This notebook may be updated occasionally.

The second part of the workshop focuses on pandas for data science and is available here: https://github.com/bballew/pandas_tutorial.

np_tut_breastcancer's People

Contributors

subhacom avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.