This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks.
You can read the book in its entirety online at https://jakevdp.github.io/PythonDataScienceHandbook/
The book was written and tested with Python 3.5, though older Python versions (including Python 2.7) should work in nearly all cases.
The book introduces the core libraries essential for working with data in Python: particularly IPython, NumPy, Pandas, Matplotlib, and related packages.
- 2.0 Numpy
- 2.1 Data Types
- 2.2 Numpy Arrays
- 2.3 Universal Functions
- 2.4 Aggregations
- 2.5 Broadcasting
- 2.6 Boolean Arrays and Masks
- 2.7 Indexing
- 2.8 Sorting
- 2.9 Structured Arrays
- 3.0 Pandas
- 3.1 Pandas Objects
- 3.2 Indexing and Selection
- 3.3 Operations
- 3.4 Missing Values
- 3.5 Hierarchical Indexing
- 3.6 Concat and Append
- 3.7 Merge and Join
- 3.8 Aggregation and Grouping
- 3.9 Pivot Tables
- 3.10 Strings
- 3.11 Time Series
- 3.12 Eval and Query
- 4.0 Matplotlib
- 4.1 Line Plots
- 4.2 Scatter Plots
- 4.3 Errorbars
- 4.4 Density and Contour Plots
- 4.5 Histograms and Binnings
- 4.6 Legends
- 4.7 Colorbars
- 4.8 Subplots
- 4.9 Text and Annotation
- 4.10 Ticks
- 4.11 Setting and Stylesheets
- 4.12 3D Plotting
- 4.13 Geographic Data with Basemap
- 4.14 Visualization with Seaborn
- 5.0 Machine Learning
- 5.1 What is Machine Learning
- 5.2 Scikit-Learn
- 5.3 Hyperparameters and Validation
- 5.4 Feature Engineering
- 5.5 Naive Bayes
- 5.6 Linear Regression
- 5.7 Support Vector Machines
- 5.8 Random Forests
- 5.9 Priciple Component Analysis
- 5.10 Manifold Learning
- 5.11 K-means
- 5.12 Gaussian Mixtures
- 5.13 Kernel Density Estimation
- 5.14 Image Features
The code in the book was tested with Python 3.5, though most (but not all) will also work correctly with Python 2.7 and other older Python versions.
The packages I used to run the code in the book are listed in requirements.txt (Note that some of these exact version numbers may not be available on your platform: you may have to tweak them for your own use). To install the requirements using conda, run the following at the command-line:
$ conda install --file requirements.txt
To create a stand-alone environment named PDSH
with Python 3.5 and all the required package versions, run the following:
$ conda create -n PDSH python=3.5 --file requirements.txt
You can read more about using conda environments in the Managing Environments section of the conda documentation.
The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.
The text content of the book is released under the CC-BY-NC-ND license. Read more at Creative Commons.