In this lab, you'll calculate skewness and kurtosis for a given dataset in SciPy using Python.
You will be able to:
- Calculate and interpret values of skewness and kurtosis
In the previous lesson, you have seen formulas to calculate skewness and kurtosis for your data. SciPy comes packaged with these functions and provides an easy way to calculate these two quantities, see scipy.stats.kurtosis and scipy.stats.skew. Check out the official SciPy documentation to dig deeper into this. Otherwise, simply pull up the documentation within the Jupyter notebook using shift+tab
within the function call or pull up the full documentation with kurtosis?
or skew?
, once you have imported these methods from the SciPy package.
You'll generate two datasets and measure/visualize and compare their skew and kurtosis in this lab.
# Import required libraries
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import kurtosis, skew
- Generate a random normal variable
x_random
in NumPy with 10,000 values. Set the mean value to 0 and the standard deviation to 2. - Plot a histogram of the data, set bins to
auto
(default). - Calculate the skewness and kurtosis for this data distribution using the SciPy functions.
- Record your observations about the calculated values and the shape of the data.
x_random = None
# Skewness = -0.0025781248189666343
# kurtosis = 0.03976806960642154
Skewness = -0.01442829768952485
kurtosis = 0.016922288438713018
# Your observations here
#
#
#
Let's generate another distribution
x = np.linspace( -5, 5, 10000 )
y = 1./(np.sqrt(2.*np.pi)) * np.exp( -.5*(x)**2 ) # normal distribution
- Plot a histogram for data
$y$ , and set bins to auto (default). - Calculate the skewness and kurtosis for this data distribution using the SciPy functions.
- Record your observations about the calculated values and the shape of the data.
# Skewness = 1.109511549276228
# kurtosis = -0.31039027765889804
Skewness = 1.109511549276228
kurtosis = -0.31039027765889804
# Your observations here
#
#
#
In this lesson we learned how to calculate, visualize, and analyze the skewness and kurtosis for any given distribution. We worked with synthetic datasets at this stage to get the concepts cleared up. Later we will try these techniques on real datasets to see if they are fit for analysis (or not).