Giter Site home page Giter Site logo

exp-dist-analysis's Introduction

An Analysis of the Properties of the Exponential Distribution and Demonstration of the CLT

Connor M
March 15, 2015

Introduction

It will be the purpose of this report to compare and contrast the following:

  1. The mean of a sample drawn from an exponential distribution and the mean of the theoretical exponential distribution said sample was drawn from.
  2. The variance of a sample drawn from an exponential distribution and the variance of the theoretical exponential distribution said sample was drawn from.

We will also show how the distribution of a large amount of sample means drawn from an exponential distribution approximately follows a gaussian (normal) distribution. This phenomenon is described by the Central Limit Theorem (CTL).

Sample Mean vs. Population Mean

Note: For the purposes of this report, the distribution for which all samples will be drawn from will be an exponential distribution with $\lambda = 0.2$. i.e. $X \sim Exp(\lambda = 0.2)$

The expected value (or mean) of an exponential distribution is $E(x) = \frac{1}{\lambda}$. Since $\lambda = 0.2$ for our theoretical distribution, the mean will be 5. Similarly, the standard deviation of our theoretical distribution is also $\frac{1}{\lambda}$ therefore $\sigma = 5$.

[\mu = E(X) = \frac{1}{\lambda} = \frac{1}{0.2} = 5]

[\sigma = \frac{1}{\lambda} = \frac{1}{0.2} = 5]

Our first task is to compare the theoretical mean of our exponential distribution to the mean of a sample drawn from our population distribution.

Let's first start with generating a random sample drawn from our exponential distribution.

# Set the seed so the results are reproduciable
set.seed(12345)

small_sample <- rexp(n = 40, rate = 0.2)

This sample is relatively small (sample size of 40). The sample mean may or may not be close to the population mean, since the sample size is so small.

mean(small_sample)
## [1] 5.553693

As you can see, the mean of this small sample are kind of close to the population mean, but not exactly. We can, however, generate a sample that will be have a sample mean that will be close to the population mean. The Law of Large Numbers states that "the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.^[Law of large numbers. (n.d.). In Wikipedia. Retrieved March 15, 2015, from http://en.wikipedia.org/wiki/Law_of_large_numbers]" Let's generate a large sample now.

large_sample <- rexp(n = 100000, rate = 0.2)

Here's the mean of the sample:

mean(large_sample)
## [1] 4.990497

The sample mean and the mean of the population distribution are very close to one another.

Sample Variation vs. Population Variation

Let's take a look at the spread of sample distribution and the population distribution via the standard deviation. We are going to be using the standard deviation as opposed to the variation for examining the variation of each distribution because the units will not be squared.

sd(small_sample)
## [1] 6.475255
sd(large_sample)
## [1] 4.991409

The standard deviation of the small sample distribution isn't very close to the population distribution's standard deviation, which is expected, since the sample size is small. The standard deviation of the large sample distribution is almost the same as the population's, which is a consequence of the Law of Large Numbers.

Distribution of Sample Means

We will now investigate the distribution of sample means. The Central Limit Theorem (CLT) states "the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed, regardless of the underlying distribution.^[Central Limit Theorem. (n.d.). In Wikipedia. Retrieved March 15, 2015, from http://en.wikipedia.org/wiki/Central_limit_theorem]" Let's examine this claim empirically.

First, let's generate 1000 samples of size 40 from our population distribution. The rows in the following matrix correspond to each sample and the columns correspond to each random variable. Next, let's compute the mean of each sample and store the results in a data frame.

# 1000 by 40 matrix
sample_matrix <- matrix(rexp(40000, 0.2), 1000)
average_means <- data.frame(avg_means = rowMeans(sample_matrix))

Based on the CLT, this distribution of sample means should be approximately normal. Let's visualize the distribution via a histogram:

library(ggplot2)

# Find the standard score of each average
mu <- 5 
sigma <- 5
standardized_means <- (average_means - mu) / sigma

ggplot(standardized_means, aes(x = standardized_means$avg_means)) + 
    geom_histogram(colour = "black", fill = "white") +
    ggtitle("Histogram of the Distribution of Sample Means") +
    ylab('Frequency') +
    xlab('Sample Mean Values')

Based on the histogram, the distribution of averages appears to be normal despite the fact that the samples were drawn from an exponential distribution!

exp-dist-analysis's People

Watchers

Connor Moreside avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.