This short lesson summarizes the topics we covered in section 20 and why they'll be important to you as a data scientist.
You will be able to:
- Understand and explain what was covered in this section
- Understand and explain why this section will help you become a data scientist
Some of the key takeaways from this section include:
- It's important to have a sound approach to experimental design to be able to determine the significance of your findings
- Start by examining any existing research to see if it can shed light on the problem you're studying
- Start with a clear alternative and null hypothesis for your experiment to "prove"
- It's important to have a thoughtfully selected control group from the same population for your trial to distinguish effect from variations based on population, time or other factors
- Sample size needs to be selected carefully to ensure your results have a good chance of being statistically significant
- Your results should be reproducible by other people and using different samples from the population
- The p-value for an outcome determines how likely it is that the outcome could be due to chance
- The alpha value is the marginal threshold at which we're comfortable rejecting the null hypothesis
- An alpha of 0.05 is a common choice for many experiments
- Effect size measures just the size in difference between two groups under observation, whereas statistical significance combines effect size with sample size
- A one sample t-test is used to determine whether a sample comes from a population with a specific mean.
- A two-sample t-test is used to determine if two population means are equal
- Type 1 errors (false positives) are when we accept an alternative hypothesis which is actually false
- The alpha that we pick is the likelihood that we will get a type 1 error due to random chance
- Type 2 errors (false negatives) are when we reject an alternative hypothesis which is actually true