Giter Site home page Giter Site logo

charlesfrye / appliedstatisticsforneuroscience Goto Github PK

View Code? Open in Web Editor NEW
52.0 4.0 17.0 35.11 MB

Materials for UC Berkeley Neuroscience 299

Jupyter Notebook 98.99% CSS 0.08% Python 0.93% Shell 0.01%
neuroscience statistics jupyter-notebook applied-statistics python tutorial

appliedstatisticsforneuroscience's Introduction

Applied Statistics For Neuroscience

This repository contains materials for the UC Berkeley course Neuroscience 299, Applied Statistics for Neuroscientists.

The course is divided into three parts: setup and review, statistical testing, and statistical modeling. Within a part, materials are organized into folders that correspond to weeks of the semester. These folders contain Jupyter notebooks that serve as tutorial material and labs for the course. Tutorials should be completed before labs.

The course can be completed either totally online or on your own machine. Completing the course online means you don't have to install anything locally, but it means you'll have a harder time saving your work.

Though technically this course does not assume you have any background in computing or Python, it's highly recommended that you get familiar with the basics before starting. I recommend Codecademy's Python course up through section 8.

Local Version

To run this class locally, i.e. on your own computer, start by downloading the materials. Click the green "Clone or Download" button and choose "Download ZIP". Unzip the resulting file into the location of your choosing.

Follow the instructions here. You'll need to install an appropriate computational environment, as described in the installation instructions. The notebooks in Part 00 - Setup and Review/00 - Setup will get you acquainted with the Jupyter notebook format, Python, and the statistical libraries used in this course.

Online Version - Binder

Alternatively, you can run the notebooks in this course on the cloud via the service binder by clicking the badge below.

Binder

This will create a Jupyter notebook server on a remote computer, then give you access to it via your web browser. This avoids you having to install anything on your machine. Any changes you make to the notebooks will not be saved, however, so this is better suited for quickly checking out just a single section or reading through the solutions.

appliedstatisticsforneuroscience's People

Contributors

charlesfrye avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

appliedstatisticsforneuroscience's Issues

Renaming files

Renaming files to reduce redundancy

  • Remove numbers from lab file names
  • Remove numbers from lab titles
  • Remove numbers from utils

Need to write two blogposts

  • Why Gaussians? blogpost - link to it from lab on inferential stats and error bars
  • Linear Algebra for Neuroscientists blogpost - link to it from lab on linear algebra

In ANOVA II, by hand,

(1). You have the same paragraph twice under "Extending ANOVA to multiple variables"

(2). In "Two Way ANOVA", the 1st equation cell box, the equation goes off the right side (too long) in my browser.

(3). In the "loading the data" section, just under where it says "Compare the above Python expression to the mathematical expression below and make sure you can match up terms", I feel like this equation "ϵijk=Yijk−Ai−Bj−ABij" is missing the grand mean? Also, you might could use more headers or sections?

Add an "Outside Resources" document to each section

So that folks using the online version of the course will have access to some of the readings from the in-person version.

Some of these resources might be behind paywalls -- need to check this and give notice.

Use scikit-learn for the modeling section?

scikit-learn is easy to use and, as of 0.18, well-integrated with pandas.

The interactive sections aren't necessarily worth changing, but the sections on fitting and cross-validating probably are.

Lab 11 Issues

Move Q17, on how to determine accuracy, to after the first plot is made.

IOPub issue from Jupyter >=5

In the conversion from 4 to 5, Jupyter added a "feature" that halts data transmission from the kenrel to the client if it exceeds a certain threshold. This was intended to prevent massive text dumps to stdout or stderr, but it sometimes triggers when rapidly passing visualizations from the kernel to the notebook -- e.g. when making a "poor man's animation" by repeatedly calling canvas.draw on pyplot figures.

Simply filtering warnings doesn't work, since this is coming from Jupyter itself. The fix is to increase the threshold, but this requires reconfiguration of Jupyter, which I'd like to avoid, for the students' sake. I've reduced the load of some animations in Lab 08 A to prevent this, but I worry this might also trigger for other plots. Will need to be vigilant when creating student versions to see if this crops up elsewhere. Was supposed to be fixed in 5.1, but that doesn't appear to be the case 😢 .

In ANOVA I by Hand tutorial

In Computing Mean Squares, under the mean_square compute cell, you say "how much different the groups" and I think it sounds better "how different" or "how much the groups differ"

In Anova2 / Tutorial by hand

(1). In lab A, running ANOVA section. 2nd ANVOA typo.

(2). In answer for Q2 paragraph 1, "Once could possibly also note" ... typo.

Lab 05 Issues

Should include some indication of the different names that different authors give to the quantities under consideration

Lab 07 Issues

  • Def'n of scaling -- is this standard?
    • NO - include iso and aniso scaling
  • Typoes

Review Tech Tools Tutorials

In Week 01, we'll cover the major tools we'll be using for the course: Python, Jupyter, Pandas, and Seaborn. I've also included a quick reference on LaTeX and a placeholder for Numpy, in case we want to front-load that material.

I'd appreciate someone else's eyes going over the tutorials, though that someone else might just be future Charles.

In Hypothesis Testing Tutorial

Slight ambiguity on whether rejecting the null hypothesis and accepting the alternative hypothesis are the same thing. This is either a small point or a misunderstanding on my part, but they are not the same, right? We can not prove the alt hypothesis.

Lab 10 Issues

argmin_theta for linear models w gaussian error should be argmin_w

in 00_setup / tutorial B

  1. In table of contents, do you want to indent 1.2.1 & 1.2.2 further than 1.2?

  2. In from future import jetpack, a typo seaborn is visualization

  3. In more python!, link at bottom to pandas notebook is broken.

Dependencies

Add sklearn and statsmodels as dependencies. Might not need both -- need to check on sklearn's support for ANOVA-type GLMs.

01 Lab A

  • In "## What are Probability Distributions?" you list 3 common sense ideas that define distributions, but it's not clear whether these are THE assumptions of a distribution or just some of them.
  • The intuitive explanation for mass functions in "### Probability Mass Functions" could also apply to density functions which confused me, doesn't make it clear that mass functions are discrete
  • In "### Probability Density Functions" the alternating references to density fns and probability density fns confused me

Update seaborn dependency

This will allow plotting standard deviation error bars and simplify the presentation in the notebook on error bars.

Lab 02 Issues

We're plotting the sampling distribution -- ask a question that makes them name this.

Clarify the "experiment" -- we're running many experiments, drawing many samples, and seeing how we do on averge

making this repo open?

Hey - a colleague was asking about this class and when I went to share the github link I realized it was a private repo. Any particular reason for this?

In ANOVA II, Lab B.

(1). In def run_experiments cell, you have "utils.generateData" instead of generate_data.

(2). In def run_experiments cell, you use false_positive_rates and f_p_rate.

Clustering Tutorial Issues

The K-Means implementation is a hack -- it's just EM with small, isotropic covariance matrices. Need to do an honest-to-goodness K-Means implementation.

Add a warnings filter to final versions

Lots of deprecation and similar warnings that students don't need to see.

Can be fixed by adding

import warnings
warnings.simplefilter("ignore")

to the shared formatting code.

Issue: ANOVA by Hand

  • MSEs are unmotivated, remove
  • standardize capitalization
  • rephrase "Below, we get more specific on in which cases this "should" be true." Perhaps "This is not true of all possible relationships between group and observation, but is true in many cases, as explained below." Perhaps footnote?

Collect Intro to Python materials.

Need material to cover relevant parts of core Python, numpy, and matplotlib pandas, and seaborn.

e: focus has changed to seaborn and pandas, rather than numpy and matplotlib

Lab 06 Issues

Definition of omnibus test needs to also be in part b

Comment omnibus test code out from student version and indicate that it's for later.

Lab 09 Issues

Confirm compatibility with updated Model class

  • confirmed on 10/23/2017

IN sampling and bootstrapping tutorial A

(1). Under "estimating sample distribution, I don't understand this. They are the same:

"Note that both variables are actually functions, so we write
sampler = np.random.standard_normal
rather than
sampler = np.random.standard_normal"

(2). In "visualizing bootstrapping w histograms", you use this plot_bootstrap function, but it gives an error. Is that something you meant to define or a library you meant to include?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.