Light

charlesfrye / appliedstatisticsforneuroscience Goto Github PK

View Code? Open in Web Editor NEW

52.0 4.0 17.0 35.11 MB

Materials for UC Berkeley Neuroscience 299

Jupyter Notebook 98.99% CSS 0.08% Python 0.93% Shell 0.01%

neuroscience statistics jupyter-notebook applied-statistics python tutorial

appliedstatisticsforneuroscience's Introduction

Applied Statistics For Neuroscience

This repository contains materials for the UC Berkeley course Neuroscience 299, Applied Statistics for Neuroscientists.

The course is divided into three parts: setup and review, statistical testing, and statistical modeling. Within a part, materials are organized into folders that correspond to weeks of the semester. These folders contain Jupyter notebooks that serve as tutorial material and labs for the course. Tutorials should be completed before labs.

The course can be completed either totally online or on your own machine. Completing the course online means you don't have to install anything locally, but it means you'll have a harder time saving your work.

Though technically this course does not assume you have any background in computing or Python, it's highly recommended that you get familiar with the basics before starting. I recommend Codecademy's Python course up through section 8.

Local Version

To run this class locally, i.e. on your own computer, start by downloading the materials. Click the green "Clone or Download" button and choose "Download ZIP". Unzip the resulting file into the location of your choosing.

Follow the instructions here. You'll need to install an appropriate computational environment, as described in the installation instructions. The notebooks in Part 00 - Setup and Review/00 - Setup will get you acquainted with the Jupyter notebook format, Python, and the statistical libraries used in this course.

Online Version - Binder

Alternatively, you can run the notebooks in this course on the cloud via the service binder by clicking the badge below.

This will create a Jupyter notebook server on a remote computer, then give you access to it via your web browser. This avoids you having to install anything on your machine. Any changes you make to the notebooks will not be saved, however, so this is better suited for quickly checking out just a single section or reading through the solutions.

appliedstatisticsforneuroscience's People

Contributors

Stargazers

Watchers

Forkers

spopham jaedukseo maedbhk dipsingh ssyoon bigdatasciencegroup mitchellb16 frdarya rainneuro manonironside anhnguyendepocen skoobyooby saurabhr franklelechen ahmeddeladly kattvalderas

appliedstatisticsforneuroscience's Issues

Renaming files

Renaming files to reduce redundancy

Remove numbers from lab file names
Remove numbers from lab titles
Remove numbers from utils

Need to write two blogposts

Why Gaussians? blogpost - link to it from lab on inferential stats and error bars
Linear Algebra for Neuroscientists blogpost - link to it from lab on linear algebra

In ANOVA II, by hand,

(1). You have the same paragraph twice under "Extending ANOVA to multiple variables"

(2). In "Two Way ANOVA", the 1st equation cell box, the equation goes off the right side (too long) in my browser.

(3). In the "loading the data" section, just under where it says "Compare the above Python expression to the mathematical expression below and make sure you can match up terms", I feel like this equation "ϵijk=Yijk−Ai−Bj−ABij" is missing the grand mean? Also, you might could use more headers or sections?

Produce environment file for Anaconda Cloud

Define an Anaconda Python virtual environment, upload that environment to Anaconda Cloud, and use the Share Environments feature to have students install all the requirements in one line.

Add an "Outside Resources" document to each section

So that folks using the online version of the course will have access to some of the readings from the in-person version.

Some of these resources might be behind paywalls -- need to check this and give notice.

Use scikit-learn for the modeling section?

scikit-learn is easy to use and, as of 0.18, well-integrated with pandas.

The interactive sections aren't necessarily worth changing, but the sections on fitting and cross-validating probably are.

Lab 11 Issues

Move Q17, on how to determine accuracy, to after the first plot is made.

IOPub issue from Jupyter >=5

In the conversion from 4 to 5, Jupyter added a "feature" that halts data transmission from the kenrel to the client if it exceeds a certain threshold. This was intended to prevent massive text dumps to stdout or stderr, but it sometimes triggers when rapidly passing visualizations from the kernel to the notebook -- e.g. when making a "poor man's animation" by repeatedly calling canvas.draw on pyplot figures.

Simply filtering warnings doesn't work, since this is coming from Jupyter itself. The fix is to increase the threshold, but this requires reconfiguration of Jupyter, which I'd like to avoid, for the students' sake. I've reduced the load of some animations in Lab 08 A to prevent this, but I worry this might also trigger for other plots. Will need to be vigilant when creating student versions to see if this crops up elsewhere. Was supposed to be fixed in 5.1, but that doesn't appear to be the case 😢 .

Install instructions reference VS265

Change over to shared utils for `format_dataframes` and `format_plots`

Subsumes #32

MathJax doesn't run on a server; it runs client-side

In the Tech Tools Tutorial, incorrectly described MathJax as a cloud service when it's actually JavaScript that runs on the user's machine. Oops!

In ANOVA I by Hand tutorial

In Computing Mean Squares, under the mean_square compute cell, you say "how much different the groups" and I think it sounds better "how different" or "how much the groups differ"

In Anova2 / Tutorial by hand

(1). In lab A, running ANOVA section. 2nd ANVOA typo.

(2). In answer for Q2 paragraph 1, "Once could possibly also note" ... typo.

Lab 05 Issues

Should include some indication of the different names that different authors give to the quantities under consideration

Lab 07 Issues

Def'n of scaling -- is this standard?
- NO - include iso and aniso scaling
Typoes

Review Tech Tools Tutorials

In Week 01, we'll cover the major tools we'll be using for the course: Python, Jupyter, Pandas, and Seaborn. I've also included a quick reference on LaTeX and a placeholder for Numpy, in case we want to front-load that material.

I'd appreciate someone else's eyes going over the tutorials, though that someone else might just be future Charles.

Revisit and revise install instructions

Add Exercises to Tech Tools Tutorials

These will be discussed in the second meeting of the class.

Confirm all notebooks run with new environment

In Hypothesis Testing Tutorial

Slight ambiguity on whether rejecting the null hypothesis and accepting the alternative hypothesis are the same thing. This is either a small point or a misunderstanding on my part, but they are not the same, right? We can not prove the alt hypothesis.

Lab 10 Issues

~~argmin_theta for linear models w gaussian error should be argmin_w~~

in 00_setup / tutorial B

In table of contents, do you want to indent 1.2.1 & 1.2.2 further than 1.2?
In from future import jetpack, a typo seaborn is visualization
In more python!, link at bottom to pandas notebook is broken.

Dependencies

Add sklearn and statsmodels as dependencies. Might not need both -- need to check on sklearn's support for ANOVA-type GLMs.

Consider presentation style for labs

randomly-chosen, walk through notebook, replace small groups or replace combined discussion?

01 Lab A

In "## What are Probability Distributions?" you list 3 common sense ideas that define distributions, but it's not clear whether these are THE assumptions of a distribution or just some of them.
The intuitive explanation for mass functions in "### Probability Mass Functions" could also apply to density functions which confused me, doesn't make it clear that mass functions are discrete
In "### Probability Density Functions" the alternating references to density fns and probability density fns confused me

Convert Tests for 2-Sample Data from R to Python

Update seaborn dependency

This will allow plotting standard deviation error bars and simplify the presentation in the notebook on error bars.

Recreate student versions

Once this round of updates is complete, go back and create student versions of each notebook

Add link to mixture models blog post

In clustering tutorial, should link to my blog post on quantal release and latent variable models. It's a cool example and would possibly be worth integrating into the course in the future.

Question Formatting in Labs 01 and 02

Early labs have a different format for questions and answers -- they need to be updated to match the later labs.

Lab 02 Issues

We're plotting the sampling distribution -- ask a question that makes them name this.

Clarify the "experiment" -- we're running many experiments, drawing many samples, and seeing how we do on averge

Improve stylistic consistency

underscores_for_functions
CamelCaseForClasses

Use @ sign instead of np.dot in modeling labs?

For consistency's sake, this might require changing the interactive model code, which would be a pain.

Pull installation instructions from VS265 and adapt.

making this repo open?

Hey - a colleague was asking about this class and when I went to share the github link I realized it was a private repo. Any particular reason for this?

In ANOVA II, Lab B.

(1). In def run_experiments cell, you have "utils.generateData" instead of generate_data.

(2). In def run_experiments cell, you use false_positive_rates and f_p_rate.

Lab 08 Issues

Too long! Less coding next time.

Add discussion on statistical versus practical significance to hypothesis testing and/or tests for 2-sample data

This was a key point in discussions last year and should be included in the material -- ideally both tutorial and lab material.

Clustering Tutorial Issues

The K-Means implementation is a hack -- it's just EM with small, isotropic covariance matrices. Need to do an honest-to-goodness K-Means implementation.

Add a warnings filter to final versions

Lots of deprecation and similar warnings that students don't need to see.

Can be fixed by adding

import warnings
warnings.simplefilter("ignore")

to the shared formatting code.

Issue: ANOVA by Hand

MSEs are unmotivated, remove
standardize capitalization
rephrase "Below, we get more specific on in which cases this "should" be true." Perhaps "This is not true of all possible relationships between group and observation, but is true in many cases, as explained below." Perhaps footnote?

Collect Intro to Python materials.

Need material to cover relevant parts of core Python, ~~numpy, and matplotlib~~ pandas, and seaborn.

e: focus has changed to seaborn and pandas, rather than numpy and matplotlib

Lab 06 Issues

~~Definition of omnibus test needs to also be in part b~~

Comment omnibus test code out from student version and indicate that it's for later.

Lab 09 Issues

Confirm compatibility with updated Model class

confirmed on 10/23/2017

Create example folder with README

Need a folder that shows the desired format for Week XX folders, with guiding README

IN sampling and bootstrapping tutorial A

(1). Under "estimating sample distribution, I don't understand this. They are the same:

"Note that both variables are actually functions, so we write
sampler = np.random.standard_normal
rather than
sampler = np.random.standard_normal"

(2). In "visualizing bootstrapping w histograms", you use this plot_bootstrap function, but it gives an error. Is that something you meant to define or a library you meant to include?

Should tutorial and lab material be more clearly demarcated in the latter half of the course?

Idea: labs are mixtures of simulations and applications; tutorials are reading material plus conceptual questions

Rewrite tech tutorials from VS265 - JuPyter, LaTeX.

in bootstrapping tutorial B

In cautionary tales, "you can explaint to yourself"

Change seaborn settings for all notebooks

Add sns.set_context("notebook", font_scale=2)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.