opetchey / bio144 Goto Github PK
View Code? Open in Web Editor NEWBIO144 -- Data Analysis for Biologists
BIO144 -- Data Analysis for Biologists
From second edition of Getting Started with R:
"We note that in the first edition of this book, we also showed tools to build bar charts ± error bars. However, we since decided we don’t like these, so we changed. Many other people don’t like bar charts... they can hide too much1)."
computer science, marketing, advertising, machine learning, financial forecasting, ecology, medicine
Owen, could you please again check question 2 in GA5? There still seems to be something wrong with it, thanks.
This might seem obvious to us, but came up recently. The confusion / uncertainty can be created at least when one describes how binary data can be coded, with "count" of number of successes in one column. So I think its worth reinforcing with students:
Count data: theoretically no upper limit on number of times an "event" (e.g., number of birds observed in a forest plot), not possible to express as a proportion.
Binomial data: upper limit on number of times and "event" is observed (e.g., number of deaths cannot be greater than number of living individuals), possible to express as proportion.
Hi Owen, I opened an issue, really nice feature:-)
I just spotted that there is a question in "Significance" in the graded assessment of week 1. Would it be more approriate to move it to week 9, where we discuss this topic?
In the option
"Statistical significance is often said to be when something is unexpected, given our expectation about what would happen by chance alone"
I think it is better to replace "by chance alone" by "under the null hypothesis".
Nice to put the quadratic lm in, because it shows that although the nonlinearity is well dealt with, the scale-location plot still shows an increase. This is indicating that in the real data the variance is increasing with the mean (fitted value), while the linear model with identity link is assuming variance constant and independent of the mean (fitted value). I think its worth mentioning in slide 12 this as one of the problems. This would link well with content on slide 13 about the variance of the poisson increasing as the mean increases.
On slide 22, first bullet, I suggest pointing out the big difference in the diagnostic plots is in the scale-location plot, which no longer shows an increasing relationship. This is the big effect of using a glm in this case.
Second bullet on slide 22... a glm is still linear regression :)
(I is Steffi, you is Owen)
I just solved the algae ANOVA example as if I was a student. Of course, I'm supposed to be much faster then them. However, I noticed that I loose too much time with looking up the right dplyr and ggplot commands (looks I'm a seasoned R veteran, but I'm really happy;-)). I'd highly appreciate the solution scripts to be more efficient. Also it will help me to not show my different opinion on that to the students, haha.
Also I think that our TAs should have the scripts, so all of us will definitely give the very same advide to the students. I know that by far not all of the TAs have so far embraced the Hadleyverse;)
E.g. fewer practical exercises
In practical 2, Question 1 of "reporting your results" section, there seems to be a problem with the right answers.
In week 01 things to do before class
Lectures numbered 1-12
It is for an old version of the dataset.
Also update the solution script, if required.
In GA 5 I have tried the questions and didn't get all correct:-)
Q1: the t-test is not actually model. Maybe reformulating the question to "Which of these is based on a linear model?" would be more precise?
Q2: I can't do a generalised linear model with the lm function;) Generalized is all the stuff with link function etc, so in general this isn't correct, right?
Q4: I was confused by the degrees of freedom question. The designs that is described there seems to be nested (hierarchical) instead of factorial. Unfortunately, I think we don't have time to cover nested designs, too. Did I misunderstand something here?
check from all relevant previous courses
MAT183
BIO134
then tell Steffi to stop using schedule pdf
In first week, record each of the five trials. So then we can do a mixed model in later week.
Numerical demonstration of how more variables (= parameters) decreases the residual error, but can eventually lead to greater prediction error, due to increases in parameter uncertainty.
Include r-squared and adjusted r-squared.
Correct the questions and any text about the relationship between mass and neocortex.
I just noticed that in the BC video 3 of week 5 there is a problem with the interpretation of the significance of the interaction term (earthworm example, around minute 13). Actually one would have to use the anova table again to test the interaction between Magenumf and Gattung and not the single p-values.
(Btw, for next year I'll plan to bring an example of an interaction term with at least 3 levels already in the lecture, because I have now only used a binary variable with interaction in lecture 4...)
New postdoc in Owen's group starting Jan 2017.
Here is what he has previously taught:
from simple to more complex, from lm to glm to lmm: what changes?
Give help with this.
Perhaps an overview
What makes 37?
Son walking down platform looking at clock, 17.20.
How many numbers am I allowed to us.
How much freedom will you give me.
How much freedom does the have to go wrong?
Lots of freedom gives lots of power.
Total
Mean
Slope intercept
Number of things estimated.
First, what do we use df for, in practice... checking design and model, looking up pvalue.
to make cheating in final exam harder
week 1: No additional material except lecture notes.
week 2: Stahel script "Lineare regression", chapter 2.
alterantively: "Statistische Datenenalayse" book chapters 13.1-13.4.
week 3: Stahel script chapters 3.1, 3.2a-q, 4.1, 4.2f, 4.3a-e;
"Statistische Datenanalyse" book, Chapter 11.2
week 4: Stahel script chapters 3.2u-x, 3.3, 4.1-4.5
week 5: Stahel book ("Statistische Datenanalyse") chapter 12;
"The new statistics with R", chapter 2
GSWR chapters 5.6 and 6.2
week 6: GSWR chapter 6.3
``The new Statistics with R'' chapter 7 (ANCOVA) (ev remove this one?)
Stahel Script chapters 3.4, 3.5 (pp 39-42) and 3.A (pp 43-45)
week 7: Stahel script chapters 5.1-5.4
``Choice and Interpretation of Models'' (Clayton/Hills) chapters 27.1 + 27.2 (pdf provided as a scan)
week 8: Self-study week, see papers and articles provided.
week 9: No BC reading (because covered by self-study week).
week 10: GSWR chapter 7
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.