datacarpentry / python-ecology-lesson Goto Github PK

View Code? Open in Web Editor NEW

159.0 25.0 310.0 22.51 MB

Data Analysis and Visualization in Python for Ecologists

Home Page: https://datacarpentry.org/python-ecology-lesson

License: Other

Python 0.10% Jupyter Notebook 99.90%

carpentries data-carpentry lesson python data-wrangling data-visualisation data-visualization english ecology stable

python-ecology-lesson's Introduction

Data Carpentry Python Lessons with Ecological Data

This repository contains the Data Carpentry Python material based on ecological data. Please see our contribution guidelines for information on how to contribute updates, bug fixes, or other corrections.

Contributing

We welcome all contributions to improve the lesson! Maintainers will do their best to help you if you have any questions, concerns, or experience any difficulties along the way.

We'd like to ask you to familiarize yourself with our Contribution Guide and have a look at the more detailed guidelines on proper formatting, ways to render the lesson locally, and even how to write new episodes.

Please see the current list of issues for ideas for contributing to this repository. For making your contribution, we use the GitHub flow, which is nicely explained in the chapter Contributing to a Project in Pro Git by Scott Chacon. Look for the tag . This indicates that the maintainers will welcome a pull request fixing this issue.

Maintainer(s)

Current maintainers of this lesson are

Authors

A list of contributors to the lesson can be found in AUTHORS

Citation

To cite this lesson, please consult with CITATION

python-ecology-lesson's People

Contributors

Stargazers

Watchers

Forkers

lwasser acharbonneau mjmichaelson lukasdrude jbcoop joleng xuf12 clairetian cmacdonell ctb tracykteal chessiq chstaiger yifanwur menegon amueller midnightradio eloyvallinaes borsdorff stijnvanhoey mboisson iamc henrykironde pywaker thomasballinger billyc tpoisot jnandez valpasq martin-jung junhuili seanrgb gabayae kpolimis wrightaprilm snamburi3 aschoenr lexual michaelconnell barrachri hugobowne mikemac8888 katiemlyons bobtodd tomhohenstein djinnome jarmokivekas cor215 aykol kariljordan pletzer tmorrell imagingearth anhnguyendepocen tv3141 weiapplele calculquebec cmortiz dmcguckin thejacksonlaboratory grseb9s nickynicolson willingc computecanada mjones01 kirstypringle versae njisrawi godfoder stephlabou liznaluminsa johnpfay zzygyx9119 mikeschwendy sechilds r-santhir yeemey jthmiller ivastar bgbg mkuzak kmfolgar brownsarahm majusogi uthaipon likefokkens lucadistasio mesfind acdf agbeltran genemachines alexmorley grazulis colinmorris petersmyth12 priya-gittest joshsteele piovere guoxiaohu maxvillev

python-ecology-lesson's Issues

Integration with the new lesson template should be improved

Happy 2017 everybody!

New year, new lesson template, and I've started work on improving our lesson's integration with that template in the new-template-improvements branch, following the example lesson guidelines.

You can see the current state rendered here, which means so far:

Enable the "episodes" drop-down menu by moving lesson/episode content to the _episodes directory
Use the fig directory for figures (and update all in-text references to point to the new directory)
Fix broken link to the "setup" page
Move learning objectives to YAML front matter

Still to be added to the YAML front matter for each episode before merging:

Timing estimates (for teaching & for exercises)
- This will generate a lesson schedule, addressing #91
Motivating questions
Key points

If you'd like to help, pull requests against the new-template-improvements branch will be more than welcome. :)

add link to the data in the 'starting with data' lesson

There is not currently a link to the data used in the lesson, in the '01-starting-with-data' lesson. We should add a link to the data.

That link is currently in the repo in https://github.com/datacarpentry/python-ecology/tree/gh-pages/data

When we switch over to the FigShare database as in #23 we can update the link.

Lessons 00/01 getting a little overfull

This round of instructor training, we had a lot of interest in the first two lessons in the Python set. Which is great! But also puts us in a position where we're looking a little overfull. So the first thing I want to know is if that's true. I think @qjcg has taught through this lesson recently? How did you find 00 and 01 went? Should we have a further discussion about streamlining these, or are we doing OK?

Expand overview of functions in 01-short-introduction-to-Python

Currently built-in functions are used without introduction, then the lesson jumps to user-defined functions.

I'm aware that capacity within lessons is limited, but I wonder if it would be helpful to alter the final section (## Functions) so that it minimally explains i) the key concepts of a function, ii) that we have already been using functions (print, dir, help) that are built-ins, iii) and subsequently that it is possible for a user to write their own functions.

P.S. I've just noticed that this issue has already been raised (#77), but was it ever resolved?

Revise learning objectives using Bloom's Taxonomy

Following the discussion in PR #63, we will pilot rewriting our lesson objectives with reference to Bloom's Taxonomy.

To start, we will aim to follow the two-level grouping of objectives suggested by @ErinBecker.

PRs with revised objectives should be made against the new-learning-objectives branch of this repo.

Contributing MD page is dated

The contributing document reflects ALL DC lessons. Should we change it to reflect only python materials? And if so, was @qjcg and @fmichonneau 's workshop the first DC python workshop and our SEA one the second? And 2) who else has contributed to these Python materials?

I've got:
Ethan White, ??, Mariela Perignon, John Gossett, Francois Michonneau, Leah Wasser, ...

i will be submitting a new PR soon with edits to L08 - SQL and Python and a few other edits throughout.
Thank you!

Which learning objectives should we assess?

Greetings everyone,

I'd appreciate your feedback on my current project. We've successfully revised the learning objectives for all of the Ecology lessons to reflect what we are teaching, and now we are in the process of developing surveys to assess our learners (before and after the workshop) on their skills and self-efficacy for the tools they were taught.

Your feedback is extremely valuable. In this document I've added the learning objectives for all of the Ecology lessons. I would appreciate if you'd open this document and add a +1 to the learning objectives you think are most important to assess our learners.

Additionally, I'm scheduling a virtual meeting to review the objectives, discuss your +1's, and come to a consensus. Please provide your availability here.

Our goal isn't to change what we're teaching, but to better understand what we're teaching, and ensure our learning objectives reflect that, so that we can assess our learners.

Thank you for your feedback and your time.
Kari

P.S. Thank you so much to all the maintainers who helped revise the learning objectives!

@gvwilson @wrightaprilm @qjcg @fmichonneau @acabunoc @mkuzak @tracykteal @stijnvanhoey @thomasballinger @mperignon @ethanwhite @ctb @twitwi @menegon @chenghlee @pbanaszkiewicz @justbennet @barrachri @ostueker @aaronreba @klemensnoga @leszektarkowski @evanwill @jpallen @pipitone @djinnome @katieMlyons @tomhohenstein @hugobowne @MichaelConnell @snamburi3 @ceholden @sdtaylor @dcwalk @amueller @hlapp @cbahlai @kcranston @turlog

Should we be teaching python 3 instead of 2?

There's a thread on SWC discuss about teaching Python 2 or 3. As these are people new to Python, so without history with the language, should we start them out with Python 3 and be teaching Python 3 instead of 2.x?

Introducing the IPython notebook

For the R lessons we have an 'Introducing RStudio' module. It would be nice to have an 'Introducing the IPython Notebook' in the Python lesson.

IPython Notebook itself is a great thing to teach for data analysis, so it seems very worthwhile to teach and use the IPython notebook in these lessons.

installation instructions

I haven't found any installation instructions. Maybe I missed it.

starting-with-data: 2nd plotting example uses constructs that were not introduced

I have several issues with these two lines of code in the Quick & Easy Plotting ... section in the Staring with data module:

total_count=surveys_df.record_id.groupby(surveys_df['plot_id']).nunique()
# let's plot that too
total_count.plot(kind='bar');

So far it the df.groupby() method has only been given a single column name or a list of them. Suddenly and without explanation a whole column (surveys_df['plot_id']) is passed as a parameter.
The .nunique() method has not been introduced.
So far only the dict like notation for accessing columns in DataFrames has been used, now it's the attribute notation (actually in Calculating Statistics ... it is also used but not explained: pd.unique(surveys_df.species_id))
(style) no whitespace around =

I'm still new to pandas and am right now at my second round through this module (on my own) and this is one of the points that easily throws me off.

My suggestions:

.nunique() could be quickly mentioned right after .unique()
the dict-notation should be used consistently (until the attribute notation is introduced at a later point).

(I'll hand in a PR soon for those.)

I'm not sure how to deal with the .groupby() though. I'll have to read up myself on this.

loc vs iloc clarification please :)

hey @qjcg et all... i am working through some of the materials again. the loc vs iloc text is pretty confusing and i'm trying to figure out a simple way to explain it. it seems loc accepts labels and numeric integer ranges. Iloc does not accept labels. But otherwise what is the difference? is it just labels vs no labels?

http://lwasser.github.io/python-ecology/02-index-slice-subset

Can you please have a look and help me clarify the difference so i can succinctly convey this at the workshop? thank you!

We can select specific ranges of our data in both the row and column directions using the loc and iloc arguments. The loc argument allows you to select data using labels AND numeric integer locations. Put another way, loc only accepts integer index values. iloc only allows you to select ranges using labels.

NOTE: Index values and labels must be found in the DataFrame or you will get a KeyError. Remember that the start bound and the stop bound are included. When using loc Integers can be used, but they refer to the index label and not the position. Thus when you use loc, and select 1:4, you will get a different result than using iloc to select rows 1:4.

To select a subset of rows AND columns from our DataFrame, we can use the iloc method. For example, we can select month, day and year (columns 2, 3 and 4 if we start counting at 1), like this:

Instructors guide

Add info on the jupyter NB
Check for currency with
- names (variables, vectors, etc)
- file i/o paths

Thank you for filing an issue!

These lessons are being prepared for publication. From March 15, 2017 to April 21, 2017 issues related to lesson release preparations are particularly appreciated.

Check variables

Thank you for filing an issue!

These lessons are being prepared for publication. From March 15, 2017 to April 21, 2017 issues related to lesson release preparations are particularly appreciated.

During Bug BBQ, we could really use for someone to walk through the materials and make sure variables are named consistently.

Level: Beginner.

Bug BBQ: Add key points to each episode

Broken Link: Setting Up Python

In the Setting Up Python section, where it says "Here you can find a python script check_env.py".
The hyperlink to check_env.py is broken (http://www.datacarpentry.org/python-ecology-lesson/scripts/check_env.py).

The file is now in _includes/scripts/ .

Why not use surveys_df['year'].unique() rather than range()

In the lesson about automation, why not pull the list of unique years from surveys_df['year'].unique() rather than creating confusion with the range() function ?

It would simplify the lesson quite a bit.

Links at bottom of lesson pages incorrect

Links to FB, twitter, etc. just link back to http://www.datacarpentry.org/python-ecology/

Link to license is 404

Bug report has no address in mailto:

No schedule in the lesson

I was checking the lesson for the schedule, but I didn't find it.

Is there a plan to add it ?

Check splitting out of python lesson

@wrightaprilm & @tracykteal - could you take a look at this and see if I've extracted the files and history of the Python lessons successfully. Thanks.

Missing js and img assets

Note, this applies to all of the *-ecology-lessons. I cannot tell where the origin of the shared files (_layouts, etc.) is though, so I'm raising the issue here for now.

If you look at the browser JS console on loading datacarpentry.github.io/python-ecology-lesson, there are three resources not found:
http://www.datacarpentry.org/python-ecology-lesson/js/jquery-1.9.1.min.js
http://www.datacarpentry.org/python-ecology-lesson/js/bootstrap/bootstrap.min.js
http://www.datacarpentry.org/img/main_shadow.png

The first two requests come from _includes/javascript.html and the last from css/swc.css. They are, indeed, missing from the repository.

If a maintainer can describe the usual workflow from updating these generic components from upstream, I'd be happy to follow up with a pull request.

Check environment script

Installation instructions were added as a result of #36 .

Would it be helpful to have a check environment script that people can run post-installation to confirm that they have it all in working order?

Check input and output filenames and paths

Bug BBQ:

Check paths and names of files used as input and that learners are directed appropriately to output paths and files.

Level: Beginner.

Thank you for filing an issue!

These lessons are being prepared for publication. From March 15, 2017 to April 21, 2017 issues related to lesson release preparations are particularly appreciated.

Strings, integers and floats and booleans

Hi folks,

It is kind of a minor issue, but it might cause confusion in our students if we introduce 3 data types and then use a fourth in our short introduction to python. Therefore, we might as well introduce the Boolean data type at the same time as we introduce Strings, integers, and floats.

Add description on how to install python ggplot.

Update markdown processor to kramdown

The redcarpet markdown processor used (i.e. in _config.yml) to convert our .md files to HTML via GitHub Pages / Jekyll will no longer be supported as of May 1st 2016.

Though the above states that:

If you're currently using Markdown processors that support GitHub-flavored Markdown, such as Rdiscount or Redcarpet, then you don't need to change your Markdown files for them to render properly.

this does not appear to be completely accurate in practice.

Thus, the processor must be changed in _config.yml (to Kramdown, the markdown processor to be used going forward), and the rendered content should be reviewed for errors (i.e. does it look OK?) at the same time.

Lack of requirements info, miniconda and how to lunch a Jupyter Notebook

I was checking the lesson as a part of the "Instructor Training Checkout".

Adding a requirements page

I see that all the requirements part is handled by installing Anaconda.

I think could be important to add a requirements page where we explain which packages you need and some information about them instead of having all this information spread around the lecture.

Using miniconda

With Anaconda you are installing 300MBs libraries and we only use Python, Pandas and Matplotlib (Numpy is a requirement of Pandas).

Could be more useful to use Miniconda and give them a brief introduction how to install packages ?

Lunch a Jupyter Notebook

I didn't find any info about how to lunch a jupyter notebook inside the lecture.

Add instructor notes document for this lesson

I'm working on helping direct instructor attention towards fixing up/contributing to instructor notes. Currently don't have a link to provide for instructor notes for this lesson. Please add - even a blank document would be somewhere to point towards.

Jupyter setup instructions

Include basis instructions for setting up a new Jupyter notebook in setup.md or modify the beginning of lesson 00 to introduce the very basics of working in a jupyter notebook. At the moment there is a bit of a gap between the end of setup and the beginning of lesson 00

Perhaps get inspiration from https://github.com/swcarpentry/python-novice-gapminder/blob/gh-pages/_episodes/01-run-quit.md

May be related to issue #10

Lessons are too long

Long lessons should be split / refactored into shorter, more focused "episodes" similar to the python-novice-gapminder and new-sql lessons.

Add motivating questions to each episode

Help us develop R/Python MCQ's for our assessment

We had overwhelmingly productive feedback regarding our issue on which learning objectives to assess in our lessons. Because of you we were able to develop new and improved pre and post workshop surveys that we plan to pilot in our workshops this Spring.

In our new surveys we plan to ask 10 multiple choice questions (MCQ's) that are skills based for R and Python (learners will answer for R or Python-depending on which tool they learned). We have done our best to develop a few questions, but realized our biggest resource is YOU-our community.

We would appreciate you taking the time to look at the questions on this document. Do they make sense? Do you have suggestions for better question? The questions are for R, but we need questions developed for Python as well. Feel free to add questions directly to the document.

Please comment on the document by Wednesday, March 1st. Just as we did before we'll have a BlueJeans meeting to go over your comments/suggestions. Please place your availability (UTC) here.

Thank you for your willingness to contribute to this process.

Should the "Starting with Data" lesson mention Pandas Series explicitly?

When I was working through the "Starting with Data" lesson, I noticed the "Summary Plotting Challenge" starts with the dict

d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}

As far as I can see, the lesson nowhere mentions the pd.Series type. For the learner the above line of code might appear pretty opaque.

In addition, under "Manipulating Our Species Survey Data", the lesson early teaches the learner to check types:

type(surveys_df)
# output: pandas.core.frame.DataFrame

But this otherwise good practice could run into trouble later with the species_counts:

type(species_counts)
# output: pandas.core.series.Series

Perhaps this could this be fixed by an early introduction to the Series terminology. For example, just before introducing the Pandas DataFrame, the lesson could say something like the following.

You can imagine a Pandas Series as a single column of a spreadsheet: a list of data, perhaps of different types, collected together and put in some order. A Pandas DataFrame is akin to the full spreadsheet: a collection of data in several columns -- i.e. in several Pandas Series -- stacked next to one another to make a 2-dimensional array.

The lesson could potentially continue as-is from there, maybe just highlighting at some point (perhaps in the species_counts example) that pulling out a single row gives a Series in Pandas terminology.

05-loops-and-functions.md contains confusing text explanations

"If a local variable has the same name as a variable somewhere else in the code, the local variable hides but doesn't overwrite the other."
-- "a variable somewhere else in the code" means global variables or local variables in other functions?
"The line defining the loop must start with for"
-- in python, we can also define while loops, a better way: "Here, the line defining the loop start with for"
"Functions are reusable, self-contained pieces of code that are called with a single command."
-- a better way: "Functions are reusable, self-contained pieces of code that can be called with a single command. "

Broken Link: Setup link

In the Schedule section, the set up link is broken and returns a 404 page. This appears to be a generic part of the template from a Data Carpentry. As the set up information is above the schedule, can this link be removed or retargeted to the appropriate name anchor? I willing to try and fix as part of DC trainer checkout.

Add timing estimates to each episode

Web based notebook for people with install issues

It would be nice to have a web-based solution for IPython notebooks as a back up if people have install issues. What are the available options for this? Once we know what they are, we can make a note to instructors in this lesson.

use the portal data from FigShare for this lesson

This lesson uses the portal data, but a copy that is in the repo, rather than the one on FigShare

https://figshare.com/articles/Portal_Project_Teaching_Database/1314459

It would be good to update the lesson to use this data. This will likely require some changes throughout the lesson though, since the FigShare version of surveys.csv has an added column 'hindfoot_length' and different column headers.

Lesson links are broken

Both the link in the GitHub description (http://datacarpentry.github.io/python-ecology) and the link at http://www.datacarpentry.org/lessons/#ecology-workshop (https://datacarpentry.github.io/python-ecology-lesson/) are broken for me.

Trainer checkout

What are responsibilities @qjcg and I have when users submit trainer checkout exercises? Do we just merge these as we see fit, or do we need to tag you in/do something else, @tracykteal @gvwilson?

matplotlib explains mostly config options, not different kinds of plots

I feel that different kinds of plots is something is important to know, like scatter plots, box plots, heatmaps etc...
The options that are explained also don't seem that important to beginners, like dpi, which seems a pretty advanced concept.

Output shown doesn't match actual output

When using the command

surveys_df['weight'].describe()

I get the output:

C:\Users\cgeroux\Anaconda3\lib\site-packages\numpy\lib\function_base.py:3834: RuntimeWarning: Invalid value encountered in percentile
  RuntimeWarning)
count    32283.000000
mean        42.672428
std         36.631259
min          4.000000
25%               NaN
50%               NaN
75%               NaN
max        280.000000
Name: weight, dtype: float64

Not what is shown in 01-starting-with-data of

count    32283.000000
mean        42.672428
std         36.631259
min          4.000000
25%         20.000000
50%         37.000000
75%         48.000000
max        280.000000
Name: weight, dtype: float64

These NaNs can be removed and the output from the current lesson can be produced with

surveys_df['weight'][surveys_df['weight'].notnull()].describe()

Not sure if this behavior depends on the version of pandas, python, or anaconda:

import sys
import pandas
print(sys.version)
print(pandas.__version__)

produces:

3.5.2 |Anaconda 4.1.1 (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
0.18.1

Challenges are not always clear

I'm just working myself through this lesson, as I haven't worked with pandas so far and I found that for most of the challenges, it is not clear to me which result is expected.

Especially for challenges that ask to generate a plot, it would be really helpful if a PNG of the desired plot would be added to the challenge.

I wanted to help adding solutions to the challenges to the INSTRUCTORS.md but often it is not clear to me what the person who has designed the challenge had in mind.

Standardize on a single code style throughout the module

There are specific DC modules where coding style is discussed (e.g. http://www.datacarpentry.org/semester-biology/materials/style/), but it is largely ignored by most modules. I don't find this particularly concerning as there is a lot to cover and style is typically not first on the list. However, I don think that one of the ways (perhaps the best way) of learning to code is to read other peoples' code. And one of the hallmarks of well-written code is consistent style. Thus, I think it's important to adopt a single coding style, at least within a module if not across all DC modules. As an example, in the "Introduction to Python" module, two different ways of naming variables appear in the same code block:

# tuples use paratheses
ATuple= (1,2,3)
anotherTuple = ('blue','green','red')
# notes lists uses square brackets
AList = [1,2,3]

This is unnecessarily confusing to novices. Since there are lots of religious wars regarding "proper" style, I think the default should be to adopt PEP8 unless someone has a very good reason to do otherwise. So the above code block should be changed to:

# tuples use paratheses
a_tuple = (1,2,3)
another_tuple = ('blue', 'green', 'red')
# notes lists uses square brackets
a_list = [1,2,3]

My specific proposals are:

This DC module (and hopefully all others) should settle on a single coding style that should be specifically stated in the module's meta-information.
All modules should be reviewed to ensure that they conform to style. This could probably be automated using something like https://pypi.python.org/pypi/autopep8.

Define questions

Define questions for lessons 01-08

Contributing guidelines need fixes reflecting split-out lessons

Repo to submit pull requests to is no longer the datacarpentry repo.
Lessons are no longer in subdirectories:

Every lesson has a sub-directory of its own, while individual topics are files in that directory. For example, the lessons/shell directory holding our introduction to the shell contains the files 00-intro.md, 01-filedir.md and so on.

"Batteries included" is a lie for data science

The intro says

"Batteries Included" philosophy - libraries for common tasks available in standard installation

I would argue that the opposite is true. Coming from R or Matlab, people are very surprised that you need to install / import a library (numpy) to get matrices. And a different one for dataframes.
Installing numpy also is a major pain on most operating systems, unless you use anaconda.
Does "standard installation" mean anaconda? Then the instruction should say that.

A novice might thing "standard installation" means the one from python.org. Imagine someone on windows installing the python.org interpreter and expecting batteries included.

[for context: I'm a strong advocate for the use of Python in data science.]

Update the front page

The front page has a broken link to the deprecated matplotlib lesson. Update link to new ggplot lesson.

library vs package

In lesson '01-starting-with-data' we're talking about libraries. The term library does not have any specific contextual meaning in Python, unlike in C or C++. It might be better to talk about packages since this is in line with Python terminology. On the other hand in Pandas documentation states 'Python Data Analysis Library' which adds confusion.

No images of plots in 06-plotting-with-matplotlib

I noticed that lesson 06-plotting-with-matplotlib does not include any images of the plots that would be created by the various bits of example code. Is this a design decision, or have the images just not been created? I'd be willing to create python script that created example plot, and link those into the lesson's markdown.

datacarpentry / python-ecology-lesson Goto Github PK

python-ecology-lesson's Introduction

Data Carpentry Python Lessons with Ecological Data

Contributing

Maintainer(s)

Authors

Citation

python-ecology-lesson's People

Contributors

Stargazers

Watchers

Forkers

python-ecology-lesson's Issues

Can you please have a look and help me clarify the difference so i can succinctly convey this at the workshop? thank you!

Recommend Projects

Recommend Topics

Recommend Org