Giter Site home page Giter Site logo

datacarpentry / python-ecology-lesson Goto Github PK

View Code? Open in Web Editor NEW
159.0 25.0 310.0 22.51 MB

Data Analysis and Visualization in Python for Ecologists

Home Page: https://datacarpentry.org/python-ecology-lesson

License: Other

Python 0.10% Jupyter Notebook 99.90%
carpentries data-carpentry lesson python data-wrangling data-visualisation data-visualization english ecology stable

python-ecology-lesson's Introduction

Create a Slack Account with us Slack Status

Data Carpentry Python Lessons with Ecological Data

This repository contains the Data Carpentry Python material based on ecological data. Please see our contribution guidelines for information on how to contribute updates, bug fixes, or other corrections.

Contributing

We welcome all contributions to improve the lesson! Maintainers will do their best to help you if you have any questions, concerns, or experience any difficulties along the way.

We'd like to ask you to familiarize yourself with our Contribution Guide and have a look at the more detailed guidelines on proper formatting, ways to render the lesson locally, and even how to write new episodes.

Please see the current list of issues for ideas for contributing to this repository. For making your contribution, we use the GitHub flow, which is nicely explained in the chapter Contributing to a Project in Pro Git by Scott Chacon. Look for the tag Image replicating 'Good first issue' label. This indicates that the maintainers will welcome a pull request fixing this issue.

Maintainer(s)

Current maintainers of this lesson are

Authors

A list of contributors to the lesson can be found in AUTHORS

Citation

To cite this lesson, please consult with CITATION

python-ecology-lesson's People

Contributors

andrewsanchez avatar bgbg avatar btovar avatar cbrafter avatar erinbecker avatar ethanwhite avatar fmichonneau avatar goi42 avatar hlapp avatar hunter-powell avatar kariljordan avatar katrinleinweber avatar lilithelina avatar maneesha avatar maxim-belkin avatar mkuzak avatar npalopoli avatar orchid00 avatar ppxasjsm avatar qjcg avatar quist00 avatar serahkiburu avatar stijnvanhoey avatar thomasballinger avatar tmorrell avatar tobyhodges avatar tompc35 avatar tracykteal avatar willingc avatar wrightaprilm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-ecology-lesson's Issues

Integration with the new lesson template should be improved

Happy 2017 everybody!

New year, new lesson template, and I've started work on improving our lesson's integration with that template in the new-template-improvements branch, following the example lesson guidelines.

You can see the current state rendered here, which means so far:

  • Enable the "episodes" drop-down menu by moving lesson/episode content to the _episodes directory
  • Use the fig directory for figures (and update all in-text references to point to the new directory)
  • Fix broken link to the "setup" page
  • Move learning objectives to YAML front matter

Still to be added to the YAML front matter for each episode before merging:

  • Timing estimates (for teaching & for exercises)
    • This will generate a lesson schedule, addressing #91
  • Motivating questions
  • Key points

If you'd like to help, pull requests against the new-template-improvements branch will be more than welcome. :)

Lessons 00/01 getting a little overfull

This round of instructor training, we had a lot of interest in the first two lessons in the Python set. Which is great! But also puts us in a position where we're looking a little overfull. So the first thing I want to know is if that's true. I think @qjcg has taught through this lesson recently? How did you find 00 and 01 went? Should we have a further discussion about streamlining these, or are we doing OK?

Expand overview of functions in 01-short-introduction-to-Python

Currently built-in functions are used without introduction, then the lesson jumps to user-defined functions.

I'm aware that capacity within lessons is limited, but I wonder if it would be helpful to alter the final section (## Functions) so that it minimally explains i) the key concepts of a function, ii) that we have already been using functions (print, dir, help) that are built-ins, iii) and subsequently that it is possible for a user to write their own functions.

P.S. I've just noticed that this issue has already been raised (#77), but was it ever resolved?

Contributing MD page is dated

The contributing document reflects ALL DC lessons. Should we change it to reflect only python materials? And if so, was @qjcg and @fmichonneau 's workshop the first DC python workshop and our SEA one the second? And 2) who else has contributed to these Python materials?

I've got:
Ethan White, ??, Mariela Perignon, John Gossett, Francois Michonneau, Leah Wasser, ...

i will be submitting a new PR soon with edits to L08 - SQL and Python and a few other edits throughout.
Thank you!

Which learning objectives should we assess?

Greetings everyone,

I'd appreciate your feedback on my current project. We've successfully revised the learning objectives for all of the Ecology lessons to reflect what we are teaching, and now we are in the process of developing surveys to assess our learners (before and after the workshop) on their skills and self-efficacy for the tools they were taught.

Your feedback is extremely valuable. In this document I've added the learning objectives for all of the Ecology lessons. I would appreciate if you'd open this document and add a +1 to the learning objectives you think are most important to assess our learners.

Additionally, I'm scheduling a virtual meeting to review the objectives, discuss your +1's, and come to a consensus. Please provide your availability here.

Our goal isn't to change what we're teaching, but to better understand what we're teaching, and ensure our learning objectives reflect that, so that we can assess our learners.

Thank you for your feedback and your time.
Kari

P.S. Thank you so much to all the maintainers who helped revise the learning objectives!

@gvwilson @wrightaprilm @qjcg @fmichonneau @acabunoc @mkuzak @tracykteal @stijnvanhoey @thomasballinger @mperignon @ethanwhite @ctb @twitwi @menegon @chenghlee @pbanaszkiewicz @justbennet @barrachri @ostueker @aaronreba @klemensnoga @leszektarkowski @evanwill @jpallen @pipitone @djinnome @katieMlyons @tomhohenstein @hugobowne @MichaelConnell @snamburi3 @ceholden @sdtaylor @dcwalk @amueller @hlapp @cbahlai @kcranston @turlog

Should we be teaching python 3 instead of 2?

There's a thread on SWC discuss about teaching Python 2 or 3. As these are people new to Python, so without history with the language, should we start them out with Python 3 and be teaching Python 3 instead of 2.x?

Introducing the IPython notebook

For the R lessons we have an 'Introducing RStudio' module. It would be nice to have an 'Introducing the IPython Notebook' in the Python lesson.

IPython Notebook itself is a great thing to teach for data analysis, so it seems very worthwhile to teach and use the IPython notebook in these lessons.

starting-with-data: 2nd plotting example uses constructs that were not introduced

I have several issues with these two lines of code in the Quick & Easy Plotting ... section in the Staring with data module:

total_count=surveys_df.record_id.groupby(surveys_df['plot_id']).nunique()
# let's plot that too
total_count.plot(kind='bar');
  1. So far it the df.groupby() method has only been given a single column name or a list of them. Suddenly and without explanation a whole column (surveys_df['plot_id']) is passed as a parameter.
  2. The .nunique() method has not been introduced.
  3. So far only the dict like notation for accessing columns in DataFrames has been used, now it's the attribute notation (actually in Calculating Statistics ... it is also used but not explained: pd.unique(surveys_df.species_id))
  4. (style) no whitespace around =

I'm still new to pandas and am right now at my second round through this module (on my own) and this is one of the points that easily throws me off.

My suggestions:

  • .nunique() could be quickly mentioned right after .unique()
  • the dict-notation should be used consistently (until the attribute notation is introduced at a later point).

(I'll hand in a PR soon for those.)

I'm not sure how to deal with the .groupby() though. I'll have to read up myself on this.

loc vs iloc clarification please :)

hey @qjcg et all... i am working through some of the materials again. the loc vs iloc text is pretty confusing and i'm trying to figure out a simple way to explain it. it seems loc accepts labels and numeric integer ranges. Iloc does not accept labels. But otherwise what is the difference? is it just labels vs no labels?

http://lwasser.github.io/python-ecology/02-index-slice-subset

Can you please have a look and help me clarify the difference so i can succinctly convey this at the workshop? thank you!

We can select specific ranges of our data in both the row and column directions using the loc and iloc arguments. The loc argument allows you to select data using labels AND numeric integer locations. Put another way, loc only accepts integer index values. iloc only allows you to select ranges using labels.

NOTE: Index values and labels must be found in the DataFrame or you will get a KeyError. Remember that the start bound and the stop bound are included. When using loc Integers can be used, but they refer to the index label and not the position. Thus when you use loc, and select 1:4, you will get a different result than using iloc to select rows 1:4.

To select a subset of rows AND columns from our DataFrame, we can use the iloc method. For example, we can select month, day and year (columns 2, 3 and 4 if we start counting at 1), like this:


Instructors guide

  • Add info on the jupyter NB
  • Check for currency with
    • names (variables, vectors, etc)
    • file i/o paths

Thank you for filing an issue!

These lessons are being prepared for publication. From March 15, 2017 to April 21, 2017 issues related to lesson release preparations are particularly appreciated.

Check variables

Thank you for filing an issue!

These lessons are being prepared for publication. From March 15, 2017 to April 21, 2017 issues related to lesson release preparations are particularly appreciated.

During Bug BBQ, we could really use for someone to walk through the materials and make sure variables are named consistently.

Level: Beginner.

Missing js and img assets

Note, this applies to all of the *-ecology-lessons. I cannot tell where the origin of the shared files (_layouts, etc.) is though, so I'm raising the issue here for now.

If you look at the browser JS console on loading datacarpentry.github.io/python-ecology-lesson, there are three resources not found:
http://www.datacarpentry.org/python-ecology-lesson/js/jquery-1.9.1.min.js
http://www.datacarpentry.org/python-ecology-lesson/js/bootstrap/bootstrap.min.js
http://www.datacarpentry.org/img/main_shadow.png

The first two requests come from _includes/javascript.html and the last from css/swc.css. They are, indeed, missing from the repository.

If a maintainer can describe the usual workflow from updating these generic components from upstream, I'd be happy to follow up with a pull request.

Check environment script

Installation instructions were added as a result of #36 .

Would it be helpful to have a check environment script that people can run post-installation to confirm that they have it all in working order?

Check input and output filenames and paths

Bug BBQ:

Check paths and names of files used as input and that learners are directed appropriately to output paths and files.

Level: Beginner.

Thank you for filing an issue!

These lessons are being prepared for publication. From March 15, 2017 to April 21, 2017 issues related to lesson release preparations are particularly appreciated.

Update markdown processor to kramdown

The redcarpet markdown processor used (i.e. in _config.yml) to convert our .md files to HTML via GitHub Pages / Jekyll will no longer be supported as of May 1st 2016.

Though the above states that:

If you're currently using Markdown processors that support GitHub-flavored Markdown, such as Rdiscount or Redcarpet, then you don't need to change your Markdown files for them to render properly.

this does not appear to be completely accurate in practice.

Thus, the processor must be changed in _config.yml (to Kramdown, the markdown processor to be used going forward), and the rendered content should be reviewed for errors (i.e. does it look OK?) at the same time.

Lack of requirements info, miniconda and how to lunch a Jupyter Notebook

I was checking the lesson as a part of the "Instructor Training Checkout".

  • Adding a requirements page

I see that all the requirements part is handled by installing Anaconda.

I think could be important to add a requirements page where we explain which packages you need and some information about them instead of having all this information spread around the lecture.

  • Using miniconda

With Anaconda you are installing 300MBs libraries and we only use Python, Pandas and Matplotlib (Numpy is a requirement of Pandas).

Could be more useful to use Miniconda and give them a brief introduction how to install packages ?

  • Lunch a Jupyter Notebook

I didn't find any info about how to lunch a jupyter notebook inside the lecture.

Add instructor notes document for this lesson

I'm working on helping direct instructor attention towards fixing up/contributing to instructor notes. Currently don't have a link to provide for instructor notes for this lesson. Please add - even a blank document would be somewhere to point towards.

Help us develop R/Python MCQ's for our assessment

We had overwhelmingly productive feedback regarding our issue on which learning objectives to assess in our lessons. Because of you we were able to develop new and improved pre and post workshop surveys that we plan to pilot in our workshops this Spring.

In our new surveys we plan to ask 10 multiple choice questions (MCQ's) that are skills based for R and Python (learners will answer for R or Python-depending on which tool they learned). We have done our best to develop a few questions, but realized our biggest resource is YOU-our community.

We would appreciate you taking the time to look at the questions on this document. Do they make sense? Do you have suggestions for better question? The questions are for R, but we need questions developed for Python as well. Feel free to add questions directly to the document.

Please comment on the document by Wednesday, March 1st. Just as we did before we'll have a BlueJeans meeting to go over your comments/suggestions. Please place your availability (UTC) here.

Thank you for your willingness to contribute to this process.

@gvwilson @wrightaprilm @qjcg @fmichonneau @acabunoc @mkuzak @tracykteal @stijnvanhoey @thomasballinger @mperignon @ethanwhite @ctb @twitwi @menegon @chenghlee @pbanaszkiewicz @justbennet @barrachri @ostueker @aaronreba @klemensnoga @leszektarkowski @evanwill @jpallen @pipitone @djinnome @katieMlyons @tomhohenstein @hugobowne @MichaelConnell @snamburi3 @ceholden @sdtaylor @dcwalk @amueller @hlapp @cbahlai @kcranston @turlog @ErinBecker

Should the "Starting with Data" lesson mention Pandas Series explicitly?

When I was working through the "Starting with Data" lesson, I noticed the "Summary Plotting Challenge" starts with the dict

d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}

As far as I can see, the lesson nowhere mentions the pd.Series type. For the learner the above line of code might appear pretty opaque.

In addition, under "Manipulating Our Species Survey Data", the lesson early teaches the learner to check types:

type(surveys_df)
# output: pandas.core.frame.DataFrame

But this otherwise good practice could run into trouble later with the species_counts:

type(species_counts)
# output: pandas.core.series.Series

Perhaps this could this be fixed by an early introduction to the Series terminology. For example, just before introducing the Pandas DataFrame, the lesson could say something like the following.

You can imagine a Pandas Series as a single column of a spreadsheet: a list of data, perhaps of different types, collected together and put in some order. A Pandas DataFrame is akin to the full spreadsheet: a collection of data in several columns -- i.e. in several Pandas Series -- stacked next to one another to make a 2-dimensional array.

The lesson could potentially continue as-is from there, maybe just highlighting at some point (perhaps in the species_counts example) that pulling out a single row gives a Series in Pandas terminology.

05-loops-and-functions.md contains confusing text explanations

  1. "If a local variable has the same name as a variable somewhere else in the code, the local variable hides but doesn't overwrite the other."
    -- "a variable somewhere else in the code" means global variables or local variables in other functions?

  2. "The line defining the loop must start with for"
    -- in python, we can also define while loops, a better way: "Here, the line defining the loop start with for"

  3. "Functions are reusable, self-contained pieces of code that are called with a single command."
    -- a better way: "Functions are reusable, self-contained pieces of code that can be called with a single command. "

Broken Link: Setup link

In the Schedule section, the set up link is broken and returns a 404 page. This appears to be a generic part of the template from a Data Carpentry. As the set up information is above the schedule, can this link be removed or retargeted to the appropriate name anchor? I willing to try and fix as part of DC trainer checkout.

Web based notebook for people with install issues

It would be nice to have a web-based solution for IPython notebooks as a back up if people have install issues. What are the available options for this? Once we know what they are, we can make a note to instructors in this lesson.

Trainer checkout

What are responsibilities @qjcg and I have when users submit trainer checkout exercises? Do we just merge these as we see fit, or do we need to tag you in/do something else, @tracykteal @gvwilson?

Output shown doesn't match actual output

When using the command

surveys_df['weight'].describe()

I get the output:

C:\Users\cgeroux\Anaconda3\lib\site-packages\numpy\lib\function_base.py:3834: RuntimeWarning: Invalid value encountered in percentile
  RuntimeWarning)
count    32283.000000
mean        42.672428
std         36.631259
min          4.000000
25%               NaN
50%               NaN
75%               NaN
max        280.000000
Name: weight, dtype: float64

Not what is shown in 01-starting-with-data of

count    32283.000000
mean        42.672428
std         36.631259
min          4.000000
25%         20.000000
50%         37.000000
75%         48.000000
max        280.000000
Name: weight, dtype: float64

These NaNs can be removed and the output from the current lesson can be produced with

surveys_df['weight'][surveys_df['weight'].notnull()].describe()

Not sure if this behavior depends on the version of pandas, python, or anaconda:

import sys
import pandas
print(sys.version)
print(pandas.__version__)

produces:

3.5.2 |Anaconda 4.1.1 (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
0.18.1

Challenges are not always clear

I'm just working myself through this lesson, as I haven't worked with pandas so far and I found that for most of the challenges, it is not clear to me which result is expected.

Especially for challenges that ask to generate a plot, it would be really helpful if a PNG of the desired plot would be added to the challenge.

I wanted to help adding solutions to the challenges to the INSTRUCTORS.md but often it is not clear to me what the person who has designed the challenge had in mind.

Standardize on a single code style throughout the module

There are specific DC modules where coding style is discussed (e.g. http://www.datacarpentry.org/semester-biology/materials/style/), but it is largely ignored by most modules. I don't find this particularly concerning as there is a lot to cover and style is typically not first on the list. However, I don think that one of the ways (perhaps the best way) of learning to code is to read other peoples' code. And one of the hallmarks of well-written code is consistent style. Thus, I think it's important to adopt a single coding style, at least within a module if not across all DC modules. As an example, in the "Introduction to Python" module, two different ways of naming variables appear in the same code block:

# tuples use paratheses
ATuple= (1,2,3)
anotherTuple = ('blue','green','red')
# notes lists uses square brackets
AList = [1,2,3]

This is unnecessarily confusing to novices. Since there are lots of religious wars regarding "proper" style, I think the default should be to adopt PEP8 unless someone has a very good reason to do otherwise. So the above code block should be changed to:

# tuples use paratheses
a_tuple = (1,2,3)
another_tuple = ('blue', 'green', 'red')
# notes lists uses square brackets
a_list = [1,2,3]

My specific proposals are:

  1. This DC module (and hopefully all others) should settle on a single coding style that should be specifically stated in the module's meta-information.
  2. All modules should be reviewed to ensure that they conform to style. This could probably be automated using something like https://pypi.python.org/pypi/autopep8.

Contributing guidelines need fixes reflecting split-out lessons

  • Repo to submit pull requests to is no longer the datacarpentry repo.

  • Lessons are no longer in subdirectories:

    Every lesson has a sub-directory of its own, while individual topics are files in that directory. For example, the lessons/shell directory holding our introduction to the shell contains the files 00-intro.md, 01-filedir.md and so on.

"Batteries included" is a lie for data science

The intro says

"Batteries Included" philosophy - libraries for common tasks available in standard installation

I would argue that the opposite is true. Coming from R or Matlab, people are very surprised that you need to install / import a library (numpy) to get matrices. And a different one for dataframes.
Installing numpy also is a major pain on most operating systems, unless you use anaconda.
Does "standard installation" mean anaconda? Then the instruction should say that.

A novice might thing "standard installation" means the one from python.org. Imagine someone on windows installing the python.org interpreter and expecting batteries included.

[for context: I'm a strong advocate for the use of Python in data science.]

library vs package

In lesson '01-starting-with-data' we're talking about libraries. The term library does not have any specific contextual meaning in Python, unlike in C or C++. It might be better to talk about packages since this is in line with Python terminology. On the other hand in Pandas documentation states 'Python Data Analysis Library' which adds confusion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.