Giter Site home page Giter Site logo

ds100-s20-content's People

Contributors

afranks86 avatar blimmie avatar dreamboymeng avatar ykharitonova avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ds100-s20-content's Issues

Lab 9

Reword 2nd question to be more generalized, then answer it with a multilinear regression

Lab 7

  • Add link to documentation for np.corrcoef
  • a_hat and b_hat are flipped where it says "respectively"

Lab 6

  • Add clarification to problem 1 to set the colors to "target_above_mean"
  • Add clarification to problem 4 that they should plot all PCs

Lab01 notes

Notes for future reference / fixes:

  • Add a shortcut for commenting/uncommenting a highlighted selection inside a code cell. (See a note in HW 4.)

  • List comprehension, zip() example (Q2.0.1), update the print statement to show the multiplication (and maybe remind them of the format() function)

for x, y in zip(a,b):
    print(x,'*', y, '=', x*y)
  • NumPy Question 3a (Q2.0.2.2) generated a lot of questions, especially, since the test answers were rounded.
---------------------------------------------------------------------
q3a > Suite 1 > Case 1

>>> valid_values
[0.9507143064099162, 0.8661761457749352, 0.9699098521619943, 0.9488855372533332, 0.9656320330745594, 0.9093204020787821, 0.9695846277645586, 0.9394989415641891, 0.8948273504276488, 0.9218742350231168]

# Error: expected
#     array([0.95071431, 0.86617615, 0.96990985, 0.94888554, 0.96563203,
#            0.9093204 , 0.96958463, 0.93949894, 0.89482735, 0.92187424])
# but got
#     [0.9507143064099162, 0.8661761457749352, 0.9699098521619943, 0.9488855372533332, 0.9656320330745594, 0.9093204020787821, 0.9695846277645586, 0.9394989415641891, 0.8948273504276488, 0.9218742350231168]
  • NumPy Question 3a (Q2.0.2.2)

    • in addition to fixing the tests, perhaps, break down this question into several parts
    • List comprehension's basic syntax is: [ expression for item in list if conditional ]
    • show that the if part is not required: e.g., create a list: nums = [i for i in range(5)]
    • now show how to update the above statement to add only the even numbers to the list (now we need an if)
  • NumPy Question 3b (Q2.0.2.3) should include the note about searching the web for exact answers.

    • Current: "If you're stuck, try a search engine! Searching the web for examples of how to use modules is very common in data science."
    • Update to: "If you're stuck, try a search engine! Searching the web for examples of how to use modules is very common in data science. Reminder: Searching the web for the exact text of the problem that you are trying to solve is considered to be cheating and will hinder your learning in the long term. "
  • NumPy Question 3b (Q2.0.2.3) generated some confusion, since students were seeing different results than what was described in the writeup

    • some were not seeing 50 times more efficient execution sometimes due to how they wrote the list_sum and array_sum functions (for loops slowed things down?).

Final project notes

  • Have a peer-review assignment as a checkpoint submission: perhaps a week before the assignment is due, allow/assign teams to peer-review their work. To accomplish this, need to have the teams submit their project topics in advance, so that we have a list of teams/projects available.
  • give LaTeX examples in advance of the project, so that students can use it for the final report.

Final report structure / notes

  • Separate data / methods into their own sections; align them with the rubric.
  • Including references at the end of the report without referencing them in the text is not how citations work. Citations should be referenced in the text. Specify the citation style.
  • For the submitted notebook section: "It is most helpful if the notebook has a structure to it.
    If we have a question from reading the report, we should be able to load the notebook and re-run it to get the same chart / see the calculations used in the analysis, so it is within your interest to make it easy to find."

Homework 3 Question 2d

There is a mistake in the instructions for Question 2d of Homework 3. The line "This time, let's use the 'bike' DataFrame to plot hourly counts instead of daily counts" should be "Let's use the 'daily_counts' DataFrame to plot daily counts."

Lab 5 notes

  • The original lab doesn't pass the otter tests with the answer key for q3, q7, and q11 (see error messages on Slack)

  • Correct "../data/bball_data.csv" to be "data/bball_data.csv"
    โ€ข At "An Observation," the pct='1 / datum.total / 100 for the transform_joinaggregate() needs documentation explaining what this step does.

Lab02 notes

import numpy as np
import pandas as pd

x = np.arange(100)
source = pd.DataFrame({
  'x': x,
  'f(x)': np.sin(x / 5)
})
  • Question 1b: Plotting the Squared Loss Would be nice to introduce an example Altair plot that shows how to create the chart object using the .Chart() method and how to use the mark_ functions.
    • Now, use the full example and use the code from above for a sample dataframe in an Altair visualization, explaining that y gets the name of the column in the dataframe:
import altair as alt

x = np.arange(100)
source = pd.DataFrame({
  'x': x,
  'f(x)': np.sin(x / 5)
})

alt.Chart(source).mark_line().encode(
    x='x',
    y='f(x)'
)
  • Question 1b: Plotting the Squared Loss: "Let us now consider the case where y_obs equals 10."

    • It would be good to add an explanation for what y_obs represents and why it is not supposed to be an array.
    • It is unclear what the question is asking us to do. Adding a bit more of the context / goal would help.
    • From Piazza: "do we need to to plot loss function respect to obs_y or c_values? In other words, what should be the x_axis and y_axis for the graph?" Answer: c_values
    • hard to know what a plot is supposed to look like
  • Question 1b: Plotting the Squared Loss: Use the correct link for ""Adjusting Axis Labels" (needed a dash in altair-viz): https://altair-viz.github.io/user_guide/customization.html#adjusting-axis-labels

    • units in labeling axes?
  • Question 2: Mean Squared Error for the Tips Data Add a docstring to the mean_squared_error function (remind them to use squared_loss they defined earlier?):

def mean_squared_error(c, data):
    """
    Calculate the mean squared error of the observed data and a summary statistic.
    
    Parameters
    ------------
    data: observed values
    c : some constant representing a summary statistic
    
    Returns
    ------------
    The mean squared loss between the data and the summary statistic.
    """
  • Q2: "Find the value of c that minimizes the L2 loss above via observation of the plot you've generated. Round your answer to the nearest integer."
    • Seems like a good time to show them how to create an interactive plot?

Visualization resources

Altair

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.