Giter Site home page Giter Site logo

persp-analysis's Introduction

MACS 30000 - Perspectives on Computational Analysis

Dr. Benjamin Soltoff Ryan C. Hughes (TA) Joshua G. Mausolf (TA)
Email [email protected] [email protected] [email protected]
Office 249 Saieh Hall 251 Saieh Hall 251 Saieh Hall
Office Hours Th 1-3pm M 8:00-10:00am F 9:30-11:30am
GitHub bensoltoff rchughes jmausolf
  • Meeting day/time: MW 11:30-1:20pm, 247 Saieh Hall for Economics
  • Lab session: W 4:30-5:20pm, 247 Saieh Hall for Economics
  • Office hours also available by appointment

Course description

Massive digital traces of human behavior and ubiquitous computation have both extended and altered classical social science inquiry. This course surveys successful social science applications of computational approaches to the representation of complex data, information visualization, and model construction and estimation. We will reexamine the scientific method in the social sciences in context of both theory development and testing, exploring how computation and digital data enables new answers to classic investigations, the posing of novel questions, and new ethical challenges and opportunities. Students will review fundamental research designs such as observational studies and experiments, statistical summaries, visualization of data, and how computational opportunities can enhance them. The focus of the course is on exploring the wide range of contemporary approaches to computational social science, with practical programming assignments to train with these approaches.

Required textbooks

All textbooks are available in electronic editions either directly from the author or via the UChicago library (authentication required). Hardcopies can be purchased at your preferred retailer.

Evaluation

Assignment Quantity Points Total Points
Short assignments 8 10 80
Final exam 1 20 20
  • Short assignments will vary depending on subject matter. They could include writing assignments analyzing computational research designs and/or problem sets implementing specific computational methods.
  • Final exam will be a timed take-home exam. Details to be furnished near the end of term.

Disability services

If you need any special accommodations, please provide me (Dr. Soltoff) with a copy of your Accommodation Determination Letter (provided to you by the Student Disability Services office) as soon as possible so that you may discuss with me how your accommodations may be implemented in this course.

Course schedule (lite)

# Date Topic Assignment due
1. Mon, Sep. 25 Introduction to Computational Social Science
2. Wed, Sep. 27 Science in a computational era
3. Mon, Oct. 2 Observational data - counting things
4. Wed, Oct. 4 Observational data - measurement
5. Mon, Oct. 9 Observational data - forecasting
6. Wed, Oct. 11 Observational data - approximating experiments
7. Mon, Oct. 16 Asking questions - fundamentals Proposing an observational study
8. Wed, Oct. 18 Asking questions - digital enrichment
9. Mon, Oct. 23 Experiments Proposing a survey study
10. Wed, Oct. 25 Experiments
11. Mon, Oct. 30 Simulated data Proposing an experiment
12. Wed, Nov. 1 Simulated data
13. Mon, Nov. 6 Collaboration Simulating your income
14. Wed, Nov. 8 Collaboration
15. Mon, Nov. 13 Ethics Collaboration
16. Wed, Nov. 15 Ethics
17. Mon, Nov. 20 Exploratory data analysis - univariate visualizations The ethics of the Montana election experiment
18. Wed, Nov. 22 Exploratory data analysis - multivariate visualizations
19. Mon, Nov. 27 Exploratory data analysis - clustering Exploring the General Social Survey
20. Wed, Nov. 29 Exploratory data analysis - dimension reduction
21. Mon, Dec. 4 Unsupervised learning

The final exam will be distributed on Tuesday December 5 at 12pm and must be submitted by 11:59pm Wednesday December 6.

Course schedule (readings)

All readings are required unless otherwise noted. Adjustments can be made throughout the quarter; be sure to check this repository frequently to make sure you know all the assigned readings.

  1. Introduction to computational social science
  2. Social science in a computational era
  3. Observational data (counting things)
  4. Observational data (measurement)
  5. Observational data (forecasting)
  6. Observational data (approximating experiments)
  7. Asking questions (fundamentals)
  8. Asking questions (digitally-enriched)
  9. Experiments
  10. Experiments (more)
  11. Simulated data
    • "Indirect Inference," New Palgrave Dictionary of Economics
    • Benoit, Kenneth, "Simulation Methodologies for Political Scientists," The Political Methodologist, 10:1, pp. 12-16.
    • Recommended readings on simulation methods (not required for class)
      • Wolpin, Kenneth I., The Limits of Inference without Theory, MIT Press, 2013.
      • Davidson, Russell and James G. MacKinnon, "Section 9.6: The Method of Simulated Moments," Econometric Theory and Methods, Oxford University Press, 2004.
  12. Simulated data (cont.)
  13. Collaboration
  14. Collaboration (cont.)
  15. Ethics
  16. Ethics (cont.)
  17. Exploratory data analysis
  18. Exploratory data analysis (cont.)
  19. Exploratory data analysis - dimension reduction
  20. Exploratory data analysis - clustering

persp-analysis's People

Contributors

ariboyarsky avatar bensoltoff avatar chenanhua avatar dailing616 avatar dgamarnik avatar fangfangwan avatar fuzhiyu avatar gmvelez avatar hyunkukwon avatar jfan3 avatar jgdenby avatar jheng18 avatar jmausolf avatar johnhenrypezzuto avatar khan1792 avatar leosonh avatar lwang11 avatar mcs2017 avatar nicholskl avatar nnickels avatar otamio avatar philipcaochicago avatar rchughes avatar rickecon avatar ruixue-li avatar shuting05 avatar siyiii avatar sumervaid avatar tamos avatar yyd007 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

persp-analysis's Issues

Questions re: Assignment 1

  • Should we be doing a literature review? If yes, to what extent?
  • Can you elaborate a bit on the following point? (Beyond what is in the instructions)
    • Are we justifying its use over other methods?

"A justification for how your proposed research design takes advantage of specific methods for observational study versus alternative observational methods"

  • Does this have to be a research project we are capable of doing? I.e., Can we suppose certain enabling factors (funding, access to data, time).

Additional experiment reading

Just a head's up, I added another reading for class tomorrow on a Twitter bot experiment designed to reduce racially biased online harassment. See the readme for the link. Make sure to read it for class tomorrow

Won't let me commit to student folder

For some reason when I try to commit something to my student folder (gamarnik_dan) it goes into a new "patch" instead (which I think is my forked directory of the class). It should be mapped to the master and not into the forked folder.

Updates to Problem Set 1

I have updated parts (a), (b), (c), and (d) in Problem Set 1.

  1. The first update is to have you do 10,000 simulations rather than 1,000 simulations. This will make your plots be smoother and your answers more uniform.
  2. The second change is to make the histogram in part (b) have 50 bins instead of 30 bins.
  3. The last change is to make your histograms for parts (c) and (d) only have as many bins as there are years in which people pay off their debts. For most people's simulations, this should only be 3 or 4 bins, but it is possible to have 5. It just depends on how many extreme outliers you get in your simulations.

Is it possible to borrow an MTurk account from others?

For example, can we ask a friend who's not in this course to register an account and use it to complete the question? Or if we don't have an SSN and therefore cannot get our account verified, the only option is the alternative question? Thanks!

Citation style

Is there a standard citation style we should use for sources in the assignments?

Issue with creating histogram

In exercises (1c) and (1d), you are asked to create histograms, respectively, for the frequency of the various years in which each simulated individual pays off his debt of $95,000. Some of you are having trouble getting your histograms to look right.

For example, suppose you are able to create a unidimensional vector named payoff_year of length 10,000 that contains the year in which each individual pays off his debt.

payoff_year = np.array([1986, 1987, 1986, 1988,... 1987])

You could use the np.unique() function to get a list of the unique year values and a count of how many of each value there is.

payoff_yr_list, payoff_yr_cnt = np.unique(payoff_year,  return_counts=True)

The vector payoff_yr_list will be a numpy array with each unique year in your big payoff_year array. The vector payoff_yr_cnt will have the same number of elements as payoff_yr_list and will contain the counts of how many times each corresponding year occurs.

You should only have three or four different unique years in which simulated households pay off their debt. You can make a nice looking histogram by passing in a bins argument.

plt.hist(payoff_year, bins=np.arange(start_bin, stop_bin, binwidth), weights=hist_wgts)

Suppose in the example above that you only had three unique years that ever occurred (1986, 1987, and 1988). The bins object tells the plt.hist() function where each bin boundary should be. The start_bin element is the left edge of the leftmost bin. If we want each bin centered on the year of payoff, and we want each bin to be the size of one year, then the binwidth = 1.0 and the start_bin = 1986 - 0.5. The stop bin just needs to be something less than 1 unit above the right edge of the right most bin. That right edge should be 1988 + 0.5, so stop_bin = 1989 would work great. Try this and see if you get a great looking histogram.

Summary of mid-course feedback

So a brief summary of the most frequent comments in the feedback you provided (for those interested, the response rate was approximately 50%, though I still don't know if my analysis is plagued by non-response bias):

What has contributed to your learning in this class

Lots of people like Bit by Bit

I too think it's a good summary of the major research designs and prominent research published in computational social science, though it does not serve as a replacement for reading the original research article. Several of you noted it was beneficial when I assigned an article also discussed in Bit by Bit as the textbook helped to summarize and clarify the major points from the article.

The connection to your own future research projects

A few of you said it was helpful that we are relating these projects to research you may conceivably do in the next couple of years here at UChicago. That is one of the major objectives for the course, and my hope is that you do not forget all about survey and experimental designs when it comes to your own research. It is easy to find a canned package of observational data and apply some machine learning algorithms to it, but remember the Schrodt article - we don't learn a ton more by re-analysis of the same dataset. Don't forget this when you think about your research paper in the spring and your thesis next year.

What you would like to change

Make it an elective

No.

Not enough application

Some of you asked when you'd be doing applied work, actually building and testing statistical models. You will be doing that next term (see the syllabus from last year for examples of what methods we will teach you). This term focuses on research design and the process of designing a research project prior to collecting data and analyzing results.

A lot of reading/not enough reading

Some asked for more articles to read with the different methods, while others asked for fewer articles. I am trying to strike a balance between exposing you to substantive applications of these broad methodological approaches and overloading you with reading. I already cut back from the reading load from last year. That said, the value of this course to you is what you make it. If you skip all the readings, then you will get very little out of class discussions and have nothing to contribute. If you read the articles, then you can contribute and you'll be able to better follow the discussion and contribute in small groups.

For those who want to read more articles, most of the Bit by Bit chapters include a table of additional articles employing the methods discussed in that chapter (I know at least this exists for the experiments chapter). I strongly encourage you to look at some of those readings if you want to dive further into a specific methodology.

Too many political science articles

I am a political scientist, so I admit I am a bit biased by my perspective and prior exposure. That said, I went back and tabulated the frequency of articles by major discipline:

Discipline Number of readings
Political science 10
Sociology 7
Statistics/other 6
Economics 3
Psychology 1-ish (the conservatives/liberals are happier article)

So we could use some more econ and psych articles, and perhaps fewer poli sci articles. I'll try to balance it out more in the second half of the term.

Class is too long/I'm hungry at noon

Why does Perspectives meet when it does? Simply put, we wanted a single time slot for the Perspectives course to meet in the fall, winter, and spring quarters. And there are a significant range of courses first-year Computation students may take, including required courses such as the CAPP programming sequence, linear algebra, statistics, etc. Computational psychology students also have a regimented sequence of courses that meet only once per term. Plus we draw a lot of certificate students from the MAPSS program which also includes some required courses. We tried to find a time that avoided conflicts with any of these requirements, which pretty much left MW at 11:30 as the only option.

We also found that an 80 minute class session was not sufficient on many days to cover the range of material we need to teach you at a sufficient depth, hence we extended it to a two-hour class. The alternative approach, which is common at UChicago, is to squeeze a semester of material (16 weeks) into 10 weeks by assigning tons of outside readings and assignments with no in-class instruction on the material. Which many students dislike, for good reason. As you've seen occasionally this term, if we finish the material early I have no problem ending class early. This is more likely to occur when students come prepared to discuss the articles and I can spend less time summarizing them. But I can only do that when the majority of students come to class prepared. A classic question of causality.

As for your hunger, look at it from my point of view. I teach Perspectives MW 11:30-1:20. I also teach a computing class MW 1:30-2:50. And I have a lab for that computing class W 3:00-4:20. If you're hungry during Perspectives, do what I do: eat lunch at 11. It's a bit early, so pack an afternoon snack (once you have a child of your own you'll discover the benefits of afternoon snacks).

Random seed of simulation assignment

Do we need to set the random seed in the assignment? If yes, do we need to set the same random seed as in the example, which is 524, or any random seed is OK? Thank you!

About rubic

Hi all!

I was wondering if the rubric for assignment 2 will be the same for assignment 1? When will you post it on Canvas?

Thanks!
Best,
Fiona

Initial income?

Is there a particular value we should set our initial income to? Is $1 acceptable, as it is effectively zero?

To clarify, I'm referring to income at t = 2019-1.

Edit: please disregard, I found the answer.

Can I Make a Pie Chart For My Kaggle Plot?

I would like to use a pie chart for my Kaggle plot. I know there is some controversy about pie charts in the world of data visualizations, and that pie charts do not have have axises that I can label.

Can I still score full points on my data visualization by creating a pie chart? I think it would be the best visualization for data I have in mind.

Survey paper evaluations posted on Canvas

Sorry for the delay in returning your scores on the last assignment. Distribution of scores compared to the first assignment:

assign

Median score is the same as the first assignment (8.5/10) with a little wider variance this time. If you have questions about your feedback, please feel free to reach out privately to me or one of the TAs.

Observational data assignment grades available

Check Canvas for your grades and comments on your observational data papers. Overall I am pleased with the results. We saw a substantial mix of topics and observational designs, demonstrating the wide range of computationally-enhanced approaches afforded by digital trace data. The distribution of grades was mixed:

assign_1

This is to be expected for the first assignment of the year. Ask any second-year student - it was an initial shock, but you will learn from this assignment and your writing will improve (as will your grades). If you have questions about your assignment feedback, please talk with me or one of the preceptors.

How to save README.md as a PDF

We want to be able to save the README.md file for the repository as a PDF. In this instance, the README.md is the syllabus for our Perspectives on Analysis class. My solution that worked pretty well is to do the following steps.

  1. Download the grip application. I used the Homebrew package manager to install grip.
  2. Navigate in terminal to the folder where your README.md file is. Type grip README.md.
  3. This will render the README.md file as HTML on a localhost URL that you can access via your browser. In your terminal, it should tell you where it has rendered the README.md file, for example: * Running on http://localhost:6419/.
  4. Paste the localhost URL into your browser.
  5. Use your browser's functionality to print the page.
  6. Select PDF as your printer.

This method worked pretty well (see syllabus.pdf) except for one strange overlap at the bottom of the first page and the top of the second page. Let me know if you find anything better.

Alternative to Amazon MTurk activity

If you tried to register a worker account on Amazon MTurk and were rejected, I just posted an alternative assignment you can complete for the collaboration homework. See the assignment instructions for more information

Potential Legal Issues?

Hi, Dr. Soltoff,

While I am waiting to hear back from the Amazon review team, I looked at the tasks and saw that many of them provide some monetary reward for finishing the task. My intuition, along with the OIA office's guidelines, is telling me that it is illegal for international students with F-1 visa to accept those tasks. By receiving any monetary compensation for their work from off-campus unauthorized sources, international students would be violating the specific government regulations that come attached to our F-1 status.

It is completely sensible to choose only tasks with zero compensation (since this is a homework assignment anyway), that's not the problem for me. I simply want to make sure that my understanding of the situation is correct. If so, perhaps you could let other international students know about this caveat. If it's not a problem at all, I would like to know the evidence supporting such a statement, namely, receiving financial compensation from employers at MTurk is legal for students with F-1 status.

Thanks!

Edit:
Here are some links to MTurk that seem to suggest Chinese citizens (from the mainland China), at least, are not legally allowed to register to become a turker.
MTurk is now available for Requesters in 10 more countries
FAQs

Pull Requests pending

Dear Joshua,

It seems my pull request is pending.

May I know what is wrong?

Best,
Xinyu

Matplotlib plots and histograms

In your problem sets, and in general analysis, you are asked to plot results and data. I wanted to give you some code for doing this using Python's matplotlib plotting library. Here is an advanced piece of plotting code for plotting a line. The plot following the code is the figure produced by that code. I will explain its separate pieces below. Then I will give some discussion about histograms.

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator
import os
...
graph = True
...
if graph:
    '''
    --------------------------------------------------------------------
    cur_path    = string, path name of current directory
    output_fldr = string, folder in current path to save files
    output_dir  = string, total path of images folder
    output_path = string, path of file name of figure to be saved
    year_vec    = (lf_years,) vector, years from beg_year to
                  beg_year + lf_years
    individual  = integer in [0, numdraws-1], index of particular series
                  to plot
    --------------------------------------------------------------------
    '''
    # Create directory if images directory does not already exist
    cur_path = os.path.split(os.path.abspath(__file__))[0]
    output_fldr = 'images'
    output_dir = os.path.join(cur_path, output_fldr)
    if not os.access(output_dir, os.F_OK):
        os.makedirs(output_dir)

    # Plot one lifetime income series from set of simulations
    x_vals = year_vec
    y_vals = inc_mat[:, 500]
    fig, ax = plt.subplots()
    plt.plot(x_vals, y_vals)
    # for the minor ticks, use no labels; default NullFormatter
    minorLocator = MultipleLocator(1)
    ax.xaxis.set_minor_locator(minorLocator)
    plt.grid(b=True, which='major', color='0.65', linestyle='-')
    plt.title('One simulated lifetime income path', fontsize=20)
    plt.xlabel(r'Year $t$')
    plt.ylabel(r'Annual income (\$s)')
    # plt.xlim((xmin, xmax))
    # plt.ylim((ymin, ymax))
    # plt.legend(loc='upper left')
    output_path = os.path.join(output_dir, 'Fig_1a')
    plt.savefig(output_path)
    # plt.show()
    plt.close()

fig_1a
The first lines of this code import the Python packages we need to run this code (numpy, matplotlib, and os). The first thing I do when I write a script that creates a plot is I create a Boolean (True or False, 0 or 1) that says whether or not that section of code will create the plot. This is nice because it separates the analysis from the plotting. Further, you can use code folding in your text editor to minimize the plotting commands under the indented if statement.

Immediately following the if graph: statement, you'll see some code defining paths and using the os package. This code is some nice housekeeping for images. The variable cur_path is a string of the path of the current directory from which you are running this script. Those 5 lines of code list the current directory, name an "images" folder to be placed in the current directory, then checks whether that folder already exists. If the folder does not already exist, it creates the folder. This creates a nice, intuitive place for you to save your images that does not clutter up the directory where your script resides.

The rest of the code is the plotting code. You could just write plt.plot(x_vals, y_vals), but you want your plot to be usable, labeled, and clean. The MultipleLocator package I imported makes nice gridlines for the plot. You also want to make sure that your plot has a title (telling you what it is) as well as clearly labeled axes. Not labeling your axes is one of the cardinal sins of rookie analysts. The philosophy is that a plot should be able to communicate its information independently.

The final four lines simply save the plot. Note the commented out plt.show() command. If you uncomment this, Python will produce the plot on your screen. However, a drawback to plt.show() is that it stops Python from running past that command. Finally, you want to include the plt.close() command at the end of the plotting script, or else you might fill up your computer's memory with plots. For some reason, matplotlib holds the plots in memory that are created until they are explicitly closed. Many times, while working on a script, I have noticed my computer slow down or freeze for no apparent reason. Often, I have realized that the reason for the slowdown was that my script was creating plots that had not been closed.

Not only does Problem Set 1 ask you to make a line plot [part (a)], but it also asks you to make a histogram. Below is some code to make a histogram. Suppose that the data for which I want to create a histogram is stored in a numpy array of length N called data. The following code will create that histogram.

fig, ax = plt.subplots()
hist_wgts = (1 / num_draws) * np.ones(num_draws)
num_bins = 50
plt.hist(data, num_bins, weights=hist_wgts)
plt.title('Histogram of first year ($t$=2018) income', fontsize=20)
plt.xlabel(r'Annual income (\$s)')
plt.ylabel(r'Percent of students')

fig_1b
One thing to note is that I have to give the plt.hist() function some weights in order to get it to plot output in which the height of each histogram bar represents the percent of the observations in that bin.

Add markdown instructions

Write a brief tutorial for creating Markdown documents

  • How it works
  • Formatting guide
  • Recommended Markdown editors

weather data

@rickecon

Hi professor,

When I downloaded the weather data "Daily Summaries" from the website for a given city and time, it gave me data from multiple stations within this city rather than an overall data of the city. In this case, should I just randomly pick up one station? Or should we calculate the average temperature across different stations on the same day...?

Thanks and happy thanksgiving!

yinxian

Asking Questions, Questions

For the Asking Questions assignment:

  • To what extent would you like the project developed (i.e., general topics for questions--> draft survey?)
  • Are the rules the same as the previous assignment re: reasonable assumptions of funding, etc.?
  • Would the specific tools (e.g., ODK, Google Forms) be worth including?

Experimental paper evaluations on Canvas

assign

Median grade increases to a 9/10. Distribution is slightly more skewed, mainly by proposals which don't actually include a digital experiment. If you have questions about your evaluation, please contact one of the TAs or myself in private. I had several students reach out to me last week and I think I was able to help clarify the original evaluations.

Clarification on simulations assignment submission format

I just want to clarify how you should submit your simulations assignment. The main objective is to submit it in a reproducible format. This can be any of the following:

  • R Markdown document (.Rmd knitted with output: md_document or output: github_document in the front matter)
  • Jupyter Notebook (.ipynb)
  • Python (.py) or R (.R) script which saves the graphs to the local directory AND a Markdown document .md which embeds the graphs and your written responses to questions

Generally for problem sets such as this I recommend a notebook format such as R Markdown or Jupyter Notebooks, as it embeds the code, output, and written analysis in a single document. This makes it easy to immediately read through and see how the code generates each of the graphs and statistics. That said, I know many students may not have used a notebook format before, instead only writing scripts in Python or R. Use whatever format seems most comfortable to you, as long as your final submission includes easy access to your code, graphs, and written answers from within GitHub (i.e. we should be able to run your code locally to view your responses, but this should not be required - all the pertinent information should be viewable directly in the repo).

Adding a Number Column to the Course Schedule?

Would be possible to add the class number to the course schedule? Right now the course schedule has the date, and the reading schedule has the class number, but it is a little bit tricky to see how they relate together.

I think adding a class number column would make it much easier to see which readings correspond each class.

about the journal article

How recent does it have to be? I am looking at a 2004 article, would that count as recent?

Thanks!

What to submit for problem sets

As described in the syllabus for this class, you will submit 4 problem sets during the last half of this term. These problem sets will primarily involve writing code. I want you to submit your assignments in a particular way.

  1. Your assignment submission will involve two parts: (a) your Python code, and (b) a PDF document that you compile using LaTeX that has your answers.
  2. Your Python code should use Python 3.5 or higher. I recommend that you download the Anaconda distribution of Python from Continuum Analytics.
  3. In your code, you should label your sections with the particular part of the Problem Set that that section of code is solving. Below is an example.
'''
------------------------------------------------------------------------
Exercise 1a: Simulate the data
------------------------------------------------------------------------
plot_1a     = Boolean, =True if make a plot of one series of the
              simulated income data
norm_errors = (lf_years, num_draws) matrix, normally distributed errors
              with mean 0 and standard deviation sigma
------------------------------------------------------------------------
'''
plot_1a = True
norm_errors = np.random.normal(0, sigma, (lf_years, num_draws))
  1. You should use a print command for any answers that your code produces in response to things the Problem Set asks for. In this way, we can just run your script and see what answers it produces.
print('1b. Percent of students getting more than $100k in first period: ',
      inc0_gt100k_pct * 100, '%')

The code above produces the following output when I run this script.

1b. Percent of students getting more than $100k in first period: x.xx %
  1. With regard to plots that the Problem Set asks for, follow the template in Issue #47 about having your script save those plots to an images folder. Further, make sure that your plot is included in the PDF document that you submit.
  2. For your PDF document with your answers that you produce using LaTeX, I have included a LaTeX tutorial document as well as a LaTeX problem set template in this repository. Note that when using the LaTeX problem set template, you must have the image document (pencildrawing.png) in the same folder as the template in order to compile the PDF. If you are not using any images, you can just delete or comment out that section of the template.

In summary, we will grade your assignments based on your code that you submit, what output it gives when we execute it, and your accompanying PDF with your answers.

Debt Question on PSet

Hi,

Nora and I were having difficulty in part 3. We are using R and trying to create a for loop in order to have it deduct from the debt.

The best that we have been able to figure out is the following, but we have two issues:

  1. We are stuck in the last step of figuring out how to call the debt from the previous year (since this is a dataframe instead of a vector, we aren't sure how to do that.

  2. We keep getting the following error and are not sure what the problem with our for loop is: Error in for (. in year) seq_along(id) : 4 arguments passed to 'for' which requires 3

Code:

simulatedIncome %>%
mutate(debt = 95000) %>%
for(year in seq_along(id)){
if(year == 2019) {
debt <- (95000 - income*.1)
} else{
debt <- simulatedIncome$debt[year-1] - income*.1
}
}

Thanks!

International students completing the collaboration assignment

It was brought to my attention by @ruixue-li and a couple other students that Amazon MTurk did not allow individuals of certain nationalities to register as workers. Additionally, some students expressed concerns that completing the assignment would be considered employment under US law and put their student visas at risk. I never intended for that to occur, and I understand if you do not wish to complete the MTurk portion of the homework assignment.

I am not familiar with other micro-task job market sites similar to Amazon MTurk, so I cannot assign you to work on a different site. If you are aware of one that you can legally participate on, feel free to use that site instead of MTurk - again, completing an hour's worth of micro-task assignments. But I am not asking you to go out and spend time hunting down such a site. As I stated in class, there is an alternative assignment available for students to complete if you cannot complete the MTurk assignment - the InfluenzaNet evaluation. Given all your experience gained this term in reading and assessing research articles, you should be able to complete the alternative assignment in a similar time frame (approximately one hour).

R Markdown as Submission?

I just wanted to check that it would be okay to submit my assignment as an html from an R markdown document with the R script and the photos of the MTurk pages.

Thanks!

Article on polling techniques and sources of error

I'm originally from the state of Virginia, which is holding an election this year for governor. I just saw this article in my Facebook newsfeed (aka the algorithmically-defined echo chamber) examining how public polling results differ wildly depending on the method of identifying the frame population (random-digit dialing vs. contacting only registered/active voters). I thought it an interesting discussion given our unit on the total survey error framework a couple weeks ago.

Re-designing a study to be more computational?

Is it OK to re-design a lightly computational study to be heavily computational?

For example, a study that uses Google search trends as an element of its dataset (lightly computational), redesigned to use more sophisticated computational techniques.

Finding Error Term

Hi,

How can I take the log of the error term, if the error term is sometimes negative? Should I be taking the absolute value of the error term? Or the I2 norm?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.