Giter Site home page Giter Site logo

spring2019_rfinal's Introduction

spring2019_RFinal

spring2019_rfinal's People

Contributors

15jennlee15 avatar kdenning avatar tamaraniella avatar karlenaochoa avatar

Watchers

James Cloos avatar  avatar

spring2019_rfinal's Issues

Grade and feedback

Hi all, really great job on this! I'm really sorry you had so many git/GitHub
issues but at least you were able to find a way forward. Upon looking at this
closer I do think it's an issue of a file being committed to the repo early
that should have never really been tracked. Usually RStudio updates the
.gitignore automatically for you to account for this, but somehow this did not
happen in this case. In fact, in the new repo you created, it is there as it
should be. Anyway... below is your grade according to each of the components,
and then I've followed up with some additional feedback.

Grading

  • No code is used repetitively (no more than twice) [10/10 points]

  • More than one variant of purrr::map is used [10/10 points]

  • At least one {purrr} function outside the basic map family (walk_*,
    reduce, modify_*, etc.) [10/10 points] (this one was given freely)

  • At least one instance of parallel iteration (e.g., map2_*, pmap_*) [10/10
    points]

  • At least one use case of tidyr::nest %>% mutate() [10/10 points]

  • At least two custom functions [20/20 points; 10 points each]

  • Code is fully reproducible and housed on GitHub [10/10 points]

  • No obvious errors in chosen output format [10/10 points]

  • Deployed on the web and shareable through a link [10/10 points]

Feedback

  • In your repo, you can actually put the link to the blog right at the top with
    your description, and I would encourage you to do that because most people who
    work with GitHub a lot will expect the link to appear there.

  • I really like your welcome page! That's a great way to get the audience setup
    for the full tutorial.

  • You might want to consider a name other than "My Blog" for your blog.

  • You also might want to consider updating the "About" section. In fact, that
    might be a great place to put your first blog post. Partly this would work well
    because then it won't get indexed and be hard to find. It would always be in the
    same place.

  • I'd recommend running the spell checker on all your posts. There were a few
    typos throughout.

  • I really like how you used the same dataset throughout and the tutorials
    built on top of each other. With that said, I think the link between the
    tutorials could be a bit stronger. It sort of reads from my perspective like
    you all chatted with each other about what you would cover, but didn't really
    take the time to read each others posts in detail.

Tutorial 1

  • From my perspective (which may differ from yours and that's okay), I think
    it's a little more clear to show a function doing some operations. So for
    mean_function I probably would have put the code to "manually" calculate the
    mean (I wouldn't worry about missing data though). Otherwise the reader may
    think, "why not just use mean?", without realizing that mean is itself a
    function.

  • Remember that you can surround code with back ticks, which I would recommend
    rather than quotes (e.g., <- rather than "<-"). Your conventions like this
    would be good to put in the welcome post too.

  • Note that most often the person you're making your code more readable for is
    yourself at a later date.

  • You might also want to include informal testing (e.g., comparing
    mean_function with mean).

  • The comments in your code chunk under the custom functions section fun off
    the screen and scrolling unfortunately doesn't work. As a general rule, I'd
    try to keep your code within 80 characters as a maximum width. You can setup
    rulers in RStudio to help you with this. That will also ensure if you need to
    render to PDF that it will all be there and not run off the page (as it did
    here even in HTML).

  • Is there a reason you chose to use base::aggregate rather than something
    like group_by(actor) %>% summarize(mean_score = mean(imdb_score))? Generally,
    I think it's best to keep everything within a single style (tidyverse or base).

  • Similarly, I owul dnot store things in row.names because specific
    operations (particularly in the tidyverse) will strip the rownames and you'll
    lose data. I would just create a new column with the data instead.

  • Minor style note, when closing out your function, I would make sure your
    final bracket is left-aligned.

  • Rather than using return to throw the error, I'd recommend using stop
    with the same message because it works with things like traceback that can
    help users with debugging (if they're advanced users).

  • Really nice examples with if statements. The only thing I'd recommend is
    thinking about the function names a bit (i.e., Input_if is not terrifically
    descriptive).

  • Really great job on the descriptives function!

  • Ha! I love the wizard!

Tutorial 2

  • I really like the use of slice(head(df_genre), 1:3). To make it more clear
    what nest is doing, you might consider showing the data frame before nesting,
    and then also showing the result of slice(head(df_genre$data), 1:3) after
    nesting.

  • Note that map can work on any generic vector, and does not necessarily have
    to be a list.

  • Rather than

save_map_c <- map(df_country$data, ~length(.x[[1]]))

I think it's a little more clear to use

save_map_c <- map(df_country$data, ~nrow(.x))

But note in this case it could even be a little more compact (which might be
less clear ๐Ÿคทโ€โ™‚๏ธ) as map(df_country$data, nrow).

  • Because you're already using list columns here, it might have made sense to
    store the length or nrow result as a new column in the df.

  • Rather than cSplit(data_new, "genres", "|"), it might have been better to
    use data_new %>% separate("genres", sep = "|"), although I recognize this
    would create a fair number of columns with missing data, but you could then
    just select the first column. This basically relates to trying to keep
    everything in the same style and using one set of packages. Of course, I'm not
    familiar with cSplit so maybe it's worth it here, it just seems like an easy
    area to introduce a source of confusion.

  • Love the meme!

Tutorial 3

  • Did we learn how to extract information from models in the previous tutorial?
    I saw fitting of the model but not extracting info.

  • Good review - maybe I missed the extraction part, but it's good to have it
    there anyway.

  • Great non-example, showing that you can't feed lists of models to anova

  • Great job pulling the "Action" genre out for comparison

  • When you create things like model_comparison_df, I'd recommend showing a
    little preview of what the resulting structure looks like. I see you do that
    later, but I would do it for the intermediary one too.

  • I think I'd recommend doing this a bit differently. We've already seen list
    columns in the previous tutorial, so I would that and add the models as list
    columns. I would do something like

by_main_genre %>%
	mutate(model1 = map(data, ~lm(profit ~ movie_facebook_likes, .x)),
	       model2 = map(data, ~lm(profit ~ movie_facebook_likes + 
	                              				 imdb_score,
                        				.x)))

to fit the models, and then you could just add another column for the anova,
etc. This would allow you to avoid all the transformations needed to create
model_comparison_df.

  • I felt like the description of list("Pr(>F)", 2) could have been a little
    stronger. Basically when you extract "Pr(>F)" it returns a vector of length
    two, so you want to pull just the second element from that vector.

  • Where's the meme!? ๐Ÿ™ƒ

Tutorial 4

  • I would say "We can use facet_wrap" rather than "We can use facet-wrap".

  • You state that we're going to use pmap. I would take a sentence or two to
    describe (conceptually) what it is, and why it is needed here. (and now I see
    you did, but I think it would have helped to add it earlier)

  • I mentioned this earlier in a different tutorial, but make sure to wrap code
    in back ticks to distinguish it (e.g., ..1)

  • Where did coefs_plot come from?

  • I don't see the coefficient for Facebook for Model 1 in the plots...
    shouldn't that be there?

  • In the example you've given, you could have used map2 rather than pmap
    because you're only looping through two things, the data and the genre.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.