Grade and feedback

Hi all, really great job on this! I'm really sorry you had so many git/GitHub
issues but at least you were able to find a way forward. Upon looking at this
closer I do think it's an issue of a file being committed to the repo early
that should have never really been tracked. Usually RStudio updates the
.gitignore automatically for you to account for this, but somehow this did not
happen in this case. In fact, in the new repo you created, it is there as it
should be. Anyway... below is your grade according to each of the components,
and then I've followed up with some additional feedback.

Grading

No code is used repetitively (no more than twice) [10/10 points]
More than one variant of purrr::map is used [10/10 points]
At least one {purrr} function outside the basic map family (walk_*,
reduce, modify_*, etc.) [10/10 points] (this one was given freely)
At least one instance of parallel iteration (e.g., map2_*, pmap_*) [10/10
points]
At least one use case of tidyr::nest %>% mutate() [10/10 points]
At least two custom functions [20/20 points; 10 points each]
Code is fully reproducible and housed on GitHub [10/10 points]
No obvious errors in chosen output format [10/10 points]
Deployed on the web and shareable through a link [10/10 points]

Feedback

In your repo, you can actually put the link to the blog right at the top with
your description, and I would encourage you to do that because most people who
work with GitHub a lot will expect the link to appear there.
I really like your welcome page! That's a great way to get the audience setup
for the full tutorial.
You might want to consider a name other than "My Blog" for your blog.
You also might want to consider updating the "About" section. In fact, that
might be a great place to put your first blog post. Partly this would work well
because then it won't get indexed and be hard to find. It would always be in the
same place.
I'd recommend running the spell checker on all your posts. There were a few
typos throughout.
I really like how you used the same dataset throughout and the tutorials
built on top of each other. With that said, I think the link between the
tutorials could be a bit stronger. It sort of reads from my perspective like
you all chatted with each other about what you would cover, but didn't really
take the time to read each others posts in detail.

Tutorial 1

From my perspective (which may differ from yours and that's okay), I think
it's a little more clear to show a function doing some operations. So for
mean_function I probably would have put the code to "manually" calculate the
mean (I wouldn't worry about missing data though). Otherwise the reader may
think, "why not just use mean?", without realizing that mean is itself a
function.
Remember that you can surround code with back ticks, which I would recommend
rather than quotes (e.g., <- rather than "<-"). Your conventions like this
would be good to put in the welcome post too.
Note that most often the person you're making your code more readable for is
yourself at a later date.
You might also want to include informal testing (e.g., comparing
mean_function with mean).
The comments in your code chunk under the custom functions section fun off
the screen and scrolling unfortunately doesn't work. As a general rule, I'd
try to keep your code within 80 characters as a maximum width. You can setup
rulers in RStudio to help you with this. That will also ensure if you need to
render to PDF that it will all be there and not run off the page (as it did
here even in HTML).
Is there a reason you chose to use base::aggregate rather than something
like group_by(actor) %>% summarize(mean_score = mean(imdb_score))? Generally,
I think it's best to keep everything within a single style (tidyverse or base).
Similarly, I owul dnot store things in row.names because specific
operations (particularly in the tidyverse) will strip the rownames and you'll
lose data. I would just create a new column with the data instead.
Minor style note, when closing out your function, I would make sure your
final bracket is left-aligned.
Rather than using return to throw the error, I'd recommend using stop
with the same message because it works with things like traceback that can
help users with debugging (if they're advanced users).
Really nice examples with if statements. The only thing I'd recommend is
thinking about the function names a bit (i.e., Input_if is not terrifically
descriptive).
Really great job on the descriptives function!
Ha! I love the wizard!

Tutorial 2

I really like the use of slice(head(df_genre), 1:3). To make it more clear
what nest is doing, you might consider showing the data frame before nesting,
and then also showing the result of slice(head(df_genre$data), 1:3) after
nesting.
Note that map can work on any generic vector, and does not necessarily have
to be a list.
Rather than

save_map_c <- map(df_country$data, ~length(.x[[1]]))

I think it's a little more clear to use

save_map_c <- map(df_country$data, ~nrow(.x))

But note in this case it could even be a little more compact (which might be
less clear 🤷‍♂️) as map(df_country$data, nrow).

Because you're already using list columns here, it might have made sense to
store the length or nrow result as a new column in the df.
Rather than cSplit(data_new, "genres", "|"), it might have been better to
use data_new %>% separate("genres", sep = "|"), although I recognize this
would create a fair number of columns with missing data, but you could then
just select the first column. This basically relates to trying to keep
everything in the same style and using one set of packages. Of course, I'm not
familiar with cSplit so maybe it's worth it here, it just seems like an easy
area to introduce a source of confusion.
Love the meme!

Tutorial 3

Did we learn how to extract information from models in the previous tutorial?
I saw fitting of the model but not extracting info.
Good review - maybe I missed the extraction part, but it's good to have it
there anyway.
Great non-example, showing that you can't feed lists of models to anova
Great job pulling the "Action" genre out for comparison
When you create things like model_comparison_df, I'd recommend showing a
little preview of what the resulting structure looks like. I see you do that
later, but I would do it for the intermediary one too.
I think I'd recommend doing this a bit differently. We've already seen list
columns in the previous tutorial, so I would that and add the models as list
columns. I would do something like

by_main_genre %>%
	mutate(model1 = map(data, ~lm(profit ~ movie_facebook_likes, .x)),
	       model2 = map(data, ~lm(profit ~ movie_facebook_likes + 
	                              				 imdb_score,
                        				.x)))

to fit the models, and then you could just add another column for the anova,
etc. This would allow you to avoid all the transformations needed to create
model_comparison_df.

I felt like the description of list("Pr(>F)", 2) could have been a little
stronger. Basically when you extract "Pr(>F)" it returns a vector of length
two, so you want to pull just the second element from that vector.
Where's the meme!? 🙃

Tutorial 4

I would say "We can use facet_wrap" rather than "We can use facet-wrap".
You state that we're going to use pmap. I would take a sentence or two to
describe (conceptually) what it is, and why it is needed here. (and now I see
you did, but I think it would have helped to add it earlier)
I mentioned this earlier in a different tutorial, but make sure to wrap code
in back ticks to distinguish it (e.g., ..1)
Where did coefs_plot come from?
I don't see the coefficient for Facebook for Model 1 in the plots...
shouldn't that be there?
In the example you've given, you could have used map2 rather than pmap
because you're only looping through two things, the data and the genre.

15jennlee15 / spring2019_rfinal Goto Github PK

spring2019_rfinal's Introduction

spring2019_rfinal's People

Contributors

Watchers

spring2019_rfinal's Issues

Grade and feedback

Grading

Feedback

Tutorial 1

Tutorial 2

Tutorial 3

Tutorial 4

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent