spring2019_RFinal
15jennlee15 / spring2019_rfinal Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
spring2019_RFinal
Hi all, really great job on this! I'm really sorry you had so many git/GitHub
issues but at least you were able to find a way forward. Upon looking at this
closer I do think it's an issue of a file being committed to the repo early
that should have never really been tracked. Usually RStudio updates the
.gitignore
automatically for you to account for this, but somehow this did not
happen in this case. In fact, in the new repo you created, it is there as it
should be. Anyway... below is your grade according to each of the components,
and then I've followed up with some additional feedback.
No code is used repetitively (no more than twice) [10/10 points]
More than one variant of purrr::map
is used [10/10 points]
At least one {purrr} function outside the basic map
family (walk_*
,
reduce
, modify_*
, etc.) [10/10 points] (this one was given freely)
At least one instance of parallel iteration (e.g., map2_*
, pmap_*
) [10/10
points]
At least one use case of tidyr::nest %>% mutate()
[10/10 points]
At least two custom functions [20/20 points; 10 points each]
Code is fully reproducible and housed on GitHub [10/10 points]
No obvious errors in chosen output format [10/10 points]
Deployed on the web and shareable through a link [10/10 points]
In your repo, you can actually put the link to the blog right at the top with
your description, and I would encourage you to do that because most people who
work with GitHub a lot will expect the link to appear there.
I really like your welcome page! That's a great way to get the audience setup
for the full tutorial.
You might want to consider a name other than "My Blog" for your blog.
You also might want to consider updating the "About" section. In fact, that
might be a great place to put your first blog post. Partly this would work well
because then it won't get indexed and be hard to find. It would always be in the
same place.
I'd recommend running the spell checker on all your posts. There were a few
typos throughout.
I really like how you used the same dataset throughout and the tutorials
built on top of each other. With that said, I think the link between the
tutorials could be a bit stronger. It sort of reads from my perspective like
you all chatted with each other about what you would cover, but didn't really
take the time to read each others posts in detail.
From my perspective (which may differ from yours and that's okay), I think
it's a little more clear to show a function doing some operations. So for
mean_function
I probably would have put the code to "manually" calculate the
mean (I wouldn't worry about missing data though). Otherwise the reader may
think, "why not just use mean
?", without realizing that mean
is itself a
function.
Remember that you can surround code with back ticks, which I would recommend
rather than quotes (e.g., <-
rather than "<-"). Your conventions like this
would be good to put in the welcome post too.
Note that most often the person you're making your code more readable for is
yourself at a later date.
You might also want to include informal testing (e.g., comparing
mean_function
with mean
).
The comments in your code chunk under the custom functions section fun off
the screen and scrolling unfortunately doesn't work. As a general rule, I'd
try to keep your code within 80 characters as a maximum width. You can setup
rulers in RStudio to help you with this. That will also ensure if you need to
render to PDF that it will all be there and not run off the page (as it did
here even in HTML).
Is there a reason you chose to use base::aggregate
rather than something
like group_by(actor) %>% summarize(mean_score = mean(imdb_score))
? Generally,
I think it's best to keep everything within a single style (tidyverse or base).
Similarly, I owul dnot store things in row.names
because specific
operations (particularly in the tidyverse) will strip the rownames and you'll
lose data. I would just create a new column with the data instead.
Minor style note, when closing out your function, I would make sure your
final bracket is left-aligned.
Rather than using return
to throw the error, I'd recommend using stop
with the same message because it works with things like traceback
that can
help users with debugging (if they're advanced users).
Really nice examples with if
statements. The only thing I'd recommend is
thinking about the function names a bit (i.e., Input_if
is not terrifically
descriptive).
Really great job on the descriptives
function!
Ha! I love the wizard!
I really like the use of slice(head(df_genre), 1:3)
. To make it more clear
what nest
is doing, you might consider showing the data frame before nesting,
and then also showing the result of slice(head(df_genre$data), 1:3)
after
nesting.
Note that map
can work on any generic vector, and does not necessarily have
to be a list.
Rather than
save_map_c <- map(df_country$data, ~length(.x[[1]]))
I think it's a little more clear to use
save_map_c <- map(df_country$data, ~nrow(.x))
But note in this case it could even be a little more compact (which might be
less clear ๐คทโโ๏ธ) as map(df_country$data, nrow)
.
Because you're already using list columns here, it might have made sense to
store the length
or nrow
result as a new column in the df.
Rather than cSplit(data_new, "genres", "|")
, it might have been better to
use data_new %>% separate("genres", sep = "|")
, although I recognize this
would create a fair number of columns with missing data, but you could then
just select the first column. This basically relates to trying to keep
everything in the same style and using one set of packages. Of course, I'm not
familiar with cSplit
so maybe it's worth it here, it just seems like an easy
area to introduce a source of confusion.
Love the meme!
Did we learn how to extract information from models in the previous tutorial?
I saw fitting of the model but not extracting info.
Good review - maybe I missed the extraction part, but it's good to have it
there anyway.
Great non-example, showing that you can't feed lists of models to anova
Great job pulling the "Action" genre out for comparison
When you create things like model_comparison_df
, I'd recommend showing a
little preview of what the resulting structure looks like. I see you do that
later, but I would do it for the intermediary one too.
I think I'd recommend doing this a bit differently. We've already seen list
columns in the previous tutorial, so I would that and add the models as list
columns. I would do something like
by_main_genre %>%
mutate(model1 = map(data, ~lm(profit ~ movie_facebook_likes, .x)),
model2 = map(data, ~lm(profit ~ movie_facebook_likes +
imdb_score,
.x)))
to fit the models, and then you could just add another column for the anova,
etc. This would allow you to avoid all the transformations needed to create
model_comparison_df
.
I felt like the description of list("Pr(>F)", 2)
could have been a little
stronger. Basically when you extract "Pr(>F)"
it returns a vector of length
two, so you want to pull just the second element from that vector.
Where's the meme!? ๐
I would say "We can use facet_wrap
" rather than "We can use facet-wrap".
You state that we're going to use pmap
. I would take a sentence or two to
describe (conceptually) what it is, and why it is needed here. (and now I see
you did, but I think it would have helped to add it earlier)
I mentioned this earlier in a different tutorial, but make sure to wrap code
in back ticks to distinguish it (e.g., ..1
)
Where did coefs_plot
come from?
I don't see the coefficient for Facebook for Model 1 in the plots...
shouldn't that be there?
In the example you've given, you could have used map2
rather than pmap
because you're only looping through two things, the data and the genre.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.