lucymcgowan / tidycode Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
I think the parser has some issues with readr/data, but unsure if it's the use of a function data
for an object?
library(tidycode)
code = 'library(readr)
# data from https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
url = paste0("https://archive.ics.uci.edu/ml/machine-learning-databases/",
"breast-cancer-wisconsin/wdbc")
info = readLines(paste0(url, ".names"))
features = c("radius", "texture", "perimeter", "area", "smoothness",
"compactness", "concavity", "concave_points", "symmetry", "fractal_dimension")
measures = c("mean", "se", "worst")
hdr = c(outer(features, measures, paste, sep = "_"))
hdr = c("id", "dx", hdr)
data = readr::read_csv(paste0(url, ".data"), col_names = hdr,
na = c("", "NA", "?"))
'
res = matahari::dance_recital(code)
out = get_package_functions(res$expr)
#> Registered S3 method overwritten by 'pryr':
#> method from
#> print.bytes Rcpp
#> Error: Some of the packages in your call list have not been installed.
#> Please install the following package before proceeding:
#> * data = readr
Created on 2021-03-01 by the reprex package (v1.0.0)
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.2 (2020-06-22)
#> os macOS Catalina 10.15.7
#> system x86_64, darwin17.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2021-03-01
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib source
#> assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.0.0)
#> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.2)
#> cli 2.3.0 2021-01-31 [1] CRAN (R 4.0.2)
#> clipr 0.7.1 2020-10-08 [1] CRAN (R 4.0.2)
#> codetools 0.2-18 2020-11-04 [1] CRAN (R 4.0.2)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.2)
#> curl 4.3 2019-12-02 [2] CRAN (R 4.0.0)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.1 2020-05-15 [2] CRAN (R 4.0.0)
#> evaluate 0.14 2019-05-28 [2] CRAN (R 4.0.0)
#> fs 1.5.0 2020-07-31 [2] CRAN (R 4.0.2)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> highr 0.8 2019-03-20 [2] CRAN (R 4.0.0)
#> hms 1.0.0 2021-01-13 [1] CRAN (R 4.0.2)
#> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
#> jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.2)
#> knitr 1.31 2021-01-27 [1] CRAN (R 4.0.2)
#> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.2)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2)
#> matahari 0.1.3 2020-02-06 [1] CRAN (R 4.0.2)
#> pillar 1.4.7 2020-11-20 [1] CRAN (R 4.0.2)
#> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.0.0)
#> pryr 0.1.4 2018-02-18 [1] CRAN (R 4.0.2)
#> purrr 0.3.4 2020-04-17 [2] CRAN (R 4.0.0)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
#> Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.2)
#> readr * 1.4.0 2020-10-05 [1] CRAN (R 4.0.2)
#> reprex 1.0.0 2021-01-27 [1] CRAN (R 4.0.2)
#> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.2)
#> rmarkdown 2.6 2020-12-14 [1] CRAN (R 4.0.2)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.2)
#> sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 4.0.0)
#> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.2)
#> stringr 1.4.0 2019-02-10 [2] CRAN (R 4.0.0)
#> styler 1.3.2 2020-02-23 [2] CRAN (R 4.0.0)
#> tibble 3.0.6 2021-01-29 [1] CRAN (R 4.0.2)
#> tidycode * 0.1.1 2021-03-01 [1] Github (LucyMcGowan/tidycode@f65c3f9)
#> vctrs 0.3.6 2020-12-17 [1] CRAN (R 4.0.2)
#> withr 2.4.1 2021-01-26 [1] CRAN (R 4.0.2)
#> xfun 0.21 2021-02-10 [1] CRAN (R 4.0.2)
#> yaml 2.2.1 2020-02-01 [2] CRAN (R 4.0.0)
#>
#> [1] /Users/johnmuschelli/Library/R/4.0/library
#> [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
Thanks for an amazing package. It is proving invaluable for a (very) nascent project in which my colleagues and I are trying to understand how beginning data scientists learn to visualize data.
One question: is there existing functionality - or would it be desirable to add functionality - for calculating the proportion of a total R file classified to different categories?
As I now write out an example, I wonder if this is trivial and something folks can just do; but also wonder if it would be helpful?
library(tidyverse)
library(tidycode)
d <- read_rfiles(
tidycode_example("example_plot.R"),
tidycode_example("example_analysis.R")
)
u <- unnest_calls(d, expr)
p <- u %>%
dplyr::inner_join(
get_classifications("crowdsource", include_duplicates = FALSE)
) %>%
dplyr::anti_join(get_stopfuncs()) %>%
dplyr::select(file, func, classification)
#> Joining, by = "func"
#> Joining, by = "func"
f <- function(d) {
d %>%
count(file, classification) %>%
group_by(file) %>%
mutate(prop = n / sum(n))
}
f(p)
#> # A tibble: 7 x 4
#> # Groups: file [2]
#> file classification n prop
#> <chr> <chr> <int> <dbl>
#> 1 /Library/Frameworks/R.framework/Versions/3.6/… data cleaning 2 0.286
#> 2 /Library/Frameworks/R.framework/Versions/3.6/… exploratory 1 0.143
#> 3 /Library/Frameworks/R.framework/Versions/3.6/… setup 3 0.429
#> 4 /Library/Frameworks/R.framework/Versions/3.6/… visualization 1 0.143
#> 5 /Library/Frameworks/R.framework/Versions/3.6/… data cleaning 4 0.5
#> 6 /Library/Frameworks/R.framework/Versions/3.6/… setup 1 0.125
#> 7 /Library/Frameworks/R.framework/Versions/3.6/… visualization 3 0.375
Created on 2019-11-22 by the reprex package (v0.3.0)
Hi Dr.McGowan,
I'm using your tidycode pkg for my independent study. I used it on one of the R scripts I have written in tidyverse syntax and compare the result to my (eye-balled) classification. There is one discrepancy where I would classify the functions as "Exploratory" rather than "Data Cleaning," which is what the tidycode package gave. I recreated those lines and replaced the dataset with the built-in dataset mtcars and obtained the same results (that the used functions such as summarize()
and mean()
are classified as Data Cleaning rather than exploratory):
library(tidyverse)
data(mtcars)
mtcars %>% summarize(mean(hp, na.rm = TRUE))
mtcars %>% group_by(cyl) %>% summarize(mean(wt, na.rm = TRUE))
Does the package classify all dplyr functions to be Data Cleaning? Is there any way we can remedy this? Thank you.
Hi,
I'm using tidycode
to analyze students' code, and I wondered about something when looking at what follows:
> "purrr::map_dbl(mtcars, mean)" %>%
dance_recital() %>%
unnest_calls(expr)
# A tibble: 1 x 7
value error output warnings messages func args
<list> <list> <list> <list> <list> <chr> <list>
1 <dbl [11]> <NULL> <chr [1]> <chr [0]> <chr [0]> map_dbl <list [2]>
I guess that the behavior above (i.e., the call to mean
is not detected) is closely related to the fact that getParseData(parse(text = "map_dbl(mtcars, mean)"))
detects mean
as a SYMBOL
.
The annoying thing is that, by using functionals, students can "hide" function calls. For instance, if I tell them to create a my_factorial
function that does not call R's factorial
but rather computes the factorial recursively, they can "cheat" and simply do my_factorial <- function(x) purrr::map_dbl(x, factorial)
.
> body(my_factorial) %>% deparse() %>% dance_recital() %>% unnest_calls(expr)
# A tibble: 3 x 7
value error output warnings messages func args
<list> <list> <list> <list> <list> <chr> <list>
1 <NULL> <smplErrr> <NULL> <NULL> <NULL> :: <list [2]>
2 <NULL> <smplErrr> <NULL> <NULL> <NULL> purrr <list [2]>
3 <NULL> <smplErrr> <NULL> <NULL> <NULL> map_dbl <list [2]>
Right now, I prevent this by brute forcing the code analysis (i.e., I use stringr::str_detect
), but I find this solution somewhat unpleasant...
> body(my_factorial) %>% deparse() %>% stringr::str_detect("factorial")
[1] TRUE
Any idea?
@jtleek : I guess that students could also hide p-hacking from you this way :)
PS: Up to yesterday, I didn't know about tidycode
and matahari
, those tools are pretty cool!
Right now, I check is_model()
based on the class of the value
object -- this means a value has to be obtained, which wouldn't happen in a function 😢, so this needs to be fixed.
Hi Dr.McGowan,
I'm trying to understand the score column of classification_tbl.csv file, but I couldn't find any documentation of the meaning of the variable and its role. I'd really appreciate it if you can explain this variable or point me towards where I can find information on this column. Thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.