Giter Site home page Giter Site logo

tidycode's Issues

Failure when :: and `=`

I think the parser has some issues with readr/data, but unsure if it's the use of a function data for an object?

library(tidycode)
code = 'library(readr)

# data from https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
url = paste0("https://archive.ics.uci.edu/ml/machine-learning-databases/", 
             "breast-cancer-wisconsin/wdbc")
info = readLines(paste0(url, ".names"))
features = c("radius", "texture", "perimeter", "area", "smoothness", 
             "compactness", "concavity", "concave_points", "symmetry", "fractal_dimension")
measures = c("mean", "se", "worst")
hdr = c(outer(features, measures, paste, sep = "_"))
hdr = c("id", "dx", hdr)
data = readr::read_csv(paste0(url, ".data"), col_names = hdr,
                       na = c("", "NA", "?"))
'
res = matahari::dance_recital(code)
out = get_package_functions(res$expr)
#> Registered S3 method overwritten by 'pryr':
#>   method      from
#>   print.bytes Rcpp
#> Error: Some of the packages in your call list have not been installed.
#> Please install the following package before proceeding:
#>  * data = readr

Created on 2021-03-01 by the reprex package (v1.0.0)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.2 (2020-06-22)
#>  os       macOS Catalina 10.15.7      
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2021-03-01                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source                               
#>  assertthat    0.2.1   2019-03-21 [2] CRAN (R 4.0.0)                       
#>  backports     1.2.1   2020-12-09 [1] CRAN (R 4.0.2)                       
#>  cli           2.3.0   2021-01-31 [1] CRAN (R 4.0.2)                       
#>  clipr         0.7.1   2020-10-08 [1] CRAN (R 4.0.2)                       
#>  codetools     0.2-18  2020-11-04 [1] CRAN (R 4.0.2)                       
#>  crayon        1.4.1   2021-02-08 [1] CRAN (R 4.0.2)                       
#>  curl          4.3     2019-12-02 [2] CRAN (R 4.0.0)                       
#>  digest        0.6.27  2020-10-24 [1] CRAN (R 4.0.2)                       
#>  ellipsis      0.3.1   2020-05-15 [2] CRAN (R 4.0.0)                       
#>  evaluate      0.14    2019-05-28 [2] CRAN (R 4.0.0)                       
#>  fs            1.5.0   2020-07-31 [2] CRAN (R 4.0.2)                       
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)                       
#>  highr         0.8     2019-03-20 [2] CRAN (R 4.0.0)                       
#>  hms           1.0.0   2021-01-13 [1] CRAN (R 4.0.2)                       
#>  htmltools     0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)                       
#>  jsonlite      1.7.2   2020-12-09 [1] CRAN (R 4.0.2)                       
#>  knitr         1.31    2021-01-27 [1] CRAN (R 4.0.2)                       
#>  lifecycle     1.0.0   2021-02-15 [1] CRAN (R 4.0.2)                       
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.0.2)                       
#>  matahari      0.1.3   2020-02-06 [1] CRAN (R 4.0.2)                       
#>  pillar        1.4.7   2020-11-20 [1] CRAN (R 4.0.2)                       
#>  pkgconfig     2.0.3   2019-09-22 [2] CRAN (R 4.0.0)                       
#>  pryr          0.1.4   2018-02-18 [1] CRAN (R 4.0.2)                       
#>  purrr         0.3.4   2020-04-17 [2] CRAN (R 4.0.0)                       
#>  R6            2.5.0   2020-10-28 [1] CRAN (R 4.0.2)                       
#>  Rcpp          1.0.6   2021-01-15 [1] CRAN (R 4.0.2)                       
#>  readr       * 1.4.0   2020-10-05 [1] CRAN (R 4.0.2)                       
#>  reprex        1.0.0   2021-01-27 [1] CRAN (R 4.0.2)                       
#>  rlang         0.4.10  2020-12-30 [1] CRAN (R 4.0.2)                       
#>  rmarkdown     2.6     2020-12-14 [1] CRAN (R 4.0.2)                       
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.0.2)                       
#>  sessioninfo   1.1.1   2018-11-05 [2] CRAN (R 4.0.0)                       
#>  stringi       1.5.3   2020-09-09 [1] CRAN (R 4.0.2)                       
#>  stringr       1.4.0   2019-02-10 [2] CRAN (R 4.0.0)                       
#>  styler        1.3.2   2020-02-23 [2] CRAN (R 4.0.0)                       
#>  tibble        3.0.6   2021-01-29 [1] CRAN (R 4.0.2)                       
#>  tidycode    * 0.1.1   2021-03-01 [1] Github (LucyMcGowan/tidycode@f65c3f9)
#>  vctrs         0.3.6   2020-12-17 [1] CRAN (R 4.0.2)                       
#>  withr         2.4.1   2021-01-26 [1] CRAN (R 4.0.2)                       
#>  xfun          0.21    2021-02-10 [1] CRAN (R 4.0.2)                       
#>  yaml          2.2.1   2020-02-01 [2] CRAN (R 4.0.0)                       
#> 
#> [1] /Users/johnmuschelli/Library/R/4.0/library
#> [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

Feature request: proportion of an R file classified to different categories?

Thanks for an amazing package. It is proving invaluable for a (very) nascent project in which my colleagues and I are trying to understand how beginning data scientists learn to visualize data.

One question: is there existing functionality - or would it be desirable to add functionality - for calculating the proportion of a total R file classified to different categories?

As I now write out an example, I wonder if this is trivial and something folks can just do; but also wonder if it would be helpful?

library(tidyverse)
library(tidycode)

d <- read_rfiles(
  tidycode_example("example_plot.R"),
  tidycode_example("example_analysis.R")
)

u <- unnest_calls(d, expr)

p <- u %>%
  dplyr::inner_join(
    get_classifications("crowdsource", include_duplicates = FALSE)
  ) %>%
  dplyr::anti_join(get_stopfuncs()) %>%
  dplyr::select(file, func, classification)
#> Joining, by = "func"
#> Joining, by = "func"

f <- function(d) {
  d %>% 
    count(file, classification) %>% 
    group_by(file) %>% 
    mutate(prop = n / sum(n))
}

f(p)
#> # A tibble: 7 x 4
#> # Groups:   file [2]
#>   file                                           classification     n  prop
#>   <chr>                                          <chr>          <int> <dbl>
#> 1 /Library/Frameworks/R.framework/Versions/3.6/… data cleaning      2 0.286
#> 2 /Library/Frameworks/R.framework/Versions/3.6/… exploratory        1 0.143
#> 3 /Library/Frameworks/R.framework/Versions/3.6/… setup              3 0.429
#> 4 /Library/Frameworks/R.framework/Versions/3.6/… visualization      1 0.143
#> 5 /Library/Frameworks/R.framework/Versions/3.6/… data cleaning      4 0.5  
#> 6 /Library/Frameworks/R.framework/Versions/3.6/… setup              1 0.125
#> 7 /Library/Frameworks/R.framework/Versions/3.6/… visualization      3 0.375

Created on 2019-11-22 by the reprex package (v0.3.0)

Data Cleaning vs Exploratory

Hi Dr.McGowan,

I'm using your tidycode pkg for my independent study. I used it on one of the R scripts I have written in tidyverse syntax and compare the result to my (eye-balled) classification. There is one discrepancy where I would classify the functions as "Exploratory" rather than "Data Cleaning," which is what the tidycode package gave. I recreated those lines and replaced the dataset with the built-in dataset mtcars and obtained the same results (that the used functions such as summarize() and mean() are classified as Data Cleaning rather than exploratory):

library(tidyverse)
data(mtcars)

mtcars %>% summarize(mean(hp, na.rm = TRUE))
mtcars %>% group_by(cyl) %>% summarize(mean(wt, na.rm = TRUE))

Does the package classify all dplyr functions to be Data Cleaning? Is there any way we can remedy this? Thank you.

Functional programming: symbols vs function calls

Hi,

I'm using tidycode to analyze students' code, and I wondered about something when looking at what follows:

> "purrr::map_dbl(mtcars, mean)" %>%
  dance_recital() %>%
  unnest_calls(expr)
# A tibble: 1 x 7
  value      error  output    warnings  messages  func    args      
  <list>     <list> <list>    <list>    <list>    <chr>   <list>    
1 <dbl [11]> <NULL> <chr [1]> <chr [0]> <chr [0]> map_dbl <list [2]>

I guess that the behavior above (i.e., the call to mean is not detected) is closely related to the fact that getParseData(parse(text = "map_dbl(mtcars, mean)")) detects mean as a SYMBOL.

The annoying thing is that, by using functionals, students can "hide" function calls. For instance, if I tell them to create a my_factorial function that does not call R's factorial but rather computes the factorial recursively, they can "cheat" and simply do my_factorial <- function(x) purrr::map_dbl(x, factorial).

> body(my_factorial) %>% deparse() %>% dance_recital() %>% unnest_calls(expr)
# A tibble: 3 x 7
  value  error      output warnings messages func    args      
  <list> <list>     <list> <list>   <list>   <chr>   <list>    
1 <NULL> <smplErrr> <NULL> <NULL>   <NULL>   ::      <list [2]>
2 <NULL> <smplErrr> <NULL> <NULL>   <NULL>   purrr   <list [2]>
3 <NULL> <smplErrr> <NULL> <NULL>   <NULL>   map_dbl <list [2]>

Right now, I prevent this by brute forcing the code analysis (i.e., I use stringr::str_detect ), but I find this solution somewhat unpleasant...

> body(my_factorial) %>% deparse() %>% stringr::str_detect("factorial")
[1] TRUE

Any idea?

@jtleek : I guess that students could also hide p-hacking from you this way :)

PS: Up to yesterday, I didn't know about tidycode and matahari, those tools are pretty cool!

Meaning of the "score" column

Hi Dr.McGowan,

I'm trying to understand the score column of classification_tbl.csv file, but I couldn't find any documentation of the meaning of the variable and its role. I'd really appreciate it if you can explain this variable or point me towards where I can find information on this column. Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.