rsquaredacademy / rbin Goto Github PK
View Code? Open in Web Editor NEWTools for binning data
Home Page: https://rbin.rsquaredacademy.com
License: Other
Tools for binning data
Home Page: https://rbin.rsquaredacademy.com
License: Other
All print methods should display the information value below the table.
rbin_winsorized()
should create bins using winsorized binning.
rb_bin_factor()
should bin factor variables.
Users should be able to choose between the below intervla:
rbin_quantiles()
should create bins using quantiles.
rb_bin_visualize()
should create visualization for binning.
Import the shiny app from the xplorerr package.
Return plot objects instead of printing. Use the argument print_plot
with the default value TRUE
.
rbin_equal_length
should create bins of equal length.
when I use rbin_quantiles(data, response, predictors[i], bins=3) on my dataset, there is an error "Error: Argument 2 must be length 3, not 2"
rbin_quantiles(data, response, predictors[i], bins=2) works just fine
summary(Data$CONVERSION) #response
0 1
248996 24912
summary(Data$COUNT_VISITS_6M) #predictors[i]
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 0.00 2.00 10.64 11.00 707.00
I hope this helps!
CRAN feedback: In rbinAddin()
and rbinFactorAddin()
, remove options to download data and plots.
Prepare for release:
devtools::check_win_devel()
rhub::check_for_cran()
Perform release:
devtools::check_win_devel()
(again!)devtools::submit_cran()
pkgdown::build_site()
Wait for CRAN...
Template from r-lib/usethis#338
There are two errors in the way equal frequency binning is computed:
library(rbin)
# equal frequency binning
bins <- rbin_equal_freq(mbank, y, age, 10)
bins
#> Binning Summary
#> ------------------------------------
#> Method Equal Frequency
#> Response y
#> Predictor age
#> Bins 10
#> Count 4521
#> Goods 517
#> Bads 4004
#> Entropy 0.51
#> Information Value 0.01
#>
#>
#> lower_cut upper_cut bin_count good bad good_rate woe iv
#> 1 18 29 452 55 397 0.12168142 -0.07040317 5.091649e-04
#> 2 29 31 452 57 395 0.12610619 -0.11117177 1.289604e-03
#> 3 31 34 452 46 406 0.10176991 0.13070550 1.623852e-03
#> 4 34 36 452 44 408 0.09734513 0.18007127 3.023706e-03
#> 5 36 39 452 58 394 0.12831858 -0.13109837 1.807071e-03
#> 6 39 42 452 51 401 0.11283186 0.01512953 2.275202e-05
#> 7 42 46 452 45 407 0.09955752 0.15514443 2.266308e-03
#> 8 46 51 452 60 392 0.13274336 -0.17008899 3.087466e-03
#> 9 51 56 452 53 399 0.11725664 -0.02833676 8.116094e-05
#> 10 56 84 453 48 405 0.10596026 0.08567979 7.116156e-04
#> entropy
#> 1 0.5341748
#> 2 0.5466619
#> 3 0.4745811
#> 4 0.4605229
#> 5 0.5528088
#> 6 0.5083990
#> 7 0.4675914
#> 8 0.5649142
#> 9 0.5214234
#> 10 0.4876093
# plot
plot(bins)
# bins
bins$bins
#> lower_cut upper_cut bin bin_count good bad bin_cum_count good_cum_count
#> 1 18 29 2 452 55 397 452 55
#> 2 29 31 7 452 57 395 904 112
#> 3 31 34 4 452 46 406 1356 158
#> 4 34 36 6 452 44 408 1808 202
#> 5 36 39 9 452 58 394 2260 260
#> 6 39 42 1 452 51 401 2712 311
#> 7 42 46 5 452 45 407 3164 356
#> 8 46 51 3 452 60 392 3616 416
#> 9 51 56 8 452 53 399 4068 469
#> 10 56 84 453 453 48 405 4521 517
#> bad_cum_count bin_prop good_rate bad_rate good_dist bad_dist
#> 1 397 0.09997788 0.12168142 0.8783186 0.10638298 0.09915085
#> 2 792 0.09997788 0.12610619 0.8738938 0.11025145 0.09865135
#> 3 1198 0.09997788 0.10176991 0.8982301 0.08897485 0.10139860
#> 4 1606 0.09997788 0.09734513 0.9026549 0.08510638 0.10189810
#> 5 2000 0.09997788 0.12831858 0.8716814 0.11218569 0.09840160
#> 6 2401 0.09997788 0.11283186 0.8871681 0.09864603 0.10014985
#> 7 2808 0.09997788 0.09955752 0.9004425 0.08704062 0.10164835
#> 8 3200 0.09997788 0.13274336 0.8672566 0.11605416 0.09790210
#> 9 3599 0.09997788 0.11725664 0.8827434 0.10251451 0.09965035
#> 10 4004 0.10019907 0.10596026 0.8940397 0.09284333 0.10114885
#> woe dist_diff iv entropy prop_entropy
#> 1 -0.07040317 -0.007232130 5.091649e-04 0.5341748 0.05340567
#> 2 -0.11117177 -0.011600102 1.289604e-03 0.5466619 0.05465410
#> 3 0.13070550 0.012423746 1.623852e-03 0.4745811 0.04744761
#> 4 0.18007127 0.016791719 3.023706e-03 0.4605229 0.04604211
#> 5 -0.13109837 -0.013784088 1.807071e-03 0.5528088 0.05526865
#> 6 0.01512953 0.001503815 2.275202e-05 0.5083990 0.05082866
#> 7 0.15514443 0.014607733 2.266308e-03 0.4675914 0.04674880
#> 8 -0.17008899 -0.018152061 3.087466e-03 0.5649142 0.05647892
#> 9 -0.02833676 -0.002864157 8.116094e-05 0.5214234 0.05213081
#> 10 0.08567979 0.008305524 7.116156e-04 0.4876093 0.04885800
Created on 2023-06-02 by the reprex package (v0.3.0)
Use rlang
equivalents for errors, warnings and messages.
Prepare for release:
devtools::check_win_devel()
rhub::check_for_cran()
Perform release:
devtools::check_win_devel()
(again!)devtools::submit_cran()
pkgdown::build_site()
Wait for CRAN...
Template from r-lib/usethis#338
CRAN feedback: Use \donttest
instead of \dontrun
in case of examples with run time > 5s.
rbin_trend_decreasing()
will force the variable to follow a monotonically decreasing trend.
Bin continuous variables based on weight of evidence and information value. Users should be able to bin
the variables in the following ways:
Explore the features of SPSS visual binning and incorporate them in the RStudio Addin or shiny app.
User should be able to save the final binned data both as CSV and RDS.
The shiny app for rbin
should do the following:
rb_create_bins()
should create binned variables in a data set.
rbin_trend_increasing()
will force the variable to follow a monotonically increasing trend.
All binning functions should return the bin-wise and total entropy.
We are contacting you because you are the maintainer of rbin, which imports ggplot2 and uses vdiffr to manage visual test cases. The upcoming release of ggplot2 includes several improvements to plot rendering, including the ability to specify lineend
and linejoin
in geom_rect()
and geom_tile()
, and improved rendering of text. These improvements will result in subtle changes to your vdiffr dopplegangers when the new version is released.
Because vdiffr test cases do not run on CRAN by default, your CRAN checks will still pass. However, we suggest updating your visual test cases with the new version of ggplot2 as soon as possible to avoid confusion. You can install the development version of ggplot2 using remotes::install_github("tidyverse/ggplot2")
.
If you have any questions, let me know!
From CRAN:
Dear maintainer,
You have file 'rbin/man/rbin.Rd' with \docType{package}, likely
intended as a package overview help file, but without the appropriate
PKGNAME-package \alias as per "Documenting packages" in R-exts.
This seems to be the consequence of the breaking change
Using @docType package no longer automatically adds a -package alias.
Instead document _PACKAGE to get all the defaults for package
documentation.
in roxygen2 7.0.0 (2019-11-12) having gone unnoticed, see
<https://github.com/r-lib/roxygen2/issues/1491>.
As explained in the issue, to get the desired PKGNAME-package \alias
back, you should either change to the new approach and document the new
special sentinel
"_PACKAGE"
or manually add
@aliases rbin-package
if remaining with the old approach.
Please fix in your master sources as appropriate, and submit a fixed
version of your package within the next few months.
Best,
-k
Prepare for release:
devtools::check_win_devel()
rhub::check_for_cran()
Perform release:
devtools::check_win_devel()
(again!)devtools::submit_cran()
pkgdown::build_site()
Wait for CRAN...
Template from r-lib/usethis#338
rb_bin_manual()
should allow users to specify manual binning.
rbin_equal_freq
should create bins with equal frequency.
User should be able to bin multiple variables without having to launch rdinAddin()
or rbinFactorAddin()
multiple times.
User should be able to select data from RStudio instead of uploading:
rbinAddin()
rbinFactorAddin()
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.