Giter Site home page Giter Site logo

dmwr2's Introduction

DMwR2

An R package with functions and data supporting the second edtion of the book Data Mining with R, by Luis Torgo, published by CRC Press.

To Install the Latest Oficial Stable Release do the following in R:

library(devtools)  # You need to install this package!
install_github("ltorgo/DMwR2",ref="master")

To Install the Latest Development Release do the following in R:

library(devtools)  # You need to install this package!
install_github("ltorgo/DMwR2",ref="develop")

After installation using any of the above procedures, the package can be used as any other R package by doing:

 library(DMwR2)

dmwr2's People

Contributors

bmd123 avatar ltorgo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dmwr2's Issues

Error with sampleCSV because of space in directory name

I downloaded “flights14.csv” from the following site.
https://github.com/arunsrinivasan/flights/wiki/NYC-Flights-2014-data

I am working on macOS High Sierra Version 10.13.3. I am using R version 3.5.0 (2018-04-23) -- "Joy in Playing” with DMwR2_0.0.2. I created “test 1” as a subdirectory in “Users” as an example.
/Users/test 1

The following line does not execute with the “space” between “test” and “1” in the subdirectory name:
flights1000_1 <- sampleCSV(file = "/Users/test 1/flights14.csv", percORn = 1000, header = T)

The following message is displayed in the console.
wc: /Users/test: open: No such file or directory
wc: 1/flights14.csv: open: No such file or directory
sh: /Users/test: Permission denied
Error: '/Users/test 1/flights14.csv.tmp.csv' does not exist.
In addition: Warning message:
In system(paste("wc -l", f), intern = TRUE) :
running command 'wc -l /Users/test 1/flights14.csv' had status 1

sampleCSV not reading header

I downloaded “flights14.csv” from the following site.
https://github.com/arunsrinivasan/flights/wiki/NYC-Flights-2014-data

I am working on macOS High Sierra Version 10.13.3. I am using R version 3.5.0 (2018-04-23) -- "Joy in Playing” with DMwR2_0.0.2. I created “test2” as subdirectories in “Users” as an example.
/Users/test2

The column names are not displayed in the tibble when I run the following line. Instead it seems to be using one of the rows of flight data for the column names.
flights1000_2 <- sampleCSV(file = "/Users/test2/flights14.csv", percORn = 1000, header = T)

The following message is displayed in the console.
Parsed with column specification:
cols(
2014 = col_integer(),
1 = col_integer(),
1_1 = col_integer(),
847 = col_integer(),
-3 = col_integer(),
1036 = col_integer(),
1_2 = col_integer(),
0 = col_integer(),
AA = col_character(),
N553AA = col_character(),
313 = col_integer(),
LGA = col_character(),
ORD = col_character(),
139 = col_integer(),
733 = col_integer(),
8 = col_integer(),
47 = col_integer()
)
Warning message:
Duplicated column names deduplicated: '1' => '1_1' [3], '1' => '1_2' [7]

SMOTE creating floats from integers

Thank you for the time and effort you put into DWwR because it is incredibly useful. SMOTE has been especially useful.

Question
Do you have a suggestion for how to stop SMOTE from creating floats from columns where all distinct values are all integers? i.e. input integers into SMOTE and SMOTE returns int's and floats.

Problem
For example, running the following creates floats from columns containing only integers:
dataset.bal <- SMOTE(target ~ ., dataset, perc.over=675, perc.under=100);

Input

#distinct values in column 124 dataset are all integers 
levels(as.factor(dataset[,124]));

[1] "0" "4" "8" "12" "16" "20" "24" "28" "32" "36" "40" "44" "48" "52" "56" "60" "64" "68" "72" "76" "80" "84"

Output

#floats values have been added in column 124 in dataset.bal  
levels(as.factor(dataset.bal[,124]));

[1] "0" "4" "8" "12""16" "17.4798878207803"
[7] "17.8020473793149" "20""20.5099524259567" "20.6726439939812" "20.7490051034838" "21.7515262812376"
[13] "23.4109582398087" "23.5898735374212" "24""24.0396314188838" "24.0953625235707" "24.148680685088"
[19] "24.2740816418082" "24.3701336197555" "24.5072071170434" "24.5087074693292" "24.5124940760434" "24.7718890225515" etc..

Chapter 3: Monte Carlo workflow issue

Thanks for this second edition of DMwR book.

One of the more exciting parts (I think) is the monte carlo evalutions in chapter 3. My first step is to replicate the data runs from the book. However, I get stuck when executing MC workflow with this error. It feels like a simple error, but I've tried all sort of ways to make it work and I can't see where the missing values should be. Am I the only one in this forum who has got stuck on this part?

PERFORMANCE ESTIMATION USING MONTE CARLO

** PREDICTIVE TASK :: SP500

++ MODEL/WORKFLOW :: earthRegr.v1
Error in if (!missing(cluster) && !is.null(cluster) && getOption("parallelMap.status") == :
missing value where TRUE/FALSE needed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.