Hi,
I have a problem reading csv files using plyrmr package and the input() function. It is working if I put the R object directly as input object (see example code). I get the following error in the stderr output. Any solution idea?
Error: !is.null(template) is not TRUE
No traceback available
Error during wrapup:
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:365)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:579)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
MY Rcode:
library(plyrmr)
Sys.setenv(HADOOP_CMD="/opt/mapr/hadoop/hadoop-0.20.2/bin/hadoop")
Sys.setenv(HADOOP_STREAMING="/opt/mapr/hadoop/hadoop-0.20.2/contrib/streaming/hadoop-0.20.2-dev-streaming.jar")
mtcars
transform(mtcars, carb.per.cyl = carb/cyl)
working
as.data.frame( transform(input(mtcars), carb.per.cyl = carb/cyl) )
write.table(mtcars, file = "/mapr/my.cluster.com/tmp/test.csv", sep = ",", row.names=TRUE, col.names=FALSE)
air_format =
make.input.format(
"csv",
sep = ","
)
not working
as.data.frame( transform(input("/mapr/my.cluster.com/tmp/test.csv", input.format=air_format), carb.per.cyl = carb/cyl) )
packageJobJar: [/tmp/Rtmpg1L4rx/rmr-local-env2ec8d6a90cb, /tmp/Rtmpg1L4rx/rmr-global-env2ec810159853, /tmp/Rtmpg1L4rx/rmr-streaming-map2ec831a0084b, /tmp/hadoop-schmidbm/hadoop-unjar1606066649349702739/] [] /tmp/streamjob4216188614721465077.jar tmpDir=null
14/03/24 12:39:34 INFO fs.JobTrackerWatcher: Current running JobTracker is: ex4s-dev01.devproof.org/144.76.60.132:9001
14/03/24 12:39:36 INFO mapred.FileInputFormat: Total input paths to process : 1
14/03/24 12:39:36 INFO mapred.JobClient: Creating job's output directory at maprfs:/tmp/file2ec88e61485
14/03/24 12:39:36 INFO mapred.JobClient: Creating job's user history location directory at maprfs:/tmp/file2ec88e61485/_logs
14/03/24 12:39:37 INFO streaming.StreamJob: getLocalDirs(): [/tmp/mapr-hadoop/mapred/local]
14/03/24 12:39:37 INFO streaming.StreamJob: Running job: job_201403201120_0338
14/03/24 12:39:37 INFO streaming.StreamJob: To kill this job, run:
14/03/24 12:39:37 INFO streaming.StreamJob: /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=maprfs:/// -kill job_201403201120_0338
14/03/24 12:39:37 INFO streaming.StreamJob: Tracking URL: http://ex4s-dev01.devproof.org:50030/jobdetails.jsp?jobid=job_201403201120_0338
14/03/24 12:39:39 INFO streaming.StreamJob: map 0% reduce 0%
14/03/24 12:40:14 INFO streaming.StreamJob: map 50% reduce 0%
14/03/24 12:40:21 INFO streaming.StreamJob: map 0% reduce 0%
14/03/24 12:40:26 INFO streaming.StreamJob: map 50% reduce 0%
14/03/24 12:40:36 INFO streaming.StreamJob: map 0% reduce 0%
14/03/24 12:41:02 INFO streaming.StreamJob: map 50% reduce 0%
14/03/24 12:41:08 INFO streaming.StreamJob: map 100% reduce 100%
14/03/24 12:41:08 INFO streaming.StreamJob: To kill this job, run:
14/03/24 12:41:08 INFO streaming.StreamJob: /opt/mapr/hadoop/hadoop-0.20.2/bin/../bin/hadoop job -Dmapred.job.tracker=maprfs:/// -kill job_201403201120_0338
14/03/24 12:41:08 INFO streaming.StreamJob: Tracking URL: http://ex4s-dev01.devproof.org:50030/jobdetails.jsp?jobid=job_201403201120_0338
14/03/24 12:41:08 ERROR streaming.StreamJob: Job not successful. Error: NA
14/03/24 12:41:08 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
Deleted maprfs:/tmp/file2ec86959dfd6
Deleted maprfs:/tmp/file2ec85d6e85fe
sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C LC_COLLATE=C LC_MONETARY=C
[6] LC_MESSAGES=C LC_PAPER=C LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=C LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] plyrmr_0.1.0 hydroPSO_0.3-3 R.methodsS3_1.6.1 pryr_0.1 rmr2_3.1.0 caTools_1.16
[7] plyr_1.8.1 stringr_0.6.2 reshape2_1.2.2 functional_0.4 digest_0.6.4 bitops_1.0-6
[13] RJSONIO_1.0-3 Rcpp_0.11.1
loaded via a namespace (and not attached):
[1] Formula_1.1-1 Hmisc_3.14-3 RColorBrewer_1.0-5 cluster_1.15.1 codetools_0.2-8 grid_3.0.1
[7] lattice_0.20-27 latticeExtra_0.6-26 sp_1.0-14 splines_3.0.1 survival_2.37-7 tools_3.0.1
[13] zoo_1.7-11
Complete Hadoop logs:
Loading objects:
.Random.seed
air_format
Loading objects:
backend.parameters
combine
combine.file
combine.line
debug
default.input.format
default.output.format
in.folder
in.memory.combine
input.format
libs
map
map.file
map.line
out.folder
output.format
pkg.opts
postamble
preamble
profile.nodes
reduce
reduce.file
reduce.line
rmr.global.env
rmr.local.env
save.env
vectorized.reduce
verbose
work.dir
Loading required package: plyrmr
Loading required package: Rcpp
Loading required package: rmr2
Loading required package: RJSONIO
Loading required package: methods
Loading required package: bitops
Loading required package: digest
Loading required package: reshape2
Loading required package: stringr
Loading required package: plyr
Loading required package: caTools
Loading required package: pryr
Loading required package: R.methodsS3
R.methodsS3 v1.6.1 (2014-01-04) successfully loaded. See ?R.methodsS3 for help.
Loading required package: hydroPSO
(C) 2011-2013 M. Zambrano-Bigiarini and R. Rojas (GPL >=2 license)
Type 'citation('hydroPSO')' to see how to cite this package
Attaching package: ‘plyrmr’
The following object is masked from ‘package:pryr’:
The following object is masked from ‘package:rmr2’:
The following objects are masked from ‘package:plyr’:
The following object is masked from ‘package:reshape2’:
The following objects are masked from ‘package:base’:
intersect, rbind, sample, union
Error: !is.null(template) is not TRUE
No traceback available
Error during wrapup:
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:365)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:579)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
syslog logs
2014-03-24 12:41:03,837 INFO org.apache.hadoop.mapred.Child: JVM: jvm_201403201120_0338_m_-442050359 pid: 12813
2014-03-24 12:41:04,175 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/mapr-hadoop/mapred/local/taskTracker/schmidbm/jobcache/job_201403201120_0338/jars/rmr-global-env2ec810159853 <- /tmp/mapr-hadoop/mapred/local/taskTracker/schmidbm/jobcache/job_201403201120_0338/attempt_201403201120_0338_m_000001_3/work/rmr-global-env2ec810159853
2014-03-24 12:41:04,176 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/mapr-hadoop/mapred/local/taskTracker/schmidbm/jobcache/job_201403201120_0338/jars/rmr-local-env2ec8d6a90cb <- /tmp/mapr-hadoop/mapred/local/taskTracker/schmidbm/jobcache/job_201403201120_0338/attempt_201403201120_0338_m_000001_3/work/rmr-local-env2ec8d6a90cb
2014-03-24 12:41:04,176 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/mapr-hadoop/mapred/local/taskTracker/schmidbm/jobcache/job_201403201120_0338/jars/job.jar <- /tmp/mapr-hadoop/mapred/local/taskTracker/schmidbm/jobcache/job_201403201120_0338/attempt_201403201120_0338_m_000001_3/work/job.jar
2014-03-24 12:41:04,177 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/mapr-hadoop/mapred/local/taskTracker/schmidbm/jobcache/job_201403201120_0338/jars/.job.jar.crc <- /tmp/mapr-hadoop/mapred/local/taskTracker/schmidbm/jobcache/job_201403201120_0338/attempt_201403201120_0338_m_000001_3/work/.job.jar.crc
2014-03-24 12:41:04,177 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /tmp/mapr-hadoop/mapred/local/taskTracker/schmidbm/jobcache/job_201403201120_0338/jars/rmr-streaming-map2ec831a0084b <- /tmp/mapr-hadoop/mapred/local/taskTracker/schmidbm/jobcache/job_201403201120_0338/attempt_201403201120_0338_m_000001_3/work/rmr-streaming-map2ec831a0084b
2014-03-24 12:41:04,198 INFO org.apache.hadoop.mapred.Child: Starting task attempt_201403201120_0338_m_000001_3
2014-03-24 12:41:04,199 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2014-03-24 12:41:04,308 INFO org.apache.hadoop.mapreduce.util.ProcessTree: setsid exited with exit code 0
2014-03-24 12:41:04,311 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: /proc//status does not have information about swap space used(VmSwap). Can not track swap usage of a task.
2014-03-24 12:41:04,312 INFO org.apache.hadoop.mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.mapreduce.util.LinuxResourceCalculatorPlugin@57271b36
2014-03-24 12:41:04,449 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library not loaded
2014-03-24 12:41:04,502 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/usr/bin/Rscript, --vanilla, ./rmr-streaming-map2ec831a0084b]
2014-03-24 12:41:04,559 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:NA [rec/s] out:NA [rec/s]
2014-03-24 12:41:04,563 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=10/0/0 in:NA [rec/s] out:NA [rec/s]
2014-03-24 12:41:06,941 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
2014-03-24 12:41:06,941 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed!