Giter Site home page Giter Site logo

rcppcnpy's Introduction

RcppCNPy: Rcpp bindings for NumPy files

CI License CRAN Dependencies Downloads Last Commit
status DOI

About

This package uses the cnpy library written by Carl Rogers to provide read and write facilities for files created with (or for) the NumPy extension for Python. Vectors and matrices of numeric types can be read or written to and from files as well as compressed files. Support for integer files is available if the package has been built with -std=c++11 which is the default starting with release 0.2.3 following the release of R 3.1.0, and available on all platforms following the release of R 3.3.0 with the updated 'Rtools'.

Example

The following Python code

>>> import numpy as np
>>> fm = np.arange(12).reshape(3,4) * 1.1
>>> fm
array([[  0. ,   1.1,   2.2,   3.3],
       [  4.4,   5.5,   6.6,   7.7],
       [  8.8,   9.9,  11. ,  12.1]])
>>> np.save("fmat.npy", fm)
>>> 
>>> im = np.arange(12).reshape(3,4)
>>> im
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> np.save("imat.npy", im)
>>> 

saves two matrices in floating-point and integer representation.

With this R code we can read and assign the files:

R> library(RcppCNPy)
R> fmat <- npyLoad("fmat.npy")
R> fmat
     [,1] [,2] [,3] [,4]
[1,]  0.0  1.1  2.2  3.3
[2,]  4.4  5.5  6.6  7.7
[3,]  8.8  9.9 11.0 12.1
R> 
R> imat <- npyLoad("imat.npy", "integer")
R> imat
     [,1] [,2] [,3] [,4]
[1,]    0    1    2    3
[2,]    4    5    6    7
[3,]    8    9   10   11
R> 

Going the opposite way by saving in R and reading in Python works equally well. An extension not present in CNPy allows reading and writing of gzip-compressed files.

The package has been tested and used on several architecture, and copes correctly with little-vs-big endian switches.

More details are available in the package vignette.

Installation

The package is on CRAN and can be installed per:

R> install.packages("RcppCNPy")

Status

On CRAN, stable and mostly feature-complete.

Alternative: reticulate

The reticulate package can also provide easy and comprehensive access to NumPy data; see the additional vignette in RcppCNPy for examples and more details.

Feedback

Contributions are welcome, please use the GitHub issue tracker for bug reports, feature requests or general discussions before sending pull requests.

Author

Dirk Eddelbuettel and Wush Wu

License

GPL (>= 2)

rcppcnpy's People

Contributors

ck37 avatar eddelbuettel avatar garrettmooney avatar romainfrancois avatar wush978 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rcppcnpy's Issues

npyLoad() not working with the last version of R

Hello Dirk,

I think the npyLoad function is not working properly for the last version of R (R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid")

rm(list = ls(all = TRUE))
 library(tidyverse)
 library(RcppCNPy)

genome_selection <- npyLoad("trial_inv_pcangsd_sca4_e2.selection.npy")

I can't send the input as the format is not supported by github...

New feature: Loading .npz

The ability to load .npz in R has been a long desired addition to the rcppcnpy repository. An adept C++ programmer should tackle this issue and submit a pull request.

Addition to vignette for 3D examples

Thanks for the help over on SO. Here's an example that you could add to the vignette for 3d examples:

Define (and look at) a 3D array in python:

import numpy as np
arr = np.array(list(range(24))).reshape(4,3,2)
np.save("arr", arr)
print(arr[:,:,0])
print(arr[:,:,1])
[[ 0  2  4]
 [ 6  8 10]
 [12 14 16]
 [18 20 22]]
[[ 1  3  5]
 [ 7  9 11]
 [13 15 17]
 [19 21 23]]

Open it in R:

> library(reticulate)
> np <- import("numpy")
> # data reading
> arr <- np$load("arr.npy")
> arr
, , 1

     [,1] [,2] [,3]
[1,]    0    2    4
[2,]    6    8   10
[3,]   12   14   16
[4,]   18   20   22

, , 2

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    7    9   11
[3,]   13   15   17
[4,]   19   21   23

Data is improperly read from binary int32 array.

Here is my numpy array, here in npy

array([[0],
       [0],
       [0],
       [1],
       [1],
       [1],
       [0],
       [1],
       [0],
       [1]], dtype=int32)

Written to disk with:

numpy.save("array_int32.npy", arrayInt)

Loading in R gives me very different values:

> library(RcppCNPy)
> numpyArray <- npyLoad("array_int32.npy")
> numpyArray
               [,1]
 [1,]  0.000000e+00
 [2,] 2.121996e-314
 [3,] 2.121996e-314
 [4,] 2.121996e-314
 [5,] 2.121996e-314
 [6,] 2.929809e-321
 [7,] 1.865035e-314
 [8,] 4.688588e-310
 [9,] 4.688589e-310
[10,] 4.688589e-310
> numpyArrayInt <- npyLoad("array_int32.npy","integer")
> numpyArrayInt
            [,1]
 [1,]          0
 [2,]          0
 [3,]          1
 [4,]          0
 [5,]          0
 [6,]        593
 [7,] -520093683
 [8,]  781958720
 [9,]  807221072
[10,]  807238832

I'm using the CRAN package of RcppCNPy installed today with R 3.6.3 and numpy 1.13.3 on Ubuntu 18.04.4 LTS.

Segmentation fault at the installation of the package

Hello,

I'm a complete R newbie. Sorry if it's something "obvious".
I'm trying to install the RcppCNPy package on my local computer and I get a segfault:

> install.packages('RcppCNPy', repos='http://cran.us.r-project.org')
trying URL 'http://cran.us.r-project.org/src/contrib/RcppCNPy_0.2.10.tar.gz'
Content type 'application/x-gzip' length 374200 bytes (365 KB)
==================================================
downloaded 365 KB

* installing *source* package ‘RcppCNPy’ ...
** package ‘RcppCNPy’ successfully unpacked and MD5 sums checked
** libs
g++ -std=gnu++11 -I/home/pierre/anaconda3/envs/Rstuff/lib/R/include -DNDEBUG  -I"/home/pierre/anaconda3/envs/Rstuff/lib/R/library/Rcpp/include" -I/home/pierre/anaconda3/envs/Rstuff/include   -fpic  -I/home/pierre/anaconda3/envs/Rstuff/include -c RcppExports.cpp -o RcppExports.o
g++ -std=gnu++11 -I/home/pierre/anaconda3/envs/Rstuff/lib/R/include -DNDEBUG  -I"/home/pierre/anaconda3/envs/Rstuff/lib/R/library/Rcpp/include" -I/home/pierre/anaconda3/envs/Rstuff/include   -fpic  -I/home/pierre/anaconda3/envs/Rstuff/include -c cnpy.cpp -o cnpy.o
g++ -std=gnu++11 -I/home/pierre/anaconda3/envs/Rstuff/lib/R/include -DNDEBUG  -I"/home/pierre/anaconda3/envs/Rstuff/lib/R/library/Rcpp/include" -I/home/pierre/anaconda3/envs/Rstuff/include   -fpic  -I/home/pierre/anaconda3/envs/Rstuff/include -c cnpyMod.cpp -o cnpyMod.o
g++ -std=gnu++11 -shared -L/home/pierre/anaconda3/envs/Rstuff/lib/R/lib -L/home/pierre/anaconda3/envs/Rstuff/lib -lgfortran -o RcppCNPy.so RcppExports.o cnpy.o cnpyMod.o -lz -L/home/pierre/anaconda3/envs/Rstuff/lib/R/lib -lR
installing to /home/pierre/anaconda3/envs/Rstuff/lib/R/library/RcppCNPy/libs
** R
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded

 *** caught segfault ***
address 0x20, cause 'memory not mapped'

Traceback:
 1: .Call(Module__functions_names, xp)
 2: Module(module, mustStart = TRUE, where = env)
 3: doTryCatch(return(expr), name, parentenv, handler)
 4: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 5: tryCatchList(expr, classes, parentenv, handlers)
 6: tryCatch(Module(module, mustStart = TRUE, where = env), error = function(e) e)
 7: loadModule(module = "cnpy", what = TRUE, env = ns, loadNow = TRUE)
 8: (function (ns) loadModule(module = "cnpy", what = TRUE, env = ns, loadNow = TRUE))(<environment>)
 9: doTryCatch(return(expr), name, parentenv, handler)
10: tryCatchOne(expr, names, parentenv, handlers[[1L]])
11: tryCatchList(expr, classes, parentenv, handlers)
12: tryCatch((function (ns) loadModule(module = "cnpy", what = TRUE, env = ns, loadNow = TRUE))(<environment>),     error = function(e) e)
13: eval(substitute(tryCatch(FUN(WHERE), error = function(e) e),     list(FUN = f, WHERE = where)), where)
14: eval(substitute(tryCatch(FUN(WHERE), error = function(e) e),     list(FUN = f, WHERE = where)), where)
15: .doLoadActions(where, attach)
16: methods::cacheMetaData(ns, TRUE, ns)
17: loadNamespace(package, lib.loc)
18: doTryCatch(return(expr), name, parentenv, handler)
19: tryCatchOne(expr, names, parentenv, handlers[[1L]])
20: tryCatchList(expr, classes, parentenv, handlers)
21: tryCatch({    attr(package, "LibPath") <- which.lib.loc    ns <- loadNamespace(package, lib.loc)    env <- attachNamespace(ns, pos = pos, deps)}, error = function(e) {    P <- if (!is.null(cc <- conditionCall(e)))         paste(" in", deparse(cc)[1L])    else ""    msg <- gettextf("package or namespace load failed for %s%s:\n %s",         sQuote(package), P, conditionMessage(e))    if (logical.return)         message(paste("Error:", msg), domain = NA)    else stop(msg, call. = FALSE, domain = NA)})
22: library(pkg_name, lib.loc = lib, character.only = TRUE, logical.return = TRUE)
23: withCallingHandlers(expr, packageStartupMessage = function(c) invokeRestart("muffleMessage"))
24: suppressPackageStartupMessages(library(pkg_name, lib.loc = lib,     character.only = TRUE, logical.return = TRUE))
25: doTryCatch(return(expr), name, parentenv, handler)
26: tryCatchOne(expr, names, parentenv, handlers[[1L]])
27: tryCatchList(expr, classes, parentenv, handlers)
28: tryCatch(expr, error = function(e) {    call <- conditionCall(e)    if (!is.null(call)) {        if (identical(call[[1L]], quote(doTryCatch)))             call <- sys.call(-4L)        dcall <- deparse(call)[1L]        prefix <- paste("Error in", dcall, ": ")        LONG <- 75L        msg <- conditionMessage(e)        sm <- strsplit(msg, "\n")[[1L]]        w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w")        if (is.na(w))             w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L],                 type = "b")        if (w > LONG)             prefix <- paste0(prefix, "\n  ")    }    else prefix <- "Error : "    msg <- paste0(prefix, conditionMessage(e), "\n")    .Internal(seterrmessage(msg[1L]))    if (!silent && identical(getOption("show.error.messages"),         TRUE)) {        cat(msg, file = outFile)        .Internal(printDeferredWarnings())    }    invisible(structure(msg, class = "try-error", condition = e))})
29: try(suppressPackageStartupMessages(library(pkg_name, lib.loc = lib,     character.only = TRUE, logical.return = TRUE)))
30: tools:::.test_load_package("RcppCNPy", "/home/pierre/anaconda3/envs/Rstuff/lib/R/library")
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault (core dumped)
ERROR: loading failed
* removing ‘/home/pierre/anaconda3/envs/Rstuff/lib/R/library/RcppCNPy’

The downloaded source packages are in
	‘/tmp/Rtmpvu9zdk/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Warning message:
In install.packages("RcppCNPy", repos = "http://cran.us.r-project.org") :
  installation of package ‘RcppCNPy’ had non-zero exit status

Any idea?
Thanks!

npySave(): 3D arrays saved as 1D npy objects

> A <- array(0:23, dim=c(2,3, 4)) * 1.1
> dim(A)
[1] 2 3 4
> npySave("farr.npy", A)
Saving Numeric Vector
> a <- npyLoad("farr.npy", type="numeric", dotranspose=TRUE)
> dim(a)
NULL

"Feature" of transposed ndarrays

Dear Dirk,

Firstly, thanks so much for this package!

So, I think I found an undocumented feature for npyLoad(), in RcppCNPy version 0.2.10 (latest version?). If I npyLoad a transposed numpy.ndarray, then the default dotranspose=TRUE results in an improper R array and dotranspose=FALSE is required. Alternatively, one can do a .copy() before numpy.save()-ing the array. I couldn't find anything in your documentation that addresses this.

Best,
Jay

Different Result via numpy Read

Mir ist aufgefallen, dass beim Einlesen einer numpy-Datei ein abweichendes Ergebnis zur entsprechenden Quelldatei im rds-Format resultierte:

[I noticed that when reading a numpy-file values differ between reading numpy and rds.]

[1] "R head"
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0
[1] "numpy head"
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0

Die Dateien mitsamt R-Code liegen unter https://github.com/andreasmeid/RcppCNPy

[File and code at repo.]

[Edits by @eddelbuetttel who continues to point out that this is not really reproducible as we do not what created y.npy and y.rds._]

Segmentation fault when loading large npy files (30GB input)

Hi there!

I tried loading in the npy file from the Skymap dataset (Hannah Carter Lab, see link, in the efs/rnaseq_merged/ folder), but it fails to load. The file is 30GB in size, not compressed. The dimensions of the npy file are 34677 rows × 225203 columns. It is not gzip compressed, and was saved using this command: np.save(filename+".npy",myDF.values)

Not sure why it is crashing. Here's the output below. Please give it a go

Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(RcppCNPy)
> mmat = npyLoad("Mus_musculus.gene_symbol.tpm.npy")

 *** caught segfault ***
address 0x7f1d955e7000, cause 'memory not mapped'

Traceback:
 1: .External(list(name = "InternalFunction_invoke", address = <pointer: 0x22f3160>,     dll = list(name = "Rcpp", path = "/hpc/packages/minerva-common/rpackages/3.4.3/site-library/Rcpp/libs/Rcpp.so",         dynamicLookup = TRUE, handle = <pointer: 0x2218080>,         info = <pointer: 0xc19e00>), numParameters = -1L), <pointer: 0x221acf0>,     filename, type, dotranspose)
 2: npyLoad("Mus_musculus.gene_symbol.tpm.npy")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 3

npyLoad yields wrong values

Dear Dirk Eddelbuetterl,

during using the npyLoad function in R I encountered the following issue. When trying to load a npy file (saved from python) the matrix values are completely wrong.

To illustrate, in Python I have the following numpy array:

Shape: (10, 4)
Dtype: float32
[[ 1. 24. 59. 465.]
[ 1. 26. 166. 466.]
[ 1. 25. 35. 458.]
[ 1. 28. 36. 465.]]

I then export via "np.save('output.npy', )".

Upon loading the file in R via "npyLoad(filename = 'output.npy')" I however get the following:

          [,1]          [,2]           [,3]          [,4]

[1,] 5.368710e+08 1.412329e+19 1.073742e+09 1.441152e+19
[2,] 8.053065e+08 1.210568e+19 2.147484e+09 1.412329e+19
[3,] 5.928788e-323 5.928788e-323 4.940656e-324 8.889260e+247
[4,] 8.573468e-315 1.946940e-308 -2.681562e+154 1.375846e-315

In contrast, exporting the same array via "np.savetext" and importing in R via "read.csv" yields the expected (i.e., correct) data.

It would be great if you could help me out with this issue.

Best,

Florian Müller.

dimension mixup

Encountered a problem with this file: global_signals.npy.zip

It was written in numpy as a [415x518] array.

python

gs = np.load("global_signals.npy")
print(gs.shape)
 >> (415,518)

However, when I read it in R, it is a [518x415] matrix and most importantly, the values are filled in the wrong order of dimensions.
(you can clearly see it, because most of the values at at index 415 on the second dimension should be ##0).

I had to apply the following patch to obtain the same matrix:

R

library(RcppCNPy)
gs <- npyLoad('data/global_signals.npy')
gsv <- as.vector(t(gs))
gs <- array(gsv,dim=dim(gs))

npyLoad crashes or shows wrong values if array is not Python type float64

Hi again,

I might as well mention this as well. If the numpy array is not set to float64 (e.g., float32 or int64), npyLoad() either crashes or shows incorrect values. I haven't looked further into this, but if you want me to suggest changes to the documentation for now, I can do a pull request for that.

Jay

three dimensional array request

What would it take to expand this package to read three dimensional .npy arrays into R. I am working with some global climate files from netcdf files that I processed into a stack of 100 yearly global means with dimensions of 720 x 360 x 100. I would love to read these 3d npy files directly into R without having to process them back into netcdfs first. Something that would take 100 of 720 x 360 global maps [720, 360, 100). Is this possible?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.