Giter Site home page Giter Site logo

monetdb / monetdblite-r Goto Github PK

View Code? Open in Web Editor NEW
64.0 64.0 13.0 3.06 MB

MonetDB reconfigured as an R package - See below for an introduction. Edit

R 1.90% C 95.32% Shell 0.01% CMake 0.20% Makefile 0.17% Python 0.03% Yacc 2.36%
column-store database dbi monetdb rstats sql-database

monetdblite-r's Introduction

The MonetDB Database System

The MonetDB database was originally developed by the CWI database research group (see https://www.monetdb.org/). Development has now shifted to the spin-off company MonetDB Solutions.

Via the MonetDB project we have brought the MonetDB system in open source, where it is accessible at https://www.monetdb.org/easy-setup/. Even though development happens mostly in a company, the MonetDB database system will remain open source. It is available under the Mozilla Public License 2.0.

The MonetDB database system is a high-performance database kernel for query-intensive applications. The MonetDB source can be found at our Mercurial server. There is also a github mirror that is updated once an hour.

Building

MonetDB is built using the cmake program. It is recommended to build in a directory that is not inside the source tree. In order to build, use the following commands when inside your build directory:

cmake [options] /path/to/monetdb/source
cmake --build .
cmake --build . --target install

In order to install into a different directory than the default /usr/local, add the option -DCMAKE_INSTALL_PREFIX:

cmake -DCMAKE_INSTALL_PREFIX=/path/to/install/monetdb /path/to/monetdb/source
cmake --build .
cmake --build . --target install

Build Options

There are many options that can be used to select how MonetDB is to be built. Options can be turned ON or OFF using a -D flag on the first of the cmake command lines. Except when specified otherwise, options are ON when the relevant libraries can be found. Available options are:

Option Explanation
ASSERT Enable asserts (default=ON for development sources, OFF for tarball installation)
CINTEGRATION Enable support for C UDFs (default=ON except on Windows)
CMAKE_SUMMARY Show a summary of the cmake configuration (for debug purposes, default=OFF)
CMAKE_UNITTESTS Build and run the unittest for the build system (default=OFF)
FITS Enable support for FITS
GEOM Enable support for geom module (using libgeos library)
INT128 Enable support for 128-bit integers (if compiler supports them)
NETCDF Enable support for netcdf
ODBC Compile the MonetDB ODBC driver
PY3INTEGRATION Enable support for Python 3 integration into MonetDB
RINTEGRATION Enable support for R integration into MonetDB
SANITIZER Enable support for the GCC address sanitizer (default=OFF)
SHP Enable support for ESRI Shapefiles
STRICT Enable strict compiler flags (default=ON for development sources, OFF for tarball installation)
TESTING Enable support for testing
WITH_BZ2 Include bz2 support
WITH_CMOCKA Include cmocka support (default=OFF)
WITH_CURL Include curl support
WITH_LZ4 Include lz4 support
WITH_LZMA Include lzma support
WITH_OPENSSL Include TLS support (secure client/server connection)
WITH_PCRE Include pcre support
WITH_PROJ Include proj support
WITH_READLINE Include readline support
WITH_VALGRIND Include valgrind support
WITH_XML2 Include xml2 support
WITH_ZLIB Include zlib support

Required and Optional Packages

On Fedora, the following packages are required: bison, cmake, gcc, pkgconf, python3.

The following packages are optional but recommended: bzip2-devel, lz4-devel, openssl-devel, pcre-devel, readline-devel, xz-devel, zlib-devel.

The following packages are optional: cfitsio-devel, gdal-devel, geos-devel, libasan, libcmocka-devel, libcurl-devel, libxml2-devel, netcdf-devel, proj-devel, python3-devel, python3-numpy, R-core-devel, unixODBC-devel, valgrind-devel.

On Ubuntu and Debian the following packages are required: bison, cmake, gcc, pkg-config, python3.

The following packages are optional but recommended: libbz2-dev, liblz4-dev, libpcre3-dev, libreadline-dev, liblzma-dev, libssl-dev, zlib1g-dev.

The following packages are optional: libasan5, libcfitsio-dev, libcmocka-dev, libcurl4-gnutls-dev, libgdal-dev, libgeos-dev, libnetcdf-dev, libproj-dev, libxml2-dev, python3-dev, python3-numpy, r-base-dev, unixodbc-dev, valgrind.

cmake must be at least version 3.12, python must be at least version 3.5.

Bugs

We of course hope there aren't any, but if you do find one, you can report bugs in our github repository.

Please note that we do not accept github Pull Requests. See the developers page for instructions.

Copyright Notice

SPDX-License-Identifier: MPL-2.0

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.

Copyright 2024 MonetDB Foundation; Copyright August 2008 - 2023 MonetDB B.V.; Copyright 1997 - July 2008 CWI.

monetdblite-r's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

monetdblite-r's Issues

[request] support for integer64 from bit64 package

Would be great to have support for mapping of R's bit64::integer64 to monetdb BIGINT. (I believe this is quite simple since bit64::integer64 is just long long and I'm pretty sure same as BIGINT)

I can't find a solution how to keep integer64. Is there any workaround?

import txt format data

I noticed that MonetDBLite can only import csv file data, could you add support for txt (fixed width and tab delimited) format data?

Data set is getting corrupted when put in MonetDB

I'm trying to put a really large dataset of tweets into MonetDB using MonetDBLite. I have already preprocessed the data and I have over 500M rows and 123 columns, which are a mixture of string, double, logical and date columns. The data are stored in chunks in .rds format which row_bind() just fine, indicating colnames and col_types are uniform. However, when I put the data in MonetDB, I realised it gets corrupted. For instance, id_str are tweet IDs in string format and they should have same nchar() across all tweets. When I count the nchars() before putting the data in MonetDB, it works just fine

> temp_data %>%
+   select(id_str) %>%
+   mutate(id_nchar=nchar(id_str)) %>%
+   count(id_nchar)
# A tibble: 1 x 2
  id_nchar     n
     <int> <int>
1       18 34403

However, when I try the same after putting the data in MonetDB, data gets corrupted.


dbdir <- "~/data/monetdb_identity_dataset" #directory of the Monet database, should be empty
con <- DBI::dbConnect(MonetDBLite::MonetDBLite(), dbdir)
dbWriteTable(con, "identity_dataset",temp_data, append=T)

> dplyr::tbl(con, "identity_dataset") %>%
+   select(id_str) %>%
+   mutate(id_nchar=nchar(id_str)) %>%
+   count(id_nchar)
# Source:   lazy query [?? x 2]
# Database: MonetDBEmbeddedConnection
   id_nchar     n
      <int> <dbl>
 1       18 32245
 2        0  2088
 3       NA    62
 4        9     1
 5       15     1
 6        2     1
 7        8     1
 8       14     1
 9        1     1
10        7     1

Seeing this, I tried decided to check the id_str to see if everything is OK but unfortunately, I get this, which is super weird:


> dplyr::tbl(con, "identity_dataset") %>%
+   select(id_str) %>%
+   mutate(id_nchar=nchar(id_str)) %>%
+   arrange(id_nchar)
# Source:     lazy query [?? x 2]
# Database:   MonetDBEmbeddedConnection
# Ordered by: id_nchar
   id_str     id_nchar
   <chr>         <int>
 1 "\xe0\x8e"       NA
 2 "\xe0\x8e"       NA
 3 "\xe0\x8e"       NA
 4 "\xe0\x8e"       NA
 5 "\xe0\x8e"       NA
 6 "\xe0\x8e"       NA
 7 "\xe0\x8e"       NA
 8 "\xe0\x8e"       NA
 9 "\xe0\x8e"       NA
10 "\xe0\x8e"       NA
# ... with more rows

This is very interesting as I checked the source .rds files and these weird characters in id_str column certainly do not appear in the source data. They only appear after I put the data in MonetDB and query.

Is there anything I can do to debug this problem? There are certain columns (like tweet_text, user_description) which can contain newlines or commas, emojis etc and I am not sure if MonetDB's handling these special characters might be causing the issue or not (drawing on my prior experience with multiple csv parsers). Or could this be an encoding issue. I really like to use MonetDB for this project and hope there is a simple solution I might be missing.

Adding sessioninfo::session_info()


> sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 3.5.1 (2018-07-02)
 os       Red Hat Enterprise Linux Server 7.4 (Maipo)
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_GB.UTF-8
 ctype    en_GB.UTF-8
 tz       Europe/London
 date     2018-12-29

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date       lib source
 assertthat    0.2.0   2017-04-11 [2] CRAN (R 3.5.1)
 bindr         0.1.1   2018-03-13 [2] CRAN (R 3.5.1)
 bindrcpp    * 0.2.2   2018-03-29 [2] CRAN (R 3.5.1)
 cli           1.0.1   2018-09-25 [1] CRAN (R 3.5.1)
 codetools     0.2-16  2018-12-24 [1] CRAN (R 3.5.1)
 crayon        1.3.4   2017-09-16 [2] CRAN (R 3.5.1)
 DBI         * 1.0.0   2018-05-02 [2] CRAN (R 3.5.1)
 dbplyr        1.2.2   2018-07-25 [1] CRAN (R 3.5.1)
 digest        0.6.18  2018-10-10 [1] CRAN (R 3.5.1)
 dplyr       * 0.7.8   2018-11-10 [1] CRAN (R 3.5.1)
 fansi         0.4.0   2018-10-05 [1] CRAN (R 3.5.1)
 furrr       * 0.1.0   2018-05-16 [1] CRAN (R 3.5.1)
 future      * 1.10.0  2018-10-17 [1] CRAN (R 3.5.1)
 globals       0.12.4  2018-10-11 [1] CRAN (R 3.5.1)
 glue          1.3.0   2018-07-17 [2] CRAN (R 3.5.1)
 hms           0.4.2   2018-03-10 [1] CRAN (R 3.5.1)
 janitor       1.1.1   2018-07-31 [1] CRAN (R 3.5.1)
 listenv       0.7.0   2018-01-21 [1] CRAN (R 3.5.1)
 lubridate   * 1.7.4   2018-04-11 [1] CRAN (R 3.5.1)
 magrittr      1.5     2014-11-22 [2] CRAN (R 3.5.1)
 MonetDBLite * 0.6.0   2018-07-27 [1] CRAN (R 3.5.1)
 pillar        1.3.1   2018-12-15 [1] CRAN (R 3.5.1)
 pkgconfig     2.0.2   2018-08-16 [1] CRAN (R 3.5.1)
 purrr         0.2.5   2018-05-29 [2] CRAN (R 3.5.1)
 R6            2.3.0   2018-10-04 [1] CRAN (R 3.5.1)
 Rcpp          1.0.0   2018-11-07 [1] CRAN (R 3.5.1)
 readr       * 1.3.1   2018-12-21 [1] CRAN (R 3.5.1)
 rlang         0.3.0.1 2018-10-25 [1] CRAN (R 3.5.1)
 sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.5.1)
 snakecase     0.9.2   2018-08-14 [1] CRAN (R 3.5.1)
 stringi       1.2.4   2018-07-20 [1] CRAN (R 3.5.1)
 stringr     * 1.3.1   2018-05-10 [2] CRAN (R 3.5.1)
 tibble        1.4.2   2018-01-22 [2] CRAN (R 3.5.1)
 tidyselect    0.2.5   2018-10-11 [1] CRAN (R 3.5.1)
 utf8          1.1.4   2018-05-24 [2] CRAN (R 3.5.1)
 withr         2.1.2   2018-03-15 [2] CRAN (R 3.5.1)

dplyr::union with mismatch variables

Hi,

I found a minor issue, when taking the union of two tables where variables are not present in one of the two tables using dplyr::union, an error occurs when rendering the query:

suppressPackageStartupMessages({
  require(DBI)
  require(MonetDBLite)
  require(dplyr)
})
conn <- dbConnect(MonetDBLite(), ":memory:")
dbWriteTable(conn = conn, name = "x", value = data.frame(x = 1))
dbWriteTable(conn = conn, name = "y", value = data.frame(y = 1))
union(tbl(conn, "x"), tbl(conn, "y")) %>% show_query

#> Error: Cannot pass NA to dbQuoteIdentifier()

I would expect a similar result as when using RSQLite, namely,

suppressPackageStartupMessages({
  require(DBI)
  require(RSQLite)
  require(dplyr)
})
conn2 <- dbConnect(SQLite(), ":memory:")
dbWriteTable(conn = conn2, name = "x", value = data.frame(x = 1))
dbWriteTable(conn = conn2, name = "y", value = data.frame(y = 1))
union(tbl(conn2 , "x"), tbl(conn2 , "y")) %>% show_query
#> <SQL>
#> SELECT `x`, NULL AS `y`
#> FROM (SELECT *
#> FROM `x`)
#> UNION
#> SELECT NULL AS `x`, `y`
#> FROM (SELECT *
#> FROM `y`)

The issue seems to occur because

dplyr::sql_escape_ident(conn, NA_character_)
#> Error: Cannot pass NA to dbQuoteIdentifier()

whereas

dplyr::sql_escape_ident(conn2, NA_character_)
#> [1] "NULL"

computer becomes unresponsive with MonetDBLite working hard

On my laptop (MacBook Air, four years old) when I execute a query in MonetDBLite (from R.app) that creates a table much larger than memory, the computer becomes unresponsive while the query executes. About half the time there's the spinning beachball cursor, the other half there's a normal cursor but other applications are extremely slow to respond (eg, typing takes multiple seconds per character).

The behaviour looks similar to what you get with memory overuse and heavy paging, but it isn't: there's still free RAM. The 'Activity Monitor' still indicates 20-40% idle CPU time, but it shows that most of the CPU time being used is in the system, not in user space.

Is there some sort of checking for interrupts that MonetDBLite could do more often, or something?

Error messages being squelched

Hi!

I'm developing an R package that creates a MonetDBLite database of biological sequence data (https://github.com/ropensci/restez). Users can run automated functions for downloading large amounts of sequence data and store it in the database.

Recently, one user encountered the following error:

: Invalid argument
simpleError in gsub("\n", " ", res, fixed = TRUE): Unable to execute statement 'ROLLBACK'.
Server says 'SQLException:sql.trans:2DM30!ROLLBACK: not allowed in auto commit mode'.>
 
Error in gsub("\n", " ", res, fixed = TRUE) : 
  input string 1 is invalid UTF-8

The error appears to be due to an unexpected encoding for the incoming character string to gsub. I suspected it was due to foreign language symbols (the user's computer is in French). Initially, I was able to recreate the error on a Windows machine in Swedish. I was then able to fix the gsub error by re-encoding the incoming string for gsub in the monetdb_embedded_query function using iconv (see updated function below). Now when re-running the original user's code (you can recreate the error with this script), I get a message about what is exactly failing (partly in Swedish).

ftruncate: Invalid argument
Error in .local(conn, statement, ...) : 
  Unable to execute statement 'COMMIT'.
 ].ver says 'ERROR: HEAPextend: failed to extend for 05\516.theap: GDKmremap() failed ' [#OS: Enheten k‰nner inte igen kommandot.

.... which indicates that we're asking too much of our machine.

If I have diagnosed the problem correctly, could the monetdb_embedded_query function be updated to handle foreign language characters? Below is my updated version of the function, note the addition of iconv lines.

Many thanks,
Dom

monetdb_embedded_query <- function(conn, query, execute=TRUE, resultconvert=TRUE, int64=FALSE) {
  print('My function')
  if (!inherits(conn, MonetDBLite:::classname)) {
    stop("Invalid connection")
  }
  query <- as.character(query)
  if (length(query) != 1) {
    stop("Need a single query as parameter.")
  }
  if (!MonetDBLite:::monetdb_embedded_env$is_started) {
    stop("Call monetdb_embedded_startup() first")
  }
  if (MonetDBLite:::monetdb_embedded_env$started_dir != ":memory:" &&
      !dir.exists(file.path(MonetDBLite:::monetdb_embedded_env$started_dir, "bat"))) {
    stop("Someone killed all the BATs! Call Brigitte Bardot!")
  }
  execute <- as.logical(execute)
  if (length(execute) != 1) {
    stop("Need a single execute flag as parameter.")
  }
  resultconvert <- as.logical(resultconvert)
  if (length(resultconvert) != 1) {
    stop("Need a single resultconvert flag as parameter.")
  }
  int64 <- as.logical(int64)
  if (length(resultconvert) != 1) {
    stop("Need a single int64 flag as parameter.")
  }
  if (int64 && !requireNamespace("bit64", quietly = TRUE)) {
    stop("Need bit64 package for integer64 support")
  }
  
  # make sure the query is terminated
  query <- paste(query, "\n;", sep="")
  res <- .Call(MonetDBLite:::monetdb_query_R, conn, query, execute, resultconvert, interactive() &&
                 getOption("monetdb.progress", FALSE), int64)
  resp <- list()
  if (is.character(res)) { # error
    resp$type <- "!" # MSG_MESSAGE
    res <- iconv(res, to = 'UTF-8')
    resp$message <- gsub("\n", " ", res, fixed=TRUE)
  }
  if (is.numeric(res)) { # no result set, but successful
    resp$type <- 2 # Q_UPDATE
    resp$rows <- res
  }
  if (is.list(res)) {
    resp$type <- 1 # Q_TABLE
    if ("__prepare" %in% names(attributes(res))) {
      resp$type <- Q_PREPARE
      resp$prepare = attr(res, "__prepare")
      attr(res, "__prepare") <- NULL
    }
    attr(res, "row.names") <- c(NA_integer_, as.integer(-1 * attr(res, "__rows")))
    class(res) <- "data.frame"
    names(res) <- gsub("\\", "", names(res), fixed=T)
    resp$tuples <- res
  }
  resp
}

Error connecting to database when src_monetdb object exists in Global Environment

I've been very intrigued by the promise of the MonetDBLite with dplyr combo. However, problems with fatal error messages have kept me from using it more routinely. Picking it back up today, I thought I had figured out my problem and posted the following StackOverflow answer:

https://stackoverflow.com/a/53307673/3705612

To save people from needing to read that response, the error messages I receive are similar to this one:

Error in UseMethod("db_query_fields") : no applicable method for 'db_query_fields' applied to an object of class "MonetDBEmbeddedConnection"

I've seen more than just "db_query_fields" come up. For example, if I try and run a dplyr query on using the tbl object, I get an Error in UseMethod("db_sql_render") message instead.

After digging into it more, I've lost confidence in my theory that the order of package loading was the culprit. I'm now thinking the error arises simply from the fact that I load an R session with a src_monetdb object already in the Global Environment. As long as I load an R session with an empty Environment, I can load MonetDBLite before dplyr and dbplyr.

I use monet_shutdown() before restarting the R Session. Yet that doesn't seem to help. If the src_monetdb object is in the Global Environment on startup, I get the error.

So while I'm guessing this is not a bug, I was hoping to track down why this is happening. I've been using R, and very gratefully, for a few years now but the magic behind DBI , MonetDBILite, dbplyr, etc. is exactly that to me---I don't understand the intricacies of how they work. Please excuse me if I'm missing something obvious.

My session info is pasted below. I'm been using RStudio Version 1.2.1060 while investigating the issue. I also tried using the standard R Gui, making sure to save the src_monetdb object into the workspace, but could not reproduce the error there.

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] dbplyr_1.2.2 dplyr_0.7.8 DBI_1.0.0
[4] MonetDBLite_0.6.1

loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 codetools_0.2-15 crayon_1.3.4 packrat_0.4.9-3
[5] digest_0.6.18 assertthat_0.2.0 R6_2.3.0 magrittr_1.5
[9] pillar_1.3.0 rlang_0.3.0.1 rstudioapi_0.8 bindrcpp_0.2.2
[13] tools_3.5.1 glue_1.3.0 purrr_0.2.5 compiler_3.5.1
[17] pkgconfig_2.0.2 tidyselect_0.2.5 bindr_0.1.1 knitr_1.20
[21] tibble_1.4.2

string manipulations

Monetdblight would be even greater if it would support string manipulations like replacements out of the box. You have implemented a function for the LIKE operator without pcre, how about implementing string replacements as with REPLACE? That would be very handy for ad hoc data cleaning of larger datasets. Now, you would have to setup monetdb with pcre / an r string replacement function.

MonetDBLite installation

Hi all,

I'm trying to install monetdblite in a fresh Linux Mint machine following your pointers at https://www.monetdb.org/blog/monetdblite-r, but lots of issues came up.
I actually solved them, but I'm not sure it was the best solution. I'll sum them up next.
$ sudo -i R

Fresh MonetDB-Lite install with the following errors:

install.packages("MonetDBLite")
---- error: "zutil.h:21:22: fatal error: string.h: No such file or directory"
According to link: https://stackoverflow.com/questions/25184167/
It appears to be missing stdlib.h as well as stdio.h
These are standard C headers, that are found in the libc package.

SOLUTION:
$ sudo apt-get install libc6-dev

back to R
$ sudo -i R

install.packages("dplyr")
---- Error in library(dplyr) : there is no package called ‘dplyr’

Issue solved in: tidyverse/dplyr#2261

install.packages("devtools")
install.packages("Rcpp")
---- ERROR: compilation failed for package ‘Rcpp’ (no g++ at this point. Remember the fresh install)

$ apt-get install zlib1g-dev libssl-dev libssh2-1-dev libcurl4-openssl-dev g++

back to R
$ sudo -i R

install.packages("dplyr")

Now I figured that you meant "dbplyr", instead of "dplyr" when I got this:

ms <- MonetDBLite::src_monetdblite(dbdir)
---- Error in loadNamespace(name) : there is no package called ‘dbplyr’
install.packages("dbplyr")

Now, the example works at monetdblite-r link

Hope this helps you improving monetdb (great project, btw).

cheers,
Eduardo

Install dev version from github fails

Tried to install dev version from here by following the instructions, seems like something is missing when cloning the repo.

davidjankoski@davidjankoski-ubuntu:~⟫ git clone https://github.com/hannesmuehleisen/MonetDBLite-R.git --depth 1 --recursive

Cloning into 'MonetDBLite-R'...
remote: Counting objects: 53, done.
remote: Compressing objects: 100% (52/52), done.
remote: Total 53 (delta 0), reused 39 (delta 0), pack-reused 0
Unpacking objects: 100% (53/53), done.
Submodule 'src/monetdblite' (https://github.com/hannesmuehleisen/MonetDBLite-C) registered for path 'src/monetdblite'
Cloning into '/home/davidjankoski/MonetDBLite-R/src/monetdblite'...
remote: Counting objects: 474, done.
remote: Compressing objects: 100% (432/432), done.
remote: Total 474 (delta 90), reused 128 (delta 41), pack-reused 0
Receiving objects: 100% (474/474), 2.24 MiB | 718.00 KiB/s, done.
Resolving deltas: 100% (90/90), done.
error: no such remote ref 86c9c223716fedb4545ee1edc58fc2a92cd63de3
Fetched in submodule path 'src/monetdblite', but it did not contain 86c9c223716fedb4545ee1edc58fc2a92cd63de3. Direct fetching of that commit failed.

davidjankoski@davidjankoski-ubuntu:~⟫ R CMD INSTALL MonetDBLite-R

* installing to library ‘/home/davidjankoski/R/x86_64-pc-linux-gnu-library/3.4’
* installing *source* package ‘MonetDBLite’ ...
** libs
make: *** No rule to make target 'build/debug/monetdblite/common/stream.o', needed by 'MonetDBLite.so'.  Stop.
ERROR: compilation failed for package ‘MonetDBLite’
* removing ‘/home/davidjankoski/R/x86_64-pc-linux-gnu-library/3.4/MonetDBLite’
* restoring previous ‘/home/davidjankoski/R/x86_64-pc-linux-gnu-library/3.4/MonetDBLite’

thanks. david.

dplyr/dbplyr mutate/case_when

The code

library(DBI)
library(dplyr)
library(dbplyr)
con <- dbConnect(MonetDBLite::MonetDBLite())
#con <- dbConnect(RSQLite::SQLite())
data(iris)
names(iris)<-tolower(sub("\\.","_",names(iris)))
dbWriteTable(con,"iris_table",iris)
irisdb<-tbl(con,"iris_table")

mutate(irisdb, sepal_category = case_when(sepal_length>5.5~"long", sepal_length>5~"medium", TRUE~"short")) 

gives the error

Error: Columns `sepal_length`, `sepal_width`, `petal_length`, `petal_width`, `species` must be length 1 or 150, not 10, 10, 10, 10, 10

This could also be a dplyr/dbplyr bug, but using RSQLite instead of MonetDBLite (as with the commented-out line above) doesn't give the error, so I thought I should at least mention it here.

Concurrent read access by R processes

Hello,

At the end of the monetDB lite page I read:

"MonetDBLite does not allow multiple R sessions to connect to a single database concurrently. As soon as a single R session loads an embedded server, that server is locked down and inaccessible to other R consoles."

I was wondering what is the technical reason for this limitation and whether this limitation could be removed by, e.g., having the accessed tables be read-only. The reason I am interested is that monetDB with concurrent read access by R processes would be a killer "frontend db" for R powered web apps using, e.g., openCPU, which allows for api calls to an R package which are served by R processes (it would be useful for shiny too, of course). Any thoughts?

Riccardo.

Weird arithmetic (types?)

The following code

library(dplyr)
dbdir <- file.path(tempdir(), "dplyrdir")
my_db <- MonetDBLite::src_monetdblite(dbdir)
my_iris  <- copy_to(my_db, iris)
my_iris <- mutate(my_iris, set=ifelse(Species=="setosa",1,0))
my_iris %>% mutate(x= (set-.3)/987654321) %>% summarise(min(x),max(x)) %>% collect()

gives zero for the maximum and minimum (the entire column x is equal to zero). Since both set and x show up as double when my_iris is printed, x should be non-zero floating point numbers.

I suspect this is related to something I don't have a minimal example of, where I'm getting math overflow errors in summing a (quite long) column of numbers that should also be double.

Option to suppress messages about reserved SQL characters?

While I appreciate the attention to detail in the messages about reserved characters needing to be quoted in SQL queries, it would be great if there was an option to suppress these messages (either via an options() setting and/or function argument).

Many R users will use MonetDBLite as a database backend with dplyr / dbplyr, which already automatically quotes reserved characters, making these messages unnecessary and potentially confusing. I'm also using MonetDBLite as a backend in a few packages (for which I have a huge Thank You! being able to access a modern, fast, column-oriented database system that supports windowing functions without any server-side installation is completely game-changing), and would rather hide these messages from users since the quoting is taken care of.

Also wondering if they should be typed as warnings instead of messages? I suppose a case could be made either way.

Anyway, thanks for considering this request, and thanks again for this excellent package. I recently read your arxiv piece which I now send to pretty much everyone.

Support for compressed CSV files

Most CSV readers (notably, base R readers) support compressed (gz, bzip2...) CSV files transparently. It would be a nice addition to MonetDBLite::monetdb.read.csv, because big CSVs are commonly gzip'ed.

grep like commands are not translated correctly

I can't get the string detection to work through dbplyr:

tbl(con,"results") %>% 
   filter(grepl("position",name))

'ParseException:SQLparser:42000!SELECT: no such binary operator 'grepl(char,clob)''

or the stringr package:

tbl(con, "results" %>% 
   filter(str_detect(pattern = "position",string = name))

'ParseException:SQLparser:42000!SELECT: no such binary operator 'instr(clob,char)''

While it does work if I create the SQL:

DBI::dbGetQuery(con,"select * from results where name like '%position%'")

Where is this translation implemented? Is this a DBI issue, dbplyr issue? Where should I look?

compiler error installing MonetDBlite 0.5.0 on Ubuntu 17.10

Hi,

I upgraded Ubuntu to 17.10 recently and installed MonetDBLite 0.4.1 with no problem.
Today, attempting to install 0.5.0 from within RStudio I get a compiler error:

> install.packages("MonetDBLite")
Installing package into ‘/home/ross/R/x86_64-pc-linux-gnu-library/3.4’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/src/contrib/MonetDBLite_0.5.0.tar.gz'
Content type 'application/x-gzip' length 1381343 bytes (1.3 MB)
==================================================
downloaded 1.3 MB

* installing *source* package ‘MonetDBLite’ ...
** package ‘MonetDBLite’ successfully unpacked and MD5 sums checked
** libs
gcc -std=gnu99  -g -O2 -fdebug-prefix-map=/build/r-base-8tMiyU/r-base-3.4.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -I/usr/share/R/include -DNDEBUG -fpic -Wno-format -DLIBGDK -DLIBMAL -DLIBOPTIMIZER -DLIBSTREAM -DHAVE_EMBEDDED_R -DMONETDBLITE_COMPILE -Imonetdblite/src/ -Imonetdblite/src/common -Imonetdblite/src/embedded -Imonetdblite/src/gdk -Imonetdblite/src/mal/mal -Imonetdblite/src/mal/modules -Imonetdblite/src/mal/optimizer -Imonetdblite/src/mal/sqlbackend -Imonetdblite/src/sql/include -Imonetdblite/src/sql/common -Imonetdblite/src/sql/server -Imonetdblite/src/sql/storage -Imonetdblite/src/sql/storage/bat -MMD -MF build/deps/monetdblite/common/stream.d -c monetdblite/src/common/stream.c -o build/objects/monetdblite/common/stream.o
cc1: error: -Wformat-security ignored without -Wformat [-Werror=format-security]
cc1: some warnings being treated as errors
Makevars:194: recipe for target 'build/objects/monetdblite/common/stream.o' failed
make: *** [build/objects/monetdblite/common/stream.o] Error 1
ERROR: compilation failed for package ‘MonetDBLite’
* removing ‘/home/ross/R/x86_64-pc-linux-gnu-library/3.4/MonetDBLite’
* restoring previous ‘/home/ross/R/x86_64-pc-linux-gnu-library/3.4/MonetDBLite’
Warning in install.packages :
  installation of package ‘MonetDBLite’ had non-zero exit status

The downloaded source packages are in
	‘/tmp/Rtmp7rYzho/downloaded_packages’

I don't have a Makevars file, so presumably the compiler flags are the defaults for installation. I have so far avoided tinkering with the compiler flags as I don't know what the maintainers would regard as essential.

Is this something the maintainers would care about or more likely something idiosyncratic about my set-up?

R crashing when trying to establish connection to existing database

I recently updated to R 3.5.1 and when I try to connect to my existing database it causes R to crash.

I can create a new database without a problem:

require(DBI)
require(RMySQL)
require(MonetDBLite)

test_db <- src_monetdblite("test_db_Monet", create = TRUE)

str(test_db)
List of 1
 $ con:Formal class 'MonetDBEmbeddedConnection' [package "MonetDBLite"] with 1 slot
  .. ..@ connenv:<environment: 0x7f9d6c79ab00> 
 - attr(*, "class")= chr [1:3] "src_monetdb" "src_sql" "src"

dbDisconnect(test_db$con, shutdown = TRUE)
rm(test_db)

I can reconnect to it as well without crashing:

test_db <- src_monetdblite("test_db_Monet", create = FALSE)
str(test_db)
List of 1
 $ con:Formal class 'MonetDBEmbeddedConnection' [package "MonetDBLite"] with 1 slot
  .. ..@ connenv:<environment: 0x7f9d6b30e270> 
 - attr(*, "class")= chr [1:3] "src_monetdb" "src_sql" "src"

The only thing that seems to work is to establish a connection to another MonetDBLite database first, and then try connecting to my existing database:

require(DBI)
require(RMySQL)
require(MonetDBLite)

test_db <- src_monetdblite("test_db_Monet", create = FALSE)
dbDisconnect(test_db$con, shutdown = TRUE)
rm(test_db)

statcast_db <- src_monetdblite("/Users/williampetti/statcast_database/statcast_db_Monet", create = FALSE)

dbListTables(statcast_db$con)

[1] "boxscore_gameday_info" "player_info"           "statcast_17"          
[4] "temp"                  "umpires_games"   

I should note the crash only occurs if I run the code in RStudio or from the R GUI. If I source a file, for example, the issue does not seem to occur.

The old database was created with a prior version of MonetDBLite and R.

Any idea what may be causing this?

String concatenation

Does the library support any string concatenation function?
I have seen that MonetDB offer support to string concatenation methods, but I've tried using paste, paste0, str_c and cat and no one seems to be supported by the library.

As an example, using paste inside a mutate throws an error.

This is the code I'm executing:

elections <- tbl(monet_connection, 'elections') %>%
  distinct()
elections <- elections %>%
  rename(name = election_description)
elections <- elections %>%
  mutate(name = toupper(name))
elections <- elections %>%
  mutate(name = paste('test', year)) %>% 
  collect()

The query generated by explain is:

SELECT "year", PASTE('test', "year") AS "name"
FROM (SELECT "year", UPPER("name") AS "name"
FROM (SELECT "year", "election_description" AS "name"
FROM (SELECT DISTINCT *
FROM "elections") "xjjnkzodlb") "hfvbvasaux") "yhkoodzuan"

And the resulting error is

Error in .local(conn, statement, ...) : Unable to execute statement 'SELECT "year", PASTE('test', "year") AS "name" FROM (SELECT "year", UPPER("name") AS "name" FROM (SE...'. Server says 'ParseException:SQLparser:42000!SELECT: no such binary operator 'paste(char,int)''.

Fix valgrind

==39140== Conditional jump or move depends on uninitialised value(s)
==39140==    at 0x504A73: checkValues (svn/R-devel/src/main/logic.c:424)
==39140==    by 0x504A73: do_logic3 (svn/R-devel/src/main/logic.c:480)
==39140==    by 0x4CB5F7: bcEval (svn/R-devel/src/main/eval.c:6736)
==39140==    by 0x4D9E3F: Rf_eval (svn/R-devel/src/main/eval.c:624)
==39140==    by 0x4DB72E: R_execClosure (svn/R-devel/src/main/eval.c:1764)
==39140==    by 0x515B87: dispatchMethod.isra.3 (svn/R-devel/src/main/objects.c:419)
==39140==    by 0x515E7D: Rf_usemethod (svn/R-devel/src/main/objects.c:455)
==39140==    by 0x5162E0: do_usemethod (svn/R-devel/src/main/objects.c:535)
==39140==    by 0x4CB42F: bcEval (svn/R-devel/src/main/eval.c:6756)
==39140==    by 0x4D9E3F: Rf_eval (svn/R-devel/src/main/eval.c:624)
==39140==    by 0x4DB72E: R_execClosure (svn/R-devel/src/main/eval.c:1764)
==39140==    by 0x4D9FCC: Rf_eval (svn/R-devel/src/main/eval.c:747)
==39140==    by 0x4DDE4F: do_set (svn/R-devel/src/main/eval.c:2745)
==39140==  Uninitialised value was created by a client request
==39140==    at 0x5113D7: Rf_allocVector3 (svn/R-devel/src/main/memory.c:2797)
==39140==    by 0x16950942: monetdb_r_dressup (packages/tests-vg/MonetDBLite/src/embeddedr/converters.c.h:147)
==39140==    by 0x169528CA: bat_to_sexp (packages/tests-vg/MonetDBLite/src/embeddedr/converters.c.h:236)
==39140==    by 0x16953549: monetdb_query_R (packages/tests-vg/MonetDBLite/src/embeddedr/embeddedr.c:158)
==39140==    by 0x4D0282: bcEval (svn/R-devel/src/main/eval.c:7243)
==39140==    by 0x4D9E3F: Rf_eval (svn/R-devel/src/main/eval.c:624)
==39140==    by 0x4DB72E: R_execClosure (svn/R-devel/src/main/eval.c:1764)
==39140==    by 0x4D1E95: bcEval (svn/R-devel/src/main/eval.c:6704)
==39140==    by 0x4D9E3F: Rf_eval (svn/R-devel/src/main/eval.c:624)
==39140==    by 0x4DB72E: R_execClosure (svn/R-devel/src/main/eval.c:1764)
==39140==    by 0x4D9FCC: Rf_eval (svn/R-devel/src/main/eval.c:747)
==39140==    by 0x4DCCBC: do_begin (svn/R-devel/src/main/eval.c:2362)

The limit of data size

I am writing a data I am retrieving from an remote Postgresql server to MonetDBLite. I got the error message: Error: cannot allocate vector of size 712.8 Mb. I thought as long as the data is smaller than my disk, MonetDBLite will be able to handle it. Is there any limit for the size of a single data to be able to fit into MonetDBLite?

Insert vector from R into existing table column

I would like to use monetDB lite as an alternative to SAS for my medium data problems. To that end I need to add new columns consisting of transformations of existing columns done in R. I have searched the internet and documentation for a solution to this and have not found an answer.

I would like to do something like this:

con <- dbConnect( MonetDBLite::MonetDBLite() , ":memory:" )
dbWriteTable(con, "mtcars", mtcars)

dbSendQuery(con, "ALTER TABLE mtcars ADD COLUMN mpg_per_cyl double")
dbListFields(con, "mtcars")

To update the new column I have tried using databinding, but it seems this only works for one value:

## ?? how to populate new column from R?
insert <- dbSendQuery(con, 'PREPARE INSERT INTO mtcars (mpg_per_cyl) VALUES (?)')

> dbBind(insert, list(mpg_per_cyl=mtcars$mpg/mtcars$cyl))
Error in vapply(params, function(x) { : values must be length 1,
 but FUN(X[[1]]) result is length 32
In addition: Warning message:
In if (is.na(x)) "NULL" else if (is.numeric(x) || is.logical(x)) { :
  the condition has length > 1 and only the first element will be used

The documentation for dbSendUpdate has the only reference to placeholders in the MonetDBLite package I could find. I am a bit new to working with DBs so forgive me if I've overlooked something obvious. Does this capability already exist?

failed initialization

I'm trying to re-connect to a previously-initiated permanently-stored database on my local machine.

library(package = DBI)
library(package = MonetDBLite)

dbdir <- "C:/path/to/database_directory"

con <- DBI::dbConnect(drv = MonetDBLite::MonetDBLite(), dbdir)

Establishment of the con fails due to the following error:

Error in monetdb_embedded_startup(embedded, !getOption("monetdb.debug.embedded", : 
Failed to initialize embedded MonetDB !FATAL: 
Fatal error during initialization:
SQLException:SQLinit:Catalogue initialization failed

The sessionInfo is as follows:

R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bindrcpp_0.2      zhaoy_0.0.1       purrr_0.2.4       MonetDBLite_0.5.1 lubridate_1.7.2  
[6] dplyr_0.7.4       DBI_0.7          

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15     knitr_1.20       bindr_0.1        feather_0.3.1    magrittr_1.5     hms_0.4.1       
 [7] R6_2.2.2         rlang_0.1.6      stringr_1.3.0    tools_3.4.3      readxl_1.0.0     yaml_2.1.16     
[13] assertthat_0.2.0 digest_0.6.15    rprojroot_1.3-2  tibble_1.4.2     codetools_0.2-15 glue_1.2.0      
[19] stringi_1.1.6    cellranger_1.1.0 compiler_3.4.3   pillar_1.1.0     backports_1.1.2  pkgconfig_2.0.1

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.