Giter Site home page Giter Site logo

go-duckdb's Introduction

Go SQL driver for DuckDB

The DuckDB driver conforms to the built-in database/sql interface.

Tests status

Installation

go get github.com/marcboeker/go-duckdb

go-duckdb uses CGO to make calls to DuckDB. You must build your binaries with CGO_ENABLED=1.

Usage

go-duckdb hooks into the database/sql interface provided by the Go stdlib. To open a connection, simply specify the driver type as duckdb.

db, err := sql.Open("duckdb", "")
if err != nil {
    ...
}
defer db.Close()

This creates an in-memory instance of DuckDB. To open a persistent database, you need to specify a filepath to the database file. If the file does not exist, then DuckDB creates it.

db, err := sql.Open("duckdb", "/path/to/foo.db")
if err != nil {
	...
}
defer db.Close()

If you want to set specific config options for DuckDB, you can add them as query style parameters in the form of name=value pairs to the DSN.

db, err := sql.Open("duckdb", "/path/to/foo.db?access_mode=read_only&threads=4")
if err != nil {
    ...
}
defer db.Close()

Alternatively, you can use sql.OpenDB. That way, you can perform initialization steps in a callback function before opening the database. Here's an example that installs and loads the JSON extension when opening a database with sql.OpenDB(connector).

connector, err := duckdb.NewConnector("/path/to/foo.db?access_mode=read_only&threads=4", func(execer driver.ExecerContext) error {
    bootQueries := []string{
        "INSTALL 'json'",
        "LOAD 'json'",
    }

    for _, query := range bootQueries {
        _, err = execer.ExecContext(context.Background(), query, nil)
        if err != nil {
            ...
        }
    }
    return nil
})
if err != nil {
    ...
}

db := sql.OpenDB(connector)
defer db.Close()

Please refer to the database/sql documentation for further usage instructions.

Memory Allocation

DuckDB lives in-process. Therefore, all its memory lives in the driver. All allocations live in the host process, which is the Go application. Especially for long-running applications, it is crucial to call the corresponding Close-functions as specified in database/sql. The following is a list of examples.

db, err := sql.Open("duckdb", "")
defer db.Close()

conn, err := db.Conn(context.Background())
defer conn.Close()

rows, err := conn.QueryContext(context.Background(), "SELECT 42")
// alternatively, rows.Next() has to return false
rows.Close()

appender, err := NewAppenderFromConn(conn, "", "test")
defer appender.Close()

// if not passed to sql.OpenDB
connector, err := NewConnector("", nil)
defer connector.Close()

DuckDB Appender API

If you want to use the DuckDB Appender API, you can obtain a new Appender by passing a DuckDB connection to NewAppenderFromConn().

connector, err := duckdb.NewConnector("test.db", nil)
if err != nil {
	...
}
defer connector.Close()

conn, err := connector.Connect(context.Background())
if err != nil {
	...
}
defer conn.Close()

// obtain an appender from the connection
// NOTE: the table 'test_tbl' must exist in test.db
appender, err := NewAppenderFromConn(conn, "", "test_tbl")
if err != nil {
	...
}
defer appender.Close()

err = appender.AppendRow(...)
if err != nil {
	...
}

DuckDB Apache Arrow Interface

If you want to use the DuckDB Arrow Interface, you can obtain a new Arrow by passing a DuckDB connection to NewArrowFromConn().

connector, err := duckdb.NewConnector("", nil)
if err != nil {
	...
}
defer connector.Close()

conn, err := connector.Connect(context.Background())
if err != nil {
	...
}
defer conn.Close()

// obtain the Arrow from the connection
arrow, err := duckdb.NewArrowFromConn(conn)
if err != nil w
	...
}

rdr, err := arrow.QueryContext(context.Background(), "SELECT * FROM generate_series(1, 10)")
if err != nil {
	...
}
defer rdr.Release()

for rdr.Next() {
  // process records
}

The Arrow interface is a heavy dependency. If you do not need it, you can disable it by passing -tags=no_duckdb_arrow to go build. This will be made opt-in in V2.

go build -tags="no_duckdb_arrow"

Vendoring

If you want to vendor a module containing go-duckdb, please use modvendor to include the missing header files and libraries. See issue #174 for more details.

  1. go install github.com/goware/modvendor@latest
  2. go mod vendor
  3. modvendor -copy="**/*.a **/*.h" -v

Now you can build your module as usual.

Linking DuckDB

By default, go-duckdb statically links DuckDB into your binary. Statically linking DuckDB adds around 30 MB to your binary size. On Linux (Intel) and macOS (Intel and ARM), go-duckdb bundles pre-compiled static libraries for fast builds.

Alternatively, you can dynamically link DuckDB by passing -tags=duckdb_use_lib to go build. You must have a copy of libduckdb available on your system (.so on Linux or .dylib on macOS), which you can download from the DuckDB releases page. For example:

# On Linux
CGO_ENABLED=1 CGO_LDFLAGS="-L/path/to/libs" go build -tags=duckdb_use_lib main.go
LD_LIBRARY_PATH=/path/to/libs ./main

# On macOS
CGO_ENABLED=1 CGO_LDFLAGS="-L/path/to/libs" go build -tags=duckdb_use_lib main.go
DYLD_LIBRARY_PATH=/path/to/libs ./main

Notes

TIMESTAMP vs. TIMESTAMP_TZ

In the C API, DuckDB stores both TIMESTAMP and TIMESTAMP_TZ as duckdb_timestamp, which holds the number of microseconds elapsed since January 1, 1970 UTC (i.e., an instant without offset information). When passing a time.Time to go-duckdb, go-duckdb transforms it to an instant with UnixMicro(), even when using TIMESTAMP_TZ. Later, scanning either type of value returns an instant, as SQL types do not model time zone information for individual values.

go-duckdb's People

Contributors

adityahegde avatar ajzo90 avatar andyyu2004 avatar begelundmuller avatar binkynz avatar carlopi avatar chhetripradeep avatar deka108 avatar elefeint avatar esevastyanov avatar flarco avatar florianuekermann avatar goober avatar junhwong avatar k-anshul avatar maiadegraaf avatar marcboeker avatar mathiasgr avatar mesge avatar mrflynn avatar panta avatar pborzenkov avatar pjain1 avatar rakeshsharma14317 avatar rrgilchrist avatar segfault88 avatar taniabogatsch avatar tn1ck avatar williamhbaker avatar wolfeidau avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-duckdb's Issues

Parquet cannot read string/byte array field

Hello,

I am trying to use your project for reading parquet files. It seems to work alright for types like int/double/boolean, however when it comes to string values, it always returns nil. I am using the 0.4.0 pre-compiled duck db version on ubuntu. Below the code for the generation and reading. I did try python's pyarrow.parquet package as well which can read the table okay. Any thoughts would be appreciated. Using Go 17

Thanks.

package main

import (
	"database/sql"
	"fmt"
	"log"
	"math/rand"
	"os"
	"path"
	"reflect"
	"time"

	_ "github.com/marcboeker/go-duckdb"

	"github.com/segmentio/parquet-go"
	"github.com/segmentio/parquet-go/compress"
	"github.com/segmentio/parquet-go/encoding"
	"github.com/segmentio/parquet-go/encoding/plain"
)

type (
	Schema struct {
		fields []parquet.Field
	}

	Field struct {
		name string
		typ  parquet.Type
	}
)

// For leaf nodes, returns the type of values of the parquet column.
//
// Calling this method on non-leaf nodes will panic.
func (s Schema) Type() parquet.Type {
	// TODO:
	return parquet.Group{}.Type()
}

func (s Schema) Optional() bool {
	return false
}

func (s Schema) Repeated() bool {
	return false
}

func (s Schema) Required() bool {
	return true
}

func (s Schema) Leaf() bool {
	return false
}

func (s Schema) Fields() []parquet.Field {
	return s.fields
}

func (s Schema) Encoding() encoding.Encoding {
	return &plain.Encoding{}
}

func (s Schema) Compression() compress.Codec {
	return nil
}

func (s Schema) String() string {
	return fmt.Sprintf("%+v", s.fields)
}

func (s Schema) GoType() reflect.Type {
	panic("not supported")
}

func (f Field) String() string {
	return fmt.Sprintf("%s: %v", f.name, f.typ)
}

func (f Field) Type() parquet.Type {
	return f.typ
}

func (f Field) Optional() bool {
	return false
}

func (f Field) Repeated() bool {
	return false
}

func (f Field) Required() bool {
	return false
}

func (f Field) Leaf() bool {
	return true
}

func (f Field) Fields() []parquet.Field {
	return nil
}

func (f Field) Encoding() encoding.Encoding {
	return &plain.Encoding{}
}

func (f Field) Compression() compress.Codec {
	return nil
}

func (f Field) GoType() reflect.Type {
	panic("not supported")
}

func (f Field) Name() string {
	return f.name
}

func (f Field) Value(base reflect.Value) reflect.Value {
	panic("not supported")
}

const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

func RandStringBytes(n int) string {
	b := make([]byte, n)
	for i := range b {
		b[i] = letterBytes[rand.Intn(len(letterBytes))]
	}
	return string(b)
}

func RandBool() bool {
	rand.Seed(time.Now().UnixNano())
	return rand.Intn(2) == 1
}

func main() {
	tempDir, _ := os.MkdirTemp("/tmp", "writer_test")
	parquetFileNameDataTable := path.Join(tempDir, "test.parquet")
	log.Print(parquetFileNameDataTable)
	file, err := os.Create(parquetFileNameDataTable)
	if err != nil {
		log.Fatal(err)
	}

	fields := make([]parquet.Field, 4)
	fields[0] = Field{
		name: "int_field",
		typ:  parquet.Int64Type,
	}
	fields[1] = Field{
		name: "string_field",
		typ:  parquet.ByteArrayType,
	}
	fields[2] = Field{
		name: "double_field",
		typ:  parquet.DoubleType,
	}
	fields[3] = Field{
		name: "bool_field",
		typ:  parquet.BooleanType,
	}

	schema := parquet.NewSchema("", Schema{fields: fields})

	writer := parquet.NewWriter(file, &parquet.WriterConfig{
		Schema: schema,
	})

	// Boolean           Kind = Kind(format.Boolean)
	// Int32             Kind = Kind(format.Int32)
	// Int64             Kind = Kind(format.Int64)
	// Int96             Kind = Kind(format.Int96)
	// Float             Kind = Kind(format.Float)
	// Double            Kind = Kind(format.Double)
	// ByteArray         Kind = Kind(format.ByteArray)
	// FixedLenByteArray Kind = Kind(format.FixedLenByteArray)

	row := make(parquet.Row, 4)

	var val0 int64
	var val1 string
	var val2 float64
	var val3 bool

	for i := 1; i < 5; i++ {

		val0 = rand.Int63()
		val1 = RandStringBytes(10)
		val2 = rand.Float64()
		val3 = RandBool()
		log.Print(val0, ", ", val1, ", ", val2, ", ", val3)
		row[0] = parquet.ValueOf(val0).Level(0, 0, 0)
		row[1] = parquet.ValueOf(val1).Level(0, 0, 1)
		row[2] = parquet.ValueOf(val2).Level(0, 0, 2)
		row[3] = parquet.ValueOf(val3).Level(0, 0, 3)
		rows := []parquet.Row{row}
		// writer header
		_, err = writer.WriteRows(rows)
		if err != nil {
			log.Fatal(err)
		}
	}

	if err = writer.Flush(); err != nil {
		log.Fatal(err)
	}
	if err = writer.Close(); err != nil {
		log.Fatal(err)
	}

	db, err := sql.Open("duckdb", "")
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	var tmpVal1 []byte

	setting := db.QueryRow(fmt.Sprintf("SELECT * FROM  read_parquet('%s')", parquetFileNameDataTable))
	err = setting.Scan(&val0, &tmpVal1, &val2, &val3)
	log.Print("*********************************************")
	log.Print(val0, ", ", tmpVal1, ", ", val2, ", ", val3)
	if err != nil {
		log.Fatal(err)
	}
}

This is the output:

2022/06/22 14:19:54 /tmp/writer_test2668547560/test.parquet
2022/06/22 14:19:54 5577006791947779410, VlBzgbaiCM, 0.8136399609900968, true
2022/06/22 14:19:54 4705042688094899580, lmRoXEYsyY, 0.6691002750474139, false
2022/06/22 14:19:54 2812245154115238472, ZskpxqxDvC, 0.6354819286351501, true
2022/06/22 14:19:54 409990495156318611, xxwmKfeUSu, 0.953294072562028, false
2022/06/22 14:19:54 *********************************************
2022/06/22 14:19:54 5577006791947779410, [], 0.8136399609900968, true

As seen the string is empty.

Fails to build with `go mod vendor`

/scratch/duckdb$ go build
/scratch/duckdb$ go mod vendor
/scratch/duckdb$ go build
/usr/local/go/pkg/tool/linux_amd64/link: running g++ failed: exit status 1
/usr/bin/ld: cannot find -lduckdb: No such file or directory
collect2: error: ld returned 1 exit status

/scratch/duckdb$ rm -rf vendor/
/scratch/duckdb$ go build

Sorry for the drive by issue. I just solved it by not using vendor. Might break my Docker build but it works for now. Will follow up if I find a solution.

Querying Parquet files

I compiled the go-duckdb into a static binary and ran the example

panic: Catalog Error: Table with name my.parquet does not exist!
Did you mean "pg_proc"?
LINE 1: select count(*) from 'my.parquet';
                             ^

goroutine 1 [running]:
main.check(...)

I modified the query to use the read_parquet function as described in the DuckDB docs and I ran into this error

./main
panic: Catalog Error: Function with name read_parquet is not on the catalog, but it exists in the parquet extension. To Install and Load the extension, run: INSTALL parquet; LOAD parquet;

goroutine 1 [running]:
main.check(...)

How does one configure this extension as a part of go-duckdb binary ?

Compilation failure on Linux

I got massive amount of errors about libc. I thought this package uses prebundled library? It looks like it is trying to compile from source.

# github.com/marcboeker/go-duckdb
~/go/pkg/mod/github.com/marcboeker/[email protected]/deps/linux_amd64/libduckdb.a(duckdb.o): In function `std::vector<duckdb::PhysicalIndex, std::allocator<duckdb::PhysicalIndex> >::operator=(std::vector<duckdb::PhysicalIndex, std::allocator<duckdb::PhysicalIndex> > const&) [clone .isra.0]':
duckdb.cpp:(.text+0x1d97e): undefined reference to `std::__throw_bad_array_new_length()'
~/go/pkg/mod/github.com/marcboeker/[email protected]/deps/linux_amd64/libduckdb.a(duckdb.o): In function `std::vector<duckdb::LogicalType, std::allocator<duckdb::LogicalType> >::vector(std::initializer_list<duckdb::LogicalType>, std::allocator<duckdb::LogicalType> const&) [clone .constprop.0]':
duckdb.cpp:(.text+0x26bbb): undefined reference to `__libc_single_threaded'
~/go/pkg/mod/github.com/marcboeker/[email protected]/deps/linux_amd64/libduckdb.a(duckdb.o): In function `duckdb::InitializeValidities(duckdb::Vector&, unsigned long&)':
duckdb.cpp:(.text+0x2d288): undefined reference to `__libc_single_threaded'
duckdb.cpp:(.text+0x2d30c): undefined reference to `__libc_single_threaded'
~/go/pkg/mod/github.com/marcboeker/[email protected]/deps/linux_amd64/libduckdb.a(duckdb.o): In function `duckdb::RelationStatement::Copy() const':
duckdb.cpp:(.text+0x2e9b9): undefined reference to `__libc_single_threaded'

The system I am running on is old, so maybe stdc++ is also old. Not sure if it is related.

Support for decimal types

Scanning of decimal is not currently implemented.
If we try to scan a decimal we get the error not supported type 19.

Decimal types come up pretty often. Something as simple as select 1.0 * 1.0 yields a decimal result.

Append rows using sql.Null* types

Appender.AppendRowArray doesn't recognize the sql.Null* value types like sql.NullString. Clients can provide a driver.Value that is either nil (to insert NULL) or a fundamental type, but the sql.Null* types are more readable and type-safe.

Static linking on Windows broken

When compiling the amalgamation file to build a static library the compiler complains about definition is marked dllimport. Others have this problem too.

But when adding a -DDUCKDB_BUILD_LIBRARY the errors and warnings are gone and the resulting libduckdb.a is successfully created. But linking agains it throws the following errors (for more examples see log.txt):

C:\Users\dev\AppData\Local\Temp\go-link-2035893457\000007.o: in function `_cgo_37088fb8ef74_Cfunc_duckdb_bind_uint16':
/tmp/go-build/cgo-gcc-prolog:313: undefined reference to `__imp_duckdb_bind_uint16'

This is very weird as the Darwin and Linux builds succeed. The code is located under https://github.com/marcboeker/go-duckdb/tree/windows-builds

Support for composite types (lists, structs, maps)

Hey, Marc!

For my use cases, I need to query composite types. I noticed this is something that isn't currently implemented, was it something that you planned to implement? I've been experimenting on a fork that adds support for this. If this is something you're interested in I'd love to get this merged.

Unfortunately, the changes are relatively large, so I'd like to check a few things with you first. (changes can be found here)

  • Some changes require generics and therefore go1.18. The rationale is it provides a nicer API for scanning composites. If this is something you'd prefer to not do, I'm sure I can find a way to rewrite it.
  • I'll also pulled in testify and sqlscan purely for convenient testing. If you prefer to have zero dependencies, I'm happy to rewrite the tests.
  • The primary change is using the duckdb data_chunk API instead of duckdb_column_data (which apparently is deprecated in favor of data_chunk anyway). I couldn't find a way to get composites to work otherwise.

Let me know what you think! If you're interested, I will open a PR.

Support query cancellation and deadlines using contexts

Currently, the go-duckdb driver doesn't support cancelling queries when a context is cancelled. I just discovered it might be easy to add:

  • By implementing database/sql/driver.StmtExecContext instead of database/sql/driver.Stmt, we get access to the query context and can check for cancellation.
  • In statement.go, duckdb_execute_prepared is used to run a query. That will block for the duration of the entire query, which means we can't cancel a DuckDB query if the context gets cancelled. However, DuckDB provides an alternative that allows the caller to periodically regain control during query execution โ€“ it is duckdb_pending_prepared and related functions (docs).

It looks like the Julia driver moved to using this (PR) to allow garbage collection to run. We could use it to periodically check the context, and cancel a query if the context has been cancelled or exceeded its deadline.

What do you think of this idea? Would be happy to submit a PR for this

Querying Apache Arrow table

I'm wondering if it's possible to load an Apache Arrow table for querying with go-duckdb - if so, could you please point me to an example usage?

Thanks for your time! ๐Ÿ™‚

functions about parquet seems not working

Hi,

I wrote a simple code in golang, just running a SQL statement that borrowed from duckdb example, like this:

db, err := sql.Open("duckdb", "")
if err != nil {
log.Fatal(err)
}
defer db.Close()

rows, err := db.Query(`SELECT * FROM read_parquet('test.parquet')`)

I got error like below:

2022/05/27 17:33:34 query: Catalog Error: Table Function with name read_parquet does not exist!
Did you mean "read_csv_auto"?
LINE 1: SELECT f1 FROM read_parquet('input.parquet')

Seems like the driver (or duckdb, I'm not an expert in C++) doesn't recognized any function aboutparquet,
such as :

  • read_parquet
  • parquet_scan

I use:

  • go 1.18
  • duckdb 0.3.4
  • go-duckdb@latest

Thanks

Implement Appender

Hi, was looking to use the Appender method described here: https://duckdb.org/docs/data/appender

I figure this could be implemented in Go to use the C++ method?

I would take a stab at it, but am unfamiliar with using C methods in Go. Is this something of considerable lift?

I also noticed a bunch of lines referencing the appender class, here in the duckdb.h file.

windows cgo question

Sorry to ask this here. What is the proper way to set the go env in windows for this package? I've not used cgo and I'm a bit lost and not feeling very smart. I plan on carefully reading more on cgo in the medium term but what I've read so far hasn't helped.

I've downloaded the library libduckdb-windows-amd64 and see the following files:

duckdb.dll
duckdb.h
duckdb.hpp

Thanks for any help or advice.

how to link to duckdb library 0.8.0?

I tried

go clean -cache -modcache

but it looks like I'm still getting duckdb 0.7.1:

could not open database: IO Error: Trying to read a database file with version number 51, but we can only read version 43.

Any hints on getting linked with duckdb 0.8.0 ?

too many arguments in call to _Cfunc_duckdb_nparams

github.com/marcboeker/go-duckdb

../../go/pkg/mod/github.com/marcboeker/[email protected]/statement.go:41:18: too many arguments in call to _Cfunc_duckdb_nparams
have (_Ctype_duckdb_prepared_statement, *_Ctype_ulonglong)
want (_Ctype_duckdb_prepared_statement)
make: *** [build] Error 2

platform ๏ผšmacos
duckdb version๏ผš 0.3.0

Cannot execute multi-statement queries

This didn't take long, but I encountered a bug with the refactored bindings. Seems like prepared statements can't be used for multi-statement queries.

Executing a query that contains multiple statements, such as:

_, err = db.Exec("INSTALL 'json'; LOAD 'json';")
if err != nil {
	fmt.Printf("error: %v", err.Error())
}

Outputs the following error:

error: Invalid Error: Cannot prepare multiple statements at once!

I don't know how other DB drivers deal with this, but one idea is to only use prepared statements for queries that have args? (Not sure if that has its own caveats.)

Use DuckDB amalgamation instead of shared libraries

Hello! First of all, thank you for building this lovely library!

I wanted to ask if you have considered using CGo to build DuckDB from source instead of downloading and linking the DuckDB shared libraries? We cross-compile and distribute our binaries, so needing to bundle a shared library is quite the headache.

DuckDB is available as a single combined C++ source and header file (a so-called "amalgamation"), which should make it easy to bundle together with this repo. That would be similar to what the go-sqlite3 library does (here's a direct link to the amalgamated SQLite code in the repo).

One caveat is that building DuckDB takes a while (as an example, 10 minutes on my 4-year old MacBook Pro). That would only apply when you first install the library, though. After that Go caches the build (for CI/CD, caching the builds also appears to be reasonably easy).

Overall, assuming this works, the pros/cons would be:

  • Pros: gets rid of downloads and build flags (incl. Makefile) and you get a standalone output binary
  • Cons: initial install will take several minutes

If you're interested in this, we would be very happy to develop/test it and submit it as a PR!

Query "select * from table_foo" leads to "unsupported type 9" error

I am using QueryContext API to run a query of the form - "select * from table_foo" and it leads to error always.
Using DuckDB CLI, the same query works fine against the db. I have tried different tables with the same outcome.
The data types in my table are regular string, int and nothing fancy.

I am using DuckDB 0.4.1 and github.com/marcboeker/go-duckdb v0.0.0-20220802072201-273b1311a677
Is this a known issue?

-Thanks.

Running `install https` results in IO Error not found

I'm trying to use the httpfs extension but hitting issues.
I'm on MacOS using the latest duckdb version here ( v1.0.7 ).

Running the following:

	db, err := sql.Open("duckdb", "")
	dir, err := ioutil.TempDir("", "duckdb_home")
	require.NoError(t, err)
	log.Printf("duckdb_home: %s", dir)

	_, err = db.Exec(`
	SET home_directory='` + dir + `';
	INSTALL 'httpfs';`)
	require.NoError(t, err)

I get:

	Error:      	Received unexpected error:
        	            	IO Error: Extension "/โ€ฆ/duckdb_home4021941023/.duckdb/extensions/v0.6.1/osx_amd64/httpfs.duckdb_extension" not found

Possible memory leak when processing parquet files

We have seen that the used memory keeps increasing when we are processing parquet files with the go-duckdb library.
After we have processed a single parquet file and closes the db connection I assume that the used memory is released. However, as seen in the example graph below that is not the case, and for repetitive processing of the same source the memory continues to increase.

Expected behaviour

The memory used for querying a parquet file is released when it is finished.

Actual behaviour

The memory continues to increase and it does not seem to ever release it.
image

Reproduce

I have created a tiny project to reproduce the issue that can be found here: https://github.com/goober/duckdb-memory-leak

Build on MacOS fails with GOOS=linux

im trying to build a small lambda app to be deployed in AWS container.
Im building on MacOS with

CGO_ENABLED=1 GOOS=linux GOARCH=amd64 go build -o af-bdnd-athena-query af-bdnd-athena-query.go

and fails with

# runtime/cgo
linux_syscall.c:67:13: error: call to undeclared function 'setresgid'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
linux_syscall.c:67:13: note: did you mean 'setregid'?
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:593:6: note: 'setregid' declared here
linux_syscall.c:73:13: error: call to undeclared function 'setresuid'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
linux_syscall.c:73:13: note: did you mean 'setreuid'?
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/unistd.h:595:6: note: 'setreuid' declared here

Support extensions when using a connection pool

We want to use the built-in connection pool in database/sql to run concurrent DuckDB queries. So far, we have not done so (disabled by setting db.SetMaxOpenConns(1)) because of two consistency problems:

  1. Changes made in one connection were not immediately available in other connections โ€“ that should be fixed by #61
  2. Extensions need to be loaded on each new connection (they're not shared), but database/sql automatically creates new connections (and to my knowledge, there's no hook to do custom init)

We would like to submit a PR that addresses point 2. I can think of a two different solutions:

  1. Support a custom ?extensions=json,parquet,... config syntax, which the driver intercepts. It will then call INSTALL for each specified extension when the DB is opened, and subsequently call LOAD for each extension when a new connection is created.
  2. Add an init callback option to the connector, which you can pass using sql.OpenDB โ€“ e.g. something like sql.OpenDB(duckdb.NewConnector(dsn, connInitFn)) where connInitFn will be invoked each time a new connection is created for the DB handle.

What do you think of these options โ€“ or do you have a better idea for how we can solve this problem?

As an aside, we have done some benchmarking of using multiple DuckDB connections, and achieved a 2x performance boost for processing 1k queries when using a connection pool of 10 (matching the number of cores on the system) instead of 1.

User column from parquet

Selecting column named user from parquet returns unexpected results.

An example with R.

df <- data.frame(
  user = c("Me", "You"),
  x = 1L
)

arrow::write_parquet(df, "test.parquet")

con <- DBI::dbConnect(duckdb(), ":memory:")

DBI::dbGetQuery(con, "SELECT user FROM read_parquet('test.parquet');")

DBI::dbDisconnect(con)
  main.user()
1      duckdb
2      duckdb

Use ExtractStatement to split query into multiple statements

DuckDB offers a method in the C++ API to split a cmd string into multiple statements. Now the method has been ported to the C API in a first PR. I've started to implement the the duckdb_extracted_statements function.

What should be the expected behaviour if the following code is given?

db.Exec("INSERT INTO foo VALUES (?); INSERT INTO bar VALUES (?);", "hello", "world")

a) Should we extract the single statements and distribute the parameters over the statements. The first statement should receive hello and the second world.

b) Should we provide all arguments to all statements. Statement 1 and 2 would then receive hello and world which would result in an error, as each statement only expects one argument.

c) Should we throw an error if multiple statements are provided along with one or more arguments like it is implemented at the moment?

My personal opinion would be c or maybe a. But a feels counterintuitive and contains too much magic. b seems totally weird to me.

Another thing would be, that only the last statement would return a result, which also seems confusing. Have you ever had the use-case with multiple SELECT statements in one query string in your application?

What do you think @begelundmuller ?

Thanks!

Linux Build Fails

Me again but this time I'm using latest Fedora.

Here is the simplified code taken from the basic example:

package main

import (
	"database/sql"
	"log"

	_ "github.com/marcboeker/go-duckdb"
)

var (
	db *sql.DB
)

func main() {
	// Use second argument to store DB on disk
	// db, err := sql.Open("duckdb", "foobar.db")

	var err error
	db, err = sql.Open("duckdb", "")
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	check(db.Ping())

}

func check(args ...interface{}) {
	err := args[len(args)-1]
	if err != nil {
		panic(err)
	}
}

I have the follow files:

$ ls duckdb/
duckdb.h  duckdb.hpp  libduckdb-linux-amd64.zip  libduckdb.so

Then I try and get the following message. I don't use cgo much so I am likely doing something wrong.:

$ export CGO_CFLAGS="-I/Projects/duckduckgo/duckdb/"
$ export CGO_LDFLAGS="-L/Projects/duckduckgo/duckdb/"
$ go run main.go
# github.com/marcboeker/go-duckdb
../../go/pkg/mod/github.com/marcboeker/[email protected]/statement.go:41:32: cannot use &pc (type *_Ctype_ulonglong) as type *_Ctype_ulong in assignment
../../go/pkg/mod/github.com/marcboeker/[email protected]/statement.go:53:45: cannot use _Ctype_ulonglong(i + 1) (type _Ctype_ulonglong) as type _Ctype_ulong in assignment
../../go/pkg/mod/github.com/marcboeker/[email protected]/statement.go:58:46: cannot use _Ctype_ulonglong(i + 1) (type _Ctype_ulonglong) as type _Ctype_ulong in assignment
../../go/pkg/mod/github.com/marcboeker/[email protected]/statement.go:63:46: cannot use _Ctype_ulonglong(i + 1) (type _Ctype_ulonglong) as type _Ctype_ulong in assignment
../../go/pkg/mod/github.com/marcboeker/[email protected]/statement.go:68:46: cannot use _Ctype_ulonglong(i + 1) (type _Ctype_ulonglong) as type _Ctype_ulong in assignment
../../go/pkg/mod/github.com/marcboeker/[email protected]/statement.go:68:96: cannot use _Ctype_longlong(v) (type _Ctype_longlong) as type _Ctype_long in assignment
../../go/pkg/mod/github.com/marcboeker/[email protected]/statement.go:73:47: cannot use _Ctype_ulonglong(i + 1) (type _Ctype_ulonglong) as type _Ctype_ulong in assignment
../../go/pkg/mod/github.com/marcboeker/[email protected]/statement.go:78:48: cannot use _Ctype_ulonglong(i + 1) (type _Ctype_ulonglong) as type _Ctype_ulong in assignment
../../go/pkg/mod/github.com/marcboeker/[email protected]/statement.go:86:48: cannot use _Ctype_ulonglong(i + 1) (type _Ctype_ulonglong) as type _Ctype_ulong in assignment

Use math/big for HugeInt and time.Duration for interval

I'm running into some issues when serializing HugeInts. When I looked at the code, I also saw that INTERVALs are returned as DuckDB types.

Are you interested in a PR from me that moves to types from the standard library? I would replace HugeInt with math/big.Int and Interval with time.Duration.

Getting 403 while doing concurrent queries in S3 parquet files

I am trying to query parquet file in S3 in Golang using this library. I have following setup:

  1. First I install and load necessary extensions postgres_scanner and httpfs and configure S3 creds: s3_region, s3_secret_access_id, s3_secret_access_key and s3_session_token

  2. Then I copy selected tables from a Postgres DB and write individual tables to S3 as parquet files. No issues here.

COPY (SELECT * FROM POSTGRES_SCAN('tablename', 'public', 'pgConnUrl')) TO 's3://bucket/tablename.parquet'
  1. Then I create views linking the parquet files in S3. No issues here as well.
CREATE OR REPLACE VIEW <table_1> AS SELECT * FROM read_parquet('s3://bucket/table_1.parquet');
CREATE OR REPLACE VIEW <table_2> AS SELECT * FROM read_parquet('s3://bucket/table_2.parquet');
  1. Now, I query the data joining multiple tables(4-5).
SELECT * 
FROM table_1 t1 
INNER JOIN table_2 t2 on t2.user_id = t1.id

This works fine when I perform single query. But, when I try two or more queries concurrently, I get this error:

# bucket name redacted
Invalid Error: Unable to connect to URL "s3://<bucket>/file.parquet": 403 (Forbidden)
  • I am pretty sure it is not related to AWS credentials expiring or not having access because it works when I perform a single query.
  • Also, there was no error when I tried querying same bucket from duckdb cli opening 4 terminal tabs concurrently.

Overall, It looks like rate limiting error but the 403 status code is confusing.

Is there there is any httpfs config to play around? Is there any difference in the way this library breaks down queries vs duckdb cli or other language libraries? What is the best way to debug this?

Invalid blob data for particular column

Hi,
I have a data source which loads and queries fine on duckdb CLI and the nodejs client : https://drive.google.com/file/d/1HaWhVs3Wy5_kFEBE41w9ewLyeTAjLSnC/view?usp=sharing

When i import using this binding go panics with address violation or invalid address. Here is the code I am using,

package main

import (
  "database/sql"
  "fmt"
  _ "github.com/marcboeker/go-duckdb"
)

func ParseRows(rows *sql.Rows) {
  columns, err := rows.Columns()
  if err != nil {
    fmt.Println(err.Error())
    return
  }

  count := len(columns)
  values := make([]interface{}, count)
  scanArgs := make([]interface{}, count)
  for i := range values {
    scanArgs[i] = &values[i]
  }

  for rows.Next() {
    err = rows.Scan(scanArgs...)
    if err != nil {
      fmt.Printf("Scan Error: %s\n", err)
    } else {
      fmt.Println(values)
    }
  }
  err = rows.Err()
  if err != nil {
    fmt.Printf("Error: %s\n", err)
  }
  defer rows.Close()
}

const TableName = "nyc311_reduced"
const SourceFile = "nyc311-reduced.parquet"

func main() {
  db, err := sql.Open("duckdb", "stage.db?access_mode=READ_WRITE")
  if err != nil {
    fmt.Println(err)
    return
  }

  res, err := db.Query("DROP TABLE " + TableName)
  res, err = db.Query("CREATE TABLE " + TableName + " AS SELECT * FROM '" + SourceFile + "'")
  if err != nil {
    fmt.Println(err)
  }
  res, err = db.Query("PRAGMA table_info(" + TableName + ")")
  if err != nil {
    fmt.Println(err)
    return
  }
  ParseRows(res)
}

The following applies to the PRAGMA query,

  1. Panic happens for row index 31 (column 31 in the parquet).
  2. It is always for the dflt_value column of the PRAGMA output.
  3. When there is no panic PRAGMA returns garbage value for its default value.

The length as seen in row.scanBlob for this is sometime negative. Entries in s.prefix are all 0 sometimes. Feels like there is some cursor movement issue. Couldn't pinpoint if duckdb is passing it incorrectly or if there is an issue with cursor movement.

Potentially dumb question on compiling under Windows

I'm trying to build a project using go-duckdb under Win11 with Mingw64 (myself rather an OSX user) and experiencing the following:

$ go version
go version go1.19.7 windows/386

$ CGO_ENABLED=1 CGO_LDFLAGS="-L/d/Projects/duckdb/build/release/src" go build -v -tags=duckdb_use_lib -o bin/sample.exe main.go
github.com/marcboeker/go-duckdb
# github.com/marcboeker/go-duckdb
vendor\github.com\marcboeker\go-duckdb\rows.go:275:11: array length 1 << 31 (untyped int constant 2147483648) must be integer
vendor\github.com\marcboeker\go-duckdb\rows.go:381:44: cannot use value (variable of type _Ctype_struct___7) as type [4]byte in struct literal
vendor\github.com\marcboeker\go-duckdb\rows.go:381:49: too few values in _Ctype_struct___6{โ€ฆ}

Same issue when trying to build in PS1 shell:

PS D:\Projects\app> $env:CGO_ENABLED=1
PS D:\Projects\app> $env:CGO_LDFLAGS="-LD:\Projects\app\lib\libduckdb-windows-amd64"
PS D:\Projects\app> go build -v -tags=duckdb_use_lib -o bin\app.exe .\main.go
github.com/marcboeker/go-duckdb
# github.com/marcboeker/go-duckdb
vendor\github.com\marcboeker\go-duckdb\rows.go:275:11: array length 1 << 31 (untyped int constant 2147483648) must be integer
vendor\github.com\marcboeker\go-duckdb\rows.go:381:44: cannot use value (variable of type _Ctype_struct___7) as type [4]byte in struct literal
vendor\github.com\marcboeker\go-duckdb\rows.go:381:49: too few values in _Ctype_struct___6{โ€ฆ}

Any guidelines/help welcome!

You cannot compile when using cross compilation

CGO_ENABLED=1 GOOS=linux GOARCH=arm64 CXX=/media/sd1data/opt/rockchip_rk3568/host/bin/aarch64-linux-g++ go build -o shmViewV2_main main.go

# runtime/cgo gcc_arm64.S: Assembler messages: gcc_arm64.S:28: Error: no such instruction: stp x29,x30,[sp,'
gcc_arm64.S:32: Error: too many memory references for mov' gcc_arm64.S:34: Error: no such instruction: stp x19,x20,[sp,'
gcc_arm64.S:37: Error: no such instruction: stp x21,x22,[sp,' gcc_arm64.S:40: Error: no such instruction: stp x23,x24,[sp,'
gcc_arm64.S:43: Error: no such instruction: stp x25,x26,[sp,' gcc_arm64.S:46: Error: no such instruction: stp x27,x28,[sp,'
gcc_arm64.S:50: Error: too many memory references for mov' gcc_arm64.S:51: Error: too many memory references for mov'
gcc_arm64.S:52: Error: too many memory references for mov' gcc_arm64.S:54: Error: no such instruction: blr x20'
gcc_arm64.S:55: Error: no such instruction: blr x19' gcc_arm64.S:57: Error: no such instruction: ldp x27,x28,[sp,'
gcc_arm64.S:60: Error: no such instruction: ldp x25,x26,[sp,' gcc_arm64.S:63: Error: no such instruction: ldp x23,x24,[sp,'
gcc_arm64.S:66: Error: no such instruction: ldp x21,x22,[sp,' gcc_arm64.S:69: Error: no such instruction: ldp x19,x20,[sp,'
gcc_arm64.S:72: Error: no such instruction: ldp x29,x30,[sp],'

Support configuring duckdb

Issue

Unable to set configuration attributes on duckdb

Context

The duckdb C documentation shows configuring various settings:

duckdb_set_config(config, "access_mode", "READ_WRITE"); // or READ_ONLY
duckdb_set_config(config, "threads", "8");
duckdb_set_config(config, "max_memory", "8GB");
duckdb_set_config(config, "default_order", "DESC");

settings like access_mode affect query concurrency, so it would be nice to be able to set these. I'm unsure how to integrate these into the database/sql interface in golang, but willing to help with some guidance.

Windows compiling error with "_access_s" error

On my local windows environment, compiling any simple test go program with TDM-GCC-64 compiler will get error like the following. I did not get any similar error from other after searching. Could you please have it a look?

github.com/marcboeker/go-duckdb

duckdb.cpp: In function 'bool duckdb_httplib::detail::is_file(const string&)':
duckdb.cpp:160873:34: error: '_access_s' was not declared in this scope
return _access_s(path.c_str(), 0) == 0;
^

cc1plus.exe: out of memory allocating 65536 bytes

Passing a map as an argument to a query

Thank you for creating this library (and just merging #40 for static builds)!

I created a table with a column with a map datatype and when I attempt to insert a record using the Map type I am getting the error:

unsupported type duckdb.Map, a map

I noticed this line has a TODO. Is this for supporting additional data types as passed arguments to queries? Is it a matter at filling out more of these cases?

Segmentation fault when using `INSERT OR REPLACE INTO`

Thank you for building this library, it's been great so far.

I ran into an error with INSERT OR REPLACE INTO queries, as the following fairly minimal reproduction illustrates:

package main

import (
	"database/sql"

	_ "github.com/marcboeker/go-duckdb"
)

func main() {
	db, err := sql.Open("duckdb", "")
	if err != nil {
		panic(err)
	}
	_, err = db.Exec("CREATE TABLE foo (id int PRIMARY KEY, b text);")
	if err != nil {
		panic(err)
	}
	_, err = db.Exec("INSERT OR REPLACE INTO foo VALUES (?, ?);", 1, "b")
	if err != nil {
		panic(err)
	}
	err = db.Close()
	if err != nil {
		panic(err)
	}
}

This exits with a fatal error on the INSERT OR REPLACE INTO line (which I'm truncating slightly for brevity):

fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x2 addr=0x8 pc=0x1052ed7b0]

runtime stack:
runtime.throw({0x1060b1960?, 0xaaaaaaaaaaaaaaab?})
        /opt/homebrew/Cellar/go/1.20.1/libexec/src/runtime/panic.go:1047 +0x40 fp=0x16b0623c0 sp=0x16b062390 pc=0x104dd1c50
runtime.sigpanic()
        /opt/homebrew/Cellar/go/1.20.1/libexec/src/runtime/signal_unix.go:821 +0x240 fp=0x16b062400 sp=0x16b0623c0 pc=0x104de7bc0

goroutine 1 [syscall]:
runtime.cgocall(0x104e58908, 0x14000058778)
        /opt/homebrew/Cellar/go/1.20.1/libexec/src/runtime/cgocall.go:157 +0x50 fp=0x14000058740 sp=0x14000058700 pc=0x104da4310
github.com/marcboeker/go-duckdb._Cfunc_duckdb_pending_prepared(0x6000027d42e0, 0x140000b2050)
        _cgo_gotypes.go:1319 +0x34 fp=0x14000058770 sp=0x14000058740 pc=0x104e52754
github.com/marcboeker/go-duckdb.(*stmt).execute.func1(0x140000587f8?, 0x140000960a0?)
        /Users/hermanschaaf/go/pkg/mod/github.com/marcboeker/[email protected]/statement.go:198 +0x78 fp=0x140000587b0 sp=0x14000058770 pc=0x104e56c78
github.com/marcboeker/go-duckdb.(*stmt).execute(0x140000aa108, {0x1062c7468, 0x140000b0000}, {0x140000960a0?, 0x1062aa380?, 0x1062c6ab0?})
        /Users/hermanschaaf/go/pkg/mod/github.com/marcboeker/[email protected]/statement.go:198 +0x84 fp=0x14000058860 sp=0x140000587b0 pc=0x104e563a4
github.com/marcboeker/go-duckdb.(*stmt).ExecContext(0x14000090130?, {0x1062c7468?, 0x140000b0000?}, {0x140000960a0?, 0x14000058938?, 0x1?})
        /Users/hermanschaaf/go/pkg/mod/github.com/marcboeker/[email protected]/statement.go:158 +0x38 fp=0x140000588e0 sp=0x14000058860 pc=0x104e560c8
github.com/marcboeker/go-duckdb.(*conn).ExecContext(0x140000589c8?, {0x1062c7468, 0x140000b0000}, {0x1060b145b?, 0x140000960a0?}, {0x140000960a0, 0x2, 0x2})
        /Users/hermanschaaf/go/pkg/mod/github.com/marcboeker/[email protected]/connection.go:47 +0xd0 fp=0x14000058960 sp=0x140000588e0 pc=0x104e52cf0
database/sql.ctxDriverExec({0x1062c7468?, 0x140000b0000?}, {0x1062c7028?, 0x14000090130?}, {0x0?, 0x0?}, {0x1060b145b?, 0x1062bd868?}, {0x140000960a0?, 0x104da9fc0?, ...})
        /opt/homebrew/Cellar/go/1.20.1/libexec/src/database/sql/ctxutil.go:31 +0xac fp=0x140000589f0 sp=0x14000058960 pc=0x104e358dc
database/sql.(*DB).execDC.func2()
        /opt/homebrew/Cellar/go/1.20.1/libexec/src/database/sql/sql.go:1675 +0x12c fp=0x14000058ac0 sp=0x140000589f0 pc=0x104e3a64c
database/sql.withLock({0x1062c72b0, 0x140000c4090}, 0x14000058c78)
        /opt/homebrew/Cellar/go/1.20.1/libexec/src/database/sql/sql.go:3405 +0x7c fp=0x14000058b00 sp=0x14000058ac0 pc=0x104e3aa4c
database/sql.(*DB).execDC(0x140000acb60?, {0x1062c7468, 0x140000b0000}, 0x140000c4090, 0x10639b860?, {0x1060b145b, 0x29}, {0x14000058f48, 0x2, 0x2})
        /opt/homebrew/Cellar/go/1.20.1/libexec/src/database/sql/sql.go:1670 +0x19c fp=0x14000058d10 sp=0x14000058b00 pc=0x104e3a13c
database/sql.(*DB).exec(0x14000058da8?, {0x1062c7468, 0x140000b0000}, {0x1060b145b, 0x29}, {0x14000058f48, 0x2, 0x2}, 0xa0?)
        /opt/homebrew/Cellar/go/1.20.1/libexec/src/database/sql/sql.go:1655 +0xa0 fp=0x14000058d90 sp=0x14000058d10 pc=0x104e39f40
database/sql.(*DB).ExecContext.func1(0x38?)
        /opt/homebrew/Cellar/go/1.20.1/libexec/src/database/sql/sql.go:1634 +0x54 fp=0x14000058e00 sp=0x14000058d90 pc=0x104e39e14
database/sql.(*DB).retry(0x14000058ee8?, 0x14000058e90)
        /opt/homebrew/Cellar/go/1.20.1/libexec/src/database/sql/sql.go:1538 +0x4c fp=0x14000058e50 sp=0x14000058e00 pc=0x104e39c2c
database/sql.(*DB).ExecContext(0x1060aabba?, {0x1062c7468?, 0x140000b0000?}, {0x1060b145b?, 0x1400004a728?}, {0x14000058f48?, 0x1400004a768?, 0x104dd41dc?})
        /opt/homebrew/Cellar/go/1.20.1/libexec/src/database/sql/sql.go:1633 +0x94 fp=0x14000058ef0 sp=0x14000058e50 pc=0x104e39d34
database/sql.(*DB).Exec(...)
        /opt/homebrew/Cellar/go/1.20.1/libexec/src/database/sql/sql.go:1647
main.main()
        /Users/hermanschaaf/code/cloudquery/cloudquery/plugins/destination/duckdb/bugs/test.go:18 +0xe4 fp=0x14000058f70 sp=0x14000058ef0 pc=0x104e57194
runtime.main()

If I change the query to a normal INSERT INTO, it works fine, so it seems specific to INSERT OR REPLACE INTO. I'm using github.com/marcboeker/go-duckdb v1.2.0.

I haven't dug into the internals of this library yet, so any pointers would be appreciated.

Reading rows of strings/byte arrays panics after second call

Morning Marc,

just did some more testing, and I found a panic for the string/blob type. When I query multiple rows and want to retrieve them, the second call will fail in a panic. Works fine with integers. Updated code below.

package main

import (
	"database/sql"
	"fmt"
	"log"
	"math/rand"
	"os"
	"path"
	"reflect"
	"time"

	_ "github.com/marcboeker/go-duckdb"

	"github.com/segmentio/parquet-go"
	"github.com/segmentio/parquet-go/compress"
	"github.com/segmentio/parquet-go/encoding"
	"github.com/segmentio/parquet-go/encoding/plain"
)

type (
	Schema struct {
		fields []parquet.Field
	}

	Field struct {
		name string
		typ  parquet.Type
	}
)

// For leaf nodes, returns the type of values of the parquet column.
//
// Calling this method on non-leaf nodes will panic.
func (s Schema) Type() parquet.Type {
	// TODO:
	return parquet.Group{}.Type()
}

func (s Schema) Optional() bool {
	return false
}

func (s Schema) Repeated() bool {
	return false
}

func (s Schema) Required() bool {
	return true
}

func (s Schema) Leaf() bool {
	return false
}

func (s Schema) Fields() []parquet.Field {
	return s.fields
}

func (s Schema) Encoding() encoding.Encoding {
	return &plain.Encoding{}
}

func (s Schema) Compression() compress.Codec {
	return nil
}

func (s Schema) String() string {
	return fmt.Sprintf("%+v", s.fields)
}

func (s Schema) GoType() reflect.Type {
	panic("not supported")
}

func (f Field) String() string {
	return fmt.Sprintf("%s: %v", f.name, f.typ)
}

func (f Field) Type() parquet.Type {
	return f.typ
}

func (f Field) Optional() bool {
	return false
}

func (f Field) Repeated() bool {
	return false
}

func (f Field) Required() bool {
	return false
}

func (f Field) Leaf() bool {
	return true
}

func (f Field) Fields() []parquet.Field {
	return nil
}

func (f Field) Encoding() encoding.Encoding {
	return &plain.Encoding{}
}

func (f Field) Compression() compress.Codec {
	return nil
}

func (f Field) GoType() reflect.Type {
	panic("not supported")
}

func (f Field) Name() string {
	return f.name
}

func (f Field) Value(base reflect.Value) reflect.Value {
	panic("not supported")
}

const letterBytes = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

func RandStringBytes(n int) string {
	b := make([]byte, n)
	for i := range b {
		b[i] = letterBytes[rand.Intn(len(letterBytes))]
	}
	return string(b)
}

func RandBool() bool {
	rand.Seed(time.Now().UnixNano())
	return rand.Intn(2) == 1
}

func main() {
	tempDir, _ := os.MkdirTemp("/tmp", "writer_test")
	parquetFileNameDataTable := path.Join(tempDir, "test.parquet")
	log.Print(parquetFileNameDataTable)
	file, err := os.Create(parquetFileNameDataTable)
	if err != nil {
		log.Fatal(err)
	}

	fields := make([]parquet.Field, 4)
	fields[0] = Field{
		name: "int_field",
		typ:  parquet.Int64Type,
	}
	fields[1] = Field{
		name: "string_field",
		typ:  parquet.ByteArrayType,
	}
	fields[2] = Field{
		name: "double_field",
		typ:  parquet.DoubleType,
	}
	fields[3] = Field{
		name: "bool_field",
		typ:  parquet.BooleanType,
	}

	schema := parquet.NewSchema("", Schema{fields: fields})

	writer := parquet.NewWriter(file, &parquet.WriterConfig{
		Schema: schema,
	})

	// Boolean           Kind = Kind(format.Boolean)
	// Int32             Kind = Kind(format.Int32)
	// Int64             Kind = Kind(format.Int64)
	// Int96             Kind = Kind(format.Int96)
	// Float             Kind = Kind(format.Float)
	// Double            Kind = Kind(format.Double)
	// ByteArray         Kind = Kind(format.ByteArray)
	// FixedLenByteArray Kind = Kind(format.FixedLenByteArray)

	row := make(parquet.Row, 4)

	var val0 int64
	var val1 string
	var val2 float64
	var val3 bool

	for i := 1; i < 5; i++ {

		val0 = rand.Int63()
		val1 = RandStringBytes(10)
		val2 = rand.Float64()
		val3 = RandBool()
		log.Print(val0, ", ", []byte(val1), ", ", val2, ", ", val3)
		row[0] = parquet.ValueOf(val0).Level(0, 0, 0)
		row[1] = parquet.ValueOf([]byte(val1)).Level(0, 0, 1)
		row[2] = parquet.ValueOf(val2).Level(0, 0, 2)
		row[3] = parquet.ValueOf(val3).Level(0, 0, 3)
		rows := []parquet.Row{row}
		// writer header
		_, err = writer.WriteRows(rows)
		if err != nil {
			log.Fatal(err)
		}
	}

	if err = writer.Flush(); err != nil {
		log.Fatal(err)
	}
	if err = writer.Close(); err != nil {
		log.Fatal(err)
	}

	db, err := sql.Open("duckdb", "")
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	// works okay
	rows, err := db.Query(fmt.Sprintf("SELECT int_field FROM  read_parquet('%s')", parquetFileNameDataTable))
	if err != nil {
		log.Fatal(err)
	}
	log.Print("*********************************************")
	for rows.Next() {
		err = rows.Scan(&val0)
		if err != nil {
			log.Fatal(err)
		}
		log.Print(val0)
	}
	rows.Close()

	// does  not work for more than 1 call. Second call throws a panic
	rows, err = db.Query(fmt.Sprintf("SELECT string_field FROM  read_parquet('%s')", parquetFileNameDataTable))
	if err != nil {
		log.Fatal(err)
	}
	log.Print("*********************************************")
	for rows.Next() {
		var tmpVal1 []byte
		err = rows.Scan(&tmpVal1)
		if err != nil {
			log.Fatal(err)
		}
		log.Print(tmpVal1)
	}
	rows.Close()

	if err != nil {
		log.Fatal(err)
	}
}

Problem running go get according to instructions on macOS monterey, with m1 chip

Failing to install this package. Perhaps someone can guide me how to get up and running?

Installed duckdb and verified that it works.

Get a different error when passing in the environment variables, guessing it's because version mismatch? Using latest (Mar 4, 2022) commit from duckdb master branch 8d8a8d16dee075984c847d9e81cd0e2d7539f1c7.

Thanks!

>> go get github.com/marcboeker/go-duckdb
# github.com/marcboeker/go-duckdb
../../go/pkg/mod/github.com/marcboeker/[email protected]/connection.go:4:10: fatal error: 'duckdb.h' file not found
#include <duckdb.h>
         ^~~~~~~~~~
1 error generated.
>> CGO_LDFLAGS="-L/Users/christianpersson/repos/duckdb/build/release/src" CGO_CFLAGS="-I/Users/christianpersson/repos/duckdb/src/include" DYLD_LIBRARY_PATH="/Users/christianpersson/repos/duckdb/build/release/src" go get github.com/marcboeker/go-duckdb

# github.com/marcboeker/go-duckdb
../../go/pkg/mod/github.com/marcboeker/[email protected]/connection.go:105:40: res.error_message undefined (type _Ctype_struct___0 has no field or method error_message)
../../go/pkg/mod/github.com/marcboeker/[email protected]/rows.go:28:34: r.r.column_count undefined (type *_Ctype_struct___0 has no field or method column_count)
../../go/pkg/mod/github.com/marcboeker/[email protected]/rows.go:30:25: c.name undefined (type _Ctype_struct___1 has no field or method name)
../../go/pkg/mod/github.com/marcboeker/[email protected]/rows.go:41:26: r.r.row_count undefined (type *_Ctype_struct___0 has no field or method row_count)
../../go/pkg/mod/github.com/marcboeker/[email protected]/rows.go:47:25: r.r.column_count undefined (type *_Ctype_struct___0 has no field or method column_count)
../../go/pkg/mod/github.com/marcboeker/[email protected]/rows.go:50:13: col._type undefined (type _Ctype_struct___1 has no field or method _type)
../../go/pkg/mod/github.com/marcboeker/[email protected]/rows.go:54:48: col.data undefined (type _Ctype_struct___1 has no field or method data)
../../go/pkg/mod/github.com/marcboeker/[email protected]/rows.go:56:48: col.data undefined (type _Ctype_struct___1 has no field or method data)
../../go/pkg/mod/github.com/marcboeker/[email protected]/rows.go:58:49: col.data undefined (type _Ctype_struct___1 has no field or method data)
../../go/pkg/mod/github.com/marcboeker/[email protected]/rows.go:60:49: col.data undefined (type _Ctype_struct___1 has no field or method data)
../../go/pkg/mod/github.com/marcboeker/[email protected]/rows.go:60:49: too many errors

field undefined

../../go/pkg/mod/github.com/marcboeker/[email protected]/rows.go:70:12: val.year undefined (type _Ctype_struct___3 has no field or method year)
../../go/pkg/mod/github.com/marcboeker/[email protected]/rows.go:71:19: val.month undefined (type _Ctype_struct___3 has no field or method month)
../../go/pkg/mod/github.com/marcboeker/[email protected]/rows.go:72:12: val.day undefined (type _Ctype_struct___3 has no field or method day)
../../go/pkg/mod/github.com/marcboeker/[email protected]/statement.go:41:18: too many arguments in call to _Cfunc_duckdb_nparams

duckdb version: 0.3.0

windows build exit status 0xc0000135

I'm trying to build the example.go file and am seeing exit status 0xc0000135. I'm running Windows 10. I'm using the available duckdb 0.3.4 dlls and header file and not building these myself. I put the dll here C:\ProgramData\chocolatey\lib\mingw\tools\install\mingw64\lib and the header files in the go-duckdb library. I believe this means a dll is missing. I tried installing dotnet which didn't resolve the issue. Any thoughts on what might be missing?

possible to compile from source and link the json or other extension statically?

Thanks for the golang binding for duckdb.
I have a question related to build the duckdb from source and include the json plugin, so that when I run it docker container, it do not need download the extension from internet.

originally, I use the below line to statically build , it success,
RUN CGO_ENABLED=1 go build -a -ldflags '-extldflags "-static" -w -s' -o app ./cmd

but when run it in docker container, and use json plugin, it will try to download the json extension into the ~/.duckdb directory. it will takes a long time to download , also, it does not work.

so is it possible to link statically the json plugin?
in the duckdb cmake file, it does have the following options, I just do not know how to pass it to customize duckdb build using cgo and link the plugin statically? Is this possible?

option(BUILD_JSON_EXTENSION "Build the JSON extension." FALSE)
option(EXTENSION_STATIC_BUILD
"Extension build linking statically with DuckDB. Required for building linux loadable extensions."
FALSE)

heap corruption (windows) at check(db.Ping())

In regular Windows 10, I get exit status 0xc0000374 (heap corruption) at the line check(db.Ping()) when it panics. Running the example program.

Go 1.18.2, duckdb 0.3.4

Running with:

$env:LIB="duckdb.dll"
$env:CGO_LDFLAGS = "-LC:\Users\me\golang\duckdb_testing"
$env:CGO_CFLAGS = "-IC:\Users\me\golang\duckdb_testing"
go run -ldflags="-r C:\Users\me\golang\duckdb_testing" main.go

Question: Provided duckdb binary references shared libraries (libstdc++.so.6, libm.so.6 etc)

First off, thanks so much for the project, it works very well already.

I encountered a problem with the provided duckdb binary (linux x64) referencing shared libraries and the binary crashing if those are not there.
I'm not too familiar with c++ build chains / how static libraries are supposed to behave, but right now, the duckdb binary requires certain shared libraries to be available like libstdc++.so.6, libm.so.6, libgcc_s.so.1, glibc etc. Is that expected behavior? I would have expected a static library to have no dependencies, else it should be written somewhere that the OS should have those. The problem was mostly encountered in our deployment image, as this is very minimal without any build chains etc and there it would complain about those missing libraries.

I can make an example repo with a Docker build that exhibits this behavior if needed.

DuckDB query hangs on deeply nested query when scheduled on a non-main Go thread

I've been debugging a bug where DuckDB queries hang/freeze when a) the query has a deeply nested structure, and b) Go schedules the call on a non-main thread.

Initially, I mistakenly thought I had replicated the bug purely in C and filed an issue in DuckDB: duckdb/duckdb#4802. However, it turns out that it doesn't replicate the actual freeze (though it probably replicates the root cause, which is a stack overflow). After much debugging, I now believe the bug is not related to DuckDB at all, but is rather a bug in how Go handles stack overflows in cgo calls. I have filed a bug report here: golang/go#55919.

If I'm correct, then this bug isn't related to go-duckdb, but since it sits at the intersection of Go and DuckDB, I want to share it here for reference. Depending on feedback in the Go issue, we can assess how to implement a workaround. The simplest solution (suggested in the DuckDB issue) is to limit the max_expression_depth to a value unlikely to cause a stack overflow.

The following test case reproduces the bug (at least on macOS):

func TestDeepQuery(t *testing.T) {
	t.Parallel()
	db := openDB(t)
	defer db.Close()

	wg := sync.WaitGroup{}
	for i := 0; i < 2; i++ {
		wg.Add(1)
		go func() {
			defer wg.Done()
			rows, err := db.Query(`
				SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1
				UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1 UNION SELECT 1;`)
			require.NoError(t, err)
			defer rows.Close()
			require.True(t, rows.Next())
		}()
	}

	wg.Wait()
}

Unique BLOB not supported?

In v1.4.1, I get this error:

Not implemented Error: Indexes cannot contain BLOBs that contain null-terminated bytes.
when trying to insert a random blob (containing one or more zero bytes) into a UNIQUE blob column into this table schema:

CREATE SEQUENCE IF NOT EXISTS seq_hashes START 1;
CREATE TABLE IF NOT EXISTS "Hashes" (
    "id" bigint PRIMARY KEY DEFAULT nextval('seq_hashes'),
    "hash" bytea NOT NULL UNIQUE
);
  • Removing the UNIQUE keyword lets me insert the rows. Checked it to see if the error message is legit and it is.
  • The concept of null-termination doesn't make any sense for BLOBs, so the error message is misleading, wrongly suggesting that I intend to misuse the BLOB datatype, but I want to use it for the exact reason it is made for, to represent raw bytes of any size.
  • My natural assumption was that unimplemented SQL statemements must fail early, that is, the above table creation should not be allowed to execute, meaning that I would like to see "Not implemented Error" thrown upon table creation immediately.

Is it possible to have a clear roadmap TODO list of the missing features from basic commands like UNIQUE, PRIMARY KEY, REFERENCES, and postgres datatypes not being supported fully in some operations?

Thanks for having this cool project, thrilled to see it develop to the maturity level of sqlite3!

bug: panic with Query a table that does not exist

panic: database/sql/driver: misuse of duckdb driver: Close with active Rows [recovered]
	panic: database/sql/driver: misuse of duckdb driver: Close with active Rows

goroutine 25 [running]:
testing.tRunner.func1.2({0xa00120, 0xba5400})
	/opt/golang/go1.17.2/src/testing/testing.go:1209 +0x24e
testing.tRunner.func1()
	/opt/golang/go1.17.2/src/testing/testing.go:1212 +0x218
panic({0xa00120, 0xba5400})
	/opt/golang/go1.17.2/src/runtime/panic.go:1038 +0x215
github.com/marcboeker/go-duckdb.(*stmt).Close(0xc000220480)
	/go/pkg/mod/github.com/marcboeker/[email protected]/statement.go:25 +0x5f
database/sql.(*driverStmt).Close(0xc00028a340)
	/opt/golang/go1.17.2/src/database/sql/sql.go:678 +0x9e
database/sql.(*driverConn).finalClose(0xc000220480)
	/opt/golang/go1.17.2/src/database/sql/sql.go:642 +0xbc
database/sql.(*DB).Close(0xc0002900d0)
	/opt/golang/go1.17.2/src/database/sql/sql.go:904 +0x27e

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.