Giter Site home page Giter Site logo

Comments (17)

marcboeker avatar marcboeker commented on July 24, 2024

Hi Tom,

thanks for the report. Could you please provide a sample repo to reproduce the problem. And could you please provide additional information about your build system/architecture that you are using.

Thanks
Marc

from go-duckdb.

chrisirhc avatar chrisirhc commented on July 24, 2024

I'm seeing the following build errors when trying to use this library:

external/go_sdk/pkg/tool/linux_amd64/link: running /usr/bin/gcc failed: exit status 1
external/com_github_marcboeker_go_duckdb/deps/linux_amd64/libduckdb.a(duckdb.o):duckdb.cpp:function duckdb::QueryProfiler::ToJSON[abi:cxx11]() const: error: undefined reference to 'std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream()'
external/com_github_marcboeker_go_duckdb/deps/linux_amd64/libduckdb.a(duckdb.o):duckdb.cpp:function duckdb::TableCatalogEntry::ToSQL[abi:cxx11](): error: undefined reference to 'std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream()'
external/com_github_marcboeker_go_duckdb/deps/linux_amd64/libduckdb.a(duckdb.o):duckdb.cpp:function duckdb::PhysicalExport::GetData(duckdb::ExecutionContext&, duckdb::DataChunk&, duckdb::GlobalSourceState&, duckdb::LocalSourceState&) const: error:
undefined reference to 'std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream()'
external/com_github_marcboeker_go_duckdb/deps/linux_amd64/libduckdb.a(duckdb.o):duckdb.cpp:function duckdb::PhysicalExport::GetData(duckdb::ExecutionContext&, duckdb::DataChunk&, duckdb::GlobalSourceState&, duckdb::LocalSourceState&) const: error:
undefined reference to 'std::__cxx11::basic_stringstream<char, std::char_traits<char>, std::allocator<char> >::basic_stringstream()'
collect2: error: ld returned 1 exit status

These errors go away when I:

  1. Clone this repo
  2. Run make deps.linux.amd64
  3. Copy and use local build of deps/linux_amd64/libduckdb.a

I'm not sure if it's due to the way these builds are being built.

from go-duckdb.

marcboeker avatar marcboeker commented on July 24, 2024

We've seen this problem when switching the underlying build OS from Ubuntu 20.04 to 22.04. It seems, that 22.04 ships with a different std library. What happens if you take the libduckdb.a from the master branch? Does this work?
What is your build OS?

from go-duckdb.

chrisirhc avatar chrisirhc commented on July 24, 2024

I'm on Debian GNU/Linux 10 (buster). Taking libduckdb.a from the master branch (ba7f5a8) has the same issue. I believe this wasn't due to a recent change, so it's unrelated to fix in ba7f5a8.

from go-duckdb.

chrisirhc avatar chrisirhc commented on July 24, 2024

I'm wondering if these compile issues on linux_amd64 are due to the current build not being hermetic. I'm not an expert of c++ toolchains, but this article mentions some properties of hermetric toolchains which lead me to believe that the issues I'm (we're) seeing (#58, ba7f5a8) might be due to relying on the C++ toolchain of the host running the GitHub action.

One suggestion I have is to consider using zig c++ to cross-compile for all targets. This also allows you to run all the build actions on one machine and simplify your Github action. See envoyproxy/envoy#19535 (comment) for zig's use on envoyproxy.

Let me know your thoughts. The change seems simple, and likely would have strong benefits in making the behavior more consistent across linux flavors.

Change would look like:

  • Make sure zig is available on the machine building it, potentially use https://github.com/marketplace/actions/setup-zig for the GitHub action
  • Change g++ command to use zig c++ instead. All options should remain the same since it's a drop-in replacement.

More info about zig: https://andrewkelley.me/post/zig-cc-powerful-drop-in-replacement-gcc-clang.html

from go-duckdb.

motiejus avatar motiejus commented on July 24, 2024

I'm wondering if these compile issues on linux_amd64 are due to the current build not being hermetic.

The problem here is that C++ does not have a stable ABI. Therefore, we cannot compile our application with clang (i.e. zig c++) and link it with a gcc-prebuilt libduckdb.a. If we do, we will get linker errors when lucky. If we are less lucky, undefined behavior.

Most Linux systems use GCC, therefore most developers ship gcc pre-compiled binaries. This is, unfortunately, quite prominent in Go ecosystem. I've seen this done with zstd, v8, now with libduckdb.

Not only we lose ability to use a different compiler by bundling static libraries, we lose flexibility:

  • ability to specify platform (-mcpu)
  • debug level (-gX).
  • sanitizers (tsan, ubsan, etc).
  • optimizations (e.g. -O2 vs -O3 or even -Os).

We should work on making compilations faster instead, not avoiding them altogether.

from go-duckdb.

marcboeker avatar marcboeker commented on July 24, 2024

@chrisirhc Does it also compile if you're using -tags=duckdb_from_source on your go run and go build?
I'm not an expert in compiler chains and cross compiling. The idea behind the current system is adapted from https://github.com/mattn/go-sqlite3. Have you tried using an older version like 1.0.3 for building you app. Does that work with the precompiled libs?

@motiejus Do you have an idea on how to make compilation faster?

from go-duckdb.

motiejus avatar motiejus commented on July 24, 2024

@motiejus Do you have an idea on how to make compilation faster?

There are a few ways to go about it:

  1. Use a caching C++ compiler. This highly depends on the environment. In the simplest case, it kind of works with Go. I.e. in theory one compiles duckdb only once on a specific machine.
  2. Use different optimization flags depending on the compilation mode. E.g. on my laptop:
$ hyperfine -r 1 -w 0  -L compiler g++,clang++-15 -L opt O0,O1,O2,O3  '{compiler} -{opt} -Wno-unqualified-std-cast-call -DGODUCKDB_FROM_SOURCE -c duckdb.cpp -o duckdb.o'                                                                                                                       
Benchmark 1: g++ -O0 -Wno-unqualified-std-cast-call -DGODUCKDB_FROM_SOURCE -c duckdb.cpp -o duckdb.o
  Time (abs ≡):        165.526 s               [User: 160.322 s, System: 5.137 s]
 
Benchmark 2: clang++-15 -O0 -Wno-unqualified-std-cast-call -DGODUCKDB_FROM_SOURCE -c duckdb.cpp -o duckdb.o
  Time (abs ≡):        84.982 s               [User: 82.857 s, System: 2.117 s]
 
Benchmark 3: g++ -O1 -Wno-unqualified-std-cast-call -DGODUCKDB_FROM_SOURCE -c duckdb.cpp -o duckdb.o
  Time (abs ≡):        291.096 s               [User: 286.639 s, System: 4.373 s]
 
Benchmark 4: clang++-15 -O1 -Wno-unqualified-std-cast-call -DGODUCKDB_FROM_SOURCE -c duckdb.cpp -o duckdb.o
  Time (abs ≡):        356.233 s               [User: 353.957 s, System: 2.047 s]
 
Benchmark 5: g++ -O2 -Wno-unqualified-std-cast-call -DGODUCKDB_FROM_SOURCE -c duckdb.cpp -o duckdb.o
  Time (abs ≡):        503.793 s               [User: 498.994 s, System: 4.574 s]
 
Benchmark 6: clang++-15 -O2 -Wno-unqualified-std-cast-call -DGODUCKDB_FROM_SOURCE -c duckdb.cpp -o duckdb.o
  Time (abs ≡):        419.577 s               [User: 417.509 s, System: 1.905 s]
 
Benchmark 7: g++ -O3 -Wno-unqualified-std-cast-call -DGODUCKDB_FROM_SOURCE -c duckdb.cpp -o duckdb.o
  Time (abs ≡):        660.839 s               [User: 655.709 s, System: 4.968 s]
 
Benchmark 8: clang++-15 -O3 -Wno-unqualified-std-cast-call -DGODUCKDB_FROM_SOURCE -c duckdb.cpp -o duckdb.o
  Time (abs ≡):        572.025 s               [User: 569.888 s, System: 2.091 s]
 
Summary
  'clang++-15 -O0 -Wno-unqualified-std-cast-call -DGODUCKDB_FROM_SOURCE -c duckdb.cpp -o duckdb.o' ran
    1.95 times faster than 'g++ -O0 -Wno-unqualified-std-cast-call -DGODUCKDB_FROM_SOURCE -c duckdb.cpp -o duckdb.o'
    3.43 times faster than 'g++ -O1 -Wno-unqualified-std-cast-call -DGODUCKDB_FROM_SOURCE -c duckdb.cpp -o duckdb.o'
    4.19 times faster than 'clang++-15 -O1 -Wno-unqualified-std-cast-call -DGODUCKDB_FROM_SOURCE -c duckdb.cpp -o duckdb.o'
    4.94 times faster than 'clang++-15 -O2 -Wno-unqualified-std-cast-call -DGODUCKDB_FROM_SOURCE -c duckdb.cpp -o duckdb.o'
    5.93 times faster than 'g++ -O2 -Wno-unqualified-std-cast-call -DGODUCKDB_FROM_SOURCE -c duckdb.cpp -o duckdb.o'
    6.73 times faster than 'clang++-15 -O3 -Wno-unqualified-std-cast-call -DGODUCKDB_FROM_SOURCE -c duckdb.cpp -o duckdb.o'
    7.78 times faster than 'g++ -O3 -Wno-unqualified-std-cast-call -DGODUCKDB_FROM_SOURCE -c duckdb.cpp -o duckdb.o'

The fastest and slowest optimization modes (-O0 and -O3) differ by 488 and 495 seconds for gcc and clang respectively (almost 5 minutes!).

Go does not support explicit "build modes" (e.g. opt for optimized builds, fastbuild for, er, fast builds). So with the Go's build system the closest is something like this:

  1. Do not set optimization flags in CFLAGS. Only what is truly necessary to build the library.
  2. Set the CFLAGS in the environment:
CGO_CFLAGS="-O2" go build <...>

Most library users do not set CGO_CFLAGS by default, unfortunately, and, if the user does not set this flag, more likely than not an un-optimized duckdb will end up in production.

Therefore, I don't have a good answer here besides "use a build system that supports compilation modes". But I acknowledge it's not for everyone.

from go-duckdb.

marcboeker avatar marcboeker commented on July 24, 2024

Thanks a lot for the benchmark. I've create a branch to play around with different compilers and versions to see what works the best. You can find the latest version here https://github.com/marcboeker/go-duckdb/tree/switch-compiler

from go-duckdb.

marcboeker avatar marcboeker commented on July 24, 2024

@TN1ck @chrisirhc I've switched from g++ to clang++. Could you please check, if the precompiled libs in the https://github.com/marcboeker/go-duckdb/tree/switch-compiler branch fix your problem?

from go-duckdb.

begelundmuller avatar begelundmuller commented on July 24, 2024

Just saw this thread and wanted to share some notes from when I contributed the pre-built static builds:

  1. About Zig: Zig would be awesome in theory, but when I tried setting it up for our builds in Rill, we got unstable binaries on macOS that appeared to crash when compiled C++ code raised an exception (even though the exception was handled). Don't know if that's been resolved, but in general, at the time, it felt like Zig was still not really production-ready as a C++ cross-compiler.

  2. About static builds: You're right these builds aren't fully static, but merely statically link the DuckDB code. I'm also not an expert on C++ builds, but I agree it sounds better to produce a fully static binary. However, I did note that the DuckDB CLI binaries are not static:

# On Linux:
$ ldd duckdb
	linux-vdso.so.1 (0x0000ffff91eec000)
	libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000ffff8fa60000)
	libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000ffff8fa40000)
	libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffff8f9a0000)
	libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffff8f7f0000)
	/lib/ld-linux-aarch64.so.1 (0x0000ffff91eb3000)

# On macOS: 
$ otool -L /opt/homebrew/bin/duckdb
/opt/homebrew/bin/duckdb:
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1300.32.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.0.0)

Would love input from someone familiar with the trade-offs of fully static vs. not (as well as how to configure a fully static build).

  1. About optimizer flags: IIRC DuckDB recommends building with -O3, that's why I put it in the CFLAGS (since Go's build system isn't very explicit about this as @motiejus points out). I think it's worth it to prevent a situation where someone deploys an un-optimized build into production. Maybe it makes sense to remove it from the duckdb_from_source flags and only do it for the pre-built static libraries. If you want to build from source with -O0, I believe the CGO_CXXFLAGS env var takes precedence over those defined in the source code.

  2. Overall, the point of the pre-built static libraries was to provide a convenient default for quick compilation times for most users. For people who want to customize optimizer and linker flags, compiling from source using -tags=duckdb_from_source should provide that flexibility.

from go-duckdb.

motiejus avatar motiejus commented on July 24, 2024

Wow, thanks for your thoughtful reply!

Just saw this thread and wanted to share some notes from when I contributed the pre-built static builds:

1. About Zig: Zig would be awesome in theory, but when I tried [setting it up](https://github.com/rilldata/rill-developer/pull/846) for our builds in Rill, we got unstable binaries on macOS that appeared to crash when compiled C++ code raised an exception (even though the exception was handled). Don't know if that's been resolved, but in general, at the time, it felt like Zig was still not really production-ready as a C++ cross-compiler.

I wasn't aware of the Darwin issues for C++ code, thanks for pointing this out. Linux, though, is a different story -- as of a few weeks ago zig c++ is compiling most of Uber's C/C++ code (also see https://lists.sr.ht/~motiejus/bazel-zig-cc/%3CCAFVMu-rYbf_jDTT4p%3DCS2KV1asdS5Ovo5AyuCwgv2AXr8OOP0g%40mail.gmail.com%3E). So I can recommend using zig c++ for Linux in production by now.

2. About static builds: You're right these builds aren't fully static, but merely statically link the DuckDB code. I'm also not an expert on C++ builds, but I agree it sounds better to produce a fully static binary. However, I did note that the DuckDB CLI binaries are not static:
# On Linux:
$ ldd duckdb
	linux-vdso.so.1 (0x0000ffff91eec000)
	libdl.so.2 => /lib/aarch64-linux-gnu/libdl.so.2 (0x0000ffff8fa60000)
	libpthread.so.0 => /lib/aarch64-linux-gnu/libpthread.so.0 (0x0000ffff8fa40000)
	libm.so.6 => /lib/aarch64-linux-gnu/libm.so.6 (0x0000ffff8f9a0000)
	libc.so.6 => /lib/aarch64-linux-gnu/libc.so.6 (0x0000ffff8f7f0000)
	/lib/ld-linux-aarch64.so.1 (0x0000ffff91eb3000)

Runnable executables are trivial, at least on Linux -- musl-gcc or zig c++ -target x86_64-linux-musl will do the trick.

On macOS:

$ otool -L /opt/homebrew/bin/duckdb
/opt/homebrew/bin/duckdb:
/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1300.32.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1319.0.0)

I don't think it's possible to create a fully static executable on MacOS (If I remember correctly what @kubkon told me). But also there is no point: MacOS is much more homogenous.

Would love input from someone familiar with the trade-offs of fully static vs. not (as well as how to configure a fully static build).

If we are talking about static binaries, there is one caveat: that kind-of implies using musl these days. However, musl's malloc implementation is quite different from glibc: it is much smaller and simpler, but may exhibit contention when many threads are allocating/freeing memory concurrently. This is not a problem in 99% of cases, but if someone were to make it official, they should benchmark it. I do not know, however, how much multi-threaded duckdb is.

3. About optimizer flags: IIRC DuckDB recommends building with `-O3`, that's why I put it in the `CFLAGS` (since Go's build system isn't very explicit about this as @motiejus points out). I think it's worth it to prevent a situation where someone deploys an un-optimized build into production. Maybe it makes sense to remove it from the `duckdb_from_source` [flags](https://github.com/marcboeker/go-duckdb/blob/d4c1fb4dcc45bca75279ece9dac5adb226309793/cgo_source.go#L6) and only do it for the pre-built static libraries. If you want to build from source with `-O0`, I believe the `CGO_CXXFLAGS` env var takes precedence over those defined in the source code.

I do not have an opinion here.

4. Overall, the point of the pre-built static libraries was to provide a convenient default for quick compilation times for most users. For people who want to customize optimizer and linker flags, compiling from source using `-tags=duckdb_from_source` should provide that flexibility.

The problem here is that the pre-built libraries are incompatible between clang and gcc. Most of the cases Linux users use gcc, but some weird ones (like me) prefer clang, which results in a very strange linker erorr, or, even worse, runtime crash.

I did not look into go-duckdb carefully though. If it can be made to use the C ABI only (i.e. only the functions that are behind extern "C" {...}, then it would work regardless of with which C++ compiler libduckdb is compiled.

from go-duckdb.

kubkon avatar kubkon commented on July 24, 2024

I just wanted to mention that the MachO linker in Zig handles C++ exceptions since about a week or so: ziglang/zig#14397 ;-)

from go-duckdb.

kubkon avatar kubkon commented on July 24, 2024

Also, re static binaries on macOS, they are completely unsupported on Apple Silicon, however, still work on x86_64. Zig's MachO linker currently doesn't emitting a non-PIE binaries.

from go-duckdb.

chrisirhc avatar chrisirhc commented on July 24, 2024

@TN1ck @chrisirhc I've switched from g++ to clang++. Could you please check, if the precompiled libs in the https://github.com/marcboeker/go-duckdb/tree/switch-compiler branch fix your problem?

Hey, just wanted to come back to this to report back that due to the constraint mentioned before that I'm using this go-duckdb library in a monorepo that implements a hermetic c++ compiler (zig c++, in our scenario), I won't be able to test this branch (due to Because C++ ABI is unstable, everything that's in the process using C++ ABI needs to be compiled with the same compiler).

I also want to report back that I've successfully gotten everything to work via zig c++. I haven't seen the MacOS crash. Perhaps I haven't hit that code path... yet.

from go-duckdb.

TN1ck avatar TN1ck commented on July 24, 2024

@TN1ck @chrisirhc I've switched from g++ to clang++. Could you please check, if the precompiled libs in the https://github.com/marcboeker/go-duckdb/tree/switch-compiler branch fix your problem?

Still showcases the same problems for me, but this is actually not important or us, I simply used a Docker image with those shared libraries.

I did create a test repo where the problem is showcased: https://github.com/TN1ck/go-duckdb-static-test

from go-duckdb.

derekperkins avatar derekperkins commented on July 24, 2024

@chrisirhc Could you share your build setup? I've been having issues building for linux-x64 from a Mac M3.

from go-duckdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.