Giter Site home page Giter Site logo

apache / incubator-gluten Goto Github PK

View Code? Open in Web Editor NEW
989.0 39.0 359.0 179.57 MB

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.

Home Page: https://gluten.apache.org/

License: Apache License 2.0

Java 6.06% Scala 64.06% Shell 1.41% C++ 26.94% CMake 1.22% Python 0.26% Dockerfile 0.03% C 0.01% Makefile 0.01%
clickhouse simd spark-sql vectorization velox arrow

incubator-gluten's Introduction

Apache Gluten (Incubating): A Middle Layer for Offloading JVM-based SQL Engines' Execution to Native Engines

OpenSSF Best Practices

This project is still under active development now, and doesn't have a stable release. Welcome to evaluate it.

1 Introduction

1.1 Problem Statement

Apache Spark is a stable, mature project that has been developed for many years. It is one of the best frameworks to scale out for processing petabyte-scale datasets. However, the Spark community has had to address performance challenges that require various optimizations over time. As a key optimization in Spark 2.0, Whole Stage Code Generation is introduced to replace Volcano Model, which achieves 2x speedup. Henceforth, most optimizations are at query plan level. Single operator's performance almost stops growing.

On the other side, SQL engines have been researched for many years. There are a few libraries like Clickhouse, Arrow and Velox, etc. By using features like native implementation, columnar data format and vectorized data processing, these libraries can outperform Spark's JVM based SQL engine. However, these libraries only support single node execution.

1.2 Gluten's Solution

“Gluten” is Latin for glue. The main goal of Gluten project is to “glue" native libraries with SparkSQL. Thus, we can benefit from high scalability of Spark SQL framework and high performance of native libraries.

The basic rule of Gluten's design is that we would reuse spark's whole control flow and as many JVM code as possible but offload the compute-intensive data processing part to native code. Here is what Gluten does:

  • Transform Spark’s whole stage physical plan to Substrait plan and send to native
  • Offload performance-critical data processing to native library
  • Define clear JNI interfaces for native libraries
  • Switch available native backends easily
  • Reuse Spark’s distributed control flow
  • Manage data sharing between JVM and native
  • Extensible to support more native accelerators

1.3 Target User

Gluten's target user is anyone who wants to accelerate SparkSQL fundamentally. As a plugin to Spark, Gluten doesn't require any change for dataframe API or SQL query, but only requires user to make correct configuration. See Gluten configuration properties here.

1.4 References

You can click below links for more related information.

2 Architecture

The overview chart is like below. Substrait provides a well-defined cross-language specification for data compute operations (see more details here). Spark physical plan is transformed to Substrait plan. Then Substrait plan is passed to native through JNI call. On native side, the native operator chain will be built out and offloaded to native engine. Gluten will return Columnar Batch to Spark and Spark Columnar API (since Spark-3.0) will be used at execution time. Gluten uses Apache Arrow data format as its basic data format, so the returned data to Spark JVM is ArrowColumnarBatch.

Currently, Gluten only supports Clickhouse backend & Velox backend. Velox is a C++ database acceleration library which provides reusable, extensible and high-performance data processing components. More details can be found from https://github.com/facebookincubator/velox/. Gluten can also be extended to support more backends.

There are several key components in Gluten:

  • Query Plan Conversion: converts Spark's physical plan to Substrait plan.
  • Unified Memory Management: controls native memory allocation.
  • Columnar Shuffle: shuffles Gluten columnar data. The shuffle service still reuses the one in Spark core. A kind of columnar exchange operator is implemented to support Gluten columnar data format.
  • Fallback Mechanism: supports falling back to Vanilla spark for unsupported operators. Gluten ColumnarToRow (C2R) and RowToColumnar (R2C) will convert Gluten columnar data and Spark's internal row data if needed. Both C2R and R2C are implemented in native code as well
  • Metrics: collected from Gluten native engine to help identify bugs, performance bottlenecks, etc. The metrics are displayed in Spark UI.
  • Shim Layer: supports multiple Spark versions. We plan to only support Spark's latest 2 or 3 releases. Currently, Spark-3.2, Spark-3.3 & Spark-3.4 (experimental) are supported.

3 How to Use

There are two ways to use Gluten.

3.1 Use Released Jar

One way is to use released jar. Here is a simple example. Currently, only centos7/8 and ubuntu20.04/22.04 are well supported.

spark-shell \
 --master yarn --deploy-mode client \
 --conf spark.plugins=org.apache.gluten.GlutenPlugin \
 --conf spark.memory.offHeap.enabled=true \
 --conf spark.memory.offHeap.size=20g \
 --conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager \
 --jars https://github.com/apache/incubator-gluten/releases/download/v1.0.0/gluten-velox-bundle-spark3.2_2.12-ubuntu_20.04_x86_64-1.0.0.jar

3.2 Custom Build

Alternatively, you can build gluten from source, then do some configurations to enable Gluten plugin for Spark. Here is a simple example. Please refer to the corresponding backend part below for more details.

export gluten_jar = /PATH/TO/GLUTEN/backends-velox/target/<gluten-jar>
spark-shell 
  --master yarn --deploy-mode client \
  --conf spark.plugins=org.apache.gluten.GlutenPlugin \
  --conf spark.memory.offHeap.enabled=true \
  --conf spark.memory.offHeap.size=20g \
  --conf spark.driver.extraClassPath=${gluten_jar} \
  --conf spark.executor.extraClassPath=${gluten_jar} \
  --conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
  ...

3.2.1 Build and install Gluten with Velox backend

If you want to use Gluten Velox backend, see Build with Velox to build and install the necessary libraries.

3.2.2 Build and install Gluten with ClickHouse backend

If you want to use Gluten ClickHouse backend, see Build with ClickHouse Backend. ClickHouse backend is developed by Kyligence, please visit https://github.com/Kyligence/ClickHouse for more infomation.

3.2.3 Build options

See Gluten build guide.

4 Contribution

Welcome to contribute to Gluten project! See contributing guide about how to make contributions.

4.1 Community

Gluten successfully joined Apache Incubator since March'24. We welcome developers and users who are interested in Gluten project. Here are several ways to contact us:

Gluten website

https://gluten.apache.org/

Mailing lists

For any technical discussion, please send email to [email protected]. See archives. Please click here to subscribe.

Wechat group

We also have a Wechat group (in Chinese) which may be more friendly for PRC developers/users. Due to the limitation of wechat group, please contact with weitingchen at apache.org or zhangzc at apache.org to be invited to the group.

Slack channel

There's also a Spark channel in Velox Slack group (in English) for community communication for Velox backend. Please check Velox document here: https://github.com/facebookincubator/velox?tab=readme-ov-file#community

4.2 Issue Report

Please feel free to create Github issue for reporting bug or proposing enhancement. For contributing code, please submit an issue firstly and mention that issue in your PR.

4.3 Documentation

Currently, all gluten documents are held at docs. The documents may not reflect the latest designs. Please feel free to contact us for getting design details or sharing your design ideas.

5 Performance

We use Decision Support Benchmark1 (TPC-H like) to evaluate Gluten's performance. Decision Support Benchmark1 is a query set modified from TPC-H benchmark. We use Parquet file format for Velox testing & MergeTree file format for Clickhouse testing, compared to Parquet file format as baseline. See Decision Support Benchmark1.

The below test environment: single node with 2TB data; Spark-3.3.2 for both baseline and Gluten. The Decision Support Benchmark1 result (tested in Jun. 2023) shows an overall speedup of 2.71x and up to 14.53x speedup in a single query with Gluten Velox backend used.

Performance

The below testing environment: a 8-nodes AWS cluster with 1TB data; Spark-3.1.1 for both baseline and Gluten. The Decision Support Benchmark1 result shows an average speedup of 2.12x and up to 3.48x speedup with Gluten Clickhouse backend.

Performance

6 License

Gluten is licensed under Apache 2.0 license.

7 Contact

Gluten was initiated by Intel and Kyligence in 2022. Several companies such as Intel, Kyligence, BIGO, Meituan, Alibaba Cloud, NetEase, Baidu, Microsoft and others, are actively participating in the development of Gluten. If you are interested in Gluten project, please contact below email address for further discussion.

[email protected]; [email protected]; [email protected]; [email protected]; [email protected]; [email protected];[email protected];[email protected];[email protected]

8 Thanks to our contributors

* LEGAL NOTICE: Your use of this software and any required dependent software (the "Software Package") is subject to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party or open source software included in or with the Software Package, and your use indicates your acceptance of all such terms. Please refer to the "TPP.txt" or other similarly-named text file included with the Software Package for additional details.

incubator-gluten's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

incubator-gluten's Issues

Do we need to exclude Pre-Projection from Aggregate?

  • TPC-H Q6's Aggregation includes:
    Pre-Projection (Multiply)
    Aggregate (Sum)
    Post-Projection (Cast to String)

In a local development branch, I have excluded Post-Projection from Aggregate in Scala side by creating a new ProjectRel when needed. Do we need to do that for Pre-Projection?

Seperate the base layer and backend layer

The base layer will include some common code and configs used by every backend. The backend layer will include some specific code and configs used by that backend only. In this way, each backend will use its own specific layer based on the base layer. The computings for different backends will be well seperated.

Remove Alias

Currently, Spark has the Alias expression to assign a new name to a computation. But due to Substrait is index-based, this expression is unneeded. Do we need to remove Alias?

Use the unified function names with Substrait

We previously used self-defined function names, which causes difficulty for the backends to use. Therefore, we need to change to use the unified names specified in Substrait yaml files.

Use unified Jni interfaces

Below parts need to be cleaned and unified:

  • ExpressionEvaluator
  • ExpressionEvaluatorJniWrapper
  • BatchIterator.java
  • JniUtils and JniInstance
  • createNativeKernelWithIterator
  • add a config to decide whether to load Gandiva, Arrow libraries

Run spark-shell with gazelle-jni-jvm-1.2.0-snapshot-jar-with-dependencies.jar failed.

Run spark-shell with gazelle-jni-jvm-1.2.0-snapshot-jar-with-dependencies.jar failed.
Run this on Ububtu 20.04. The command is shown below:
1647307968(1)

Then i check the libaray libspark_columnar_jni.so with ldd. There are some undefined symbol errors.

root@ubuntu:/home/gazelle/gazelle-jni/cpp/build/releases# ldd -r libspark_columnar_jni.so
        linux-vdso.so.1 (0x00007ffcd0ae1000)
        libprotobuf.so.17 => /lib/x86_64-linux-gnu/libprotobuf.so.17 (0x00007f03889ec000)
        libdouble-conversion.so.3 => /lib/x86_64-linux-gnu/libdouble-conversion.so.3 (0x00007f03889d6000)
        libsnappy.so.1 => /lib/x86_64-linux-gnu/libsnappy.so.1 (0x00007f03889cb000)
        libglog.so.0 => /usr/local/lib/libglog.so.0 (0x00007f0388984000)
        libarrow.so.400 (0x00007f038724f000)
        libgandiva.so.400 (0x00007f0384ff7000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f0384e13000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0384cc4000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f0384ca9000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0384ab7000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f038b0b5000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f0384a9b000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0384a78000)
        libgflags.so.2.2 => /usr/local/lib/libgflags.so.2.2 (0x00007f0384a49000)
        libunwind.so.8 => /lib/x86_64-linux-gnu/libunwind.so.8 (0x00007f0384a2c000)
        libcrypto.so.1.1 => /lib/x86_64-linux-gnu/libcrypto.so.1.1 (0x00007f0384756000)
        libssl.so.1.1 => /lib/x86_64-linux-gnu/libssl.so.1.1 (0x00007f03846c3000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f03846bd000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f03846b2000)
        libcurl.so.4 => /lib/x86_64-linux-gnu/libcurl.so.4 (0x00007f038461f000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f03845f6000)
        libnghttp2.so.14 => /lib/x86_64-linux-gnu/libnghttp2.so.14 (0x00007f03845cd000)
        libidn2.so.0 => /lib/x86_64-linux-gnu/libidn2.so.0 (0x00007f03845ac000)
        librtmp.so.1 => /lib/x86_64-linux-gnu/librtmp.so.1 (0x00007f038458c000)
        libssh.so.4 => /lib/x86_64-linux-gnu/libssh.so.4 (0x00007f038451c000)
        libpsl.so.5 => /lib/x86_64-linux-gnu/libpsl.so.5 (0x00007f0384509000)
        libgssapi_krb5.so.2 => /lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007f03844bc000)
        libldap_r-2.4.so.2 => /lib/x86_64-linux-gnu/libldap_r-2.4.so.2 (0x00007f0384466000)
        liblber-2.4.so.2 => /lib/x86_64-linux-gnu/liblber-2.4.so.2 (0x00007f0384455000)
        libbrotlidec.so.1 => /lib/x86_64-linux-gnu/libbrotlidec.so.1 (0x00007f0384447000)
        libunistring.so.2 => /lib/x86_64-linux-gnu/libunistring.so.2 (0x00007f03842c3000)
        libgnutls.so.30 => /lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007f03840ed000)
        libhogweed.so.5 => /lib/x86_64-linux-gnu/libhogweed.so.5 (0x00007f03840b6000)
        libnettle.so.7 => /lib/x86_64-linux-gnu/libnettle.so.7 (0x00007f038407c000)
        libgmp.so.10 => /lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f0383ff8000)
        libkrb5.so.3 => /lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007f0383f1b000)
        libk5crypto.so.3 => /lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007f0383ee8000)
        libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007f0383ee1000)
        libkrb5support.so.0 => /lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007f0383ed2000)
        libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f0383eb6000)
        libsasl2.so.2 => /lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007f0383e99000)
        libgssapi.so.3 => /lib/x86_64-linux-gnu/libgssapi.so.3 (0x00007f0383e54000)
        libbrotlicommon.so.1 => /lib/x86_64-linux-gnu/libbrotlicommon.so.1 (0x00007f0383e2f000)
        libp11-kit.so.0 => /lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007f0383cf9000)
        libtasn1.so.6 => /lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007f0383ce3000)
        libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007f0383cdc000)
        libheimntlm.so.0 => /lib/x86_64-linux-gnu/libheimntlm.so.0 (0x00007f0383cd0000)
        libkrb5.so.26 => /lib/x86_64-linux-gnu/libkrb5.so.26 (0x00007f0383c3b000)
        libasn1.so.8 => /lib/x86_64-linux-gnu/libasn1.so.8 (0x00007f0383b94000)
        libhcrypto.so.4 => /lib/x86_64-linux-gnu/libhcrypto.so.4 (0x00007f0383b5c000)
        libroken.so.18 => /lib/x86_64-linux-gnu/libroken.so.18 (0x00007f0383b43000)
        libffi.so.7 => /lib/x86_64-linux-gnu/libffi.so.7 (0x00007f0383b37000)
        libwind.so.0 => /lib/x86_64-linux-gnu/libwind.so.0 (0x00007f0383b0d000)
        libheimbase.so.1 => /lib/x86_64-linux-gnu/libheimbase.so.1 (0x00007f0383af9000)
        libhx509.so.5 => /lib/x86_64-linux-gnu/libhx509.so.5 (0x00007f0383aab000)
        libsqlite3.so.0 => /lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007f0383982000)
        libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f0383947000)
undefined symbol: _ZN3fLB10FLAGS_avx2E  (./libspark_columnar_jni.so)
undefined symbol: _ZN3fLB32FLAGS_velox_exception_stacktraceE    (./libspark_columnar_jni.so)
undefined symbol: _ZN3fLB10FLAGS_bmi2E  (./libspark_columnar_jni.so)
undefined symbol: _ZN3fLI46FLAGS_velox_exception_stacktrace_rate_limit_msE      (./libspark_columnar_jni.so)
undefined symbol: _ZN3fLB22FLAGS_velox_use_mallocE      (./libspark_columnar_jni.so)
undefined symbol: _ZNK8facebook5velox7process10StackTrace8toStringB5cxx11Ev     (./libspark_columnar_jni.so)
undefined symbol: _ZN5boost16re_detail_10710013put_mem_blockEPv (./libspark_columnar_jni.so)
undefined symbol: _ZN5boost13match_resultsIN9__gnu_cxx17__normal_iteratorIPKcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEESaINS_9sub_matchISB_EEEE12maybe_assignERKSF_  (./libspark_columnar_jni.so)
undefined symbol: _ZN8facebook5velox8encoding6Base646encodeB5cxx11EN5folly5RangeIPKcEE  (./libspark_columnar_jni.so)
undefined symbol: _ZN8facebook5velox4dwio6common10encryptioneqERKNS3_20EncryptionPropertiesES6_ (./libspark_columnar_jni.so)
undefined symbol: _ZN8facebook5velox8encoding6Base6420calculateDecodedSizeEPKcRmb       (./libspark_columnar_jni.so)
undefined symbol: event_base_new        (./libspark_columnar_jni.so)
undefined symbol: _ZN4date11locate_zoneESt17basic_string_viewIcSt11char_traitsIcEE      (./libspark_columnar_jni.so)
undefined symbol: event_active  (./libspark_columnar_jni.so)
undefined symbol: _ZN8facebook5velox4dwrf10ProtoUtils9writeTypeERKNS0_4TypeERNS1_5proto6FooterEPNS6_4TypeE      (./libspark_columnar_jni.so)
undefined symbol: _ZN8facebook5velox7process12TraceContext10statusLineB5cxx11Ev (./libspark_columnar_jni.so)
undefined symbol: jump_fcontext (./libspark_columnar_jni.so)
undefined symbol: event_add     (./libspark_columnar_jni.so)
undefined symbol: _ZN8facebook5velox7process10StackTraceC1Ei    (./libspark_columnar_jni.so)
undefined symbol: _ZN5boost13match_resultsIPKcSaINS_9sub_matchIS2_EEEE12maybe_assignERKS6_      (./libspark_columnar_jni.so)
undefined symbol: ZSTD_getErrorName     (./libspark_columnar_jni.so)
undefined symbol: _ZN8facebook5velox7process12TraceContextD1Ev  (./libspark_columnar_jni.so)
undefined symbol: event_base_set        (./libspark_columnar_jni.so)
undefined symbol: _ZN5boost16re_detail_10710012perl_matcherIN9__gnu_cxx17__normal_iteratorIPKcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEESaINS_9sub_matchISC_EEENS_12regex_traitsIcNS_16cpp_regex_traitsIcEEEEE14construct_initERKNS_11basic_regexIcSJ_EENS_15regex_constants12_match_flagsE   (./libspark_columnar_jni.so)
undefined symbol: _ZN8facebook5velox8encoding6Base646decodeEPKcmPc      (./libspark_columnar_jni.so)
undefined symbol: _ZN8facebook5velox8encoding6Base646encodeEPKcmPc      (./libspark_columnar_jni.so)
undefined symbol: _ZN5boost16re_detail_10710019raise_runtime_errorERKSt13runtime_error  (./libspark_columnar_jni.so)
undefined symbol: ZSTD_decompress       (./libspark_columnar_jni.so)
undefined symbol: event_base_free       (./libspark_columnar_jni.so)
undefined symbol: _ZN8facebook5velox8encoding6Base6420calculateEncodedSizeEmb   (./libspark_columnar_jni.so)
undefined symbol: _ZN5boost11basic_regexIcNS_12regex_traitsIcNS_16cpp_regex_traitsIcEEEEE9do_assignEPKcS7_j     (./libspark_columnar_jni.so)
undefined symbol: _ZN5boost16re_detail_10710013get_mem_blockEv  (./libspark_columnar_jni.so)
undefined symbol: _ZN5boost16re_detail_10710014verify_optionsEjNS_15regex_constants12_match_flagsE      (./libspark_columnar_jni.so)
undefined symbol: event_set     (./libspark_columnar_jni.so)
undefined symbol: ZSTD_getFrameContentSize      (./libspark_columnar_jni.so)
undefined symbol: _ZN8facebook5velox8encoding6Base646decodeB5cxx11EN5folly5RangeIPKcEE  (./libspark_columnar_jni.so)
undefined symbol: event_base_loop       (./libspark_columnar_jni.so)
undefined symbol: _ZN5boost16re_detail_10710012perl_matcherIPKcSaINS_9sub_matchIS3_EEENS_12regex_traitsIcNS_16cpp_regex_traitsIcEEEEE14construct_initERKNS_11basic_regexIcSA_EENS_15regex_constants12_match_flagsE       (./libspark_columnar_jni.so)
undefined symbol: event_del     (./libspark_columnar_jni.so)
undefined symbol: _ZNK5boost16re_detail_10710031cpp_regex_traits_implementationIcE17transform_primaryB5cxx11EPKcS4_     (./libspark_columnar_jni.so)
undefined symbol: _ZN8facebook5velox7process12TraceContextC1ENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEb      (./libspark_columnar_jni.so)
undefined symbol: _ZNK4date9time_zone13get_info_implENSt6chrono10time_pointINS1_3_V212system_clockENS1_8durationIlSt5ratioILl1ELl1EEEEEE        (./libspark_columnar_jni.so)
undefined symbol: ZSTD_isError  (./libspark_columnar_jni.so)
undefined symbol: _ZNK5boost16re_detail_10710031cpp_regex_traits_implementationIcE9transformB5cxx11EPKcS4_      (./libspark_columnar_jni.so)
undefined symbol: LZ4_decompress_safe   (./libspark_columnar_jni.so)
undefined symbol: _ZN8facebook5velox4dwio6common9exception18getExceptionLoggerEv        (./libspark_columnar_jni.so)
undefined symbol: event_get_version     (./libspark_columnar_jni.so)
undefined symbol: _ZN5boost16re_detail_10710024get_default_error_stringENS_15regex_constants10error_typeE       (./libspark_columnar_jni.so)
undefined symbol: event_base_loopbreak  (./libspark_columnar_jni.so)
undefined symbol: make_fcontext (./libspark_columnar_jni.so)
undefined symbol: _ZN8facebook5velox4dwio6common11compression13lzoDecompressEPKcS5_PcS6_        (./libspark_columnar_jni.so)
undefined symbol: ZSTD_getErrorCode     (./libspark_columnar_jni.so)
undefined symbol: _ZNK4date9time_zone13get_info_implENSt6chrono10time_pointINS_7local_tENS1_8durationIlSt5ratioILl1ELl1EEEEEE   (./libspark_columnar_jni.so)
undefined symbol: ZSTD_compress (./libspark_columnar_jni.so)
undefined symbol: _ZN8facebook5velox35DeserializationRegistryForSharedPtrB5cxx11Ev      (./libspark_columnar_jni.so)
undefined symbol: event_base_get_method (./libspark_columnar_jni.so)

For example undefined symbol: _ZN3fLB10FLAGS_avx2E (./libspark_columnar_jni.so), I use c++filt to show the function name:

root@ubuntu:/home/gazelle/gazelle-jni/cpp/build/releases# c++filt _ZN3fLB10FLAGS_avx2E
fLB::FLAGS_avx2

The function FLAGS_avx2 is used by velox, but I can not find the defination of it.
I have no idea what to do next. Someone can help?
I compile gazelle_jni on branch velox_dev, compile velox on branch substrait ,So @rui-mo , can you give me some help?

Fix some fallback issues

Currently, there are some fallback issues when SparkPlan is SerializeFromObjectExec, ObjectHashAggregateExec and V2CommandExec, for example:

val tookTimeArr = Array(12, 23, 56, 100, 500, 20)
import spark.implicits._
val df = spark.sparkContext.parallelize(tookTimeArr.toSeq, 1).toDF("time")
df.summary().show(100, false)

When executing the above code, it will return a 'null' result.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.