Giter Site home page Giter Site logo

intel-ai / hdk Goto Github PK

View Code? Open in Web Editor NEW
31.0 9.0 14.0 67.88 MB

A low-level execution library for analytic data processing.

License: Apache License 2.0

CMake 2.26% C++ 84.38% C 0.27% Dockerfile 0.02% Python 3.82% Cython 0.94% FreeMarker 0.60% Java 5.23% Cuda 2.06% LLVM 0.14% Makefile 0.01% CSS 0.01% HTML 0.10% NASL 0.01% Shell 0.09% Ruby 0.01% PowerShell 0.04% Batchfile 0.01%
sql gpu query analytics query-engine modin pandas data-science machine-learning query-builder

hdk's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hdk's Issues

Running tests puts `${sys:MAPD_LOG_DIR}` directories in test folder

It appears that the environment variable MAPD_LOG_DIR set here https://github.com/intel-ai/omniscidb/blob/jit-engine/Calcite/CMakeLists.txt#L30 is not being picked up by the log4j properties file(s) https://github.com/intel-ai/omniscidb/blob/jit-engine/Calcite/java/calcite/src/main/resources/log4j2.properties.

To reproduce, build as normal, then enter build/Tests and run ArrowBasedExecuteTest --gtest_filter=Select.GroupBy (keeps the tests brief).

@vlad-penkin Ilya suggested you might have some ideas about how to debug?

Allow building the engine with L0 support in conda environment

Since the default L0 driver location is in system libraries, there's an issue when building jit engine. CMake's find_package looks for the headers and finds them in /usr/include. CMake does not add it to include even if the target_include_directories is set explicitly to include the system paths (see https://gitlab.kitware.com/cmake/cmake/-/issues/17966 for details). Providing hints/paths to find_package break the linking process for other libraries due to conflicts.
There's also currently no conda package that we could use to avoid using the system includes/libraries. We need to either build a package or create a workaround for building jit-engine with L0 under conda env.

Add Modin tests to HDK suite

Add a smoke/sanity test for Modin powered by HDK to the GitHub Actions tests in this repo. Will need to use the HDK branch in the Modin repository for now.

Add working pyhdk example -- to readme?

e.g.:

import pyhdk
storage = pyhdk.storage.ArrowStorage(1) # 1 is schema id
data_mgr = pyhdk.storage.DataMgr()
data_mgr.registerDataProvider(storage)

calcite = pyhdk.sql.Calcite(storage)
executor = pyhdk.Executor(data_mgr)

import pyarrow
import pandas
at = pyarrow.Table.from_pandas(
            pandas.DataFrame({"a": [1, 2, 3], "b": [10, 20, 30]})
        )
opt = pyhdk.storage.TableOptions(2)
storage.importArrowTable(at, "test", opt)

sql = "SELECT * FROM test;"
ra = calcite.process(sql)
rel_alg_executor = pyhdk.sql.RelAlgExecutor(executor, storage, data_mgr, ra)
print(rel_alg_executor.execute().to_arrow().to_pandas())

print(rel_alg_executor.execute(just_explain = True).to_explain_str())

Support bringing jit-engine branch in as module

Functionality required:

  • QueryEngine
  • Analyzer
  • DataMgr / ArrowStorage
  • Possibly Calcite, though initially we can directly generate Analyzer nodes
  • ArrowStorageExecuteTest

Functionality not required/desired:

  • Parser/ParserNode
  • Catalog
  • Thrift/DBHandler

Initial attempts have failed due to linking problems, but we can try again once https://github.com/intel-ai/omniscidb/pull/332 lands.

Also requires:

  • document endpoints exposed for integration
  • support build and minimal test w/ CI to prevent regressions (on either side)

Modin doesn't work with PyHDK with submodule updated to the latest jit-engine branch

If I update omniscidb submodule to the latest jit-engine branch then I get this error trying to parse RelAlg JSON queries in Calcite:

java.lang.NoClassDefFoundError: com/fasterxml/jackson/annotation/JsonIncludeProperties
        at com.fasterxml.jackson.databind.introspect.JacksonAnnotationIntrospector.findPropertyInclusionByName(JacksonAnnotationIntrospector.java:321) ~[calcite-1.0-SNAPSHOT-jar-with-dependencies.jar:?]

Looks like it is related to the latest change in the jackson-databind version used by Calcite. The problem can be reproduced using ienkovich/config of HDK and ienkovich/pyhdk-config branch of Modin.

JVM Initialization preventing back to back test runs

Running pytest from the hdk tests directory causes a crash on the second test.

Specifically, this line is failing:

    if (JNI_CreateJavaVM(&jvm, (void**)&env, &vm_args) != JNI_OK) {

      LOG(FATAL) << "Couldn't initialize JVM.";

    }

And due to some issues with the logger not being around anymore, we are failing to log the message and aborting.

The problem appears to be an issue of JNI context trying to initialize the JVM twice.

Full backtrace:

Thread 1 "python3" received signal SIGABRT, Aborted.
0x00007ffff7c8e36c in ?? () from /usr/lib/libc.so.6
(gdb) bt
#0 0x00007ffff7c8e36c in ?? () from /usr/lib/libc.so.6
#1 0x00007ffff7c3e838 in raise () from /usr/lib/libc.so.6
#2 0x00007ffff7c28535 in abort () from /usr/lib/libc.so.6
#3 0x00007fff29d4cac0 in logger::Logger::~Logger (this=0x7fffffff6110, __in_chrg=<optimized out>)
at /home/alexb/Projects/hdk/omniscidb/Logger/Logger.cpp:459
#4 0x00007fff29959551 in (anonymous namespace)::JVM::createJVM (max_mem_mb=<optimized out>)
at /home/alexb/Projects/hdk/omniscidb/Calcite/CalciteJNI.cpp:144
#5 (anonymous namespace)::JVM::getInstance (max_mem_mb=<optimized out>)
at /home/alexb/Projects/hdk/omniscidb/Calcite/CalciteJNI.cpp:88
#6 CalciteJNI::Impl::Impl (this=0x5555564e1060, schema_provider=..., udf_filename=..., calcite_max_mem_mb=<optimized out>)
at /home/alexb/Projects/hdk/omniscidb/Calcite/CalciteJNI.cpp:171
#7 0x00007fff2995ab13 in std::make_unique<CalciteJNI::Impl, std::shared_ptr<SchemaProvider>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long&> ()
at /home/alexb/.conda/envs/omnisci-dev/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/unique_ptr.h:857
#8 CalciteJNI::CalciteJNI (this=0x555556a04190, schema_provider=..., udf_filename=..., calcite_max_mem_mb=1024)
at /home/alexb/Projects/hdk/omniscidb/Calcite/CalciteJNI.cpp:602
#9 0x00007fff2997906e in __gnu_cxx::new_allocator<CalciteJNI>::construct<CalciteJNI, std::shared_ptr<SchemaProvider>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, unsigned long&> (this=<optimized out>, __p=0x555556a04190)
at /home/alexb/.conda/envs/omnisci-dev/x86_64-conda-linux-gnu/include/c++/9.4.0/ext/new_allocator.h:146
#10 std::allocator_traits<std::allocator<CalciteJNI> >::construct<CalciteJNI, std::shared_ptr<SchemaProvider>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, unsigned long&> (__a=..., __p=0x555556a04190)
at /home/alexb/.conda/envs/omnisci-dev/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/alloc_traits.h:483
#11 std::_Sp_counted_ptr_inplace<CalciteJNI, std::allocator<CalciteJNI>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::shared_ptr<SchemaProvider>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, unsigned long&> (
__a=..., this=0x555556a04180)
at /home/alexb/.conda/envs/omnisci-dev/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/shared_ptr_base.h:548
#12 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<CalciteJNI, std::allocator<CalciteJNI>, std::shared_ptr<SchemaProvider>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, unsigned long&> (__a=...,
__p=<optimized out>, this=<optimized out>)
at /home/alexb/.conda/envs/omnisci-dev/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/shared_ptr_base.h:679
#13 std::__shared_ptr<CalciteJNI, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<CalciteJNI>, std::shared_ptr<SchemaProvider>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, unsigned long&> (__tag=...,
this=<optimized out>)
at /home/alexb/.conda/envs/omnisci-dev/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/shared_ptr_base.h:1344
#14 std::shared_ptr<CalciteJNI>::shared_ptr<std::allocator<CalciteJNI>, std::shared_ptr<SchemaProvider>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, unsigned long&> (__tag=..., this=<optimized out>)
at /home/alexb/.conda/envs/omnisci-dev/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/shared_ptr.h:359
#15 std::allocate_shared<CalciteJNI, std::allocator<CalciteJNI>, std::shared_ptr<SchemaProvider>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, unsigned long&> (__a=...)
at /home/alexb/.conda/envs/omnisci-dev/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/shared_ptr.h:702

Support extract/date time runtime in L0 backend

For Taxi Q3/Q4, we need to determine how to pull the Date/Time runtime into SPIRV. For CUDA, we compile the extension functions into a CUDA FatBinary at build time, then use the CUDA linker. We could follow a similar approach with SPIRV, or move the time extraction functions to the module and inline them during the JIT process. The downside to this could be increased module compile time (though there are some optimizations meant to keep such increases to a minimum), so we are considering building a benchmark to test.

Heterogeneous execution fail on assert

There seems to be a flaw with recompilation when QueryMustRunOnCPU is thrown.
ArrowBasedExecutionTest fails:

2022-11-17T08:53:04.345347 F 2829187 0 0 RelAlgExecutor.cpp:622 Check failed: co.device_type == ExecutorDeviceType::GPU

Executor holds dangling reference to data mgr

In the unit tests, we delete storage between each test. This deletes DataMgr, but the Executor ends up with a pointer to the old DataMgr. This results in a segfault the next time an Executor is created, because the Python Executor class calls getExecutor which pulls from the Executor pool.

Bringing in jit-engine causes compiler error with date/time runtime

Compiler error bringing in extract from time code:

heterogeneous-data-kernels/omniscidb/QueryEngine/ExtractFromTime.cpp: In function 'int64_t ExtractFromTime(ExtractField, int64_t)':
heterogeneous-data-kernels/omniscidb/QueryEngine/ExtractFromTime.cpp:156:1: error: inlining failed in call to always_inline 'int64_t extract_epoch(int64_t)': function body can be overwritten at link time
  156 | extract_epoch(const int64_t timeval) {
      | ^~~~~~~~~~~~~
heterogeneous-data-kernels/omniscidb/QueryEngine/ExtractFromTime.cpp:270:27: note: called from here
  270 |       return extract_epoch(timeval);
      |              ~~~~~~~~~~~~~^~~~~~~~~

Allow GPU execution

Add GPU managers initialization, options for controlling device selection at user level.

pyhdk failing inside Jupyter notebook

When running inside a Jupyter notebook we get:

      [3] data_mgr = pyhdk.storage.DataMgr()
      [4] data_mgr.registerDataProvider(storage)
----> [6] calcite = pyhdk.sql.Calcite(storage)
      [7] executor = pyhdk.Executor(data_mgr)
      [9] import pyarrow

File _sql.pyx:36, in pyhdk._sql.Calcite.__cinit__()

RuntimeError: Couldn't initialize JVM.

pyhdk from conda-forge segfaults

Here is the scenario:

conda env remove -n omnisci-dev
conda env update -f omniscidb/scripts/mapd-deps-conda-dev-env.yml
git clone https://github.com/intel-ai/modin.git
git checkout ienkovich/pyhdk
conda activate omnisci-dev
mamba install  -c conda-forge pyhdk
cd modin/
pip install -e .
cd ..
python python/tests/modin/modin_smoke_test.py

Here is the error:

UserWarning: Distributing <class 'list'> object. This may take some time.
FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
0    12
Name: a, dtype: int64
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f2bcc5c3b85, pid=2895537, tid=2895537
#
# JRE version: OpenJDK Runtime Environment (11.0.15) (build 11.0.15-internal+0-adhoc..src)
# Java VM: OpenJDK 64-Bit Server VM (11.0.15-internal+0-adhoc..src, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  [libjimage.so+0x2b85]  ImageStrings::find(Endian*, char const*, int*, unsigned int)+0x65
#
# Core dump will be written. Default location: /localdisk2/afedotov/git/hdk/core
#
# An error report file with more information is saved as:
# /localdisk2/afedotov/git/hdk/hs_err_pid2895537.log
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#
Aborted (core dumped)

Fix import of Arrow table with time32[s] data

Currently, ArrowStorage fails to import such data due to improper scheme checks. But other issues might also exist in the actual data import.

C++ exception with description "Mismatched type for column col4: timestamp[s] vs. time32[s]" thrown in the test body.

Allow building HDK from an arbitrary folder

Our current flow assumes the exact location of the build folder. This is a request to lift that restriction to allow something like this:

cd /my/build/folder
cmake /path/to/hdk

InsertOrderFragmenter depends on Catalog code

The method insertData depends on Catalog::getTableEpochs for error handling. This requires linking Catalog into Fragmenter, and Fragmenter is currently a dependency on data fetch in QueryEngine. Need to elevate the Catalog accesses to remove the dependency.

Enable HDK on Windows

Enabling includes successful execution of all OmniSci and HDK tests and integration with Modin.

Replicate jit-engine CI here

In preparation for repo merge, we should add the GitHub actions from the other repo here, to run under the omniscidb folder. We will need to copy the actions over and update the paths.

create manylinux2014_x86_64 build

Status:

The container manylinux2014_x86_64 does not work because of outdated repo url. The container from cibuildwheel cannot be used because it does not have sudo.

It seems people build their own containers for their builds and check them using auditwheel

Flaky Select.FilterAndSimpleAggregation test

After CUDA tests were introduced to CI, we saw failures of the Select.FilterAndSimpleAggregation test. The failure is flaky, but if it fails, it always fails in the same way:

Expected equality of these values:
  20
  v<int64_t>( run_simple_agg("SELECT COUNT(*) FROM test WHERE MOD(x, 7) <> 7;", dt))
    Which is: 22

I found out, that it's enough to leave only this particular query in the test to reproduce the failure. The query is supposed to return the number of rows but somehow returns a greater value. The input table has 10 fragments, 2 rows each.

I dumped generated IR module and all data copied to a CUDA device. Dumps are the same for good and bad runs. It looks like we run the same code on the same data but get different results.

I was able to reproduce it on an August 30 version of jit-engine branch, so the problem is not new. Don't know when it was introduced.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.