intel-ai / hdk Goto Github PK
View Code? Open in Web Editor NEWA low-level execution library for analytic data processing.
License: Apache License 2.0
A low-level execution library for analytic data processing.
License: Apache License 2.0
It appears that the environment variable MAPD_LOG_DIR
set here https://github.com/intel-ai/omniscidb/blob/jit-engine/Calcite/CMakeLists.txt#L30 is not being picked up by the log4j properties file(s) https://github.com/intel-ai/omniscidb/blob/jit-engine/Calcite/java/calcite/src/main/resources/log4j2.properties.
To reproduce, build as normal, then enter build/Tests
and run ArrowBasedExecuteTest --gtest_filter=Select.GroupBy
(keeps the tests brief).
@vlad-penkin Ilya suggested you might have some ideas about how to debug?
Since the default L0 driver location is in system libraries, there's an issue when building jit engine. CMake's find_package
looks for the headers and finds them in /usr/include
. CMake does not add it to include even if the target_include_directories
is set explicitly to include the system paths (see https://gitlab.kitware.com/cmake/cmake/-/issues/17966 for details). Providing hints/paths to find_package
break the linking process for other libraries due to conflicts.
There's also currently no conda package that we could use to avoid using the system includes/libraries. We need to either build a package or create a workaround for building jit-engine with L0 under conda env.
Blocked by #34
Add a smoke/sanity test for Modin powered by HDK to the GitHub Actions tests in this repo. Will need to use the HDK branch in the Modin repository for now.
e.g.:
import pyhdk
storage = pyhdk.storage.ArrowStorage(1) # 1 is schema id
data_mgr = pyhdk.storage.DataMgr()
data_mgr.registerDataProvider(storage)
calcite = pyhdk.sql.Calcite(storage)
executor = pyhdk.Executor(data_mgr)
import pyarrow
import pandas
at = pyarrow.Table.from_pandas(
pandas.DataFrame({"a": [1, 2, 3], "b": [10, 20, 30]})
)
opt = pyhdk.storage.TableOptions(2)
storage.importArrowTable(at, "test", opt)
sql = "SELECT * FROM test;"
ra = calcite.process(sql)
rel_alg_executor = pyhdk.sql.RelAlgExecutor(executor, storage, data_mgr, ra)
print(rel_alg_executor.execute().to_arrow().to_pandas())
print(rel_alg_executor.execute(just_explain = True).to_explain_str())
See #51 for details
My attempt to use Ubuntu 22 was blocked with an invalid CUDA package for this Ubuntu version, see https://askubuntu.com/questions/1421423/cuda-11-7-dependencies-issue-on-ubuntu-22-04
I'll check if it's possible update maven only
Most of the code generation enabling requires a simple change: switch to correct pointers address space and calling convention. For native code generation, this is handled by CodegenTraits
, however, existing codegen logic does not allow passing CodegenTraits
to CodeGenerator
at construction time.
From Igor:
we have version mark here https://github.com/intel-ai/hdk/blob/main/CMakeLists.txt#L5 and here https://github.com/intel-ai/hdk/blob/main/CMakeLists.txt#L32 - probably we can drop one of it to not change in two places each time
Remove java target folder and Cython generated cpp files in python folder when running make clean from this repo.
Functionality required:
Functionality not required/desired:
Initial attempts have failed due to linking problems, but we can try again once https://github.com/intel-ai/omniscidb/pull/332 lands.
Also requires:
If I update omniscidb submodule to the latest jit-engine branch then I get this error trying to parse RelAlg JSON queries in Calcite:
java.lang.NoClassDefFoundError: com/fasterxml/jackson/annotation/JsonIncludeProperties
at com.fasterxml.jackson.databind.introspect.JacksonAnnotationIntrospector.findPropertyInclusionByName(JacksonAnnotationIntrospector.java:321) ~[calcite-1.0-SNAPSHOT-jar-with-dependencies.jar:?]
Looks like it is related to the latest change in the jackson-databind
version used by Calcite. The problem can be reproduced using ienkovich/config
of HDK and ienkovich/pyhdk-config
branch of Modin.
Use github pages + sphinx + github actions for auto-build?
Running pytest
from the hdk tests directory causes a crash on the second test.
Specifically, this line is failing:
if (JNI_CreateJavaVM(&jvm, (void**)&env, &vm_args) != JNI_OK) {
LOG(FATAL) << "Couldn't initialize JVM.";
}
And due to some issues with the logger not being around anymore, we are failing to log the message and aborting.
The problem appears to be an issue of JNI context trying to initialize the JVM twice.
Full backtrace:
Thread 1 "python3" received signal SIGABRT, Aborted.
0x00007ffff7c8e36c in ?? () from /usr/lib/libc.so.6
(gdb) bt
#0 0x00007ffff7c8e36c in ?? () from /usr/lib/libc.so.6
#1 0x00007ffff7c3e838 in raise () from /usr/lib/libc.so.6
#2 0x00007ffff7c28535 in abort () from /usr/lib/libc.so.6
#3 0x00007fff29d4cac0 in logger::Logger::~Logger (this=0x7fffffff6110, __in_chrg=<optimized out>)
at /home/alexb/Projects/hdk/omniscidb/Logger/Logger.cpp:459
#4 0x00007fff29959551 in (anonymous namespace)::JVM::createJVM (max_mem_mb=<optimized out>)
at /home/alexb/Projects/hdk/omniscidb/Calcite/CalciteJNI.cpp:144
#5 (anonymous namespace)::JVM::getInstance (max_mem_mb=<optimized out>)
at /home/alexb/Projects/hdk/omniscidb/Calcite/CalciteJNI.cpp:88
#6 CalciteJNI::Impl::Impl (this=0x5555564e1060, schema_provider=..., udf_filename=..., calcite_max_mem_mb=<optimized out>)
at /home/alexb/Projects/hdk/omniscidb/Calcite/CalciteJNI.cpp:171
#7 0x00007fff2995ab13 in std::make_unique<CalciteJNI::Impl, std::shared_ptr<SchemaProvider>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long&> ()
at /home/alexb/.conda/envs/omnisci-dev/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/unique_ptr.h:857
#8 CalciteJNI::CalciteJNI (this=0x555556a04190, schema_provider=..., udf_filename=..., calcite_max_mem_mb=1024)
at /home/alexb/Projects/hdk/omniscidb/Calcite/CalciteJNI.cpp:602
#9 0x00007fff2997906e in __gnu_cxx::new_allocator<CalciteJNI>::construct<CalciteJNI, std::shared_ptr<SchemaProvider>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, unsigned long&> (this=<optimized out>, __p=0x555556a04190)
at /home/alexb/.conda/envs/omnisci-dev/x86_64-conda-linux-gnu/include/c++/9.4.0/ext/new_allocator.h:146
#10 std::allocator_traits<std::allocator<CalciteJNI> >::construct<CalciteJNI, std::shared_ptr<SchemaProvider>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, unsigned long&> (__a=..., __p=0x555556a04190)
at /home/alexb/.conda/envs/omnisci-dev/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/alloc_traits.h:483
#11 std::_Sp_counted_ptr_inplace<CalciteJNI, std::allocator<CalciteJNI>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::shared_ptr<SchemaProvider>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, unsigned long&> (
__a=..., this=0x555556a04180)
at /home/alexb/.conda/envs/omnisci-dev/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/shared_ptr_base.h:548
#12 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<CalciteJNI, std::allocator<CalciteJNI>, std::shared_ptr<SchemaProvider>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, unsigned long&> (__a=...,
__p=<optimized out>, this=<optimized out>)
at /home/alexb/.conda/envs/omnisci-dev/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/shared_ptr_base.h:679
#13 std::__shared_ptr<CalciteJNI, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<CalciteJNI>, std::shared_ptr<SchemaProvider>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, unsigned long&> (__tag=...,
this=<optimized out>)
at /home/alexb/.conda/envs/omnisci-dev/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/shared_ptr_base.h:1344
#14 std::shared_ptr<CalciteJNI>::shared_ptr<std::allocator<CalciteJNI>, std::shared_ptr<SchemaProvider>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, unsigned long&> (__tag=..., this=<optimized out>)
at /home/alexb/.conda/envs/omnisci-dev/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/shared_ptr.h:359
#15 std::allocate_shared<CalciteJNI, std::allocator<CalciteJNI>, std::shared_ptr<SchemaProvider>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, unsigned long&> (__a=...)
at /home/alexb/.conda/envs/omnisci-dev/x86_64-conda-linux-gnu/include/c++/9.4.0/bits/shared_ptr.h:702
In case of Arrow import data error in PyHDK, we would just get a segfault with no proper diagnostic message. Python exceptions would be much nicer.
Need to enable PRs checks in the HDK CI
For Taxi Q3/Q4, we need to determine how to pull the Date/Time runtime into SPIRV. For CUDA, we compile the extension functions into a CUDA FatBinary at build time, then use the CUDA linker. We could follow a similar approach with SPIRV, or move the time extraction functions to the module and inline them during the JIT process. The downside to this could be increased module compile time (though there are some optimizations meant to keep such increases to a minimum), so we are considering building a benchmark to test.
There's a bug in how we process steps if multiple threads are running.
There seems to be a flaw with recompilation when QueryMustRunOnCPU
is thrown.
ArrowBasedExecutionTest fails:
2022-11-17T08:53:04.345347 F 2829187 0 0 RelAlgExecutor.cpp:622 Check failed: co.device_type == ExecutorDeviceType::GPU
In the unit tests, we delete storage between each test. This deletes DataMgr, but the Executor ends up with a pointer to the old DataMgr. This results in a segfault the next time an Executor is created, because the Python Executor class calls getExecutor
which pulls from the Executor pool.
Needs investigation.
Build library exposing required endpoints and Python wrapper
Compiler error bringing in extract from time code:
heterogeneous-data-kernels/omniscidb/QueryEngine/ExtractFromTime.cpp: In function 'int64_t ExtractFromTime(ExtractField, int64_t)':
heterogeneous-data-kernels/omniscidb/QueryEngine/ExtractFromTime.cpp:156:1: error: inlining failed in call to always_inline 'int64_t extract_epoch(int64_t)': function body can be overwritten at link time
156 | extract_epoch(const int64_t timeval) {
| ^~~~~~~~~~~~~
heterogeneous-data-kernels/omniscidb/QueryEngine/ExtractFromTime.cpp:270:27: note: called from here
270 | return extract_epoch(timeval);
| ~~~~~~~~~~~~~^~~~~~~~~
Add GPU managers initialization, options for controlling device selection at user level.
When running inside a Jupyter notebook we get:
[3] data_mgr = pyhdk.storage.DataMgr()
[4] data_mgr.registerDataProvider(storage)
----> [6] calcite = pyhdk.sql.Calcite(storage)
[7] executor = pyhdk.Executor(data_mgr)
[9] import pyarrow
File _sql.pyx:36, in pyhdk._sql.Calcite.__cinit__()
RuntimeError: Couldn't initialize JVM.
Here is the scenario:
conda env remove -n omnisci-dev
conda env update -f omniscidb/scripts/mapd-deps-conda-dev-env.yml
git clone https://github.com/intel-ai/modin.git
git checkout ienkovich/pyhdk
conda activate omnisci-dev
mamba install -c conda-forge pyhdk
cd modin/
pip install -e .
cd ..
python python/tests/modin/modin_smoke_test.py
Here is the error:
UserWarning: Distributing <class 'list'> object. This may take some time.
FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
0 12
Name: a, dtype: int64
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f2bcc5c3b85, pid=2895537, tid=2895537
#
# JRE version: OpenJDK Runtime Environment (11.0.15) (build 11.0.15-internal+0-adhoc..src)
# Java VM: OpenJDK 64-Bit Server VM (11.0.15-internal+0-adhoc..src, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C [libjimage.so+0x2b85] ImageStrings::find(Endian*, char const*, int*, unsigned int)+0x65
#
# Core dump will be written. Default location: /localdisk2/afedotov/git/hdk/core
#
# An error report file with more information is saved as:
# /localdisk2/afedotov/git/hdk/hs_err_pid2895537.log
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
#
Aborted (core dumped)
Currently, ArrowStorage fails to import such data due to improper scheme checks. But other issues might also exist in the actual data import.
C++ exception with description "Mismatched type for column col4: timestamp[s] vs. time32[s]" thrown in the test body.
Our current flow assumes the exact location of the build folder. This is a request to lift that restriction to allow something like this:
cd /my/build/folder
cmake /path/to/hdk
The method insertData
depends on Catalog::getTableEpochs
for error handling. This requires linking Catalog into Fragmenter, and Fragmenter is currently a dependency on data fetch in QueryEngine. Need to elevate the Catalog accesses to remove the dependency.
Enabling includes successful execution of all OmniSci and HDK tests and integration with Modin.
In preparation for repo merge, we should add the GitHub actions from the other repo here, to run under the omniscidb
folder. We will need to copy the actions over and update the paths.
Status:
The container manylinux2014_x86_64 does not work because of outdated repo url. The container from cibuildwheel
cannot be used because it does not have sudo
.
It seems people build their own containers for their builds and check them using auditwheel
After CUDA tests were introduced to CI, we saw failures of the Select.FilterAndSimpleAggregation
test. The failure is flaky, but if it fails, it always fails in the same way:
Expected equality of these values:
20
v<int64_t>( run_simple_agg("SELECT COUNT(*) FROM test WHERE MOD(x, 7) <> 7;", dt))
Which is: 22
I found out, that it's enough to leave only this particular query in the test to reproduce the failure. The query is supposed to return the number of rows but somehow returns a greater value. The input table has 10 fragments, 2 rows each.
I dumped generated IR module and all data copied to a CUDA device. Dumps are the same for good and bad runs. It looks like we run the same code on the same data but get different results.
I was able to reproduce it on an August 30 version of jit-engine branch, so the problem is not new. Don't know when it was introduced.
Support installing all required dependencies on Linux using vcpkg.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.