Giter Site home page Giter Site logo

clickhouse / clickbench Goto Github PK

View Code? Open in Web Editor NEW
570.0 25.0 126.0 3.32 MB

ClickBench: a Benchmark For Analytical Databases

Home Page: https://benchmark.clickhouse.com/

License: Other

Shell 16.78% Python 0.28% HTML 80.22% JavaScript 2.72%
analytics benchmark big-data databases olap sql

clickbench's Introduction

Website Apache 2.0 License

The ClickHouse company logo.

ClickHouse® is an open-source column-oriented database management system that allows generating analytical data reports in real-time.

How To Install (Linux, macOS, FreeBSD)

curl https://clickhouse.com/ | sh

Useful Links

  • Official website has a quick high-level overview of ClickHouse on the main page.
  • ClickHouse Cloud ClickHouse as a service, built by the creators and maintainers.
  • Tutorial shows how to set up and query a small ClickHouse cluster.
  • Documentation provides more in-depth information.
  • YouTube channel has a lot of content about ClickHouse in video format.
  • Slack and Telegram allow chatting with ClickHouse users in real-time.
  • Blog contains various ClickHouse-related articles, as well as announcements and reports about events.
  • Code Browser (github.dev) with syntax highlighting, powered by github.dev.
  • Contacts can help to get your questions answered if there are any.

Monthly Release & Community Call

Every month we get together with the community (users, contributors, customers, those interested in learning more about ClickHouse) to discuss what is coming in the latest release. If you are interested in sharing what you've built on ClickHouse, let us know.

Upcoming Events

Keep an eye out for upcoming meetups and events around the world. Somewhere else you want us to be? Please feel free to reach out to tyler <at> clickhouse <dot> com. You can also peruse ClickHouse Events for a list of all upcoming trainings, meetups, speaking engagements, etc.

Recent Recordings

  • Recent Meetup Videos: Meetup Playlist Whenever possible recordings of the ClickHouse Community Meetups are edited and presented as individual talks. Current featuring "Modern SQL in 2023", "Fast, Concurrent, and Consistent Asynchronous INSERTS in ClickHouse", and "Full-Text Indices: Design and Experiments"
  • Recording available: v24.2 Release Call All the features of 24.2, one convenient video! Watch it now!

Interested in joining ClickHouse and making it your full-time job?

We are a globally diverse and distributed team, united behind a common goal of creating industry-leading, real-time analytics. Here, you will have an opportunity to solve some of the most cutting-edge technical challenges and have direct ownership of your work and vision. If you are a contributor by nature, a thinker and a doer - we’ll definitely click!

Check out our current openings here: https://clickhouse.com/company/careers

Can't find what you are looking for, but want to let us know you are interested in joining ClickHouse? Email [email protected]!

clickbench's People

Contributors

alexey-milovidov avatar arno756 avatar cnkk avatar farthur-cmd avatar felixoid avatar hello-stephen avatar jasonthorsness avatar justindeguzman avatar jychen7 avatar kmitchener avatar mofeiatwork avatar mosinnik avatar mytherin avatar nickitat avatar nmreadelf avatar osawyerr avatar patricklauer avatar philippemnoel avatar pkartaviy avatar puzpuzpuz avatar qoega avatar shhnwz avatar silverbullet233 avatar sundy-li avatar tbragin avatar thunderxblitz avatar tinybit avatar toschmidt avatar tylerhannan avatar waitingkuo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clickbench's Issues

Q17 doesn't have sorting

Q17 is Q16 but without sorting. So Q17 results always unpredictable. Should it be updated to use ORDER BY somehow?

Q17 Now is:

SELECT UserID, SearchPhrase, COUNT(*) FROM hits GROUP BY UserID, SearchPhrase LIMIT 10;

feat: add Github Actions to auto generate index.html

Background

currently, after updating individual PR, we also need to manually run generate-results.sh to update index.html.

Problem

  1. It is inconvenient
  2. It is confusing why PR on one system will "unexpectedly" update another system's result (e.g. this Datafusion PR updates Clickhouse result. I think because Clickhouse forgot to update index.html before)

Proposal

add a GitHub action on the main branch to auto-generate index.html and push it back to the main branch
(be careful about the dead loop, we may prevent it by check git log whether index.html is modified)

Add YTsaurus support

What is YTsaurus: https://ytsaurus.tech/

Since YTsaurus does not have a built-in benchmark tool, could be a good idea to add YTsaurus support to ClickBench to get a possibility to benchmark YTsaurus somehow. Since YTsaurus supports YQL and has CLI, it should be possible quite easy to adopt ClickBench (at least from the interface perspective).

Honestly, not sure how typical workloads from ClickBench are for YTsaurus - needs to be clarified.

@gritukan you could be interested in this issue. Resolving this one can help with ytsaurus/ytsaurus#40

Help wanted: Databricks

We need your help adding Databricks.

Note: if you will not help, I will add it by myself, and it might be with less attention to nuances.

Inaccurate table size calculation of Mysql

At the moment, the below command is used in order to calculate the storage size of the table in MySQL:

sudo du -bcs /var/lib/mysql

Because of the irrelevant files of the Mysql (about 1G) and Binlog files (tens of Gigabytes), it doesn't seem reasonable to me to calculate table size in the mentioned way. Instead, it may be more appropriate to use the below query:

SELECT 
    table_name AS `Table`, 
    round(((data_length + index_length) / 1024 / 1024), 2) `Size in MB` 
FROM information_schema.TABLES 
WHERE table_schema = "test"
    AND table_name = "hits";

This will give us the actual size of the hits table. For 10M rows of the table, I got 14G for the first command and 6.4G for the proposed query. I found out that about 6G of 14G was due to the Binlog files, which are ephemeral, and I don't see any point to include them in the storage result.

Consequently, for the full hits table, we may end up with at most 70G instead of 160G, which is more rational to me.

What do you think?

Syntax error for postgresql CREATE TABLE

Hi, When I use ClickHouse run benchmark for PostgreSQL, I found there is a syntax error in create.sql. Here we should remove the last comma.

diff --git a/postgresql/create.sql b/postgresql/create.sql
index 10185d5..41c961c 100644
--- a/postgresql/create.sql
+++ b/postgresql/create.sql
@@ -104,5 +104,5 @@ CREATE TABLE hits
     HasGCLID SMALLINT NOT NULL,
     RefererHash BIGINT NOT NULL,
     URLHash BIGINT NOT NULL,
-    CLID INTEGER NOT NULL,
+    CLID INTEGER NOT NULL
 );

Elasticsearch benchmarks flush the cache between queries

I've been trying to understand with my local benchmarking of roughly equivalent queries from my app (at least to get the data I need) why ES is outperforming CH by a factor of 10x in some cases, especially compared to the benchmarks here.

https://github.com/ClickHouse/ClickBench/blob/main/elasticsearch/run.sh#L11-L12

I can sort of see the reasoning for clearing the ES cache between queries but I feel that it's misleading with regards to the real world performance people will see.

While it's true from my observations that my equivalent CH and ES queries take around the same time from a cold start, ES quickly improves while CH stays relatively consistent and so on the whole much slower. ES also deals with concurrency much better, using Apache Benchmark with a concurrency of 3 I could see little change for ES but CH was taking up 4x longer than normal to deal with 1000 requests.

I feel at the very least there should be some indication that caches are not a factor in these benchmarks, and so should not be viewed as representative of real-world performance, at least in this case.

Benchmark for StarRocks

Hey there, I found that you got some problems when trying to benchmark StarRocks, and I'm willing to help, to make the benchmark result public.

Several ways to download StarRocks:

Current latest stable version is 2.2.2, and latest RC version is 2.3.0-rc2.

If you got any more problems when downloading or benchmarking, please contact our team at this issue or our slack channel.

Segmentation fault running hardware.sh while running Test 17

Hello,

I am running the hardware.sh on a Ubuntu 22.04 LTS setup with a i9-12900K processor but I always get a segfault which I guess is not what should happen?

~/ClickBench/hardware/clickhouse-benchmark ~/ClickBench/hardware
 19:05:33 up 36 min,  0 users,  load average: 0.48, 1.17, 1.08
Starting clickhouse-server
Waiting for clickhouse-server to start
Ok.
Dataset already downloaded
Dataset already prepared

Will perform benchmark. Results:

[0.001, 0.013, 0.001],
[0.011, 0.008, 0.011],
[0.044, 0.037, 0.032],
[0.062, 0.030, 0.025],
[0.100, 0.060, 0.074],
[0.161, 0.184, 0.139],
[0.001, 0.001, 0.001],
[0.013, 0.015, 0.008],
[0.414, 0.374, 0.367],
[0.473, 0.419, 0.410],
[0.147, 0.095, 0.086],
[0.128, 0.115, 0.121],
[0.671, 0.624, 0.630],
[0.876, 0.797, 0.802],
[0.805, 0.705, 0.645],
[0.629, 0.672, 0.673],
[2.454, [Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.218443 [ 11856 ] <Fatal> BaseDaemon: ########################################
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.218522 [ 11856 ] <Fatal> BaseDaemon: (version 22.10.1.934 (official build), build id: 833BDA9C186AC201676309D415DD72D3E0D5F458) (from thread 11592) (query_id: 464568b4-3d14-4903-9a6c-a5a6f48410d7) (query: SELECT UserID, SearchPhrase, count() FROM hits GROUP BY UserID, SearchPhrase ORDER BY count() DESC LIMIT 10;) Received signal Segmentation fault (11)
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.218565 [ 11856 ] <Fatal> BaseDaemon: Address: 0x10 Access: read. Address not mapped to object.
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.218577 [ 11856 ] <Fatal> BaseDaemon: Stack trace: 0x11c78a52 0x11adcdf6 0x134b2e08 0x132dd655 0x132dd1e6 0x132f92c6 0x132ed49c 0x132efb1d 0xcd88f8c 0xcd8e69e 0x7f335b04ab43 0x7f335b0dca00
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.612117 [ 11856 ] <Fatal> BaseDaemon: 2.1. inlined from ./build_docker/../base/base/StringRef.h:78: compareSSE2(char const*, char const*)
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.612187 [ 11856 ] <Fatal> BaseDaemon: 2.2. inlined from ./build_docker/../base/base/StringRef.h:152: memequalSSE2Wide(char const*, char const*, unsigned long)
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.612198 [ 11856 ] <Fatal> BaseDaemon: 2.3. inlined from ./build_docker/../base/base/StringRef.h:171: operator==(StringRef, StringRef)
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.612211 [ 11856 ] <Fatal> BaseDaemon: 2.4. inlined from ./build_docker/../src/Common/HashTable/HashTable.h:95: bool bitEquals<StringRef const&>(StringRef const&, StringRef const&)
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.612224 [ 11856 ] <Fatal> BaseDaemon: 2.5. inlined from ./build_docker/../src/Common/HashTable/HashMap.h:174: HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>::keyEquals(StringRef const&, unsigned long) const
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.612238 [ 11856 ] <Fatal> BaseDaemon: 2.6. inlined from ./build_docker/../src/Common/HashTable/HashMap.h:175: HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>::keyEquals(StringRef const&, unsigned long, HashTableNoState const&) const
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.612252 [ 11856 ] <Fatal> BaseDaemon: 2.7. inlined from ./build_docker/../src/Common/HashTable/HashTable.h:486: HashTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true> >::findCell(StringRef const&, unsigned long, unsigned long) const
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.612270 [ 11856 ] <Fatal> BaseDaemon: 2.8. inlined from ./build_docker/../src/Common/HashTable/HashTable.h:989: void HashTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true> >::emplaceNonZero<StringRef const&>(StringRef const&, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>*&, bool&, unsigned long)
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.612284 [ 11856 ] <Fatal> BaseDaemon: 2.9. inlined from ./build_docker/../src/Common/HashTable/HashTable.h:1069: void HashTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true> >::emplace<StringRef const&>(StringRef const&, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>*&, bool&, unsigned long)
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.612314 [ 11856 ] <Fatal> BaseDaemon: 2.10. inlined from ./build_docker/../src/Common/HashTable/HashMap.h:232: void HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true> >::mergeToViaEmplace<void DB::Aggregator::mergeDataImpl<DB::AggregationMethodSerialized<TwoLevelHashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true>, HashMapTable> >, true, false, HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true> > >(HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true> >&, HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true> >&, DB::Arena*) const::'lambda'(char*&, char*&, bool), false>(HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true> >&, void DB::Aggregator::mergeDataImpl<DB::AggregationMethodSerialized<TwoLevelHashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true>, HashMapTable> >, true, false, HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true> > >(HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true> >&, HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true> >&, DB::Arena*) const::'lambda'(char*&, char*&, bool)&&)
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.612333 [ 11856 ] <Fatal> BaseDaemon: 2. ./build_docker/../src/Interpreters/Aggregator.cpp:2437: void DB::Aggregator::mergeDataImpl<DB::AggregationMethodSerialized<TwoLevelHashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true>, HashMapTable> >, true, false, HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true> > >(HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true> >&, HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true> >&, DB::Arena*) const @ 0x11c78a52 in /home/mike/ClickBench/hardware/clickhouse-benchmark/clickhouse
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.903818 [ 11856 ] <Fatal> BaseDaemon: 3.1. inlined from ./build_docker/../src/Interpreters/Aggregator.cpp:2616: void DB::Aggregator::mergeBucketImpl<DB::AggregationMethodSerialized<TwoLevelHashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, DefaultHash<StringRef>, HashTableNoState>, DefaultHash<StringRef>, TwoLevelHashTableGrower<8ul>, Allocator<true, true>, HashMapTable> > >(std::__1::vector<std::__1::shared_ptr<DB::AggregatedDataVariants>, std::__1::allocator<std::__1::shared_ptr<DB::AggregatedDataVariants> > >&, int, DB::Arena*, std::__1::atomic<bool>*) const
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.904102 [ 11856 ] <Fatal> BaseDaemon: 3. ./build_docker/../src/Interpreters/Aggregator.cpp:1668: DB::Aggregator::mergeAndConvertOneBucketToBlock(std::__1::vector<std::__1::shared_ptr<DB::AggregatedDataVariants>, std::__1::allocator<std::__1::shared_ptr<DB::AggregatedDataVariants> > >&, DB::Arena*, bool, unsigned long, std::__1::atomic<bool>*) const @ 0x11adcdf6 in /home/mike/ClickBench/hardware/clickhouse-benchmark/clickhouse
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.967403 [ 11856 ] <Fatal> BaseDaemon: 4. ./build_docker/../src/Processors/Transforms/AggregatingTransform.cpp:127: DB::ConvertingAggregatedToChunksSource::generate() @ 0x134b2e08 in /home/mike/ClickBench/hardware/clickhouse-benchmark/clickhouse
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.983802 [ 11856 ] <Fatal> BaseDaemon: 5.1. inlined from ./build_docker/../src/Processors/Chunk.h:90: DB::Chunk::hasRows() const
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.983867 [ 11856 ] <Fatal> BaseDaemon: 5.2. inlined from ./build_docker/../src/Processors/Chunk.h:92: DB::Chunk::empty() const
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.983878 [ 11856 ] <Fatal> BaseDaemon: 5.3. inlined from ./build_docker/../src/Processors/Chunk.h:93: DB::Chunk::operator bool() const
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.983887 [ 11856 ] <Fatal> BaseDaemon: 5. ./build_docker/../src/Processors/ISource.cpp:125: DB::ISource::tryGenerate() @ 0x132dd655 in /home/mike/ClickBench/hardware/clickhouse-benchmark/clickhouse
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.990772 [ 11856 ] <Fatal> BaseDaemon: 6.1. inlined from ./build_docker/../contrib/libcxx/include/optional:321: std::__1::__optional_storage_base<DB::Chunk, false>::has_value() const
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.990843 [ 11856 ] <Fatal> BaseDaemon: 6.2. inlined from ./build_docker/../contrib/libcxx/include/optional:975: std::__1::optional<DB::Chunk>::operator bool() const
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.990849 [ 11856 ] <Fatal> BaseDaemon: 6. ./build_docker/../src/Processors/ISource.cpp:94: DB::ISource::work() @ 0x132dd1e6 in /home/mike/ClickBench/hardware/clickhouse-benchmark/clickhouse
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.997860 [ 11856 ] <Fatal> BaseDaemon: 7.1. inlined from ./build_docker/../src/Processors/Executors/ExecutionThreadContext.cpp:0: DB::executeJob(DB::ExecutingGraph::Node*, DB::ReadProgressCallback*)
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:53.997936 [ 11856 ] <Fatal> BaseDaemon: 7. ./build_docker/../src/Processors/Executors/ExecutionThreadContext.cpp:92: DB::ExecutionThreadContext::executeTask() @ 0x132f92c6 in /home/mike/ClickBench/hardware/clickhouse-benchmark/clickhouse
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.011623 [ 11856 ] <Fatal> BaseDaemon: 8. ./build_docker/../src/Processors/Executors/PipelineExecutor.cpp:228: DB::PipelineExecutor::executeStepImpl(unsigned long, std::__1::atomic<bool>*) @ 0x132ed49c in /home/mike/ClickBench/hardware/clickhouse-benchmark/clickhouse
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.019734 [ 11856 ] <Fatal> BaseDaemon: 9.1. inlined from ./build_docker/../src/Processors/Executors/PipelineExecutor.cpp:0: operator()
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.019756 [ 11856 ] <Fatal> BaseDaemon: 9.2. inlined from ./build_docker/../contrib/libcxx/include/type_traits:3640: decltype(static_cast<DB::PipelineExecutor::spawnThreads()::$_0>(fp)()) std::__1::__invoke<ThreadFromGlobalPoolImpl<true>::ThreadFromGlobalPoolImpl<DB::PipelineExecutor::spawnThreads()::$_0>(DB::PipelineExecutor::spawnThreads()::$_0&&)::'lambda'()&>(DB::PipelineExecutor::spawnThreads()::$_0&&)
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.019762 [ 11856 ] <Fatal> BaseDaemon: 9.3. inlined from ./build_docker/../contrib/libcxx/include/__functional/invoke.h:61: void std::__1::__invoke_void_return_wrapper<void, true>::__call<ThreadFromGlobalPoolImpl<true>::ThreadFromGlobalPoolImpl<DB::PipelineExecutor::spawnThreads()::$_0>(DB::PipelineExecutor::spawnThreads()::$_0&&)::'lambda'()&>(ThreadFromGlobalPoolImpl<true>::ThreadFromGlobalPoolImpl<DB::PipelineExecutor::spawnThreads()::$_0>(DB::PipelineExecutor::spawnThreads()::$_0&&)::'lambda'()&)
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.019768 [ 11856 ] <Fatal> BaseDaemon: 9.4. inlined from ./build_docker/../contrib/libcxx/include/__functional/function.h:230: std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true>::ThreadFromGlobalPoolImpl<DB::PipelineExecutor::spawnThreads()::$_0>(DB::PipelineExecutor::spawnThreads()::$_0&&)::'lambda'(), void ()>::operator()()
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.019770 [ 11856 ] <Fatal> BaseDaemon: 9. ./build_docker/../contrib/libcxx/include/__functional/function.h:711: void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true>::ThreadFromGlobalPoolImpl<DB::PipelineExecutor::spawnThreads()::$_0>(DB::PipelineExecutor::spawnThreads()::$_0&&)::'lambda'(), void ()> >(std::__1::__function::__policy_storage const*) @ 0x132efb1d in /home/mike/ClickBench/hardware/clickhouse-benchmark/clickhouse
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.025178 [ 11856 ] <Fatal> BaseDaemon: 10.1. inlined from ./build_docker/../base/base/wide_integer_impl.h:772: bool wide::integer<128ul, unsigned int>::_impl::operator_eq<wide::integer<128ul, unsigned int> >(wide::integer<128ul, unsigned int> const&, wide::integer<128ul, unsigned int> const&)
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.025190 [ 11856 ] <Fatal> BaseDaemon: 10.2. inlined from ./build_docker/../base/base/wide_integer_impl.h:1439: bool wide::operator==<128ul, unsigned int, 128ul, unsigned int>(wide::integer<128ul, unsigned int> const&, wide::integer<128ul, unsigned int> const&)
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.025195 [ 11856 ] <Fatal> BaseDaemon: 10.3. inlined from ./build_docker/../base/base/strong_typedef.h:42: StrongTypedef<wide::integer<128ul, unsigned int>, DB::UUIDTag>::operator==(StrongTypedef<wide::integer<128ul, unsigned int>, DB::UUIDTag> const&) const
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.025200 [ 11856 ] <Fatal> BaseDaemon: 10.4. inlined from ./build_docker/../src/Common/OpenTelemetryTraceContext.h:39: DB::OpenTelemetry::Span::isTraceEnabled() const
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.025202 [ 11856 ] <Fatal> BaseDaemon: 10. ./build_docker/../src/Common/ThreadPool.cpp:296: ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0xcd88f8c in /home/mike/ClickBench/hardware/clickhouse-benchmark/clickhouse
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.031601 [ 11856 ] <Fatal> BaseDaemon: 11.1. inlined from ./build_docker/../contrib/libcxx/include/__memory/unique_ptr.h:312: std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >::reset(std::__1::__thread_struct*)
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.031608 [ 11856 ] <Fatal> BaseDaemon: 11.2. inlined from ./build_docker/../contrib/libcxx/include/__memory/unique_ptr.h:269: ~unique_ptr
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.031613 [ 11856 ] <Fatal> BaseDaemon: 11.3. inlined from ./build_docker/../contrib/libcxx/include/tuple:210: ~__tuple_leaf
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.031616 [ 11856 ] <Fatal> BaseDaemon: 11.4. inlined from ./build_docker/../contrib/libcxx/include/tuple:470: ~tuple
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.031621 [ 11856 ] <Fatal> BaseDaemon: 11.5. inlined from ./build_docker/../contrib/libcxx/include/__memory/unique_ptr.h:54: std::__1::default_delete<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>, bool)::'lambda0'()> >::operator()(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>, bool)::'lambda0'()>*) const
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.031628 [ 11856 ] <Fatal> BaseDaemon: 11.6. inlined from ./build_docker/../contrib/libcxx/include/__memory/unique_ptr.h:315: std::__1::unique_ptr<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>, bool)::'lambda0'()>, std::__1::default_delete<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>, bool)::'lambda0'()> > >::reset(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>, bool)::'lambda0'()>*)
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.031632 [ 11856 ] <Fatal> BaseDaemon: 11.7. inlined from ./build_docker/../contrib/libcxx/include/__memory/unique_ptr.h:269: ~unique_ptr
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.031636 [ 11856 ] <Fatal> BaseDaemon: 11. ./build_docker/../contrib/libcxx/include/thread:295: void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>, bool)::'lambda0'()> >(void*) @ 0xcd8e69e in /home/mike/ClickBench/hardware/clickhouse-benchmark/clickhouse
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.031641 [ 11856 ] <Fatal> BaseDaemon: 12. ? @ 0x7f335b04ab43 in ?
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.031644 [ 11856 ] <Fatal> BaseDaemon: 13. ? @ 0x7f335b0dca00 in ?
[Ubuntu-2204-jammy-amd64-base] 2022.10.05 19:05:54.118399 [ 11856 ] <Fatal> BaseDaemon: Integrity check of the executable successfully passed (checksum: B355A90358E27C0884733C9EB49D0A6E)
Code: 32. DB::Exception: Attempt to read after eof: while receiving packet from localhost:9000: (in query: SELECT UserID, SearchPhrase, count() FROM hits GROUP BY UserID, SearchPhrase ORDER BY count() DESC LIMIT 10;). (ATTEMPT_TO_READ_AFTER_EOF), Code: 210. DB::NetException: Connection refused (localhost:9000). (NETWORK_ERROR)],
[Code: 210. DB::NetException: Connection refused (localhost:9000). (NETWORK_ERROR), Code: 210. DB::NetException: Connection refused (localhost:9000). (NETWORK_ERROR), Code: 210. DB::NetException: Connection refused (localhost:9000). (NETWORK_ERROR)],
[Code: 210. DB::NetException: Connection refused (localhost:9000). (NETWORK_ERROR), Code: 210. DB::NetException: Connection refused (localhost:9000). (NETWORK_ERROR), Code: 210. DB::NetException: Connection refused (localhost:9000). (NETWORK_ERROR)],

Biased reports: Benchmark uses a different input for Clickhouse than for other databases

Since Clickhouse is running these benchmarks, it's not surprising that Clickhouse gets preferential treatment.

For example the "Detailed Comparison" shows in a strong red color that Snowflake took 42 minutes to load the data, while Clickhouse only a little more than 2 minutes:

image

How is that possible? What's the magic that allows Clickhouse to load data in 2 minutes, instead of 42?

The magic is that Clickhouse is feeding Clickhouse a different source of data:

If Clickhouse had used the same 100 Parquet files as input for Snowflake, loading times would have been roughly equivalent - as this is an I/O bounded operation that can be parallelized.

Disclosure: I'm Felipe Hoffa, and I work for Snowflake. By the way, I'm glad to see the great results Snowflake got in this "potentially" biased benchmark.

Ps: While we are here, I would recommend Clickhouse to delete the unprofessional snarky comments on https://github.com/ClickHouse/ClickBench/blob/main/snowflake/NOTES.md - if they want to keep the appearances of running a fair comparison.

inconsistent parquet format between hits.parquet and hits_0.parquet

hits.parquet has the String tag for binaries; all the fields are required

e.g. required binary field_id=-1 Title (String);

In [7]: pq.read_metadata('hits.parquet').schema
Out[7]: 
<pyarrow._parquet.ParquetSchema object at 0x11238ab40>
required group field_id=-1 schema {
  required int64 field_id=-1 WatchID;
  required int32 field_id=-1 JavaEnable (Int(bitWidth=16, isSigned=true));
  required binary field_id=-1 Title (String);
  required int32 field_id=-1 GoodEvent (Int(bitWidth=16, isSigned=true));
  required int64 field_id=-1 EventTime;
  required int32 field_id=-1 EventDate (Int(bitWidth=16, isSigned=false));
  required int32 field_id=-1 CounterID;
  required int32 field_id=-1 ClientIP;
  required int32 field_id=-1 RegionID;
  required int64 field_id=-1 UserID;
  required int32 field_id=-1 CounterClass (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 OS (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 UserAgent (Int(bitWidth=16, isSigned=true));
  required binary field_id=-1 URL (String);
  required binary field_id=-1 Referer (String);
  required int32 field_id=-1 IsRefresh (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 RefererCategoryID (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 RefererRegionID;
  required int32 field_id=-1 URLCategoryID (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 URLRegionID;
  required int32 field_id=-1 ResolutionWidth (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 ResolutionHeight (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 ResolutionDepth (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 FlashMajor (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 FlashMinor (Int(bitWidth=16, isSigned=true));
  required binary field_id=-1 FlashMinor2 (String);
  required int32 field_id=-1 NetMajor (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 NetMinor (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 UserAgentMajor (Int(bitWidth=16, isSigned=true));
  required binary field_id=-1 UserAgentMinor (String);
  required int32 field_id=-1 CookieEnable (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 JavascriptEnable (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 IsMobile (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 MobilePhone (Int(bitWidth=16, isSigned=true));
  required binary field_id=-1 MobilePhoneModel (String);
  required binary field_id=-1 Params (String);
  required int32 field_id=-1 IPNetworkID;
  required int32 field_id=-1 TraficSourceID (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 SearchEngineID (Int(bitWidth=16, isSigned=true));
  required binary field_id=-1 SearchPhrase (String);
  required int32 field_id=-1 AdvEngineID (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 IsArtifical (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 WindowClientWidth (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 WindowClientHeight (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 ClientTimeZone (Int(bitWidth=16, isSigned=true));
  required int64 field_id=-1 ClientEventTime;
  required int32 field_id=-1 SilverlightVersion1 (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 SilverlightVersion2 (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 SilverlightVersion3;
  required int32 field_id=-1 SilverlightVersion4 (Int(bitWidth=16, isSigned=true));
  required binary field_id=-1 PageCharset (String);
  required int32 field_id=-1 CodeVersion;
  required int32 field_id=-1 IsLink (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 IsDownload (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 IsNotBounce (Int(bitWidth=16, isSigned=true));
  required int64 field_id=-1 FUniqID;
  required binary field_id=-1 OriginalURL (String);
  required int32 field_id=-1 HID;
  required int32 field_id=-1 IsOldCounter (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 IsEvent (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 IsParameter (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 DontCountHits (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 WithHash (Int(bitWidth=16, isSigned=true));
  required binary field_id=-1 HitColor (String);
  required int64 field_id=-1 LocalEventTime;
  required int32 field_id=-1 Age (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 Sex (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 Income (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 Interests (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 Robotness (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 RemoteIP;
  required int32 field_id=-1 WindowName;
  required int32 field_id=-1 OpenerName;
  required int32 field_id=-1 HistoryLength (Int(bitWidth=16, isSigned=true));
  required binary field_id=-1 BrowserLanguage (String);
  required binary field_id=-1 BrowserCountry (String);
  required binary field_id=-1 SocialNetwork (String);
  required binary field_id=-1 SocialAction (String);
  required int32 field_id=-1 HTTPError (Int(bitWidth=16, isSigned=true));
  required int32 field_id=-1 SendTiming;
  required int32 field_id=-1 DNSTiming;
  required int32 field_id=-1 ConnectTiming;
  required int32 field_id=-1 ResponseStartTiming;
  required int32 field_id=-1 ResponseEndTiming;
  required int32 field_id=-1 FetchTiming;
  required int32 field_id=-1 SocialSourceNetworkID (Int(bitWidth=16, isSigned=true));
  required binary field_id=-1 SocialSourcePage (String);
  required int64 field_id=-1 ParamPrice;
  required binary field_id=-1 ParamOrderID (String);
  required binary field_id=-1 ParamCurrency (String);
  required int32 field_id=-1 ParamCurrencyID (Int(bitWidth=16, isSigned=true));
  required binary field_id=-1 OpenstatServiceName (String);
  required binary field_id=-1 OpenstatCampaignID (String);
  required binary field_id=-1 OpenstatAdID (String);
  required binary field_id=-1 OpenstatSourceID (String);
  required binary field_id=-1 UTMSource (String);
  required binary field_id=-1 UTMMedium (String);
  required binary field_id=-1 UTMCampaign (String);
  required binary field_id=-1 UTMContent (String);
  required binary field_id=-1 UTMTerm (String);
  required binary field_id=-1 FromTag (String);
  required int32 field_id=-1 HasGCLID (Int(bitWidth=16, isSigned=true));
  required int64 field_id=-1 RefererHash;
  required int64 field_id=-1 URLHash;
  required int32 field_id=-1 CLID;
}

while hits_0.parquet has no String tag for binaries, and all the fields are optional

e.g. optional binary field_id=-1 Title;

In [8]: pq.read_metadata('hits_0.parquet').schema
Out[8]: 
<pyarrow._parquet.ParquetSchema object at 0x114c30fc0>
required group field_id=-1 schema {
  optional int64 field_id=-1 WatchID;
  optional int32 field_id=-1 JavaEnable (Int(bitWidth=16, isSigned=true));
  optional binary field_id=-1 Title;
  optional int32 field_id=-1 GoodEvent (Int(bitWidth=16, isSigned=true));
  optional int64 field_id=-1 EventTime;
  optional int32 field_id=-1 EventDate (Int(bitWidth=16, isSigned=false));
  optional int32 field_id=-1 CounterID;
  optional int32 field_id=-1 ClientIP;
  optional int32 field_id=-1 RegionID;
  optional int64 field_id=-1 UserID;
  optional int32 field_id=-1 CounterClass (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 OS (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 UserAgent (Int(bitWidth=16, isSigned=true));
  optional binary field_id=-1 URL;
  optional binary field_id=-1 Referer;
  optional int32 field_id=-1 IsRefresh (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 RefererCategoryID (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 RefererRegionID;
  optional int32 field_id=-1 URLCategoryID (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 URLRegionID;
  optional int32 field_id=-1 ResolutionWidth (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 ResolutionHeight (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 ResolutionDepth (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 FlashMajor (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 FlashMinor (Int(bitWidth=16, isSigned=true));
  optional binary field_id=-1 FlashMinor2;
  optional int32 field_id=-1 NetMajor (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 NetMinor (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 UserAgentMajor (Int(bitWidth=16, isSigned=true));
  optional binary field_id=-1 UserAgentMinor;
  optional int32 field_id=-1 CookieEnable (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 JavascriptEnable (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 IsMobile (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 MobilePhone (Int(bitWidth=16, isSigned=true));
  optional binary field_id=-1 MobilePhoneModel;
  optional binary field_id=-1 Params;
  optional int32 field_id=-1 IPNetworkID;
  optional int32 field_id=-1 TraficSourceID (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 SearchEngineID (Int(bitWidth=16, isSigned=true));
  optional binary field_id=-1 SearchPhrase;
  optional int32 field_id=-1 AdvEngineID (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 IsArtifical (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 WindowClientWidth (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 WindowClientHeight (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 ClientTimeZone (Int(bitWidth=16, isSigned=true));
  optional int64 field_id=-1 ClientEventTime;
  optional int32 field_id=-1 SilverlightVersion1 (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 SilverlightVersion2 (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 SilverlightVersion3;
  optional int32 field_id=-1 SilverlightVersion4 (Int(bitWidth=16, isSigned=true));
  optional binary field_id=-1 PageCharset;
  optional int32 field_id=-1 CodeVersion;
  optional int32 field_id=-1 IsLink (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 IsDownload (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 IsNotBounce (Int(bitWidth=16, isSigned=true));
  optional int64 field_id=-1 FUniqID;
  optional binary field_id=-1 OriginalURL;
  optional int32 field_id=-1 HID;
  optional int32 field_id=-1 IsOldCounter (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 IsEvent (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 IsParameter (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 DontCountHits (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 WithHash (Int(bitWidth=16, isSigned=true));
  optional binary field_id=-1 HitColor;
  optional int64 field_id=-1 LocalEventTime;
  optional int32 field_id=-1 Age (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 Sex (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 Income (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 Interests (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 Robotness (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 RemoteIP;
  optional int32 field_id=-1 WindowName;
  optional int32 field_id=-1 OpenerName;
  optional int32 field_id=-1 HistoryLength (Int(bitWidth=16, isSigned=true));
  optional binary field_id=-1 BrowserLanguage;
  optional binary field_id=-1 BrowserCountry;
  optional binary field_id=-1 SocialNetwork;
  optional binary field_id=-1 SocialAction;
  optional int32 field_id=-1 HTTPError (Int(bitWidth=16, isSigned=true));
  optional int32 field_id=-1 SendTiming;
  optional int32 field_id=-1 DNSTiming;
  optional int32 field_id=-1 ConnectTiming;
  optional int32 field_id=-1 ResponseStartTiming;
  optional int32 field_id=-1 ResponseEndTiming;
  optional int32 field_id=-1 FetchTiming;
  optional int32 field_id=-1 SocialSourceNetworkID (Int(bitWidth=16, isSigned=true));
  optional binary field_id=-1 SocialSourcePage;
  optional int64 field_id=-1 ParamPrice;
  optional binary field_id=-1 ParamOrderID;
  optional binary field_id=-1 ParamCurrency;
  optional int32 field_id=-1 ParamCurrencyID (Int(bitWidth=16, isSigned=true));
  optional binary field_id=-1 OpenstatServiceName;
  optional binary field_id=-1 OpenstatCampaignID;
  optional binary field_id=-1 OpenstatAdID;
  optional binary field_id=-1 OpenstatSourceID;
  optional binary field_id=-1 UTMSource;
  optional binary field_id=-1 UTMMedium;
  optional binary field_id=-1 UTMCampaign;
  optional binary field_id=-1 UTMContent;
  optional binary field_id=-1 UTMTerm;
  optional binary field_id=-1 FromTag;
  optional int32 field_id=-1 HasGCLID (Int(bitWidth=16, isSigned=true));
  optional int64 field_id=-1 RefererHash;
  optional int64 field_id=-1 URLHash;
  optional int32 field_id=-1 CLID;
}

Add result for Apache Doris

hi:
I want to make a pr to add Apache Doris result, but I'm not sure about the Reproducibility.
It says 20 minutes. if I load data into Doris and wait for 5 minutes, then run query, is that OK?
Doris still performs compaction in the backend when the load data is done, which affects the query performance.

Add Quickwit support

Would be nice to see Quickwit comparison as well since at least ElasticSearch is supported here right now.

Pinging @fulmicoton - probably you could be interested in it at least from the perspective "where Quickwit performance could be improved compared to other engines".

ClickBench query 29

Running q29 produces results like

http://rihanner.ferio.ru/katalogOrigin	38	149869	http://rihanner.ferio.ru/katalogOrigin
http://irr.ru/jobs-educations/tehnik	36	123637	http://irr.ru/jobs-educations/tehnik
https://google.com/fee=меньше	35	2958167	https://google.com/fee=меньше
http://kirov.irr.ru/index.php%3Ftb	34	273645	http://kirov.irr.ru/index.php%3Ftb
http://video.yandsearch/price=от	34	268732	http://video.yandsearch/price=от
http://irr.ru/jobinmoscow.ru/Nike	33	260592	http://irr.ru/jobinmoscow.ru/Nike
http://bdsmpeople.ru/register2123	33	164925	http://bdsmpeople.ru/register2123
...

I would have expect to see host names return as a result

With Postgres I do receive correct result

postgres=# 
postgres=# SELECT REGEXP_REPLACE('http://irr.ru/jobs-educations/tehnik', '^https?://(?:www.)?([^/]+)/.*$', '\1');
 regexp_replace 
----------------
 irr.ru
(1 row)

postgres=# SELECT REGEXP_REPLACE('http://rihanner.ferio.ru/katalogOrigin', '^https?://(?:www.)?([^/]+)/.*$', '\1');
  regexp_replace   
-------------------
 rihanner.ferio.ru
(1 row)

postgres=# 

ClickHouse appears to run into an overflow on Q3

Ran into this while adding result verification, the result produced by ClickHouse for Q3 appears to be incorrect - likely due to an internal overflow:

SELECT AVG(UserID) FROM hits;
┌─────────avg(UserID)─┐
│ -55945124888.916016 │
└─────────────────────┘

I'm not sure if this is intended behavior - it does not appear to be listed on the documentation.

Adding a cast to INT128 or DOUBLE fixes the problem:

SELECT AVG(toInt128(UserID)) FROM hits;
┌─avg(toInt128(UserID))─┐
│   2528953029789716000 │
└───────────────────────┘

SELECT AVG(CAST(UserID AS DOUBLE)) FROM hits;
┌─avg(CAST(UserID, 'DOUBLE'))─┐
│         2528953029789716000 │
└─────────────────────────────┘

Can we get a larger dataset?

Is it possible to get a larger data set, say 2TB or 5TB? Testing on a 200GB data set that is easily compressible down to 50GB with modern compression algorithms might exclude disk IO from the equation on systems with large caches (even if those are simple disk caches)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.