Comments (6)
Thanks Alexey for making your intentions clear:
-
If there are 2 ways of doing something with Snowflake, you will choose the one that makes Snowflake looks worse. Here you have acknowledged that there is a better way, but you refuse to change. Give Snowflake the same files that you provided to everyone else and yourself, and the numbers will change.
-
Of the 38 systems you tested, only Snowflake got a snarky
NOTES.MD
. Then you want us to believe that's because your main goal is to make Snowflake better. I doubt that's your main goal.
from clickbench.
You are messing the results for ClickHouse and clickhouse-local:
Loading data into ClickHouse:
https://github.com/ClickHouse/ClickBench/blob/main/clickhouse/benchmark.sh#L24
It takes 476 seconds to load from TSV file on c6a.4xlarge machine:
https://github.com/ClickHouse/ClickBench/blob/main/clickhouse/results/c6a.4xlarge.json
Or 417 seconds if you use zstd compression:
https://github.com/ClickHouse/ClickBench/blob/main/clickhouse/results/c6a.4xlarge.zstd.json
It takes 137 seconds to load from TSV file on c6a.metal machine:
https://github.com/ClickHouse/ClickBench/blob/main/clickhouse/results/c6a.metal.json
In contrast, clickhouse-local is a stateless system (like AWS Athena) and it does not take any time to load the data (it is using the files as is without loading), but the performance on the queries is lower.
ClickHouse and clickhouse-local are present as different entries in the benchmark.
On your screenshot, you are comparing Snowflake with ClickHouse, and ClickHouse indeed does loading faster.
There is no magic and you can reproduce the result by following the script.
The loading is not parallelized and should not be, as per Methodology:
https://github.com/ClickHouse/ClickBench#data-loading
from clickbench.
About the comments in NOTES.md
I've spent multiple hours figuring out how to load the data.
First I tried to load with SnowSQL. But it is using Python code to parse CSV, spent one CPU core, and did not finish in 24 hours.
Happily, I've found another way to load the data.
The usability issue with SnowSQL is real. I tried to specify my account name multiple times before I found out that I also need to specify the region name in the command line. This was unclear from the documentation and represents a usability issue worth fixing. There were two different substrings looking like my account name, it was unclear what substring to copy-paste and none of them work by default.
The syntax @test.public.%hits
does look weird.
The pricing is also not quite clear. It shows the price in "credits" but it is difficult to find what credit is worth.
Finally, I found it in some PDF but it was not easy (the search in the documentation does not help and the random internet pages show controversial info). I could not find the billing information in the UI. This is an opportunity for improvement.
The internet is flooded with half-spam pages that "help to figure out the cost of Snowflake".
Finally, I have found the overall experience of the UI one of the best. It works well and looks polished.
The possibility to resize Warehouse in seconds is unique among other services.
Query performance is very consistent - all queries run fine.
While it's slower on average than ClickHouse, you'd better compare it with similar services, like Redshift, and Redshift Serverless.
I've already told my colleagues that Snowflake surprised me in a good way.
(easy scaling + good user experience)
from clickbench.
If there are 2 ways of doing something with Snowflake, you will choose the one that makes Snowflake looks worse. Here you have acknowledged that there is a better way, but you refuse to change. Give Snowflake the same files that you provided to everyone else and yourself, and the numbers will change.
No, I've selected the best way to load the data.
As mentioned in the NOTES.md, I've ended up using
COPY INTO test.public.hits2 FROM 's3://clickhouse-public-datasets/hits_compatible/hits.csv.gz' FILE_FORMAT = (TYPE = CSV, COMPRESSION = GZIP, FIELD_OPTIONALLY_ENCLOSED_BY = '"')
If there is an even better variant of data loading within the rules of this benchmark, let's use it.
from clickbench.
Of the 38 systems you tested, only Snowflake got a snarky NOTES.MD. Then you want us to believe that's because your main goal is to make Snowflake better. I doubt that's your main goal.
You will find similar comments about other systems' usability, for example:
https://github.com/ClickHouse/ClickBench/tree/main/bigquery
from clickbench.
Please note that poor onboarding, obsolete documentation, and nothing working by default - it's sad, but it's typical among the services, and your service does not perform the worst from this standpoint.
I think that capturing the experience of a "clueless" and "ignorant" user that is using your service for the first time - is the most valuable for improving the product.
from clickbench.
Related Issues (20)
- Databend benchmark is not valid the the latest Databend versions HOT 3
- Does the skipping index have more advantages for test dataset HOT 1
- Why is the perf of cold scan much worse than ever in the last commit of m5d.24xlarge. HOT 1
- Add YTsaurus support
- Can we get a larger dataset? HOT 1
- Add BoilingData to ClickBench HOT 14
- Doris vs Clickhouse for TPC-H HOT 1
- Update DataFusion & results
- AlloyDB benchmark results for 8 vCPU not clear
- Any plan for adding queries with window function? HOT 1
- Q17 doesn't have sorting HOT 2
- Segmentation fault running hardware.sh while running Test 17 HOT 1
- The Pinot benchmark does not have indices. HOT 7
- Elasticsearch benchmarks flush the cache between queries HOT 6
- ClickHouse appears to run into an overflow on Q3 HOT 2
- Add YDB for comparing HOT 1
- Inaccurate table size calculation of Mysql HOT 1
- Add Quickwit support HOT 5
- feat: add Github Actions to auto generate index.html HOT 1
- Syntax error for postgresql CREATE TABLE HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clickbench.