Giter Site home page Giter Site logo

Comments (6)

fhoffa avatar fhoffa commented on May 18, 2024 3

Thanks Alexey for making your intentions clear:

  • If there are 2 ways of doing something with Snowflake, you will choose the one that makes Snowflake looks worse. Here you have acknowledged that there is a better way, but you refuse to change. Give Snowflake the same files that you provided to everyone else and yourself, and the numbers will change.

  • Of the 38 systems you tested, only Snowflake got a snarky NOTES.MD. Then you want us to believe that's because your main goal is to make Snowflake better. I doubt that's your main goal.

from clickbench.

alexey-milovidov avatar alexey-milovidov commented on May 18, 2024 2

You are messing the results for ClickHouse and clickhouse-local:

Loading data into ClickHouse:
https://github.com/ClickHouse/ClickBench/blob/main/clickhouse/benchmark.sh#L24

It takes 476 seconds to load from TSV file on c6a.4xlarge machine:
https://github.com/ClickHouse/ClickBench/blob/main/clickhouse/results/c6a.4xlarge.json

Or 417 seconds if you use zstd compression:
https://github.com/ClickHouse/ClickBench/blob/main/clickhouse/results/c6a.4xlarge.zstd.json

It takes 137 seconds to load from TSV file on c6a.metal machine:
https://github.com/ClickHouse/ClickBench/blob/main/clickhouse/results/c6a.metal.json

In contrast, clickhouse-local is a stateless system (like AWS Athena) and it does not take any time to load the data (it is using the files as is without loading), but the performance on the queries is lower.

ClickHouse and clickhouse-local are present as different entries in the benchmark.
On your screenshot, you are comparing Snowflake with ClickHouse, and ClickHouse indeed does loading faster.

There is no magic and you can reproduce the result by following the script.

The loading is not parallelized and should not be, as per Methodology:
https://github.com/ClickHouse/ClickBench#data-loading

from clickbench.

alexey-milovidov avatar alexey-milovidov commented on May 18, 2024 1

About the comments in NOTES.md

I've spent multiple hours figuring out how to load the data.
First I tried to load with SnowSQL. But it is using Python code to parse CSV, spent one CPU core, and did not finish in 24 hours.
Happily, I've found another way to load the data.

The usability issue with SnowSQL is real. I tried to specify my account name multiple times before I found out that I also need to specify the region name in the command line. This was unclear from the documentation and represents a usability issue worth fixing. There were two different substrings looking like my account name, it was unclear what substring to copy-paste and none of them work by default.

The syntax @test.public.%hits does look weird.

The pricing is also not quite clear. It shows the price in "credits" but it is difficult to find what credit is worth.
Finally, I found it in some PDF but it was not easy (the search in the documentation does not help and the random internet pages show controversial info). I could not find the billing information in the UI. This is an opportunity for improvement.

The internet is flooded with half-spam pages that "help to figure out the cost of Snowflake".

Finally, I have found the overall experience of the UI one of the best. It works well and looks polished.
The possibility to resize Warehouse in seconds is unique among other services.
Query performance is very consistent - all queries run fine.
While it's slower on average than ClickHouse, you'd better compare it with similar services, like Redshift, and Redshift Serverless.

I've already told my colleagues that Snowflake surprised me in a good way.
(easy scaling + good user experience)

from clickbench.

alexey-milovidov avatar alexey-milovidov commented on May 18, 2024 1

If there are 2 ways of doing something with Snowflake, you will choose the one that makes Snowflake looks worse. Here you have acknowledged that there is a better way, but you refuse to change. Give Snowflake the same files that you provided to everyone else and yourself, and the numbers will change.

No, I've selected the best way to load the data.

As mentioned in the NOTES.md, I've ended up using

COPY INTO test.public.hits2 FROM 's3://clickhouse-public-datasets/hits_compatible/hits.csv.gz' FILE_FORMAT = (TYPE = CSV, COMPRESSION = GZIP, FIELD_OPTIONALLY_ENCLOSED_BY = '"')

If there is an even better variant of data loading within the rules of this benchmark, let's use it.

from clickbench.

alexey-milovidov avatar alexey-milovidov commented on May 18, 2024 1

Of the 38 systems you tested, only Snowflake got a snarky NOTES.MD. Then you want us to believe that's because your main goal is to make Snowflake better. I doubt that's your main goal.

You will find similar comments about other systems' usability, for example:
https://github.com/ClickHouse/ClickBench/tree/main/bigquery

from clickbench.

alexey-milovidov avatar alexey-milovidov commented on May 18, 2024

Please note that poor onboarding, obsolete documentation, and nothing working by default - it's sad, but it's typical among the services, and your service does not perform the worst from this standpoint.

I think that capturing the experience of a "clueless" and "ignorant" user that is using your service for the first time - is the most valuable for improving the product.

from clickbench.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.