As previously discussed, I think that the current implementation of the <a href="https

The run-suite invokes each benchm

Some thoughts on DRAM throughput benchmarking about sycl-bench HOT 7 CLOSED

psalz commented on May 27, 2024

Some thoughts on DRAM throughput benchmarking

from sycl-bench.

Comments (7)

illuhad commented on May 27, 2024

I agree with your analysis. Your proposal is much closer to the classical stream workload. For measuring throughput, I think we should rather follow stream.

This is of course quite a large number, and a much larger size than many other benchmarks can support (if e.g. it is used as the range in both dimensions of a 2D buffer). One possible solution would be to always multiply the size by some fixed factor, e.g. 1024. However, I generally feel like the mapping of the --size parameter to the actual workload is too arbitrary already. It might be convenient for running lots of benchmarks in batch, but in reality I think we'll have to hand tune these values (as well as any additional parameters a benchmark might have) for each individual platform anyways. I'm thus thinking whether maybe having individual parameters for each benchmark would make more sense (e.g. having a buffer-size= for this one).

The run-suite script starts with a very small --size parameter (I think something like 64) and then doubles the problem size until the runtime has achieved a minimum runtime that is determined by the test profile in run-suite. I image this could already work well for many cases. If this is not the case here, the test profiles can override the tested problem sizes for individual benchmarks. We could simply edit the test profile to start with larger values if necessary.

In the same vein, having the ability to compute custom metrics would also be great. For example, it would be cool if this benchmark could actually print its achieved throughput, instead of having to manually compute it.

Yes, I agree. The main challenge is that the benchmarks are individual applications, that don't know which results the other benchmarks will emit to the csv file. If the benchmarks emit different columns to the csv, this prevents a correct formatting. A solution could be adding a standard field for all benchmarks that will be present in the result of every benchmark, say, metric, and benchmarks can choose to implement this or return N/A, similar to the verification.

from sycl-bench.

psalz commented on May 27, 2024

Yes, I agree. The main challenge is that the benchmarks are individual applications, that don't know which results the other benchmarks will emit to the csv file. If the benchmarks emit different columns to the csv, this prevents a correct formatting. A solution could be adding a standard field for all benchmarks that will be present in the result of every benchmark, say, metric, and benchmarks can choose to implement this or return N/A, similar to the verification.

That could work, yes. It is of course a bit limiting and not very descriptive. Hypothetically, if the various benchmarks were to emit additional metrics/columns -- how useful is it really to have all results for all benchmarks in a single file? Couldn't we just generate separate CSVs for each executable?

Additionally, to compute e.g. throughput, the benchmark would need to have access to the timing results, which, given the current plugin/hook architecture seems a bit messy. Maybe timing should be considered a "core" functionality of the framework instead, much like verification?

from sycl-bench.

illuhad commented on May 27, 2024

That could work, yes. It is of course a bit limiting and not very descriptive. Hypothetically, if the various benchmarks were to emit additional metrics/columns -- how useful is it really to have all results for all benchmarks in a single file? Couldn't we just generate separate CSVs for each executable?

The run-suite script invokes each benchmark a dozen times with different combinations of problem size and local size, so we would have a looot of very small csv files, each with only 1 or a couple rows. For every meaningful analysis, those files would need to be aggregated into one file again anyway. It could be done, but it won't be pretty I think.

Additionally, to compute e.g. throughput, the benchmark would need to have access to the timing results, which, given the current plugin/hook architecture seems a bit messy. Maybe timing should be considered a "core" functionality of the framework instead, much like verification?

Hm. Good point. There may be more metrics than just bandwidth that require access to one or multiple measurements (e.g. once we can measure power consumption, flops/s/watt). This makes me think that maybe metrics that represent transformations of actual measurements should maybe better be implemented in a post processing step (e.g. in run-suite) for the most flexibility?

from sycl-bench.

sohansharma commented on May 27, 2024

The run-suite script invokes each benchmark a dozen times with different combinations of problem size and local size, so we would have a looot of very small csv files, each with only 1 or a couple rows. For every meaningful analysis, those files would need to be aggregated into one file again anyway. It could be done, but it won't be pretty I think.
Do we need so many different configurations? If we can restrict the number of configurations, then it will not be that messy in one CSV file. For example, we do not need to vary the problem size and local size for every experiment.

from sycl-bench.

psalz commented on May 27, 2024

Hm. Good point. There may be more metrics than just bandwidth that require access to one or multiple measurements (e.g. once we can measure power consumption, flops/s/watt). This makes me think that maybe metrics that represent transformations of actual measurements should maybe better be implemented in a post processing step (e.g. in run-suite) for the most flexibility?

The problem I see with that approach is that e.g. in this case, for DRAM bandwidth, we also need to know the amount of bytes copied. If we go with the mapping from --size to 1/2/3D ranges as you proposed on my PR, the number of bytes might depend on that parameter, the dimensionality of the kernel and the data type being copied. So either we have the benchmark log this information somewhere (coming back to the issue of per-benchmark metrics), or we hardcode this somehow into run-suite, which seems like a bad idea to me.

from sycl-bench.

psalz commented on May 27, 2024

The run-suite script invokes each benchmark a dozen times with different combinations of problem size and local size, so we would have a looot of very small csv files, each with only 1 or a couple rows. For every meaningful analysis, those files would need to be aggregated into one file again anyway. It could be done, but it won't be pretty I think.

Since the script knows which runs belong to the same benchmark, it could just concatenate those outputs into a single CSV. Then we'd have one file per benchmark - that sounds manageable to me.

from sycl-bench.

psalz commented on May 27, 2024

Closing this as we have merged the simpler version of the DRAM benchmark in #26 and added throughput metrics in #30. Thanks everyone!

from sycl-bench.

Some thoughts on DRAM throughput benchmarking about sycl-bench HOT 7 CLOSED

Comments (7)

Related Issues (17)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent