Management Summary: Chrome 124 results don't

The disaster is complete <a target="_blank" rel="noopener noreferr

Chrome issue report is here: <a href="https://issues.chromium.org/issues/337900449" re

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

chrome 124: weird results about js-framework-benchmark HOT 23 CLOSED

krausest commented on August 16, 2024

chrome 124: weird results

from js-framework-benchmark.

Comments (23)

krausest commented on August 16, 2024 5

Just when I finished my plan what I'd do after quitting this project I tried one last thing 😄
What if I add a sleep call of a second before each interaction?
Turns out that this results in the following charts for three independent runs for puppeteer:

Those charts look much closer to manual testing (and of course as old as I'm feeling that second between clicks seems realistic).

Now I have to find out:

How does the sleep impact the chart of the other testdrivers?
Where do we need the sleep calls? (hopefully not at many places)
What's the minimum duration for the sleep?

from js-framework-benchmark.

krausest commented on August 16, 2024 3

The disaster is complete

On the right are chrome 123 results. The left four charts are chrome 124 with varying sleep calls after every click during initBenchmark and before and after forceGC. Chrome 123 used no sleeps and thus should correspond to no sleep.
I've painted a baseline since the axis is different for the charts.
Sleep with 1 second creates the chart most similar to chrome 123, but is slower that the others and chrome 123.
Basically I'd say that the fastest results should be considered most correct (I mean Vmax means here t_min, no matter how it's achieved).

Not sure how to get out of here. Currently I see no chance that chrome 124 allows me to measure with any confidence.

Thanks to https://github.com/GoogleChromeLabs/chrome-for-testing#json-api-endpoints I was able to download chrome 123 and check if it's really chrome 124 that causes this effect.
So here's the same chart with three different delays for chrome 123:

This looks fine (order is preserved no matter what sleep. 1 sec sleep is a bit faster than no sleep, though 100msecs for create many rows is a little odd, but nowhere near to what chrome 124 doesn), but doesn't help us with chrome 124.

from js-framework-benchmark.

krausest commented on August 16, 2024 2

Chrome issue report is here: https://issues.chromium.org/issues/337900449

from js-framework-benchmark.

antonmak1 commented on August 16, 2024 1

@krausest Maybe this has something to do with the HTML and DOM changes in version 124? Specifically, "Document render-blocking" or "setHTMLUnsafe and parseHTMLUnsafe" or something like that?

from js-framework-benchmark.

titoBouzout commented on August 16, 2024 1

May I suggest to try chrome 126 (chrome dev channel), to see if the problem persist? I'm not sure if worthy but you can also try beta and canary channels. Maybe they already fixed something? What about experiments, last time I read about experiments you cannot turn them off.

from js-framework-benchmark.

krausest commented on August 16, 2024 1

@syduki I don't agree at all.
I didn't expect chrome to perform equally across versions (and that wasn't the case in the past).
But when performance changes between versions there should be a reason we understand (like new layout engine etc.).

Performance within one chrome version must be consistent, i.e. one must be able to validate results from automated testing with manual tests. This is not the case for chrome 124. The automated test reports that ivi is fastest but manual testing shows no evidence that it really is (and manual testing yields indeed pretty much the same results as chrome 123). And of course adding some delays in the warm up and before running the actual benchmark must not influence the measured duration.

from js-framework-benchmark.

krausest commented on August 16, 2024

If I add a RAF to vanillajs I get results pretty close to imba and I get a second commit event.

Maybe it was a wrong decision to count to the first commit and ignore the second commit in this case.
It seems like that second commit is a consequence of the RAF call (and not something spurious) and should be included in the measurement.

from js-framework-benchmark.

krausest commented on August 16, 2024

Ran the benchmark for the keyed frameworks again last night. At least results are very consistent ☹️
Here's a screenshot:

or here if you want to look at the table.
It'll take some time for further analysis.

from js-framework-benchmark.

krausest commented on August 16, 2024

Here's a look at create rows:
A rather normal vanillajs trace:

The fastest ivi trace:

This wasn't the case with chrome 123:

With manual testing I haven't managed to get faster results for ivi, it looks more like chrome 123.

Using puppeteer as the test runner reports 34.1 msecs for ivi and 37.3 msecs for vanillajs.
Webdrivercdp reports 33.9 msecs for ivi and 33.8 for vanillajs, which is more consistent with manual testing.

Other tests (so far it was the same browser windows, but a new tab per run):
puppeteer with a new browser window per run: 33.4 for ivi and 36.2 for vanillajs.
puppeteer running the create bench in one tab: 33.6 for ivi and 34.3 for vanillajs

Manual testing gives me 38.5 msecs for ivi and 36.8 for vanillajs. Something is wrong here...

from js-framework-benchmark.

krausest commented on August 16, 2024

I took a closer look at the create 1k rows issue - but I have no good news:
I ran a manual test for create 1k rows with 8 runs for ivi, solid, doohtml and vanillajs. This is what a boxplot looks like for this manual test. I'd consider that to be the ground truth:

Please note that there's a suspicious outlier for vanillajs with 34.29 msecs. I kept a screenshot of that trace.
There's a quite clear ordering between the other three: doohtml < ivi < solid

What's interesting is that the chart above is remarkably similar to the puppeteer chrome 123 results:

However the results for all testdrivers are disappointing. I performed three runs for each of them:

Puppeteer: "clearly ivi is fastest"

At least results are repeatable though not similar to the manual results.

Playwright: "solid is fastest"

Webdrivercdp: "where is my mind?"

The first run fits to "doo < ivi < solidjs" though vanillajs is closer to the outlier from the manual testing above, but run two and three just look random.

Currently I'm out of ideas. I don't see how I could publish chrome 124 results soon.

from js-framework-benchmark.

antonmak1 commented on August 16, 2024

@krausest Perhaps, if the old code works somehow wrong, then first of all, it needs to be tested together with the new code, which needs to be written based on version 124. It is clear that some kind of nonsense is coming out, that all the results now start from 1.05 and those frameworks and libraries that were plus or minus in one place are now 10 positions ahead, then 10 positions behind - this is nonsense. How can 7+ releases be adequate, but then everything breaks with the new one? This means that there is definitely a new error somewhere in the code that was not an error before or was not considered one.

from js-framework-benchmark.

mksunny1 commented on August 16, 2024

I will add my mind here. See if I can come up with something. I am not too familiar with this yet.

from js-framework-benchmark.

krausest commented on August 16, 2024

There's not much code that was updated: 30c247a
I also tried to update all the dependencies, but it doesn't change anything so I rolled them back. So I really think the factor that makes the difference is chrome 124.

I tried if windows reports better results, but it doesn't look like it does.

from js-framework-benchmark.

krausest commented on August 16, 2024

1: I don't see that the delay helps the other test drivers.
playwright with a delay of 1 second:

Looks different from the chart without delays, but not close to the manual testing result.

Nor does webdrivercdp:

So we'll stick with puppeteer the other drivers do not report values closer to manual testing.

from js-framework-benchmark.

krausest commented on August 16, 2024

Chrome 124 is bizarre. I tried first adding sleeps between all interactions and then trying to filter out where they are actually needed.

I got that chart with puppeteer:

Look how bad those numbers are: > 50 msecs. That's ridiculous.
Here's a trace for one such bad run:

And here's the trace file:
doohtml-keyed_01_run1k_5.json
The trace looks right, the computation of the duration is OK, there's no RAF, no GC, nothing suspicious. But scripting 5 ms and rendering 48 ms is just incredibly bad.

It turns out that one sleep call causes that bad performance:
Before runBenchmark we call forceGC. If we sleep after that we get the bad performance:

Uncommenting the line (which just sleeps for a second via setTimeout in a promise) causes the bad performance.
Without that sleep:

Just 37 msecs.
doohtml-keyed_01_run1k_2.json
The script duration is now 2 msecs and the rendering duration 37 mecs (!).

Found out something: This can be resolved by adding the trace category "disabled-by-default-v8.cpu_profiler". When enables were below 40 msecs with the sleep after forceGC().

from js-framework-benchmark.

krausest commented on August 16, 2024

This comment serves just as a summary and will be linked from the chrome issue report. It contains no new information to the above.

With chrome 124 I got strange results for the benchmark that can be be seen in the chart below:

The results for chrome 123 on the right should be seen as a baseline. The chart "no sleep" is using the same code as chrome 123 and should be identical, but it is far off!
I actually performed the same benchmark with chrome 124 manually by clicking and extracting the duration from the timeline in chrome and it gives results close to chrome 123:

I tried if delaying the actions from the benchmark driver helps and indeed adding a sleep of one second gives an order similar to chrome 123 but at a different speed. Varying the sleep duration makes the ranking of the frameworks arbitrary: ivi can be fastest, slowest and third. There's a chart above that shows that chrome 123 are stable for those sleep durations.

The duration for the benchmark is measured via traces from the click to the paint commit event:

For chrome 124 with 500 msecs delay we get something like 38.81 for one benchmark run
Without sleep it takes only 34.78 msecs.

Please note that the difference comes from rendering duration, though rendering shouldn't behave different when some sleeps delays are added before the click event.

from js-framework-benchmark.

krausest commented on August 16, 2024

@titoBouzout I tried that before, but didn't keep the result. Anyways I repeated the run. It looks like that without sleep:

Just as bad as chrome 124
@localvoid Any idea why ivi is most impacted?

from js-framework-benchmark.

localvoid commented on August 16, 2024

@krausest I am not sure about other libraries, but I think that the only difference (DOM operations) between ivi and vanillajs is that when table is cleared, it removes rows by replacing <tbody> DOM node with a new one instead of removing rows with textContent=""

from js-framework-benchmark.

syduki commented on August 16, 2024

Very exciting "issue" indeed 😄. But seriously, this should have happened soon or later, it is doubtful a Chrome 124 issue. It is naive at least to expect benchmark consistency between browser releases when there is no separation for "Olympics" and "Paralympics" while they run together on the same marathon, also when benchmarking two different kinds of "manufacturing processes", like "stamping" and "handcrafting", which is the very case when some frameworks are using innerHTML to build the DOM tree and others - createElement.

To me, the results look pretty predictable, if considering the recent effort put into HTML reviving, thus HTML optimizations. It is logical that innerHTML should be faster than createElement, indeed as it was once. As for the case with "sleep", it can be explained from the point of view of HTML parser cache optimization, where no-sleep would benefit from cache-hit, and conversely, a sleep would cause a cache-miss.

It seems to me that we will see the same "issue" in the future versions of Chrome, so my suggestion is to publish the results as they are, for the sake of history.

from js-framework-benchmark.

syduki commented on August 16, 2024

@krausest Well, I can just admit that we have very different perception of benchmark consistency. Here, I consider consistency to be relevant in a narrow field only, depending on specific framework/scenario/environment, not that it should be universally comparable between any arbitrary mix of those.
It is obvious that optimizations for different methods of layout creation are different, thus, I am not expecting inconsistencies only on major changes in browser, i.e. breaking changes that affects all frameworks (layout engine), and disregarding the changes/optimizations in a specific subroutine which may not necessarily be a breaking change nor a subject for public report, but may affect only a subset of frameworks.
As I already alluded, I think this very issue is somehow related to the latest changes/optimizations in the markup handling. It could be that recent addition of setHTMLUnsafe somehow affected the innerHTML handling as they share the same underlying code. I didn't dig deeper into that code, but this source code makes me to believe that such kind of optimizations are used now more aggressively.
Regarding the manual/automated tests, I am agree that they should exhibit the same behavioral results but don't get why one should expect the same performance results/consistency when actually these are different test environments.

from js-framework-benchmark.

krausest commented on August 16, 2024

I'm really stuck because I have no confidence in the chrome 124+ results, but I have a proposal to resolve that:
If we achieve to create a new vanillajs implementation that performs create 1,000 rows, replace rows and create 10,000 rows faster (or as fast) as ivi in the no delay scenario above I'll gain back confidence. If we fail I claim that chrome reports incorrect values.
@syduki @trueadm @localvoid or anyone else: Any chance submitting a new vanillajs version?
(One more rule: Just cloning ivi and using that as a lib wouldn't count 😄 )
I opened an issue for that: #1661

from js-framework-benchmark.

krausest commented on August 16, 2024

Thanks to you (especially @robbiespeed) we found something: The GC didn't work properly for ivi. Tweaking the GC calls made results much more as expected.

I uploaded new results for keyed implementations here: https://krausest.github.io/js-framework-benchmark/2024/table_chrome_124_preview2.html

Looks much more reasonable. Event the remove row benchmarks looks better, imba is back to normal.

There's still one decision open:
Should we compute RAF frameworks like bobril with 22.64 msecs (from click to end of 2nd commit)

or as 16.04 msecs (from click to end of first commit)

So far we take the 2nd approach, but if the 2nd commit happens for all RAF frameworks I think switching to alternative 1 seems more logical to me.
Seems like there's no second commit for frameworks that don't use RAF.

I'll check both assumption and report back.

from js-framework-benchmark.

krausest commented on August 16, 2024

One last check. It's really crazy. Forcing GC should help most frameworks, since it prevents GC during execution of the framework. Except that it doesn't help ivi...

Duration in msecs

Framework	no GC	7x window.gc()	Full GC
vanillajs	41.3	38	38
ivi	37.6	35.2	39.2

Full GC = window.gc({type:'major',execution:'sync',flavor:'last-resort'}), which has the same effect as a loop of HeapProfiler.collectGarbage and window.gc()

Memory after create 1k

Framework	no GC	7x window.gc()	Full GC
vanillajs	1.78	1.81	1.77
ivi	2.09	2.09	2.07

Anyways I'm closing this issue now, since the chrome 124 anomalies were removed with the new Full GC.

from js-framework-benchmark.

chrome 124: weird results about js-framework-benchmark HOT 23 CLOSED

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent