Deion TLDR - Snapshots change across 4.0.0a26 and 4.0.0a27.

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Raised <a class="issue-link js-issue-link" data-error-text="Failed to load title" data

Running benchmark across versions with UI changes about benchmarks HOT 9 CLOSED

jupyterlab commented on August 20, 2024

Running benchmark across versions with UI changes

from benchmarks.

Comments (9)

welcome commented on August 20, 2024

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! 🤗

If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively.

You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! 👋

Welcome to the Jupyter community! 🎉

from benchmarks.

fcollonval commented on August 20, 2024

Hey @suganya-sk

The goal of the snapshots is actually not to check for consistency between versions but between runs with the same version. But I definitely understand that it creates some confusion.

If you look at the GitHub workflow we are using, you will see that we are running once the tests to update the snapshots before running them for statistics computation:

benchmarks/.github/workflows/run-benchmark.yml

Lines 203 to 204 in 76daf5f

    
                     # Update test screenshots 
        
                     BENCHMARK_NUMBER_SAMPLES=1 PW_VIDEO=1 jlpm run test --project ${{ inputs.reference_project }} -u

So I would encourage you to do the same.

Additional comment: if the version you are comparing are strict open-source JupyterLab, you can directly fork this repository and execute the action to compute the benchmarks; see https://jupyterlab-benchmarks.readthedocs.io/en/latest/benchmarks/ci.html. In particular the challenger repo will be jupyterlab/jupyterlab and the git references will be the tags you want to compare.
For example I started a benchmark between 3.4.7 and 4.0.0alpha 29; see that job

from benchmarks.

fcollonval commented on August 20, 2024

For completion, this is the results of the above mentioned job.

The switch action can be faster in 3.4.7 because it was using a different approach to switch tabs. That approach was actually reverted because it breaks Chrome-based browser if one of the tab contains an iframe (like external documentation or a pdf viewer).

Benchmark report

The execution time (in milliseconds) are grouped by test file, test type and browser.
For each case, the following values are computed: min <- [1st quartile - median - 3rd quartile] -> max.

The mean relative comparison is computed with 95% confidence.

Results table

Test file	large_code_100_notebook	large_md_100_notebook	longOutput - A single output with 100x100 divs
open
`chromium`
v3.4.7	5012 <- [5371 - 6002 - 6342] -> 7516	1740 <- [1847 - 1882 - 1944] -> 2441	2104 <- [2189 - 2221 - 2279] -> 2702
expected	588 <- [632 - 658 - 688] -> 947	1116 <- [1191 - 1239 - 1279] -> 1797	1471 <- [1520 - 1550 - 1588] -> 2145
Mean relative change	779.7% ± 25.3%	51.0% ± 3.9%	43.1% ± 2.4%
switch-from-copy
`chromium`
v3.4.7	57 <- [164 - 221 - 361] -> 878	47 <- [200 - 241 - 300] -> 1041	61 <- [78 - 125 - 271] -> 797
expected	161 <- [198 - 221 - 359] -> 650	250 <- [293 - 316 - 435] -> 856	849 <- [905 - 968 - 1085] -> 1456
Mean relative change	1.5% ± 8.9%	-29.3% ± 5.0%	-80.8% ± 1.8%
switch-to-copy
`chromium`
v3.4.7	509 <- [538 - 554 - 573] -> 662	506 <- [525 - 545 - 555] -> 687	507 <- [529 - 541 - 552] -> 616
expected	506 <- [511 - 516 - 523] -> 652	505 <- [510 - 516 - 523] -> 645	1019 <- [1121 - 1175 - 1245] -> 1447
Mean relative change	7.0% ± 0.7%	4.2% ± 0.6%	-54.2% ± 0.4%
switch-from-txt
`chromium`
v3.4.7	54 <- [80 - 147 - 234] -> 445	47 <- [76 - 139 - 225] -> 298	63 <- [76 - 85 - 123] -> 272
expected	124 <- [154 - 164 - 177] -> 258	128 <- [175 - 184 - 194] -> 263	180 <- [201 - 211 - 228] -> 316
Mean relative change	-2.0% ± 5.9%	-16.6% ± 4.7%	-47.7% ± 2.8%
switch-to-txt
`chromium`
v3.4.7	49 <- [68 - 104 - 120] -> 326	45 <- [66 - 77 - 205] -> 377	63 <- [79 - 112 - 171] -> 494
expected	136 <- [162 - 174 - 190] -> 272	295 <- [327 - 339 - 353] -> 479	811 <- [873 - 907 - 1021] -> 1399
Mean relative change	-35.4% ± 4.2%	-63.3% ± 2.6%	-85.9% ± 0.9%
close
`chromium`
v3.4.7	757 <- [822 - 900 - 945] -> 1072	561 <- [616 - 635 - 663] -> 786	877 <- [927 - 940 - 962] -> 1022
expected	245 <- [274 - 291 - 307] -> 458	288 <- [326 - 343 - 362] -> 481	477 <- [526 - 545 - 565] -> 628
Mean relative change	197.3% ± 9.1%	83.9% ± 4.9%	73.1% ± 2.2%

Changes are computed with expected as reference.

v4.0.0a29 = 8e08e4252a79a6c816535b9e80759adff984cad7 | v3.4.7 = 8fea0391c8d9d72f4f2aabb7af3b3064be4fa52e
Go to action log
Changelog covered

❗ Test metadata have changed

--- /dev/fd/63	2022-09-29 18:18:00.202102049 +0000
+++ /dev/fd/62	2022-09-29 18:18:00.202102049 +0000
@@ -1,7 +1,7 @@
 {
   "benchmark": {
     "BENCHMARK_OUTPUTFILE": "lab-benchmark.json",
-    "BENCHMARK_REFERENCE": "v3.4.7"
+    "BENCHMARK_REFERENCE": "actual"
   },
   "browsers": {
     "chromium": "106.0.5249.30"

from benchmarks.

suganya-sk commented on August 20, 2024

Hello, thank you for the through response here.

If you look at the GitHub workflow we are using, you will see that we are running once the tests to update the snapshots before running them for statistics computation:

Let me try this and update.

from benchmarks.

suganya-sk commented on August 20, 2024

Please correct me if I'm wrong here - I had understood -u to mean that it updates snapshots and also sets this run to be expected value in tests-out/lab-benchmark-expected.json. If this is so, when I use -u before running tests on the challenger, would this not reset the values in the tests-out/lab-benchmark-expected.json as well?

from benchmarks.

fcollonval commented on August 20, 2024

Indeed it will do, on the CI we copy a back up after the updated snapshots call and before the challenger tests:

benchmarks/.github/workflows/run-benchmark.yml

Lines 261 to 268 in 76daf5f

    
                     # Update test screenshots 
        
                     BENCHMARK_NUMBER_SAMPLES=1 PW_VIDEO=1 jlpm run test --project ${{ inputs.challenger_project }} -u 
        
                     # Copy reference here otherwise it will use the value from the update screenshots 
        
                     # command called just before 
        
                     cp /tmp/lab-benchmark-expected.json ./tests-out 
        
                     jlpm run test --project ${{ inputs.challenger_project }}

from benchmarks.

suganya-sk commented on August 20, 2024

Ah, I missed that, sorry. Thank you, this makes it very clear.

Can we consider adding this to the instructions here, to compare a reference and a challenger between which there are expected UI differences?

I would be happy to raise a PR, if that works.

from benchmarks.

fcollonval commented on August 20, 2024

Sure PRs are welcomed.

from benchmarks.

suganya-sk commented on August 20, 2024

Raised #125 to add doc as discussed above.

from benchmarks.

Running benchmark across versions with UI changes about benchmarks HOT 9 CLOSED

Comments (9)

Benchmark report

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	# Update test screenshots
	BENCHMARK_NUMBER_SAMPLES=1 PW_VIDEO=1 jlpm run test --project ${{ inputs.reference_project }} -u

	# Update test screenshots
	BENCHMARK_NUMBER_SAMPLES=1 PW_VIDEO=1 jlpm run test --project ${{ inputs.challenger_project }} -u

	# Copy reference here otherwise it will use the value from the update screenshots
	# command called just before
	cp /tmp/lab-benchmark-expected.json ./tests-out

	jlpm run test --project ${{ inputs.challenger_project }}