Hypothesis
By moving scripts from the end of the DOM (before the closing </body>
tag), and setting the defer
attribute on scripts, this will improve page performance.
Theory
In some pages on GOV.UK a chart from WebPageTest shows there's a significant lag in the discovery of <script>
tags since they are located at the very bottom of the DOM. They are the last elements to be streamed to the browser.
As seen in the above, by moving the scripts into the head we will bring the discovery of these scripts further forwards in the waterfall. Discovery of assets allows them to be requested and downloaded sooner.
Script Execution
Because the scripts are in the <head>
the DOM isn't fully downloaded and parsed ready for the script execution. We have 2 options, use of either the async
or defer
attribute:
async
- this attribute will download the script asynchronously alongside the HTML, but will execute as soon as download completes (and in any order compared to other scripts).
defer
- this attribute will download the script in parallel, but execution will only happen once the DOM is complete and the domInteractive
event fires.
In theory the parallel downloading of the script and HTML should improve page performance.
Current GOV.UK setup
Slimmer currently has a processor that moves all the script tags found in the page to just before the closing </body>
tag. This processor can be seen here. It may be possible to reverse this move either for all scripts, or only selected scripts.
Testing
Thankfully it is possible to test the theory above without the need to touch any production code by using a Cloudflare Worker and the HTML Rewriter API.
Worker setup
With the worker we will be rewriting the HTML 'on-the-fly' as the request is made. We will be:
- Removing some inline scripts from the page (they can't use
defer
)
- Removing the JavaScript references from the bottom of the page
- Adding the JavaScript back into the
<head>
and appending the defer
attribute.
- Add the inline scripts back into the page by creating a custom response to a specific
url.pathname
A request passed through the above worker will seem completely normal to the browser who will parse and execute the page accordingly.
WebPageTest setup
Using WebPageTest we can test the performance of our modified page against the baseline setup. Thus allowing us to see if it improves or makes performance worse. To do this we must use the overrideHost
feature available in WebPageTest:
overrideHost www.gov.uk govuk-worker-example.workers.dev
This code essentially says "any request to www.gov.uk
from the original page route through the worker and therefore modify the response presented back to WebPageTest.
The full WebPageTest can be seen below:
setCookie https://www.gov.uk/ cookies_policy={"essential":true,"settings":true,"usage":true,"campaigns":true}
setCookie https://www.gov.uk/ cookies_preferences_set=true
setCookie https://www.gov.uk/ global_bar_seen={"count":999,"version":8}
addHeader x-bypass-transform:true
overrideHost www.gov.uk govuk-worker-example.workers.dev
navigate %URL%
Note: govuk-worker-example.workers.dev
isn't a real domain, just an example placeholder.
The x-bypass-transform:true
header allows us to bypass the transformation and capture a baseline that itself is passed through the worker but unmodified.
Cookies have been set in order to accept the cookie banner for all tests.
Page setup
The News and Communication page was chosen as it displayed the issue from the original hypothesis, and also has a large DOM structure. So the issue could be clearly seen. It is possible to use any page, but because of the nature of the workers and how they modify the HTML using CSS selectors, they can be quite flakey if you are relying on the basic ordering of elements on a page e.g. removing the 7th script element on a page (body > script:nth-of-type(7)
).
Browser setup
I decided to test pages in Chromium under a 3G Fast connection on a desktop device and also a real Moto G4 device (located in Dulles, USA). This should give us more of an idea as to "real world" performance.
Results Chrome Desktop - 3G Fast
Baseline (run through worker, no transform)
Chrome Desktop (run through worker, with HTML transforms applied)
Comparing the charts above it is possible to see the JS has been shifted forwards, and in doing so it is requested in the middle of the highest priority CSS. The CSS and JS will now be competing for limited bandwidth (3G Fast connection).
Visual Impact
Here we compare both tests with each other:
As you can see from the filmstrip and the visual progress there's been some improvement to the rendering of the font, but nothing substantial under these conditions. Looking at the visual progress graph we can actually see that the baseline started to render sooner than our transformed version.
Results Moto G4 Mobile - 3G Fast
Baseline (run through worker, no transform)
Chrome Desktop (run through worker, with HTML transforms applied)
In these two tests we can see in both cases that the requests all start at a similar time, but the real difference is the interleaving between CSS & JS once the scripts are moved into the head with the defer
attribute. At this point these resources are competing for limited bandwidth.
Visual Impact
Here we compare both tests with each other:
As you can see above the impact of moving the scripts to the <head>
and adding defer
has made a huge negative impact on performance. First paint is 500ms slower after modifications to the page are applied. A device with limited memory / CPU and bandwidth is having share these resources the best it can. The bandwidth being allocated between CSS and JS delays the download and parsing of all the CSS, which in turn delays the creation of the render tree. No render tree = nothing painted to the screen.
Conclusion
From these two sets of tests it is clear to see that it isn't just as clear cut as moving scripts into the head and adding defer
will always result in improved performance. In fact in some cases it looks to actually hinder performance.
Recommendation
My recommendation off the back of this testing is to keep the tag_mover.rb
processor in place, but factor in some additional functionality to allow developers to exclude certain scripts if required. E.g. via the addition of a data attribute. This would then give the flexibility to keep certain scripts in the head when required, and also mean the current functionality isn't broken.