If not already obvious, the title of this weblog post is clickbait. There are very good reasons to have highlight.js (‘picked on’ for being probably the most popular), Prism, et al., in a client-side package. These syntax highlighters are meant for highlighting your dynamic in web applications where a server side trip would be slow & expensive. In fact, the aforementioned options can be used as a part of a build-time or server-side highlighting solution.
The focus of this post however will be on that static side of the equation where syntax highlighting on the client is a problem—not disparaging a particular syntax highlighting library.
A probably not uncommon scenario
Imagine: you’re a developer & you just built a cool project or have a cool idea & now the time has come to publish it and/or content about it.
Since you’re a dev & your audience is largely other devs, you’ll be needing code samples & demonstrations on how to use your project/idea.
Your text editor had ‘fancy’ colors for the different syntactic elements of your code & your docs/posts deserves that same treatment.
With other devs reading, they’ll likely be used to this colorful setup too, & prefer too since the colors help most coders quickly parse the code in their heads.
But you’ve ran into a problem: HTML’s built-in
<pre> tag offers no such highlighting—no, you are stuck with black & white (or whatever colors you’ve designed with, or whatever color the consumer’s user agent selected) but this is solvable because you’ve seen it other places on the web!
You fire up your favorite privacy-friendly search engine to blast off a query: “how to add syntax highlighting to a website”.
SEO being best gamified by the so-called ‘gurus’ bubble up results at the time of this writing pointing specifically to highlight.js or Prism & with copypastable
<style> tags in the head from a third-party.
You, not wanting to think about this more necessary, paste in these resources in your project page & as if by magic, your new flagship laptop & flagship smart phone on an unlimited 5G data plan render the rainbow with but a small flicker.
Case closed. Or was it?
What’s happening with these scripts?
A couple of script was copied into our sources.
First if these script weren’t vendored onto our domain we’re probably connecting to a third-party CDN, & with none of the examples I saw checking integrity or talking about vendoring we can probably assume this to be true in many cases—but these CDNs are useless & dangerous, offer little to no performance improvement, can be hacked (tho integrity checks can mitigate somewhat), do go down, expose user IP addresses, & require adding exceptions to our content security policy CSP.
If we assume the dev did the right thing & vendored their sources, the files will be downloaded once the the user agent encounters them in the
<head> (or elsewhere 😞) & then a second script that must execute the syntax highlighter on the appropriate sources after both the script is downloaded+parsed & the
DOMContentLoaded event is fired.
The elements we selected will need to be queried from the DOM their text contents will be ran thru a lexer to gather tokens, followed by a parser, to make sense of those tokens, which is generates a syntax tree of the contents.
That tree is then handed to a printer function that needs to wrap all the relevant parts with their
class’d in a way that matches a corresponding style sheet.
The DOM element’s contents are wiped & then replaced with our printer’s output with
<span> elements (and maybe line numbers & other things if enabled).
To the outsider this is magical. The fact that it can happen fairly quickly is a bit of a marvel in & of itself. However, a lot is happening & it’s happening on every request & for every users.
Why is it bad to do syntax highlighting like this?
Our biggest offender: idempotency
Idempotency, in mathematics & computer science is when a certain action repeated multiple times produces the same result. The makers of our syntax highlighter options are smart & have test suites to more or less guarantee that given a particular version of the parser setup, given a certain blob of text to parse, we printer will give us the same output. Under our client-side setup, we are doing this load+query+lex+parse+print+insert loop on every page refresh, & each page we navigate to. But it’s not just a single user’s time/CPU cycles that were wasted, every user’s machine consuming this content is doing the exact same load+query+lex+parse+print+insert task to get the exact same resulting HTML. This task isn’t cheap either, especially on low-end hardware (an e-reader for instance should be a optimal devices to read weblog posts/documentation & they also are not known for be CPU powerhouses). The larger your project & with more users viewing it, the more resources are wasted.
If we were for example going to calculate the Fibonacci sequence for a large numbers we might employ a technique like memoization to cache results of previous iterations so we can look in that cache to find values we’ve calculated already. If we applied such a technique to syntax highlighting, it would look like this: our build tool would in CI or otherwise would run this highlighting once & we would serve those results to our users so only the build tool needed to calculate it for our users. Similarly we may have a dynamic-ish page that can cache these views at choose-your-layer of the stack. There are very few situation where the content doesn’t change often that an end user should ever be doing this parsing on this principle alone.
Delaying our experience & flashing content
Due to awaiting the page & the script’s loading + full execution, we will always cause repaints & flashes for users. There is no way around this with client-side rendering. This can lead to mild annoyance, to dropped users waiting for loading, to unnecessarily chewing thru a user’s battery. A slow initial paint can lead to worse performance metrics & unprioritized SEO in some cases as well.
Network implies latency & it’s not free
While it’s nice that syntax highlighters usually break up their scripts per language to save on size, even optimized these requests are delaying your page load times. In many parts of the world (or just folks that don’t like to be wasteful) downloading these scripts take considerable time. While this is true of all scripts, not all scripts are as useless syntax highlighters on static content.
As such, making experiences for these folks, likely power users, is ideal (within reason—we don’t need to resort to checkbox hacks & such).
In the scenario of the build-time or server-side rendering you give these users the same, optimal experience.
As a bonus can help the low-end, broken-X11, or saving-every-watt-of-battery folks where even
elinks, the TUI browser, supports CSS & could get the nice highlighted experience.
To be become an JS allowlister using uBlock Origin
- Navigate to the add-on’s settings (triple cogwheel ⚙️)
When on a site that requires JS is encountered:
- Open the fly-out menu
Solving by highlighting syntax just once
There are a wealth of highlighting options like Tree-sitter, highlight, Pygments, Rouge, Chroma, just to name a few. We also shouldn’t forget JS options leading this post in highlight.js & Prism that function just as well. All of these have have or could be adapted to a CLI or come in a library form—meaning they can be used at the time of build for a static site (like documentation) or ran quickly on the server side & sent to the user. Doing highlighting at this phase fixes all of the drawbacks in the previous section of this post on the pitfalls of client-side highlighting.
Can we think of (or create) some pitfalls?
Bad, but excusable is the solo developer or small team, but there are some big projects that are perpetuating this bad practice, & some of those projects are used for documentation for lots of downstream projects, & I want to call them out openly since I would like to see the landscape change. All of these projects could & should be baking syntax highlighting into the their systems & they have enough people to look at the problem. I hope to in the future strikethru these tools after they fix it.
<a>anchor tag), but I’ll spare you the other gripes. Not only does it do synax highlighting client-side via highlight.js, but by default, it’s using a public CDN without integrity checks. The scripts themselves are blocking in head without
Popular in the Rust community & I guess will be used for Nixpkgs, mdBook’s syntax highlighting section states it’s shipping with highlight.js (no CDN) which can be extended by the user.
Also the rendering situation is made worse by
<script>tags at the end of the body instead of in the
defer; this means the scripts aren’t blocking which is good, but when a script is in the
<head>it clues the user agent into starting to download these resources to be ready for when the document is loaded. Alternatively a prefetch link header would have a similar effect, it’s missing (the extra bytes might just favor
deferanyhow in some situations). There’s even open merge requests opened for years about build-time rendering, but nothing has happened with any of them.
- The Pijul Nest
- Recently this forge did upgrade to no longer requiring Cloudflare’s public CDN for its highlight.js scripts, choosing instead to vendor such scripts, but instead a) the entire architecture is now in Cloudflare’s ‘edge’ offerings & b) if you’re going to use a Cloudflare’s architecture, at least cache these views on the edge instead of hurting the user experience.
Client-side syntax highlighting has unseen costs to many developers. We can solve a lot of these costs by moving the syntax highlighting to the server and/or build tool. Developers should be considerate of a user’s time, their data usage, their power usage, & the impact frivelous computations have on the planet. These ideas can & should expand to other parts of the document with heavy parsing/rendering requirements such as LaTeX, MathJax, diagrams (Mermaid, Graphviz, etc.), & more.