The “Developer Experience” Bait-and-Switch

TL;DR: we cannot continue to use as much JavaScript as is now “normal” and expect the web to flourish. At the same time, most developers experience no constraint on their use of JS…until it’s too late. “JS neutral” (or negative) tools are here, but we’re stuck in a rhetorical rut. We need to reset our conversation about “developer experience” to factor in the asymmetric cost of JS.

JavaScript is the web’s CO2. We need some of it, but too much puts the entire ecosystem at risk. Those who emit the most are furthest from suffering the consequences — until the ecosystem collapses. The web will not succeed in the markets and form-factors where computing is headed unless we get JS emissions under control.

Against this grim backdrop, there’s something peculiar about conversations regarding the costs of JS-oriented development: a rhetorical substitution of developer value for user value. Here’s a straw-man composite from several recent conversations:

These tools let us move faster. Because we can iterate faster we’re delivering better experiences. If performance is a problem, we can do progressive enhancement through Server-Side Rendering.

This argument substitutes good intentions and developer value (“moving faster”, “less complexity”) for questions about the lived experience of users. It also tends to do so without evidence. We’re meant to take it on faith that it will all work out if only the well intentioned people are never questioned about the trajectory of the outcomes.

Most unfortunately, this substitution is frequently offered to shield the preferences of those in a position to benefit at the expense of folks who can least afford to deal with the repercussions. Polluters very much prefer conversations that don’t focus on the costs of emissions.

The backdrop to this argument is a set of nominally shared values to which folks assign different weights:

  • Universality and accessibility
  • Fidelity and richness
  • Competitive (or superior) cost to produce

The “developer experience” bait-and-switch works by appealing to the listener’s parochial interests as developers or managers, claiming supremacy in one category in order to remove others from the conversation. The swap is executed by implying that by making things better for developers, users will eventually benefit equivalently. The unstated agreement is that developers share all of the same goals with the same intensity as end users and even managers. This is not true.

Shifting the conversation away from actual user experiences to team-level advantages enables a culture in which the folks who receive focus and attention are developers, rather than end-users or the business. It naturally follows that teams can then substitute tools for goals.

This has predictable consequences, particularly when developers, through their privileged positions as expensive-knowers-of-things-about-computers, are allowed to externalize costs. And they do. Few teams I’ve encountered have actionable metrics associated with the real experiences of their users. I can count on one hand the number of teams I’ve worked with who have goals that allow them to block launches for latency regressions, including Google products. Nearly all developers in the modern frontend shops do not experience performance constraints until it’s too late. The brakes aren’t applied until performance is so poor that it actively hurts the business.

If one views the web as a way to address a fixed market of existing, wealthy web users, then it’s reasonable to bias towards richness and lower production costs. If, on the other hand, our primary challenge is in growing the web along with the growth of computing overall, the ability to reasonably access content bumps up in priority. If you believe the web’s future to be at risk due to the unusability of most web experiences for most users, then discussion of developer comfort that isn’t tied to demonstrable gains for marginalized users is at best misguided.

Competition between these forces is as old as debates about imagemaps vs. tables for layout. What’s new is JavaScript; or rather, the amount we’re applying to solve our problems:

Mobile pages have gone from an average of 50KB of JS in 2011 to more than 300KB today.

Median mobile sites have gone from ~50KB of JS in 2011 to more than 350KB today. That unzips to roughly 2MB of script.

I’ve previously outlined why JavaScript is the most expensive way to accomplish anything in a browser. This has been coupled with an attempt to lean on evolving facts about computing (it’s all going to mobile, mostly to Android, and not high-end devices). My hope is that anyone who connects these ideas will come to understand that we can’t afford to continue on as we have. We must budget. We must cap-and-trade JS. There is no other way to fix what we have now broken with script — we simply need to use less of it.

There have been positive signs that this message has taken root in certain quarters, but it has not generally changed the dynamic. Despite the heroic efforts of Polymer, Preact, Svelte, Ionic, and Vue to create companion “starter kits” or “CLI” tools that provide the structure necessary to send less JS be default, as many (or more) JS-heavy performance disasters cross my desk in an average month as in previous years.

And still framework marketing continues unmodified. The landing pages of popular tools talk about “speed” without context. Relatively few folks bring WPT traces to arguments. Appeals to “Developer Experience” are made without context. Which set of users do we intend to serve? All? Or the wealthy few? It is apparently possible to present performance arguments to the JavaScript community in 2018 — a time when it has never been easier to collect and publish traces — without traces against the global baseline or an explanation of why that baseline is inappropriate. The bait-and-switch still works, and that’s a hell of a problem.

Perhaps my arguments have not been effective because I hold to a policy of not posting analyses without site owner’s consent. This leaves me as open to critique by Hitchen’s Razor as my dataless interlocutors. The evidence has never been easier to gather and the aggregates paint a chilling picture. But aggregates aren’t specific, citable incidents. Video of a single slow-loading page lands in a visceral way; abstract graphs don’t.

And the examples are there, many of them causing material, negative business impact. A decent hedge-fund strategy would be to run a private WPT instance and track JS bloat and TTI for commercial-intent sites — and then short firms that regress because they just rewrote everything in The One True Framework. Seeing the evidence instills terror, yet I’ve been hamstrung to do more than roughly sketch the unfolding disaster while working behind the scenes with teams.

There is, however, one exception to my rule: the public sector. Specifically public sector sites in countries where I pay taxes. Today, that’s the US and the UK, although I suspect I could be talked into a more blanket exception.

So I’m going to start posting and dissecting a lot more traces of public sector work, but the goal isn’t to mock or shame the fine folks doing hard work for too little pay. Rather, it’s to demonstrate what “modern frontend” is doing to the accessibility of the web — not in the traditional “a11y” sense, but in the “is going to this site reasonable for its intended users?” sense. That is, I will be talking about this as a proxy for the data I can’t share.

Luckily, the brilliant folks at the USDS and the UK’s Government Digital Service have been cleaning up many of the worst examples of government-procurement-gone-wild. My goal isn’t to detract anything from this extraordinary achievement:

My hope, instead, is that by showing specific outcomes and the overwhelming volume of these examples it will become possible to talk more specifically about what’s wrong, using and pervasively citing data. I hope that by talking about what it means to build well when trying to serve everybody, we can show businesses how short they’re falling of the mark — and why those common root-causes in JS-centric development are so toxic. And if the analysis manages to help clean up some public sector services, so much the better; we’re all paying for it anyway.

This isn’t Plan A, but neither was the CDS talk in ’16 that got everyone so upset. I don’t like that this is where we are as a community and as a platform. I hate that this continues to estrange me from the JS community. We need tools. We need frameworks. But we need to judge them by whether or not the deliver a better developer experience without fundamentally impairing the user experience. We must get to JS-neutral (or, my preferred, Time-to-Interactive-neutral or negative) tooling. Frameworks and tooling need to explain clearly, in small words, with reproducible instructions how they deliver under-budget experiences, how much room is left after their budgetary cost, and what devices and networks their tools are appropriate in. This will mean that many popular tools are relegated to prototyping. That’s OK.

This is very much Plan D…or E. But the crisis is real and it isn’t inevitable. It is not exogenous. We made it, and we can fix it.

To get this fixed, we need to confront the “developer experience” bait-and-switch. Tools that cost the poorest users to pay wealthy developers are bunk. To do better, we need to move the conversation to an evidence-based footing. I wish the arguments folks made against my positions were data-driven. There’s so much opening! Perhaps a firm is doing market analysis and only cares about ever reaching users who make more than $100K USD/yr or who are in enterprise settings. Perhaps research will demonstrate that interactivity isn’t as valuable as getting bits on screen (the usual SSR argument). Or, more likely, that acknowledgement (bits on screen) buys a larger-than-anticipated amount of perceptual padding (perhaps due to scanning). Perhaps the global network landscape is shifting so dramatically that the budget for client-side JS runtime has increased. Perhaps the median CPU improvement that doesn’t look set to materialize until 2021 at the earliest will happen much earlier; i.e., maybe the current baseline is wrong!

But we aren’t having that conversation. And we aren’t going to have it until we identify, call-out, and end the “developer experience” bait-and-switch.

Thanks and apologies to Ade Oshineye, Ojan Vafai, Frances Berriman, Dion Almaer, Addy Osmani, Gray Norton and Philip Walton for their feedback on drafts of this post.

Effective Standards Work, Part 2: Threading the Needle

The web standards process fails us too often. This series explores the forces at work, how we’re improving the situation, and how you can shape new features more effectively.

“Part 1: The Lay of The Land” discussed persistent challenges in standards and forces that give rise to misunderstandings. It also described the ecosystem dynamics that make change difficult, even before considering the varying firm-level strategies of browser vendors.

Essential Ingredients

Making progress on new features is extraordinarily challenging in this environment. However, armed with a clear understanding of the situation, it’s possible to chart a narrow but reliable path forward. Necessary ingredients in solving problems on the web platform include:

  • The ability to fail cheaply
    …at least early on. Most ideas and designs aren’t good, and most of the ones we eventually accept don’t start good. Spaces that allow ideas to spring to life and quietly pass into time, or radically change without undue drama, are essential to improving our outcomes.
  • Participation by web developers and browser engineers
    Nothing good happens without both groups at the table.
  • A venue outside a chartered Working Group in which to design and iterate
    Pre-determined outcomes rarely yield new insights and approaches. Long-term relationships of WG participants can also be toxic to new ideas. Nobody takes their first tap-dancing lessons under Broadway’s big lights. Start small and nimble, build from there.
  • A path towards eventual standardisation
    Care must be taken to ensure that IP obligations can be met the future, even if the loose, early group isn’t convened with a strict IP policy
  • Face-to-face deliberation
    I’ve never witnessed early design work go well without in-person collaboration. At a minimum, it bootstraps the human relationships necessary to jointly explore alternatives.

It’s attractive to think that design can (or should) happen within a formal Working Group. A well-functioning WG should include both developers and implementers, after all. Those groups often have face-to-face meetings, and the path toward standardisation is shortest in those venues! But it doesn’t work; not often enough to be useful, anyway.

Starting your journey there leads to pain and failure. Why? The deck is stacked against design-in-committee, both structurally and procedurally.

Structurally, it is the job of a Working Group to evaluate proposals for inclusion in a specification. The basis for inclusion in nearly all standards I know of is not rigorous or scientific. Evidence is not (yet) a compelling argument. The norms of standards organisations are set, largely, by social cohesion amongst those working to improve the systems they maintain. The older the specification and the more stable the composition of the group, the harder it is for new ideas and people to enter with credibility.

A further difficulty for non-implementers (in another universe, “customers”) within these groups is the information asymmetry inherent in the producer/consumer relationship. Implementers feel a responsibility to resist designs they feel would be detrimental to either their architecture or their competitive position. New ideas have to enter this environment roughly “done” to even get on the agenda.

Procedurally, it’s the responsibility of chairs and the overall group to make progress towards the promised deliverables. Working Group charters typically set up a scoped set of deliverables and a time-table, and while there’s lots of play built into these things, groups that don’t continue to produce new versions on a regular basis are considered problematic. Problematic groups tend not to continue to receive the organisational support they require to continue.

Failure and iteration are the lifeblood of good design, but these groups are geared for success. They aggressively filter out new ideas to preserve their ability to ship new versions of specs. Once something is locked into a WG agenda, it’s “in”. This is inherently anti-iteration.

If you’ve never been to a functioning standards meeting, it’s easy to imagine languid intellectual salons wherein brilliant ideas spring forth unbidden and perfect consensus is forged in a blinding flash. Nothing could be further from the real experience. Instead, the time available to cover updates and get into nuances of proposed changes can easily eat all of the scheduled time. And this is expensive time! Even when participants don’t have to travel to meet, high-profile groups are comically busy. Recall that the most in-demand members of the group (chairs, engineers from the most consequential firms) are doing this as a part-time commitment. Standards work is time away from the day-job, so making the time and expense count matters. Before anyone gets into the room, everyone knows what the important topics will be, and if precious time is taken from resolving those issues — particularly to explore “half baked” ideas — influential folks (and the teams they represent) will be upset. Not a recipe for agreement.

The idea that a public, agenda-driven, minuted, chaired forum with a full docket and a room full of powerful decision-makers primed to say “no” is where your best design work will happen is barmy. Policies aren’t dreamt up in open session at Parliament, Congress, or the UN; rather they’re presented and voted on, possibly with minor amendments.

Note: There are many dysfunctional standards groups; they tend to have lighter agendas or a great deal of make-work. Those groups are unlikely to be well-attended by busy implementers. Groups that can’t keep implementer interest aren’t worth investing time in.

This insight is why the Chrome team now insists on doing design work in “incubation” forums. These can be embedded into a WG’s formal process (as at TC39), or in separate forums which are feeders for formal, chartered groups (e.g. WICG or RICG).

Design β†’ Iterate β†’ Ship & Standardise

What I’ve learned over the past decade trying to evolving the web platform is a frustratingly short list given the amount of pain involved in extracting each insight:

  • Do early design work in small, invested groups
  • Design in the open, but away from the bright lights of the big stage
  • Iterate furiously early on because once it’s in the web, it’s forever
  • Prioritize plausible interoperability; if an implementer says “that can’t work”, believe them!
  • Ship to a limited audience as soon as possible to get feedback
  • Drive standards with evidence and developer feedback from those iterations
  • Prioritise interop over perfect specs; tests create compatibility as much or more than tight prose or perfect IDL
  • Dot “i”s and cross “t”s; chartered Working Groups and wide review are important ways to improve your design later in the game

These derive from our overriding goal: ship the right thing.

All too often we’ve seen designs (cough AppCache cough) that could have been improved by listening to available feedback. Design processes without web developers involved tend to fail because they can’t error correct. Implementers most acutely feel the constraints of their system, not web developer reality. Without the voices of web developers, designs tend towards easy-to-build — rather than fit-for-purpose. Group-think too often takes hold, as those represented share the same perspective, making change and iteration harder.

Similarly, design efforts without implementers present are missing the constraints that lead to successful design. Proposals without this grounding are easily written off. It’s tempting to get a group together to design future APIs in a vacuum, but without implementers critical mass never forms.

So how can you shape the future of the platform as a web developer?

The first thing to understand is that browser engineers want to solve important problems, but they might not know which problems are worth their time. Making progress with implementers is often a function of helping them understand the positive impact of solving a problem. They don’t feel it, so you may need to sell it!

Building this understanding is a social process. Available, objective evidence can be an important tool, but so are stories. Getting these in front of a sympathetic audience within a browser team is perhaps harder. Thankfully, functional browser engine teams now staff sizable outreach and Developer Relations groups (oh hai, @ChromiumDev, @mozappsdev, MSEdgeDev, and Jonathan!). Similarly, if you happen to work for a top-1k web property, your team may already have a connection to a browser’s partnerships team. Those teams can route thoughtful questions to the right engineers.

Other models for early collaborations involve sideline conversations at industry gatherings, e.g. TPAC or BlinkOn. Special-purpose vehicles like W3C Workshops are somewhat harder to organize, but browser engineers are willing to join them. I can’t speak for other vendors, but Chromies are also willing to travel for ad-hoc gatherings to do early design work. Andrew Betts masterfully orchestrated such an event while at the FT, kicking off what became Service Workers. You might not have Andrew’s wealth of connections, but odds are you probably know someone who does. Remember, at the start this is about individuals. Drawing attention to an issue that you think is important means building a small group of like-minded folks. It’s effort to find “your people”, but it’s far from impossible!

Next, recognize that the design, development, iteration, and eventual standardisation phases take time. Sometime a lot of time. As a web developer, it’s unlikely that you’ll be able to sustain professional interest in such a process as there’s no practical way it can bear fruit in time for your current (or even next) project. This is not a personal failing, it’s just how the gearing works. You have information that browser teams don’t, but less leverage and time. Setting them on a better course is helping the next person and, if you’re doing this as your profession, may eventually help you too. Don’t feel guilt for needing to drop out of the process at some point.

It has gotten ever easier to stay engaged as designs iterate. After initial meetings, early designs are sketched up and frequently posted to GitHub where you can provide comments. Forums like WICG let you provide direct design feedback during development — a very intentional shift by the Chrome and Edge teams to give developers a louder voice when designs are still maleable.

Further along the process, Chrome is now running a series of “Origin Trials”, an idea the Chrome team borrowed from Jacob Rossi at MSFT. Origin Trials allow developers to test new features on live sites and shape their evolution. Teams running these trials actively solicit feedback and frequently change them in response.

Astute readers will note none of this involves joining a Working Group or keeping up with busy mailing lists. Affecting the trajectory of the web platform has never been easier, assuming you know which side of the amplifier to approach.

“Ship The Right Thing”

These relatively new opportunities for participation outside formal processes have been intentionally constructed to give developers and evidence a larger role in the design process. We’ve supported their creation because they help us to separate open design and iteration from standardisation, allowing each process to assist the community in improving features at the point where they are most effective.

These processes aren’t perfect by any stretch, and it would be an epic understatement to suggest that the broader browser and standards communities agree with design via incubation outside of formal Working Groups. Maintenance work is a particularly thorny topic. Regardless, the Chrome team has gathered compelling evidence that this is a better way to work.

Prizing collaboration, iteration, and evidence has enabled us to shape process to support those values. Incubation and related processes let us be more responsive to developers while simultaneously increasing confidence that features shipped to Stable meaningfully address problems worth solving. Hopefully this series will help you shape the future with fewer misunderstandings. After all, we all want to see the right thing ship.


I’ve been drafting and re-drafting versions of this post for almost 4 years. In that time I’ve promised a dozen or more people that I had a post in process that talked about these issues, but for some of the reasons I cited at the beginning, it has never seemed a good time to hit “Publish”. To those folks, my apologies for the delay.

There’s a meta-critique of formal standards and the defacto-exclusionary processes used to create them. This series didn’t deal in it deeply because doing so would require a long digression into the laws surrounding anti-trust and competition. Suffice to say, I have a deep personal interest in bringing more voices into developing the future of the web platform, and the changes to Chrome’s approach to standards discussed above have been made with an explicit eye towards broader diversity, inclusion, and a greater role for evidence.

Deep thanks to Andrew Betts, Bruce Lawson, Chris Wilson, and Mariko Kosaka for reviewing drafts of this series and correcting many of the errors within.

Effective Standards Work, Part 1: The Lay Of The Land

The web standards process fails us too often. This series explores the forces at work, how we’re improving the situation, and how you can shape new features more effectively.

“Why don’t browsers match standards!” muttered the thoughtful developer (just before filing an issue at crbug.com). “The point of standards is so that everything works the same.”

Once something is in The Standard, everyone will implement interoperably…right?

Maybe. Confusion about how interoperability is really created sparks many of the worst arguments I’ve witnessed in my ~12 years working on web platform features.

Not every effort I’ve been involved in has succeeded (e.g. ES4), while some have but took too long. A few even went well. Having made most of the available mistakes, I was handed the responsibility to keep Chrome engineers from error as our “Web Standards Tech Lead”. I don’t write about it much because anything said is prone to misinterpretation; standards-making is inherently political. In the interest of progress, this 2-part series is a calculated risk.

Some problems on the web can be solved in userland. Others require platform changes. For historical and practical reasons, changes to the web platform must be codified in web standards. The community of web developers and browser vendors share this norm. Browsers are substitutable and largely rival goods. As a result, every party has reasons to value compatibility and interoperability. Developers benefit when interop extends the reach of their products and services. Browsers benefit from interop as end-users abandon browsers that can’t access existing services and content. New features, then, require a trajectory towards standardisation. Features shipped without corresponding standards proposals are viewed critically. For business folks primed to hear “proprietary” as a positive, it can be surprising to encounter the web community’s loathing of non-standard features.

From far enough away, it may appear as though new features “happen” at the W3C or WHATWG or ECMA or IETF. Some presume that features which are standardised at these organisations originated within them — that essential design work is the product of conversations in committee. If vendors implement what standards say, then surely being part of the standards process is how to affect change.

Standards Theory

This isn’t how things work in practice, nor is it how feature design should work. Instead, new features and modifications are brought to Standards Development Organisations (“SDOs”) by developers and vendors as coherent proposals. It’s important to separate design from standards making. Design is the process of trying to address a problem with a new feature. Standardisation is the process of documenting consensus.

The process of feature design is a messy, exciting exploration embarked upon from a place of trust and hope. It requires folks who have problems (web developers) and the people who can solve them (browser engineers) to have wide-ranging conversations. Deep exploration of the potential solution space — discarding dozens of ideas along the way — is essential. A lot of the work is just getting to agreement on a problem statement. This is not what formal standards processes do.

For boring reasons of IP licensing, anti-trust law, and SDO governance, chartered Working Groups (e.g., at the W3C) state what it is they’re going to deliver, often years before they actually do. Some SDOs require “signs of life” from potential implementers (usually 2 or more) to progress in their processes. IETF famously refers to this as “running code and rough consensus”. Clear problem statements and potential solutions must be proposed and partially agreed to even get a hearing. This puts formal standards-making late in the process (when done correctly).

TL;DR: SDOs and their formal Working Groups aren’t in the business of feature design.

SDOs and Working Groups can’t tell you if a design solves an important problem or solves it well. They are set up to pass judgement on the details of a specification and a consensus surrounding the specific words in spec documents.

Working Groups and SDOs are not fitness functions for features.

Getting a design through committee says next to nothing about its quality. Many an august and thoughtful person has engaged in outrageous groupthink when these processes are asked to predict the future rather than document consensus. Without developers trying a feature and providing feedback, Working Groups quickly wander into irrelevant pastures. And who’s to tell them they’re wrong? They are staffed with experts, after all!

SDOs are best understood as amplifiers: they take raw inputs, filter them to prevent major harm if played at top volume, then use processes to broadcast them. How the inputs came to be gets obscured in this process.

I can report these histories aren’t lost, but they are unattractive. Participants have reasons not to tell them — boredom with a topic, the phantom pain of arguments nearly won, and the responsibilities of statesmanship towards counterparties. Plucky documentarians sometimes try, but standards dramas don’t exactly jump off the page.

When the deeper histories aren’t told or taught, it becomes hard for newcomers to get oriented. There’s no effective counter-narrative to “progress come from standards bodies”, and no SDO is going to turn down new members. Nobody tells developers not to look to SDOs for answers. Confusion reigns.

The Forces At Play

Feature design starts by exploring problems without knowing the answers, whereas participation in Working Groups entails sifting a set of proposed solutions and integrating the best proposals. Late iteration can happen there, but every change made without developer feedback is dangerous — and Working Groups aren’t set up to collect or prioritise it.

Even when consensus is roughly achieved, standards processes are powerless to compel anyone to implement anything, even Working Group participants! Voluntary standards are not regulations. Implementers adopt standards because customers demand interoperability to hedge against vendor market power.

It can’t be repeated enough: the fact of something appearing in a web standard compels nobody to implement, even those who implemented previous versions. This flows from the fundamental relationships between browsers, developers, and users.

Browser vendors exercise absolute control over the bits in their binaries because they need to do what’s best for users. A standard might declare that window.open() creates a new browsing context, yet browsers block popups in practice. Chrome has even formalized this idea of “interventions”.

Preventing abuse is one thing, but perhaps new features are different? A browser can add dozens of proprietary features, but if developers decline to adopt, users don’t benefit. Developer adoption is frequently gated on interoperability, a proxy for “can I reach all the users I want to if my app requires this feature?” That is, developers often decline to use features that aren’t already in every engine they care about.

Circularly, non-implementing engines view features without broad use as cost rather than potential benefit. This is particularly true when the alternative is to improve performance of existing features. A sure way for a browser engineer to attract kudos is to make existing content work better, thereby directly improving things for users who choose your browser. Said differently, product managers intuitively prefer surefire improvements to their products rather than speculative additions which might benefit competitors as much (or more). Enlightened ecosystem thinkers are rare, although the web has been blessed with several of them recently.

Last, but not least, a lack of solidarity looms over the enterprise. Browser engineers don’t make websites for a living, so they also lack intrinsic motivation to improve the situation. Many an epic web standards battle to advance an “obviously” missing feature can be traced to this root.

Note: the whole ecosystem breaks down when users do not have a real choice of engine, e.g. on iOS. When competition is limited vendors are free to structurally under-invest and the platform becomes less dynamic and responsive to needs. Everyone loses, but it’s a classic deadweight loss, making it harder to spot and call out.

The status quo is powerful. Every vendor has plausible (if played-up) reasons not to implement new features. They can stall progress until they are last (or second-to last in a 4+ party ecosystem) to implement. There’s no developer wrath until that point, and it’s easy to cast FUD on not-yet-pervasive features while holding a pocket veto. Further, it’s the job of SDOs and formal Working Groups to kill ideas that are not obviously fit for purpose or which lack momentum.

This situation looks hopeless for folks trying to make progress. The slow pace of some essential but overdue improvements (Responsive Images, CSS variables, ES6 Classes, Promises & async/await, Web Components, Streams, etc.) would give any sane observer pause. Is it worth even trying?

The enormous positive impact that changes to the web platform can deliver makes me believe the answer is “yes”. In Part 2 I’ll share details of how the Chrome Team changed its thinking about feature development and standards and how that’s enabling the entire web community to deliver more progress faster.

Keep Reading: Part 2, Threading The Needle

Hearty thanks to Andrew Betts, Bruce Lawson, and Mariko Kosaka for reviewing drafts and correcting many of my innumerable errors.

Can You Afford It?: Real-world Web Performance Budgets

TL;DR: performance budgets are an essential but under-appreciated part of product success and team health. Most partners we work with are not aware of the real-world operating environment and make inappropriate technology choices as a result. We set a budget in time of <= 5 seconds first-load Time-to-Interactive and <= 2s for subsequent loads. We constrain ourselves to a real-world baseline device + network configuration to measure progress. The default global baseline is a ~$200 Android device on a 400Kbps link with a 400ms round-trip-time (“RTT”). This translates into a budget of ~130-170KB of critical-path resources, depending on composition — the more JS you include, the smaller the bundle must be.

We’ve had the pleasure of working with dozens of teams over the past few years. This work has been illuminating, sometimes in very unexpected ways. One of the most surprising results has been the frequent occurrence of “ambush by JavaScript”:

Business leaders who green-light the development of Progressive Web Apps frequently cite the ability to reach new users with near-zero friction as a primary motivator. At the same time, teams are reaching for tools which make achieving this goal impossible. Nobody is trying to do a poor job, and yet the results of a “completed” PWA project often require weeks or months of painstaking rework to deliver minimally acceptable performance.

This rework delays launch which, in turn, delays gathering data about the viability of a PWA strategy. Teams we aren’t able to work with directly sometimes do not catch these problems until it’s too late, launching experiences which are simply unusable for all but the wealthiest.

Setting A Baseline

Teams that avoid unpleasant surprises tend to share a few traits:

  1. Executive sponsors are enthusiastic. They use “do what it takes” language to describe the efforts to get and stay fast
  2. Performance budgets are set early in the life of the project
  3. Budgets are scaled to a benchmark network & device
  4. Tools and CI systems help them monitor progress & prevent regressions

These properties build on each other: it’s difficult to get the space you need to plan to do things well without decision makers who value user experience and long-term business value. Teams with this support are free to set performance budgets, do “bakeoffs” between competing approaches, and invest in performance infrastructure. They’re also more able to go against the “industry standard” grain when popular tools prove to be inappropriate.

Performance budgets keep everyone on the same. They help to create a culture of shared enthusiasm for improving the lived user experience. Teams with budgets also find it easier to track and graph progress. This helps support executive sponsors who then have meaningful metrics to point to in justifying the investments being made.

Budgets set an objective frame for determining which changes to the codebase represent progress and which are regressions from the user perspective. Without them it’s impossible to avoid slipping into the trap of pretending you can afford more than you can. Very rarely have we seen a team succeed that doesn’t set budgets, gather RUM metrics, and carry representative customer devices.

Partner meetings are illuminating. We get a strong sense for how bad site performance is going to be based on the percentage of engineering leads, PMs, and decision makers carrying high-end phones which they primarily use in urban areas.

Doing better by users involves 2 phases:

  • Challenging assumptions & growing understanding of real-world conditions
  • Automating testing against an objective baseline

Never before have front-end teams enjoyed access to such good performance tools and diagnostic techniques, yet poor results are the norm. What’s going on here?

JS Is Your Most Expensive Asset

One distinct trend is a belief that a JavaScript framework and Single-Page Architecture (SPA) is a must for PWA development. This isn’t true (more on that in a follow-up post), and sites which are built this way implicitly require more script in each document (e.g., for router components). We regularly see sites loading more than 500KB of script (compressed). This matters because all script loading delays the metric we value most: Time to Interactive. Sites with this much script are simply inaccessible to a broad swath of the world’s users; statistically, users do not (and will not) wait for these experiences to load. Those that do experience horrendous jank.

We’re often asked “what’s the big deal about 200KB of JS, some of our images are that size?” A good question! Answering it requires an understanding of how browsers process resources (which differs by type) and the concept of the critical path. For a timely introduction, I recommended Kevin Schaaf’s recent talk.

Late-loading JavaScript can cause “server-side rendered” pages to fail in infuriating ways. This uncanny-valley effect is the reason we focus on when pages become reliably interactive.

Consider a page like:

<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" href="/styles.css">
<script src="/app.js" async></script>
</head>
<body>
<my-app>
<picture slot="hero-image">
<source srcset="img@desktop.png, img@desktop-2x.png 2x"
media="(min-width: 990px)">
<source srcset="img@tablet.png, img@tablet-2x.png 2x"
media="(min-width: 750px)">
<img srcset="img@mobile.png, img@mobile-2x.png 2x"
alt="I don't know why. It's a perfectly cromunlent word!">
</picture>
</my-app>
</body>
</html>

The browser encounters this document in response for a GET request to https://example.com/. The server sends it as a stream of bytes and when the browser encounters each of the sub-resources referenced in the document, it requests them.

For this page to be done loading it needs to be responsive to user input — the “interactive” in “Time to Interactive”. Browsers process user input by generating DOM events that application code listens to. This input processing happens on document’s main thread, where JavaScript runs.

Here are some operations that can happen on other threads, allowing the browser to stay responsive:

  • Parsing HTML
  • Parsing CSS
  • Parsing and compiling JavaScript (sometimes)
  • Some JS garbage collection tasks
  • Parsing and rasterizing images
  • GPU-accelerated CSS transformations and animations
  • Main-document scrolling (assuming no active touch listeners)

These operations, however, must happen on the main thread:

  • Execution of JavaScript
  • Construction of DOM
  • Layout
  • Processing input (including scrolling w/ active touch listeners)

If our example document wasn’t reliant on JavaScript to construct the <my-app> custom element, the contents of the document would likely be interactive as soon as enough CSS and content was available to render meaningfully.

Script execution delays interactivity in a few ways:

  • If the script executes for more than 50ms, time-to-interactive is delayed by the entire amount of time it takes to download, compile, and execute the JS
  • Any DOM or UI created in JS is not available for use until the script runs

Images, on the other hand, do not block the main thread, do not block interaction when parsed or rasterized, and do not prevent other parts of the UI from getting or staying interactive. Therefore, while a 150KB image won’t appreciably increase TTI, 150KB of JS will delay interactivity by the time required to:

  • Request the code, including DNS, TCP, HTTP, and decompression overhead
  • Parse and compile the top-level functions of the JS
  • Execute the script

These steps are largely serialized.

If script execution could stay under 50ms for a bundle this large, TTI would not be delayed, but that’s not feasible. 150KB of gzipped JavaScript expands to roughly 1MB of code, and as Addy documented, that’s going to take more than a second on most of the world’s phones not including the time to fetch it.

JavaScript is the single most expensive part of any page in ways that are a function of both network capacity and device speed. For developers and decision makers with fast phones on fast networks this is a double-whammy of hidden costs.

Global Ground-Truth

Deciding what benchmark to use for a performance budget is crucial. Some teams and businesses know their audience intimately and can make informed estimates about the devices and networks current and prospective users are on. Most, however, do not have such a baseline easily to-hand. Where to start?

Two numbers set the stage:

The median user is on a slow network. Just how slow is a matter of some debate.

Our metrics at Google show a conflicted picture (which I’m working to get to clarity on). Some systems show median RTTs near ~100ms for 3G users. Others show the median user unable to transmit and receive an individual packet in less than 400ms in some major markets.

I suggest we should be conservative. Contended, over-subscribed cells can make “fast” networks brutally slow, transport variance can make TCP much less efficient, and the bursty nature of web traffic works against us.

Googlers enjoy access to a simulated “degraded 3G” network to help validate the behaviour of their apps under these conditions. It simulates a link with a 400ms RTT and 400-600Kbps of throughput (plus latency variability and simulated packet loss). Given the conflicted data we see across our other systems, this seems about right as a baseline.

Simulated packet loss and variable latency, however, can make benchmarking extremely difficult and slow. The effect of a lost packet during DNS lookup can be a difference of seconds, making it frustrating to compare before/after for changes at development time. Our baseline, then, should probably trade lower throughput/higher-latency for packet loss. What we lose in real-world fidelity, we gain in repeatability and the ability to compare across changes and across products. There’s much, much more to say about the effects of DNS, TLS, network topology, and other factors. For those who want to go deeper on this, I highly recommend Ilya Grigorik’s “High Performance Browser Networking”. The coverage of RRC alone makes it worth your time.

Back to our baseline, we now have a sense for what our simulated network conditions should be: 400ms RTT, 400Kbps bandwidth. What about the device itself?

At last year’s Chrome Dev Summit I discussed some of the thermal and power-limiting factors that create a huge disparity between desktop and mobile device performance. Add onto that the yawning chasm between low-end and high-end device performance thanks to chip design factors like cache sizes, and it can be difficult to know where to set a device baseline. Thankfully, this is somewhat easier than network speeds: more than half of American mobile users are on Android devices. As you look abroad, worldwide smartphone shipments are (and for the past 5 years have been) overwhelmingly Android-based. The average selling price for those devices is falling in most geographies, driven by the ubiquity of Android and relentless price drops within that ecosystem. This, in turn, drives the single most important trend in setting the global web performance budget hardware baseline: the next billion users will largely come online when they can afford to. This will drive declines in smartphone average-selling-price (“ASP”) in emerging markets for the foreseeable future. This, in turn, means that all improvements to transistor-count-per-dollar will translate into lower selling prices, not faster devices (on average).

The true median device from 2016 sold at about ~$200 unlocked. This year’s median device is even cheaper, but their performance is roughly equivalent. Expect continued performance stasis at the median for the next few years. This is part of the reason I suggested the Moto G4 last year and recommend it or the Moto G5 Plus this year.

Putting it all together, our global baseline for performance benchmarking is a:

  • ~$200 (new, unlocked) Android phone
  • On a slow 3G network, emulated at:
    • 400ms RTT
    • 400Kbps transfer

For most technologists, building applications for this environment might as well be farming on Mars. Luckily, this configuration is available on webpagetest.org/easy, meaning we can re-create these conditions here on earth, any time we like.

The Affordability Calculation

The last thing we need for our perf budget is time. How long is too long?

I like Monica’s definition:

…but that’s more qualitative than quantitative. Numerically, we’d prefer every page load occur in under a second (see RAIL). That’s not possible on real-world networks, so we’ve set the following Time-to-Interactive (TTI) metric goal with partners:

  • TTI under 5 seconds for first load
  • TTI under 2 seconds for subsequent loads

We now have everything we need to create a ballpark perf budget for a product in 2017.

First Load

Working backwards from time, network conditions, and the primary stages of the critical path, we get a few interesting results. We can start with our first-load budget of 5 seconds and begin to calculate how much transfer we can afford.

First we subtract 1.6 seconds from our budgets for DNS lookup and TLS handshaking, leaving us 3.4s to work with.

Then, we calculate how much data we can send over this link in 3.4 seconds: 400 Kbps = 50KB/s. 50KB/s * 3.4 = 170KB.

NOTE: This discussion is sure to infuriate competent network engineers. Previous versions of this article discussed slow-start, bdp, tcp window scaling, and the like. They were commensurately difficult to follow. Simplifying has relatively little impact on the overall story, so those details are elided.

Modern web applications are largely composed of JS, meaning we also need to subtract the amount of time the JS needs to parse and evaluate. The gzip compression factor for JS code is between 5x and 7x. 170KB of JS then becomes ~850KB-1MB of JS which, based on earlier estimates, may take a second to run (presuming it doesn’t do any expensive DOM work, which of course it will). Playing with these numbers a little bit, we can get back below 3.4s of download and eval by limiting ourselves to 130KB of JS transferred on the wire.

One last wrench in the works: if any of our critical-path resources come from a different origin (e.g., a CDN), we need to subtract connection setup time for that origin (~1.6s) from the budget, further limiting how much of our 5s we actually get to can spend on network transfer and client-side work.

Putting it all together, under ideal conditions, our rough budget for critical-path resources (CSS, JS, HTML, and data) at:

  • 170KB for sites without much JS
  • 130KB for sites built with JS frameworks

This gives us the ability to consider the single most pressing question in front-end development today: “can you afford it?”

For example, if your JS framework takes ~40KB of transfer on a JS-heavy site (which gets a budget of 130KB thanks to JS eval time), you’re left with only 90KB of “headroom”. Your entire app must fit into that space. A 100KB framework loaded from a CDN is already 20KB over budget.

Think back: your framework of choice might be 40K, but what about that data system? The router you added? Suddenly 130KB doesn’t seem like a lot when you also need to include data, templates, and styles.

Living on a budget means constantly asking yourself “can I really afford this?”

Second Load

In an ideal world, all page loads happen in under a second, but for many reasons that’s often not feasible. Therefore we’re going to give ourselves a bit of a breather and budget 2 seconds for second (third, fourth, etc.) load.

Why not 5? Because we shouldn’t need to ever go to the network to get our app’s UI booted once we’ve visited it the first time. Service Workers and “offline first” architectures enables us to put interactive pixels on screen without ever touching the network. This is the key to achieving reliable performance.

Two seconds is forever in modern CPU terms, but we still need to spend it wisely. Factors we need to account for include:

  • Process creation time (Android is relatively slow vs. other OSes)
  • Time required to read bytes from disk (it’s not zero, even on flash-based storage!)
  • Time to execute and run our code

Every app I’ve seen that hits a 5s initial load and implements offline-first correctly stays under this 2s budget, and sub 1s is possible! But getting to offline-first is a huge challenge for many teams. Architecting to save last-seen user data locally, cache app resources in a reliable and coherent way, and juggle application code upgrades using the Service Worker lifecycle can be a major undertaking.

I’m looking forward to tools continuing to evolve in this area. The most comprehensive bootstrap I know of today is the Polymer App Toolbox, so if you’re not sure where to start, start there.

130-170KB…Surely You’re Kidding!?!

Many teams we talk to wonder if it’s even possible to deliver something useful in as little as 130KB. It is! the PRPL pattern shows the way through aggressive code-splitting based on route awareness, Service Worker caching of granular (subsequent-page) resources, and clever use of modern protocol enhancements like HTTP/2 Push.

Taken together, these tools enable us to deliver functional, modern experiences in under 100KB for the critical path.

Sadly, it’s still sort of difficult to tell from a specific trace which parts of the page load are critical-path resources for TTI and which aren’t, but I’m optimistic that tools will evolve quickly to help us understand this key metric.

Regardless, we know it’s possible, even without giving up on frameworks entirely. Both Wego and Ele.me are built with modern tools (Polymer and Vue, respectively) and help users complete real transactions today. Most apps are less complex than they are. Life on a budget isn’t starvation.

Tools for Teams on a Budget

Getting under-budget is hard, but the benefits to the business and to users are immense. Less often discussed are the benefits to engineering teams and their leaders. No tech-lead or PM wants to be on the wrong side of an executive who walks into their area with a phone asking “so why is this so slow when I’m on vacation?”

This isn’t theoretical.

I’ve seen teams that have just finished re-building on a modern tech stack cringe for an hour as we walk them through the experience of using their “better”, “faster” experiences under real-world conditions.

Everyone loses face when the product fails to meet expectations. Months of unplanned performance fire-fighting delay the addition of new features and have a draining effect on team morale. When performance becomes a crisis, mid-level managers get caught between being the “shit umbrella” their teams count on and crushing self doubt. Worse, they may begin to doubt their team. The other side of a performance crisis is a long road; how can the organisation trust the team to deliver a quality product? Can they trust the TLs to recommend new technology or large re-investments? Recriminations follow. This is a terrible experience, specifically for developers who are too often on the receiving end of incredible pressure to “fix it”, ASAP — and “it” may be a core technology the product is built on.

In the worst cases, the product may be unfixable on a short enough timeframe to help the business. A lot of progress is Darwinian and for startups and small teams, betting on the wrong stack without the benefit of a long runway can be fatal. Worse, this can go un-diagnosed for a long, long time. If the whole team carries the latest iOS devices on fast, urban networks and the product’s economics are premised on growing a broad-based audience, the failure of that audience to arrive barely makes a sound.

Performance isn’t the (entire) product, of course. Lots of slow or market-limited products do incredibly well. Having a unique service that people want (and will go out of their way for) can override all of these other concerns. Some folks even succeed in App Stores where friction-to-acquire an experience is intense. But products in competitive marketplaces need every advantage.

Some specific tools and techniques can help teams that adopt a performance budget:

  • webpagetest.org/easy: this is our go-to tool for one-off analysis.
  • WPT scripting: for teams that don’t want to set up a custom WPT instance and have public URLs for their WIP apps, integrating with WPT scripting can be a great way to get regular “checks”
  • WPT private instances: teams that want to integrate WPT directly into their CI or commit-queue systems should investigate setting up a private WPT server and hardware
  • Scripted Lighthouse: not ready for a full WPT instance? Scripting Lighthouse can help your CI automate analysis of your site and catch regressions
  • grunt-perfbudget is an even-easier, automated WPT testing for your CI. Use it!
  • Speedcurve and Calibre: these hosted services automate tracking performance over time, delivering an outstanding real-world gut-check
  • Webpack Performance Budgets: for teams using webpack in their build steps, enabling this configuration can provide great development-time warning for resources that exceed budgets.
  • bundlesize and pr-bot let you set per-script budgets which can be automatically enforced as part of your pull-request process. Recommended!

Success in combating bloat often means turning warnings into hard errors. Teams with CI or commit-queue systems should strongly consider disallowing commits that break the (performance) bank.

For teams starting fresh, my strong recommendation is to start with a stack that embeds strong opinions about app structure, code splitting, and build targets. The best of those today are:

Whatever tools your team chooses, a budget is essential. Without one, even the most advanced, “lightweight” frameworks can easily create bloated, unusable apps. Starting from the global baseline and only increasing the budget based on hard numbers is the best way I know of to ensure your project lands well for everyone.

Endnotes

In the interest of time and space, discussion of future-friendly architectures will have to wait for another post. The curious can dig into Service Workers, Navigation Preload, and Streams. Their powers combined are going to fundamentally transform the optimal page-load for 2018 and beyond.

Lastly, thanks to everyone who reviewed early drafts of this post, including (but not limited to): Vinamrata Singal, Paul Kinlan, Peter O’Shaughnessy, Addy Osmani, and Gray Norton. Hopefully their valiant attempts to direct this article away from error were not overcome by my talent in adding it.