code-for-a-living December 7, 2020

Can developer productivity be measured?

Defining and measuring programmer productivity is one of the most difficult parts of an engineering manager or CTO’s job description. When everything you do is intangible, how should you measure it? Can it be measured at all?

Defining and measuring programmer productivity is something of a great white whale in the software industry. It’s the basis of enormous investment, the value proposition of numerous startups, and one of the most difficult parts of an engineering manager or CTO’s job description. It’s also a source of anxiety for developers at all experience levels: how do you know if you’re doing enough, both on and off the clock? When everything you do is intangible, how should you measure it? Can it be measured at all? In this article I’ll discuss the biggest pitfalls of productivity measurement and a few ways to do it well.

In software development, as in any other field, many people think of productivity in terms of inputs and outputs. A full-time developer works 40 hours per week for an average salary of $107,510 per year in the United States. Hours and salary are visible, easily quantifiable inputs. The developer then produces software features, documentation, deployments, and/or bug fixes on a recurring basis. These are outputs. If developers are as simple as the software we imagine they are writing, then increasing their productivity should be as simple as asking them to work more hours or paying them higher salaries. Of course, this is a fairy tale. Neither developers nor software work like that.

The problems of input measurement

“Hours worked” is one of several false metrics used as a proxy for job performance. I mention it first because it’s an oft-unexamined default, a path of least resistance. If a company doesn’t intentionally avoid doing so, it will sooner or later deteriorate into an hours-only environment. Outside of a pandemic where remote work is the norm, the symptoms of an hours-only environment are easy to recognize. Working hours are seen as non-negotiable, and being present at the office is seen as proof that someone is working. Anyone who tries to leave the office a couple hours early is met with hostility (sometimes as muted as a few raised eyebrows, sometimes more brazen). Anyone who works late into the evening or comes in on the weekend is seen as a high performer. The incentives of this “last to leave the gym” culture are unfortunate: developers are pushed to spend more and more of their lives at work, left without any other way to demonstrate their value, and lulled into paying only secondary attention to their work output. As time goes on, the workplace becomes more and more a place where everyone is working but nothing is getting done.

The problems don’t end there. If we assume that all work is “positive work”—that is, that all work represents progress toward a goal—then we are mistaken. Developers who have worked while exhausted, distracted, or sick tend to be familiar with the concept of “negative work”: work so poorly done that it must be undone or compensated for later, thus increasing rather than decreasing the amount of work remaining. Software development is complex, abstract, attentive work, and therefore hypersensitive to a developer’s mental state. That is, there are hidden inputs at play: anxiety, depression, burnout, toxicity at work, grief, microaggressions, and a hundred other things that can reduce or invert individual productivity on any given day. If company culture demands long hours week after week, or even just eight-hour days with no flexibility or vacation time, developers will inevitably spend time doing negative work: they will literally accomplish less by staying late than they would have if they had gone home earlier. And due to fatigue, they’ll accomplish less the next day too.

On the other hand, an hours-only environment is not the worst case scenario. It has a spectre of fairness about it: if two developers are working the same number of hours, there is one clear dimension on which they are equals. Neither of them appears to be slacking off, neither appears to be doing more than their fair share. If they produce less than expected, well, at least they put in their time. And the “hours worked” metric doesn’t explicitly incentivize bad code like some metrics do. So while it’s a poor metric, and even works against productivity in many situations, there are much worse metrics we should discuss.

Consider the other obvious input to software development: money. I have jokingly suggested to my manager once or twice that productivity should be measured by salary, and if my salary were doubled I would produce code at the level of a world-class software architect. Of course, you know intuitively that this is ridiculous. Paying someone more money doesn’t immediately make them more productive (although, indirectly and on a limited scale, it may). Yet, in my mind, money and hours belong to the same category: not just inputs, but auxiliary ones, only tenuously driving productivity. One is given by the employer, the other by the employee, but this exchange is incidental to the creation of useful software.

Long story short, measuring inputs is a deficient technique because software development is not an equation and code cannot be built by assembly line. So let’s talk about outputs.

The pitfalls of output measurement

Here, perhaps counterintuitively, we find many of the worst metrics in the software development world. Some have famously fallen into the trap of thinking that the work output of software development is lines of code or commits in version control. Certainly these are part of the process, but they’re more like byproducts than results. Strictly speaking, a line of code that doesn’t solve a problem is worse than no code at all. So measuring a developer’s productivity by how much code they contribute is like measuring a power plant by how much waste they produce or measuring Congress by how many bills they pass; it’s tangential to actual value.

What’s worse, gaming these measurements is trivially easy. A developer who gets paid per line of code can easily earn an entire year’s salary in a single day without creating any business value whatsoever. Most developers will adopt a subtler approach, but all the same, you should be careful what you wish for.

When a measure becomes a target, it ceases to be a good measure.
~Goodhart’s Law

Developers, by and large, understand this—and yet, embarrassingly, we still tend to use commits and lines of code as proverbial peacock feathers. Our eyes widen when we read that Google (meaning all Google-branded products, as of 2015) spans over two billion lines of code, or that the Windows team does over 8,400 code pushes per day, even though we know that neither of these is what makes Google or Windows useful. Sometimes the community even produces nonsense like this:

(As an aside, I congratulate the person whose contribution graph this is for building a daily coding habit, and also for taking a day off now and then. Both positive signs as far as I’m concerned, although I wouldn’t go so far as to say this person is productive without a much deeper look at their contribution history.)

In any case, we can add these measures to our list of ineffective proxies. Measuring productivity in terms of bugs fixed, tasks completed, or features shipped is equally futile, if marginally more difficult to game. If the goal is to fix more bugs, developers can write intentionally buggy software and then write a plethora of fixes; or, to achieve the opposite goal, they can reduce their bug count by writing features as slowly as possible. If the goal is to ship features, they can write them quickly and naively, resulting in slow and barely-functioning software; if the goal is to finish tasks, the entire team can dissolve into politics as each developer jockeys for the easiest (or most overestimated) ones. A good team may be able to ignore your measures and just work, but even in the best of circumstances a bad measure is a hindrance that’s hard to ignore.

Some organizations, in a display of profound paranoia, install spyware on their employees’ computers to track the minutiae of their moment-to-moment work with artifacts like mouse movements, keypresses, and screenshots. It’s unclear to me how any employee can do creative work under this kind of scrutiny. I expect most developers would quit immediately. But as with the measures discussed above, this one’s most obvious failing is that it doesn’t capture anything truly meaningful to the business or its customers. Would you discipline a highly productive developer because they spend a lot of time on Reddit or don’t move their mouse enough? Would you promote a developer because they spend a lot of time typing in Visual Studio, even if they’re difficult to work with? Some managers apparently do, but hopefully most of us are smarter than that.

Measuring productivity at the right level

Now you’ve been warned off the worst measures you might be tempted to use, let’s talk about a few good ones. Unfortunately, individual performance can rarely be measured beyond a binary state of “this team member contributes” or “this team member does not contribute.” And it cannot be measured at a distance. 

A software development team is not a group of isolated individuals working alone; each team member’s work output is a function of work output from all their teammates, not to mention several meaningful non-measurable interactions throughout the day. The interdependencies and nuances of individual work are too complex to be measured by an outside observer. For example, some team members are force multipliers for the rest of their team—they may not accomplish a lot on their own, but their teammates would be significantly less productive without their help and influence. Individuals like this are a secret weapon of effective engineering organizations, but their productivity cannot be measured on an individual scale. Other team members may not produce a lot of features, but act as “code janitors,” carefully testing, cleaning up, and refactoring code wherever they go so that their teammates can develop features more quickly and painlessly. Their productivity as individuals is also impossible to measure, but their effect on the team’s productivity is exponential. Even for programmers that regularly ship new features, productivity tends to vary greatly over the short term, stifling efforts to track it with any specificity. For reasons like this, individual performance is best left for individual contributors to measure in themselves and each other.

Team performance, on the other hand, is far more visible. Perhaps the best way to track it is to ask, does this team consistently produce useful software on a timescale of weeks to months? This echoes the third Agile principle: “Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.” A team that produces useful software on a regular basis is productive. A team that doesn’t should be asked why not. There are usually legitimate reasons for a lack of productivity; most unproductive teams want to be productive, and most productive teams want to be more productive.

Team productivity can be measured at an organizational scale with simple, holistic observations. And since teammates tend to be well aware of each other’s contributions (whether measurable or not), any serious failings in individual productivity can be discovered by means of good organizational habits, such as having frequent one-on-one interviews between managers and their direct reports; regularly gathering honest, anonymous feedback; and encouraging each team member to exercise personal accountability by reporting their accomplishments and taking responsibility for their failures.

There’s a lot here that depends on human beings rather than trend charts and raw data. This is an inescapable fact of software: it’s far more about humans than ones and zeros, and always has been. Productivity tracking tools and incentive programs will never have as great an impact as a positive culture in the workplace. And when accountability and healthy communication are baked into this type of culture, critical moments for productivity will quickly become visible to the people most able to address them.

Many organizations use velocity as their preferred metric for team productivity, and when done right, this can be a useful tool for understanding the software development process. Velocity is an aggregate measure of tasks completed by a team over time, usually taking into account developers’ own estimates of the relative complexity of each task. It answers questions like, “how much work can this team do in the next two weeks?” The baseline answer is “about as much as they did in the last two weeks,” and velocity is the context for that statement. It’s a planning measure, not a retrospective measure, and anyone who tries to attach incentives to it will find that its accuracy evaporates under pressure (for more on this, see The Nature of Software Development by Ron Jeffries). Understanding the velocity of a team, department or company can be foundational as you prioritize feature development, set expectations with clients, and plan the future of your products.

There is no useful measure that operates at a finer grain than “tasks multiplied by complexity.” Measuring commits, lines of code, or hours spent coding, as some tools do, is no more useful at a team scale than it is at an individual scale. There simply is no relation between the number of code artifacts a team produces, or the amount of time they spend on them, and the value of their contributions.

Many organizations thrive without any hard-and-fast measures at all. In organizations where useful software is well-understood to be both the goal and the primary (albeit hard-to-quantify) measured result of development work and inputs are correspondingly deprioritized, there are profound and far-reaching implications. Developers are liberated to do their best work, whenever and wherever they’re most productive. This may or may not look like a 9-to-5. Some will, by preference or necessity, do the bulk of their work early in the morning and late at night. Others will work in odd chunks: an hour here, a few more hours there. Some will work at home, some at the office, and others on the road. This is a feature, not a bug. It emphasizes true productivity rather than trying to shoehorn it into an observable heuristic, and it makes the workplace viable for a deeper talent pool that includes, for example, working parents and people with disabilities. Much has been written and said about the benefits of Results Only Work Environments (ROWE), remote work, reducing time spent in meetings, and flexible hours; each of these is just a manifestation of savvy productivity measures.

It’s been said that you get what you measure. So it follows that you should only measure what you really, truly want—whether or not it can be drawn as a line graph. For some, it can be frustrating to do or manage work that can’t be reduced to a number. But with work as nuanced and abstract as software development, the further we entrench ourselves in details, the more we defeat our own purposes. Useful software is our goal, and we shouldn’t settle for (or measure) anything less.

Tags: , ,
Podcast logo The Stack Overflow Podcast is a weekly conversation about working in software development, learning to code, and the art and culture of computer programming.

Related

The Overflow Newsletter Banner
newsletter June 4, 2021

The Overflow #76: The 2021 Dev Survey now open!

Welcome to ISSUE #76 of the Overflow! This newsletter is by developers, for developers, written and curated by the Stack Overflow team and Cassidy Williams at Netlify. The new Developer Survey is here! But also: precise measurements using imprecise tools and a visual walkthrough of Kubernetes. From the blog The 2021 Developer Survey is now open! stackoverflow.blogWe want to…