Can developer productivity be measured?
Defining and measuring programmer productivity is something of a great white whale in the software industry. It’s the basis of enormous investment, the value proposition of numerous startups, and one of the most difficult parts of an engineering manager or CTO’s job description. It’s also a source of anxiety for developers at all experience levels: how do you know if you’re doing enough, both on and off the clock? When everything you do is intangible, how should you measure it? Can it be measured at all? In this article I’ll discuss the biggest pitfalls of productivity measurement and a few ways to do it well.
In software development, as in any other field, many people think of productivity in terms of inputs and outputs. A full-time developer works 40 hours per week for an average salary of $107,510 per year in the United States. Hours and salary are visible, easily quantifiable inputs. The developer then produces software features, documentation, deployments, and/or bug fixes on a recurring basis. These are outputs. If developers are as simple as the software we imagine they are writing, then increasing their productivity should be as simple as asking them to work more hours or paying them higher salaries. Of course, this is a fairy tale. Neither developers nor software work like that.
The problems of input measurement
“Hours worked” is one of several false metrics used as a proxy for job performance. I mention it first because it’s an oft-unexamined default, a path of least resistance. If a company doesn’t intentionally avoid doing so, it will sooner or later deteriorate into an hours-only environment. Outside of a pandemic where remote work is the norm, the symptoms of an hours-only environment are easy to recognize. Working hours are seen as non-negotiable, and being present at the office is seen as proof that someone is working. Anyone who tries to leave the office a couple hours early is met with hostility (sometimes as muted as a few raised eyebrows, sometimes more brazen). Anyone who works late into the evening or comes in on the weekend is seen as a high performer. The incentives of this “last to leave the gym” culture are unfortunate: developers are pushed to spend more and more of their lives at work, left without any other way to demonstrate their value, and lulled into paying only secondary attention to their work output. As time goes on, the workplace becomes more and more a place where everyone is working but nothing is getting done.
The problems don’t end there. If we assume that all work is “positive work”—that is, that all work represents progress toward a goal—then we are mistaken. Developers who have worked while exhausted, distracted, or sick tend to be familiar with the concept of “negative work”: work so poorly done that it must be undone or compensated for later, thus increasing rather than decreasing the amount of work remaining. Software development is complex, abstract, attentive work, and therefore hypersensitive to a developer’s mental state. That is, there are hidden inputs at play: anxiety, depression, burnout, toxicity at work, grief, microaggressions, and a hundred other things that can reduce or invert individual productivity on any given day. If company culture demands long hours week after week, or even just eight-hour days with no flexibility or vacation time, developers will inevitably spend time doing negative work: they will literally accomplish less by staying late than they would have if they had gone home earlier. And due to fatigue, they’ll accomplish less the next day too.
On the other hand, an hours-only environment is not the worst case scenario. It has a spectre of fairness about it: if two developers are working the same number of hours, there is one clear dimension on which they are equals. Neither of them appears to be slacking off, neither appears to be doing more than their fair share. If they produce less than expected, well, at least they put in their time. And the “hours worked” metric doesn’t explicitly incentivize bad code like some metrics do. So while it’s a poor metric, and even works against productivity in many situations, there are much worse metrics we should discuss.
Consider the other obvious input to software development: money. I have jokingly suggested to my manager once or twice that productivity should be measured by salary, and if my salary were doubled I would produce code at the level of a world-class software architect. Of course, you know intuitively that this is ridiculous. Paying someone more money doesn’t immediately make them more productive (although, indirectly and on a limited scale, it may). Yet, in my mind, money and hours belong to the same category: not just inputs, but auxiliary ones, only tenuously driving productivity. One is given by the employer, the other by the employee, but this exchange is incidental to the creation of useful software.
Long story short, measuring inputs is a deficient technique because software development is not an equation and code cannot be built by assembly line. So let’s talk about outputs.
The pitfalls of output measurement
Here, perhaps counterintuitively, we find many of the worst metrics in the software development world. Some have famously fallen into the trap of thinking that the work output of software development is lines of code or commits in version control. Certainly these are part of the process, but they’re more like byproducts than results. Strictly speaking, a line of code that doesn’t solve a problem is worse than no code at all. So measuring a developer’s productivity by how much code they contribute is like measuring a power plant by how much waste they produce or measuring Congress by how many bills they pass; it’s tangential to actual value.
What’s worse, gaming these measurements is trivially easy. A developer who gets paid per line of code can easily earn an entire year’s salary in a single day without creating any business value whatsoever. Most developers will adopt a subtler approach, but all the same, you should be careful what you wish for.
When a measure becomes a target, it ceases to be a good measure.
Developers, by and large, understand this—and yet, embarrassingly, we still tend to use commits and lines of code as proverbial peacock feathers. Our eyes widen when we read that Google (meaning all Google-branded products, as of 2015) spans over two billion lines of code, or that the Windows team does over 8,400 code pushes per day, even though we know that neither of these is what makes Google or Windows useful. Sometimes the community even produces nonsense like this:
(As an aside, I congratulate the person whose contribution graph this is for building a daily coding habit, and also for taking a day off now and then. Both positive signs as far as I’m concerned, although I wouldn’t go so far as to say this person is productive without a much deeper look at their contribution history.)
In any case, we can add these measures to our list of ineffective proxies. Measuring productivity in terms of bugs fixed, tasks completed, or features shipped is equally futile, if marginally more difficult to game. If the goal is to fix more bugs, developers can write intentionally buggy software and then write a plethora of fixes; or, to achieve the opposite goal, they can reduce their bug count by writing features as slowly as possible. If the goal is to ship features, they can write them quickly and naively, resulting in slow and barely-functioning software; if the goal is to finish tasks, the entire team can dissolve into politics as each developer jockeys for the easiest (or most overestimated) ones. A good team may be able to ignore your measures and just work, but even in the best of circumstances a bad measure is a hindrance that’s hard to ignore.
Some organizations, in a display of profound paranoia, install spyware on their employees’ computers to track the minutiae of their moment-to-moment work with artifacts like mouse movements, keypresses, and screenshots. It’s unclear to me how any employee can do creative work under this kind of scrutiny. I expect most developers would quit immediately. But as with the measures discussed above, this one’s most obvious failing is that it doesn’t capture anything truly meaningful to the business or its customers. Would you discipline a highly productive developer because they spend a lot of time on Reddit or don’t move their mouse enough? Would you promote a developer because they spend a lot of time typing in Visual Studio, even if they’re difficult to work with? Some managers apparently do, but hopefully most of us are smarter than that.
Measuring productivity at the right level
Now you’ve been warned off the worst measures you might be tempted to use, let’s talk about a few good ones. Unfortunately, individual performance can rarely be measured beyond a binary state of “this team member contributes” or “this team member does not contribute.” And it cannot be measured at a distance.
A software development team is not a group of isolated individuals working alone; each team member’s work output is a function of work output from all their teammates, not to mention several meaningful non-measurable interactions throughout the day. The interdependencies and nuances of individual work are too complex to be measured by an outside observer. For example, some team members are force multipliers for the rest of their team—they may not accomplish a lot on their own, but their teammates would be significantly less productive without their help and influence. Individuals like this are a secret weapon of effective engineering organizations, but their productivity cannot be measured on an individual scale. Other team members may not produce a lot of features, but act as “code janitors,” carefully testing, cleaning up, and refactoring code wherever they go so that their teammates can develop features more quickly and painlessly. Their productivity as individuals is also impossible to measure, but their effect on the team’s productivity is exponential. Even for programmers that regularly ship new features, productivity tends to vary greatly over the short term, stifling efforts to track it with any specificity. For reasons like this, individual performance is best left for individual contributors to measure in themselves and each other.
Team performance, on the other hand, is far more visible. Perhaps the best way to track it is to ask, does this team consistently produce useful software on a timescale of weeks to months? This echoes the third Agile principle: “Deliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.” A team that produces useful software on a regular basis is productive. A team that doesn’t should be asked why not. There are usually legitimate reasons for a lack of productivity; most unproductive teams want to be productive, and most productive teams want to be more productive.
Team productivity can be measured at an organizational scale with simple, holistic observations. And since teammates tend to be well aware of each other’s contributions (whether measurable or not), any serious failings in individual productivity can be discovered by means of good organizational habits, such as having frequent one-on-one interviews between managers and their direct reports; regularly gathering honest, anonymous feedback; and encouraging each team member to exercise personal accountability by reporting their accomplishments and taking responsibility for their failures.
There’s a lot here that depends on human beings rather than trend charts and raw data. This is an inescapable fact of software: it’s far more about humans than ones and zeros, and always has been. Productivity tracking tools and incentive programs will never have as great an impact as a positive culture in the workplace. And when accountability and healthy communication are baked into this type of culture, critical moments for productivity will quickly become visible to the people most able to address them.
Many organizations use velocity as their preferred metric for team productivity, and when done right, this can be a useful tool for understanding the software development process. Velocity is an aggregate measure of tasks completed by a team over time, usually taking into account developers’ own estimates of the relative complexity of each task. It answers questions like, “how much work can this team do in the next two weeks?” The baseline answer is “about as much as they did in the last two weeks,” and velocity is the context for that statement. It’s a planning measure, not a retrospective measure, and anyone who tries to attach incentives to it will find that its accuracy evaporates under pressure (for more on this, see The Nature of Software Development by Ron Jeffries). Understanding the velocity of a team, department or company can be foundational as you prioritize feature development, set expectations with clients, and plan the future of your products.
There is no useful measure that operates at a finer grain than “tasks multiplied by complexity.” Measuring commits, lines of code, or hours spent coding, as some tools do, is no more useful at a team scale than it is at an individual scale. There simply is no relation between the number of code artifacts a team produces, or the amount of time they spend on them, and the value of their contributions.
Many organizations thrive without any hard-and-fast measures at all. In organizations where useful software is well-understood to be both the goal and the primary (albeit hard-to-quantify) measured result of development work and inputs are correspondingly deprioritized, there are profound and far-reaching implications. Developers are liberated to do their best work, whenever and wherever they’re most productive. This may or may not look like a 9-to-5. Some will, by preference or necessity, do the bulk of their work early in the morning and late at night. Others will work in odd chunks: an hour here, a few more hours there. Some will work at home, some at the office, and others on the road. This is a feature, not a bug. It emphasizes true productivity rather than trying to shoehorn it into an observable heuristic, and it makes the workplace viable for a deeper talent pool that includes, for example, working parents and people with disabilities. Much has been written and said about the benefits of Results Only Work Environments (ROWE), remote work, reducing time spent in meetings, and flexible hours; each of these is just a manifestation of savvy productivity measures.
It’s been said that you get what you measure. So it follows that you should only measure what you really, truly want—whether or not it can be drawn as a line graph. For some, it can be frustrating to do or manage work that can’t be reduced to a number. But with work as nuanced and abstract as software development, the further we entrench ourselves in details, the more we defeat our own purposes. Useful software is our goal, and we shouldn’t settle for (or measure) anything less.Tags: management, measurement, productivity
A note: generally it is considered proper and less offensive to use people first language, i.e. people with disabilities.
Person-first language is often rejected by those who are supposed to be its beneficiaries, including autistic people such as myself. You can even see that in the “Criticism” section of the Wikipedia article that you cited.
The logic of person-first language seems to be that putting the word “person” before one’s condition/disability emphasizes one’s personhood over the condition. That logic runs headlong into the fact that emphasis and word order are only tenuously related.
I’m an autistic person. I believe that “people first language” is just a silly word game. The idea that re-ordering a sentence can in some way change a person’s perception when they hear it is weakly supported by experimental evidence.
I actually find the practice of attempting to manipulate me with carefully crafted sentence structure the greater offence.
Call a spade a spade.
Maybe it was intentional in your case, but I’ve been told not to use that phrase (call a spade a spade) because it could be interpreted as having racist connotations. Which is weird, because as far as I know its about garden tools for about 500 years. Apparently calling a person a spade is a euphemism for “black person” that started in the 1920s though- and due to the fairly racist status-quo in those days (and even now, really) it picked up negative connotation. Google “npr is-it-racist-to-call-a-spade-a-spade” if you want the full discussion. [I’m aware the claim could be made that in certain communities, calling a person a racist is in itself a type of hate speech or at least divisive speech. To be fair, with most forms of social identity and mental attributes, using a binary category to describe a person falls flat- it should be mostly used for relative measurements. There is no such thing as “hot” or “cold”, there is only “hotter than X” and “colder than Y”. For brevity sake, we leave out the X and Y often- and that leads to all sorts of problems.
The phrase comes from Plutarch in ancient Greece: https://en.wikipedia.org/wiki/Call_a_spade_a_spade
Is this truly the most noteworthy thing (in your eyes) about this article?
Programmers are today’s coal mine workers, being squeezed by people not really involved in the software construction and having better salaries. I recommend programmers to create they own business and get out of companies as fast as possible.
that very much depends on the company. i wouldn’t recommend creating an own business if you’re not that business guy. a good company having your back beats the fiscal paperwork you’re not fond of at the weekends at any time (talking from own experience).
I totally agree with you. Additionally, programmers are turned into a commodity (just look at a typical whiteboarding job interview for confirmation).
But unfortunately for majority it is hard to switch mentality from employee to being on your own (questions what to do, feeling of security, etc).
Did anyone else do the math on the 14,489 commits per year Tweet? If we assume that’s a normal 2080 hours a year (which I’m guessing it probably isn’t), that’s an average of almost 7 commits an hour, or 1 commit about every 8.5 minutes. I hope that’s for a team, a massively large project, or even multiple projects, not a single dev. Otherwise there’s likely thousands of commits that aren’t actually useful.
“I added some whitespace, I’d better commit.”
“I deleted some whitespace, I’d better commit.”
“I renamed a variable, I’d better commit.”
Are you assuming that someone is working 9-5, 5 days a week ?
I know startup founders who work 10 hours a day, 7 days a week.
Even at 80 hours a week, a commit every 17 minutes (on average) is completely unrealistic for a single person.
As a startup founder, I definitely understand working 12-14 hrs a day for 6 days a week, which I did for years. Yet there’s no way what I was doing merited commits that often. Not to mention that I did way more than just program that whole time. As a startup founder, I had to do way more than just programming, and that’s even with spending 40 hrs/week of that time as a dev at my day job.
With that many commits, they likely have to spend a considerable amount of time not developing. At 5 days a week, that’s an average of nearly 56 commits a day. Even working 365 days that year, it’s still nearly 40 commits a day. If you have even close to a decent commit workflow with code reviews and other checks and balances to prevent issues, someone is spending massive amounts of time dealing with those commits.
Github activity includes comments on issues, openning PRs, forking repos. This guy is just trying to make himself look cool but obviously the majority of that is definitely not code. It’s just “activity”
Developer Productivity = Meeting Deadlines
Estimation is an art learned over many years of experience and first time working on a platform or technology should increase estimates by 25%.
I’ve worked in software development for 20 years, and always found that having a solid foundation in the tech you are using help speed up progress, but the number 1 way to measure productivity is the ability to deliver on time, where the key variable for consideration is being able to estimate your work time to complete accurately.
I now run a team of 10.
If you are really good with estimations, you are a bad software developer.
As simple as that.
Well, because if you were good, you would have already automated all the parts away that are “easy to estimate” because you have done them multiple times and know what has to be done. Otherwise, how could you be good at estimating them in the beginning. 🙂
Well, on a long term, I’d prefer that his/her organization measures the quality of the developer’s work. This is much more relevant for the outcomes of someone’s realisations.
Quality of developments are now relatively easy to measure:
* how a piece of code is documented
* how is it tested
* how is it addressing the specifications
All these criteria are closely related to how the organisation is managed, how the best rules are shared within the company, etc.
Why is defining and measuring programmer productivity necessary? Lack of trust?
In the worst case, yes, it’s a lack of trust – the same reason why so many businesses are wary of remote work. But in many cases it’s simply an outdated assembly-line attitude about software development: if we can measure production, then we can create goals and incentives to increase it. Even though this attitude is ultimately self-defeating when it comes to software, its purported effectiveness in a few other fields (e.g. manufacturing, logistics, marketing) has given it a lot of unearned reach.
I believe, if your productivity is measured by this particular thing then he did the great job :).
It’s shouldn’t be how much code you’re contributed but the quality of clean code. To me the whole year can’t exceed as much. We’d policy as you’ve to push your code as per new features added or problem resolved. Clean code and Quality matter alot.
hi~ very very informative! thank you for the insightful article.
But dear stackoverflow team, comment section here is very very touch-sensitive and i had to tap ‘cancel reply’ several times while i was scrolling to the bottom. it can be improved maybe? 🥰
It’s hard to do much better than a simple performance review at project completion/ Did the developer meet expectations, or exceed them (assign bigger/more important projects), or fall short in some way (remedial action required or assign simpler tasks – maybe eventually three strikes and you’re out). Developers quickly find their appropriate level.
I think a good measure of engineer’s productivity is to calculate from the contrary. Take 8-hour work day and subtract all time spent on activities that produce no value (useless meetings, bureaucratic processes, waiting for compilation to complete, etc). That would be my measure of productivity.