Measuring the performance of software engineering teams has long been seen as a complicated, daunting task. This is particularly true as software becomes more complex and more decentralized.
To deliver better software, engineering teams need the visibility, data, and decisions to continuously improve. The applications that software engineering teams use to manage their processes and release their software have access to more data than ever before. Teams can use this data to measure their performance—if they know what data most accurately reflects team performance.
The DevOps Research and Assessment (DORA) team at Google designed a six-year program to understand what sets high-performing software engineering teams apart from low-performing software engineering teams. They surveyed thousands of teams across multiple industries to measure and understand DevOps practices and capabilities. It is the longest-running academically rigorous investigation of its kind, providing visibility into what drives high performance in technology delivery and, ultimately, organizational outcomes.
The DORA team had two hypotheses that they wanted to validate:
- Software engineering team performance can be measured in a meaningful way.
- High-performing software engineering teams (based on the measures they found) can predict wider organizational performance. In simple terms, high-performing teams bring high value to organizations.
DORA found that high-performing organizations focused on engineering outcomes over outputs and teams over individuals. From the research, they identified four key metrics that indicate the performance of a software engineering team:
- Deployment frequency
- Mean change lead time
- Mean time to restore
- Change failure rate
These metrics provide a data-driven approach to analyzing and improving performance based on real research. DORA used these metrics to identify elite, high, medium, and low performing teams. Their research found that elite teams were twice as likely than low performing teams to achieve or surpass their organizational performance goals.
Let’s dive a bit deeper into these four metrics to understand how focusing on them can help teams deliver software faster and more effectively.
The DORA engineering metrics
At a high level, the DORA engineering metrics measure the velocity of a software engineering team and the stability of the software they build and release. If a team can constantly improve on these metrics, they can release higher-quality software to customers more quickly.
For each of the four DORA engineering metrics below, we’ll cover what the metric is, how it’s calculated, why it matters, how to improve it, and the target value for an elite team.
Deployment frequency is how often a software engineering team deploys code to production. This important metric can serve as a proxy for how often a team provides new value to customers.
Continuous delivery and shipping code as fast, small, frequent deployments are key components of DevOps. Deployment frequency reveals how efficient a team’s working and releasing processes are. For example, if deployment frequency slows down, that might indicate an issue with a new workflow. Measuring deployment frequency can reveal the wider impacts of change to team structure, personnel, or process. Measuring deployment frequency alongside other metrics ensures the changes being deployed add real value for customers.
Deployment frequency is usually reported in deployments per day. You can automate this measurement by pulling data from your team’s continuous integration/continuous delivery tools.
There are a few practices a software engineering team can adopt to improve their deployment frequency:
- Reduce the size of each batch of work so a team can ship smaller pieces of work more frequently. Another advantage: Less-risky deployments that can be easily tracked or rolled back should there be an issue.
- Integrate with continuous integration/continuous delivery tools to improve the efficiency of your release process.
- Use automated tests to increase confidence in code quality and reduce the requirement for slow manual testing before deploying new changes to production.
An elite team deploys changes to production multiple times per day to continuously add value for customers.
Mean change lead time
Change lead time (also known as cycle time) is the time it takes from code being committed to code successfully running . It allows you to track the pace of a software engineering team. Faster teams have optimized processes and can get new features to market faster. This increased efficiency opens up opportunities to increase organization revenue, improve customer renewal rates, and create a happy and efficient team. On the other hand, slower delivery means there is waste or inefficiency in the process, causing delays for customers.
Measuring change lead time also helps teams identify bottlenecks in their workflow, so they can optimize and improve.
Mean change lead time is calculated by tracking the time between each code commit to the code being delivered in production and calculating an average.
What steps can software engineering teams take to improve their mean change lead time?
- Integrate testing into the development process.
- Automate tests instead of manually testing.
- Integrate with continuous integration/continuous delivery tools
- Streamline the code review process to reduce delays.
An elite teams have a mean change lead time as low as one hour.
Mean time to recovery
Mean time to recovery measures how quickly a software engineering team recovers from a failure. A failure is anything that interrupts the expected production service quality, from a new bug introduced in deployment to a hosting infrastructure going down. Mean time to recovery indicates how quickly a software engineering team can understand and resolve problems that occur in production. Downtime is never good for customers. A low mean time to recovery gives teams confidence that if production is impacted, it can be quickly restored to a functional state.
Mean time to recovery is calculated by tracking the average time between a production bug or failure being reported and that issue being fixed.
Here are some ways a software engineering team can improve their mean time to recovery:
- Introduce monitoring tools that quickly report failures in production.
- Implement a robust on-call and support documentation system.
- Improve deployment time so fixed issues can be quickly released to production.
- Use feature flags that allow you to turn on/off features in production with the click of a button. This can reduce the mean time to recovery to seconds.
An elite team aims to have a mean time to recovery of less than one hour.
Change failure rate
Change failure rate measures how often a software engineering team releases a change to production that causes a failure. These are changes that lead to a bug or have to be rolled back because they did not meet customers’ expectations. This metric indicates the quality of the software a team builds. Fixing bugs and rolling back code is a costly exercise as it takes away from time that could be spent building new features that add value for customers, so a high change failure rate suggests lower-quality software that frustrates customers.
Change failure rate is calculated as a percentage. It is the ratio of the number of failures per number of deployments to production.
Software engineering teams can implement these practices to improve their change failure rate:
- Introduce automated code review tools to catch issues that manual code reviews miss.
- Add automated tests for all new code.
- Run all automated tests as part of the release process using continuous integration/continuous delivery tools.
- Introduce incident retrospectives so the team can understand what caused an incident and work to ensure it does not happen again.
An elite team aims to have a change failure rate between zero and 15%.
Of course, these aren’t the only metrics you can consider when assessing the performance of a software engineering teams. Many other metrics that you can track can provide insight into your team’s performance. However, DORA found that these four metrics were the most correlated with wider organizational success.
A word of caution
The four DORA metrics seem straightforward, but when used incorrectly, they can create problems.
Every team that uses the DORA engineering metrics exists within its own context, and its product/service will be different from other teams. The metrics should be used to help individual teams continuously improve their delivery. An anti-pattern is using the metrics to rate your teams against each other. This is unfair, because each team’s context and starting point is different.
Consider all four metrics together rather than focusing on a subset. The metrics are intended to balance speed and quality, so zeroing in on a subset may lead to worse performance. For example, a high deployment frequency can negatively impact quality if many of the changes you are releasing have bugs.
These metrics shouldn’t consume your teams, becoming the only things they focus on.. Improving metrics should never be your main goal. Goodhart’s law says, “Any measure that becomes a target ceases to become a good measure.” The goal should be to constantly, effectively deliver value to the customer, using the metrics to reflect your team’s progress toward that goal
Finally, ensure that the DORA engineering metrics do not become vanity metrics that display numbers but give no obvious clue about what action to take. If a team is deploying 100 times, what does that mean? Was it 100 times in a day, a week, a year? Has that number improved since the last measure? How can it be improved? Are other metrics suffering? The DORA metrics help teams assess and improve their performance, but in order for teams to take meaningful action, they need to understand what the metrics do (and don’t) indicate in context.
Software engineering teams are constantly looking for ways to improve their processes and delivery. For many years, teams have lacked an objective, meaningful way to measure their performance. The DORA team wants to change that by focusing on the metrics that not only indicate how a team is performing but also reveal important clues about the organization’s overall health.
How would your team fare with the respect to the four DORA metrics? What does that say about your organization?