[Ed. note: While we take some time to rest up over the holidays and prepare for next year, we are re-publishing our top ten posts for the year. Please enjoy our favorite work this year and we’ll see you in 2025.]
In this era of exponential acceleration that AI has brought forth, the average VS Code or Jetbrains developer benefits from a toolkit stacked with modern, state-of-the-art components. From IDEs to extensions to CLIs, if you stop and consider all the tools you use weekly, which would you guess is the oldest? That is, which tool do you interact with regularly (say, an hour per week) that hasn't substantively changed in years?
Since IDEs are refreshed every few years, maybe you've guessed that your oldest tool in active use is "git." It was first released nearly 20 years ago, back in 2005. Or maybe you prefer to code with the classic old-school text editors, like Sublime Text (2008) or vim (1991).
According to our research at GitClear, the oldest tool most developers are still actively using—more than an hour per week—hasn't changed since before the Berlin Wall came down. In 1986, an unheralded Tucson computer science professor named Eugene Myers published his seminal research paper and the "Myers Diff Algorithm" was born. Few developers today know the algorithm by name, but they can recognize the familiar red-and-green byproduct of Myers algorithm, which is the default diff generator of git, and thus, git platforms like GitHub:
How did GitHub designate which lines to color red and green? By implementing the formulation of Eugene Myers, who offered what became the canonical solution for representing the difference between the state of a git repo “before” and “after” a developer’s git commit.
Is it possible to improve upon this method, or did Professor Myers knock it out of the park? New research we’ve undertaken tests an updated set of "diff operators" that extend the usual "add" and "delete" to include "move," "update," "find/replace," and others. Their goal is to measure whether it's possible to apply a deeper lexicon of diff operators to condense how a commit is represented. Can change be shown more concisely than what was possible nearly 40 years ago?
Our work followed tangent lines of investigation: one empirical, one observational. We'll summarize the two sides of the research below. The headline finding should come as welcome news for developers: 30% Less is More: Exploring Strategies to Reduce Pull Request Review Time. The research comes accompanied by examples and videos to substantiate the “30% less code to review in pull requests" finding.
How much time is currently spent on pull requests?
Before we could determine how much time could be saved, we had to establish a baseline: how much time was being spent on code review in the 2020s?
According to CodeGrip's 2022 online survey of "1,000+ CxOs and developers," code review is utilized by around 84% of companies. The same survey found that the median developer in 2022 spent two to five hours per week on code review. Thirty percent of respondents report spending more than five hours per week on review. Thus, in a 40-hour work week, more than 10% of the entire week is consumed by code review.
The primary correlate of "code review time" is "lines of code to review." And the higher an organization’s contributor count rises, the more likely that developers’ burden of "code to review" will become 20% or more of their total hours available per week.
Finding specific opportunities to reduce lines requiring review
To understand how GitClear's "Commit Cruncher" diff algorithm can generate a more precise diff than Myers, let’s consider a couple specific examples.
The Myers diff algorithm classifies all code change lines as binary: either "add" or "delete."
The Commit Cruncher algorithm tested recognizes three times more types of changed operations: Added, Deleted, Updated, Moved, Find/Replaced, and Copy/Pasted (examples of recognized code operations here). The latter operation (Copy/Paste) featured prominently in GitClear's popular AI Code Quality research from earlier in 2024, which was cited by more than ten developer media sources, including Stack Overflow’s podcast.
The extent to which Commit Cruncher reduces lines to read vs. Myers algorithm depends on the content of a given pull request. The more refactoring a pull request includes, the greater the potential savings of “lines to review.” A few examples where Commit Cruncher can express a diff more concisely follow.
White space changes
One example where Myers requires more work by a reviewer is when a code change involves white space, like the change shown earlier in this post:
The same diff, through the lens of Commit Cruncher:
By recognizing whitespace changes as trivial updates, a diff viewer can focus attention on the subset of lines where meaningful changes occurred.
File renames and refactored functions
Another example comes from a pull request comparison posted to GitClear’s Youtube channel. It shows a case where a file was renamed and a block of code that had initially been in the body of a React component was extracted to a standalone function. In GitHub’s Myers-based diff presentation, the change is shown as a new, 30+ line, added method:
The same diff, processed by Commit Cruncher, is less exciting:
“Less excitement” reviewing pull requests is precisely what the average code reviewer seeks. By eliding the no-op changes to the relocated method, the Commit Cruncher presentation conserves a reviewer’s attention for lines with more substantive content differences.
Incremental updates
Another line reduction opportunity is presented when lines receive an incremental modification, like this trivial change from a recent React commit:
The same diff through a GitClear lens condenses the incremental update to a single line, where the new (or removed) characters are shown inline:
In the GitClear database, containing more than one billion line changes, around 10-15% of those changes are incremental updates like this. These small improvements add up.
Weaving the changes together
Commit Cruncher employs another, more subtle difference in how it derives which lines changed compared to Myers. To understand this difference, consider a pull request that includes a set of commits, [A, B, C] being proposed for merge.
The Myers diff algorithm works by inspecting two inputs: the repo state before commit A and the state after commit C. The only information that this diff algorithm has available to construct a visual diff is the state of the repo at two points in time. Consider a case where Commit B renames a file, followed by Commit C adding and removing a few lines from the renamed file. A comparison that considers only [A, C] would show on the "before" side the pre-rename version of the renamed file as deleted lines. On the "after" side, the hundreds or thousands of lines from the renamed file would be presented as if the file had been newly created.
In contrast, Commit Cruncher employs the more computationally intensive approach of following each changed line through each commit that it appears within, to build what is labeled a "Commit Group":
From GitClear Commit Group PDF explainer
There are a few benefits that emerge from doing the added work of traversing each line through its entire commit history.
One benefit is that, when hovering on the line, the developer can access a historical record of the commit messages, which often elucidates why a particular line evolved into its final form. From GitClear's video:
Another benefit is that, when a line undergoes multiple changes like being moved and having a find/replace applied, Commit Cruncher can still show the reviewer the original location of the multi-updated line. This means less uncertainty about how this line will perform in a production environment (since it was already deployed in the original form).
Empirical compare method: What 12,638 pull requests reveal
To assess the real-world impact of using the Commit Cruncher diff algorithm vs. classic Myers, GitClear analyzed 12,638 pull requests that it processed during the second half of May 2024. The pull requests’ diffs were processed by GitClear and compared to their GitHub equivalent. The contributing repos were about 25% popular open-source projects (React, VS Code, Chromium, Tensorflow) and roughly 75% SaaS customers who had opted into anonymized data sharing.
GitClear used the GitHub API's compare endpoint to capture the count of "added" and "deleted" lines that GitHub would show for each pull request. It then recorded the count of changed lines per pull request, as derived by Commit Cruncher.
28% fewer lines to review
Our metric for comparison was the number of green or red highlighted lines in either the classic Myers diff as used on GitHub (and elsewhere) versus the highlighted lines on GitClear (using updated diff algorithm). We lumped add, remove, and others into a single value, as that’s what any code reviewer would see.
Here were the 12,638 pull requests, with average and median changed line counts for variously-sized pull requests:
The data shows that developers reviewing with GitClear and its "Commit Cruncher" algorithm were presented with, on average, 22% to 29% fewer changed lines to review per pull request.
The median difference between "Myers" and "Commit Cruncher" ranges from 27% to 31%, depending on the total magnitude of the change set. This implies that updating git diff processing tools could reduce the volume of lines requiring review by almost a third. A detailed description of the database queries that were used to produce these numbers is offered in the "Appendix" section A6.
Interview compare method: Are shorter pull requests more prone to missed bugs?
While the raw line counts suggest that Commit Cruncher-processed pull requests will require fewer lines to be reviewed, Lead developers, CTOs, and VPs of engineering may fairly wonder if there are less desirable changes coupled with a new diff processor.
To research these questions, 48 developer research participants were recruited randomly from the web platform CodeMentor. Each participant was assigned to review two different pull requests in a programming language familiar to them. Each pair of pull requests alternated which git platform it was shown on. For example, the first participant would review PR #1 on GitHub and PR #2 on GitClear. The second participant would review PR #1 on GitClear and PR #2 on GitHub.
The research provides tabular results for these 48 developer interviews, collecting data to evaluate:
1) Do reductions in "lines to review" actually translate to a corresponding reduction in "real-world time to review"?
2) Does reviewing pull requests in less time correlate with negative or positive impacts to "percentage of bugs discovered"?
A study by Bacchelli and Bird that supports the contention that, when reviewing code, most understanding and attention is spent in search of "Finding Defects":
Statements from GitClear's survey participants support this interpretation for "code review motivation":
- “The most difficult thing when doing a code review is understanding the reason for the change.”
- "Understanding the code takes most of the reviewing time."
- “In a successful code review submission, the author is sure that his peers understand and approve the change.”
To shorten the code review process, a tool needs to accelerate the rate at which a developer evolves from "encountering code" to "contextualizing it" to "evaluating whether it satisfies the author's goals."
Code interviews: 23-36% less time, equivalent comprehension
Our interviews found reductions in each of three pull request programming languages tested:
Here's how the aggregated data looks in graph form, with the yellow bars illustrating the absolute difference between the two data points:
The most notable difference was for the pull request #25610, with a 42% decrease (13.2 average minutes with GitClear's Commit Cruncher vs. 22.7 minutes with GitHub).
Code comprehension metrics
GitClear's research found pull request comprehension within the margin of error for GitClear and GitHub reviewers.
Question accuracy percentages were less than 5% different. A statistically insignificant benefit was found in favor of Commit Cruncher diffs when evaluated across the entire pool results.
The raw data of the evaluation metrics for each individual session was plotted using a scatter chart, comparing question accuracy scores against the code review duration, as seen in the figure below. The code review duration decrease is visually outlined by the increased frequency of blue (GitClear) dots on the left side of the chart.
On durability of status quo
Perhaps the most substantive question that comes to mind while reading this research: how has diff viewing evolved to be more homogenous than any of "developer IDE," "git platform," or "system OS"? Compared to the innumerable programming languages that have come and gone in the decades since the Myers algorithm was created, how did so many products converge on a solution developed 40 years ago? We offer two ideas on this puzzler.
The first possibility is that most developers don’t even recognize that it’s possible to represent a diff without Myers. Since diffs have looked the same since their career began, nobody thinks to go looking for other options.
The second reason is that Myers is a much “cleaner” algorithm than any successor would be. Choosing Myers offers an instantly available, multi-generation-tested means to show a diff. And when it comes to reviewing a diff, getting every line right, every time, is incredibly important.
While Commit Cruncher shows significant improvement over Myers in this research, it relies upon a set of iteratively tuned heuristics. None of the large git platforms can afford to imperil user trust as they iterate on a more granular representation of what changed within a commit. Much like all source control providers herded to git once it was proven reliable at enterprise-scale, no single company is likely to evolve their diff tool until they have strong incentives to do so. As future research corroborates that a savings of more than 20% is possible in pull request review time, the current stasis may end.
Conclusion
The evidence presented herein raises new questions for the oldest tool still widely used by contemporary developers.
The implications of a 28% drop in code review time could be significant. Scale the 2022 CodeGrip code review survey result to a 10-member team and the math works out to about 50 hours per week spent reviewing code. If the combined developer and manager salaries average $150,000, then a 10-developer team invests around $16,000 per month of salary toward code review. This doesn’t include the difficult-to-measure (but familiar to any developer) time needed to context shift into and out of "code review mode."
A reduction of the magnitude observed here would mean this 10-developer team could reallocate 40 hours per month for more coding, less reviewing (one hour/week * four weeks/month * 10 developers). Considering that code review is often one of the most unpleasant, high-willpower chores included in a developer's responsibilities, the morale improvement gained by reducing code review time may rival the gains in "time saved."
The full text and citations of GitClear's latest research is available to download free: 30% Less is More: Exploring Strategies to Cut Pull Request Review Time. The pull request tool can be trialed at no cost by visiting our Best GitHub Alternative Pull Request Review Tool page, which allows pasting a pull request URL from GitHub to allow a direct, side-by-side comparison of the competing diff algorithms.