Use Git tactically
[Ed. note: While we take some time to rest up over the holidays and prepare for next year, we are re-publishing our top ten posts for the year. Please enjoy our favorite work this year and we’ll see you in 2023.]
In the movie Free Solo the rock climber Alex Honnold trains to perform a free solo climb of El Capitan, a mountain in Yosemite.
It’s a good movie, but if you haven’t seen it, free solo climbing is when you scale a rock face without ropes, harness, or safety equipment. If you lose your grip and fall, you’ll die. El Capitan, just to rub it in, is 914 meters of vertical rock. Free-climbing it is an incredible endeavor, but Honnold gets it done by committing to one move at a time (this article is about using Git, after all).
Save point
Honnold didn’t just free-climb El Capitan. He trained deliberately towards the goal of free climbing El Capitan.
The documentary shows how he repeatedly climbs El Capitan with safety equipment. He plans a route and climbs it several times. On each of the training ascensions, he’s using ropes, a harness, and various fasteners for the ropes. When he falls during training, he doesn’t fall far, because the rope, harness, and fasteners stop the fall at the last point of fixation.
It’s almost like a video game save point.
In one memorable scene, Honnold considers a jump from one position to another. Hundreds of meters in the air, parallel to a vertical rock face. It’s a truly precarious maneuver. If he fails, he’ll die.
Or, that’s true for the free climb. At first, he rehearses the move using rope and harness. This enables him to perform a potentially fatal jump in relative safety. When it goes wrong, he’s back at where he fixed his rope, and he may try again.
When you’re making large code changes, even migrating to a new implementation, you can create save points to prevent catastrophes. Like Alex Honold, you can fix your code in place to give you a better chance to get to the next successful build.
Precarious editing
When you edit code, you go from one working state to another, but during the process, the code doesn’t always run or compile.
Consider an interface like this:
public interface IReservationsRepository
{
Task Create(Reservation reservation);
Task<IReadOnlyCollection<Reservation>> ReadReservations(
DateTime dateTime);
Task<Reservation?> ReadReservation(Guid id);
Task Update(Reservation reservation);
Task Delete(Guid id);
}
This, as most of the code in this article, is from my book Code That Fits in Your Head. As I describe in the section on the Strangler Fig pattern, at one point I had to add a new method to the interface. The new method should be an overload of the ReadReservations
method with this signature:
Task<IReadOnlyCollection<Reservation>> ReadReservations(DateTime min, DateTime max);
Once you start typing that method definition, however, your code no longer works:
Task<IReadOnlyCollection<Reservation>> ReadReservations(
DateTime dateTime);
T
Task<Reservation?> ReadReservation(Guid id);
If you’re editing in Visual Studio, it’ll immediately light up with red squiggly underlines, indicating that the code doesn’t parse.
You have to type the entire method declaration before the red squiggly lines disappear, but even then, the code doesn’t compile. While the interface definition may be syntactically valid, adding the new method broke some other code. The code base contains classes that implement the IReservationsRepository
interface, but none of them define the method you just added. The compiler knows this, and complains:
Error CS0535 ‘SqlReservationsRepository’ does not implement interface member ‘IReservationsRepository.ReadReservations(DateTime, DateTime)’
There’s nothing wrong with that. I’m just trying to highlight how editing code involves a transition between two working states:
In Free Solo the entire climb is dangerous, but there’s a particularly perilous maneuver that Alex Honnold has to make because he can’t find a safer route. For most of the climb, he climbs using safer techniques, moving from position to position in small increments, never losing grip or footing as he shifts his center of gravity.
There’s a reason he favors climbing like that. It’s safer.
Micro-commits
You can’t edit code without temporarily breaking it. What you can do, however, is move in small, deliberate steps. Every time you reach a point where the code compiles and all tests pass: commit the changes to Git.
Tim Ottinger calls this a micro-commit. Not only should you commit every time you have a green bar—you should deliberately move in such a way that the distance between two commits is as short as possible. If you can think of alternative ways to change the code, choose the pathway that promises the smallest steps.
Why make dangerous leaps when you can advance in small, controlled moves?
Git is a wonderful tool for maneuverability. Most people don’t think of it like that. They start programming, and hours later, they may commit to Git in order to push a branch out.
Tim Ottinger doesn’t do that, and neither do I. I use Git tactically.
I’ll walk you through an example.
Adding an interface method
As described above, I wanted to add a ReadReservations
overload to the IReservationsRepository
interface. The motivation for that is described in Code That Fits in Your Head, but that’s not the point here. The point is to use Git to move in small increments.
When you add a new method to an existing interface, the code base fails to compile when you have existing classes that implement that interface. How do you deal with that situation? Do you just forge ahead and implement the new method? Or are there alternatives?
Here’s an alternative path that moves in smaller increments.
First, lean on the compiler (as Working Effectively with Legacy Code puts it). The compiler errors tell you which classes lack the new method. In the example code base, it’s SqlReservationsRepository
and FakeDatabase
. Open one of those code files, but don’t do anything yet. Instead, copy the new ReadReservations
method declaration to the clipboard. Then stash the changes:
$ git stash
Saved working directory and index state WIP on tactical-git: [...]
The code is now back in a working state. Now find a good place to add the new method to one of the classes that implement the interface.
SQL implementation
I’ll start with the SqlReservationsRepository
class. Once I’ve navigated to the line in the file where I want to add the new method, I paste in the method declaration:
Task<IReadOnlyCollection<Reservation>> ReadReservations(DateTime min, DateTime max);
That doesn’t compile because the method ends with a semicolon and has no body.
So I make the method public
, delete the semicolon, and add curly brackets:
public Task<IReadOnlyCollection<Reservation>> ReadReservations(DateTime min, DateTime max)
{
}
This still doesn’t compile, because the method declaration promises to return a value, but the body is empty.
What’s the shortest way to a working system?
public Task<IReadOnlyCollection<Reservation>> ReadReservations(DateTime min, DateTime max)
{
throw new NotImplementedException();
}
You may not want to commit code that throws NotImplementedException
, but this is in a brand-new method that has no callers. The code compiles and all tests pass—of course they do: no existing code changed.
Commit the changes:
$ git add . && git commit
[tactical-git 085e3ea] Add ReadReservations overload to SQL repo
1 file changed, 5 insertions(+)
This is a save point. Saving your progress enables you to back out of this work if something else comes up. You don’t have to push that commit anywhere. If you feel icky about that NotImplementedException
, take comfort that it exists exclusively on your hard drive.
Moving from the old working state to the new working state took less than a minute.
The natural next step is to implement the new method. You may consider doing this incrementally as well, using TDD as you go, and committing after each green and refactor step (assuming you follow the red-green-refactor checklist).
I’m not going to do that here because I try to keep SqlReservationsRepository
a Humble Object. The implementation will turn out to have a cyclomatic complexity of 2. Weighed against how much trouble it is to write and maintain a database integration test, I consider that sufficiently low to forgo adding a test (but if you disagree, nothing prevents you from adding tests in this step).
public async Task<IReadOnlyCollection<Reservation>> ReadReservations(DateTime min, DateTime max)
{
const string readByRangeSql = @"
SELECT [PublicId], [Date], [Name], [Email], [Quantity]
FROM [dbo].[Reservations]
WHERE @Min <= [Date] AND [Date] <= @Max";
var result = new List<Reservation>();
using var conn = new SqlConnection(ConnectionString);
using var cmd = new SqlCommand(readByRangeSql, conn);
cmd.Parameters.AddWithValue("@Min", min);
cmd.Parameters.AddWithValue("@Max", max);
await conn.OpenAsync().ConfigureAwait(false);
using var rdr = await cmd.ExecuteReaderAsync().ConfigureAwait(false);
while (await rdr.ReadAsync().ConfigureAwait(false))
result.Add(
new Reservation(
(Guid)rdr["PublicId"],
(DateTime)rdr["Date"],
new Email((string)rdr["Email"]),
new Name((string)rdr["Name"]),
(int)rdr["Quantity"]));
return result.AsReadOnly();
}
Granted, this takes more than a minute to write, but if you’ve done this kind of thing before, it probably takes less than ten—particularly if you’ve already figured the SELECT
statement out on beforehand, perhaps by experimenting with a query editor.
Once again, the code compiles and all tests pass. Commit:
$ git add . && git commit
[tactical-git 6f1e07e] Implement ReadReservations overload in SQL repo
1 file changed, 25 insertions(+), 2 deletions(-)
Status so far: We’re two commits in, and all code works. The time spent coding between each commit has been short.
Fake implementation
The other class that implements IReservationsRepository
is called FakeDatabase
. It’s a Fake Object (a kind of Test Double) that exists only to support automated testing.
The process for implementing the new method is exactly the same as for SqlReservationsRepository
. First, add the method:
public Task<IReadOnlyCollection<Reservation>> ReadReservations(DateTime min, DateTime max)
{
throw new NotImplementedException();
}
The code compiles and all tests pass. Commit:
$ git add . && git commit
[tactical-git c5d3fba] Add ReadReservations overload to FakeDatabase
1 file changed, 5 insertions(+)
Then add the implementation:
public Task<IReadOnlyCollection<Reservation>> ReadReservations(DateTime min, DateTime max)
{
return Task.FromResult<IReadOnlyCollection<Reservation>>(
this.Where(r => min <= r.At && r.At <= max).ToList());
}
The code compiles and all tests pass. Commit:
$ git add . && git commit
[tactical-git e258575] Implement FakeDatabase.ReadReservations overload
1 file changed, 2 insertions(+), 1 deletion(-)
Each of these commits represent only a few minutes of programming time; that’s the whole point. By committing often, you have granular save points you can retreat to if things start to go wrong.
Now change the interface
Keep in mind that we’ve been adding the methods in anticipation that the IReservationsRepository
interface will change. It hasn’t changed yet, remember. I stashed that edit.
The new method is now in place everywhere it needs to be in place: both on SqlReservationsRepository
and FakeDatabase
.
Now pop the stash:
$ git stash pop
On branch tactical-git
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: Restaurant.RestApi/IReservationsRepository.cs
no changes added to commit (use "git add" and/or "git commit -a")
Dropped refs/stash@{0} (4703ba9e2bca72aeafa11f859577b478ff406ff9)
This re-adds the ReadReservations
method overload to the interface. When I first tried to do this, the code didn’t compile because the classes that implement the interface didn’t have that method.
Now, on the other hand, the code immediately compiles and all tests pass. Commit.
$ git add . && git commit
[tactical-git de440df] Add ReadReservations overload to repo interface
1 file changed, 2 insertions(+)
We’re done. By a tactical application of git stash
, it was possible to partition what looked like one long, unsafe maneuver into five smaller, safer steps.
Tactical Git
Someone once, in passing, mentioned that one should never be more than five minutes away from a commit. That’s the same kind of idea. When you begin editing code, do yourself the favor of moving in such a way that you can get to a new working state in five minutes.
This doesn’t mean that you have to commit every five minutes. It’s okay to take time to think. Sometimes, I go for a run, or go grocery shopping, to allow my brain to chew on a problem. Sometimes, I just sit and look at the code without typing anything. And sometimes, I start editing the code without a good plan, and that’s okay, too… Often, by dawdling with the code, inspiration comes to me.
When that happens, the code may be in some inconsistent state. Perhaps it compiles; perhaps it doesn’t. It’s okay. I can always reset to my latest save point. Often, I reset by stashing the results of my half-baked experimentation. That way, I don’t throw anything away that may turn out to be valuable, but I still get to start with a clean slate.
git stash is probably the command I use the most for increased maneuverability. After that, being able to move between branches locally is also useful. Sometimes, I do a quick-and-dirty prototype in one branch. Once I feel that I understand the direction in which I must go, I commit to that branch, reset my work to a more proper commit, make a new branch and do the work again, but now with tests or other things that I skipped during the prototype.
Being able to stash changes is also great when you discover that the code you’re writing right now needs something else to be in place (e.g. a helper method that doesn’t yet exist). Stash the changes, add the thing you just learned about, commit that, and the pop the stash. Subsection 11.1.3 Separate Refactoring of Test and Production Code in Code That Fits in Your Head contains an example of that.
I also use git rebase
a lot. While I’m no fan of squashing commits, I’ve no compunction about reordering commits on my local Git branches. As long as I haven’t shared the commits with the world, rewriting history can be beneficial.
Git enables you to experiment, to try out one direction, and to back out if the direction begins to look like a dead end. Just stash or commit your changes, move back to a previous save point and try an alternative direction. Keep in mind that you can leave as many incomplete branches on your hard drive as you like. You don’t have to push them anywhere.
That’s what I consider tactical use of Git. It’s maneuvers you perform to be productive in the small. The artifacts of these moves remain on your local hard drive, unless you explicitly choose to share them with others.
Conclusion
Git is a tool with more potential than most people realize. Usually, programmers use it to synchronize their work with others. Thus, they use it only when they feel the need to do that. That’s git push
and git pull
.
While that’s a useful and essential feature of Git, if that’s all you do, you might as well use a centralized source control system.
The value of Git is the tactical advantage it also provides. You can use it to experiment, make mistakes, flail, and struggle on your local machine, and at any time, you can just reset if things get too hard.
In this article, you saw an example of adding an interface method, only to realize that this involves more work than you may have initially thought. Instead of just pushing through on an ill-planned unsafe maneuver that has no clear end, just back out by stashing the changes so far. Then move deliberately in smaller steps and finally pop the stash.
Just like a rock climber like Alex Honnold trains with ropes and harness, Git enables you to proceed in small steps with fallback options. Use it to your advantage.
Tags: git, micro-commits, strangler fig pattern
45 Comments
Git is still a terrible tool. It was produced specifically for use in the development of one extreme-outlier product whose development model looks nothing like 99.9% of all software projects, and the fact that it’s managed to achieve such dominance in today’s world despite virtually everyone who uses it being willing to freely, openly say that they hate it, is a clear sign of something being very, very wrong in our industry.
Yes, you’re right that Git is terrible in many ways. My first exposure to DVCS was with Mercurial, which was much easier to pick up.
I think, however, we should identify what it is that makes Git ‘terrible’. Would it be fair to say that it’s the user interface?
I would certainly agree that it could be more user-friendly.
On the other hand, the capabilities that Git offers are, in my experience, far from terrible. I think that the benefits that Git offers so far outweigh the disadvantages that it’s easy to understand why it dominates. To me, this doesn’t indicate that something is wrong in our industry. Git is a local maximum. Higher peaks are possible, but in order to achieve something better, you must first leave the ‘peak of Git’ and traverse the ‘abyss of migration’ until you can reach another peak.
And who knows if the peak you eventually arrive at is actually higher that the ‘peak of Git’.
I have as a hobby because I’m not a software professional traversed the abysmal span of creating another version control system. I realize that again we need to address Nitzche’s question regarding history. Why something as baggy as history been a unique indication of advanced civilization? The positive and negatives of this question can be better fleshed out philosophically before we can even start to endeavor to create a better version control system. However, I like the rock climbing adventure analogy. I love living meagerly as an adventurer.
Something produced for use with X isn’t necessarily bad for use with Y. You don’t give a real reason why git is terrible for other projects. Explain?
Is it just me, or would a code example much closer to “Hello, World” than that used here leave more room in the reader’s head for the crux of this article, which of course is tactical git? Simplest Example –> Greatest Reach
That’s a good idea. I don’t own the idea of using Git this way, so I’ll be happy if someone would like to write such an article.
Often when I write a simple example, I’m met with the opposite criticism: The example isn’t realistic enough.
I don’t hear many people say they hate it. Sure, for many it’s their first interaction with a CLI, and they might hate that (learning curve), but neither does one need to use Git from the CLI, nor do people hate Git itself.
Git is terrible? I was using Mercurial for so long, and after switching to Git, I consider Mercurial pretty terrible, and Git is amazing in comparison. Mercurial is glitchy and slow. Git is stable and fast, as well as I find being more logical.
I seem to recall a quote stating something along the lines of Democracy seeming like the worst possible form of government … until you consider the alternatives.
Git revision control seems to fall into a similar trap, it does seem like an awful way to track development … until you consider the alternatives.
You’re awfully wrong if you call Git a terrible.
I used many a source control tool from CVS to Subversion, heck, even Perforce (whoops, just dated myself, but whatever). Git is so superior to these it isn’t even in the same ballpark—once you understand its basic concepts, which arguably took me several months, it becomes second nature, an extension of your mental representation of the development process. It’s the ultimate Swiss army knife of source control, with the uncanny perk of making it nearly impossible to mess up, you can undo anything (talk about a liberating solution). Its representation model
This is why it dominates in today’s world. I only hated it until I understood it (and isn’t that the case in so many areas of life, heh), and no, I’m not some kind of prodigy or eccentric geek, just an ordinary developer.
Oh, and it isn’t only suitable for “source controlling” Linux kernel development, as you implied, I have used it with undeniable, smooth success from trivial home projects all the way to one of the world’s largest corporate monorepos.
Now, is it perfect, obviously not, its commands and quirky option syntax make me curse occasionally to this day, its GC tends to kick in at the worst possible times, but again, all its flaws I’m aware of are minor hurdles definitely worth negotiating.
And that’s a terrible comment that leads off with a non-sequitur, backflipping straight onto a bandwagon fallacy which then crashes onto itself because it fails to give a single example as to why people “dislike” it in order to spark any meaningful conversation. The hint is that nobody hates it for doing what it’s meant to be doing (keeping versopm), defeating the entire criticism.
Only clear sign here is that the statement came off sounding like it’s from someone who cannot write and most probably cannot checkout code either. Sincerely hope nobody has to touch work coming from authors with a similar misunderstanding of reality.
Thank you Mark.
I use git almost this way…. but I learned to rethink the way you implemented a new method to an interface for so to stash it and implement the method in the classes to return to the top goal.
Why does it say “git statsh” in the graphic with the climber? 😀
diagram twice invokes
git statsh
I love the analogy. Do you think that micro-commits serve as training for free-solo development where you commit only at the summit? 😛
Off topic maybe, but,
Why await on every row of while (await rdr.ReadAsync().ConfigureAwait(false))? I wouldn’t have thought to make that async.
The default implementation of that method seems to just call the sync version
– https://stackoverflow.com/q/47229906
– https://docs.microsoft.com/en-us/dotnet/api/system.data.common.dbdatareader.readasync?view=net-6.0&viewFallbackFrom=dotnet-plat-ext-6.0#system-data-common-dbdatareader-readasync
I’m wondering if your implementation better supports async than the default one, and which implementation you are using.
SQL queries return bulk data (based on some SELECT statement return a list of items), so intuitively I’d guess that the (maybe) slow part is the query, not the “read item i of the results” part, unless perhaps the part of the driver that converts it to object data is slow.
Though I’m a little confused by that part, too, because computers are generally very fast at converting data from one type to another, and I am kinda wondering if the overhead of async is significant relative to the time spent converting the SQL results to something C# can use.
Also, obligatory “response time” analysis, parsing entries in the results is
– *Maybe* something that can be done in parallel, if your SQL implementation supports this I would find that pretty cool!
– Without that, though, there is no noticeable response time improvement to the user. There’s nothing we can respond with until we have all the results, unless we want to return one HTTP response per SQL row.
Or is this sort of a “future proofing” thing? E.g., that this code may be used with a different SQL implementation / server implementation?
It’s not as deep as that.
When I wrote that code, I was already in an asynchronous context, since the `ReadReservations` method returns a Task. In those situation, if an API offers me an asynchronous version of a method, I use that by default. I’m assuming that any library that offers both a synchronous and an asynchronous version of a method only offers the synchronous version for backwards compatibility, and that the asynchronous version is the preferred method.
Even if the current implementation of `ReadAsync` currently runs synchronously, future versions may behave differently.
Unless there’s a good reason to do otherwise, that’s my modus operandi.
I rarely attempt to predict how my code is going to perform. It usually performs well enough. In the rare cases where it doesn’t, I prefer measuring rather than speculating on what might be faster. I consider my approach to performance optimisation to be compatible with [Eric Lippert’s](https://ericlippert.com/2012/12/17/performance-rant/).
The method is actually truly async, the documentation is just misleading. ReadAsync() will first Call ReadAsync(CancellationToken.None) [0] and from there it will Fallback to sync.
But SQLReader overrides the ReadAsync method with a cancellationToken, with an actual async implementation[1], so both methods will be async. The docs on ReadAsync without a cancellationToken will be inherited from the abstract base class DBDataReader though, so it is not explained when you click on that method in the MS Docs instead of the one with the cancellationToken.
[0] https://github.com/microsoft/referencesource/blob/5697c29004a34d80acdaf5742d7e699022c64ecd/System.Data/System/Data/Common/DbDataReader.cs#L242
[1] https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqldatareader.readasync?view=dotnet-plat-ext-6.0#system-data-sqlclient-sqldatareader-readasync(system-threading-cancellationtoken)
I feel like this whole micro-commit thing is overkill. Sure, you should commit little and often, but what’s the value in committing every time you have code that compiles if you’ve not actually made any real progress towards the overall solution; let’s not forget that syntactically sound code isn’t the same as logically sound code.
I feel like there’s a far simpler solution to this, and that’s just to make sure that you have a high undo-max-steps set in your IDE. Ctrl+Z is your friend.
I don’t know if this is how everyone else uses it but I tend to commit experimental stuff to branches that are easy to delete / revert.
A pile of commits to main / the current feature branch that end up being the wrong solution (and sometimes you don’t know until you’re a bit deep into it) can be a bit confusing to follow if you end up rewriting the changes or reverting them.
Of course knowing when you’re about to do something that might be bad is part of the “fun”. My crystal ball is pretty good these days thankfully.
Tried a git gui app that allows to graphically move commits around? This way you can stack the preparatory cleanups and function extractions etc up front, then handle those with one or more merges, using your pipeline to verify all still works. And then the new feature or bugfix becomes a small commit that clearly shows what logic was the actual fix or feature.
I don’t think that I wrote that one should commit every time the code compiles.
I also use Ctrl+z frequently, but the problem with relying too much on the undo functionality of an editor is that the undo history is linear. Sometimes, you may want to undo something you did a while back, while still retaining something more recent. Git, being a directed acyclic graph rather than a linked list allows more nimble manoeuvring.
You should never be more than [2 minutes away from checking in and going home](https://www.youtube.com/watch?v=aWiwDdx_rdo), that’s the message from the excellent presentation/demo by Woody Zuill and Llewellyn Falco. This does not translate directly to unconditionally commit something every two minutes, but you should commit whenever you have something that is consistent enough to compile and pass tests (which you of course are verifying by using [https://github.cin/mhagger/git-test](git test), right?). You can always join commits later on with interactive rebase if you feel the existing commits are too small/detailed (that’s a trivial and conflict free operation when the commits are history neighbors). E.g. in this example you could join the five commits into one before making a pull request/merging the result. But you should most certainly create individual commits like presented here while working on the feature.
@Mas I’ve never had the idea that ‘virtually everyone’ hates Git, in fact I only ever hear positive things about it, other than that it’s hard to learn.
How does this interact with test-driven development? Do you keep the target tests outside of the code so you can keep micro-commits, or do you modify your notion of “target” tests to be more incremental?
It works fine with TDD. Write a test, make it pass, commit. Refactor, commit. Write a new test, make it pass, commit. Refactor, commit.
Since commits are really ‘snapshots’ rather than deltas, the name ‘micro-commiy’ is a bit misleading. (I also known Tim – he didn’t actually coin the term micro-commit’).
Its really more like a “baby step” (or a ‘small move’ if you like the movie/book ‘Contact’)
Fortunately, in the SCM-patterns world, this has been called ‘Private Checkpoint’ (since the late 90s).
Forvexample, if practicing TDD, and its famous progression of Red-Green-Refactor*, it would be common to make a (Private) Checkpoint after each one of those steps, even tho you likely wouldn’t merge until after the refactoring was done for each such TDD- cycle.
With Git in particular, although such frequent checkpointing may be common, once you are ready to merge/integrate your changes do far, you should probably ‘squash’ all those previous checkpoints come merge-time.
One thing I dislike about ` git stash` is that the stack of stashes are not per-branch but shared globally for the repository. If I need to switch branches or in the middle of a messy edit, I much prefer to make a commit of non-compiling code with a message marked “WIP”, and when I return I can just `git reset HEAD^1` to get back to where I was.
That had me looking up “statsh” as a Git command.
Me too, on any other website I would just think “this is just a typo”. But on stackoverflow I thought this might be a new feature 😂.
Maybe it’s because this is a contrived example, but why the need for stashing here?
Why not just add the method, and go stub out the implementations in the classes that implement the interface? If it’s about risk, you rightly say “…this is in a brand-new method that has no callers.” What’s the risk in doing all the stubs at once?
I hate “stash”. In almost every way imaginable, I find Git to be better than TFS. One thing from TFS that I miss sorely, though, is shelf-sets. Why can’t I “apply” individual files? I know I can convert a stash to a proper branch, but that’s a lot of forced process just to grab one file. I’m sure a lot of that is just spending the time to automate ways of doing things that I can’t currently do easily.
Great article, though. It’s so tempting to use “big bang” programming. Then, you wind up in the “a bridge too far” state!
Jamie
What you suggest is also what I did in the book. With this article I wanted to show that a more fine-grained alternative exists.
I’m all for committing often but I think these commits are way too small. If it takes roughly the same amount of time to actually commit the code as it did to write it then you’re just wasting your time. Especially if people you work with value a “clean history” then you’ll have to squash those commits before pushing anyway which just creates more busywork.
Do you consider typing speed to be the bottleneck in programming? I don’t: https://blog.ploeh.dk/2018/09/17/typing-is-not-a-programming-bottleneck
Oh, and I don’t squash my commits: https://blog.ploeh.dk/2020/10/05/fortunately-i-dont-squash-my-commits
What is the aversion to making commits in a non-compiling state? You don’t have to make every branch public. When you figure out a problem or are about to make a big change you should commit, build error or no. Someone wants to test something out and they need it buildable they can just use a different branch/commit and rebase when you get to a good point.
It’s up to each team to decide a policy on this matter, but I prefer all commits to be buildable and passing all tests, because it enables troubleshooting. For various reasons, I tend to often rebase branches as I work with them, and I favour being able to run the build (including tests) for each commit to verify that I didn’t break anything if I reordered commits, etc.
Granted, I have exotic needs for some of my repositories.
Still, imagine that you need to do a git bisect to find the source of a defect. First, you write an automated test that reproduces the bug, then you rewrite history by adding that new test in the past, and finally you run git bisect to find the first bad commit. This is easiest to do if all commits compile and pass all tests (apart from the new hypothetical repro).
This may sound like a stretch, but I mostly consider commits that compile and pass all tests good hygiene.
If you use an IntelliJ-based IDE, e.g. Rider, you have a “micro-commit” system built in, called “Local History”. With it, you can revert a file or folder to any past state. This saved my head many times already, and it’s one of the reasons I prefer Rider over Visual Studio or VS Code.
Time to put my pedant hat on
Free Climb != Free Solo
The terms are not interchangeable, they even have separate Wikipedia articles
https://en.wikipedia.org/wiki/Free_climbing
https://en.wikipedia.org/wiki/Free_solo_climbing
I’m switching to TCR (circumstance permitting), and I’m finding it more efficient than the contrived micro committing tactics. YMMV.
Mark,
Thanks for an interesting article. I see two potential problems with it this methodology, and am eager to hear your thoughts.
My main issue with unsquashed micro-commits is a potential loss of context. You seem to suggest that a single new feature should consist of separate commits for, say, adding instance members, adding interface members, implementing instance members, adding calls to the new members, etc. Imagine now that I (possibly at a much later date) want to know why a line was changed, and I use git blame. I imagine it would be more helpful to me if the commit contained the changes for the entire feature, so that I know the context for the change (including a hopefully helpful commit message), and can easily see what other changes were part of the same feature. That way, I wouldn’t have to look for the commit in the git log, and then hunt up and down from that commit to try to identify which commits are part of the feature and then piece together a coherent context for that change. (One could perhaps use branches for this, but unlike commit messages, branch names are very restricted, and AFAIK you’d need to push them to the remote, which would end up with very many branches after many features have been implemented.)
Another issue: You have previously said on your blog that you can only trust a test if you have seen it fail. In terms of red-green-refactor, why not commit failing tests? If a bug is reported, isn’t it worth having a commit “Add failing test for …” to explicitly document that that test, at some point in time, did indeed fail?
The arguments for and against squashing commits remind me of the controversy about event sourcing. Fundamentally, commit-squashing is a destructive process. Once you’ve squashed many commits into one, you can no longer reconstruct the original history.
On the other hand, if you retain the original commit history, you can always do a ‘private squash’ if you need to. In the situation you outline, if you truly feel you could get better insights from a squashed commit, then squash it on your local machine, and keep the squash there.
That suggestion, however, is mostly for didactic purposes. Git is perfectly able to show a diff across multiple commits. You don’t have to squash the commits to see that ‘condensed’ view.
With that out of the way, I’m not ready to accept the premise that a squashed commit is somehow more informative. Is one massive commit better than many small commits? In a big commit, you have to navigate a large diff set. With many small commits, you do need to navigate up and down a branch, but that’s possible, too. I usually use gitk for that.
The argument that it may be easier to navigate one big thing instead of many small things reminds me of another discussion: How to structure files in a code base. I’m a proponent of many small files in a code base. Some people react negatively to a myriad of small files because they feel that it makes it hard to navigate the code base. Apparently, they feel that a few God Classes are easier to navigate.
I disagree.
You write: “One could perhaps use branches for this, but unlike commit messages, branch names are very restricted, and AFAIK you’d need to push them to the remote, which would end up with very many branches after many features have been implemented.”
Yes, indeed, that’s how git merge works. Once you’ve merged a branch, you can ‘delete’ it so that it doesn’t look as though you have a lot of clutter. The only thing that you delete when you do that, though, is the branch name. The history (and the branch) remains, which, to be clear, I consider a good thing. Again, you can always squash locally if you have the original history, while the converse is impossible.
As to the other issue, I consider it good hygiene that every commit builds; i.e. compiles and passes all tests.
Why? It’s useful to know that if you go back and examine an old commit, the code in that snapshot was considered correct at the time.
If, for the sake of argument, you allow tests to fail in a commit, why stop there? Should we also allow commits where the code doesn’t compile?
You wrote: ” isn’t it worth having a commit “Add failing test for …” to explicitly document that that test, at some point in time, did indeed fail?”
You can still do that even if you check the test and the fix in as one commit: Check out that commit, and then check out all files except for the test file from the commit prior to that. You should then be able to observe the failing test.
Git is so useful and powerful if you let it.
At my work, every commit must pass all tests, so we cannot merge a failing test.
Usually we do what Mark said and commit a test that fails on its own and the change to the production code that makes it past. When I review code, I sometimes verify that this test fails by running it without the change to the production code. Often the situation is simple enough that a static analysis is convincing enough.
Other times, we commit a characterization test and a comment next to it saying so. Then in a separate commit when we later make the change to the production code, we make a tiny change the test. For example, we might change the assertion in the test from Assert.false to Assert.true.
Thank you for the clarifications. While I am already conscious of commit sizes, I am now more convinced than before that it would be beneficial to reduce my commit sizes even more. I’m not sure I buy into the degree presented here (e.g., separate commits for adding placeholder implementations that throw NotImplementedException), at least not for the commits I push to the remote, but I’ll keep it in mind and use my best judgement as I go along.
I assume part of the story is that I use Rider as my IDE, which already gives me excellent maneuverability in terms of stashes (or shelves, as Rider calls it), diffs, committing parts of changes (similar to git add -i), etc. Commit sizes have therefore not been an issue for me in terms of tactical maneuverability while developing, experimenting, etc.
Funny enough, I blogged in the same spirit a couple of years ago. Glad to see that I’m not the only one doing it 🙂
https://fullstack.info/why-you-should-use-an-scm/