A look under the hood: how branches work in Git
Git has won the race for the most popular version control system. But why exactly is it so popular? The answer, at least in my opinion, is pretty clear: branches! They allow you to keep different versions of your code cleanly separated—ideal when collaborating in a team with other people, but also for yourself when you begin working on a new feature.
Although other version control systems also offer some form of branching, Git’s concept and implementation are just stunning. It has made working with branches so quick and easy that many developers have adopted the concept for their daily work.
In this post, I’d like to explore and explain the what, why, and how of branches:
- What are branches, in a nutshell?
- Why are branches so useful?
- How are they used, especially in teams?
- What do branches look like under the hood in Git?
We’ll be looking at the concept of branches in Git from a high-level perspective—with the ultimate goal of better understanding what they are and how they should be used so that you and your team can get the most benefit out of them.
What are branches, in a nutshell?
Put simply, a code base is a collection of files. Any meaningful change (e.g. when developing a new feature or fixing a problem) will most likely involve a couple of those files.
Just for a moment, let’s pretend that version control and especially its concept of “branches” didn’t exist. In such a situation you would have to be very careful when making changes—because you couldn’t easily undo them and you would risk breaking the current, working version. But there’s an easy way to mitigate these risks: you could simply duplicate the whole project folder! In this copy, you could then make any changes you like, without worrying about breaking something.
Branches in Git have the exact same effect and purpose: they provide developers with separate workspaces for their code. Of course, they are much more clever than our simple “copy folder” strategy. For example, they don’t waste disk space (which a simple file system copy would do) and they are much more capable when it comes to collaborating with other developers in the same project.
Let’s dive a little deeper into the details!
Why are branches so useful?
Branching’s value becomes clear if you think about your workflow without them: imagine that a team of 10 or 20 developers all work in the same “folder,” so to speak. All of them simultaneously editing the same source code files. Suddenly, their code is now your code. Some may be implementing brand new features, which could introduce bugs in the beginning. Every experiment that a developer has to perform in order to make progress and every bug that is unavoidable in that process will affect all of their teammates. A handful of people working together in such a way is enough to bring the whole development process to a halt.
When only one context exists that all teammates and all features have to share, commit history looks something like this:
As the (quite confusing) dotted red lines show, working without branches quickly results in chaos: it becomes hard to understand which commits belong to which feature! In the distant past, where a powerful branching model had not been available, version control was—quite frankly—less useful because a major structural element was not available.
Let’s imagine the same scenario with branches in mind. Every experiment, every new feature, every bug fix gets its own branch, its own context that is completely separate from all the other development contexts. “Separate” in this regard also means “safe”: if a bug occurs in one context, none of the other contexts are affected by that. Branches, in other words, are convenient containers that make developing in multiple, simultaneous contexts safe (and possible in the first place).
Features and bug fixes are clearly separated from each other, and so are their commits and their code:
In the above example, the changes in “FAQ Content” and “New Design #1” don’t get in each other’s way.
To make a long story short: the main benefit of using branches is that they make software development safe. They prevent bugs and problems from leaking from one context into another and help you undo mistakes more easily.
Why Git?
As said, branches are not exclusive to Git. Many other version control systems offer branches as part of their toolsets. But there are a couple of advantages that set Git’s branching model apart:
- Simplicity: Using branches in Git is really easy. Beginners sometimes hesitate before creating a new branch and ask themselves if a particular situation justifies creating a new branch. Such hesitation is unnecessary: creating new branches in Git is so simple, and it doesn’t come with any drawbacks. Just take disk space as an example: creating a new branch in Git does not copy around files, possibly creating a large amount of duplicate files; instead, Git’s file management makes sure that nothing has to be copied!
- Speed: In Git, all operations around working with branches are very quick. While some version control systems, for example, create copies of files when a new branch is created, Git’s internal management does not need this. As a result, there’s no waiting time involved when creating new branches or working with existing ones.
- Merging: Creating safe separate contexts using branches is one thing. But integrating (or “merging”) branches back into other contexts is the other part of the game. In many other version control systems, merging can be a complicated business: you might be asked to provide the exact revisions or select specific changesets. Git, on the other hand, makes it very simple and user-friendly. In most cases, it just works without any user intervention.
Git was built with the concept of branches in mind. It was neither an afterthought nor a “nice to have” feature, but instead one of the core requirements from the start.
How are branches used in practice?
The general idea remains the same: branches provide a safe, separate context for the many different kinds of work that happen in any project at any time. In practice, this takes a number of different forms.
Experimentation/New features
Whenever you start “something new” in your project, it makes sense to create a new, separate branch for that. It doesn’t matter if it’s just a little experiment or a full-blown new feature: you want to protect the mainline of code and your coworkers from all the small and big bugs that inevitably happen when new code is being produced.
Having a separate context has the additional benefit that it’s easy to get rid of—if your experiment doesn’t yield anything or your new feature doesn’t work out. Again, just think about a situation like that without branches: your new experimental commits would have been mingled with other features or even the mainline of code, making them difficult to get rid of later.
Bug fixes
Even for bug fixes (if they require more than a single line of code), a new separate branch very often makes sense. One advantage, again, is that the relatively untested new code that you are producing won’t interfere with the work of others. But the more important advantage, in this case, is about organization: by having the bug fix in its own branch, it’s easier to integrate it both when and where you want. Maybe you want to integrate the fix only at a certain point in time or into certain other contexts—all of which is easily possible with branches.
Short-lived branches
Both of the previous examples—experiments/new features and bug fixes—belong to what are known as short-lived branches. As the name implies, these branches aren’t meant to live forever. Instead, they were created for certain well-defined purposes (e.g. implementing a new feature idea) and can then be deleted once they have been integrated into a long-running branch.
Long-running branches
Unlike their short-lived colleagues, long-running branches typically remain in a project during its complete lifetime. These branches are not tied to specific features or topics. Instead, they represent states or stages in a project:
- the typical “main” branch for the project’s mainline or production code
- often also a “develop” branch where different short-lived branches are integrated
- sometimes also branches like “staging” or “production” that represent certain stages or environments
Along with this distinction between short-lived and long-running branches, there’s also often a golden rule: never directly commit on a long-running branch; code should only land on these branches through deliberate integration (via merge or rebase).
How do branches work technically, under the hood?
When trying to imagine how branches work, it’s tempting to use the concept of “folders.” After all, creating a new branch feels very much like copying the project’s current state and pasting it into a new, separate folder.
But part of the genius behind Git is that it doesn’t just “copy all contents” of your project, which would make things slow and use up lots of disk space. It’s much smarter than that!
Pointers, not file system copies
If branches aren’t copies of the file system, what are they? To answer this, it’s important to know that commits in Git are incredibly safe and efficient at storing data:
- Safety: A commit’s contents and metadata are run through an SHA-1 hash algorithm, which makes it very hard to manipulate data after the fact.
- Efficiency: Commits don’t store data again if it has already been stored previously. This means that files that have NOT been changed between commits aren’t taking up unnecessary disk space.
Keeping this in mind, would there be any need to “invent” something on top of this? No, of course not! That’s why branches in Git are an extremely lightweight concept – far from heavy “copies of the filesystem”.
A branch is simply a pointer to the latest commit in a given context.
This pointer is automatically moved every time you add a new commit on top. Very practical!
Let’s look at this fact again, from another angle: commits in Git are identified by their SHA-1 hash, those 40-character long, cryptic strings. These commit hashes are static and immutable. Branches, on the other hand, are highly flexible, always changing to the commit hash of the latest commit anytime you create a new commit in that branch.
What is a pointer, really?
But what do those mystical pointers look like, and where do they live? To answer this one, we’ll have to take a quick journey into the inner workings of Git: the hidden “.git” directory, located in the root of your project. In here, you’ll find a subdirectory named “refs” and another subfolder named “heads”: it’s here that each branch is represented by a file named after that branch.
When you create a new branch, e.g. with a command like “git branch my-new-branch”, you’ll find a new physical file in here, named “my-new-branch”.
And if you were to look at the contents of such a file in a text editor, you’d find that it only contains a single piece of information: the SHA-1 hash of the commit it currently points to!
The image above shows the contents of such a “pointer file”: the branch named “main” currently points to the commit with the SHA-1 hash “bf88ef73c”.
Which of those pointers is currently active?
But how does Git know which of your local branches is currently active? What is the branch that new commits are added to and that moves its pointer along when this happens?
Git keeps another, special pointer named “HEAD”. The sole purpose of HEAD is to point to the currently active (or “checked out”) branch:
In the local .git repository folder, you’ll find this information saved in a file appropriately named “HEAD”. And the content of that file typically looks something like this:
ref: refs/heads/my-new-branch
What happens when switching branches?
Finally, let’s take a look at what happens when you switch the currently active branch—e.g. with a command like git checkout other-branch
or git switch other-branch
.
Such a command will do two things:
- a) The HEAD pointer is changed and now points to “other-branch”.
- b) The files in your working copy are swapped and now represent the state from the latest commit in that branch.
Understanding Git == becoming a better developer
I hope that this post gave you a better understanding of how branches—one of the core tools in Git—really work.
Like in many other areas, it’s only when you deeply understand something that you can really reap its benefits. This is especially true for Git, in my opinion: advanced tools like interactive rebase, submodules, or the reflog have the power to make you a better software developer—but you’ll have to take a bit of time to properly learn and understand them.
If you want to up your Git game, I put together the “First Aid Kit for Git“—a free collection of short videos that explain lots of helpful tools like the reflog, interactive rebase and others that can help you avoid or repair mistakes.
Have fun becoming a better programmer!
Tags: branches, git, source control
20 Comments
Non-Fungible Commit Hashes 😱️
Also, there’s a typo with “Ccommits” about 2/3 down. Nice writeup!
Nice article, very educational! Can I request you to please write something similar for different ways of ‘reuniting’ the branches e.g. merge vs rebase etc. Thanks!
That’s a wonderful idea, Aditya. Thanks for the feedback!
What a great write-up.
Thanks a lot.
As a ClearCase expert (emeritus), I find git branches to be very tedious to use – although better than svn.
The big problem is that git branches seem to be associated with individual files rather than groups of files. Yes you can find check out all the files with the same branch tag, and yes you can diff all the files with a branch tag against corresponding files with a different branch tag.
And this is much better than svn.
But the thing that thing that annoys me is the fact that you can’t easily get the branch point for the collection of all files in the checked out branch within spending three hours digging through online docs that are mostly wrong or incomplete.
I have the solution – you find the common ancestor of all files with the branch and pretend that that is really the base branch from which to children derive. While this seems to work, it’s very frightening because git let’s you do history editing and who knows what mess someone on your team might have made while innocently trying to follow examples on the internet.
Or maybe I have not yet found the magic command that says please give me the branching functionality that ClearCase used to give me.
Yep, git branches might be better than some other source control systems, but they are really bad compared to mercurial.
Mercurial is really similar to git, it is a distributed source control and it has everything git offers plus stellar branch management. Also, it has “revsets”, which is a mini language that allows you to find commits really easily. Branches in mercurial are so simple and efficient! You never struggle to find the origin of a branch. Each commit has the name of its branch embedded in it, you can’t get lost! I don’t know how many times I had to drill down git history just to find the origin of a branch.
It’s simply maddening that mercurial lost to git in the source control war…
I appreciate the article, as I need to try to figure out how Git does things (it is just so awkward to work with).
“Although other version control systems also offer some form of branching, Git’s concept and implementation are just stunning.”
Going to have to disagree 100% there. Mercurial did this much more intuitively than Git. It’s a shame that Git , an awkward and hard to use system, won the popularity contest.
You can tell the system is flawed when the standard practice in many workflows is to use “push -f” to force things. As with any tool, if you have to force it something is wrong.
It’s good to know that I’m not the only one who wishes Mercurial had become mainstream!
If your standard practice is to force push than you don’t know how to use git properly.
Then a large number of projects don’t know how to use Git properly. I’ve seen advice on dev lists for major projects to do forced pushes. It seemed to be related to Git’s awkward user interface and not working as people expect it to. I myself have always used Mercurial (so much easier to use than Git – all of the benefits a distributed system with an interface that makes so much more sense), when I contributed to an open source project, I basically had to re-clone the project after each contribution because after my changes were integrated upstream my existing clone of the source would cause problems if I tried to contribute a new unrelated change… It was a painful experience.
Same here! Mercurial was better in every way… So sad it didn’t win over git…
I’m with you that Mercurial is indeed a wonderful alternative to Git! Absolutely agreed!
My personal experience (and that of my team and the teams I talk to) is that things like “force push” should NOT and don’t have to be standard practice. Designing workflows and collaboration carefully should result in these “bad habits” to be necessary only very rarely.
This article *looks* like what I had written long time ago.
https://xosh.org/explain-git-in-simple-words/
It definitely doesn’t read like it!
After reviewing both, I find your work is more concise and comprehensive (TY). Those seeking to understand Git may wish to read your article first, then return here.
you can find an amazing interactive way to learn how git works on https://try.github.io/ under “learn git branching”.
I agree that branching in git is 100% more efficient than in SVN, but I wouldn’t say it’s actually easier, just from a user point of view – git always has extra steps and gotchas.
And SVN branches are probably easier for the average person to understand, just because they are simply a copy of everything. Also not sure why you believe merging is any easier – you’d still need to resolve any conflicts, and as ever git seems to offer a hundred possible ways to do that and we’re left trying to figure out what it all means.
Having switched from SVN to git, I wouldn’t go back, because of the lightweight branches and specifically how those work with a continuous integration process, but if the simplicity of SVN could be combined with the efficiency of git, I’d be a very happy developer.
If you like the simplicity of SVN then you should use Mercurial. Mercurial even has extensions so you can use the Mercurial commands to clone and push to a Git repo.
Thank you, Tobias, for a clear and easily understandable write-up.
Enjoyed reading it 🙂
I just started learning Git/Github and this has cleared up so many questions. Great read A++!