Thanks to David Meyers, Principal Engineer at Flipp, for introducing me to this concept, normalizing it, and implementing it flawlessly.
Raw startups are often chaotic and scattered. "Move fast and break things!" was repeated in the standups of hundreds of startup engineering orgs. Typically, startups tend to build monolithic systems to reduce friction in creating and changing features. As time goes on, the monolithic architecture begins to show strain, and almost inevitably begins to be broken up.
Teams begin to take ownership of the new services, usually adhering to Conway's Law. Since each service is only owned by a single team (or should be), questions inevitably arise related to the technologies used for each of them.
Should the backend administrative system, used by a dozen internal employees, be written using the same frameworks and stack as the tight, performance-critical end-user system? Should batch processors have the same technical footprint as stream processors? Should analytics databases use the same solution as the one used for ad-hoc queries?
Some engineers love to experiment with new, bleeding-edge technologies. Others warn of the perils and demand adherence to the tried-and-true. As the company grows ever bigger, the same sorts of problems emerge over and over again to be solved. Do you solve those problems the same way you did earlier because it's already done? Or do you go with a new approach that offers more benefits?
How these problems get solved becomes more critical as a startup transitions to a medium-sized company.
This article discusses all kinds of technology from a software engineering perspective. This can include programming languages, frameworks, database systems, ways to store and transfer files, schema definition systems, etc. I will use the word tech as a shortcut to describe this list of solutions, systems, and projects.
Option 1: The Wild West
At one extreme, we could simply allow every team to choose whatever tech they wanted to use with full autonomy. There's no oversight committee, no red tape, and no blockers to doing what they want how they want.
- Teams are empowered to make their own technology decisions based on the information they have.
- The "best" tech for the problem being solved can be used.
- Teams are not tied down by previous technical decisions made either by themselves or other teams.
- Morale is high since engineers get to use new, fun tech rather than being stuck with old or out-of-style choices.
- Engineers feel that their voices matter in technology decisions.
- It is more difficult to reuse previous solutions written in other languages or frameworks.
- It is harder to generalize solutions across similar problems since reuse is harder.
- There is risk of "technology sprawl": a single engineer has to know multiple technologies in order to succeed at their job.
- This can result in superficial broad knowledge of languages or systems rather than deep knowledge of one tech, which can affect the ability to debug problems or make more advanced changes.
- Hiring and internal training becomes more onerous. Either the company has to hire engineers who already know the full set of tech or spend more time training them once they are hired.
- Internal support becomes more fractured since there will be fewer experts in any one tech to help less experienced developers.
- Building internal tools may become more difficult since it has to work with all the tech that the team (or company) supports.
Option 2: Lock it down
At the other extreme, whatever choice was made at the company's inception becomes the one and only solution that must be used. If the first application was made using Python and Postgres, then every service must use only Python and Postgres.
The advantages and disadvantages are swapped here.
- Reusing previous solutions and generalizing them becomes much easier since you are guaranteed to use the same tech stack.
- Each engineer can be trained on a single stack and can more easily learn deep knowledge about it, and therefore help support other engineers.
- Hiring is easier since only one set of skills is necessary.
- Internal tools are simpler since they only have to deal with one set of tech.
- Engineers become demoralized since they feel stuck with old tech that may not be well suited to the problem at hand.
- Trying to squeeze a tech into solving a problem it isn't designed for can result in increasingly hacky and costly kludges.
- Engineers do not feel empowered since they have no agency to make technological decisions.
The golden mean
As you can guess, the "right path" is somewhere in the middle of these extremes. At Flipp, the term used is the Tech Toolbox, and it works like this.
- The team or company curates a list of approved tech. This list should be very small.
- The contents of the list should start with whatever the company is using at the moment.
- Each tech on the list should be given an overall status of approved, pending approval, discouraged, or not allowed.
- Further, each tech should be specific as to what use cases it is approved for. For example, Postgres might be approved for ad-hoc queries and internal tools, while MongoDB might be used for more performance-critical uses where the queries are well-known.
- All new projects must by default use tech on this list, and all other tech is not allowed.
- Approved tech are given official support by internal tools and our engineering guilds. All other tech is not supported.
Crucial to the success of this framework is that it should have three processes:
- Adding a tech to the toolbox
- Auditing and removing tech from the toolbox
- Allowing one-off exceptions to the policy
There should be a committee of senior, staff, or principal engineers who manage these processes. This sounds heavy-handed, but in reality, once the toolbox is set and disseminated properly, these processes happen very rarely.
Adding a new tech
If a team feels that none of the tech in the toolbox currently solve its problem set and that a new tech could be broadly useful across the team or company, it can petition the committee to add it to the toolbox.
This petition needs to be presented as a business case. This doesn't mean that the team needs to do research with actual dollars attached to it. They need to be able to argue one or more of the following:
- The tech allows us to significantly reduce infrastructure cost by being more performant.
- The tech allows us to reduce engineering costs and go to market faster by making it significantly faster to develop with.
- The tech saves on testing and quality costs by reducing errors.
- The tech increases morale by providing a less stressful development environment. (This could be a broad category).
Generally, there should be some kind of data attached to this. In the latter point, it might simply be surveys of the engineers on the team.
Once the committee is satisfied, the tech is considered pending approval. The requesting team should proceed with a proof-of-concept implementation of the new tech so that any kinks or surprises can be ironed out. Once this is deemed a success, the tech can be approved in the toolbox.
If the tech fails the process, that doesn't mean the proposal is dead in the water! Proposals can always be revisited if there is new information or use cases. And the proposing team always has the option of using it for one-offs (see below).
Auditing and removing tech
Periodically, the committee should speak to the engineers who manage the systems they own and see if a particular tech has fallen out of favor, and if so, why. Sometimes a new tech entirely supplants an older one and is deemed unwise to use since the new one is better in most respects for the use case in question.
If the committee and engineers agree, the tech in question then becomes demoted to discouraged. This means that no new projects should be built with it, and all existing projects using it should have some kind of plan to move off of it if possible.
Once as many services as possible have moved off it, the tech can be moved to not allowed. There can be legacy systems using the old tech that are grandfathered in and allowed to stay, but these are "special cases" and don't affect the overall status of the tech in the toolbox.
No one likes working for tyrants, and the tech leadership committee is no different. There are always cases where a tech that didn't make the cut or is marked as discouraged still needs to be used.
With a tech toolbox, we have a single central document we can point to explaining what we use at Flipp and why. We have a "paved road" to make it easier to create projects using this technology, such as an app generator, shared libraries, deployment support, and more. We have guilds that meet regularly and take on projects to improve the usage and documentation for how we use that tech within the company. And we have an explicit process to make changes to this list so engineers don't feel disenfranchised.
As your company grows, technology decisions become more costly. Having a framework like this in place will provide a defined path forward.