Why you should build on Kubernetes from day one
If you’re starting a new project from scratch—a new app, service, or website—your main concern usually isn’t how to operate it at web scale with high availability. Rather, you’re likely to focus on building the right thing for the customer you’re targeting or finding product market fit. If you’re creating an MVP for a startup, you need to get to that before scaling, otherwise, who are you scaling for? If you’re a developer at an enterprise, you want to make sure that you’re meeting the expectations and needs of the business. Operating at scale is, at best, a tomorrow problem.
Hence, when it comes to choosing the right set of technologies, Kubernetes—commonly associated with large, distributed systems—might not be on your radar right now. After all, it comes with a significant amount of overhead: setting up and operating a cluster, containerizing your app, defining services, deployments, load-balancers, and so on. This may look like massive overkill at the early stage, and you might think that your time would be better spent on other tasks, such as coding the first couple of iterations of your actual app.
When we started building Stack Overflow in 2008, we didn’t have that choice. There was no Docker (2013), no Kubernetes (2014). Cloud computing was in its infancy: Azure had just launched (2008), and Amazon Web Services was about two years old. What we built was designed for specific hardware and made a lot of assumptions about it. Now that we’re modernizing our codebase and moving to the cloud, there’s a fair amount of work we have to put in to make Kubernetes and containers work.
Going through this process has given us a new perspective. If you’re building a new app today, it might be worth taking a closer look at making it cloud-native and using Kubernetes from the jump. The effort to set up Kubernetes is less than you think. Certainly, it’s less than the effort it would take to refactor your app later on to support containerization.
Here are three reasons why building your app on Kubernetes from the start might not necessarily be such a bad idea anymore.
Managed Kubernetes does the heavy lifting
At Stack Overflow, when we set up our first in-house Kubernetes cluster a couple of years ago, it took us close to a week to get everything up and running: provision virtual machines, install, configure, configure, configure. Once the cluster was up, there was ongoing maintenance. What we ultimately realized was that Kubernetes was great for us—but we wanted somebody else to run it.
Today, managed Kubernetes services such as Amazon’s Elastic Kubernetes Service (EKS), Microsoft’s Azure Kubernetes Service (AKS), or Google’s Google Kubernetes Engine (GKE) allow you to set up your own cluster literally in minutes. For example, in AKS, you can just click a few buttons in the portal and fill out a couple of forms:
This is convenient, but you might wanna stop short of actually creating the cluster at the end of the workflow. Go through the wizard, but don’t click that blue “Create” button at the end! Instead, download the configuration you just created as an ARM template and check it into your source control system. Now you have the best of both worlds—ease of use and infrastructure as code (IaC)!
Once you’re set up here, there’s little left to do once you start scaling your application except write bigger checks to your cloud provider. Any additional resource allocation is easy. The problems that come with scale—fault tolerance, load balancing, traffic shaping—are already handled. At no point will you hit that moment of being overwhelmed with success; you future-proofed your app without too much extra effort.
You can stay (somewhat) cloud agnostic
If your project is successful, chances are that technology decisions made in the early stages will still have an impact months or years down the road. Stack Overflow, for instance, was originally written in C#. 13 years later, it’s still written in C#, but it used to be, too. Occasionally someone (ahem) suggests that we rewrite it in Node.js. That’s still never happened.
The same can be said about dependencies on cloud services. You might build your new app on top of infrastructure as a service (IaaS) products like Amazon’s EC2. Or maybe you’re starting to take dependencies on platform as a service (PaaS) offerings such as Microsoft’s Azure SQL. But are you willing to make a long-term commitment to the cloud providers behind them at this stage? If you don’t know yet where your journey is going to take you, maybe you’d prefer to stay cloud agnostic for a little longer.
Let’s get back to infrastructure as code: throwing a tool such as Terraform into the mix is going to help you with staying cloud agnostic to some degree. It provides a unified toolkit and configuration language (HCL) to manage your resources across different cloud and infrastructure providers. Your app is unlikely to be truly cloud agnostic, however, in the sense that you’ll be able to just switch your cloud provider as easily as your internet or electricity provider at home.
Here’s a good discussion on this topic in HashiCorp’s forum: Is Terraform really cloud agnostic? As one of the commenter points out:
> “A Kubernetes cluster is a good example of an abstraction over compute resources: there are many hosted and self-managed implementations of it on different platforms, all of which offer a common API and common set of capabilities.”
This sums it up nicely! It’s still not a perfect abstraction. For example, each cloud provider is likely to have its own custom way of implementing things like public load balancers and persistent volumes in Kubernetes. It’s still be fair to say if you’re building on Kubernetes, you’re going to stay cloud agnostic to a certain degree.
You can easily spin up new environments—as many as you like!
Kubernetes is usually seen as a way to manage your production infrastructure. But here at Stack Overflow, we’ve been using it to manage our test environments on the fly. We’re using Kubernetes to host what we call PR Environments. Every pull request can be run in an isolated test environment at the push of a button:
And when we say “isolated environment”, we mean everything: the app itself (with the code changes in the PR branch) with its own dedicated instances of SQL Server, Redis, Elasticsearch, and additional services pieces. All spun up from scratch within minutes and running in a handful of containers in a dedicated namespace, just for you and anyone who’s interested in your PR.
This isn’t something we invented; other organizations have been using this concept. The idea is that every code change goes into a version control system like Git through a pull request. Other developers will review the code, but the code won’t tell the whole story. You want to see the code in action. Normally, you’d have to download all the code locally, compile, and run it. That could be simple, but if you’re running a large application that draws code from multiple repos or—have mercy—a microservice architecture, then you may run into several hours of debugging.
Even better, let’s say you’ve squashed all of the commits for a new feature into a single one and are committing it as a single PR. Send that PR environment to sales or marketing as a single link so that they can preview the feature in action. If your sales team wants to demo the app with specific features or custom builds, send them a PR environment link. You won’t have to spend time walking your less technical colleagues through the build process.
A lot of groundwork was required to get to this point. First off, running classic .NET Framework in Windows Containers wasn’t really an avenue we wanted to pursue. It’s possible in theory—Windows support has been available in Kubernetes since v1.19—but the Docker/Kubernetes ecosystem is really more centered around Linux. Thankfully, our migration to .NET Core was already underway, so we decided to bet on Linux containers.
This, of course, came with its own set of challenges. When you’re dealing with a 10+ year old codebase, you’re likely going to find assumptions about the infrastructure it’s running on: hardcoded file paths (including our favorite: forward slash vs. backslashes), service URLs, configuration, and so on. But we got there eventually, and now we’re in a place where we can spin up an arbitrary number of test instances of Stack Overflow, the Stack Exchange network, and our Teams product on our auto-scaling Kubernetes cluster. What a time to be alive!
Looking back at the early days of Stack Overflow, having this kind of tooling available back then would have been a game changer. In the early stages of building a product, you typically want to build, measure, and learn as much and as fast as possible. Using containers and Kubernetes will allow you to build the tooling for that and future-proof you in case you’re going to scale.
So, should you use Kubernetes from day one? Maybe! It still depends, of course, on your specific project, your requirements, and your priorities.
But have you been saying “we don’t need Kubernetes because we don’t have product market fit yet”? Take a closer look, and maybe you’ll find yourself saying “we need Kubernetes because we don’t have product market fit yet.”
Tags: kubernetes, software development, testing
33 Comments
Why should you stop telling people what to do on day 1 without knowing their circumstances.
If you can find the time to finish reading the blog post, you might agree that I’m not actually doing that.
That’s kind of off-topic, but instead of ARM templates, you can now use Bicep (https://github.com/Azure/bicep), a dedicated language designed specifically to make deploying Azure resources easier. It’s a lot more readable than the XML files that make ARM templates.
Not at all off-topic – thanks! Do you have any thoughts on using Bicep vs Terraform?
We’re fully invested in Azure, so in our case, Terraform doesn’t seem to provide additional value. Also, we were lucky that Bicep was available when we started our IaC journey, so we didn’t evaluate other tooling. I can just say that IaC is awesome, and made our life so much easier 😀
Great to hear, thanks for sharing!
No, no and no. Rolling out a Kubernetes cluster is considerably complex and definitely not worth it for every start up, especially as most cloud providers charge for it, even if it is idle. It is more important to run your code in containers. This gives you the flexibility to run on compute instances initially, and then migrate to K8s when you need the ability to scale up.
I respect your strong opinion and appreciate you sharing it. However, I have a different view. Rolling out a cluster really isn’t particularly hard anymore, as managed offerings have improved significantly and take care of the heavy lifting. I agree that it’s not worth it for every startup, as that always depends on individual circumstances and goals. However, especially in the early stage, frequent iterations are crucial; the ability to dynamically spin up environments in separate K8S namespaces is – at least for us -a very valuable tool which has increased productivity and enabled faster iterations. Our designers and PMs have described it as a game-changer, and we wish we had this in our toolbox in our own early startup days. Hence, I’m proposing to make that part of your consideration. It’s true that even an idle cluster will accrue charges, but autoscaling will mitigate this to a large degree and the cost of running just a handful of VMs is relatively low – especially compared to the cost of engineering /designer/PM bandwidth. I agree that it’s important to run your code in containers, but that’s a prerequisite for running Kubernetes, not an argument against it. Good luck!
Spinning up a vanilla cluster is indeed trivial from my experience on AWS, however, it doesn’t stop there. We spend a week installing additional services . For example, if you want to do configuration management properly you will need Kustomize, Calico if you want to control egress, etc.
In short you are proposing a Ferrari to drive to the corner shop a mile down the road. It will get you there, however, a Fiat Panda will do equally well and is much cheaper to maintain.
> After all, it comes with a significant amount of overhead: setting up and operating a cluster, containerizing your app, defining services, deployments, load-balancers, and so on.
I was hoping this article would address these pains a little more directly but it seems to only have a solution to the “setting up a cluster” part. As someone currently building a small MVP app, I’d *like* to use k8s day-one, but dealing with all the moving parts involved just isn’t practical right now.
Setting up virtual networks, load balancers, deployments, autoscaling, etc… it takes a lot of time and I can’t justify it for a project with zero users.
You’re making a great point here – if setting up tons of infrastructure pieces takes a lot of time, your bandwidth will certainly be better spent on building the actual product. It’s of course impossible for me to judge how many of the pieces you’re mentioning would be needed for your specific project; however, in my personal experience, it’s often possible to get off the ground with a relatively small subset of everything that Kubernetes has to offer in little time. Autoscaling won’t be needed until you have actual users. “Setting up” load balancers and deployments means writing a few lines of YAML. And typically, putting these pieces in place addresses problems you’ll have to solve regardless, with or without Kubernetes. I mean, deployments … you’re gonna have to deploy your code somehow, right?
And especially at MVP stage, you’re gonna be living and breathing fast iterations while seeking tons of feedback from various stakeholders. Wouldn’t it be nice if were able to have different versions of your app and quickly spin them up and down as needed? Kubernetes will allow you to do that. Good luck!
All good points! I’ll have to look a little more into it 🙂
I know GKE also offers an ‘autopilot’ version of the cluster which is easier to set up and go for smaller apps, not sure if it’s available on other cloud providers.
Hey buddy I’m with you – Kubernetes is the answer to scalable and successful startups but there has to be more layers of abstraction to get us from nothing to a working deployment that can be reproduced in any cloud provider or on-prem cluster, the technology is just not there yet unless someone has a different point of view..
I second that! And such abstractions are emerging already. For example, have a look at https://github.com/deckhouse/deckhouse — it’s Kubernetes that works on any infrastructure (just as you mentioned) and is simplified in management/maintenance.
if the Sr leadership has committed to let’s say Azure – would you still use Kub? or, go with App Services and Az Functions?
Great question! In that case, staying cloud agnostic is not a concern. Your technology choices will always depends on your specific circumstances and goals. If you would benefit from dynamically spinning up test environments in separate namespaces in a K8S cluster, as outlined in the blog post, I’d recommend factoring this into your consideration.
If you are running IaC as recommended in post I would go for Web apps and functions for sure. You get the same possibility to do PR dedicated environments from those services too. The big setback for me when deciding to go for Kubernetes at early point is that people need to understand it to operate it (and sadly not a lot of people do) which will slow you down. Additionally when building the service within kubernetes from scratch you will have to be careful not to construct a distributed monolith, when building on pure PaaS services you are more likely to build indipendent by design. Finally, at the Build Conference Microsoft went live with project Lima/Azurr Stack for webapps and functions which means that you can deploy your services as containers in any kubernetes cluser, aks, any other cloud or your locally hosted.
With that said, is it worth learning and understanding kubernetes? Sure!
Another blog putting equality between “app” and “web app”. Is kubernetes needed for desktop app? For automotive app? For embedded app?
These articles are not assuming your “app” is a “web app”, they are assuming it to have a “cloud” infrastructure.
Nowadays, even refrigerators have an internet connection and use cloud infrastructure, so yes, those considerations are valid for any non strictly offline app, including embedded automotive apps.
Great post.
I wish more would invest in containers and kubernetes on day one. Kubernetes solves complex scaling pain and makes web applications a dream to make.
As an ops guy who knows how to code, I understand kubernetes. For a dev guy, learning to do ops may seem like a mountain of work to get an application up on the internet and so kubernetes is avoided. Trust me, kubernetes is as good as this post makes it. When implemented properly as this post implies, kubernetes will make your app unstoppable.
Yup – that is all handy and dandy, either we are using Kubernetes in the wrong fashion or there is a paradigm mismatch in what we are trying to accomplish as a startup. First off, we totally recognize that Kubernetes is the foundation of our data analytics and machine learning platform which we are developing and aiming to sell in public cloud market places. There are just a few problems here: 1st to deploy our app stack into in a place like AWS costs thousands of dollars a month so what do we do? We have a Hybrid cloud structure where we want to have the ability to have scripts that can deploy our platform into an on-prem kubernetes cluster or a public cloud, the problem is all the little details involved which experts of Kubernetes can understand. First its hard to get your head around the platform in the first place based on the way its designed . Some of our on-prem clusters are using local file storage for PVC while cloud providers offer their own. So the time it has taken to re-engineer or re-configure off the shelf helm charts needed for the open source software we need has been horrible! Every time we spin up a new cluster we have to re-clone an environment repo, find/replace values with cluster specific nodes and then with the mix of open source and custom developed artifacts in Kubernetes it is extremely difficult to come up with a consistent way using Jenkins to build pipelines. What do you do just send over an ssh command and hope it works? How do we get to the point of being able to have scripts and automation that can deploy and scale our core product across an on-prem or public cloud provider solution? In my humble opinion there is a lot of work to do in this space and more abstraction layers are needed to make Kubernetes less painful. I totally agree its an awesome platform, it just sucks when startups have to spend their capital and time on getting things to work within it. The goal is to be cloud agnostic, have a script that can specify CPU/Storage requirements for dozens of open source products + our own stack and at the end of the day it is confusing as hell. Does anyone else feel this pain?
Hey Max!
I enjoyed the accompanying podcast to this article!
I agree that cloud providers do a huge part of the heavy lifting, but I think it’s also important to mention that baseline cloud clusters are, for most projects, not production ready.
It is easy to get the cluster up, but there is still a full stack of applications necessary to make it observable and reliable. Different apps to satisfy requirements like
* montioring/metrics
* centralized logging
* handling external traffic (api gateway, ingress, cloud service load balancer, nodeport?)
* developer permissions and deployment process
* pod permissions
* etc
In general, I agree with you that Kubernetes still makes sense from Day 1 for a lot of teams, but even with cloud providers, it takes a significant amount of effort and continual maintenance of the cluster. Of course, the argument can be made that you need all of these things setup no matter what your deployment service/platform is, which is true, but Kubernetes seems to be a much more open space.
I’d be interested in a follow up post where you lay out what it means for your team to have a production be truly production ready. A stack like this:
* prometheus/grafana for monitoring/metrics
* fluent-bit + elastic for centralized logging
* ambassador (now named emissionary-ingress) for api gateway/loadbalancer
* cluster-autoscaler
* HPA
* Kube2iam (or equivalent for other cloud providers)
* knative
* argocd?
* etc.
I’d add a caveat to your stance that Kubernetes from Day 1 is worth it if you have at least one engineer who is very comfortable installing and debugging that production stack (or have the time to allow one of your engineers to dedicate their full time to learning it).
It has become relatively easy to provision a bare cluster that does the basics, but it is a very small part of the puzzle: make cluster scale up in number of nodes, and scale back down, enable auto dns for load balancers, install datadog agent / prometheus / container insights, integrate with the cloud provider’s persistent volume system (EBS in AWS, etc) so you can easily use and take snapshots, integrate with secrets management systems like AWS Secrets Manager because secrets are not safe in k8s, setup backup and failover to another cluster, integrate the container management with the app cloud resources management like databases and queues and buckets, get layer 7 metrics (not available out of the box), figure out what resource limits to use on each container, integrate with identity management (eg IRSA in AWS) so your pods can safely communicate with cloud privider api, and and and…
Yes you can start small without any of those, but there will be lots of manual work, lots of security holes, lots of time understanding why some behavior of the cluster is not working the way the docs seem to say it should.
It’s certainly not going to be enough to click on a few buttons and deploy your containers and marvel at how far things have come. Managed kubernetes (in whatever provider) is an important gain, but we’re far from having comoditized kubernetes.
I’d like to add all of this to my comment above!
The more I think about this, the more I think a lot of companies should start with an alternative container service – especially those not at massive scale.
I’d bet that the most productive thing to come out of the small number of companies that decides to do Kubernetes from Day 1 will not be their company’s product, but the Kubernetes tooling they’ve built around their processes that can be made available to other teams looking to get started quickly.
You should defs use Kubernetes on day1 for your complex and highly scalable web based service.
Unless you already understand Erlang, in which case, you wont be needing Kuber-anything.
How do we compare price wise/running cost if similar infrastructure is deployed on let’s say app services.
It’s not the same for everyone but I tend to agree. It’s interesting that in this blog there’s a list if everyone that k8 helps, but no mention of the most imortant person, the customer.
If you’re essentially making a website, your end user doesn’t care about your elastic scaling, they care about what it does. It just can’t make good business sense to invest up front in something you could benefit from several years down the line, if you hit that million subscriber mark. It would be like buying 20 vacant buildings as the first action of opening a shop.
Having said all that, investing in correct design so you can migrate with less pain, and understanding what that means, always make sense.
No no no. Absolutely terrible advice. Just worked with a company that has done this and it may have cost them everything.
You get to build the right architecture AFTER you build the right product, not before.
As someone who has spent the best part of the past year trying to migrate a .NET web app from hosting on a VM to Kubernetes (AKS), I’m as much envious of your achievement as I am in disagreement at your advice.
The biggest problem is the technical scope of k8s. It’s *huge*. At every turn there is another level of configuration needed. I would liken choosing k8s to walking through a door: the door is the same size as any other door, and someone tells you that when you walk through the door labelled ‘k8s’ your life will become simpler and everything will just work. So you open the door and enter. Then you find yourself in a very large hall where every wall is filled with more doors, each of them labelled with something you need to configure. You walk through the nearest and find yourself in another large hall filled with yet more doors. You walk through one of them and… it’s the resource configuration equivalent of a Mandelbrot Set.
But that shouldn’t be too much of a problem because everything is well-known and documented. Sort of. In the case of Azure, expect there to be missing documentation, too much documentation, or contradictory documentation. You can spend days going down each rabbit hole trying to determine how to configure one seemingly-niche aspect.
[Here’s one example of this contradictory information: We spent weeks (off and on) trying to figure out why there were occasional outages from our cluster. We opened a support ticket (Azure support… OUCH!) where the support ‘engineer’ was adamant that the problem was in our code base. We eventually found that it was due to our use of burstable VMs – the Azure ‘B’ series. We pointed this out to support and their response was “Never use a burstable VM for Kubernetes. It won’t work.” We showed them a webpage where they *recommended* burstable VMs as being ideal for Kubernetes… no response came back.]
And then there’s the use of YAML – the world’s least-friendliest markup language. I like the idea of having our provisioning under source control, but… YAML?
Oh, let’s not forget Helm – you’ll almost certainly need to become familiar with Helm as well – and how to configure multiple layers of resource provisioning through other tools (ARM templates > Helm > Azure CLI) all done through YAML referencing additional, often external, scripts.
This whole process might go smoothly if you have devops or sysops people onboard who are familiar with k8s. If you don’t have such people, get such people before you start. Otherwise this process will take years off your life.
And once it has been deployed and is running you’ll get – if you’ve followed best practices – tons of data metrics which you’re expected to digest and use to go back to your cluster to fine-tune its performance. Continuously.
Yes, having a solution where you can deploy your PR branches to k8s and send out a URL for testing/approval is a great use of technology, and it feels like a major step forward once you get it up and running – one worth blogging about, perhaps. But getting there is an arduous, painful journey.
I think the advice that everyone should move to k8s from day one to be quite reckless. You need training, patience, and luck. A lot of each. And if you are primarily a developer who builds web services, ask yourself if you really want to be spending a lot of time doing devops, because once you head down this rabbit hole you’ll be spending a lot of time thereafter going back down the same hole. Are you prepared to do this?
Lots of excellent points.
One thing to bear in mind, and even with my own comment (see above) it’s important to bear in mind, that you *can* start small in some cases, but not all the time.
Eg I built a system that used a very simple gitops without flux or anything, with a python script that calls kubectl and helm to do things. There was no ingress controller, no container insights etc, and it did the job. Was it worth hosting the app services in kubernetes? To some extent, yes: the containers being immutable, the templating offered by helm, and the lifecycle offered by kubernetes provided some advantages.
BUT this was basic setup and required some amount of manual work eg copying of settings between resources provisioned by terraform, and helm charts, and creating route53 entries.
Then as you say there starts to be lots of doors to look at to simplify things. Eg you can install external dns, container insights etc, and all these involve some configuration, and sometimes it is hard to figure out what is not configured properly, eg for cluster autoscaler and datadog agent right now I’m dealing with that.
Another example is helm charts can be installed from terraform, which obviates the need to pass data from terraform outputs to helm chart values files (you do this with its helm provider). I’m really not a fan of the helm template syntax so at that point it is tempting to eliminate helm and use terraform’s templating which is a lot nicer and more powerful, but I will stay away from writing k8s manifests in HCL, rather have them as files and load them with kubectl provider (which is not the same as kubernetes provider, it is for reading manifests from strings ie you can load them from templatefile() and have access to all those niceties, and then you can apply them through the provider).
So you can start someplace, and that place is easier than it was, but once you are there, yes you will need someone who specialized in kubernetes and terraform; the notion that developers can be good at those and still have time to develop a micro-services app is wrong, as soon as you get past day 1 this stops being viable (actually maybe even from day 1 because there are many concepts that will distract from their job). I did software eng for many years and I had to make a switch, I could not do both. Ironically, infrastructure as code involves quite a bit of programming, that makes me very happy.
The title is a bit misleading, it’s better named “why you should make sure your applications are easily portable to cloud for day one”. I think setting up k8s in the beginning is a lot of unnecessary overhead for POC/MVP. However once you’re past that it makes sense to invest into k8s.
Sorry, but it feels like the article is sponsored. There are so many ways to scale an application and k8s is just one of them. One should make sure that an application can actually be scaled from day one. Deploying an application with a lot of bottlenecks on a k8s won’t help no matter how many instances/containers you have. And I really doubt that solving bottlenecks/scalability issues is easier then porting a scalable application to k8s or whatever.