In the first decade of the 2000s, we saw a shift from using physical hardware to virtualization. This shift was driven by increases in clock frequencies and number of cores in modern CPUs. It was no longer cost effective to run a single application on a single physical server. Both open source and commercial hypervisor solutions started gaining tremendous popularity in the server market.
Today we’re seeing another shift from virtual machines to containers. Containers are virtual runtime environments running on top of the operating system kernel that emulates the operating system itself. The difference between virtual machines and containers is the virtual machines emulate hardware and containers—running on top of a VM—emulate just the operating system.
Operating system emulation has numerous benefits. Processes running in containers are isolated from each other. Containers can be started and terminated quickly as their footprint or size is typically minimal, making them a great choice for fast scaling.
Virtual machines and containers allow software to better utilize purchased hardware. Multiple applications can run concurrently on the same hardware. For finance departments, high utilization means effective use of capital expenditures.
That’s where the serverless model comes in. Serverless computing describes an application architecture designed as a collection of managed cloud services. Managed services like AWS Lambda, Aurora, or CloudFront provide high added value and allow you to build your applications quickly. You spend your time focusing on developing your business logic and not fiddling with infrastructure, like making sure your database operates as a cluster. The serverless application model helps you spin up your application stack and easily organize and manage the state of cloud resources associated with your application.
Serverless computing was made possible by significant technological advancements in container space. Cloud providers striving to reduce overhead and allow fast provisioning of resources in their platforms quickly found out it was necessary to develop new types of hardware accelerated hypervisors.
The combination of hardware-accelerated hypervisors like AWS Nitro and virtual machine monitors like AWS Firecracker allowed fast deployment and execution of container environments. Thanks to this new generation of hardware accelerated hypervisors, compute services like AWS Lambda now offer much higher CPU, disk, and network I/O performance.
A cost-effective cloud environment is all about high utilization of provisioned resources. Serverless computing promises that you’re always running your applications at 100% utilization. When running your applications on serverless, your provisioning is always perfect because you only pay for machine time you actually used.
Serverless compute scales well with your incoming traffic compared to classic compute services like EC2. Scaling with classic EC2 typically involves another service called auto scaling. Auto scaling itself is a complicated feature that handles tracking select metrics like CPU load or network utilization of your virtual machines. Auto scaling uses these metrics to trigger alarms to scale your EC2 instance fleet up or down. As you can tell, EC2 fleet scaling involves a lot of steps, and it’s really easy to miss something because every application is quite different.
With serverless, you can forget all that complexity. Serverless application scaling literally just copies the final container to select hypervisors, executes it, and routes all incoming events to it. If your serverless application fits into container size and runtime memory limits, you’re good to go and your application will scale just right. This is enabled by containers and hypervisors that run closer to hardware. Serverless has very little to no operating system overhead because it’s running just kernel code and the application in its containers.
Scaling is something serverless excels at, but it also has some limits. Serverless is a perfect compute model for event-based processing. Most workloads on the web are event-based or can be converted to it. Currently, serverless doesn’t work well with long-running processes because AWS Lambda and other cloud services have execution limits. Classic VM compute or other types of container-based compute are still better suited for long running processes. As long as your workload can be event-driven and processing an event is relatively quick with low resource requirements, serverless should work well for you. None of the current serverless limits seem to be permanent and service offerings should develop and improve over time. A limit that’s been a problem yesterday may not even exist tomorrow.
Finally, the most important thing: cost. How much cheaper is it to develop and run serverless applications? I’d say it depends on the application. When your applications scale well, they will provide better value for you and your users. You’ll run them cheaper. Serverless helps by eliminating overprovisioning and overpaying for compute resources you don’t utilize. While serverless improves scaling, other billable utilization still applies—you should still think about how much bandwidth, storage, and database volume you use. Serverless will not help you better utilize those resources at the moment, that’s still up to you and your application.
You may have read posts around the internet about serverless and vendor lock-in. I’d say you can still control those aspects in the architecture of your applications. Serverless computing has numerous benefits—great performance, easy setup, and quick development and deployment. This is why serverless became an important tool in my toolbox for building high performance applications that scale well while maintaining low cost. If you haven't tried serveless yet, you should. Start by exploring the serverless application model.
Editor's note: For those curious about the serverless offerings outside of those mentioned in this article, check out AWS Lambda, Azure functions and Google Cloud Functions. h/t to Chris in the comments.