Q&A: Bustle Digital Group’s CTO on the Media Company’s Serverless Architecture

Bustle Digital Group is the largest premium publisher reaching millennial women, attracting over 80 million readers a month to properties like Bustle.com and Elite Daily. This past March, its CTO and co-founder Tyler Love tweeted that the company had fully adopted serverless architecture. “We serve upwards of a billion requests to 80 million people using SSR preact and react per month,” he wrote. “We are a thriving example of modern JavaScript at scale.” Our Engineering Manager Sara Chipps sat down with Love as part of a series of Q&As on how engineering teams across sectors and industries work and collaborate. They dove deeper into his background in Serverless and why it was the right decision for Bustle to fully adopt it. Sara Chipps: What benefits do you think Serverless has added to Bustle’s tech stack and culture? Tyler Love: I’ll start on the technical side. I’ve built automated infrastructure at a few places in the past. Back then, we built everything and pushed it out in Chef. This was before Docker was even a viable solution and it just felt really painful. But every time we wanted to make a change to that, or we wanted a new type of stack, we repeated a lot of work. So I think what it added to the stack was the ability to take the infrastructure code out of the equation and not have to build that same thing over and over. We've been able to focus a little more on other challenges. On the culture side, I think that's a little more nebulous of an answer. I would say we've just been able to be engineers with more of a focused skill set. The culture part is a little bit separate from any of the Serverless things. But we've found engineers that are okay with trying something new. There's not an old known, established way of doing things that you don't question or people are less willing to change. So it's helpful with that on the culture side. So you don’t need a DevOps team? We do not have a DevOps team. I think it's two things. There's less DevOps, and there's less overhead maintenance. But it also means we've all taken on a little of that. We still have code that handles everything from deployment and monitoring, and you still have to know how those things work. We're just using cloud tools to do more of it, but we still have to configure it and control where our logs end up and that sort of thing. So I've seen a spectrum of hiring strategies, and what you're saying makes me wonder if you need full stack people that are even fuller stack. Or do you hire with specialties in mind and just encourage people to be more aware of this part of the environment? That’s something that seems to be a little different from engineer to engineer. I think the most interesting part is some engineers tend to like to be focused on the micro problems and engineer really succinct solutions to those. And then some engineers like to do the more sweeping changes. I've been thinking of the differences between full stack front-end and back-end engineers a little bit less. But I kind of relate to both of them. It’s really fun to just write the simplest solution to something that seems basic, and also to zoom out so you can figure out how to configure something that's super high level. Bustle Media Group’s properties attract over 80 million visitors. When it comes to companies that are Serverless, I bet that your traffic is on the higher side. Would you say that that's accurate? Yeah, I would. As far as I know, we are the highest trafficked user-facing Serverless website. I'm pretty sure that's still true. But I'm certain that there are people that are doing internal stuff and using Serverless that's way beyond the scale that we're at. Have you run into any challenges because of that, or are you just surprised that more people aren't doing it because you don't? It was certainly challenging figuring out the user-facing side at the beginning. So much so that we took on a few risks, but discovered it wasn't possible to do a few things. Fortunately, all that stuff seems to have been resolved by now. But I don't think Amazon had intended for user-facing websites to launch with API Gateway and Lambda up front. So we wound up waiting for them to add some features and crossing our fingers that they would come. It was basic stuff, like handling response codes under certain conditions, and it was simpler stuff than you may think. It was like doing a redirect where you could wire up redirects when they launched it, but you couldn't pass the URL that it was supposed to redirect to from your function. So we had to hack some really wild things like that. Wow, that's wild. I compare a lot of our experience to how it changed DevOps. That's what really changed. We're still just writing business logic in JavaScript. The other end was monitoring and logging, so we didn't know what we were going to do there. There weren't any conventions, best practices, or any plug-and-play tools. But all of the logs are showing up somewhere, so we took on a lot of that ourselves. That wound up being a really nice solution. All of our logs by default in Amazon go to CloudWatch. We just have a function that's completely isolated from our app that fans it out to any service that we choose. So it's really just a matter of us taking a look at the SDK, or whatever the API is that we want to send the logs to. We figure out if we want to sample it or send all of them there, and we just write what winds up being a function to adopt any monitoring tool. When we got started, that solution was scary and challenging. We thought, "How do we see stack traces?" And we wound up with something that was a little bit better than what we started with. That's neat. Do you store your logs in the same places? We just kind of built all that from the ground up as we moved over to Serverless. We have CloudWatch that parses, and we draw the basic graphs that you would want to see, which replace things like CPU and memory utilization. Now we’re interested in execution time, monitoring cold starts, and the impact they have. We adopted things like Scalyr, where the bulk of our logs go if we actually need full tech search. There are a million monitoring products. I haven't heard of Scalyr. It was easy to adopt. We also have Sentry, which is another one for application crashes. That's where our stack traces wind up for a 500 error. So between those two tools, we're able to get all the things we ever wanted from logging and monitoring. We’re able to reproduce bugs and issues that are happening in production and development easily so we can fix them. We can also say, “If I write this log statement and the application here, I know how to build a graph out of it, or see the volume, or just see what's happening in production.” Do you see an end in sight? Is there something in front of you where you say, "We’re no longer going to be able to function like this?" Or are you in a place that you're happy with how the infrastructure is set up? I'm pretty happy with where it's set up. I think the next scaling concerns we have are partitioning our data store sometime in the next year or two, and that unravels a new set of problems. Serverless has kind of taken that out of the equation. I'll be honest, though. For our specific workload, we're a read-heavy website. We still do a lot of writes for data, but we don't have to scale those things with each other. That makes sense. Now our data set's getting larger, so our scaling problems have never been particularly challenging when it comes to just serving HTTP requests that are user-generated. I guess it's funny to say, though. A decade ago, thinking about the amount of traffic we have would've been terrifying to most people. Now I don't think any relatively competent group of engineers could probably figure it out with all sorts of different techs. That transitions well to one remaining question. What stresses you out and is keeping you up at night? This is the fun part of what I get to do. What keeps me up at night is always making sure there are engaging, fun projects for engineers to work on. So, knowing what we need as a business, then contextualizing that into engineering challenges and problems that are going to really engage engineers. And I think you need to understand the context for the problem and what engineering skill set is actually going be required to solve that. I think that's an important part of the culture that I am trying to build on the team here. We’re a relatively small engineering team, so keeping 15 to 20 people busy, feeling like they're learning something, and also making a business impact is pretty hard to do. I think because of the tech that we've chosen, you have the freedom to experiment. Then, the things that are menial or repetitive, we've kind of solved with the tech stack that we've chosen. So I think GraphQL and Serverless have made it so you're not configuring a web server to do something different in a way. It's kept us focused on the business logic and the actual code you write to sell something, which is fun. Learn how Stack Overflow can help engineering teams of all sizes (from 10 to 10,000) improve productivity. This interview has been edited and condensed for clarity.

Add to the discussion