How chaos engineering preps developers for the ultimate game day (Ep. 531)

In complex service-oriented architectures, failure can happen in individual servers and containers, then cascade through your system. Good engineering takes into account possible failures. But how do you test whether a solution actually mitigates failures without risking the ire of your customers? That’s where chaos engineering comes in, injecting failures and uncertainty into complex systems so your team can see where your architecture breaks.

On this sponsored episode, our fourth in the series with Intuit, Ben and Ryan chat with Deepthi Panthula, Senior Product Manager, and Shan Anwar, Principal Software Engineer, both of Intuit, about using self-serve chaos engineering tools to control the blast radius of failures, how game day tests and drills keep their systems resilient, and how their investment in open-source software powers their program.

Episode notes:

Sometimes old practices work in new environments. The Intuit team uses Failure Mode Effect Analysis, (FMEA), a procedure developed by the US military in 1949, to ensure that their developers understand possible points of failure before code makes it to production.

The team uses Litmus Chaos to inject failures into their Kubernetes-based system and power their chaos engineering efforts. It’s open source and maintained by Intuit and others.

If you’ve been following this series, you’d know that Intuit is a big fan of open-source software. Special shout out to Argo Workflow, which makes their compute-intensive Kubernetes jobs work much smoother.

Connect on LinkedIn with Deepthi Panthula and Zeeshan (Shan) Anwar.If you want to see what Stack Overflow users are saying about chaos engineering, check out Chaos engineering best practice, asked by User NingLee two years ago.

TRANSCRIPT

How chaos engineering preps developers for the ultimate game day (Ep. 531)

SPONSORED BY INTUIT

Add to the discussion