\u003C/a>\u003C/figure>\u003C/div>\n\u003C!-- /wp:image -->\n\n\u003C!-- wp:heading -->\n\u003Ch2 id=\"h-channeling-feedback-into-better-runbooks\">Channeling feedback into better runbooks\u003C/h2>\n\u003C!-- /wp:heading -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>Here’s where the feedback loop begins. \u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>At some point, an ops person is going to get an alert that is insufficiently documented and needs to be escalated to the developers. Once you go through every step on the runbook and exhaust every idea you have, it’s time to talk to the folks who wrote the code. \u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>This is the first point of feedback: the ops engineer reads the runbook and tries to implement the process documented therein. The dev may have documented every alert, but this is where the rubber hits the road; if a runbook doesn’t fix a problem when it says it does, the ops person should correct it immediately or at least identify the problem. When the ops person escalates to the developer, that’s the start of a feedback loop. \u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>If a problem does get escalated, then that should trigger an update to the doc. Whether it’s the ops engineer adding what they were unsure about, noting the use case that triggered the escalation, or the developer augmenting the bullet list, the end result makes operations more self-sufficient and reduces future escalations. Maybe your clever ops engineer wrote a shell script that bundles ten bullets into a single command; edit the runbook and include it. \u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>And yet, sometimes you run into unknown or unpredictable problems. At a previous company, I got a runbook that was just one bulleted item: “This should never happen. If it does, call the developers.” I spoke with the developer and, without accusing them of being lazy, asked if this was the best runbook entry they could make? It turned out it was! They were monitoring for an assertion that should never happen, but if it ever did happen, they wanted to know. It was the kind of situation where a fix could only be determined after the first time it happened. \u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>It is important to remove speed bumps to fixing and updating runbooks. Management should create a culture where everyone is expected and encouraged to edit frequently, in real-time, not at the end of a project. I recommend that ops engineers go one step further. When you read a runbook, do it in edit mode. It removes the temptation to procrastinate about updates. You may think, 'I’m in the middle of addressing a problem in production, I'll come back later and edit this.' No, you're never going to come back. There's never a later. If you're in edit mode, you can fix that broken comma, paste in an updated command, or at least make a note that something needs improvement. \u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>If a step doesn’t work or you’re not sure what does, still put in the edit. These are living documents, so not every edit has to be perfect. Put in a note to self or the developer: \"that last statement is confusing\" or \"bullet three didn’t work.\" At least the next person that sees this will know to be careful when they get to bullet item number three. Identifying the problem is better than silence.\u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>The feedback loop gives developers control over how often they are escalated to. If the devs feel like they’re being escalated to too much, well, physician, heal thyself. Developers can reduce future escalations by improving the document. Either add those procedures that would have kept ops from interrupting your dinner or put in deliberate escalation calls. \u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>The feedback loop gives operations the confidence to escalate without feeling like they are nagging or giving up too soon. Operations can be hesitant to escalate an issue for fear of creating a “boy who cried wolf” situation or that they will look stupid for not knowing how to handle a situation. Instead, operations can “show their homework” to justify their escalation. It is difficult to be upset that your dinner was interrupted when the operations person calling you can clearly show they followed the written procedure and the results of each step.\u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>The benefit to this feedback loop is that it gives you as much or as little documentation as the process needs, and as many or as few escalations as needed. It empowers the developers to fix the problem of getting too many escalations.\u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:heading -->\n\u003Ch2 id=\"h-share-don-t-hoard\">Share, don’t hoard\u003C/h2>\n\u003C!-- /wp:heading -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>Everyone should want to build an organization where information is shared, and that sharing is rewarded.\u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>I once worked at a company where one person was celebrated because they were the most knowledgeable at handling every kind of on-call situation. In hindsight, this was a red flag. Why was one person so much better than everyone else? Are we allowed to have crappy service the weeks other people are on call? I learned from that experience. \u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>I now prefer to celebrate the people that share their knowledge so that everyone is great at on-call duty. We should honor those with more experience, but celebrate the people that share their knowledge in ways that empower everyone to be better. This includes everything from one-on-one teaching, to answering questions on Stack Overflow, to writing runbooks. People that do that enable excellence no matter whose turn it is to carry the pager.\u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>We can reframe this in terms of power dynamics. The old school way to maintain power and influence at a company was to hoard information. I remember admiring a person at a previous company because of what information they held. Need a [insert technical task] done? They’re the only person that knows how to do it. Visit their office. Show your respect. Prostrate to their higher mind. If they deem you are worthy, they will do the tasks for you. If you want a smoothly run company, toss that toxic attitude into the dustbin of history.\u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>The new way is the opposite. Power comes from how much you give away. We admire the person that shares their knowledge: the person that teaches people how to do things instead of doing things for them; that documents constantly, not at the end of the project; that is generous with their knowledge in everything they do. They are powerful because everyone points to them and says, “that’s the person that made me successful!”\u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:heading -->\n\u003Ch2 id=\"h-easily-transferable-knowledge\">Easily transferable knowledge\u003C/h2>\n\u003C!-- /wp:heading -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>Here at Stack Overflow, we have a collection in our Teams instance that contains all of our runbooks and related procedures. We push all of our developers to use this feedback process with the docs. The format—articles and questions that anyone can comment on or edit—makes feedback easy. The most commonly used runbooks are heavily edited and refined. Less recently edited runbooks are easy to identify and update. The level of effort reflects the need.\u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>Then there are the knock-on effects, bonus efficiencies that come from having field-tested knowledge easily available. The technologies and skills mentioned in the runbooks inform us when writing the job advertisement for new operational staff. Once hired, runbooks can be used as a training tool. Newly hired operations people can be walked through each runbook, or use the runbooks as a self-study aide. Once they have reviewed all the runbooks, they are ready for on-call. \u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>Key to all of these feedback loops is the ease in which everyone—developers, SREs, and new hires—can ask questions, raise issues, and offer suggestions for better solutions. Sometimes the anxiety of being seen as less knowledgeable can prevent people from jumping in and commenting. You can alleviate that by explicitly providing a space for questions in every runbook. Even better, you can use a platform like Stack Overflow for Teams that is designed to gather business-critical information from questions and answers. \u003C/p>\n\u003C!-- /wp:paragraph -->\n\n\u003C!-- wp:paragraph -->\n\u003Cp>A good feedback loop won’t solve every problem in the on-call process. But it will make it a smoother one, from debugging an issue that pops up in production all the way through hiring and training and onboarding. When done well, it’s a very effective way to improve your organization through small, ingrained processes. \u003C/p>\n\u003C!-- /wp:paragraph -->","html","2021-12-21T15:00:23.000Z",{"current":443},"creating-a-good-feedback-loop-between-ops-and-devs-using-documentation",[445,453,456,460],{"_createdAt":446,"_id":447,"_rev":448,"_type":449,"_updatedAt":446,"slug":450,"title":452},"2023-05-23T16:43:21Z","wp-tagcat-code-for-a-living","9HpbCsT2tq0xwozQfkc4ih","blogTag",{"current":451},"code-for-a-living","Code for a Living",{"_createdAt":446,"_id":454,"_rev":448,"_type":449,"_updatedAt":446,"slug":455,"title":70},"wp-tagcat-devops",{"current":70},{"_createdAt":446,"_id":457,"_rev":448,"_type":449,"_updatedAt":446,"slug":458,"title":459},"wp-tagcat-documentation",{"current":459},"documentation",{"_createdAt":446,"_id":461,"_rev":448,"_type":449,"_updatedAt":446,"slug":462,"title":463},"wp-tagcat-sre",{"current":463},"sre","“This should never happen. If it does, call the developers.”",[466,472,478,484],{"_id":467,"publishedAt":468,"slug":469,"sponsored":12,"title":471},"65472515-0b62-40d1-8b79-a62bdd2f508a","2025-08-25T16:00:00.000Z",{"_type":10,"current":470},"making-continuous-learning-work-at-work","Making continuous learning work at work",{"_id":473,"publishedAt":474,"slug":475,"sponsored":12,"title":477},"1b0bdf8c-5558-4631-80ca-40cb8e54b571","2025-08-21T14:00:25.054Z",{"_type":10,"current":476},"research-roadmap-update-august-2025","Research roadmap update, August 2025",{"_id":479,"publishedAt":480,"slug":481,"sponsored":12,"title":483},"5ff6f77f-c459-4080-b0fa-4091583af1ac","2025-08-20T14:00:00.000Z",{"_type":10,"current":482},"documents-the-architect-s-programming-language","Documents: The architect’s programming language",{"_id":16,"publishedAt":17,"slug":485,"sponsored":12,"title":20},{"_type":10,"current":19},{"count":487,"lastTimestamp":488},26,"2023-05-25T09:47:43Z",["Reactive",490],{"$sarticleModal":437},["Set"],["ShallowReactive",493],{"sanity-Ww_VovdzYZh2DQ4xZaNmuDMh9-5SYFmv2_iv-AjSuJk":-1,"sanity-comment-wp-post-17644-1756129967968":-1},"/2021/12/21/creating-a-good-feedback-loop-between-ops-and-devs-using-documentation/?cb=1"]