The following is a chapter summary for “The DevOps Handbook” by Gene Kim, Jez Humble, John Willis, and Patrick DeBois for an online book club.
The book club is a weekly lunchtime meeting of technology professionals. As a group, the book club selects, reads, and discuss books related to our profession. Participants are uplifted via group discussion of foundational principles & novel innovations. Attendees do not need to read the book to participate.
Background on The DevOps Handbook
More than ever, the effective management of technology is critical for business competitiveness. For decades, technology leaders have struggled to balance agility, reliability, and security. The consequences of failure have never been greater―whether it’s the healthcare.gov debacle, cardholder data breaches, or missing the boat with Big Data in the cloud.
And yet, high performers using DevOps principles, such as Google, Amazon, Facebook, Etsy, and Netflix, are routinely and reliably deploying code into production hundreds, or even thousands, of times per day.
Following in the footsteps of The Phoenix Project, The DevOps Handbook shows leaders how to replicate these incredible outcomes, by showing how to integrate Product Management, Development, QA, IT Operations, and Information Security to elevate your company and win in the marketplace.The DevOps Handbook
The Third Way focuses on creating a culture of continual learning and experimentation. The goal is to create a high-trust culture, reinforcing that everyone is a lifelong learner who must take risks in daily work.
Applying a scientific approach to both process improvement and product development will result in learning from successes and failures; the learnings will also help identifying which ideas don’t work and reinforcing those ideas that do work.
Aspects of positive work culture:
- Reserve time for the improvement of daily work and to ensure learning.
- Consistently introduce stress into applications & infrastructure to force continual improvement.
- Simulate and inject failures in production services under controlled conditions to increase resilience.
Enabling Organizational Learning and a Safety Culture
When accidents affect customers, teams should seek to understand why it happened. The root cause is often deemed to be human error, and the all too common management response is to “name, blame, and shame” the person who caused the problem.
Three types of culture (Dr. Ron Westrum):
Pathological organizations are characterized by large amounts of fear and threat. People often hoard information, withhold it for political reasons, or distort it to make themselves look better. Failure is often hidden.
Bureaucratic organizations are characterized by rules and processes, often to help individual departments maintain their “turf.” Failure is processed through a system of judgment, resulting in either punishment or justice and mercy.
Generative organizations are characterized by actively seeking and sharing information to better enable the organization to achieve its mission. Responsibilities are shared throughout the value stream, and failure results in reflection and genuine inquiry.Dr. Ron Westrum
In the technology value stream, establish a generative culture by creating a safe system of work. When accidents and failures occur, instead of looking for human error look for how the team can redesign the system to prevent the accident from happening again.
For instance, the team may conduct a blameless post-mortem after every incident to gain the best understanding of how the accident occurred and agree upon what the best countermeasures are to improve the system, ideally preventing the problem from occurring again and enabling faster detection and recovery.
Institutionalize the Improvements of Daily Work
Teams are often not able or not willing to improve the processes they operate within. The result is they continue to suffer from their current problems and their suffering grows worse over time.
In the technology value stream, when teams avoid fixing their problems and instead rely on daily workarounds, their problems and technical debt accumulates until all they do is perform workarounds in an attempt to avoid disaster, with no cycles left over for doing productive work.
Daily work is improved by explicitly reserving time to pay down technical debt, fix defects, and refactor / improve problematic areas of code as well as environments. Cycles in each development interval must be reserved for this work, or teams should schedule kaizen blitzes, which are periods when engineers self-organize into teams to work on fixing any problem they want.
In the technology value stream, as the team makes their system of work safer, they find and fix problems from ever weaker failure signals.
Transform Local Discoveries Into Global Improvements
When new learnings are discovered locally, there must also be some mechanism to enable the rest of the organization to use and benefit from that knowledge.
When teams or individuals have experiences that create expertise, the goal should be to convert that knowledge into explicit, codified knowledge, which becomes someone else’s expertise through practice.
Inject Resilience Patterns Into Daily Work
The process of applying stress to increase resilience was named “antifragility” by author and risk analyst Nassim Nicholas Taleb.
In the technology value stream, teams can introduce the same type of tension into systems by seeking to always reduce deployment lead times, increase test coverage, decrease test execution times, and re-architecting if necessary to increase developer productivity or increase reliability.
Leaders Reinforce a Learning Culture
Traditionally, leaders were expected to be responsible for setting objectives, allocating people & resources to achieve those objectives, along with establishing the right combination of incentives.
Greatness is not achieved by leaders making all the right decisions, instead the leader’s role is to create the conditions so their team can discover greatness in their daily work. Leaders must elevate the value of learning and disciplined problem-solving.
Target conditions frame the scientific experiment: explicitly state the problem the team is seeking to solve, generate a hypothesis of how the proposed countermeasure will solve it, describe the methods for testing that hypothesis, write the interpretation of the results, and use of learnings to inform the next iteration.
- What was your last step and what happened?
- What did you learn?
- What is your condition now?
- What is your next target condition?
- What obstacle are you working on now?
- What is your next step?
- What is your expected outcome?
- When can we check?
The principles of the Third Way address the need for valuing organizational learning, enabling high trust, accepting that failures will always occur in complex systems, and making it acceptable to talk about problems to create a safe system of work.
The Third Way also requires making the improvement of daily work part of the institution’s culture by converting local learnings into global learnings that can be used by the entire organization, as well as continually injecting tension into daily work.