The following is a chapter summary for “The DevOps Handbook” by Gene Kim, Jez Humble, John Willis, and Patrick DeBois for an online book club.
The book club is a weekly lunchtime meeting of technology professionals. As a group, the book club selects, reads, and discuss books related to our profession. Participants are uplifted via group discussion of foundational principles & novel innovations. Attendees do not need to read the book to participate.
Background on The DevOps Handbook
More than ever, the effective management of technology is critical for business competitiveness. For decades, technology leaders have struggled to balance agility, reliability, and security. The consequences of failure have never been greater―whether it’s the healthcare.gov debacle, cardholder data breaches, or missing the boat with Big Data in the cloud.
And yet, high performers using DevOps principles, such as Google, Amazon, Facebook, Etsy, and Netflix, are routinely and reliably deploying code into production hundreds, or even thousands, of times per day.
Following in the footsteps of The Phoenix Project, The DevOps Handbook shows leaders how to replicate these incredible outcomes, by showing how to integrate Product Management, Development, QA, IT Operations, and Information Security to elevate your company and win in the marketplace.The DevOps Handbook
The theme of this section is enabling Development and Operations to reduce the risk of production changes before they are made.
The peer review process at GitHub is an example of how inspection can increase quality, make deployments safe, and be integrated into the flow of everyone’s daily work. They pioneered the process called “pull request”, one of the most popular forms of peer review that span Dev and Ops. Once a pull request is sent, interested parties can review the set of changes, discuss potential modifications, and even push follow-up commits if necessary.
At GitHub, pull requests are the mechanism used to deploy code into production through a collective set of practices called “GitHub Flow”. The process is how engineers request code reviews, integrate feedback, and declare that code will be deployed to production.
GitHub Flow consists of five steps:
- To work on something new, the engineer creates a descriptively named branch off of master.
- The engineer commits to that branch locally, regularly pushing their work to the same named branch on the server.
- When they need feedback or help, or when they think the branch is ready for merging, they open a pull request.
- When they get their desired reviews and get any necessary approvals of the feature, the engineer can then merge it into master.
- Once the code changes are merged and pushed to master, the engineer deploys them into production.
The Dangers of the Change Approval Process
When high-profile deployment incidents occur, there are typically two responses. The first narrative is that the accident was due to a change control failure, which seems valid because of a situation where better change control practices could have detected the risk earlier and prevented the change from going into production. The second narrative is that the accident was due to a testing failure.
The reality is that in environments with low-trust, command-and-control cultures, the outcomes of these types of change control and testing countermeasures often result in an increased likelihood that problems will occur again.
Potential Dangers of “Overly Controlling Changes”
Traditional change controls can lead to unintended outcomes, such as contributing to long lead times, and reducing the strength and immediacy of feedback from the deployment process.
Common controls include:
- Adding more questions that need to be answered to the change request form.
- Requiring more authorizations, such as one more level of management approval or more stakeholders.
- Requiring more lead time for change approvals so that change requests can be properly evaluated.
Enable Coordination and Scheduling of Changes
Whenever multiple groups work on systems that share dependencies, changes will likely need to be coordinated to ensure that they don’t interfere with each other. For more complex organizations and organizations with more tightly-coupled architectures, teams may need to deliberately schedule changes, where representatives from the teams get together, not to authorize changes, but to schedule and sequence their changes in order to minimize accidents.
Enable Peer Review of Changes
Instead of requiring approval from an external body prior to deployment, require engineers to get peer reviews of their changes. The goal is to find errors by having fellow engineers close to the work scrutinize changes.
This review improves the quality of changes, which also creates the benefits of cross-training, peer learning, and skill improvement. A logical place to require reviews is prior to committing code to trunk in source control, where changes could potentially have a team-wide or global impact.
The principle of small batch sizes also applies to code reviews. The larger the size of the change that needs to be reviewed, the longer it takes to understand and the larger the burden on the reviewing engineer.
“There is a non-linear relationship between the size of the change and the potential risk of integrating that change—when you go from a ten line code change to a one hundred line code, the risk of something going wrong is more than ten times higher, and so forth.”Randy Sharp
“Ask a programmer to review ten lines of code, he’ll find ten issues. Ask him to do five hundred lines, and he’ll say it looks good.”Giray Özil
Guidelines for Code Reviews include:
- Everyone must have someone to review their changes before committing to trunk.
- Everyone should monitor the commit stream of their fellow team members so that potential conflicts can be identified and reviewed.
- Define which changes qualify as high risk and may require review from a designated subject matter expert.
- If someone submits a change that is too large to reason about easily, then it should be split up into multiple, smaller changes that can be understood at a glance.
Code Review Formats:
- Pair programming: programmers work in pairs.
- “Over-the-shoulder”: One developer looks over the author’s shoulder as the latter walks through the code.
- Email pass-around: A source code management system emails code to reviewers automatically after the code is checked in.
- Tool-assisted code review: Authors and reviewers use specialized tools designed for peer code review or facilities provided by the source code repositories.
Potential Danger of Doing More Manual Testing and Change Freezes
When testing failures occur, the typical reaction is to do more testing. This is true if performing manual testing, because manual testing is naturally slower and more tedious than automated testing.
Manual testing often has the consequence of taking significantly longer to test, which means deploying less frequently, thus increasing the deployment batch size. Instead of performing testing on large batches of changes that are scheduled around change freeze periods, fully integrate testing into daily work as part of the smooth and continual flow into production.
Enable Pair Programming to Improve Changes
Pair programming is when two engineers work together at the same workstation, a method popularized by Extreme Programming and Agile in the early 2000s.
In one common pattern of pairing, one engineer fills the role of the driver, the person who actually writes the code, while the other engineer acts as the navigator, observer, or pointer, the person who reviews the work as it is being performed. The driver focuses their attention on the tactical aspects of completing the task, using the observer as a safety net and guide.
Dr. Laurie Williams performed a study in 2001 that showed “paired programmers are 15% slower than two independent individual programmers, while ‘error-free’ code increased from 70% to 85%.”
“Pairs typically consider more design alternatives than programmers working alone and arrive at simpler, more maintainable designs; they also catch design defects early.“Dr Laurie Williams
Pair programming has the additional benefit of spreading knowledge throughout the organization and increasing information flow within the team.
Evaluating the Effectiveness of Pull Request Process
One method to evaluate the effectiveness of peer review is to look at production outages and examine the peer review process for any relevant changes.
Ryan Tomayko, CIO and co-founder of GitHub:
- “A bad pull request is one that doesn’t have enough context for the reader, having little or no documentation of what the change is intended to do.”
- “A great pull request has sufficient detail on why the change is being made, how the change was made, as well as any identified risks and resulting countermeasures.”
Fearlessly Cut Bureaucratic Process
Many companies still have long-standing processes for approval that require months to navigate. These approval processes can significantly increase lead times, not only preventing teams from delivering value quickly to customers, but potentially increasing the risk to our organizational objectives.
“A great metric to publish widely is how many meetings and work tickets are mandatory to perform a release—the goal is to relentlessly reduce the effort required for engineers to perform work and deliver it to the customer.”Adrian Cockcroft
By implementing feedback loops teams can enable everyone to work together toward shared goals, see problems as they occur, and ensure that features not only operate as designed in production, but also achieve organizational goals and organizational learning.