Red Green Refactor

Book Club: The DevOps Handbook (Chapter 4. The Third Way: The Principles of Continual Learning and Experimentation)

This entry is in the series DevOps Handbook

The following is a chapter summary for “The DevOps Handbook” by Gene Kim, Jez Humble, John Willis, and Patrick DeBois for an online book club.

The book club is a weekly lunchtime meeting of technology professionals. As a group, the book club selects, reads, and discuss books related to our profession. Participants are uplifted via group discussion of foundational principles & novel innovations. Attendees do not need to read the book to participate.

Background on The DevOps Handbook

More than ever, the effective management of technology is critical for business competitiveness. For decades, technology leaders have struggled to balance agility, reliability, and security. The consequences of failure have never been greater―whether it’s the healthcare.gov debacle, cardholder data breaches, or missing the boat with Big Data in the cloud.
And yet, high performers using DevOps principles, such as Google, Amazon, Facebook, Etsy, and Netflix, are routinely and reliably deploying code into production hundreds, or even thousands, of times per day.
Following in the footsteps of The Phoenix Project, The DevOps Handbook shows leaders how to replicate these incredible outcomes, by showing how to integrate Product Management, Development, QA, IT Operations, and Information Security to elevate your company and win in the marketplace.
The DevOps Handbook

Chapter 4

The Third Way focuses on creating a culture of continual learning and experimentation. The goal is to create a high-trust culture, reinforcing that everyone is a lifelong learner who must take risks in daily work.

Applying a scientific approach to both process improvement and product development will result in learning from successes and failures; the learnings will also help identifying which ideas don’t work and reinforcing those ideas that do work.

Aspects of positive work culture:

Reserve time for the improvement of daily work and to ensure learning.
Consistently introduce stress into applications & infrastructure to force continual improvement.
Simulate and inject failures in production services under controlled conditions to increase resilience.

Enabling Organizational Learning and a Safety Culture

When accidents affect customers, teams should seek to understand why it happened. The root cause is often deemed to be human error, and the all too common management response is to “name, blame, and shame” the person who caused the problem.

Three types of culture (Dr. Ron Westrum):

Pathological organizations are characterized by large amounts of fear and threat. People often hoard information, withhold it for political reasons, or distort it to make themselves look better. Failure is often hidden.
Bureaucratic organizations are characterized by rules and processes, often to help individual departments maintain their “turf.” Failure is processed through a system of judgment, resulting in either punishment or justice and mercy.
Generative organizations are characterized by actively seeking and sharing information to better enable the organization to achieve its mission. Responsibilities are shared throughout the value stream, and failure results in reflection and genuine inquiry.
Dr. Ron Westrum

In the technology value stream, establish a generative culture by creating a safe system of work. When accidents and failures occur, instead of looking for human error look for how the team can redesign the system to prevent the accident from happening again.

For instance, the team may conduct a blameless post-mortem after every incident to gain the best understanding of how the accident occurred and agree upon what the best countermeasures are to improve the system, ideally preventing the problem from occurring again and enabling faster detection and recovery.

Institutionalize the Improvements of Daily Work

Teams are often not able or not willing to improve the processes they operate within. The result is they continue to suffer from their current problems and their suffering grows worse over time.

In the technology value stream, when teams avoid fixing their problems and instead rely on daily workarounds, their problems and technical debt accumulates until all they do is perform workarounds in an attempt to avoid disaster, with no cycles left over for doing productive work.

Daily work is improved by explicitly reserving time to pay down technical debt, fix defects, and refactor / improve problematic areas of code as well as environments. Cycles in each development interval must be reserved for this work, or teams should schedule kaizen blitzes, which are periods when engineers self-organize into teams to work on fixing any problem they want.

In the technology value stream, as the team makes their system of work safer, they find and fix problems from ever weaker failure signals.

Transform Local Discoveries Into Global Improvements

When new learnings are discovered locally, there must also be some mechanism to enable the rest of the organization to use and benefit from that knowledge.

When teams or individuals have experiences that create expertise, the goal should be to convert that knowledge into explicit, codified knowledge, which becomes someone else’s expertise through practice.

Inject Resilience Patterns Into Daily Work

The process of applying stress to increase resilience was named “antifragility” by author and risk analyst Nassim Nicholas Taleb.

In the technology value stream, teams can introduce the same type of tension into systems by seeking to always reduce deployment lead times, increase test coverage, decrease test execution times, and re-architecting if necessary to increase developer productivity or increase reliability.

Leaders Reinforce a Learning Culture

Traditionally, leaders were expected to be responsible for setting objectives, allocating people & resources to achieve those objectives, along with establishing the right combination of incentives.

Greatness is not achieved by leaders making all the right decisions, instead the leader’s role is to create the conditions so their team can discover greatness in their daily work. Leaders must elevate the value of learning and disciplined problem-solving.

Target conditions frame the scientific experiment: explicitly state the problem the team is seeking to solve, generate a hypothesis of how the proposed countermeasure will solve it, describe the methods for testing that hypothesis, write the interpretation of the results, and use of learnings to inform the next iteration.

Leader’s Questions

What was your last step and what happened?
What did you learn?
What is your condition now?
What is your next target condition?
What obstacle are you working on now?
What is your next step?
What is your expected outcome?
When can we check?

Conclusion

The principles of the Third Way address the need for valuing organizational learning, enabling high trust, accepting that failures will always occur in complex systems, and making it acceptable to talk about problems to create a safe system of work.

The Third Way also requires making the improvement of daily work part of the institution’s culture by converting local learnings into global learnings that can be used by the entire organization, as well as continually injecting tension into daily work.

From the Pipeline v27.0

This entry is in the series From the Pipeline

The following will be a regular feature where we share articles, podcasts, and webinars of interest from the web.

The Importance of Feature Flags in CI/CD

The DevOps Research and Assessment group (DORA) measures four things: frequency of deployment, lead time for change, mean time to repair, and change failure rate. Elite performers deploy many times per day. Feature flags allow teams to move fast and break nothing. Progressive Delivery means releasing to a subset of the user base then gradually expand once confirmation the release is successful. Feature Flags in the delivery pipeline lend visibility into the configuration for each release with the capability to include performance-related metrics.

Top 5 Trending Test Automation Actions

TestProject maintains a library of over 1,500 automated actions shared with the community as Addons. The community recently ranked the actions by usage: (1) Click If Visible, (2) Click (using Javascript), (3) HTTP Get Request, (4) Get CSS Value, and (5) Compare Image with UI Element. The last ranked item is used for visual testing. The click using Javascript action helps when sometimes WebDriver has difficulty interacting with an element. The HTTP Get Request provides standard request methods used in API testing. The Click If Visible action is leveraged to more closely mimic the user experience. Finally the Get CSS Value enables to automation to probe specific CSS properties.

Microsoft revealed the latest truths about working from home. One is truly disturbing

Microsoft looked into the year of working from home and found some interesting facts. For one, the share of IMs being sent increased by 53% between 6pm and midnight. During the pandemic, most IT leaders described themselves as thriving, yet workers don’t share the same sentiment. Microsoft says 37% of employees say companies are making them work too hard. As the potential for return-to-work or hybrid models loom, there will be another disruption as the workforce changes gears again.

Predicting Security Vulnerabilities with Behavioral Code Analysis

Security vulnerabilities correlate with low code health, development hotspots, and a high author churn in the organization. In the article, Code Scene argues that code quality is as much as technical issues as it is a business issue. Low code health leads to technical debt, which consume development resources. Low code health also leads to a higher number of total security errors. In general, the more experience a team has i nthe domain and codebase, the fewer security errors. Code health is an aggregated metric to classify code with respect to correctness and ease of understanding. Violating code health properties like DRY, Developer Congestion, and Bumpy Road lead to a higher number of vulnerabilities. Additionally there is a strong correlation between security error density and hotspots where complicated code that developers spend much time on.

How to Design DevSecOps Compliance Processes to Free Up Developer Resources

With the expectations for fast delivery, it’s imperative to include security from day one. Security is a shared responsible that must be included in the end-to-end delivery pipeline. Compliance can be designed into the system via automation such as vulnerability scanning, auditing, logging, and monitoring to track changes real-time.

Book Club: The DevOps Handbook (Chapter 3. The Second Way: The Principles of Feedback)

This entry is in the series DevOps Handbook

The following is a chapter summary for “The DevOps Handbook” by Gene Kim, Jez Humble, John Willis, and Patrick DeBois for an online book club.

Background on The DevOps Handbook

More than ever, the effective management of technology is critical for business competitiveness. For decades, technology leaders have struggled to balance agility, reliability, and security. The consequences of failure have never been greater―whether it’s the healthcare.gov debacle, cardholder data breaches, or missing the boat with Big Data in the cloud.
And yet, high performers using DevOps principles, such as Google, Amazon, Facebook, Etsy, and Netflix, are routinely and reliably deploying code into production hundreds, or even thousands, of times per day.
Following in the footsteps of The Phoenix Project, The DevOps Handbook shows leaders how to replicate these incredible outcomes, by showing how to integrate Product Management, Development, QA, IT Operations, and Information Security to elevate your company and win in the marketplace.
The DevOps Handbook

Chapter 3

The Second Way describes the principles that enable the fast and constant feedback at all stages of the value stream. In technology, work happens almost entirely within complex systems with a high risk of catastrophic consequences. As in manufacturing, problems are often discovered only when large failures are underway, such as a massive production outage or a security breach resulting in the theft of customer data.

Working Safely Within Complex Systems

The goal is to make the system of work safer by creating fast, frequent, high-quality information flow throughout the value stream and organization, which includes feedback and feedforward loops. One of the defining characteristics of a complex system is that it defies any single person’s ability to see the system as a whole and understand how all the pieces fit together.

Four conditions to help make complex systems safe:

Complex work is managed so that problems in design and operations are revealed.
Problems are swarmed and solved, resulting in quick construction of new knowledge.
New local knowledge is exploited globally throughout the organization.
Leaders create other leaders who continually grow these types of capabilities.

See Problems As They Occur

Our goal is to increase information flow in the system from as many areas as possible with as much clarity between cause and effect as possible. In the technology value stream, poor outcomes arise because of the absence of fast feedback. When feedback is delayed and infrequent, it’s too slow to prevent undesirable outcomes. Fast feedback can be enabled with the creation of automated build, integration, and test process.

Feedback loops enable quick detection and recovery of problems, as well as how to prevent these problems from occurring again in the future. Doing this increases the quality and safety of the system of work and creates organizational learning.

Swarm and Solve Problems To Build New Knowledge

Swarming is necessary for the following reasons:

It prevents the problem from progressing downstream (cost and effort to repair it increases downstream and technical debt accumulates).
It prevents the work center from starting new work, which will likely introduce new errors into the system.
If the problem is not addressed, the team could have the same problem in the next cycle or iteration, which requires more fixes and work.

Keep Pushing Quality Closer To The Source

In complex systems, adding more inspection steps and approval processes actually increases the likelihood of future failures. The effectiveness of approval processes decreases as decision-making is pushed further away from where the work is performed.

Examples of Ineffective Quality Controls:

Requiring another team to complete tedious, error-prone, and manual tasks that could be easily automated and run as needed by the team who needs the work performed.
Requiring approvals from busy people who are distant from the work, forcing them to make decisions without an adequate knowledge of the work or the potential implications, or to merely rubber stamp their approvals.
Creating large volumes of documentation of questionable detail which become obsolete shortly after they are written.
Pushing large batches of work to teams and special committees for approval and processing and then waiting for responses.

Enable Optimizing For Downstream Work Centers

Lean defines two types of customers that to design for: (1) the external customer (who most likely pays for the service being delivered) and (2) the internal customer (who receives and processes the work immediately after the development team).

According to Lean, the most important customer is the one next step downstream.

Creating fast feedback is critical to achieving quality, reliability, and safety in the technology value stream. Fast feedback is achieved by seeing problems as they occur, swarming and solving problems to build new knowledge, pushing quality closer to the source, and continually optimizing for downstream work centers.

From the Pipeline v26.0

This entry is in the series From the Pipeline

The following is a regular feature where we share articles, podcasts, and webinars of interest from the web.

Evaluating Test Cases, Checks, and Tools

In his latest blog post Michael Bolton asks the reader to perform an experiment to observe if they are putting too much stock in tooling or test cases: start a list. Note how significant an impact the artifact was in finding the bug. For the purpose of the blog post, Bolton considers test cases, automated checks, and testing tools to be artifacts. The scoring system can provide a negative impact when the artifact costs time or disrupts from the task of finding problems. An overall high score might indicate the artifacts are helping the testing effort; a low score might suggest the artifacts are hindering the testing effort. What ultimately matters is the experience not the score. Learning to understand the context in which artifacts can help or hurt testing is a path to improvement, most especially in guarding against over-reliance on particular artifacts.

Bad software sent postal workers to jail, because no one wanted to admit it could be wrong

UK Post Office employees have dealt with a piece of software called Horizon that had bugs that made it look like employees stole tens of thousands of British pounds. Some local postmasters were convicted of crimes. More than 736 employees were convicted over the course of 15 years, yet the software defects were responsible for reporting the count was short. At present many of those affected have had their convictions overturned and there is currently an inquiry into the software in question.

Simplify Cucumber Steps

In the later stages of a test automation project, code complexity increases due to development. Ali Fuat Atest provides tips to reduce maintenance cost, complexity, and achieve a better project structure. The Background is recommended to reduce repetition for the common steps that begin each scenario. Using Scenario Outlines are recommended when using datasets to improve readability. Additionally, many common and repeated steps should be combined into a higher-level step definition. Lastly, table structure is useful for a given step that has many components such as data entry.

Cypress Courses

Cypress is a front-end testing tool capable for functional automated checks as well as unit tests. It’s growing in popularity, most especially with angular web applications because it’s written in Javascript. Cypress is different from Selenium in that it executes in the same run loop as the application, as opposed to Selenium that executes external to the browser and sends remote commands. The Cypress team has collected many of the training courses offered across multiple platforms into one place.

Book: 33 Routines to Make You a Better Tester

A brief book review of “33 Routines to Make You a Better Tester”. The book contains topics from exploratory testing, Risk-based testing, static testing techniques, etc.. Each section defines the topic briefly, demonstrates an approach, and has several key takeaways. It’s available on Amazon Kindle for just $1.50.

Book Club: The DevOps Handbook (Chapter 2. The First Way: The Principles of Flow)

This entry is in the series DevOps Handbook

The following is a chapter summary for “The DevOps Handbook” by Gene Kim, Jez Humble, John Willis, and Patrick DeBois for an online book club.

Background on The DevOps Handbook

More than ever, the effective management of technology is critical for business competitiveness. For decades, technology leaders have struggled to balance agility, reliability, and security. The consequences of failure have never been greater―whether it’s the healthcare.gov debacle, cardholder data breaches, or missing the boat with Big Data in the cloud.
And yet, high performers using DevOps principles, such as Google, Amazon, Facebook, Etsy, and Netflix, are routinely and reliably deploying code into production hundreds, or even thousands, of times per day.
Following in the footsteps of The Phoenix Project, The DevOps Handbook shows leaders how to replicate these incredible outcomes, by showing how to integrate Product Management, Development, QA, IT Operations, and Information Security to elevate your company and win in the marketplace.
The DevOps Handbook

Chapter 2

The First Way requires the fast and smooth flow of work from Development to Operations, to deliver value to customers quickly. Flow is increased by making work visible, by reducing batch sizes and intervals of work, and by building quality in, preventing defects from being passed to downstream work centers. The ultimate goal is to decrease the amount of time required for changes to be deployed into production and to increase the reliability and quality of those services.

Make Work Visible

A significant difference between technology and manufacturing value streams is that work is invisible. Work can bounce between teams due to incomplete information, or work can be passed onto downstream work centers with problems that remain invisible. One of the best methods of making work visible is by using boards to track work, such as Kanban boards or sprint planning boards, where work is represented on physical or electronic cards.

Kanban

In Kanban, work flows from left to right. Measure lead time from when a card is placed on the board to when it is moved into the “Done” column. Work is DONE when it reaches production.

Limit Work in Progress (WIP)

In technology, work is usually far more dynamic. This is especially the case in shared services, where teams must satisfy the demands of many different stakeholders. Often daily work becomes dominated by recency priority, with requests for urgent work coming in through every communication mechanism possible, including ticketing systems, outage calls, emails, phone calls, chat rooms, and management escalations.

Interrupting technology workers is easy, because the consequences are often invisible. Multitasking can be limited by using a Kanban board to manage work, by enforcing WIP (work in progress) limits for each column or work center that puts an upper limit on the number of cards that can be in a column. For example, limit the number of cards in a given column. Limiting WIP also makes it easier to visually identify problems that prevent the completion of work.

Reduce Batch Sizes

Another key component to creating smooth and fast flow is performing work in small batch sizes. The theoretical lower limit for batch size is single-piece flow, where each operation is performed one unit at a time.

The negative outcomes associated with large batch sizes are just as relevant to the technology value stream as in manufacturing. The larger the change going into production, the more difficult the production errors are to diagnose and fix, and the longer they take to remediate.

Reduce The Number of Handoffs

To transmit code through the value stream requires multiple departments to work on a variety of tasks, including functional testing, integration testing, environment creation, server administration, storage administration, networking, load balancing, and information security.

A recommended approach is to automate significant portions of the work or to reorganize teams so they can deliver value to the customer themselves instead of having to be constantly dependent on others.

Continually Identify and Elevate Constraints

To reduce lead times and increase throughput, teams should continually identify the system’s constraints and improve its work capacity.

“In any value stream, there is always a direction of flow, and there is always one and only one constraint; any improvement not made at that constraint is an illusion.”
Beyond the Goal by Eli Goldratt

Five Focusing Steps to Address Constraints

Identify the system’s constraint.
Decide how to exploit the system’s constraint.
Subordinate everything else to the above decisions.
Elevate the system’s constraint.
If in the previous steps a constraint has been broken, go back to step one, but do not allow inertia to cause a system constraint.

DevOps Constraint Progression

Environment Creation: On-demand deployments are not achievable if the wait time is weeks or months for production or test environments. The countermeasure is to create environments that are on demand and completely self-serviced, so that they are always available when we need them.

Code Deployment: On-demand deployments are not possible if each of the production code deployments take weeks or months to perform. The countermeasure is to automate our deployments as much as possible, with the goal of being completely automated so they can be done self-service by any developer.

Test Setup and Run: On-demand deployments are not possible if every code deployment requires two weeks to set up the test environments and data sets, and another four weeks to manually execute all our regression tests. The countermeasure is to automate our tests to can execute deployments safely and to parallelize them so the test rate can keep up with the code development rate.

Overly Tight Architecture: On-demand deployments are not possible if overly tight architecture means that every time a code change occurs the engineers must attend scores of committee meetings in order to get permission to make changes. The countermeasure is to create more loosely-coupled architecture so that changes can be made safely and with more autonomy, increasing developer productivity.

Eliminate Hardships and Waste in The Value Stream

Modern interpretations of Lean have noted that “eliminating waste” can have a demeaning and dehumanizing context; instead, the goal is reframed to reduce hardship and drudgery in daily work through continual learning in order to achieve the organization’s goals.

Categories of Waste and Hardship

Partially Done Work: Any work in the value stream that has not been completed (requirement documents or change orders not yet reviewed) and work that is sitting in queue (waiting for QA review or admin tickets). Partially done work becomes obsolete and loses value as time progresses.

Extra Processes: Any additional work being performed in a process that does not add value to the customer. This may include documentation not used in a downstream work center or reviews or approvals that do not add value to the output. Extra processes add effort and increase lead times.

Extra Features: Features built into the service that are not needed by the organization or the customer (“gold plating”). Extra features add complexity and effort to testing and managing functionality.

Task Switching: When people are assigned to multiple projects and value streams, requiring them to context switch and manage dependencies between work, adding additional effort and time into the value stream.

Waiting: Any delays between work requiring people to wait until they can complete the current work. Delays increase cycle time and prevent the customer from getting value.

Motion: The amount of effort to move information or materials from one work center to another. Motion waste can be created when people who need to communicate frequently are not co-located. Handoffs also create motion waste and often require additional communication to resolve ambiguities.

Defects: Incorrect, missing, or unclear information, materials, or products create waste, as effort is needed to resolve these issues. The longer the time between defect creation and defect detection, the more difficult it is to resolve the defect.

Nonstandard or Manual Work: Reliance on nonstandard or manual work from others, such as using non-rebuilding servers, test environments, and configurations. Ideally, any dependencies on Operations should be automated, self-serviced, and available on demand.

Heroics: In order for an organization to achieve goals, individuals and teams are put in a position where they must perform unreasonable acts, which may even become a part of their daily work (nightly 2AM problems in production, creating hundreds of work tickets as part of every software release).

Book Club: The DevOps Handbook (Chapter 1. Agile, Continuous Delivery, and the Three Ways)

This entry is in the series DevOps Handbook

The following is a chapter summary for “The DevOps Handbook” by Gene Kim, Jez Humble, John Willis, and Patrick DeBois for an online book club.

Background on The DevOps Handbook

More than ever, the effective management of technology is critical for business competitiveness. For decades, technology leaders have struggled to balance agility, reliability, and security. The consequences of failure have never been greater―whether it’s the healthcare.gov debacle, cardholder data breaches, or missing the boat with Big Data in the cloud.
And yet, high performers using DevOps principles, such as Google, Amazon, Facebook, Etsy, and Netflix, are routinely and reliably deploying code into production hundreds, or even thousands, of times per day.
Following in the footsteps of The Phoenix Project, The DevOps Handbook shows leaders how to replicate these incredible outcomes, by showing how to integrate Product Management, Development, QA, IT Operations, and Information Security to elevate your company and win in the marketplace.
The DevOps Handbook

Chapter 1 The Manufacturing Value Stream

In manufacturing operations, the value stream is often easy to see and observe: it starts when a customer order is received and the raw materials are released onto the plant floor.

Value Stream: “the sequence of activities an organization undertakes to deliver upon a customer request” or “the sequence of activities required to design, produce, and deliver a good or service to a customer, including the dual flows of information and material.” – Value Stream Mapping by Karen Martin & Mike Osterling

To enable fast and predictable lead times in any value stream:

Create a smooth and even flow of work
Using techniques such as small batch sizes
Reducing work in process (WIP)
Preventing rework to ensure defects are not passed to downstream work centers
Constantly optimizing the system toward global goals

The Technology Value Stream

The same principles and patterns that enable the fast flow of work in physical processes are equally applicable to technology work.

In DevOps, the technology value stream as the process required to convert a business hypothesis into a technology-enabled service that delivers value to the customer. Value is created only when services run in production.

Focus on Deployment Lead Time

Deployment Lead Time begins when a developer checks a change into version control. Deployment Lead Time ends when that change is successfully running in production, providing value to the customer and generating useful feedback and telemetry.

Instead of work going sequentially through the design/development value stream and then through the test/operations value stream, testing and operation happens simultaneously with design/development.

Defining Lead Time vs. Processing Time:

The lead time clock starts when the request is made and ends when it is fulfilled.
The process time clock starts only when work begins on the customer request—specifically, it omits the time that the work is in queue, waiting to be processed.

The Common Scenario: Deployment Lead Times Requiring Months

When we have long deployment lead times, heroics are required at almost every stage of the value stream. We may discover that nothing works at the end of the project when we merge all the development team’s changes together.

Our DevOps Ideal: Deployment Lead Times of Minutes

Developers receive fast, constant feedback on their work, which enables them to quickly and independently implement, integrate, and validate their code, and have the code deployed into the production environment.

Achieved by checking in small code changes to version control repository, performing automated and exploratory testing against it, and deploying it into production. Achieved when we have architecture that is modular, well encapsulated, and loosely-coupled.

Teams capable of working with high degrees of autonomy, with failures being small and contained, and without causing global disruptions. Deployment lead time is measured in minutes or, in the worst case, hours.

The below is the Value Stream Map:

Observing “%C/A” As A Measure of Rework

The third key metric in the technology value stream is percent complete and accurate (%C/A). This metric reflects the quality of the output of each step in our value stream.

“The %C/A can be obtained by asking downstream customers what percentage of the time they receive work that is ‘usable as is,’ meaning that they can do their work without having to correct the information that was provided, add missing information that should have been supplied, or clarify information that should have and could have been clearer.”
Value Stream Mapping

The Three Ways: The Principles Underpinning DevOps

The First Way enables fast left-to-right flow of work from Development to Operations to the customer. In order to maximize flow, we need to make work visible, reduce our batch sizes and intervals of work, build in quality by preventing defects from being passed to downstream work centers, and constantly optimize for the global goals.

The First Way

Goal: Speed up flow through technology value stream, reduce lead time to fulfill requests, increase throughput.

Practices:

Continuous build, integration, test, and deployment processes
Creating environments on demand
Limiting work in process (WIP)
Building systems and organizations that are safe to change

The Second Way

Goal: Creation of a generative, high-trust culture that supports a dynamic, disciplined, and scientific approach to experimentation and risk-taking, facilitating the creation of organizational learning, both from our successes and failures.

Practices:

System design to multiple effects of new knowledge (local discoveries into global improvements)

The Third Way

Goal: fast and constant flow of feedback from right to left at all stages of our value stream.

Practices:

Amplify feedback to prevent problems
Enable faster detection and recovery

Conclusion

Chapter One covered the concepts of value streams, lead time as one of the key measures of the effectiveness for technology, and the high-level concepts behind each of the Three Ways. The following chapter summaries will cover each of the Three Ways in greater detail.

From the Pipeline v25.0

This entry is in the series From the Pipeline

The following will be a regular feature where we share articles, podcasts, and webinars of interest from the web.

Making GitHub CI Workflow 3x Faster

GitHub has started a “building GitHub” blog series to provide insight on their engineering team practices. In the first post, they share how they decreased the time from commit to production deployment. The GitHub codebase is a monolith with thousands of tests executed across 25 CI jobs for every commit. To reduce the time from commit to deployment they first categorized the types of CI jobs, then fixed the flaky tests. They then modified their deployment with a “deferred compliance” tool that pushes through changes, but when an issue is noted by the CI jobs, gives the team 72 hours to fix the issues before it’s rolled back. The teams are notified of these compliance issues via Slack. Overall a interesting read and I’m looking forward to the next three posts in the series.

A Sustainable Pattern with Shared Library

Thomas Bjerre describes how he uses Shared Libraries in Jenkins. Shared Libraries are used for Pipelines, which can be defined in external source control repositories and loaded into existing Pipelines. This helps to reduce duplicated code, provides a form of documentation, and a standard way to reuse patterns. Thomas constructs a build plan to decide on what will be done in the build to streamline the rest of the code. A public API is used to help standardize what uses of the library will invoke.

How to Use Page Object Model in Selenium

This post by Perfecto is an overview of the Page Object Model. “Page Object Model (POM) in Selenium is a design pattern that creates a repository of objects, such as buttons, input fields, and other elements. The main goal of using POM in Selenium is to reduce code duplication and improve the maintenance of tests in the future.” To help keep the test code in maintainable state, ensure that page objects never make the verifications. Also make sure the verification is that the page loaded correctly. Lastly, only add elements that are actually used to prevent clutter.

Antipatterns and Patterns

This is a fascinating article that not only explains the difference between antipattern (ineffective approaches that are ineffective) and pattern (effective and improves desired outcomes), but provides examples of pairs within an organization. The collection of all the patterns and antipatterns are included in an associated book, “Sooner Safer Happier”.

Java for QA Engineers: How to Learn

John Selawasky lists the path forward for converting manual testers to automation testers in a Java domain. His recommendations include: (1) learn Java Core and solve many small coding tasks; (2) use a good IDE (I recommend IntelliJ IDEA); (3) Learn unit testing; (4) verify your code without System.out.println but with your own unit tests; (5) read about code refactoring; (6) learn SQL at the beginner level; (7) learn a little bit about Gradle, Maven, and Spring; (8) read, check, and improve the code of other people; (9) work with Mockito (or other mock testing frameworks); and, (10) now learn your testing tools.

From the Pipeline v24.0

This entry is in the series From the Pipeline

The following will be a regular feature where we share articles, podcasts, and webinars of interest from the web.

Introducing Boa Constrictor: The .NET Screenplay Pattern

Andy Knight, the “Automation Panda”, has released a new open source tool for implementing the Screenplay Pattern for test automation. The Screenplay pattern has slowly chipped away at Page Object’s high usage in the web automation space. The reason for the shift is the pattern follows a good design principle in coding: separation of concerns. In the pattern, actors use abilities to perform interactions. Andy has provided a brief tutorial in his article along with a link to the open source code.

Let’s Focus More on Quality and Less on Testing

Joel Montvelisky is one of the luminaries in the testing field. In this article posted to StickyMinds (which has also a conference presentation), Joel explains how the role of the tester has shifted and his recommendations to providing the most value to a team & organization. “In order to understand a tester’s value, we need to look at the role and understand the impact of the changing development process on this role.”

Comparing Java and Ruby

Deepak Vohra provides a good overview of the differences between Java and Ruby for someone looking to learn their first programming language. I would recommend this for anyone trying to understand the difference between interpreted versus compiled languages, static typed versus dynamically typed, as well as OOP principles. The article is brief but is a good jumping off point.

An Unlikely Union – DevOps and Audit

IT Revolution has made one of their white papers on DevOps available free to the public. This is absolutely worth a read for those of you working in organizations that must go through security, compliance, and audit to make changes. “Many organizations are adopting DevOps patterns and practices and are enjoying the benefits that come from that adoption: More speed. Higher quality. Better value. However, many organizations often get stymied when dealing with information security, compliance, and audit requirements. There seems to be a misconception that DevOps practices won’t work in organizations which are under SOX or PCI regulations. In this paper, we will provide some high-level guidance on three major concerns about DevOps Practices: (1) DevOps and Change Control, (2) DevOps and Security, (3) DevOps and Separation of Duties”

Kobiton Odyssey Recordings

This past Summer Kobiton hosted an online conference called Kobiton. They invited industry leaders in the quality space to provide experience reports. They have made these conference talks freely available to everyone. I recommend listening to the sessions by Joel Montvelisky, Paul Grizzaffi, and Melissa Tondi.

Slaying the Leviathan: Containerized Execution of Test Automation-part 2

This entry is in the series Slaying the Leviathan

Introduction

In this series on automated testing with Docker we covered the basics of the automation framework we are utilizing as well as an overview of Docker in part 1. For part 2, we dive into the actual utilization of the framework.

Docker Applied

In our framework we have a Dockerfile in the root directory.

This Dockerfile houses all necessary steps required for building a Docker Image to setup and run a Ruby/Watir test automation framework as a Docker Container.

In Docker, the RUN commands are executed to build the image. The build steps of the Image include:

Ruby 2.6.6 Installation
Chrome Installation-This will install whatever is considered the most recent stable Chrome version.
ChromeDriver Download and Unzip-We are downloading the ChromeDriver for Chrome 84 as that is the stable Chrome version currently being pulled down. This may need changed depending on when you are executing this code.
Git Setup

The build steps for Image setup are similar to what we did for our workspace setup in part 1 of this series. That is intentional since we need the same things within the context of the image.

The final line in the Dockerfile houses the CMD function. These CMD commands do not run during the build of the image. The commands in the CMD line are executed when the container is built on the top writable layer of the Docker Image.

This CMD step completes the following functionality:

Clone framework from Git
Sets the Ruby Version up in Rbenv
Installs necessary Gems via bundler
Kicks off the dynamic_tags.rb file, which will split the build based on the variables passed
Sets the location of the Chrome Browser and Chrome Driver
Specifies which tests to run within the framework
Kicks off the Rake Task which will start the Cucumber functionality

On your local machine, build the docker image via ‘docker image build -t cucumber-example ./‘ this should be run from the root directory of our framework.

We should see this when the process is complete (this process will take longer the first time)

Docker Single Threaded Execution

Now we have an image named cucumber-example. This can be seen by running the docker images command.

We can now run a Container based on the Image we have generated utilizing this command.

docker container run -e total_number_of_builds=2 -e build_number=1 –name cucumber-run-4 cucumber-example

Then we see the Container run, which completes all the CMD commands listed in the Dockerfile in the image’s context.

One note, we are setting two environment variables at the runtime of the container total_number_of_builds and build_number.

These environment variables allow our dynamic_tags.rb script within the container to signify a subsection of the tests to run.

Docker Compose

Docker Compose allows us to signify how we want to run multiple containers from multiple images, simultaneously, in a YAML format.

We have a docker-compose.yaml file in the root directory of this framework.

This image has an empty alt attribute; its file name is image.png

We utilize the Compose file to set up multiple Container instances, utilizing the cucumber-example Image we have generated. The services section in the docker-compose.yaml file lists a numerical alias for each instance of the image we will run.

For each of these services, we’re utilizing YAML inheritance to pass the build image because it’s same for all of them and the total number of builds. Each service has a unique value for build_number as the dynamic_tags.rb script will split the regression up between all of these Containers based on that number.

We are running 12 containers in the Compose file, so a 12th of the regression will run on every container. This can be adjusted by simply removing service instances and decreasing the total_number_of_builds value accordingly.

Another parameter we’re passing into all containers is restart: “no”; this stops the containers from restarting once they complete the tests assigned to them. Without this, all of the containers would run in an endless restart loop. This is good if you are housing service in these containers like a web app but not good for a finite process like running a test set.

Docker Compose Runtime

Now we get to accomplish the fun process of running a set of Containers utilizing Docker Compose.

The first thing we do is remove all existing containers related to the instance of Docker Compose. These exist on my local because I have executed this before; they won’t work on yours during your first run.

We want to ensure that these are removed so that we are running in fresh Containers rather than Docker restarting the existing Containers for the Compose file.

One important thing to note is the naming convention of the Containers is generated as a result of Compose executing. It’s a combination of the directory that the Compose file is housed within.

*If you didn’t change the root directory name during phase one, now would be the time to change it to sample_cucumber.

The Service Alias in the Compose file is:

The index of that service running.

This container generated for Service Alias one would be named sample_cucumber_one_1

Next, we can run ‘docker-compose up‘ in our framework’s root directory. All of the necessary containers will be created.

A thing to note is that you will see all the output from all of the running Compose containers mixed in the command line output.

You can prevent this by running in detached mode.

Once Docker Compose has executed and all of the containers are done executing you will see:

The last thing to discuss is how do we retrieve the results from the containers that have run.

Docker has a copy command in which we can take the contents of a directory housed in the Container and store a copy externally or vice versa.

docker container cp sample_cucumber_one_1:docker_web_repo/output ./docker_output/1

The blue text is the container name
The red text is the path to the directory
The green text is where to store the found file externally

This will give us the test results of an individual container and can review external to the container in which it was created.

Conclusion and Next Steps

In part 2, we have covered Docker Images, Docker Containers and utilizing Docker Compose. In part 3 of this series will deal with implementing this framework to run in a CI/CD tool.

Book Club: The DevOps Handbook (Introduction)

This entry is in the series DevOps Handbook

The following is a chapter summary for “The DevOps Handbook” by Gene Kim, Jez Humble, John Willis, and Patrick DeBois for an online book club.

Background on The DevOps Handbook

More than ever, the effective management of technology is critical for business competitiveness. For decades, technology leaders have struggled to balance agility, reliability, and security. The consequences of failure have never been greater―whether it’s the healthcare.gov debacle, cardholder data breaches, or missing the boat with Big Data in the cloud.
And yet, high performers using DevOps principles, such as Google, Amazon, Facebook, Etsy, and Netflix, are routinely and reliably deploying code into production hundreds, or even thousands, of times per day.
Following in the footsteps of The Phoenix Project, The DevOps Handbook shows leaders how to replicate these incredible outcomes, by showing how to integrate Product Management, Development, QA, IT Operations, and Information Security to elevate your company and win in the marketplace.
The DevOps Handbook

An Introduction to DevOps

“Imagine a world where product owners, Development, QA, IT Operations, and Infosec work together, not only to help each other, but also to ensure that the overall organization succeeds. By working toward a common goal, they enable the fast flow of planned work into production (e.g., performing tens, hundreds, or even thousands of code deploys per day), while achieving world-class stability, reliability, availability, and security.”
An Introduction to DevOps

In this world, cross-functional teams rigorously test their hypotheses of which features will most delight users and advance the organizational goals.

Simultaneously, QA, IT Operations, and Infosec are always working on ways to reduce friction for the team, creating the work systems that enable developers to be more productive and get better outcomes.

This enables organizations to create a safe system of work, where small teams are able to quickly and independently develop, test, and deploy code and value quickly, safely, securely, and reliably to customers.

By adopting Lean principles and practices, manufacturing organizations dramatically improved plant productivity, customer lead times, product quality, and customer satisfaction, enabling them to win in the marketplace.

Before the revolution, average manufacturing plant order lead times were six weeks, with fewer than 70% of orders being shipped on time.

By 2005, with the widespread implementation of Lean practices, average product lead times had dropped to less than three weeks, and more than 95% of orders were being shipped on time.

Most organizations are not able to deploy production changes in minutes or hours, instead requiring weeks or months. These same organizations are not able to deploy hundreds or thousands of changes into production per day. They struggle to deploy monthly or even quarterly. Production deployments are not routine, but instead involve outages and firefighting.

The Core, Chronic Conflict

In almost every IT organization, there is built-in conflict between Development and IT Operations that creates a downward spiral, resulting in slower time to market for new products and features, reduced quality, increased outages, and an ever-increasing amount of technical debt.

Technical Debt: the term “technical debt” was first coined by Ward Cunningham. Technical debt describes how decisions we make lead to problems that get increasingly more difficult to fix over time, continually reducing our available options in the future — even when taken on judiciously, we still incur interest.

Two competing organizational interests: respond to the rapidly changing competitive landscape and provide a stable service to the customer.

Development takes responsibility for responding to changes in the market, deploying features and changes into production. IT Operations takes responsibility for providing customers with IT service that is stable and secure, making it difficult for anyone to introduce production changes that could jeopardize production. Dr. Eli Goldratt called these types of configuration “the core, chronic conflict”.

The Downward Spiral

The first act begins in IT Operations, where our goal is to keep applications and infrastructure running so that our organization can deliver value to customers. In our daily work, many of our problems are due to applications and infrastructure that are complex, poorly documented, and incredibly fragile. The systems most prone to failure are also our most important and are at the epicenter of our most urgent changes.

The second act begins when somebody has to compensate for the latest broken promise—it could be a product manager promising a bigger, bolder feature to dazzle customers with or a business executive setting an even larger revenue target. Then they commit the technology organization to deliver upon this new promise. Development is tasked with another urgent project that inevitably requires solving new technical challenges and cutting corners to meet the promised release date, further adding to our technical debt.

The Third and final act, where everything becomes just a little more difficult, bit by bit—everybody gets a little busier, work takes a little more time, communications become a little slower, and work queues get a little longer. Our work becomes more tightly-coupled, smaller actions cause bigger failures, and we become more fearful and less tolerant of making changes. Work requires more communication, coordination, and approvals; teams must wait longer for their dependent work to get done; and our quality keeps getting worse.

Why Does the Downward Spiral Happen?

Every IT organization has two opposing goals, and second, every company is a technology company, whether they know it or not. The vast majority of capital projects have some reliance upon IT.

“When people are trapped in this downward spiral for years, especially those who are downstream of Development, they often feel stuck in a system that pre-ordains failure and leaves them powerless to change the outcomes. This powerlessness is often followed by burnout, with the associated feelings of fatigue, cynicism, and even hopelessness and despair.”
An Introduction to DevOps

A culture can be created where people are afraid to do the right thing because of fear of punishment, failure, or jeopardizing their livelihood. This can create the condition of learned helplessness, where people become unwilling or unable to act in a way that avoids the same problem in the future.

Counteracting the Downward Spiral

By creating fast feedback loops at every step of the process, everyone can immediately see the effects of their actions. Whenever changes are committed into version control, fast automated tests are run in production-like environments, giving continual assurance that the code and environments operate as designed and are always in a secure and deployable state. Automated testing helps developers discover their mistakes quickly.

High-profile product and feature releases become routine by using dark launch techniques. Long before the launch date, we put all the required code for the feature into production, invisible to everyone except internal employees and small cohorts of real users, allowing us to test and evolve the feature until it achieves the desired business goal.

In a DevOps culture, everyone has ownership of their work regardless of their role in the organization.

The Business Value of DevOps

High-Performing Organizers succeed in the following areas:

Throughput metrics
Code and change deployments (thirty times more frequent)
Code and change deployment lead time (two hundred times faster)
Reliability metrics
Production deployments (sixty times higher change success rate)
Mean time to restore service (168 times faster)
Organizational performance metrics
Productivity, market share, and profitability goals (two times more likely to exceed)
Market capitalization growth (50% higher over three years)

When we increase the number of developers, individual developer productivity often significantly decreases due to communication, integration, and testing overhead. DevOps shows us that when we have the right architecture, the right technical practices, and the right cultural norms, small teams of developers are able to quickly, safely, and independently develop, integrate, test, and deploy changes into production.

Organizations adopting DevOps are able to linearly increase the number of deploys per day as they increase their number of developers

“The purpose of the DevOps Handbook is to provide the theory, principles, and practices needed to successfully start a DevOps initiative. This guidance is based on decades of management theory, study of high-performing technology organizations, work the authors have done helping organizations transform, and research that validates the effectiveness of the prescribed DevOps practices.”
An Introduction to DevOps

The reader is not expected to have extensive knowledge of any of these domains, or of DevOps, Agile, ITIL, Lean, or process improvement. Each of these topics is introduced and explained in the book.

The goal is to create a working knowledge of the critical concepts in each of the above listed areas.