Three Team Activities to Improve the Quality of Automation

This is the second in a series of posts about the strategy and tactics of test automation. The first on common challenges can be found HERE. Our team has experience working at multiple large firms with an enterprise-wide scope. Throughout our time working in IT, we have encountered challenges with existing test automation implementations and committed several mistakes on the way. Our hope is to relay some valuable activities to build robustness into an automation suite so you can defeat the automation supervillains.

“I’m not a great programmer; I’m just a good programmer with great habits.”

– Martin Fowler, Refactoring: Improving the Design of Existing Code

The following is an overview of Regression Analysis, Code Reviews, and Refactoring Sessions for test automation. Just like any programmer, automation testers are developing an application; it so happens the application is designed to test other applications. Automation test suites accumulate technical debt like any other code base. Overly complicated scenarios, single use steps, and data management miscues are just a few of the issues facing an automation test suite. The quality standards one would expect from the application being delivered to stakeholders should also be followed for an automation suite to test that application.

Activity One: Regression Analysis

Regression testing has many definitions depending on the source, which can include a set of automated tests executed regularly, 20% of the tests that cover 80% of an application’s functionality, testing after an application undergoes some change, or any test executed in the past.  Regression testing can provide value by informing a team whether a change (new release, upgrade, patch, or hot fix) negatively impacts an application. Michael Bolton has previously offered that regression testing also helps us learn about the relationship between parts of the software, to understand better where future changes might have an impact. One of the concerns surrounding regression testing is “what is the appropriate number of tests” or “test coverage” to adequately observe the system. Regression testing is important, but so is performing new tests that extend coverage to features being developed. Plus, time & budget will often play a limiting factor in how much testing can be done before the change is implemented. Therefore, teams must adopt a standard mechanism to select those tests to be included in regression, which is why conducting a Regression Analysis meeting to add, modify, and remove tests from regression is important to supporting those change events.

A Regression Analysis meeting will determine (1) which tests associated with the release should be considered part of core regression and (2) which regression tests should be removed from the current core regression suite. The core regression should be understood by all members of the team and business representatives to represent automated tests executed for any release, patch, or hot fix. The output of a Regression Analysis meeting is a regression suite that reflects the core functionality of the application so for any of those events the team has confidence the application will behave as expected.

Before the Regression Analysis meeting is held, whomever is taking responsibility as quality lead for the application will compile a list of all new release scripts and all existing core regression scripts. That lead will provide both lists to all expected participants of the meeting ahead of time to give everyone an opportunity to review. A representative of the business will provide metrics on application usage broken down by feature, which can include items such as the most used platforms, popular conversion paths, tracked application exit points, active A/B tests, and any other relevant details they believe the development team should know. The application manager or product owner should provide a list of upcoming projects with high-level feature changes to identify features that may be deprecated or modified in the next release. Lastly, a representative of production support (incident and/or service request) will provide metrics on issues for that application’s most recent release and any issues in the months prior to the release. Therefore, the Regression Analysis meeting will include at least the QA lead, business sponsor, application manager / product owner, and production support representative.

The purpose of having these four roles represented in the meeting is to make educated, evidenced-based decisions about testing coverage and effort. Since testing is often limited by both time and budget constraints, all stakeholders of an application should understand the risks of excluding or limiting testing activities for a given event (release, patches, hot fixes). Helping those stakeholders understand the coverage of testing, the time involved, and the division of that work (manual & automated) for a given event aligns expectations with outcomes.

Regression Analysis should be conducted on a regular basis, matching the cadence of the release cycle if applicable. If teams are releasing daily, they should establish a working agreement to adopt those new tests to the core regression by default and have the meeting at predefined intervals to remove any tests determined not necessary by the above stakeholders. The purpose of this practice is to bias towards lower risk by including those tests rather than allow a coverage gap of weeks or months to build up before another review is held. During a Regression Analysis meeting, the participants will review the individual tests from the release to be added to the core regression and determine which tests to remove based on the data points from the four representatives. This decision process can be left open-ended if all participants agree or a checklist can be used to help make the decisions on what to include and exclude. It’s important that the meeting be held live rather than over email because like many team ceremonies, it focuses the attendees on the subject at hand, which is key to establishing a shared understanding.

Outside of updating the core regression suite to reflect the state of the application, the Regression Analysis meeting provides effort estimates to be used in future releases and a list of risk & assumptions the team can use in their working agreements or Test Plans. It’s a powerful event to focus a team on executing valuable tests rather than having a regression suite that becomes overgrown and inaccurate.

Activity Two: Code Reviews

“If you can get today’s work done today, but you do it in such a way that you can’t possibly get tomorrow’s work done tomorrow, then you lose.”

– Martin Fowler, Refactoring: Improving the Design of Existing Code

Code Reviews are a best-practice development activity to ensure mistakes are caught early in the development lifecycle. The activity will help ensure the team has “built the thing right”. Some code review activities include peer reviews by a technical lead, paired programming with another developer, or demonstration to a wider audience. A good practice to follow is leveraging a static code analysis tool (e.g., Cuke Sniffer for Ruby Cucumber) and participate in code reviews. To help ensure a feature has been tested using automation, the team should also conduct an informal walk-through of the feature under development before it’s promoted to higher environments.

Code Reviews conducted by a peer or a larger team should ensure that all requirements for the given feature under development have been met. Additionally, the feature should have all required traceability and follow all accepted team standards of development. These standards can vary significantly team-to-team, so it’s recommended any teams that cross-impact each other establish common standards. Otherwise code and projects that move across multiple teams will only be as strong as the weakest team practice. Most importantly, the automation scripts should actually execute on a regular basis and meet the expectations of pass/fail consistently. At times during the software development lifecycle (SDLC), features provided by the development team aren’t ready for test automation or data is not available. These external factors should be taken into account during a code review, so expectations for pass / fail are met. Any automation script that fails due to outside circumstances is worth noting for review at a later date. Overall, the team should look for the following during a code review:

  • All possible automation scripts for the feature are indeed scripted
  • The automation scripts are understandable by the entire team
  • The automation scripts do not duplicate effort already present
  • All required environmental, UI-locator, services, and data needs are addressed
  • The Features and Scenarios best represent the state of the application (living documentation)
  • All agreed team and enterprise standards & practices are followed (traceability, compatibility, formatting, etc.)

In the above general guidelines, a static code analysis tool was recommended to support team standards programmatically. The advantage of such a tool is execution can occur frequently to assess the current state of the codebase in a consistent manner. For instance, “Cuke Sniffer” is a Ruby Gem used to find “broken windows” in a Ruby project. Executing this static code analysis tool against a Ruby project will provide a list of issues and recommended improvements. Each problem area is assigned a score where the more important the issue results in a higher score. All combined scores for individual areas in a given project is the overall score; the higher the score, the more improvements are needed for the project. The tool also allows each team to update the standard set of rules to address specific needs for an application. Tracking the score over time provides telemetry about one aspect of quality of the test automation as features are added to the application under test. In addition to the above listed guidelines about code reviews, here are some specific “broken windows” to catch:

  • Tests without names or descriptions
  • Tests lacking traceability back to the original requirements
  • Overly long test descriptions
  • Imperative style Gherkin steps that focus on the UI and not the behavior in declarative style
  • Empty files with no tests
  • Features with too many scenarios
  • Hard-coded data (the data may work now but not in the future)
  • Tests that use “selfish data” (data that is used once and then is no longer valid)
  • Tests that use “toxic data” (data that represents a security risk, especially if that data is pulled from production without sanitization)
  • Tests that never fail (this is an often-overlooked issue. If the application is unavailable and the test still passes, then you don’t have a test)
  • . . . the list goes on and on.

Many of the above listed issues have been encountered by experienced automation developers. It’s incumbent on those individuals to support newer developers in identifying issues and educating colleagues across their organization on practices that avoid these common mistakes. Code Reviews are an effective early detection mechanism and the collaborative nature of the activity between developers helps build technical ability.

Activity Three: Refactoring Sessions

“Whenever I have to think to understand what the code is doing, I ask myself if I can refactor the code to make that understanding more immediately apparent.”

– Martin Fowler, Refactoring: Improving the Design of Existing Code

Code refactoring is an activity to improve existing code without changing its external behavior. The advantages include improved code readability and reduced complexity, which can improve code maintainability and create more expressive features or improve extensibility.

Refactoring is often motivated by noticing a “code smell”. Once a code smell has been identified, the feature can be addressed by refactoring the code or even transforming it, so the feature behaves the same as before but no longer “smells”. There are two main benefits to refactoring:

  • Maintainability. Easy to read code is easier to fix and the intent is self-apparent. One example is reducing overly long & complex methods into individually concise, single-purpose methods. Another example is migrating a method to a more appropriate class or by removing poor comments.
  • Extensibility. It’s easier to extend the automation suite if the appropriate (and agreed upon) design patterns are followed, and it provides flexibility to write more automation scripts without adding support code.

Refactoring should be conducted regularly and with specific goals in mind. Refactoring by making many small changes can result in a larger scale change. A set of guiding principles can help guide a team in refactoring as part of the development process (not as an exception-based activity or occasional activity). Static code analysis tools can be used to supplement the following guiding principles:

  • Duplication. A violation of the “Don’t Repeat Yourself” (DRY) principle.
  • Nonorthogonal Design. Code or a design choice that could be made more orthogonal. Orthogonal design examples are scenarios, data management, methods, classes, etc.  in an automation suite that are independent of each other.
  • Outdated Knowledge. Applications can change frequently, and requirements tend to shift during the course of a project. Over the course of time the team’s knowledge of the application improves, which include many of the code smells. The automation suite should represent living documentation, reflecting the current state of the application under test.
  • Performance. Automation scripts should be executed quickly and often. Added wait times and long setup for scenarios should be minimized to improve performance. Explicit wait times, flaky scenarios, and overly long scenarios hinder the feedback loop for automation results. Poor performance of automation scripts are exposed when the development team uses a CI/CD pipeline to deploy frequently, with the automated testing being the bottleneck to build success.

“I’ve found that refactoring helps me write fast software. It slows the software in the short term while I’m refactoring, but it makes the software easier to tune during optimization. I end up well ahead.”

– Martin Fowler, Refactoring: Improving the Design of Existing Code

Similar to Code Reviews, every team should implement Refactoring Sessions on a recurring basis. In each refactoring session, the team should follow a set of standards enforced by a static code analysis tool and working agreements. These standards are in addition to any existing federated standards for their enterprise. The automation sessions should be led by a member of each team and supported by an automation developer from outside the team for peer review. The reason for outside assistance is to provide a fresh viewpoint on the state of the automation suite. If the code is not self-documenting, that person should be able to raise concerns. Think of the external representative as another form of Code Review in support of quality.

The refactoring sessions should start at approximately one hour per week and be focused on active project work. The reason for this is to establish a baseline expectation for the team AND make the activity “billable” work if time tracking is a concern. To provide guardrails for the team to determine focus for a given sessions, there are a few recommendations:  (1) utilize a static code analysis tool to identify problem areas, (2) leverage daily Regression/Release executions from execution reports, (3) select a feature being actively developed, and (4) use telemetry on execution performance (speed and consistency of test execution). The following describe the roles & responsibilities during a refactoring session.

The Team Leader is responsible for scheduling the weekly sessions and ensure attendance by the team for that application under test. The Team Leader can choose to focus on one area or multiple areas, time permitting. The topic responsibility belongs to the Team Leader, but they may choose to rotate selection of the topic to other members of the team to support collective ownership. The Leader can select from multiples topic areas during a session; this is to provide the so-called guardrails, so the team stays within scope and has a fresh topic each session. The topic areas are:

  • Static Code Analysis Report
    • Review the rules enforced by the team in the static code analysis tool then execute a fresh report. Use the report to address items in the improvement list (top-down or bottom-up), remove dead steps, improve features & scenarios, refactor step definitions, or refactor hooks. The team can also choose to update any static code analysis at this time regarding enforcement and score. The history of execution should be captured to provide telemetry on the state of the automated suite.
  • Active Work
    • Select a feature from the current or previous cycle then execute the scripts in the appropriate test environment. The team should ensure the feature has the required traceability, proper formatting, and follows all coding standards. Next, the team should ensure all associated data are properly included for successful test execution. The team should confirm functionality is not duplicating existing work. After any updates to the existing test cases, the team will identify technical debt and assign action items for after the session (to add or update any test cases they feel necessary to fulfill the functional and non-functional requirements for the feature). Lastly, the team will re-execute the feature again to confirm expectations of pass / fail.
  • Daily Release / Regression
    • The Team Leader will select a feature containing regression scenarios. Execute the scripts in the highest test environment. The team will identify any regression scripts they feel are no longer relevant to core functionality of the application and tag those for Regression Analysis as an action item. The team should ensure the feature has traceability, proper formatting, and follows all coding standards. Any scenarios that have dependency on one another to be successful need to be decoupled. Any functionality in the regression that has been duplicated should be removed. Lastly, the team will re-execute those selected release / regression scripts to confirm expectations of pass / fail.
  • Execution Performance
    • The Team Leader opens multiple recent CI executions and reviews the results with the team with a focus on performance. The goal a root cause analysis to determine if the scripts suffer because of: (1) application performance, (2) test environment, (3) data issues, (4) automation timing issues such as explicit waits, or (5) change in expected functionality. Flaky tests should be removed from regular execution until the underlying issue(s) are addressed. Explicit wait times should be eliminated to improve execution time; instead, use implicit waits that execute when the application service or UI is available. Additionally, the team should establish failure criteria in the automated tests for response times that exceed a threshold. After addressing the issue(s), the CI job should be executed, and project tracking tool updated if needed.

The Automation Guide is responsible for reporting the meeting outcome to the entire development team and tracking results in an accessible location to the organization at-large. The purpose of tracking this changelog is to demonstrate improvement over time. Information tracked will include the features addressed in the team meeting, the cause for review or refactoring, and the successful outcome. Consistent problem areas can be incorporated into team & personal development goals if the root cause is automation or reporting to the application development team if the root cause is development or requirements.

The Automation Guide also serves as technical oracle for the team during the meeting. When there are questions about implementation or upholding standards for automation, the guide will act as the point of contact for solving those problems during the meeting and will be responsible for follow-up if the issue cannot be addressed in one meeting. The automation guide plays a support role and should allow the team to select the features and problems areas of focus.

Summary

“Functional tests are a different animal. They are written to ensure the software as a whole works. They provide quality assurance to the customer and don’t care about programmer productivity. They should be developed by a different team, one who delights in finding bugs.”

– Martin Fowler, Refactoring: Improving the Design of Existing Code

The above overview of Regression Analysis, Code Reviews, and Refactoring Sessions for test automation help build quality in a test automation suite and by extension the application under test. Regression Analysis helps align business partners with their development teams to establish a shared understanding of the application. Code Reviews help ensure the team has “built the thing right” by catching mistakes early in the development process. Refactoring is an activity to improve existing code without changing its external behavior by through increased code readability and reduced complexity. It’s not enough for any team to just say they’ll commit to regression analysis or code reviews or refactoring – building rigor around these activities and making them habitual help bias a team toward long-term success.

Further Exploration

In the interest of continuous improvement, developers participating in the above activities will gain new understanding of standards & best practices. However, learning does not stop at meetings end. Many of the guiding principles for the Regression Analysis, Code Reviews, and Refactoring sessions are derived from seminal works in programming. Additional study is required to progress beyond static code analysis tools and team standards. Listed below are some recommended background reading materials on software craftsmanship:

Leave a Reply

%d bloggers like this: