From The Pipeline v4.0

This entry is part 4 of 36 in the series From the Pipeline

The following will be a regular feature where we share articles, podcasts, and webinars of interest from the web. This week we’ll showcase a articles on Jenkins upgrades, DevOps, Checking versus Testing, the Screenplay pattern, and an external post by yours truly.

Docker Images for Agents: New Names and What’s Next

Jenkins is moving away from the antiquated “slave” name in favor of “agent” because the former name was considered inappropriate. Jenkins Docker images will also expand availability of windows images, support for additional platforms, and multi-platform Docker images.

How to Decide Which Types of Test Cases to Automate

Perfecto has published part of my strategy and tactics of test automation on their blog.

Making DevOps Evolution Happen

Helen Beal provides a high-level summary for troubles surrounding the DevOps evolution that most companies are trying to overcome. It’s not simply renaming the build team to “DevOps” but rather a slow and committed process to change an organization’s culture.

Checking vs Testing

Jason Arbon presents his view (with a bit of humor) regarding the “checking” versus “testing” debate that has occurred online and at conference across the globe. In the article, he differentiates between automated regression testing and generative automated testing. For Jason, regression tests are those often repeated test scripts to validate existing application behavior. Generative automated testing analyzes the software specifications, implementation, and application itself to automatically generate test coverage.

Understanding Screenplay

Matt Wynne provides part 4 in his series on the Screenplay pattern for test automation. You can follow the trail back to part 1. The series is a great comparison piece for people who use the PageObject pattern for their application testing.

Three Team Activities to Improve the Quality of Automation

This is the second in a series of posts about the strategy and tactics of test automation. The first on common challenges can be found HERE. Our team has experience working at multiple large firms with an enterprise-wide scope. Throughout our time working in IT, we have encountered challenges with existing test automation implementations and committed several mistakes on the way. Our hope is to relay some valuable activities to build robustness into an automation suite so you can defeat the automation supervillains.

“I’m not a great programmer; I’m just a good programmer with great habits.”

– Martin Fowler, Refactoring: Improving the Design of Existing Code

The following is an overview of Regression Analysis, Code Reviews, and Refactoring Sessions for test automation. Just like any programmer, automation testers are developing an application; it so happens the application is designed to test other applications. Automation test suites accumulate technical debt like any other code base. Overly complicated scenarios, single use steps, and data management miscues are just a few of the issues facing an automation test suite. The quality standards one would expect from the application being delivered to stakeholders should also be followed for an automation suite to test that application.

Activity One: Regression Analysis

Regression testing has many definitions depending on the source, which can include a set of automated tests executed regularly, 20% of the tests that cover 80% of an application’s functionality, testing after an application undergoes some change, or any test executed in the past.  Regression testing can provide value by informing a team whether a change (new release, upgrade, patch, or hot fix) negatively impacts an application. Michael Bolton has previously offered that regression testing also helps us learn about the relationship between parts of the software, to understand better where future changes might have an impact. One of the concerns surrounding regression testing is “what is the appropriate number of tests” or “test coverage” to adequately observe the system. Regression testing is important, but so is performing new tests that extend coverage to features being developed. Plus, time & budget will often play a limiting factor in how much testing can be done before the change is implemented. Therefore, teams must adopt a standard mechanism to select those tests to be included in regression, which is why conducting a Regression Analysis meeting to add, modify, and remove tests from regression is important to supporting those change events.

A Regression Analysis meeting will determine (1) which tests associated with the release should be considered part of core regression and (2) which regression tests should be removed from the current core regression suite. The core regression should be understood by all members of the team and business representatives to represent automated tests executed for any release, patch, or hot fix. The output of a Regression Analysis meeting is a regression suite that reflects the core functionality of the application so for any of those events the team has confidence the application will behave as expected.

Before the Regression Analysis meeting is held, whomever is taking responsibility as quality lead for the application will compile a list of all new release scripts and all existing core regression scripts. That lead will provide both lists to all expected participants of the meeting ahead of time to give everyone an opportunity to review. A representative of the business will provide metrics on application usage broken down by feature, which can include items such as the most used platforms, popular conversion paths, tracked application exit points, active A/B tests, and any other relevant details they believe the development team should know. The application manager or product owner should provide a list of upcoming projects with high-level feature changes to identify features that may be deprecated or modified in the next release. Lastly, a representative of production support (incident and/or service request) will provide metrics on issues for that application’s most recent release and any issues in the months prior to the release. Therefore, the Regression Analysis meeting will include at least the QA lead, business sponsor, application manager / product owner, and production support representative.

The purpose of having these four roles represented in the meeting is to make educated, evidenced-based decisions about testing coverage and effort. Since testing is often limited by both time and budget constraints, all stakeholders of an application should understand the risks of excluding or limiting testing activities for a given event (release, patches, hot fixes). Helping those stakeholders understand the coverage of testing, the time involved, and the division of that work (manual & automated) for a given event aligns expectations with outcomes.

Regression Analysis should be conducted on a regular basis, matching the cadence of the release cycle if applicable. If teams are releasing daily, they should establish a working agreement to adopt those new tests to the core regression by default and have the meeting at predefined intervals to remove any tests determined not necessary by the above stakeholders. The purpose of this practice is to bias towards lower risk by including those tests rather than allow a coverage gap of weeks or months to build up before another review is held. During a Regression Analysis meeting, the participants will review the individual tests from the release to be added to the core regression and determine which tests to remove based on the data points from the four representatives. This decision process can be left open-ended if all participants agree or a checklist can be used to help make the decisions on what to include and exclude. It’s important that the meeting be held live rather than over email because like many team ceremonies, it focuses the attendees on the subject at hand, which is key to establishing a shared understanding.

Outside of updating the core regression suite to reflect the state of the application, the Regression Analysis meeting provides effort estimates to be used in future releases and a list of risk & assumptions the team can use in their working agreements or Test Plans. It’s a powerful event to focus a team on executing valuable tests rather than having a regression suite that becomes overgrown and inaccurate.

Activity Two: Code Reviews

“If you can get today’s work done today, but you do it in such a way that you can’t possibly get tomorrow’s work done tomorrow, then you lose.”

– Martin Fowler, Refactoring: Improving the Design of Existing Code

Code Reviews are a best-practice development activity to ensure mistakes are caught early in the development lifecycle. The activity will help ensure the team has “built the thing right”. Some code review activities include peer reviews by a technical lead, paired programming with another developer, or demonstration to a wider audience. A good practice to follow is leveraging a static code analysis tool (e.g., Cuke Sniffer for Ruby Cucumber) and participate in code reviews. To help ensure a feature has been tested using automation, the team should also conduct an informal walk-through of the feature under development before it’s promoted to higher environments.

Code Reviews conducted by a peer or a larger team should ensure that all requirements for the given feature under development have been met. Additionally, the feature should have all required traceability and follow all accepted team standards of development. These standards can vary significantly team-to-team, so it’s recommended any teams that cross-impact each other establish common standards. Otherwise code and projects that move across multiple teams will only be as strong as the weakest team practice. Most importantly, the automation scripts should actually execute on a regular basis and meet the expectations of pass/fail consistently. At times during the software development lifecycle (SDLC), features provided by the development team aren’t ready for test automation or data is not available. These external factors should be taken into account during a code review, so expectations for pass / fail are met. Any automation script that fails due to outside circumstances is worth noting for review at a later date. Overall, the team should look for the following during a code review:

  • All possible automation scripts for the feature are indeed scripted
  • The automation scripts are understandable by the entire team
  • The automation scripts do not duplicate effort already present
  • All required environmental, UI-locator, services, and data needs are addressed
  • The Features and Scenarios best represent the state of the application (living documentation)
  • All agreed team and enterprise standards & practices are followed (traceability, compatibility, formatting, etc.)

In the above general guidelines, a static code analysis tool was recommended to support team standards programmatically. The advantage of such a tool is execution can occur frequently to assess the current state of the codebase in a consistent manner. For instance, “Cuke Sniffer” is a Ruby Gem used to find “broken windows” in a Ruby project. Executing this static code analysis tool against a Ruby project will provide a list of issues and recommended improvements. Each problem area is assigned a score where the more important the issue results in a higher score. All combined scores for individual areas in a given project is the overall score; the higher the score, the more improvements are needed for the project. The tool also allows each team to update the standard set of rules to address specific needs for an application. Tracking the score over time provides telemetry about one aspect of quality of the test automation as features are added to the application under test. In addition to the above listed guidelines about code reviews, here are some specific “broken windows” to catch:

  • Tests without names or descriptions
  • Tests lacking traceability back to the original requirements
  • Overly long test descriptions
  • Imperative style Gherkin steps that focus on the UI and not the behavior in declarative style
  • Empty files with no tests
  • Features with too many scenarios
  • Hard-coded data (the data may work now but not in the future)
  • Tests that use “selfish data” (data that is used once and then is no longer valid)
  • Tests that use “toxic data” (data that represents a security risk, especially if that data is pulled from production without sanitization)
  • Tests that never fail (this is an often-overlooked issue. If the application is unavailable and the test still passes, then you don’t have a test)
  • . . . the list goes on and on.

Many of the above listed issues have been encountered by experienced automation developers. It’s incumbent on those individuals to support newer developers in identifying issues and educating colleagues across their organization on practices that avoid these common mistakes. Code Reviews are an effective early detection mechanism and the collaborative nature of the activity between developers helps build technical ability.

Activity Three: Refactoring Sessions

“Whenever I have to think to understand what the code is doing, I ask myself if I can refactor the code to make that understanding more immediately apparent.”

– Martin Fowler, Refactoring: Improving the Design of Existing Code

Code refactoring is an activity to improve existing code without changing its external behavior. The advantages include improved code readability and reduced complexity, which can improve code maintainability and create more expressive features or improve extensibility.

Refactoring is often motivated by noticing a “code smell”. Once a code smell has been identified, the feature can be addressed by refactoring the code or even transforming it, so the feature behaves the same as before but no longer “smells”. There are two main benefits to refactoring:

  • Maintainability. Easy to read code is easier to fix and the intent is self-apparent. One example is reducing overly long & complex methods into individually concise, single-purpose methods. Another example is migrating a method to a more appropriate class or by removing poor comments.
  • Extensibility. It’s easier to extend the automation suite if the appropriate (and agreed upon) design patterns are followed, and it provides flexibility to write more automation scripts without adding support code.

Refactoring should be conducted regularly and with specific goals in mind. Refactoring by making many small changes can result in a larger scale change. A set of guiding principles can help guide a team in refactoring as part of the development process (not as an exception-based activity or occasional activity). Static code analysis tools can be used to supplement the following guiding principles:

  • Duplication. A violation of the “Don’t Repeat Yourself” (DRY) principle.
  • Nonorthogonal Design. Code or a design choice that could be made more orthogonal. Orthogonal design examples are scenarios, data management, methods, classes, etc.  in an automation suite that are independent of each other.
  • Outdated Knowledge. Applications can change frequently, and requirements tend to shift during the course of a project. Over the course of time the team’s knowledge of the application improves, which include many of the code smells. The automation suite should represent living documentation, reflecting the current state of the application under test.
  • Performance. Automation scripts should be executed quickly and often. Added wait times and long setup for scenarios should be minimized to improve performance. Explicit wait times, flaky scenarios, and overly long scenarios hinder the feedback loop for automation results. Poor performance of automation scripts are exposed when the development team uses a CI/CD pipeline to deploy frequently, with the automated testing being the bottleneck to build success.

“I’ve found that refactoring helps me write fast software. It slows the software in the short term while I’m refactoring, but it makes the software easier to tune during optimization. I end up well ahead.”

– Martin Fowler, Refactoring: Improving the Design of Existing Code

Similar to Code Reviews, every team should implement Refactoring Sessions on a recurring basis. In each refactoring session, the team should follow a set of standards enforced by a static code analysis tool and working agreements. These standards are in addition to any existing federated standards for their enterprise. The automation sessions should be led by a member of each team and supported by an automation developer from outside the team for peer review. The reason for outside assistance is to provide a fresh viewpoint on the state of the automation suite. If the code is not self-documenting, that person should be able to raise concerns. Think of the external representative as another form of Code Review in support of quality.

The refactoring sessions should start at approximately one hour per week and be focused on active project work. The reason for this is to establish a baseline expectation for the team AND make the activity “billable” work if time tracking is a concern. To provide guardrails for the team to determine focus for a given sessions, there are a few recommendations:  (1) utilize a static code analysis tool to identify problem areas, (2) leverage daily Regression/Release executions from execution reports, (3) select a feature being actively developed, and (4) use telemetry on execution performance (speed and consistency of test execution). The following describe the roles & responsibilities during a refactoring session.

The Team Leader is responsible for scheduling the weekly sessions and ensure attendance by the team for that application under test. The Team Leader can choose to focus on one area or multiple areas, time permitting. The topic responsibility belongs to the Team Leader, but they may choose to rotate selection of the topic to other members of the team to support collective ownership. The Leader can select from multiples topic areas during a session; this is to provide the so-called guardrails, so the team stays within scope and has a fresh topic each session. The topic areas are:

  • Static Code Analysis Report
    • Review the rules enforced by the team in the static code analysis tool then execute a fresh report. Use the report to address items in the improvement list (top-down or bottom-up), remove dead steps, improve features & scenarios, refactor step definitions, or refactor hooks. The team can also choose to update any static code analysis at this time regarding enforcement and score. The history of execution should be captured to provide telemetry on the state of the automated suite.
  • Active Work
    • Select a feature from the current or previous cycle then execute the scripts in the appropriate test environment. The team should ensure the feature has the required traceability, proper formatting, and follows all coding standards. Next, the team should ensure all associated data are properly included for successful test execution. The team should confirm functionality is not duplicating existing work. After any updates to the existing test cases, the team will identify technical debt and assign action items for after the session (to add or update any test cases they feel necessary to fulfill the functional and non-functional requirements for the feature). Lastly, the team will re-execute the feature again to confirm expectations of pass / fail.
  • Daily Release / Regression
    • The Team Leader will select a feature containing regression scenarios. Execute the scripts in the highest test environment. The team will identify any regression scripts they feel are no longer relevant to core functionality of the application and tag those for Regression Analysis as an action item. The team should ensure the feature has traceability, proper formatting, and follows all coding standards. Any scenarios that have dependency on one another to be successful need to be decoupled. Any functionality in the regression that has been duplicated should be removed. Lastly, the team will re-execute those selected release / regression scripts to confirm expectations of pass / fail.
  • Execution Performance
    • The Team Leader opens multiple recent CI executions and reviews the results with the team with a focus on performance. The goal a root cause analysis to determine if the scripts suffer because of: (1) application performance, (2) test environment, (3) data issues, (4) automation timing issues such as explicit waits, or (5) change in expected functionality. Flaky tests should be removed from regular execution until the underlying issue(s) are addressed. Explicit wait times should be eliminated to improve execution time; instead, use implicit waits that execute when the application service or UI is available. Additionally, the team should establish failure criteria in the automated tests for response times that exceed a threshold. After addressing the issue(s), the CI job should be executed, and project tracking tool updated if needed.

The Automation Guide is responsible for reporting the meeting outcome to the entire development team and tracking results in an accessible location to the organization at-large. The purpose of tracking this changelog is to demonstrate improvement over time. Information tracked will include the features addressed in the team meeting, the cause for review or refactoring, and the successful outcome. Consistent problem areas can be incorporated into team & personal development goals if the root cause is automation or reporting to the application development team if the root cause is development or requirements.

The Automation Guide also serves as technical oracle for the team during the meeting. When there are questions about implementation or upholding standards for automation, the guide will act as the point of contact for solving those problems during the meeting and will be responsible for follow-up if the issue cannot be addressed in one meeting. The automation guide plays a support role and should allow the team to select the features and problems areas of focus.


“Functional tests are a different animal. They are written to ensure the software as a whole works. They provide quality assurance to the customer and don’t care about programmer productivity. They should be developed by a different team, one who delights in finding bugs.”

– Martin Fowler, Refactoring: Improving the Design of Existing Code

The above overview of Regression Analysis, Code Reviews, and Refactoring Sessions for test automation help build quality in a test automation suite and by extension the application under test. Regression Analysis helps align business partners with their development teams to establish a shared understanding of the application. Code Reviews help ensure the team has “built the thing right” by catching mistakes early in the development process. Refactoring is an activity to improve existing code without changing its external behavior by through increased code readability and reduced complexity. It’s not enough for any team to just say they’ll commit to regression analysis or code reviews or refactoring – building rigor around these activities and making them habitual help bias a team toward long-term success.

Further Exploration

In the interest of continuous improvement, developers participating in the above activities will gain new understanding of standards & best practices. However, learning does not stop at meetings end. Many of the guiding principles for the Regression Analysis, Code Reviews, and Refactoring sessions are derived from seminal works in programming. Additional study is required to progress beyond static code analysis tools and team standards. Listed below are some recommended background reading materials on software craftsmanship:

Retrieve Fantasy Football Stats using ESPN’s API: Introduction

For the past several years I have been passionate about making things easier in the automation world by taking advantage of API’s. I think many people who don’t have experience working with web services can feel intimidated by them, and might be looking for a good excuse to practice with them. A couple years ago I found this post explaining how to connect to ESPN’s “hidden” API using Python. I’m a huge fantasy football nut, and since I work with Ruby so much I decided to build my own project that would connect to ESPN and extract various data for my fantasy football league.

Quarterback, American Football, Sport, Competition

In this post we will be mainly using the Ruby rest-client gem to send GET requests to the API, and then we will be pulling data from the JSON data that we receive back. The main purpose is to show you how to pull ESPN data, but we will be trying to look at this from a learning perspective and highlight practices that we can use when working with any web service. We’ll be building out several classes that interact with different pieces of data and organize our code in a way that makes sense. First, let’s give a little background on fantasy football and why this is some fun data to pull. Even if you don’t care about fantasy football, hopefully this post will still provide some useful information for you to learn from.

For the uninitiated, fantasy football is when a group of degenerates pit their imaginary football teams against each other in a weekly matchup. Everyone gets to draft real players to fill out their rosters, set their lineups, make trades, pick up free agents, and much more. Points are awarded based on stats such as yards gained and touchdowns. Many fantasy football platforms supply you with lots of good data, but we don’t have the raw data to play around with and analyze. We could just use Selenium to scrape data off the site, but websites are subject to change and API’s tend to be much more stable.

Note: From here on, I will assume that you have a valid installation of Ruby and Rubymine. For instructions on this, see Josh’s previous blog post here and stop at “Install Appium”.

So let’s get started building our new project. We’re going to begin with a new Ruby class called main.rb. We’ll also want to create a Gemfile to bring in the necessary libraries. As mentioned before, the only gem we’ll need for now is rest-client. In my environment, I was also receiving an error for the FFI gem, so we’ll specify a version for that as well.

Gemfile for our project.

Go ahead and do a bundle install if you don’t already have these gems (Tools >> Bundler >> Install). Then we’ll want to pull in our gems to our main.rb class:

require ‘rest-client’
require ‘json’

If you’re following along for your own fantasy league, you may need to pause here. For those of you with private leagues, you will need to go into your browser to retrieve some cookies. Instructions for this can be found here. Those with public leagues can skip that step. This is a good time to point out that oftentimes the hardest part of accessing an API is authentication. Web services use a wide variety of authentication methods, and it is important to keep in mind that simply getting hooked up might take more time than you may think.

The other piece of data we’ll need to get started is our league ID. This ID can be retrieved if you go to your league page in ESPN and look at the URL:

Fantasy League ID

Let’s go ahead and set our league ID to a global variable at the top of our file since we’ll need to use that variable across multiple files. If you have a private league, let’s assign the S2 value and SWID in the same place. This is lazy and is generally bad practice, but we’ll make sure to come back later and move those variables to somewhere more appropriate. Our class should now look something like this:

Our main.rb file so far.

Now we can test our API call. As a general rule, we want to wrap our rest-client calls inside a begin/rescue block. This is because if a call fails, it will crash our whole suite. This is usually not a desired behavior, because we will either (a) want the test to try again, or (b) do something useful with the failure message so we can see what the issue is. Our rest-client call is going to need a URL, a method, and some headers if we are accessing a private league.

Since we are just retrieving data, our method will be a GET. To simply try our connection, we can use the following URL:{$league_id}?view=mMatchup&view=mMatchupScore&scoringPeriodId=1

In this URL, the “seasons/2019/” specifies that we want to look at the 2019 season. Then we specify our league ID, and the “scoringPeriodId=1” query parameter tells the API to pull the data for week 1 of the season. For now, let’s assign value to a variable called “url“.

We will get into the API endpoints as we go forward, but this is the main one that we’ll be working with for now. If you are using a private league, we can assign our headers value to a variable as well. You don’t need to specify headers if you are using a public league. Our headers will look like this:

headers =
            swid: $swid,
            espn_s2: $s2

Our rest-client request will look like this:

Our rest-client request.

Here we can point out a few good practices that we’ve implemented for this basic action that will make our lives easier as our scripts get larger. We mentioned that it’s helpful to wrap our requests in begin/rescue blocks. The above code will give us a much cleaner failure than if we let the program output the failure on its own. Also, our call is nice and clean because we have variables defined for the URL and headers.

Go ahead and execute your code. If your console doesn’t show any text, then congratulations! Your call was successful. If our “Request failed” text is displaying, then you may need to go back and verify your league ID or ESPN cookies.

Now let’s explore what we have in our successful response. We have a large JSON block stored inside a RestClient::Response object. Here we can use our JSON library that we required earlier to parse this data into a Hash that we can more easily read.

Debugger view of response variable

We can perform this action and assign the hash to a variable with the code:

Convert JSON to Hash
Data Hash

It looks like we’ve received quite a bit of data back! For simply pulling stats, we aren’t going to need most of this at the moment. We can see that we have pulled some league data, some scheduling data, our league ID and scoring period, and other various data. The key that we’re going to be concerned with for now is Teams. When we start to explore this entry, we’re going to be hit in the face with a pretty deep hash:

Now we’re to the fun part! In the interest of added suspense, we’re going to end this post here before we dive into parsing out and organizing our data for use. If you don’t want to wait on me, the Python blog post mentioned at the top should have enough information for you to continue on your own. Let’s review what we covered so far:

  1. ESPN has a semi-hidden API that we can use to pull data from.
  2. We can easily use our rest-client gem to pull data cleanly.
  3. Global variables for data are usually bad and we still need to address how we are storing our data before it starts to pile up.
  4. We should typically be wrapping our API requests in begin/rescue blocks in order to better handle potential errors.
  5. JSON responses can be easily converted into Hash objects in order to make them more usable.

It may not feel like we’ve accomplished much so far, but we are well on our way to pulling lots of useful data that we can have some fun with. Look for Part Two soon!

From the Pipeline v3.0

This entry is part 3 of 36 in the series From the Pipeline

The following will be a regular feature where we share articles, podcasts, and webinars of interest from the web. This week we’ll showcase a articles on CI/CD Pipelines, Continuous Testing, the Spotify Model, Unit Tests, and a Webinar series on Automation.

Announcing General Availability of YAML CD features in Azure Pipelines

For those of you working in Azure DevOps, Microsoft recently made an update to their Pipelines feature to help support CI/CD. Entire CI/CD workflows can be defined in a YAML file and be versioned with the rest of the code.

How to Create An Automated Test Strategy + Plan

Great article by Perfecto that provides a high-level view of crafting an automated test strategy. Nearly every software company is aiming for CI/CD or maximizing the efficiency of their existing CI/CD. The article provides those steps, from value stream mapping the pipeline to building in flexibility to the testing platform. There is plenty of solid references in the article as well for those looking to learn more about automated testing in general and continuous testing in particular.

Failed Squad Goals

A wonderful look at the Spotify Model by Jeremiah Lee from his time at the company. The Spotify Model is revealed to be more aspirational than actual, with the company struggling from the management side of growth to team collaboration. As someone who has previously used the Spotify Health Check Model for teams, I’m fascinated by this look into Spotify and feedback from people who actually worked there.

Unit Tests Are Tests of Modularity

Michael Feathers posts a fascinating article that questions the size of a unit test. He posits a unit test can be a class, a function, or a cluster of either so long as it’s something “small” that is a unit of the application under test. The unit test should align with and enforce modularity and encapsulation. I think his views offer a smart philosophy to approaching code – if you are having difficulty writing tests then that’s a good indication the code could be more module so you can see the distinct pieces.

The Summer of Learning

BrowserStack established a free “Summer of Learning” webinar series for people interested in automated testing of web- and mobile-applications. Recently David Burns joined the BrowserStack team. David is a core contributor to Selenium and was previously responsible for GeckoDriver (Firefox) while working at Mozilla. This webinar series is a great idea to uplift skills while most of us are working from home.

Episode 1 — The Basics: Getting started with Selenium: An introduction to Selenium, how to set up/write your first test scripts, and how to pick the right framework. This is a great introductory session for those looking to learn test automation in 60 minutes.

Episode 2 — Introduction to BrowserStack Automate: In this episode, you’ll learn how to set up and run your first test with Automate, how to test on various real devices and browsers on the BrowserStack Real Device cloud, how to test your local instance on the cloud, and how to collaborate and debug better.

Episode 3 — Continuous testing at scale: You’ll learn how to build an efficient, well-integrated CI pipeline that helps release quality software at speed. You’ll also learn how to use BrowserStack to deploy faster and listen to stories from great companies like The Weather Channel, who release to millions of users every day.

Episode 4 — Selenium + BrowserStack at scale: In Episode 4, David Burns, core contributor to Selenium will explain how to plan parallelization more effectively to achieve faster build times, the best ways to maintain test hygiene while scaling your team or automation suite, and how to monitor test feedback effectively.

Episode 5 — Testing for a mobile-first market: There are 9,000 distinct mobile devices in the market—and you most definitely can’t test on them all. But with this episode, you’ll learn the best strategy to pick the right devices for testing your website or mobile app.

Cukes and Apples: App Automation with Ruby and Appium

(Part One)

This post will be the first of a series that demonstrates how to build robust mobile test automation using Ruby, Cucumber, and Appium. The initial implementation is relatively simple – a good place to start, but not mature. Over this series, we will upgrade this Ruby Cucumber test suite to add capability and improve ease of use. This post will focus on introducing tools and setting up our project.

Workspace Setup

I’m using a Windows PC and Android phone at this time. Some things I write will be specific to that platform configuration and would differ for users of Mac and iOS. I will point out the differences where I can but focus primarily on Android implementation. I hope to include much more information about Mac and iOS in a future entry about cross-platform support.

Install Ruby

As a Windows user, I use RubyInstaller to install Ruby. For this series, I selected Ruby 2.6. In this case, as I usually do, I chose the recommended version (as shown below) for greatest gem compatibility.

Download RubyInstaller from the following location and install with default options selected:

If you are a Mac user, you have Ruby installed by default, but an older version. You can install a newer version of Ruby with a version manager like rbenv or rvm. I have used rbenv and I recommend it. Check out the installation instructions here:

Install RubyMine

You don’t need RubyMine, but I do recommend it. I use it myself, so my examples will show it. I’m using RubyMine 2020.1.

Download RubyMine from the following location and install with default options selected:

Install Appium

If you download Appium from the Appium website, you get Appium Desktop, which augments Appium with a graphical interface and tools for inspecting elements in mobile apps. I’m using Appium Desktop 1.15.1 (exe).

Download Appium from the following location and install with default options selected:

The install process is very straightforward for Appium Desktop, but the Getting Started page in the official Appium documentation explains how you can also install Appium (without the GUI) using NPM.

Set Up a Device

We can test our Appium installation against a device or a simulator – for this post, I will be using a real Android phone. If you are using Mac and iOS, I recommend starting with an iOS simulator.

To allow automation on an Android device, you must allow USB debugging in the developer options. The process of enabling developer options varies for different phones, so you will need to find documentation specific to your phone. For mine, I had to launch the Settings app and tap on the Build Number seven times. The following messages were displayed while tapping and afterward.

Once developer options are enabled, you should find a switch for USB debugging under Developer Options. Toggle that on.

Be mindful of popups that appear when your device is connected to a computer. Requests for access, like the one below, can prevent Appium from controlling the device.

Install the Android SDK

The Android SDK includes adb.exe, which allows us to query for device names and control connected Android devices. To acquire the Android SDK, we must install Android Studio.

Download Android Studio from the following location and install with default options selected, or choose custom installation to install device emulators with the Android Virtual Device option.

After installing the Android SDK, you will need add it to your system path. Create an ANDROID_HOME environment variable for the SDK install location.

And then add the following two directories to your Path variable.

With the Android SDK installed, it should be possible to retrieve the name of your connected device.

Check Appium

We can verify our device setup and Appium configuration by using Appium to launch an app on a device.

When you first launch Appium, you should be able to select Start Server without making any other configuration changes. After the Start Server button is selected, Appium displays a message that confirms the server is running.

Create a New Project

Create an empty Ruby project and add a Gemfile which includes both the cucumber and appium_lib gems.

Use Bundler to install cucumber and appium_lib.

Note: This installation process may produce specific errors later. If you see LoadErrors with the message “Cannot load such file” and references to ffi or eventmachine, I recommend uninstalling the offending gem, then reinstalling it with the platform argument.

ex. “gem install eventmachine –platform ruby”

A LoadError, as mentioned above

Cucumber Directories

Create the following directories for a Cucumber test suite, and create a Ruby file named env.rb in features/support

  • features/gherkin
  • features/step_definitions
  • features/support

Bringing It All Together

Open up env.rb. We are going to use it to require the Appium gem ‘appium_lib’, and then write a simple script to prove our workspace setup was successful.

The screenshot above features all of the code we need to verify that Ruby, Cucumber, and Appium are all cooperating. When Cucumber starts, env.rb will be executed, an Appium driver will be created, and an app will be launched.

Where did the capabilities information come from? The value associated with deviceName was identified with adb. The appPackage and appActivity are both references to the Google Play Store app, but any app will do.

Mac and iOS users will ignore the appPackage and appActivity capabilities – use bundleId instead. See this documentation:

Did It Work?

As shown in the screenshots below, the Cucumber process executed successfully (with no scenarios) and the Google Play app was launched on my phone.

Coming Up Next

There is more work required before our mobile test suite will be functional, and a lot more before it’s mature. Today, we covered workspace setup. In the future, we hope to deliver some or all of the following topics:

  • Full integration of Appium and Cucumber – managing the driver and capabilities across tests, writing steps for mobile automation
  • Implementing the Page Object pattern – building mobile page objects to organize information and behavior
  • Cross-platform mobile automation – creating flexible execution mechanisms, page objects that cover multiple platforms, tags for platform-specific execution


From the Pipeline v2.0

This entry is part 2 of 36 in the series From the Pipeline

The following will be a regular feature where we share articles, podcasts, and webinars of interest from the web. This week we’ll showcase a articles on Robot Process Automation, Source Code management, Mobile testing, Automation as Documentation, and a free virtual conference.

3 Steps for Deploying Robotic Process Automation

Jeff Machols outlines a high-level adoption of Robot Process Automation (RPA). RPA has become a hot tech topic in the last few years with many companies adopting an RPA solution to automate business processes. The article also links to several additional articles about RPA that are useful for anyone looking to learn about the subject.

Patterns for Managing Source Code Branches

Martin Fowler is back with another gem about source code management. The article is worth the read and should be part of a book on source control. Details below:

Modern source-control systems provide powerful tools that make it easy to create branches in source code. But eventually these branches have to be merged back together, and many teams spend an inordinate amount of time coping with their tangled thicket of branches. There are several patterns that can allow teams to use branching effectively, concentrating around integrating the work of multiple developers and organizing the path to production releases. The over-arching theme is that branches should be integrated frequently and efforts focused on a healthy mainline that can be deployed into production with minimal effort.”

Emulator vs Simulator vs Real Devices: Which One to Choose for Testing

A short article about the context for testing mobile applications when considering speed and reliability. The article also links to a few podcasts, one of which with Perfecto Mobile’s developer advocate Eran Kinsbruner.

SmartBear Connect Conference

This year the SmartBear Connect conference is going virtual. The event is free and will be held April 27-28. There are some great speakers lined up to present at the conference. Red Green Refactor will post additional events on our Events page HERE.

Replacing the Water Cooler

An open invitation by Atomist to use automation as a means of transferring knowledge, which they call “skills”. They have an early preview available if you sign up at the bottom of the article.

From the Pipeline v1.0

This entry is part 1 of 36 in the series From the Pipeline

The following will be a regular feature where we share articles, podcasts, and webinars of interest from the web. This week we’ll focus on five articles published recently about refactoring, code reviews, leadership, bug reports, and unit testing.

Refactoring: This Class is Too Large

From Martin Fowler’s blog, this piece by Clare Sudbury takes the reader through the step-by-step process of refactoring. The article is long but well worth the time as you learn how to systematically identify smells in a codebase and clean them up.

How To Improve Your Git Code Review Workflow

While the purpose of this article leads toward a particular tool, the lessons from the code review process are important and tool agnostic: require code reviews before merging changes, make reviews accessible to global teams, setup an effective workflow, and integrate with CI.

The Difference Between Compliance and Commitment and How to Create Committed Teams

From the leadership viewpoint, this article explains the difference between compliant and committed employees. Building a culture of accountability is essential to engage employees and, according to studies cited in the article, leads to higher productivity.

Writing Good Bug Reports

Andy Knight provides some essentials on writing good bug reports. This is a solid investigation into what goes into a bug report, why we should write them, and what are the “do’s” and “don’ts” of bug reports.

JUnit vs. TestNG: Choosing a Framework for Unit Testing

A comparison of two popular Unit Testing Frameworks by Junaid Ahmed. In the article, Junaid provides great details on the differences in annotations, executing the test suite, reporting, and ultimately what criteria developers should consider when selecting a Unit Test framework.

Slaying the Hydra: Parallel Execution of Test Automation

This entry is part 1 of 5 in the series Slaying the Hydra

The Great Constraint

“How long does it take to run the regression?”

“Why does the regression take so long?”

These questions represent the a large constraint with test automation execution. If the suite doesn’t provide feedback in an appropriate time frame it impacts both the decision-making ability of our stakeholders and our ability to maintain the quality of the tests.

To put things more simply, single threaded execution of automated tests are often too slow to meet the business needs of the application under test.

Modifications can be made to increase the speed of the suite by shaving down the run time of individual scenarios and/or removing unnecessary scenarios. Ultimately, we end up in the same place where the regression is just simply too slow.

Thomas has a great blog post about the common failure points of automation implementation. I would strongly suggest reading this as it is a good starting point to understanding automation challenges and provides a foundation for where we are going.

The Real Question

The real questions posed back to your team should be “What is the appropriate execution time of the regression?”

The answer “as fast as possible” is not acceptable. Increased speed means increased resources and planning that will cost the team both time and money. Getting an accurate answer to this question becomes the basis of our research on the cost of the solution.


For the sake of argument let’s say you have a specific execution time for the feedback loop. If the current infrastructure does not support a feedback loop that short, the team should consider:

Are the individual test scenarios robust and independent enough to handle being executed in parallel?

If the answer here is no for any reason this work should be included as part of the effort. In an ideal world a test scenario should execute completely independent of other scenarios, meaning it should not impact or be impacted by other scenarios (commonly called “atomic” tests).

Does the team have the drive to provide time and resources to this effort?

The resources could be everything from additional physical or virtual machines to time with other developers/team members to help build the solution. If the team is not able to free up team members to work on this solution then it’s a wasted effort. Additionally, ensure that there are motivated capable individuals on the team that can contribute.

Past Solutions

I’ve experienced the speed of the regression impacting the teams I have supported in my career. The solutions below are ones that I have implemented in the past that I would not recommend:

In Cucumber, tagging is a process done at the feature or scenario level to group scenarios into locatable and executable sections. This process is helpful for smoke tests, regressions or functional areas that can the be executed or excluded at run-time. I would not recommend splitting regression for parallel execution utilizing static tags because tagging should be used to signify the logical groups a test belongs within and nothing more.

An extension of the above would be running different logical groups at different times. For example: running checkout scenarios on Tuesday and search scenarios on Wednesday. The feedback loop for the regression is now multiple days and doesn’t provide rapid feedback which we expect.


So far, I have told you what I believe to be the most common constraint in test automation feedback lops, some questions I would ask your team, and some things I would not recommend doing. In this section I am going to go full ten commandments style and lay down the requirements of what we want from our tool.

Our tool should be able to:

  • Execute on multiple workstations in parallel in order to increase the efficiency of running the scenarios.
  • Utilize a CI/CD tool to allow for orchestration of the process.
  • Report back the status of the regression in a meaningful and consumable way to our stakeholders.
  • Allow for easy modification where/when required.  

Going forward

With this information in mind the following course is going to be taken as a series of blog posts in order to serve as a guide in fulfilling these requirements:

Part 1 – Orchestration overview and setting a clean slate – In this section the practical implementation of the orchestration component will be discussed along with the importance of insuring a clean slate.

Part 2 – Run-time state and splitting up the execution – Discussion of what should happen during and immediately before the tests begin running.  

Part 3 – Consolidation of information and reporting – How to collect test result information and report it to the stakeholders.

Part 4 – Modifications and next steps – What potential changes could occur and what are the next steps from here.

Six Common Challenges of Test Automation and How to Beat Them

This will be a series of posts about the strategy and tactics of test automation. My team has experience working at multiple large firms with an enterprise-wide scope. Throughout our time working in IT, we have encountered challenges with existing test automation implementations and unfortunately committed several mistakes on the way. The content of this post will focus on UI-automated tests because in our experience that’s where most of the misplaced effort is found in test automation. Our hope is to relay some of these common challenges and solutions to a wider audience so you can defeat the automation supervillains.

Challenge One: The Automation Firehose

Just because a scenario CAN be automated does not mean it SHOULD be automated. Teams that adopt automation often rush to automate everything they can — the automation firehose. The firehose results from teams being enamored with a new tool and they want to use it everywhere. It’s a natural inclination for those of us in technology to be excited about new tech.

Instead teams should adopt a risk-based approach to determine the most critical scenarios needing attention. For those scenarios that should be automated, every team must adopt an implementation plan to ensure value is derived from reliable automated test execution. That plan should include entry and exit criteria for any automated scripts that take into account schedule, budget, and technical skillset of the developers. Additionally, the automation scripts should be focused on frequently used / critical paths, heavy data dependency, and include legal risk (SOX compliance, ADA compliance, etc.).

One recommendation is to use an “automation scorecard” to identify the most important scenarios to automate. The columns should be criteria you will use to judge whether or not a scenario should be automated. The rows will include either feature-level work or individual scenarios. In the example provided we use simple checkboxes to help determine features that should be automated. Checkboxes could easily be replaced with a scale of zero to ten, low-medium-high, or whatever criteria the team agrees to use. Only four categories are used in the example, but you could easily extend this based on team or organizational-values. A key component of using this sort of scorecard is to establish a threshold for scenarios to be automated so teams can start with the most VALUABLE scenarios first and work their way down the list.  The result is often a more focused UI-automation suite, with more valuable tests that require less upkeep (because there are fewer of them).

Challenge Two: Data Failure

When a team writes an automated test only considering a single test environment, they are selling themselves short. An even larger problem for testers is simply not having access to or control over their own test data. The data can be stale in an environment or only be applicable to a single environment or be restricted by an external team or come from an external vendor. There are many ways we can run into data challenges in testing, which also extends to our automated tests. A test that only works in a single environment cannot achieve the same value proposition as a test that works across multiple environments. Consider one of the “selling” points on test automation – those automated checks can run many times unattended or part of a build pipeline to provide the team insight about the state of the application. A test that only works in one environment has limited scope and cannot achieve its full potential. Perhaps that automated check shouldn’t have been written in the first place because it takes more time to write & maintain than it would to execute manually.

To address this challenge, make sure cross-environment compatibility is an up-front concern. Before the development work even begins on a feature, test data generation & manipulation across multiple environments should be part of the “ready” state criteria. Additionally, execution of those automated checks across multiple environments should be part of the “done” state criteria. Advising people to adopt this approach is the easy part. How can control of test data for automation be achieved? Through persistence and patience. As a precursor to having test data across environments part of any “ready” and “done” state criteria, it’s important to capture what your data needs are and how to best use that data. Some of these tips are in a prior blog post, Fictional Test Data. Map out the application under test using a context-dependency diagram. Identify the inputs & outputs of your system and the expected outcomes. From that refined view it will be more apparent what data is needed and when you need to create, read, update, and delete (simple CRUD).

While the topic of test data at large is beyond the scope of this post, for automated checks we first identify what the needs are and then fight to get access to that data. The best persuasive argument that you can make to the management and cross-impacted teams is to show empirical evidence where this lack of data is hurting the company. What bugs have escaped because you couldn’t test in an environment? What automated checks needed to be tested manually across those environments? What stale data or selfish data do you have today that is hindering the team’s ability to deliver in a timely manner? Identifying those concerns using evidence will help build your case to get the access needed or at least pave the way to generate fictional test data for those automated checks. Once you have that clear picture, then adopt those “ready” and “done” state criteria requiring test data so your tests can be cross-environment compatible and have a higher ROI.

Challenge Three: Flickering Tests

Flickering Tests or “Flaky” tests are tests that can either pass or fail even when run on the same code. Automated tests that don’t consistently pass are soon ignored by the entire team. The execution report, dashboard, and notification emails should mean something. Flickering tests are pernicious threats to an automation suite; they steal time away from more valuable activities; they hurt the trustworthiness of our automated executions; and the limit the success of future tests because they can’t be used as building blocks.

“A test is non-deterministic when it passes sometimes and fails sometimes, without any noticeable change in the code, tests, or environment. Such tests fail, then you re-run them and they pass. Test failures for such tests are seemingly random. Non-determinism can plague any kind of test, but it’s particularly prone to affect tests with a broad scope, such as acceptance or functional tests.” – Martin Fowler

Martin Fowler has a response to flickering tests that is quite appropriate given the current state of the world: quarantine. First remove the flickering tests from any active executions (triggered by scheduled jobs or part of a build pipeline). The quality of the automated tests must be maintained lest we lose confidence from our team and our stakeholders. Next perform root cause analysis on each flickering test to determine the source of the flakiness: our own coding practices, environment, data, the application under test, external service, or any combination of the listed reasons. This can be a time intensive endeavor but it’s important to address these issues before your automation suite turns into a monster you can no longer fight. If the source of failure can be addressed, then the test can be added to the rest of the executions; otherwise remove it.

Challenge Four: Long Tests

Another common problem seen in automation suites are overly long tests with literally hundreds of validations. These tests perhaps started with a small scope but began a long scope creep as more and more validations were tacked on to a flow. Validations for fields and messages and navigation – any and everything could be added to these monstrous test cases. There are a host of problems with this approach. For one, long tests take a long time to execute. If the goal is fast feedback, especially fast feedback in a CI/CD pipeline, then long tests will kill your ability to deliver quickly. Another issue is missed validations. Many automated testing platforms skip the remaining validations within a single test once a step fails. If a long test fails at step 20 of 300, then you have no idea if there are issues with step 21 through 300. The team now has less knowledge about the state of the application because those missed validations are unknown until you move beyond that failed step. Lastly, many of the validations in those long tests should be unit or integrations tests. That test is sacrificing speed and quality and returning little of value.

Slice and dice long tests. Ideally each automated check focuses on a single validation or “outcome”. UI tests should be focused on a successful outcome from a user’s perspective. For those fields and messages and database calls, instead implement the tests most suited to fast feedback and robustness. An automation approach needs to place unit tests and integration tests as a priority over UI tests. Automate UI as needed to verify end-user behavior.

Challenge Five: Shaky Foundation

We have all been victim to the “project deadline” bug. Whatever the best intentions were at the outset of a project, we become constrained by timelines that simply will not move. All the grand ideas we had about architecting an awesome solution are soon thrown by the wayside in favor of getting “done”. So we continue to make sacrifices to the quality of our work for the sake of getting done again and again. The problems with our automation suite pile up and soon we’re writing our code to get to the next day rather than help the poor schmuck who will have to dig through our code 5 years from now. Whomever that guy/gal is will likely throw the codebase away and start anew because we’ve built the entire suite on a shaky foundation.

Our team has thrown plenty of legacy automation suites in the garbage and a few of our own joined the pile early on when we realized the mistakes we made were not sustainable. An automation suite that is not constructed properly from the beginning and maintained throughout its life will eventually fall apart. It’s a lot easier to make a lot of small bad decisions to get “done” than short-term costly up-front decisions that ultimately save us time down the line. Once that shaky suite is built it’s hard for the creators and maintainers to discard it because of the sunk-cost fallacy. A better path is to architect an automated solution with good practices from the start and to consistently engage in activities to promote the quality of the suite.

Treat the automation code with the same care and expectations as you would expect of the development code. That means leveraging the existing principles of “Don’t Repeat Yourself” (DRY) &“Keep It Simple, Stupid” (KISS), implementing design patterns to support the overall goal of the automated solution, scheduling regular code reviews, using code analysis tools, and engaging in regular refactoring sessions.

The preceding topics and their associated sources are too large for a single article to cover, but we’ll attempt to do them justice in some concise advice. If you’re testing web applications, it’s a good idea to consider using the Page Object pattern or the Screenplay pattern. These are tried-and-true patterns with a great deal of background material and active training to support learning. Many of the existing version control tools out there have built-in policies to ensure code reviews are performed before branches are merged. These automatic tollgates can help enforce code review practices agreed to by a team and help spread domain knowledge by checking each other’s work. Static code analysis tools or linters are great at picking up common errors in the code; execution of such linters can be made standard practice with each commit or separately executed to support refactoring sessions. Lastly, regular refactoring sessions should be held by the team and supported by an outside “automation oracle” to help improve the state of the codebase while also sharing domain knowledge. More will be shared on refactoring sessions in a later article.

All these activities described above are designed to support the quality of the automation code. It certainly sounds like a lot of work – but quality doesn’t come free. The team must be committed to the quality of the testing activities with the same vigor we expect of development work or business analysis or infrastructure. Avoiding a shaky foundation through good practices will help keep the automation in a healthy state.

Challenge Six: Automation Lags Behind Development

Similar to the “deadline driven development” described in the prior challenge, teams often run into a time crunch in the handoff from development to testing. Development extends beyond their initial estimations and the time allocated for testing becomes more limited. Since automation scripting does take time, teams can fall into a trap of skipping automation for the sake of manual validation or pushing the automation to the next project or Sprint for the sake of pushing to production on time. This creates a build-up of automation technical debt since there are likely candidate test cases for automation that are simply not done, or the team violates their working agreement and pushes through development work that hasn’t been tested properly. Continuing this practice project-after-project or sprint-after-sprint results in an accumulation of technical debt that limits the test coverage of an application. Ultimately defects will escape into production if a team constantly has testing as a whole (and automation specifically) lagging behind development.

To address the issue of automation lagging behind development, it’s imperative for a team to incorporate automation feasibility into the entry criteria for any feature to be developed. That means the team determines test automation candidates during refinement of new stories, which include the aforementioned access to test data from Challenge #2. Additionally, teams must consider completed (and executed!) scripts as part of the definition of done or exit criteria for any development work. If deadlines preclude this work from being done, teams should adopt working agreements that the “missed” automation is added as technical debt to be addressed at the beginning of the next Sprint or Project cycle. If this becomes a common occurrence, then the team must address the underlying cause of their estimations being lower than what’s needed to deliver a product that is tested according to their standards.

To help ensure automation runs concurrently with development, teams should adopt development standards that help promote automation as an upfront concern. That can include Test-Driven Development (TDD), Acceptance Test-Driven Development (ATDD), as well as Behavior-Driven Development (BDD). These practices promote testing up front and testing from the perspective of the user. When working on UI automated tests, it’s recommended the developers maintain standards for element locator IDs so the automation developers can write scripts concurrently with development.

Post Credits Scene

The challenges discussed in this post were not an exhaustive list of all the problems a team could face with test automation but do provide insight into common issues. Test automation is a big investment for an organization; it’s not a magic wand that makes all testing less costly or finds all your bugs. Automation is another tool to support the team in their quest for quality. Teams that treat their automation code the same as development code and follow practices that promote good code quality are more likely to have long-term success with their automated tests. You don’t have to be a superhero to write good automated tests – all you need is a desire to improve and the will to see it through.

Fictional Test Data

An underlying principle in our work as software developers is that everyone should understand our work. From design to production, we strive to produce sensible models for other humans to understand. We design requirements for clarity, and hammer them out until everyone involved agrees that they make sense. We write code that is self-documenting, employs conventions, and uses design patterns, so that other developers can better comprehend how it works. We write tests that tell detailed stories about software behavior – stories that are not only truthful, but easily understood. We enshrine this principle in the tools and processes we use, in quality assurance especially, with tools like Cucumber and Gherkin, which emphasize collaboration and communication.

We are storytellers

To that end, I propose an exercise in which we try on a new hat – the storyteller.

I sense some parallel between my experience as a reader and my experience in quality assurance. I feel sensitive to the difference between an easy, accessible writing style, and writing that is denser and more challenging. Popular authors like Stephen King are criticized for being too prolific, too popular, and too easy to read, but there is a great value in accessibility – reaching a wide audience is good for the business model.

In software development, striving for accessibility can be valuable. Most of the difficulty that I’ve witnessed and experienced can be attributed not to the inherent complexity of code, or to the scale of a system, but to simple miscommunications that occur as we work to build them. From the perspective of quality assurance, it’s particularly harmful when our tests, our expressions of expected system behavior, are difficult to understand. In particular, I find that test data which drives a test is difficult to understand, untrustworthy, and time-consuming to manage.

When I say “test data”, I’m speaking broadly about information in our systems as it is employed by our tests. It’s helpful to break this down – a common model categorizes information as master data, transactional data, and analytical data.

Most of the data that we directly reference in our tests falls into the category of master data. Master data includes business entities like users, products, and accounts. This data is persistent in the systems that we test, and it becomes persistent in our tests too – most test cases involve authenticating as some kind of user, or interactioning with some kind of object (like a product). This is usually the main character in our stories.

Transactional data is just what it sounds like – transactions. In our systems, this may include purchases, orders, submissions, etc – any record of an interaction within the system. We don’t usually express this directly in our tests, but transactional data is intrinsically linked to master data, and the entities that we use in our tests are further defined by any associated transactional data.

The last category is analytical data, which is not obviously expressed in our tests. This encompasses metrics and measurements collected from production systems and users to make business decisions that drive software development. It tells us about the means by which users access our systems, and the way that they use them. This data is also a part of our tests – we employ information about real users and real interactions to improve our testing, and all of our test data becomes a reflection of the real world.

What does our test data typically look like?

I wouldn’t judge a book by it’s cover, but I would like to read test data at a glance. That’s not easy to do when we share user data that looks like the following example:

We don’t know much about this user without doing further research, like slinging SQL queries, or booting up the app-under-test to take a look. This information is not recognizable or memorable, and it undermines the confidence of anyone who would attempt to read it or use it. It tells a poor story.

Why do we construct data like this? The test data I remember using most often was not particularly well-designed, but simply very common. Sometimes a user is readily shared amongst testers because it is difficult to find or create something better – I give this user to you because it was given to me. At best, we could infer that this is a fair representative of a “generic user” – at worst, we may not even think about it. When we discover some strange new behavior in the system, something which may be a real defect to act on, we often need to ask first “was this data valid?”

Would our work be easier if our data was more carefully constructed?

As an example, I present the Ward family. I designed the Ward family to test tiers of a loyalty points system, and each member represents a specific tier. For the highest tier user, with more rewards than the others, I created Maury Wards. For the middle tier, a user with some rewards – Summer Wards. To represent the user who has earned no rewards – Nora Wards. If the gag isn’t obvious, try sounding out the names as you read them.

I created these users without much though. I was just trying to be funny. I don’t like writing test data, and making a joke of it can be motivating. What I didn’t realize until later is that this data set was not only meaningful, but memorable. I found myself re-using the Ward family, every time I needed a specific loyalty tier, for months. I knew what this data represented, and I knew exactly when I needed to use it.

Beyond the names, I employed other conventions that also made this data easier to use. For example, I could summon these users with confidence in all of our test environments because I gave them email addresses that indicated not only what kind of user they are, but what environment they were created in. I recommend applying such conventions to any visible and non-critical information to imbue data with meaning and tell a clear story.

What could we do to tell a more detailed story?

User stories are relayed to us through an elaborate game of telephone, and something is often lost along the way. Take a look at the following example, and you may see what I mean.

“As a user”. Right there. This example may seem contrived, but I’ve seen it often – a user story without a real user. This doesn’t explicitly encourage us to consider the different real-world people who will interact with our software, and the kind of tests that we should design for them. It would probably make an individual requirement clumsy to include much more explicit information about the user, but it is important. Imagine that this feature was tested with “a user”, and it passed without issue – great. But what about Dan? Dan does all of his business online, and doesn’t shop in-store. Where he lives, our system won’t even recommend a nearby store. How can we avoid forgetting about users like Dan?

If we can’t describe the users in a requirement, what can we do?

Alan Cooper, software developer and author of The Inmates Are Running The Asylum, argues that we can only be successful if we design our software for specific users. We don’t want all users to be somewhat satisfied – we want specific users to be completely satisfied. He recommends the use of personas – hypothetical archetypes that represent actual users through the software design process. UX designers employ personas to infer the needs of real-world users and design solutions that will address them, and for quality assurance, we should use the same personas to drive test case design and execution.

If I expanded a member of the Wards family into a full persona, it might look like the following example – a little something about who the user is, where they are, and how they interact with our system.

A persona includes personal information about a user, even seemingly irrelevant information, like a picture, name, age, career, etc – to make them feel like a real, relatable person. Thinking about a real human being will help us understand which features matter to the user, and how the user will experience these new features, to design test cases which support them.

A persona includes geographic location, especially when relevant in our tests. Software might behave differently depending on the user’s specific GPS location, local time zone, and even legislation. A user may be directed to nearby store locations or use a specific feature while in-store. Our software may behave differently depending on time and date – for example, delivery estimates, or transaction cut-off times. Our software may need to accommodate laws that make it illegal to do business across geographic boundaries, or to do business differently. The California Consumer Privacy Act (CCPA) is a recognizable example with implications for all kinds of software-dependent businesses.

A persona also includes information about the technology that a user favors. This is the lens through which they view our software, and it changes the user experience dramatically. How is this feature presented on PCs, smartphones, and tablets? Does it work for users on different operating systems? Which browsers, or clients, do we support? We can design personas for users with many combinations of hardware and software, and then execute the same test with each of them.

Hope lives in Honolulu, Hawaii, and I chose the name because the ‘H’ sound reminds me that. She lives in the Hawaiian-Aleution time zone, which can be easy to forget about if we do most of our testing against a corporate office address. She uses a Google Pixel 3 and keeps the operating system up-to-date – currently Android 10. While Honolulu is a major city, I took a liberty of assuming a poor internet connection – something else which may not be tested if don’t build personas like this.

Lee lives in Los Angeles, CA – Pacific Time Zone. He uses an iPhone-XS Max, and he doesn’t update the operating system immediately – he’s currently using iOS 12. He has a good network connection, but there’s a wrinkle – he’s using other apps that could compete for bandwidth and hardware resources.

Cass lives in Chicago, IL – Central Time Zone. She’s another Android user, this time a Samsung device, currently running Android 9. She has a good connection, but she’s using other apps which also use her GPS location.

How do we manage all of this?

If I asked you today, “where can I find a user who meets a specific condition,” where would you start? How is your test data managed today? There are plenty of valid solutions, like SharePoint, wikis, network drives, etc – but don’t think of the application database as a test data library – in test environments, it is not a library, but a landfill. There is too much to parse, too many duplicates, too much invalid data – we can only find helpful data if we are very good at finding it. Keep personas and detailed data somewhere that can be easily accessed and manipulated.

We can further reduce the work of test data management by treating the collection like a curated, personal library, where every story is included for a reason. Take care to reduce noise by eliminating duplicate data sets, and removing invalid ones. Name data sets for reference so that they can be updated or recreated as needed without disrupting the requirements, test cases, and software developers that use them.

In summary, I advocate the following:

  • Test data should be recognizable and memorable
  • Test data should be realistic and relatable
  • Test data should be curated and readily available

Additional Resources:

The Inmates Are Running The Asylum, Alan Cooper
Types of Enterprise Data