Slaying the Hydra: Parallel Execution of Test Automation

This entry is part 1 of 5 in the series Slaying the Hydra

The Great Constraint

“How long does it take to run the regression?”

“Why does the regression take so long?”

These questions represent the a large constraint with test automation execution. If the suite doesn’t provide feedback in an appropriate time frame it impacts both the decision-making ability of our stakeholders and our ability to maintain the quality of the tests.

To put things more simply, single threaded execution of automated tests are often too slow to meet the business needs of the application under test.

Modifications can be made to increase the speed of the suite by shaving down the run time of individual scenarios and/or removing unnecessary scenarios. Ultimately, we end up in the same place where the regression is just simply too slow.

Thomas has a great blog post about the common failure points of automation implementation. I would strongly suggest reading this as it is a good starting point to understanding automation challenges and provides a foundation for where we are going.

The Real Question

The real questions posed back to your team should be “What is the appropriate execution time of the regression?”

The answer “as fast as possible” is not acceptable. Increased speed means increased resources and planning that will cost the team both time and money. Getting an accurate answer to this question becomes the basis of our research on the cost of the solution.


For the sake of argument let’s say you have a specific execution time for the feedback loop. If the current infrastructure does not support a feedback loop that short, the team should consider:

Are the individual test scenarios robust and independent enough to handle being executed in parallel?

If the answer here is no for any reason this work should be included as part of the effort. In an ideal world a test scenario should execute completely independent of other scenarios, meaning it should not impact or be impacted by other scenarios (commonly called “atomic” tests).

Does the team have the drive to provide time and resources to this effort?

The resources could be everything from additional physical or virtual machines to time with other developers/team members to help build the solution. If the team is not able to free up team members to work on this solution then it’s a wasted effort. Additionally, ensure that there are motivated capable individuals on the team that can contribute.

Past Solutions

I’ve experienced the speed of the regression impacting the teams I have supported in my career. The solutions below are ones that I have implemented in the past that I would not recommend:

In Cucumber, tagging is a process done at the feature or scenario level to group scenarios into locatable and executable sections. This process is helpful for smoke tests, regressions or functional areas that can the be executed or excluded at run-time. I would not recommend splitting regression for parallel execution utilizing static tags because tagging should be used to signify the logical groups a test belongs within and nothing more.

An extension of the above would be running different logical groups at different times. For example: running checkout scenarios on Tuesday and search scenarios on Wednesday. The feedback loop for the regression is now multiple days and doesn’t provide rapid feedback which we expect.


So far, I have told you what I believe to be the most common constraint in test automation feedback lops, some questions I would ask your team, and some things I would not recommend doing. In this section I am going to go full ten commandments style and lay down the requirements of what we want from our tool.

Our tool should be able to:

  • Execute on multiple workstations in parallel in order to increase the efficiency of running the scenarios.
  • Utilize a CI/CD tool to allow for orchestration of the process.
  • Report back the status of the regression in a meaningful and consumable way to our stakeholders.
  • Allow for easy modification where/when required.  

Going forward

With this information in mind the following course is going to be taken as a series of blog posts in order to serve as a guide in fulfilling these requirements:

Part 1 – Orchestration overview and setting a clean slate – In this section the practical implementation of the orchestration component will be discussed along with the importance of insuring a clean slate.

Part 2 – Run-time state and splitting up the execution – Discussion of what should happen during and immediately before the tests begin running.  

Part 3 – Consolidation of information and reporting – How to collect test result information and report it to the stakeholders.

Part 4 – Modifications and next steps – What potential changes could occur and what are the next steps from here.

Six Common Challenges of Test Automation and How to Beat Them

This will be a series of posts about the strategy and tactics of test automation. My team has experience working at multiple large firms with an enterprise-wide scope. Throughout our time working in IT, we have encountered challenges with existing test automation implementations and unfortunately committed several mistakes on the way. The content of this post will focus on UI-automated tests because in our experience that’s where most of the misplaced effort is found in test automation. Our hope is to relay some of these common challenges and solutions to a wider audience so you can defeat the automation supervillains.

Challenge One: The Automation Firehose

Just because a scenario CAN be automated does not mean it SHOULD be automated. Teams that adopt automation often rush to automate everything they can — the automation firehose. The firehose results from teams being enamored with a new tool and they want to use it everywhere. It’s a natural inclination for those of us in technology to be excited about new tech.

Instead teams should adopt a risk-based approach to determine the most critical scenarios needing attention. For those scenarios that should be automated, every team must adopt an implementation plan to ensure value is derived from reliable automated test execution. That plan should include entry and exit criteria for any automated scripts that take into account schedule, budget, and technical skillset of the developers. Additionally, the automation scripts should be focused on frequently used / critical paths, heavy data dependency, and include legal risk (SOX compliance, ADA compliance, etc.).

One recommendation is to use an “automation scorecard” to identify the most important scenarios to automate. The columns should be criteria you will use to judge whether or not a scenario should be automated. The rows will include either feature-level work or individual scenarios. In the example provided we use simple checkboxes to help determine features that should be automated. Checkboxes could easily be replaced with a scale of zero to ten, low-medium-high, or whatever criteria the team agrees to use. Only four categories are used in the example, but you could easily extend this based on team or organizational-values. A key component of using this sort of scorecard is to establish a threshold for scenarios to be automated so teams can start with the most VALUABLE scenarios first and work their way down the list.  The result is often a more focused UI-automation suite, with more valuable tests that require less upkeep (because there are fewer of them).

Challenge Two: Data Failure

When a team writes an automated test only considering a single test environment, they are selling themselves short. An even larger problem for testers is simply not having access to or control over their own test data. The data can be stale in an environment or only be applicable to a single environment or be restricted by an external team or come from an external vendor. There are many ways we can run into data challenges in testing, which also extends to our automated tests. A test that only works in a single environment cannot achieve the same value proposition as a test that works across multiple environments. Consider one of the “selling” points on test automation – those automated checks can run many times unattended or part of a build pipeline to provide the team insight about the state of the application. A test that only works in one environment has limited scope and cannot achieve its full potential. Perhaps that automated check shouldn’t have been written in the first place because it takes more time to write & maintain than it would to execute manually.

To address this challenge, make sure cross-environment compatibility is an up-front concern. Before the development work even begins on a feature, test data generation & manipulation across multiple environments should be part of the “ready” state criteria. Additionally, execution of those automated checks across multiple environments should be part of the “done” state criteria. Advising people to adopt this approach is the easy part. How can control of test data for automation be achieved? Through persistence and patience. As a precursor to having test data across environments part of any “ready” and “done” state criteria, it’s important to capture what your data needs are and how to best use that data. Some of these tips are in a prior blog post, Fictional Test Data. Map out the application under test using a context-dependency diagram. Identify the inputs & outputs of your system and the expected outcomes. From that refined view it will be more apparent what data is needed and when you need to create, read, update, and delete (simple CRUD).

While the topic of test data at large is beyond the scope of this post, for automated checks we first identify what the needs are and then fight to get access to that data. The best persuasive argument that you can make to the management and cross-impacted teams is to show empirical evidence where this lack of data is hurting the company. What bugs have escaped because you couldn’t test in an environment? What automated checks needed to be tested manually across those environments? What stale data or selfish data do you have today that is hindering the team’s ability to deliver in a timely manner? Identifying those concerns using evidence will help build your case to get the access needed or at least pave the way to generate fictional test data for those automated checks. Once you have that clear picture, then adopt those “ready” and “done” state criteria requiring test data so your tests can be cross-environment compatible and have a higher ROI.

Challenge Three: Flickering Tests

Flickering Tests or “Flaky” tests are tests that can either pass or fail even when run on the same code. Automated tests that don’t consistently pass are soon ignored by the entire team. The execution report, dashboard, and notification emails should mean something. Flickering tests are pernicious threats to an automation suite; they steal time away from more valuable activities; they hurt the trustworthiness of our automated executions; and the limit the success of future tests because they can’t be used as building blocks.

“A test is non-deterministic when it passes sometimes and fails sometimes, without any noticeable change in the code, tests, or environment. Such tests fail, then you re-run them and they pass. Test failures for such tests are seemingly random. Non-determinism can plague any kind of test, but it’s particularly prone to affect tests with a broad scope, such as acceptance or functional tests.” – Martin Fowler

Martin Fowler has a response to flickering tests that is quite appropriate given the current state of the world: quarantine. First remove the flickering tests from any active executions (triggered by scheduled jobs or part of a build pipeline). The quality of the automated tests must be maintained lest we lose confidence from our team and our stakeholders. Next perform root cause analysis on each flickering test to determine the source of the flakiness: our own coding practices, environment, data, the application under test, external service, or any combination of the listed reasons. This can be a time intensive endeavor but it’s important to address these issues before your automation suite turns into a monster you can no longer fight. If the source of failure can be addressed, then the test can be added to the rest of the executions; otherwise remove it.

Challenge Four: Long Tests

Another common problem seen in automation suites are overly long tests with literally hundreds of validations. These tests perhaps started with a small scope but began a long scope creep as more and more validations were tacked on to a flow. Validations for fields and messages and navigation – any and everything could be added to these monstrous test cases. There are a host of problems with this approach. For one, long tests take a long time to execute. If the goal is fast feedback, especially fast feedback in a CI/CD pipeline, then long tests will kill your ability to deliver quickly. Another issue is missed validations. Many automated testing platforms skip the remaining validations within a single test once a step fails. If a long test fails at step 20 of 300, then you have no idea if there are issues with step 21 through 300. The team now has less knowledge about the state of the application because those missed validations are unknown until you move beyond that failed step. Lastly, many of the validations in those long tests should be unit or integrations tests. That test is sacrificing speed and quality and returning little of value.

Slice and dice long tests. Ideally each automated check focuses on a single validation or “outcome”. UI tests should be focused on a successful outcome from a user’s perspective. For those fields and messages and database calls, instead implement the tests most suited to fast feedback and robustness. An automation approach needs to place unit tests and integration tests as a priority over UI tests. Automate UI as needed to verify end-user behavior.

Challenge Five: Shaky Foundation

We have all been victim to the “project deadline” bug. Whatever the best intentions were at the outset of a project, we become constrained by timelines that simply will not move. All the grand ideas we had about architecting an awesome solution are soon thrown by the wayside in favor of getting “done”. So we continue to make sacrifices to the quality of our work for the sake of getting done again and again. The problems with our automation suite pile up and soon we’re writing our code to get to the next day rather than help the poor schmuck who will have to dig through our code 5 years from now. Whomever that guy/gal is will likely throw the codebase away and start anew because we’ve built the entire suite on a shaky foundation.

Our team has thrown plenty of legacy automation suites in the garbage and a few of our own joined the pile early on when we realized the mistakes we made were not sustainable. An automation suite that is not constructed properly from the beginning and maintained throughout its life will eventually fall apart. It’s a lot easier to make a lot of small bad decisions to get “done” than short-term costly up-front decisions that ultimately save us time down the line. Once that shaky suite is built it’s hard for the creators and maintainers to discard it because of the sunk-cost fallacy. A better path is to architect an automated solution with good practices from the start and to consistently engage in activities to promote the quality of the suite.

Treat the automation code with the same care and expectations as you would expect of the development code. That means leveraging the existing principles of “Don’t Repeat Yourself” (DRY) &“Keep It Simple, Stupid” (KISS), implementing design patterns to support the overall goal of the automated solution, scheduling regular code reviews, using code analysis tools, and engaging in regular refactoring sessions.

The preceding topics and their associated sources are too large for a single article to cover, but we’ll attempt to do them justice in some concise advice. If you’re testing web applications, it’s a good idea to consider using the Page Object pattern or the Screenplay pattern. These are tried-and-true patterns with a great deal of background material and active training to support learning. Many of the existing version control tools out there have built-in policies to ensure code reviews are performed before branches are merged. These automatic tollgates can help enforce code review practices agreed to by a team and help spread domain knowledge by checking each other’s work. Static code analysis tools or linters are great at picking up common errors in the code; execution of such linters can be made standard practice with each commit or separately executed to support refactoring sessions. Lastly, regular refactoring sessions should be held by the team and supported by an outside “automation oracle” to help improve the state of the codebase while also sharing domain knowledge. More will be shared on refactoring sessions in a later article.

All these activities described above are designed to support the quality of the automation code. It certainly sounds like a lot of work – but quality doesn’t come free. The team must be committed to the quality of the testing activities with the same vigor we expect of development work or business analysis or infrastructure. Avoiding a shaky foundation through good practices will help keep the automation in a healthy state.

Challenge Six: Automation Lags Behind Development

Similar to the “deadline driven development” described in the prior challenge, teams often run into a time crunch in the handoff from development to testing. Development extends beyond their initial estimations and the time allocated for testing becomes more limited. Since automation scripting does take time, teams can fall into a trap of skipping automation for the sake of manual validation or pushing the automation to the next project or Sprint for the sake of pushing to production on time. This creates a build-up of automation technical debt since there are likely candidate test cases for automation that are simply not done, or the team violates their working agreement and pushes through development work that hasn’t been tested properly. Continuing this practice project-after-project or sprint-after-sprint results in an accumulation of technical debt that limits the test coverage of an application. Ultimately defects will escape into production if a team constantly has testing as a whole (and automation specifically) lagging behind development.

To address the issue of automation lagging behind development, it’s imperative for a team to incorporate automation feasibility into the entry criteria for any feature to be developed. That means the team determines test automation candidates during refinement of new stories, which include the aforementioned access to test data from Challenge #2. Additionally, teams must consider completed (and executed!) scripts as part of the definition of done or exit criteria for any development work. If deadlines preclude this work from being done, teams should adopt working agreements that the “missed” automation is added as technical debt to be addressed at the beginning of the next Sprint or Project cycle. If this becomes a common occurrence, then the team must address the underlying cause of their estimations being lower than what’s needed to deliver a product that is tested according to their standards.

To help ensure automation runs concurrently with development, teams should adopt development standards that help promote automation as an upfront concern. That can include Test-Driven Development (TDD), Acceptance Test-Driven Development (ATDD), as well as Behavior-Driven Development (BDD). These practices promote testing up front and testing from the perspective of the user. When working on UI automated tests, it’s recommended the developers maintain standards for element locator IDs so the automation developers can write scripts concurrently with development.

Post Credits Scene

The challenges discussed in this post were not an exhaustive list of all the problems a team could face with test automation but do provide insight into common issues. Test automation is a big investment for an organization; it’s not a magic wand that makes all testing less costly or finds all your bugs. Automation is another tool to support the team in their quest for quality. Teams that treat their automation code the same as development code and follow practices that promote good code quality are more likely to have long-term success with their automated tests. You don’t have to be a superhero to write good automated tests – all you need is a desire to improve and the will to see it through.

Fictional Test Data

An underlying principle in our work as software developers is that everyone should understand our work. From design to production, we strive to produce sensible models for other humans to understand. We design requirements for clarity, and hammer them out until everyone involved agrees that they make sense. We write code that is self-documenting, employs conventions, and uses design patterns, so that other developers can better comprehend how it works. We write tests that tell detailed stories about software behavior – stories that are not only truthful, but easily understood. We enshrine this principle in the tools and processes we use, in quality assurance especially, with tools like Cucumber and Gherkin, which emphasize collaboration and communication.

We are storytellers

To that end, I propose an exercise in which we try on a new hat – the storyteller.

I sense some parallel between my experience as a reader and my experience in quality assurance. I feel sensitive to the difference between an easy, accessible writing style, and writing that is denser and more challenging. Popular authors like Stephen King are criticized for being too prolific, too popular, and too easy to read, but there is a great value in accessibility – reaching a wide audience is good for the business model.

In software development, striving for accessibility can be valuable. Most of the difficulty that I’ve witnessed and experienced can be attributed not to the inherent complexity of code, or to the scale of a system, but to simple miscommunications that occur as we work to build them. From the perspective of quality assurance, it’s particularly harmful when our tests, our expressions of expected system behavior, are difficult to understand. In particular, I find that test data which drives a test is difficult to understand, untrustworthy, and time-consuming to manage.

When I say “test data”, I’m speaking broadly about information in our systems as it is employed by our tests. It’s helpful to break this down – a common model categorizes information as master data, transactional data, and analytical data.

Most of the data that we directly reference in our tests falls into the category of master data. Master data includes business entities like users, products, and accounts. This data is persistent in the systems that we test, and it becomes persistent in our tests too – most test cases involve authenticating as some kind of user, or interactioning with some kind of object (like a product). This is usually the main character in our stories.

Transactional data is just what it sounds like – transactions. In our systems, this may include purchases, orders, submissions, etc – any record of an interaction within the system. We don’t usually express this directly in our tests, but transactional data is intrinsically linked to master data, and the entities that we use in our tests are further defined by any associated transactional data.

The last category is analytical data, which is not obviously expressed in our tests. This encompasses metrics and measurements collected from production systems and users to make business decisions that drive software development. It tells us about the means by which users access our systems, and the way that they use them. This data is also a part of our tests – we employ information about real users and real interactions to improve our testing, and all of our test data becomes a reflection of the real world.

What does our test data typically look like?

I wouldn’t judge a book by it’s cover, but I would like to read test data at a glance. That’s not easy to do when we share user data that looks like the following example:

We don’t know much about this user without doing further research, like slinging SQL queries, or booting up the app-under-test to take a look. This information is not recognizable or memorable, and it undermines the confidence of anyone who would attempt to read it or use it. It tells a poor story.

Why do we construct data like this? The test data I remember using most often was not particularly well-designed, but simply very common. Sometimes a user is readily shared amongst testers because it is difficult to find or create something better – I give this user to you because it was given to me. At best, we could infer that this is a fair representative of a “generic user” – at worst, we may not even think about it. When we discover some strange new behavior in the system, something which may be a real defect to act on, we often need to ask first “was this data valid?”

Would our work be easier if our data was more carefully constructed?

As an example, I present the Ward family. I designed the Ward family to test tiers of a loyalty points system, and each member represents a specific tier. For the highest tier user, with more rewards than the others, I created Maury Wards. For the middle tier, a user with some rewards – Summer Wards. To represent the user who has earned no rewards – Nora Wards. If the gag isn’t obvious, try sounding out the names as you read them.

I created these users without much though. I was just trying to be funny. I don’t like writing test data, and making a joke of it can be motivating. What I didn’t realize until later is that this data set was not only meaningful, but memorable. I found myself re-using the Ward family, every time I needed a specific loyalty tier, for months. I knew what this data represented, and I knew exactly when I needed to use it.

Beyond the names, I employed other conventions that also made this data easier to use. For example, I could summon these users with confidence in all of our test environments because I gave them email addresses that indicated not only what kind of user they are, but what environment they were created in. I recommend applying such conventions to any visible and non-critical information to imbue data with meaning and tell a clear story.

What could we do to tell a more detailed story?

User stories are relayed to us through an elaborate game of telephone, and something is often lost along the way. Take a look at the following example, and you may see what I mean.

“As a user”. Right there. This example may seem contrived, but I’ve seen it often – a user story without a real user. This doesn’t explicitly encourage us to consider the different real-world people who will interact with our software, and the kind of tests that we should design for them. It would probably make an individual requirement clumsy to include much more explicit information about the user, but it is important. Imagine that this feature was tested with “a user”, and it passed without issue – great. But what about Dan? Dan does all of his business online, and doesn’t shop in-store. Where he lives, our system won’t even recommend a nearby store. How can we avoid forgetting about users like Dan?

If we can’t describe the users in a requirement, what can we do?

Alan Cooper, software developer and author of The Inmates Are Running The Asylum, argues that we can only be successful if we design our software for specific users. We don’t want all users to be somewhat satisfied – we want specific users to be completely satisfied. He recommends the use of personas – hypothetical archetypes that represent actual users through the software design process. UX designers employ personas to infer the needs of real-world users and design solutions that will address them, and for quality assurance, we should use the same personas to drive test case design and execution.

If I expanded a member of the Wards family into a full persona, it might look like the following example – a little something about who the user is, where they are, and how they interact with our system.

A persona includes personal information about a user, even seemingly irrelevant information, like a picture, name, age, career, etc – to make them feel like a real, relatable person. Thinking about a real human being will help us understand which features matter to the user, and how the user will experience these new features, to design test cases which support them.

A persona includes geographic location, especially when relevant in our tests. Software might behave differently depending on the user’s specific GPS location, local time zone, and even legislation. A user may be directed to nearby store locations or use a specific feature while in-store. Our software may behave differently depending on time and date – for example, delivery estimates, or transaction cut-off times. Our software may need to accommodate laws that make it illegal to do business across geographic boundaries, or to do business differently. The California Consumer Privacy Act (CCPA) is a recognizable example with implications for all kinds of software-dependent businesses.

A persona also includes information about the technology that a user favors. This is the lens through which they view our software, and it changes the user experience dramatically. How is this feature presented on PCs, smartphones, and tablets? Does it work for users on different operating systems? Which browsers, or clients, do we support? We can design personas for users with many combinations of hardware and software, and then execute the same test with each of them.

Hope lives in Honolulu, Hawaii, and I chose the name because the ‘H’ sound reminds me that. She lives in the Hawaiian-Aleution time zone, which can be easy to forget about if we do most of our testing against a corporate office address. She uses a Google Pixel 3 and keeps the operating system up-to-date – currently Android 10. While Honolulu is a major city, I took a liberty of assuming a poor internet connection – something else which may not be tested if don’t build personas like this.

Lee lives in Los Angeles, CA – Pacific Time Zone. He uses an iPhone-XS Max, and he doesn’t update the operating system immediately – he’s currently using iOS 12. He has a good network connection, but there’s a wrinkle – he’s using other apps that could compete for bandwidth and hardware resources.

Cass lives in Chicago, IL – Central Time Zone. She’s another Android user, this time a Samsung device, currently running Android 9. She has a good connection, but she’s using other apps which also use her GPS location.

How do we manage all of this?

If I asked you today, “where can I find a user who meets a specific condition,” where would you start? How is your test data managed today? There are plenty of valid solutions, like SharePoint, wikis, network drives, etc – but don’t think of the application database as a test data library – in test environments, it is not a library, but a landfill. There is too much to parse, too many duplicates, too much invalid data – we can only find helpful data if we are very good at finding it. Keep personas and detailed data somewhere that can be easily accessed and manipulated.

We can further reduce the work of test data management by treating the collection like a curated, personal library, where every story is included for a reason. Take care to reduce noise by eliminating duplicate data sets, and removing invalid ones. Name data sets for reference so that they can be updated or recreated as needed without disrupting the requirements, test cases, and software developers that use them.

In summary, I advocate the following:

  • Test data should be recognizable and memorable
  • Test data should be realistic and relatable
  • Test data should be curated and readily available

Additional Resources:

The Inmates Are Running The Asylum, Alan Cooper
Types of Enterprise Data

Automated Accessibility (ADA) Testing with Pa11y

ADA validation is the oft-forgotten child in QA Automation conversations. As Quality Assurance professionals we focus on functional, performance, and security testing but miss the value and importance of accessibility validations. Any site that is customer-facing has an obligation to comply to ADA standards. Therefore, it’s important for us to make accessibility an up-front concern in testing.

For a little background, the Americans with Disabilities Act was signed into law in 1990. The law legally obligates the prevention of discrimination in all areas of public life. There are five sections of the law, but for web applications only the regulations within Title 3 – Public Accommodations apply.


The frequency and severity of the lawsuits related to ADA Title 3 are rising year over year, as evidenced by the chart above. Most of the lawsuits require the company to set aside resources to become ADA compliant.

The commonly accepted standard to gauge ADA compliance is the WCAG (Web Content Accessibility Guideline). The guidelines are provided by the W3C (World Wide Web Consortium). They are broken down into four sections.

Perceivable – Information and user interface components must be presentable to users in ways they can perceive.

Operable – User interface components and navigation must be operable.

Understandable – Information and the operation of user interface must be understandable.

Robust – Content must be robust enough that it can be interpreted by a wide variety of user agents, including assistive technologies.

Each guideline has multiple subsections and multiple success criteria to determine compliance; the criteria are judged on A, AA, and AAA standards with A being the lowest compliance level and AAA being the highest. Additional information can be found HERE.

There are multiple options for ADA compliance testing that fall into a few categories: (1) In house QA utilizing tools like JAWS to go through the web applications flow utilizing tools designed for disabled individuals; (2) Companies that will complete scans and/or manual validations as a service; and, (3) Static tools built as add-on’s into browsers such as Axe and Wave. I personally have no problem with any approach that nets results but would like to provide a fourth option to the list.

Pa11y is an accessibility testing tool utilized to determine a websites’ WCAG Compliance. The tool can scan based on A, AA, and AAA standards and can be executed directly against a web page or an HTML file.

To jump into the weeds a bit, Pa11y utilizes Puppeteer which is a node library that provides an API to interact with headless Chrome. When Pa11y runs Puppeteer it’s creating a headless Chrome browser, navigating to the web page or opening the HTML file in the browser and then the page is scanned against an WCAG compliance rule-set.

The next logical question is “what rule-set is utilized to determine WCAG compliance?” Pa11y by default utilizes HTML Code Sniffer, which is a client side script that scans HTML source code and detects violations of a defined coding standard.

Pa11y will return the following for each violation found: a description of the error, the WCAG guideline violated, the path to the HTML element, and the content of the HTML element. Pa11y by default will output to the command line but with some configuration modifications can export either CSV or JSON.

Additionally, Pa11y has the ability to run the Axe rule-set against the HTML spun up in the browser. This can provide a good level set if your developers are utilizing Axe as well.

So now that we have covered Pa11y, the next step will be discussing the ways in which we can implement Pa11y to run automatically.

The first way is built into Pa11y: we can implement the actions functionality that allows the browser to navigate through the web-page utilizing CSS Selectors via Puppeteer.


The second way is to utilize an existing test automation framework to complete the following steps:

  1. Utilize your test framework to navigate to the desired scanned page, scrape, and save the HTML to disk.
  2. From the command line pass the HTML file to Pa11y

The first option is beneficial if you’re building from the ground up as no additional framework is needed. The second option is beneficial if you have a built automation framework and you want to utilize the existing navigation to all the various pages that require validation.

Whichever choice you make the scripts should be built into a Continuous Integration (CI) job utilizing your tool of choice (Jenkins, Bamboo, etc). In addition to providing a way to continuously execute your scripts the CI tool will provide a location for storage of the scans to prove compliance if a lawsuit requires proof of compliance effort.

One important note: Automated scans with Pa11y do not replace the need for manual validation as there are WCAG requirements that cannot be validated via an automated scanning tool.

In summary, every web development team should be validating WCAG compliance as part of its software development life cycle. Also, WCAG compliance should be included in teams definition of done for a given card. Lastly, to maximize success for an application under test the results should be transparent and utilize your preexisting automation framework if possible to do the heavy lifting.

Validating Site Analytics

Almost every modern company with an e-commerce presence makes decisions with the help of site data and analytics. The questions posed surrounding a user base can be almost endless. Which pages are people viewing? What marketing campaigns and promotions are actually working? How much revenue is being generated and where is coming from?

In an environment where data is valuable and accessible, it’s important to take a step back and ask the question: is this data accurate? If the data Brad Pitt was basing decisions on to run a baseball organization in the movie Moneyball wasn’t correct, then it would’ve been an extremely short movie (if not somewhat comical). Ultimately, the analytics collected from our websites and applications are used to make important decisions for our organizations. When that data turns out to be inaccurate then it becomes worthless, or worse yet, negatively impacts our business.

Throughout my professional career I have noticed ensuring the integrity of this data can often be put on the backburner within individual software teams. Sure, it’s one of the most important things to leadership, but in our day-to-day job we are often focused on more visible functionality rather than the one network call in the background that is reporting data and doesn’t have anything to do with our apps actually working. At the end of the day, if this data is valuable to our leaders and organization, then it should be valuable to us.

Let’s look at an imaginary business scenario. Say we have a site that sells kittens. Our site sells all kinds of kitten breeds. Our Agile team been working on the site for a long time and feels pretty good about our development pipelines and practices. The automated testing suite for the site is robust and well maintained, with lots of scripts and solid site coverage.

Then one day we find out that Billy from the business team has been doing user acceptance testing on our Adobe Analytics once every couple months. He’s got about 200 scripts that he manually goes through, and he does his best to look at all the really important functionality. But wait a second… we know that our site records data for about 100 unique user events. What’s more, there are about 200 additional fields of data that we are sending along with those events, and we are sending data on almost every page for almost every significant site interaction. This could easily translate into thousands of test cases! How could we possibly be confident in our data integrity when we are constantly making changes to these pages? How in the world is Billy okay with running through these scripts all the time? Is Billy a robot? Can we really trust Billy?

This new information seems like a potential quality gap to our team, and we wonder how we can go about automating away this effort. It definitely checks all the boxes for a good process to automate. It is manual, mundane, easily repeated, and will result in significant time savings. So what are our options? Our Selenium tests can hit the front end, but have no knowledge of the network calls behind the scenes. We know that there are 3rd party options, but we don’t have the budget to invest in a new tool. Luckily, there’s an open source tool that will hook up to our existing test suite and won’t be hard to implement.

The tool that we’re talking about is called Browserup Proxy (BUP), formerly known as Browsermob proxy. BUP works by setting up a local proxy that network traffic can be passed through. This proxy then captures all of the request and response data passing through it, and allows us to access and manipulate that data. This proxy can do a lot for us, such as blacklisting/whitelisting URLs, simulating network conditions (e.g. high latency), and control DNS settings, but what we really care about is capturing that HTTP data.

BUP makes it relatively easy for us to include a proxy instance for our tests when we instantiate our Selenium driver. We simply have to start our proxy, create a Selenium Proxy object using our running proxy, and pass the Selenium Proxy object into our driver capabilities. Then we execute one command that tells the driver to create HAR files containing request and response data.

from the BUP GitHub page at

Since we will be working with HAR files, let’s talk about what those actually are. HAR stands for “HTML Archive”. When we go into our Network tab in our browser’s Developer Tools and export that data, it’s also saved in this format. These files hold every request/response pair in an entry. Each entry contains data such as URL’s, query string parameters, response codes, and timings.

HAR file example from using Google’s HAR Analyzer
HAR entry details example

Now we can better visualize what we’re working with here. Assuming we’ve already collected our 200 regression scenarios from Billy the Robot, we should have a good jumping off point to start validating this data more thoroughly. The beauty of this approach is we can now hook these validations up to our existing tests. We already have plenty of code to navigate through the site, right? Now all we need is some additional code to perform some new validations.

Above we mentioned that our site is using Adobe Analytics. This service passes data from our site to the cloud using some interesting calls. Each Adobe call will be a GET that passes its data via the query parameters. So in this case we need to find the call that we’re looking to validate, and then make sure that the correct data is included in that call. To find the correct call, we can simply use a unique identifier (e.g. signInClickEvent) and sort through the request URLs until we find the correct call. It might be useful to use the following format to store our validation data:

Data stored in YML format

Storing data this way makes it simple and easy to worth with. We have a descriptive name, we have an identifier to find the correct request, and we have a nice list of fields that we want to validate. We can allow our tests to simply ignore the fields that we’re not specifically looking to validate. Our degree of difficulty will increase somewhat if we are trying to validate entire request or response payloads, but this general format is still workable. So to review our general workflow for these types of validations:

  1. Use suite to instantiate Proxy
  2. Pass Proxy into Selenium driver
  3. Run Selenium scripts as normal and generate desired event(s)
  4. Load HTTP traffic from Proxy object
  5. Find correct call based on unique identifier
  6. Perform validation(s)
  7. (optional) Save HAR file for logs

Not too bad! We can assume that our kitten site probably already has a lot of our scenarios built out, but we just didn’t know it before. There’s a good chance that we can simply slap some validations onto the end of some existing scripts and they’ll be ready to go. We’ll soon be able to get those 200 UAT scripts built out in our suite and executing regularly, and Billy will have a little less work on his plate going forward (the psychopath).

In my opinion, it’s a very good idea to implement these validations into your test automation frameworks. The amount of value they provide compared with the amount of effort required (assuming you are already running Selenium scripts) makes this a smart functionality to implement. Building out these tests for my teams has contributed to finding a number of analytics defects that probably would’ve never been found otherwise and, as a result, has increased the quality of our site’s data.

A few notes:
– We don’t necessarily want to instantiate our Proxy with every Selenium test we run. The proxy will consumer additional resources compared to running normal tests, but how much this affects your test box will vary depending on hardware. It is recommended that you use some sort of flag or environment variable to determine if the Proxy should be instantiated.
– It can seem practical to make a separate testing suite to perform these validations, but with that approach you will have to maintain practically duplicate code in more than one place. It is easier to plug this into existing suites.
– BUP is a Java application that has it’s own directory and files. The easiest way to manage distribution of these files is to plug it into version control in a project’s utility folder. There is no BUP installation required outside of having a valid Java version.
– I wanted to keep this post high level, but if you are using Ruby then there are useful gems to work with Browserup/Browsermob and HAR files (“browsermob-proxy” and “har”, respectively).

Happy testing!

Additional References:

Browserup Proxy
Browsermob Proxy Ruby gem
HAR Ruby gem

Welcome to Red Green Refactor

We officially welcome you to the start of Red Green Refactor, a technology blog about automation and DevOps. We are a group of passionate technologists who care about learning and sharing our knowledge. Information Technology is a huge field and even though we’re a small part of it – we wanted another outlet to collaborate with the community.

Why Red Green Refactor?

Red Green Refactor is a term commonly used in Test Driven Development to support a test first approach to software design. Kent Beck is generally credited with discovering or “rediscovering” the phrase “Test Driven Development”. The mantra for the practice is red-green-refactor, where the colors refer to the status of the test driving the development code.

The Red is writing a small piece of test code without the development code implemented. The test should fail upon execution – a red failure. The Green is writing just enough development code to get the test code to pass. The test should pass upon execution – a green pass. The Refactor is making small improvements to the development code without affecting the behavior. The quality of the code is improved according to team standards, addressing “code smells” (making the code readable, maintainable, removing duplication), or using simple design patterns. The point of the practice is to make the code more robust by catching the mistakes early, with an eye on quality of the code from the beginning. Writing in small batches helps the practitioner think about the design of their program consistently.

“Refactoring is a controlled technique for improving the design of an existing codebase.”

Martin Fowler

The goal of Red Green Refactor is similar to the practice of refactoring: to make small-yet-cumulative positive changes, but instead in learning to help educate the community about automation and DevOps. The act of publishing also encourages our team to refine our materials in preparation for a larger audience. Many of the writers on Red Green Refactor speak at conferences, professional groups, and the occasional webinar. The learning at Red Green Refactor will be bi-directional – to the readers and to the writers.

Who Are We?

The writers on Red Green Refactor come from varied backgrounds but all of us made our way into information technology, some purposefully and some accidentally. Our primary focus was on test automation, which has evolved into DevOps practices as we expanded our scope into operations. Occasionally we will invite external contributors to post on a subject of interest. We have a few invited writers lined up and ready to contribute.

“Automation Team” outing with some of Red-Green-Refactor authors

As for myself, I have a background in Physics & Biophysics, with over a decade spent in research science studying fluorescence spectroscopy and microscopy before joining IT. I’ve worked as a requirements analyst, developer, and tester before joining the ranks of pointed-headed management. That doesn’t stop me from exploring new tech at home though or posting about it on a blog.

What Can You Expect From Red Green Refactor?


Some companies are in the .NET stack, some are Java shops, but everyone needs some form of automation. The result is many varied implementations of both test & task automation. Our team has supported almost all the application types under the sun (desktop, web, mobile, database, API/services, mainframe, etc.). We’ve also explored with many tools both open-source and commercial at companies with ancient tech and bleeding edge. Our posts will be driven by both prior experience as well as exploration to the unknown.

We’ll be exploring programming languages and tools in the automation space.  Readers can expect to learn about frameworks, cloud solutions, CI/CD, design patterns, code reviews, refactoring, metrics, implementation strategies, performance testing, etc. – it’s open ended.

Continuous Improvement

We aim to keep our readers informed about continuous improvement activities in the community. One of the great things about this field is there is so much to learn and it’s ever-changing. It can be difficult at times with the firehose of information coming at you since there are only so many hours in the day. We tend to divide responsibility among our group to perform “deep dives” into certain topics and then share that knowledge with a wider audience (for example: Docker, Analytics or Robot Process Automation). In the same spirit we plan to share information on Red Green Refactor about continuous improvement. Posts about continuous improvement will include: trainings, conference recaps, professional groups, aggregated articles, podcasts, tech book summaries, career development, and even the occasional job posting.

Once again welcome to Red Green Refactor. Your feedback is always welcome.