This will be a series of posts about the strategy and tactics of test automation. My team has experience working at multiple large firms with an enterprise-wide scope. Throughout our time working in IT, we have encountered challenges with existing test automation implementations and unfortunately committed several mistakes on the way. The content of this post will focus on UI-automated tests because in our experience that’s where most of the misplaced effort is found in test automation. Our hope is to relay some of these common challenges and solutions to a wider audience so you can defeat the automation supervillains.
Challenge One: The Automation Firehose
Just because a scenario CAN be automated does not mean it SHOULD be automated. Teams that adopt automation often rush to automate everything they can — the automation firehose. The firehose results from teams being enamored with a new tool and they want to use it everywhere. It’s a natural inclination for those of us in technology to be excited about new tech.
Instead teams should adopt a risk-based approach to determine the most critical scenarios needing attention. For those scenarios that should be automated, every team must adopt an implementation plan to ensure value is derived from reliable automated test execution. That plan should include entry and exit criteria for any automated scripts that take into account schedule, budget, and technical skillset of the developers. Additionally, the automation scripts should be focused on frequently used / critical paths, heavy data dependency, and include legal risk (SOX compliance, ADA compliance, etc.).
One recommendation is to use an “automation scorecard” to identify the most important scenarios to automate. The columns should be criteria you will use to judge whether or not a scenario should be automated. The rows will include either feature-level work or individual scenarios. In the example provided we use simple checkboxes to help determine features that should be automated. Checkboxes could easily be replaced with a scale of zero to ten, low-medium-high, or whatever criteria the team agrees to use. Only four categories are used in the example, but you could easily extend this based on team or organizational-values. A key component of using this sort of scorecard is to establish a threshold for scenarios to be automated so teams can start with the most VALUABLE scenarios first and work their way down the list. The result is often a more focused UI-automation suite, with more valuable tests that require less upkeep (because there are fewer of them).
Challenge Two: Data Failure
When a team writes an automated test only considering a single test environment, they are selling themselves short. An even larger problem for testers is simply not having access to or control over their own test data. The data can be stale in an environment or only be applicable to a single environment or be restricted by an external team or come from an external vendor. There are many ways we can run into data challenges in testing, which also extends to our automated tests. A test that only works in a single environment cannot achieve the same value proposition as a test that works across multiple environments. Consider one of the “selling” points on test automation – those automated checks can run many times unattended or part of a build pipeline to provide the team insight about the state of the application. A test that only works in one environment has limited scope and cannot achieve its full potential. Perhaps that automated check shouldn’t have been written in the first place because it takes more time to write & maintain than it would to execute manually.
To address this challenge, make sure cross-environment compatibility is an up-front concern. Before the development work even begins on a feature, test data generation & manipulation across multiple environments should be part of the “ready” state criteria. Additionally, execution of those automated checks across multiple environments should be part of the “done” state criteria. Advising people to adopt this approach is the easy part. How can control of test data for automation be achieved? Through persistence and patience. As a precursor to having test data across environments part of any “ready” and “done” state criteria, it’s important to capture what your data needs are and how to best use that data. Some of these tips are in a prior blog post, Fictional Test Data. Map out the application under test using a context-dependency diagram. Identify the inputs & outputs of your system and the expected outcomes. From that refined view it will be more apparent what data is needed and when you need to create, read, update, and delete (simple CRUD).
While the topic of test data at large is beyond the scope of this post, for automated checks we first identify what the needs are and then fight to get access to that data. The best persuasive argument that you can make to the management and cross-impacted teams is to show empirical evidence where this lack of data is hurting the company. What bugs have escaped because you couldn’t test in an environment? What automated checks needed to be tested manually across those environments? What stale data or selfish data do you have today that is hindering the team’s ability to deliver in a timely manner? Identifying those concerns using evidence will help build your case to get the access needed or at least pave the way to generate fictional test data for those automated checks. Once you have that clear picture, then adopt those “ready” and “done” state criteria requiring test data so your tests can be cross-environment compatible and have a higher ROI.
Challenge Three: Flickering Tests
Flickering Tests or “Flaky” tests are tests that can either pass or fail even when run on the same code. Automated tests that don’t consistently pass are soon ignored by the entire team. The execution report, dashboard, and notification emails should mean something. Flickering tests are pernicious threats to an automation suite; they steal time away from more valuable activities; they hurt the trustworthiness of our automated executions; and the limit the success of future tests because they can’t be used as building blocks.
“A test is non-deterministic when it passes sometimes and fails sometimes, without any noticeable change in the code, tests, or environment. Such tests fail, then you re-run them and they pass. Test failures for such tests are seemingly random. Non-determinism can plague any kind of test, but it’s particularly prone to affect tests with a broad scope, such as acceptance or functional tests.” – Martin Fowler
Martin Fowler has a response to flickering tests that is quite appropriate given the current state of the world: quarantine. First remove the flickering tests from any active executions (triggered by scheduled jobs or part of a build pipeline). The quality of the automated tests must be maintained lest we lose confidence from our team and our stakeholders. Next perform root cause analysis on each flickering test to determine the source of the flakiness: our own coding practices, environment, data, the application under test, external service, or any combination of the listed reasons. This can be a time intensive endeavor but it’s important to address these issues before your automation suite turns into a monster you can no longer fight. If the source of failure can be addressed, then the test can be added to the rest of the executions; otherwise remove it.
Challenge Four: Long Tests
Another common problem seen in automation suites are overly long tests with literally hundreds of validations. These tests perhaps started with a small scope but began a long scope creep as more and more validations were tacked on to a flow. Validations for fields and messages and navigation – any and everything could be added to these monstrous test cases. There are a host of problems with this approach. For one, long tests take a long time to execute. If the goal is fast feedback, especially fast feedback in a CI/CD pipeline, then long tests will kill your ability to deliver quickly. Another issue is missed validations. Many automated testing platforms skip the remaining validations within a single test once a step fails. If a long test fails at step 20 of 300, then you have no idea if there are issues with step 21 through 300. The team now has less knowledge about the state of the application because those missed validations are unknown until you move beyond that failed step. Lastly, many of the validations in those long tests should be unit or integrations tests. That test is sacrificing speed and quality and returning little of value.
Slice and dice long tests. Ideally each automated check focuses on a single validation or “outcome”. UI tests should be focused on a successful outcome from a user’s perspective. For those fields and messages and database calls, instead implement the tests most suited to fast feedback and robustness. An automation approach needs to place unit tests and integration tests as a priority over UI tests. Automate UI as needed to verify end-user behavior.
Challenge Five: Shaky Foundation
We have all been victim to the “project deadline” bug. Whatever the best intentions were at the outset of a project, we become constrained by timelines that simply will not move. All the grand ideas we had about architecting an awesome solution are soon thrown by the wayside in favor of getting “done”. So we continue to make sacrifices to the quality of our work for the sake of getting done again and again. The problems with our automation suite pile up and soon we’re writing our code to get to the next day rather than help the poor schmuck who will have to dig through our code 5 years from now. Whomever that guy/gal is will likely throw the codebase away and start anew because we’ve built the entire suite on a shaky foundation.
Our team has thrown plenty of legacy automation suites in the garbage and a few of our own joined the pile early on when we realized the mistakes we made were not sustainable. An automation suite that is not constructed properly from the beginning and maintained throughout its life will eventually fall apart. It’s a lot easier to make a lot of small bad decisions to get “done” than short-term costly up-front decisions that ultimately save us time down the line. Once that shaky suite is built it’s hard for the creators and maintainers to discard it because of the sunk-cost fallacy. A better path is to architect an automated solution with good practices from the start and to consistently engage in activities to promote the quality of the suite.
Treat the automation code with the same care and expectations as you would expect of the development code. That means leveraging the existing principles of “Don’t Repeat Yourself” (DRY) &“Keep It Simple, Stupid” (KISS), implementing design patterns to support the overall goal of the automated solution, scheduling regular code reviews, using code analysis tools, and engaging in regular refactoring sessions.
The preceding topics and their associated sources are too large for a single article to cover, but we’ll attempt to do them justice in some concise advice. If you’re testing web applications, it’s a good idea to consider using the Page Object pattern or the Screenplay pattern. These are tried-and-true patterns with a great deal of background material and active training to support learning. Many of the existing version control tools out there have built-in policies to ensure code reviews are performed before branches are merged. These automatic tollgates can help enforce code review practices agreed to by a team and help spread domain knowledge by checking each other’s work. Static code analysis tools or linters are great at picking up common errors in the code; execution of such linters can be made standard practice with each commit or separately executed to support refactoring sessions. Lastly, regular refactoring sessions should be held by the team and supported by an outside “automation oracle” to help improve the state of the codebase while also sharing domain knowledge. More will be shared on refactoring sessions in a later article.
All these activities described above are designed to support the quality of the automation code. It certainly sounds like a lot of work – but quality doesn’t come free. The team must be committed to the quality of the testing activities with the same vigor we expect of development work or business analysis or infrastructure. Avoiding a shaky foundation through good practices will help keep the automation in a healthy state.
Challenge Six: Automation Lags Behind Development
Similar to the “deadline driven development” described in the prior challenge, teams often run into a time crunch in the handoff from development to testing. Development extends beyond their initial estimations and the time allocated for testing becomes more limited. Since automation scripting does take time, teams can fall into a trap of skipping automation for the sake of manual validation or pushing the automation to the next project or Sprint for the sake of pushing to production on time. This creates a build-up of automation technical debt since there are likely candidate test cases for automation that are simply not done, or the team violates their working agreement and pushes through development work that hasn’t been tested properly. Continuing this practice project-after-project or sprint-after-sprint results in an accumulation of technical debt that limits the test coverage of an application. Ultimately defects will escape into production if a team constantly has testing as a whole (and automation specifically) lagging behind development.
To address the issue of automation lagging behind development, it’s imperative for a team to incorporate automation feasibility into the entry criteria for any feature to be developed. That means the team determines test automation candidates during refinement of new stories, which include the aforementioned access to test data from Challenge #2. Additionally, teams must consider completed (and executed!) scripts as part of the definition of done or exit criteria for any development work. If deadlines preclude this work from being done, teams should adopt working agreements that the “missed” automation is added as technical debt to be addressed at the beginning of the next Sprint or Project cycle. If this becomes a common occurrence, then the team must address the underlying cause of their estimations being lower than what’s needed to deliver a product that is tested according to their standards.
To help ensure automation runs concurrently with development, teams should adopt development standards that help promote automation as an upfront concern. That can include Test-Driven Development (TDD), Acceptance Test-Driven Development (ATDD), as well as Behavior-Driven Development (BDD). These practices promote testing up front and testing from the perspective of the user. When working on UI automated tests, it’s recommended the developers maintain standards for element locator IDs so the automation developers can write scripts concurrently with development.
Post Credits Scene
The challenges discussed in this post were not an exhaustive list of all the problems a team could face with test automation but do provide insight into common issues. Test automation is a big investment for an organization; it’s not a magic wand that makes all testing less costly or finds all your bugs. Automation is another tool to support the team in their quest for quality. Teams that treat their automation code the same as development code and follow practices that promote good code quality are more likely to have long-term success with their automated tests. You don’t have to be a superhero to write good automated tests – all you need is a desire to improve and the will to see it through.