Validating Site Analytics

Almost every modern company with an e-commerce presence makes decisions with the help of site data and analytics. The questions posed surrounding a user base can be almost endless. Which pages are people viewing? What marketing campaigns and promotions are actually working? How much revenue is being generated and where is coming from?

In an environment where data is valuable and accessible, it’s important to take a step back and ask the question: is this data accurate? If the data Brad Pitt was basing decisions on to run a baseball organization in the movie Moneyball wasn’t correct, then it would’ve been an extremely short movie (if not somewhat comical). Ultimately, the analytics collected from our websites and applications are used to make important decisions for our organizations. When that data turns out to be inaccurate then it becomes worthless, or worse yet, negatively impacts our business.

Throughout my professional career I have noticed ensuring the integrity of this data can often be put on the backburner within individual software teams. Sure, it’s one of the most important things to leadership, but in our day-to-day job we are often focused on more visible functionality rather than the one network call in the background that is reporting data and doesn’t have anything to do with our apps actually working. At the end of the day, if this data is valuable to our leaders and organization, then it should be valuable to us.

Let’s look at an imaginary business scenario. Say we have a site that sells kittens. Our site sells all kinds of kitten breeds. Our Agile team been working on the site for a long time and feels pretty good about our development pipelines and practices. The automated testing suite for the site is robust and well maintained, with lots of scripts and solid site coverage.

Then one day we find out that Billy from the business team has been doing user acceptance testing on our Adobe Analytics once every couple months. He’s got about 200 scripts that he manually goes through, and he does his best to look at all the really important functionality. But wait a second… we know that our site records data for about 100 unique user events. What’s more, there are about 200 additional fields of data that we are sending along with those events, and we are sending data on almost every page for almost every significant site interaction. This could easily translate into thousands of test cases! How could we possibly be confident in our data integrity when we are constantly making changes to these pages? How in the world is Billy okay with running through these scripts all the time? Is Billy a robot? Can we really trust Billy?

This new information seems like a potential quality gap to our team, and we wonder how we can go about automating away this effort. It definitely checks all the boxes for a good process to automate. It is manual, mundane, easily repeated, and will result in significant time savings. So what are our options? Our Selenium tests can hit the front end, but have no knowledge of the network calls behind the scenes. We know that there are 3rd party options, but we don’t have the budget to invest in a new tool. Luckily, there’s an open source tool that will hook up to our existing test suite and won’t be hard to implement.

The tool that we’re talking about is called Browserup Proxy (BUP), formerly known as Browsermob proxy. BUP works by setting up a local proxy that network traffic can be passed through. This proxy then captures all of the request and response data passing through it, and allows us to access and manipulate that data. This proxy can do a lot for us, such as blacklisting/whitelisting URLs, simulating network conditions (e.g. high latency), and control DNS settings, but what we really care about is capturing that HTTP data.

BUP makes it relatively easy for us to include a proxy instance for our tests when we instantiate our Selenium driver. We simply have to start our proxy, create a Selenium Proxy object using our running proxy, and pass the Selenium Proxy object into our driver capabilities. Then we execute one command that tells the driver to create HAR files containing request and response data.

from the BUP GitHub page at https://github.com/browserup/browserup-proxy

Since we will be working with HAR files, let’s talk about what those actually are. HAR stands for “HTML Archive”. When we go into our Network tab in our browser’s Developer Tools and export that data, it’s also saved in this format. These files hold every request/response pair in an entry. Each entry contains data such as URL’s, query string parameters, response codes, and timings.

HAR file example from google.com using Google’s HAR Analyzer
HAR entry details example

Now we can better visualize what we’re working with here. Assuming we’ve already collected our 200 regression scenarios from Billy the Robot, we should have a good jumping off point to start validating this data more thoroughly. The beauty of this approach is we can now hook these validations up to our existing tests. We already have plenty of code to navigate through the site, right? Now all we need is some additional code to perform some new validations.

Above we mentioned that our site is using Adobe Analytics. This service passes data from our site to the cloud using some interesting calls. Each Adobe call will be a GET that passes its data via the query parameters. So in this case we need to find the call that we’re looking to validate, and then make sure that the correct data is included in that call. To find the correct call, we can simply use a unique identifier (e.g. signInClickEvent) and sort through the request URLs until we find the correct call. It might be useful to use the following format to store our validation data:

Data stored in YML format

Storing data this way makes it simple and easy to worth with. We have a descriptive name, we have an identifier to find the correct request, and we have a nice list of fields that we want to validate. We can allow our tests to simply ignore the fields that we’re not specifically looking to validate. Our degree of difficulty will increase somewhat if we are trying to validate entire request or response payloads, but this general format is still workable. So to review our general workflow for these types of validations:

  1. Use suite to instantiate Proxy
  2. Pass Proxy into Selenium driver
  3. Run Selenium scripts as normal and generate desired event(s)
  4. Load HTTP traffic from Proxy object
  5. Find correct call based on unique identifier
  6. Perform validation(s)
  7. (optional) Save HAR file for logs

Not too bad! We can assume that our kitten site probably already has a lot of our scenarios built out, but we just didn’t know it before. There’s a good chance that we can simply slap some validations onto the end of some existing scripts and they’ll be ready to go. We’ll soon be able to get those 200 UAT scripts built out in our suite and executing regularly, and Billy will have a little less work on his plate going forward (the psychopath).

In my opinion, it’s a very good idea to implement these validations into your test automation frameworks. The amount of value they provide compared with the amount of effort required (assuming you are already running Selenium scripts) makes this a smart functionality to implement. Building out these tests for my teams has contributed to finding a number of analytics defects that probably would’ve never been found otherwise and, as a result, has increased the quality of our site’s data.

A few notes:
– We don’t necessarily want to instantiate our Proxy with every Selenium test we run. The proxy will consumer additional resources compared to running normal tests, but how much this affects your test box will vary depending on hardware. It is recommended that you use some sort of flag or environment variable to determine if the Proxy should be instantiated.
– It can seem practical to make a separate testing suite to perform these validations, but with that approach you will have to maintain practically duplicate code in more than one place. It is easier to plug this into existing suites.
– BUP is a Java application that has it’s own directory and files. The easiest way to manage distribution of these files is to plug it into version control in a project’s utility folder. There is no BUP installation required outside of having a valid Java version.
– I wanted to keep this post high level, but if you are using Ruby then there are useful gems to work with Browserup/Browsermob and HAR files (“browsermob-proxy” and “har”, respectively).

Happy testing!

Additional References:

Browserup Proxy
Browsermob Proxy Ruby gem
HAR Ruby gem

Welcome to Red Green Refactor

We officially welcome you to the start of Red Green Refactor, a technology blog about automation and DevOps. We are a group of passionate technologists who care about learning and sharing our knowledge. Information Technology is a huge field and even though we’re a small part of it – we wanted another outlet to collaborate with the community.

Why Red Green Refactor?

Red Green Refactor is a term commonly used in Test Driven Development to support a test first approach to software design. Kent Beck is generally credited with discovering or “rediscovering” the phrase “Test Driven Development”. The mantra for the practice is red-green-refactor, where the colors refer to the status of the test driving the development code.

The Red is writing a small piece of test code without the development code implemented. The test should fail upon execution – a red failure. The Green is writing just enough development code to get the test code to pass. The test should pass upon execution – a green pass. The Refactor is making small improvements to the development code without affecting the behavior. The quality of the code is improved according to team standards, addressing “code smells” (making the code readable, maintainable, removing duplication), or using simple design patterns. The point of the practice is to make the code more robust by catching the mistakes early, with an eye on quality of the code from the beginning. Writing in small batches helps the practitioner think about the design of their program consistently.

“Refactoring is a controlled technique for improving the design of an existing codebase.”

Martin Fowler

The goal of Red Green Refactor is similar to the practice of refactoring: to make small-yet-cumulative positive changes, but instead in learning to help educate the community about automation and DevOps. The act of publishing also encourages our team to refine our materials in preparation for a larger audience. Many of the writers on Red Green Refactor speak at conferences, professional groups, and the occasional webinar. The learning at Red Green Refactor will be bi-directional – to the readers and to the writers.

Who Are We?

The writers on Red Green Refactor come from varied backgrounds but all of us made our way into information technology, some purposefully and some accidentally. Our primary focus was on test automation, which has evolved into DevOps practices as we expanded our scope into operations. Occasionally we will invite external contributors to post on a subject of interest. We have a few invited writers lined up and ready to contribute.

“Automation Team” outing with some of Red-Green-Refactor authors

As for myself, I have a background in Physics & Biophysics, with over a decade spent in research science studying fluorescence spectroscopy and microscopy before joining IT. I’ve worked as a requirements analyst, developer, and tester before joining the ranks of pointed-headed management. That doesn’t stop me from exploring new tech at home though or posting about it on a blog.

What Can You Expect From Red Green Refactor?

Technology

Some companies are in the .NET stack, some are Java shops, but everyone needs some form of automation. The result is many varied implementations of both test & task automation. Our team has supported almost all the application types under the sun (desktop, web, mobile, database, API/services, mainframe, etc.). We’ve also explored with many tools both open-source and commercial at companies with ancient tech and bleeding edge. Our posts will be driven by both prior experience as well as exploration to the unknown.

We’ll be exploring programming languages and tools in the automation space.  Readers can expect to learn about frameworks, cloud solutions, CI/CD, design patterns, code reviews, refactoring, metrics, implementation strategies, performance testing, etc. – it’s open ended.

Continuous Improvement

We aim to keep our readers informed about continuous improvement activities in the community. One of the great things about this field is there is so much to learn and it’s ever-changing. It can be difficult at times with the firehose of information coming at you since there are only so many hours in the day. We tend to divide responsibility among our group to perform “deep dives” into certain topics and then share that knowledge with a wider audience (for example: Docker, Analytics or Robot Process Automation). In the same spirit we plan to share information on Red Green Refactor about continuous improvement. Posts about continuous improvement will include: trainings, conference recaps, professional groups, aggregated articles, podcasts, tech book summaries, career development, and even the occasional job posting.

Once again welcome to Red Green Refactor. Your feedback is always welcome.