Safely Testing Applications in Production

There are many different types of application testing, such as unit tests, integration tests, performance tests, availability tests, and many more. This article is entirely about system-level testing, the most outer-level user experience testing. For example, with a web application, system-level testing is testing from an end-user perspective. System-level represents what’s most important from a functional standpoint as it verifies that users can do business. Other forms of functional testing are lower level and may or may not impede an application user from doing business.

Unfortunately, system-level testing is the most difficult to automate and keep comprehensive. Furthermore, automated system-level tests are the most fragile. Simple user interface changes can break automated tests written using Selenium or other testing products. I’ve seen entire teams spend weeks attempting to automate system-level tests for complicated web applications with little benefit to show for it. Furthermore, such activity tends to slow down the process of delivering new features into the hands of end users. Clearly, the traditional tactic of automating system-level testing doesn’t work very well. So, what’s the alternative?

Outsource system-level testing to a small percentage of end users in production.Instead of formally maintaining fragile web UI automation, expose a small percentage of end users to new features in production and measure their error rate. The benefits are many. There are no fragile system-level tests to maintain. End-useractions will be more representative of real-world usage than scripted automated system-level tests could ever be.

There are two primarily used tactics for testing in production: Canary deployments and feature toggles. Canary deployments establish a sister implementation of the application in production and implement a “traffic cop” that decides which version of the application an end user is directed to. Feature toggles are embedded in application logic and act as a “traffic cop” at runtime to decide if a user is exposed to a new feature or not.In both cases, the traffic cop rules can be changed easily without redeploying the application. In both cases, new functionality is “deployed” to production but is only active for users or transactions you want it to be.

Both canary deployments and feature toggle tactics provide valuable information on errors and can be provided to the development team for remediation.Ideally, automated unit and/or integration tests would have caught the error before exposing the new functionality to end users. That said, accidental release of defects to production is expected no matter how system-level tests are conducted. When errors are reported and diagnosed, unit and/or integration tests should be enhanced to catch the error in addition to resolving the issue. When fixes are ready, they can be deployed to production and retested by a small percentage of end users.

Both tactics support continuous deployment and delivery. These days, speed to market for new features often providesa competitive advantage for the business.Essentially, both tactics allow you to “fail forward” and fix defects that make it to production instead of formally backing out a new release.

Both canary deployments and feature toggleshave implementation costs. Nothing is free. Let’s look at costs and implementation requirements.

Canary deployments require 100% infrastructure as code (IaC). It is not practical to establish a “sister” version of the application that is guaranteed to operate identically in every way otherwise. This takes a level of DevOps maturity and discipline. My focus for this article is not on IaC, but if you still maintain infrastructure manually, you shouldn’t. Manual maintenance depresses all four DORA metrics.

Canary deployments require a traffic cop capable of managing which implementation end-user transactions are directed to.DNS is the most straightforward tactic, but it is course-grained. DNS doesn’t have the capabilities to more specifically identify end users you want to enlist. Often a proxy such as NGinX is used that can implement more sophisticated selection rules.

Canary deployments frequently increase runtime costs as two copies of the application are deployed at the same time. How material those runtime costs vary per application. That is, for some applications, the increase in runtime costs is low and not a large percentage of your entire cost structure.

Feature toggles require a traffic cop to evaluate whether an end user should be exposed to a new feature.While it is possible to build your own feature toggle runtime evaluation engine, most don’t. There are open-source alternatives, such as Unleash, as well as a wide variety of commercial alternatives.At runtime, the application code will invoke the evaluation engine to decide if a feature is active for the current transaction or not. As that evaluation engine is a separate application, those activation rules can be changed at any point and don’t require redeployment of your application to roll out or recall features.

Feature toggles increase complexity temporarily in application code.As feature toggles require conditional logic in your application code to support whether a new feature is active for a given transaction, it increases application complexity. Additionally, code for both your current version and new feature exist in the application while the feature is rolled out.

It is important to remove feature toggles after new features are released to all users.Consider obsolete feature toggle code technical debt that makes application code harder to maintain going forward.

Feature toggles do not increase runtime costs like canary deployments do but do increase development costs due to the temporary increase in complexity. In other words, nothing is free. There is some level of discipline needed to manage feature toggles through the life cycle.

It may be counter-intuitive, but testing applications in production with a managed set of production users is the safest, fastest, and most comprehensive testing you can do.Furthermore, it complements continuous delivery and deployment. It relieves you of the aggravation of creating and maintaining fragile system-level tests. It enables you to accelerate the process of getting new features into users’ hands and move faster than your competition.