Insights

The testing renaissance

and

Monday, 30 October 2017

Don’t you love to crank code? As developers, we love to refactor; we love to be productive. As managers, how can we let the coding cowboys on our team be free, while maintaining code quality? At Twitter, we’ve been experimenting with feature testing. There are at least seven teams doing some form of feature testing of their services. Some of these teams have even discarded most of their single class tests, resulting in a test suite that’s 95% feature tests. It may be hard to believe, but throwing away single class tests can actually make development teams faster.

Let’s back up a bit and first define some terms:

Single class test - A test verifying methods of a single class. Any dependencies external to the class are ignored or mocked out. Note that some single class tests also qualify as feature tests in a few cases, depending on the scope of the “feature” under test.

Feature test (#featuretests) - A test verifying a service or library as the customer would use it, but within a single process. A few examples:

Given a search service that returns tweets based on a query, a test that feeds in a fake tweet, queries for that tweet, and verifies it’s found. This is clearly not a single class test
A test verifying a library such as netty using its public APIs only, perhaps mocking JDK APIs for failure testing

Integration test - One of a few definitions is possible (thus we’ll avoid this term)

A test covering many servers and verifying that they work together
A test covering many classes and verifying that they work together

Unit test - A test verifying a “unit” of functionality. What qualifies as a unit test is ambiguous, since the size of a unit can vary, or could even mean any test defined in JUnit. We will avoid this term as well.

Having established the terminology, we should be clear that this concept is not new. It was alluded to here ten years ago. Today, feature tests, sometimes referred to as integration tests, are being used on multiple projects. More novel, however, is the decision to delete single class tests on a project and use all or mostly feature tests.

Although creating your first feature test on a service takes a bit of time and can require complex setup, benefits typically outweigh the cost. Let’s look at some of the advantages.

Advantages

Tests as documentation of use cases - Developers can easily step through the entire service, and working through the feature tests gives new developers good understanding of the feature. The end purpose of feature tests is generally much clearer than individual unit tests.

A safety net for refactoring - Properly designed feature tests provide comprehensive code coverage and don’t need to be rewritten because they only use public APIs. Attempting to refactor a system that only has single class tests is often painful, because developers usually have to completely refactor the test suite at the same time, invalidating the safety net. This incentivizes hacking, creating tech debt
Testing from customer point-of-view - Leads to better user APIs
Test end-to-end behavior - With only single class tests, the test suite may pass but the feature may be broken, if a failure occurs in the interface between modules. Feature tests will verify end-to-end feature behavior and catch these bugs.
Write fewer tests - A feature test typically covers a larger volume of your system than a single class test
Service as pluggable library - If setup correctly, feature tests lead towards a service design in which the service module itself is embeddable in other applications
Test remote service failure and recovery - It’s much easier to verify major failure conditions and recovery in feature tests, by invoking API calls and checking the response

Common concerns

We have encountered many concerns about feature testing over the years, but we’ve found that most of those arguing against it have not fully embraced it on a project. Below are some common concerns we’ve heard:

“I am adding this filter and now, since we are doing feature testing, there are so many more combinations I need to test. How do I do this?” - It turns out the number of important combinations is roughly equivalent to those in a single class test suite. Think about how many tests you were going to write as a single class test. Write feature tests covering the same conditions, and you are done for now. As corner case bugs pop up, add test cases for them.
“My feature test broke, and it’s way harder to debug than single class tests.” - Think about how much time you will spend if a customer reports a new bug. First, you spend time trying to reproduce the bug. Next, you have to debug and fix the problem. Finally, you have to write single class tests. In general, reproducing the bug alone will cost more time than you spent debugging a feature test. In fact, if you have trouble debugging a properly-designed feature test, it is actually the same as debugging your whole server and we assert you have bigger issues if you have trouble debugging your server.
“I have no idea what this test is testing or doing. We should do single class unit testing instead.” - To me this seems like an issue of not understanding your customers and their use cases. Feature tests naturally provide great documentation (see advantage above), so step through that test case, reverse engineer it, retrain the team on lost knowledge. It’s true that there are cases where features from customers go away, so learning an old unused feature can be a waste of time. However, the implementation still exists in this case; you could remove the code and rely on your other feature tests to make sure you only remove this feature. If you had only single class tests, it would not be obvious which tests are safe to delete and which cover other features. Admittedly, this is a pain point in both worlds.
“What happens when you add a feature test and the only way to get it to pass is to break another feature test?” - This is where feature tests really shine. You now have discovered a case in which doing something for one customer will break another customer. Having caught this (likely production, not testing) issue early, you can refactor your system and tests to support both use cases.
“I can’t test a certain snippet of code like this Runnable or TimerTask from a feature test.” - In every specific case we’ve encountered so far, we’ve been able to create feature tests covering the code. Occasionally this is tricky, but there are feature testing patterns for guidance, and often refactoring the code a bit to make feature testing easier leads to better production code.
“It takes a big time investment to setup feature testing and even way more investment on a legacy system.” - Yup, all we can say here is that we found it well worth the effort, especially if you end up deleting most of the single class tests. Once you’ve converted your test suite, you can refactor safely without touching tests, as mentioned above (unless, of course, you modify feature behavior -- in that case you’d have to be very careful regardless, to avoid breaking customers).
“We depend on too many remote services that require mocking.” - I start with feature testing specifically to avoid this. There are ways to work around this issue, such as composing multiple service calls into one, and mocking that new interface (this is rarely necessary, but sometimes cleans up a ton of test code).

Examples

Let’s start out with an example however of a webapp written using webpieces.

The design of this small project is a query for tweets that hits a remote service TweetSearchService to retrieve tweet ids, and then hits HydratorService to retrieve the actual tweet data. The design looks like:

This post is unavailable

This post is unavailable.

The code to setup our test suite is fairly terse (excluding the reusable setup code called from this method):

This post is unavailable

This post is unavailable.

This exemplifies a well-designed feature test framework, in that minimal code is required to

Create the server (“new Server(...)”)
Start the server (“webserver.start()”)
Create and connect an HTTP client (“http11Socket = connectHttp(...)”)
Declare standard mock responses for dependent services

If you look closely, there is also an AppOverridesModule which allows you to override remote endpoints with mocks, using Guice:

This post is unavailable

This post is unavailable.

Now, let’s move on to a typical test case, which is similarly terse:

This post is unavailable

This post is unavailable.

Feature tests generally follow the same basic template:

First, add responses (often shared amongst tests) to the mocks
Next, create and send an actual customer request (request objects are also typically shared amongst tests)
Last, assert the response from the API, occasionally also asserting expected requests to downstream services

Setting up the first feature test is a bit of work, but many have been shocked at how little test code you have to write once your framework is in place. You really get a big bang for your buck in feature testing. This one test covers all five classes shown in the design diagram above. Using the single-class test strategy, you’d have to write five tests to cover those same classes.

The full code can be found here.

Advanced cases

In feature testing, there are some pretty interesting and fun design scenarios that come up. For instance, how do you write a feature test if SearchController above is using an ExecutorService? How do you test code that uses Thread.sleep? How do you deal with code that has a while(true) loop? Dealing with time (e.g. System.currentTimeMillis()) can be tricky in feature testing, as well as getting timer tasks to fire appropriately in the feature test (e.g., how do you make a timer task run now, even though it was scheduled to go off tomorrow in the production code?).

There are a few patterns to help with these advanced cases, which are beyond the scope of this article, but we are available to help with specific scenarios -- feel free to reach out!

Summary

As with anything in software engineering, feature testing is not a silver bullet, but we are ecstatic about our development speed these days. Refactoring is much more likely to happen on our team, as we do not have the burden of worrying about rewriting tests and blowing away our safety net. As our safety nets grow, we gain the ability to be coding cowboys/cowgirls that have a good safety net to catch our mistakes. Hit us up on Twitter at @deansoldtweets or @frogbuddha42 and feel free to use the Twitter hashtag #featuretests.

Although this blog article is too brief to make a complete case, we hope we’ve given you a compelling reason to consider feature testing.

This post is unavailable

This post is unavailable.