Diffy: Testing services without writing tests

Thursday, 3 September 2015

Today, we’re excited to release Diffy, an open-source tool that automatically catches bugs in Apache Thrift and HTTP-based services. It needs minimal setup and is able to catch bugs without requiring developers to write many tests.

Service-oriented architectures like our platform see a large number of services evolve at a very fast pace. As new features are added with each commit, existing code is inevitably modified daily – and the developer may wonder if they might have broken something. Unit tests offer some confidence, but writing good tests can take more time than writing the code itself. What’s more, unit tests offer coverage for tightly-scoped small segments of code, but don’t address the aggregate behavior of a system composed of multiple code segments.

Diffy: Testing services without writing tests

Each independent code path requires its own test.

As the complexity of a system grows, it very quickly becomes impossible to get adequate coverage using hand-written tests, and there’s a need for more advanced automated techniques that require minimal effort from developers. Diffy is one such approach we use.

What is Diffy?
Diffy finds potential bugs in your service by running instances of your new and old code side by side. It behaves as a proxy and multicasts whatever requests it receives to each of the running instances. It then compares the responses, and reports any regressions that surface from these comparisons.

The premise for Diffy is that if two implementations of the service return “similar” responses for a sufficiently large and diverse set of requests, then the two implementations can be treated as equivalent and the newer implementation is regression-free.

We use the language “similar” instead of “same” because responses may be prone to a good deal of noise that can make some parts of the response data structure non-deterministic. For example:

  • Server-generated timestamps embedded in the response
  • Use of random generators in the code
  • Race conditions in live data served by downstream services

All of these create a strong need for noise to be automatically eliminated. Noisy results are useless for developers, because trying to manually distinguish real regressions from noise is like looking for a needle in a haystack. Diffy’s novel noise cancellation technique distinguishes it from other comparison-based regression analysis tools.

How Diffy works
Diffy acts as a proxy which accepts requests drawn from any source you provide and multicasts each of these requests to three different service instances:

  1. A candidate instance running your new code
  2. A primary instance running your last known-good code
  3. A secondary instance running the same known-good code as the primary instance

Here’s a diagram illustrating how Diffy operates:

Diffy: Testing services without writing tests

As Diffy receives a request, it sends the same request to candidate, primary and secondary instances. When those services send responses back, Diffy compares these responses and looks for two things:

  1. Raw differences observed between the candidate and primary instances.
  2. Non-deterministic noise observed between the primary and secondary instances. Since both of these instances are running known-good code, we would ideally expect responses to be identical. For most real services, however, we observe that some parts of the responses end up being different and exhibit nondeterministic behavior.

These differences may not show up consistently on a per-request basis. Imagine a random boolean embedded in the response. There is a 50% chance that the boolean will be the same across primary and secondary and a 50% chance that candidate will have a different value than primary. This means that 25% of the requests will trigger a false error and result in noise. For this reason, Diffy looks at the aggregate frequency of each type of error across all the requests it has seen to date. Diffy measures how often primary and secondary disagree with each other versus how often primary and candidate disagree with each other. If these measurements are roughly the same, then it determines that there is nothing wrong and that the error can be ignored.

Getting started
Here’s how you can start using Diffy to compare three instances of your service:
1. Deploy your old code to localhost:9990. This is your primary.
2. Deploy your old code to localhost:9991. This is your secondary.
3. Deploy your new code to localhost:9992. This is your candidate.
4. Build your diffy jar from the code using the “./sbt assembly” comand.
5. Run the Diffy jar with following command from the diffy directory :
java -jar./target/scala-2.11/diffy-server.jar \
-candidate="localhost:9992" \
-master.primary="localhost:9990" \
-master.secondary="localhost:9991" \
-service.protocol="http" \
-serviceName="My Service" \
-proxy.port=:31900 \
-admin.port=:31159 \
-http.port=:31149 \
-rootUrl=’localhost:31149’
6. Send a few test requests to your Diffy instance:
curl localhost:31900/your_application_route
7. Watch the differences show up in your browser at localhost:31149. You should see something like this:

Diffy: Testing services without writing tests

8. You can also see the full request that triggered the behavior and the full responses from primary and candidate:

Diffy: Testing services without writing tests

Visit the Github repo for more detailed instructions and examples.

As engineers we all want to focus on building and shipping products quickly. Diffy enables us to do that by keeping track of potential bugs for us. We hope you can gain from the project just as we have, and help us to continue improving it over time.