Mobile app playbook: progressive improvements and testing

By

Thursday, 18 February 2016

About a year ago, I was helping out our recruiting team at a college job fair when a candidate earnestly asked me “Why are you looking for Android engineers? I mean, the Twitter app is fully built, so what are they working on?” The question caught me a little off guard, but we went on to have a pretty good conversation about the improvements we’re continually making to the app, and how the development work is never really done.

Part eight: progressive improvements and testing

There’s always lots to work on in any app: fixing bugs, or starting to use new tools and OS-level APIs; improving load times and resilience under poor network conditions; making accessibility and internationalization updates; or experimenting with new layouts.

Often, there’s a business goal behind what you’re doing, whether it’s explicitly stated or not. Fixing bugs is a good thing to do for its own sake, but it’s also key for retention, since people tend to delete or stop using slow, crashy apps. Proper internationalization is important for adoption because many people don’t use apps that aren’t available in their country or don’t work in their language. Tweaks in user experience and design can have massive impact on engagement or purchases.

Whenever you make significant changes, it’s important to keep track of how they’re affecting the people using your app. Talking to your customers, in person or online, is always important. Feedback conversations provide nuance and details that you can’t always capture with a few top-line metrics. You can and should use qualitative research to shape your approach to customers, but you can’t always be in touch with each segment of people that’s using your app. Logging metrics about the changes you make is critical, because analytics are user feedback at scale.

When you’re ready to start moving past your MVP and shipping major updates, we have a few suggestions for how to do it right:

Establish a baseline

It’s hard to measure the impact of changes you make if you don’t know how you’re doing to start with. Before going into any experiments, you should already know key numbers like your current DAUs, MAUs, retention rates, and conversion rates on events like purchases or social shares (bonus: our free analytics tool Answers gives you these stats in real time).

Decide if your change actually needs to be user tested

You can safely assume that you should just fix bugs, or just ship a version of your app that works with a screen reader. You should certainly try and measure the effects that these types of change have on your business metrics — but unless it’s a significant change to the way your app operates, you may not need to spend significant time testing it out first.

Take the time to actually construct a hypothesis to test

What do you think might happen after you make this change? How will you know that it has or hasn’t happened? What do you consider success? Measure the effect you think it will have, but also keep an eye on your other key metrics — you may notice unintended effects alongside the ones you were testing.

Check the math

As in other parts of this series, we’re assuming that you’re operating lean and don’t have a whole data science team behind you to help create and manage these tests. Here are a few important things to keep in mind when you do that:

Make sure you have substantial minimum sample sizes. Whether you’re running an A/B test or a multivariate test, you want your results to be statistically significant — that is, you want to be clear that the results you’re observing are due to a real difference between the two samples and not just due to random variation. Think about the average traffic your app or site gets, and get a sense of how long it’s going to take to collect enough responses to draw a conclusion (1,000 is a reasonable lower bound on responses, but you can use tools like this handy calculator to see what minimum sample you should collect).
Pay attention to how daily, weekly, or seasonal usage of your app can affect the sample of people you gather for the test. Most testing methods assume that you have a random sample of your users. For example, if you run your test only during the weekdays, you may be skewing the sample set by excluding people who only use your app on weekends. Consider running tests for at least a full week.
Be careful when running simultaneous tests. Tests can affect one another, particularly if you’re running the test with the same group of people, and it can be difficult to disentangle the results. If you want to run tests simultaneously, try to pick independent components of your app to test.

Running a statistically robust experiment isn’t as simple as it looks on the surface. This blog post from ConversionXL gives some useful details on common mistakes people make, including ignoring validity threats, and increasing chances of false positives by testing too many variations at once. A/B testing frameworks like Optimizely help take a lot of the guesswork out of running tests and bake in some best practices around measurement and interpretation of the results, so you can make better decisions. There are also lots of great tips from around the web about running robust tests for your business.

What key things do you measure for your app? Tweet using #MobileAppPlaybook to tell us what’s important to you!

And don’t forget to check out this new interview about Cards with the Developer Advocate, Jonathan Cipriano.

https://twitter.com/TwitterDev/statuses/701884799120592896