Open source

Introducing Serial: Improved Data Serialization on Android

By Ali Fauci
Monday, 6 November 2017

Smooth timeline scrolling on the Twitter for Android app is important for the user experience, and we’re always looking for ways to improve it. With some profiling, we discovered that serializing and deserializing data to and from the database using standard Android Externalizable classes was taking around 15% of the UI thread time. Existing libraries provided little support for making iterative changes, and any changes that broke serialization caused bugs that were difficult to identify and fix without wiping the database clean.

We set out to fix this, and today we are excited to announce Serial, a new open source library for serialization.

When we started developing Serial, we identified four main pain points around the standard Android serialization libraries:

  • Performance: Slow serialization was directly impacting the user experience.
  • Debuggability: When there was a bug in our serialized data, the debugging information was obtuse and provided very little insight into how to approach a fix.
  • Backwards compatibility: Android libraries had little to no support for making changes to objects that are serialized without wiping the serialized data completely, which made iteration difficult.
  • Flexibility: We wanted a library that could be easily adopted by our existing code and model structure.

While other Java serialization libraries like Kryo and Flatbuffer attempt to solve some overlapping problems, no libraries that we found fit these needs effectively on Android. The libraries tend to target performance and backwards compatibility, but ignore debuggability and often require major changes to the existing codebase in order to adopt the framework.

Performance

We pinpointed reflection as a culprit for performance. With Externalizable, information about the class, including the class name and package, are added to the byte array when serializing the object. This allows the framework to identify which object to instantiate with the serialized data, and where to find that class in the app package structure, but it is a time consuming process.

To remove this inefficiency, Serial allows the developer to define a Serializer for each object that needs to be serialized. The Serializer can explicitly enumerate how each field should be written to and read from the serialized stream. This removes the need for reflection and dynamic lookups.

Note: The real Tweet object and other model objects in our codebase that get the most benefit from these changes are significantly larger than this example, but for simplicity this is a scaled down version.

This Tweet is unavailable
This Tweet is unavailable.
file

Additionally, Serializers are stateless and can be used directly as static instances.

In a test with a large TweetEntities object, we found significant improvements in the size and average serialization and deserialization speed.

This Tweet is unavailable
This Tweet is unavailable.
file

As shown in this table of results, roundtrip serialization was more than 3x faster, with serialization being almost 5x faster and deserialization about 2.5x faster (deserialization is inherently slower because of the time spent in creating a new instance of the object). The space it took to serialize this object decreased by almost 5x. These improvements will vary based on the object being serialized, but the gains are significant.

Debuggability

To improve debuggability of serialization, we took advantage of the structure of our serializer to provide meaningful and descriptive error messages. The majority of issues occur when deserialization finds an unexpected value in the input, either because something has changed since the object was serialized or there’s a bug in the serialization code. When a problem occurs, Serial provides an exception with exactly what the unexpected value was, and a dump of the structure of the serialized data to allow the developer to see where in the object the issue occurred. This helps developers easily identify and fix the issue.

This Tweet is unavailable
This Tweet is unavailable.
file

Backwards compatibility

In the past, if a developer wanted to make a change to a model object, it broke serialization and usually required a database clean. To fix this problem and allow developers to make changes quickly and without additional cost, we added versioning. If a field is added or removed, you can increment the version of the Serializer. This version is written into the serialized data so that during deserialization, the version is read and can be used in the deserialization code to specify exactly what fields are expected for that version.

This Tweet is unavailable
This Tweet is unavailable.
file

Additionally, we removed any reference to the class name or package from the serialized data. Unlike Android libraries, where if you moved the class or changed the name serialization would break, Serial makes it easy for developers to make changes to their codebase without these repercussions.

Flexibility

In order to make Serial easy to use, it was important that it not require significant changes to the existing model structure. Since a Serializer is defined as a standalone class that can be statically called to serialize an object, almost any model object can be serialized using this framework, even existing Android objects or objects from other imported libraries. To use a Serializer to serialize an object, you just need to create an instance of the class Serial, which allows you to transform an object to and from a byte array using the defined Serializer.

This Tweet is unavailable
This Tweet is unavailable.
file
file

Under the hood

The Serial framework serializes objects into a compact and well structured byte stream. Each object is serialized into a byte array with an object start, the serialized object fields, and an object end. The serialized object fields are either serialized objects themselves or simple types, which consist of a type header and a value.

This Tweet is unavailable
This Tweet is unavailable.
file

For the Tweet object example above, the resulting byte array would have the following structure.

This Tweet is unavailable
This Tweet is unavailable.
file

The object start and end delimiters help us ensure that the serialized data is valid, and dump the structure as shown in the debuggability section above. Additionally, we can store the version number of the serialized object in the header so it’s the first thing read during deserialization and can be used to support backwards compatibility when reading the subsequent fields.

Compared to including package and class data in the serialized stream, Serial is much more efficient with its use of metadata. While these headers add some extra data, the benefits around debuggability and backwards compatibility resulting from them add more value.

Try it out

You can find all of Twitter’s Open Source projects on our Github page at https://twitter.github.io, and you can download Serial at https://github.com/twitter/serial. We're excited to continue contributing to the open source community and look forward to your feedback.

 

This Tweet is unavailable
This Tweet is unavailable.
@fauciforthewin

Ali Fauci

‎@fauciforthewin‎

Software Engineer

Only on Twitter