Update - July 11, 2012, 9:45am
We want to correct an error regarding the slab calcification problem we mentioned in the original post. This problem only applied to our v1.4.4 fork of Memcached; this correction is reflected below. The recent Memcached version has addressed some of these problems.
We built Twemcache because we needed a more robust and manageable version of Memcached, suitable for our large-scale production environment. Today, we are open-sourcing Twemcache under the New BSD license. As one of the largest adopters of Memcached, a popular open source caching system, we have used Memcached over the years to help us scale our ever-growing traffic. Today, we have hundreds of dedicated cache servers keeping over 20TB of data from over 30 services in-memory, including crucial data such as user information and Tweets. Collectively these servers handle almost 2 trillion queries on any given day (that’s more than 23 million queries per second). As we continued to grow, we needed a more robust and manageable version of Memcached suitable for our large scale production environment.
We have been running Twemcache in production for more than a year and a half. Twemcache is based on a fork of Memcached v1.4.4 that is heavily modified to improve maintainability and help us monitor our cache servers better. We improved performance, removed code that we didn’t find necessary, refactored large source files and added observability related features. The following sections will provide more details on why we did this and what those new features are.
Almost all of our cache use cases fall into two categories:
An example of these two optimizations is “caching of Tweets”. All Tweets are persisted to disk when they are created, but most Tweets requested by users need to be served out of memory for performance reasons. We use Twemcache to store recent and frequently accessed Tweets, as an optimization for disk. When a Tweet shows up in a particular client, it takes a particular presentation - rendered Tweet - which has other metadata like number of retweets, favorites etc. We also use Twemcache to store the recently rendered Tweets, as an optimization for cpu.
To effectively address the use cases mentioned above, it’s extremely important that caches are always available and have predictable performance with respect to item hit rate even when operating at full capacity. Caches should also be able to adapt to changing item sizes on-the-fly as application data size grows or shrinks over time. Finally, it is critical to have observability into caches to monitor the health and effectiveness of our cache clusters. It turns out that all these problems are interrelated because adapting to changing item sizes usually requires a cache reconfiguration — which impacts availability and predictability. Twemcache tries to address these needs with the help of the following features:
The v1.4.4 implementation of Memcached, which Twemcache is based on, suffers from a problem we call slab calcification. In Memcached, a slab can only store items of a given maximum size and once a slab has been allocated to a slab class, it cannot be reassigned to another slab class. In other words, slabs once allocated are locked to their respective slab classes. This is the crux of the slab calcification problem. When items grow or shrink in size, new slabs must be to allocated to store them. Over time, when caches reach full memory capacity, to store newer items we must rely on evicting existing items in the same slab class. If the newer items are of a size with no slabs allocated, write requests may fail completely. Meanwhile, slabs allocated to a different slab class may sit idle. Slab calcification leads to loss of capacity and efficiency.
To solve this problem without resorting to periodically restarting the server instances, we introduced a new eviction strategy called random eviction. In this strategy, when a new item needs to be inserted and it cannot be accommodated by the space occupied by an expired item or the available free memory, we’ll simply pick a random slab from the list of all allocated slabs, evict all items within that slab, and reallocate it to the slab class that fits the new item.
It turns out that this feature is quite powerful for two reasons:
By providing a stable hit rate, random eviction prevents performance degradation due to data pattern change and system instability associated with restarts. The video below illustrates how over time Twemcache is able to adapt to a shifting size pattern and still remain effective.
Each line of the command log gives precise information on the client, the time when a request was received, the command header including the command, key, flags and data length, a return code, and reply message length. In fact, the only thing missing is the item value itself, which turns out to be unimportant for our analysis.
The command logger supports lockless read/write into ring buffers. Each worker thread logs into a thread-local buffer as they process incoming queries, and a background thread asynchronously dumps buffer contents to either a file or a socket. Thus the overhead on worker threads is minimal and so would not affect the service availability. The logger has been tested to log at about 100k requests-per-second. To control the speed of log generation, the command logger also supports sampling. Once we know what keys are accessed, the way they are accessed, and their return status, we can perform offline data analysis to estimate optimal working set size, item heat map, etc.
Twemcache is the result of our effort to turn Memcached into a reliable building block in Twitter’s data infrastructure. We kept the simplicity of the Memcached protocol intact, but made the service more dependable and more informative with Twemcache, without sacrificing performance. While we initially focused on the challenging goal of making Memcached work extremely well within the Twitter infrastructure, we look forward to sharing our code and ideas with the Memcached community in the long term.
In the near future, we plan to evolve Twemcache in the open, address the hashtable lock contention issue that would further improve scalability, support more eviction strategies, support bootstrapping the cache from disk and provide a complete set of real-time data analysis tools. To view the source code and share feedback, please visit the Twemcache GitHub page. You can also follow Twemcache’s Twitter account (@Twemcache) for updates. We would love to hear any ideas you have in improving Twemcache via pull requests or issues. Or better yet, why not consider joining the flock (@jointheflock) if you want to help build a world class caching system?
Other work: Twemproxy
Twemcache is one of the building blocks that comprise the caching system at Twitter. Another fundamental building block in our caching system is Twemproxy, a proxy for memcached protocol that we recently open sourced. Twemproxy minimizes the connections to our backend caching servers and enables us to scale horizontally. Furthermore, we are also actively developing the client side of our caching system on top of the Twitter Finagle stack.
Twemcache was primarily engineered by Manju Rajashekhar (@manju) and Yao Yue (@thinkingfish). In addition, we’d like to acknowledge the following folks who contributed to the project either directly or indirectly and its deployment and maintenance in our datacenters: Anirudh Srinivas (@asrin), David Lam (@kkdlam), Krishna Gade (@krishnagade), Joshua Coats (@shu), Owen Vallis (@o_e_bert), Rob Benson (@rgbenson), Brandon Mitchell (@bitbckt) and Xin Xiang (@xiangxin72).
- Chris Aniszczyk, Manager of Open Source (@cra)
Did someone say … cookies?