For the first time, Twitter participated in the Google Summer of Code (GSoC) and we want to share news on the resulting open source activities. Unlike many GSoC participating organizations that focus on a single ecosystem, we have a variety of projects spanning multiple programming languages and communities.
Today, we’re excited to open source Clockwork Raven, a web application that allows users to easily submit data to Mechanical Turk for manual review and then analyze that data. Clockwork Raven steps in to do what algorithms cannot: it sends your data analysis tasks to real people and gets fast, cheap and accurate results. We use Clockwork Raven to gather tens of thousands of judgments from Mechanical Turk users every week.
We are a heavy adopter of Apache Hadoop with a large set of data that resides in its clusters, so it’s important for us to understand how these resources are utilized. At our July Hack Week, we experimented with developing HDFS-DU to provide us an interactive visualization of the underlying Hadoop Distributed File System (HDFS).
Trident is a new high-level abstraction for doing realtime computing on top of Twitter Storm, available in Storm 0.8.0 (released today). It allows you to seamlessly mix high throughput (millions of messages per second), stateful stream processing with low latency distributed querying.
We recently open sourced TwitterCLDR under the Apache Public License 2.0. TwitterCLDR is an “ICU level” internationalization library for Ruby that supports dates, times, numbers, currencies, world languages, sorting, text normalization, time spans, plurals, and unicode code point data. By sharing our code with the community we hope to collaborate together and improve internationalization support for websites all over the world.