Open source

Leaving the Nest: Heron donated to Apache Software Foundation

By Maosong Fu
Monday, 26 February 2018

In 2014, we began development on Heron, a real-time streaming and analytics platform, to reliably process billions of events generated at Twitter every day. Today, we are proud to donate Heron to the Apache Incubator where the community will continue to grow and thrive under the guidance of the Apache Software Foundation. We are excited to see usage of Heron grow beyond our use cases here at Twitter, and look forward to continued community growth and collaborative support.

Heron is the next generation distributed streaming engine that was built to be backwards compatible with Apache Storm, which we open sourced in 2011 and donated to Apache. It was built to improve our developer and operational experiences with Storm and introduced a wide array of architectural improvements and native support for Apache Aurora. Heron has become our primary streaming system, reliabily powering all of Twitter’s real-time analytics and running hundreds of development and production topologies deployed on thousands of nodes.

Twitter Heron galvanized the streaming community by introducing several new ideas in stream processing including:

  • Backpressure to adjust the pace of execution of topologies based on slowest component.
  • Notion of modularity similar to microkernels that allows multi language support and provide alternative implementations for module (for constantly changing big data landscape)
  • Isolation at various levels for ease of debugging and troubleshooting
  • Native containerization for supporting cgroups and dockers
  • Process based as opposed to thread based for profiling and troubleshooting
  • Support diverse workloads in a single deployment - latency sensitive vs throughput sensitive with simple configuration change at each topology.

In 2016, we were excited to open source Heron, which enabled the project to grow into the vibrant and active community it is today. Some of the significant contributions from collaboration with community include:

  • New APIs. First, high level functional API called Streamlets have been introduced both for Java and Python. Second, low level API similar to Storm API have been added for Python. The key advantage is Python API based topologies run in native python interpreter and directly receive and process data. Furthermore, support for ECO (Extensible, Component, Orchestrator) API using YAML for topology stitching is provided.
  • Incorporation of stateful and effective once processing with adaptors for Apache Hadoop, Apache BookKeeper, and local file system.
  • In collaboration with Microsoft, Heron pioneered the work of Dhalion based on the operational experiences it went through. Dhalion allows Heron to self tune, self heal and self stabilize when the topologies experience unexpected behaviors due to system behavior and change in data rate and volume - without any manual intervention.
  • Open source Heron includes optional storage called Apache BookKeeper that provides a seamless experience for distributing job jars and also serves as a stateful storage for exactly once processing.
  • Support for several scheduler deployments such as Kubernetes, DC/OS, Nomad, SLURM and standalone thereby taking advantage of those scheduler features.

We’ve worked collaboratively with the open source community and have benefited from contributed features that are now running in production at Twitter. We shared our experiences in streaming by publishing several papers in premier conferences.

Thank you to the Real-Time Compute team at Twitter, and to the Heron Community, for your continued support as the project moves onto the next phase of its life-cycle. Please follow us on Twitter at @heronstreaming, subscribe to the mailing list, and help support Heron in its new home!

References

[1] Dhalion: Self-Regulating Stream Processing in Heron, Proceedings of the Very Large Database(VLDB), September 2017.

[2] Twitter Heron: Towards Extensible Streaming Engines, IEEE 33rd International Conference on Data Engineering (ICDE), May 2017.

[3] Optimizing Twitter Heron, Twitter Engineering Blog, March 2017.

[4] Open Sourcing Twitter Heron - Twitter Engineering Blog, May 2016.

[5 ]Streaming@Twitter, Bulletin of the Technical Committee on Data Engineering, IEEE Computer Society, December 2015.

[6] Twitter Heron: Streaming at Scale, Proceedings of ACM SIGMOD Conference, Melbourne, Australia, June 2015.

[7] Flying Faster with Heron, Twitter Engineering Blog, May 2015.

[8] Storm@Twitter, Proceedings of ACM SIGMOD Conference, Snowbird, Utah, June 2014.

This Tweet is unavailable
This Tweet is unavailable.