Insights Distributed training of sparse ML models — Part 3: Observed speedups

Using our customized data and model parallel distributed training strategy provides training speed improvements of up to 60x over single-node training for sparse machine learning models at Twitter.

Insights Distributed training of sparse ML models — Part 2: Optimized strategies

We use a combination of data parallelism and model parallelism in a customized distributed training strategy to enable fast training of large sparse machine learning models at Twitter.

Insights Distributed training of sparse ML models — Part 1: Network bottlenecks

We detail the optimizations behind our custom approach to distributed training. We begin with the distribution strategies provided by TensorFlow, and the difficulties we had using them at Twitter.