Latest posts
By Andrew Bean and Ricardo Cervera-Navarro on
Using our customized data and model parallel distributed training strategy provides training speed improvements of up to 60x over single-node training for sparse machine learning models at Twitter.
By Andrew Bean on
We use a combination of data parallelism and model parallelism in a customized distributed training strategy to enable fast training of large sparse machine learning models at Twitter.
By Andrew Bean on
We detail the optimizations behind our custom approach to distributed training. We begin with the distribution strategies provided by TensorFlow, and the difficulties we had using them at Twitter.
By using Twitter’s services you agree to our Cookies Use. We use cookies for purposes including analytics, personalisation, and ads.