In October 2018, we published the first comprehensive, public archive of data related to state-backed information operations. Since then, we’ve shared 37 datasets of attributed platform manipulation campaigns originating from 17 countries, spanning more than 200 million Tweets and nine terabytes of media. More than 26,000 researchers have accessed these datasets, empowering an unprecedented level of empirical research into state-backed attacks on the integrity of the conversation on Twitter.
We strive to provide timely updates, alongside comprehensive data, whenever our teams identify and remove these campaigns, however, this year, due to technical issues and significant risks to the physical safety of our employees posed by certain disclosures, we have only provided one update. During this time, we’ve been working to identify a sustainable path forward, without compromising on our goals of providing meaningful transparency.
Today, in addition to disclosing eight additional datasets in our archive, we’re sharing an update on what we’ve learned from these efforts and how we intend to advance data-driven transparency in 2022 and beyond.
What we’ve learned so far
Where we’re headed in 2022
With these lessons in mind, as well as the emergent risks we see to the physical safety of our employees around the world tied to potential disclosures, we’re changing our approach in an effort to continue to provide expanded transparency about our content moderation actions. Here’s what you’ll see in the coming months:
In early 2022, we will launch the Twitter Moderation Research Consortium (TMRC) — a global group of experts from across academia, civil society, NGOs, and journalism studying platform governance issues.
- A proven track record of research on content moderation and integrity topics (or affiliation with a group that does such research, such as a university, research lab, or newspaper).
- Appropriate plans and systems for safeguarding the privacy and security of the data provided by the consortium.
Later in 2022, we will for the first time share similarly comprehensive data about other policy areas, including misinformation, coordinated harmful activity, and safety.
As part of this change, we will discontinue our fully public dataset releases, prioritizing release to the consortium. Existing datasets will continue to be available for download indefinitely — and our public data offerings, including free access to our APIs (including the full archive of Tweets) remain available.
Transparency is core to our mission. Our goal with these changes is to provide more transparency about more issues, while grappling with the considerable safety, security, and integrity challenges in this space. We’ll continue to learn and iterate on our approach over time and share those findings publicly along the way.