Company

Expanding access beyond information operations

By and
Thursday, 2 December 2021

Editorial note: This blog was first published on 2 December 2021 and last updated 7 June 2022 to include updates to our approach.

In October 2018, we published the first comprehensive, public archive of data related to state-backed information operations. Since then, we’ve shared 37 datasets of attributed platform manipulation campaigns originating from 17 countries, spanning more than 200 million Tweets and nine terabytes of media. More than 26,000 researchers have accessed these datasets, empowering an unprecedented level of empirical research into state-backed attacks on the integrity of the conversation on Twitter.

We strive to provide timely updates, alongside comprehensive data, whenever our teams identify and remove these campaigns, however, this year, due to technical issues and significant risks to the physical safety of our employees posed by certain disclosures, we have only provided one update. During this time, we’ve been working to identify a sustainable path forward, without compromising on our goals of providing meaningful transparency.

Today, in addition to disclosing eight additional datasets in our archive, we’re sharing an update on what we’ve learned from these efforts and how we intend to advance data-driven transparency in 2022 and beyond.

What we’ve learned so far

  • Meaningful transparency begins with access to data. The data we publish about information operations allows researchers to understand not just that a platform manipulation campaign took place and that Twitter removed it — but precisely which narratives that campaign aimed to advance, and how widely they spread on Twitter. Access to raw content, rather than limited samples and aggregate information, is important.
  • Raw data isn’t accessible to everyone. Many of the datasets we’ve released include hundreds of thousands of Tweets and gigabytes of media. Processing this information often requires advanced tooling and capabilities. Academics, independent researchers, NGOs and data journalists play a key part in translating raw data into meaningful insights, as well as providing critical context in understanding how bad actors operate. Partnerships with the Stanford Internet Observatory and Australia Strategic Policy Institute have helped put these datasets in analytic and narrative context, along with a conference dedicated to studying this data we held in conjunction with the Carnegie Institute.
  • Confident attribution isn’t always possible. Our transparency approach has focused on activity we can confidently attribute to a state actor. Emergent behaviors, including the use of disinformation-for-hire vendors and increasing operational security, sometimes make confident attribution impossible based solely on Twitter’s own data. This doesn’t make the activity in question less important to analyze, but our policies presently prevent dataset disclosure in these cases. Moreover, access to this data, without attribution, may allow experts to piece together operations across multiple platforms and services that is not possible by just one company.
  • Information operations are just one area of public concern. We’ve provided an unprecedented level of transparency about state-backed information operations, given their severe impact on public discourse around the world. As Camille François and evelyn douek have pointed out, other content moderation domains of equal public concern don’t receive the same treatment.

Where we’re headed in 2022

With these lessons in mind, as well as the emergent risks we see to the physical safety of our employees around the world tied to potential disclosures, we’re changing our approach in an effort to continue to provide expanded transparency about our content moderation actions. Here’s what you’ll see in the coming months:

In early 2022, we will launch the Twitter Moderation Research Consortium (TMRC) — a global group of experts from across academia, civil society, NGOs, and journalism studying platform governance issues.

  • Membership in the consortium will be granted to groups or individuals with:

- A proven track record of research on content moderation and integrity topics (or affiliation with a group that does such research, such as a university, research lab, or newspaper).

- Appropriate plans and systems for safeguarding the privacy and security of the data provided by the consortium.

  • We will be fully public about the standards used to determine membership in the consortium, and will bias towards inclusion and access, particularly for emerging researchers and researchers from historically under-represented communities and parts of the world.
  • Twitter will not exercise any control or judgment over the findings or focus areas of the research produced using this data by members of the consortium.
  • The more than 200 researchers around the world with existing access to our unhashed information operations datasets will be invited to join the consortium through an expedited process. Other qualifying individuals and institutions are welcome to apply. We will share additional details about this process in early 2022 in advance of any disclosures to the consortium.
  • We will provide comprehensive data about attributed platform manipulation campaigns to members of the consortium, who may independently choose to publish their findings on the basis of the data we share and their own research. Under this model, we will also begin to share data about platform manipulation campaigns for which we have not been able to arrive at confident attribution to a state actor, and campaigns where we are unable to provide broad access due to employee safety concerns.

Later in 2022, we will for the first time share similarly comprehensive data about other policy areas, including misinformation, coordinated harmful activity, and safety. 

As part of this change, we will discontinue our fully public dataset releases, prioritizing release to the consortium. Existing datasets will continue to be available for download indefinitely — and our public data offerings, including free access to our APIs (including the full archive of Tweets) remain available.

Our efforts in this space are underpinned by our Privacy Policy which has long informed people how we may use the data they share with us. This includes sharing or disclosing information if we believe that it is reasonably necessary to protect the safety or integrity of our platform, including to help prevent spam, abuse, or malicious actors, or to explain why we have removed content or accounts from our services. As we highlighted in our position paper setting out principles for policymakers drafting new regulations, we urge policymakers to build protections for this kind of data sharing into the laws that govern data privacy.

Transparency is core to our mission. Our goal with these changes is to provide more transparency about more issues, while grappling with the considerable safety, security, and integrity challenges in this space. We’ll continue to learn and iterate on our approach over time and share those findings publicly along the way.

This Tweet is unavailable
This Tweet is unavailable.

7 June 2022

Today, we’re opening up the Twitter Moderation Research Consortium to a limited group of researchers. We’ll use this initial period to gather learnings and make adjustments to program design, where needed, ahead of our forthcoming public launch. Feedback from these researchers will help shape and inform our work. 

During this period, membership will be open to applicants who were granted access to our information operation data sets during prior disclosures. Researchers with prior access may re-apply for the Consortium during this phase, and will be evaluated in line with the below updated criteria:

  • Hold a primary institutional affiliation with an academic, journalistic, nonprofit, or civil society research organization. If they are students, they must be master’s or PhD level students; undergraduate students are ineligible at this time.
  • Have prior experience and relevant skills for data-driven analysis. Consortium datasets are primarily shared as JSON files and require technical skills to analyze. 
  • Demonstrate a specific public interest research use case for the data provided by the Consortium. (“Public interest research use case” means non-commercial research for journalistic, academic, or non-profit/civil society purposes.)
  • Equipped with industry-standard plans and systems for safeguarding the privacy and security of the data provided by the Consortium. Consortium members are required to sign a data use agreement.

Later this year, we’ll open up the application for Consortium membership to the wider public and share key learnings from the beta period.

As we’ve said previously, transparency is core to our work here. Through this updated approach, we aim to share more about what we’re seeing on the Twitter service, while addressing the safety, security, and integrity challenges that accompany these disclosures. Down the line, we’ll disclose data about other policy areas, including misinformation, coordinated harmful activity, and safety. More to come.

 

This Tweet is unavailable
This Tweet is unavailable.