As we shared earlier this year, we believe it’s critical to study the effects of machine learning (ML) on the public conversation and share our findings publicly. This effort is part of our ongoing work to look at algorithms across a range of topics. We recently shared the findings of our analysis of bias in our image cropping algorithm and how they informed changes in our product.
Today, we’re publishing learnings from another study: an in-depth analysis of whether our recommendation algorithms amplify political content. The first part of the study examines Tweets from elected officials* in seven countries (Canada, France, Germany, Japan, Spain, the United Kingdom, and the United States). Since Tweets from elected officials cover just a small portion of political content on the platform, we also studied whether our recommendation algorithms amplify political content from news outlets.
Since 2016, people on Twitter have been able to choose between viewing algorithmically ordered Tweets first in the Home timeline or viewing the most recent Tweets in reverse chronological order. An algorithmic Home timeline displays a stream of Tweets from accounts you have chosen to follow on Twitter, as well as recommendations of other content we think you might be interested in based on accounts you interact with frequently, Tweets you engage with, and more. As a result, what an individual sees on their Home timeline is a function of how they interact with the algorithmic system, as well as how the system is designed.
The purpose of this study was to better understand the amplification of elected officials’ political content on our algorithmically ranked Home timeline versus the reverse chronological Home timeline. We hope our findings will contribute to an evidence-based discussion of the role these algorithms play in shaping political content consumption on the internet.
In our study, we examined algorithmic amplification of political content in the Home timeline by asking the following questions:
How we conducted the study
We analyzed millions of Tweets from April 1 to August 15, 2020, from accounts operated by elected officials in seven countries. We used this data to test whether or not these Tweets are amplified more on the algorithmically ranked Home timeline than the reverse chronological feed and whether there was variance within a party. We used public, third-party sources (such as official institutional websites) to identify political party affiliation. We did not use Tweet content to attempt to infer political views of elected officials.
To study algorithmic amplification of news outlets, we analyzed hundreds of millions of Tweets containing links to articles shared by people on Twitter during the same time period. The outlets were categorized based on media bias ratings from two independent organizations, AllSides and Ad Fontes Media. We excluded Tweets pointing to non-political content such as recipes or sports.
What did we find?
You can read the entirety of our findings in the paper here.
In this study, we identify what is happening: certain political content is amplified on the platform. Establishing why these observed patterns occur is a significantly more difficult question to answer as it is a product of the interactions between people and the platform. The ML Ethics, Transparency and Accountability (META) team’s mission, as researchers and practitioners embedded within a social media company, is to identify both, and mitigate any inequity that may occur.
This research study highlights the complex interplay between an algorithmic system and people using the platform. Algorithmic amplification is not problematic by default – all algorithms amplify. Algorithmic amplification is problematic if there is preferential treatment as a function of how the algorithm is constructed versus the interactions people have with it. Further root cause analysis is required in order to determine what, if any, changes are required to reduce adverse impacts by our Home timeline algorithm.
How can these findings be validated?
It’s important for us to share the data we used to conduct this study so other researchers can reproduce our work. To aid this, we are making aggregated datasets available for third party researchers who wish to reproduce our main findings and validate our methodology, upon request. Details on what is included in this data are given in the paper. For full transparency, researchers would ideally have access to the raw data from which these aggregates were calculated, but this is extremely difficult without compromising privacy.
For the past several months, META has been looking into methods to responsibly make available large datasets to support validation. We’re finalizing a partnership to leverage privacy preserving technology to enable third-party researchers to reproduce this type of work, while also protecting and safeguarding the privacy of people who use Twitter. This approach is new and hasn’t been used at this scale, but we are optimistic that it will address the privacy-vs-accountability tradeoffs that can hinder algorithmic transparency. We’re excited about the opportunities this work may unlock for future collaboration with external researchers looking to reproduce, validate and extend our internal research. We’ll share more about this partnership soon.
We hope that by sharing this analysis today, we can help spark a productive conversation with the broader research community to examine various hypotheses for why we are generally observing comparatively more right-leaning political amplification of elected officials’ content on Twitter.
If you have any questions about Responsible ML, or the work META’s doing, feel free to ask us using #AskTwitterMETA. If you’d like to help, join us.
This research was conducted by Ferenc Huszár (Twitter, University of Cambridge), Sofia Ira Ktena (now at DeepMind Technologies), Conor O’Brien (Twitter), Luca Belli (Twitter), Andrew Schlaikjer (Twitter), and Moritz Hardt (UC Berkeley; the author was a paid consultant at Twitter. Work performed while consulting for Twitter).
*Elected officials in this study are defined as follows:
Canada, House of Commons members.
France, French National Assembly members.
Germany, German Bundestag members.
Japan, House of Representatives members.
Spain, Congress of Deputies members.
United Kingdom, House of Commons members.
United States, official and personal accounts of House of Representatives and Senate members.
Did someone say … cookies?