We’ve all said words we’ve wanted to take back in the heat of an argument. There’s no pause button for moments like this in real life. But when it comes to debates that take a turn for the worse online — the kind of spat that results in name-calling and generally unwanted replies on your timeline — there may be a digital aid, one that’s rooted in a much-studied mechanism of human behavior.
Social scientists call them nudges. These are subtle cues that influence our actions by either removing barriers to make it easier to do something (like watch the next episode of your favorite streaming show without having to click a button) or by creating friction to make it harder to do something (like drive too fast in a school zone, hence speed bumps). They’re so embedded in our lives, we hardly recognize them for the role they play in our everyday decision-making.
Twitter has been experimenting with nudges to help create a healthier online environment that reflects what we expect when we interact with one another in real life. Not only has the team seen positive results in the impact these nudges can have on helping facilitate healthier conversations in English, but they are now being expanded to languages like Portuguese, Spanish, and Turkish.
Paul Lee, a senior product manager in Twitter’s Content Health group, has been working with a team of researchers, designers, copywriters, data scientists, and engineers to find ways to break negative feedback loops — to not only slow down abusive comments, but to help normalize healthier behaviors that remind folks that there are real people on the receiving end. One of the ways they’re doing this is by building mechanisms that ask people to pause before they post something potentially harmful.
A new feature that prompts people to reconsider Tweet replies containing harmful language is seeing promising results, with people changing or deleting their replies over 30% of the time when prompted for English users in the U.S. and around 47% of the time for Portuguese users in Brazil.
There’s a field of behavioral economics that argues how using these techniques can help us design a safer, more productive world. The UK even has an arm of its government nicknamed “The Nudge Unit,” which aims to apply behavioral science to public policy. Can nudging citizens to smoke less, vote more, and conserve energy, for example, make communities healthier, more engaged, and cleaner? Could we apply these same techniques to make our online communities better? Twitter has been actively exploring this.
Can Twitter help persuade people to be better?
Twitter has strict rules against harassment and abusive behavior. But there is harmful language—just plain ugly speech—that doesn't reach the threshold of a rules violation.
“We've seen and heard that this type of behavior is one of the main reasons people leave Twitter,” said Lee, adding that this type of content in large volumes can be especially threatening to historically excluded communities and groups that have been shown to experience disproportionate levels of abuse. “They either don't use the platform or avoid using it in certain ways because they fear targeted instances of marginally-to-more-severely abusive behavior that result in direct harm.”
That’s why the team has been using nudges, a prompt that appears if you try responding to someone using harmful or offensive language, as one way to explore encouraging better behavior.
Here’s how it works: when a person composes a potentially offensive reply to a tweet, and then clicks the "reply" button to publish the text, a pop-up appears that asks them if they would like to review the post before Tweeting it. The message includes the options to “Edit,” “Delete,” or “Tweet” as originally written. If the person clicks “Tweet,” the Reply is posted as they originally typed it. Those who click “Edit” are allowed to edit the Tweet text before sending. The “Delete” option cancels the proposed post entirely.
Twitter designed the intervention to try to bring more awareness to the moments where people who may get caught in what they call a “hot state” — when they’re about to use words they may later regret.
“One of the primary motivations behind this was to reach users who might be having a temporary loss of composure and remind them, hey, you have a moment here to pause and reflect on what you are attempting to say, and be more thoughtful and constructive, even if you're disagreeing with someone,” said Lee.
Cody Elam, a researcher on the team who spoke with customers to understand how they interact on Twitter, says they have a phrase for these instances: “We call these regrettable moments.”
“Our data shows a lot of people — and there's a lot of research out there — who are caught up in the moment who exhibit regrettable behavior,” he said.
In one of the surveys that the team conducted during the development process for the nudges, a majority of people admitted that they had Tweeted something they regretted or deleted something later because they felt sorry about it.
Early signs of usefulness
Twitter has been experimenting with nudges since 2020, including an earlier version of its offensive Reply prompt. But the nudge that’s probably most recognizable is the “Want to read the article first?” prompt that appears when you attempt to Retweet an article without first clicking on it. These prompts are Twitter’s way of encouraging a more informed discussion: People opening articles before Retweeting increased by 33% after they rolled out the nudge.
Led by Alberto Parrella, a Product Manager at Twitter, the company began testing the latest version of the offensive Reply prompt in February 2021. For six weeks, the team studied how well these interventions worked, compared to a control group that received no prompts. The team behind the experiment — Matthew Katsaros, who is a part-time research advisor at Twitter and the director of the Social Media Governance Initiative at Yale’s Justice Collaboratory, and Twitter data scientists Kathy Yang and Lauren Fratamico — compiled their findings in an academic paper that will be presented in June.
The findings suggest that nudges can encourage less offensive speech online without hindering participation in online conversations. People who were prompted to reconsider their Replies canceled them 9% of the time and revised them 22% of the time (37% of these were to a less offensive alternative). Overall, those who were prompted posted 6% fewer offensive Tweets.
The team also observed a decrease in both the number of future offensive Tweets written by users who got the nudge, and the number of offensive replies they themselves received.
Yang, the lead data scientist on the experiment, added that these results go beyond showing how prompts can be an effective tool for decreasing harmful and abusive content on the platform. They also represent a shift in how social media networks can think about content moderation.
Helping, not hindering, expression
As Twitter continues to explore ways to reduce harmful content and encourage users to pause and engage in more constructive and thoughtful ways, they also need to understand the different needs of people. Universal treatments that get applied to all accounts equally, such as nudges, are but one way to reduce this content overall.
“What we're moving towards in the future is less reliance on these universal treatments and more focus on personalized controls and settings that allow users who don't want to see potentially offensive content to successfully avoid it, and successfully avoid the types of accounts share that content,” said Lee.
“First you want to have the platform reminding and educating people about pro-social norms [through things like nudges],” he said. “But the more powerful change is giving people enough control to be able to enforce their own preferences and norms, which will hopefully lead to a healthier, more relevant, and higher quality user experience overall.”