Filter bubbles

In the ongoing media post-mortem of the 2016 US presidential election, a number of news outlets have questioned the role of social media in determining public opinion. The Guardian exposed people to ideologically opposed Facebook news feeds, and Buzzfeed aired an analysis of specific Facebook features they believe have outsized political impact. These and other recent articles are in whole or in part inspired by the idea of “filter bubbles,” or the notion that people are guided into ideological cul-de-sacs by the well-meaning but ultimately harmful algorithms that decide what we see and don’t see on Facebook, Twitter, etc. This idea was popularized by Eli Pariser, author of the book The Filter Bubble: What the Internet is Hiding From You, and co-founder of reliably cringe-inducing clickbait factory Upworthy.

In an echo chamber, exposure to ideologically divergent views is limited by near-complete homogeneity. You can’t absorb an idea if nobody ever brings it to the table. Filter bubbles are different—in a filter bubble, what you see is controlled by an algorithm that tracks your behavior, and thereby your ideological position. The worry is if I only “like” ideologically conservative content, the Facebook news feed algorithm will give me more content it thinks I will “like,” limiting my exposure to an ideologically heterogeneous world.

Researchers at Facebook and the School of Information at the University of Michigan undertook a large study of Facebook users (n = 10.1 million) to assess whether the Facebook news feed algorithm by its very nature limited exposure to “ideologically discordant” or “cross-cutting” content (Bakshy, Messing, & Adamic, 2015). They identify three progressive stages where exposure to such content can be limited: first, people in your social network can fail to share it; second, the news feed algorithm can hide it; third, you can fail to click on it when it appears in your news feed. As a basis for comparison, the authors measure how much cross-cutting content people would be exposed to if their friend networks were random, rather than selected at least partially on the basis of ideological similarity.

Figure 3B from Bakshy et al. (2015)

Drops in the level of cross-cutting content at two of these stages could show a filter bubble effect: the move from potential to actual exposure, and the move from exposure to selection (or clicking). The latter matters for filter bubbles because articles that appear higher in the feed are clicked on more frequently, and an algorithm determines this position. The results do show a small filter bubble effect: the news feed hides some cross-cutting content from liberal and conservative users completely, and shifts some down the timeline. These effects are small. Without a filter, liberal feeds have 24% cross-cutting content on average, and conservative feeds have 35%. With the filter, these figures drop to 19% and 33%, respectively. Another way of looking at it: Facebook users actually do see most of the cross-cutting content shared by their network, but because of how we pick our friends, there’s not a lot of cross-cutting content in our networks to begin with.

Figure S6 from Bakshy et al. (2015)

Recent reporting on fake news found it’s largely targeted toward conservatives. Are conservatives also more affected by filter bubbles? Compared to randomly selected social networks, both liberals and conservatives have substantially less potential exposure to cross-cutting content. Perhaps surprisingly, users that identify as conservative are both exposed to and actually click on a larger percentage of cross-cutting content than liberals. And self-identified liberals have much more ideologically homogeneous networks than conservatives. So if anybody is living in a filter bubble, it’s liberals.

There are a number of missing pieces that make these findings hard to interpret. Is this small filter bubble effect nevertheless big enough to shift political outcomes? (If so, all things being equal, we should expect more conservatives to be pulled out of their bubbles than liberals.) How strong a filter bubble do we want in the first place, and what would our ideal filter bubble look like? What kinds of trades are we willing to make to increase ideological heterogeneity in our social networks? (What if for every two friends you added on Facebook you were forced to connect with a random person from outside your social network? Would that be too much? Too little? The more I think about this the more it reminds me of false notions of “balance” in reporting: for every movie I watch about an old man who knits hats for prematurely born children on Upworthy, should I be required to watch an anime racist on YouTube deliver a lecture on the Clinton body count?) What is the effect of exposure to cross-cutting content in the first place? Does exposure cause us to reconsider our beliefs, or do we react by bolstering our preexisting point of view? (Do we gleefully “hate watch” cross-cutting news like a hilariously bad movie?) This research focused on “hard news”—but to what extent can reading “hard news” change our beliefs?

I suspect one of the reasons that filter bubbles, echo chambers, and fake news are getting so much air time right now is because these explanations point to relatively simple moral failings and possible solutions: Facebook fails to accurately represent the heterogeneity of our social networks, Facebook fails to identify fake news, and so on. If this is the case, we are blameless, and Facebook could change their algorithms, either voluntarily or because of legal regulation, and thus dramatically improve our political situation. The alternative is scarier: echo chambers, filter bubbles, and fake news are relatively minor problems that expose preexisting cracks in our civic fundament: the tendency to associate with people who look like we do, or who think like we do; the fragmentation of the attention economy and the difficulty of identifying and interpreting facts; the overall low quality of “real” news; racism; apathy; disenfranchisement; economic structures that reinforce isolation; the potential impact of changing social norms; and so on. It is easier to blame Facebook than to turn our attention and energy toward social forces that emerge from teeming interpersonal flows.

Bibliography:

Bakshy, E., Messing, S., & Adamic, L. A. (2015). Exposure to ideologically diverse news and opinion on Facebook. Science, 348(6239), 1130–1133. http://doi.org/10.1126/science.aaa1160

Fake News

For the past day or two the people I follow on Twitter have been loudly calling for Facebook to crack down on fake news. The impetus is likely a Buzzfeed article detailing how a group of enterprising young Macedonians are making a lot of money from advertising on fake news sites. Their articles are designed to inflame partisan hatred, with clickbait-style titles like “Proof surfaces that Obama was born in Kenya—Trump was right all along…” Conservatives are their primary targets. Of this, one of the fake news authors said simply “People in America prefer to read news about Trump.” In a delightful twist, one of these sites is now running an article titled “‘Fake news’ on social media influenced US election voters, experts say.” 

In A Thousand Years of Nonlinear History, Manuel de Landa makes a case for the importance of energy gradients as an evolutionary and cultural force. Wherever the flows of matter and energy in the universe create a boundary or a gradient, organisms will evolve to exploit it. For example, solar energy gradients are exploited by phytoplankton that evolved to capture light using photosynthesis. The authors of fake news are exploiting information and attention gradients, transforming the cognitive processes of Facebook users into cold hard cash.

One reason this story is compelling is that we have the intuition that if fake news on Facebook is profitable, then we have a real structural problem. We can’t handle the moral dilemma: generating profit (capturing an energy flow) is good, but spreading misinformation is bad and could weaken our political institutions. In his defense of Facebook’s algorithmic approach, Mark Zuckerberg gives anarcho-capitalism a relativist spin: “Identifying the ‘truth’ is complicated.” As on-its-face repelling as I find this, he has a point. If it were easy to identify the truth, maybe people would not consume fake news in the first place. Zuckerberg also suggests that spreading misinformation is not so bad, all things considered: “on Facebook, more than 99% of what people see is authentic.” He seems to want to have it both ways: identifying fake news is hard, but we can trust our users to do it.

This raises a lot of questions, some of which I alluded to in my previous post examining evidence for echo chambers for online news reading. How much inauthentic information is the right amount? How do we gauge the impact of fake news? If 1% of the information on Facebook is incorrect, how much damage do we expect to our political institutions? What about 2% or 0.5%? How do we estimate the effect of fake news on our elections? (Remember “swiftboating?”) Who are the people reading the fake news? If they are already die-hard partisans, do we expect the news to change their thinking? Do people in fact believe the fake news is true? Why should I have the intuition that the Pope endorsed Trump is obviously false and should be ignored while someone else has the intuition that it’s important and should be shared?

I think Facebook should identify and limit the spread of fake news articles. I agree with Zeynep Tufekci that Mark Zuckerberg is in denial. But I think that is more urgent to address why we are susceptible to fake news in the first place, and what the effects of fake news truly are. Only after answering some of these questions can we build social institutions that resist the spread of misinformation.

Online echo chambers

In the wake of Trump’s election, many of my friends have raised the question of whether social media sites like Facebook and Twitter and search engines such as Google have exacerbated existing political divisions by creating “echo chambers” and “filter bubbles.” Today Mark Zuckerberg, CEO of Facebook, defended the company’s news feed algorithms, making the case that the proliferation of fake news articles was not a problem, and emphasizing the necessity of not introducing bias into the system. This sidesteps a perhaps larger concern: that even a “neutral” site structure can allow users to create strongly biased echo chambers that reinforce partisan animus.

In this post I review Flaxman, Goel, & Rao (2016), an impressive paper by researchers at Oxford, Stanford, and Microsoft Research that examines internet news reading behavior. The paper directly addresses whether or not social media and search engines contribute to an echo chamber effect, where users are overexposed to content reinforcing their political beliefs, and underexposed to content offering diverging perspectives.

Data set and study limitations

The authors collected web browsing time series data during a three month period in 2013 from 50,383 people who actively read the news. These users were drawn from a larger sample of 1.2 million US users of the Bing Toolbar, a plugin for Internet Explorer. The sample was cut down to size by (1) removing participants who viewed less than 10 news articles, and (2) removing participants who viewed less than 2 opinion pieces. After (1), they had 173,450 people, and after (2) they were left with the final count of 50,383, or ~4% of the sample. It is striking that out of 1.2 million users, at most ~14% actively read more than 3 news articles per month. Off the bat, this suggests that any echo chamber effects on political thought and action are likely to be much smaller than the effect of overall disengagement with news.

The data were further limited to the top 100 websites visited by users that were classified as news, politics/news, politics/media, and regional/news on the Open Directory Project. Highlighting the low overall level of engagement with news, the authors note that only 1 in 300 outbound clicks from Facebook “correspond to substantive news,” based on their criteria. Most clicks instead go to “video- and photo-sharing sites,” e.g., YouTube, Instagram, etc. This strikes me as a blind spot, as these “sharing” sites are home to a wide variety of political content. It’s also important to note that the data contained only clicked links and typed website addresses—“likes,” “favorites,” “shares,” “retweets,” and other site-specific actions that reflect political thought were not tracked. This study gives us no way to evaluate echo chamber effects on the thought and action of the vast majority of internet users, who avoid the news and stick to social and sharing sites. However, if regular newsreaders are also more likely to vote (which seems plausible to me), assessing the strength of an echo chamber effect on this group seems prudent.

Measuring political slant

The political slant of news sites was calculated in terms of their “conservative share,” or the fraction of each site’s readership that originated from IP addresses in zip codes that voted Republican vs. Democratic in the 2012 presidential election. These zip code level data were weighted by the percentage of actual Republican vs. Democratic voters. The rank ordering of news sites by conservative share (in the supplementary materials) is interesting in itself. Many of the most ideologically polarized sites are local news, e.g., the Orange County Register (conservative share = .15) and the Knoxville News Sentinel (conservative share = .96). This could be indirect evidence for the “big sort” hypothesis, that people are relocating to areas with others who share their political views. 

Sites on equal but opposite sides of the 50% conservative share mark do not strike me as having equal but opposite biases in their coverage. For example, the BBC (which reads to me as centrist/liberal—certainly not leftist) has a conservative share of .3, whereas its conservative-share opposite at .7 is Breitbart (which I consider to be on the far right). Sites with conservative shares greater than .7 are mostly local news, with the exception of Topix at .96.

To evaluate the strength of echo chamber effects, the authors ask two key questions. First, how ideologically polarized are news readers? And second, do users read a range of news across the ideological spectrum, or do they stick to a narrow band reflecting a single ideology?

How polarized are news readers?

The authors assess polarization by defining “ideological segregation” as the expected value of the difference in polarity scores between two randomly selected users. The polarity score of an individual is the mean of conservative shares across their visited news sites, estimated using a fancy hierarchical model. Approximately 66% of users had a polarity score between .41 and .54—i.e., they are boring centrists that get their news from e.g., USA today (conservative share = .47). Segregation across the user population is .11, which the authors note corresponds to the difference in polarity between e.g., ABC (conservative share = .48) and Fox News (conservative share = .59).

One way to look for an echo chamber effect is to ask if social media and search behavior show greater segregation than other kinds of browsing. The authors assess this by splitting user activity into categories based on the type of site used to find news, and whether the news is identified as an opinion piece (using a fancy corpus-based binary classifier). The types of browsing activity (or “channels”) identified by the authors are: aggregators (e.g., Google News), direct (typing a URL into the browser), social (e.g., Facebook), and search (e.g., Google, Bing). Segregation was higher for opinion pieces regardless of site type. Segregation was also higher for social and search than direct traffic, and quite a bit lower for aggregators, consistent with a small but reliable echo chamber effect.

Figure 3 from Flaxman et al. (2016)

Directly entering a URL into the browser accounts for 67% of opinion traffic and 79% of news traffic. This is followed by search (opinion: 23%, news: 14%), social (10% opinion, 6% news), and aggregators (.4%). From this study alone we can’t know how users develop their direct browsing habits—it’s plausible that direct browsing choices are influenced by social media and search browsing, but it’s hard to know from this study alone.

How much segregation is healthy? The highest segregation value reported is .2, for users that find opinion articles using search, which corresponds to the distance between The Daily Kos (conservative share = .39) and Fox News (conservative share = .59). The authors note that this difference “represents meaningful differences in coverage,” and is “within the mainstream political spectrum.” In the wake of Trump’s election, this is a cold comfort. A controlled experiment would manipulate ideological segregation across a set of polities on equal political and cultural footing and observe the results: presumably the rate of protests, coups, uprisings, and so on, would be affected. A correlational approach would observe browsing habits across a range of polities, but would need to contend with confounding inter-polity differences in political structure and culture.

How big is the ideological range of the average user?

Do users get their news from a variety of sources with different political slants, or do they stick to reading a limited number of sources in a small ideological range? The authors calculate the “isolation” of each user by estimating the standard deviation of the estimated conservative share of the sites they read. They find that users are highly isolated, rarely reading news with conservative shares further than ±.06 from their average. For example, a centrist who typically reads NBC (conservative share = .5) will only rarely read news sources with conservative shares lower than .44 or higher than .56—excluding sites such as CNN (conservative share = .42), the New York Times (conservative share = .31), and Fox News (conservative share = .59). Recapitulating the theme of general disengagement, this isolation is because 78% of users get the majority of their news from a single site, and 94% get the majority of their news from two sites at most.

Isolation varies with polarity, where users who stick to sites with a conservative share around .5 are the most isolated (~.06), those who read sites with conservative share less than .3 are slightly less isolated (~.08), and those who read sites with conservative share greater than .7 are the least isolated (~.19). However, as observed before, the linear conservative share spectrum does not correspond to a linear ideological spectrum—sites on opposite sides of the 50% share line are not equally biased in opposite directions. What we really want to know is how frequently users read content from news sources that are meaningfully ideologically different from their regular news.

Figure 5 from Flaxman et al. (2016)

In general, people stay in their ideological comfort zone, and more so for partisans on both sides than for centrists. However, as mentioned above, centrists have the most narrow ideological bandwidth, so although they are willing to cross the aisle, they are not doing so by reading far left or far right news sources. At the extremes, highly polarized users get between 1% and 4% of their news from ideologically opposed sources. Users arrive at opposing articles more frequently using aggregators, social media, and search, versus direct browsing, which results in almost no exposure to opposing views. The effect of social media and search on direct browsing choices remains unclear. These results are hard to interpret and, as with the results concerning ideological segregation, they raise a normative question: how many ideologically opposing articles should people read?

Discussion

Social media and search activity show small but meaningful echo chamber effects for regular news readers. At the same time, social media and search expose people to a larger share of ideologically opposing news articles than direct browsing. These effects are _absolutely dwarfed_ by two larger effects: (1) the vast majority of internet users do not read the news, and (2) the vast majority of those who do read the news use one or two websites at most, which they browse to directly, without social media or search as an intermediary.

This research should not be read as disconfirming the importance of echo chamber effects. Behavior on social media and sharing sites, including “likes,” “shares,” “favorites,” and so on are not in the scope of the analysis, excluding the vast majority of politically relevant internet behavior. It is possible that there are profound echo chamber effects that cannot be detected based on data from the URL bar alone. 

Further, though it emphasizes the dominance of direct browsing, this study offers no information about how direct browsing habits are influenced by previous experiences using social media and search. We know that individual users have extremely narrow ideological bandwidth—but would this be the case if the internet or social media were structured in a different way? If, instead of allowing users to choose who they are friends with on Facebook, what if everyone you conversed with in your day-to-day life were automatically added to your friends list? Would this increase or decrease ideological bandwidth? What if web browsers loaded a bipartisan news aggregator in new windows by default? Other factors also undoubtedly affect ideological bandwidth, and may contribute to echo chamber effects, including television watching, education, religion, the “big sort,” and so on.

Investigation of echo chamber effects raises important normative questions. How much ideologically opposing news material should people read? Do social media sites and search engines have a moral responsibility to create ideological balance? Or do they have a higher responsibility to the truth, independent of political ideology? Making the case that social and search companies should change their algorithms requires answers to some of these normative questions, as well as additional research tying a broader range of user actions (e.g., “likes” and “shares”) to important political outcomes (e.g., voting likelihood, discriminatory behavior, etc).

Bibliography

Flaxman, S. R., Goel, S., & Rao, J. M. (2016). Filter Bubbles, Echo Chambers, and Online News Consumption. Public Opinion Quarterly, 80, 298–320. http://doi.org/10.1093/poq/nfw006

Joining the communication ecosystem

Over the past four years I’ve been working on a PhD in Psychological and Brain Sciences at Dartmouth College. My research draws equally from the traditions of social and cognitive psychology and neuroscience. During my training, the “replication crisis” in the sciences has become the subject of many discussions, both private and popular. I have had the confusing yet wholly enriching experience of learning a set of methods and practices for isolating facts and interpreting meanings, while also learning that these methods and practices are deeply flawed and must to be subject to stringent, ceaseless critique if they are to survive.

A key feature of these discussions has been their openness. They have been conducted largely on the internet—Twitter, blogs, Facebook groups—bypassing traditional print publishing gatekeepers. One obvious upside to this approach is the accessibility of the conversation. Anyone, from interested laypeople to tenured professors, can tap into the discourse. But for me, the more profound effect of openness has been seeing how the thinking of the community has changed over time. Rather presenting as a single, tight statement that individuals must take or leave, the discourse of the replication crisis is a slow-shifting landscape of argumentation, reaction, feeling, communication. Following the crisis means seeing up close how thoughts evolve under stress.

I’ve learned from, among many others, Michael Inzlicht, who publicly shared his doubts about ego depletion research earlier this year; Sanjay Srivastava, whose Everything is fucked: The syllabus is a superb guide to the methodological and philosophical issues lying beneath the replication crisis; Simine Vazire’s writing at sometimes i’m wrong, whose casual tone belies sharp, persuasive reasoning. And also from people who have radically different views of scientific practice than my own, such as Jason Mitchell, whose On the evidentiary emptiness of failed replications was a strong driver of productive discussion. I have also been influenced by scientists who write openly about their research and the research of others, such as Niko Kriegeskorte, who publishes all his peer reviews on his blog, and Micah Allen, who regularly shares on Twitter the day-to-day details of the messy process of research.

To date, my own work has been conducted largely under the old—and I think busted—paradigm of operating in complete secrecy until publication, followed by some limited outreach to the public via the popular press. This despite the fact that I have learned more from scientists who discuss their research in the open than from scientific papers read in isolation. Well, no more. The purpose of this blog is to bring my thinking process out into the open, with the hope that it will be as useful to others as the many eloquent thinkers of open science have been useful to me.

The decision to open up my thinking arrived alongside the election of Donald Trump as the 45th President of the United States of America. Me and many of my close friends have wondered how America has become so profoundly divided—so much that the speech of those with different political views seems not simply wrong, but uninterpretable. Social psychology has long attempted to understand these divisions, from the seminal Robbers Cave experiment by Muzafer Sherif, to the study of minimal group formation by Henri Tajfel, to recent work on the idea of implicit bias by Mahzarin Banaji, Brian Nosek, and many others. However, as Emmett Rensin notes in The smug style in American liberalism, social psychology also has a long history of fucking things up pretty bad, offering pat, simplistic non-solutions to deep problems, and making it very hard to distinguish truth from researchers’ wishful thinking. My own research is shifting toward some of these politically charged topics, and I hope that sharing my experience of making sense of it all will be a meaningful contribution to our communication ecosystem.