China's apparatus for control of speech on the Internet is perhaps the largest and most robust in the world. However secretive and mysterious it may be, its fingerprints aren't hard to read, if you're willing to sift through lots and lots of data. And the results offer some interesting clarity about the methodology of Beijing's bustling online whac-a-mole.
Researchers at Harvard have mined millions of Chinese blog posts to divine the intentions of the government's massive censorship program. The team, led by Professor Gary King, processed and coded thousands of posts, and then returned to see if they had been censored. Their findings suggest that, contrary to popular opinion, censorship in China is designed not to limit government criticism, but to limit speech that has the potential to incite collective actions. Also, porn and – how meta – anything that disses the censors themselves.
"The results are clear," the paper states,"posts are censored if they are in a topic area with collective action potential and not otherwise. Whether or not the posts are in favor of the government, its leaders, and its policies has no effect on the probability of censorship."
Two outliers to this theory are pornography and criticism of the censors themselves — both of which are removed to the extent of the censor's abilities. The faceless censors – abetted by an army of citizen police known as fifty centers – are ruthless about their image. "They offer freedom to the Chinese people to criticize every political leader except for the censors, every policy except the one they implement, and every program except the one they run," says the study. "Even within the strained logic the Chinese state uses to justify censorship, Figure 7 (Panel b) — which reveals consistently high levels of censored posts that involve criticisms of the censors — is remarkable."
When news about former Chongqing mayor Bo Xilai's expulsion from the Party was announced yesterday – along with the finding that he had "seriously violated discipline," including abuses related to a murder case involving his wife, taking vast quantities of bribes and "having or maintaining inappropriate sexual relations with multiple women" – social media sites were abuzz with discussion, gossip and disdain for a system thought to be rife with corruption. But as Josh Chin at the Journal's China blog reported, few posts were censored. One exception was from an anonymous Weibo user named DarrenLIU:
"Inappropriate sexual relations with multiple women. Damn. That's not the sexual problem most Chinese officials have."
As with any project trying to explicate the contents of something as substantial and diverse and the Chinese internet, the study is subject to biases and limitations. Most glaringly, the paper skips any discussion of weibo, China's version of Twitter. Weibo acts as a central forum for discussion, and its omission is a large one. Perhaps most important importantly, as the paper notes:
[The methodology] misses self-censorship and censorship that may occur before we are able to obtain the post in the ﬁrst place; it also does not quantify the direct effects of The Great Firewall, keyword blocking, or search ﬁltering in ﬁnding what others say. We have also not studied the effect of physical violence, such as the arrest of bloggers, or threats of the same.
Given that one of the strongest aspects of Chinese censorship is the very clear lack of anything resembling a list of rules to follow so that you don't get put under house arrest self censorship seems like a big thing to miss.
Earlier this year, another paper, published in the online journal First Monday took the other tack, and focused on posts to Twitter and Weibo. Under the direction of Carnegie Mellon's Noah Smith researchers collected 57 million messages, and then sampled randomly, and for keywords. Their findings jibe with the Harvard study, suggesting that, outside of a predictable base rate of censorship, post deletion can be tied, quantitatively, to keywords.
The paper suggests a rather clever hyothesis in measuring the objective frequency of terms between domestically hosted microblogs like Weibo and internationally hosted services like Twitter. Methodologically, they write, the discrepancies "may be a productive source of information for automatically identifying which terms are politically sensitive in contemporary discourse online."
One of the most interesting things they found was a deletion rate that correlated with geographic origin:
Messages that self-identify as originating from the outlying provinces of Tibet, Qinghai, and Ningxia are deleted at phenomenal rates: up to 53% of all messages originating from Tibet are deleted, compared with 12% from Beijing and 11.4% for Shanghai.
The Harvard study also touches on geography, but with different conclusions. In it, the researchers found highly localized censorship of issues, without corresponding national level deletions. This suggest, they write, that local governments in China can act with autonomy (something that was already established) and that localized censorship can be used to examine "the differences between the priorities of various sub-national units of government."
The Harvard authors suggest that their data "clearly expose[s] government intent" and while that may be an overstatement — one could say that the intentions of the Chinese government remain largely inscrutable — both papers go a long way toward making the tea-leaf-reading of political science a little more data oriented.