Sunday, August 4, 2013

Hackers, bloggers and professors team up to tap into blocked microblog content

The stated-sponsored newspaper Global Times published an article about Weibo censorship on July 28th 2013 both in print and electronically. The article was removed from the website two days later.  The article is reproduced below.
Update on Aug 12: According to The Diplomat,  "A source close to the matter inside the Global Times tells The Diplomat, "After Kaifu Lee tweeted it on Weibo, it got too much attention and got on the authorities' radar." The same source also confirms that the propaganda department did play a role in taking it down."

Hackers, bloggers and professors team up to tap into blocked microblog content
Global Times | 2013-7-28 19:13:01
By Xuyang Jingjing
With over 500 million registered users and over 46 million daily active users, Sina Weibo is the largest and most influential social media platform in China. It has also become known as a fostering ground for discussions with a more liberal slant.
But what is not allowed to be discussed on Weibo perhaps says just as much as what can be. There are a number of projects that aim to uncover content blocked on Weibo. Most of the people behind such efforts are China watchers based overseas or foreigners living in China. While they may have different approaches and backgrounds, their efforts are successful in bringing this vanished content back to light.
One such project, Freeweibo.com, won the 2013 Bobs, or Best of the Blogs awards, for best innovation in June. The Bobs awards, started by Deutsche Welle in 2004, are given out in 34 categories in 14 languages, and aim to honor the open exchange of ideas of free expression.
Hu Yong, a professor at Peking University and a new media observer, served as a juror at the awards. He commented that Freeweibo preserves digital memories and makes disappeared content visible again, according to the official website of the Bobs.
Alternative Weibo universe
Launched on October 10, 2012, Freeweibo retrieves data automatically from Weibo to provide "uncensored and anonymous Sina Weibo searches."
"We ignore relevant laws, legislation and policy," the welcome message on the website reads, a response to the expression Weibo and Chinese search engines use to explain why searches for certain words come back empty.
The website, in both English and Chinese, displays posts that are blocked or deleted on Sina Weibo. When searching for keywords, Freeweibo breaks search results down to "blocked by Sina Weibo" and "official search results," which allows users to see which search results are missing from the official Weibo.  
Freeweibo has around 10,000 unique visitors per day, with most coming from China, including Taiwan, based on the language setting, according to Percy Alpha, the pseudonym used by one of the founders.
A week after the website went live, it was blocked on the mainland. But the creators of the website have also been trying to provide mirror sites that are accessible without a VPN.
From the list of blocked keywords provided on the website, it is also clear when some words become sensitive and when such scrutiny is lifted.
For instance, the name of Bo Xilai, former Party chief of Chongqing who was recently prosecuted on corruption charges, was banned from searches until July 25, the day the news of his prosecution was announced.
Meet the founders
The same team also founded Greatfire.org back in 2011, a website that enables real-time testing of what is being blocked by the Great Firewall of China (GFW). URLs being tested are added by users or are imported from other similar projects. At the moment, the website monitors over 10,000 websites regularly to see if they are blocked and then analyzes precise methods of online monitoring such as connection resets, DNS poisonings and so on, explained Percy Alpha.
The website also provides an up-to-date database of URLs and keywords that are blocked.
Greatfire is also blocked in the mainland. Test data collected by the website clearly showed a 6-month gap between when its Chinese version was blocked and then the English.
The founders of the two websites have remained anonymous but one of the three is an American in China who goes by the alias Martin Johnson.
Percy Alpha would only say in an e-mail interview that he lived in China for a long time and is now based in the US. He said they are collaborating with other organizations and developers, though he wouldn't disclose the nature of the organizations they are working with or give further details about their collaboration.
According to their own introduction on Greatfire, they are self-financed but are exploring ways to "make the website a financially sustaining entity."
Percy Alpha said that what pushed him over the edge and made him start the project was the Google China dispute in 2010. Google refused to comply with China's regulations to filter search terms and later moved its Google China servers to Hong Kong.
Not long after that, search for individual characters, mostly those contained in Chinese leaders' names, were also blocked even when they are frequently used in other phrases and expressions.
"Chinese people in general know very little about censorship," Percy Alpha told the Global Times. He said that when he talked to Chinese people about the Google withdrawal from the mainland and searches being blocked, he found that most didn't seem to care and repeated the official line that censorship is just and necessary.
China's regulation on Internet information lists nine types of banned content, most of which concerns national security, state unity, rumors, pornography and violence. But in practice it isn't always clear where the line is and in the event of a breaking incident, certain words or phrases that are otherwise normal might become sensitive for a period of time.
Data provided by Greatfire has been used by other researchers to get to grips with Internet restrictions. In May, for instance, two professors from Northwestern University in the US used its data to study how the GFW affects users' online behavior.
Percy Alpha says the team is also developing easy tools that allow people to access free Internet and to make information available in China.
Zhang Zhi'an, an associate professor in new media at Sun Yat-sen University, said plenty of Chinese scholars also observe and study Weibo regulation. He acknowledged it might be easier for researchers overseas as they are not restricted by the GFW and take less risks when doing so.
"I don't know about their motives, but by presenting this blocked information, they allow more people to know about Internet regulation in China and provide data for other scholars who might be interested in studying China's Internet monitoring," he said.
Academic support
Their team isn't the first or the only one watching the censors and collecting data about blocked content. Many individual or academic efforts are also being made to take a closer look at how China's Internet and social media operate. Oftentimes, such projects inspire each other and even use each other's data.
For example, Freeweibo was inspired by and uses data from WeiboScope, a data collection and visualization system developed by the Journalism and Media Studies Center of the University of Hong Kong in 2011.
WeiboScope uses API tools provided by Weibo to retrieve posts from 350,000 users at set time intervals to show how posts are diffused and censored. People can also search for the most reposted microblogs with images within the past 24 hours or search for specific keywords in several languages. This allows people to get a real-time idea of trending topics on Weibo, without online monitoring.
With this tool, researchers at the school are able to assess online monitoring on Weibo and the impact of policies such as the real-name registration policy enacted last year that requires microbloggers  to register with their real identity.
The web page for WeiboScope is also not accessible in the mainland.
Another project centered in academia is China Digital Times, a bilingual news website that brings "uncensored news and online voices from China to the world." It is supported by the Counter-Power Lab at the School of Information, University of California, Berkeley. Both the Chinese and English websites are blocked.
Since 2011, it started a research project that aims to construct a database of sensitive Weibo search keywords. It's an open source project where Web users could pitch in.
Xiao Qiang, the founder and editor-in-chief of China Digital Times, is an adjunct professor at UC Berkeley. He was a theoretical physicist by training and later became a human rights activist.
Other efforts
One of the few projects that remains accessible in the mainland is a Tumblr page called Blocked on Weibo, which documents words blocked on Sina Weibo and also offers contexts and explanations for the bans. The creator of the blog is Jason Q. Ng, a 2013 Google Policy Fellow at the University of Toronto's Citizen Lab.
Ng uses a different approach. He developed an automated process to check individual words to see whether they are blocked or not. He tested 700,000 Chinese Wikipedia titles in early 2012. The script performed searches on Weibo for three months and recorded whether they were censored. He collected over 150 terms and explained why they were sensitive in a book also entitled Blocked on Weibo which will be published next month.
Ng, a US citizen and a graduate student in East Asian Studies at the University of Pittsburgh, said he didn't have a background in computer science prior to this project.
He said he doesn't have an agenda with Blocked on Weibo, and that it's a "fun little challenge" for him as "coding is akin to solving a puzzle, solving little pieces at a time."
In his past career as a book editor, Ng worked on a book about China Central Television and developed an interest for how media works in China, he explained.
Ng wrote on his blog that he hopes his site "proves the resourcefulness and resiliency of Chinese netizens as well as the sense of responsibility that Chinese leaders (in the government and in private organizations) have for shepherding the country forward. You could even claim that the CCP [CPC] cares too much for its citizens."
Ng explained he meant no sarcasm by this. "Even though I don't agree with such a sentiment, I think it is part of a argument that needs to be legitimately considered in order for those outside China to begin understanding why such restrictions are in place in China," he said.  

Posted in: Frontpage In-Depth

Sunday, March 31, 2013

Sina testing subtle censorship ahead of Tiananmen anniversary

What happened?

On May 31, 2013 at 7am, I observed that searching for keywords that are normally blocked, for example, “六四事件” (June 4th incident), surprisingly returned some results and no censorship notice. This temporary lift of censorship ended at 9am but started and stopped a few more times into early afternoon, as if literally somebody was flipping a switch on and off at Sina headquarters.
Update on June 2: Sina is still constantly switching between those two method.
Update on June 8: From June 3-4th and onwards, Sina Weibo seems to switch back to explicit complete block for those keywords.

Change in Tact

To understand what is happening you need to be familiar with Sina’s various censorship methods. I observed and reported last year that Sina had implemented new tactics to censor particular keyword searches. Just days before the June 4th anniversary, Sina is again tweaking its censorship mechanisms. During the morning hours of May 31, Sina completely abandoned its old style, explicit approach to censorship, which displayed a message but no search results:
“According to relevant laws, regulations and policies, search results for [the blocked keyword] can not be displayed.”
No, Sina has not suddenly decided to fully support freedom of speech. On the contrary, it would appear that Sina is using more advanced and subtler methods to censor search results. All keywords mentioned below are normally explicitly and completely blocked. But each behaved a little bit differently on the morning of May 31.

Strictly filtering

The results I received when I searched for “六四事件” (June 4th incident) showed that the first page of results displayed not all results but carefully selected results. While the first results page seems to indicate that there are more than 50 pages of results, no results are shown when you click through to the next page or any page beyond the first. This censorship tactic was also employed last year for “香港” (Hong Kong) (Solidot story in Chinese).
Keyword example: “六四事件”(June 4th incident)

Delayed censorship

I have also previously reported that Sina has delayed search results for sensitive keywords.
When testing the delayed censorship tactic, I conducted two simultaneous searches of similar keywords, one sensitive and one non-sensitive. In the case of searching for results for “六四” (June 4), I used “五七” (May 7) as a control group, to be sure that search results were indeed intentionally delayed. As suspected, results for “五七” (May 7) displayed posts that were ten minutes old while the most recent June 4 posts were several hours old.
This indicates a marked improvement on Sina’s time delay censorship mechanism. Previously the default time delay was seven days. It is likely that many Sina Weibo users would find this time delay to be odd but could attribute this to a bug or a glitch in the system. With today’s hours-long time delay, most users will likely think that discussion around this topic is ongoing.
This is an example of censorship at its worst - users suspect their search term might get blocked before they search but instead of a censorship notice they are led to believe that what they are searching for is not sensitive plus not many people are saying anything interesting about the keyword anyway.
A good example of this can be found when searching for “天安门事件” (Tiananmen incident). The current search results do in fact bring up quite a bit of information about an incident which occurred in Tiananmen Square - in 1976.
Keyword example “六四” (June 4th), “天安门事件” (Tiananmen incident), 24周年 (24th anniversary) 法轮功 (Falungong)

Implicit complete censorship

Sina now returns a fake “Sorry, no results can be found” message instead of the normal censorship message that we have come to at once love and hate: “According to relevant laws, regulations and policies, search results for [the blocked keyword] can not be displayed.”
Keyword example: “64事件” (64 incident)

Implicit semi-censorship

Sina returned a partial censorship notice from page 2 onwards. The first page looks completely normal, leading users to believe that there are no decent search results for that keyword - why continue searching?
Keyword example: “天安门大屠杀”

V user only censorship

All search results come from V users only. V users on Sina are like verified account on Twitter. Those users are more likely to follow the rules as they might face revoke of V status or block of account.
Keyword example: “游行” (demonstration)

Wednesday, January 23, 2013

GitHub blocked in China - how it happened, how to get around it, and where it will take us

What happened?

Update: On January 23, https://github.com was unblocked again.

On January 18, or possibly the day before (though our test data doesn’t cover this), the Great Firewall began to reset connections containing “*.github.com”. As a result, code sharing projects hosted on a subdomain of GitHub, such as aoxu.github.com, were blocked in China. The main GitHub website was mostly unaffected, for two reasons. Firstly, it’s hosted on github.com, without a subdomain. Secondly, it serves encrypted content only, thus preventing the Great Firewall from resetting connections based on keywords.
A day later, the block was extended through the inclusion of github.com, without subdomains, in the list of keywords causing connections to be reset. Chinese users could still access GitHub as long as they manually typed in https://github.com in their browser (notice the https). Strangely the www.github.com host was DNS poisoned, but not any other hosts. The www subdomain is not used by GitHub.
On January 21, DNS poisoning was extended to all github.com hosts including the root domain as well as all its subdomains. In effect, all of GitHub was blocked in China.
Interestingly, the blocking of GitHub has seemingly not been censored on social media. The keyword “github” has not been blocked on Sina Weibo, and we have not detected any deleted posts containing “github” on FreeWeibo.
For further information on how the blocking was introduced, including data references, see the Timeline at the end of this article.

Why oh why?

As always when online censorship in China changes, the first question asked is why. While we cannot be certain, it doesn’t stop us, or anyone else, from speculating.
Some have suggested that it may be because of the Mongol project, hosted on GitHub. Mongol is an open-source tool used to detect routers that block certain connections going out of China - in essence tracking where the Great Firewall is located. While such a tool may seem threatening from the point of view of the Chinese authorities, there are a few facts that make the blocking of Mongol seem unlikely: the tool was released a full month ago(link is external), the working principle of the software was released back in 2011 and the paper describing it is still not blocked.
Another theory is that the government jumped on the opportunity to block an all-encrypted file-sharing service which, though intended for code sharing, can also be used to share politically sensitive material. Other file sharing services have faced similar dilemmas in China, including Dropbox which was blocked in 2010. Was GitHub being used by activists to share information?

The train ticket theory

The most gripping tale though ties this story in with China’s annual mass migration during the new year holiday. Each year tens of millions of Chinese scramble to purchase a limited and insufficient number of train tickets so they can make the journey home to spend the holiday with their families. Train tickets in China can only be bought 18 days ahead of the planned journey. With tens of millions of people traveling home for the Spring Festival, getting hold of the right ticket is a real challenge. Failure can mean missing out on the often only once-a-year chance to meet up with the family.
With the increased use of the internet, however, a lot of ticket sales are done online via the government-run website 12306.cn. While waiting for the right ticket to go on sale, users will often reload a web page continuously. This is of course a problem easily solved by creative software developers. Several Chinese web browser providers rolled out add-ons(link is external)that automatically reload the government website and book the ticket as soon as it's available.
A particularly interesting add-on was called 12306_ticket_helper (https://github.com/iccfish/12306_ticket_helper(link is external), now deleted). The software was using files embedded on GitHub. It’s sudden popularity caused such a traffic load that GitHub temporarily went offline, and an employee sent an abuse complaint to 12306.cn(link is external). GitHub didn't know that it was actually the browser add-on that embedded the file, and not the 12306.cn website itself.
On January 18, at the same time that the GitHub block was introduced, the Ministry of Railways was said to be asking Kingsoft(link is external), one of the other browser providers, to disable their ticket-buying add-on. On the same day, the Ministry of Industry ordered all browser providers(link is external) to remove similar add-ons.
Is the GitHub block just a matter of the site being in the wrong place at the wrong time? It’s not inconceivable to think that when the Ministers of Railways and Industry say “dance” that everyone dances. After all of the accomplices who were involved in the ticket scandal made amends, it is likely that they looked further to see who else was involved and GitHub may have just found themselves caught in that net.
If this is true, then this episode does reveal something about the Chinese censorship mechanism. One of two things would have had to occur for GitHub to have been blocked. The person who has his finger on the censorship button had free reign to just censor what he thought needed to be censored (in relation to the ticket scandal) which would indicate that this civil servant does not have to jump through a lot of hoops when he thinks a site should be blocked. Another explanation is that the powers-that-be in the censorship bureau who gave the go-ahead to block GitHub are so incompetent that they could not comprehend the fallout related to closing down the site. They were either too lazy to investigate, too distracted to care or just plain oblivious to the role that GitHub plays for many developers across China.
Our tests indicate that the likely answer is a combination of the two theories above. At first the censors started resetting *.github.com but found that this was ineffective. So then they moved to a more comprehensive block when they understood that the first one was not working. Which would mean that the powers-that-be had no understanding of how GitHub works and the civil servant with his finger on the button can choose to push that button whenever he wants.

The HTTPS theory (true either way)

Because GitHub is HTTPS-only, the Great Firewall cannot block individual pages. Regardless of the specific project the authorities wanted to block access to, the only way they could do it was to block GitHub altogether. This could have severe implications for other websites as well. As more and more of the Internet is switching to encrypted connections, the ability for online censorship authorities to selectively block content decreases. If, or perhaps when, Google Search, Wikipedia and CNN switch to HTTPS-only, will the Chinese authorities decide to block them altogether as well?

What will the knock on effects be?

According to Alexa, GitHub is the 276th most popular(link is external) website in China. Globally, GitHub is ranked 209th. Since its targeting a very specific audience (software developers), that’s not a bad ranking. Github themselves told Techinasia(link is external) that China ranks fourth in terms of visits to the site.  The only foreign-hosted websites ranked higher than GitHub in China are Google, Bing (and Live.com, Microsoft.com, Msn.com), Amazon, Yahoo, Wikipedia, Apple, eBay and Adobe.
While GitHub is popular, there are many other code-sharing services offering alternatives. Google Code is not blocked, though the HTTPS version sometimes is, and if or when they switch to HTTPS-only they may well face the same dilemma as GitHub. Sourceforge is also not blocked, as well as many other smaller providers.
Software developers often have to work with whatever code sharing service their project is already using. Switching from one to another is somewhat complicated. Many Chinese developers, especially the ones that work with customers abroad, will now have to use circumvention tools to stay in business. With such tools being actively targeted, some of them may not be able to continue their work at all.
China has been successful in attracting a lot of foreign developer houses to the country due to lower costs and access to plenty of developer talent. Foreign investors in this area may now start to question if it is a wise decision to place so many human resources in a country that may prevent or limit access to key technical resources without warning. Companies who run Gmail for their enterprises have learned the hard way that their communications can be turned off on a whim. Most who experienced outages when China completely blocked Google last November have probably found enterprise alternatives to Gmail already. Companies will now likely consider more stable alternatives to China.
The most devastating impact could come in an attitude shift amongst young Chinese. China’s censors have effectively just pissed off a whole nation of developers. It is likely they knew how to get around the firewall anyway but when developers have to turn on VPNs or fiddle with proxies in order to do their jobs, they will get upset. Does China really want to create a generation of would-be hackers? Especially within her borders? Could this signal the birth of a Chinese Anonymous? Perhaps an end to online censorship in China is now closer than we think?

How to get around it?

If the Great Firewall has not fallen by the time you read this, then you can follow these instructions to circumvent the blocking of GitHub.
If you are using a VPN, all your traffic is rerouted through a foreign server and GitHub will work as usual. Unless the Great Firewall also blocks the IP address of GitHub, another simpler alternative is to manually edit the so-called hosts file, adding the following entry:
207.97.227.239 github.com
With such an entry in place, connections to https://github.com will work from inside the Great Firewall. The unencrypted http://github.com will not work, so remember to add the “https” manually.
The IP address of GitHub may change at any time, of course. A more stable solution is to use an encrypted DNS lookup service such as DNSCrypt(link is external) which effectively bypasses DNS poisining. Ironically, the Mac version downloads links to GitHub, which of course is blocked. But the final download link is not blocked: http://download.dnscrypt.org/guis/opendns/osx/dnscrypt-osx-client-0.19.dmg.

Timeline

DateEvent
Jan 18Connection reset of *.github.com includingwww.github.com (not DNS poisoned)

Jan 19Connection reset of github.com
Jan 19DNS poisoning ofwww.github.com
Jan 19thewww.github.com keyword causes connection reset on Google Search
Jan 20Connection reset of *.github.com (still not DNS poisoned)
Jan 21DNS poisoning of github.com root domain (as well as *.github.com)