Scaling the Wall One Word at a Time

Researchers study the computer science behind Internet censorship

Type "Falun Gong" into www.Yahoo.cn, the Chinese version of the popular search engine, and you'll probably receive a message like this: "Connection Interrupted. The connection to the server was reset while the page was loading."

It's not your computer or the browser. The problem rests thousands of miles away in China. By searching for "Falun Gong," a religious practice banned by the Chinese government, you've run headlong into the Great Firewall of China (GFC), the Chinese Internet censorship program.

But try that same search several more times and there's a good chance it will get through. That inconsistency in what should be a solid perimeter is what's so interesting to Jed Crandall, assistant professor of computer science. For the last year and a half, Crandall and computer science doctoral student Jong Chun Park, along with other collaborators from UNM and University of California, Davis, have been using machine learning, algorithms, and some old-fashioned sleuthing techniques to study the technical workings of the GFC and Chinese Internet censorship. Their findings will shape future Internet policy and help secure networks closer to home.

The Science of Words

The research team's first goal is to understand the computer science behind the GFC. Central to that effort is a list of 130 banned words assembled from the Chinese version of Wikipedia. Two UNM computer science students, Erik Webb and Tamsen Schurman, took the lead in discovering the most recent list. The team prioritizes which keywords to test using a process called ConceptDoppler. Inspired by the weather-tracking tool that uses the mixing of gases to forecast weather changes, ConceptDoppler uses the complex relationship between sensitive concepts and blacklisted keywords to track Internet censorship.

With that list in hand, Park tests how the system works by repeatedly sending keyword requests and mapping their progress through the Internet. "You can see how far a packet goes before it's detected by a filter," explains Crandall. A packet is a piece of a message sent over the network. "If it goes through 14 routers in China and gets censored, then we know it's the 14th hop on that route that's sending the reset." A "reset" is essentially a disconnection forged by the censors, which stops the computers from sharing more data.

Park's testing process has also revealed censorship disparities in the different layers of the Internet. Because of how Internet protocols work and how packets are routed, the application-level behavior of the censor-what users actually see-can vary widely from city to city and even between two computers on the same network. For example, two reporters in two rooms in the same hotel might consistently get different results about whether a page is accessible or not due to effects such as traffic engineering operating on the individual packet level. Park and Crandall are trying to understand these packet-level dynamics and the dramatic effects they can have on the application-level behavior of censorship.

Because Crandall and Park can't conduct their research in China, they enlisted the help of Associate Professor of Computer Science Terran Lane, who took a page from neuroscience. Brain researchers use "computed tomography," a way of measuring what's happening inside the brain from the outside using a CAT scan. In a similar process called "network tomography," Crandall and colleagues use different technologies to measure the network from the outside to get a clearer picture of what's happening inside the Internet in China.

Finding Faults

They've already made several interesting discoveries. Unlike a true firewall that stops everything at the border, the GFC waits to block keywords deeper in the network. The level of censorship also varies based on the time of day and the volume of Internet traffic. Censorship is less efficient when usage is high. And, surprisingly, 28 percent of the tested paths weren't censored at all. "It's like closing access to Albuquerque by putting road blocks on the freeways and leaving all the side roads open," explains Crandall.

The keyword list reveals interesting results, too. Some words, like Tiananmen Square, are expected on the blacklist. Others aren't. For example, when North Rhine-Westphalia, an area in western Germany, is translated into Chinese characters it appears to contain the word "Falun" and is, therefore, censored. Crandall says such imprecise censorship has wide-reaching social effects - from unintentionally banning innocuous content to heightening self-censorship. He says people won't go to certain web sites because they assume they're broken, and by manipulating blacklisted words, the government can stop news of a protest from spreading. "It's effective, but not in the sense that people normally would think about censorship. It's more of a mass effect," he adds.

While Crandall and Park are focusing on China, Internet censorship is a worldwide issue. "Censorship in the Internet is surprisingly widespread and increasing quickly," says Stephanie Forrest, chair of the Computer Science Department. "Yet, very little is understood about the what, why, where, and how of Internet censorship. Professor Crandall's work on ConceptDoppler is a significant technical advance in Internet measurement, one which is allowing us to begin answering these questions."

The Open Net Initiative, which monitors Internet filtering and surveillance, says there are at least 26 known countries conducting some form of Internet filtering. "If we don't set effective global censorship policies, then the Internet could be transformed into something that's controlled in different domains," warns Crandall.

Results from the research have broad applications. Because the technology used by the Chinese is the same as that behind intrusion detection, results could help researchers and the large ISPs that control the Internet backbone develop better security tools for networks covering large geographic regions. Crandall and Park plan to continue their testing and are developing partnerships with other researchers to expand the scope of their work in hopes of unraveling more clues hidden behind the GFC. In doing so, they'll wield intellect and computing power instead of sledgehammers to bring down the walls that divide people around the world.