CyberWarfare / ExoWarfare

Reality Mining: How Mass Surveillance Works in Xinjiang, China

(Articles below updated last: 05 JUN 2019)

‘Reverse Engineering’ Police App Reveals Profiling and Monitoring Strategies

(Synapsis; full report further below)

New York, May 2, 2019

Chinese authorities are using a mobile app to carry out illegal mass surveillance and arbitrary detention of Muslims in China’s western Xinjiang region.

The Human Rights Watch report, “China’s Algorithms of Repression: Reverse Engineering a Xinjiang Police Mass Surveillance App,” presents new evidence about the surveillance state in Xinjiang, where the government has subjected 13 million Turkic Muslims to heightened repression as part of its “Strike Hard Campaign against Violent Terrorism.” Between January 2018 and February 2019, Human Rights Watch was able to reverse engineer the mobile app that officials use to connect to the Integrated Joint Operations Platform (IJOP), the Xinjiang policing program that aggregates data about people and flags those deemed potentially threatening. By examining the design of the app, which at the time was publicly available, Human Rights Watch revealed specifically the kinds of behaviors and people this mass surveillance system targets.

“Our research shows, for the first time, that Xinjiang police are using illegally gathered information about people’s completely lawful behavior – and using it against them,” said Maya Wang, senior China researcher at Human Rights Watch. “The Chinese government is monitoring every aspect of people’s lives in Xinjiang, picking out those it mistrusts, and subjecting them to extra scrutiny.”

Human Rights Watch published screenshots from the IJOP app, in the original Chinese and translated into English.

The app prompts government officials to collect a wide array of information from ordinary people in Xinjiang.

From a drop-down menu, officials are prompted to choose the circumstances under which information is being collected.

The information it gathers ranges from people’s blood type to their height, from their “religious atmosphere” to their political affiliation.

The app’s source code also reveals that the police platform targets 36 types of people for data collection. Those include people who have stopped using smart phones, those who fail to “socialize with neighbors,” and those who “collected money or materials for mosques with enthusiasm.”

The IJOP platform tracks everyone in Xinjiang. It monitors people’s movements by tracing their phones, vehicles, and ID cards. It keeps track of people’s use of electricity and gas stations.

Human Rights Watch found that the system and some of the region’s checkpoints work together to form a series of invisible or virtual fences. People’s freedom of movement is restricted to varying degrees depending on the level of threat authorities perceive they pose, determined by factors programmed into the system.

A former Xinjiang resident told Human Rights Watch a week after he was released from arbitrary detention: “I was entering a mall, and an orange alarm went off.” The police came and took him to a police station. “I said to them, ‘I was in a detention center and you guys released me because I was innocent.’… The police told me, ‘Just don’t go to any public places.’… I said, ‘What do I do now? Just stay home?’ He said, ‘Yes, that’s better than this, right?’”

The authorities have programmed the IJOP so that it treats many ordinary and lawful activities as indicators of suspicious behavior. For example:

Officials are prompted to investigate those determined to have used an “unusual” amount of electricity.

Officials can select from a list of reasons for unusual electricity consumption, such as: “purchased new electronics for domestic use” or “doing renovations.”

The system detects when the registered owner of the car is not the same as the person who is buying gasoline.

The app’s source code suggests that nearby officials are required to investigate by logging the reasons for the mismatch, …

… and deciding whether this case seems suspicious and requires further police investigation.

The app alerts officials to people who took trips abroad that it considers excessively long, …

… then prompts officials to interrogate the “overdue” person or their relatives and other acquaintances, asking them for details about the travel.

The system alerts officials if it has lost track of someone’s phone, to determine whether the owner’s actions are suspicious and require investigation.

Some of the investigations involve checking people’s phones for any one of the 51 internet tools that are considered suspicious, including WhatsApp, Viber, Telegram, and Virtual Private Networks (VPNs), Human Rights Watch found. The IJOP system also monitors people’s relationships, identifying as suspicious travelling with anyone on a police watch list, for example, or anyone related to someone who has recently obtained a new phone number.

Based on these broad and dubious criteria, the system generates lists of people to be evaluated by officials for detention. Official documents state individuals “who ought to be taken, should be taken,” suggesting the goal is to maximize detentions for people found to be “untrustworthy.” Those people are then interrogated without basic protections. They have no right to legal counsel, and some are tortured or otherwise mistreated, for which they have no effective redress.

The IJOP system was developed by China Electronics Technology Group Corporation (CETC), a major state-owned military contractor in China. The IJOP app was developed by Hebei Far East Communication System Engineering Company (HBFEC), a company that, at the time of the app’s development, was fully owned by CETC.

Under the Strike Hard Campaign, Xinjiang authorities have also collected biometrics, including DNA samples, fingerprints, iris scans, and blood types of all residents in the region ages 12 to 65. The authorities require residents to give voice samples when they apply for passports. All of this data is being entered into centralized, searchable government databases. While Xinjiang’s systems are particularly intrusive, their basic designs are similar to those the police are planning and implementing throughout China.

The Chinese government should immediately shut down the IJOP platform and delete all the data that it has collected from individuals in Xinjiang, Human Rights Watch said. Concerned foreign governments should impose targeted sanctions, such as under the US Global Magnitsky Act, including visa bans and asset freezes, against the Xinjiang Party Secretary, Chen Quanguo, and other senior officials linked to abuses in the Strike Hard Campaign. They should also impose appropriate export control mechanisms to prevent the Chinese government from obtaining technologies used to violate basic rights. United Nations member countries should push for an international fact-finding mission to assess the situation in Xinjiang and report to the UN Human Rights Council.






read the full report:

China’s Algorithms of Repression

Reverse Engineering a Xinjiang Police Mass Surveillance App

A Xinjiang Police College webpage shows police officers collecting information from villagers in Kargilik (or Yecheng) County in Kashgar Prefecture, Xinjiang. Source: Xinjiang Police College website


Since late 2016, the Chinese government has subjected the 13 million ethnic Uyghurs and other Turkic Muslims in Xinjiang to mass arbitrary detention, forced political indoctrination, restrictions on movement, and religious oppression. Credible estimates indicate that under this heightened repression, up to one million people are being held in “political education” camps. The government’s “Strike Hard Campaign against Violent Terrorism” (Strike Hard Campaign, 严厉打击暴力恐怖活动专项行动) has turned Xinjiang into one of China’s major centers for using innovative technologies for social control.

This report provides a detailed description and analysis of a mobile app that police and other officials use to communicate with the Integrated Joint Operations Platform (IJOP, 一体化联合作战平台), one of the main systems Chinese authorities use for mass surveillance in Xinjiang. Human Rights Watch first reported on the IJOP in February 2018, noting the policing program aggregates data about people and flags to officials those it deems potentially threatening; some of those targeted are detained and sent to political education camps and other facilities. But by “reverse engineering” this mobile app, we now know specifically the kinds of behaviors and people this mass surveillance system targets.

The findings have broader significance, providing an unprecedented window into how mass surveillance actually works in Xinjiang, because the IJOP system is central to a larger ecosystem of social monitoring and control in the region. They also shed light on how mass surveillance functions in China. While Xinjiang’s systems are particularly intrusive, their basic designs are similar to those the police are planning and implementing throughout China.

Many—perhaps all—of the mass surveillance practices described in this report appear to be contrary to Chinese law. They violate the internationally guaranteed rights to privacy, to be presumed innocent until proven guilty, and to freedom of association and movement. Their impact on other rights, such as freedom of expression and religion, is profound.



Human Rights Watch finds that officials use the IJOP app to fulfill three broad functions: collecting personal information, reporting on activities or circumstances deemed suspicious, and prompting investigations of people the system flags as problematic.

Analysis of the IJOP app reveals that authorities are collecting massive amounts of personal information—from the color of a person’s car to their height down to the precise centimeter—and feeding it into the IJOP central system, linking that data to the person’s national identification card number. Our analysis also shows that Xinjiang authorities consider many forms of lawful, everyday, non-violent behavior—such as “not socializing with neighbors, often avoiding using the front door”—as suspicious. The app also labels the use of 51 network tools as suspicious, including many Virtual Private Networks (VPNs) and encrypted communication tools, such as WhatsApp and Viber.

The IJOP app demonstrates that Chinese authorities consider certain peaceful religious activities as suspicious, such as donating to mosques or preaching the Quran without authorization. But most of the other behavior the app considers problematic are ethnic-and religion-neutral. Our findings suggest the IJOP system surveils and collects data on everyone in Xinjiang. The system is tracking the movement of people by monitoring the “trajectory” and location data of their phones, ID cards, and vehicles; it is also monitoring the use of electricity and gas stations of everybody in the region. This is consistent with Xinjiang local government statements that emphasize officials must collect data for the IJOP system in a “comprehensive manner” from “everyone in every household.”

When the IJOP system detects irregularities or deviations from what it considers normal, such as when people are using a phone that is not registered to them, when they use more electricity than “normal,” or when they leave the area in which they are registered to live without police permission, the system flags these “micro-clues” to the authorities as suspicious and prompts an investigation.

Another key element of IJOP system is the monitoring of personal relationships. Authorities seem to consider some of these relationships inherently suspicious. For example, the IJOP app instructs officers to investigate people who are related to people who have obtained a new phone number or who have foreign links.

The authorities have sought to justify mass surveillance in Xinjiang as a means to fight terrorism. While the app instructs officials to check for “terrorism” and “violent audio-visual content” when conducting phone and software checks, these terms are broadly defined under Chinese laws. It also instructs officials to watch out for “adherents of Wahhabism,” a term suggesting an ultra-conservative form of Islamic belief, and “families of those…who detonated [devices] and killed themselves.” But many—if not most—behaviors the IJOP system pays special attention to have no clear relationship to terrorism or extremism. Our analysis of the IJOP system suggests that gathering information to counter genuine terrorism or extremist violence is not a central goal of the system.

The app also scores government officials on their performance in fulfilling tasks and is a tool for higher-level supervisors to assign tasks to, and keep tabs on the performance of, lower-level officials. The IJOP app, in part, aims to control government officials to ensure that they are efficiently carrying out the government’s repressive orders.

In creating the IJOP system, the Chinese government has benefitted from Chinese companies who provide them with technologies. While the Chinese government has primary responsibility for the human rights violations taking place in Xinjiang, these companies also have a responsibility under international law to respect human rights, avoid complicity in abuses, and adequately remedy them when they occur.

As detailed below, the IJOP system and some of the region’s checkpoints work together to form a series of invisible or virtual fences. Authorities describe them as a series of “filters” or “sieves” throughout the region, sifting out undesirable elements. Depending on the level of threat authorities perceive—determined by factors programmed into the IJOP system—, individuals’ freedom of movement is restricted to different degrees. Some are held captive in Xinjiang’s prisons and political education camps; others are subjected to house arrest, not allowed to leave their registered locales, not allowed to enter public places, or not allowed to leave China.

Government control over movement in Xinjiang today bears similarities to the Mao Zedong era (1949-1976), when people were restricted to where they were registered to live and police could detain anyone for venturing outside their locales. After economic liberalization was launched in 1979, most of these controls had become largely obsolete. However, Xinjiang’s modern police state—which uses a combination of technological systems and administrative controls—empowers the authorities to reimpose a Mao-era degree of control, but in a graded manner that also meets the economy’s demands for largely free movement of labor.

The intrusive, massive collection of personal information through the IJOP app helps explain reports by Turkic Muslims in Xinjiang that government officials have asked them or their family members a bewildering array of personal questions. When government agents conduct intrusive visits to Muslims’ homes and offices, for example, they typically ask whether the residents own exercise equipment and how they communicate with families who live abroad; it appears that such officials are fulfilling requirements sent to them through apps such as the IJOP app. The IJOP app does not require government officials to inform the people whose daily lives are pored over and logged the purpose of such intrusive data collection or how their information is being used or stored, much less obtain consent for such data collection.


A checkpoint in Turpan, Xinjiang. Some of Xinjiang’s checkpoints are equipped with special machines that, in addition to recognizing people through their ID cards or facial recognition, are also vacuuming up people’s identifying information from their electronic devices. © 2018 Darren Byler


The Strike Hard Campaign has shown complete disregard for the rights of Turkic Muslims to be presumed innocent until proven guilty. In Xinjiang, authorities have created a system that considers individuals suspicious based on broad and dubious criteria, and then generates lists of people to be evaluated by officials for detention. Official documents state that individuals “who ought to be taken, should be taken,” suggesting the goal is to maximize the number of people they find “untrustworthy” in detention. Such people are then subjected to police interrogation without basic procedural protections. They have no right to legal counsel, and some are subjected to torture and mistreatment, for which they have no effective redress, as we have documented in our September 2018 report. The result is Chinese authorities, bolstered by technology, arbitrarily and indefinitely detaining Turkic Muslims in Xinjiang en masse for actions and behavior that are not crimes under Chinese law.

And yet Chinese authorities continue to make wildly inaccurate claims that their “sophisticated” systems are keeping Xinjiang safe by “targeting” terrorists “with precision.” In China, the lack of an independent judiciary and free press, coupled with fierce government hostility to independent civil society organizations, means there is no way to hold the government or participating businesses accountable for their actions, including for the devastating consequences these systems inflict on people’s lives.

The Chinese government should immediately shut down the IJOP and delete all the data it has collected from individuals in Xinjiang. It should cease the Strike Hard Campaign, including all compulsory programs aimed at surveilling and controlling Turkic Muslims. All those held in political education camps should be unconditionally released and the camps shut down. The government should also investigate Party Secretary Chen Quanguo and other senior officials implicated in human rights abuses, including violating privacy rights, and grant access to Xinjiang, as requested by the Office of the United Nations High Commissioner for Human Rights and UN human rights experts.

Concerned foreign governments should impose targeted sanctions, such as the US Global Magnitsky Act, including visa bans and asset freezes, against Party Secretary Chen and other senior officials linked to abuses in the Strike Hard Campaign. They should also impose appropriate export control mechanisms to prevent the Chinese government from obtaining technologies used to violate basic rights.





Uighur’s pray in Xinjiang in 2015. UN chief Antonio Guterres was under pressure from rights groups to publicly confront Beijing over the mass detention of the Muslim minority (AFP Photo/Greg Baker)

China data leak exposes mass surveillance in Xinjiang

Beijing (AFP) – A Chinese technology firm has compiled a range of personal information on 2.6 million people in Xinjiang — from their ethnicity to locations — according to a data leak highlighting the wide extent of surveillance in the restive region.

Xinjiang is home to most of China’s Uighur ethnic minority lives and has been under heavy police surveillance in recent years after violent inter-ethnic tensions.

Nearly one million Uighurs and other Turkic language-speaking minorities in Xinjiang are reportedly held in re-education camps, according to a UN panel of experts.

The leak was discovered last week by security researcher Victor Gevers, who found that Chinese tech company SenseNets had stored the records of individuals in an open database “fully accessible to anyone”.

The records included information such as their Chinese ID number, birthday, address, ethnicity, and employer.

The exposed data also linked individuals to GPS coordinates — labelled with descriptions such as “mosque” — captured by tracking devices around the region.

Within a 24-hour period, more than six million locations were saved by SenseNets’ tracking devices, according to Gevers, who works at Dutch online security non-profit GDI Foundation and posted his findings on Twitter.

“You can clearly see they have absolutely no clue about network security,” he told AFP, describing SenseNets’ IT skills as belonging “to the early 90s”.

“Who in their right mind runs a database which is completely open and gives any visitors full administrative rights so then those database records can be manipulated by anyone with an internet connection?” he said.

“It simply does not compute.”

The database had been exposed since last July but was closed last Thursday, after Gevers reported the leak to SenseNets, he said.

SenseNets told AFP it was not accepting media interviews. The Xinjiang government did not immediately respond to AFP’s request for comment.

– Blacklisted –

The demand for high-tech surveillance in Xinjiang region has led to the placing of surveillance cameras inside mosques, restaurants and other public places, while police checkpoints have been set up across the region.

It has has also created lucrative business opportunities for artificial intelligence companies like SenseNets, which specialises in facial recognition.

On its website, the Shenzhen-based firm showcases different applications, from detecting “blacklisted” individuals in a crowd to tracing a suspect’s whereabouts.

The technology firm partners with public security bureaus around the country, as well as US tech firms such as Microsoft and semiconductor company AMD.

In 2016, for instance, it helped local police in southern Guangdong province identify individuals involved in organising an “illegal gathering” — a term that often refers to protests in China.

SenseNets is majority-owned by NetPosa, a public company listed on the Shenzhen stock exchange. On its website, the Beijing-based firm calls itself a “leading manufacturer of video surveillance platforms” and boasted coverage of over 1.5 million roads in China at the end of 2017.






China’s Xinjiang Region A Surveillance State Unlike Any the World Has Ever Seen

In western China, Beijing is using the most modern means available to control its Uighur minority. Tens of thousands have disappeared into re-education camps. A journey to an eerily quiet region.

Police patrol a night food market near the Id Kah Mosque in Kashgar in Chinas Xinjiang Uighur Autonomous Region. This picture taken on June 25, 2017 shows police patrolling in a night food market near the Id Kah Mosque in Kashgar in China’s Xinjiang Uighur Autonomous Region, a day before the Eid al-Fitr holiday.
The increasingly strict curbs imposed on the mostly Muslim Uighur population have stifled life in the tense Xinjiang region, where beards are partially banned and no one is allowed to pray in public. Beijing says the restrictions and heavy police presence seek to control the spread of Islamic extremism and separatist movements, but analysts warn that Xinjiang is becoming an open air prison. / AFP PHOTO / Johannes EISELE / TO GO WITH China-religion-politics, FOCUS by Ben Dooley (Photo credit should read JOHANNES EISELE/AFP/Getty Images)


July 26, 2018

These days, the city of Kashgar in westernmost China feels a bit like Baghdad after the war. The sound of wailing sirens fills the air, armed trucks patrol the streets and fighter jets roar above the city. The few hotels that still host a smattering of tourists are surrounded by high concrete walls. Police in protective vests and helmets direct the traffic with sweeping, bossy gestures, sometimes yelling at those who don’t comply.

But now and then, a ghostly calm descends on the city. Just after noon, when it’s time for Friday prayers, the square in front of the huge Id Kah Mosque lies empty. There’s no muezzin piercing the air, just a gentle buzz on the rare occasion that someone passes through the metal detector at the entrance to the mosque. Dozens of surveillance cameras overlook the square. Security forces, some in uniform and others in plain-clothes, do the rounds of the Old Town with such stealth it’s as if they were trying to read people’s minds.

Journalists are not immune to their attentions. No sooner have we arrived than two police officers insist on sitting down with us for a “talk.” The next day in our hotel, one of them emerges from a room on our floor. When we take a walk through the city in the morning, we’re followed by several plain-clothes officers. Eventually, we’re being tailed by some eight people and three cars, including a black Honda with a covered license plate — apparently the secret police. Occasionally, our minders seem to be leaving us alone, but already awaiting us at the next intersection are the surveillance cameras that reach into every last corner of Kashgar’s inner city. The minute we strike up conversation with anyone, officials appear and start interrogating them.

Before too long, they’ll detain us too. More on that later. But while the authorities in Xinjiang keep close tabs on foreign reporters, their vigilance is nothing compared to their persecution of the Uighur population.

Nowhere in the world, not even in North Korea, is the population monitored as strictly as it is in the Xinjiang Uighur Autonomous Region, an area that is four times the size of Germany and shares borders with eight countries, including Pakistan, Afghanistan Tajikistan and Kazakhstan.

Oppression has been in place for years, but has worsened massively in recent months. It is targeted primarily at the Uighur minority, a Turkic ethnic group of some 10 million Sunni Muslims considered by Beijing to be a hindrance to the development of a “harmonious society.” A spate of attacks involving Uighur militants has only consolidated this belief.

The Uighurs see themselves as a minority facing cultural, religious and economic discrimination. When Xinjiang was incorporated into the People’s Republic of China in 1949, they comprised roughly 80 percent of the region’s population. Controlled migration to Xinjiang of Han Chinese has reduced this share to 45 percent, and it is mainly these migrants who benefit from the economic boom in the region, which has plentiful supplies of oil, gas and coal.

With the Uighurs protesting, Beijing has tightened its grip and turned Xinjiang into a security state that is extreme even by China’s standards, being a police state itself. According to Adrian Zenz, a German expert on Xinjiang, the provincial government has recruited over 90,000 police officers in the last two years alone — twice as many as it recruited in the previous seven years. With around 500 police officers for every 100,000 inhabitants, the police presence will soon be almost as tight as it is in neighboring Tibet.

At the same time, Beijing is equipping the far-western region with state-of-the-art surveillance technology, with cameras illuminating every street all over the region, from the capital Urumqi to the most remote mountain village. Iris scanners and WiFi sniffers are in use in stations, airports and at the ubiquitous checkpoints — tools and programs that allow data traffic from wireless networks to be monitored.

The data is then collated by an “integrated joint operations platform” that also stores further data on the populace — from consumer habits to banking activity, health status and indeed the DNA profile of every single inhabitant of Xinjiang.


FILE — Uighurs, a group of mostly Sunni Muslims, in Kashgar in the far western Chinese region of Xinjiang, Dec. 7, 2015. The Chinese government, dominated by the Han ethnic group, has tightened control and confiscated passports in areas with Uighurs, who are mostly Sunni Muslims. (Adam Dean/The New York Times)


Anyone with a potentially suspicious data trail can be detained. The government has built up a grid of hundreds of re-education camps. Tens of thousands of people have disappeared into them in recent months. Zenz estimates the number to be closer to hundreds of thousands. More precise figures are difficult to obtain. Censorship in Xinjiang is the strictest in China and its authorities the most inscrutable.

But a distinct impression forms after a trip through the territory and numerous conversations with its inhabitants, who all want to remain anonymous. Xinjiang, one of the most remote and backward regions in booming China, has become a real-life dystopia. It provides a glimpse of what an authoritarian regime armed with 21st century technology is capable of.

Urumqi: Police, Block Leaders and Snitches

With its ultra-modern skyline, the capital of Xinjiang is home to a population of some 3.5 million, 75 percent of which are Han Chinese. The Uighurs make up the largest minority. Kazakhs, Mongolians and Chinese-speaking Muslim Hui people also live here. “All ethnic groups belong together like the seeds of a pomegranate,” reads a banner overlooking Urumqi’s multilane ring road.

“The truth is, you can’t trust the Uighurs,” says a Han Chinese who used to work for the military. “They act like they’re your friend but they only really stick together.”

Mistrust between these two ethnic groups has been growing for decades. In 2009, tensions erupted in Xinjiang and claimed nearly 200 lives. Most of the dead were Han Chinese. In 2014, knife-wielding Uighur militants killed 31 people in Kunming. Just months later, two cars sped into a busy street market in Urumqi, killing dozens. There have been fewer major attacks since, but rumors abound among the Han Chinese that serious incidents frequently occur in the south of Xinjiang but go unreported.

In a bid to see calm return to the region, Beijing brought in hardliner Chen Quanguo, party boss in Tibet, and put him in charge in Xinjiang. Within two years, he implemented the same policy he enacted in Tibet and installed police stations across the region. These bunker-like, barricaded and heavily guarded buildings now litter every crossroads of the major cities.

Chen also introduced a block leader system not unlike the old German “Blockwarts,” with members of the local Communist Party committee given powers to inspect family homes and interrogate them about their lives: Who lives here? Who visited? What did you talk about? Even the controllers are getting controlled: Many apartments have bar code labels on the inside of the front door which the official must scan to prove that he or she carried out the visit.

To optimize social control, neighbors are now also instructed to turn each other in. “They came to me at the start of the year,” says a businessman from Urumqi. “They said: You and your neighbor are now responsible for each other. If either of you does anything unusual, the other will be held responsible.” The businessman says he loves his country. “But I refuse to spy on my neighbor.”

Chen’s predecessor pinned his hopes on an economic upswing in Xinjiang, says a driver who also lives in Urumqi, gesturing at the downtown skyscrapers. He hoped that the more economically comfortable the population would become, the safer the region would be. “No one believes that anymore. The economy continues to grow, but the first priority now is repression.”

Turpan: A Duty To Ramp Up Security

A two-hour drive south of Urumqi is the city-oasis of Turpan, historically located directly on the Silk Road. Over the centuries, temples and mosques were built here by Chinese, Persians, Uighurs, Buddhists, Manichees and Muslims. It’s also a wine-growing region and a place suited to prayer and contemplation. Beyond the oasis are two ancient city ruins. A grand modern museum in the city center charts their history. But anyone who enters must show an ID and there’s a barbed wire fence outside. A dozen surveillance cameras watch the surrounding park, complete with pond and playground.

The museum’s security guards wear helmets and flak jackets. Next to the baggage scanners at the entrance are protective shields used by police for crowd control. It “can all be purchased,” says an assistant in the museum shop. “On the other side of the street.”

Indeed, there is a store selling security equipment just opposite the museum: Helmets and bayonets, surveillance electronics, 12-packs of batons and, above all, protective vests. “300 yuan each,” says a salesperson. That’s about 40 euros. “But they only help against stab wounds. We’ve got bulletproof vests too, but they’re much more expensive. Do you have the paperwork?”


Map of the Xinhiang region


All this gear is intended for use by security personnel protecting stores, restaurants, museums, hospitals and hotels. Their operators are obligated to ramp up security measures. “There’s just been a new directive,” says one hotel manager in Turpan, holding up a stamped piece of paper. Guests must show IDs when they check in and every time they re-enter — however often they leave and return. More security staff also have to be employed. In Xinjiang, these tightened security measures are designed not only to make the region a safer place but also to create jobs.

“There are 30 men in each bunker,” says a Uighur with suppressed anger as he passes one of the new police stations. “Thirty men, 30 breakfasts, 30 lunches and dinners. Every day. What for? Who’s paying for everything?”

Hotan: ‘Sent To School’

Hotan, a city of 300,000 people, is an oasis in the southwestern fringe of the Taklamakan desert. Attacks have been common there and surveillance is therefore especially prevalent.

When DER SPIEGEL visited Hotan in 2014, it was still possible to meet with a man who told us about the Chinese government’s harsh measures in the surrounding towns. Such a meeting would be out of the question today, the man now informs us through a messaging app. It’s not even possible to drive from one town to another without written permission, much less meet with a foreigner. “Maybe in a few years,” he writes, adding: “Delete this conversation from your phone immediately. Delete everything that could be suspicious.”

There is a modern shopping center at the edge of the city, though barely one in five stores is still open. Most of the others were closed recently due to “security and stability measures,” according to the official seals adhered to the doors. “Everyone was sent to school,” one passerby says quietly while looking around.

Qu xuexi,” meaning to go or be sent to study, is one of the most common expressions in Xinjiang these days. It is a euphemism for having been taken away and not having been seen or heard from since. The “schools” are re-education centers in which the detainees are being forced to take courses in Chinese and patriotism, without any indictment, due process or a fair hearing.

More than half the people we met along the way during our journey spoke of family members or acquaintances who were “sent to school.” One driver in Hotan talked about his 72-year-old grandfather. A person in Urumqi told the story of his daughter’s professor. An airplane passenger spoke of his best friend.

The stories differ, yet they all contain important parallels. Most of the people affected are men. The arrests usually occur at night or in the early morning. The reasons cited include contacts abroad, too many visits to a mosque or possessing forbidden content on a mobile phone or computer. Relatives of those who are apprehended often don’t hear from them for months. And when they do manage to see them again, it’s never in person but rather via video stream from the prison visitor area.

During a conversation with a rug salesman at the market in Hotan, a woman in a short dress shows up and joins the chat. She says she works for an office nearby, and that she has taken the day off. She offers to translate the conversation with the salesman from Uighur into Chinese. No, she will later say as she walks across the nearly empty market, the store closures have nothing to do with re-education camps. “The employees were sent away for technical training,” she says. Then she politely says goodbye.

A few hours later, we arrive at the train station for the 500-kilometer (311 mile) ride to Kashgar. The station is guarded like a military base. Travelers must pass through three checkpoints and dozens of surveillance cameras to get to the platform.

“Ah,” the ticket inspector says to her colleague as we inquire about our seats. “This is the foreign journalist.” The train is nearly full, with hundreds of passengers aboard. A few compartments away, I later notice the woman in the short dress who offered her services as a translator at the market.

Kashgar: ‘Allergic Images’

The train to Kashgar takes six hours and passes by more oasis towns and settlements, the names of which are synonymous with the Uighur resistance in China: Moyu, Pishan, Shache, Shule. All the train stations are surrounded by checkpoints and barbed wire fences. When the train stops at a platform, the train dispatcher is often accompanied by a police officer with either a billy club or a gun.

Kashgar is more than 2,000 years old. It was one of the most important stations along the old Silk Road. Visitors could once gaze upon one of the best preserved Islamic old cities in central Asia, made almost entirely of mud houses. But the government demolished most of the old buildings and erected a picturesque tourist quarter in its stead.

Unlike in Urumqi and Turpan, most taxis in Kashgar are outfitted with two cameras. One is aimed at the passenger up front while the other points at those in the backseat. “That was imposed over a year ago,” one driver says. “The cameras are directly connected to Public Security. They turn them on and off whenever they want. We have no influence.”

Normal journalistic research in Kashgar is inconceivable. No one wants to talk. A Uighur human rights activist who met up with us four years ago didn’t respond to a single one of our text messages. His phone number is no longer listed. As we later learned, he disappeared months ago. But whether he was thrown into a re-education camp or prison is unknown.

And then the police officers from the beginning of this story show up again and don’t let us out of their sight.

There’s a bit of drama as we buy apricots from a fruit shop. We speak to a woman who’s sitting and reading a book. It’s a language book — the woman is learning to speak Chinese. South of Xinjiang, very few Uighurs above the age of 20 speak Chinese well.

We only exchange a few words with the woman, but as we leave the store, three of our minders, including a woman in a red jacket, walk inside and confront her. I go back and begin to film the scene with my phone. Surprised, the government officials stop the conversation, pretend to be shopping and hide their faces.

An hour later, a police officer flanked by several government officials approaches us. The woman in the red coat is with them. She’s a tourist, the officials claim, and she just learned that she was filmed without her permission. According to Chinese law, the footage must now be deleted. The officer escorts us to a police station, where he confiscates the phone and not only deletes the clip from the fruit stand, but also other clips in which our government minders are recognizable. One of the officials warns us against taking any more such “allergic images.” We are then allowed to go.

The surveillance infrastructure in Kashgar is state of the art, but the Chinese government is already working on the next level of control. It wants to introduce a “social credit system” that rates the “trustworthiness” of each citizen, to reward loyalty and punish bad behavior. While the rollout of this system in the densely populated east has been sluggish and spotty, the Uighurs are evidently already subjected to a similar point-based system. This system primarily involves details that could be interesting to the police.

Every family begins with 100 points, one person affected by the system tells us. But anyone with contacts or relatives abroad, especially in Islamic countries like Turkey, Egypt or Malaysia, is punished by losing points. A person with fewer than 60 points is in danger. One wrong word, a prayer or one telephone call too many and they could be sent to “school” in no time.





First published: 02 August 2018

Funding information
National Natural Science Foundation of China, Grant Number: 61562093, 61772575; China Education & Research Network Innovation Project, Grant Numbers: NGII20170419, NGII20170631



The salient facial feature discovery is one of the important research tasks in ethnical group face recognition. In this paper, we first construct an ethnical group face dataset including Chinese Uyghur, Tibetan, and Korean. Then, we show that the effective sparse sensing approach to general face recognition is not working anymore for ethnical group facial recognition if the features based on whole face image are used. This is partially due to a fact that each ethnical group may have its own characteristics manifesting only in specified face regions. Therefore, we will analyze the particularity of three ethnical groups and aim to find the common characterizations in some local regions for the three ethnical groups. For this purpose, we first use the facial landmark detector STASM to find some important landmarks in a face image, then, we use the well‐known data mining technique, the mRMR algorithm, to select the salient geometric length features based on all possible lines connected by any two landmarks. Second, based on these selected salient features, we construct three “T” regions in a face image for ethnical feature representation and prove them to be effective areas for ethnicity recognition. Finally, some extensive experiments are conducted and the results reveal that the proposed “T” regions with extracted features are quite effective for ethnical group facial recognition when the L2‐norm is adopted using the sparse sensing approach. In comparison to face recognition, the proposed three “T” regions are evaluated on the olivetti research laboratory face dataset, and the results show that the constructed “T” regions for ethnicity recognition are not suitable for general face recognition.

This article is categorized under:

  • Algorithmic Development > Structure Discovery
  • Algorithmic Development > Biological Data Mining
  • Fundamental Concepts of Data and Knowledge > Knowledge Representation
  • Technologies > Classification



The analysis of race, nation, and ethnical groups based on facial images is a popular topic recently in face recognition community (Fu, He, & Hou, 2014). With rapid advance of people globalization, face recognition has great application potential in border control, customs check, and public security. Meanwhile, it is also an important research branch in physical anthropology. Usually, facial features are influenced by gene, environment, society, and other factors comprehensively. However, the gene of one ethnical group is hardly unique and it may include various gene fragments from some other ethnical groups. Hence, it may lead to the similarities of facial features among several ethnicities (Jianwen, Lihua, Lilongguang, & Shourong, 2010). Therefore, it is significant to analyze facial attributes for different ethnicities in computationally artificial intelligence. This work is also helpful to the research in anthropology as it may indicate the facial features evolution (Cunrui, Qingling, Xiaodong, Yuangang, & Zedong, 2018).

This paper focuses on the analysis of some Chinese ethnical groups. First, it is necessary for us to differentiate three definitions, namely race, nation, and ethnicity (Wade, 2007). Race is a concept which is formed based on the differences from physical structures such as skin, hair, and and so on, while nation is a social‐oriented concept which refers to a community based on economics, language, and culture of a given area. Ethnicity describes a group of people, who have similar gene, culture, and language in geologically close regions. One can find that race and ethnicity are closely related though they have differences. For example, Chinese includes ethnical groups such as Han, Korean, Jing, Mongolian, Tibetan, Qiang, Miao, Turkic, Jurchen, and so on (Shiyuan, 2002). Based on homologous gene, ethnicities are steady groups and their facial features are regular and exhibit certain patterns. Although race and ethnicity have close relationship, the analysis of facial features among ethnicities is more difficult than that of race as the discrimination of facial features from different ethnicities is more difficult than that from different races (Fu et al., 2014).

Also, in cognitive process for a human face, human brains receive ethnicity or race information prior to age, gender, and expression. As shown in Figure 1, the information of ethnicity or race is processed in 80–120 ms, and the rest features, such as age and gender, are then gradually perceived later (Ito & Bartholow, 2009). This implies that race or ethnicity information is very important in face recognition.


Figure 1 — The order of attribute identification in face recognition


In recent years, the sparse representation (SR) has broad applications in face recognition, expression recognition, and age estimation (Ortiz, Wright, & Shah, 2013; Ptucha, Tsagkatakis, & Savakis, 2011; Sun, Wang, & Tang, 2015; Wagner et al., 2012), but rarely used in ethnical group facial analysis (Fu et al., 2014). Although the SR has high effectiveness in general for face recognition, it is not effective in ethnicity recognition with the features from the whole face image, as demonstrated in this paper, especially when the sample size of each ethnicity is small. We believe this phenomena is due to a fact that the significant facial features for each ethnicity are only located in some typical regions on a face image and the features from other regions will reduce the discriminative capability for ethnical group recognition. Thus, we need to find some salient regions for these corresponding features and discover the effective facial features for ethnicity recognition.



The past decade has witnessed the increasing popularity of facial ethnical recognition. Many researches have been conducted for extracting ethnical facial features using various approaches such as geometrical feature, holistic feature, local feature, and fusion features. Chan and Bledsoe (1965) analyzed the facial features of the White by using the distance and ratio of facial geometrical features. According to geometrical relationship of eyes, mouth, and underjaw, Kanade (1977) matched face images in a dataset constructed by himself. Brunelli and Poggio (1993) measured face similarity using facial geometrical features, which include nose length, mouth width, and underjaw shape, and the results indicated that geometrical features could be used to identify ethnical groups quite well. Brooks and Gwinn (2010) analyzed the differences between the White and Black using the skin color. According to their proposed skin color model, Gwinn extracted the facial features from Asian and European. Akbari and Mozaffari (Mar. 2012) explored the relations of facial skin color using south Indian, Australian, and African. Anzures, Pascalis, Quinn, Slater, and Lee (2011) confirmed that skin color was very sensitive to illumination, so that the skin color was usually fused in combined features to classify people primarily. Since Turk and Pentland (1991) proposed principal component analysis (PCA) in facial feature analysis including eyes, nose, and mouth successfully, PCA has been a popular method in face recognition. Based on PCA, Levine (1996) conducted facial feature extraction between Burman and non‐Burman. Awwad, Ahmad, and Salameh (2013) accomplished facial features analysis for Arabian, Asian, and Caucasian. Based on scale, illumination and pose, Yan and Zhang (2009) used PCA to analyze the facial features on CMU and UCSD databases. Recently, many deep neural network methods are also used for face analysis and recognition (Chen, Zhang, Dong, Le, & Rao, 2017; Luan et al., 2018; Trigeorgis, Snape, Kokkinos, & Zafeiriou, 2017; Zhang, Song, & Qi, 2017). Srinivas et al. (2017) focused on predicting ethnicity using a convolutional neural network (CNN) with the Wild East Asian Face Dataset.

Local features can reduce the influence of illumination and obstacle occlusion, which are usually performing better than holistic features. For example, wavelet and local binary pattern (LBP) had shown their effectiveness on FERET database (Kumar, Berg, Belhumeur, & Nayar, 2011; Salah, Du, & Al‐Jawad, 2013). In addition, Fu, Yang, and Hou (2011) analyzed facial expression using embedded topographic independent component analysis (TICA), the results showed the advantages of local features. However, the combined features which usually include skin color features, local wavelets features, and holistic features were used in practice instead of a single type of facial features. Ding, Huang, Wang, and Chen (2013) described face representations using texture and geometrical shape. Previously, we also combined several different geometric features to represent ethnical groups, such as length, angular, and proportion features (Li et al., 2017). Also the semantic descriptions for ethnical groups were constructed based on Axiomatic Fuzzy Set (AFS) theory, and the manifolds of ethnical groups were learned in our recent study (Duan, Li, Wang, Zhang, & Liu, 2016; Wang, Duan, Liu, Wang, & Li, 2016).

SR has been intensively used in the field of face recognition, expression analysis, age estimation, and facial image super‐resolution (Dian, Fang, & Li, 2017). Wright, Yang, Ganesh, Sastry, and Ma (2009) proposed sparse representation‐based classification (ESRC) approach and brought the SR into face recognition. It assumed that a face image could be viewed as a sparse linear representation of other face images for the same person. Aharon, Elad, and Bruckstein (2006) applied the ESRC approach directly for occluded facial expression recognition. The performance is not as good as expected due to the fact that the identity information of human face is more obvious than that of expression, which implies that the features of identity would affect facial expression recognition severely. Recently, the SR has been extended to some recognition tasks with small sample size. Mairal, Leordeanu, Bach, Hebert, and Ponce (2008) proposed the extended ESRC approach, which has refined SR by adding general learning in the framework of ESRC. This method improved the performance for small sample size face recognition problem and single sample based face recognition problem by unitizing the information extracted from other datasets. Yang, Zhang, Yang and Zhang (2010) proposed the sparse variation dictionary learning (SVDL) approach, in which one could obtain the projection matrix according to a training set. The SVDL was then embedded in ESRC to conduct face recognition. However, SVDL needs plenty of training data which contained all type of images for each class to learn an effective dictionary. Yang, Zhang, Yang, and Niu (2007) proposed sparse illumination learning and transfer (SILT) approach. This approach could match a few targets for obtaining information of face images with different illuminations. The methods mentioned above can improve face recognition performance to different extent, and also have significant achievement in solving small sample size problem in face recognition. In this paper, we aim to use the SR approach to solve the ethnical group recognition with extracted regional features via data mining.



In order to investigate the ethnicity description and recognition, we collected a dataset including facial images of different ethnical students on campus in Dalian Minzu University, whose ages are ranging from 18 to 22 years old. The database includes three ethnicities, namely Korean, Tibetan, and Uyghur. The students of the three ethnicities are from the regions inhabited by the corresponding ethnical groups, as shown in Figure 2. For each ethnicity, 100 students are selected and their facial images are captured. The capture environment and setup are illustrated in Figure 3, in which we have three cameras, three lights with one person sitting in the center. The images of several participants are shown in Figure 4. Remember only the frontal images are used in this paper though we have collected images with different poses and expressions.


Figure 2 — The living area distribution of three ethnicities


Figure 3 — Data capture environment


Figure 4 — A part of face dataset



Due to the variations in pose, illumination, and camera parameters, it is necessary to align the images before further processing. This mainly involves face alignment and illumination normalization. The aim of face alignment is to correct face pose and resize the face resolution. The details are shown as follows:

  • The coordinates of eyes are obtained automatically by eye detector, and the coordinates of two eye corners are denoted by E l and E r.
  • The face image is rotated to make the line segment which connecting E l and E r to be horizontal.
  • The facial area is cropped out according to the ratio of eye separation and the the rest of face.
  • The cropped face images are resized to a given resolution.

As the skin color and texture of different ethnicities vary a lot, due to influence rendered by gene or environment, we conduct the illumination normalization in face image preprocessing stage. However, illumination normalization will affect the skin color. According to literature (Brooks & Gwinn, 2010), the skin color has a poor correlation with facial ethnic attributes. Hence, illumination normalization is implemented and the skin color change is ignored in our study.

Many methods have been proposed to deal with illumination variations (Biglari, Mirzaei, & Ebrahimpour‐Komeh, 2013) such as single scale retinex (SSR), multiscale retinex (MSR), and homomorphic filtering (HOMO). In this paper, the SSR is used to normalize the illumination variations for simplicity. Suppose that the light is smoothly distributed over space, the brightness of object depends on the lighting of environment and reflection of objective surface, as shown in formula (1),

urn:x-wiley:19424787:media:widm1278:widm1278-math-0001  (1)

where S(x, y) is the facial image captured by camera, L(x, y) indicates component of lighting, and R(x, y) represents reflection components of object. In order to separate reflection components and lighting components, logarithm operation is used as follows:

urn:x-wiley:19424787:media:widm1278:widm1278-math-0002  (2)

where R(x, y) is corresponding to the high‐frequency components of image, L(x, y) represents the low‐frequency components of image. In order to obtain R(x, y), the Gaussian filter (Hyvrinen, Hoyer, & Oja, 1999) is then applied to estimate L(x, y) as follows.

urn:x-wiley:19424787:media:widm1278:widm1278-math-0003  (3)

where G(x, y) is the Gaussian function with urn:x-wiley:19424787:media:widm1278:widm1278-math-0004; c is the scale of Gaussian function; x is the size of Gaussian kernel; y is standard deviation of Gaussian distribution; K is a constant, and it satisfies ∬G(x, y)dxdy = 1.

As shown in Figure 5, the experimental results demonstrate that SSR not only has good performance in illumination normalization, but also has a quick computational speed.


Figure 5 — The results of face image using single scale retinex



5.1 The kNN‐based fast sparse sensing for ethnicity recognition

The SR for facial ethnicity recognition contains two steps. First, the K‐nearest neighbors of a sample are selected from the whole training set for each group. Second, the sample is described and catergrized by the selected K‐nearest neighbors via SR. The testing sample is described as a linear combination using its K‐nearest neighbors (Waqas, Yi, & Zhang, 2013).

The proposed fast SR algorithm consists of three steps: K‐nearest neighbors identification, linear representation, and classification. In K‐nearest neighbors identification, the K‐nearest neighbors are identified and the corresponding labels are recorded. If a training sample belongs to jth (j = 1, 2, ⋯, L) class, j is taken as the label. Suppose {x1, ⋯, x K} are the K‐nearest neighbors of a testing sample y, their labels could form a new set C = {c1, c2, ⋯, c d}. The number of elements in this set is less than or equal to L or K. That is to say, C is a subset of {1, 2, ⋯, L}.

The testing sample y could be represented as a linear combination of the K‐nearest neighbors

urn:x-wiley:19424787:media:widm1278:widm1278-math-0005 (4)

where a i(i = 1, 2, ⋯, K) are the coefficients. The formula (4) can be rewritten as follows:

urn:x-wiley:19424787:media:widm1278:widm1278-math-0006  (5)

where A = [a1, ⋯, a K] T, X = [x1, ⋯, x K]. Our aim is to solve the minimum error between XA and y, subject to that the norm of A must be minimum. This optimization problem can be described by a Lagrangian function,

urn:x-wiley:19424787:media:widm1278:widm1278-math-0007  (6)

where μ is a positive constant. According to Lagrangian method, A should satisfy that urn:x-wiley:19424787:media:widm1278:widm1278-math-0008. Therefore, the optimal solution could be obtained as follows:

urn:x-wiley:19424787:media:widm1278:widm1278-math-0009  (7)

where I is an identity matrix.The class label of a testing sample will be estimated according to its K‐nearest neighbors’s weight contribution in the SR (Wang et al., 2016). Specifically, in K‐nearest neighbors of a testing sample, the subset {x s, ⋯, x t} belongs to rth(rC) class, the contribution of rth class is described as follows:

urn:x-wiley:19424787:media:widm1278:widm1278-math-0010  (8)

The error between g r and the testing sample is given in formula (9).

urn:x-wiley:19424787:media:widm1278:widm1278-math-0011  (9)

The smaller of the value of e r = ||y − g r||2, implies the greater of influence on the r class. The testing sample y is then classified as the class which has the greatest contribution. In addition, if all K‐nearest neighbors are not from rth class, r does not belong to C. Hence, the SR will not classify the sample y as the rth class. Two kinds of measurement similarity are usually used in the SR. One is Euclidean distance (Aharon et al., 2006), the other is cosine measure,

urn:x-wiley:19424787:media:widm1278:widm1278-math-0012  (10)

where s(y, x i) represents the similarity between y and ith neighbor.

5.2 Holistic ethnical facial features based on SR

In this paper, the “O” region represents the whole face. We first implement SR based on holistic facial features, which implies that the whole image of a testing sample y is approximated by a linear combination of all the training images. The class label of a testing sample y is then assigned based on the difference between y and the weighted combination of the samples from each class. Let A = (A1, A2, ⋯, An) denote n training samples, the testing sample y can be approximated as a linear combination of all training samples

urn:x-wiley:19424787:media:widm1278:widm1278-math-0013  (11)

Without loss of generality, the formula is expressed as follows:


where β = (β1, β2, ⋯, βn)T, A = (A1, A2, ⋯, An).

If AT A is nonsingular, the coefficients of β could be obtained by β = (AT A)−1 AT y. Otherwise, if AT A is singular, β could be calculated by β = (AT A + γI)−1 AT y, where γ is a small positive number, and I is a unit matrix.

It can be seen from formula (11) that each training sample has its contribution to the representation of the testing sample, and the contribution of ith training sample is βi Ai. Suppose the training samples from kth class are As, ⋯, At, and the total contribution of these samples to the testing sample y is denoted by gk = βs As + ⋯ + βt At, the error of the SR could be calculated by ek = ||y − gk||2. The smaller error value implies that the contribution is greater from the samples of kth class.

Now, we conduct a simple experiment for ethnicity recognition based on the holistic facial features on the captured dataset in this paper and the experimental results are shown in Table 1, the accuracy rate of ethnicity recognition is only 45% with 90% for training and 10% for testing on 10‐fold experiments. It can be seen that the ethnicity recognition accuracy based on holistic facial features is quite low. In fact, the ethnic face is represented by sparse combination of various faces. However, one particular problem of ethnicity recognition is that the ethnic attributes come from various individuals but the facial attributes of individuals from different ethnicities may have significant contributions. In addition, holistic face features may contain insensitive features to ethnicity classification, since the facial ethnical differences are mainly conveyed by local features. Hence, it is important for us to figure out the local facial regions that are related to ethnic differences, and investigate whether the sparsity of such local features is useful for facial feature representation in terms of ethnicity recognition. We will investigate local feature extraction issue via data mining approach in next section.

Table 1. The ethnicity recognition based on holistic features
The ethnicity recognition based on holistic features

5.3 Salient ethnic facial region extraction

In this section, some salient ethnic facial regions will be investigated based on the three ethnicities. Since the geometric features are often used in anthropometry, this work also analyzes salient ethnic facial regions according to geometrical relationship of key points based on facial components. Here, we use the facial landmark detector STASM (Milborrow & Nicolls, 2014) to extract 77 landmarks as shown in Figure 6.


Figure 6 — Landmarks obtained using STASM


Based on these 77 landmarks, we can construct 2,926 facial features by connecting any two landmarks. Considering the redundancy and relevance of the obtained line features, the well‐known data mining technique, the minimal‐redundancy‐maximal‐relevance (mRMR) (Ding & Peng, 2005; Peng, Long, & Ding, 2005) feature selection method, is applied to select the most salient features. According to mutual information, the mRMR aims to select the significant features based on the minimal redundancy and maximal relevance, using the Equations 12 and 13,

urn:x-wiley:19424787:media:widm1278:widm1278-math-0015  (12)
urn:x-wiley:19424787:media:widm1278:widm1278-math-0016   (13)

where F is facial geometrical feature subset, c is class label of ethnicities, f i is ith feature of F. I(f i, c) is mutual information between feature f i and class c, and I(f i, f j) indicates mutual information between f i and f j. The mutual information is calculated by Equation 14, and the mRMR selection criterion is achieved by the Equation 15.

urn:x-wiley:19424787:media:widm1278:widm1278-math-0017  (14)
urn:x-wiley:19424787:media:widm1278:widm1278-math-0018   (15)

Based on these 2,926 facial features, 195 salient length features are then selected out to represent the ethnic attributes of the three ethnicities using the mRMR approach. These features are divided into four parts, and then compared with anthropological features (Farkas, 1994). As shown in Figure 7, these four parts of the features are plotted on facial images. Figures 7a, b, c, and d show 19, 37, 63, and 65 length features, respectively. One can see that the best weights of features focus on nose, eyes, and eyebrows and these feature regions together form a “T” region, which can be seen clearly in Figure 7a and c. With the weights decreasing, the important region would extend to mouth area gradually. This observation shows that this “T” region is ethnic salient as demonstrated in next section.

Figure 7 — The various weight of length facial features

5.4 Local facial feature based ethnicity SR

From analysis in last section, the ethnic‐salient “T” regions are first identified according to analysis of facial geometrical features, and the facial “T” regions are then used to recognize the ethnicity. As the shape of “T” regions are different in order to deal with various situations as described below, we propose three types of “T” regions, denoted as “T1,” “T2,” and “T3,” which contain different facial components. As shown in Figure 8, “T1” includes eyes and nose, “T2” contains eyebrows, eyes, and nose, and “T3” contains eyebrows, eyes, nose, and mouth. Furthermore, the images of “T” regions are encoded according to zigzag rule for feature extraction in ethnicity recognition. “O” region represents the whole face image as explained below in Figure 9.


Figure 8 — Facial feature region of various weights
Figure 9 — The image coding of “T” region

In the following analysis, the feature vector of “T” region is extracted to represent ethnic attributes. The K‐nearest neighbors of a testing sample are selected based on the features from the corresponding “T” regions. The SR approach mentioned in previous section is then implemented to describe ethnicity attributes, which only locate in these “T” regions. The detailed algorithm can be described as follows:

  • The “T” regions are identified based on landmarks obtained by STASM.
  • The facial images are divided into training set X = [x1, x2, ⋯, x m], and testing set Y = [y1, y2, ⋯, y n], where x i and y i are the feature vectors extracted from the corresponding “T” regions.
  • The K‐nearest neighbors of each testing sample are selected, and the training labels are recorded.
  • The testing sample yY is represented by a linear combination
urn:x-wiley:19424787:media:widm1278:widm1278-math-0019  (16)
  • According to Lagrange optimization, the problem (16) could be solved, the optimal solution is given by:
urn:x-wiley:19424787:media:widm1278:widm1278-math-0020   (17)
  • The contribution of every class could be calculated:
urn:x-wiley:19424787:media:widm1278:widm1278-math-0021  (18)
  • The error between y and g r could be obtained, and the class label of y is then identified according to the error e r.
urn:x-wiley:19424787:media:widm1278:widm1278-math-0022  (19)

where we can select different norms in (19).

In summary, In order to represent each ethnical group effectively, we have used the STASM facial landmark detector to extract 77 landmarks in each facial image. Then, we construct 2,926 geometrical facial features. As the number of these features is too large, we used the data mining approach mRMR to select some salient geometrical features for these three ethnical groups and then 195 salient features are selected. One can find that these salient features are mainly located in a “T” region and then three types of “T” regions are constructed. We believe the features in these “T” regions are more important for ethnical group recognition. In next section, we demonstrate the effectiveness of the proposed framework.



In this section, we conduct several experiments on the face images of Uygur, Tibetan, and Korean, and four types regions, that is, “0,” “T1,” “T2,” and “T3,” are established to extract ethnic salient features via using the data mining technique mRMR. The captured face images are first preprocessed, in which the faces are aligned and the illuminations are normalized. The effectiveness of the extracted features is then verified by using several different norms.

The performances of ethnicity recognition models on different “T” regions are evaluated by several different criteria, which include the true positive rate (TPR), false positive rate (FPR), Precision, Recall, and F‐measure defined in (Anselmo, 1991; Bouckaert et al., 2010; Han, Pei, & Kamber, 2011). Next, we first conduct experiments on different “T” regions and validate the effectiveness of the proposed approach.

6.1 The effectiveness of the three “T” regions

In this section, the coefficient number K is set to be 90, and the L2 norm is applied in SR. Table 2 lists the results obtained based on different types of “T” regions. It can be seen that the result in the region of “T3” is the best among all regions. It indicates that the “T” region surrounded by eyes, eyebrows, nose, and mouth is more effective for ethnicity recognition than other “T” regions. Meanwhile, the results show that the region O is the worst in all regions, which is because the holistic facial images contain too much identity information rather than ethnic group features. Therefore, the SR based on local features is an effective approach to solve facial ethnic recognition. We believe the core contribution of “T” region via data mining plays an important role.

Table 2. The results based on various “T” regions
FPR, false positive rate; TPR, true positive rate.


6.2 Parameter selection

In order to study the influence of norms and K‐neighbors, a series of different norms and K‐neighbors are selected to identify ethnicities, and the recognition performances are compared based on the features from different facial regions. Specifically, the norms of L0, L1, and L2 are adopted to evaluate recognition performance and the accuracy curves are plotted in Figure 10, Figure 11, and Figure 12, respectively.


Figure 10 — The accuracy based on L0 norm


Figure 11 — The accuracy based on L1 norm


Figure 12 — The accuracy based on L2 norm


Figure 10 shows the ethnicity recognition accuracy when the performance is evaluated by using L0 norm. It can be seen that the best accuracy is achieved when the number of neighbors K equals to 77 and the features are extracted from “T3” region.

The results obtained based on the L1 norm are shown in Figure 11. The recognition rate approaches to the peak based on the features from “T3” region when the neighbor number is 50. With the of neighbor number increasing from 15 to 50, the accuracy achieved based on “T3” region increases gradually with fluctuations. It suggests that “T3” region is the salient region for ethnic feature extraction and ethnicity recognition.

Figure 12 presents the recognition results using the L2 norm. It can be seen that the best recognition performance is achieved based on the features extracted from “T3” region when neighbor number K is 80. Compared with L0 and L1 norm shown in Figure 10 and Figure 11, the highest accuracy is achieved by using L2 norm, which reveals that L2 norm is more appropriate than the other two types of norms in facial ethnicity recognition. In addition, the experimental results show that the performance obtained based on the features from “T3” region is better than that of “T1” and “T2,” which means that identifying ethnic salient region can improve the recognition rate significantly.

In summary, one can see that the proposed “T3” region is the most effective region for ethnicity recognition in combination with the L2 norm using the SR. Next, we will develop a software platform for the visualization of ethnic facial feature description.

6.3 Facial ethnic feature description

Based on previous analysis, this work attempts to describe the ethnic attributes according to the contribution of testing samples. As shown in Figure 13, a facial ethnicity evaluation system is constructed based on the SR coefficients. The k‐nearest neighbors of a testing image on the left are determined based on the feature vector extracted from the “T” region. The SR coefficient (coe), the distance from the testing image to its k‐nearest neighbors (dis), and the ethnicity identity of the testing image (type) are then obtained accordingly.


Figure 13 — The software for face ethnic analysis


The error distance (err) from the testing sample to its k‐nearest neighbors could serve as an important reference for facial ethnic description. As illustrated in Figure 14, the error distance err of a testing sample to the ethnic of Uyghur male is 0.01992, which means the most possible ethnic category of this sample is Uyghur. It can be seen from Figure 13, the k‐nearest neighbors of this testing sample belong to several different ethnicities, its ethnicity could be estimated more precisely based on the error distance err. Therefore, the ethnicity recognition depends on ethnic features in the constructed “T” region. It should be reminded that the error distance in the software platform is normalized for easy use.


Figure 14 — The results of classifiers


6.4 The investigation of the “T” region for face recognition

In previous section, one can see that the facial “T” region has shown its effectiveness in ethnicity recognition. Thus, it is straightforward to ask whether it is also useful for face recognition. In order to answer this question, some experiments of face recognition are conducted on olivetti research laboratory (ORL) database Figure 15 (Samaria, Harter & Harter, 1994). ORL database includes 400 face images of 40 persons with minor pose variations, and has been used for face recognition algorithm evaluation for decades. Since it is lack of pattern variation, the recognition rates for many face recognition systems have exceeded 90%.


Figure 15 — Olivetti research laboratory face dataset


The facial images of ORL database are divided into training and testing set, and the feature vectors are extracted from “T3” and “O” regions separately. The fast sparse classification based on k‐nearest neighbors is also used to perform face recognition, and the results are shown in Figure 16 and Table 3. It can be seen clearly that recognition rate obtained based on holistic face (“O” region) is much better than that obtained based on local region (“T3” region). When k = 90, the recognition rate based on “T3” region is only 63% but the accuracy based on holistic face has reached 90%. In fact, the performance achieved based on “T3” region never exceeds 70%, no matter how many neighbors are taken into consideration.


Figure 16 — Use of the olivetti research laboratory database for testing


Table 3. Different T‐zone recognition results for olivetti research laboratory datasets


FPR, false positive rate; TPR, true positive rate.


The experimental results indicate that the constructed “T” region is only suitable for ethnicity identification and not suitable for face recognition. This is mainly caused by the differences in samples referred in the SR. The referred samples of ethnicity are consisting of different individuals, while the referred samples of face recognition are from one individual with different poses and expressions. Moreover, the ethnic salient information concentrates in the “T” regions, but the information enclosed in “T” regions is not enough for general face recognition. Actually, the facial features extracted from the “T” regions are more suitable for ethnicity recognition since the unrelated information has been filtered out.



This paper aims to extract salient features via data mining for ethnicity recognition. First, the features extracted from holistic facial images are utilized for ethnicity recognition, and the recognition rate is quite low. This is because the facial ethnic features are different from the features extracted for face recognition. Consequently, this work continues to extract salient regions for ethnicity recognition. For such purpose, this work detects 77 facial landmarks to construct features for ethnicity representation according to anthropometry. The distance between each pair of landmarks is used to form a feature set, and 2,926 length features are produced for ethnical group description and then 199 features are selected after mRMR feature selection. Second, based on the selected features via using the data mining technique mRMR, three “T” regions including the most salient ethnic features are constructed. The experiments are conducted based on the features extracted from holistic face, “T1,” “T2,” and “T3” regions, the results show that the features from “T3” region would achieve the best performance when L2 norm is adopted. Third, in order to verify the suitability of “T” region in face recognition, the facial features are extracted from “T” region on ORL dataset, and the fast sparse classification approach based on k‐nearest neighbors is used to conduct face recognition, and the results suggest that the proposed “T” region is not suitable for face recognition.

The contributions of this paper are as follows: (a) The holistic facial features are proved to be ineffective for ethnicity analysis and recognition based on sparse sensing recognition. (b) The ethnic salient “T” region is proposed for ethnic attribute description via data mining technique. (c) The effectiveness of “T” region to ethnicity classification is verified. (d) The application of “T” region is investigated, it is suitable for ethnicity recognition but not for face recognition. In addition, this paper proposes a new approach for extracting facial ethnic features based on sparse description via data mining. The testing samples are sparsely represented and then assigned with its ethnic category accurately even under small sample size circumstance. Meanwhile, a framework for facial features analysis is proposed, that is, a framework for salient area search based on the data‐driven feature selection, which can improve the effectiveness of the attribute discrimination using SR.

In the future, we will use different approaches instead of the SR to investigate this ethnicity recognition problem, which is different from general face recognition as shown in this paper. One possible direction is to extract the geometric features in the identified “T” region and use some deep learning (Pathirage, Li, & Liu, 2017) or stochastic configuration neural networks for classification (Wang & Li, 2017).



We thank the two reviewers and Associate editor for their constructive comments and the quality of this paper is significantly improved after careful revision based on their comments. At same time, this work is supported by the National Natural Science Foundations of China with grant number 61562093 & 61772575 and the China Education & Research Network Innovation projects with grant numbers NGII20170419 & NGII20170631.