From Iljimae, a Korean drama.
“There are two kinds of sword — one to save others and one to kill others.”
The drama series shows us the power of a weapon-master. In the hands of a righteous weapon-master, he saves people from harm. In the hands of a ruthless one, he mows down people with ease. The range and capability of a powerful weapon is what makes it feared; its use is only limited by imagination. The same goes with data — data is an extremely powerful weapon.
One such illustration is through a social experiment in Xinjiang, China.
In Xinjiang, the Chinese government is constantly rolling out technology to acquire data about its citizens to create their digital profiles. The most benign interpretation suggests the use of technology to measure physical and digital observables, and then collate them for other applications such as crime prevention. A few examples of a collection of physical observables and information we could gather:
- The habits of people. We will look out for microscopic observables such as face recognition, the time these faces are spotted at particular locations and the cameras which detect these faces (providing geolocation information). There could be a repeated pattern which would raise the likelihood of a habit. For instance, if a person were to be spotted at a pub every Wednesday evening at about the same time, entering through the front door, but does not emerge from it at all, it could be indicative of a suspicious habit (what would make people enter the front door, but not leave from it?)
- Crowd management. A problem with big events is in management of crowds. Left unmanaged, these lead to poor user experience and security threats. To help us, we shall measure macroscopic observables such as flux (number of people moving through a defined gateway/area/volume per unit time) and crowd levels. If the flux across an area not designated as a chokepoint is too low, it could be an infrastructural constraint that needs fixing.
These applications, by themselves, are not controversial. However, the actions taken upon the data is controversial, especially because such data may not be anonymised. There are use cases for non-anonymised data, such as physical security (identifying anomalous behaviour, and then acting on a security threat pre-emptively). However, the same non-anonymised data can be misused for social engineering of certain types of behaviour. Let us walk through how accumulation of data can become controversial because of the potential stories such data tells us.
Let us assume we have the mobile number of a person, which means we can also trace the ISP of the person’s mobile. Through GPS data, we are able to obtain geolocation information, which means that we can track that person’s whereabouts. Through the camera network, we may be able to infer whether this person carries his or her mobile device wherever he or she goes. Since the government controls the car register, they can also check if the licence plate of the car driven matches the digital records. If there are discrepancies between physical and digital records, it may be an anomaly worth investigating.
From the data plan, we can trace the API calls made to find out if a user is deemed “suspicious”. For instance, he or she could use mobile applications that connects to suspicious applications, or dissident locations. It does not matter, at this point, what the nature of the calls really are; the anomaly suggests further investigation.
The fear of over-collection of data in the hands of the authorities is real. The amount of data in the hands of any particular agency can mean the use for good. In the wrong hands, it can be used for evil. This naturally leads us to the next question: how do data owners, data collectors and their respective entities protect such data? This falls under the subject of information assurance, which will be covered in the next part of the series.