Microsoft Deletes 10 Million Facial Recognition Photos Amid Privacy Concerns

essidsolutions

Microsoft has axedOpens a new window a massive database of facial recognition photos after a press report questioned the ethics of the project and whether it infringed on the privacy of millions of individuals.

Two universities have also suspended their photo databases over concerns that the images have been used without permission of the individuals photographed.

Microsoft’s MS Celeb dataset contains photos of more than 10 million faces scraped from public websites such as Flickr, where photos are posted under a Creative Commons license. That means they can be used without infringing on copyright protections.

MS Celeb contains many images of celebrities, and Microsoft claimed it was the world’s largest public-face dataset — data selected and processed by artificial intelligence software. But the dataset also contains photos of journalists, researchers, and academics who are not famous.

Photo Set Still Exists

Microsoft says it deleted the dataset because the researcher who created it had left the company. But the dataset still exists because many academics and companies have already downloaded it.

The dataset has been used as training data for AI facial recognition systemsOpens a new window , including by military researchers and companies such as IBM, Panasonic and Alibaba. Concerns have been raised that some of these systems could be used for repressive measures.

Two Chinese companies, Sensetime and Megvii, have used the service to supply facial recognition tools to government offices in Xinjiang, where Uighur minorities and other Muslims are tracked and jailed in detention camps.

The Financial Times reportedOpens a new window that Microsoft deleted its database of photos a few days after the paper reported on its activities. Meanwhile, two other databases have also been axed since the report, with Duke and Stanford universities both taking down their facial recognition datasets.

Nervous Over Privacy Issues

Some of the people whose faces were included in the Microsoft dataset were unaware that their photos were being used. Some have questioned whether using people’s photos without permission might contravene Europe’s data protection rules, though Microsoft said it was not aware of any issues with it.

The removal of MS Celeb is significant because it may indicate nervousness by tech companies about privacy issues surrounding facial recognition systems.

Facial recognition systems are already used at airports and by police agencies and security services. Retailers have started to implement them in stores to monitor customer reactions to merchandising and enable instant check-outs. Banks are using them for biometric authentication for transactions.

Amazon, Google and Facebook are investing in the technology, which could have wide applications in the future. Tests already have run on using facial recognition to judge people’s emotional responses to films and experiences. Some tech analysts predict that facial recognition will become a core technology, used to judge whether people are healthy or sick and to see if they are physically fit to handle heavy machinery.

But creating AI systems for facial recognition needs huge amounts of data and millions of photos to train algorithms to identify and differentiate faces.

Risking a Backlash

Tech companies have used photos which are publicly available without getting explicit permission from the individuals concerned. This risks creating a backlash if people discover their photos have been used for state repression, police surveillance or military purposes.

Many photos found on the web are posed and are usually only uploaded when the face is clearly seen. But to train AI software to recognize faces in a variety of situations requires many photos taken “in the wildOpens a new window ” when people are unaware they are being photographed.

People need to be in the shade, wearing hats and glasses or wrapped up in scarves. That is why companies are so eager to scrape social media sites and photo-sharing apps for as many “natural” photos as possible. If they can use 10 or 20 different pictures of the same person, they can help train the algorithm to understand facial differentiation in a variety of settings.

This points to an important issue with the AI future. The terabytes of data needed to train algorithms for AI activity mean tech companies must scour the Internet for photos, social media posts and online activity undertaken by the general population. Much of the training data that fuels the AI revolution will be provided by the public.

But the benefits will go to the tech giants.