Humans can communicate a range of nonverbal emotions, from terrified shrieks to exasperated groans. Voice inflections and cues can communicate subtle feelings, from ecstasy to agony, arousal and disgust. Even when simply speaking, the human voice is stuffed with meaning, and a lot of potential value if you're a company collecting personal data.
Now, researchers at the Imperial College London have used AI to mask the emotional cues in users' voices when they're speaking to internet-connected voice assistants. The idea is to put a "layer" between the user and the cloud their data is uploaded to by automatically converting emotional speech into "normal" speech. They recently published their paper "Emotionless: Privacy-Preserving Speech Analysis for Voice Assistants" on the arXiv preprint server.
Our voices can reveal our confidence and stress levels, physical condition, age, gender, and personal traits. This isn't lost on smart speaker makers, and companies such as Amazon are always working to improve the emotion-detecting abilities of AI.
An accurate emotion-detecting AI could pin down people's "personal preferences, and emotional states," said lead researcher Ranya Aloufi, "and may therefore significantly compromise their privacy."
Their method for masking emotion involves collecting speech, analyzing it, and extracting emotional features from the raw signal. Next, an AI program trains on this signal and replaces the emotional indicators in speech, flattening them. Finally, a voice synthesizer re-generates the normalized speech using the AIs outputs, which gets sent to the cloud. The researchers say that this method reduced emotional identification by 96 percent in an experiment, although speech recognition accuracy decreased, with a word error rate of 35 percent.
Understanding emotion is an important part of making a machine seem human, a longtime goal of many AI companies and futurists. Speech Emotion Recognition (SER), as a field, has been around since the late 90s, and is a lot older than the artificially-intelligent speech recognition systems running on Alexa, Siri, or Google Home devices. But emotionality has become a serious goal for AI speech engineers in recent years.
Last year, Huawei announced that it was working on AI that detected emotions, to put in its hugely-popular voice assistance programs. “We think that, in the future, all our end users wish [that] they can interact with the system in the emotional mode,” Felix Zhang, vice president of software engineering at Huawei, told CNBC. “This is the direction we see in the long run.”
And in May, Alexa speech engineers unveiled research on using adversarial networks to discern emotion in Amazon's home voice assistants. Emotion, they wrote, "can aid in health monitoring; it can make conversational-AI systems more engaging; and it can provide implicit customer feedback that could help voice agents like Alexa learn from their mistakes."
These companies are also hoping to use this highly sensitive data to sell you more specifically-targeted advertisements.
Amazon's 2017 patent for emotional speech recognition uses illness as an example: "physical conditions such as sore throats and coughs may be determined based at least in part on a voice input from the user, and emotional conditions such as an excited emotional state or a sad emotional state may be determined based at least in part on voice input from a user," the patent says. "A cough or sniffle, or crying, may indicate that the user has a specific physical or emotional abnormality."
A virtual assistant such as Alexa would then use these cues—combined with your browsing and purchase history, according to the patent—to offer you hyper-specific advertisements for medications or other products.
Unless smart home devices like Alexa or Google Home integrate more privacy protections, there's not much consumers can do to protect themselves against this use of their data, Aloufi said.
"Users should be informed and given awareness on the ways in which these systems will use and analyse their recording so they are able to make informed choices about the placement of these devices in their households, or their interaction with these devices," she said. "There is a dire need to apply these [privacy-protecting] technologies to protect the users’ and their emotional state."