The European Association of Biometrics recently hosted a workshop on the proposed EU AI Act. This blog post shares my practical perspectives on technological and socio-technical aspects concerning voice biometrics, based on the presentation I delivered at the workshop.

The proposed EU AI Act takes an innovation encouraging, risk-based approach to regulating AI. This approach categorises AI applications into four risk groups: unacceptable, high, limited and minimal risk. Applications that fall into the unacceptable category will be prohibited to be sold and deployed in the EU. Under limited circumstances biometric systems are considered an unacceptable risk: when used for real-time, remote identification of people in public spaces. This scoping aims to prohibit public (state) surveillance applications and mitigate the risks associated with the use of biometrics by law enforcement. Naturally, this scoping then allows many applications of biometrics for other purposes.

The proposed EU AI Act makes a definitive distinction between biometric verification applications, used for authentication purposes and access control, and biometric identification. The difference between these two applications is that verification conducts a 1:1 comparison to validate if two data samples of biometric data match, while identification conducts a 1:many match, a process similar to looking up an individual in a database. As verification does not compare biometric data against a database of people, it is not considered as posing a surveillance risk. Instead, law makers view the GDPR and consumer protection acts as sufficient for guarding against the mis- and abuse of verification systems. In this article I distinguish between verification and identification technologies, but I refer to both of them as biometrics. Limiting the scope of biometrics to identification only creates an artificial divide that is misleading when considering the nature of the technologies.

Biometrics in the proposed EU AI Act has been primarily considered from the perspective of face recognition technologies. The proposal thus does not consider the specific contexts and associated risks of voice-based systems differently to those of facial recognition systems. This is a concern, as voice-based biometrics are widely used in billions of devices, from smartphones and smartwatches to speakers, TVs, and cameras. In the near future it is highly likely that every digital consumer device will have a microphone, and that every microphone will have the capability of supporting voice biometrics in order to control access to the services that the device enables. Moreover, call-centers offering public and private services use voice biometrics for a large variety of applications, ranging from serving personalised tax information to proof-of-life verification of pensioners, workforce monitoring and customer experience enhancement. In the remainder of the article I will examine how the perspectives on biometrics in the proposed EU AI Act hold out against the reality of, and emerging trends in, voice-based biometric systems. To frame a discussion on voice biometrics and the proposed Act, we first need some background knowledge on speech signals.

The building blocks of voice biometrics

In our day-to-day lives, we think of speech signals mostly in relation to the information that we convey with words through our voice. However, our speech carries other personal information, such as information about our emotional state, intent, wakefulness and health. In the case of biometrics, our voice can be used to identify us. In the technical domain, the technologies that do this are called speaker verification and speaker identification. Speaker verification compares a stored data record of your voice against a new voice record - the speech you produced to authenticate yourself. This process can be active or passive. Active verification requires you to set a passcode, for example, “my voice is my ID”, that you use repeatedly. Passive verification does not need this and can be used even if you are not aware of it. Whether active or passive, verification databases accumulate a large amount of personal information contained in our speech signals and background audio signals (often thought of as noise).

Despite over 60 years of scientific research on paralinguistics (i.e. the vocal and sometimes non-vocal aspects of spoken communication that do not include words), the proposed EU AI Act does not consider sensitive information contained in background audio data and speech signals a surveillance threat. In the context of the proposed EU AI Act and public surveillance, the collection of voice data however ought to open new questions about surveillance, as biometric applications are not the only mechanism that can be used to surveil citizens. In 2019 Propublica examined the deployment of voice-activated aggression detectors in hundreds of schools, raising concerns that the large-scale public monitoring of voice attributes limits our personal freedom and poses a risk of its own. Propublica found that the surveillance technology does not work as claimed, failing to detect low-pitched and low-volume aggressive behaviour and instead misclassifying high-pitched girls’ laughter as aggression. When the aggression detectors are triggered, they record the conversations that follow the trigger, jeopardising the privacy of playground conversations. Propublica raised the important question whether the unvalidated benefits of the technology justify the deployment of voice-based surveillance in public institutions. Not negating the risks of identification, inferring personal attributes from voice data has significant privacy implications and potential for harm, as the Propublica case demonstrates.

While biometric verification and identification purposes are quite different, the two applications share a technological backbone that makes them very similar. Importantly, they share a common data infrastructure. Once deployed on a cloud-based server, a system that verifies can be viewed as a data collection mechanism for a system that identifies. Any database built for cloud-based verification can thus in future be adapted to develop identification technology. This leads us to a difficult question: is it practically possible to safeguard citizens from surveillance, if most corporate verification and identification applications are permitted, and only public use of identification technologies is prohibited?

Privacy, scope-creep, and consent

Consumer devices present a particular set of concerns. Speaker verification is an essential component of voice-activated services (think smart speakers that you can instruct with a wakeword phrase, like “Hi Siri”, to turn on the lights), as it constitutes an important security mechanism that prevents intruders from accessing your digital services. However, the security benefit of speaker verification can stand in tension with our privacy desires: in order to secure voice-activated services, technology providers need to collect and process your voice data. One approach to mitigating privacy concerns is to store voice data only for short periods of time. However, this does not guarantee that the collected data will not be repurposed, and that the data will be in fact deleted. In certain cases this also erases evidence that citizens and consumers may need to contest decision-making or service quality. To practically prevent the repurposing of verification technology for identification purposes, on-device processing offers another solution. Irrespective of whether we deploy systems with temporary data storage, on-device processing, or alternative approaches for privacy preservation, in the absence of a physical proof, users will ultimately need to trust technology providers that their services comply with their promises.

From consumer products to customer services, voice biometrics remain a mostly hidden technology. “This call is being recorded for quality and training purposes”. You have probably heard this line many times when waiting for a call center agent to assist you. But this raises numerous questions: who is training what? What quality purposes does my voice support? And what voice-based AI is the call center using? Passive verification, identification, emotion recognition, health monitoring or all of them? I am yet to find a call center that lets me opt out of the recording. This makes me wonder, what constitutes informed consent in a system using voice biometrics? Is consenting to use the system to verify my voice the same as consenting to provide my voice data to train it? Is consenting to provide my voice data to train a verification system in the EU the same as consenting to it being used to build identification technology that will be sold outside of the EU (permitted under the proposed EU AI Act)? And if I do not have an expert understanding of the difference between verification and identification, will I be able to tell these purposes apart?

Purely technical approaches are insufficient for offering assurance that speaker verification and identification technologies are only used as intended, and will not be repurposed for other applications. Instead, we need to look to governance mechanisms that ensure transparency and offer avenues for contestation, so that citizens and community-based organisations can hold technology deployers and vendors to account. On a high level, such governance mechanisms should enforce purpose clarity, purpose limitation and prevent repurposing, thus supporting the restrictions on processing special categories of data laid out in the GDPR under Article 9. Without such governance mechanisms, it may be practically impossible to monitor the use and deployment of voice biometrics, rendering the restrictions imposed by the GDPR as morally aspirational and operationally infeasible. Additionally, in multi-use devices that simultaneously provide benign services (e.g. play music) and essential services (e.g. activate emergency response), performance guarantees are needed that assure that services do not discriminate against users based on demographic attributes, and that they meet quality requirements proportionate to the severity of potential harms.

In April this year, Mozilla published it’s recommendations on the proposed EU AI Act. The recommendations call for effective allocation of responsibility along the AI supply chain, making the public AI database a bedrock of transparency and effective oversight, and giving people and communities the means to take action when harmed. In the case for voice biometrics, these recommendations provide mechanisms to support purpose clarity, purpose limitation and the prevention of repurposing.