The voice in the machine: how AI vocal analysis could improve our health

Martin Cooper MBCS speaks to Tim Bashford about exciting developments in healthcare technology which may allow the early detection and treatment of respiratory diseases such as COPD through the analysis of AI based vocal analysis.

Why don’t you introduce yourself, your department and your university

We are the Wales Institute for Digital Information (WIDI) — a partnership between the University of Wales Trinity Saint David (UWTSD), University of South Wales (USW) and Digital Health and Care Wales (DHCW). Our mission is to improve the health of Wales through organisational development, training, research and innovation within digital healthcare by working with health boards, Welsh government, and industry. Our staff work across the three organisations, with the collaboration reducing barriers to working together and sharing expertise.

Talk to us about the ‘helicopter project’ — what is the project’s full name and how did it come about? What are its goals?

The overarching piece of work is called ARIA-TRE, created as a collaboration between Respiratory Innovation Wales (RIW) and WIDI. The goal of this project is to analyse voice data to detect respiratory disease using artificial intelligence (AI) and make the resultant analysis available through a Trusted Research Environment. We have developed a significant software component as part of this, named VoxLab, to perform the computational analysis on the recorded voice samples.

The helicopter project itself is named after a speech-breathing task in which we ask the subject to repeat the word ‘helicopter’ as fast as possible for 20 seconds. During the COVID-19 pandemic, we thought about how to measure people’s lung function remotely, even at home. Respiratory clinicians noted that they could often accurately identify issues with patient's lung function when speaking with them on the phone but could not precisely identify what quality of their voice led them to their conclusions. This led us to design a protocol by which respiratory health could be measured using recorded voice.

Who are the people working on the project – what are their specialisms and backgrounds?

On the project team we have: Dr Tim Bashford, WIDI research lead, a computer scientist based at UWTSD; Dr Tom Powell, Head of Innovation at Cwm Taf Morgannwg University Health Board; Dr Biao Zeng, a psychology lecturer and linguistics expert based at USW; Nathan Morgan, a WIDI software engineer based at UWTSD; Hok-Shing Lau, a WIDI artificial intelligence engineer based at UWTSD; Mark Huntly, a WIDI software engineer based at UWTSD; and Adesua Iyenoma, a WIDI data scientist based at UWTSD. Additionally, esiratory science expert Professor Mark Williams was an invaluable contributor to the early-stage development of the protocol, but sadly passed away earlier in the year.

What’s special about the word ‘helicopter’?

The helicopter task sounds simple and straightforward — which is partially by design. The word was selected from a speech therapy word list used to test articulation. Helicopter is a four-syllable word, is not easy to articulate, and offers a certain challenge to breathing, or airflow. Moreover, helicopter is a well-known and somewhat commonly used word. Most people can repeat it fluently without significant cognitive effort; it is not a word that most people will need to memorise. Finally, in the word ‘helicopter’, ‘h’ and ‘p’ are two consonants which are explosive and tax our breathing system. This process of applying strain to the vocal cords permits the VoxLab software to acquire the greatest amount of data from the recorded voice sample.

We are also in the process of conducting validation studies to test alternative words with similar properties, for example for participants who may not have English as their first language.

Can you tell us how the project works under the hood? How does the tool gather data, order it, process and finally come to a recommendation? What are the critical data and computational steps between a patient speaking and the tool making a recommendation?

VoxLab is a software tool designed for the scientific community, utilising AI to interrogate and transform audio files, to provide quantitative output that can be analysed, stored, and shared. Features can be extracted — for example through the removal of noise, and balancing of time-frequency ranges — through a cloud-based pipeline to be analysed. The models implemented will extract the features from input audio files, providing a wealth of information to form a vocal biomarker. The pipeline is a multi-model feature extraction process to feed into the sandboxed lab environment for researchers to visualise and interrogate, with the ability to use the features further to fit ML models that will classify respiratory conditions.

Talk to us about some of the key technologies you’re using to built and deploy.

As a priority, we wanted something secure and portable. By implementing Microsoft Entra ID we were able to leverage OAuth2 out of the box to ensure that all the pages on the platform are secured using the best practices defined by Azure, with considerations of the data held adhering to guidelines specified by the HDR and GDPR. We elected to use a web application to avoid coupling the software to any particular vendor, device or architecture, especially given the long-term ambition to make the tool available in the developing world.

By utilising single-page application technology we achieved a scalable, robust and reactive service. To further ensure portability, several techniques were implemented that are focused on minimising the workload of the client, such as rendering visuals and performing calculations on the cloud, permitting low-powered devices to make use of complex computational workloads. Using cutting-edge MLOps and DevOps paradigms and technologies we can deploy to the cloud rapidly and reliably, allowing for continuous integration and continuous deployment.

Where does AI fit into the process?

AI, or more specifically machine learning (ML), is the foundation to the pipeline that will extract features from the sound recording/sound wave. These ML models are able to provide a speech to text function to create a transcript, segment the sound wave into granular segments such as word, syllable and phoneme level and time align them within the sound wave. Another model further extracts acoustic features and biomarkers, such as frequency, amplitude, and formant as well as prosodic features that can be interrogated by researchers within the sandboxed lab environment. There are a huge number of features which can be extracted.

How do you see the project being used within a care pathway? Will a GP prescribe it or will specialists use it?

Approximately 15 million people in the UK live with a chronic condition, with daily symptoms that can change and evolve over time. Often, they have no cure but can be managed with the right treatment. When speaking to our friends, family, and colleagues, we can often detect subtle changes in the way someone speaks, especially if they have a cold. It can be more subtle, such as repetition of certain words, a style or manner of speech, or sounds made when speaking, or the speed at which speech is delivered.

We know that people with a range of chronic health problems speak differently when their symptoms worsen. These differences are detectable in their speech and breathing. By using AI to analyse regular digital voice recordings we propose to define speech signatures that are indicative of deteriorating health. Early warning could allow patients to seek health care and avoid escalation of the condition, lessening the impact on the patient and the NHS.

For you

Be part of something bigger, join BCS, The Chartered Institute for IT.

We envisage an automated system that can create alerts to family members and healthcare practitioners when an individual’s symptoms with COPD begins to exacerbate, potentially allowing early intervention, promoting better care and preventing hospital admissions. NICE estimated that each exacerbation of COPD results in a direct cost of £1868 for every patient whose symptoms deteriorates to the point they have to be hospitalised. While the primary aim of developing our AI approach is to improve patient care, given the number of individuals identified and the costs associated with COPD care, there is a clear financial benefit to developing solutions that prevent hospitalisation.

Where is the project in terms of its maturity? When do you see it being widely used by NHS Wales? Is there any special approval or certification needed due its medical focus?

The project is still within the R&D stage but it is showing significant promise. We are in the process of collating data sets of recorded voice samples for a range of respiratory diseases to facilitate the training of the ML model.

If somebody would like to learn more about the project, are there any published you can recommend?

Some great resources include the ARIA page within RIW’s website, and the following publications:

Zeng et al. Exploring the acoustic and prosodic features of a lung-function-sensitive repeated-word speech articulation test, Frontiers in Psychology, vol. 14, 2023. doi:10.3389/fpsyg.2023.1167902
Williams, Breathing an Inspired History. London, UK: Reaktion Books, 2021.

Additionally, in June 2024 we are being funded by Wales Innovation Network and setting up a speech-breathing network, which you can find details of on LinkedIn.

Finally, how do you see the project changing the world? How it is a prime example of IT being good for society?

Recent advances in computing power and AI now offer great potential to derive clinically relevant information from existing physiological signals that have previously been overlooked. New innovative digital approaches for non-specialist settings (such as the home and primary care) are envisaged that will provide diagnostic chronic disease insight. There are many benefits to 'closer to the patient' approaches that have the potential to reduce the burden on patients and health services by achieving early intervention in chronic disease trajectories, reducing impact and cost of hospital admissions and through the prevention of disease exacerbations.

Topics

Computing in society COVID-19 coronavirus Health and social care

Interview

Inside The BCS Health and Care Faculty

4 days ago

Article

Learning from Ukraine: how can disinformation be defeated?