Dr. Fasih Haider FBCS, Research Fellow at the Usher Institute, University of Edinburgh and Entrepreneurial Lead at Innovate UK ICURe Program, explains how AI can analyse how we speak to detect early signs of neurological and mental health conditions, and also highlights important considerations around bias, regulation and ethics.

Most of us are now comfortable with the idea that devices can track our steps, monitor our heart rate, or even measure our sleep. These forms of passive health monitoring have become part of everyday life. But what if the same devices could also pick up early signs of neurodegenerative diseases such as Alzheimer’s, Parkinson’s, and cognitive decline not by listening to what you say, but by analysing how you say it?

This is the premise of speech as a digital biomarker, an area of rapid innovation that could transform the way we approach neurological health through remote monitoring and personalised care. Advances in digital technologies are making it possible to detect subtle changes in rhythm, tone, pauses, lexical richness, disfluency and vocal energy that reveal information associated with neurological and mental health. Crucially, this can be done without recording or storing the content of speech, addressing one of the most pressing concerns in digital health privacy.

Speech-based biomarkers are already being trialled in research and early commercial settings, with prototypes moving from clinical validation to real world deployment.

Why speech matters now

The pressures on health systems are immense. In the UK, the number of people living with dementia is projected to surpass 1.6 million by 2040. Parkinson’s disease and other neurodegenerative disorders are rising rapidly as populations age. Existing methods of assessment are costly and resource intensive. Questionnaires are subjective and often miss early signs. Specialist neurological examinations are precise but not scalable. In clinical research, screening and monitoring participants remains one of the most expensive bottlenecks, with failed clinical trials costing billions.

Speech presents a compelling alternative. Unlike invasive tests or clinical monitoring, voice is readily available. Almost everyone has access to a smartphone or computer microphone. A short daily voice diary, or analysis of speech during a routine phone call, could provide real-time insights into neurological and mental health. For patients and carers, this could mean earlier interventions. For pharmaceutical companies, it could accelerate trial recruitment and monitoring. And for health systems, it could mean a more scalable and affordable approach to wellbeing.

Listening without listening

When people hear about voice analysis, their first thought is usually the content of speech. But speech as a digital biomarker works differently. It does not focus on what someone says but instead on how they say it. The relevant signals lie in speech features such as pitch variation, rhythm, pauses, lexical richness, speech rate, vocal intensity, tremor and breathiness.

Parkinson’s disease can cause changes in articulation, pitch range and vocal tremor. Cognitive decline may reveal itself in disrupted rhythm, longer hesitations, disfluency, reduced lexical richness and altered speech patterns. These shifts are subtle and usually imperceptible in casual conversation, but artificial intelligence models can detect them with remarkable sensitivity.

A typical pipeline begins with audio captured through a device microphone. The signal is cleaned and normalised to account for noise, loudness and recording conditions.

Features can be extracted in several ways, for example, by using traditional descriptors like mel-frequency cepstral coefficients (MFCCs) which represent how sound energy is distributed across different frequencies and help capture vocal qualities such as timbre and articulation. Another example involves training neural networks to automatically learn numerical representations of complex speech patterns. This exceeds the capabilities of manually designed features by using multi-layer models trained on large, diverse datasets of recorded and transcribed speech.

Models ranging from traditional machine learning to modern self-supervised deep learning architectures then map these features to health-related outcomes. The result is not a diagnosis in itself, but a probabilistic signal that can highlight risk or track change over time. 
Both traditional machine learning and deep learning are used today. Classical models such as support vector machines or random forests are efficient and easier to interpret, while deep learning models (like CNNs and transformers) achieve higher accuracy with large datasets but require more data and computational power than traditional machine learning.

The technical challenge is not just accurate detection, but interpretation. A system must translate noisy, real-world voice recordings into insights that are meaningful and trustworthy for clinicians, researchers and individuals. That means not only building accurate models but also ensuring they are explainable and resistant to confounding factors like accent, age, gender and background noise.

Skills and knowledge for IT professionals

For BCS members, the appeal of this field is not only its potential impact but also the technical challenges it poses. speech is a complex signal that changes continuously over time but remains roughly stable over short intervals (known as ‘quasi-stationary’ behaviour), requiring advanced skills in speech processing and machine learning. Understanding how to extract useful features from raw data remains crucial, even as end-to-end neural models grow in popularity.

Privacy is another central concern. Because speech is so personal, any monitoring system must be designed with privacy at its core. That means processing data locally on devices wherever possible, ensuring that models can operate without transmitting or storing raw audio. Foundation models are particularly relevant here; once trained, they can be deployed in lightweight, self-contained form directly on a device, extracting only anonymised acoustic representations without sending audio streams to the internet. Federated learning further strengthens this approach by allowing models to improve collaboratively across distributed devices without centralising sensitive information. At the same time, content-agnostic analysis, which focuses on the properties of speech rather than its content, provides an additional safeguard for user privacy.

Bias is a constant risk. Accents, dialects, gender and microphone types can all introduce systematic errors. Without care, systems could perform well for some populations but poorly for others, undermining trust and fairness. Addressing this requires data augmentation, fairness-aware modelling, and techniques that disentangle health signals from irrelevant confounders such as accent, microphone quality, or background noise that can distort model predictions if not controlled.

Finally, practical deployment brings challenges of its own. Unlike retrospective clinical assessments, speech-based monitoring can operate periodically (for example, through short daily recordings) or continuously in the background integrated within the built environment, such as smart homes or ambient health systems, depending on the application and user needs.

Systems must therefore be efficient enough to run on resource-limited devices, yet reliable enough to support clinical or wellbeing use. Edge computing, model compression, and robust cloud–edge hybrid architectures are all part of the solution.

Applications emerging today

Although it is still an emerging field, real-world applications of speech biomarkers are already taking shape. Researchers are developing tasks that detect the earliest vocal changes in Parkinson’s and Alzheimer’s disease. Pharmaceutical companies are piloting voice-based pre-screening tools to accelerate recruitment for clinical trials. Each of these applications brings ethical questions and these debates go beyond the technical and raise important societal issues where IT professionals must contribute to responsible design and governance.

Adoption and regulatory barriers

Despite growing evidence and enthusiasm, the road from laboratory innovation to real-world adoption is rarely straightforward. Speech biomarkers sit at the intersection of health, technology, and data protection, and this makes regulatory approval complex. 

For you

Be part of something bigger, join BCS, The Chartered Institute for IT.

Clinical validation can take years, and healthcare procurement frameworks often move more slowly than technological innovation, creating a gap between what is technically possible and what is clinically adopted. Beyond regulation, adoption depends on trust, from clinicians, patients and the public. Health professionals need confidence in model interpretability and reproducibility; patients need assurance of privacy and ethical use. Without transparent validation pathways and evidence-based standards, even the most promising technology risks remaining in pilot stages rather than achieving large-scale deployment.

The road ahead

Looking forward, several trends are likely to shape the future of speech as a biomarker. Everyday devices such as smartphones, smart watches, and even vehicles may soon offer optional wellbeing checks based on voice. Regulators will need to establish clear frameworks for safety and efficacy, just as they do for traditional medical devices. Systems will become more personalised, tracking an individual’s own baseline over time rather than comparing them to a generic population, making it easier to detect meaningful changes. And hybrid approaches that combine speech with other non-invasive signals such as typing patterns, brain signals or eye movements may provide even stronger insights.

The risks cannot be ignored. False positives could cause unnecessary alarm. Overhyped claims could undermine trust. Misuse of the technology for surveillance or commercial profiling would erode confidence. But the benefits affordable, accessible, non-invasive monitoring of neurological and mental health make this an area too important to dismiss.

Conclusion

Speech is one of the most natural and universal forms of human behaviour that could be used for remote monitoring and personalised care. It is also a rich signal of brain and body working together. Advances in AI mean we can now measure those signals in ways that were unimaginable a decade ago. For IT professionals, this represents both an opportunity and a responsibility. The opportunity lies in applying skills from data science, engineering, and privacy-preserving AI to help address some of society’s greatest health challenges. The responsibility lies in ensuring these systems are fair, ethical, and trustworthy.