The future of speech tech

Dr Sharon Goldwater talks to Martin Cooper AMBCS about her work in natural language processing, her Needham Award lecture and why we’re unlikely to ever see a Babel fish.

Systems like Siri and Cortana are now everyday helpers. But the apparent popularity of speech based interfaces belies the fact that comparatively few languages can be processed using current natural language processing technologies.

English, because of its popularity, and the fact its spoken by many academics, has led it to be the focus of most machine learning research. Millions of people, Dr Sharon Goldwater says, are missing out on the advantages speech tech offers and she hopes her research will redress this imbalance.

The 2016 Needham Lecture

Dr Goldwater is a Reader at the University of Edinburgh’s School of Informatics and the winner of the 2016 Roger Needham award - an award made annually for distinguished research contributions in computer science by a UK based researcher. Along with the award, the winner is given the opportunity to give a public lecture.

Dr Goldwater’s talk was called ‘Language learning in humans and machines: making connections to make progress.’ Explaining where she hopes her research will lead, she says: ‘There are languages in Africa that have millions of speakers, yet there’s zero language technology. Especially in areas with low literacy, developing speech technology would be very useful - users could call up on their mobile phone, ask a question and get a spoken response. Using current technology, that’s not possible.’

Fascinated by how words work

‘I’m interested in how computational systems can learn language,’ says Goldwater as she begins to explain her work. ‘And when I say computational system, it could be an actual computer or it could be the human mind - which I think of as a computational system too. It receives input, does some sort of computation and produces output.’

‘When you say you’re interested in language,’ she observes, ‘people always say “oh, so you want to be a writer or you’re interested in literature”. That’s not what I’m interested in. I’ve always been fascinated by the structural nature of language. What is it that makes Russian different from English? That’s what linguists are interested in - the scientific study of language.’

Along with being fascinated by language, Dr Goldwater says that she’s equally beguiled by mathematics and logical problem solving. An academic career in computational language processing seems then the perfect fit.

Looking back at her university years, Dr Goldwater admits that she was very lucky. She met a professor who was a leader in the field of natural language processing - the technological side of computational linguistics. ‘I was already studying computer science and linguistics and I discovered that there is a field that involves both of those things,’ she says.

An everyday revolution

Natural language processing is something of a hot topic in the tech industry. With the arrival of Siri, Cortana and their cousins, people are becoming increasingly comfortable talking to their devices - in much the same way they became accustomed to touch based interfaces a few years ago. Of course, that’s not always been the case. ‘Not so long ago NLP was a very niche subject, ‘Dr Goldwater says. ‘If I tried to tell anybody what I was involved in - even if they worked in computer science - they had no idea what I was talking about’, she laughs. ‘Now the number of people turning up to conferences has increased massively.’

Looking beyond English

One of Goldwater’s main focusses is on making language processing accessible to people who speak languages that aren’t necessarily widely spoken.

English, she explains, is probably the best language to speak if you’d like to benefit from the burgeoning technology’s current and future advantages. That’s because there’s been so much research done in turning the spoken language into digital data.
Commercial speech to text software for English has been available over twenty years, she says. And in that time it’s been refined. Even today, she explains, if you speak a language other than English, computers will find understanding you just that bit more difficult.

‘It’s nothing fundamental about English itself,’ she explains - English words aren’t shaped or formed in a way that flatters machines. More prosaically, Dr Goldwater says: ‘It’s just a historical fact that many people who have worked in the field have worked in English speaking countries and so most of the work that has happened has been in English. For a long time most of the development resources were also mainly in English.’

Tough to teach understanding

So why is teaching a computer to understand language so hard? Firstly, Dr Goldwater explains, language is an infinite system - it is possible to take a language’s building blocks and string them together to make a sentence that nobody has ever said before.

‘Humans,’ she explains, ‘are very good at understanding how these building blocks fit together and extracting meaning from those completely novel sentences.’ That, she says, is very tough for computers. That holds true at the level of understanding and also at the level of speech recognition.

It gets more difficult too. ‘If I say the word “blue” and then say the word “blue” again, the precise acoustics of that word will be different each time. Getting past just that and getting a computer to generalise about things it’s seen before and things it hasn’t seen before is difficult.’

There are also differences in languages’ structural conventions. This means, if you develop a system that works well in one language, it may not work so well in another.

Doubling down on difficulty

The other major hurdle computers must overcome is ambiguity - ambiguity at the level of sound and also in meaning. Take the word ‘interest’, for example. You could be showing interest in something or discussing your bank account’s dividends. And, of course, the word can be pronounced in different ways.

Humans, Dr Goldwater explains, are good at making sense of ambiguity. Computers, on the other hand, aren’t so capable and this problem is multiplied when a computer is translating from one language to another. The different connotations of ‘interest’ in English may translate to completely different words in another language.

‘That kind of ambiguity comes up astonishingly often and people are generally completely unaware of it,’ she says. ‘We only become aware when somebody makes a pun or we misunderstand what somebody’s saying... but that happens very infrequently. People just don’t notice the astonishing amount of ambiguity that’s going on.’ People, she says, are also very good at filling in missing information because we know a lot about the world.

Universal access but no sci-fi

Part of Dr Goldwater’s work in helping computers understand human language focusses on ‘unsupervised learning’. Supervised learning sees researchers give a computer a series of recordings and the corresponding transcriptions (for speech recognition) or sentences along with their translations (for machine translation): that is, the computer learns from examples of language input together with the correct output.

Most language processing systems, Dr Goldwater explains, are trained using that kind of data. Google translate is, for example. Based on that set of information the computer tries to generalise and then produce new transcriptions from new audio.

Supervised learning in natural language processing - learning based on human derived translations or transcriptions - is expensive to do. You need to pay humans to do the hard work. And this, Dr Goldwater reveals, is one of the reasons why she’s motivated to find another means of training computers: by definition today’s methodology favours rich counties. ‘I’m keen to give broader access to language technology across all languages.’

Unsupervised machine learning focuses on teaching machines to identify patterns, and even meaning in language, without the need for transcriptions or other explicit labels telling the computer what the right output is.

Unsupervised learning also means having some other kind of information available, Dr Goldwater reveals. ‘One example may be you’re observing somebody giving some instructions and you can then see what was executed. From that you can try and learn what each of the individual words meant.

In much of my work, though, we have just the audio. So you can listen to all that audio and start to spot repeated patterns and parts. The machine starts to recognise that those repetitions have meaning... there might be something useful in the patterns.’

Unsupervised learning doesn’t mean the machine can immediately understand a language it’s never seen before, however. Bad news, she says, if you’re expecting the arrival of a Star Trek Universal Translator or a Douglas Adams’ Babel Fish any time soon.