Recent years have witnessed the development of a wide range of computational tools that process and generate natural language text. Many of these have become familiar to mainstream computer users in the form of web search, question answering, sentiment analysis, and notably machine translation.

The accessibility of the web could be further enhanced with applications that not only translate between different languages (for example from English to French) but also within the same language, between different modalities, or different data formats. The web is, after all, rife with non-linguistic data such as video, images and source code. This cannot be indexed or searched since most retrieval tools operate over textual data.

In her Karen Spärck Jones Lecture on 23 October 2019, Professor Mirella Lapata argued that, in order to render electronic data more accessible to individuals and computers alike, new types of translation models need to be developed.

Professor Lapata focused on three examples: text simplification, source code generation, and movie summarization. She illustrated how recent advances in deep learning can be extended in order to induce general representations for different modalities and learn how to translate between these and natural language.

Watch Professor Lapata's 2019 Karen Spärck Jones Lecture

About the speaker

Mirella Lapata is professor of natural language processing in the School of Informatics at the University of Edinburgh.

Her research focuses on getting computers to understand, reason with, and generate natural language. She is the first recipient (2009) of the BCS and Information Retrieval Specialist Group (BCS/IRSG) Karen Spärck Jones award and a Fellow of the Royal Society of Edinburgh.

She has also received best paper awards in leading NLP conferences and has served on the editorial boards of the Journal of Artificial Intelligence Research, the Transactions of the ACL, and Computational Linguistics. She was president of SIGDAT (the group that organises EMNLP) in 2018.