Translating from multiple modalities to text and back

Recent years have witnessed the development of a wide range of computational tools that process and generate natural language text. Many of these have become familiar to mainstream computer users in the form of web search, question answering, sentiment analysis, and notably machine translation.

The accessibility of the web could be further enhanced with applications that not only translate between different languages (for example from English to French) but also within the same language, between different modalities, or different data formats. The web is, after all, rife with non-linguistic data such as video, images and source code. This cannot be indexed or searched since most retrieval tools operate over textual data.

In her Karen Spärck Jones Lecture on 23 October 2019, Professor Mirella Lapata argued that, in order to render electronic data more accessible to individuals and computers alike, new types of translation models need to be developed.

Professor Lapata focused on three examples: text simplification, source code generation, and movie summarization. She illustrated how recent advances in deep learning can be extended in order to induce general representations for different modalities and learn how to translate between these and natural language.