Martin Wheatman, winner of the 2016 BCS SGAI Machine Intelligence Competition, describes a replacement for the GUI metaphor through using natural language to make our smartphones chatty.

You may have recently seen a technology piece on the BBC website, making claims about ‘chatty’ smart-phones by those nice folks at Google. The Beeb’s tech correspondent, Rory Cellan-Jones, says something like, ‘Ok Google, who is the current Prime Minister of Great Britain?’ and it comes up with the correct answer. He then asks, ‘how old is she?’, and it links Theresa May with the pronoun she and comes up with another reply. Spot on.

The search engine giant’s James Nugent then says it is because we talk like this that makes the assistant better than a screen and keyboard; certainly, as a front-end to search, it is a useful tool.

But hang on, do we really talk like this? Imagine the exciting evening you’re in for if you get caught between two people exchanging facts at a dinner party! A conversation is clearly not just a string of facts: it needs to be able to express the non-computable as well.

Constructing reality

Facts are traditionally divided into the analytic, those which are self-evident; and the synthetic, those based on experience. So, ‘all bachelors are unmarried’ is analytical; whereas, ‘Martin is a bachelor’ is synthetic. However, there are also non-computable ideas: ‘what is my favourite colour; what’s the name of my son’, or ‘do I like celery?’

Nobody can know these things - they are not self-evident, you can’t search for them on the internet, and they may not even be true - they are subjective. Such utterances co-operate to form repertoires which support some facet of reality, so the concept of need is composed of: ‘I need X, I do not need X, what do I need,’ and ‘do I need X’. This provides a framework by which analytic and synthetic facts can be set.

The problem, though, with such ‘constructed’ realities is that they are effectively hearsay; in a court of law they would probably hold lowest sway. Does this mean they are of no importance? In 1950, Alan Turing proposed a parlour game using just such subjective questioning to create a reality1 - not necessarily the real one - but one to determine if men and women could be distinguished solely by their responses.

By extension, this might be used to tell if machines could be indistinguishable from humans, neatly side-stepping the thorny notion of what intelligence is. In the intervening years, this built reality has been drowned out by the mantra, ‘if you can maintain a conversation you must be intelligent’.

The creation of meaningful reality, however, has not been lost. The building of worlds - the objective model of subjectivity - has flourished in computing, particularly through the object orientated technologies; however, this has remained firmly within the sphere of the context-free. Until now. This article describes the winning entry at the 2016 BCS SGAI Machine Intelligence Competition, its concepts, and their common features.

Ok Enguage, make it so

Enguage, the language engine, is a program which maps one utterance - an arbitrary string of strings - onto another. Thus, it deconstructs natural language in natural language - all within the context-dependent domain. It therefore recognises subtle nuances in speech which have significant effect on meaning; it has the ability to deal with, for example, how ‘I have a coffee’ is equivocal to ‘I need a coffee’; whereas, ‘I need to go to town’ is the same as ‘I have to go to town’.

This is not a chatbot, the whole utterance must be matched, rather than a keyword or phrase; the emphasis is not on maintaining a conversation but on replacing the graphical interface metaphor.

Deconstruction is interspersed with context-free function calls. Mappings are organised into concepts. The following is an excerpt from the meeting concept3, which is read in whenever meeting is uttered.

On “I am meeting PHRASE-WHOM”, _user is meeting WHOM.

On “SUBJECT is meeting PHRASE-WHOM”:

  assert SUBJECT is meeting WHOM;

  reply “I know”;

  if not, perform “list add SUBJECT meeting WHOM”;

  set output format to “,LOCATOR LOCATION, WHEN”;

  then, reply “ok, SUBJECT is meeting ...”.

This shows two mappings: a simple translation; and, a complex transformation. An utterance is deconstructed until a reply is found. The relevance of each thought within the deconstruction is dictated by the felicity of each preceding thought - if the assertion, for example, is infelicitous, thoughts proceeded are not interpreted. Using more explicit utterances to implement general utterances allows a rich variety of utterances to be used.

Concepts are loaded and unloaded automatically, giving a spontaneity to language support. Thus, the system mediates utterances - finding the most appropriate understanding in ambiguous situations2.

This programming ability means that the system can be modified to support new concepts, without having to create the same concept for similar ideas. Common features include numerical abilities in verbal arithmetic ‘what is 1 + 2’, which presumably can be done in other systems because it is self-evident.

However, Enguage employs this in concepts: ‘I need a coffee / and another’. It also demonstrates the ability to be corrected: ‘I need a coffee / and another / no I need another three’. Unless you are going to force users to learn a menu and interact vocally with your system, you will need the features of something very much like Enguage.

You can search the internet for this

This non-computable approach is apparent in other apps, such as Memrica Prompt, which is aimed at aiding people with early stage dementia, but which should prove useful to us all: what is the name of the window cleaner? Where am I meeting my brother tomorrow? Such a spatio-temporal concept meeting has been developed as a prototype3.

Current development is focused on refactoring and reducing the onerous task of teaching a language3. To facilitate discussion, the source code is available on https://github.com/martinwheatman. This work shows that machine understanding can work on two levels: that of the concept, shared between the machine and user; and the personal meaning reflected back to the user - the machine need not know what is meant by ‘coffee’. It also shows the pre-eminence of ideas over facts.

A fact can be true or false; and, known or unknown; however, an idea affords the imagination necessary to deal with conundrums such as ‘if we are holding hands, whose hand am I holding?4’. Unless and until machines can deal with such subtleties, they won’t get invited to the right sort of dinner parties.

Key concepts and further reading

Several assistant-type apps are available, other than from Google, some of which are not directed at search; however, the point is that software is directed, whereas non-computable ideas are universal.

Context-dependent analysis means the utterance need not necessarily be English, or indeed a vocal language: if an utterance can be serialised, it can be mediated.