Dr. Martin Wheatman introduces Enguage, a language interpreter with the ability to program purely through speech.

On 21 June we’ll see the 70th anniversary of the Manchester Baby, marking the inception of stored programs as data. Programs, as high-level languages, are defined by a fixed syntax from which an implementation, in a low-level language or machine code, can be derived.

This article describes a novel approach of using instructions defined at run-time. These instructions are identified by patterns rather than a formal syntax; and have intentions - their meanings - attached, to be executed when the pattern is matched.

One impact is that these instructions need no punctuation symbols - curly brackets and the like - and so can be spoken. This removes the need for the written artefact we know as a program.

Let’s look at a simple example: what is a factorial?

What is a factorial?

int factorial( int n ) {

       return n == 1 ? 1 : n * factorial( n-1 );

}

If you can remember back to your secondary school maths, the factorial of a number is the number multiplied by the factorial of the number minus one. Keep this in mind. In essence, it can be written in C as a recursive function, above, or procedurally as:

int factorial( int n ) {

       int total = 1;

      while (n > 1)

              total *= n--;

      return total;

}

But could that description of factorial work: is speech ever precise enough? In the early days of computing, one idea was to develop programming languages to be more like natural language, such as COBOL and CORAL-66, making them easier to program; although, this goal became ‘merely easier to read when programming’, but didn’t become a trivial activity.

There remains a difficult mapping between natural and programming languages; over the years, many techniques have evolved to transform specifications into code.

However, if we think in utterances, surely, these thoughts can be expressed in further, more precise, utterances? For example, the factorial example can be broken down into:

the factorial of 1 is 1.

the factorial of N is N times the factorial of N-1.

This transforms the original definition into more precise statements because, in this case, we have defined where the recursion stops. But we’re still not looking at a traditional programming language: there isn’t a curly bracket in sight!

But this forms the basis of a complete textual definition, which can be read by the language engine Enguage. And, unlike the source code above, it has no punctuation, so it can be spoken. There is no need for a program as a written artefact.

A self-describing system

What distinguishes Enguage from other voice controlled systems, such as Amazon’s remarkable Alexa, is, put simply, that it is self-describing. This simple, but powerful technique is akin to the C compiler being written in C. What follows are the descriptions of factorial, developed using output from the speech-to-text function of Android, forming part of the Enguage unit test suite. The first of these is given simply as:

to the phrase the factorial of 1 reply 1.

ok.

This shows that inductive utterances can describe deductive ones. This construction of meaning involves adding intentions to an utterance pattern: what you want to think, do, or say when you match a given utterance. In this first case, the only intention is to return the value 1.

To highlight this inductive state, the machine will reply with:

go on

during construction to indicate it is waiting for further instruction. Induction is then terminated by uttering:

ok

Enguage can describe statements as being spatially or temporally significant, as well as connect to third party programs via TCP/IP sockets. It is also supported by numeric and persistent classes through the perform command. Two such calls are used in the factorial example, one covering multiplication...

interpret multiply numeric variable a by numeric variable b thus.

first perform numeric evaluate variable a times variable b.

ok.

...and a corresponding one for subtraction. These allow a factorial to be fully described as a question, thus:

interpret what is the factorial of numeric variable n thus.

first subtract 1 from variable n.

then what is the factorial of whatever.

then multiply whatever by variable n.

then reply whatever the factorial of variable n is whatever.

ok.

‘Whatever’ refers to the answer given to the previous utterance: a reply is a formatted answer. Though this is ostensibly given in English, Enguage, in fact, works with arbitrary strings; it can work with any alphabet your speech-to-text software can deliver.

Any input, and any output, can be encoded: slot filling is less of a pejorative term when you define your own slots! This inductive framework could easily be translated into another language, or it can be used directly. There is more inductive description given in the Enguage test suite, such as:

this implies variable n is a positive integer.

if not reply sorry I cannot calculate the factorial of variable n.

Also, this is not interpretation. While the user can speak whatever the software has been designed to catch, the output has to be unequivocal. This is so that the user can be assured that the correct understanding of the utterance has been applied; hence the wordy factorial reply and ‘go on’ being the reply when constructing an utterance description.

This is not the disambiguation mechanism, so that where an unexpected reply is received, the phrase can be repeated, prefixed by ‘no’, to achieve a different meaning. Enguage is, therefore, a mediator, not an interpreter: it finds the most appropriate meaning for a given utterance.

Programming with utterance

Arguably, this is a new computing paradigm - perhaps an overused phrase - but it could possibly be a model of informatics? The universal machine can be seen as the forerunner of the modern process address space: a linear space of bytes, a program counter and discrete state changes. The Lambda Calculus is the forerunner of programming languages: a textual representation of algorithm.

These form a basis for computer science, inwardly looking at the ultimate truths of algorithm and silicon. Whereas Enguage, with its emphasis on pragmatism through arbitrary strings, is outward- looking. If informatics is an engineering discipline, outwardly facing society, then Enguage clearly fits within it.

Some may point out that Enguage depends on software written using source code, in particular for performing arithmetic. However, the same can be said for source code which is implemented in machine code, and even machine code is implemented in microcode or hardware, so dipping into other paradigms is no barrier.

Some may see the Enguage examples, above, as simply another expression of function. But, before we can reach that conclusion, there are several points to address. Firstly, there are several properties of functions which Enguage breaks, not least maintaining state, and passing out several values pertinent to its operation, such as an utterance’s felicity.

However, many of these have also been broken by the looser structuring of code through procedures. But most importantly, the ability program Enguage moves it further away from functions: its inputs define its functionality. Ultimately, Enguage is unimportant; what is said to it is of consequence.

This example shows that the essence of factorial - a recursive algorithm - is not only delivered, but it is created, by utterance. The paradigm here is that voice programs voice. The impact is that we have reached a juncture where software works as its own user interface - the instruction sets envisaged in the Church-Turing thesis, plus the GUI paradigm, are unified.

This fits in with the BCS goal of ‘making IT good for society’: it should make a digital society for people with screens and keyboards into a digital society for all.

Feel free to try it for yourself:

https://github.com/martinwheatman/

Tags