Tom McEwan CEng FBCS CITP from Edinburgh Napier University and Chair of the BCS Interaction SG takes a look at the new Microsoft Kinect and what it means for computer interaction.

At the start of the year, everyone’s Facebook profiles seemed to be full of pictures of family members jumping in the air on Christmas Day. The outputs of Microsoft’s Kinect and Xbox became this generation’s version of taking to the streets on Boxing Day to show off your roller skates, hula hoops or just that tasteful new pullover.

A few months on are these items part of everyday life, or are they piled in the closet? Does Kinect illustrate how we will interact with information and media in future, with gestural interfaces freeing us from RSI? Are the iPad and Kinect really ‘game-changers’, technologies that nobody could see coming, but are destined to endure?

Is this the future of how we will interact with computers? No longer are we to be shackled to WIMP (Windows Icons Menus Pointers) interfaces, and we can look forward to lives free of RSI?

Kinect is a nice technology and a lot of fun, but its recent release confirms that we can predict what technology will be hot on the market in 2031 by looking at what is being demonstrated in labs today. Just as WIMP had a lengthy gestation, from Doug Engelbart’s demo in 1968 through to commercial systems in the mid-1980s, Kinect is the latest refinement of a series of innovations.

Ten years ago, in an article for the Spring 2001 issue of BCS’s Interfaces magazine, I reviewed Intel’s Me2Cam - a £30 webcam on sale in the high street that allowed interaction with software, based on gestures you made into your webcam. Similar technology appeared later in Sony’s MagicEye for the PlayStation, and Kinect is basically a 3D version of this - adding the ability to detect distance of objects from the sensors.

At the time I wrote the article, I was still fairly new to the world of academia, and surprised that a commercial product like this could appear for which I could find no recent academic research. After a bit of digging around, it turned out that the research for this had been pretty well wrapped up 15-20 years earlier by Myron W. Krueger. This research had been applied not just for computer games but in machine vision, for example in factories.

The length of time it takes for a research breakthrough to become a mainstream product might seem surprising, but there are many reasons for this.

Innovation diffuses slowly - something that Harvard economist Svi Griliches pointed out in the 1950s (in relation to hybrid seedcorn) and a body of work mined repeatedly ever since - you might be familiar with the popular business book by Geoffrey Moore, Crossing the Chasm. Gaines & Shaw provide a related model: BRETAM - breakthrough, replication, empiricism, theory, automation, maturity - suggesting that it takes eight years to go through each of these stages.

So what breakthroughs are the ones that we should be watching out for? The one that leaps to mind is Brain-Computer Interaction, although this too is surprisingly well-trodden in research terms, with both invasive (such as cochlear implants) and non-invasive (such as those based on electroencephalography (EEG)) technologies turning into widely available, if still-expensive, commercial products.

Another interesting area is the use of eye-tracking devices, now no more intrusive than a webcam and widely used in web usability studies, to measure not only attention but emotional response. However, like most attempts to infer thoughts by measuring physical reactions, we tend to be able to know that something is going on, but not to know what it signifies. The commercial field of neuro-marketing can feel peppered with half-substantiated claims reminiscent of the era of ‘subliminal cuts’.

We have to wait and see what people can and will do with all of these technologies - and this is another reason for the slow diffusion of innovation. What the OECD define as ‘non-R&D innovation’ seems to have far more significance than R&D in new interaction technologies. The psychoacoustic phenomena found in the 1960s and 1970s informed the definition of MP3 by the end of the eighties.

This led to the first online sales of MP3-encoded music by 1994 and the first portable MP3 players by 1998. But it was only after a few years of seeing how consumers used these devices that Apple started work on the iPod, and it’s only now that online sales outstrip physical sales for the single charts. What happened in-between was less to do with R&D (beyond the application of Moore’s Law) and more to do with how the consumers and the producers of music formed new dance steps around each other’s interests.

So, as well as looking at research labs for your guide to long term future, look at current novel products in the market to see what five years’ hence will be like. Recently I’ve had a chance to play with some of the new tech on the block.

I’ve been demonstrating Edinburgh Napier’s Interactive Collaborative Environment (ICE) room to a dozen groups of first year students. After seeing colleagues demonstrate Microsoft’s Kinect, my family had a new device under the TV on Xmas morning. I took possession of the new Galaxy Tab - half the size and weight of an iPad, Android-based but already becoming a constant in our lives.

My experiences with these technologies suggest a growing division between interactions that depend on selecting one or more points on a screen, and moving or clicking, and collision-based interaction of different objects displayed on a screen. While WIMP, and before that the command line, have been a part of our lives for years, let’s remember that, for a new generation, the single point that you click, double click, right-click, touch and/or drag is no longer dominant in the market.

Almost every phone and laptop sold now supports multi-touch - the ability to touch two or more locations on screen (or touchpad) and, by means of a gesture, do things, for example pinching and stretching with two fingers to expand or contract an image. Yet when you think about it, this gesture is not really obvious - when you let go of a stretched picture shouldn’t it contract?

At HCI2009’s Open house a team from Microsoft demonstrated an alternative circular gesture to zoom in or out. Again, this will be familiar to some, e.g. users of telescopes and binoculars, but to the average user?

Teams around the world seek out new, yet familiar, gestures in order to patent or trade mark them. We are starting to see the first few threats of gesture-related law suits since the 1998 Standard Life v BT spat over the rights to using one’s hand, in an advert, to indicate a phonecall. Putting two fingers up to those who claim IP in gestures might cost you!

Anyway - back to multi-touch. Edinburgh Napier’s ICE has been built to explore how we cope with multiple hands all touching the same back screen at once. When multiple people flick documents back and forth to each other, or draw multiple dataflow lines at once, this leads to their hands overlapping. How do we know which point of contact with the touchscreen belongs to whom?

One solution detects the finger locations and the heat of the palm to infer a hand and to keep this separate from another hand. We can happily have a hundred simultaneous points of contact on our big table (although processing all those instructions, as you can imagine, has an overhead).

In any case, is the point of contact always what we are interested in? Do we use this to infer something that we would be better off detecting in other ways. Computer games don’t rely only on what you click. In fact most of the interaction is the result of you controlling an object that collides with another object in some way - either by simulating impact in a 3D space, or by overlapping in a 2D view of a 3D world.

The Kinect, Nintendo Wii and Sony PlayStation Eye systems all use your physical movement to control an on screen avatar. In a few cases the avatar points and clicks at something, but what is more interesting is to realise that the avatar is effectively immersed and interacts with other virtual objects within an environment.

The future

In trying to predict future interactions you might have with computers, don’t just think about the latest amazing technology, think about how, and with what, the human controlling it needs to interact in the information space. Then think about the extent to which that technology can support that interaction.

Some things are as simple as ‘select object, do action’ (or vice versa) while others are more immersive and about being in that world. Then, perhaps your technology investments will remain in use, rather than get buried in a closet after the fun has gone.

Further references

  • Krueger, M. Artificial Reality, Addison- Wesley, 1983.
  • Freeman, W.T. and Weissman, C., Television control by hand gestures. In M. Bichsel, editor, Intl. Workshop on automatic face and gesture-recognition, pp. 179- 183, Zurich, Switzerland, 1995. Dept. of Computer Science, University of Zurich, CH-B057.