

 |
 |
Wave goodbye to the mouse By Peter Ferne, Technical Consultant, Futurelab |
Computer Vision, which comes under the umbrella of Artificial Intelligence, aims to get computers to 'understand' what they can 'see' when attached to a camera. Until recently it has been a somewhat arcane field of study that has not significantly impinged on the public consciousness, but that is about to change. Gestural interfaces apply computer vision techniques to track the motion of an object within a scene (typically a hand) and use this information to control the computer.
Although webcams have been around for a while - the first showed a coffee pot in a computer lab so that programmers several floors away could tell whether the pot was full or empty without making a wasted journey - it is only recently that the rapid spread of broadband has made them useful and cost effective communication tools. And as they become more common we are starting to see innovative new applications for them. Two things in particular are driving the mass adoption of cameras in computing devices - games and cameraphones.
The EyeToy, a webcam for the Playstation which came bundled with a collection of minigames called EyeToy Play, has been a phenomenal success for Sony, selling 4 million copies worldwide within a year of launch and spawning a slew of follow-on games (EyeToy Antigrav, Groove, MonkeyMania etc) for the Playstation, crossing over to the Mac with ToySight, the Xbox with the Xbox Live Camera and no doubt becoming a standard accessory on any gaming platform worth its salt.
Over a quarter of a billion cameraphones were sold worldwide in 2004, outselling digital still cameras by four to one, and Nokia is now well on the way to becoming the world's largest camera manufacturer. Whilst most of these are being used to snap pictures and, with the introduction of 3G, stream video - most visibly being used for TV news broadcasts by BBC journalists - that is not all.
The twin effects of this flood of devices are firstly, and most obviously, a reduction in cost and an increase in quality of the devices, but secondly, and more significantly in the long run, more software developers coming up with more creative uses. For example Marble Revolution, which uses the camera in a very unexpected way as a control mechanism for rolling a marble around a maze in an entirely natural interface. Of course the camera is also used as the controller for a wide range of EyeToy games, allowing the player to use their whole body to interact with the game.
Whilst the dream of being able to control machines by the power of thought alone (as demonstrated by James Burke in his Connections TV series as long ago as 1978) still seems to be a way off yet, we will soon be able to control them by pointing and waving at them. Perhaps you will wave goodbye to your PC to log off as you leave work. Research such as that being done by Professor Roberto Cipolla's Computer Vision and Robotics group at Cambridge University often seems to make its way slowly from the lab to the high street, but sometimes new products arrive from unexpected directions.
The film Minority Report probably did more than anything else to bring the idea of the gestural interface into public consciousness. The sight of Tom Cruise sifting through a vast collection of documents and photos on screen by waving his hands around clearly struck a chord with a lot of people. Why put up with the awkward translation from Desktop to desktop 'pointing' with a mouse when you can just point directly at the thing itself with your finger instead?
John Underkoffler was the technical advisor on the film tasked with making the vision of the future believable. The key to this was consistency. John and his team went so far as to make training videos for the non-existent interface so that the actors could use it convincingly. They were so convincing that when an engineer at US defence contractor Raytheon saw the film she contacted John to talk about building it for real.
John and his team retained the right to commercially exploit the system, and at the Game Developers Conference in San Francisco earlier this year they finally broke cover and announced G-Speak. Although, disappointingly, he didn't give a live demo of the system, he did show three videos of the interface being used in different scenarios.
The first clip saw the user exploring a large panoramic image. To pan left or right around the image he held his hand flat, more or less parallel to the screen, and simply slid the image on screen sideways. To zoom in to the image he pointed at the area of interest and moved his hand in towards the screen, to zoom out he moved it back out. And using two hands he was able to zoom and pan simultaneously.
In the second example the user was examining high resolution video footage. To play the video he held his hand flat, parallel to the floor, and slid his hand to the right. To play backwards he slid his hand to the left. To pause the clip he turned his hand through ninety degrees, thumb upwards, and restarted it by turning his hand flat again. And, again by using two hands, he was able to zoom in and out of a video while it was playing.
The last scenario showed a user flying through a 3D environment. The movement of his viewpoint through the scene was controlled in a similar fashion to the forward/stop/backward gestures of the video playback. His orientation within the scene was controlled by holding his thumb, index finger and middle finger at right angles to one another, echoing the x, y and z axes. Rotating his hand in any direction produced a corresponding rotation of the scene on screen.
The overwhelming impression from watching these demonstrations was of the fluidity and naturalness of the gestures; they seemed to correspond remarkably closely to how you would manipulate the physical counterparts of the digital objects, making them easy to learn and use. Whilst there will no doubt be some detailed close-up work better suited to other tools, such as keyboard or graphics tablet, for the bulk of day-to-day use gestural interfaces will quickly become the de facto standard.
I believe that after using a good gestural interface you would no more go back to using a mouse than to using punched cards. I expect that within five years all of our computing and communication devices will have cameras as standard accessories, or even built-in, and gesture recognition will be part of all mainstream operating systems.
And finally...
While you are waiting for the future to arrive you might like to practise not clicking on things with your mouse at the Institute for Interactive Research. If that tickles your fancy you could start using mouse gestures in your web browser or, if you use a Mac, across all of your applications.
If you've already invested in learning to touch type the Das Keyboard claims to increase your speed and accuracy by the simple technique of having entirely blank keys, yes really. But if, like me, you hunt and peck and like to look at the keys as you type the Optimus keyboard, where each key has its own screen, might be worth a look.
By Peter Ferne (www.petef.com)
Links
Computer Vision on Wikipedia: en.wikipedia.org/wiki/Computer_vision
The Trojan Room Coffee Pot: www.cl.cam.ac.uk/coffee/qsf/coffee.html
EyeToy: www.eyetoy.com
ToySight: www.toysight.com
Xbox Live Camera: news.teamxbox.com/xbox/8416/Xbox-360-Peripherals-High-Resolution-Pictures
Marble Revolution: www.bit-side.com/entertainment/MOBILE%20GAMES/Marble
Computer Vision and Robotics group, Cambridge University: mi.eng.cam.ac.uk/~cipolla/research.html
Minority Report - for real: www.defensetech.org/archives/001491.html
G-Speak: www.g-speak.com
Institute for Interactive Research: www.dontclick.it
Mouse Gestures for Firefox: optimoz.mozdev.org/gestures
xGestures - Mouse Gestures for OS X: stout.hampshire.edu/~bjk02/xGestures
Das Keyboard: daskeyboard.com
Optimus Keyboard: artlebedev.com/portfolio/optimus
August 2005
|
|