Volume 2 No.5                                                                                                                     September 1999

TECHNOLOGY

Artificial Intelligence: Encounters with the Real World

How do we create a mind? This quest goes back to ancient times, far older than the colourful name - "Artificial Intelligence" (AI) - which is only forty years old. In its initial avatar AI was purely cerebral, but it could solve toy puzzles. In recent years, realizing that intelligence becomes stunted without participatory stimulus, AI has reincarnated itself into the real world, which it tries to fathom through sensors, actuators, and active exploration. We describe here a stream of interdisciplinary research at the Center for Robotics at IIT Kanpur which looks at AI in this context, using tools such as Computer Vision, Machine Learning, and Natural Language Modelling.

Visuo-Motor Learning in Robot Soccer

The robot with a camera must decide what it will do. To the camera the image shown is just a series of numbers (255 is the brightest and 0 is black). The top of the ball looks like this:

22   24   20   32   23   24   25   27   26   24   20   23   24   24
22   23   24   25   27   26   123   121   17   14   23   24   24   110
25   27   30   24   52   172   131   69   20   24   23   24   24   30
24   23   24   24   80   144   125   132   21   24   18   24   19   20
23   24   24   50   125   177   175   98   60   72   31   29   20   24

From this, the robot must figure out a course of action. In learning the soccer goal-hitting task, it characterizes this type of image data ( the visual stimulus) into a set of states, and then takes an action which is initially random. In each visual state, actions which have a higher likelihood of scoring a goal are reinforced (Reinforcement Learning). This process, initially coded in simulation, is implemented on a real robot as shown here. From learning to score goals without any opposition, we are now in the process of building a team of three micro-robots for playing soccer. The strategy for the team is still learned using Reinforcement.

Such robots are built based on a new model of intelligence called the reactive or agent model.

Earlier AI attempted to create intelligence by integrating complete knowledge of the world into a set of rules, but this failed due to the increasing complexity of consistently combining a large set of rules.

Robot learns to shoot goals: Initially the robot moves randomly, but it reinforces those actions that eventually result in a goal. Gradually it forms a model of actions, e.g., "If the ball is to the left of the scene and the goal is in the centre, turn right."

The reactive model attempts to encode intelligence as a set of behaviors each of which acts on a limited context such as the world of the soccer model above or the urban park model of the natural language understanding system to be discussed below. These agents are then able to function in these niche domains, and it is hoped that combining many such agents will lead to a more general model of intelligence.

From Story telling to Cinema

Natural Language research, like Computer Vision, has shifted its goals from trying to "understand" to the more limited objective of executing the task at hand. In this context driven work, a story is typed in about a scene in a park, which is then interpreted by a context-specific semantics model. This includes details about the appearance of each object, their possible articulations, size ranges, nominal behaviour, etc. The model also interprets spatial relations involving objects or agents ("There is a man near the hedge."). Finally, the system creates all the objects and the agents and then animates the actions described in each scene as they are typed in, resulting in a cinema from the story.

Virtual Director: The user types in a story set in a known world (e.g., an urban park). The language is interpreted using semantic models specific to the known domain, and these are used to generate a graphics animation. "He gave the flower to the woman" results in the man walking across to the woman and handing her the flower.

The tools of Artificial Intelligence need increasingly sophisticated sensory capability. These are often based on models of the world they are looking at, such as in the soccer-vision system above. Much of the interaction in these systems must be with humans, and therefore, it is important to be able to identify humans and their motions, which is the focus of a stream of activity described next. Here the system is aware of the nominal dimensions of the human body and uses these in locating various limbs and their motions.

Virtual Engineering

Virtual Reality technologies are increasingly being used to test Manufacturing System Design before deployment. Here, the designer needs to make complex decisions about systems that one is not able to test. By linking the human motion tracking input to the Virtual Reality model of the plant, one can build Manufacturing Simulations, as in the TISCO Gantry Crane model shown in the figure captioned EOT Crane Simulation.

Other applications of human motion tracking become useful in controlling complex multi-degree-of-freedom machines, such as robots. In the figure, a robot responds by emulating the arm motion of the master who can, with a single motion, provide input for all its joints.

EOT Crane Simulation: The user signals the gantry crane using the same gestures as he actually uses in the shop floor. In this project for TISCO, a camera is used to interpret these gestures and move a simulation of the crane. The image on top shows a Master-Slave operation where the robot is following the motion of the human hand.

Distance Training in Dance and Surgery

In forms of education such as sports coaching, surgery, dance, engineering maintenance etc. it is not sufficient to give a lecture: the guru needs to be next to the pupil guiding her hand. Unfortunately, good gurus are rare, and may not be close at hand. This problem can be solved by using training sites at remote locations which are fitted with inexpensive camera-based trackers for identifying the trainee's arm motions in 3D. This 3D data is very compact and low bandwidth (about 20 bytes per frame). Based on this a pre-loaded model of the trainee is re-created in front of the Guru at the remote site, who can also view the pupil from other angles and provide feedback. The model recreated remotely may also be in a totally different dress/costume than the one actually used!

Distance Training: The trainee wearing informal clothes is seen by the Guru in a remote location wearing full costume but executing the same dance in real time. This is possible because only the joint angles are transmitted at about 20 bytes per frame.

Virtual Reality in Entertainment

Another interactive model based on human tracking is the Virtual Reality game Resultant Force, which sends the user on a space walk with jet thrusters along each arm. Instructions in Hindi, English, or Bengali guide him as he tries to return to the mother ship by positioning his arms in space. We are looking for organizations interested in installing this Indian Language VR game.

Resultant Force: In this Virtual Reality game, the astronaut tries to return to the mother ship by adjusting his arms which have jet thrusters mounted on them. The images are analysed in real time to determine the arm positions and therby his resultant motion. In the picture, he has just landed back on the mother ship platform.

The overall goal of all AI research is to build a system that can function in the real world for performing some useful task such as assisting a human user. Increasingly, AI is linking up with robotics to form systems that can directly build its models based on real world experiences.

The Future?

If there is one point we should understand about AI, it is this - that which is machine-achievable, that which is explicitly known, is no longer in the purvey of AI. How many people using Mathematica today realize that symbolic computation started as a topic in AI? Similarly, speech recognition, neural networks, and to a large extent, object-oriented computing, are the children of AI. Many other topics (even machine chess) have ceased to be AI activities, since they have become demystified enough to be no longer associated with AI. Other topics such as Natural Language and Computer Vision, which are yet to be solved, are still in AI, though problems such as speech recognition or fingerprint identification have recently stopped being AI since they are now commercial technologies.

What will tomorrow bring? Looking at the exponential growth in computational capability which has been going on for more than a hundred years now (from pre-silicon days), computational devices with capability comparable to the human brain will be there within a decade or so. But how much longer will it take to be able to attain human level performance from such machines? In some areas where the problem is relatively well defined, the success will come early. Speech, Language and Vision, also seem to be making faster headway than seemed possible a few years ago. Difficult tasks such as balancing the goals of a complex system may take longer, but sooner or later human level functionality in machines is sure to happen. Machines with emotion, machines with some form of self-awareness (consciousness?), machines that can replace parts of the human brain that have atrophied, machines that will encode entire individual minds, cell by cell, and react identically to different situations... these may turn out to be more science than fiction even in our lifetimes.

Amitabha Mukerjee

Department of Mechanical Engineering and

Centre for Robotics

Indian Institute of Technology

Kanpur - 208016

e.mail : amit@iitk.ac.in


[back] [next]