Real Time Vision Guided Reactive Strategies
for Micro Robot Soccer
A Project Report Submitted
in the Partial Fulfillment of the Requirements
for the Degree of
Bachelor of Technology
by
Nishant Agrawal
Tamhant Jain
Under the guidance of
Dr. Amitabha Mukerjee
and Dr. K. S. Venkatesh
Department of Electrical Engineering
Indian Institute of Technology, Kanpur April 2000
Abstract
Soccer robot games are rapidly becoming a major research tool in field of AI, machine vision and motion control. The dynamism of the environment in the size of state space makes such a design challenging. The dynamism and the rate of information required for optimal behaviors also puts huge demands on Real time machine vision and image processing algorithms being used. A wide range of approaches are being followed all over the world to design startegies and behaviors for the game as well as to implement machine vision ends of the system. In this work we develop optimal reactive strategies for the game using ground truth data from experts, with potential fields as base of the strategies and Machine vision based on colour segmentation and clustering to complement strategies with data at a rate that tackles the dynamism of environment well. We thus developed a system ready to be integrated with robot hardware to function as a fully functional team.
Contents
1. Introduction
Soccer is certainly one of the widest spread sports in the world. In the last decade, soccer has been taken to another dimension. It is now being played by robots. Technology has been put together with a very simple game. Similar rules now enable play at a different level.
FIRA (Federation of International Robot-soccer Association) organizes these games on an annual basis. The FIRA Robot World Cup provides an arena for multi-agent research dealing with research subjects such as cooperation protocol by distributed control, effective communication and fault tolerance, while having efficiency of cooperation, adaptation, robustness and being executed in real-time.
With the ever increasing number of robots in the industrial environment, scientists/technologists are often faced with issues on cooperation and coordination among different robots and their self-governance in a workspace. This has led to development of control in multi-robot cooperative autonomous systems. The proponents of multi-robot autonomous systems needed a model to test the theories being proposed for its efficacy and efficiency. It is therefore not surprising that this research has strongly focussed on robot soccer.
Robot soccer makes heavy demands in all the key areas of robot technology, mechanics, sensors and intelligence. It does so in a competitive setting that people around the world can understand and enjoy. The Micro-Robot World Cup Soccer Tournament (MIROSOT) thus was given birth, and new interdisciplinary research area emerged, where scientists and technologists from diverse fields like, robotics, intelligent control, communication, computer technology, sensor technology, image processing, mechatronics, artificial life, etc., work together to make the multi-robot systems a reality. MIROSOT involves multiple robots that need to collaborate in an adversarial environment to achieve specific objectives.
In multi-robot systems, other robots in addition to the uncertainty that may be inherent in the domain, can determine the environment’s dynamics. They have dynamic environments as other robots intentionally affect the environment in unpredictable ways. The key aspect being the need for robots not only to control themselves, but also to track and control the ball which is a passive part of the environment. The interesting theoretical issue behind MIROSOT experiments is the use of soccer as a prototype example of a complex, adaptive system. Our efforts have been concerned with several aspects of MIROSOT game. An overview of the game, rules and schematic description follow in next section.
2. OVERVIEW
2.1 MIROSOT
The game is played on a soccer field that is scaled down to the size of the Ping-Pong table. The robots attempt to push a golf ball into the opponent’s goal. Two halves each of five minutes are played, and the team with the highest score at the end wins. An overview of the game board is shown in figure 1.

Figure 1: MIROSOT Board
Inside a small chassis of robots (7.5 cm cube) are DC motors for motion, a small radio communication module and a microprocessor based control module. The robots are controlled by a central computer that communicates through the radio to each one of them. Also, over the play field is mounted a color camera. This camera provides the view of field to the host computer. Figure 2 shows some such robots playing the game.
2. 2 SCHEMATIC DESCRIBTION
The computer extracts information regarding present coordinates of robots and ball from the visual feedback provided by the camera. This is done using complex image processing techniques so as to achieve speed and accuracy needed. This information is used by strategy decision algorithms to decide about the motions to be effected for robots. The motion commands are then conveyed to each of the robots through a radio link according as a pre-decided protocol. Each of the robots has a communication module, which receives this radio signal and transfers it to onboard micro-controller. Micro-controller then decides if the command sequence is meant for that particular player or not. The command is then interpreted and the requisite motion effected using a control algorithm residing on the microprocessor and using feedback from the motors.

Figure 2:A game of micro robot soccer
The following figure shows the overall view of the whole system:

Figure 3: Overall view of the system
.3. PROBLEM CONCEPTUALIZATION
The soccer robot domain provides for immense opportunities in the following
areas :
Our efforts in the present work have been concentrated on the Artificial Intelligence, Multi agent systems, Machine vision and Image processing related aspects of the problem. The aim being to develop an exhaustive, fully integrated and extendible software backbone ready to be integrated with associated hardware, to result in a fully functional and competitive soccer robot team. Earlier we had developed first generation of robot hardware, which is now being developed into second generation of such robots. In following subsections we briefly mention several aspects of soccer robot domain and associated challenges that extend to aforesaid areas.
3.1 Artificial Intelligence and multi-agent system
Robot soccer playing involves two main challenges : coping with a very dynamical environment and performing robot cooperation towards team playing. A mobile robot has to carry out complex deliberative actions. At the same time, fast reactions are necessary to "survive" in a dynamical environment. Mutli-agent coordination is essential to success
in robot soccer domain [3].
Designing for multi-robot cooperation is complex as it involves both the dynamics of individual robots and dynamics of the team [2]. In this environment robot can’t learn where the obstacle will be and must contend with them reactively while maintaining longer term strategic plan. Performance evaluation in this environment is not easy since problem could lie in either individual design or the group dynamics. Thus, it is difficult to assess which components in the multi agent system should be modified for improvement. Due to this credit assignment problem, many multiagent designs specific to an environment may not be transferable to other. Therefore, the environment requires a careful consideration when designing multiagent cooperation. The demands on design of intelligent behaviors for a multi agent system of this nature are further intensified by the dynamic nature of environment. As the robots move very fast (say 15 robot lengths per second) so they have to be tracked very fast which in turn means that very limited time is available for decision making regarding robot coordination. The strategies have to be complete enough to guide the robots to play the game without any human-intervention or feedback during the game. The only input being the coordinates of all the robots and the ball received after processing of every image frame. Another issue to be addressed is handling of the enormous state space in this domain. Even with a coarse position resolution of one robot length, angular resolution of 10 degrees for orientation of the robot, we have a state space as large as (20 x 15 x 20) 7 = 6 x 10 21 states.
3.2 Computer vision and Image processing
As one looks, one may find the game of Soccer, played by robots very dynamic and fast. The Robots may be moving at speeds greater than 1m/s, some having reached the speed of even 2m/s. A fast vision system minimizes errors in navigational control of the robots. Tracking the motion of the robots precisely, at such high speeds becomes an extremely difficult task and computationally intensive task. The order of complexity of the problem increases merely by the fact that 7 objects have to be tracked with each robot having one out of three different pre allotted colors.
The camera grabs the image and the image is used for further processing and display via the digitizer(Matrox PCI Frame Grabber in our case).In each of these images, the robots have to be identified on the basis of color, each team having robots of same color. The color information is 3 bytes per pixel, an image having size 320x240 pixels. Also the ball must be identified. After Identification, each of these objects have to be tracked at least 30 frames per second. All this has to be done within 33ms for each of the 7 objects in the visual field. So, one can calculate the amount of information to be handled as: (30 fps)x(320 by 240 image size)x(3 bytes of color information)x(7 objects)x(2 bytes for location) = 94.5 MegaBytes of information in one second. So one can see the enormity of the amount of data to be handled.
4. CONTEMPORARY RESEARCH
In this section we give a brief overview of concurrent research around the globe on aspects of AI, Multi agent systems, Computer vision and Real time image processing related to soccer robots.
4.1 AI and Multi agent systems
The present methodologies being followed for development of strategies include the following:
Learning tasks in this domain mainly utilizes feed-forward neural networks in the role of supervised learning [5]. This leads to methods, which are off-line, thus unable to adapt to opponents. Being black box models they are difficult for a human to understand. The requirements for a supervised learning algorithm are also much higher because of its need of optimal state–action pairs [4]. Rule based strategies are less demanding to develop but they depend on programmer a lot for their success. Also they fail drastically under new situations, which have not been considered by the programmer. Decision trees have also been used to predict and to learn optimal actions under different situations. Most of neural nets and other learning methods have mainly been black box methods and as such impact of each and every factor of game cannot be visualized easily. On the contrary are methods involving potential fields which are easy to visualize and one knows precisely as to what will be the qualitative impact of any variation in associated parameters upon the overall strategies [1],[2]. However, the design of these potential fields has mainly been dependent on the programmer. Also these potential functions once developed had no formal procedure for optimization. After lot of trial and error the designer would change the parameters relating to potential functions to improve upon them.
In this work we have worked with potential field approach, as it is more intuitive and less computationally intensive. To bring these strategies closer to the way human play we have based our strategies on ground truth data collected from several soccer players. As a method of optimization of these strategies we have used genetic algorithms.
4.2 Computer Vision and Image Processing
If one looks at the vision systems for Soccer robots, used by teams all over world, one may find no dearth of innovative techniques used for Real Time Color based segmentation and tracking. The fundamental challenges that drive much of the research in this field are the enormous data bandwidth implied by high resolution frames at high frame rates, and the desire for real-time interactive performance. Numerous innovative methods have been proposed. However, most of these methods utilize sophisticated models such as edges, snakes, splines, templates or computationally intensive eigen-image or condensation algorithms. Although these approaches are broad in their abilities offering reliable object recognition in addition to tracking, they are yet unable to run on full video resolution images at high frames rates.
Some of these methods worth mentioning are:
All these methods are statistical in nature and relies heavily on training and pre conditioned data.
The first one was the obvious choice to work on and improve further as it was very efficient, fast and independent of the varying illumination.
5. REACTIVE STRATEGIES
Multi-dimensional fuzzy functions or Potential Fields are a common approach to implementing reactive behaviours [1][2].
Often these potential fields are designed based on subjective decisions made by the programmer, which turn out to be sub-optimal. In this work we develop and implement approach which constructs the potential fields based on Groundtruths, or the correct behaviour outcomes.
5.1 Potential Fields
In soccer robot domain the objective is to design a reactive strategy function which will indicate the appropriate actions of each robot under different states. Designing such behaviors involves planning for the individual robot as well the team dynamics, thus the reactive function is defined on a state vector which consists of the position and orientation relative to goal, position and headings of other robots (opponent /self team), ball ownership, etc. Now, based on the current task (defence, pass receiving, striking, etc.), a suitable reactive strategy has to be defined. In our approach, this is done by defining the strategy as a sum of several multi-dimensional fuzzy functions more popularly known as potential fields [7]. In this approach the robot is represented as an entity under the influence of several artificial potential fields whose local variations are expected to reflect the favourability of a particular action concerning that region. The potential function are defined over entire field so that attractive potential make robot towards moving to a more favourable location (eg: closer to the ball, closer to the goal) while repulsive potential push robot away from unfavourable locations (eg: away from opponent robot).

Figure 4:Representative Field Layout
Three opponent players(black) are trying to prevent the other three (grey/white) from scoring.The white player is trying to position himself for receiving a pass.The field colouring reflects relative desirability of various positions in this state under a reactive behaviour function, which is to be tuned using groundtruth.
5.2 What is Groundtruth?
The term "Groundtruth" arises in Satellite Image Processing, where
it implies the correct labeling of zones in the image. In recent times
it has come to acquire a wider meaning in vision and document interpretation where it represents the correct results calibrated on the sensory data. In the context of agent behaviours, we take it to mean the "correct" action in a given state, i.e., the action, which the reactive function should generate. We present two approaches for determining this groundtruth. To our knowledge, this concept has not been used in the design of multi-agent system. Here we take groundtruth to mean the correct action of the agent given a set of objectives in a certain configuration.
Performance evaluation in this domain is not easy; the ultimate objective is to score a goal, but we need some feedback at the intermediate stages as well. Approaches that attempt to reinforce the actions leading to success require storing all the actions and their consequences over a large number of simulation runs, and then mapping, for each state, the probability of success using all actions. This state-action space, for problems like robot soccer, is often very large. In these instances, one may use
alternate sources, In practice, two methods can be used for obtaining the for defining optimal reactive behavior or "groundtruth":
We have used Delphi method in designing our strategies.
5.3 Polling Human Experts
Since every expert might have different measure of the optimality of behaviour, the criteria for optimality also get encoded in this information which had to be decided by the programmer in case of simulations and it is possible that one might miss on some important criterion. Also since we get the information about optimality criteria from this data once we have made our strategies compatible with this data, generalization can be done with less chances of an error since all possible criterions have been accounted for in response to above data.
In our work we prepared a response form with eighteen different configurations ( 9 stressing on defence , 9 stressing on offence) and handed each of them to 20 different experts to fill in the place where they think the robot in question should go. Since the all robots of each team are treated identically so these 18 situations in effect represent a much larger set of situation. Also these situations capture the essence of views of experts about relative importance of different factors in decision making so they are sufficient to provide a framework which can be generalized to develop comprehensive strategies. The form showed the field along with goals, players and the ball at an instant and the directions in which opponent robots are heading and their expected position at planning horizon, there response was also to be for planning horizon. Two such configurations and corresponding responses are shown in the figures below.


Figure 5:Sample Query Sheet given to human experts
The human expert is asked to indicate the direction and velocity in which the white robot should move. Opponent goal is to the left in each field. A total of eighteen configurations were handed to 20 experts.


Figure 6: Variation in human responses
Human responses shown as a scatter plot for field configurations 1 and 2 of figure (***********). In configuration 1 there are two possible responses modeled as two separate clusters (between ball and goal, or between ball and opponent). In configuration 2 there is
only one major cluster though it has a wide variation. There are some outlier behaviours in each sample. In comparing the data, each cluster was separately averaged and outliers were not considered.
5.4 The Strategies
The strategies can broadly be classified as defence and offence strategies. Here we describe all the potential fields that come into picture for defence and for offence. All of them are added to find the resultant field for all the points on the grid for a particular configuration and for decisions for a particular robot. The decision on offence or defence is made according as which team possesses the ball.
5.5 Defence
For defence the following fields were designed, the summation of which creates the resultant field. The play area is discretized into a grid of 15 x 13. The reactive behaviour expects the robot to move towards the position with the minimum potential.
The component fields are
the position closer to goal are strategically more advantageous than
positions away from it .
At any point on the field the goal field is defined as
![]()
where dist_g = Distance of point from home goal, K_G is an amplitude parameter and n_g is the exponent.


Figure 7: The Goal field
The figure on the top shows the goal field (k_g = 1.5, n_g=2) in colour graded scheme. Lighter colour represents lesser potential (more favoured positions). The plot below shows the variation of the field numerically, 5 isoclines are also shown, which are essentially circular. The x axis represents X axis on the field i.e. from left to right in adjoining figure and y axis represents direction parallel to two goal lines. Z axis represents value of potential.
5.5.1 Ball Field
This field tries to take the robot closer to the ball. This field is to ascertain that the robot of home team keep close to ball as the control of ball becomes very important in a fast game of this nature. Also it prevents the robot to go to position, which might be a good place for stopping ball but which are far away from current focus of play. This positions and associated planning can thus prove to be over ambitious.
At any point on the field it is defined as
![]()
where dist_b=Distance of point from ball, k_b is an amplitude parameter and n_b is the exponent.
This field is important as most of the users (90 percent) preferred a position of robot closer to the ball than other positions.


Figure 7: The Ball field ( k_b = 2, n_b =2)
5.5.2 Other player field
This field is designed to avoid collisions with other players. This assign high potential to points very close to other players of same or opponent team.
![]()
![]()
For d > D_max
where dist_op =Distance from other player and D_max is a threshold beyond which this effect is insignificant. Figure (*****) shows a graphical representation of this field.
5.5.3 Point Occlusion field
Locations behind the other robots are not reachable by the agent. A pencil region is defined from the robot center to encompass the other robot with some clearance on each side. The width of the pencil = 1.5 x Robot size. All points in the pencil lying beyond the other robot are then assigned a very high field value (the maximum current field strength). This rules out regions which are unreachable through direct motion from the current position.

Figure 8: The Other Player field (k_op = 8000)
(shown only for opponent robots)

Figure 9: Point Occlusion Field
The resultant of point occlusion field and the goal field. The region behind Home team robot is unreachable and is shown dark due to point occlusion field.
5.5.4 Obstruction field
It is desirable in Defensive play that the robot should obstruct the path from the opponent robot to the ball (region OB) or from the ball to the goal (region BG).
Region BG is an X-shaped double-pencil drawn from each goal post (Home team) to the ball and extended backwards till the opponent robot.
Region OB extends from a point on each side of the opponent robot to the ball and is extended as region OBB beyond the ball.
In addition, all the points lying beyond the opponent robot are assigned a high potential
![]()
where k_obs = amplitude parameter.
dist_orb = Distance of Opponent robot from ball.
For blocking motions, it is desirable to head for the Region OB blocking the opponent robot from reaching the ball. This is achieved by applying an additional high potential Field_obs everywhere except for region OB.
In particular, points in the intersection region of OB and BG are particularly dangerous since the opponent can score goals from this region. An additional Field_obs is added everywhere except in this intersection zone to account for this. This process is repeated for each of the opponent robots.

Figure 10: The Obstruction field
The figure shows the combined Obstruction field, where some points have several Field_obs added.
5.5.5 Energy Conservation
This potential field is based on preferring points which require minimum energy and time to reach. it thus directly follows that this has to depend on the distance of a point from the robot. It is defined at every point on play field as
![]()
where k_e is amplitude parameter,
n_e is exponential parameter
dist_rob = distance of a point from the robot.
This also is important to ensure that not all the robots of the same side try to rush to same location as it ascertains that a robot playing in left half prefers to continue to play in left half and the robot in right half prefers to continue to play in right half.


Figure 11: The Energy conservation field
The figure shows the Energy conservation field (k_e =1.5, n_e=2) in colour graded scheme, Dark colours represent high potential while light colors represent low potentials or the preferred regions for energy conservation.
5.6 Offence
In case of offence it is assumed that the one of our team members (not the one we are making decision for ) has the possession of the ball. In this case, Ball Field, Energy conservation field, and Other player field are the same as in defence. Goal Field is centered on the opponent goal. The point occlusion field is similarly oriented towards the opponent goal instead of home goal. Obstruction field is not used and a Ball Occlusion field is used instead.
5.6.1 Ball occlusion field
This field tries to eliminate those points where ball can not reach because of occlusion by opponent or other home team robot. One ray each is drawn from the ball to points on each side of the other robot. The region beyond the other robot is occluded and is assigned the existing maximum field strength.

Figure 12: The Ball Occlusion field
The darker regions are regions which have very low probability of ball reaching them and hence are the ones which are less preferred for receiving a pass.
5.6.2 Striking behavior
For this behavior, potential functions of offence are used, which include the goal field, the other player field, the occlusion field and the ball field. Thus if goal is not occluded and is not far from present ball location than robot will strike towards the goal.
5.7
Predictive PositioningThe soccer robot domain represents a dynamic environment. In planning in such a environment the ability to predict and use this prediction to plan for the planning horizon is very important for optimal behaviours. To accomplish this we have developed a prediction model that uses the present velocities of the opponent robots to include a estimate of the situation ahead and adapt the potential field to it. This ensures that behaviours designed pertain to future configuration rather than present configuration. We have introduced some uncertainty about expected future configuration by assigning a probability distribution to a number of possible configuration. We then find such future configurations and find complete resultant potential field distribution for each of these configuration. Then these potential distribution values are multiplied by corresponding probability of that particular configuration. These weighted potential distributions are then added to find net potential for deciding on the behavior.
The possible configuration are found by assuming that each robot can take a number of discrete positions (presently 5) between its present and predicted position at the planning horizon (the end of the line showing the direction of motion in query figures) with some probability for each of the five locations.

Figure 13: Variation in Behaviour by inclusion of a prediction model
The figure to the left shows effective potential variation and the corresponding behavioral decision taken for the white robot without using a prediction model while figure to the right shows the same parameters with the prediction model taken into account. There is a marked difference between the occluded and the obstructed regions as they take probable motion of the opponent robots into account.
5.8 Optimization of the behaviours
The groundtruth values provide a reference to which the results of the current behaviour can be compared. For any configuration, the groundtruth provides a direction in which the robot must move, as well as a magnitude of motion. These are then compared with the direction and amplitude suggested by the reactive model generated by combining all the potential fields described above.
If there are N calibrated data sets (ie. N configurations), then the net error can be characterized as
![]()
![]()
where subscript Behaviour reflects data for the actual behaviours obtained from strategies with particular parameters, and Ground indicates the groundtruth (the mean of the clusters). This results in a 2-D vector error (r, theta). In practical terms, the angle error is more relevant since it determines the immediate direction heading and has a higher impact on future decisions. A single error parameter is arrived at as follows

at present R_const has been fixed at 5 Robot length.
In the behaviours being modeled here, the parameter set consists of nine parameters for Offence, and nine for Defence. The parameters are as follows for both of them but they take different values:
n_e, n_g, n_b, k_e, k_b, k_g, D_max, K_obs and k_op.
The objective of optimization is to determine a set of the potential parameters that results in a minimum error from the groundtruth. Since the domain is very rough in terms of local minimas so we have concentrated on using several runs of Genetic algorithms with different initial population.
5.9 Genetic algorithms
GA’s are very popular tool for engineering optimization problems [6]. In our problem we have used roulette-wheel selection, a single-point crossover, and a bit wise mutation operator. The coding has been done as 45 bits of which each parameter has been allotted bits as follows:
|
Parameter |
Number of bits allocated |
Range of value
|
|
K_g |
5 |
0 - 10 |
|
k_b |
5 |
0 - 10 |
|
k_e |
5 |
0 - 10 |
|
n_g |
5 |
0 - 5 |
|
n_b |
5 |
0 - 5 |
|
n_e |
5 |
0 - 5 |
|
k_obs |
8 |
500 – 5000 (defence) 1000 – 10000 (offence) |
|
d_max |
3 |
40 – 120 |
|
k_op |
4 |
1000 – 15000 |
Crossover Rate = 0.7
Mutation Rate = 0.05
Every simulation of genetic algorithms ran for 50 generations.
6. REAL TIME COMPUTER VISION AND IMAGE PROCESSING
For implementing the vision system for the soccer robots, color is the most visual cue for identifying robots. Color has been widely used in machine-based vision systems for tasks like image segmentation, object recognition and tracking. It offers several significant advantages like robustness under partial occlusion, rotation in depth, scale changes and resolution changes. The overall problem can be broken down into three tasks that have to be performed. They are:
6.1 Color Segmentation
In machine vision, color pixels usually contain Red, Green and Blue values each measured in 8 bits. Typical color segmentation would involve a conversion of these values to some color model parameters, then, comparison of these parameters to the assumed object model invariants.
The difficulty using RGB model is that it is quite sensitive to variation in color values brought about by lightning changes. Intensity is distributed throughout all 3 parameters, rendering color values highly sensitive to scene brightness. A simple approach to color constancy is to use the HSV color space [9],[10] which consists of Hue angle(H), color saturation(S) and Intensity(I). In order to obtain a limited level of intensity invariance,
Colors can be modeled in HS Space. Also, speed of segmentation may become an issue in real time applications where use of computationally expensive operations during the segmentation process may be restrictive.
The color segmentation can be divided further into:
(1)
Where a mixing parameter
corresponds to a prior probability that pixel ,
, was generated by component j and where ,
. Each mixture component is a Gaussian with mean,
, and covariance matrix,
, i.e. in case of a 2D color space:
For color segmentation of the image HS model, due to reasons mentioned previously, was used. This conversion of a pixel’s RGB value to HS value was done according to standard algorithm mentioned in text books[9],[10]the conversion, for color segmentation the following steps were followed:

Figure 14: Training the Gaussian Model

Figure 15: HS Space
Plotting the HS space(Lower left Window)
Thresholded HS space(Upper Right Window)
1. Training the Gaussian Model based on a particular object’s color. For every pixel of the training object the HS value was calculated and then the mean and the variance were calculated in the HS space. These calculated values of mean and variance were used as mean and variance for a Gauusian function. A single gaussion was found to be sufficient for segmenting a single color.
2. For faster performance, the whole RGB cube (8 bits per color, 16.6 million colors) was compressed into a total of 4 bits per color, 4K colors). This compression algorithm was implemented in Assembly language for faster runtime. This resulted in a lesser search space at the same time accounting for small variations in color of the object as the last 4 bits of color information were neglected.
3. Using the training data and the RGB values calculated as above, a LUT table was generated. This LUT would facilitate in on the fly decisions regarding the containership a color pixel to an object.
4. Using this LUT generated, the image was segmented based on a particular color for which it was trained.
5. Similar LUTs were generated for different colors which were there in a particular image.
All this training and segmenting the image was done in the order of a few seconds which is quite fast for such a complicated sequence of steps.

Figure 16: Segmenting and Deciding the cluster locations(lower left)
6.2 Object Recognition (Clustering)
Clustering techniques seek to separate a set of data into groups or clusters. This is used to determine to location and size of the robots, once segmentation is completed. Some famous algorithms for clustering are k-means algorithm or the greedy algorithm but all of these techniques suffer from the draw back of being slow in convergence to a solution. The following method is suggested for clustering.
The direct manipulation of the data set is impractical due it’s large size. So first data compression or consolidation and then region segmentation is applied. While doing all this, it must be ensured that the essential character of the data and the context of the image is not lost. This is achieved by:
It should be noted that the density map is S2 times smaller than the original binary image. Such data compression significantly speeds up the clustering process, since only density map pixels are used to further consideration.
It should be noted that clusters are representative of the object shapes. Segmenting a image only means that we know what’s the color of a particular object. The position of the object is still to be calculated.
This requires further processing as mentioned below:
1. Taking the segmented image, for faster calculations a density map of the segmented image was generated. This density map contains as pixel values, the number of valid object pixels in a smaller sample window. We took the size of the sample window to be 8. This resulted in a total reduction of 64 times in the image size. (40x30 for an initial image size of 320x240). This density map is an accurate representation of the image itself and can thus be used for further processing. All further operations were performed on the density map.
2. A cluster was determined based on the 8 neighbour approach and connected components algorithm. For the actual implementation, Depth first Search algorithm was
used for storing the valid neighbors of a pixel. By valid neighbor, we mean a neighbor belonging to the object.
3. Finally the location of the cluster in terms of x and y coordinates and size of the cluster were calculated in the density map and later it was translated linearly to the actual image.
The location and size obtained using this method were quite accurate and at the same time, being calculated in
time where n is the total number of image pixels considered, which in our case is the number of pixels in the density map (1200 only as opposed to 76800 in the original image)

Figure 17: Tracking a Particular object
6.3 Tracking
The tracking dynamics involve estimating the position, width, and height of the object. This box provides a focus of attention for further processing. The position and size of the box are found by computing the mean,
, and standard deviation,
, of the local color probability distribution within a rectangular search area centered on,
, at time t [11],[12]. The dimensions of this search area are determined by scaling the dimensions of the bounding box at time t-1. In our case, we took size of the search box to be about 1.5 times that of the bounding box.
For a given time frame t, the box position,
, is estimated as an offset from the position ,
:
(3)
where
ranges over all image coordinates of interest and ,
, is the HS color vector at image position
. To improve accuracy, probabilities
are thresholded. Probabilities lower that the threshold are taken to be the background and are consequently set to zero in order to nullify their influence on,
, and,
. The size of the box can also be dynamically estimated.
Once the robots location was determined, the robots were tracked.
1. Based on the robot location and size determined previously, a search box was generated within the image within which the robot was searched. This helped in reducing the actual search area within the image and the search became faster. By calibration and by knowing the speed of the robots, an approximate size of the search box can be determined centered at the object center. In our case, we found that a search box of size
times the object box was sufficient.
2. For all the pixels in the search box, the HS values were calculated which gave the RGB values in the smaller subspace of the RGB space. Then the probability, for the Gaussion Model were found out by using LUT. In this way, precious time for calculation of probabilities was saved.
3. The probabilities were then thresholded to effectively recognize the object and to eliminate any spurious object pixels being generated by noise or to compensate for errors in segmentation.
4. Using all the valid object pixel probabilities, new values for the object location were determined based on the formulae given before.
The new size of the object can also be calculated afresh.
All this steps took a total of about a few ms, which was sufficient for us as the frames are coming a time gap of about 33ms(30 fps).
7. Results
With use of GA’s the performance of the strategies gradually improved in terms of ground truth. Following table shows the variation of parameters and the goodness with generations of GA for case of defence similar results were obtained for offence.
|
Genera—tion number |
Goodness
|
k_g |
k_b |
k_e |
n_g |
n_b |
n_e |
k_obs |
D_max |
k_op |
|
0 |
.49 |
1.5 |
2 |
1.5 |
2 |
2 |
1 |
800 |
60 |
8000 |
|
10 |
.51 |
3.87 |
8.06 |
9.68 |
3.39 |
3.39 |
.81 |
1877 |
108 |
12200 |
|
20 |
.51 |
3.87 |
.32 |
3.23 |
4.03 |
4.84 |
1.94 |
1078 |
62 |
8466 |
|
30 |
.53 |
9.03 |
5.81 |
4.84 |
1.45 |
1.61 |
.81 |
1384 |
40 |
10333 |
|
40 |
.54 |
5.16 |
.65 |
10 |
3.23 |
4.03 |
1.45 |
1792 |
97.14 |
2866 |
|
50 |
.55 |
5.81 |
5.48 |
5.48 |
4.35 |
4.52 |
4.19 |
738 |
97.14 |
14066 |
We optimized strategies on a set of 18 different configuration here we present the results of optimized strategies on 6 such cases.



Figure 18: Results compared to the Groundtruth (for trained data).
The white line indicates the behaviour after tuning; black lines are the cluster averages, gray lines represents actual expert actions. The colour gradation represents the potential distribution.
Once the strategies were optimized using the training set, we tested strategies on some untrained random data sets. Here we show six such situations and leave it to reader to judge the performance of the strategies.



Figure 19: Results for random configurations ( untrained data) .
The colour gradation represents the potential distribution. The white line from the white robot shows the decision arrived at by optimized reactive strategies.
We were successful in tracking multiple objects(3) of same color at pretty fast frames rates at about 25 fps. The time taken for clustering was to the order of a few ms for multiple objects which is quite fast but the time taken for segmenting the image was around a 10 ms. This can be attributed to the huge amount of computation involved in segmenting the whole image.
9. References




Figure 20: Tracking
These images show the tracking of three objects of same color. Notice the bottom most object in first figure. This has been moved to the right. Also Notice how in subsequent frames the tracking algorithm tries to track the object by successively moving the tracking window to the right and finally locks on the accurate object location in the last frame. The motion of the window is shown by the white line which indicates displacement from the previous frame.
We also studied the performance of the complete work in cluttered environments which turned out to be quite satisfactory. Not only were we able to successfully segment and cluster the object in a cluttered image having multiple objects but, were also successful in tracking it.


Figure 21: Cluttered Environment
These images show the results of segmenting and tracking in a cluttered environment.
9. References