Computer Vision Laboratory

Computer vision is the science and technology of machines that can see. As a scientific discipline, computer vision is concerned with the theory behind artificial systems that extract information from images. The image data can take many forms, such as video sequences, views from multiple cameras, or multi-dimensional data from a medical scanner. As a technological discipline, computer vision seeks to apply its theories and models to the construction of computer vision systems. Various reseach areas includes: Applications in Display Technology, Computer Vision for Navigation, Metrology, High Level Video Analysis, and Human Computer Interfaces. Pharmaceutical biomathematical modeling, Signal and System theory.

Faculty:


Ph. D. Students:
  • Prithwijit Guha
  • Rajesh Bhatt
  • Meenakshi
  • Anima Majumder
  • Raju Ranjan
  • Naveen Kumar

M. Tech. Students:
  • G Gupta
  • A. Mishra Sharma
  • Akshata Singh
  • Biswajit Sharma
  • Anurag Srivastava
  • Kamlesh Verma
  • Armin Mustafa
  • Irfan Khorakiwala
  • Shah Jaykumar K
  • K Sudhir
  • Gautam LV
  • Koteswara Rao G
  • Satheesh KVV
  • Shivranjan Popli

Staff:
  • Narendra Singh

Facilities:
  • Hardware and software facilities includes following:
    Cameras:
    121 mp Canon EOS 5D,1 Canon EOS 10D, 1 Sony DCR TRV328 Handycam, 1 3-CCD PDX-10P Camcorder, 4 Sony SNC R Z30 PTZ Cameras, 1 Silicon Video 5MP industrial camera, 1 Basler 0.24MP 120fps monochrome Camera
    Illumination sources, Sensors and setup:
    10 Illumination sources at 850 and 950nm, 1 2kLm DLP projector, Various portable and non portable illuminators, 1m and 5m range IR sensors, 3-side chroma keying setup.
    One locally fabricated 3-wheeled robot with n-pi rotating platform and Zigby wireless interface, 2 LEGO Mindstrom robot kits, one calibrated 2-axis platform.

Research:
  • Digital Video Stabilization through curve warping techniques:
    For digital video stabilization, sufficient number of frames are extracted from the video. For each frame, a signature is provided by integral projection method (summing up the pixel values of each row and each column) generating two characteristic curves for each frame as depicted in Fig.1. By matching between two curves we calculate global motion vector. Under fast varying illumination conditions and in presence of large moving objects Dynamic Curve Warping method provides more robust solution for such complex scenes. The validity of algorithm will be tested by integral projection error, block matching error and dynamic time warping error under various conditions like: large moving objects, fast changing illumination and other required conditions.
  • Automatic Target Detection and tracking for thermal image sequences:
    Automatic target detection and tracking is to extract and track targets of interest and remove undesirable clutter from an image sequence automatically using image processing techniques. Thermal Imagers found the application under the circumstances (e.g., in nights or bad weathers), where sensing in visible spectrum becomes infeasible or severely impaired. Thermal imaging extends our vision beyond the visible regions of the EM spectrum into the infra-red (IR) region (0.7 to 10000m). Thermal imaging allows us to see in complete darkness. ATDT for thermal image sequences is a very challenging task due to the high variability of targets and background clutter and the low spatial resolution. Also there is not enough contrast between targets and background. Following steps are involved: 1.Shot Detection 2.Then within each shot, different frames are decomposed into background foreground. 3. Images are compared with the already learned background models and detection of foreground is done.
  • Human Activity analysis based on pose detection:
    Proposed method is to obtain foreground information. Then estimate pose for each frame followed by regularization of noisy pose estimates. Next temporal collection centroid and orientation of parts is used as raw features. These raw features are further converted into equal length feature vector. Finally a classifier is trained by feeding these feature vectors. Datasets used are from
    a.)http://www.ics.uci.edu/~dramanan/papers/parse/index.html
    b.)http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html
    Results of Pose Estimation are shown in the figure below:
  • Action Recognition in Videos
    The proposed approach is a. obtain foreground blob for each frame b.Extract tracking box centered on actor c.Create shape representation per frame d.Calculate average over magnitude of moving-window Fourier spectrum of temporally varying shapes e. Train a classifier for supervised learning of action representation f.obtain modes/clusters for unsupervised learning. Datasets used were: Wiezmann dataset: 10 actions are: bend, jack , jump, pjump , run , side, skip, walk,wave1, wave2, 9 actors, Static camera, KTH Human action consists of 6 actions which are: Walk, jog,run, box, wave, clap. 25 actors Static camera,in/outdoor.
  • Multiple objects tracking using multiple cameras:
    We applied region-feature based approach for tracking humans through partial occlusions. We implement a part based paradigm that employs both color and edge information to accurately localize different parts of a human target. A method is proposed to segment automatically, using projection histograms, any human target into three parts namely, head, torso and legs. Using their non-parametric color probability density estimates, these parts are tracked individually through mean shift. Following the mean shift convergence, the strong local edges present in the mean shift window are matched iteratively with those of the initially learnt edge template. The robust edge matching validates and refines the estimates of the mean shift procedure. The same approach is extended for tracking multiple objects. Finally we make use of multiple cameras to track the persons efficiently. Some of the results are shown below:
  • Camera placement and network surveillance
    Problem statement: Given a floor plan, place the cameras so as to cover the whole floor plan. Need to decide the location of the cameras place on the boundary of the floor plan as well their orientation and direction. Must ensure to get the minimum number of cameras so as to cover the whole floor area. Greedy Algorithm is used. A greedy algorithm repeatedly executes a procedure which tries to maximize the return based on examining local conditions. It employs simple strategies that are simple to implement and require minimal amount of resources. The floor plan given as an input should have black obstacles. The free spaces are white. Each camera subtends a cone depending upon its location and orientation. The area covered by any camera is colored as green and that is Ac. The area occluded by the camera is colored as blue as that is Ao.
  • Analysis and annotation of cricket videos.
    Video can be treated as a sequence of frames, shots, event sequences or stories at different levels of abstraction. Video Shot Detection and Classification is a fundamental step for efficient accessing, retrieval, browsing, highlight generation etc. Summarization of a large amount of video data is one of the popular fields of video research in very recent times. Sports videos in particular have a clear domain knowledge that depends on the particular sport, which helps mainly in shot classification. The aim of the present work is to analyze the cricket videos from the annotation point of view, and to use this analysis in shot classification to automatically classify shots into various semantic categories, and hence generate annotation cues for them. The features used in shot segmentation step are color histograms. We use in shot classification; both color histograms and mean shot energy as features. In addition to automatic shot classification, the work also deals with one of the popular application of sports video annotation, highlights generation by detecting the slow motion replay shots.
  • Foreground extraction and object tracking.
    Real-time segmentation of moving regions in image sequences is a fundamental step in many vision systems such as automated visual surveillance. Objects that temporally occlude near-static parts of the scene (called the background) are extracted as the foreground by background subtraction algorithms. This process consists of different stages: pre-processing, color space selection, feature extraction, model based learning and segmentation. Our proposed contributions are in the pre processing and feature extraction stages. We propose a novel pre-processing technique for background subtraction that comprises representing the image in multiple scales. We also propose that the co-linearity statistic can be used as the feature to work with. We show that this combination gives the best results. Then we focus on background subtraction for Pan-Tilt Cameras. Here we propose a suitable technique for mosaicing the background model. Next we develop a target tracking system (DynaTracker).
  • Human activity representation, analysis, and recognition.
    Human Activity Recognition is an active area of research in computer vision with wide scale applications in video surveillance, motion analysis, virtual reality interfaces, robot navigation and recognition, video indexing, browsing, HCI, choreography, sports video analysis etc. It consists of analyzing the characteristic features of various human actions and classifying them. The system consists of following stages: Background subtraction, tracking, feature extraction and classification. We build a motion decomposition approach to analyze the periodicity inhuman actions. We further propose a novel video compression idea to compress these extracted periodic activities from the videos. Our method exploits the correlation between the frames over longer length of time, of the order of the period of the activity, as compared to the traditional video compression algorithms which use correlation between a few neighboring frames for motion prediction and compensation. We also consider the problem of silhouette normalization for activity analysis. Keywords: Activity recognition, Principal component analysis, Object based compression, Combination classifier.
  • Multi Camera Pan-Tilt Surveillance Networks:
    With the advent of cheaper cameras and faster computing facilities, multi camera surveillance systems are becoming increasingly popular for wide area surveillance. In this thesis, the challenges of multi-person, multi-camera tracking on a network of collaborating cameras is addressed. Each camera is modeled to be a smart entity, capable of performing certain basic surveillance tasks independently. PTZ cameras have been chosen in order to reduce the coverage cost. A novel auto-calibration strategy is proposed that forms the backbone for assimilating the multiple marginal fields of view of a PTZ camera into one global mosaic. This approach is further extended during the implementation of surveillance algorithms for online adaptive background modeling on active cameras and target localization followed by tracking in a dynamic background. Later these smart entities are grouped together in a hierarchical manner, so as to form a plug and play system. A set theoretic approach to finding spatial correspondences between pair wise cameras in a supervised manner is proposed for a practical 2d manifold and its implication in a 3D environment are studied. Keywords: Multi Camera, Pan-Tilt, Surveillance, Image Processing.
  • Unsupervised Object Categorization from Surveillance Videos:
    Appearance descriptors computed over the complete animacy of an object forms a powerful tool for scene analysis with object discovery in mind. This thesis proposes the means of obtaining such descriptors in an unsupervised manner from the tracking algorithm output. During its scene presence, an object presents itself in many poses with differing frequencies, thus generating multiple modes in the appearance feature space. For each object, we focus on its unoccluded intervals and obtain time-indexed vectors from shape and Haar templates which are then clustered to obtain appearance classes. The object is modeled as a probability distribution over the space of co-occurrent shape and Haar templates. These object models are clustered in an unsupervised manner with Bhattacharya distance metric between object models. Further an algorithm is proposed which gives an asynchronous feedback to the database constructed in the discovery phase on the arrival of new unseen objects or appearances found in further frames. Keywords: Object Discovery and Detection, Spatio-Temporal, Animacy, Vocabulary Update-Correction.
  • Visual Recognition of Hand Gestures in ASL:
    Sign language recognition has emerged as one of the most important research areas in the field of human-computer interaction. The aim of sign language recognition is to provide an efficient and accurate mechanism to transcribe sign language into text or speech. Existing recognition systems rely heavily on expensive instrumented gloves or markers to determine the signer's manual configuration; this is unnatural and restrictive for the signer. Also these systems have mostly concentrated on finger signing, where the user spells each word with hand signs corresponding to the letters of the alphabet. However, most signing does not involve finger spelling but, instead, gestures which represent whole words. This allows signed conversations to proceed at about the pace of spoken conversation. In our work, we look at recognition of signs representing whole words in ASL rather than finger spelling. Keywords: ASL, Gesture, Recognition, Classification.
  • Intrusion detection in presence of continuously and highly varying projector illumination and Finger gesture Recognition in dynamic environment for mobile camera :
    I. The algorithm for the first part to be implemented includes: a. First step was to carefully record videos with certain constraints on it. This was followed by the training /learning phase and calibrating the projector and camera to match the no of frames per second. b. Doing the luminance compensation. c. Performing dominant color compensation.
    II. Algorithm for Finger gesture Recognition in dynamic environment for mobile camera is also implemented. Presently, we are working on the following set of gestures: - Single finger gestures include click, snapshot tool frame, move arbitrary, rotate (clockwise and anti-clockwise), and pan. Two fingers gesture includes drag and zoom (in and out).
  • Laser Pointer mouse:
    The aim of this project is to make a LASER pointer mouse which can be used as a normal mouse during presentations or slide shows. Computer display is projected on a screen using a projector and a LASER Pointer is used as a tool to direct the mouse cursor position. This projection is captured using Fire-Wire digital camera. This captured stream is then analyzed and position of the LASER dot is located. Mouse cursor is then moved to the corresponding LASER dot position. Other button functions of mouse are implemented using a normal radio frequency wireless mouse. Algorithm 1. Project computer display on screen 2. Set camera in a symmetric position 3. Capture screen and adjust gain 4. for (live video stream) do i. get frame ii. save as PPM image iii. extract Red channel image iv. for (each frame) do v. find all pixels with highest red intensity vi. find centroid of all these points having maximum red intensity value vii. set mouse cursor to coordinate of centroid end
  • A Novel Mouse-Based Authentication System:
    We have developed a mouse-based authentication system using Java technology(on the Java 2 platform). In our system, the user will set pass-actions (instead of passwords) with the help of mouse motion, and through the use of numerous user-friendly GUIs (Graphical User Interfaces) we will allow each user to enter the system by performing valid pass-actions. We have developed GUIs keeping in mind the fact that users have varying interests, thus, each user can personalize his GUIs and pass-actions. This definitely reduces the need to memorize long key-words or PIN numbers. The following flowchart demonstrates the four stages involved in our authentication system.
  • Novel Representations, Techniques and Error Evaluation for 3D Reconstruction:
    It proposes a Multiple Axis Object Centered Cylindrical Coordinate System (MAOCCCS) for the representation of 3D models reconstructed by silhouette based technique. A single axis cylindrical coordinate system is insufficient for representing objects with multiple auxiliary components because it leads to unequal distribution of points over the object. It also proposes a camera calibration method using mesh-grid patterns for laser based technique. Using this method we are able to achieve sub-millimeter accuracy in the reconstructed 3D models. A novel technique for the error evaluation of a reconstruction method, which is closely related to the conventional concept of visual hull, has been introduced in this work. We have also performed texture mapping on the reconstructed models.
  • Novel Techniques for 3D reconstruction:
    The problem of reconstructing the exact shape and texture of any object using its multiple views is considered. We observe the object from multiple views and capture its silhouette and texture images. We propose a method to automatically reconstruct the shape of the object using these silhouette images. However the silhouette images can only give the convex hull of the object. In order to reconstruct the concavities, we propose a laser light based approach which can reconstruct even the concave surfaces. Finally after reconstructing the exact shape, we map the texture on the object using the images captured at multiple angles. Keywords: 3D Reconstruction, Texture Mapping, Structured Light Based Reconstruction, Silhouette Based Reconstruction To have a look at some of the photos of lab works and setup, refer the link given below.

Publications:
  • R Mallik, A Misra, K S Venkatesh, "Nano-Scaffold Matrices For Size-Controlled, Pulsatile Transdermal Testosterone Delivery: Nanosize Effects On The Time Dimension", 2008 Nanotechnology (Electronic Journal) IOP Publishing Ltd.
  • Tripuresh Mishra, A Dutta, K S Venkatesh, "Regrasp Planning For Capture Of A Moving And Deforming Object Using Vision" Mechatronics, a Journal of IFAC, the International Federation of Automatic Control, Elsevier, 2008
  • T Santra, K S Venkatesh, A Mukerjee, "Self Managing Systems Based On Human Social Behaviors" Proceedings of the Ninth Intl Symp. On Artificial Intelligence and Mathematics (AIMATH 2006), Ft. Lauderdale, Florida, USA, Jan 4to 6, 2006
  • P Guha, K S Venkatesh, A Mukerjee, "A Multi-Scale Colinearity Statistic Based Approach To Robust Background Modeling", Proceedings of the Seventh Asian Conf. on Computer Vision, (ACCV 06), Hyderabad, Jan 13 16, 2006
  • Kiran Kumar, K S Venkatesh, "A New Computationally Efficient Shot Detection Algorithm for Cricket Videos", Proceedings of the Twelfth National Conference on Communications, NCC 06, IIT Delhi January 26 to 28, 2006
  • P Guha, A Mukerjee and K.S. Venkatesh, "Spatio-temporal Discovery: Appearance + Behavior = Agent", Proceedings of the 5th Indian Conference on Vision, Graphics and Image Processing, Madurai (India), December 13 to 16, 2006, Lecture Notes in Computer Science (LNCS), Vol. 4338, Springer, pp. 516-527,
  • P Guha, A Biswas, A Mukerjee, P Sateesh, K.S. Venkatesh, "Surveillance Video Mining", Proceedings of the Third IET International Conference on Visual Information Engineering, pp. 447-453, Bangalore (India), September 26 to 28, 2006
  • A Biswas, P Guha, A Mukerjee, K.S. Venkatesh, "Intrusion Detection and Tracking with Pan-Tilt Cameras", Proceedings of the Third IET International Conference on Visual Information Engineering, pp. 565-571, Bangalore (India), September 26 to 28, 2006.
  • P Guha, A Mukerjee, K.S. Venkatesh, P Mitra, "Activity Discovery from Surveillance Videos", Proceedings of the 18th International Conference on Pattern Recognition (ICPR), Vol. 1, pp. 433-436, Hong Kong (China), August 20 to 24, 2006
  • P Guha, A Mukerjee, K.S. Venkatesh, "Appearance Based Multi-Agent Tracking Under Complex Occlusions", Proceedings of the 9'th Pacific Rim International Conference on Artificial Intelligence (PRICAI), Lecture Notes in Computer Science (LNCS), Vol. 4099, Springer, pp. 593-602, Guilin (China), August 7 to 11, 2006.
  • P Guha, A Biswas, A Mukerjee, K.S. Venkatesh, "Occlusion Sequence Mining for Complex Multi-Agent Activity Discovery", Proceedings of The Sixth IEEE International Workshop on Visual Surveillance (In conjunction with ECCV 2006), pp. 33 to 40, Graz (Austria), 13'th May, 2006
  • M P Sriram,K.S. Venkatesh, "A Robust Algorithm for Automatic Novel View Synthesis/Stereo Compression With Efficient Handling of Occlusions and Illumination Effects", Proceedings of the Ninth Asian Symposium on Information Display (ASID), New Delhi, India, Oct 8 to 12, 2006
  • Varsha H Chanrdashekhar, K S Venkatesh, " Action Energy Images for Reliable Human Action Recognition", Proceedings of the Ninth Asian Symposium on Information Display (ASID), New Delhi, India, Oct 8th to 12th, 2006
  • A Jamal, K S Venkatesh, "A New Color Based Optical Flow Algorithm for Environment Mapping Using a Mobile Robot", IEEE Multi-conference on Systems and Control (MSC-2007) Oct 1 to 3, 2007, Singapore
  • Meenakshi Gupta, T Naveen Kumar, Laxmidher Behera, K S Venkatesh and Ashish Dutta, "Environment Modelling in Mobile Robotics through Takagi-Sugeno Fuzzy Model", Irish Signals and systems Conference, June 2009,Dublin.
  • Meenakshi Gupta, Laxmidher Behera, K S Venkatesh and Girijesh Prasad, "Optimized Takagi Sugeno Fuzzy Motion Controller Design For Object Tracking With The Mobile Robot", IEEE TechSym 2010, IIT Kharagpur

Contact Person:
Dr. K. S. Venkatesh
Email: venkats@iitk.ac.in
Location: ACES 207

Photos: