Thanh-Hai TRAN

Image and Video Understanding 

Natural lines detection for image representation
This work has been conducted in the context of my master and Ph.D thesis at GRAVIR laboratory, Inria Rhone-Alpes, Monbonnot, France under the surpervision of Prof. Augustin Lux from 09/2001 to 06/2006. In this work, we have proposed a new definition of ridge and valley that bases on analysis of Laplacian response of the image in the scale space. Hereby, we detected ridges and valleys as local directional maxima of Laplacian. These ridges and valleys correspond to the central axes of structures with additional information of width. Ridges and valleys show its significant advantages for representation of all natural line structures. We have applied into text and human detection in images and obtained very good results [more detail ... ]
Visual localization and mapping
This research was carried out in the framework of Ph.D thesis of Quoc-Hung Nguyen under my supervision and Prof. Quang-Hoan Nguyen from 12/2010 to 12/2016 at Hanoi University of Science and Technogy. In this work, we have three main contributions: i) proposed a hybrid representation of environment for visual servoing; ii) enriched keypoints detected from images of floor planes for improving visual servoing algorithm; iii) improved vision based localization in indoor environment by discarding similar images and keeping only distintive key location. The experiments have been conducted in three environments: a floor of B1 building; a floor of Ta Quang Buu library, and a floor of Nguyen Dinh Chieu school. We have sucessfully built an application that support the visually impaired to navigate in indoor environment using robot  [more detail ... ].
Object detection and fitting from point cloud data
We proposed a geometry-based method for 3D object fitting and localization in the context of building a grasping aid service for visually impaired people using information from Kinect sensor. Given two constraints of this working application, (1) the interested object is on a table and (2) the geometrical form of the object is known in advance based on the query of the user, the proposed system consists of three steps: table plane detection, object detection, and object fitting and localization. Our work has three contributions. First, we propose to use organized point cloud representation instead of just point cloud in order to speedup the computational time and improve the accuracy of table plane detection. Second, we employ MLESAC (Maximum LikElihood SAmple Consensus) that can give better results for object fitting. Third, we introduce a new method for evaluating object localization task and make a quantitative evaluation ofobject localization on our captured dataset  [more detail ... ]
Image analysis in biodiversity

We applied and improved image processing techniques for three main biordiversity applications. The first is a system for automated classification of rice variety for rice seed production. We investigated various feature extraction techniques for efficient rice seed image representation. We analyzed the performance of powerful classifiers on the extracted features for finding the robust one. Images of six different rice seed varieties in northern Vietnam were acquired and analyzed. This result can be used for developing a computer-aided machine visionsystem for automated assessment of rice seeds purity . The second application is to count tender tea shoots in a sampled area is required before making a decision for plucking. However, it is a tedious task and requires a large amount of time. In this paper, we propose a vision-based method for automatically detecting and counting the number of tea shoots in an image acquired from a tea field.  It offers an elegant way to build an assisting tool for tea harvesting. The third application is to identify medicial plant using multimodal information [more detail ... ]
Human detection and tracking
Object  tracking becomes  a problem that interests  more and more researchers due to its wide applications in reality. In monitoring application where we need to detect and track multiple people in a scene,  multiple  object  tracking  is  necessary.  The  problem  of  multiple  object  tracking  is  much  more complex  than  single object tracking because  we must do  an additional  step:  data association  as well as manage the apparition and disappearance of the objects.  In this work, we proposed to detect human in a fast and efficient manner. First, we applied a backgound substraction to decarding all static regions in images and keep only moving regions of interest (ROI), then we verify if ROI contains a human by HOG-SVM human detector. For human tracking, we proposed a  framework for  multiple  objects  tracking  using  Kalman  filter  with  the  data  association  using  histogram  based similarity  measurement. This method  permits tracking multiple people in realtime so can be applied in real applications [more detail ... ]
Human activity representation and recognition
In this study, we overcome these encountered issues by combining multi-modal features (skeleton and RGB) from Kinect sensor to take benefits of each data characteristic. If a skeleton is available, we propose a rules based technique on the vertical velocity and the height to floor plane of the human center. Otherwise, we compute a motion map from a continuous gray-scale image sequence, represent it by an improved kernel descriptor then input to a linear Support Vector Machine. This combination speeds up the proposed system and avoid missing detection at an unmeasurable range of the Kinect sensor. We then deploy this method with multiple Kinects to deal with large environments based on client server architecture with late fusion techniques. We evaluated the method on some freely available datasets for fall detection. Compared to recent methods, our method has a lower false alarm rate while keeping the highest accuracy. We also validated on-line our system using multiple Kinects in a large lab-based environment. Its on-line deployment on multiple Kinects shows the potential to be applied in to any of living space in reality [more detail ... ]
Tracking for augmented reality
This paper develops a fully automatic system for tracking based on matching keypoints at both steps: initialization and tracking. At initialization, the pose of the camera is reliably estimated by matching the first image against all reference images in database using SIFT-matching algorithm [1]. Then we apply a fast algorithm, proposed in our
previous works [9] to match the current image with the previous one to speed up the tracking. In case where the matching is not satisfied, SIFT matching is applied to re-initialize the tracker. The main contribution of this paper is having developed one system of camera tracking, which is fully automatic and works in near real-time, so suitable to real-time applications such as augmented reality that we show the result in this paper [more detail...]

Vision based Human Machine Interaction

Hand posture recognition for human robot interaction
Hand posture recognition is an extremely active research topic in Computer Vision and Robotics, with many
applications ranging from automatic sign language recognition to human-system interaction. We have performed this work in in the context of Ph.D Thesis of Van-Toi Nguyen. We have made three contributions: For hand detection: we introduced internal Haarlike features that are detected only in the region of the hand. This method outperform the Viola-Jone method on the same dataset. For hand representation,  a new descriptor for hand representation based on the kernel method (KDES) has been proposed which is more robust to scale change, rotation, and differences in the object structure. We have built an application which uses hand postures for controlling robot in the library [more detail ... ]
Dynamic hand gesture recogniton and application into human system interaction
This work tackles a new prototype of dynamic hand gestures and its advantages to apply in controlling smart home appliances. The proposed gestures convey cyclical patterns of hand shapes as well hand movements. Thanks to the periodicity of defined gestures, on one hand, some common technical issues that may appear when deploying the application (e.g., spotting gestures from a video stream) are addressed. On the other hand, they are supportive features for deploying robust recognition schemes. To this end, we propose a novel hand representation in temporal-spatial space. Particularly, the phase continuity of the gesture’s trajectory is taken into account underlying the conducted space. This scheme obtains very competitive results with the best accuracy rate is of 96%. We deploy the proposed techniques to control home-appliances such as lamps, fans. Such systems have been evaluated in both lab-based environment and real exhibitions. In the future, the proposed gestures will be evaluated in term of naturalness of end-users and/or robustness of the systems [more detail ... ]

Face detection and recognition
This work concerns the problem of face recognition from video under uncontrolled lighting condition. To face with illumination change, we propose to inspire the idea that preprocesses input images in order to represent them robust to illumination change [1]. We then use embedded Hidden Markov Model (EHMM), a famous model for face recognition to identify faces [2]. The main reason that we would like to study and experiment this model is that it allows us to represent structure of face images that makes more explicit face representation than numeric face descriptors. The traditional EHMM was applied to face recognition from still images. In our paper, we deal with face recognition from video. Therefore, we propose to combine the recognition result obtained from several frames to make our decision more confident. This improves significantly the recognition rate. We have trained our model and tested with two face databases, the one is the Yale-B database and the other oneis created by MICA Institute