Mme Pham Thi Thanh Thuy, doctorante de l'Institut MICA, a brillamment soutenu sa thèse à Hanoi le 23 mars 2017 et ainsi obtenu le titre de Docteur en Sciences, spécialité Informatique



Title: Person localization and identification by fusion of vision and WiFi


Thesis supervisor: Dr LE Thi Lan
Thesis co-supervisor: Dr DAO Trung Kien



Prof. Huynh Quyet Thang (Chief, SoICT – HUST)

Dr. Tran Thi Thanh Hai (Secretary, MICA – HUST)

Prof. Ngo Quoc Tao (Member, IOIT – VAST)

Prof. Tran Dinh Que (Member, Posts and Telecommunications Institute of Technology)

Prof. Do Nang Toan (Member, ITI – VNU)

Prof. Tran Duc Tan (Member, Coltech – VNU)

Dr. Tran Quang Vinh (Member, SET – HUST)



Modern technology is changing human life in many ways, notably in the way people interact with technological products. Human-computer interaction becomes more and more natural and convenient and this makes our life more enjoyable and comfortable. A new concept was formed for this revolutionary change that is Ambient Intelligent (AmI). Although the vision of AmI was introduced more than ten years ago and its research has strengthened and expanded, the development and implementation of the resulting systems and environments are still in infancy. There are many practical challenges that need to be addressed in each of the contributing technological areas or particular applications.

In this research, the contextual information of person position and identity is considered in indoor environments. They are two of the most crucial attributes for ambient environments. In order to determine position (where a person is) and identity (who a person is) in indoor environments, two problems of person localization and identification need to be solved. A wide range of sensors can be used to handle these problems, such as Ultra-Wideband (UWB), ultrasound, Radio-Frequency Identification (RFID), camera, WiFi, etc. It has by now become apparent that there is no overall solution based on a single technology is perfect for all. Therefore, besides developing optimal algorithms for each technology, fusion of them is a new trend in solving the problem of person localization and identification in indoor environments. The main propose of the fusion is to retain the benefits of each individual sensor technology, whilst at the same time mitigating their weaknesses. Being motivated by this, our research focuses on person localization and identification by fusion of WiFi-based and vision-based technologies.



• Contribution 1: Proposing an improvement for WiFi-based user localization. In this proposal, an effcient pass-loss model is defined with the consideration of obstacle constraints in indoor environments. Based on this, we can effectively model the relationship between RSSI and the distance from a mobile user and APs. A well-known fingerprinting method with a new radio map is defined to make stable and reliable fingerprint data for localization. In order to do matching between a query sample and fingerprint data, KNN method is utilized with an additional coefficient reflecting the chronological changes of fingerprinting data in the environment. The WiFi-based localization results allow to activate the vision-based localization processes at the cameras which are in the range of returned positioning result from WiFi system.

• Contribution 2: An eficient shadow removal method is proposed by the combination of chromaticity-based and physical features. A density-based score fusion scheme is built to integrate each shadow-matching score archived by each independent feature. The proposed method for shadow removal is particularly effective for human detection, tracking, localization and Re-ID.

• Contribution 3: Proposing a combined scheme of HOG-SVM and GMM background subtraction techniques for human detection. The scheme allows to take advantages of high speed of GMM and accuracy of HOG-SVM for human detection. For HOG-SVM detector, we build HOG descriptors and train SVM on our database and standard INRIA dataset. This helps to improve the performance of human detection by HOG-SVM in the considered environments.

• Contribution 4: Setting a useful method for linking person trajectories in camera networks. Based on the observation that cameras have views on a common floor plane where the people move, each pair of cameras forms a stereo vision on a single floor plane. By using camera calibration for this stereo vision, the person's trajectories on the images captured by different cameras can be transformed correspondingly to the real world locations on a unique floor map.

• Contribution 5: An effcient appearance-based human descriptor is applied for person Re-ID in camera networks. The descriptor is built on each detected human ROI from human detector. Three different features of gradient, color and shape are extracted from a human ROI at three levels of pixel, patch and whole human ROI, then three match kernel functions are built from these. Fusion of these match kernel functions results to an invariant descriptor to scale and rotation of human images captured from different camera view angles and distances. This is especially helpful for multi-camera surveillance scenarios which exist high intraclass


• Contribution 6: A new fusion method is proposed in a combined person localization and identification system of WiFi and camera. By using state prediction and correction steps in Kalman filter, together with an optimal assignment, the proposed fusion method allows to maintain the high accuracy of vision-based person localization. In addition, with this fusion, tracking by identification based on ID cues from WiFi adapter also offers better solution for person tracking and Re-ID in camera networks. Apart from the main contributions mentioned beforehand, in this thesis, a fully-automated person surveillance system is proposed in indoor environments. The system reflects the real surveillance scenarios in most buildings. Towards building a such surveillance system, some experiments are done to show the performance of the other reported methods for person localization, identification and Re-ID. Additionally, we also build our datasets for testing the proposed methods or other reported methods.


General framework and outline of the thesis

In each camera FOV, person localization is done by three phases: human detection, tracking and 3D localization to output person identity (IDC) and corresponding positions (PC). Because WiFi range covers the camera FOVs, so in each camera FOV, vision-based positioning results will be combined with WiFi-based localization results by a fusion algorithm in order to make effective decisions about position and identity of person in environments. When people switch from one camera FOV to another, they will be re-identified to update the ID for each individual trajectory. The trajectories through the cameras will be also linked to show the entire route in the environment.

In this thesis, the algorithms for person localization and identification will be developed and evaluated in the combined system of WiFi and camera. This thesis is divided into five chapters, with the introduction at the beginning and the conclusion with future research directions are shown at the end:

• Introduction: The motivation and objective of the thesis; the considered context, constraints and challenges that arise when dealing with the problems in the thesis. Additionally, an overview framework, thesis outline and the contributions of the thesis are also presented in this chapter.

• Chapter 1: The related works on person localization based on WiFi system or camera system and fusion of them are discussed. In addition, the relative issues of person Re-ID in camera network are also surveyed in this chapter.

• Chapter 2: The details of the proposed algorithm and testing evaluations for WiFi-based localization are presented.

• Chapter 3: A vision-based person localization system is proposed with three main phases of human detection, tracking and human localization. The improvements are given in each phase in order to enhance the system performance.

• Chapter 4: In real-time scenarios of multi-camera surveillance system, the problems of person identification based on human face and appearance-based person Re-ID are proposed. An efficient human descriptor based on appearance is proposed with evaluations on person Re-ID in a camera network.

• Chapter 5: An integration of WiFi and visual signals for person localization, identification, and Re-ID is proposed in this chapter.

• Conclusion and future works: Major findings of the thesis will be recapitulated and future directions are proposed for further research and development.