Camera pose estimation in a sequence of monocular images

Estimating camera pose parameters and constructing robust and accurate map in a sequence of monocular images is a challenging task in computer vision and robotic applications. It is in close relationship with some fundamental problems in computer vision, e.g., 3D reconstruction, image registration and feature matching. Simultaneous Localization and Mapping (SLAM) and Augmented Reality (AR) are two problems, which, their performance is seriously dependent on accuracy of the camera pose estimation. Visual SLAM aims to estimate camera trajectory and at the same time to construct a sparse or dense representation of the environment. Visual SLAM solutions incrementally construct map of the observed scene and then use this map to locate camera position. When the visual senor embedded in camera is a range scanner or a RGBD sensor, the measured depth of extracted feature points with a certain precision is available and hence no feature initialization is needed for newly detected features. However, unlike range scanners and RGBD cameras that produce favorable information about depth of observed scene, a monocular camera is a bearing-only sensor that only provides 2D measurements of a 3D environment. On the other hand, for monocular cameras due to the lack of depth data for newly observed features, it is necessary to initialize them.

In this project, we performed a feature-based approach for tracking hand-held camera in room-sized workspaces. Our approach is based on tracking a set of sparse features that are successively tracked in video frames. To do so, a particle filter framework was employed to estimate 6-DOF parameters of the camera pose. In addition, the proposed system employs a hierarchical method on basis of Lucas-Kanade registration technique to track a set of sparse extracted features.