Cognitive feature fusion for effective pattern recognition in multi-modal images and videos

Yan, Yijun

Thesis

Cognitive feature fusion for effective pattern recognition in multi-modal images and videos

Download PDF

Creator

Yan, Yijun

Rights statement

Strathclyde Thesis Copyright

Awarding institution

University of Strathclyde

Date of award

2018

Thesis identifier

T15035

Person Identifier (Local)

201351638

Qualification Level

Doctoral (Postgraduate)

Qualification Name

Doctor of Philosophy (PhD)

Department, School or Faculty

Abstract

Image retrieval and object detection have been always popular topics in computer vision, wherein feature extraction and analysis plays an important role. Effective feature descriptors can represent the characteristics of the images and videos, however, for various images and videos, single feature can no longer meet the needs due to its limitations. Therefore, fusion of multiple feature descriptors is desired to extract the comprehensive information from the images, where statistical learning techniques can also be combined to improve the decision making for object detection and matching. In this thesis, three different topics are focused which include logo image retrieval, image saliency detection, and small object detection from videos. Trademark/logo image retrieval (TLIR) as a branch of content-based image retrieval (CBIR) has drawn wide attention for many years. However, most TLIR methods are derived from CBIR methods which are not designed for trademark and logo images, simply because trademark/logo images do not have rich colour and texture information as ordinary images. In the proposed TLIR method, the characteristic of the logo images is extracted by taking advantage of the color and spatial features. Furthermore, a novel adaptive fusion strategy is proposed for feature matching and image retrieval. The experimental results have shown the promising results of the proposed approach, which outperforms three benchmarking methods. Image saliency detection is to simulate the human visual attention (i.e. bottom-up and top-down mechanisms) and to extract the region of attention in images, which has been widely applied in a number of applications such as image segmentation, object detection, classification, etc. However, image saliency detection under complex natural environment is always very challenging. Although different techniques have been proposed and produced good results in various cases, there is some lacking in modeling them in a more generic way under human perception mechanisms. Inspired by Gestalt laws, a novel unsupervised saliency detection framework is proposed, where both top-down and bottom-up perception mechanisms are used along with low level color and spatial features. By the guidance of several Gestalt laws, the proposed method can successfully suppress the backgroundness and highlight the region of interests. Comprehensive experiments on many popular large datasets have validated the superior performance of the proposed methodology in benchmarking with 8 unsupervised approaches. Pedestrian detection is always an important task in urban surveillance, which can be further applied for pedestrian tracking and recognition. In general, visible and thermal imagery are two popularly used data sources, though either of them has pros and cons. A novel approach is proposed to fuse the two data sources for effective pedestrian detection and tracking in videos. For the purpose of pedestrian detection, background subtraction is used, where an adaptive Gaussian mixture model (GMM) is employed to measure the distribution of color and intensity in multi-modality images (RGB images and thermal images). These are integrated to determine the background model where biologically knowledge is used to help refine the background subtraction results. In addition, a constrained mean-shift algorithm is proposed to detect individual persons from groups. Experiments have fully demonstrated the efficacy of the proposed approach in detecting the pedestrians and separating them from groups for successfully tracking in videos.

Advisor / supervisor

Ren, Jinchang
Soraghan, John

Resource Type

Doctoral thesis

DOI

10.48730/sx74-5j86

Date Created

2018

Former identifier

9912680193502996

Relations

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	PDF of thesis T15035	2021-07-02	Public	Download

Cognitive feature fusion for effective pattern recognition in multi-modal images and videos

Downloadable Content

Relations

Items