Deep spiking neural networks with applications to human gesture recognition

Xing, Yannan

Thesis

Deep spiking neural networks with applications to human gesture recognition

Download PDF

Creator

Xing, Yannan

Rights statement

Strathclyde Thesis Copyright

Awarding institution

University of Strathclyde

Date of award

2020

Thesis identifier

T15900

Person Identifier (Local)

201763014

Qualification Level

Doctoral (Postgraduate)

Qualification Name

Doctor of Philosophy (PhD)

Department, School or Faculty

Abstract

The spiking neural networks (SNNs), as the 3rd generation of Artificial Neural Networks (ANNs), are a class of event-driven neuromorphic algorithms that potentially have a wide range of application domains and are applicable to a variety of extremely low power neuromorphic hardware. The work presented in this thesis addresses the challenges of human gesture recognition using novel SNN algorithms. It discusses the design of these algorithms for both visual and auditory domain human gesture recognition as well as event-based pre-processing toolkits for audio signals.From the visual gesture recognition aspect, a novel SNN-based event-driven hand gesture recognition system is proposed. This system is shown to be effective in an experiment on hand gesture recognition with its spiking recurrent convolutional neural network (SCRNN) design, which combines both designed convolution operation and recurrent connectivity to maintain spatial and temporal relations with address-event-representation (AER) data. The proposed SCRNN architecture can achieve arbitrary temporal resolution, which means it can exploit temporal correlations between event collections. This design utilises a backpropagation-based training algorithm and does not suffer from gradient vanishing/explosion problems.From the audio perspective, a novel end-to-end spiking speech emotion recognition system (SER) is proposed. This system employs the MFCC as its main speech feature extractor as well as a self-designed latency coding algorithm to effciently convert the raw signal to AER input that can be used for SNN. A two-layer spiking recurrent architecture is proposed to address temporal correlations between spike trains. The robustness of this system is supported by several open public datasets, which demonstrate state of the arts recognition accuracy and a significant reduction in network size, computational costs, and training speed.In addition to directly contributing to neuromorphic SER, this thesis proposes a novel speech-coding algorithm based on the working mechanism of humans auditory organ system. The algorithm mimics the functionality of the cochlea and successfully provides an alternative method of event-data acquisition for audio-based data. The algorithm is then further simplified and extended into an application of speech enhancement which is jointly used in the proposed SER system. This speech-enhancement method uses the lateral inhibition mechanism as a frequency coincidence detector to remove uncorrelated noise in the time-frequency spectrum. The method is shown to be effective by experiments for up to six types of noise.

Advisor / supervisor

Soraghan, John

Resource Type

Doctoral thesis

DOI

10.48730/9qhn-xv15

Relations

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	PDF of thesis T15900	2021-07-28	Public	Download

Deep spiking neural networks with applications to human gesture recognition

Downloadable Content

Relations

Items