Optimised algorithm implementation for effctive alignment of gene sequences

Rights statement
Awarding institution
  • University of Strathclyde
Date of award
  • 2022
Thesis identifier
  • T16337
Person Identifier (Local)
  • 201886414
Qualification Level
Qualification Name
Department, School or Faculty
  • With the completion of the Human Genome Project (HGP) and the vigorous development of the Model Organism Genome Project, more and more molecular sequence data have been generated. Scientific analysis, processing and research of these sequence data not only promote the development of bioinformatics research methods and technologies, but also have broad application background in the fields of prevention, diagnosis, treatment and new drug development of human diseases and major epidemic situations. How to give an effective graphic expression of gene sequence and analyze the similarity and evolutionary relationship of genes on this basis has become a hot topic in bioinformatics. This dissertation mainly studies the graphical representation of DNA sequence, the similarity analysis of biological sequences and the algorithm for constructing the phylogenetic tree. The main achievements are summarized as below: Firstly, the JZ-curve, a new graphical expression of the gene sequence, is introduced. By defining three mathematical mapping, a gene sequence can be transformed into three curves It proves that the JZ-curve not only avoids the limitations associated with crossing and overlapping, but also retains the biological information of gene sequences. Secondly, we construct a new characteristic matrix, named J/J matrix. When we study the sequence comparability based on graphical representation of DNA sequence, the J/J characteristic matrix based on JZ-curve can describe the chemical characteristic and the biological significance of gene sequences The examination of similarities/dissimilarities among the coding sequences of the first exon of ββ-globin gene of different species illustrates the utility of the approach. Thirdly, based on the JZ-curve, a fuzzy clustering algorithm on the basis of spectral graph theory for constructing phylogenetic tree is proposed. With the cluster analysis method, we build phylogenetic trees and determine the evolutionary relationship between the sequences. Meanwhile, the algorithm not only considers the divergence between classes, but also considers the similarity between classes, increase the accuracy of the results. The phylogenetic relationships for the coding sequences of the first exon of ββ-globin gene of 11 different species and the NA(H1N1) sequences of avian influenza virus illustrate that algorithm is credible.
Advisor / supervisor
  • Ren, Jinchang
  • Marshall, Stephen
Resource Type