Computational investigation on protein sequencing and genome rearrangement problems

dc.contributor.advisorChairperson, Graduate Committee: Binhai Zhuen
dc.contributor.authorQingge, Letuen
dc.date.accessioned2018-12-05T19:30:09Z
dc.date.available2018-12-05T19:30:09Z
dc.date.issued2018en
dc.description.abstractDe novo protein sequencing and genome rearrangement problems are the classical problems in bioinformatics. De novo protein sequencing problem try to determine the whole sequence of amino acids based on the mass spectrometry data without using the database search. Genome rearrangement problems try to recognize the evolutionary process between two species. In this dissertation, first, we describe the process of constructing target protein sequences by utilizing mass spectrometry based data from both top-down and bottom-up tandem mass spectra. In addition to using data from mass spectrometry analysis, we also utilize techniques for de novo protein sequencing using a homologous protein sequence as a reference to attempt to fill in any remaining gaps in the constructed protein scaffold. Initial results for analysis on real datasets yield over 96-100% coverage and 73-91% accuracy with the target protein sequence. Second, we use different genome rearrangement operations to transform one genome to another such that the similarity between two genomes is maximized. We explore these problems in terms of theoretical and experimental analysis. For sorting unsigned genome problem by double cut and join (DCJ) operation, we design a randomized fixed parameter tractable (FPT) approximation algorithm for computing the DCJ distance with an approximation factor 4/3 + Epsilon, and the running time O*(2 d*), where d* represents the optimal DCJ distance. For one-sided exemplar adjacency number problem, we reformulate the problem as maximum independent set in a colored interval graph and hence reduce the appearance of each gene at most twice. Moreover, we design a factor-2 approximation and also show that the approximation factor can not be improved less than 2 by some local search technique. At last, we apply integer linear programming to solve the reduced instance exactly. For the minimum copy number generation problem, we analyze the complexity of different variations of this problem and show a practical algorithm for the general case based on greedy method.en
dc.identifier.urihttps://scholarworks.montana.edu/handle/1/14693en
dc.language.isoenen
dc.publisherMontana State University - Bozeman, College of Engineeringen
dc.rights.holderCopyright 2018 by Letu Qinggeen
dc.subject.lcshAmino acid sequenceen
dc.subject.lcshGenomicsen
dc.subject.lcshBioinformaticsen
dc.subject.lcshMass spectrometryen
dc.subject.lcshAlgorithmsen
dc.subject.lcshLinear programmingen
dc.titleComputational investigation on protein sequencing and genome rearrangement problemsen
dc.typeDissertationen
mus.data.thumbpage136en
thesis.degree.committeemembersMembers, Graduate Committee: Brendan Mumey; David Millman; Brittany Fasy.en
thesis.degree.departmentGianforte School of Computing.en
thesis.degree.genreDissertationen
thesis.degree.namePhDen
thesis.format.extentfirstpage1en
thesis.format.extentlastpage136en

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
QinggeL0818.pdf
Size:
683.06 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
826 B
Format:
Plain Text
Description:
Copyright (c) 2002-2022, LYRASIS. All rights reserved.