Computational investigation on protein sequencing and genome rearrangement problems

Qingge, Letu

Computational investigation on protein sequencing and genome rearrangement problems

dc.contributor.advisor	Chairperson, Graduate Committee: Binhai Zhu	en
dc.contributor.author	Qingge, Letu	en
dc.date.accessioned	2018-12-05T19:30:09Z
dc.date.available	2018-12-05T19:30:09Z
dc.date.issued	2018	en
dc.description.abstract	De novo protein sequencing and genome rearrangement problems are the classical problems in bioinformatics. De novo protein sequencing problem try to determine the whole sequence of amino acids based on the mass spectrometry data without using the database search. Genome rearrangement problems try to recognize the evolutionary process between two species. In this dissertation, first, we describe the process of constructing target protein sequences by utilizing mass spectrometry based data from both top-down and bottom-up tandem mass spectra. In addition to using data from mass spectrometry analysis, we also utilize techniques for de novo protein sequencing using a homologous protein sequence as a reference to attempt to fill in any remaining gaps in the constructed protein scaffold. Initial results for analysis on real datasets yield over 96-100% coverage and 73-91% accuracy with the target protein sequence. Second, we use different genome rearrangement operations to transform one genome to another such that the similarity between two genomes is maximized. We explore these problems in terms of theoretical and experimental analysis. For sorting unsigned genome problem by double cut and join (DCJ) operation, we design a randomized fixed parameter tractable (FPT) approximation algorithm for computing the DCJ distance with an approximation factor 4/3 + Epsilon, and the running time O(2 d), where d* represents the optimal DCJ distance. For one-sided exemplar adjacency number problem, we reformulate the problem as maximum independent set in a colored interval graph and hence reduce the appearance of each gene at most twice. Moreover, we design a factor-2 approximation and also show that the approximation factor can not be improved less than 2 by some local search technique. At last, we apply integer linear programming to solve the reduced instance exactly. For the minimum copy number generation problem, we analyze the complexity of different variations of this problem and show a practical algorithm for the general case based on greedy method.	en
dc.identifier.uri	https://scholarworks.montana.edu/handle/1/14693	en
dc.language.iso	en	en
dc.publisher	Montana State University - Bozeman, College of Engineering	en
dc.rights.holder	Copyright 2018 by Letu Qingge	en
dc.subject.lcsh	Amino acid sequence	en
dc.subject.lcsh	Genomics	en
dc.subject.lcsh	Bioinformatics	en
dc.subject.lcsh	Mass spectrometry	en
dc.subject.lcsh	Algorithms	en
dc.subject.lcsh	Linear programming	en
dc.title	Computational investigation on protein sequencing and genome rearrangement problems	en
dc.type	Dissertation	en
mus.data.thumbpage	136	en
thesis.degree.committeemembers	Members, Graduate Committee: Brendan Mumey; David Millman; Brittany Fasy.	en
thesis.degree.department	Gianforte School of Computing.	en
thesis.degree.genre	Dissertation	en
thesis.degree.name	PhD	en
thesis.format.extentfirstpage	1	en
thesis.format.extentlastpage	136	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: QinggeL0818.pdf
Size:: 683.06 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 826 B
Format:: Plain Text
Description:

Download

Collections

Theses and Dissertations at Montana State University (MSU)