Duplications and deletions in genomes: theory and applications

Zou, Peng

Duplications and deletions in genomes: theory and applications

dc.contributor.advisor	Chairperson, Graduate Committee: Binhai Zhu	en
dc.contributor.author	Zou, Peng	en
dc.date.accessioned	2022-11-09T22:41:02Z
dc.date.available	2022-11-09T22:41:02Z
dc.date.issued	2022	en
dc.description.abstract	In computational biology, duplications and deletions in genome rearrangements are important to understand an evolutionary process. In cancer genomics research, intra-tumor genetic heterogeneity is one of the central problems. Gene duplications and deletions are observed occurring rapidly in cancer during tumour formation. Hence, they are recognized as critical mutations of cancer evolution. Understanding these mutations are important to understand the origins of cancer cell diversity which could help with cancer prognostics as well as drug resistance explanation. In this dissertation, first, we prove that the tandem duplication distance problem is NP-complete, even if \|sigma\| > or = 4, settling a 16-year old open problem. And we obtain some positive results by showing that if one of the input sequences, S, is exemplar, then one can decide if S can be transformed into T using at most k tandem duplications in time 2 O (k 2) + poly(n). Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest letter-duplicated subsequence (LLDS) is investigated. We investigate several variants of this problem. Due to fast mutations in cancer, genome rearrangements on copy number profiles are used more often than genome themselves. We explore the Minimum Copy Number Generation problem. We prove that it is NP-hard to even obtain a constant factor approximation. We also show that the corresponding parameterized version is W[1]-hard. These either improve the previous hardness result or solve an open problem. And we then give a polynomial algorithm for the Copy Number Profile Conforming problem. Finally, we investigate the pattern matching with 1-reversal distance problem. With the known results on Longest Common Extension queries, one can design an O(n+m) time algorithm for this problem. However, we find empirically that this algorithm is very slow for small m. We then design an algorithm based on the Karp-Rabin fingerprints which runs in an expected O(nm) time. The algorithms are implemented and tested on real bacterial sequence dataset. The empirical results shows that the shorter the pattern length is (i.e., when m < 200), the more substrings with 1-reversal distance the bacterial sequences have.	en
dc.identifier.uri	https://scholarworks.montana.edu/handle/1/16981	en
dc.language.iso	en	en
dc.publisher	Montana State University - Bozeman, College of Engineering	en
dc.rights.holder	Copyright 2022 by Peng Zou	en
dc.subject.lcsh	Genomics	en
dc.subject.lcsh	Computational biology	en
dc.subject.lcsh	Cancer	en
dc.subject.lcsh	Mutation (Biology)	en
dc.subject.lcsh	Algorithms	en
dc.title	Duplications and deletions in genomes: theory and applications	en
dc.type	Dissertation	en
mus.data.thumbpage	23	en
thesis.degree.committeemembers	Members, Graduate Committee: David Millman; Sean Yaw; Brittany Fasy	en
thesis.degree.department	Computing.	en
thesis.degree.genre	Dissertation	en
thesis.degree.name	PhD	en
thesis.format.extentfirstpage	1	en
thesis.format.extentlastpage	132	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: zou-duplications-2022.pdf
Size:: 683.1 KB
Format:: Adobe Portable Document Format
Description:: Duplications and deletions in genomes (PDF)

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 826 B
Format:: Plain Text
Description:

Download

Collections

Theses and Dissertations at Montana State University (MSU)