Theses and Dissertations at Montana State University (MSU)
Permanent URI for this communityhttps://scholarworks.montana.edu/handle/1/732
Browse
2 results
Search Results
Item String analysis and algorithms with genomic applications(Montana State University - Bozeman, College of Engineering, 2024) Liyana Ralalage, Adiesha Lakshan Liyanage; Chairperson, Graduate Committee: Binhai ZhuIn biology, genome rearrangements are mutations that change the gene content of a genome or the arrangement of the genes on a genome. Understanding how genome rearrangements occur in a genome can help us to understand the evolutionary history of extant species, improve genetic engineering, and understand the basis of genetic diseases. In this dissertation, we explored four problems related to genome partitioning and tandem duplication and deletion rearrangement operations. Our interest was focused on determining how difficult it is to solve these problems and identifying efficient algorithms to solve them. The proposed problems were formulated as string problems and then analyzed using complexity theory. In the first chapter, we explored several variations of F -strip recovery problem called XSR-F and GSR-F and their complexity under different parameters. We proved that the XSR-F problem is hard to solve unless we restrict the allowed block sizes to one size. We provided a polynomial time algorithm for GSR-F under a fixed alphabet and fixed F . In the second and third chapters, we introduced two string problems named longest letter- duplicated subsequence (LLDS) and longest subsequence-repeated subsequence (LSRS)-- formulated as alternative problem formulations for the tandem-duplication distance problem that allow to extract information about segments of genes that may have undergone tandem duplication-- analyzed the complexity of their variations and devised efficient algorithms to solve them. We proved that constrained versions of LLDS and LSRS problems are NP- hard for parameter d > or = 4, while general versions were polynomially solvable which hints that any variations closer to the original tandem duplication distance problem are still hard to solve. In the final chapter, we delved into two heuristic algorithms designed to compute genomic distance between two mitochondrial genomes and a heuristic algorithm to predict ancestral gene order under the TDRL (tandem-duplication random loss) model. We improved the previously studied method developed for permutation strings by tweaking heuristic choices aimed at calculating the minimum distance between two genomes to apply to non-permutation strings. These heuristic algorithms were implemented and tested on a real-world mitochondrial genome data set.Item Extensions to modeling and inference in continuous time Bayesian networks(Montana State University - Bozeman, College of Engineering, 2014) Sturlaugson, Liessman Eric; Chairperson, Graduate Committee: John SheppardThe continuous time Bayesian network (CTBN) enables reasoning about complex systems in continuous time by representing a system as a factored, finite-state, continuous-time Markov process. The dynamics of the CTBN are described by each node's conditional intensity matrices, determined by the states of the parents in the network. As the CTBN is a relatively new model, many extensions that have been defined with respect to Bayesian networks (BNs) have not yet been extended to CTBNs. This thesis presents five novel extensions to CTBN modeling and inference. First, we prove several complexity results specific to CTBNs. It is known that exact inference in CTBNs is NP-hard due to the use of a BN for the initial distribution. We prove that exact inference in CTBNs is still NP-hard, even when the initial states are given, and prove that approximate inference in CTBNs, as with BNs, is also NP-hard. Second, we formalize performance functions for the CTBN and show how they can be factored in the same way as the network, even when the performance functions are defined with respect to interaction between multiple nodes. Performance functions extend the model, allowing it to represent complex, user-specified functions of the behaviors of the system. Third, we present a novel method for node marginalization called "node isolation" that approximates a set of conditional intensity matrices with a single unconditional intensity matrix. The method outperforms previous node marginalization techniques in all of our experiments by better describing the long-term behavior of the marginalized nodes. Fourth, using the node isolation method we developed, we show how methods for sensitivity analysis of Markov processes can be applied to the CTBN while exploiting the conditional independence structure of the network. This enables efficient sensitivity analysis to be performed on our CTBN performance functions. Fifth, we formalize both uncertain and negative types of evidence in the context of CTBNs and extend existing inference algorithms to be able to support all combinations of evidence types. We show that these extensions make the CTBN more powerful, versatile, and applicable to real-world domains.