Theses and Dissertations at Montana State University (MSU)

Permanent URI for this communityhttps://scholarworks.montana.edu/handle/1/732

Browse

Search Results

Now showing 1 - 4 of 4
  • Thumbnail Image
    Item
    Duplications and deletions in genomes: theory and applications
    (Montana State University - Bozeman, College of Engineering, 2022) Zou, Peng; Chairperson, Graduate Committee: Binhai Zhu
    In computational biology, duplications and deletions in genome rearrangements are important to understand an evolutionary process. In cancer genomics research, intra-tumor genetic heterogeneity is one of the central problems. Gene duplications and deletions are observed occurring rapidly in cancer during tumour formation. Hence, they are recognized as critical mutations of cancer evolution. Understanding these mutations are important to understand the origins of cancer cell diversity which could help with cancer prognostics as well as drug resistance explanation. In this dissertation, first, we prove that the tandem duplication distance problem is NP-complete, even if |sigma| > or = 4, settling a 16-year old open problem. And we obtain some positive results by showing that if one of the input sequences, S, is exemplar, then one can decide if S can be transformed into T using at most k tandem duplications in time 2 O (k 2) + poly(n). Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest letter-duplicated subsequence (LLDS) is investigated. We investigate several variants of this problem. Due to fast mutations in cancer, genome rearrangements on copy number profiles are used more often than genome themselves. We explore the Minimum Copy Number Generation problem. We prove that it is NP-hard to even obtain a constant factor approximation. We also show that the corresponding parameterized version is W[1]-hard. These either improve the previous hardness result or solve an open problem. And we then give a polynomial algorithm for the Copy Number Profile Conforming problem. Finally, we investigate the pattern matching with 1-reversal distance problem. With the known results on Longest Common Extension queries, one can design an O(n+m) time algorithm for this problem. However, we find empirically that this algorithm is very slow for small m. We then design an algorithm based on the Karp-Rabin fingerprints which runs in an expected O(nm) time. The algorithms are implemented and tested on real bacterial sequence dataset. The empirical results shows that the shorter the pattern length is (i.e., when m < 200), the more substrings with 1-reversal distance the bacterial sequences have.
  • Thumbnail Image
    Item
    Computational pan-genomics: algorithms and applications
    (Montana State University - Bozeman, College of Engineering, 2018) Cleary, Alan Michael; Chairperson, Graduate Committee: Brendan Mumey
    As the cost of sequencing DNA continues to drop, the number of sequenced genomes rapidly grows. In the recent past, the cost dropped so low that it is no longer prohibitively expensive to sequence multiple genomes for the same species. This has led to a shift from the single reference genome per species paradigm to the more comprehensive pan-genomics approach, where populations of genomes from one or more species are analyzed together. The total genomic content of a population is vast, requiring algorithms for analysis that are more sophisticated and scalable than existing methods. In this dissertation, we explore new algorithms and their applications to pan-genome analysis, both at the nucleotide and genic resolutions. Specifically, we present the Approximate Frequent Subpaths and Frequented Regions problems as a means of mining syntenic blocks from pan-genomic de Bruijn graphs and provide efficient algorithms for mining these structures. We then explore a variety of analyses that mining synteny blocks from pan-genomic data enables, including meaningful visualization, genome classification, and multidimensional-scaling. We also present a novel interactive data mining tool for pan-genome analysis -- the Genome Context Viewer -- which allows users to explore pan-genomic data distributed across a heterogeneous set of data providers by using gene family annotations as a unit of search and comparison. Using this approach, the tool is able to perform traditionally cumbersome analyses on-demand in a federated manner.
  • Thumbnail Image
    Item
    A web-based interface for the NeuroSys database project
    (Montana State University - Bozeman, College of Engineering, 2004) Howard, Stuart W.; Chairperson, Graduate Committee: John Paxton
    This paper describes and documents the implementation of a web-based interface for an existing database application. In response to the demand for managing and storing large amounts scientific data, programmers at the Center for Computational Biology at Montana State University have developed client-side software that allows a user to access, edit, and query a remote database. To use this application the client computer must have a current version of Java installed, the application must be downloaded, and browser settings may need to be adjusted. While many of the end users are computer savvy, the setup process challenges other end users. The goal of this project was to develop a server-side, web-based interface for the application that allows users to browse, query, and edit a database by simply visiting a website and logging in. Principles of good design for user interfaces and interactive systems are reviewed and the resulting system is compared with three other database applications using these design principles as criteria. Results of the comparisons are summarized and recommendations for future work on the project are made.
  • Thumbnail Image
    Item
    A non-autonomous bursting model for neurons
    (Montana State University - Bozeman, College of Letters & Science, 2007) Latulippe, Joe Jean-Marc; Chairperson, Graduate Committee: Mark Pernarowski
    Certain mammalian visual neurons exhibit On and Off responses when given a light stimulus. In addition to these responses, [51] showed that for retinal ganglion cells, the neuron will also exhibit a Mixed response when given two simultaneous stimuli in different regions of the cell's receptive field. This Mixed response is a nonlinear combination of the On and Off responses. In this dissertation, a single cell model which can reproduce On, Off, and Mixed responses is developed and examined using leading order analyses and averaging. This model is developed from a current balance equation which includes a non-autonomous input I(t), and consists of three coupled, first-order nonlinear differential equations which describe the dynamics of the membrane potential of the cell. When I(t) is assumed to be a constant current pulse, the On and Off responses can be reproduced but will depend on both the duration and the amplitude of the input. When I(t) is assumed to be monotone slowly decreasing, the model can reproduce the nonlinear properties for two simultaneous stimuli. In this dissertation, conditions which will guarantee each type of response will be found using the different subsystems of the model.
Copyright (c) 2002-2022, LYRASIS. All rights reserved.