Theses and Dissertations at Montana State University (MSU)
Permanent URI for this communityhttps://scholarworks.montana.edu/handle/1/732
Browse
2 results
Search Results
Item Flow decomposition algorithms for multiassembly problems(Montana State University - Bozeman, College of Engineering, 2022) Williams, Lucia Gean; Chairperson, Graduate Committee: Brendan MumeyCurrent genetic sequencing technologies allow for fast and cheap measurement of short substrings of genetic sequence called reads which must be assembled to recover the full unknown sequence. In some cases, such as when assembling RNA transcripts or the genomes of a mixture of species taken in a single sample, the reads come from multiple sequences. In this case, we would like to recover all of the distinct unknown sequences and their relative abundances, a task which we call multiassembly. A common model underlying many multiassembly approaches is flow decomposition, which decomposes a flow network into a set of paths and weights that parsimoniously explains the flow. In this dissertation, we formalize two new variations on flow decomposition to better model the information available when performing multiassembly from reads. The first, inexact flow decomposition, allows for some uncertainty in the flow measurements. The second, flow decomposition with subpath constraints, incorporates additional information that may be provided by longer reads. We give algorithms to solve these problems and demonstrated their usefulness for RNA assembly on a simulated dataset. Additionally, we give the first polynomial-size integer linear programming (ILP) formulation for minimum flow decomposition and show that it can be adapted to encode both of the variants mentioned above. An implementation of the ILP using the ILP solver CPLEX runs faster than existing exact MFD solvers on RNA sequencing datasets.Item Directed graph descriptors and distances for analyzing multivariate time series data(Montana State University - Bozeman, College of Letters & Science, 2022) Belton, Robin Lynne; Chairperson, Graduate Committee: Tomas GedeonLocal maxima and minima, or extremal events, in experimental time series can be used as a coarse summary to characterize data. However, the discrete sampling in recording experimental measurements suggests uncertainty in the true timing of extrema during the experiment. This in turn gives uncertainty in the timing order of extrema within the time series. Motivated by applications in genomic time series and biological network analysis, we construct a weighted directed acyclic graph (DAG) called an extremal event DAG using techniques from persistent homology that is robust to measurement noise. Furthermore, we define a distance between extremal event DAGs based on the edit distance between strings. We prove several properties including local stability for the extremal event DAG distance with respect to pairwise L1 distances between functions in the time series data. Lastly, we provide algorithms, publicly free software, and implementations on extremal event DAG construction and comparison.