Flow decomposition algorithms for multiassembly problems

dc.contributor.advisorChairperson, Graduate Committee: Brendan Mumeyen
dc.contributor.authorWilliams, Lucia Geanen
dc.date.accessioned2022-11-09T22:47:43Z
dc.date.available2022-11-09T22:47:43Z
dc.date.issued2022en
dc.description.abstractCurrent genetic sequencing technologies allow for fast and cheap measurement of short substrings of genetic sequence called reads which must be assembled to recover the full unknown sequence. In some cases, such as when assembling RNA transcripts or the genomes of a mixture of species taken in a single sample, the reads come from multiple sequences. In this case, we would like to recover all of the distinct unknown sequences and their relative abundances, a task which we call multiassembly. A common model underlying many multiassembly approaches is flow decomposition, which decomposes a flow network into a set of paths and weights that parsimoniously explains the flow. In this dissertation, we formalize two new variations on flow decomposition to better model the information available when performing multiassembly from reads. The first, inexact flow decomposition, allows for some uncertainty in the flow measurements. The second, flow decomposition with subpath constraints, incorporates additional information that may be provided by longer reads. We give algorithms to solve these problems and demonstrated their usefulness for RNA assembly on a simulated dataset. Additionally, we give the first polynomial-size integer linear programming (ILP) formulation for minimum flow decomposition and show that it can be adapted to encode both of the variants mentioned above. An implementation of the ILP using the ILP solver CPLEX runs faster than existing exact MFD solvers on RNA sequencing datasets.en
dc.identifier.urihttps://scholarworks.montana.edu/handle/1/16974en
dc.language.isoenen
dc.publisherMontana State University - Bozeman, College of Engineeringen
dc.rights.holderCopyright 2022 by Lucia Gean Williamsen
dc.subject.lcshNucleotide sequenceen
dc.subject.lcshDirected graphsen
dc.subject.lcshAlgorithmsen
dc.titleFlow decomposition algorithms for multiassembly problemsen
dc.typeDissertationen
mus.data.thumbpage69en
thesis.degree.committeemembersMembers, Graduate Committee: Binhai Zhu; Sean Yaw; David Millmanen
thesis.degree.departmentComputing.en
thesis.degree.genreDissertationen
thesis.degree.namePhDen
thesis.format.extentfirstpage1en
thesis.format.extentlastpage91en

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
williams-flow-2022.pdf
Size:
1.48 MB
Format:
Adobe Portable Document Format
Description:
Flow decomposition algorithms for multiassembly problems (PDF)

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
826 B
Format:
Plain Text
Description:
Copyright (c) 2002-2022, LYRASIS. All rights reserved.