Safety in multi-assembly via paths appearing in all path covers of a DAG

dc.contributor.authorCaceres, Manuel
dc.contributor.authorMumey, Brendan
dc.contributor.authorHusic, Edin
dc.contributor.authorRizzi, Romeo
dc.contributor.authorCairo, Massimo
dc.contributor.authorSahlin, Kristoffer
dc.contributor.authorTomescu, Alexandru I. Ioan
dc.date.accessioned2022-09-01T15:57:01Z
dc.date.available2022-09-01T15:57:01Z
dc.date.issued2021-01
dc.description.abstractA multi-assembly problem asks to reconstruct multiple genomic sequences from mixed reads sequenced from all of them. Standard formulations of such problems model a solution as a path cover in a directed acyclic graph, namely a set of paths that together cover all vertices of the graph. Since multi-assembly problems admit multiple solutions in practice, we consider an approach commonly used in standard genome assembly: output only partial solutions (contigs, or safe paths), that appear in all path cover solutions. We study constrained path covers, a restriction on the path cover solution that incorporate practical constraints arising in multi-assembly problems. We give efficient algorithms finding all maximal safe paths for constrained path covers. We compute the safe paths of splicing graphs constructed from transcript annotations of different species. Our algorithms run in less than 15 seconds per species and report RNA contigs that are over 99% precise and are up to 8 times longer than unitigs. Moreover, RNA contigs cover over 70% of the transcripts and their coding sequences in most cases. With their increased length to unitigs, high precision, and fast construction time, maximal safe paths can provide a better base set of sequences for transcript assembly programs.en_US
dc.identifier.citationCaceres, Manuel, Brendan Mumey, Edin Husic, Romeo Rizzi, Massimo Cairo, Kristoffer Sahlin, and Alexandru I. Ioan Tomescu. "Safety in multi-assembly via paths appearing in all path covers of a DAG." IEEE/ACM Transactions on Computational Biology and Bioinformatics (2021).en_US
dc.identifier.issn1545-5963
dc.identifier.urihttps://scholarworks.montana.edu/handle/1/17042
dc.language.isoen_USen_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.rightscc-byen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.subjectsafety dagen_US
dc.titleSafety in multi-assembly via paths appearing in all path covers of a DAGen_US
dc.typeArticleen_US
mus.citation.extentfirstpage1en_US
mus.citation.extentlastpage12en_US
mus.citation.journaltitleIEEE/ACM Transactions on Computational Biology and Bioinformaticsen_US
mus.data.thumbpage6en_US
mus.identifier.doi10.1109/TCBB.2021.3131203en_US
mus.relation.collegeCollege of Engineeringen_US
mus.relation.departmentComputer Science.en_US
mus.relation.universityMontana State University - Bozemanen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
caceres-dag-2021.pdf
Size:
6.52 MB
Format:
Adobe Portable Document Format
Description:
safety paths dag

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
826 B
Format:
Item-specific license agreed upon to submission
Description:
Copyright (c) 2002-2022, LYRASIS. All rights reserved.