Improving RNA Assembly via Safety and Completeness in Flow Decompositions

dc.contributor.authorKhan, Shahbaz
dc.contributor.authorKortelainen, Milla
dc.contributor.authorCáceres, Manuel
dc.contributor.authorWilliams, Lucia
dc.contributor.authorTomescu, Alexandru I.
dc.date.accessioned2023-01-27T18:42:10Z
dc.date.available2023-01-27T18:42:10Z
dc.date.issued2022-12
dc.description.abstractDecomposing a network flow into weighted paths is a problem with numerous applications, ranging from networking, transportation planning, to bioinformatics. In some applications we look for a decomposition that is optimal with respect to some property, such as the number of paths used, robustness to edge deletion, or length of the longest path. However, in many bioinformatic applications, we seek a specific decomposition where the paths correspond to some underlying data that generated the flow. In these cases, no optimization criteria guarantee the identification of the correct decomposition. Therefore, we propose to instead report the safe paths, which are subpaths of at least one path in every flow decomposition. In this work, we give the first local characterization of safe paths for flow decompositions in directed acyclic graphs, leading to a practical algorithm for finding the complete set of safe paths. In addition, we evaluate our algorithm on RNA transcript data sets against a trivial safe algorithm (extended unitigs), the recently proposed safe paths for path covers (TCBB 2021) and the popular heuristic greedy-width. On the one hand, we found that besides maintaining perfect precision, our safe and complete algorithm reports a significantly higher coverage ( = 50% more) compared with the other safe algorithms. On the other hand, the greedy-width algorithm although reporting a better coverage, it also reports a significantly lower precision on complex graphs (for genes expressing a large number of transcripts). Overall, our safe and complete algorithm outperforms (by = 20%) greedy-width on a unified metric (F-score) considering both coverage and precision when the evaluated data set has a significant number of complex graphs. Moreover, it also has a superior time (4 - 5x) and space performance (1.2 - 2.2x), resulting in a better and more practical approach for bioinformatic applications of flow decomposition.en_US
dc.identifier.citationKhan, S., Kortelainen, M., Cáceres, M., Williams, L., & Tomescu, A. I. (2022). Improving rna assembly via safety and completeness in flow decompositions. Journal of Computational Biology, 29(12), 1270-1287.en_US
dc.identifier.issn1557-8666
dc.identifier.urihttps://scholarworks.montana.edu/handle/1/17657
dc.language.isoen_USen_US
dc.publisherMary Ann Liebert Incen_US
dc.rightscc-byen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.subjectdirected acyclic graphsen_US
dc.subjectflow decompositionen_US
dc.subjectflow networksen_US
dc.subjectRNA assemblyen_US
dc.subjectsafetyen_US
dc.titleImproving RNA Assembly via Safety and Completeness in Flow Decompositionsen_US
dc.typeArticleen_US
mus.citation.extentfirstpage1en_US
mus.citation.extentlastpage18en_US
mus.citation.issue12en_US
mus.citation.journaltitleJournal of Computational Biologyen_US
mus.citation.volume29en_US
mus.identifier.doi10.1089/cmb.2022.0261en_US
mus.relation.collegeCollege of Engineeringen_US
mus.relation.departmentComputer Science.en_US
mus.relation.universityMontana State University - Bozemanen_US

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
khan-rna-2022.pdf
Size:
1.2 MB
Format:
Adobe Portable Document Format
Description:
rna assembly

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:
Copyright (c) 2002-2022, LYRASIS. All rights reserved.