Browsing by Author "Cáceres, Manuel"

Now showing 1 - 2 of 2

Improving RNA Assembly via Safety and Completeness in Flow Decompositions
(Mary Ann Liebert Inc, 2022-12) Khan, Shahbaz; Kortelainen, Milla; Cáceres, Manuel; Williams, Lucia; Tomescu, Alexandru I.
Decomposing a network flow into weighted paths is a problem with numerous applications, ranging from networking, transportation planning, to bioinformatics. In some applications we look for a decomposition that is optimal with respect to some property, such as the number of paths used, robustness to edge deletion, or length of the longest path. However, in many bioinformatic applications, we seek a specific decomposition where the paths correspond to some underlying data that generated the flow. In these cases, no optimization criteria guarantee the identification of the correct decomposition. Therefore, we propose to instead report the safe paths, which are subpaths of at least one path in every flow decomposition. In this work, we give the first local characterization of safe paths for flow decompositions in directed acyclic graphs, leading to a practical algorithm for finding the complete set of safe paths. In addition, we evaluate our algorithm on RNA transcript data sets against a trivial safe algorithm (extended unitigs), the recently proposed safe paths for path covers (TCBB 2021) and the popular heuristic greedy-width. On the one hand, we found that besides maintaining perfect precision, our safe and complete algorithm reports a significantly higher coverage ( = 50% more) compared with the other safe algorithms. On the other hand, the greedy-width algorithm although reporting a better coverage, it also reports a significantly lower precision on complex graphs (for genes expressing a large number of transcripts). Overall, our safe and complete algorithm outperforms (by = 20%) greedy-width on a unified metric (F-score) considering both coverage and precision when the evaluated data set has a significant number of complex graphs. Moreover, it also has a superior time (4 - 5x) and space performance (1.2 - 2.2x), resulting in a better and more practical approach for bioinformatic applications of flow decomposition.
Width Helps and Hinders Splitting Flows
(Association for Computing Machinery, 2024-01) Cáceres, Manuel; Cairo, Massimo; Grigorjew, Andreas; Khan, Shahbaz; Mumey, Brendan; Rizzi, Romeo; Tomescu, Alexandru I.; Williams, Lucia
Minimum flow decomposition (MFD) is the NP-hard problem of finding a smallest decomposition of a network flow/circulation X on a directed graph G into weighted source-to-sink paths whose weighted sum equals X. We show that, for acyclic graphs, considering the width of the graph (the minimum number of paths needed to cover all of its edges) yields advances in our understanding of its approximability. For the version of the problem that uses only non-negative weights, we identify and characterise a new class of width-stable graphs, for which a popular heuristic is a O(log Val (X))-approximation (Val(X) being the total flow of X), and strengthen its worst-case approximation ratio from Ω(m−−√) to Ω (m/log m) for sparse graphs, where m is the number of edges in the graph. We also study a new problem on graphs with cycles, Minimum Cost Circulation Decomposition (MCCD), and show that it generalises MFD through a simple reduction. For the version allowing also negative weights, we give a (⌈ log ‖ X ‖ ⌉ +1)-approximation (‖ X ‖ being the maximum absolute value of X on any edge) using a power-of-two approach, combined with parity fixing arguments and a decomposition of unitary circulations (‖ X ‖ ≤ 1), using a generalised notion of width for this problem. Finally, we disprove a conjecture about the linear independence of minimum (non-negative) flow decompositions posed by Kloster et al. [2018], but show that its useful implication (polynomial-time assignments of weights to a given set of paths to decompose a flow) holds for the negative version.