Browsing by Author "Lafond, Manuel"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Computing the Tandem Duplication Distance is NP-Hard(Society for Industrial & Applied Mathematics, 2022-03) Lafond, Manuel; Zhu, Binhai; Zou, PengIn computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment---this can be represented as the string operation AXB⇒AXXB. Tandem exon duplications have been found in many species such as human, fly, and worm and have been largely studied in computational biology. The tandem duplication (TD) distance problem we investigate in this paper is defined as follows: given two strings S and T over the same alphabet Σ, compute the smallest sequence of TDs required to convert S to T. The natural question of whether the TD distance can be computed in polynomial time was posed in 2004 by Leupold et al. and had remained open, despite the fact that TDs have received much attention ever since. In this paper, we focus on the special case when all characters of S are distinct. This is known as the exemplar TD distance, which is of special relevance in bioinformatics. We first prove that this problem is NP-hard when the alphabet size is unbounded, settling the 16-year-old open problem. We then show how to adapt the proof to |Σ|=4, hence proving the NP-hardness of the TD problem for any |Σ|≥4. One of the tools we develop for the reduction is a new problem called Cost-Effective Subgraph, for which we obtain W[1]-hardness results that might be of independent interest. We finally show that computing the exemplar TD distance between S and T is fixed-parameter tractable. Our results open the door to many other questions, and we conclude with several open problems.Item Permutation-constrained Common String Partitions with Applications(Springer Science and Business Media LLC, 2024-09) Lafond, Manuel; Zhu, BinhaiWe study a new combinatorial problem based on the famous Minimum Common String Partition (MCSP) problem, which we call Permutation-constrained Common String Partition (PCSP for short). In PCSP, we are given two sequences/genomes s and t with the same length and a permutation π on [`], the question is to decide whether it is possible to decompose s and t into ` blocks that can be matched according to some specified requirements, and that conform with the permutation π. Our main result is that PCSP is FPT in parameter ` + d, where d is the maximum number of occurrences that any symbol may have in s or t. We also study a variant where the input specifies whether each matched pair of block needs to be preserved as is, or reversed. With this result on PCSP, we show that a series of genome rearrangement problems are FPT k + d, where k is the rearrangement distance between two genomes of interest.