Computing the Tandem Duplication Distance is NP-Hard

dc.contributor.authorLafond, Manuel
dc.contributor.authorZhu, Binhai
dc.contributor.authorZou, Peng
dc.date.accessioned2023-01-24T21:42:17Z
dc.date.available2023-01-24T21:42:17Z
dc.date.issued2022-03
dc.description© SIAMen_US
dc.description.abstractIn computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment---this can be represented as the string operation AXB⇒AXXB. Tandem exon duplications have been found in many species such as human, fly, and worm and have been largely studied in computational biology. The tandem duplication (TD) distance problem we investigate in this paper is defined as follows: given two strings S and T over the same alphabet Σ, compute the smallest sequence of TDs required to convert S to T. The natural question of whether the TD distance can be computed in polynomial time was posed in 2004 by Leupold et al. and had remained open, despite the fact that TDs have received much attention ever since. In this paper, we focus on the special case when all characters of S are distinct. This is known as the exemplar TD distance, which is of special relevance in bioinformatics. We first prove that this problem is NP-hard when the alphabet size is unbounded, settling the 16-year-old open problem. We then show how to adapt the proof to |Σ|=4, hence proving the NP-hardness of the TD problem for any |Σ|≥4. One of the tools we develop for the reduction is a new problem called Cost-Effective Subgraph, for which we obtain W[1]-hardness results that might be of independent interest. We finally show that computing the exemplar TD distance between S and T is fixed-parameter tractable. Our results open the door to many other questions, and we conclude with several open problems.en_US
dc.identifier.citationLafond, M., Zhu, B., & Zou, P. (2022). Computing the Tandem Duplication Distance is NP-Hard. SIAM Journal on Discrete Mathematics, 36(1), 64-91.en_US
dc.identifier.issn0895-4801
dc.identifier.urihttps://scholarworks.montana.edu/handle/1/17625
dc.language.isoen_USen_US
dc.publisherSociety for Industrial & Applied Mathematicsen_US
dc.rightscopyright Society for Industrial & Applied Mathematics 2022en_US
dc.rights.uriSociety for Industrial & Applied Mathematicsen_US
dc.subjecttandem duplicationen_US
dc.subjecttext processingen_US
dc.subjectformal languagesen_US
dc.subjectcomputational genomicsen_US
dc.subjectFPT algorithmsen_US
dc.titleComputing the Tandem Duplication Distance is NP-Harden_US
dc.typeArticleen_US
mus.citation.extentfirstpage1en_US
mus.citation.extentlastpage28en_US
mus.citation.issue1en_US
mus.citation.journaltitleSIAM Journal on Discrete Mathematicsen_US
mus.citation.volume36en_US
mus.data.thumbpage17en_US
mus.identifier.doi10.1137/20M1356257en_US
mus.relation.collegeCollege of Engineeringen_US
mus.relation.departmentComputer Science.en_US
mus.relation.universityMontana State University - Bozemanen_US

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
lafond-computing-2022.pdf
Size:
531.36 KB
Format:
Adobe Portable Document Format
Description:
computing tandem duplication

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:
Copyright (c) 2002-2022, LYRASIS. All rights reserved.