Scholarly Work - Computer Science
Permanent URI for this collectionhttps://scholarworks.montana.edu/handle/1/3034
Browse
3 results
Search Results
Item The longest letter-duplicated subsequence and related problems(Springer Science and Business Media LLC, 2024-07) Lai, Wenfeng; Liyanage, Adiesha; Zhu, Binhai; Zou, PengMotivated by computing duplication patterns in sequences, a new problem called the longest letter-duplicated subsequence (LLDS) is proposed. Given a sequence S of length n, a letter- duplicated subsequence is a subsequence of S in the form of x d1 1 x d2 2 . . . x d k k with x i ∈ , x j = x j+1 and di ≥ 2 for all i in [k] and j in [k − 1]. A linear time algorithm for computing a longest letter-duplicated subsequence (LLDS) of S can be easily obtained. In this paper, we focus on two variants of this problem: (1) ‘all-appearance’ version, i.e., all letters in must appear in the solution, and (2) the weighted version. For the former, we obtain dichotomous results: We prove that, when each letter appears in S at least 4 times, the problem and a relaxed version on feasibility testing (FT) are both NP-hard. The reduction is from (3+, 1, 2−)- SAT, where all 3-clauses (i.e., containing 3 lals) are monotone (i.e., containing only positive literals) and all 2-clauses contain only negative literals. We then show that when each letter appears in S at most 3 times, then the problem admits an O(n) time algorithm. Finally, we consider the weighted version, where the weight of a block x di i (di ≥ 2) could be any positive function which might not grow with di . We give a non-trivial O(n2) time dynamic programming algorithm for this version, i.e., computing an LD-subsequence of S whose weight is maximized.Item Computing the Tandem Duplication Distance is NP-Hard(Society for Industrial & Applied Mathematics, 2022-03) Lafond, Manuel; Zhu, Binhai; Zou, PengIn computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment---this can be represented as the string operation AXB⇒AXXB. Tandem exon duplications have been found in many species such as human, fly, and worm and have been largely studied in computational biology. The tandem duplication (TD) distance problem we investigate in this paper is defined as follows: given two strings S and T over the same alphabet Σ, compute the smallest sequence of TDs required to convert S to T. The natural question of whether the TD distance can be computed in polynomial time was posed in 2004 by Leupold et al. and had remained open, despite the fact that TDs have received much attention ever since. In this paper, we focus on the special case when all characters of S are distinct. This is known as the exemplar TD distance, which is of special relevance in bioinformatics. We first prove that this problem is NP-hard when the alphabet size is unbounded, settling the 16-year-old open problem. We then show how to adapt the proof to |Σ|=4, hence proving the NP-hardness of the TD problem for any |Σ|≥4. One of the tools we develop for the reduction is a new problem called Cost-Effective Subgraph, for which we obtain W[1]-hardness results that might be of independent interest. We finally show that computing the exemplar TD distance between S and T is fixed-parameter tractable. Our results open the door to many other questions, and we conclude with several open problems.Item Dispersing and grouping points on planar segments(Elsevier BV, 2022-09) He, Xiaozhou; Lai, Wenfeng; Zhu, Binhai; Zou, PengMotivated by (continuous) facility location, we study the problem of dispersing and grouping points on a set of segments (of streets) in the plane. In the former problem, given a set of n disjoint line segments in the plane, we investigate the problem of computing a point on each of the n segments such that the minimum Euclidean distance between any two of these points is maximized. We prove that this 2D dispersion problem is NP-hard, in fact, it is NP-hard even if all the segments are parallel and are of unit length. This is in contrast to the polynomial solvability of the corresponding 1D problem by Li and Wang (2016), where the intervals are in 1D and are all disjoint. With this result, we also show that the Independent Set problem on Colored Linear Unit Disk Graph (meaning the convex hulls of points with the same color form disjoint line segments) remains NP-hard, and the parameterized version of it is in W[2]. In the latter problem, given a set of n disjoint line segments in the plane we study the problem of computing a point on each of the n segments such that the maximum Euclidean distance between any two of these points is minimized. We present a factor-1.1547 approximation algorithm which runs in time. Our results can be generalized to the Manhattan distance.