-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Motivation
STRING-TO-STRING CORRECTION (P168) from Garey & Johnson, A4 SR20. A classical NP-complete problem concerning the minimum-cost transformation of one string into another using only deletion and adjacent-symbol interchange operations. While the standard edit distance (with insert, delete, change) is solvable in polynomial time via dynamic programming (Wagner-Fischer algorithm), restricting the operation set to only deletions and adjacent swaps makes the problem NP-complete for unbounded alphabets.
Associated reduction rules:
- R114: SET COVERING -> STRING-TO-STRING CORRECTION (this is the GJ reference reduction)
Definition
Name: StringToStringCorrection
Canonical name: String-to-String Correction (also: Extended Edit Distance with Swaps and Deletions)
Reference: Garey & Johnson, Computers and Intractability, A4 SR20
Mathematical definition:
INSTANCE: Finite alphabet Sigma, two strings x,y in Sigma*, and a positive integer K.
QUESTION: Is there a way to derive the string y from the string x by a sequence of K or fewer operations of single symbol deletion or adjacent symbol interchange?
The problem is a decision (satisfaction) problem: the answer is YES or NO depending on whether the restricted edit distance (using only swap and delete) from x to y is at most K.
Variables
- Count:
bound_k(the maximum number of operations K). Encoded as K operation slots. - Per-variable domain: Each operation slot is an integer in
{0, ..., 2*source_length}. Values0..source_lengthrepresent "delete at position i", valuessource_length..2*source_lengthrepresent "swap adjacent positions i and i+1", and the value2*source_lengthrepresents "no-op". dims():vec![2 * source.len() + 1; bound_k]evaluate(): Apply the sequence of operations left-to-right to a mutable copy ofsource. Skip no-ops. If any delete/swap index is out of bounds for the current intermediate string, returnfalse. After all operations, returntrueiff the result equalstarget.
Schema (data type)
Type name: StringToStringCorrection
Variants: none (no graph or weight type parameter; operates on strings over a finite alphabet)
| Field | Type | Description |
|---|---|---|
alphabet_size |
usize |
Size of the finite alphabet Sigma (symbols indexed 0..alphabet_size) |
source |
Vec<usize> |
The source string x, encoded as a sequence of symbol indices |
target |
Vec<usize> |
The target string y, encoded as a sequence of symbol indices |
bound_k |
usize |
The positive integer K: maximum number of operations allowed |
Size fields (getter methods for overhead expressions and declare_variants!):
alphabet_size()— returnsalphabet_sizesource_length()— returnssource.len()target_length()— returnstarget.len()bound_k()— returnsbound_k
Complexity
- Decision complexity: NP-complete for unbounded alphabet size (Wagner, 1975; transformation from SET COVERING).
- Best known exact algorithm: Brute-force enumeration over the configuration space: all sequences of at most K operations, each being a delete, adjacent swap, or no-op, gives O((2*source_length+1)^bound_k) in the worst case.
declare_variants!complexity string:"(2 * source_length + 1)^bound_k"- Special polynomial cases:
- If insert and change operations are also allowed (even without swap), solvable in O(|x| * |y|) time (Wagner-Fischer, 1974).
- If only adjacent swap is allowed (no delete), solvable in polynomial time (Wagner, 1975) -- equivalent to counting inversions.
- Binary alphabet: some restricted cases are polynomial (Meister, 2015).
- References:
- [Wagner, 1975] R. A. Wagner, "On the complexity of the extended string-to-string correction problem", Proc. 7th STOC, pp. 218-223.
- [Wagner and Fischer, 1974] R. A. Wagner and M. J. Fischer, "The string-to-string correction problem", JACM 21, pp. 168-173.
Extra Remark
Full book text:
INSTANCE: Finite alphabet Sigma, two strings x,y in Sigma*, and a positive integer K.
QUESTION: Is there a way to derive the string y from the string x by a sequence of K or fewer operations of single symbol deletion or adjacent symbol interchange?
Reference: [Wagner, 1975]. Transformation from SET COVERING.
Comment: Solvable in polynomial time if the operation set is expanded to include the operations of changing a single character and of inserting a single character, even if interchanges are not allowed (e.g., see [Wagner and Fischer, 1974]), or if the only operation is adjacent symbol interchange [Wagner, 1975]. See reference for related results for cases in which different operations can have different costs.
How to solve
- It can be solved by (existing) bruteforce. Enumerate all sequences of at most K operations (each being a delete or adjacent swap at some position), apply each sequence to x, and check whether the result equals y.
- It can be solved by reducing to integer programming.
- Other: For small alphabet or special cases, dynamic programming approaches exist (CELLAR algorithm).
Example Instance
alphabet_size = 4 (symbols: {0, 1, 2, 3})
source = [0, 1, 2, 3, 1, 0] (length 6)
target = [0, 1, 3, 2, 1] (length 5)
bound_k = 2
Step-by-step solution (one possible sequence of 2 operations):
- Start: [0, 1, 2, 3, 1, 0]
- Swap positions 2 and 3 (values 2 and 3): [0, 1, 3, 2, 1, 0]
- Delete position 5 (value 0): [0, 1, 3, 2, 1]
- Result: [0, 1, 3, 2, 1] = target
Total operations: 2 (swap + delete) = bound_k. Answer: YES.
Verification that fewer operations are insufficient:
- With 0 operations: [0, 1, 2, 3, 1, 0] != [0, 1, 3, 2, 1] (different length and content).
- With 1 operation: a single delete removes one character from a 6-element string to get a 5-element string, but no single deletion of [0, 1, 2, 3, 1, 0] yields [0, 1, 3, 2, 1]. A single swap keeps length 6, which cannot equal the 5-element target. So 1 operation is insufficient.
- The minimum cost is 2, which equals bound_k.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status