Skip to content

[Model] StringToStringCorrection #439

@isPANN

Description

@isPANN

Motivation

STRING-TO-STRING CORRECTION (P168) from Garey & Johnson, A4 SR20. A classical NP-complete problem concerning the minimum-cost transformation of one string into another using only deletion and adjacent-symbol interchange operations. While the standard edit distance (with insert, delete, change) is solvable in polynomial time via dynamic programming (Wagner-Fischer algorithm), restricting the operation set to only deletions and adjacent swaps makes the problem NP-complete for unbounded alphabets.

Associated reduction rules:

  • R114: SET COVERING -> STRING-TO-STRING CORRECTION (this is the GJ reference reduction)

Definition

Name: StringToStringCorrection
Canonical name: String-to-String Correction (also: Extended Edit Distance with Swaps and Deletions)
Reference: Garey & Johnson, Computers and Intractability, A4 SR20

Mathematical definition:

INSTANCE: Finite alphabet Sigma, two strings x,y in Sigma*, and a positive integer K.
QUESTION: Is there a way to derive the string y from the string x by a sequence of K or fewer operations of single symbol deletion or adjacent symbol interchange?

The problem is a decision (satisfaction) problem: the answer is YES or NO depending on whether the restricted edit distance (using only swap and delete) from x to y is at most K.

Variables

  • Count: bound_k (the maximum number of operations K). Encoded as K operation slots.
  • Per-variable domain: Each operation slot is an integer in {0, ..., 2*source_length}. Values 0..source_length represent "delete at position i", values source_length..2*source_length represent "swap adjacent positions i and i+1", and the value 2*source_length represents "no-op".
  • dims(): vec![2 * source.len() + 1; bound_k]
  • evaluate(): Apply the sequence of operations left-to-right to a mutable copy of source. Skip no-ops. If any delete/swap index is out of bounds for the current intermediate string, return false. After all operations, return true iff the result equals target.

Schema (data type)

Type name: StringToStringCorrection
Variants: none (no graph or weight type parameter; operates on strings over a finite alphabet)

Field Type Description
alphabet_size usize Size of the finite alphabet Sigma (symbols indexed 0..alphabet_size)
source Vec<usize> The source string x, encoded as a sequence of symbol indices
target Vec<usize> The target string y, encoded as a sequence of symbol indices
bound_k usize The positive integer K: maximum number of operations allowed

Size fields (getter methods for overhead expressions and declare_variants!):

  • alphabet_size() — returns alphabet_size
  • source_length() — returns source.len()
  • target_length() — returns target.len()
  • bound_k() — returns bound_k

Complexity

  • Decision complexity: NP-complete for unbounded alphabet size (Wagner, 1975; transformation from SET COVERING).
  • Best known exact algorithm: Brute-force enumeration over the configuration space: all sequences of at most K operations, each being a delete, adjacent swap, or no-op, gives O((2*source_length+1)^bound_k) in the worst case.
  • declare_variants! complexity string: "(2 * source_length + 1)^bound_k"
  • Special polynomial cases:
    • If insert and change operations are also allowed (even without swap), solvable in O(|x| * |y|) time (Wagner-Fischer, 1974).
    • If only adjacent swap is allowed (no delete), solvable in polynomial time (Wagner, 1975) -- equivalent to counting inversions.
    • Binary alphabet: some restricted cases are polynomial (Meister, 2015).
  • References:
    • [Wagner, 1975] R. A. Wagner, "On the complexity of the extended string-to-string correction problem", Proc. 7th STOC, pp. 218-223.
    • [Wagner and Fischer, 1974] R. A. Wagner and M. J. Fischer, "The string-to-string correction problem", JACM 21, pp. 168-173.

Extra Remark

Full book text:

INSTANCE: Finite alphabet Sigma, two strings x,y in Sigma*, and a positive integer K.
QUESTION: Is there a way to derive the string y from the string x by a sequence of K or fewer operations of single symbol deletion or adjacent symbol interchange?
Reference: [Wagner, 1975]. Transformation from SET COVERING.
Comment: Solvable in polynomial time if the operation set is expanded to include the operations of changing a single character and of inserting a single character, even if interchanges are not allowed (e.g., see [Wagner and Fischer, 1974]), or if the only operation is adjacent symbol interchange [Wagner, 1975]. See reference for related results for cases in which different operations can have different costs.

How to solve

  • It can be solved by (existing) bruteforce. Enumerate all sequences of at most K operations (each being a delete or adjacent swap at some position), apply each sequence to x, and check whether the result equals y.
  • It can be solved by reducing to integer programming.
  • Other: For small alphabet or special cases, dynamic programming approaches exist (CELLAR algorithm).

Example Instance

alphabet_size = 4 (symbols: {0, 1, 2, 3})
source = [0, 1, 2, 3, 1, 0] (length 6)
target = [0, 1, 3, 2, 1] (length 5)
bound_k = 2

Step-by-step solution (one possible sequence of 2 operations):

  1. Start: [0, 1, 2, 3, 1, 0]
  2. Swap positions 2 and 3 (values 2 and 3): [0, 1, 3, 2, 1, 0]
  3. Delete position 5 (value 0): [0, 1, 3, 2, 1]
  4. Result: [0, 1, 3, 2, 1] = target

Total operations: 2 (swap + delete) = bound_k. Answer: YES.

Verification that fewer operations are insufficient:

  • With 0 operations: [0, 1, 2, 3, 1, 0] != [0, 1, 3, 2, 1] (different length and content).
  • With 1 operation: a single delete removes one character from a 6-element string to get a 5-element string, but no single deletion of [0, 1, 2, 3, 1, 0] yields [0, 1, 3, 2, 1]. A single swap keeps length 6, which cannot equal the 5-element target. So 1 operation is insufficient.
  • The minimum cost is 2, which equals bound_k.

Metadata

Metadata

Assignees

No one assigned

    Labels

    GoodAn issue passed all checks.modelA model problem to be implemented.

    Type

    No type

    Projects

    Status

    Ready

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions