Skip to content

[Model] AdditionalKey #445

@isPANN

Description

@isPANN

Motivation

ADDITIONAL KEY (P175) from Garey & Johnson, A4 SR27. A classical NP-complete problem from relational database theory that asks whether a relational schema has a candidate key not already in a given set of known keys. This problem is central to database normalization: when designing schemas, one must enumerate all candidate keys to determine normal forms (especially BCNF). The NP-completeness of this problem means that verifying completeness of key enumeration is computationally intractable.

Associated rules:

  • R121: Hitting Set -> Additional Key (this model is the target)

Definition

Name: AdditionalKey
Canonical name: Additional Key
Reference: Garey & Johnson, Computers and Intractability, A4 SR27, p.232

Mathematical definition:

INSTANCE: A set A of attribute names, a collection F of functional dependencies on A, a subset R ⊆ A, and a set K of keys for the relational scheme <R,F>.
QUESTION: Does R have a key not already contained in K, i.e., is there an R' ⊆ R such that R' not in K, (R',R) in F*, and for no R'' strictly contained in R' is (R'',R) in F*?

Variables

  • Count: |R| binary variables, one per attribute in the relation scheme R.
  • Per-variable domain: binary {0, 1} — whether the attribute is included in the candidate key R'.
  • dims(): vec![2; relation_attrs.len()]
  • evaluate(): Let R' = {relation_attrs[i] : config[i] = 1}. Compute the closure of R' under F*. Return true iff (1) closure(R') = R (the full relation attributes), (2) R' is minimal (no proper subset of R' also has closure = R), and (3) R' is not in known_keys.

Schema (data type)

Type name: AdditionalKey
Variants: none (no graph or weight parameters)

Field Type Description
num_attributes usize Number of attributes in A (attributes indexed 0..num_attributes)
dependencies Vec<(Vec<usize>, Vec<usize>)> Functional dependencies F; each pair (lhs, rhs) means lhs -> rhs
relation_attrs Vec<usize> Subset R ⊆ A: the relation scheme attributes
known_keys Vec<Vec<usize>> Set K of known candidate keys for <R,F>

Size fields (getter methods for overhead expressions and declare_variants!):

  • num_attributes() — returns num_attributes
  • num_dependencies() — returns dependencies.len()
  • num_relation_attrs() — returns relation_attrs.len()
  • num_known_keys() — returns known_keys.len()

Notes:

  • This is a satisfaction (decision) problem: Metric = bool, implementing SatisfactionProblem.
  • No weight type is needed.
  • The answer is YES if and only if there exists a candidate key for <R,F> that is not in the known set K.
  • The problem is related to key enumeration: if the Lucchesi-Osborne algorithm finds all keys, then checking if any are missing from K is straightforward, but the enumeration itself may produce exponentially many keys.
  • The input includes both the full attribute set A (for functional dependency context) and the relation subset R.

Complexity

  • Best known exact algorithm: Enumerate all subsets of R, check each for key property and absence from K. Worst case: O(2^num_relation_attrs * num_dependencies * num_attributes).
  • NP-completeness: NP-complete [Beeri and Bernstein, 1979], via transformation from HITTING SET.
  • declare_variants! complexity string: "2^num_relation_attrs * num_dependencies * num_attributes"
  • Relationship to Hitting Set: The reduction from Hitting Set encodes the covering constraint as a key-determination constraint, making the additional key search equivalent to finding an uncovered hitting set.
  • References:
    • C. Beeri and P. A. Bernstein (1979). "Computational problems related to the design of normal form relational schemas." ACM Transactions on Database Systems, 4(1), pp. 30-59.

Extra Remark

Full book text:

INSTANCE: A set A of attribute names, a collection F of functional dependencies on A, a subset R ⊆ A, and a set K of keys for the relational scheme <R,F>.
QUESTION: Does R have a key not already contained in K, i.e., is there an R' ⊆ R such that R' not in K, (R',R) ∈ F*, and for no R'' ⊆ R' is (R'',R) ∈ F*?
Reference: [Beeri and Bernstein, 1978]. Transformation from HITTING SET.

Connection to BCNF normalization: A relation scheme <R,F> is in Boyce-Codd Normal Form (BCNF) if and only if every non-trivial functional dependency X -> Y has X as a superkey. Checking BCNF requires knowing all candidate keys, which in turn requires solving the Additional Key problem to ensure no keys have been missed. The NP-completeness of Additional Key implies that BCNF testing is also intractable in general (Beeri and Bernstein, 1979).

How to solve

  • It can be solved by (existing) bruteforce -- enumerate all subsets of R, check each for key property (closure = R, minimality), and verify it is not in K.
  • It can be solved by reducing to integer programming -- binary variable per attribute in R; constraint that closure covers R; constraint that the selected set is not equal to any known key (expressed via at least one differing attribute per known key); minimality as additional constraint.
  • Other: Use Lucchesi-Osborne key enumeration algorithm to find all keys, then check against K. Alternatively, reduce to SAT: encode closure computation, minimality, and exclusion of known keys as Boolean constraints.

Example Instance

Instance 1 (YES — additional key exists):
num_attributes = 6 (attributes: {0, 1, 2, 3, 4, 5})
relation_attrs = [0, 1, 2, 3, 4, 5] (full relation)
Functional dependencies F:

  • {0, 1} -> {2, 3}
  • {2, 3} -> {4, 5}
  • {4, 5} -> {0, 1}
  • {0, 2} -> {3}
  • {3, 5} -> {1}

known_keys = [[0, 1], [2, 3], [4, 5]]

The three known keys form a cycle: {0,1} -> {2,3} -> {4,5} -> {0,1}. But there is a hidden key {0, 2} that requires a 3-step closure chain:

  • Start: {0, 2}
  • Apply {0, 2} -> {3}: closure = {0, 2, 3}
  • Apply {2, 3} -> {4, 5}: closure = {0, 2, 3, 4, 5}
  • Apply {4, 5} -> {0, 1}: closure = {0, 1, 2, 3, 4, 5} = A
  • {0, 2} is minimal: {0} alone has closure {0}, {2} alone has closure {2}. Neither determines A.
  • {0, 2} is not in known_keys.

Answer: YES, {0, 2} is an additional key not in known_keys.

Instance 2 (NO — no additional key):
num_attributes = 3 (attributes: {0, 1, 2})
relation_attrs = [0, 1, 2]
Functional dependencies F:

  • {0} -> {1, 2}

known_keys = [[0]]

Analysis:

  • {0}: closure = {0,1,2} = A. Key.
  • {1}: closure = {1}. Not a key.
  • {2}: closure = {2}. Not a key.
  • {0,1}: contains {0} which is already a key, so not minimal.
  • {0,2}: same, contains {0}.
  • {1,2}: closure = {1,2}. Not a key.
  • The only candidate key is {0}, which is already in known_keys.

Answer: NO, no additional key exists.

Metadata

Metadata

Assignees

No one assigned

    Labels

    GoodAn issue passed all checks.modelA model problem to be implemented.

    Type

    No type

    Projects

    Status

    Ready

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions