Skip to content

Resolver logic bug does not permit multiple ACME challenges for multiple subdomains #157

@DCKcode

Description

@DCKcode

What happened?

This webhook has a bug that causes only one ACME DNS challenge can exist at the same time for multiple subdomains, although it's perfectly valid and expected to have multiple.

As a result, it's not possible for this webhook to support a cluster with multiple different subdomains - it will randomly cause certificate renewal to fail. This contention problem grows when using more subdomains, as the chances they will request subdomains at close enough contend for challenges increases. I should note that it's not always possible to use the HTTP-01 challenge as an alternative, as using that requires the actual HTTP endpoints to be publicly available. The DNS-01 challenge allows getting a certificate for an endpoint that is not publicly available.

How can we reproduce this?

Here's what happens:

  1. Alice requests a certificate for alice.example.com using the DNS-01 challenge
  2. stackit-cert-manager-webhook creates a TXT record at _acme_challenge.example.com with a challenge for alice.example.com
  3. Slightly later, Bob requests a certificate for bob.example.com using the DNS-01 challenge
  4. stackit-cert-manager-webhook attempts to create a TXT record at _acme_challenge.example.com

What should happen at this point:

  • stackit-cert-manager-webhook creates a second, separate TXT record at _acme_challenge.example.com with a challenge for bob.example.com
  • Both Alice and Bob get a new certificate for their subdomains.

What actually happens instead (this is the bug):

  • stackit-cert-manager-webhook finds that an existing set of records already exists (since Alice's record already exists), and just updates Alice's existing TXT record with a new TTL. Bob's request is completely ignored otherwise, and no challenge is created for Bob's domain.
  • Only Alice gets a new certificate for her subdomain. Bob doesn't.
  • Every time Bob requests a new certificate, stackit-cert-manager-webhook just updates Alice's ACME challenge with a new TTL instead. So Bob remains without a new certificate until after a cleanup of Alice's challenge occurs.

Search

  • I did search for other open and closed issues before opening this.

Code of Conduct

  • I agree to follow this project's Code of Conduct

Additional context

Note that Let's Encrypt in particular expects multiple TXT records.

You can have multiple TXT records in place for the same name.

There's also this old cert-manager discussion where this was explicitly allowed and other implementations were updated.

There seems to be a related bug in CleanUp() - the cert-manager specification quoted in the comments states that only the requested challenge key should be deleted, but the implementation just throws out the entire set for a specific domain. Both codepaths seem to implement a hidden assumption that only one ACME challenge can exist at the time, whereas that is both allowed and expected.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions