Skip to content

ore,server-core,pgwire,balancerd,environmentd: migrate TLS from openssl to rustls#35866

Draft
jasonhernandez wants to merge 42 commits intomainfrom
jason/sec-218-core-tls-rustls-v2
Draft

ore,server-core,pgwire,balancerd,environmentd: migrate TLS from openssl to rustls#35866
jasonhernandez wants to merge 42 commits intomainfrom
jason/sec-218-core-tls-rustls-v2

Conversation

@jasonhernandez
Copy link
Copy Markdown
Contributor

Summary

  • mz-ore: Add crypto module with FIPS-aware CryptoProvider helper. Replace openssl/tokio-openssl in async feature with tokio-rustls. Replace native-tls/hyper-tls in tracing feature with hyper-rustls. Update AsyncReady impls for tokio-rustls stream types.
  • mz-server-core: Rewrite TlsCertConfig::load_context() to produce rustls::ServerConfig. Update ReloadingSslContext to wrap Arc<RwLock<Arc<ServerConfig>>> with acceptor() method.
  • mz-pgwire-common: Introduce TlsStream enum wrapping both server and client rustls streams with SNI extraction support.
  • mz-pgwire: Update SSL accept to TlsAcceptor pattern.
  • mz-balancerd: Migrate server-side TLS accept, client-side TLS connect (with NoVerifier for internal connections), and SNI extraction.
  • mz-environmentd: Migrate HTTP server TLS accept and console proxy HTTPS client from hyper-tls to hyper-rustls.
  • tests: Migrate JWT signing from EC to RSA keys (matching Frontegg/OIDC RS256), update error message assertions for rustls.

Part of SEC-218 (PR 6: Core TLS infrastructure and downstream consumers).

Test plan

  • cargo check passes for all modified crates
  • cargo check -p mz-environmentd --features test passes
  • clippy, rustfmt, lint all pass (Linux + macOS)
  • cargo-test-1 and cargo-test-2 pass
  • mzcompose integration tests (pending Docker image build)

Checklist

  • Code compiles cleanly
  • cargo fmt clean
  • Full CI passes
  • Release notes updated (if user-visible)

🤖 Generated with Claude Code

jasonhernandez and others added 30 commits April 2, 2026 10:10
Add bin/lint-openssl to detect all openssl dependencies, feature flags,
and source imports across the workspace. This is the first step toward
migrating from openssl to rustls—it serves as a tracking tool for
migration progress and can later be promoted to a CI gate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add migration plan (doc/developer/openssl-to-rustls-migration.md) with
tiered breakdown of all 28 affected crates, dependency graph, replacement
crate mapping, and links to Linear issues (SEC-176 through SEC-200).

Include raw linter output snapshots (.txt and .json) as baseline for
tracking progress.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace all non-FIPS crypto crate recommendations (ring, sha2, hmac,
pbkdf2, subtle, rsa, ed25519-dalek, aes+cbc) with aws-lc-rs equivalents.
Add FIPS 140-3 strategy section, workspace fips feature flag (SEC-201),
and updated replacement crate mapping table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add [[bans.deny]] entries for crypto crates that are not FIPS 140-3
validated: sha2, hmac, subtle, ring, pbkdf2, ed25519-dalek, aes, cbc,
rsa. All new crypto code must use aws-lc-rs instead.

Existing workspace and third-party usage is allowed via wrappers, with
TODO comments to remove them as each crate is migrated to aws-lc-rs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add bin/lint-fips-containers to scan Dockerfiles for FIPS 140-3
compliance gaps: non-FIPS base images, crypto-relevant package
installations, and non-FIPS algorithms in cert generation scripts.

Distinguishes production images (must comply) from test/dev
(informational). Supports --strict and --json flags.

Current results: 8 production findings across 4 base images.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Covers all three compliance layers: Rust binaries (137 openssl findings
across 28 crates + sha2/hmac/subtle), container images (8 production
findings across 4 base images), and Kubernetes/Helm deployment (Ed25519,
image validation, external services, FIPS toggle).

Includes full issue inventory (SEC-176 through SEC-213), remediation
strategy, recommended execution order, and FIPS validation caveat.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove the rustls ban from deny.toml, unblocking all openssl-to-rustls
migration work.

Add `aws-lc-rs` as an optional dependency in mz-ore with two feature
flags:
- `crypto`: enables aws-lc-rs in standard mode
- `fips`: enables aws-lc-rs with FIPS 140-3 validated module

mz-ore is the natural distribution channel since every crate in the
workspace depends on it. Downstream crates enable `mz-ore/crypto`
(or `mz-ore/fips` for FIPS builds) to get the validated backend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `fips` feature on mz-ore enables `aws-lc-rs/fips`, which pulls in
`aws-lc-fips-sys`. That crate builds BoringSSL's FIPS module via cmake,
requiring Go for integrity verification. Since cargo-test runs with
`--all-features`, Go must be available in the CI builder.

Fixes SEC-232.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
uuid 1.23.0 changed error message format which breaks the
fmt_ids test in mz-persist-types.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pin three deps that were inadvertently bumped during Cargo.lock
regeneration:
- os_info 3.11.0: avoids objc2 0.6.x which causes E0275 on macOS
- chrono-tz 0.8.1: avoids Egypt timezone data change that breaks
  test_pg_timezone_abbrevs
- serde_path_to_error 0.1.8: avoids error message format change
  that breaks test_mcp_observatory

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Switch TLS backend from openssl/native-tls to rustls:
- mz-cloud-resources: kube openssl-tls → rustls-tls
- mz-npm: reqwest native-tls-vendored → rustls-tls-webpki-roots-no-provider
- mz-testdrive: reqwest native-tls-vendored → rustls-tls-webpki-roots-no-provider

Uses the -no-provider variant for reqwest to avoid pulling in ring,
allowing aws-lc-rs to serve as the crypto provider instead.

Deferred: tiberius (SEC-223, fork needs rustls fix), segment and duckdb
(no rustls feature available), storage-types (has direct native-tls dep).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The client-legacy feature was previously activated transitively. After
Cargo.lock regeneration, the transitive activation stopped and
`hyper_openssl::client` became configured out. Enable it explicitly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The webpki-roots 1.0.6 crate uses the CDLA-Permissive-2.0 license, which
is already allowed in deny.toml but was missing from about.toml (the
cargo-about config that must be manually kept in sync).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- mz-aws-util: remove custom hyper-tls HTTP client override; the AWS SDK
  already uses rustls by default, so the native-TLS override was
  unnecessary
- mz (CLI): reqwest default-tls → rustls-tls-webpki-roots-no-provider
- mz-persist: reqwest default-tls → rustls-tls-webpki-roots-no-provider

Deferred: mz-dyncfg-launchdarkly (LD SDK takes hyper_tls::HttpsConnector
directly — needs upstream/fork change), mz-persist openssl-sys removal
(has openssl_sys::init() hack that needs investigation), mz CLI
openssl-probe removal (needs source changes for cert discovery).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove references to native-TLS policy override and hyper-tls dep in
generated docs. The AWS SDK's default rustls client is now used directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The client-legacy feature was previously activated transitively through
hyper-tls in mz-ore. After replacing hyper-tls with hyper-rustls, the
transitive activation stopped and `hyper_openssl::client` became
configured out. Enable it explicitly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The PR removed the custom hyper-tls HTTP client from aws-util but
didn't enable the `default-https-client` feature on aws-config. With
default-features = false, no HTTP client was bundled, causing
environmentd to crash on startup.

Also pin os_info to 3.11.0 to avoid pulling in objc2 0.6.x which
causes E0275 overflow on macOS clippy due to its blanket
IntoIterator impl on Retained<T>.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eature

Add a `telemetry` feature (default-enabled) to mz-adapter that gates
`launchdarkly-server-sdk` and `mz-segment` as optional deps. Add #[cfg]
guards on LD-specific code in config.rs and config/frontend.rs.

WIP: segment client refs in client.rs, coord.rs, coord/ddl.rs, and
coord/message_handler.rs still need cfg guards. The pattern is proven
but the threading is extensive — see SEC-229 for remaining work.

Compiles with default features. Does not yet compile with
--no-default-features (missing segment cfg guards).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…for FIPS

In FIPS mode, non-essential third-party SDKs must be excluded from the
binary at compile time (not just disabled at runtime). This adds a
`telemetry` Cargo feature to mz-adapter, mz-environmentd, and
mz-balancerd, plus a `sentry` feature to mz-ore, mz-orchestrator-tracing,
and mz-service.

When these features are disabled:
- Segment analytics client is compiled out via SegmentClient type alias
- LaunchDarkly SDK and dyncfg sync are excluded
- Sentry error reporting and panic integration are excluded
- CLI args are still accepted but values are ignored

All features are default-enabled so standard builds are unaffected.
FIPS builds use `--no-default-features` to exclude them.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a section to the FIPS compliance report documenting the Cargo feature
flags that gate third-party telemetry SDKs (Segment, LaunchDarkly, Sentry)
for FIPS builds.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace all openssl cryptographic primitives in src/auth/src/hash.rs
with aws-lc-rs equivalents to ensure FIPS 140-3 compliance:

- openssl::rand::rand_bytes -> aws_lc_rs::rand::SystemRandom
- openssl::memcmp::eq -> aws_lc_rs::constant_time::verify_slices_are_equal
- openssl::pkey::PKey::hmac + openssl::sign::Signer -> aws_lc_rs::hmac
- openssl::sha::sha256 -> aws_lc_rs::digest
- openssl::pkcs5::pbkdf2_hmac -> aws_lc_rs::pbkdf2

Removes the openssl dependency from mz-auth entirely.

Part of SEC-198.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
### Motivation

This is a stacked PR for OIDC login PR:
#35440
This PR let's the user retrieve the ID token for psql connection string

Changes that would go in are from the last commit 

### Description

- Added OIDC Connection modal similar to Connect modal for cloud console
to show the connection instructions and ID token

<img width="2680" height="1598" alt="image"
src="https://github.com/user-attachments/assets/494b2949-827f-489d-afd9-6ca86bf890b5"
/>



### Verification
Once logged in using SSO, take the connection string and put that in the
terminal. You will be prompted to put in a password so copy and paste
the id token to get authenticated
Context: Instructions for running bin/environmentd with postgres cause a
panic
https://materializeinc.slack.com/archives/CU7ELJ6E9/p1775151866083609.
Replace sha2, hmac, and subtle crate usage with aws-lc-rs equivalents
to consolidate on a single FIPS 140-3 validated crypto backend.

Changes by crate:
- mz-adapter: sha2::Sha256 → aws_lc_rs::digest
- mz-avro: sha2/digest traits → aws_lc_rs::digest (fingerprint API
  changed from generic type parameter to algorithm reference)
- mz-catalog: sha2::Sha256 → aws_lc_rs::digest
- mz-expr: sha2/sha1 digests → aws_lc_rs::digest, hmac (sha1-512) →
  aws_lc_rs::hmac, subtle::ConstantTimeEq → aws_lc_rs::constant_time
  (hmac crate retained for MD5 HMAC only)
- mz-fivetran-destination: sha2::Sha256 → aws_lc_rs::digest
- mz-license-keys: sha2::Sha256 → aws_lc_rs::digest
- mz-npm: sha2::Sha256 → aws_lc_rs::digest
- mz-orchestrator-kubernetes: sha2::Sha256 → aws_lc_rs::digest
- mz-orchestratord: sha2::Sha256 → aws_lc_rs::digest
- mz-persist: removed sha2 "asm" perf dependency (aws-lc-rs uses
  native assembly)
- mz-storage: sha2 Digest trait → aws_lc_rs::digest::Context

Dependencies removed: sha2 (10 crates), subtle (1 crate), sha1 (1
crate), digest (1 crate). The hmac crate remains in mz-expr for
HMAC-MD5 (not available in aws-lc-rs).

Part of SEC-206.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…c-rs/rustls

Migrate 7 leaf crates away from direct openssl/native-tls dependencies as part
of FIPS 140-3 compliance:

- mz-adapter: openssl::rand::rand_bytes → aws_lc_rs::rand::fill
- mz-catalog: openssl::rand::rand_bytes → aws_lc_rs::rand::fill,
  openssl::sha::sha256 → aws_lc_rs::digest
- mz-ssh-util: Ed25519 keygen from openssl PKey → aws_lc_rs::signature
- mz-frontegg-mock: test RSA keygen from openssl → aws_lc_rs::rsa
- mz-oidc-mock: RSA key parsing from openssl → aws_lc_rs::rsa + manual DER
- mz-ccsr: native-tls cert handling → base64 PEM parsing + reqwest validation,
  reqwest native-tls-vendored → rustls-tls-webpki-roots
- mz-storage-types: remove NativeTls/Openssl error variants from CsrConnectError

mz-debug and mz-postgres-util migrations are blocked by SEC-192 (mz-tls-util)
since they consume mz_tls_util::make_tls which returns OpenSSL-based types.

Part of SEC-220.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ustls

Replace openssl-based test certificate generation and TLS connector
construction with rcgen (cert generation) and tokio-postgres-rustls.

- Ca struct now uses rcgen::CertificateParams + KeyPair instead of
  openssl X509/PKey. Certificate and key are stored as PEM bytes.
- New TestTlsConfig builder replaces the closure-based
  SslConnectorBuilder pattern with a declarative config struct.
- make_pg_tls now takes TestTlsConfig and returns MakeRustlsConnect.

Test files (auth.rs, server.rs, balancerd/tests) still need call-site
migration to the new API — tracked as remaining work for SEC-219.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jasonhernandez and others added 9 commits April 3, 2026 11:24
Update all test files to use the new TestTlsConfig-based API:

- auth.rs: Migrate ~50 make_pg_tls call sites, replace SslConnectorBuilder
  closures with TestTlsConfig builder. Switch JWT from RS256 to ES256
  (matching rcgen's ECDSA key generation). Stub make_http_tls/make_ws_tls
  with TODO comments for full rustls migration.
- environmentd/tests/server.rs: Migrate make_pg_tls calls, JWT keys,
  reqwest cert access, and X509 comparisons (with TODO stubs).
- balancerd/tests/server.rs: Migrate make_pg_tls calls, JWT keys,
  reqwest cert access, and X509 comparisons (with TODO stubs).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace TODO stubs with working implementations:

- peer_certificate_der(): raw tokio-rustls handshake to inspect peer
  certificates. reqwest's TlsInfo::peer_certificate() only works with
  the native-tls backend, returning None with rustls — so we drop down
  to tokio_rustls::TlsConnector directly where
  ServerConnection::peer_certificates() always works.

- cert_file_to_der(): parse PEM cert files to DER for comparison.

- make_http_tls(): now honors TestTlsConfig (builds hyper-rustls
  connector from the client config that trusts the test CA).

- make_ws_tls(): uses rustls::StreamOwned for synchronous TLS
  WebSocket connections.

Cert reloading test assertions in both environmentd and balancerd are
now fully restored — no remaining TODO stubs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite the central TLS utility crate to use rustls instead of openssl:

- make_tls: returns MakeRustlsConnect (rustls-based) instead of
  postgres-openssl MakeTlsConnector. Supports SslMode verification,
  client certificates, and a NoVerifier for non-verifying modes.
- pkcs12der_from_pem: validates PEM with rustls-pki-types instead of
  openssl. Stores concatenated PEM in the Pkcs12Archive for backward
  compatibility (consumers use reqwest::Identity::from_pem).
- TlsError: OpenSsl variant replaced with Rustls variant.
- MakeRustlsConnect + RustlsConnect: implements tokio_postgres MakeTlsConnect
  trait using tokio-rustls, with RustlsTlsStream wrapper for TlsStream trait.

Updated consumers:
- mz-postgres-util: removed openssl + postgres-openssl deps, updated error types
- mz-postgres-client: updated TlsError match arm
- mz-debug: replaced MakeTlsConnector/TlsStream with rustls equivalents
- mz-ccsr: pkcs12der_from_pem error type changed (already updated in SEC-220)
- mz-storage-types: pkcs12der_from_pem returns anyhow::Error (compatible)

Part of SEC-192.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sl to rustls

Replace openssl/tokio-openssl/native-tls/hyper-tls with rustls/tokio-rustls/
hyper-rustls across the core TLS infrastructure and all downstream consumers:

- mz-ore: Add `crypto` module with FIPS-aware CryptoProvider helper.
  Replace openssl/tokio-openssl in `async` feature with tokio-rustls.
  Replace native-tls/hyper-tls in `tracing` feature with hyper-rustls.
  Update AsyncReady impls for tokio-rustls stream types.
- mz-server-core: Rewrite TlsCertConfig::load_context() to produce
  rustls::ServerConfig. Update ReloadingSslContext to wrap
  Arc<RwLock<Arc<ServerConfig>>> with acceptor() method.
- mz-pgwire-common: Introduce TlsStream enum wrapping both server and
  client rustls streams with SNI extraction support.
- mz-pgwire: Update SSL accept to TlsAcceptor pattern.
- mz-balancerd: Migrate server-side TLS accept, client-side TLS connect
  (with NoVerifier for internal connections), and SNI extraction.
- mz-environmentd: Migrate HTTP server TLS accept and console proxy
  HTTPS client from hyper-tls to hyper-rustls.

Part of SEC-218.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix clippy::clone_on_ref_ptr in tls-util (Arc::clone instead of .clone())
- Fix rustfmt line width in pgwire server.rs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Frontegg mock server uses RS256 (RSA) for JWT signing, but the tests
were passing the CA's ECDSA key via `from_ec_pem()`. This worked with
the old openssl backend (which generated RSA keys for CAs) but fails
with rcgen (which generates ECDSA keys).

Use `Ca::generate_jwt_rsa_keypair()` to create a separate RSA keypair
for JWT signing, and switch all `from_ec_pem` calls to `from_rsa_pem`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…wt_keys

- DecodingKey needs the RSA public key, not private key
- Add jwt_keys initialization to OIDC-only test functions that were
  missing it
- Remove unused JwtRsaKeyPair import from balancerd tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Change JWT algorithm from ES256 to RS256 to match RSA keys
- Fix OIDC issuer switch test: use jwt_keys instead of ca1.key_pem
- Update error message assertions for rustls:
  - "packet length too long" → "InvalidContentType"
  - "unable to get local issuer certificate" → "UnknownIssuer"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix second occurrence of "unable to get local issuer certificate"
  assertion that was missed in prior commit
- Update miri ignore comments: remove stale OPENSSL_init_ssl reference
- Clean up outdated SslConnectorBuilder reference in test_util.rs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

jasonhernandez and others added 3 commits April 3, 2026 16:08
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fips_crypto_provider() now calls install_default() so that all rustls
usage in the process (including transitive deps like hyper-rustls,
tokio-postgres-rustls, and any code that doesn't explicitly pass a
provider) picks up aws-lc-rs automatically.

This fixes materialized crashing on startup in Docker containers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rovider"

install_default() causes SIGABRT during process shutdown when both
aws-lc-rs and openssl (via rdkafka) are linked. Revert to the explicit
provider-passing approach which worked in build 120006.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants