Skip to content

feat: reimplement dropped features (reload, health, tenant guard)#81

Merged
intel352 merged 14 commits intomainfrom
feat/reimplementation
Mar 10, 2026
Merged

feat: reimplement dropped features (reload, health, tenant guard)#81
intel352 merged 14 commits intomainfrom
feat/reimplementation

Conversation

@intel352
Copy link
Contributor

@intel352 intel352 commented Mar 9, 2026

Summary

Reimplements 3 of the 4 features dropped during the CrisisTextLine/modular upstream reset:

Dynamic Reload Manager

  • Reloadable interface for modules that support live config reloading
  • ReloadOrchestrator with buffered request queue, atomic CAS single-flight, circuit breaker (exponential backoff), and rollback on partial failure
  • ConfigDiff with change tracking, prefix filtering, sensitive field redaction
  • 4 reload lifecycle events via CloudEvents observer pattern
  • 10 tests including concurrency and rollback verification

Aggregate Health Service

  • HealthProvider interface with HealthStatus enum (Unknown/Healthy/Degraded/Unhealthy)
  • AggregateHealthService with concurrent fan-out, panic recovery, TTL cache (250ms), force-refresh context key
  • Separate readiness (non-optional only) vs health (all providers) aggregation
  • Temporary error detection (Degraded vs Unhealthy)
  • 3 provider adapters: Simple, Static, Composite
  • 17 tests including panic recovery and cache behavior

TenantGuard Enforcement

  • TenantGuard interface with ValidateAccess() for 3 modes: Strict (block), Lenient (log), Disabled (no-op)
  • Ring buffer for bounded violation history (default 1000 entries, FIFO eviction)
  • Whitelist support for explicit cross-tenant access
  • Builder integration: WithTenantGuardMode(), WithTenantGuardConfig()
  • 13 tests including concurrent access (100 goroutines)

Revised Plans

  • Gap analysis for all 4 features against existing codebase
  • BDD/Contract Testing plan updated (65% already exists, remaining items depend on reload + health)

Test plan

  • go build ./... passes
  • go vet ./... passes
  • All 40 new tests pass with -race flag
  • No modifications to existing interfaces (additive only)

🤖 Generated with Claude Code

intel352 and others added 4 commits March 9, 2026 18:23
Analyzed existing codebase against all 4 plans:
- TenantGuard: ~50% exists (context, service, config, decorators)
- Dynamic Reload: ~25% exists (observer, field tracking, config providers)
- Aggregate Health: ~15% exists (reverseproxy health checker)
- BDD/Contract Testing: ~65% exists (121 tests, contract CLI, CI)

Revised checklists to only cover remaining work.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… emission

Implements HealthProvider interface, HealthReport/AggregatedHealth types,
provider adapters (simple, static, composite), and AggregateHealthService
with fan-out evaluation, panic recovery, cache TTL, temporary error
detection, and CloudEvent emission on status changes. 17 tests including
concurrency and race-detector verified.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduces TenantGuard interface and StandardTenantGuard implementation
with strict/lenient/disabled modes, whitelist support, ring buffer
violation tracking, and CloudEvents emission on violations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…and rollback

Add Reloadable interface to module.go for modules that support runtime
config reloading. Implement ReloadOrchestrator with single-flight
execution, exponential backoff circuit breaker, reverse-order rollback
on partial failure, and CloudEvents emission for reload lifecycle.

New files:
- reload.go: ChangeType, ConfigChange, FieldChange, ConfigDiff, ReloadTrigger types
- reload_orchestrator.go: ReloadOrchestrator with queue, circuit breaker, rollback
- reload_test.go: comprehensive tests (ConfigDiff, orchestrator, rollback, circuit breaker, concurrency)

Also fix pre-existing WithLogger name collision in health_service.go -> WithHealthLogger.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

📋 API Contract Changes Summary

No breaking changes detected - only additions and non-breaking modifications

Changed Components:

Core Framework

Contract diff saved to artifacts/diffs/core.json

Module: auth

Contract diff saved to artifacts/diffs/auth.json

Module: cache

Contract diff saved to artifacts/diffs/cache.json

Module: chimux

Contract diff saved to artifacts/diffs/chimux.json

Module: database

Contract diff saved to artifacts/diffs/database.json

Module: eventbus

Contract diff saved to artifacts/diffs/eventbus.json

Module: eventlogger

Contract diff saved to artifacts/diffs/eventlogger.json

Module: httpclient

Contract diff saved to artifacts/diffs/httpclient.json

Module: httpserver

Contract diff saved to artifacts/diffs/httpserver.json

Module: jsonschema

Contract diff saved to artifacts/diffs/jsonschema.json

Module: letsencrypt

Contract diff saved to artifacts/diffs/letsencrypt.json

Module: logmasker

Contract diff saved to artifacts/diffs/logmasker.json

Module: reverseproxy

Contract diff saved to artifacts/diffs/reverseproxy.json

Module: scheduler

Contract diff saved to artifacts/diffs/scheduler.json

Artifacts

📁 Full contract diffs and JSON artifacts are available in the workflow artifacts.

intel352 and others added 2 commits March 9, 2026 19:16
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reintroduces previously dropped framework capabilities by adding new core primitives for dynamic configuration reloads, aggregate health reporting, and tenant isolation enforcement, plus associated tests and plan updates.

Changes:

  • Add Reloadable + reload diff/change types and a ReloadOrchestrator with queueing, single-flight, rollback, circuit-breaker, and reload lifecycle events.
  • Add aggregate health types (HealthProvider, HealthStatus, reports) and AggregateHealthService with concurrent fan-out, caching, panic recovery, and health events.
  • Add TenantGuard types + StandardTenantGuard with modes, whitelist, ring-buffer violation history, tenant violation events, and builder options.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
tenant_guard.go Adds TenantGuard interfaces/types and default implementation with violation tracking + event emission.
tenant_guard_test.go Adds unit + concurrency tests for tenant guard modes, whitelist, ring buffer, deep copy.
builder.go Adds tenant guard builder options and attempts to construct a guard during build.
errors.go Adds tenant guard sentinel errors.
observer.go Adds new CloudEvent type constants for tenant violations, config reload lifecycle, and health.
module.go Adds the Reloadable optional module interface.
reload.go Adds reload diff/change models (ConfigDiff, FieldChange, triggers) and helper methods.
reload_orchestrator.go Implements reload orchestrator (queue, CAS single-flight, rollback, circuit breaker, events).
reload_test.go Adds tests for diff helpers and orchestrator behavior (success, rollback, circuit breaker, concurrency, noop).
health.go Adds health models (HealthStatus, HealthProvider, reports) and provider adapters.
health_service.go Implements concurrent aggregate health checks with caching and health event emission.
health_test.go Adds tests for providers, aggregation semantics, caching, panic recovery, events, concurrency.
docs/plans/dynamic-reload.md Updates plan doc to reflect current state and remaining items post-reset.
docs/plans/aggregate-health.md Updates plan doc to reflect current state and remaining items post-reset.
docs/plans/tenant-guard.md Updates plan doc to reflect current state and remaining items post-reset.
docs/plans/bdd-contract-testing.md Updates plan doc to reflect existing infra and remaining contract work.

intel352 and others added 2 commits March 9, 2026 19:23
- Replace dynamic errors.New() with sentinel errors (err113)
- Add StatusUnknown case to exhaustive switch (exhaustive)
- Derive request context from parent context (contextcheck)
- Wrap interface method error return (wrapcheck)
- Fix gofmt alignment in builder.go, contract_verifier.go, observer.go
- Update letsencrypt go.sum for httpserver@v1.12.0 checksum

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reload orchestrator:
- Sort targets by module name for deterministic reload/rollback order
- Make Stop() idempotent with sync.Once, reject requests after stop
- Move noop check before started event to avoid misleading event stream
- Apply default 30s timeout when ReloadTimeout() returns <= 0
- Add doc comment noting application integration is a follow-up

Health service:
- Switch from *log.Logger to modular.Logger for consistent structured logging
- Return deep copies from Check() to prevent cache mutation by callers
- Use NewCloudEvent helper for proper event ID/specversion
- Prefer StatusUnhealthy over StatusUnknown on tie-breaks

Tenant guard:
- Switch from *log.Logger to modular.Logger for consistent structured logging
- Deep-copy whitelist in constructor, convert to set for O(1) lookups
- Use NewCloudEvent helper for proper event ID/specversion
- Use structured logging (Warn with key-value pairs) instead of Printf
- Wire guard into application service registry via builder

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 9, 2026 23:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 7 comments.

- Add ErrReloadStopped for stopped orchestrator (distinct from channel full)
- Guard against nil logger with nopLogger fallback in NewReloadOrchestrator
- Fix BenchmarkReload to call processReload directly instead of queue+drain
- Update rollback test to assert deterministic sorted order

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov
Copy link

codecov bot commented Mar 10, 2026

- reload_orchestrator: add recover guard in RequestReload to prevent
  panic on send-to-closed-channel race with Stop()
- builder: use app.RegisterService() instead of direct SvcRegistry map
  mutation for tenant guard registration
- reload_contract_bdd_test: make rollback assertion deterministic now
  that targets are sorted by name

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 10, 2026 00:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 5 comments.

Comment on lines +255 to +276
// worstStatus returns the worse of two health statuses.
// StatusUnknown is treated as StatusUnhealthy for aggregation purposes.
// When both normalize to the same severity, StatusUnhealthy is preferred
// over StatusUnknown.
func worstStatus(a, b HealthStatus) HealthStatus {
ar := normalizeForAggregation(a)
br := normalizeForAggregation(b)
if ar > br {
return a
}
if br > ar {
return b
}
// Tie-break: prefer StatusUnhealthy over StatusUnknown
if a == StatusUnknown && b == StatusUnhealthy {
return b
}
if b == StatusUnknown && a == StatusUnhealthy {
return a
}
return a
}
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worstStatus is documented to treat StatusUnknown as StatusUnhealthy for aggregation, but it returns the original input status when it “wins” the comparison. For example, worstStatus(StatusHealthy, StatusUnknown) returns StatusUnknown, which allows aggregated readiness/health to remain "unknown" instead of "unhealthy". To match the doc, return a normalized status (or explicitly map Unknown→Unhealthy before returning).

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d29ee4a. worstStatus now maps StatusUnknownStatusUnhealthy in the return value, so aggregated health never reports "unknown" — consistent with the documented "treated as Unhealthy" semantics.

reload_test.go Outdated
Comment on lines +267 to +269
// Wait for processing.
time.Sleep(100 * time.Millisecond)

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These reload orchestrator tests rely on fixed time.Sleep(...) delays to wait for async processing. This can be flaky on slower/contended CI runners and can also slow the suite unnecessarily. Prefer deterministic synchronization (e.g., a channel/WaitGroup in the mock Reloadable or polling until a condition/event is observed with a bounded timeout) so the test proceeds as soon as the reload completes.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d29ee4a. Replaced all time.Sleep waits with a waitFor polling helper that checks the condition every 5ms with a 2s timeout. Tests are now deterministic and CI-safe regardless of runner speed.

Comment on lines +157 to +160
diff := rc.newDiff()
rc.reloadErr = rc.orchestrator.RequestReload(rc.ctx, ReloadManual, diff)
time.Sleep(200 * time.Millisecond)
return nil
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Godog step implementations use time.Sleep(...) to wait for asynchronous reload processing. This can introduce intermittent failures and make the BDD suite sensitive to timing. Consider synchronizing on an explicit completion signal (e.g., wait until the expected event type is observed, with a timeout) instead of sleeping a fixed duration.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d29ee4a. Replaced all time.Sleep calls with bddWaitForEvent and bddWaitForCalls polling helpers that wait for the expected event type or call count with a 2s timeout. No more timing-sensitive assertions.

Comment on lines +124 to +133
// Derive a context from the start context. If the request carries
// a deadline, apply it to the parent context.
rctx := ctx
if req.Ctx != nil {
if deadline, ok := req.Ctx.Deadline(); ok {
dctx, cancel := context.WithDeadline(ctx, deadline)
rctx = dctx
defer cancel()
}
}
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Start(), the per-request deadline context uses defer cancel() inside the loop. Those defers won’t run until the goroutine exits, so timers/contexts accumulate and can leak resources under sustained reload traffic. Call cancel immediately after processReload returns (or scope the context creation to a helper function) rather than deferring inside the loop.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d29ee4a. Extracted handleReload() method that creates a scoped context with context.WithCancel and uses defer cancel() within the method — context is released immediately after each reload, no accumulation in the loop.

Comment on lines +124 to +133
// Derive a context from the start context. If the request carries
// a deadline, apply it to the parent context.
rctx := ctx
if req.Ctx != nil {
if deadline, ok := req.Ctx.Deadline(); ok {
dctx, cancel := context.WithDeadline(ctx, deadline)
rctx = dctx
defer cancel()
}
}
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RequestReload accepts a context, but Start() only copies the deadline from req.Ctx and otherwise runs reloads under the Start() parent context. This means request cancellation/values won’t propagate to module Reload calls, which is surprising for callers and makes it impossible to cancel a queued reload. Consider deriving rctx from req.Ctx (e.g., rctx := req.Ctx with a fallback to ctx) and only overriding the parent when needed.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d29ee4a. handleReload() now uses req.Ctx as the base context when provided (falls back to the Start parent context otherwise), so caller cancellation and values fully propagate to module Reload() calls. A context.WithCancel wrapper ensures cleanup on completion.

intel352 and others added 2 commits March 9, 2026 21:19
- reload_orchestrator: extract handleReload() to properly scope context
  lifetime (no defer cancel leak in loop), propagate req.Ctx values and
  cancellation to module Reload calls instead of only copying deadline
- health_service: worstStatus now maps StatusUnknown → StatusUnhealthy
  in output, consistent with documented aggregation behavior
- reload_test: replace all time.Sleep waits with polling waitFor helper
  for deterministic CI-safe synchronization
- reload_contract_bdd_test: replace time.Sleep waits with event/call
  polling helpers (bddWaitForEvent, bddWaitForCalls)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Keep context chain rooted in parentCtx and apply request deadline via
context.WithDeadline instead of swapping the base context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 10, 2026 01:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 6 comments.

Comment on lines +111 to +123
// checkReloadIdempotent calls Reload with empty changes twice and returns an error
// if either call fails.
func (v *StandardContractVerifier) checkReloadIdempotent(module Reloadable) error {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

if err := module.Reload(ctx, nil); err != nil {
return fmt.Errorf("first call: %w", err)
}
if err := module.Reload(ctx, nil); err != nil {
return fmt.Errorf("second call: %w", err)
}
return nil
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkReloadIdempotent calls module.Reload(ctx, nil) directly. If a module violates the contract by ignoring ctx cancellation/timeouts (the very thing this verifier is meant to detect), VerifyReloadContract can hang indefinitely. Consider running Reload in a goroutine and selecting on ctx.Done() so the verifier returns a violation instead of blocking forever.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e42a1d4. checkReloadIdempotent now runs each Reload call in a goroutine with a select on ctx.Done(), so a misbehaving module that ignores context cancellation causes a timeout violation instead of blocking the verifier.

Comment on lines +145 to +147
for range len(providers) {
result := <-ch

Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AggregateHealthService.Check waits for exactly len(providers) results using a blocking receive, but it never selects on ctx.Done(). If a provider blocks or ignores context cancellation, Check can hang indefinitely even after the caller context is cancelled/deadline exceeded. Consider receiving results with a select on ctx.Done() and returning ctx.Err() (or a partial result) when the context is done to prevent stuck callers.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e42a1d4. Check() now uses select on both the result channel and ctx.Done() when collecting provider results. If the caller context is cancelled or times out, Check returns ctx.Err() immediately instead of blocking.

Comment on lines +238 to +242
// Emit event using NewCloudEvent helper (sets ID, specversion, time)
if g.subject != nil {
event := NewCloudEvent(EventTypeTenantViolation, "com.modular.tenant.guard", violation, nil)
_ = g.subject.NotifyObservers(ctx, event)
}
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValidateAccess ignores the error returned by Subject.NotifyObservers. This can silently drop tenant violation events (e.g., misconfigured observers) and makes troubleshooting harder. Consider handling the returned error (at least log it via g.logger when available, or propagate it depending on desired semantics).

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e42a1d4. The error from NotifyObservers is now logged via g.logger.Warn() when a logger is configured, instead of being silently discarded.

Comment on lines +14 to +18
// TenantGuardStrict blocks the operation and returns an error on violation.
TenantGuardStrict TenantGuardMode = iota
// TenantGuardLenient logs the violation but allows the operation to proceed.
TenantGuardLenient
// TenantGuardDisabled performs no validation at all.
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TenantGuardLenient doc comment says it "logs the violation", but logging is conditional on both LogViolations and a non-nil logger being configured. Either update the comment to reflect that logging is best-effort/optional, or ensure a default logger is always set when lenient mode is enabled (e.g., in builder wiring).

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e42a1d4. Updated the TenantGuardLenient doc comment to clarify that logging is best-effort and conditional on LogViolations being true and a logger being configured.

Comment on lines +351 to +356
func (rc *ReloadBDDContext) theCircuitBreakerShouldEventuallyReset() error {
// Manually reset the circuit breaker state to simulate backoff elapsed.
rc.orchestrator.cbMu.Lock()
rc.orchestrator.circuitOpen = false
rc.orchestrator.failures = 0
rc.orchestrator.cbMu.Unlock()
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BDD step "the circuit breaker should eventually reset" manually clears orchestrator.circuitOpen/failures, so it doesn't actually validate the backoff/reset behavior implemented in isCircuitOpen/backoffDuration. To make this contract meaningful, simulate elapsed backoff by adjusting lastFailure to sufficiently in the past (or inject a clock) and then assert that RequestReload is accepted and the circuit is reset without directly mutating circuitOpen/failures.

Copilot generated this review using guidance from repository custom instructions.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e42a1d4. The BDD step now simulates elapsed backoff by backdating lastFailure to time.Now().Add(-circuitBreakerMaxDelay - 1s), which exercises the actual isCircuitOpen()/backoffDuration() logic rather than bypassing it.

Comment on lines +134 to +148
// When the request carries a context with a deadline, that deadline is applied to
// the parent context so that both caller deadlines and Start() cancellation are
// respected. Request context values are not propagated to keep the context chain
// rooted in parentCtx (required by contextcheck linter).
func (o *ReloadOrchestrator) handleReload(parentCtx context.Context, req ReloadRequest) {
rctx, cancel := context.WithCancel(parentCtx)
defer cancel()

// If the request carries a deadline, tighten the context with it.
if req.Ctx != nil {
if deadline, ok := req.Ctx.Deadline(); ok {
rctx, cancel = context.WithDeadline(rctx, deadline) //nolint:contextcheck // deadline from request
defer cancel()
}
}
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handleReload only uses req.Ctx to copy over a deadline; it does not respect req.Ctx cancellation when there is no deadline (or when the caller cancels early). Since RequestReload accepts a ctx, callers may reasonably expect cancellation to abort in-flight reload work. Consider wiring req.Ctx.Done() into rctx cancellation (without propagating values) so both deadlines and explicit cancellation are honored.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in e42a1d4. handleReload now spawns a background goroutine that watches req.Ctx.Done() and cancels the reload context when the request context is cancelled. This ensures both deadlines and explicit cancellation from callers propagate to module Reload() calls.

intel352 and others added 2 commits March 9, 2026 21:42
- contract_verifier: guard checkReloadIdempotent with goroutine+select
  so a misbehaving module cannot block the verifier indefinitely
- health_service: Check now selects on ctx.Done() when collecting
  provider results, returning ctx.Err() on cancellation/timeout
- tenant_guard: log NotifyObservers errors instead of silently dropping;
  update TenantGuardLenient doc to clarify logging is best-effort
- reload_contract_bdd_test: simulate elapsed backoff by backdating
  lastFailure instead of manually clearing circuit breaker state
- reload_orchestrator: propagate req.Ctx cancellation via background
  goroutine watching req.Ctx.Done(), not just deadline

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add ErrReloadTimeout sentinel error, use in contract verifier
- Wrap ctx.Err() in health_service.Check for wrapcheck compliance

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 10, 2026 01:50
@intel352 intel352 merged commit bcb82fb into main Mar 10, 2026
33 checks passed
@intel352 intel352 deleted the feat/reimplementation branch March 10, 2026 01:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 3 comments.

Comment on lines +251 to +254
// In strict mode, return error
if g.config.Mode == TenantGuardStrict {
return ErrTenantIsolationViolation
}
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In strict mode ValidateAccess always returns ErrTenantIsolationViolation, even when the violation type is MissingContext/InvalidContext. Since ErrTenantContextMissing is defined, consider returning that (at least for MissingContext) so callers can distinguish missing tenant context from a true cross-tenant isolation violation.

Copilot uses AI. Check for mistakes.
Comment on lines +132 to +134
go func() {
ch <- result{err: module.Reload(ctx, nil)}
}()
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contract verification currently calls module.Reload without panic recovery (both in the goroutine here and in the direct call in checkReloadRespectsCancel). A panicking Reload implementation will crash the verifier/test run instead of being reported as a contract violation; wrap Reload invocations with recover and surface a violation describing the panic.

Copilot uses AI. Check for mistakes.
Comment on lines +174 to +177
go func() {
reports, err := provider.HealthCheck(ctx)
ch <- result{reports, err}
}()
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VerifyHealthContract runs provider.HealthCheck in a goroutine (and again later for cancellation) without panic recovery. A panicking provider will crash the verifier/test suite; add a recover guard and report a contract violation when HealthCheck panics instead of letting the panic escape.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants