-
Notifications
You must be signed in to change notification settings - Fork 535
Description
Describe the bug
When upgrading ClickHouse from one version to another (e.g., 25.3 → 25.8) while simultaneously adding version-specific settings, the pods enter CrashLoopBackOff and never recover without manual intervention.
The root cause is the reconciliation order introduced in v0.24.3:
- ConfigMaps are updated first with new version-specific settings
- Operator attempts
SYSTEM SHUTDOWN(software restart) on the existing pod - Pod restarts with the OLD image but mounts the NEW ConfigMap
- ClickHouse crashes because the old version doesn't recognize the new settings
- Operator gets stuck in
waitHostIsReady()waiting for the unhealthy host - StatefulSet update (which would apply the new image) is never reached
This is a regression from v0.24.3 where the restart behavior was changed:
"Changed a way ClickHouse is restarted in order to pickup server configuration change. Instead of pod re-creation it tries
SYSTEM SHUTDOWNfirst."
To Reproduce
- Deploy a ClickHouse cluster with operator v0.25.6 (or any version >= 0.24.3)
- Use ClickHouse version 25.3
- Update the CHI to:
- Change image to ClickHouse 25.8
- Add a setting that only exists in 25.5+ (e.g.,
write_marks_for_substreams_in_compact_parts)
- Apply the updated CHI
- Observe pods entering
CrashLoopBackOff
Example CHI change:
spec:
configuration:
settings:
merge_tree:
# This MergeTree setting is only valid in ClickHouse 25.5+
write_marks_for_substreams_in_compact_parts: 0
templates:
podTemplates:
- name: clickhouse-pod
spec:
containers:
- name: clickhouse
image: clickhouse/clickhouse-server:25.8.16.34 # Changed from 25.3Expected behavior
When an image change is detected, the operator should:
- Update the StatefulSet with the new image first (or skip software restart for image changes)
- Wait for the new pod to be running with the new image
- Then apply ConfigMap changes
OR
The operator should detect that an image change requires a full pod replacement and skip the SYSTEM SHUTDOWN software restart attempt.
Actual behavior
- Operator detects the image change correctly (visible in logs)
- ConfigMaps are updated with new settings (including version-specific ones)
- Operator sends
SYSTEM SHUTDOWNto ClickHouse - Kubernetes restarts the container with the same old image (StatefulSet not yet updated)
- ClickHouse 25.3 fails to start due to unrecognized MergeTree setting:
Code: 137. DB::Exception: Unknown setting 'write_marks_for_substreams_in_compact_parts'. - Pod enters
CrashLoopBackOff - Operator stuck in
waitHostIsReady()indefinitely - StatefulSet reconciliation (with new image) is never executed
Workaround
Manual recovery:
kubectl rollout restart deployment clickhouse-operator -n <namespace>This triggers a fresh reconciliation that properly updates the StatefulSet.
Prevention:
Perform upgrades in two phases:
- First: Update image only (no new settings)
- Second: Add version-specific settings after pods are running new image
Operator logs
Change detection working correctly:
I0217 09:59:32.115253 1 worker-reconciler-chi.go:182] logSWVersion():Host:0-0[0/0]:
default/clickhouse:Host software version: 0-0 25.8.16[25.8.16.34/parsed from the tag: '25.8.16.34']
diff item [14]:'.Templates.PodTemplates[0].Spec.Containers[0].Image' = '"clickhouse/clickhouse-server:25.8.16.34"'
Stuck waiting for unhealthy host:
I0217 09:59:XX.XXXXXX 1 worker.go:XXX] waitHostIsReady():
Waiting for host to be ready...
[Repeated indefinitely - host never becomes ready due to CrashLoopBackOff]
Environment
| Component | Version |
|---|---|
| Operator | 0.25.6 |
| Previous Operator | 0.24.2 (worked correctly) |
| ClickHouse (before) | 25.3 |
| ClickHouse (after) | 25.8 |
| Kubernetes | 1.34 |
| Installation method | Helm |
Additional context
- This issue was introduced in v0.24.3 with the
SYSTEM SHUTDOWNoptimization - v0.24.2 and earlier versions did not have this issue because they always did full pod replacement
- The issue only manifests when upgrading images AND adding version-specific settings simultaneously