ZooKeeper entries not cleaned up when a shard is removed

## Summary

When a shard is removed from a CHI (e.g. reducing `shardsCount` from 2 to 1), the operator does **not** clean up ZooKeeper entries for the removed shard's replicated tables. After the reconcile completes, paths like `/clickhouse/{cluster}/tables/{removed_shard_index}/...` remain in ZooKeeper.

This is confirmed by `test_010014_0` "Remove shard" step, which has been XFAIL'd:
```python
with Then(f"Shard is removed from {self.context.keeper_type}", flags=XFAIL):
    out = clickhouse.query_with_error(
        chi_name,
        f"SELECT count() FROM system.zookeeper WHERE path ='/clickhouse/{cluster}/tables/1/default'",
    )
    assert "DB::Exception: No node" in out or out == "0"
```
The assertion always fails: `count() = 1` (or more) instead of 0 / "No node".

## Root Cause

### Issue 1: `dropZKReplicas` shard callback is a no-op

`pkg/controller/chi/worker-deleter.go` line 64:

```go
func (w *worker) dropZKReplicas(ctx context.Context, cr *api.ClickHouseInstallation) {
    cr.EnsureRuntime().ActionPlan.WalkRemoved(
        func(cluster api.ICluster) {},
        func(shard api.IShard) {},       // ← no-op: shard removal is silently ignored
        func(host *api.Host) {
            _ = w.dropZKReplica(ctx, host, NewDropReplicaOptions().SetRegularDrop())
        },
    )
}
```

When `shardsCount` decreases, `messagediff.DeepDiff` detects the removal at the **shard level** and adds the `*ChiShard` to `specDiff.Removed`. Since `*ChiShard` implements `IShard`, `shardFunc` is called — which is a no-op. The `hostFunc` callback for individual hosts is **never reached**.

### Issue 2: `dropZKReplica` uses hosts from the removed shard itself

Even if hosts were enumerated individually, `dropZKReplica` resolves the "host to run on" as:

```go
if shard := hostToDrop.GetShard(); shard != nil {
    hostToRunOn = shard.FirstHost()   // first host of the REMOVED shard
}
```

When an entire shard is removed, `FirstHost()` of the removed shard is also being deleted. By the time `dropZKReplicas` runs (after `clean()`), those pods are already gone — the SQL connection fails silently.

### Issue 3: `SYSTEM DROP REPLICA` is wrong tool for whole-shard removal

Even if `hostToRunOn` were a surviving host from another shard, `SYSTEM DROP REPLICA 'host-1-x'` run on shard 0 would not clean shard 1's ZK paths — shard 0's ClickHouse doesn't have shard 1's tables in `system.replicas`.

## Correct Fix

The proper approach is to run **`DROP TABLE IF EXISTS ... SYNC`** on the removed shard's hosts **before `clean()` deletes their pods**. This is the same mechanism used during full CHI deletion (`deleteTables` / `HostDropTables`), and it correctly:
1. Removes the replica's ZK entries
2. If it's the last replica, cleans the entire table ZK subtree

Concretely, in `worker-reconciler-chi.go`, before `w.clean(ctx, new)`:

```go
w.dropTablesFromRemovedHosts(ctx, new)   // NEW: drop tables on removed hosts while pods still running
w.clean(ctx, new)
...
w.dropZKReplicas(ctx, new)
```

Where `dropTablesFromRemovedHosts` walks `ActionPlan.WalkRemoved` and calls `deleteTables()` on each host in removed shards (respecting the existing `HostCanDeleteAllPVCs` guard for `reclaimPolicy: Retain`).

Additionally, `dropZKReplicas` should implement the `shardFunc` callback to handle removed shards explicitly.

## Impact

- ZooKeeper accumulates stale table paths every time a shard is scaled down
- If the same shard index is later re-added, it may see stale ZK metadata
- `SYSTEM DROP REPLICA` is never logged for shard removal (confirmed by query log check in `test_010014_0`)

## Related

- `test_010014_0` "Remove shard" step XFAILs:
  - "Shard is removed from zookeeper"
  - "system.query_log should contain SYSTEM DROP REPLICA statements"
- Full CHI deletion (`deleteCHIProtocol`) does NOT have this issue — it explicitly calls `deleteTables` on every host before deleting K8s resources
- Single-replica removal (scaling replicas 2→1 within a shard) also has a related issue: `dropZKReplica` correctly gets `shard.FirstHost()` as the surviving replica, but that only works when at least one replica remains in the shard


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZooKeeper entries not cleaned up when a shard is removed #1927

Summary

Root Cause

Issue 1: `dropZKReplicas` shard callback is a no-op

Issue 2: `dropZKReplica` uses hosts from the removed shard itself

Issue 3: `SYSTEM DROP REPLICA` is wrong tool for whole-shard removal

Correct Fix

Impact

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ZooKeeper entries not cleaned up when a shard is removed #1927

Description

Summary

Root Cause

Issue 1: dropZKReplicas shard callback is a no-op

Issue 2: dropZKReplica uses hosts from the removed shard itself

Issue 3: SYSTEM DROP REPLICA is wrong tool for whole-shard removal

Correct Fix

Impact

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Issue 1: `dropZKReplicas` shard callback is a no-op

Issue 2: `dropZKReplica` uses hosts from the removed shard itself

Issue 3: `SYSTEM DROP REPLICA` is wrong tool for whole-shard removal