diff --git a/docs/geneva/jobs/performance.mdx b/docs/geneva/jobs/performance.mdx index ffb55d8..f7c377e 100644 --- a/docs/geneva/jobs/performance.mdx +++ b/docs/geneva/jobs/performance.mdx @@ -1,12 +1,47 @@ --- -title: Distributed Job Performance -sidebarTitle: Performance +title: Distributed Job Performance and Cluster Sizing +sidebarTitle: Performance and Sizing description: Learn how to tune Geneva distributed job performance by scaling compute resources and balancing write bandwidth. icon: chart-line --- When Geneva runs in distributed mode, jobs are deployed against a Kubernetes KubeRay instance that dynamically provisions a Ray cluster. Job execution time depends on sufficient CPU/GPU resources for *computation* and sufficient *write bandwidth* to store the output values. Tuning the performance of a job boils down to configuring the table or cluster resources. +## Geneva defaults + +Geneva sets the following defaults when creating a KubeRay cluster via GenevaClusterBuilder. These apply as both Kubernetes requests and limits. + +### Head Node +| Resource | Default | +| --- | --- | +| CPU | 4 | +| Memory | 8 GiB | +| GPU | 0 | +| Node Selector | `geneva.lancedb.com/ray-head: true` | +| Service Account | `geneva-service-account` | + +The head node runs the Ray GCS (Global Control Store), the driver task, and the dashboard. It does not run UDF applier actors. + +### CPU Workers +| Resource | Default | +| --- | --- | +| CPU | 4 | +| Memory | 8 GiB | +| GPU | 0 | +| Node Selector | `geneva.lancedb.com/ray-worker-cpu: true` | +| Replicas | 1 (min: 0, max: 100) | +| Idle Timeout | 60 seconds | + +### GPU Workers +| Resource | Default | +| --- | --- | +| CPU | 8 | +| Memory | 16 GiB | +| GPU | 1 | +| Node Selector | `geneva.lancedb.com/ray-worker-gpu: true` | +| Replicas | 1 (min: 0, max: 100) | +| Idle Timeout | 60 seconds | + ## Scaling computation resources Geneva jobs can split and schedule computational work into smaller batches that are assigned to *tasks* which are distributed across the cluster. As each task completes, it writes its output into a checkpoint file. If a job is interrupted or run again, Geneva will look to see if a checkpoint for the computation is already present and if not will kick off computations. @@ -49,6 +84,48 @@ Typical per-row sizes: - Videos: ~10MB–200MB - Embeddings: `dimension * data_type_size` bytes (e.g. float32 embeddings use 4 bytes per value, so a 1536-dim embedding is `1536 * 4 = 6144` bytes) +#### Internal Actor Resource Overhead +In addition to UDF applier actors, each backfill job creates internal actors that consume resources: +|Component |CPU |Memory |Count| +|-------------|----|-------|-----| +|Driver task |0.1 |— |1 | +|JobTracker |0.1 |128 MiB|1 | +|Writer actors|0.1 |1 GiB |1 per applier| +|Queue actors |0 |— |1 per applier| +|Applier actors|UDF `num_cpus * intra_applier_concurrency`|UDF memory |`concurrency`| + +#### Example +For a job with `concurrency=4`, `intra_applier_concurrency=2`, with a UDF that requests 1 CPU and 1GiB memory: + +|Component |CPU |Memory |Count| +|--------------|-----|-------|-----| +|Driver task |0.1 |— |1 | +|JobTracker |0.1 |128 MiB|1 | +|Writer actors |0.1 |1 GiB |4 | +|Queue actors |0 |— |4 | +|Applier actors|1*2=2|1 GiB |4 | +|Total |0.1+0.1+4*0.1+4*2 = **8.6**|128 MiB + 4 * 1GiB + 4 * 1GiB = **8.125 GiB**|| + +### Overall Cluster Sizing + +Of course, the size of a cluster will vary dramatically for each task. But if you don't know how to estimate your workload, we can recommend the following cluster sizes as a starting point: + +| Cluster Size | Head Node | Workers | +| --- | --- | --- | +| **Small** (Dev / CI) | 1 CPU, 8 GB | 4 x 2 CPU, 8 GB | +| **Medium** (Staging) | 2 CPU, 8 GB | 4 x 4 CPU, 8 GB | +| **Large** (Production) | 4 CPU, 8 GB | 8+ x 8 CPU, 16 GB | + +### Validation Thresholds + +The cluster builder validates memory configuration and warns or errors on suspicious values: + +| Threshold | Value | Behavior | +| --- | --- | --- | +| GPU worker minimum memory | `< 4 GiB` | **Error** — build fails if below this | +| Large memory warning | `> 100 GB` | **Warning** — may exceed K8s node capacity | +| Memory-per-CPU ratio warning | `> 16 GiB per CPU` | **Warning** — unusual ratio, likely misconfigured | + ### GKE node pools GKE + KubeRay can autoscale the number of VM nodes on demand. Limitations on the amount of resources provisioned are configured via [node pools](https://cloud.google.com/kubernetes-engine/docs/how-to/node-pools#scale-node-pool). Node pools can be managed to scale vertically (type of machine) or horizontally (# of nodes). diff --git a/docs/geneva/jobs/troubleshooting.mdx b/docs/geneva/jobs/troubleshooting.mdx index 187c8b8..a763121 100644 --- a/docs/geneva/jobs/troubleshooting.mdx +++ b/docs/geneva/jobs/troubleshooting.mdx @@ -120,6 +120,16 @@ The client could not connect to the Ray cluster (e.g. after starting a KubeRay o If you see errors about the client already being connected (e.g. when re-running a notebook or script), ensure you're not holding an old Ray client connection. Restart the kernel or process so Ray is re-initialized cleanly. Geneva disconnects the client when the context exits; leaving a context open or reusing a stale connection can cause this. +### Head node out of memory + +If the head node is under-provisioned (e.g. 1 CPU / 2 GB), it can OOM when: + +- Many workers connect and register with GCS +- The Ray dashboard accumulates metrics +- Object store spillover occurs + +**Recommendation**: Use at least 4 CPU / 8 GiB for the head node in any non-trivial deployment. This is the current default. + --- ## Serialization library or `attrs` version mismatch diff --git a/tests/py/test_geneva_defaults.py b/tests/py/test_geneva_defaults.py new file mode 100644 index 0000000..83502f1 --- /dev/null +++ b/tests/py/test_geneva_defaults.py @@ -0,0 +1,55 @@ +# SPDX-License-Identifier: Apache-2.0 +# SPDX-FileCopyrightText: Copyright The LanceDB Authors + +# These tests assert the default resource values for Geneva KubeRay cluster nodes. +# If any of these values change, update the tables in: +# docs/geneva/jobs/performance.mdx (the "Geneva defaults" section) + +import pytest + + +# TODO: remove skip once geneva 0.12.0 is on PyPI (head node defaults changed to 4 CPU / 8Gi) +@pytest.mark.skip(reason="requires geneva>=0.12.0") +def test_head_node_defaults(): + # If these change, update the Head Node table in performance.mdx + from geneva.cluster.builder import KubeRayClusterBuilder + + builder = KubeRayClusterBuilder() + assert builder._head_cpus == 4 + assert builder._head_memory == "8Gi" + assert builder._head_gpus == 0 + assert builder._head_node_selector == {"geneva.lancedb.com/ray-head": "true"} + assert builder._head_service_account == "geneva-service-account" + + +def test_cpu_worker_defaults(): + # If these change, update the CPU Workers table in performance.mdx + from geneva.cluster.builder import CpuWorkerBuilder + + worker = CpuWorkerBuilder() + assert worker._num_cpus == 4 + assert worker._memory == "8Gi" + assert worker._node_selector == {"geneva.lancedb.com/ray-worker-cpu": "true"} + assert worker._replicas == 1 + assert worker._min_replicas == 0 + assert worker._max_replicas == 100 + assert worker._idle_timeout_seconds == 60 + + # Confirm build produces 0 GPUs + config = worker.build() + assert config.num_gpus == 0 + + +def test_gpu_worker_defaults(): + # If these change, update the GPU Workers table in performance.mdx + from geneva.cluster.builder import GpuWorkerBuilder + + worker = GpuWorkerBuilder() + assert worker._num_cpus == 8 + assert worker._memory == "16Gi" + assert worker._num_gpus == 1 + assert worker._node_selector == {"geneva.lancedb.com/ray-worker-gpu": "true"} + assert worker._replicas == 1 + assert worker._min_replicas == 0 + assert worker._max_replicas == 100 + assert worker._idle_timeout_seconds == 60