fix: pad vace_input_frames to min temporal size to prevent 3x1x1 kernel underflow by livepeer-tessa · Pull Request #674 · daydreamlive/scope

livepeer-tessa · 2026-03-13T06:19:20Z

Problem

Fixes #673.

VaceEncodingBlock._encode_with_conditioning hard-crashes with a PyTorch convolution error when the input chunk has fewer than num_frame_per_block × vae_temporal_downsample_factor (= 12) frames:

RuntimeError: Calculated padded input size per channel: (2 x 64 x 64).
Kernel size: (3 x 1 x 1). Kernel size can't be greater than actual input size

The WAN VAE encoder contains a 3×1×1 temporal convolution kernel. 8 pixel-space frames → 2 latent-space frames, which is below the kernel's minimum of 3.

Observed in prod on 2026-03-12 for the streamdiffusionv2 pipeline with VACE conditioning.

Solution

In _encode_with_conditioning, detect when num_frames < min_frames and pad to min_frames by repeating the last input frame (and last mask frame when vace_input_masks is also supplied). A WARNING is logged for observability without crashing.

Changes

src/scope/core/pipelines/wan2_1/vace/blocks/vace_encoding.py: Add temporal underflow guard before VAE encoding in _encode_with_conditioning

Testing

Existing tests should still pass (no behaviour change when num_frames >= min_frames)
To reproduce locally: pass a vace_input_frames tensor with shape [1, 3, 8, H, W] to streamdiffusionv2 with VACE conditioning enabled — was crashing before, now pads silently.

Float8DynamicActivationFloat8WeightConfig is not compatible with torch.compile(fullgraph=False). During warmup on H100 (where compile=True), AOT autograd's gen_alias_from_base calls aten.as_strided on Float8Tensor outputs, which is not implemented in torchao: NotImplementedError: Float8Tensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten.as_strided', overload='default')> The crash manifests specifically after longlive (also FP8) because torch._dynamo's compile cache is never reset between pipeline switches, allowing longlive's Float8 dispatch state to persist and influence Krea's subsequent compile attempt. Two fixes: 1. krea_realtime_video/pipeline.py: when FP8 quantization is active, skip block.compile() — the two optimizations are currently mutually exclusive with fullgraph=False. FP8 alone still provides meaningful memory/compute savings on H100 without compile. 2. pipeline_manager.py: call torch._dynamo.reset() on every pipeline unload to clear stale compiled graphs and Float8 dispatch state, preventing cross-pipeline cache pollution. Fixes #669 Signed-off-by: livepeer-robot <robot@livepeer.org>

…ale-cache recompile If torch._dynamo.reset() raises during pipeline unload, stale Dynamo/FP8 compile caches remain active in the worker process. Previously the code swallowed the exception and published pipeline_unloaded unconditionally, leaving the next krea-realtime-video load free to torch.compile against those stale caches — re-entering the warmup crash from the FP8→Krea conflict. Fix: set self._dynamo_reset_failed = True on reset failure. The Krea load path now checks this flag and forces compile=False for the lifetime of the worker, with a clear log warning to restart the process to re-enable compilation. Addresses CodeRabbit review comment on PR #670. Signed-off-by: livepeer-robot <robot@livepeer.org>

…ompile=False When compile=False, kv_cache_attention_bias was still being set to DEFAULT_KV_CACHE_ATTENTION_BIAS (0.3), which causes the warmup loop to enter the flex_attention code path and trigger torch._dynamo tracing even though no block.compile() call was ever made. This meant the _dynamo_reset_failed guard in pipeline_manager.py had no effect on the warmup-induced recompilation. Fix: - Import KV_CACHE_ATTENTION_BIAS_DISABLED (1.0) from causal_model and use it as the initial kv_cache_attention_bias when compile=False. This sentinel makes causal_model.py take the standard attention branch and skip the flex_attention/torch.compile path entirely. - Guard the warmup loop behind 'if compile:' — warmup exists solely to prime the compiled flex_attention kernel, so it is a no-op (and harmful) when compilation is disabled. Log a message when skipped for observability. Addresses CodeRabbit review comment on PR #671. Signed-off-by: livepeer-robot <robot@livepeer.org>

The comment at line 230 already specifies ceil(local_attn_size / num_frame_per_block) + 1, but the implementation was using floor division (//). When local_attn_size is not evenly divisible by num_frame_per_block, this meant warmup stopped one iteration early, leaving the cache short of the steady-state shape and triggering a recompile on the first live request. Replace with the ceiling equivalent: (a + b - 1) // b to avoid importing math. Fixes coderabbitai suggestion on PR #671. Signed-off-by: livepeer-robot <robot@livepeer.org>

coderabbitai · 2026-03-13T06:19:28Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f63327be-e6b4-4872-b7b2-3fedc76f3ee1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/vace-temporal-kernel-underflow

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

… underflow The WAN VAE encoder contains a 3x1x1 temporal convolution kernel. When the input chunk has fewer than (num_frame_per_block × vae_temporal_downsample_factor) frames, the latent temporal dimension after downsampling falls below 3, causing: RuntimeError: Calculated padded input size per channel: (2 x 64 x 64). Kernel size: (3 x 1 x 1). Kernel size can't be greater than actual input size Observed in prod logs (2026-03-12) for the streamdiffusionv2 pipeline with VACE conditioning enabled (fal.ai job 10670fc6). Fix: in _encode_with_conditioning, detect when num_frames < min_frames and pad to min_frames by repeating the last input frame (and last mask frame when vace_input_masks is also provided). A WARNING is emitted so short chunks remain visible in logs without crashing. Related: #557 (same block, different axis — spatial width underflow) Signed-off-by: livepeer-robot <robot@livepeer.org>

Signed-off-by: livepeer-robot <robot@livepeer.org>

github-actions · 2026-03-13T06:28:03Z

🚀 fal.ai Preview Deployment


App ID	`daydream/scope-pr-674--preview`
WebSocket	`wss://fal.run/daydream/scope-pr-674--preview/ws`
Commit	`b183ecb`

Testing

Connect to this preview deployment by running this on your branch:

uv run build && SCOPE_CLOUD_APP_ID="daydream/scope-pr-674--preview/ws" uv run daydream-scope

🧪 E2E tests will run automatically against this deployment.

github-actions · 2026-03-13T06:30:11Z

✅ E2E Tests passed


Status	passed
fal App	`daydream/scope-pr-674--preview`
Run	View logs

Test Artifacts

Check the workflow run for screenshots.

…derflow The WAN VAE encoder contains a 3×3 spatial convolution kernel. When the input chunk has spatial dimensions < 3 on either axis the forward pass raises: RuntimeError: Calculated padded input size per channel: (2 x 513). Kernel size: (3 x 3). Kernel size can't be greater than actual input size Observed in prod logs (2026-03-15, 10:48–10:59 UTC) on krea-realtime-video pipeline, fal.ai job 5193400c-da0f-4eef-8bdd-dd0fdd26c1db: 2 372 errors over 11 minutes (~4 errors/second) from an input with height=2 pixels. Fix: in _encode_with_conditioning, detect when height or width < 3 and pad to the minimum safe size using F.pad. The corresponding masks tensor is also padded to keep shapes consistent. block_state.height/width are updated so the downstream resolution check still passes. A WARNING is emitted so the unusual input remains visible in logs without a crash. This is the spatial analogue of the 3×1×1 temporal kernel guard (issue #673, PR #674). Fixes #557 Signed-off-by: livepeer-robot <robot@livepeer.org>

livepeer-robot added 4 commits March 11, 2026 18:37

livepeer-tessa requested review from emranemran and mjh1 March 13, 2026 06:19

livepeer-robot added 2 commits March 13, 2026 06:21

style: ruff format two pre-existing unformatted files

b183ecb

Signed-off-by: livepeer-robot <robot@livepeer.org>

livepeer-tessa force-pushed the fix/vace-temporal-kernel-underflow branch from 2c90d83 to b183ecb Compare March 13, 2026 06:21

This was referenced Mar 15, 2026

fix: pad vace_input_frames to min spatial size to avoid 3×3 kernel underflow #696

Open

VaceEncodingBlock fails on extremely narrow input images (kernel size > input size) #557

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pad vace_input_frames to min temporal size to prevent 3x1x1 kernel underflow#674

fix: pad vace_input_frames to min temporal size to prevent 3x1x1 kernel underflow#674
livepeer-tessa wants to merge 6 commits intomainfrom
fix/vace-temporal-kernel-underflow

livepeer-tessa commented Mar 13, 2026

Uh oh!

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

Review skipped

Uh oh!

github-actions bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

livepeer-tessa commented Mar 13, 2026

Problem

Solution

Changes

Testing

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 fal.ai Preview Deployment

Testing

Uh oh!

github-actions bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ E2E Tests passed

Test Artifacts

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

github-actions bot commented Mar 13, 2026 •

edited

Loading

github-actions bot commented Mar 13, 2026 •

edited

Loading