fix: pad vace_input_frames to min temporal size to prevent 3x1x1 kernel underflow#674
Open
livepeer-tessa wants to merge 6 commits intomainfrom
Open
fix: pad vace_input_frames to min temporal size to prevent 3x1x1 kernel underflow#674livepeer-tessa wants to merge 6 commits intomainfrom
livepeer-tessa wants to merge 6 commits intomainfrom
Conversation
added 4 commits
March 11, 2026 18:37
Float8DynamicActivationFloat8WeightConfig is not compatible with torch.compile(fullgraph=False). During warmup on H100 (where compile=True), AOT autograd's gen_alias_from_base calls aten.as_strided on Float8Tensor outputs, which is not implemented in torchao: NotImplementedError: Float8Tensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten.as_strided', overload='default')> The crash manifests specifically after longlive (also FP8) because torch._dynamo's compile cache is never reset between pipeline switches, allowing longlive's Float8 dispatch state to persist and influence Krea's subsequent compile attempt. Two fixes: 1. krea_realtime_video/pipeline.py: when FP8 quantization is active, skip block.compile() — the two optimizations are currently mutually exclusive with fullgraph=False. FP8 alone still provides meaningful memory/compute savings on H100 without compile. 2. pipeline_manager.py: call torch._dynamo.reset() on every pipeline unload to clear stale compiled graphs and Float8 dispatch state, preventing cross-pipeline cache pollution. Fixes #669 Signed-off-by: livepeer-robot <robot@livepeer.org>
…ale-cache recompile If torch._dynamo.reset() raises during pipeline unload, stale Dynamo/FP8 compile caches remain active in the worker process. Previously the code swallowed the exception and published pipeline_unloaded unconditionally, leaving the next krea-realtime-video load free to torch.compile against those stale caches — re-entering the warmup crash from the FP8→Krea conflict. Fix: set self._dynamo_reset_failed = True on reset failure. The Krea load path now checks this flag and forces compile=False for the lifetime of the worker, with a clear log warning to restart the process to re-enable compilation. Addresses CodeRabbit review comment on PR #670. Signed-off-by: livepeer-robot <robot@livepeer.org>
…ompile=False When compile=False, kv_cache_attention_bias was still being set to DEFAULT_KV_CACHE_ATTENTION_BIAS (0.3), which causes the warmup loop to enter the flex_attention code path and trigger torch._dynamo tracing even though no block.compile() call was ever made. This meant the _dynamo_reset_failed guard in pipeline_manager.py had no effect on the warmup-induced recompilation. Fix: - Import KV_CACHE_ATTENTION_BIAS_DISABLED (1.0) from causal_model and use it as the initial kv_cache_attention_bias when compile=False. This sentinel makes causal_model.py take the standard attention branch and skip the flex_attention/torch.compile path entirely. - Guard the warmup loop behind 'if compile:' — warmup exists solely to prime the compiled flex_attention kernel, so it is a no-op (and harmful) when compilation is disabled. Log a message when skipped for observability. Addresses CodeRabbit review comment on PR #671. Signed-off-by: livepeer-robot <robot@livepeer.org>
The comment at line 230 already specifies ceil(local_attn_size / num_frame_per_block) + 1, but the implementation was using floor division (//). When local_attn_size is not evenly divisible by num_frame_per_block, this meant warmup stopped one iteration early, leaving the cache short of the steady-state shape and triggering a recompile on the first live request. Replace with the ceiling equivalent: (a + b - 1) // b to avoid importing math. Fixes coderabbitai suggestion on PR #671. Signed-off-by: livepeer-robot <robot@livepeer.org>
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
added 2 commits
March 13, 2026 06:21
… underflow The WAN VAE encoder contains a 3x1x1 temporal convolution kernel. When the input chunk has fewer than (num_frame_per_block × vae_temporal_downsample_factor) frames, the latent temporal dimension after downsampling falls below 3, causing: RuntimeError: Calculated padded input size per channel: (2 x 64 x 64). Kernel size: (3 x 1 x 1). Kernel size can't be greater than actual input size Observed in prod logs (2026-03-12) for the streamdiffusionv2 pipeline with VACE conditioning enabled (fal.ai job 10670fc6). Fix: in _encode_with_conditioning, detect when num_frames < min_frames and pad to min_frames by repeating the last input frame (and last mask frame when vace_input_masks is also provided). A WARNING is emitted so short chunks remain visible in logs without crashing. Related: #557 (same block, different axis — spatial width underflow) Signed-off-by: livepeer-robot <robot@livepeer.org>
Signed-off-by: livepeer-robot <robot@livepeer.org>
2c90d83 to
b183ecb
Compare
Contributor
🚀 fal.ai Preview Deployment
TestingConnect to this preview deployment by running this on your branch: 🧪 E2E tests will run automatically against this deployment. |
Contributor
✅ E2E Tests passed
Test ArtifactsCheck the workflow run for screenshots. |
livepeer-tessa
pushed a commit
that referenced
this pull request
Mar 15, 2026
…derflow The WAN VAE encoder contains a 3×3 spatial convolution kernel. When the input chunk has spatial dimensions < 3 on either axis the forward pass raises: RuntimeError: Calculated padded input size per channel: (2 x 513). Kernel size: (3 x 3). Kernel size can't be greater than actual input size Observed in prod logs (2026-03-15, 10:48–10:59 UTC) on krea-realtime-video pipeline, fal.ai job 5193400c-da0f-4eef-8bdd-dd0fdd26c1db: 2 372 errors over 11 minutes (~4 errors/second) from an input with height=2 pixels. Fix: in _encode_with_conditioning, detect when height or width < 3 and pad to the minimum safe size using F.pad. The corresponding masks tensor is also padded to keep shapes consistent. block_state.height/width are updated so the downstream resolution check still passes. A WARNING is emitted so the unusual input remains visible in logs without a crash. This is the spatial analogue of the 3×1×1 temporal kernel guard (issue #673, PR #674). Fixes #557 Signed-off-by: livepeer-robot <robot@livepeer.org>
This was referenced Mar 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Fixes #673.
VaceEncodingBlock._encode_with_conditioninghard-crashes with a PyTorch convolution error when the input chunk has fewer thannum_frame_per_block × vae_temporal_downsample_factor(= 12) frames:The WAN VAE encoder contains a 3×1×1 temporal convolution kernel. 8 pixel-space frames → 2 latent-space frames, which is below the kernel's minimum of 3.
Observed in prod on 2026-03-12 for the
streamdiffusionv2pipeline with VACE conditioning.Solution
In
_encode_with_conditioning, detect whennum_frames < min_framesand pad tomin_framesby repeating the last input frame (and last mask frame whenvace_input_masksis also supplied). AWARNINGis logged for observability without crashing.Changes
src/scope/core/pipelines/wan2_1/vace/blocks/vace_encoding.py: Add temporal underflow guard before VAE encoding in_encode_with_conditioningTesting
num_frames >= min_frames)vace_input_framestensor with shape[1, 3, 8, H, W]tostreamdiffusionv2with VACE conditioning enabled — was crashing before, now pads silently.