fix: pad vace_input_frames to min spatial size to avoid 3×3 kernel underflow#696
Open
livepeer-tessa wants to merge 1 commit intomainfrom
Open
fix: pad vace_input_frames to min spatial size to avoid 3×3 kernel underflow#696livepeer-tessa wants to merge 1 commit intomainfrom
livepeer-tessa wants to merge 1 commit intomainfrom
Conversation
…derflow The WAN VAE encoder contains a 3×3 spatial convolution kernel. When the input chunk has spatial dimensions < 3 on either axis the forward pass raises: RuntimeError: Calculated padded input size per channel: (2 x 513). Kernel size: (3 x 3). Kernel size can't be greater than actual input size Observed in prod logs (2026-03-15, 10:48–10:59 UTC) on krea-realtime-video pipeline, fal.ai job 5193400c-da0f-4eef-8bdd-dd0fdd26c1db: 2 372 errors over 11 minutes (~4 errors/second) from an input with height=2 pixels. Fix: in _encode_with_conditioning, detect when height or width < 3 and pad to the minimum safe size using F.pad. The corresponding masks tensor is also padded to keep shapes consistent. block_state.height/width are updated so the downstream resolution check still passes. A WARNING is emitted so the unusual input remains visible in logs without a crash. This is the spatial analogue of the 3×1×1 temporal kernel guard (issue #673, PR #674). Fixes #557 Signed-off-by: livepeer-robot <robot@livepeer.org>
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
Contributor
🚀 fal.ai Preview Deployment
TestingConnect to this preview deployment by running this on your branch: 🧪 E2E tests will run automatically against this deployment. |
Contributor
✅ E2E Tests passed
Test ArtifactsCheck the workflow run for screenshots. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #557
Guards
VaceEncodingBlock._encode_with_conditioningagainst inputs whose spatial dimensions are below the WAN VAE's 3×3 convolution minimum — the spatial analogue of the temporal guard in PR #674 / issue #673.Root Cause
The WAN VAE encoder has a 3×3 spatial convolution kernel in its first layer. When
vace_input_frameshas height or width < 3 pixels, PyTorch raises:Observed in Prod (2026-03-15)
krea-realtime-videogithub_f1lhgmk5v76a0ev1w0u378by-scope-app--prod5193400c-da0f-4eef-8bdd-dd0fdd26c1dbFix
In
_encode_with_conditioning, after extracting(batch, channels, frames, height, width)fromvace_input_frames:height < 3orwidth < 3, pad to the minimum safe size usingF.padvace_input_masks(if provided) is padded to matchblock_state.height/widthare updated so the downstream resolution assertion still passesWARNINGis emitted so the unusual input remains visible in logsTesting
The fix is a pure defensive guard — normal inputs (height ≥ 3, width ≥ 3) are completely unaffected. Abnormal inputs (< 3 on either axis) will now warn instead of crash.
Related: PR #674 (temporal kernel guard, same block)