fix(vace): guard against sub-minimum spatial dims in WAN VAE encoder#714
fix(vace): guard against sub-minimum spatial dims in WAN VAE encoder#714livepeer-tessa wants to merge 1 commit intomainfrom
Conversation
The WAN VAE encoder has 3 spatial downsample stages, each implemented as ZeroPad2d(0,1,0,1) + Conv2d(kernel=3, stride=2, padding=0). Any spatial dimension < 8px causes a RuntimeError at the second or third stage: Calculated padded input size per channel: (2 x 513). Kernel size: (3 x 3). Kernel size can't be greater than actual input size This was observed 500+ times in 2 min in krea-realtime-video (job 5193400c-da0f-4eef-8bdd-dd0fdd26c1db), crashing every chunk for the affected session. Changes in vace/utils/encoding.py: - Add MIN_VAE_SPATIAL_SIZE = 8 constant with derivation comment - Add _check_spatial_size() helper that raises a descriptive ValueError early (before the cryptic PyTorch kernel-size error) for main frames - Add a warning+skip guard for reference images: since refs are supplementary, we degrade gracefully (no ref conditioning for that chunk) rather than aborting the session - Nest the mask-channel padding and latent concatenation inside the else branch so ref_latent_batch is never accessed on the skip path - Add logging import and module-level logger Distinct from #673 (temporal kernel underflow in streamdiffusionv2). Fixes #713 Signed-off-by: livepeer-robot <robot@livepeer.org>
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip You can enable review details to help with troubleshooting, context usage and more.Enable the |
🚀 fal.ai Preview Deployment
TestingConnect to this preview deployment by running this on your branch: 🧪 E2E tests will run automatically against this deployment. |
✅ E2E Tests passed
Test ArtifactsCheck the workflow run for screenshots. |
Summary
Fixes #713 —
krea-realtime-video/ WAN2.1 VAE: Conv2d crashes with (2 × 513) input < (3 × 3) spatial kernel inVaceEncodingBlock.Root Cause
The WAN VAE encoder has 3 spatial downsample stages, each implemented as:
Tracing the minimum viable height through all three stages:
An input with H=2 survives DS1 but crashes at DS2 with PyTorch's opaque:
The minimum safe spatial input is 8px in both dimensions (derived by working backwards through all 3 stages to ensure each padded intermediate is ≥ 3).
The crash was triggered at line 123 of
vace/utils/encoding.pywhen encoding reference images withvae.encode_to_latent(refs_stacked, use_cache=False).Changes
src/scope/core/pipelines/wan2_1/vace/utils/encoding.pyMIN_VAE_SPATIAL_SIZE = 8constant with derivation comment_check_spatial_size()helper that raises a descriptiveValueErrorearly (before the cryptic PyTorch kernel-size error) for main frame encoding pathselsebranch soref_latent_batchis never accessed on the skip path (was previously a latent NameError risk)import loggingand module-levelloggerBehaviour After Fix
ValueErrorsurfaced immediately with resolution guidance (vs. cryptic PyTorch kernel message after several encoder stages).Testing
This fix is defensive (no new code paths in the happy path). Verified with:
python3 -c "import ast; ast.parse(open('src/scope/core/pipelines/wan2_1/vace/utils/encoding.py').read()); print('ok')"✅Reviewers: @mjh1 @emranemran