Skip to content

fix(vace): guard against sub-minimum spatial dims in WAN VAE encoder#714

Open
livepeer-tessa wants to merge 1 commit intomainfrom
fix/713-vace-encoding-min-spatial-size
Open

fix(vace): guard against sub-minimum spatial dims in WAN VAE encoder#714
livepeer-tessa wants to merge 1 commit intomainfrom
fix/713-vace-encoding-min-spatial-size

Conversation

@livepeer-tessa
Copy link
Contributor

Summary

Fixes #713krea-realtime-video / WAN2.1 VAE: Conv2d crashes with (2 × 513) input < (3 × 3) spatial kernel in VaceEncodingBlock.


Root Cause

The WAN VAE encoder has 3 spatial downsample stages, each implemented as:

nn.Sequential(nn.ZeroPad2d((0, 1, 0, 1)), nn.Conv2d(dim, dim, 3, stride=(2, 2)))

Tracing the minimum viable height through all three stages:

Stage Input H After ZeroPad After Conv2d(3, stride=2)
DS1 2 3 1
DS2 1 2 CRASH (2 < 3)

An input with H=2 survives DS1 but crashes at DS2 with PyTorch's opaque:

RuntimeError: Calculated padded input size per channel: (2 x 513).
Kernel size: (3 x 3). Kernel size can't be greater than actual input size

The minimum safe spatial input is 8px in both dimensions (derived by working backwards through all 3 stages to ensure each padded intermediate is ≥ 3).

The crash was triggered at line 123 of vace/utils/encoding.py when encoding reference images with vae.encode_to_latent(refs_stacked, use_cache=False).


Changes

src/scope/core/pipelines/wan2_1/vace/utils/encoding.py

  • Added MIN_VAE_SPATIAL_SIZE = 8 constant with derivation comment
  • Added _check_spatial_size() helper that raises a descriptive ValueError early (before the cryptic PyTorch kernel-size error) for main frame encoding paths
  • Added a warning + skip guard for reference image encoding: since refs are supplementary conditioning, we log a warning and continue without ref conditioning for that chunk — degrading gracefully rather than crashing the session 500+ times
  • Moved the mask-channel padding and latent concatenation into the else branch so ref_latent_batch is never accessed on the skip path (was previously a latent NameError risk)
  • Added import logging and module-level logger

Behaviour After Fix

  • Ref images too small (< 8px in H or W): Warning logged, chunk continues without reference conditioning. Session does not crash.
  • Main frames too small: Descriptive ValueError surfaced immediately with resolution guidance (vs. cryptic PyTorch kernel message after several encoder stages).

Testing

This fix is defensive (no new code paths in the happy path). Verified with:

  • python3 -c "import ast; ast.parse(open('src/scope/core/pipelines/wan2_1/vace/utils/encoding.py').read()); print('ok')"

Reviewers: @mjh1 @emranemran

The WAN VAE encoder has 3 spatial downsample stages, each implemented as
ZeroPad2d(0,1,0,1) + Conv2d(kernel=3, stride=2, padding=0).  Any spatial
dimension < 8px causes a RuntimeError at the second or third stage:

  Calculated padded input size per channel: (2 x 513). Kernel size: (3 x 3).
  Kernel size can't be greater than actual input size

This was observed 500+ times in 2 min in krea-realtime-video (job
5193400c-da0f-4eef-8bdd-dd0fdd26c1db), crashing every chunk for the
affected session.

Changes in vace/utils/encoding.py:
- Add MIN_VAE_SPATIAL_SIZE = 8 constant with derivation comment
- Add _check_spatial_size() helper that raises a descriptive ValueError
  early (before the cryptic PyTorch kernel-size error) for main frames
- Add a warning+skip guard for reference images: since refs are
  supplementary, we degrade gracefully (no ref conditioning for that
  chunk) rather than aborting the session
- Nest the mask-channel padding and latent concatenation inside the
  else branch so ref_latent_batch is never accessed on the skip path
- Add logging import and module-level logger

Distinct from #673 (temporal kernel underflow in streamdiffusionv2).

Fixes #713

Signed-off-by: livepeer-robot <robot@livepeer.org>
@coderabbitai
Copy link

coderabbitai bot commented Mar 18, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f2eb7808-7580-4d60-be2b-a8debd7e4c29

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/713-vace-encoding-min-spatial-size
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

@github-actions
Copy link
Contributor

🚀 fal.ai Preview Deployment

App ID daydream/scope-pr-714--preview
WebSocket wss://fal.run/daydream/scope-pr-714--preview/ws
Commit 6cc5d17

Testing

Connect to this preview deployment by running this on your branch:

uv run build && SCOPE_CLOUD_APP_ID="daydream/scope-pr-714--preview/ws" uv run daydream-scope

🧪 E2E tests will run automatically against this deployment.

@github-actions
Copy link
Contributor

✅ E2E Tests passed

Status passed
fal App daydream/scope-pr-714--preview
Run View logs

Test Artifacts

Check the workflow run for screenshots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

krea-realtime-video / WAN2.1 VAE: Conv2d crashes with (2 × 513) input < (3 × 3) spatial kernel in VaceEncodingBlock

1 participant