ByteDance / CascadeV

huggingface.co
Total runs: 0
24-hour runs: 0
7-day runs: 0
30-day runs: 0
Model's Last Updated: September 02 2024

Introduction of CascadeV

Model Details of CascadeV

CascadeV | An Implemention of Würstchen architecture for High-Resolution Video Generation

News

[2024.07.17] We release the code and pretrained weights of a DiT-based video VAE, which supports video reconstruction with a high compression factor (1x32x32=1024). The T2V model is still on the way.

Introduction

CascadeV is a video generation pipeline built upon the Würstchen architecture. By using a highly compressed latent representation, we can generate longer videos with higher resolution.

Video VAE

Comparison of Our Cascade Approach with Other VAEs (on Latent Space of Shape 8x32x32)

Video Recontruction: Original (left) vs. Reconstructed (right) | Click to view the videos

1. Model Architecture
1.1 DiT

We use PixArt-Σ as our base model with the following modifications:

  • Replace the original VAE (of SDXL ) with the one from Stable Video Diffusion .
  • Use sematic compressor from StableCascade to provide the low-resolution latent input.
  • Remove text encoder and all multi-head cross-attention layers since we are not using text condition.
  • Replace all 2D attention layers to 3D. We find that 3D attention outperforms 2+1D (i.e. alternative spatial and temporal attention), especially in temporal consistency.

Comparison of 2+1D Attention (left) vs. 3D Attention (right)

1.2. Grid Attention

Using 3D attention requires much more computational resources than 2D/2+1D, especially with higher resolution. As a compromise solution, we replace some 3D attention layers with alternative spatial and temporal grid attention.

2. Evaluation

Dataset: We perform qualitative comparison with other baselines on the dataset Inter4K , by sampling the first 200 videos from the Inter4K to create a video dataset with a resolution of 1024x1024 and 30 FPS.

Metrics: We use PSNR, SSIM and LPIPS to evaluate the per-frame quality (and the similarity between original and reconstructed video) and VBench to evaluate the video quality independently.

2.1 PSNR/SSIM/LPIPS

Diffusion-based VAEs (like StableCascade and our model) performs poorly in reconstruction metrics, due to their ability to produce videos with more fine-grained details but less similiar to the original ones.

Model/Compression Factor PSNR↑ SSIM↑ LPIPS↓
Open-Sora-Plan v1.1/4x8x8=256 25.7282 0.8000 0.1030
EasyAnimate v3/4x8x8=256 28.8666 0.8505 0.0818
StableCascade/1x32x32=1024 24.3336 0.6896 0.1395
Ours/1x32x32=1024 23.7320 0.6742 0.1786
2.2 VBench

Our approach has comparable performance to the previous VAEs in both frame-wise and temporal quality even with much larger compression factor.

Model/Compression Factor Subject Consistency Background Consistency Temporal Flickering Motion Smoothness Imaging Quality Aesthetic Quality
Open-Sora-Plan v1.1/4x8x8=256 0.9519 0.9618 0.9573 0.9789 0.6791 0.5450
EasyAnimate v3/4x8x8=256 0.9578 0.9695 0.9615 0.9845 0.6735 0.5535
StableCascade/1x32x32=1024 0.9490 0.9517 0.9430 0.9639 0.6811 0.5675
Ours/1x32x32=1024 0.9601 0.9679 0.9626 0.9837 0.6747 0.5579
3. Usage
3.1 Installation

Recommend to use Conda

conda create -n cascadev python==3.9.0
conda activate cascadev

Install PixArt-Σ

bash install.sh
3.2 Download Pretrained Weights
bash pretrained/download.sh
3.3 Video Reconstruction

A sample script for video reconstruction with compression factor of 32

bash recon.sh

Results of Video Reconstruction: w/o LDM (left) vs. w/ LDM (right)

It takes almost 1 minutes to reconstruct a video of shape 8x1024x1024 with one NVIDIA-A800

3.4 Train VAE
  • Replace "video_list" in configs/s1024.effn-f32.py with your own video datasets
  • Then run
bash train_vae.sh
Acknowledgement

Runs of ByteDance CascadeV on huggingface.co

0
Total runs
0
24-hour runs
0
3-day runs
0
7-day runs
0
30-day runs

More Information About CascadeV huggingface.co Model

More CascadeV license Visit here:

https://choosealicense.com/licenses/openrail++

CascadeV huggingface.co

CascadeV huggingface.co is an AI model on huggingface.co that provides CascadeV's model effect (), which can be used instantly with this ByteDance CascadeV model. huggingface.co supports a free trial of the CascadeV model, and also provides paid use of the CascadeV. Support call CascadeV model through api, including Node.js, Python, http.

ByteDance CascadeV online free

CascadeV huggingface.co is an online trial and call api platform, which integrates CascadeV's modeling effects, including api services, and provides a free online trial of CascadeV, you can try CascadeV online for free by clicking the link below.

ByteDance CascadeV online free url in huggingface.co:

https://huggingface.co/ByteDance/CascadeV

CascadeV install

CascadeV is an open source model from GitHub that offers a free installation service, and any user can find CascadeV on GitHub to install. At the same time, huggingface.co provides the effect of CascadeV install, users can directly use CascadeV installed effect in huggingface.co for debugging and trial. It also supports api for free installation.

CascadeV install url in huggingface.co:

https://huggingface.co/ByteDance/CascadeV

Url of CascadeV

Provider of CascadeV huggingface.co

ByteDance
ORGANIZATIONS

Other API from ByteDance

huggingface.co

Total runs: 46.6K
Run Growth: -12.6K
Growth Rate: -27.03%
Updated:December 05 2024
huggingface.co

Total runs: 38.4K
Run Growth: 15.1K
Growth Rate: 39.87%
Updated:January 19 2026
huggingface.co

Total runs: 34.6K
Run Growth: 31.5K
Growth Rate: 91.17%
Updated:January 19 2026
huggingface.co

Total runs: 4.5K
Run Growth: 2.5K
Growth Rate: 53.99%
Updated:December 12 2025
huggingface.co

Total runs: 2.8K
Run Growth: 1.9K
Growth Rate: 68.70%
Updated:September 08 2025
huggingface.co

Total runs: 1.2K
Run Growth: -4.1K
Growth Rate: -315.52%
Updated:September 08 2025
huggingface.co

Total runs: 596
Run Growth: -145
Growth Rate: -25.31%
Updated:September 08 2025
huggingface.co

Total runs: 573
Run Growth: -1.4K
Growth Rate: -246.70%
Updated:November 28 2025
huggingface.co

Total runs: 323
Run Growth: -945
Growth Rate: -292.57%
Updated:July 16 2025
huggingface.co

Total runs: 126
Run Growth: 8
Growth Rate: 6.35%
Updated:April 04 2025
huggingface.co

Total runs: 39
Run Growth: -95
Growth Rate: -306.45%
Updated:September 08 2025
huggingface.co

Total runs: 31
Run Growth: -16
Growth Rate: -51.61%
Updated:July 01 2025
huggingface.co

Total runs: 15
Run Growth: 3
Growth Rate: 20.00%
Updated:April 22 2025
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:August 26 2025
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:November 11 2025
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:September 27 2025
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:February 13 2026
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:September 05 2025
huggingface.co

Total runs: 0
Run Growth: 0
Growth Rate: 0.00%
Updated:June 24 2025