Nvidia unveils Pixel Diffusion Decoder for high-resolution images

2049.news · 26.05.2026, 14:35:03

Nvidia unveils Pixel Diffusion Decoder for high-resolution images


Nvidia introduced the Pixel Diffusion Decoder (PiD), a decoder designed to produce high-resolution images directly from latent representations.

PiD merges decoding and upsampling into a single module, enabling faster generation of large images compared with traditional VAE-based decoders.

How PiD works

Most text-to-image models generate in compressed latent spaces, then use a VAE decoder to reconstruct final pixel images at target resolution.

Traditional decoders focus on faithful reconstruction rather than synthesizing additional high-frequency detail, and computational cost rises with larger output resolutions.

Performance and specifications

Nvidia reports PiD can decode a 512×512 latent into a 2048×2048 image in approximately ~1 sec on a 5090 GPU while using 13 GB VRAM.

The approach also supports generating 4096×4096 images from larger latents, combining fewer denoising steps with integrated upsampling for efficiency.

Compatibility and checkpoints

PiD is compatible with existing VAE and RAE models, and Nvidia has released checkpoints intended for integration with Flux and related pipelines.

Nvidia lists checkpoints for Flux, Flux 2, SD3, Dino v2 and Siglip, while Z-image uses Flux's VAE without a separate PiD checkpoint.

Community and tooling

Kijai has officially joined the Comfy code maintainers, and experimental support for PiD integration into Comfy workflows is currently in progress.

Test workflows and community nodes have appeared on public forums, indicating growing developer interest and practical experimentation around PiD-enabled models.

Nvidia also provided 2K and 4K weight variants trained to map 512 and 1024 latents to higher pixel outputs respectively, simplifying upscale pipelines.


Related posts

Open-source ACE‑Step expands models and audio interfaces
Image-generation prompt outlines realistic stadium broadcast scene
Scroll down to load next post