Nari-labs releases open-source Dia2 streaming speech generator
Nari-labs releases open-source Dia2 streaming speech generator
Nari-labs published an open-source speech generation model called Dia2 that supports streaming and per-speaker voice samples.
Model overview
Dia2 provides a streaming mode that can begin producing audio from the first words without waiting for full text preprocessing, enabling lower perceived latency for interactive scenarios.
- Variants: 1B and 2B model sizes.
- Language: generates up to 2 minutes in English; Russian is not supported.
- Hardware: designed to run on GPUs with around 8 GB VRAM or less.
Behavior and limitations
Early testing shows variable outputs: Dia2 may introduce unsolicited words, exhibit inconsistent loudness, and deliver speech with rapid pacing and reduced pausing.
Developers note that stable speaker rendering requires either supplying per-speaker audio samples as prefixes or fine-tuning the model on target voices.
License and ecosystem
The project is released under the Apache-2 license, which is permissive and intended to facilitate commercial and community adoption of the code and models.
While Nari-labs positions Dia2 as less mature in quality than established commercial offerings, the maintainers expect community contributions to address current instability and improve naturalness over time.
Related posts
