Two Undergrads Unveil SOTA Open-Source Speech AI
Two Undergrads Unveil SOTA Open-Source Speech AI
Published: May 2025
Image source: Nari Labs
The Rundown
Korean startup Nari Labs, founded by two undergraduates with no outside funding, has released Dia—a 1.6 billion-parameter, open-source text-to-speech model that rivals leading commercial systems like ElevenLabs and Sesame CSM-1B.
๐ Try Dia on GitHub: Nari Labs Dia Model
Key Features of Dia
-
Expressive Emotional Tones
Delivers nuanced speech with joy, sadness, and urgency. -
Multi-Speaker Support
Tag voices for distinct characters or personas. -
Nonverbal Cues
Includes realistic laughter, coughing, whispers, and screams. -
Open-Source & Free
No licensing fees—ideal for startups, researchers, and hobbyists.
Performance Benchmarks
In side-by-side tests, Dia outperformed:
- ElevenLabs Studio in waveform naturalness and timing
- **Sesame CSM-1B
- Latency & Throughput in large-batch generation
How Two Undergrads Did It
- TPU Research Cloud: Leveraged Google’s free TPU credits to train Dia.
- Inspired by NotebookLM: Borrowed techniques from Google’s AI research.
- Zero Funding: Bootstrapped development through academic collaboration.
“We wanted to prove you don’t need deep pockets to build world-class AI,” says founder Toby Kim. Dia is now the foundation for Nari Labs’ upcoming consumer app focused on social content creation and remixing.
Why It Matters
Dia exemplifies the democratization of AI:
- Accessibility: Open-source models close the gap between industry giants and independent creators.
- Innovation: Young developers can launch competitive products with minimal resources.
- Community Growth: Contributions to GitHub will accelerate voice-tech research and applications.
Get Started with Dia Today
Experience the future of speech AI—download or fork Dia on GitHub:
๐ Nari Labs Dia Model Repository
Keywords: open source speech AI, text-to-speech SOTA, Dia TTS model, Nari Labs, undergrad AI startup
Comments
Post a Comment