Two Undergrads Unveil SOTA Open-Source Speech AI

Published: May 2025
Image source: Nari Labs

The Rundown

Korean startup Nari Labs, founded by two undergraduates with no outside funding, has released Dia—a 1.6 billion-parameter, open-source text-to-speech model that rivals leading commercial systems like ElevenLabs and Sesame CSM-1B.

👉 Try Dia on GitHub: Nari Labs Dia Model

Key Features of Dia

Expressive Emotional Tones
Delivers nuanced speech with joy, sadness, and urgency.
Multi-Speaker Support
Tag voices for distinct characters or personas.
Nonverbal Cues
Includes realistic laughter, coughing, whispers, and screams.
Open-Source & Free
No licensing fees—ideal for startups, researchers, and hobbyists.

Performance Benchmarks

In side-by-side tests, Dia outperformed:

ElevenLabs Studio in waveform naturalness and timing
**Sesame CSM-1B
Latency & Throughput in large-batch generation

How Two Undergrads Did It

TPU Research Cloud: Leveraged Google’s free TPU credits to train Dia.
Inspired by NotebookLM: Borrowed techniques from Google’s AI research.
Zero Funding: Bootstrapped development through academic collaboration.

“We wanted to prove you don’t need deep pockets to build world-class AI,” says founder Toby Kim. Dia is now the foundation for Nari Labs’ upcoming consumer app focused on social content creation and remixing.

Why It Matters

Dia exemplifies the democratization of AI:

Accessibility: Open-source models close the gap between industry giants and independent creators.
Innovation: Young developers can launch competitive products with minimal resources.
Community Growth: Contributions to GitHub will accelerate voice-tech research and applications.

Get Started with Dia Today

Experience the future of speech AI—download or fork Dia on GitHub:

🔗 Nari Labs Dia Model Repository

Keywords: open source speech AI, text-to-speech SOTA, Dia TTS model, Nari Labs, undergrad AI startup

MrYT

MrYT