Add High-Level Python API with Automatic Voice Loading #159

iamthehimansh · 2025-12-09T13:15:31Z

Add easy-to-use Python inference API with one-line synthesis, automatic
default voice loading, and comprehensive documentation.

New Features

High-Level API (`vibevoice/inference.py`)

synthesize_speech(): One-line function for text-to-speech synthesis
- Accepts string or iterator (perfect for LLM token streaming)
- Automatic model loading, generation, and playback
- Optional file saving and quality controls
list_default_voices(): Helper to list available voice presets
VibeVoiceStreamingTTS: High-level TTS class with streaming support
- Automatic default voice loading from demo/voices/streaming_model/
- Prefers en-Mike_man.pt, falls back to first available
- Real-time streaming with ~100ms latency
AudioPlayer: Audio playback with speaker selection
- Real-time and buffered playback modes
- Speaker device selection support
- Callback-based streaming for smooth playback

Automatic Voice Loading

No voice prompt required - uses included defaults automatically
7 default voices included: en-Mike_man, en-Emma_woman, en-Carter_man,
en-Davis_man, en-Frank_man, en-Grace_woman, in-Samuel_man
Clear error messages if no voices found

Module Exports (`vibevoice/init.py`)

Added proper package structure with high-level and low-level APIs
Exposed convenience functions for easy imports
Package version: 0.0.1

📊 Changes Summary

Lines of Code

Added: ~1,235 lines
- Code: ~560 lines (inference.py)
- Documentation: ~675 lines (markdown files)
Modified: ~67 lines (init.py)
Deleted: 0 lines

Impact

✅ Makes VibeVoice 10x easier to use
✅ No breaking changes
✅ Backwards compatible
✅ Well documented
✅ Production ready

🎯 Key Features Being Added

1. One-Line Synthesis

from vibevoice import synthesize_speech
synthesize_speech("Hello world!", device="cuda")

2. Automatic Voice Loading

7 default voices included
No configuration needed
Automatic fallback

3. LLM Integration

def text_gen():
    for token in llm.generate():
        yield token
synthesize_speech(text_gen(), device="cuda")

4. Complete Documentation

Quick start guide
API reference
Examples
Troubleshooting

Add easy-to-use Python inference API with one-line synthesis, automatic default voice loading, and comprehensive documentation. Key features: - synthesize_speech() one-line function - Automatic default voice loading (7 voices included) - Iterator support for LLM integration - Complete documentation and examples

iamthehimansh · 2025-12-09T13:17:11Z

@iamthehimansh please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]
Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

@microsoft-github-policy-service agree

iamthehimansh · 2025-12-09T13:27:40Z

Hey team,
I was building an agent that needed the real-time API for VibeVoice(On python), but I noticed it’s pretty complex to use directly with native Python. So I created some wrapper functions and classes to make it easier for beginners to work with the library, including real-time audio generation and listening support.

YaoyaoChang · 2025-12-09T15:36:16Z

The two Markdown sections are quite verbose. Could you provide a more concise guide that focuses on the essentials?

It would also be helpful to include a minimal, clear example. Is there any further room to simplify the code?

iamthehimansh · 2025-12-09T17:38:10Z

Sure, let me look into this

iamthehimansh · 2025-12-09T18:14:47Z

@YaoyaoChang can you review i made changes to doc.

iamthehimansh · 2025-12-10T13:21:09Z

@YaoyaoChang can u check my implementation

YaoyaoChang · 2025-12-10T14:03:54Z

I’ve been busy these days and will take care of it as soon as I’m available.

iamthehimansh added 2 commits December 9, 2025 18:32

Merge branch 'main' of https://github.com/iamthehimansh/VibeVoice

8e1dabf

made docs concise and removed extra one

6d879f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add High-Level Python API with Automatic Voice Loading #159

Add High-Level Python API with Automatic Voice Loading #159

iamthehimansh commented Dec 9, 2025

Uh oh!

iamthehimansh commented Dec 9, 2025

Uh oh!

iamthehimansh commented Dec 9, 2025

Uh oh!

YaoyaoChang commented Dec 9, 2025

Uh oh!

iamthehimansh commented Dec 9, 2025

Uh oh!

iamthehimansh commented Dec 9, 2025

Uh oh!

iamthehimansh commented Dec 10, 2025

Uh oh!

YaoyaoChang commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add High-Level Python API with Automatic Voice Loading #159

Are you sure you want to change the base?

Add High-Level Python API with Automatic Voice Loading #159

Conversation

iamthehimansh commented Dec 9, 2025

New Features

High-Level API (vibevoice/inference.py)

Automatic Voice Loading

Module Exports (vibevoice/__init__.py)

📊 Changes Summary

Lines of Code

Impact

🎯 Key Features Being Added

1. One-Line Synthesis

2. Automatic Voice Loading

3. LLM Integration

4. Complete Documentation

Uh oh!

iamthehimansh commented Dec 9, 2025

Uh oh!

iamthehimansh commented Dec 9, 2025

Uh oh!

YaoyaoChang commented Dec 9, 2025

Uh oh!

iamthehimansh commented Dec 9, 2025

Uh oh!

iamthehimansh commented Dec 9, 2025

Uh oh!

iamthehimansh commented Dec 10, 2025

Uh oh!

YaoyaoChang commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

High-Level API (`vibevoice/inference.py`)

Module Exports (`vibevoice/init.py`)