Skip to content

Conversation

@iamthehimansh
Copy link

Add easy-to-use Python inference API with one-line synthesis, automatic
default voice loading, and comprehensive documentation.

New Features

High-Level API (vibevoice/inference.py)

  • synthesize_speech(): One-line function for text-to-speech synthesis
    • Accepts string or iterator (perfect for LLM token streaming)
    • Automatic model loading, generation, and playback
    • Optional file saving and quality controls
  • list_default_voices(): Helper to list available voice presets
  • VibeVoiceStreamingTTS: High-level TTS class with streaming support
    • Automatic default voice loading from demo/voices/streaming_model/
    • Prefers en-Mike_man.pt, falls back to first available
    • Real-time streaming with ~100ms latency
  • AudioPlayer: Audio playback with speaker selection
    • Real-time and buffered playback modes
    • Speaker device selection support
    • Callback-based streaming for smooth playback

Automatic Voice Loading

  • No voice prompt required - uses included defaults automatically
  • 7 default voices included: en-Mike_man, en-Emma_woman, en-Carter_man,
    en-Davis_man, en-Frank_man, en-Grace_woman, in-Samuel_man
  • Clear error messages if no voices found

Module Exports (vibevoice/__init__.py)

  • Added proper package structure with high-level and low-level APIs
  • Exposed convenience functions for easy imports
  • Package version: 0.0.1

📊 Changes Summary

Lines of Code

  • Added: ~1,235 lines
    • Code: ~560 lines (inference.py)
    • Documentation: ~675 lines (markdown files)
  • Modified: ~67 lines (init.py)
  • Deleted: 0 lines

Impact

  • ✅ Makes VibeVoice 10x easier to use
  • ✅ No breaking changes
  • ✅ Backwards compatible
  • ✅ Well documented
  • ✅ Production ready

🎯 Key Features Being Added

1. One-Line Synthesis

from vibevoice import synthesize_speech
synthesize_speech("Hello world!", device="cuda")

2. Automatic Voice Loading

  • 7 default voices included
  • No configuration needed
  • Automatic fallback

3. LLM Integration

def text_gen():
    for token in llm.generate():
        yield token
synthesize_speech(text_gen(), device="cuda")

4. Complete Documentation

  • Quick start guide
  • API reference
  • Examples
  • Troubleshooting

Add easy-to-use Python inference API with one-line synthesis, automatic
default voice loading, and comprehensive documentation.

Key features:
- synthesize_speech() one-line function
- Automatic default voice loading (7 voices included)
- Iterator support for LLM integration
- Complete documentation and examples
@iamthehimansh
Copy link
Author

@iamthehimansh please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

@iamthehimansh
Copy link
Author

Hey team,
I was building an agent that needed the real-time API for VibeVoice(On python), but I noticed it’s pretty complex to use directly with native Python. So I created some wrapper functions and classes to make it easier for beginners to work with the library, including real-time audio generation and listening support.

@YaoyaoChang
Copy link
Collaborator

The two Markdown sections are quite verbose. Could you provide a more concise guide that focuses on the essentials?

It would also be helpful to include a minimal, clear example. Is there any further room to simplify the code?

@iamthehimansh
Copy link
Author

Sure, let me look into this

@iamthehimansh
Copy link
Author

@YaoyaoChang can you review i made changes to doc.

@iamthehimansh
Copy link
Author

@YaoyaoChang can u check my implementation

@YaoyaoChang
Copy link
Collaborator

I’ve been busy these days and will take care of it as soon as I’m available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants