Add Google Gemini TTS Integration #752

zxdxjtu · 2025-07-08T02:50:35Z

Add Google Gemini TTS Integration

Summary

This PR adds support for Google Gemini Text-to-Speech (TTS) API, providing users with 15 high-quality voice options for video
narration. The integration enables users who only have Gemini API keys to leverage Google's advanced TTS capabilities
without requiring additional API services.

What's Changed

✨ New Features

Google Gemini TTS Integration: Added gemini_tts() function in app/services/voice.py with proper Linear PCM audio handling
15 Voice Options: Support for diverse voices including:
- Female voices: Zephyr, Kore, Fennel, Aoede
- Male voices: Puck, Charon, Krypton, Orion, Pegasus
- Each voice has unique characteristics suitable for different content styles
WebUI Integration: Added "Google Gemini TTS" option in audio settings dropdown

🐛 Bug Fixes

Audio Format Issue: Fixed critical bug where Gemini API response was incorrectly processed as base64, resulting in
0.07-second audio files
PCM Audio Handling: Properly handle Linear PCM format (24kHz, mono, 16-bit) returned by Gemini API
Dependency Management: Made faster_whisper optional to improve compatibility

🔧 Technical Details

Gemini API returns raw PCM audio bytes, not base64 encoded data
Audio is converted to MP3 format for compatibility with video processing pipeline
Integrated with existing SubMaker for proper subtitle synchronization
Maintains compatibility with existing TTS providers (Edge TTS, Azure, etc.)

Testing

✅ Tested with various text lengths (5-55 characters)
✅ Verified both English and Chinese text synthesis
✅ Confirmed proper audio duration (1-12 seconds for test cases)
✅ Full video generation pipeline tested successfully

How to Use

Add your Gemini API key to config.toml:
gemini_api_key = "your-api-key-here"
Select "Google Gemini TTS" in WebUI audio settings
Choose from 15 available voices (e.g., "gemini:Kore-Female")
Generate videos with high-quality Gemini narration

Dependencies

google-generativeai (for Gemini API)
pydub (for audio processing)
No additional dependencies required

generating audio

2025-10-27 14:29:22 | INFO | "./app\services\voice.py:1472": gemini_tts - start, voice name: Zephyr, try: 1
2025-10-27 14:29:22 | ERROR | "./app\services\voice.py:1557": gemini_tts - Gemini TTS failed, error: Protocol message GenerationConfig has no "response_modalities" field.
2025-10-27 14:29:22 | ERROR | "./app\services\task.py:84": generate_audio - failed to generate audio:

check if the language of the voice matches the language of the video script.
check if the network is available. If you are in China, it is recommended to use a VPN and enable the global traffic mode.
2025-10-27 14:29:22 | ERROR | "./webui\Main.py:971": - Video Generation Failed
2025-10-27 14:29:45 | INFO | "./app\services\voice.py:1472": gemini_tts - start, voice name: Kore, try: 1
2025-10-27 14:29:45 | ERROR | "./app\services\voice.py:1557": gemini_tts - Gemini TTS failed, error: Protocol message GenerationConfig has no "response_modalities" field.
2025-10-27 14:29:45 | INFO | "./app\services\voice.py:1472": gemini_tts - start, voice name: Kore, try: 1
2025-10-27 14:29:45 | ERROR | "./app\services\voice.py:1557": gemini_tts - Gemini TTS failed, error: Protocol message GenerationConfig has no "response_modalities" field.

Im not in china but?

zxdxjtu added 2 commits July 8, 2025 10:39

fix: make faster_whisper dependency optional

d2706a5

- Add try/except import for faster_whisper - Gracefully handle missing dependency with warning - Prevents import errors on systems without faster_whisper

zxdxjtu marked this pull request as draft July 8, 2025 02:51

zxdxjtu marked this pull request as ready for review July 8, 2025 02:52

muranja approved these changes Oct 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Google Gemini TTS Integration #752

Add Google Gemini TTS Integration #752

Uh oh!

zxdxjtu commented Jul 8, 2025

Uh oh!

alperktt commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add Google Gemini TTS Integration #752

Are you sure you want to change the base?

Add Google Gemini TTS Integration #752

Uh oh!

Conversation

zxdxjtu commented Jul 8, 2025

Uh oh!

alperktt commented Oct 27, 2025

generating audio

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants