A comprehensive collection of demos, tools, and experiments for Google's Gemini Live API, featuring real-time audio/video interaction, function calling, and advanced conversation management.
This repository contains multiple implementations and testing approaches for the Gemini Live API, ranging from simple proof-of-concepts to production-ready voice assistants with advanced features like:
- Real-time Audio/Video Processing: Bidirectional audio streaming with camera and screen capture
- Function Calling: Smart home control, weather queries, and custom tool integration
- Advanced Conversation Management: Context optimization, session handling, and instruction management
- Multiple Interface Options: CLI, web-based, and programmatic interfaces
- Audio Development Kit (ADK) Integration: Professional-grade audio processing capabilities
testing_live_gemini_tool/
├── Core Live API Demos
│ ├── gemini_live_01.py # Basic audio processing with file I/O
│ ├── gemini_live_02.py # Simple text-based live session
│ ├── gemini_live_tool_call.py # Enhanced multimodal demo with tools
│ └── live_simple_cli.py # Official DeepMind CLI implementation
│
├── Function Calling & Tools
│ ├── function_calling.py # Smart home voice assistant
│ └── name_correction.py # Booking management utilities
│
├── Advanced Features
│ ├── adk_audio_to_audio.py # ADK-based audio processing
│ ├── instruction_optimization.py # Context management system
│ └── test_minimal.py # API connection testing
│
├── Web Interface
│ ├── index.tsx # TypeScript web component
│ └── simple_websocket_server.py # WebSocket server
│
├── Configuration & Setup
│ ├── setup.py # Automated setup script
│ ├── requirements.txt # Python dependencies
│ ├── env.txt # Environment configuration reference
│ └── .gitignore # Git ignore rules
│
├── Assets
│ ├── audio.wav # Sample audio output
│ └── sample.wav # Sample audio input
│
└── Virtual Environment
└── testing_live_function_calling/ # Isolated environment for testing
- Python 3.8+
- Google AI Studio API Key
- Audio devices (microphone/speakers or headphones)
- Webcam (optional, for video demos)
-
Clone and navigate to the repository
cd testing_live_gemini_tool -
Run the automated setup
python setup.py
-
Configure your API key
- Edit the
.envfile created by setup - Add your Gemini API key from Google AI Studio
GEMINI_API_KEY=your_api_key_here - Edit the
If you prefer manual installation:
# Install core dependencies
pip install google-genai opencv-python pyaudio pillow mss asyncio websockets
# Install additional dependencies for advanced features
pip install google-adk daily-python pipecat-ai
# Install web dependencies (if using TypeScript components)
npm install @google/genai litSimple audio-to-audio processing using the Gemini Live API:
python gemini_live_01.pyFeatures:
- Loads audio from
sample.wav - Processes through Gemini Live API
- Outputs response to
audio.wav - Demonstrates basic audio format conversion
Minimal text-based live session:
python gemini_live_02.pyFeatures:
- Simple "Hello" message processing
- Text-only response modality
- Demonstrates basic session management
Full-featured demo with audio, video, and function calling:
python gemini_live_tool_call.py --mode camera
# or
python gemini_live_tool_call.py --mode screenFeatures:
- Real-time camera or screen capture
- Bidirectional audio streaming
- Function calling capabilities
- Enhanced logging and conversation management
- Session statistics and conversation saving
- Multiple input modes (text + audio + video)
Commands:
help- Show available commandsstats- Display session statisticssave [filename]- Save conversation logclear- Clear screenqorquit- Exit
Production-ready voice assistant with smart home integration:
python function_calling.pyCapabilities:
- Control smart lights (on/off/dim)
- Get weather information
- Adjust thermostat settings
- Natural language processing
- Audio and text input/output
Example Commands:
- "Turn on the living room lights"
- "What's the weather in San Francisco?"
- "Set the temperature to 72 degrees"
- "Dim the bedroom lights to 30%"
Professional-grade audio processing using Google's ADK:
python adk_audio_to_audio.pyFeatures:
- Advanced audio processing pipeline
- Name correction utilities integration
- Session management with persistent context
- Order status checking functionality
- Professional audio quality optimization
TypeScript/Lit-based web component for browser integration:
Features:
- Browser-based audio recording
- Real-time audio playback
- Visual audio indicators
- WebRTC audio processing
- Responsive web interface
Comprehensive API connection testing:
python test_minimal.pyTests:
- Multiple Gemini model compatibility
- API key validation
- Connection stability
- Error handling verification
Intelligent conversation context management:
Features:
- Dynamic prompt assembly
- Session-based context injection
- Conversation phase tracking
- Memory-efficient context segmentation
Usage:
from instruction_optimization import SessionMetadata, ConversationPhase
# Create session metadata
session = SessionMetadata(
user_id="user123",
session_id="session456",
current_phase=ConversationPhase.GREETING
)Booking management and name correction system:
Correction Types:
- Spelling corrections
- Name swaps
- Gender corrections
- Maiden name changes
- Title removals
Reference your environment variables against env.txt for complete setup:
- Google Cloud credentials
- API keys and authentication
- Audio processing libraries
- Development dependencies
gemini-2.0-flash-exp- Latest experimental modelgemini-2.0-flash- Production flash modelgemini-1.5-flash- Fast processing modelgemini-1.5-pro- Advanced reasoning modelgemini-2.5-flash-preview-native-audio-dialog- Native audio dialoggemini-live-2.5-flash-preview- Live preview model
- Format: PCM 16-bit
- Sample Rate: 16,000 Hz
- Channels: Mono
- Chunk Size: 1024 bytes
- Format: PCM 16-bit
- Sample Rate: 24,000 Hz
- Channels: Mono
- Voice Options: Zephyr, Kore
-
API Key Protection
- Store API keys in
.envfiles - Never commit API keys to version control
- Use environment variables in production
- Store API keys in
-
Audio Privacy
- Use headphones to prevent audio feedback
- Be aware of microphone permissions
- Consider audio data handling policies
-
Function Calling Security
- Validate all function parameters
- Implement proper error handling
- Use permission-based access controls
-
Audio Feedback
- Solution: Use headphones instead of speakers
- Cause: Microphone picks up speaker output
-
API Connection Errors
- Solution: Verify API key in
.envfile - Check: Internet connection and firewall settings
- Solution: Verify API key in
-
Module Import Errors
- Solution: Install missing dependencies with
pip install -r requirements.txt - Check: Virtual environment activation
- Solution: Install missing dependencies with
-
Camera/Screen Capture Issues
- Solution: Grant necessary permissions to terminal/application
- macOS: System Preferences → Security & Privacy → Camera/Screen Recording
API Key not found: Check.envfile configurationConnection refused: Verify network connectivityAudio device error: Check microphone/speaker connectionsPermission denied: Grant required system permissions
- Use appropriate chunk sizes (1024-2048 bytes)
- Implement proper buffering strategies
- Consider audio compression for network efficiency
- Limit frame rate to 1 FPS for efficiency
- Resize images to max 1024x1024
- Use JPEG compression for bandwidth optimization
- Implement async function execution
- Use proper error handling and timeouts
- Cache frequently used data
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Google AI Studio - Get your API key
- Gemini Live API Documentation - Official documentation
- Google ADK Documentation - Audio Development Kit
- WebRTC Documentation - Web audio standards
For issues and questions:
- Check the troubleshooting section above
- Review the official Gemini documentation
- Open an issue in this repository
- Join the Google AI developer community
Note: This is an experimental project for testing and learning purposes. Use responsibly and in accordance with Google's terms of service.