SignScribe is an AI solution that provides real time sign language translation. It allows deaf people (5% of global population) to follow live events and content online by generating sign language images in real time.
Firstly we used Pipecat to orchestrate the entire pipeline, which is structured as follows:
- Pipecat listens to the audio of the person speaking and, using the Cloud Speech-to-Text API on Google, transcribes the audio to text.
- The transcribed text is then passed to the Gemini to translate the text into ASL grammar.
- The ASL grammar is then passed to the Gemini Image Generation API to generate the sign language images.
- The sign language images are then passed to the Pipecat pipeline to be displayed to the user.
- Pipecat
- Gemini
- Cloud Speech-to-Text API
- Generative Language API
We started on this project from scratch. We got the idea on our way to the hackathon this morning so all code was written today. Both technologies are new to us so we had to learn them on the fly!
First time using GCP it took a bit to get everything set up.
