Bar Keepers

AI-Powered Educational Raps: An Automated Video Generation Pipeline

Final - Instagram Post

This project is an automated pipeline that transforms any educational topic into a short, engaging music video featuring a talking head avatar, dynamic subtitles, and AI-generated music. The final output is designed for short-form video platforms like TikTok and Instagram to make learning fun and memorable.

The entire pipeline is built to run on Modal, leveraging the power of a single NVIDIA H100 GPU to accelerate the most demanding AI tasks, from 3D head generation to audio transcription[1][3][5][7].

The Concept

The core idea is to fuse education with entertainment. Instead of dry text, users learn through a catchy rap song, with the key information reinforced through a visually engaging video format.

The Automated Workflow

The project consists of several automated stages that work together to produce the final video:

Lyric Generation: An LLM creates educational rap lyrics about a user-provided topic.
Voice Synthesis: The lyrics are converted into a high-quality, rap-style voice using ElevenLabs.
Music Generation: A background beat is generated to match the rhythm and style of the vocals.
Talking Head Synthesis: The core of the project. Using the Real3D-Portrait model, a realistic 3D talking head is generated, with lip movements perfectly synced to the rap audio[4].
Word-by-Word Subtitles: The audio is transcribed using whisper-timestamped to get precise timings for every word. These are then rendered onto the video as a "karaoke-style" animation.

Tech Stack & Key Components

This project integrates several cutting-edge AI models and tools:

3D Talking Head Generation: Based on the ICLR 2024 Spotlight paper Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis.
Cloud Compute & Infrastructure: All heavy computation is offloaded to Modal, running containerized environments on demand. The most intensive tasks leverage the speed of an NVIDIA H100 80GB GPU for fast processing.
Audio Transcription: whisper-timestamped is used to obtain precise word-level start and end times for dynamic subtitle animation.
Video & Audio Processing: FFmpeg and MoviePy are used for pre-processing input data and rendering the final subtitled video.
Containerization: The entire complex software environment, including specific versions of PyTorch, CUDA, and system libraries, is defined in a Dockerfile to ensure reproducibility[1].

Project Structure

The repository is organized into a clean and scalable structure:

Knowunity_project/
├── .gitignore
├── Dockerfile
├── README.md
├── data/
│ ├── raw/
│ └── processed/
├── docs/
├── notebooks/
├── output/
└── src/
├── helpers/
├── add_subtitles_modal.py
├── preprocess_data.py
└── run_modal.py

How to Run

This project is designed to be run from your local machine, with all heavy lifting executed remotely on Modal.

1. Setup

First, ensure you have Modal installed and configured:

Install the Modal client pip install modal

Set up your Modal token (this will open a browser window) modal token new

Clone this repository git clone https://github.com/Poyen-Chen/Knowunity_project.git cd Knowunity_project

Make sure you have a Hugging Face secret named 'huggingface-secret' in your Modal account. You can create one on the Modal website with your Hugging Face token. text

2. Usage Workflow

The process is broken down into three simple command-line steps. Run these commands from the root directory of the project (Knowunity_project/).

Step 1: Pre-process Your Data

Prepare your driving audio and video files. This script converts them to the formats required by the AI models. Place your raw files in the data/raw/ folder first.

Example: python src/preprocess_data.py --input-dir data/raw --output-dir data/processed --audio-file your_audio.mp3 --video-file your_video.mp4

text The processed files will be saved in data/processed/.

Step 2: Generate the Main Video

This command runs the core pipeline on Modal, using an H100 GPU to generate the talking head video. It uses your processed data as input.

Example: python src/run_modal.py --src-img data/raw/your_source_image.png --drv-aud data/processed/your_audio_16khz.wav --drv-pose data/processed/your_video_512x512.mp4 --bg-img data/raw/your_background.png --out-name my_video.mp4

text This will save the result to the output/ folder.

Step 3: Add Dynamic Subtitles

This command takes the generated video, transcribes it, and burns in the word-by-word animated subtitles[1].

Example: python src/add_subtitles_modal.py --input-video output/my_video.mp4 --output-video output/my_video_with_subs.mp4 --gpu T4 --model medium

text The final, shareable video will be saved in the output/ folder.

Acknowledgements

This project's 3D talking head generation is powered by the incredible work from the authors of Real3D-Portrait.
Cloud infrastructure and GPU acceleration are provided by Modal Labs.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.ipynb_checkpoints		.ipynb_checkpoints
notebook		notebook
src		src
.gitignore		.gitignore
11L-A_high-energy_West_C-1749927461988.mp3		11L-A_high-energy_West_C-1749927461988.mp3
Dockerfile		Dockerfile
ElevenLabs_2025-06-14T17_26_15_KL_ivc_sp100_s50_sb75_v3.mp3		ElevenLabs_2025-06-14T17_26_15_KL_ivc_sp100_s50_sb75_v3.mp3
README.md		README.md
combined_audio.mp3		combined_audio.mp3
idea.png		idea.png
not_like_us_instrumental.mp3		not_like_us_instrumental.mp3
workflow.png		workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bar Keepers

Final - Instagram Post

The Concept

The Automated Workflow

Tech Stack & Key Components

Project Structure

How to Run

1. Setup

2. Usage Workflow

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bar Keepers

Final - Instagram Post

The Concept

The Automated Workflow

Tech Stack & Key Components

Project Structure

How to Run

1. Setup

2. Usage Workflow

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages