Skip to content
Back to projects
Media · 2025 ·Creator · In progress

Lyric Video Generator

A tool that turns an audio file and its lyrics into a styled MP4 lyric video — using OpenAI Whisper for word-level timing, then rendering animated text and audio visualizers with optional GPU acceleration.

L

Overview

Making a lyric video by hand means syncing text to audio one line at a time — tedious, finicky work. This tool automates the whole thing: feed it a song and its lyrics, get back a finished, styled lyric video.

What I built

  • Word-level alignment — OpenAI Whisper transcribes with timestamps, then fuzzy matching reconciles the transcription against the real lyrics so timing lands on the right words.
  • Styled rendering — animated text with configurable fonts, colors, and positioning, plus spectrum and waveform visualizer overlays.
  • Built for speed and reach — optional CUDA GPU acceleration (3–5× faster) and three ways to drive it: CLI, GUI, and a FastAPI web interface.

Status

A work in progress with a solid core — the alignment-and-render pipeline works; the hosted web experience is the next step.

Built with Python Whisper MoviePy PyTorch FastAPI
Next project AI Support Engine