Built a production-grade, event-driven audio processing system from scratch. The architecture decouples REST API ingestion from ML inference using Redis Streams as a message broker — enabling horizontal scalability without tight coupling. Core pipeline: → FastAPI handles async job ingestion and real-time status updates via REST → Redis Streams queues audio jobs to independent ML workers → OpenAI Whisper performs automatic speech recognition → pyannote delivers speaker diarization with labeled segments → PostgreSQL (Supabase) persists structured transcription data → Multi-cloud deployment: Hugging Face Spaces (API) · Upstash Redis TLS · Cloudflare Pages (frontend) Tackled real infrastructure challenges: TLS-secured inter-service communication, environment-based config across a multi-cloud stack, and async job orchestration under load. Stack: Python · FastAPI · Redis Streams · OpenAI Whisper · pyannote · PostgreSQL · Docker · Supabase · Cloudflare Pages
Built with