Production-deployable fully local AI assistant with zero cloud dependency. 1. Runs Whisper STT, moondream2 VQA, Kokoro TTS, and Phi-3 LLM entirely on local hardware 2. BART-MNLI zero-shot semantic router dispatches queries across three processing paths: personal knowledge retrieval, OS-level tool execution, and general conversation 3. Hybrid RAG using MiniLM dense retrieval via Qdrant plus BM25 sparse, reranked by a cross-encoder 4. GBNF grammar-constrained sampling reduces JSON parse failures and tool hallucination to near zero 5. Next.js frontend with FastAPI orchestrator, fully containerized with Docker