Built a fully deployed, production-ready medical chatbot that answers health questions using Retrieval-Augmented Generation grounded entirely in the Gale Encyclopedia of Medicine, not the model's general knowledge.
How it works: Medical PDFs are chunked (700 chars, 80 overlap) and embedded with BAAI/bge-small-en-v1.5, then indexed in Pinecone with MMR search for relevant, non-redundant retrieval. Groq's LLaMA 3.1 8B Instant generates answers strictly from retrieved context, bringing response time down from 60-90s on local Mistral/GGUF to 2-5s.
Hallucination control: A 3-layer filtering pipeline (LLM medical-intent check, similarity threshold ≥ 0.75, and a refusal-phrase check) stops the bot from answering anything outside medicine. Tested against 35 non-medical questions spanning tech, geography, and general knowledge: 35/35 correctly refused. 15/15 medical questions answered accurately with verified page citations.
Beyond the model, it's a full product: Google OAuth login with guest mode, PostgreSQL-backed session history, bookmarks, and persistent source citations, dark/light theme, text-to-speech playback, HTML chat export, and an in-app issue reporter. Containerized with Docker and served through Nginx with SSL on AWS EC2.
Stack: Python, Flask, LangChain, Groq API, Pinecone, PostgreSQL, SQLAlchemy, Docker, AWS EC2, Nginx, Google OAuth 2.0
🔗 Live: https://medencyclo.duckdns.org
💻 Open-source: MIT licensed
Built with