Designed and deployed a scalable LLM-powered pipeline for extracting structured knowledge from large-scale scientific literature. Built a hybrid retrieval + extraction system supporting PDF uploads, ArXiv parsing, and raw text ingestion, generating structured JSON outputs from unstructured research documents. Processed large-scale scientific datasets (200M+ papers via APIs and retrieval systems), enabling automated information extraction and summarization workflows. Reduced end-to-end latency from ~45s to <4s by migrating from local inference (Ollama/Qwen) to Groq-accelerated LLaMA 3.1 with optimized prompt execution. Engineered a reliability layer with retry logic, schema validation, and normalization pipelines to ensure deterministic, failure-resilient outputs. Designed modular architecture integrating LLM pipelines with semantic retrieval for scalable and production-ready deployment. Tech Stack: Python, LLaMA 3.1 (Groq), Streamlit, Sentence-Transformers, Semantic Scholar API, Async Processing