Built an intelligent Tic Tac Toe simulator with two competing agents—one trained via Reinforcement Learning (PPO) and the other using an optimized MiniMax algorithm with memoization. Trained RL agents using Q-learning with ε-greedy exploration, converging to near-optimal play after 10,000+ episodes. Implemented MiniMax with memoization, achieving perfect adversarial play and outperforming under-trained RL agents. Created an interactive environment in Jupyter Notebook, enabling step-by-step simulation of RL vs RL, RL vs MiniMax, and MiniMax vs MiniMax matches. Added dynamic performance tracking with visual and tabular analytics for win/loss/draw stats. Demonstrated the contrast between model-free learning and deterministic search, showcasing learning curves and decision optimization. Tech Stack: Python · Jupyter Notebook · NumPy · Matplotlib