This project is a Python-based system designed to detect duplicate questions on Stack Overflow. It utilizes advanced machine learning techniques, specifically Convolutional Neural Networks (CNN) combined with Natural Language Processing (NLP), to accurately identify semantically similar queries. The system features an interactive Flask-based web interface that allows users to easily submit questions for analysis, process multiple questions in batches, and train the model directly through the UI with progress tracking. It provides real-time predictions and displays detailed model performance statistics. For user convenience, sample questions from Stack Overflow are included, and the model can be trained in the background using threading to avoid UI blocking. Prerequisites include Python 3.7+, pip, and Git.
Built with