This project predicts the Air Quality Index (AQI) using advanced machine learning techniques with a focus on robust data imputation, outlier handling, and model selection. Multiple Imputation by Chained Equations (MICE) was used to manage missing data with estimators like Ridge Regression, Bayesian Regression, and Elastic Net, ensuring a complete dataset. Outliers were removed prior to imputation to prevent data skew.
Models Implemented:
LinearRegression
Ridge
SGDRegressor
ElasticNet
Lasso
SVR
Decision Tree Regressor
Random Forest Regressor (Top Performer)
Gradient Boosting Regressor
After testing, the Random Forest Regressor yielded the most accurate AQI predictions. Visualizations compared model performance, with Random Forest emerging as the most reliable predictor. This project serves as an effective framework for accurate AQI monitoring.