Scope-3 emissions estimation

Estimate Scope-3 emissions & score supplier risk

Open Source

Project Overview

This project provides an end-to-end data pipeline and predictive analytics system designed to estimate Scope-3 emissions, identify high-risk suppliers, and deliver actionable insights through an interactive Power BI dashboard. It addresses the challenge of quantifying Scope-3 emissions, particularly Category 1 (Purchased Goods & Services), by mapping procurement spend to emission factors and developing a supplier-level risk score.

Key Features:

ETL Pipeline: Processes raw procurement data, maps categories to NAICS codes, joins with EPA emission factors, handles imputation for missing data, and calculates line-level emissions.
Supplier Risk Scoring: Generates a composite score based on normalized emissions and spend to prioritize suppliers.
Machine Learning Model: Utilizes a Random Forest model, achieving high accuracy (97.6% R² = 0.9968) in predicting supplier emissions. Other models like Linear Regression, Ridge Regression, and Gradient Boosting were also evaluated.
Data Visualization: Presents findings through a comprehensive 4-page Power BI dashboard, including an executive overview, supplier and category analysis, and supplier drilldown views.
Data Sources: Leverages procurement data (e.g., from Kaggle), US EPA USEEIO emission factors, and NAICS codes for accurate estimation.
Methodology: Employs spend-based emissions calculation aligned with the GHG Protocol Scope 3 Standard and US EPA USEEIO methodology.

Tech Stack:

Data Engineering: Python 3.11, Pandas, NumPy
Machine Learning: scikit-learn (Random Forest, Ridge, Gradient Boosting)
Visualization: Matplotlib, Seaborn, Plotly
Business Intelligence: Power BI Desktop, DAX
SQL Analytics: PostgreSQL-compatible queries

The project includes detailed notebooks for the ETL pipeline and ML model training, SQL query examples, and clear instructions on how to run the system.

Built with

Python

Pandas

NumPy

scikit-learn

Power BI

SQL

Jupyter Notebook

Matplotlib

Seaborn

Plotly