- Master’s Project in collaboration with Dr. Polo Chau and a team of graduate students.
- Designed a cluster of over 30 machines to reliably detect anomalies in power sensor data at scale.
- Built a data ingestion pipeline using a Java multithreaded reverse proxy to write data to HBase.
- Created a Spark Streaming analysis pipeline to perform anomaly detection at scale.
- Results: Ingestion speeds of 400k data points per second on a 30 node cluster with linear scalability.
- Published in International Parallel & Distributed Processing Symposium Workshop.