Carleton University - School of Computer Science Honours Project
Winter 2020
Changepoint Detection in Hadoop Workloads
Alex Trostanovsky
SCS Honours Project Image
ABSTRACT
Apache Hadoop is a software framework that enables efficient and scalable computation of Big Data analytic queries using networks of processors. Resource consumption by nodes in such networks varies as queries execute, and dynamic allocation of resources can provide performance improvements. To monitor Hadoop jobs and anticipate resource requirements, Genkin and Dehne [8] introduced the ChangeDetector, a Workload Transition classifier that can predict transient states during Hadoop Workload execution. This work evaluates the ChangeDetector and compares its Change Point Detection (CPD) accuracy to other well-established algorithms. We find that the ChangeDetector is competitive when compared to the optimal configuration of performant offline algorithms, and conclude that it is well suited for transition classification when dealing with Hadoop workloads.