Ahmed Mostafa

Date of Award





Computer Science - Applied Computing Track

Degree Type

Master of Science


TSYS School of Computer Science

First Advisor

Yi Zhou

Second Advisor

Rania Hodhod

Third Advisor

Shamim Khan


Hadoop, a pioneering open source framework, has revolutionized the big data world because of its ability to process vast amounts of unstructured and semi-structured data. This ability makes Hadoop the ‘go-to’ technology for many industries that generate big data, thus it also aids in being cost effective, unlike other legacy systems. Hadoop MapReduce is used in large scale data parallel applications to process massive amounts of data across a cluster and is used for scheduling, processing, and executing jobs. Basically, MapReduce is the right hand of Hadoop, as its library is needed to process these large data sets. In this research thesis, this study proposes a smart framework model that profiles MapReduce tasks with the use of Machine Learning (ML) algorithms to effectively place the data in Hadoop clusters; activate only sufficient number of nodes to accomplish the data processing within the planned deadline time for the task. The model will ensure achieving energy efficiency by utilizing the minimum number of necessary nodes, with maximum utilization and least energy consumption to reduce the overall cost of operations in data centers that deploy the Hadoop clusters.