Author

Ahmed Mostafa

Date of Award

5-2020

Type

Thesis

Major

Computer Science - Applied Computing Track

Degree Type

Master of Science

Department

TSYS School of Computer Science

First Advisor

Yi Zhou

Second Advisor

Rania Hodhod

Third Advisor

Shamim Khan

Abstract

Hadoop, a pioneering open source framework, has revolutionized the big data world because of its ability to process vast amounts of unstructured and semi-structured data. This ability makes Hadoop the ‘go-to’ technology for many industries that generate big data, thus it also aids in being cost effective, unlike other legacy systems. Hadoop MapReduce is used in large scale data parallel applications to process massive amounts of data across a cluster and is used for scheduling, processing, and executing jobs. Basically, MapReduce is the right hand of Hadoop, as its library is needed to process these large data sets. In this research thesis, this study proposes a smart framework model that profiles MapReduce tasks with the use of Machine Learning (ML) algorithms to effectively place the data in Hadoop clusters; activate only sufficient number of nodes to accomplish the data processing within the planned deadline time for the task. The model will ensure achieving energy efficiency by utilizing the minimum number of necessary nodes, with maximum utilization and least energy consumption to reduce the overall cost of operations in data centers that deploy the Hadoop clusters.

Share

COinS