Abstract
The big data computing era is coming to be a fact in all daily life. As data-intensive become a reality in many of scientific branches, finding an efficient strategy for massive data computing systems has become a multi-objective improvement. Processing these huge data on the distributed hardware clusters as Clouds needs a powerful computation model like Hadoop-MapReduce. In this paper, we studied various schedulers developed in Hadoop in Cloud Environments, features and issues. Most existing studies considered the improvement in the performance from the single point of view (scheduling, locality of data, the correctness of the data, etc) but very few literature involved multi-objectives improvements (quality requirements, scheduling entities, and dynamic environment adaptation), especially in heterogeneous parallel and distributed systems. Hadoop and MapReduce are two important aspects in big data for handling structured and unstructured data. The Creation of an algorithm for node selection is essential to improve and optimize the performance of the MapReduce. This paper introduces a survey of the previous work done in the Hadoop-MapReduce scheduling and gives some suggestion for the improvement of it.