Abstract
This paper introduces a novel data placement policy to distribute large video data across Hadoop cluster nodes. In video processing, it is common that applications process only the Region of Interest, so processing two equal-sized video segments may require different workloads and hence different execution times. Therefore, the MapReduce implementation of such processing inevitably causes a workload imbalance when the default data placement policy of Hadoop is used to distribute video data. The resulting workload imbalance reduces data locality, which in turn results in data migration overhead. The proposed data placement policy improves the performance of such MapReduce-based applications by mitigating the workload imbalance and minimizing the data transfer overhead in homogeneous environments. The experimental results show the achieved performance improvement.