Abstract
In distributed computing, data scheduling is becoming an important field of research with the emergence of Big Data. High level features provided by software data scheduler often rely on data management policies - possibly user-defined - such as fault tolerance, multi-protocol file transfer, reliable and multi-tenant storage, security and data privacy, locality-aware data distribution etc. Nowadays, to execute data-intensive applications, such advanced features become necessary, and this means that data and task schedulers are capable to cooperate closely. In this paper, we propose a data driven cooperative platform by combining two existing middleware: XtremWeb-HEP, as the task scheduler, and BitDew, as the data scheduler. Taking advantage of both middleware, our solution allows user to select the suitable data scheduling strategy as well as the adequate task granularity which provide the optimal data distribution. To evaluate the efficiency of our approach, we compare different strategies of scheduling tasks and data and prove the efficiency of the cooperation of data and task schedulers to execute data-intensive applications.