Abstract
Cloud Computing can offer a cost-effective way to deploy scientific applications that are abstracted as scientific workflows than traditional distributed computing environments such as Grid and Cluster. Due to the large size of consumed and produced datasets by these workflows, data placement is becoming more and more a challenging task. When Intensive Workflows tasks are executed, they may require a massive volume of data that are physically distributed across multiple Servers. Moving these datasets appears to be costly in terms of workflow execution time, communication cost and energy consumption. Thus, an efficient data placement strategy is needed and it must take into consideration the following challenges: reducing the workflow communication, the total data movement, and the energy consumption. Actually, ensuring an optimal mapping of data to Cloud Storage Service in a reasonable time turns out to constitute a challenging task as it is considered as an NP-Hard problem. In this paper, a decentralized and scalable agent based-approach that considers both the placement of original and intermediate datasets obtained during workflow run is proposed, with the main concern is reducing the execution time of the data placement algorithm. It involves two steps. The first step consists in distributing all the data placement process based on a set of cooperative agents and this is based on the idea that the formal context can be partitioned on a set of formal sub-context. The second step is the placement of the newly generated datasets during the execution of workflow tasks as "intermediate data" using an Incremental algorithm for constructing Concepts Lattice (or Galois Lattice). Experimentation proved that our proposed strategy appears to be effective in reducing the total execution time and in finding optimal data mapping in polynomial time.