Abstract
Conference Title: 2014 International Conference on Parallel, Distributed and Grid Computing (PDGC) Conference Start Date: 2014, Dec. 11 Conference End Date: 2014, Dec. 13 Conference Location: Solan, India Writing code for heterogeneous architectures with processors and accelerators from multiple vendors from scratch or translating existing serial code, a lot of effort and investment will be required from the application developer. This problem will become more prominent when HPC applications are moved into the Cloud as Cloud providers frequently update their architectures to keep with market trends. In these scenarios, automatic parallelization tools will definitely have an important role to play. An important constituent of these tools would be the ability to perform pertinent domain decomposition of the serial code to maximize utilization of the available computational elements. One of the first steps in this direction is to understand the role of the number and type of computational element in a heterogeneous architecture to the overall performance of an application. This paper presents observations made on architectures with different types and number of computational elements using two case studies on five different architectures with different types and number of computational elements. Results show that the perceived speedup and actual speedup are not related.