Abstract
Solving large-scale problems in a variety of scientific and engineering fields requires efficient hierarchical methods to exploit parallelism. In this paper we present optimizations to enhance the performance of parallel N-body simulations (NBS) using the Barnes Hut approximation for a 60-core MIC accelerator. We focus on two sources of performance degradation in NBS: (1) the semi-static parallelism which leads to dynamic load unbalancing and (2) the processing of very large data exceeding the cache capacity. A first proposed optimization is to dynamically balance the load by computing load in an iteration as an estimate for the load in the next iteration. This optimization helps in even distribution of the load for the next iteration. The second proposed optimization subdivides the data into well-adjusted chunks to enhance data reuse in shared caches. The proposed optimizations are tested on a 60-core MIC accelerator. Evaluation results showed that optimized NBS produces a speedup of up to 33% due to dynamic load balancing and 260% due to enhanced cached data reuse.