Abstract
Conference Title: GLOBECOM 2018 - 2018 IEEE Global Communications Conference Conference Start Date: 2018, Dec. 9 Conference End Date: 2018, Dec. 13 Conference Location: Abu Dhabi, United Arab Emirates The new generation of data analytics frameworks require sub-second response times. This in turn requires low latency and high throughput schedulers to allow distributed real- time data analysis. Recent studies in resource allocation for datacenters have focused on probe-based distributed schedulers to improve throughput and jobs response time. However, reducing the search space by relying on probing causes poor response times, especially when the cluster load increases. As a result, high performance cluster schedulers operate at the expense of over provisioning with an increasing operational cost. This paper proposes an Adaptive Probe Size Estimation (APSE) algorithm for cluster management in datacenters. The algorithm efficiently estimates a minimum probe size that limits overprovisioning while keeping the region of high throughput and low response time intact. This approach is particularly effective in large clusters and under heavy loads. The algorithm has been implemented and tested in Sparrow with synthetic and trace-driven data. It has shown considerable improvement in terms of response time, load balancing, and provisioning cost. The algorithm can be used in probe-based schedulers such as Sparrow, Tarcil, Piper, and Hawk.