Abstract
As consumers are increasingly engaged in social networking
and E-commerce activities, businesses grow to rely on
Big Data analytics for intelligence, and traditional IT infrastructures
continue to migrate to the cloud and edge,
these trends cause distributed data storage demand to rise
at an unprecedented speed. Erasure coding has seen itself
quickly emerged as a promising technique to reduce storage
cost while providing similar reliability as replicated systems,
widely adopted by companies like Facebook, Microsoft and
Google. However, it also brings new challenges in characterizing
and optimizing the access latency when data objects
are erasure coded in distributed storage. The aim of this
monograph is to provide a review of recent progress (both
theoretical and practical) on systems that employ erasure
codes for distributed storage.
In this monograph, we will first identify the key challenges
and taxonomy of the research problems and then give an
overview of different models and approaches that have been
developed to quantify latency of erasure-coded storage. This
includes recent work leveraging MDS-Reservation, Fork-Join,
Probabilistic, and Delayed-Relaunch scheduling policies, as
well as their applications to characterizing access latency
(e.g., mean, tail, and asymptotic latency) of erasure-coded
distributed storage systems. We will also extend the discussions
to video streaming from erasure-coded distributed
storage systems. Next, we will bridge the gap between theory
and practice, and discuss lessons learned from prototype
implementations. In particular, we will discuss exemplary
implementations of erasure-coded storage, illuminate key
design degrees of freedom and tradeoffs, and summarize
remaining challenges in real-world storage systems such as
in content delivery and caching. Open problems for future
research are discussed at the end of each chapter.