Abstract
This paper deals with aspects of data distribution for machine learning tasks, considering the advantages as well as the drawbacks that are frequently associated with data partitioning and its different models. This study, from the point of view of the distributed data, reviews some of the algorithms that have been used to treat each case, although it is not a review of learning or computation algorithms. Finally, this report looks into the issues that new data partitioning based models such as MapReduce have brought to distributed learning.