Map-Reduce for Machine Learning on Multicore

TitleMap-Reduce for Machine Learning on Multicore
Publication TypeConference Paper
Year of Publication2006
AuthorsNg, Andrew Y.., Bradski, Gary., Chu, Cheng-Tao., Olukotun, Kunle., Kim, Sang Kyun., Lin, Yi-An., and Yu, YuanYuan
Conference NameNIPS
Date Published12/2006
Keywordsmachine learning, mapreduce

We are at the beginning of the multicore era. Computers will have increasingly
many cores (processors), but there is still no good programming framework for
these architectures, and thus no simple and unified way for machine learning to
take advantage of the potential speed up. In this paper, we develop a broadly ap-
plicable parallel programming method, one that is easily applied to many different
learning algorithms. Our work is in distinct contrast to the tradition in machine
learning of designing (often ingenious) ways to speed up a single algorithm at a
time. Specifically, we show that algorithms that fit the Statistical Query model [15]
can be written in a certain “summation form,” which allows them to be easily par-
allelized on multicore computers. We adapt Google’s map-reduce [7] paradigm to
demonstrate this parallel speed up technique on a variety of learning algorithms
including locally weighted linear regression (LWLR), k-means, logistic regres-
sion (LR), naive Bayes (NB), SVM, ICA, PCA, gaussian discriminant analysis
(GDA), EM, and backpropagation (NN). Our experimental results show basically
linear speedup with an increasing number of processors.

ChuCT_etal_2006.pdf466.24 KB