A Comparative Analysis of Cloud based Recommendation System on Mapreduce and Spark
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Pulchowk Campus
Abstract
Today, Big Data is a hot issue both in industrial and academic fields. The need of data
processing is changing with the gradual increase in data volume and with the mass of
sources leading to a diversity of structures. Although relational database management
system (RDBMS) remaining the primary technology for data management of structured
data and been proven best for more than 40 years, it has reached its limit, and the reason is
massive growth in the diverged volume of data. Several researchers and organizations now
focused on MapReduce and Spark framework that has discovered huge success in
processing and analyzing a large volume of data on several clusters. In this study, the
performance of MapReduce, RDBMS, and Spark with various comparison measures are
evaluated. To conduct a comparison and analysis, three processes are computed: (a)
developed recommendation system with all three algorithms, (b) run that system on various
data networks and data sizes, and (c) the output is then analyzed and compared on the basis
of time computation, memory consumption, and CPU usage. Moreover, statistical validation of the observed results from all the algorithms with respective node and network
configuration using Friedman rank test and Holm post-hoc test are performed. Overall,
observations show that Spark is about 2.5x and 5x faster than MapReduce, and 10/20 times
faster than RDBMS. The reason for these speedups is the efficiency of the alternative least
square algorithm and reduced CPU and disk overheads due to RDD caching in spark.
Description
Today, Big Data is a hot issue both in industrial and academic fields. The need of data
processing is changing with the gradual increase in data volume and with the mass of
sources leading to a diversity of structures.