A Comparative Analysis of Cloud based Recommendation System on Mapreduce and Spark

Journal Title
Journal ISSN
Volume Title
Publisher
Pulchowk Campus
Abstract
Today, Big Data is a hot issue both in industrial and academic fields. The need of data processing is changing with the gradual increase in data volume and with the mass of sources leading to a diversity of structures. Although relational database management system (RDBMS) remaining the primary technology for data management of structured data and been proven best for more than 40 years, it has reached its limit, and the reason is massive growth in the diverged volume of data. Several researchers and organizations now focused on MapReduce and Spark framework that has discovered huge success in processing and analyzing a large volume of data on several clusters. In this study, the performance of MapReduce, RDBMS, and Spark with various comparison measures are evaluated. To conduct a comparison and analysis, three processes are computed: (a) developed recommendation system with all three algorithms, (b) run that system on various data networks and data sizes, and (c) the output is then analyzed and compared on the basis of time computation, memory consumption, and CPU usage. Moreover, statistical validation of the observed results from all the algorithms with respective node and network configuration using Friedman rank test and Holm post-hoc test are performed. Overall, observations show that Spark is about 2.5x and 5x faster than MapReduce, and 10/20 times faster than RDBMS. The reason for these speedups is the efficiency of the alternative least square algorithm and reduced CPU and disk overheads due to RDD caching in spark.
Description
Today, Big Data is a hot issue both in industrial and academic fields. The need of data processing is changing with the gradual increase in data volume and with the mass of sources leading to a diversity of structures.
Citation