Please use this identifier to cite or link to this item: https://elibrary.tucl.edu.np/handle/123456789/7292
Title: A Comparative Analysis of Cloud based Recommendation System on Mapreduce and Spark
Authors: Ghimire, Sarala
Keywords: Cloud Computing;MapReduce;Multi-node cluster;Hadoop
Issue Date: Nov-2017
Publisher: Pulchowk Campus
Institute Name: Institute of Engineering
Level: Masters
Abstract: Today, Big Data is a hot issue both in industrial and academic fields. The need of data processing is changing with the gradual increase in data volume and with the mass of sources leading to a diversity of structures. Although relational database management system (RDBMS) remaining the primary technology for data management of structured data and been proven best for more than 40 years, it has reached its limit, and the reason is massive growth in the diverged volume of data. Several researchers and organizations now focused on MapReduce and Spark framework that has discovered huge success in processing and analyzing a large volume of data on several clusters. In this study, the performance of MapReduce, RDBMS, and Spark with various comparison measures are evaluated. To conduct a comparison and analysis, three processes are computed: (a) developed recommendation system with all three algorithms, (b) run that system on various data networks and data sizes, and (c) the output is then analyzed and compared on the basis of time computation, memory consumption, and CPU usage. Moreover, statistical validation of the observed results from all the algorithms with respective node and network configuration using Friedman rank test and Holm post-hoc test are performed. Overall, observations show that Spark is about 2.5x and 5x faster than MapReduce, and 10/20 times faster than RDBMS. The reason for these speedups is the efficiency of the alternative least square algorithm and reduced CPU and disk overheads due to RDD caching in spark.
Description: Today, Big Data is a hot issue both in industrial and academic fields. The need of data processing is changing with the gradual increase in data volume and with the mass of sources leading to a diversity of structures.
URI: https://elibrary.tucl.edu.np/handle/123456789/7292
Appears in Collections:Electronics and Computer Engineering

Files in This Item:
File Description SizeFormat 
Sarala Ghimire.pdf4.75 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.