Optimization of Shard Selection Techniques on Elasticsearch

Pant, Yashasvi Raj

Please use this identifier to cite or link to this item: https://elibrary.tucl.edu.np/handle/123456789/7731

Title:	Optimization of Shard Selection Techniques on Elasticsearch
Authors:	Pant, Yashasvi Raj
Keywords:	Nodes,;Elasticsearch,;Index,;Shards,;ReDDe,;Sushi,;Rank-S
Issue Date:	Aug-2021
Publisher:	Pulchowk Campus
Institute Name:	Institute of Engineering
Level:	Masters
Citation:	MASTER OF SCIENCE IN COMPUTER SYSTEM AND KNOWLEDGE ENGINEERING
Abstract:	Distributed systems typically consist of several nodes connected together for han- dling search operations. Data is divided into those nodes for the purpose of parallel processing and replications. Elasticsearch is the popular distributed search engine where data is organized into indices. Each index of Elasticsearch consists of one or more shards and those shards can be distributed over di erent nodes. When a search operation is performed on a particular index, sending the search requests to all the related shards distributed over di erent nodes might result in high latency especially when the size of the cluster is large and nodes are far apart. Shard Selection is the technique that attempts to forward the query to the highly relevant shards discarding other non-relevant shards and thus decreasing the latency. Shard selection comes with the cost of relevance, it's obvious that the application of the shard selection algorithm might decrease the query relevance. There are several shard selection algorithms developed time and again. Among them, ReDDe, Sushi, and Rank-S are very popular. In this paper, a new shard selection algorithm called Hybrid Optimized Shard Selection Algorithm (HOSSA) is developed extracting core features from each of these three algorithms and also optimizing shard-related parameters. HOSSA has shown improvements both in terms of latency and rele- vance compared to the existing shard selection algorithms. The experimentation is performed using Insider Threat Test Dataset(CERT V6.2) collected from Carnegie Mellon University site . In terms of average latency, the HOSSA is performing 19.34%, 15.6%, and 7.30% better than SUSHI, ReDDe, and Rank-S respectively. In terms of Average Document Score, the HOSSA is performing 33.09%, 18.89%, and 3.31% better than SUSHI, ReDDe, and Rank-S respectively.
Description:	Distributed systems typically consist of several nodes connected together for han- dling search operations.
URI:	https://elibrary.tucl.edu.np/handle/123456789/7731
Appears in Collections:	Electronics and Computer Engineering

Files in This Item:

File	Description	Size	Format
Thesis_Final_Deliveries.zip		56.8 MB	Unknown	View/Open

Show full item record

TUCL eLibrary

Easy and open access to all types of digital resources of TUCL