Please use this identifier to cite or link to this item:
https://elibrary.tucl.edu.np/handle/123456789/7731
Title: | Optimization of Shard Selection Techniques on Elasticsearch |
Authors: | Pant, Yashasvi Raj |
Keywords: | Nodes,;Elasticsearch,;Index,;Shards,;ReDDe,;Sushi,;Rank-S |
Issue Date: | Aug-2021 |
Publisher: | Pulchowk Campus |
Institute Name: | Institute of Engineering |
Level: | Masters |
Citation: | MASTER OF SCIENCE IN COMPUTER SYSTEM AND KNOWLEDGE ENGINEERING |
Abstract: | Distributed systems typically consist of several nodes connected together for han- dling search operations. Data is divided into those nodes for the purpose of parallel processing and replications. Elasticsearch is the popular distributed search engine where data is organized into indices. Each index of Elasticsearch consists of one or more shards and those shards can be distributed over di erent nodes. When a search operation is performed on a particular index, sending the search requests to all the related shards distributed over di erent nodes might result in high latency especially when the size of the cluster is large and nodes are far apart. Shard Selection is the technique that attempts to forward the query to the highly relevant shards discarding other non-relevant shards and thus decreasing the latency. Shard selection comes with the cost of relevance, it's obvious that the application of the shard selection algorithm might decrease the query relevance. There are several shard selection algorithms developed time and again. Among them, ReDDe, Sushi, and Rank-S are very popular. In this paper, a new shard selection algorithm called Hybrid Optimized Shard Selection Algorithm (HOSSA) is developed extracting core features from each of these three algorithms and also optimizing shard-related parameters. HOSSA has shown improvements both in terms of latency and rele- vance compared to the existing shard selection algorithms. The experimentation is performed using Insider Threat Test Dataset(CERT V6.2) collected from Carnegie Mellon University site . In terms of average latency, the HOSSA is performing 19.34%, 15.6%, and 7.30% better than SUSHI, ReDDe, and Rank-S respectively. In terms of Average Document Score, the HOSSA is performing 33.09%, 18.89%, and 3.31% better than SUSHI, ReDDe, and Rank-S respectively. |
Description: | Distributed systems typically consist of several nodes connected together for han- dling search operations. |
URI: | https://elibrary.tucl.edu.np/handle/123456789/7731 |
Appears in Collections: | Electronics and Computer Engineering |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Thesis_Final_Deliveries.zip | 56.8 MB | Unknown | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.