Optimization of Shard Selection Techniques on Elasticsearch

Pant, Yashasvi Raj

Optimization of Shard Selection Techniques on Elasticsearch

Files

Thesis_Final_Deliveries.zip (55.47 MB)

Date

2021-08

Authors

Pant, Yashasvi Raj

Publisher

Pulchowk Campus

Abstract

Distributed systems typically consist of several nodes connected together for han- dling search operations. Data is divided into those nodes for the purpose of parallel processing and replications. Elasticsearch is the popular distributed search engine where data is organized into indices. Each index of Elasticsearch consists of one or more shards and those shards can be distributed over di erent nodes. When a search operation is performed on a particular index, sending the search requests to all the related shards distributed over di erent nodes might result in high latency especially when the size of the cluster is large and nodes are far apart. Shard Selection is the technique that attempts to forward the query to the highly relevant shards discarding other non-relevant shards and thus decreasing the latency. Shard selection comes with the cost of relevance, it's obvious that the application of the shard selection algorithm might decrease the query relevance. There are several shard selection algorithms developed time and again. Among them, ReDDe, Sushi, and Rank-S are very popular. In this paper, a new shard selection algorithm called Hybrid Optimized Shard Selection Algorithm (HOSSA) is developed extracting core features from each of these three algorithms and also optimizing shard-related parameters. HOSSA has shown improvements both in terms of latency and rele- vance compared to the existing shard selection algorithms. The experimentation is performed using Insider Threat Test Dataset(CERT V6.2) collected from Carnegie Mellon University site . In terms of average latency, the HOSSA is performing 19.34%, 15.6%, and 7.30% better than SUSHI, ReDDe, and Rank-S respectively. In terms of Average Document Score, the HOSSA is performing 33.09%, 18.89%, and 3.31% better than SUSHI, ReDDe, and Rank-S respectively.

Description

Distributed systems typically consist of several nodes connected together for han- dling search operations.

Keywords

Nodes,, Elasticsearch,, Index,, Shards,, ReDDe,, Sushi,, Rank-S

Citation

MASTER OF SCIENCE IN COMPUTER SYSTEM AND KNOWLEDGE ENGINEERING

URI

https://hdl.handle.net/20.500.14540/7731

Collections

Electronics and Computer Engineering

Full item page

Optimization of Shard Selection Techniques on Elasticsearch

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections