Optimization of Shard Selection Techniques on Elasticsearch
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Pulchowk Campus
Abstract
Distributed systems typically consist of several nodes connected together for han-
dling search operations. Data is divided into those nodes for the purpose of parallel
processing and replications. Elasticsearch is the popular distributed search engine
where data is organized into indices. Each index of Elasticsearch consists of one
or more shards and those shards can be distributed over di erent nodes. When a
search operation is performed on a particular index, sending the search requests to
all the related shards distributed over di erent nodes might result in high latency
especially when the size of the cluster is large and nodes are far apart. Shard
Selection is the technique that attempts to forward the query to the highly relevant
shards discarding other non-relevant shards and thus decreasing the latency. Shard
selection comes with the cost of relevance, it's obvious that the application of the
shard selection algorithm might decrease the query relevance. There are several
shard selection algorithms developed time and again. Among them, ReDDe, Sushi,
and Rank-S are very popular. In this paper, a new shard selection algorithm called
Hybrid Optimized Shard Selection Algorithm (HOSSA) is developed extracting
core features from each of these three algorithms and also optimizing shard-related
parameters. HOSSA has shown improvements both in terms of latency and rele-
vance compared to the existing shard selection algorithms.
The experimentation is performed using Insider Threat Test Dataset(CERT V6.2)
collected from Carnegie Mellon University site . In terms of average latency, the
HOSSA is performing 19.34%, 15.6%, and 7.30% better than SUSHI, ReDDe,
and Rank-S respectively. In terms of Average Document Score, the HOSSA is
performing 33.09%, 18.89%, and 3.31% better than SUSHI, ReDDe, and Rank-S
respectively.
Description
Distributed systems typically consist of several nodes connected together for han-
dling search operations.
Citation
MASTER OF SCIENCE IN COMPUTER SYSTEM AND KNOWLEDGE ENGINEERING
