Optimization of Shard Selection Techniques on Elasticsearch

dc.contributor.authorPant, Yashasvi Raj
dc.date.accessioned2022-01-26T10:15:07Z
dc.date.available2022-01-26T10:15:07Z
dc.date.issued2021-08
dc.descriptionDistributed systems typically consist of several nodes connected together for han- dling search operations.en_US
dc.description.abstractDistributed systems typically consist of several nodes connected together for han- dling search operations. Data is divided into those nodes for the purpose of parallel processing and replications. Elasticsearch is the popular distributed search engine where data is organized into indices. Each index of Elasticsearch consists of one or more shards and those shards can be distributed over di erent nodes. When a search operation is performed on a particular index, sending the search requests to all the related shards distributed over di erent nodes might result in high latency especially when the size of the cluster is large and nodes are far apart. Shard Selection is the technique that attempts to forward the query to the highly relevant shards discarding other non-relevant shards and thus decreasing the latency. Shard selection comes with the cost of relevance, it's obvious that the application of the shard selection algorithm might decrease the query relevance. There are several shard selection algorithms developed time and again. Among them, ReDDe, Sushi, and Rank-S are very popular. In this paper, a new shard selection algorithm called Hybrid Optimized Shard Selection Algorithm (HOSSA) is developed extracting core features from each of these three algorithms and also optimizing shard-related parameters. HOSSA has shown improvements both in terms of latency and rele- vance compared to the existing shard selection algorithms. The experimentation is performed using Insider Threat Test Dataset(CERT V6.2) collected from Carnegie Mellon University site . In terms of average latency, the HOSSA is performing 19.34%, 15.6%, and 7.30% better than SUSHI, ReDDe, and Rank-S respectively. In terms of Average Document Score, the HOSSA is performing 33.09%, 18.89%, and 3.31% better than SUSHI, ReDDe, and Rank-S respectively.en_US
dc.identifier.citationMASTER OF SCIENCE IN COMPUTER SYSTEM AND KNOWLEDGE ENGINEERINGen_US
dc.identifier.urihttps://hdl.handle.net/20.500.14540/7731
dc.language.isoenen_US
dc.publisherPulchowk Campusen_US
dc.subjectNodes,en_US
dc.subjectElasticsearch,en_US
dc.subjectIndex,en_US
dc.subjectShards,en_US
dc.subjectReDDe,en_US
dc.subjectSushi,en_US
dc.subjectRank-Sen_US
dc.titleOptimization of Shard Selection Techniques on Elasticsearchen_US
dc.typeThesisen_US
local.academic.levelMastersen_US
local.affiliatedinstitute.titlePulchowk Campusen_US
local.institute.titleInstitute of Engineeringen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Thesis_Final_Deliveries.zip
Size:
55.47 MB
Format:
Unknown data format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: