Parallelization of Star Alignment Algorithm for Multiple Sequence Alignment using MapReduce Model

Journal Title
Journal ISSN
Volume Title
Publisher
Pulchowk Campus
Abstract
Multiple sequence alignment (MSA) is an important problem in molecular biology. Biological sequences are aligned with each other vertically to show possible similarities or differences among these sequences. To solve an MSA problem is to find an alignment of multiple sequences with the highest score based on a given scoring criterion among sequences. Dynamic programming algorithms like Needleman-Wunch and Smith-Waterman produce accurate alignments but these algorithms are computation intensive, computational complexity of O(n2) and are limited to a small number of short sequences. Similarly multiple sequence alignment that processes the sequences one by one, called star alignment, takes time until O(k2n2). However the computation result still has high accuracy. Consequently, it is very important to get a better way to improve the performance. To achieve this, a MapReduce model of star alignment is designed and implemented that executes in parallel on a hadoop clusters. Since hadoop already handles work/job dispatching and work balance among distributed worker nodes, we need note handle node failure and load balancing required with the traditional distributed computing. The experimental result shows that the MapReduce model of star alignment improve the execution time by 3 times with 8 physical nodes than single node with datasets size greater than 1 GB.
Description
Multiple sequence alignment (MSA) is an important problem in molecular biology.
Citation
MASTER OF SCIENCE IN COMPUTER SYSTEM AND KNOWLEDGE ENGINEERING