ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches

Abstract

There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith-Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith-Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/

Keywords

Smith–Waterman algorithmComputer scienceSIMDSpeedupMultiple sequence alignmentSequence alignmentAlgorithmParallel computingHeuristicsComputationSimilarity (geometry)HeuristicSequence (biology)DiagonalBiologyArtificial intelligenceMathematics

MeSH Terms

AlgorithmsComputational BiologyDatabasesFactualInformation Storage and RetrievalSensitivity and SpecificitySequence AlignmentSoftware

Affiliated Institutions

Related Publications

Accelerated Profile HMM Searches

Sean R. Eddy

Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, pr...

2011 PLoS Computational Biology 6891 citations

HH-suite3 for fast remote homology detection and deep protein annotation

Martin Steinegger , Markus Meier , Milot Mirdita +3 more

Abstract Background HH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It is based on pairwise alignme...

2019 BMC Bioinformatics 1177 citations

A new bioinformatics analysis tools framework at EMBL-EBI

M. Goujon , Hamish McWilliam , Weizhong Li +4 more

The EMBL-EBI provides access to various mainstream sequence analysis applications. These include sequence similarity search services such as BLAST, FASTA, InterProScan and multi...

2010 Nucleic Acids Research 1870 citations

MMseqs software suite for fast and deep clustering and searching of large protein sequence sets

Maria Hauser , Martin Steinegger , Johannes Söding

Abstract Motivation: Sequence databases are growing fast, challenging existing analysis pipelines. Reducing the redundancy of sequence databases by similarity clustering improve...

2016 Bioinformatics 276 citations

Combining many multiple alignments in one improved alignment.

K Bucka-Lassen , O Caprani , Jotun Hein

Abstract MOTIVATION: The fact that the multiple sequence alignment problem is of high complexity has led to many different heuristic algorithms attempting to find a solution in ...

1999 Bioinformatics 42 citations

Publication Info

Year: 2001
Type: article
Volume: 29
Issue: 7
Pages: 1647-1652
Citations: 60
Access: Closed

External Links

Download PDF (Free) View on DOI.org PubMed Semantic Scholar

Social Impact

Altmetric

ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Influential

CrossRef

Cite This

APA Style

                            
                                    Torbjørn Rognes
                                
                            (2001). 
                            ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches. 
                            Nucleic Acids Research
                            , 29
                            (7)
                            , 1647-1652.
                            https://doi.org/10.1093/nar/29.7.1647

Identifiers

DOI: 10.1093/nar/29.7.1647
PMID: 11266569
PMCID: PMC31274

Data Quality

Data completeness: 86%