Abstract
DNA sequences are translated into protein coding sequences and then further assigned to protein families in metagenomic analyses, because of the need for sensitivity. However, huge amounts of sequence data create the problem that even general homology search analyses using BLASTX become difficult in terms of computational cost. We designed a new homology search algorithm that finds seed sequences based on the suffix arrays of a query and a database, and have implemented it as GHOSTX. GHOSTX achieved approximately 131-165 times acceleration over a BLASTX search at similar levels of sensitivity. GHOSTX is distributed under the BSD 2-clause license and is available for download at http://www.bi.cs.titech.ac.jp/ghostx/. Currently, sequencing technology continues to improve, and sequencers are increasingly producing larger and larger quantities of data. This explosion of sequence data makes computational analysis with contemporary tools more difficult. We offer this tool as a potential solution to this problem.
Keywords
Affiliated Institutions
Related Publications
SSAHA: A Fast Search Method for Large DNA Databases
We describe an algorithm, SSAHA ( S equence S earch and A lignment by H ashing A lgorithm), for performing fast searches on databases containing multiple gigabases of DNA. Seque...
Database searching using mass spectrometry data
Abstract Large‐scale DNA sequencing is creating a sequence infrastructure of great benefit to protein biochemistry. Concurrent with the application of large‐scale DNA sequencing...
BLAST and FASTA Similarity Searching for Multiple Sequence Alignment
BLAST, FASTA, and other similarity searching programs seek to identify homologous proteins and DNA sequences based on excess sequence similarity. If two sequences share much mor...
Database of homology‐derived protein structures and the structural meaning of sequence alignment
Abstract The database of known protein three‐dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The ...
MUMmer4: A fast and versatile genome alignment system
The MUMmer system and the genome sequence aligner nucmer included within it are among the most widely used alignment packages in genomics. Since the last major release of MUMmer...
Publication Info
- Year
- 2014
- Type
- article
- Volume
- 9
- Issue
- 8
- Pages
- e103833-e103833
- Citations
- 91
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1371/journal.pone.0103833