Abstract
Abstract Motivation: Non-coding RNAs (ncRNAs) are functional RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are extremely slow. Previously, we created rigorous filters, which provably sacrifice none of a CM's accuracy, while making searches significantly faster for virtually all ncRNA families. However, these rigorous filters make searches slower than heuristics could be. Results: In this paper we introduce profile HMM-based heuristic filters. We show that their accuracy is usually superior to heuristics based on BLAST. Moreover, we compared our heuristics with those used in tRNAscan-SE, whose heuristics incorporate a significant amount of work specific to tRNAs, where our heuristics are generic to any ncRNA. Performance was roughly comparable, so we expect that our heuristics provide a high-quality solution that—unlike family-specific solutions—can scale to hundreds of ncRNA families. Availability: The source code is available under GNU Public License at the supplementary web site. Contact: zasha@cs.washington.edu Supplementary information: (Technical details, results, C++ code)
Keywords
MeSH Terms
Affiliated Institutions
Related Publications
Infernal 1.1: 100-fold faster RNA homology searches
Abstract Summary: Infernal builds probabilistic profiles of the sequence and secondary structure of an RNA family called covariance models (CMs) from structurally annotated mult...
Clusters of Internally Primed Transcripts Reveal Novel Long Noncoding RNAs
Non-protein-coding RNAs (ncRNAs) are increasingly being recognized as having important regulatory roles. Although much recent attention has focused on tiny 22- to 25-nucleotide ...
Rfam 12.0: updates to the RNA families database
The Rfam database (available at http://rfam.xfam.org) is a collection of non-coding RNA families represented by manually curated sequence alignments, consensus secondary structu...
Rfam 11.0: 10 years of RNA families
The Rfam database (available via the website at http://rfam.sanger.ac.uk and through our mirror at http://rfam.janelia.org) is a collection of non-coding RNA families, primarily...
Designing Patterns and Profiles for Faster HMM Search
Profile HMMs are powerful tools for modeling conserved motifs in proteins. They are widely used by search tools to classify new protein sequences into families based on domain a...
Publication Info
- Year
- 2005
- Type
- article
- Volume
- 22
- Issue
- 1
- Pages
- 35-39
- Citations
- 94
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/bioinformatics/bti743
- PMID
- 16267089