Abstract

Abstract Motivation: Non-coding RNAs (ncRNAs) are functional RNA molecules that do not code for proteins. Covariance Models (CMs) are a useful statistical tool to find new members of an ncRNA gene family in a large genome database, using both sequence and, importantly, RNA secondary structure information. Unfortunately, CM searches are extremely slow. Previously, we created rigorous filters, which provably sacrifice none of a CM's accuracy, while making searches significantly faster for virtually all ncRNA families. However, these rigorous filters make searches slower than heuristics could be. Results: In this paper we introduce profile HMM-based heuristic filters. We show that their accuracy is usually superior to heuristics based on BLAST. Moreover, we compared our heuristics with those used in tRNAscan-SE, whose heuristics incorporate a significant amount of work specific to tRNAs, where our heuristics are generic to any ncRNA. Performance was roughly comparable, so we expect that our heuristics provide a high-quality solution that—unlike family-specific solutions—can scale to hundreds of ncRNA families. Availability: The source code is available under GNU Public License at the supplementary web site. Contact: zasha@cs.washington.edu Supplementary information: (Technical details, results, C++ code)

Keywords

HeuristicsComputer scienceNon-coding RNASource codeHeuristicAnnotationTheoretical computer scienceComputational biologyRNAArtificial intelligenceBiologyGeneticsGene

MeSH Terms

AlgorithmsComputational BiologyGenomeHumansMarkov ChainsModelsStatisticalNucleic Acid ConformationProtein StructureSecondaryProteinsRNARNATransferRNAUntranslatedROC CurveSensitivity and SpecificitySequence AlignmentSoftware

Affiliated Institutions

Related Publications

Publication Info

Year
2005
Type
article
Volume
22
Issue
1
Pages
35-39
Citations
94
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

94
OpenAlex
2
Influential
70
CrossRef

Cite This

Zasha Weinberg, Walter L. Ruzzo (2005). Sequence-based heuristics for faster annotation of non-coding RNA families. Bioinformatics , 22 (1) , 35-39. https://doi.org/10.1093/bioinformatics/bti743

Identifiers

DOI
10.1093/bioinformatics/bti743
PMID
16267089

Data Quality

Data completeness: 90%