Abstract
A computer program called BLASTX was previously shown to be effective in identifying and assigning putative function to likely protein coding regions by detecting significant similarity between a conceptually translated nucleotide query sequence and members of a protein sequence database. We present and assess the sensitivity of a new option to this software tool, herein called BLASTC, which employs information obtained from biases in codon utilization, along with the information obtained from sequence similarity. A rationale for combining these diverse information sources was derived, and analyses of the information available from codon utilization in several species were performed, with wide variation seen. Codon bias information was found on average to improve the sensitivity of detection of short coding regions of human origin by about a factor of 5. The implications of combining information sources on the interpretation of positive findings are discussed.
Keywords
Affiliated Institutions
Related Publications
GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions
Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence ...
Use of the UGA terminator as a tryptophan codon in yeast mitochondria.
We propose that the UGA terminator regularly occurs as a tryptophan codon in yeast mitochondrial DNA. This conclusion is based on the sequence analysis of mitochondrial DNA regi...
Species-Diagnostic Differences in a Ribosomal DNA Internal Transcribed Spacer from the Sibling Species Anopheles Freeborni and Anopheles Hermsi (Diptera:Culicidae)
Approximately 460 base pairs (bp) of DNA sequence that included the second internal transcribed spacer (ITS2) and some flanking 5.8S and 28S ribosomal RNA coding regions were co...
The 16s/23s ribosomal spacer region as a target for DNA probes to identify eubacteria.
Variable regions of the 16s ribosomal RNA have been frequently used as the target for DNA probes to identify microorganisms. In some situations, however, there is very little se...
Outlier Detection and False Discovery Rates for Whole-Genome DNA Matching
We define a statistic, called the <em>matching statistic</em>, for locating regions of the genome that exhibit excess similarity among cases when compared to controls. Such regi...
Publication Info
- Year
- 1994
- Type
- article
- Volume
- 1
- Issue
- 1
- Pages
- 39-50
- Citations
- 149
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1089/cmb.1994.1.39