Homology-based gene structure prediction: simplified matching algorithm using a translated codon (tron) and improved accuracy by allowing for long gaps

Abstract

Abstract Motivation: Locating protein-coding exons (CDSs) on a eukaryotic genomic DNA sequence is the initial and an essential step in predicting the functions of the genes embedded in that part of the genome. Accurate prediction of CDSs may be achieved by directly matching the DNA sequence with a known protein sequence or profile of a homologous family member(s). Results: A new convention for encoding a DNA sequence into a series of 23 possible letters (translated codon or tron code) was devised to improve this type of analysis. Using this convention, a dynamic programming algorithm was developed to align a DNA sequence and a protein sequence or profile so that the spliced and translated sequence optimally matches the reference the same as the standard protein sequence alignment allowing for long gaps. The objective function also takes account of frameshift errors, coding potentials, and translational initiation, termination and splicing signals. This method was tested on Caenorhabditis elegans genes of known structures. The accuracy of prediction measured in terms of a correlation coefficient (CC) was about 95% at the nucleotide level for the 288 genes tested, and 97.0% for the 170 genes whose product and closest homologue share more than 30% identical amino acids. We also propose a strategy to improve the accuracy of prediction for a set of paralogous genes by means of iterative gene prediction and reconstruction of the reference profile derived from the predicted sequences. Availability: The source codes for the program ‘aln’ written in ANSI-C and the test data will be available via anonymous FTP at ftp.genome.ad.jp/pub/genomenet/saitama-cc. Contact: gotoh@cancer-c.pref.saitama.jp

Keywords

Gene predictionGeneGeneticsAlgorithmCoding regionSequence (biology)Computer scienceComputational biologyExonSequence analysisGenomeBiology

Affiliated Institutions

Saitama Cancer Center JP

Related Publications

GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions

J Besemer

Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence ...

2001 Nucleic Acids Research 2290 citations

Use of the UGA terminator as a tryptophan codon in yeast mitochondria.

G. Macino , Gloria M. Coruzzi , F. G. Nóbrega +2 more

We propose that the UGA terminator regularly occurs as a tryptophan codon in yeast mitochondrial DNA. This conclusion is based on the sequence analysis of mitochondrial DNA regi...

1979 Proceedings of the National Academy o... 153 citations

Improved metagenomic analysis with Kraken 2

Derrick E. Wood , Jennifer Lu , Ben Langmead

2019 Genome biology 6015 citations

SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing

Juliane C. Dohm , Claudio Lottaz , Tatiana Borodina +1 more

The latest revolution in the DNA sequencing field has been brought about by the development of automated sequencers that are capable of generating giga base pair data sets quick...

2007 Genome Research 281 citations

Fast and accurate short read alignment with Burrows–Wheeler transform

Heng Li , Richard Durbin

Abstract Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A...

2009 Bioinformatics 59569 citations

Publication Info

Year: 2000
Type: article
Volume: 16
Issue: 3
Pages: 190-202
Citations: 68
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Homology-based gene structure prediction: simplified matching algorithm using a translated codon (tron) and improved accuracy by allowing for long gaps

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

OpenAlex

Cite This

APA Style

                            
                                    Osamu Gotoh
                                
                            (2000). 
                            Homology-based gene structure prediction: simplified matching algorithm using a translated codon (tron) and improved accuracy by allowing for long gaps. 
                            Bioinformatics
                            , 16
                            (3)
                            , 190-202.
                            https://doi.org/10.1093/bioinformatics/16.3.190

Identifiers

DOI: 10.1093/bioinformatics/16.3.190