Abstract

Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli , and consider additional quality control for their results.

Keywords

AnnotationGenomeGenome projectDNA sequencingComputational biologyComputer scienceGene AnnotationBacterial genome sizeBiologyGeneGenetics

Affiliated Institutions

Related Publications

Ensembl 2020

Abstract The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across...

2019 Nucleic Acids Research 1174 citations

Publication Info

Year
2010
Type
review
Volume
156
Issue
7
Pages
1909-1917
Citations
105
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

105
OpenAlex

Cite This

Maria Poptsova, J. Peter Gogarten (2010). Using comparative genome analysis to identify problems in annotated microbial genomes. Microbiology , 156 (7) , 1909-1917. https://doi.org/10.1099/mic.0.033811-0

Identifiers

DOI
10.1099/mic.0.033811-0