Abstract
Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli , and consider additional quality control for their results.
Keywords
Affiliated Institutions
Related Publications
BG7: A New Approach for Bacterial Genome Annotation Designed for Next Generation Sequencing Data
BG7 is a new system for de novo bacterial, archaeal and viral genome annotation based on a new approach specifically designed for annotating genomes sequenced with next generati...
Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement
Advances in modern sequencing technologies allow us to generate sufficient data to analyze hundreds of bacterial genomes from a single machine in a single day. This potential fo...
OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes
The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restri...
Ensembl 2020
Abstract The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation and comparative genomics across...
Automated Protein Structure Homology Modeling: A Progress Report
Understanding the molecular function of proteins is greatly enhanced by insights gained from their three-dimensional structures. Since experimental structures are only available...
Publication Info
- Year
- 2010
- Type
- review
- Volume
- 156
- Issue
- 7
- Pages
- 1909-1917
- Citations
- 105
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1099/mic.0.033811-0