Abstract

With the goal of solving the whole‐cell problem with Escherichia coli K‐12 as a model cell, highly accurate genomes were determined for two closely related K‐12 strains, MG1655 and W3110. Completion of the W3110 genome and comparison with the MG1655 genome revealed differences at 267 sites, including 251 sites with short, mostly single‐nucleotide, insertions or deletions (indels) or base substitutions (totaling 358 nucleotides), in addition to 13 sites with an insertion sequence element or defective prophage in only one strain and two sites for the W3110 inversion. Direct DNA sequencing of PCR products for the 251 regions with short indel and base disparities revealed that only eight sites are true differences. The other 243 discrepancies were due to errors in the original MG1655 sequence, including 79 frameshifts, one amino‐acid residue deletion, five amino‐acid residue insertions, 73 missense, and 17 silent changes within coding regions. Errors in the original MG1655 sequence (<1 per 13 000 bases) were mostly within portions sequenced with out‐dated technology based on radioactive chemistry.

Keywords

IndelBiologyGenomeProphageGeneticsEscherichia coliWhole genome sequencingINDEL MutationBase pairComputational biologyDNAGeneBacteriophageSingle-nucleotide polymorphismGenotype

MeSH Terms

Base SequenceEscherichia coliGenomeBacterialMutationSequence AnalysisDNA

Affiliated Institutions

Related Publications

Publication Info

Year
2006
Type
article
Volume
2
Issue
1
Pages
2006.0007-2006.0007
Citations
496
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

496
OpenAlex
59
Influential
387
CrossRef

Cite This

Kôji Hayashi, Naoki Morooka, Yoshihiro Yamamoto et al. (2006). Highly accurate genome sequences of<i>Escherichia coli</i>K‐12 strains MG1655 and W3110. Molecular Systems Biology , 2 (1) , 2006.0007-2006.0007. https://doi.org/10.1038/msb4100049

Identifiers

DOI
10.1038/msb4100049
PMID
16738553
PMCID
PMC1681481

Data Quality

Data completeness: 86%