Abstract

Elimination of the data processing bottleneck in high-throughput sequencing will require both improved accuracy of data processing software and reliable measures of that accuracy. We have developed and implemented in our base-calling program phred the ability to estimate a probability of error for each base-call, as a function of certain parameters computed from the trace data. These error probabilities are shown here to be valid (correspond to actual error rates) and to have high power to discriminate correct base-calls from incorrect ones, for read data collected under several different chemistries and electrophoretic conditions. They play a critical role in our assembly program phrap and our finishing program consed.

Keywords

BottleneckBase (topology)SoftwareComputer scienceBiologyFunction (biology)TRACE (psycholinguistics)Word error rateAlgorithmArtificial intelligenceGeneticsMathematicsProgramming languageEmbedded system

MeSH Terms

Base SequenceChimeraCloningMolecularData InterpretationStatisticalDiscriminant AnalysisGenetic VectorsHuman Genome ProjectHumansProbabilityQuality ControlReproducibility of ResultsSequence AnalysisDNASoftware

Affiliated Institutions

Related Publications

Publication Info

Year
1998
Type
article
Volume
8
Issue
3
Pages
186-194
Citations
5469
Access
Closed

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

5469
OpenAlex
330
Influential
3718
CrossRef

Cite This

Brent Ewing, Phil Green (1998). Base-Calling of Automated Sequencer Traces Using <i>Phred.</i> II. Error Probabilities. Genome Research , 8 (3) , 186-194. https://doi.org/10.1101/gr.8.3.186

Identifiers

DOI
10.1101/gr.8.3.186
PMID
9521922

Data Quality

Data completeness: 86%