Abstract
A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, copy number, mutational history, etc. for tandem repeats has been limited by the inability to easily detect them in genomic sequence data. In this paper, we present a new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size. We model tandem repeats by percent identity and frequency of indels between adjacent pattern copies and use statistically based recognition criteria. We demonstrate the algorithm's speed and its ability to detect tandem repeats that have undergone extensive mutational change by analyzing four sequences: the human frataxin gene, the human beta T cellreceptor locus sequence and two yeast chromosomes. These sequences range in size from 3 kb up to 700 kb. A World Wide Web server interface atc3.biomath.mssm.edu/trf.html has been established for automated use of the program.
Keywords
Affiliated Institutions
Related Publications
Five major nuclear ribosomal repeats represent a large and variable fraction of the genomic DNA of Picea rubens and P. mariana.
The nuclear ribosomal repeats for the 18S, 5.8S, and 26S RNAs of two closely related Picea (spruce) species were characterized by restriction mapping and Southern blot hybridiza...
Assembling millions of short DNA sequences using SSAKE
Abstract Summary: Novel DNA sequencing technologies with the potential for up to three orders magnitude more sequence throughput than conventional Sanger sequencing are emerging...
Fast and accurate short read alignment with Burrows–Wheeler transform
Abstract Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A...
De novo assembly of human genomes with massively parallel short read sequencing
Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length...
Apparent heterozygote deficiencies observed in DNA typing data and their implications in forensic applications
Summary Restriction fragment length polymorphisms (RFLP) analysis using the Southern blot technique can be used to recognize copy number variation of variable number of tandem r...
Publication Info
- Year
- 1999
- Type
- article
- Volume
- 27
- Issue
- 2
- Pages
- 573-580
- Citations
- 9221
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1093/nar/27.2.573