Abstract
We study the distribution of a statistic useful in calculating the significance of the number of k-tuple matches detected in biological sequence homology algorithms. The statistic is Rn,k, the total number of heads in head runs of length k or more in a sequence of iid Bernoulli trials of length n. Calculation of the mean is straightforward. Poisson approximation formulas have been used for the variance because they are simple and powerful. Unfortunately, when p = P(Head) is large, the Poisson approximation no longer works well. In our application, p is large, say .75, and we have turned instead to direct calculation of the variance. Surprisingly, we are able to show that the variance, which is based on the interactions of O(n2) random variables, can be computed in constant time, independent of the length of the sequence and probability p. This result can be used to calculate the mean and variance of a number of other head run statistics in constant time. Additionally, we show how to extend the result to sequences generated by a stationary Markov process where the variance can be calculated in O(n) time.
Keywords
Affiliated Institutions
Related Publications
The Jackknife and the Bootstrap for General Stationary Observations
We extend the jackknife and the bootstrap method of estimating standard errors to the case where the observations form a general stationary sequence. We do not attempt a reducti...
A systematic study of coordinate precision in X-ray structure analyses. II. Predictive estimates of e.s.d.'s for the general-atom case
The relationship between the mean isotropic e.s.d. \sigma \bar(A)o of any element type A in a crystal structure and the R-factor and atomic constitution of that structure is exp...
Bootstrap Methods: Another Look at the Jackknife
We discuss the following problem: given a random sample $\\mathbf{X} = (X_1, X_2, \\cdots, X_n)$ from an unknown probability distribution $F$, estimate the sampling distribution...
The Distribution of Relaxation Times in Typical Dielectrics
K. W. Wagner's treatment of the distribution of relaxation times in dielectrics is reviewed; the effect of the density of distribution upon the frequency variation of the dielec...
HyRec: A fast and highly accurate primordial hydrogen and helium recombination code
We present a state-of-the-art primordial recombination code, HYREC, including all the physical effects that have been shown to significantly affect recombination. The computatio...
Publication Info
- Year
- 1998
- Type
- article
- Volume
- 5
- Issue
- 1
- Pages
- 87-100
- Citations
- 13
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1089/cmb.1998.5.87