On the Distribution of<i>K</i>-tuple Matches for Sequence Homology: A Constant Time Exact Calculation of the Variance

1998 Journal of Computational Biology 13 citations

Abstract

We study the distribution of a statistic useful in calculating the significance of the number of k-tuple matches detected in biological sequence homology algorithms. The statistic is Rn,k, the total number of heads in head runs of length k or more in a sequence of iid Bernoulli trials of length n. Calculation of the mean is straightforward. Poisson approximation formulas have been used for the variance because they are simple and powerful. Unfortunately, when p = P(Head) is large, the Poisson approximation no longer works well. In our application, p is large, say .75, and we have turned instead to direct calculation of the variance. Surprisingly, we are able to show that the variance, which is based on the interactions of O(n2) random variables, can be computed in constant time, independent of the length of the sequence and probability p. This result can be used to calculate the mean and variance of a number of other head run statistics in constant time. Additionally, we show how to extend the result to sequences generated by a stationary Markov process where the variance can be calculated in O(n) time.

Keywords

MathematicsConstant (computer programming)Variance (accounting)Homology (biology)Sequence (biology)Distribution (mathematics)StatisticsCombinatoricsBiologyComputer scienceMathematical analysisGenetics

Affiliated Institutions

Related Publications

Bootstrap Methods: Another Look at the Jackknife

We discuss the following problem: given a random sample $\\mathbf{X} = (X_1, X_2, \\cdots, X_n)$ from an unknown probability distribution $F$, estimate the sampling distribution...

1979 The Annals of Statistics 16966 citations

Publication Info

Year
1998
Type
article
Volume
5
Issue
1
Pages
87-100
Citations
13
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

13
OpenAlex

Cite This

Gary Benson, Xiaoping Su (1998). On the Distribution of<i>K</i>-tuple Matches for Sequence Homology: A Constant Time Exact Calculation of the Variance. Journal of Computational Biology , 5 (1) , 87-100. https://doi.org/10.1089/cmb.1998.5.87

Identifiers

DOI
10.1089/cmb.1998.5.87