Modified Randomization Tests for Nonparametric Hypotheses

Abstract

Suppose $X_1, \\cdots, X_m, Y_1, \\cdots, Y_n$ are $m + n = N$ independent random variables, the $X$'s identically distributed and the $Y$'s identically distributed, each with a continuous cdf. Let $$z = (z_1, \\cdots, z_m, z_{m + 1}, \\cdots, z_N) = (x_1, \\cdots, x_m, y_1, \\cdots, y_n)$$ represent an observation on the $N$ random variables and let $$u(z) = (1/m) \\sum^m_{i = 1} z_i - (1/n) \\sum^N_{i = m + 1} z_i = \\bar x - \\bar y$$. Consider the $r = N! N$-tuples obtained from $(z_1, \\cdots, z_N)$ by making all permutations of the indices $(1, \\cdots, N)$. Since we assume continuous cdf's, then with probability one, these $r N$-tuples will be distinct. Denote them by $z^{(1)}, \\cdots, z^{(r)}$, and suppose that they have been ordered so that $$u(z^{(1)} \\geqq \\cdots \\geqq u(z^{(r)})$$. Notice that since $$\\bar x - \\bar y = (1/m) \\sum^N_{i = 1} z_i - (N/m)\\bar y = (N/n)\\bar x - (1/n) \\sum^N_{i = 1} z_i,$$ the same ordering can be induced by choosing $u(z) = c\\bar x$ or $u(z) = - c\\bar y$ for any $c > 0$. Assuming that the cdf's of $X_1, Y_1$ are of the form $F(x), F(x - \\Delta)$ respectively, Pitman [2] suggested essentially the following test of the hypothesis $H'$ that $\\Delta = 0$. Select a set of $k (k > 0)$ integers $i_1, \\cdots, i_k, (1 \\leqq i_1 < \\cdots < i_k \\leqq r)$. If the observed $z$ is one of the points $z^{(i_1)}, \\cdots, z^{(i_k)}$, reject $H'$, otherwise accept. When $H'$ is true, the type one error does not depend on the specific form of the distribution of the $X$'s and the $Y$'s and is in fact equal to $k/r$. The choice of the rejection set $i_1, \\cdots, i_k$ should depend on the alternative hypothesis. For instance, if the experimenter wants protection against the alternative that the "$X$'s tend to be larger than the $Y$'s," then the labels $1, \\cdots, k$ might be reasonable. For the alternative that the "$X$'s tend to be smaller than the $Y$'s" the analogous procedure is to use the other tail, $r - k + 1, \\cdots, r$. Against both alternatives, a two-tail procedure could be used. Lehmann and Stein have shown in [1] that in the class of all tests (of size $\\alpha = k/r$) of the hypothesis $$H: \\text{the distribution of} X_1 \\cdots, X_m, Y_1, \\cdots, Y_n \\text{is invariant under all permutations},$$ the single-tail test based on $1, \\cdots, k$ is uniformly most powerful against the alternatives that $F_1$ is an $N(\\theta, \\sigma)$ cdf, $F_2$ is an $N(\\theta + \\Delta, \\sigma)$ cdf, $\\Delta < 0$; the test based on $r - k + 1, \\cdots, r$ is uniformly most powerful for $\\Delta > 0$. A practical shortcoming of this procedure is the great difficulty in enumerating the points $z^{(i)}$ and the evaluation of $u(z^{(i)})$ for each of them. For instance, even after eliminating those permutations which always give the same value of $u$, then for sample sizes $m = n = 5$, there are $\\binom{10}{5} = 252$ permutations to examine, and for sample sizes $m = n = 10$, there are $\\binom{20}{10} = 184,765$ permutations to examine. In the following section, we propose the almost obvious procedure of examining a "random sample" of permutations and making the decision to accept or reject $H$ on the basis of those permutations only. Bounds are determined for the ratio of the power of the original procedure to the modified one. Some numerical values of these bounds are given in Table 1. The bounds there listed correspond to tests which in both original and modified form have size $\\alpha$, and for which the modified test is based on a random sample of $s$ permutations drawn with replacement. These have been computed for a certain class of alternatives which is described below. For simplicity, we have restricted the main exposition to the two-sample problem. In Section 5, we point out extensions to the more general hypotheses of invariance studied in [1].

Keywords

CombinatoricsMathematicsIndependent and identically distributed random variablesBar (unit)Random variableStatisticsPhysics

Related Publications

A Class of Statistics with Asymptotically Normal Distribution

Wassily Hoeffding

Let $X_1, \\cdot, X_n$ be $n$ independent random vectors, $X_\\nu = (X^{(1)}_\\nu, \\cdots, X^{(r)}_\\nu),$ and $\\Phi(x_1, \\cdots, x_m)$ a function of $m(\\leq n)$ vectors $x_...

1948 The Annals of Mathematical Statistics 1829 citations

Correlation hole of the spin-polarized electron gas, with exact small-wave-vector and high-density scaling

Yue Wang , John P. Perdew

For a uniform electron gas of density n=${\mathit{n}}_{\mathrm{\ensuremath{\uparrow}}}$+${\mathit{n}}_{\mathrm{\ensuremath{\downarrow}}}$=3/4\ensuremath{\pi}${\mathit{r}}_{\math...

1991 Physical review. B, Condensed matter 1428 citations

Noiseless coding of correlated information sources

D. Slepian , J.K. Wolf

Correlated information sequences <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">\cdots ,X_{-1},X_0,X_1, \cdots</tex> and <tex xml...

1973 IEEE Transactions on Information Theory 3954 citations

Laws of Large Numbers for Sums of Extreme Values

David M. Mason

Let $X_1, X_2, \\cdots$, be a sequence of nonnegative i.i.d. random variables with common distribution $F$, and for each $n \\geq 1$ let $X_{1n} \\leq \\cdots \\leq X_{nn}$ deno...

1982 The Annals of Probability 284 citations

Estimation in a Multivariate "Errors in Variables" Regression Model: Large Sample Results

Leon Jay Gleser

In a multivariate "errors in variables" regression model, the unknown mean vectors $\\mathbf{u}_{1i}: p \\times 1, \\mathbf{u}_{2i}: r \\times 1$ of the vector observations $\\m...

1981 The Annals of Statistics 300 citations

Publication Info

Year: 1957
Type: article
Volume: 28
Issue: 1
Pages: 181-187
Citations: 731
Access: Closed

External Links

View on DOI.org

Social Impact

Altmetric

Modified Randomization Tests for Nonparametric Hypotheses

PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

731

OpenAlex

Cite This

APA Style

                            
                                    Meyer Dwass
                                
                            (1957). 
                            Modified Randomization Tests for Nonparametric Hypotheses. 
                            The Annals of Mathematical Statistics
                            , 28
                            (1)
                            , 181-187.
                            https://doi.org/10.1214/aoms/1177707045

Identifiers

DOI: 10.1214/aoms/1177707045