AccessMyLibrary provides FREE access to over 30 million articles from top publications available through your library.

Large-scale simultaneous hypothesis testing: the choice of a null hypothesis.

Journal of the American Statistical Association

| March 01, 2004 | Efron, Bradley | COPYRIGHT 1999 American Statistical Association. This material is published under license from the publisher through the Gale Group, Farmington Hills, Michigan.  All inquiries regarding rights should be directed to the Gale Group. (Hide copyright information)Copyright

1. INTRODUCTION

Until recently, "simultaneous inference" meant considering two or five or perhaps even 10 hypothesis tests at the same time, as in Miller's classic text (Miller 1981). Rapid progress in technology, particularly in genomics and imaging, has vastly upped the ante for simultaneous inference problems. Now 500 or 5,000 or even 50,000 tests may need to be evaluated simultaneously, raising new problems for the statistician, but also opening new analytic opportunities. This article explores choosing an appropriate null hypothesis in large-scale testing situations, and how this choice affects well-known inference methods, such as the false discovery rate (FDR).

Simultaneous hypothesis testing begins with a collection of null hypotheses,

[H.sub.1],[H.sub.2],...,[H.sub.N]; (1)

corresponding test statistics, possibly not independent,

[Y.sub.1],[Y.sub.2],...,[Y.sub.N]; (2)

and their p values, [P.sub.1], [P.sub.2],..., [P.sub.N], with [P.sub.i] measuring how strongly [y.sub.i], the observed value of [Y.sub.i], contradicts [H.sub.i]; for instance, [P.sub.i] = [Pr.sub.H.sub.i]{|[Y.sub.i]| > |[y.sub.i]|}. "Large-scale" means that N is a big number, say at least N > 100.

It is convenient, although not necessary, to work with z-values instead of the [Y.sub.i]'s or [P.sub.i]'s,

[z.sub.i] = [[PHI].sup.-1]([P.sub.i]), i = 1, 2,...,N, (3)

with [PHI] indicating the standard normal cumulative distribution function (cdf), for example, [[PHI].sup.-1](.95) = 1.645. If [H.sub.i] is exactly true, then [z.sub.i] will have a standard normal distribution,

[z.sub.i]|[H.sub.i] [approximately] N(0,1). (4)

I call (4) the theoretical null hypothesis.

Our motivating example concerns a study of 1,391 patients with human immunodeficiency virus (HIV) infection, investigating which of 6 protease inhibitor (PI) drugs cause mutations at which of 74 sites on the viral genome. Each patient provided a vector of predictors,

x = ([x.sub.1], [x.sub.2],...,[x.sub.6]), (5)

with [x.sub.j] = 1 or 0 indicating whether or not the patient used P[I.sub.j], 1 [less than or equal to] [[summation].sub.1.sup.6][x.sub.j] [less than or equal to] 6; and a vector of responses,

v = ([v.sub.1], [v.sub.2],...,[v.sub.74]), (6)

[v.sub.k] = 1 or 0 indicating whether or not a mutation occurred at site k. Remark A of Section 7 describes the study in more detail.

For each of the 74 genomic sites, a separate logistic regression analysis was run using all 1,391 cases, with that site's mutation indicators as responses and the PI indicators as predictors. Together these yielded 444 = 6 X 74 z-values, one for testing each null hypothesis that drug j does not cause mutations at site k, j = 1, 2,...,6 and k = 1, 2,...,74. The z-values were based on the usual approximation

[z.sub.i] = [y.sub.i]/s[e.sub.i], i = 1, 2,...,444, (7)

[using a single subscript i in place of (j, k)] where [y.sub.i] is the maximum likelihood estimate (MLE) of the logistic regression coefficient and s[e.sub.i] is its approximate large-sample standard error.

Figure 1 shows a histogram of the 444 z-values, with negative [z.sub.i]'s indicating greater mutational effects. The smooth curve, f(z), is a natural spline with 7 df, fit to the histogram counts by Poisson regression. It emphasizes the central peak near z = 0, presumably the large majority of uninteresting drug-site combinations that have negligible mutation effects. Near its center, the peak is well described by a normal density with mean -.35 and standard deviation 1.20, which will be called the empirical null hypothesis,

[z.sub.i]|[H.sub.i] [approximately] N(-.35, [1.20.sup.2]). (8)

Section 3 describes the estimation methodology for (8), with a brief discussion of the normality assumption in Remark D of Section 7.

The difference between the theoretical null N(0, 1) and empirical null N(-.35, [1.20.sup.2]) may not seem worrisome here, but it will be shown that it substantially affects any simultaneous inference procedure. More dramatic example is given in Section 6, for a microarray analysis in which going from the theoretical to empirical null totally negates any findings of significance. Situations going in the reverse direction can also occur.

[FIGURE 1 OMITTED]

In classic situations involving only a single hypothesis test, one must, out of necessity, use the theoretical null hypothesis, z [approximately] N(0, 1). The main point of this article is that large-scale testing situations permit empirical estimation of the null distribution. Sections 3-5 explore reasons why the empirical and theoretical null might differ, and which might be preferable in different situations.

There are scientific as well as statistical differences between small-scale and large-scale hypothesis testing situations. A single hypothesis test is most often run with the expectation and hope of rejecting the null, "with 80% power" in a typical clinical trial. Nobody wants to reject 80% of N = 5,000 null hypotheses. The usual point of large-scale testing is to identify a small percentage of interesting cases that deserve further investigation. Although we are not exactly looking for a needle in a haystack, we do not want the whole haystack either. An important assumption of what follows is that the proportion of interesting cases is small, perhaps 1% or 5% of N, but not more than 10%. This is made explicit in Section 2, in the description of the local false discovery rate as an analytic tool for large-scale testing. There are situations in which the 10% limit is irrelevant (e.g., in constructing prediction models), but these lie outside our purpose here.

The terminology "Interesting/Uninteresting" used in this article in preference to "Significant/Nonsignificant" is discussed near the end of Section 5. We conclude in Sections 7 and 8 with remarks, including most of the technical details, and a summary.

2. THE LOCAL FALSE DISCOVERY RATE

It is convenient to discuss large-scale testing problems in terms of the local false discovery rate (fdr), an empirical Bayes version of Benjamini and Hochberg's (1995) methodology focusing on densities rather than tail areas (see Efron et al. 2001; Efron and Tibshirani 2002; Storey 2002, 2003).

We begin with a simple Bayes model. Suppose that each of the N z-values falls into one of two classes, "Uninteresting" or "Interesting," corresponding to whether or not [z.sub.i] is generated according to the null hypothesis, with prior probabilities [p.sub.0] and [p.sub.1] = 1 - [p.sub.0] for the classes. Assume that [z.sub.i] has density either [f.sub.0](z) or [f.sub.1](z), depending on its class,

[p.sub.0] = Pr{Uninteresting}, [f.sub.0](z) density if Uninteresting (Null), (9)

[p.sub.1] = Pr{Interesting}, [f.sub.1](z) density if Interesting (Nonnull).

The smooth curve in Figure 1 estimates the mixture density, f(z),

f(z) = [p.sub.0][f.sub.0](z) + …

Related articles from newspapers, magazines, journals, and more
Two-Sample [T.sub.3] Plot: A Graphical Comparison of Two...
Magazine article from: Journal of Computational & Graphical Statistics GHOSH, Sucharita BERAN, Jan March 1, 2000 700+ words
...Sigma].sub.y] [is less than...problem of testing the null hypothesis [H.sub.o] : [F.sub...distribution function under the null hypothesis is not specified in...acceptance region under ...
A martingale approach to the changepoint problem.
Magazine article from: Journal of the American Statistical Association Brostrom, Goran September 1, 1997 700+ words
...p.sub.n]. The null hypothesis to be tested is that...n. The null hypothesis is that [p.sub.1...and T = [S.sub.n], and, under the null hypothesis, [Mathematical Expression...
Permutation tests using estimated distribution functions.
Magazine article from: Journal of the American Statistical Association Fay, Michael P. Shih, Joanna H. March 1, 1998 700+ words
...assumed distribution function, [F.sub.i]. The null hypothesis is that each [F.sub.i], i = 1, . . ., n comes from...1 to n, (2) where we reject the null hypothesis if [L.sub.0] is extreme in the ...
CORRESPONDENCE.
Magazine article from: Journal of Studies on Alcohol CARPENTER, JOHN A. May 1, 2001 700+ words
...are the weakness of the null hypothesis ([H.sub.0]) and the improper...the probability that the null hypothesis ([H.sub.0]) is true given that...and NHST (i.e., the null hypothesis, ...
Calibration of p Values for Testing Precise Null Hypotheses.(statistical theory...
Magazine article from: The American Statistician Sellke, Thomas BAYARRI, M. J. Berger, James O. February 1, 2001 700+ words
...equal to] T([x.sub.obs])). (1...1) is under [H.sub.0]. The null hypothesis is thus a "precise...restriction to a point null hypothesis is only done here for...hypotheses, [H.sub.1], will be introduced...
Permutation Methods: A Distance Function Approach.(Book Reviews)(Book Review)
Magazine article from: Technometrics Liu, Shin Ta August 1, 2002 700+ words
...summation over (I<J)][[DELTA].sub.I,J] is the average distance function value for all distinct pairs of objects in the group [S.sub.i]. Under null hypothesis [H.sub.0], equal probabilities are assigned to each...
On unit-root tests when the alternative is a trend-break stationary process.
Magazine article from: Journal of Business & Economic Statistics Sen, Amit January 1, 2003 700+ words
...statistic ([t.sub.DF]) accepts the null hypothesis of a unit...against the null hypothesis. For example...sup.min.sub.DF] (i...sup.min.sub.DF] (A...against the null ...
Testing ordered alternatives in the presence of incomplete data.
Magazine article from: Journal of the American Statistical Association Alvo, Mayer Cabilio, Paul September 1, 1995 700+ words
...lead to the rejection of [H.sub.0], and, under this null hypothesis, J is asymptotically normal...to rank. In such a case [H.sub.0] is the null hypothesis of lack of preference and [H.sub.1] becomes the ...
Unit root tests in ARMA models with data-dependent methods for the selection of...
Magazine article from: Journal of the American Statistical Association Ng, Serena Perron, Pierre March 1, 1995 700+ words
...q). Because [Delta (difference)][y.sub.t] = [u.sub.t] under the null hypothesis, (2) can also be seen as an autoregression in [Delta (difference)][y.sub.t] augmented by [y.sub.t - 1] namely...
Multiple comparisons with control in a single experiment versus separate...
Magazine article from: The American Statistician Proschan, Michael A. Follmann, Dean A. May 1, 1995 700+ words
...1i] : [[Mu].sub.i] [greater than] [[Mu].sub.0]. The global null hypothesis is [[intersection].sub.i][H.sub.0i]. Suppose that the global null hypothesis is true. It is clear that the PCE is ...
For more facts and information, see all results

Source: HighBeam Research, Large-scale simultaneous hypothesis testing: the choice of a null...

©2010 Gale, a part of Cengage Learning. All rights reserved. About us | FAQs | Contact us | Privacy policy | Terms and conditions
Other Gale sites: Encyclopedia.com | HighBeam Research | Acquire Content | Books & Authors | Goliath | MovieRetriever | Answers Encyclopedia

The AccessMyLibrary advertising network includes: womensforum.com GlamFamily