AccessMyLibrary provides FREE access to over 30 million articles from top publications available through your library.
Create a link to this page
Copy and paste this link tag into your Web page or blog:
1. INTRODUCTION
This article introduces a new class of independent and identically distributed (iid) Monte Carlo algorithms which can be used for inference in semiparametric linear mixed models under minimal assumptions for the random-effects distribution. An important feature of our algorithms is that they combine frequentist parametric estimates with Bayesian nonparametric techniques and as such seek to exploit the strengths of both approaches in producing an overall more flexible and efficient method for inference. Our iid algorithms are extensions of the weighted Chinese restaurant (WCR) algorithms discussed by Ishwaran, James, and Lo (2001) for fitting semiparametric models. Lo, Brunner, and Chan (1996) have provided general methodology related to iid WCR algorithms, and Ishwaran and James (2000) and Ishwaran, James, and Sun (2001) have given extensions to generalized Chinese restaurant processes and algorithms. Related work on sequential importance sampling (SIS) has been done by MacEachern, Clyde, and Liu (1999), Quin tana (1998), and Quintana and Newton (2000).
One of the primary focuses of the article is the widely used linear mixed model for continuous longitudinal data, where models include fixed-effects terms for population parameters and random-effects terms for subject-specific variation (Laird and Ware 1982). However, here we relax the usual parametric assumption of normal random effects to focus on more general inference for the random-effects distribution, such as estimation of higher-order moments and detection of skewness and multimodality. Although estimation for the fixed-effects parameters is relatively robust to misspecification of the random effects, for example, by empirical best linear unbiased estimators (EBLUE) obtained from restricted maximum likelihood (REML) estimation (see Butler and Louis 1992 for some empirical evidence and Jiang 1998 for theoretical results), it is becoming widely recognized that inference for random effects can be misleading when normal distributional assumptions do not hold (Verbeke and Lesaffre 1996). Moreover, even tho ugh certain features for random-effects distributions, such as variances, can be estimated accurately even when the assumption of normality is violated (Richardson and Welsh 1994; Jiang 1996), detecting higher-order properties of distributions such as skewness and multimodality necessitates dispensing with normality assumptions. Identifying such deviations from normality can be important, sometimes leading to critical insight into the data. For example, Zhang and Davidian (2002) used a semiparametric approach to identify population differences in cholesterol levels by detecting skewness in random intercept terms, while Greene (2001) was able to identify clinical differences in the progression of renal kidney disease by identifying skewed and thick-tailed random-slope distributions. (We return later in more detail to this last example as one illustration of our approach.) For other non-Bayesian semiparametric approaches to linear mixed models see Verbeke and Lesaffre (1996) and Aitkin (1999) who used a finite- mixture approach with implementation by the EM algorithm.
Although the longitudinal problem is a key application, our methods also apply to single-measurement data when only one observation is recorded per subject; for example, when it is anticipated that subject variation is larger than can be explained by measurement error. If error distributions are unknown, then such data can be viewed more generally as arising from a semiparametric regression model and cannot always be fit properly using standard methods such as REML. The assumption throughout, for both single and repeated measurements, is that random-effects distributions are unknown. Our approach, discussed in detail in Section 2, is to use a Bayesian nonparametric framework in which random effects are modeled using a Ferguson Dirichlet process.
The article is organized as follows. Section 2 provides the necessary background material on the iid WCR procedure and provides an overview of our approach. Connections with this work to the Bayesian nonparametric literature are discussed. Section 3 characterizes the posterior of Dirichlet process mixture models in terms of the partition structure, and thus provides the blueprint for estimating random effects. In particular, Rao--Blackwell estimates for higher-order moments for random effects, such as the skewness and kurtosis, estimates for standard errors, and methods for density estimation are discussed. Sections 4 and 5 discuss the WCR procedures for the single-measurement case. The methods developed in these sections are then extended to the longitudinal setting of Section 6, which describes how to combine REML estimates with the iid WCR algorithms for inference in Laird--Ware random-effects models. Section 7 discusses the extensive testing of the method by simulation and then illustrates its application to a study involving chronic renal disease. Section 8 summarizes key computational features.
2. SEMIPARAMETRIC HIERARCHICAL MODELS: OVERVIEW OF METHODS
For ease of presentation, it is simpler for us to first consider the single-measurement case, and later extend the methods for longitudinal data in Section 6. With single-measurement data, inference for the fixed-effects parameter [beta] (p-dimensional) and random-effects parameters [[alpha].sub.i] (s-dimensional) are based on data Y = ([Y.sub.1],..., [Y.sub.n]), where
[Y.sub.i] = [X'.sub.i][beta] + [Z'.sub.i][[alpha].sub.i] + [[member of].sub.i], i = 1,...., n. (1)
Thus we observe one random-effects parameter [[alpha].sub.i] per subject i. In (1), the [X.sub.i] are the p-dimensional fixed-effects covariates, [Z.sub.i] are the s-dimensional random-effects covariates, and [[member of].sub.i] are iid normal random variables (the measurement errors) with mean 0 and variance [[sigma.sup.2]. We assume that [[alpha].sub.i] are iid from an unknown distribution [Q.sub.0]; thus (1) can be viewed generally as a semiparametric regression model. For example, if s = 1 and [Z.sub.i] = 1, then
[Y.sub.i] = [X'.sub.i][beta] + [[zeta].sub.i], i = 1,..., n,
where [[xi].sub.i] = [[alpha].sub.i] + [[member of].sub.i] are iid measurement errors with unknown distribution.
An illustrative example in Section 5 we look at is what we call the "two-slope" problem, which can be described as follows. In group 1, the data are assumed to follow a slope--intercept model with a fixed-effects term for the slope and a random-effects term for the intercept, whereas in group 2, both slopes and intercepts are assumed to be random. For convenience, assuming that group 1 corresponds to observations 1,..., m, and group 2 corresponds to observations m + 1,..., n, the model is
[Y.sub.i] = [[alpha].sub.i, 0] + [X.sub.i][[beta].sub.1] + [[member of].sub.i], i = 1,..., m, (2)
and
[Y.sub.i] = [[alpha].sub.i, 0] + [X.sub.i][[alpha].sub.i, 1] + [[member of].sub.i], i = m + 1,..., n. (3)
It is clear this can be written as (1).
Consider the simulation presented in Figure 1 based on n = 1,000 observations (with data evenly distributed between the two groups). In group 2, random slopes [[alpha].sub.i, 1] were simulated from a skewed distribution with mean chosen to coincide with the fixed-effects slope [[beta].sub.1] of group 1, whereas random intercepts [[alpha].sub.i, 0] were drawn from a discrete two-point mixture distribution. Analysis of this data will be challenging to methods that do not relax assumptions of normality. For example, if intercept distributions are assumed normal, then it becomes theoretically impossible to separate the variance of intercepts from the variance [[sigma].sup.2] for the measurement error. Thus a method such as REML will be inconsistent here (see Sec. 5 for simulation results). Conventional methods will also have difficulty estimating the nonnormal random-effects distributions. For example, random effects typically estimated by empirical best linear unbiased prediction (EBLUP) will have poor performan ce here. Consider Figure 2, which plots the EBLUP estimates obtained from REML. It is clear that EBLUP cannot pick up on the discreteness of the intercept distribution, whereas the slope distribution, although skewed, still looks somewhat normal. Compare this to the WCR posterior estimates E[[[alpha].sub.i, 0]\Y] and E[[[alpha].sub.i, 1]\Y] plotted in Figure 3, which clearly identify the bimodal and skewed shape of the random effects. We return to this example in more detail in Section 5. Later, in Section 6, we consider more complex models for longitudinal data.
2.1 Hierarchical Models
Our approach to the linear mixed model uses a Bayesian nonparametric technique of modeling an unknown distribution by a nonparametric prior. In the mixed model, this works by modeling [Q.sub.0] using a random measure P with some prior Y. The method is best conceptualized by reexpressing (1) in a semiparametric hierarchical framework. Under the prior Y, the Bayesian semiparametric model for (1) is (conditioning on fixed effects and the measurement error variance)
([Y.sub.i]\[[alpha].sub.i], [beta], [[sigma].sup.2]) [~.sup.ind] N ([[eta].sub.i], [[sigma].sup.2]), i = 1,..., n,
([[alpha].sub.i]\P) [~.sup.iid] P,
P ~ P, (4)
where [[eta].sub.i] = [X.sup.'.sub.i][beta] + [Z'.sub.i][[alpha].sub.i]. Inference for [Q.sub.0] is thus based on the posterior for P from (4). The WCR methods discussed later can be applied to general discrete random measures P (Ishwaran and James 2000), although here we illustrate its use applied to the Ferguson (1973, 1974) Dirichlet process. Thus throughout, we write P to refer to DP([alpha].sub.0]H), the Ferguson Dirichlet process with finite measure [alpha].sub.0](*), where [alpha].sub.0] > 0 is some constant and H(*) is a probability measure over [R.sup.s]. The measure H(*) can be thought of as the guess of the distribution of the random effects, and [a.sub.0] as a measure of the strength of this belief. In our applications, it is convenient to take H(*) to be a normal distribution. These details are spelled out later in the article.
Hierarchical models subject to the Dirichlet process, similar in nature to (4), are now increasingly used for inference in nonparametric and semiparametric problems (see West, Muller, and Escobar 1994; Escobar and West 1998 for background and examples). Applications of such models specifically to linear mixed models have been considered in various contexts; for example, Bush and MacEachern (1996) considered a semiparametric randomized block design, and Kleinman and Ibrahim (1998) looked at mixed models for longitudinal data. Such approaches typically rely on the use of Markov chain Monte Carlo for model fitting, typically using a Polya urn Gibbs sampler, a general Gibbs sampling technique for fitting Dirichlet process mixture models (Escobar 1988, 1994). (See, however, Tao, Palta, Yandell, and Newton 1999 for a predictive recursive algorithm for fitting models.) But little work has been done along the lines of fitting linear mixed models based on iid sampling, or on methods that…
Source: HighBeam Research, Independent and identically distributed Monte Carlo algorithms for...