AccessMyLibrary provides FREE access to over 30 million articles from top publications available through your library.

Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation.

American Political Science Review

| March 01, 2001 | KING, GARY; HONAKER, JAMES; JOSEPH, ANNE; SCHEVE, KENNETH | COPYRIGHT 2001 Cambridge University Press. This material is published under license from the publisher through the Gale Group, Farmington Hills, Michigan.  All inquiries regarding rights should be directed to the Gale Group. (Hide copyright information)Copyright

On average, about half the respondents to surveys do not answer one or more questions analyzed in the average survey-based political science article. Almost all analysts contaminate their data at least partially by filling in educated guesses for some of these items (such as coding "don't know" on party identification questions as "independent"). Our review of a large part of the recent literature suggests that approximately 94% use listwise deletion to eliminate entire observations (losing about one-third of their data, on average) when any one variable remains missing after filling in guesses for some.(1) Of course, similar problems with missing data occur in nonsurvey research as well.

This article addresses the discrepancy between the treatment of missing data by political scientists and the well-developed body of statistical theory that recommends against the procedures we routinely follow.(2) Even if the missing answers we guess for nonrespondents are right on average, the procedure overestimates the certainty with which we know those answers. Consequently, standard errors will be too small. List-wise deletion discards one-third of cases on average, which deletes both the few nonresponses and the many responses in those cases. The result is a loss of valuable information at best and severe selection bias at worst.

Some researchers avoid the problems missing data can cause by using sophisticated statistical models optimized for their particular applications (such as censoring or truncation models; see Appendix A). When possible, it is best to adapt one's statistical model specially to deal with missing data in this way. Unfortunately, doing so may put heavy burdens on the investigator, since optimal models for missing data differ with each application, are not programmed in currently available standard statistical software, and do not exist for many applications (especially when missingness is scattered throughout a data matrix).

Our complementary approach is to find a better choice in the class of widely applicable and easy-to-use methods for missing data. Instead of the default method for coping with the issue--guessing answers in combination with listwise deletion--we favor a procedure based on the concept of "multiple imputation" that is nearly as easy to use but avoids the problems of current practices (Rubin 1977).(3) Multiple imputation methods have been around for about two decades and are now the choice of most statisticians in principle, but they have not made it into the toolbox of more than a few applied statisticians or social scientists. In fact, aside from the experts, "the method has remained largely unknown and unused" (Schafer and Olsen 1998). The problem is only in part a lack of information and training. A bigger issue is that although this method is easy to use in theory, in practice it requires computational algorithms that can take many hours or days to run and cannot be fully automated. Because these algorithms rely on concepts of stochastic (rather than deterministic) convergence, knowing when the iterations are complete and the program should be stopped requires much expert judgment, but unfortunately, there is little consensus about this even among the experts.(4) In part for these reasons, no commercial software includes a correct implementation of multiple imputation.(5)

We begin with a review of three types of assumptions one can make about missing data. Then we demonstrate analytically the disadvantages of listwise deletion. Next, we introduce multiple imputation and our alternative algorithm. We discuss what can go wrong and provide Monte Carlo evidence that shows how our method compares with existing practice and how it is equivalent to the standard approach recommended in the statistics literature, except that it runs much faster. We then present two examples of applied research to illustrate how assumptions about and methods for missing data can affect our conclusions about government and politics.

ASSUMPTIONS ABOUT MISSINGNESS

We now introduce three assumptions about the process by which data become missing. Briefly in the conclusion to this section and more extensively in subsequent sections, we will discuss how the various methods crucially depend upon them (Little 1992).

Related articles from newspapers, magazines, journals, and more
A simplified framework for using multiple imputation in social work...
Magazine article from: Social Work Research Rose, Roderick A. Fraser, Mark W. September 1, 2008 700+ words
...multiple imputation to handle missing data. Multiple imputation, in which missing values are...simulation study. KEY WORDS: missing data; multiple imputation; nonresponse ********** Missing data are ubiquitous in social research...
Multiple imputation after 18+ years.
Magazine article from: Journal of the American Statistical Association Rubin, Donald B. June 1, 1996 700+ words
...objectives for handling missing data in this environment...emphasize that the goal of multiple imputation is to provide statistically...accepted reason for the missing data. In Section 2 multiple imputation is reviewed, with particular...
A potential for bias when rounding in multiple imputation.(Statistical Practice)
Magazine article from: The American Statistician Horton, Nicholas J. Lipsitz, Stuart R. Parzen, Michael November 1, 2003 700+ words
...estimation with missing data is the method of multiple imputation (Rubin 1978...1978) multiple imputation is "fillingin" the missing data by drawing...distribution of the missing data given the...summary of multiple imputation can be found...
Multiple imputation for incomplete data with semicontinuous variables.
Magazine article from: Journal of the American Statistical Association Javaras, Kristin N. van Dyk, David A. September 1, 2003 700+ words
...INTRODUCTION 1.1 Multiple Imputation With Semicontinuous Variables Missing data often complicate...caused by the missing data from the basic...uncertainty. Multiple imputation (MI) (e...accounting for missing data. In the imputation...
Multiple imputation in mixture models for nonignorable nonresponse with...
Magazine article from: Journal of the American Statistical Association Glynn, Robert J. Laird, Nan M. Rubin, Donald B. September 1, 1993 700+ words
...application of multiple imputation to the estimation...impact of missing data, as quantified...mechanism. Multiple imputation is presented...parameters by multiple imputation. We consider...some and the missing data mechanism...
Assessing secular trends in blood pressure: a multiple-imputation approach.
Magazine article from: Journal of the American Statistical Association Heitjan, Daniel F. Landis, J. Richard September 1, 1994 700+ words
...to use statistical models to fill in likely values for these unobserved data. Our method is an adaptation of multiple imputation by predictive-mean matching (Heitjan and Little 1991). The idea is as follows: One uses available data to...
Multiple edit/multiple imputation for multivariate continuous data.
Magazine article from: Journal of the American Statistical Association Ghosh-Dastidar, Bonnie Schafer, Joseph L. December 1, 2003 700+ words
1. INTRODUCTION 1.1 Background A measurement error describes a discrepancy in the observed value and the true value that it attempts to measure. In survey sampling, measurement error in data collected from human respondents is usually called response error (Biemer, Groves, Lyberg, Mathiowetz, and
Statistical Analysis With Missing Data (2nd ed.).(Book Review)
Magazine article from: Technometrics Lazar, Nicole A. November 1, 2003 700+ words
...analyzing datasets with missing data. These last are categorized...the Bayesian approach to missing data, as well as on multiple imputation, data augmentation, and...in the estimates of the missing data (an area that was also...
Researchers develop a prognostic model in the presence of missing data.
Newspaper article from: Clinical Oncology Week April 14, 2003 700+ words
...ovarian cancer in the presence of missing data. According to recent research...deaths). After applying a multiple imputation (MI) framework we included...prognostic model in the presence of missing data: an ovarian cancer case study...
Researchers develop a prognostic model in the presence of missing data.(ovarian...
Magazine article from: Women's Health Weekly April 17, 2003 700+ words
...ovarian cancer in the presence of missing data. According to recent research...deaths). After applying a multiple imputation (MI) framework we included...prognostic model in the presence of missing data: an ovarian cancer case study...
For more facts and information, see all results

Source: HighBeam Research, Analyzing Incomplete Political Science Data: An Alternative Algorithm...

©2009 Gale, a part of Cengage Learning. All rights reserved.
About us | FAQs | Contact us | Privacy policy | Terms and conditions
Other Gale sites: Encyclopedia.com | HighBeam Research | Acquire Content | Books & Authors | Goliath | MovieRetriever | Smart QandA