AccessMyLibrary provides FREE access to over 30 million articles from top publications available through your library.
Create a link to this page
Copy and paste this link tag into your Web page or blog:
On average, about half the respondents to surveys do not answer one or more questions analyzed in the average survey-based political science article. Almost all analysts contaminate their data at least partially by filling in educated guesses for some of these items (such as coding "don't know" on party identification questions as "independent"). Our review of a large part of the recent literature suggests that approximately 94% use listwise deletion to eliminate entire observations (losing about one-third of their data, on average) when any one variable remains missing after filling in guesses for some.(1) Of course, similar problems with missing data occur in nonsurvey research as well.
This article addresses the discrepancy between the treatment of missing data by political scientists and the well-developed body of statistical theory that recommends against the procedures we routinely follow.(2) Even if the missing answers we guess for nonrespondents are right on average, the procedure overestimates the certainty with which we know those answers. Consequently, standard errors will be too small. List-wise deletion discards one-third of cases on average, which deletes both the few nonresponses and the many responses in those cases. The result is a loss of valuable information at best and severe selection bias at worst.
Some researchers avoid the problems missing data can cause by using sophisticated statistical models optimized for their particular applications (such as censoring or truncation models; see Appendix A). When possible, it is best to adapt one's statistical model specially to deal with missing data in this way. Unfortunately, doing so may put heavy burdens on the investigator, since optimal models for missing data differ with each application, are not programmed in currently available standard statistical software, and do not exist for many applications (especially when missingness is scattered throughout a data matrix).
Our complementary approach is to find a better choice in the class of widely applicable and easy-to-use methods for missing data. Instead of the default method for coping with the issue--guessing answers in combination with listwise deletion--we favor a procedure based on the concept of "multiple imputation" that is nearly as easy to use but avoids the problems of current practices (Rubin 1977).(3) Multiple imputation methods have been around for about two decades and are now the choice of most statisticians in principle, but they have not made it into the toolbox of more than a few applied statisticians or social scientists. In fact, aside from the experts, "the method has remained largely unknown and unused" (Schafer and Olsen 1998). The problem is only in part a lack of information and training. A bigger issue is that although this method is easy to use in theory, in practice it requires computational algorithms that can take many hours or days to run and cannot be fully automated. Because these algorithms rely on concepts of stochastic (rather than deterministic) convergence, knowing when the iterations are complete and the program should be stopped requires much expert judgment, but unfortunately, there is little consensus about this even among the experts.(4) In part for these reasons, no commercial software includes a correct implementation of multiple imputation.(5)
We begin with a review of three types of assumptions one can make about missing data. Then we demonstrate analytically the disadvantages of listwise deletion. Next, we introduce multiple imputation and our alternative algorithm. We discuss what can go wrong and provide Monte Carlo evidence that shows how our method compares with existing practice and how it is equivalent to the standard approach recommended in the statistics literature, except that it runs much faster. We then present two examples of applied research to illustrate how assumptions about and methods for missing data can affect our conclusions about government and politics.
ASSUMPTIONS ABOUT MISSINGNESS
We now introduce three assumptions about the process by which data become missing. Briefly in the conclusion to this section and more extensively in subsequent sections, we will discuss how the various methods crucially depend upon them (Little 1992).
Source: HighBeam Research, Analyzing Incomplete Political Science Data: An Alternative Algorithm...