AccessMyLibrary provides FREE access to over 30 million articles from top publications available through your library.
Create a link to this page
Copy and paste this link tag into your Web page or blog:
"Exploratory" and "confirmatory" data analysis can both be viewed as methods for comparing observed data to what would be obtained under an implicit or explicit statistical model. For example, many of Tukey's methods can be interpreted as checks against hypothetical linear models and Poisson distributions. In more complex situations, Bayesian methods can be useful for constructing reference distributions for various plots that are useful in exploratory data analysis. This article proposes an approach to unify exploratory data analysis with more formal statistical methods based on probability models. These ideas are developed in the context of examples from fields including psychology, medicine, and social science.
Key Words: Bayesian inference; Bootstrap; Graphs; Multiple imputation; Posterior predictive checks.
1. INTRODUCTION
This article proposes a unified approach to exploratory and confirmatory data analysis, based on considering graphical data displays as comparisons to a reference distribution. The comparison can be explicit, as when data are compared to sets of take data simulated from the model, or implicit, as when patterns in a two-way plot are compared to an assumed model of independence. Confirmatory analysis has the same structure, but the comparisons are numerical rather than visual.
From the standpoint of exploratory data analysis, our methodology has three major benefits:
1. Explicit identification of a comparison model allows one to simulate replicated data to be used as a reference distribution for an exploratory plot.
2. Symmetries in the underlying model can be used to construct exploratory graphs that are easier to interpret, sometimes (as with a residual plot) without the need for explicit comparison to a reference distribution.