AccessMyLibrary provides FREE access to over 30 million articles from top publications available through your library.
Create a link to this page
Copy and paste this link tag into your Web page or blog:
ABSTRACT
Attention to statistical power and effect size can improve the design and the reporting of behavioral accounting research. Three accounting journals representative of current empirical behavioral accounting research are analyzed for their power (1 - [beta]), or control of Type II errors ([beta]), and compared to research in other disciplines. Given this study's findings, additional attention should be directed to adequacy of sample sizes and study design to ensure sufficient power when Type I error is controlled at [alpha] = .05 as a baseline. We do not suggest replacing traditional significance testing, but rather augmenting it with the reporting of [beta] to complement and interpret the relevance of a reported [alpha] in any given study. In addition, the presentation of results in alternative formats, such as those suggested in this study, will enhance the current reporting of significance tests. In turn, this will allow the reader a richer understanding of, and an increased trust in, a study's results and implications.
Statistical significance testing is such an Integral part of behavioral accounting research that the importance of demonstrating statistical significance is probably unquestioned by researchers. Yet for decades, researchers In other disciplines such as psychology, education, and management have discussed deficiencies of null-hypothesis testing for making inferences in behavioral sciences. Critics argue that results of statistical significance testing are often misinterpreted and the likelihood of Type II errors (rejecting [H.sub.0] when [H.sub.0] is false) are ignored (Brewer 1972; Cohen 1992; Greenwald et al. 1996; Mone et al. 1996). The debate among methodologists in psychology on ascribing meaning to failure to reject the null hypothesis has gone so far that the American Psychological Association (APA) is considering banning significance tests from its journals and devoted the January 1997 issue of Psychological Science to this question (Shrout 1997).
Accounting researchers would probably accept that the power of a statistical test (the probability of rejecting [H.sub.0] when [H.sub.0] is false) is important, but they may not be aware of the attention that has been given to the calculation and reporting of statistical power in other disciplines such as psychology and education. Yet the discussion is especially pertinent to behavioral accounting researchers who often employ methodologies that have been derived from research in these disciplines. Attention to statistical power and effect size [1] can improve both the design and the reporting of behavioral accounting research. For example, if a researcher cannot reject the null hypothesis, then the ability to demonstrate that the statistical tests performed were of sufficient power to detect an effect would strengthen conclusions that could be drawn from the research.
In particular, Burgstahler (1987, 204) argues that many accounting researchers are "Bayesians who revise their prior beliefs based on observed empirical evidence." If accounting studies are designed such that their hypothesis tests are of low power, then "little probability revision should be induced regardless of whether significant results are observed" (Burgstahler 1987, 212) (emphasis added). In other words, a highly significant test with very low power should "properly have little or no impact on the beliefs of a Bayesian" (Burgstahler 1987, 203). However, most published accounting research reports only significance levels without reporting power. How can readers judge the true impact of a study's findings on their prior beliefs if they cannot assess the actual strength and reliability of those findings?
In this paper, after a brief review of the concepts of statistical power and effect size, we report an analysis of the statistical power of published behavioral accounting research from three journals, Issues in Accounting Education, Behavioral Research in Accounting, and Journal of Management Accounting Research, published 1993 through 1997, and compare our results to retrospective power analyses in other disciplines. This selection of accounting journals allows a comparison of statistical power among studies that use student subjects and studies that use professional accountants as subjects. Obtaining a sufficient sample size when using professionals as subjects is often a critical issue in designing behavioral accounting research. We expect that studies using student subjects will be more powerful because researchers can obtain larger samples at a lower cost than when using professional accountants as subjects.
There is one previous study on statistical power that examines accounting research. Lindsay (1993) analyzed statistical power in studies on budgetary planning and control. Results were analyzed by journal to detect trends in the reporting of power due to type of journal, and articles were examined for evidence that researchers incorporated power considerations into planning their studies. This study extends Lindsay's (1993) research in three ways: (1) by significantly increasing the number of studies included in the relevant time period; [2] (2) by analyzing more recent research to assess changes in power in response to Lindsay (1993) and other authors of power analyses; and (3) by not restricting the analysis to a single research topic. Another contribution of this study is a series of techniques through which accounting researchers can report their research results to provide more information on the power of tests and on effect size.
STATISTICAL POWER ANALYSIS
In 1962, Jacob Cohen published the first study on the statistical power of studies. In what is now considered a classic study on power analysis, Cohen argues that researchers in psychology place disproportionate attention on control of Type I error (i.e., concluding that there is a relation or effect when there is none). However, in designing research, they largely ignore the power of a statistical test, which is related to Type II error. However, as Cohen (1988, 1) noted, "Since statistical significance Is so earnestly sought and devoutly wished for by behavioral scientists, one would think that the a priori probability of its accomplishment would be routinely determined and well understood." However, he demonstrated that studies published in the 1960 Journal of Abnormal Psychology had, on average, very low power to detect even a moderate effect in the population. Subsequent studies in a number of disciplines have found similar results.
Power Determination
Type II error is the probability that a statistical test will fail to reject the null hypothesis when it is false. The probability of a Type II error is referred to as [beta], and power is (1 - [beta]). Power is primarily a function of three determinants: the level of significance ([alpha]), sample size (n), and the effect size in the population (the difference between the null and the alternative hypothesis). In a retrospective power analysis of published research, the power of reported statistical tests can be computed if values can be obtained for the three power determinants (Cohen 1962). For example, the following formula is used to calculate the power of a one-tailed test of differences between two population means:
Power = 1 - [beta] = Probability [Z [less than] Z[alpha] - d]
where [beta] = probability of Type II error, d = effect size = \[[micro].sub.0]\ - [[micro].sub.a]/[sigma], [alpha] = level of significance, and [[micro].sub.0] and [[micro].sub.a] = the respective means of the two samples, and [sigma] = standard deviation of either sample (samples are assumed to be equal).
Level of Significance
Traditionally, researchers have placed more emphasis on avoiding Type I errors than Type II errors (Pollard 1993; Schmidt 1996). "As we do not wish the research literature to be riddled with spurious effects, nor to encourage pointless experiments that build on such effects, we are more concerned with avoiding Type I errors" (Pollard 1993, 450). However, while [alpha] levels have been controlled, Cohen (1962) demonstrated that [beta] levels had been ignored in psychological research, and that the power of published research ranges from 50 to 80 percent. In practice, this means that a study with power of 55 percent has a Type II error rate of 45 percent, causing difficulties in accepting that study's results without serious reservations about the results' validity.
Cohen (1988) suggests that the conventional Type II error rate should be 0.20, which would set power conventionally at 0.80. A materially smaller power would result in an unacceptable risk of Type II error, while a significantly larger value would probably require a larger sample size than is generally practical (Cohen 1992). Setting [beta] at 0.20 is consistent with the prevailing view that Type I error is more serious. Since [alpha] is conventionally set at 0.05, Cohen suggested setting [beta] at four times that value (Cohen 1988, 1992).
Sample Size
The second determinant of power is sample size. Power increases as the number of observations increases. As the sample size increases, the standard deviations of the sampling distributions for [H.sub.0] and [H.sub.1] decrease, which result in less overlap of the distributions and increased power (Sedlmeier and Gigerenzer 1989). The relationship of sample size and statistical power is especially salient for behavioral accounting researchers. Since behavioral accounting research focuses on the response of individuals to accounting issues or information, researchers can rarely use archival data with a large number of observations. Lab experiments, and even surveys, often produce smaller sample sizes of necessity because of the cost of data collection.
Effect Size
The final determinant of power is effect size (d), the true size of the difference between [H.sub.0] and [H.sub.1] (the null hypothesis is that the effect size is 0). Alternatively, effect size can be described as the strength of a relationship among two or more variables (Sawyer and Ball 1981). Other things being equal, the greater the effect size, the greater the power. Probably the most difficult aspect of power analysis is specifying, or at least estimating, the effect size.…