The social structure of public education in the U.S. typically clusters students within classrooms, classrooms within schools, and schools within districts. Because of this clustering a practical design for conducting educational experiments is to randomly assign an intervention to districts within a state, schools within a district, or classrooms within a school. This is called a Cluster Randomized Design (CRD). Because certain characteristics (such as student achievement) tend to be shared across clusters, CRDs must account for these statistical relationships both in the design of the experiment and in the analysis of results.
The “power” of any experiment to detect a treatment effect is determined by sample design and size. In a 2011 REESE PI Meeting workshop on Introduction to Power Analysis for Clustered Designs and the Online Variance Almanac, Hedges and Hedberg note that when relationships among clusters are nonrandom, more information is required to determine the power of CRDs than simple random samples. Power analyses for CRDs involve:
- The number of clusters and units within clusters.
- An estimated effect size.
- An estimated intraclass correlation (ICC), which is the proportion of the total variation that occurs between the clusters (see the online Variance Almanac of Academic Achievement, WebVA and the VA User’s Guide, which describes the data sets used to calculate ICCs).
- Estimated R2 values for each level from the use of any covariates in the analyses, which is the proportion of variance at the cluster and unit level accounted for by covariates (see the WebVA).
- The statistical significance required.
In addition to the Variance Almanac of Academic Achievement, a compendium of ICCs and related variance components that also includes a custom Stata program for calculating power, the ARC website contains links to resources on experimental design (including a link to the NCER Summer Institute on “Designing Cluster Randomized Trials”), statistical power (see also the VA User’s Guide), and sampling (including multilevel designs). Hedges and Hedberg also touched on each of these issues in the VA Workshop referenced above.
In another 2011 REESE PI Meeting workshop, on the Design and Analysis of Clustered Randomized Experiments in Education, Maier and Konstantopoulos note that analyses of multilevel data must also take clustering into account in order to calculate correct standard errors of regression estimates. Because of the nonrandom relationship among clusters, the standard errors of treatment effect estimates are typically smaller when clustering is ignored. This results in higher values of t-tests and thus a greater chance of committing a Type I error.
Post hoc methods for calculating the standard errors of clustered data include multiplying the square root of the design effect with the standard error of the estimate or using the Huber-White robust standard errors from Stata. A more direct method is to use analytic software designed for multilevel models (e.g., SAS proc MIXED, HLM, Stata xtmixed). The ARC website has links to information on such statistical software, while Konstantopoulos and Maier provide instructions for using HLM software to conduct two-level analyses. They also provide an example SPSS dataset to get started:
These data were produced from a two-level cluster randomized design where schools were randomly assigned to a treatment and a control condition. The data have two levels: students (2117) at level-1 who were nested in schools (74) at level-2. The outcome is post test reading scores (standardized) and the level-1 covariates are: student SES (1 = low, 0 = high) and pre test reading scores (standardized). The treatment is at the second level (1 = treatment school, 0 = control school). The level-2 covariates are: school SES (aggregate of student SES) and school pre test reading scores (aggregate of student pre test reading scores).