- Intervention Charts
- Academic Intervention Chart
- Behavioral Intervention Chart
Behavioral Intervention Tools Rating Rubric
Design (Group Design)
Does the study design allow us to conclude that the intervention program, rather than extraneous variables, was responsible for the results?
Full Bubble: Students were randomly assigned. At pretreatment, program and control groups were not statistically significantly different; and had a mean standardized difference that fell within 0.25 SD on measures used as covariates or on pretest measures also used as outcomes, and on demographic measures. There was no attrition bias*. Unit of analysis matched random assignment (controlling for variance associated with potential dependency at higher levels of the unit of randomization is permitted, e.g., for randomizing at the student level, controlling for variance at the classroom level)
Half Bubble:
Students were randomly assigned but other conditions for full bubble not met;
OR
Students were not randomly assigned but a strong quasi-experimental design was used. At pretreatment, program and control groups were not statistically significantly different and had a mean standardized difference that fell within 0.25 SD on measures central to the study (i.e., pretest measures also used as outcomes) and demographic measures, and outcomes were analyzed to adjust for pretreatment differences. There was no attrition bias. Unit of analysis matched assignment strategy.
Empty Bubble: Fails full and half bubble.
*NCII follows guidance from the What Works Clearinghouse (WWC) in determining attrition bias. The WWC model for determining bias based on a combination of differential and overall attrition rates can be found on pages 13-14 of this document: https://ies.ed.gov/ncee/wwc/Docs/referenceresources/wwc_procedures_v2_1_standards_handbook.pdf
Design (Single Case Design)
Does the study design allow us to evaluate experimental control?
Full Bubble: The study includes three data points or sufficient number to document a stable performance within that phase. There is the opportunity for at least three demonstrations of experimental control.*
Half Bubble: The study includes one or two data points within a phase. There is the opportunity for two demonstrations of experimental control. Or, the study is a non-concurrent multiple baseline design.
Empty Bubble: Fails full and half bubble.
*For alternating treatment designs, five repetitions of the alternating sequence are required for a full bubble, and four are required for a half bubble.
Behavioral Intervention Tools Rating Rubric
Effect Size (Group Design)
The effect size is a measure of the magnitude of the relationship between two variables. Specifically, on this chart, the effect size represents the magnitude of the relationship between participating in a particular intervention and an academic outcome of interest. The larger the effect size, the greater the impact that participating in the intervention had on the outcome. Furthermore, a positive effect size indicates that participating in the intervention led to improvement in performance on the academic outcome measure, while a negative effect size indicates that participating in the intervention led to a decline in performance on the academic outcome measure. According to guidelines from the What Works Clearinghouse, an effect size of .25 or greater is considered to be “substantively important.” Additionally, we note on this tools chart those effect sizes which are statistically significant. Effect sizes that are statistically significant can be considered more trustworthy than effect sizes of the same magnitude that are not statistically significant.
There are many different methods for calculating effect size. In order to ensure comparability of effect size across studies on this chart, the NCII follows guidance from the What Works Clearinghouse and uses a standard formula to calculate effect size across all studies and outcome measures—Hedges g, corrected for small-sample bias:
Developers of programs on the chart were asked to submit the necessary data to compute the effect sizes. Where available, the NCII requests adjusted posttest means, which refers to posttests that have been adjusted to correct for any pretest differences between the program and control groups. In the event that developers are unable to access or report adjusted means, the NCII will calculate and report effect size based on unadjusted posttests. However, unadjusted posttests are typically reported only in instances in which we can assume pretest group equivalency. Therefore, the default effect size reported will be Hedges g based on adjusted posttest means. NCII will only report effect size based on unadjusted posttests for studies that (a) are unable to provide adjusted means, and (b) have pretest differences on the measure that fall within .25SD and are not statistically significant.*
The chart includes, for each study, the number and type of outcomes measures, and, for each type of outcome measure, a mean effect size. Additionally, for some studies, effect sizes are reported for one or more disaggregated sub-samples. By clicking on any of the individual effect size cells, users can see a full list of effect sizes for each measure used in the study.
Studies that include a “—“ in the effect size cell either do not have the necessary data or do not meet the assumptions required for calculating and reporting effect size using the associated formula. The reason for the missing data is provided when users click on the cell.
*An exception to this rule will only be made if vendors establish a link between an instrument administered only at posttest and a comparable instrument administered at pretest. If Center staff verify that the pretest and posttest measures assess the same construct and that there were negligible between-group differences in this domain (pretest ES fell within 0.25 SDs and was statistically insignificant), NCII will attempt to calculate a difference-in-differences adjusted ES. For further details, see Appendix F (pages F.4-F.5) of the current WWC Procedures Handbook. If you believe that one of your outcome measures is a suitable “proxy” for another measure, please indicate this in your submission or contact us (ToolsChartHelp@air.org) and explain your thinking.
Behavioral Intervention Tools Rating Rubric
Visual Analysis (Single Case Design)
Does visual analysis of the data demonstrate evidence of a relationship between the independent variable and the primary outcome of interest?
Full Bubble: Visual or other analysis demonstrates clear, consistent, and meaningful change in pattern of data as a result of intervention (level, trend, variability, immediacy). The number of data points is sufficient to demonstrate a stable level of performance for the dependent variable; there are at least three demonstrations of a treatment effect*, and no documented non-demonstrations.
Half Bubble: Visual or other analysis demonstrates minimal or inconsistent change in pattern of data. There were two demonstrations of a treatment effect and no documented non-effects, or the ratio of effects to non-effects was less than or equal to 3:1.
Empty Bubble: Visual analysis demonstrates no change in pattern of the data. Fails full and half bubble.
* In determining demonstration of a treatment effect, the TRC will consider the following:
- Do the baseline data document a pattern in need of change?
-
Do the baseline data demonstrate a predictable baseline pattern?
- Is the variability sufficiently consistent?
- Is the trend either stable or moving away from the therapeutic direction?
-
Do the data within each phase non-baseline document a predictable data pattern?
- Is the variability sufficiently consistent?
- Is the trend either sufficiently low or moving in the hypothesized direction (i.e., away from anticipated treatment effects during baseline conditions and towards treatment effects in intervention conditions)?
-
Does between phase data document the presence of basic effects?
- Is the level discriminably different between the first and last three data points in adjacent phases?
- Is the trend discriminably different between the first and last three data points in adjacent phases?
- Is there an overall level change between baseline and treatment phases?
- Is there an overall change in trend between baseline and treatment phases?
- Is there an overall change in variability between baseline and treatment phases?
- Is there sufficiently low overlap between baseline and treatment phases to document an experimental effect?
- Do the data patterns in similar phases (e.g., intervention-to-intervention) demonstrate similar patterns? (Only applicable to reversal designs or embedded probe designs)
Behavioral Intervention Tools Rating Rubric
Participants (Group & Single Case Design)
Do the students in the study exhibit intensive social, emotional or behavioral challenges?
Full Bubble: Evidence is convincing that all participants currently exhibit intensive social, emotional, or behavioral challenges, as measured by an ED label, placement in an alternative school/classroom, non-response to Tiers 1 and 2,* or designation of severe problem behaviors on a validated scale or through observation.
Half Bubble: Evidence is convincing that some participants currently exhibit intensive social, emotional, or behavioral challenges, as measured by an ED label, placement in an alternative school/classroom, non-response to Tiers 1 and 2, or designation of severe problem behaviors on a validated scale or through observation, etc.)
Empty Bubble: Evidence is unconvincing that participants currently exhibit intensive social, emotional, or behavioral challenges.
* Non-response to Tiers 1 and 2 is applicable for interventions studied in settings in which a behavioral tiered intervention system is in place, and the student has failed to meet the school’s or district’s criteria for “response” to both Tier 1 (schoolwide/universal program) and Tier 2 (Tier 2 or secondary behavioral intervention) supports. Detailed information about these non-response criteria should be included in the study description.
Behavioral Intervention Tools Rating Rubric
Fidelity of Implementation (Group & Single Case Design)
Was it clear that the intervention program was implemented as it is designed to be used?
Full Bubble: Measurement of fidelity of implementation was conducted adequately* and observed with adequate intercoder agreement (e.g., between .8 and 1.0) or permanent product, and levels of fidelity indicate that the intervention program was implemented as intended (e.g., a reasonable average across multiple measures, or 75% or above for a single measure).
Half Bubble:
Measurement of fidelity of implementation was conducted adequately and observed with adequate intercoder agreement (e.g., between .8 and 1.0) or permanent product, but levels of fidelity are moderate (e.g., an average below 60% across multiple measures, 60%-75% for a single measure);
OR
Levels of fidelity indicate that the intervention program was implemented as intended (e.g., a reasonable average across multiple measures, or 75% or above for a single measure), but measurement of fidelity of implementation either was not conducted adequately or was not observed with adequate intercoder agreement or permanent product.
Empty Bubble: Fails full and half bubble.
* In determining whether measurement of fidelity of implementation was conducted adequately, the TRC will consider the following:
- clear and comprehensive rationale for the indicators making up the implementation measures, that reflects what the intervention developers believe are the active intervention ingredients;
- the number of times implementation is measured;
- the extent to which implementation fidelity observers are independent of the intervention development team.
Behavioral Intervention Tools Rating Rubric
Measures (Targeted) (Group & Single Case Design)
Were the study measures accurate and important?
Full Bubble: Measure(s) directly assess behaviors targeted by the intervention. Empirical evidence (e.g., psychometrics, inter-observer agreement) of the quality of each targeted measure was provided for the current sample and results are adequate (e.g., IOA between .8 and 1.0 for all measures).
Half Bubble: Measure(s) directly assess behaviors targeted by the intervention. Empirical evidence (e.g., psychometrics, inter-observer agreement) of the quality of most or all targeted measure was provided for the current sample, but results were adequate only for some measures or were marginally acceptable.
Empty Bubble: Fails full and half bubble.
Dash: No targeted measures used in the study.
Behavioral Intervention Tools Rating Rubric
Measures (Broader) (Group & Single Case Design)
Were the study measures accurate and important?
Full Bubble: Measure(s) assess outcomes not directly targeted by the intervention. Empirical evidence (e.g., psychometrics, inter-observer agreement) of the quality of each related measure was provided for the current sample and results are adequate (e.g., IOA between .8 and 1.0 for all measures).
Half Bubble: Measure(s) assess outcomes not directly targeted by the intervention. Empirical evidence (e.g., psychometrics, inter-observer agreement) of the quality of most or all related measures was provided for the current sample, but results were adequate only for some measures or were marginally acceptable.
Empty Bubble: Fails full and half bubble.
Dash: No measures used in the study.
Behavioral Intervention Tools Rating Rubric
Disaggregated Effect Size Data Available
Indicates whether disaggregated outcome data are available (and if so, for which subgroups.)
Behavioral Intervention Tools Rating Rubric
Targeted Behavior(s)
Target behaviors include externalizing and/or internalizing behaviors.
Behavioral Intervention Tools Rating Rubric
Delivery
Delivery methods may include administration to an individual student, small groups of students, and/or a classroom of students.
Behavioral Intervention Tools Rating Rubric
Fidelity of Implementation Check List Available
Indicates whether a fidelity of implementation checklist is available.
Behavioral Intervention Tools Rating Rubric
Minimum Interventionist Requirements
The title or credentials an interventionist should possess in order to administer the program.
Behavioral Intervention Tools Rating Rubric
Minimum Training Requirements
The minimum training time required to prepare an instructor or interventionist to implement the program.
Behavioral Intervention Tools Rating Rubric
Intervention Reviewed by What Works Clearinghouse
Indicates whether the intervention or study was reviewed by the What Works Clearinghouse.
Behavioral Intervention Tools Rating Rubric
Other Research Potentially Eligible for NCII Review
Indicates the number of other research studies that are potentially eligible for NCII review, but have not been reviewed.