- Intervention Charts
- Academic Intervention Chart
- Behavioral Intervention Chart
Behavior Screening Rating Rubric
Classification Accuracy
Note: Classification Accuracy will be rated separately for each criterion measure and time of year for the administration (e.g., Fall, Winter, Spring). Ratings will be provided for up to two different criterion measures and up to three different time points. Data for additional criterion measures or administration times may be reported, but will not be rated.
Full Bubble: Full Bubble: All of Q1 – Q3 (below) rated as YES; and the lower bound of the confidence interval around the Area Under the Curve (AUC) estimate ≥ 0.75, or if a confidence interval is not available, the lowest estimate of the AUC is ≥ 0.751; and Sensitivity ≥ 0.70 and Specificity ≥ 0.70.
Half Bubble: All of Q1 – Q3 (below) rated as YES; and the lower bound of the confidence interval around the Area Under the Curve (AUC) estimate ≥ 0.70 but <0.75, or if a confidence interval is not available, the lowest estimate of the AUC is ≥ 0.701; and Sensitivity ≥ 0.60 and Specificity ≥ 0.60.
Empty Bubble: Does not meet full or half bubble.
1Note: This option will only be included in the rubric for the 2017 and 2018 review cycles and will be phased out in 2019.
Q1. Was an appropriate external measure of academic performance used as an outcome?
Q2. Was risk adequately defined within an RTI approach to screening (e.g., 20th percentile), and consistent with base rate?
Q3. Were the classification analyses and cut-points adequately performed?
Area Under the Curve (AUC) Statistic: an overall indication of the diagnostic accuracy of a Receiver Operating Characteristic (ROC) curve. ROC curves are a generalization of the set of potential combinations of sensitivity and specificity possible for predictors. AUC values closer to 1 indicate the screening measure reliably distinguishes among students with satisfactory and unsatisfactory reading performance, whereas values at 0.50 indicate the predictor is no better than chance.
Behavior Screening Rating Rubric
Reliability
Full Bubble: At least two types of reliability are reported that are appropriate1 for the purpose of the tool, and the analyses are drawn from at least two samples representative of students across all performance levels, and the median of the estimates for each type met or exceeded 0.70.
Half Bubble:
At least two types of reliability are reported that are appropriate1 for the purpose of the tool, and:
-
the analyses are drawn from one sample representative of students across all performance levels, and the median of the estimates met or exceeded 0.70
or - the analyses are drawn from at least two samples representative of students across all performance levels and the median of the estimates for each type met or exceeded 0.60.
Empty Bubble: Does not meet full or half bubble.
Dash: Reliability data were not provided.
1 Tests which require human judgment must report inter-rater reliability to be eligible for a Full or Half Bubble rating. Other types of reliability must include justification of appropriateness given the purpose of the tool.
Behavior Screening Rating Rubric
Validity
Full Bubble: At least two types of appropriately justified1 validity analyses are reported, and the analyses are drawn from at least one sample representative of students across all performance levels, and the median of the estimates for each type met or exceeded 0.60 (or was within an acceptable range given the expected relationship with the criterion measure(s)).
Half Bubble: One type of appropriately justified1 validity analysis is reported, and the analysis is drawn from a sample representative of students across all performance levels, and the median of the estimates met or exceeded 0.60 (or was within an acceptable range given the expected relationship with the criterion measure(s)).
Empty Bubble: Does not meet full or half bubble.
Dash: Validity data were not provided
1Appropriately justified analyses must include at least one criterion measure that is external to the screening system and theoretically linked to the underlying construct measured by the tool.
Behavior Screening Rating Rubric
Sample Representativeness
National with Cross-Validation: At least one classification accuracy analysis was conducted using a national sample1, and at least one cross-validation study was conducted.
National without Cross-Validation: At least one classification accuracy analysis was conducted using a national sample1, without a cross-validation study.
Regional with Cross-Validation: At least one classification accuracy analysis was conducted using one or more state or regional samples and at least one cross-validation study was conducted.
Regional without Cross-Validation: At least one classification accuracy analysis was conducted using one or more state or regional samples without a cross-validation study.
Local with Cross-Validation: At least one classification accuracy analysis was conducted using one or more local district samples and at least one cross-validation study was conducted.
Local without Cross-Validation: At least one classification accuracy analysis was conducted using one or more local district samples without a cross-validation study.
No Evidence: Insufficient or no evidence was provided on the characteristics of the sample used for the classification accuracy analysis.
1A national sample consists of at least 150 students across at least three of nine geographical divisions defined by U.S. Census Bureau: https://www2.census.gov/geo/pdfs/maps-data/maps/reference/us_regdiv.pdf
Behavior Screening Rating Rubric
Bias Analysis Conducted
Bias Analysis refers to an analysis that examines the degree to which a tool is or is not biased against subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)
Yes: One or more of the following three types of analyses were conducted:
- Multiple-group confirmatory factor models for categorical item responses
- Explanatory group models such as multiple-indicators, multiple-causes (MIMIC) or explanatory IRT with group predictors
- Differential Item Functioning from Item Response Theory (DIF in IRT)
- Testing differential classification accuracy across demographic groups
No: Does not meet Yes.
Behavior Screening Rating Rubric
Admin Format
The administration format may include individual student administration and/or small groups of students.
Behavior Screening Rating Rubric
Admin & Scoring Time
The time needed to administer and score the assessment.
Behavior Screening Rating Rubric
Scoring Format
The scoring format may include manual scoring (i.e., hand scoring) and/or automatic scoring (i.e., computer scoring).
Behavior Screening Rating Rubric
Types of Decision Rules
Indicates the decision rules, or criteria for decision making, available for the tool.
Behavior Screening Rating Rubric
Evidence Available for Multiple Decision Rules
Indicates whether evidence was provided in support of multiple decision rules.
Behavior Screening Rating Rubric
Usability Study Conducted
Indicates whether a usability study was conducted on the assessment.