Behavior Screening Rating Rubric
Please note that the following rubrics are applied separately for each sub-scale, grade leve/span, and informant targeted by the tool.
Classification Accuracy
Classification Accuracy
Note: Classification Accuracy will be rated separately for each criterion measure and time of year for the administration (e.g., Fall, Winter, Spring). Ratings will be provided for up to two different criterion measure and up to three different time points. Data for additional criterion measures or administration times may be reported, but will not be rated.
Full Bubble: All of Q1 – Q3 (below) rated as YES; and the lower bound of the confidence interval around the Area Under the Curve (AUC) estimate ≥ 0.75, or if a confidence interval is not available, the lowest estimate of the AUC is ≥ 0.75*; and Sensitivity ≥ 0.70 and Specificity ≥ 0.70.
Half Bubble: All of Q1 – Q3 (below) rated as YES; and the lower bound of the confidence interval around the Area Under the Curve (AUC) estimate ≥ 0.70 but <0.75, or if a confidence interval is not available, the lowest estimate of the AUC is ≥ 0.70*; and Sensitivity ≥ 0.60 and Specificity ≥ 0.60.
Empty Bubble: Does not meet full or half bubble.
*Note: This option will only be included in the rubric for the 2017 and 2018 review cycles and will be phased out in 2019.
Q1. Was an appropriate measure of social, emotional, or behavior skills used as an outcome?
Q2. Was a convincing rationale provided for the selection of comparison point against which the screener was judged (e.g., percentile, cut score)?
Q3. Were the classification analyses and cut-points adequately performed?
Area Under the Curve (AUC) Statistic: an overall indication of the diagnostic accuracy of a Receiver Operating Characteristic (ROC) curve. ROC curves are a generalization of the set of potential combinations of sensitivity and specificity possible for predictors. AUC values closer to 1 indicate the screening measure reliably distinguishes among students with satisfactory and unsatisfactory reading performance, whereas values at 0.50 indicate the predictor is no better than chance.
Technical Standards
Reliability
Full Bubble: Either (a) a model-based approach to reliability was reported with at least two sources of variance or (b) at least two other types of reliability were reported appropriate for the purpose of the tool, and drawn from at least two samples that are representative of students across all performance levels. And for each type of reliability reported the lower bound of the confidence interval around the median estimate met or exceeded 0.70 or, if a confidence interval is not available, the lowest estimate met or exceeded 0.70*.
Half Bubble: Either (a) a model-based approach to reliability was reported with at least two sources of variance or (b) at least two other types of reliability were reported appropriate for the purpose of the tool, and drawn from at least two samples that are representative of students across all performance levels. And/or for each type of reliability reported the lower bound of the confidence interval around the median estimate fell below 0.70 but met or exceeded 0.60 or, if a confidence interval is not available, the lowest estimate fell below 0.70 but met or exceeded 0.60*.
Empty Bubble: Does not meet full or half bubble.
Dash: Reliability data were not provided.
*Note: This option will only be included in the rubric for the 2017 and 2018 review cycles and will be phased out in 2019.
Validity
Full Bubble: There are at least two types of appropriately justified validity analyses* from a sample representative of students across all performance levels and the lower bound of the confidence interval around each standardized estimate met or exceeded 0.60 (or if not, within an acceptable range given the expected relationship with the criterion measure(s)), or, if a confidence interval is not available, the lowest estimate exceeded 0.60**.
Half Bubble: Analyses, measures, and sample were appropriate, but evidence was mixed, with one or more estimate(s) either not meeting or exceeding 0.60 or not within an acceptable range given the expected relationship with the criterion measure(s)
Empty Bubble: Does not meet full or half bubble.
Dash: Validity data were not provided
*Appropriately justified analyses must include at least one criterion measure that is external to the screening system and theoretically linked to the underlying construct measured by the tool.
**Note: This option will only be included in the rubric for the 2017 and 2018 review cycles and will be phased out in 2019.
Sample Representativeness
Full Bubble: Large representative national sample (at least 150 students across at least three geographic divisions*) and cross-validation (i.e., multiple studies).
Half Bubble: Large representative national sample (at least 150 students across at least three geographic divisions) or multiple regional/state samples with no cross-validation; or one or more regional/state samples with cross-validation.
Empty Bubble: One regional or state sample with no cross-validation; or one or more local samples.
* Nine regions defined by Census geographical divisions: https://www2.census.gov/geo/pdfs/maps-data/maps/reference/us_regdiv.pdf
Bias Analysis Conducted
Bias Analysis refers to an analysis that examines the degree to which a tool is or is not biased against subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)
Yes: One or more of the following three types of analyses were conducted:
- Multiple-group confirmatory factor models for categorical item responses
- Explanatory group models such as multiple-indicators, multiple-causes (MIMIC) or explanatory IRT with group predictors
- Differential Item Functioning from Item Response Theory (DIF in IRT)
- Testing differential classification accuracy across demographic groups
No: Does not meet Yes.
Usability Features
Admin Format
The administration format may include individual student administration and/or small groups of students.
Admin & Scoring Time
The time needed to administer and score the assessment.
Scoring Format
The scoring format may include manual scoring (i.e., hand scoring) and/or automatic scoring (i.e., computer scoring).
Types of Decision Rules
Indicates the decision rules, or criteria for decision making, available for the tool.
Evidence Available for Multiple Decision Rules
Indicates whether evidence was provided in support of multiple decision rules.
Usability Study
Indicates whether a usability study was conducted on the assessment.