DBR-SIS (Direct Behavior Rating - Single Item Scale)
Disruptive Behavior

Summary

Descriptive Information

As a behavioral assessment methodology, DBR combines characteristics of systematic direct observations and behavioral rating scales. Specifically, DBR-SIS reflects a teacher’s rating regarding the proportion of time in which a target student was observed to engage in a specific behavior, using as a scale from 0 (never) to 10 (always), during a specified observation period. For example, if a student received a score of 8 out of 10 on a DBR-SIS form while being observed for Academic Engagement over a 20-minute period, this score would be interpreted as the student being academically engaged during 80% of the period. While observation periods and settings may vary depending on student and behavior specific factors, DBR-SIS forms reflecting student behaviors are always completed immediately following the observation.

Acquisition & Cost

Where to Obtain:: PAR, Inc. – in conjunction with authors Chafouleas & Riley-Tillman / PAR, Inc.; https://dbr.education.uconn.edu/

Initial Cost:: Contact vendor for pricing details.

Replacement Cost:: Contact vendor for pricing details.

Included in Cost:: Training is free of charge via the online training module: http://dbrtraining.education.uconn.edu/

Training & Technical Support

Training Requirements:: Less than one hour of training. Training is free of charge via the online training module: http://dbrtraining.education.uconn.edu/

Qualified Administrators:: There are no minimum qualifications of the examiner.

Access to Technical Support:

Administration

Assessment Format:

Direct observation
Rating scale

Scoring Time:

Scoring is automatic OR
1 minutes per student

Scores Generated:

Administration Time:

15 minutes per student

Scoring Method:

Automatically (computer-scored)

Technology Requirements:

Computer or tablet
Internet connection

Tool Information

Descriptive Information

Please provide a description of your tool:: As a behavioral assessment methodology, DBR combines characteristics of systematic direct observations and behavioral rating scales. Specifically, DBR-SIS reflects a teacher’s rating regarding the proportion of time in which a target student was observed to engage in a specific behavior, using as a scale from 0 (never) to 10 (always), during a specified observation period. For example, if a student received a score of 8 out of 10 on a DBR-SIS form while being observed for Academic Engagement over a 20-minute period, this score would be interpreted as the student being academically engaged during 80% of the period. While observation periods and settings may vary depending on student and behavior specific factors, DBR-SIS forms reflecting student behaviors are always completed immediately following the observation.

Is your tool designed to measure progress towards an end-of-year goal (e.g., oral reading fluency) or progress towards a short-term skill (e.g., letter naming fluency)?: End-year goal
Short-term skill

The tool is intended for use with the following grade(s).

Preschool / Pre - kindergarten
selected

Kindergarten
selected

First grade
selected

Second grade
not selected

Third grade
selected

Fourth grade
selected

Fifth grade
not selected

Sixth grade
selected

Seventh grade
selected

Eighth grade
not selected

Ninth grade
not selected

Tenth grade
not selected

Eleventh grade
not selected

Twelfth grade

The tool is intended for use with the following age(s).

0-4 years old
selected

5 years old
selected

6 years old
selected

7 years old
selected

8 years old
selected

9 years old
selected

10 years old
selected

11 years old
selected

12 years old
selected

13 years old
selected

14 years old
not selected

15 years old
not selected

16 years old
not selected

17 years old
not selected

18 years old

The tool is intended for use with the following student populations.

Students in general education
selected

Students with disabilities
not selected

English language learners

ACADEMIC ONLY: What dimensions does the tool assess?

Reading

Global Indicator of Reading Competence
not selected

Listening Comprehension
not selected

Vocabulary
not selected

Phonemic Awareness
not selected

Decoding

Passage Reading
not selected

Word Identification
not selected

Comprehension

Spelling & Written Expression

Global Indicator of Spelling Competence
not selected

Global Indicator of Writting Expression Competence

Mathematics

Global Indicator of Mathematics Comprehension
not selected

Early Numeracy
not selected

Mathematics Concepts
not selected

Mathematics Computation
not selected

Mathematics Application
not selected

Fractions

Algebra

Other

Please describe specific domain, skills or subtests:

BEHAVIOR ONLY: Please identify which broad domain(s)/construct(s) are measured by your tool and define each sub-domain or sub-construct.: Disruptive Behavior - A student action that interrupts regular school or classroom activity.

BEHAVIOR ONLY: Which category of behaviors does your tool target?: Externalizing

Acquisition and Cost Information

Where to obtain:

Email Address
Address
Phone Number
Website: https://dbr.education.uconn.edu/

Initial cost for implementing program:

Cost
Unit of cost

Replacement cost per unit for subsequent use:

Cost
Unit of cost
Duration of license

Additional cost information:

Describe basic pricing plan and structure of the tool. Provide information on what is included in the published tool, as well as what is not included but required for implementation.: Training is free of charge via the online training module: http://dbrtraining.education.uconn.edu/

Provide information about special accommodations for students with disabilities.

Administration

BEHAVIOR ONLY: What type of administrator is your tool designed for?

General education teacher
selected

Special education teacher
not selected

Parent

Child

External observer
selected

Other

If other, please specify:

Anyone with consistent access to the student throughout the observation period.

BEHAVIOR ONLY: What is the administration format?

Direct observation
selected

Rating scale
not selected

Checklist

Performance measure
not selected

Other

If other, please specify:

BEHAVIOR ONLY: What is the administration setting?

General education classroom
selected

Special education classroom
selected

School office
selected

Recess

Lunchroom

Home

Other

If other, please specify:

Does the program require technology?

Yes

If yes, what technology is required to implement your program? (Select all that apply)

Computer or tablet
selected

Internet connection
not selected

Other technology (please specify)

If your program requires additional technology not listed above, please describe the required technology and the extent to which it is combined with teacher small-group instruction/intervention:

What is the administration context?

Individual

Small group If small group, n=

Large group If large group, n=

Computer-administered
not selected

Other

If other, please specify:

What is the administration time?

Time in minutes

per (student/group/other unit)

student

Additional scoring time:

Time in minutes

per (student/group/other unit)

student

How many alternate forms are available, if applicable?

Number of alternate forms

per (grade/level/unit)

ACADEMIC ONLY: What are the discontinue rules?

No discontinue rules provided
not selected

Basals

Ceilings

Other

If other, please specify:

BEHAVIOR ONLY: Can multiple students be rated concurrently by one administrator?

If yes, how many students can be rated concurrently?

Training & Scoring

Training

Is training for the administrator required?: Yes

Describe the time required for administrator training, if applicable:: Less than one hour of training. Training is free of charge via the online training module: http://dbrtraining.education.uconn.edu/

Please describe the minimum qualifications an administrator must possess.: There are no minimum qualifications of the examiner.; No minimum qualifications

Are training manuals and materials available?: Yes

Are training manuals/materials field-tested?: Yes

Are training manuals/materials included in cost of tools?: Yes
If No, please describe training costs:

Can users obtain ongoing professional and technical support?: No
If Yes, please describe how users can obtain support:

Scoring

BEHAVIOR ONLY: What types of scores result from the administration of the assessment?

Score
Observation	Behavior Rating
Frequency Duration Interval Latency	Raw score

Conversion
Observation	Behavior Rating
Rate Percent	Standard score Subscale/ Subtest Composite Stanine Percentile ranks Normal curve equivalents IRT based scores

Interpretation
Observation	Behavior Rating
Error analysis Peer comparison Rate of change	Dev. benchmarks Age-Grade equivalent

How are scores calculated?

Manually (by hand)
selected

Automatically (computer-scored)
not selected

Other

If other, please specify:

Do you provide basis for calculating performance level scores?

Yes

What is the basis for calculating performance level and percentile scores?

Age norms

Grade norms
not selected

Classwide norms
not selected

Schoolwide norms
not selected

Stanines

Normal curve equivalents

What types of performance level scores are available?

Raw score

Standard score
not selected

Percentile score
not selected

Grade equivalents
not selected

IRT-based score
not selected

Age equivalents
not selected

Stanines

Normal curve equivalents
not selected

Developmental benchmarks
not selected

Developmental cut points
not selected

Equated

Probability
not selected

Lexile score
not selected

Error analysis
not selected

Composite scores
not selected

Subscale/subtest scores
not selected

Other

If other, please specify:

Please describe the scoring structure. Provide relevant details such as the scoring format, the number of items overall, the number of items per subscale, what the cluster/composite score comprises, and how raw scores are calculated.: The scoring format is a 0-10 scale, with items rated using the scale following each observation period. There is a single rating (item) per sub-domain. As previously described, there are three domains that form school-based core behavior competencies (academically engaged, disruptive, respectful). Observers are asked to estimate the proportion of time that a student exhibited each behavioral competency during the observation period, and to convert that percentage to the 0-10 scale, with “0” indicating that the competency was not observed and “10” indicating that the competency was observed throughout the entire observation period. To calculate level of performance for each sub-domain, it is recommended that the average rating across more than 5 occasions be utilized. Rate of change is completed at the individual level, as consistent with single subject design logic that 5 or more data points are recommended (minimum of 3).

Do you provide basis for calculating slope (e.g., amount of improvement per unit in time)?: No

ACADEMIC ONLY: Do you provide benchmarks for the slopes?

ACADEMIC ONLY: Do you provide percentile ranks for the slopes?

What is the basis for calculating slope and percentile scores?

Age norms

Grade norms
not selected

Classwide norms
not selected

Schoolwide norms
not selected

Stanines

Normal curve equivalents

Describe the tool’s approach to progress monitoring, behavior samples, test format, and/or scoring practices, including steps taken to ensure that it is appropriate for use with culturally and linguistically diverse populations and students with disabilities.

Levels of Performance and Usability

Are levels of performance specified in your manual or published materials?

Yes

If yes, specify the levels of performance and how they are used for progress monitoring:

Chafouleas, S. M., Kilgus, S. P., Jaffery, R., Riley-Tillman, T. C., & Welsh, M. (2013). Direct Behavior Rating as a school-based behavior screener for elementary and middle grades. Journal of School Psychology. Johnson, A. H., Miller, F. G., Chafouleas, S. M., Welsh, M. E., Riley-Tillman, T. C., & Fabiano, G. (2016). Evaluating the technical adequacy of DBR-SIS in tri-annual behavioral screening: A multisite investigation. Journal of school psychology, 54, 39-57. As noted in the manuscripts, levels of performance were obtained through use of ROC analyses. These analyses result in conditional probability indices which can be used to determine an optimal cut score for determining risk. This cut score serves as the level of performance with which a comparison of individual student can be made. The above-noted manuscripts established cut scores with relatively small confidence intervals. Findings indicated that the established cuts were much more accurate in identifying at-risk students than would be expected from identifying students via chance alone. The following cut scores were decided for various grade groups: Early Elementary (K-2) DB = 2 Upper Elementary (3-5) DB = 1 Middle School (6-8) DB = 1 This information is presented as preliminary, and with a few important caveats. For example, our subsequent analyses utilizing data collected from a different sample seem to indicate that it may not be appropriate to set uniform cuts across a grade group. In particular, cuts are not consistent across grade levels for upper elementary students, and different cuts may be needed for different portions of the school year (Fall, Winter, Spring).

What is the basis for specifying levels of performance?

Norm-referenced
not selected

Criterion-referenced
not selected

Other

If other, please specify:

False

If norm-referenced, describe the normative profile.

National representation (check all that apply):

Northeast:

New England

Middle Atlantic

Midwest:

East North Central

West North Central

South:

South Atlantic

East South Central

West South Central

West:

Mountain

Pacific

Local representation (please describe, including number of states)

Representation from two states in the Northeast (New England, Middle Atlantic) CT an NY and one state in the Midwest (West North Central) MO. In total 22 schools sampled across rural, suburban and urban districts.

Date: 2011-2012 School Year
Size: 629 (Fall), 606 (Winter), 609 (Spring)

Gender (Percent)

Male: 52 (Fall), 52.1 (Winter), 51.7 (Spring)
Female: 48 (Fall), 47.9 (Winter), 48.3 (Spring)
Unknown

SES indicators (Percent)

Eligible for free or reduced-price lunch: Mean rate of students eligible for free or reduced lunch in schools within sample 36.4%
Other SES Indicators

Race/Ethnicity (Percent)

White, Non-Hispanic: 81.4 (Fall), 82.8 (Winter), 82.3 (Spring)
Black, Non-Hispanic: 12.2 (Fall), 11.0 (Winter), 11.3 (Spring)
Hispanic: 7.5 (Fall), 7.6 (Winter), 7.4 (Spring)
American Indian/Alaska Native
Asian/Pacific Islander: 1.7 (Fall), 1.7 (Winter), 1.7 (Spring)
Other: Multiracial: 1.0 (Fall), 1.0 (Winter), 0.9 (Spring)
Unknown: 3.7 (Fall), 3.5 (Winter), 3.6 (Spring).

Disability classification (Please describe)
First language (Please describe)
Language proficiency status (Please describe)

If criterion-referenced, describe procedures for specifying levels of performance.

Describe any other procedures for specifying levels of performance.

Has a usability study been conducted on your tool (i.e., a study that examines the extent to which the tool is convenient and practicable for use?)

Yes

If yes, please describe, including the results:

Riley-Tillman, T.C., Chafouleas, S.M., Briesch, A.M., & Eckert, T. (2008). Daily Behavior Report Cards and Systematic Direct Observation: An investigation of the acceptability, reported training and use, and decision reliability among school psychologists. Journal of Behavioral Education, 17, 313-327. doi:s10864-008-9070-5 Abstract: More than ever, educators require assessment procedures and instrumentation that are technically adequate as well as efficient to guide data-based decision making. Thus, there is a need to understand perceptions of available tools, and the decisions made when using collected data, by the primary users of those data. In this paper, two studies that surveyed members of the National Association of School Psychologists with regard to two procedures useful in formative assessment, (i.e.,Daily Behavior Report Cards; Systematic Direct Observation), are presented. Participants reported greater overall levels of training and use of Systematic Direct Observation than Daily Behavior Report Cards, yet both techniques were rated as equally acceptable for use in formative assessment. Furthermore, findings supported that school psychologists tend to make similar intervention decisions when presented with both types of data. Implications, limitations, and future directions are discussed. Miller, F. G., Neugebauer, S. R., Chafouleas, S. M., Briesch, A. M., Welsh, M. E., Riley-Tillman, T. C., & Fabiano, G. A. (2012, August). Teacher perceptions of behavior screening assessments. Poster presentation at the American Psychological Association Annual Convention, Orlando, FL. Abstract: This study aimed to investigate teachers’ perceptions of the usability (acceptability, understanding, feasibility, home-school collaboration, systems climate, and systems support) of three school-based behavior assessments. Public school teachers in grades 1, 2, 4, 5, 7, and 8 and located across three geographic locations (N = 133) served as participants. Overall, teachers rated the three behavioral assessments positively, with perceived greater understanding of DBR-SIS than other measures. One possible reason for the perceived greater understanding may be that DBR-SIS may be more easily interpreted than other rating scale formats because DBR-SIS ratings are intend to reflect the percentage of time a student engaged in a target behavior. Understanding teacher perceptions of behavioral rating scales is important; such assessments can be used to identify barriers to implementation for the purpose of either removing those barriers or for selecting an alternative option with greater likelihood of success.

Has a social validity study been conducted on your tool (i.e., a study that examines the significance of goals, appropriateness of procedures (e.g., ethics, cost, practicality), and the importance of treatment effects)?

If yes, please describe, including the results:

Performance Level

Reliability

Age / Grade Informant	Grades K-5 Teacher	Grades 6-8 Teacher
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

*Offer a justification for each type of reliability reported, given the type and purpose of the tool.: DBR-SIS was originally developed with intent to mirror opportunities for formative data streams as provided within systematic direct observation. As such, issues of reliability (particularly which types to emphasize) can be openly discussed and debated. In this section, we present a two-pronged approach to reliability that includes a) intraclass correlations b) generalizability theory. The first approach involves reliability estimated by converting intraclass correlation coefficients to reliability coefficients using the approach suggested by Shrout & Fleiss (1979), with data obtained from studies designed to examine DBR-SIS within screening purposes. These reliability estimates are based on large samples of students across a diverse range of general classroom settings, and they address a wide range of grade levels and ultimately consider the variability between students and within observations These data provide insight into the consistency of student ratings across observation periods and indicate that ratings are very stable across observation periods. Using generalizability theory, reliability data are calculated through dependability studies to demonstrate how reliability varies based on number of observations and days observed. This approach is appropriate as DBR-SIS data are rating scales and the ability to generalize scores such that we can assume a student would receive a similar rating from a different observer is of key concern. Generalization studies allow for reliability estimates across several thresholds of ratings, in this case, determining how many observations are needed to obtain various estimates. This provides practitioners with a range of administration options depending on the type of decision to be made (e.g. low stakes intervention, high stakes interventions). Reliability coefficients assuming assessment considerations (differing numbers of observations, type of rater scoring students) are discussed in the sources listed below, which purposely sampled from classrooms in which variability in student behavior was expected (e.g. inclusive classroom with intensive intervention needs).

*Describe the sample(s), including size and characteristics, for each reliability analysis conducted.: Sample information for Johnson et. al (2016) study: Subgroup Fall Winter Spring Male 52.0% 52.1% 51.7% Female 48.0% 47.9% 48.3% White 81.4% 82.8% 82.3% Black 12.2% 11.0% 11.3% Asian/Pacific Islander 1.7% 1.7% 1.7% Non-Hispanic 92.5% 92.4% 92.6% Hispanic 7.5% 7.6% 7.4% Other 3.7% 3.5% 3.6% Multiracial 1.0% 1.0% 0.9% Sample information for Chafouleas et. al (2013) study: Grades K-5: 51.7% female, White, Non-Hispanic (N = 553; 89.6%), White, Hispanic (N = 12; 1.9%), Black (N = 9; 1.5%), American Indian or Alaska Native (N = 2; 0.3%), Asian (N = 193 2.1%), Other (N = 8; 1.3%), missing (N = 20 3.2%). Grades 6-8: 46.3% female, 89.7% White, non-Hispanic. Sample information for Chafouleas et. al (2010) study: Seven 8th-grade students attending an inclusive language arts classroom. Students demographics included: 3 boys/4 girls, 6 hispanic/1 african-american, 4 receiving special education services. Raters included the classroom teacher, a special education teacher who provided services in the classroom, and two research assistants. In the actual study, raters observed students three times a day over six consective days for a period of 45-60 minutes. Reliability coefficients below present the reliability for raters including classroom teachers and research assistants separately, across a variety of total observations.

*Describe the analysis procedures for each reported type of reliability.: Analysis procedures for Johnson et. al (2016) study:: Average DBR-SIS DB scores across 6-10 observations per student were used for analysis. Specifically, data reliability was calculated from a one-way intra-class correlation coefficient (ICC) that examined variability between students and within observations, corresponding to ICC(I,k), using a formula proposed by Shrout and Fleiss (1979). The average ICC (k=6-10) was selected. Analysis procedures for Chafouleas et. al (2013) study: Students were rated on DB by teachers across 5-10 data points and these scores were averaged to obtain a mean value. ICCs were then calculated, using a formula in accordance with Shrout and Fleiss’ (1979) recommendations. Intra-class correlation (ICC) coefficients were examined for each DBR-SIS behavior target to assess the appropriateness of this within-student DBR-SIS data aggregation. Analysis procedures for Chafouleas et. al (2010) study: Four primary facets of interest were identified (i.e., person, rater, day and rating occasion). Every student was rated on every occasion by every rater and, given that the goal was to generalize results beyond the specific students, raters, and rating occasions examined in the current study, all facets were considered to be random. An ANOVA with Type III sum of squares was used to derive all variance components (Chafouleas et al., 2010).

*In the table(s) below, report the results of the reliability analyses described above (e.g., model-based evidence, internal consistency or inter-rater reliability coefficients). Include detail about the type of reliability data, statistic generated, and sample size and demographic information.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:: Additional reliability data is available from the Center upon request.

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Do you have reliability data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?: No

If yes, fill in data for each subgroup with disaggregated reliability data.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Validity

Age / Grade Informant	Grades K-5 Teacher	Grades 6-8 Teacher
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

*Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.: Concurrent validity serves as the primary source of data presented as related to DBR-SIS. As described, the intended purpose of DBR-SIS is in formative uses. As such, a primary source of validity data comes from concurrent comparisons with variety of behavior observation. While there is no single behavior assessment method that combines both teacher ratings and formative assessment, comparisons the Behavioral and Emotional Screening System and Student Risk Screening Scale (teacher ratings), both established and technically sound screening measures, provide information about the validity of DBR-SIS

*Describe the sample(s), including size and characteristics, for each validity analysis conducted.: Sample information for Johnson et. al (2016) study: Subgroup Fall Winter Spring Male 52.0% 52.1% 51.7% Female 48.0% 47.9% 48.3% White 81.4% 82.8% 82.3% Black 12.2% 11.0% 11.3% Asian/Pacific Islander 1.7% 1.7% 1.7% Non-Hispanic 92.5% 92.4% 92.6% Hispanic 7.5% 7.6% 7.4% Other 3.7% 3.5% 3.6% Multiracial 1.0% 1.0% 0.9% Sample information for Kilgus, Riley-Tillman, Chafouleas, Christ, & Welsh (2014) study: The sample consisted of 1108 students in the 1st, 4th, and 7th grades sampled from 13 schools across three geographic regions (northeast, southeast, Midwest). Specifically, the sample consisted of 410 first grade students – 31 teachers, 354 fourth grade students – 25 teachers, and 344 seventh grade students – 23 teachers. Regarding region the sample consisted of 28 teachers at the Northeast site (first-grade n = 8, fourth-grade n = 9, and seventh-grade n = 11), 29 teachers at the Southeast site (first grade n = 14, fourth-grade n = 10, and seventh-grade n = 5) and 22 teachers at the Midwest site (first-grade n = 9, fourth-grade n = 6, and seventh-grade n = 7). The majority of students were identified as White, non Hispanic (n = 536; 48.38%); 141 as White, Hispanic (12.73%); 297 as Black or African American (26.81%); 20 as American Indian or Alaskan Native (1.81%); 45 as Asian American (4.06%); and 32 as Other (2.89%). Race/ethnicity data were not provided for 37 students (3.33%). A review of data indicated that the student sample at each geographic site was representative of its corresponding state population with regard to gender and race/ethnicity, with a slight underrepresentation of White, non-Hispanic students. Sample information for Chafouleas et. al (2013) study: Elementary (K-5) 617 Elementary Students (K-90; 1st-116; 2nd- 106; 3rd- 92; 4th-122; 5th-91) Lower Elementary (K-2) – 312 Upper Elementary (3-5) – 305 Female (51.7%) White, Non-Hispanic (N = 553; 89.6%) White, Hispanic (N = 12; 1.9%) Black (N = 9; 1.5%) American Indian or Alaska Native (N = 2; 0.3%) Asian (N = 193 2.1%) Other (N = 8; 1.3%) Missing (N = 20 3.2%). Middle School (6-8) 214 middle school students (6th-18; 7th-155; 8th-41) 46.3% female 89.7% White, non-Hispanic

*Describe the analysis procedures for each reported type of validity.: Analysis procedures for Johnson et. al (2016) study: Correlation coefficients were calculated between BESS T-scores and mean DBR-SIS DB scores. Analysis procedures for Kilgus et. al (2014) study: Pearson product-moment bi-variate correlations between screening scales (e.g., DB, BESS, and SRSS) were calculated across grades. Analysis procedures for Chafouleas et. al (2013) study: Concurrent validity evaluated by calculating Pearson product-moment correlation coefficients (r) assessing the correlation between mean DBR-SIS DB scores and computed SRSS summed scores and BESS T scores.

*In the table below, report the results of the validity analyses described above (e.g., concurrent or predictive validity, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of validity analysis not compatible with above table format:: Internal Validity: The following steps were taken to protect against the threat of internal validity: (a) counterbalancing of measure presentation, (b) random order assignment of students on individual measures, and (c) random selection of students within classrooms. Counterbalancing of presentation order took place by measure through the random assignment of conditions to teacher participants, with corrections made after random assignment in order to ensure even distribution of conditions within site and grade group.

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Describe the degree to which the provided data support the validity of the tool.: Results of Johnson et. al (2016) study: The DB scale, in which lower scores indicated less risk (e.g., lower disruption) was positively correlated with the BESS-T scale for which lower scores indicate less risk across all grades and time-points. All correlations were statistically significant from 0 at the p<.001 level using the Holm–Bonferroni correction for Type I error inflation (Holm, 1979). These results, in addition to the steps taken to protect against threats to internal validity (see above), provide evidence strengthening the validity of DBR-SIS DB scores. Results of Kilgus et. al (2014) study: Bivariate correlations between the BESS and DBR-SIS DB and SRSS and DBR-SIS DB were all in the expected direction (e.g., Lower DB scores positively correlated with BESS and SRSS higher risk scores) and were statistically significant at the p<.001 level. Results of Chafouleas et. al (2013) study: All correlations between DBR-SIS DB and BESS-T and DBR-SIS DB and SRSS scores were statistically significant at the .001 level and in the expected direction. Additionally, the influence of subgroup size (e.g., ratings of students within two vs three subgroups) was taken into consideration and no differences in correlation scores were found.

Do you have validity data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?: No

If yes, fill in data for each subgroup with disaggregated validity data.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of validity analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Bias Analysis

Age / Grade: Informant	Grades K-5 Teacher	Grades 6-8 Teacher
Rating	Not Provided	Not Provided

Have you conducted additional analyses related to the extent to which your tool is or is not biased against subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)? Examples might include Differential Item Functioning (DIF) or invariance testing in multiple-group confirmatory factor models.: No

If yes,
a. Describe the method used to determine the presence or absence of bias:

b. Describe the subgroups for which bias analyses were conducted:

c. Describe the results of the bias analyses conducted, including data and interpretative statements. Include magnitude of effect (if available) if bias has been identified.

Growth Standards

Sensitivity to Behavior Change

Age / Grade: Informant	Grades K-5 Teacher	Grades 6-8 Teacher
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

Describe evidence that the monitoring system produces data that are sensitive to detect incremental change (e.g., small behavior change in a short period of time such as every 20 days, or more frequently depending on the purpose of the construct). Evidence should be drawn from samples targeting the specific population that would benefit from intervention. Include in this example a hypothetical illustration (with narrative and/or graphics) of how these data could be used to monitor student performance frequently enough and with enough sensitivity to accurately assess change:: Evidence that DBR-SIS can produce data that are sensitive to detect incremental change (e.g., small behavior change in a short period of time) is provided in the 3 studies below. Actual, not hypothetical, data are available to demonstrate how DBR-SIS has been used to monitor student performance on a frequent basis to inform decisions about student performance. The studies below represent a continuum of classwide (middle school, elementary) to individual (elementary) student focus. Graphs are provided in 2 of the 3 (JOBE, AEI) manuscripts to illustrate how data present enough sensitivity to assess change – the third manuscript (Exceptional Children) presents aggregated information in table format only given the volume of data. Chafouleas, S. M., Sanetti, L.M.H., Kilgus, S. P., & Maggin, D. M. (2012). Evaluating sensitivity to behavioral change across consultation cases using Direct Behavior Rating Single-Item Scales (DBR-SIS). Exceptional Children, 78, 491-505. Abstract. In this study, the sensitivity of Direct Behavior Rating Single Item Scales (DBR-SIS) for assessing behavior change in response to an intervention was evaluated. Data from 20 completed behavioral consultation cases involving a diverse sample of elementary participants and contexts utilizing a common intervention in an A-B design were included in analyses. Secondary purposes of the study were to investigate the utility of five metrics proposed for understanding behavioral response as well as the correspondence among these metrics and teachers’ ratings of intervention acceptability. Overall, results suggest that DBR-SIS demonstrated sensitivity to behavior change regardless of the metric used. Furthermore, there was limited association between student change and teachers’ ratings of acceptability. Chafouleas, S. M., Sanetti, L.M.H., Jaffery, R., & Fallon, L. (2012). Research to practice: An evaluation of a class-wide intervention package involving self-management and a group contingency on behavior of middle school students. Journal of Behavioral Education, 21, 34-57. Doi:10.1007/s10864-011-9135-8. Abstract. The effectiveness of an intervention package involving self-management and a group contingency at increasing appropriate classroom behaviors was evaluated in a sample of middle school students. Participants included all students in each of the 3 eighth-grade general education classrooms and their teachers. The intervention package included strategies recommended as part of best practice in classroom management to involve both building skill (self-management) and reinforcing appropriate behavior (group contingency). Data sources involved assessment of targeted behaviors using Direct Behavior Rating—single item scales completed by students and systematic direct observations completed by external observers. Outcomes suggested that, on average, student behavior moderately improved during intervention as compared to baseline when examining observational data for off-task behavior. Results for Direct Behavior Rating data were not as pronounced across all targets and classrooms in suggesting improvement for students. Limitations and future directions, along with implications for school-based practitioners working in middle school general education settings, are discussed. Riley-Tillman, T.C., Methe, S.A., & Weegar, K. (2009). Examining the use of Direct Behavior Rating methodology on classwide formative assessment: A case study. Assessment for Effective Intervention, 34, 242-250. doi:10.1177/1534508409333879 Abstract. High-quality formative assessment data are critical to the successful application of any problem-solving model (e.g., response to intervention). Formative data available for a wide variety of outcomes (academic, behavior) and targets (individual, class, school) facilitate effective decisions about needed intervention supports and responsiveness to those supports. The purpose of the current case study is to provide preliminary examination of direct behavior rating methods in class-wide assessment of engagement. A class-wide intervention is applied in a single-case design (B-A-B-A), and both systematic direct observation and direct behavior rating are used to evaluate effects. Results indicate that class-wide direct behavior rating data are consistent with systematic direct observation across phases, suggesting that in this case study, direct behavior rating data are sensitive to classroom-level intervention effects. Implications for future research are discussed. In addition, the following study provides evidence DBR-SIS for both Academic Engagement and Disruptive behavior is also sensitive to change in an intensive need population. In this study students, all had demonstrated weakness in social confidence and a majority were diagnosed formally with autism or emotional disturbances. In these studies, it was demonstrated that when behavior changed over time, both DBR and systemic direct observation altered accordingly. In this case, SDO was used as a marker to document DBR sensitivity to change for both academic engagement and disruptive behavior. Kilgus, S. P., Riley-Tillman, T. C., Stichter, J. P., Schoemann, A., & Owens, S. (in press). Examining the concurrent criterion-related validity of Direct Behavior Rating Single Item Scales (DBR-SIS) with students with high functioning autism. Assessment for Effective Intervention. A line of research has supported the development and validation of Direct Behavior Rating – Single Item Scales (DBR-SIS) for use in progress monitoring. Yet, this research was largely conducted within the general education setting with typically developing children. It is unknown whether the tool may be defensibly used with students exhibiting more substantial concerns, including students with social competence difficulties. The purpose of this investigation was to examine the concurrent validity of DBR-SIS in a middle school sample of students exhibiting substantial social competence concerns (n = 58). Students were assessed using both DBR-SIS and systematic direct observation (SDO) across three target behaviors. Each student was enrolled in one of two interventions: the Social Competence Intervention or a business-as-usual control condition. Students were assessed across three time points, including baseline, mid-intervention, and post-intervention. A review of across-time correlations indicated small to moderate correlations between DBR-SIS and SDO data (r = .25-.45). Results further suggested that the relationships between DBR-SIS and SDO targets were small to large at baseline. Correlations attenuated over time, though differences across time points were not statistically significant. This was with the exception of academic engagement correlations, which remained moderate-high across all time points.

Reliability (Intensive Population): Reliability for Students in Need of Intensive Intervention

Age / Grade Informant	Grades K-5 Teacher	Grades 6-8 Teacher
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

Offer a justification for each type of reliability reported, given the type and purpose of the tool:: DBR-SIS was originally developed with intent to mirror opportunities for formative data streams as provided within systematic direct observation. As such, issues of reliability (particularly which types to emphasize) can be openly discussed and debated. In this section, we present a two-pronged approach to reliability that includes a) intraclass correlations and b) Min N Estimation Kilgus, S. P., Riley-Tillman, T. C., Stichter, J. P., Schoemann, A. M., & Bellesheim, K. (2016). Reliability of Direct Behavior Ratings – Social Competence (DBR-SC) data: How many ratings are necessary? School Psychology Quarterly, 31, 431-442. The first approach involves reliability estimated by converting intraclass correlation coefficients to reliability coefficients using the approach suggested by Shrout & Fleiss (1979), with data obtained from a study designed to examine DBR-SIS with a population of children who have intensive needs related to social competence. These data provide insight into the consistency of student ratings across observation periods and indicate that ratings are very stable across observation periods. In addition, analysis estimated the minimum number of observations necessary to reach .8 reliability. This part is particularly important as it provides guidance as to how many observations are necessary for the estimate to be reliable.

Describe the sample(s), including size and characteristics, for each reliability analysis conducted:: Participants met the following inclusion criteria: (a) student aged 11 to 14, (b) diagnosis of ASD or Special Education eligibility criteria of autism or school-identified social need,1 and (c) cognitive functioning (i.e., full-scale IQ) within 2.0 standard deviations of the mean. A sample of 33 students at six schools constituted the SCI-A group and 30 students at six schools constituted the BAU group for a total of 63 participants. Two students were dropped from analyses because of misreported IQ scores and one additional student was dropped because of a lack of data on outcome measures. The resulting sample includes 60 students (29 SCI-A and 31 BAU). Parent consent and student assent were obtained before the start of the study. Across all student participants, 55 students were male and 5 were female. The majority of participants met criteria for special education services, specifically 43.33% in the Autism category, 25% in the Emotional Disturbance category, and 20% in the Other Health Impairment category. Two students met eligibility for Specific Learning Disability, and one student met eligibility for Speech/Language Impairment. Four students did not have a current individualized education plan (IEP), and one student had a Section 504 Plan without an IEP.

Describe the analysis procedures for each reported type of reliability:: To assess DBR-SC performance, intraclass correlations (ICC) coefficients were first calculated to evaluate the consistency of DBR-SC data points across time within students. ICC and other statistics were calculated separately for different time points and different groups (SCI-A and BAU). This resulted in a 2 (treatment groups) 3 (times of assessment) mixed design with repeated measures on the time of assessment. ICCs were calculated via a two level unconditional multilevel structural equation model, with DBR-SC observations at level 1 and students at level 2 of the model. For each model, variances and covariances of each DBR-SC subscale were estimated both between and within observations. ICC was computed as the ratio of between group variance to total variance (between group variance within group variance). Next, ICCs were used to generate reliability estimates in accordance with recommendations from Shrout and Fleiss (1979).

In the table(s) below, report the results of the reliability analyses described above (e.g., model-based evidence, internal consistency or inter-rater reliability coefficients). Report results by age range or grade level (if relevant) and include detail about the type of reliability data, statistic generated, and sample size and demographic information.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:: Reliability Type Age or Grade n (examinees) n (raters) Median Coefficient Confidence Interval ICC 11-14 Years Old 23 60 0.793 - 0.947 -

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Do you have reliability data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?: No

If yes, fill in data for each subgroup with disaggregated reliability data.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Validity (Intensive Population): Validity for Students in Need of Intensive Intervention

Age / Grade Informant	Grades K-5 Teacher	Grades 6-8 Teacher
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.: Concurrent validity serves as the primary source of data presented as related to DBR-SIS. As described, the intended purpose of DBR-SIS is in formative uses. As such, a primary source of validity data comes from concurrent comparisons with Systematic Direct Observation. Kilgus, S. P., Riley-Tillman, T. C., Stichter, J. P., Schoemann, A., & Owens, S. (in press). Examining the concurrent criterion-related validity of Direct Behavior Rating Single Item Scales (DBR-SIS) with students with high functioning autism. Assessment for Effective Intervention. SDO data were collected across 15-min observation sessions, each of which was divided into 30-sec intervals. The SDO employed both partial interval recording and momentary time sampling recording to estimate percentage of time target students engaged in relevant classroom behaviors. Three OCF behaviors were considered as part of this study. Academic engagement (SDO-AE) was defined as physical orientation to the teacher or current stimuli or active participation in the lesson or social interaction. Disruptive behavior (SDO-DB) was defined as purposeful engagement in behavior that interrupts the natural flow of academic instruction or classroom functioning. Noncompliance (SDO-NC) was defined as failure to follow/complete verbal or gestural behavioral directions provided by the teacher to a group or target student within 5 seconds. Note that SDO-DB and NC were coded using partial interval recording (where a behavior was marked as having occurred if it was observed at any point within each 30-sec interval), whereas SDO-AE was coded using momentary time sampling (where a behavior was marked as having occurred if it was observed at the end of each 30-sec interval). Partial interval was deemed appropriate given the typically irregular and brief, albeit still interruptive, nature of both DB and NC. Momentary time sampling was also considered appropriate given the expectation of frequent and nearly continuous AE within the classroom.

Describe the sample(s), including size and characteristics, for each validity analysis conducted.: Participants met the following inclusion criteria: (a) student aged 11 to 14, (b) diagnosis of ASD or Special Education eligibility criteria of autism or school-identified social need,1 and (c) cognitive functioning (i.e., full-scale IQ) within 2.0 standard deviations of the mean. A sample of 33 students at six schools constituted the SCI-A group and 30 students at six schools constituted the BAU group for a total of 63 participants. Two students were dropped from analyses because of misreported IQ scores and one additional student was dropped because of a lack of data on outcome measures. The resulting sample includes 60 students (29 SCI-A and 31 BAU). Parent consent and student assent were obtained before the start of the study. Across all student participants, 55 students were male and 5 were female. The majority of participants met criteria for special education services, specifically 43.33% in the Autism category, 25% in the Emotional Disturbance category, and 20% in the Other Health Impairment category. Two students met eligibility for Specific Learning Disability, and one student met eligibility for Speech/Language Impairment. Four students did not have a current individualized education plan (IEP), and one student had a Section 504 Plan without an IEP.

Describe the analysis procedures for each reported type of validity.: Correlation coefficients were calculated to examine the relationship between each DBR and SDO target within each time point (i.e., pre, mid, post). hypothesized convergent relations corresponded to the pairings of (a) DBR-AE and SDO-AE, (b) DBR-DB and SDO-DB, (c) DBR-RS and SDO-NC. All other DBR-SDO pairings were hypothesized to be discriminant relations and were thus expected to be lower in magnitude relative to convergent relations. We followed Cohen’s (1988) guidelines for effect size interpretations of correlation magnitudes, where r ≥ .10 was considered small, r ≥ .30 medium, and r ≥ .50 large. In the interest of limiting over-interpretation of spurious or non-meaningful relations, conclusions regarding the presence of concurrent criterion-related validity were limited to medium and large correlations. Next, correlation coefficients were compared across time points within each DBR-SDO pairing to examine the extent to which correlation magnitude varied over time. This testing was accomplished via chi-square nested model comparisons between a model with correlations freely estimated across time to a model that specified correlational equivalence (H0: ρ1 = ρ2= ρ3). Finally, a single overall correlation was estimated and evaluated within each DBR-SDO pair to evaluate the relationship between each measure across all time points. All correlations were estimated with Mplus v. 7.11 (Muthén & Muthén, 1998–2013). Participants met the following inclusion criteria: (a) student aged 11 to 14, (b) diagnosis of ASD or Special Education eligibility criteria of autism or school-identified social need, and (c) cognitive functioning (i.e., full-scale IQ) within 2.0 standard deviations of the mean.

In the table(s) below, report the results of the validity analyses described above (e.g., concurrent or predictive validity, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of validity analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Describe the degree to which the provided data support the validity of the tool.

Do you have validity data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?: No

If yes, fill in data for each subgroup with disaggregated validity data.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of validity analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Decision Rules: Data to Support Intervention Change

Age / Grade: Informant	Grades K-5 Teacher	Grades 6-8 Teacher
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

Are validated decision rules for when changes to the intervention need to be made specified in your manual or published materials?: No
If yes, specify the decision rules:

What is the evidentiary basis for these decision rules?

Decision Rules: Data to Support Intervention Selection

Age / Grade: Informant	Grades K-5 Teacher	Grades 6-8 Teacher
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

Are validated decision rules for what intervention(s) to select specified in your manual or published materials?: No
If yes, specify the decision rules:

What is the evidentiary basis for these decision rules?

Data Collection Practices

Most tools and programs evaluated by the NCII are branded products which have been submitted by the companies, organizations, or individuals that disseminate these products. These entities supply the textual information shown above, but not the ratings accompanying the text. NCII administrators and members of our Technical Review Committees have reviewed the content on this page, but NCII cannot guarantee that this information is free from error or reflective of recent changes to the product. Tools and programs have the opportunity to be updated annually or upon request.

Summary

Tool Information
Descriptive Information
Administration
Training & Scoring
Usability

Performance Level
Reliability
Validity
Bias Analysis

Growth Standards
Sensitivity to Behavior Change
Reliability (Intensive Population)
Validity (Intensive Population)
Decision Rules

Data Collection Practices

DBR-SIS (Direct Behavior Rating - Single Item Scale)Disruptive Behavior

Summary

Tool Information

Descriptive Information

Acquisition and Cost Information

Administration

Training & Scoring

Training

Scoring

Levels of Performance and Usability

Performance Level

Reliability

Validity

Bias Analysis

Growth Standards

Sensitivity to Behavior Change

Reliability (Intensive Population): Reliability for Students in Need of Intensive Intervention

Validity (Intensive Population): Validity for Students in Need of Intensive Intervention

Decision Rules: Data to Support Intervention Change

Decision Rules: Data to Support Intervention Selection

Data Collection Practices

DBR-SIS (Direct Behavior Rating - Single Item Scale)
Disruptive Behavior