Momentary Time-Sampling
Academic Engagement

Summary

Descriptive Information

Momentary time-sampling (MTS) is a behavior assessment methodology within systematic direct observation wherein an observation period is divided into intervals, and behavior during each interval is scored as an occurrence if the behavior is occurring at the moment the interval begins or ends (depending on the specific procedures used). By dividing intervals scored as occurrences by the total number of intervals, MTS provides an estimate of the proportion or percentage of an observation period during which a target behavior was occurring (i.e., “prevalence”). Depending on the characteristics of the underlying behavior, MTS may also be used to estimate the frequency of a target behavior (see Suen & Ary, 1989). However, this latter use of MTS is uncommon in the social sciences literature. Like interval-recording procedures (i.e., partial-interval and whole-interval recording), MTS can be used with a number of interval lengths, observation durations, and target behaviors depending upon the target assessment question. This review focuses on the use of MTS with 15-second intervals, with a target behavior of academic engagement (or AE; defined as including both passive and active engagement, as described below) and a focus on individual students (rather than progress monitoring academic engagement for an entire class or small group). The studies reviewed are those that explicitly examine the reliability, validity, and levels of performance of data derived from MTS with 15-second intervals for AE with individual students as the target.

Acquisition & Cost

Where to Obtain:: John Hintze & William Matthews

Initial Cost:: Free

Replacement Cost:: Free

Included in Cost:: Momentary time-sampling is described in numerous books, articles, and presentations. Its methods are simple and transparent, and as a result, MTS may be considered free to use. However, training in the use of MTS typically occurs within the context of a graduate course on behavior assessment (e.g., school psychology, special education, applied behavior analysis), and as a result, some costs may be associated with its use. However, this is not necessarily a “prerequisite” to use of the instrument, and the cost for training will vary by user (from free to thousands of dollars).; The information provided on MTS in published tools varies widely.

Training & Technical Support

Training Requirements:: Generally, 8 hours or more, however training time varies by study. Fellers & Saudargas, 1987: 12 hours (using SECOS system) Briesch, Chafouleas, & Riley-Tillman, 2010: 8 hours (using only MTS) Hintze & Matthews, 2004: 4 hours (using only MTS) Slate & Saudargas, 1986b: 13-15 hours (using SECOS system)

Qualified Administrators:: No minimum qualifications specified.

Access to Technical Support:: MTS is a common methodology for direct observation, and technical support should be available from any expert in behavior assessment.

Administration

Assessment Format:

Direct observation

Scoring Time:

Scoring is automatic OR

Scores Generated:

Administration Time:

minutes per

Scoring Method:

Manually (by hand)
Automatically (computer-scored)

Technology Requirements:

Tool Information

Descriptive Information

Please provide a description of your tool:: Momentary time-sampling (MTS) is a behavior assessment methodology within systematic direct observation wherein an observation period is divided into intervals, and behavior during each interval is scored as an occurrence if the behavior is occurring at the moment the interval begins or ends (depending on the specific procedures used). By dividing intervals scored as occurrences by the total number of intervals, MTS provides an estimate of the proportion or percentage of an observation period during which a target behavior was occurring (i.e., “prevalence”). Depending on the characteristics of the underlying behavior, MTS may also be used to estimate the frequency of a target behavior (see Suen & Ary, 1989). However, this latter use of MTS is uncommon in the social sciences literature. Like interval-recording procedures (i.e., partial-interval and whole-interval recording), MTS can be used with a number of interval lengths, observation durations, and target behaviors depending upon the target assessment question. This review focuses on the use of MTS with 15-second intervals, with a target behavior of academic engagement (or AE; defined as including both passive and active engagement, as described below) and a focus on individual students (rather than progress monitoring academic engagement for an entire class or small group). The studies reviewed are those that explicitly examine the reliability, validity, and levels of performance of data derived from MTS with 15-second intervals for AE with individual students as the target.

Is your tool designed to measure progress towards an end-of-year goal (e.g., oral reading fluency) or progress towards a short-term skill (e.g., letter naming fluency)?: End-year goal
Short-term skill

The tool is intended for use with the following grade(s).

Preschool / Pre - kindergarten
selected

Kindergarten
selected

First grade
selected

Second grade
selected

Third grade
selected

Fourth grade
selected

Fifth grade
selected

Sixth grade
selected

Seventh grade
selected

Eighth grade
selected

Ninth grade
selected

Tenth grade
selected

Eleventh grade
selected

Twelfth grade

The tool is intended for use with the following age(s).

0-4 years old
selected

5 years old
selected

6 years old
selected

7 years old
selected

8 years old
selected

9 years old
selected

10 years old
selected

11 years old
selected

12 years old
selected

13 years old
selected

14 years old
selected

15 years old
selected

16 years old
selected

17 years old
selected

18 years old

The tool is intended for use with the following student populations.

Students in general education
selected

Students with disabilities
selected

English language learners

ACADEMIC ONLY: What dimensions does the tool assess?

Reading

Global Indicator of Reading Competence
not selected

Listening Comprehension
not selected

Vocabulary
not selected

Phonemic Awareness
not selected

Decoding

Passage Reading
not selected

Word Identification
not selected

Comprehension

Spelling & Written Expression

Global Indicator of Spelling Competence
not selected

Global Indicator of Writting Expression Competence

Mathematics

Global Indicator of Mathematics Comprehension
not selected

Early Numeracy
not selected

Mathematics Concepts
not selected

Mathematics Computation
not selected

Mathematics Application
not selected

Fractions

Algebra

Other

Please describe specific domain, skills or subtests:

BEHAVIOR ONLY: Please identify which broad domain(s)/construct(s) are measured by your tool and define each sub-domain or sub-construct.: This review focuses on examinations of the properties of data derived from MTS with 15-second intervals for measuring AE, when this construct is defined as including both active (e.g., writing on a piece of paper) and passive (e.g., looking at the teacher during a lecture) engagement. In other words, a student could engage in active or passive engagement in order to be considered to be engaging in AE.

BEHAVIOR ONLY: Which category of behaviors does your tool target?: Externalizing

Acquisition and Cost Information

Where to obtain:

Email Address
Address
Phone Number
Website

Initial cost for implementing program:

Cost: $0.00
Unit of cost

Replacement cost per unit for subsequent use:

Cost: $0.00
Unit of cost
Duration of license

Additional cost information:

Describe basic pricing plan and structure of the tool. Provide information on what is included in the published tool, as well as what is not included but required for implementation.: Momentary time-sampling is described in numerous books, articles, and presentations. Its methods are simple and transparent, and as a result, MTS may be considered free to use. However, training in the use of MTS typically occurs within the context of a graduate course on behavior assessment (e.g., school psychology, special education, applied behavior analysis), and as a result, some costs may be associated with its use. However, this is not necessarily a “prerequisite” to use of the instrument, and the cost for training will vary by user (from free to thousands of dollars).

Provide information about special accommodations for students with disabilities.: The information provided on MTS in published tools varies widely.

Administration

BEHAVIOR ONLY: What type of administrator is your tool designed for?

General education teacher
selected

Special education teacher
not selected

Parent

Child

External observer
not selected

Other

If other, please specify:

BEHAVIOR ONLY: What is the administration format?

Direct observation
not selected

Rating scale
not selected

Checklist

Performance measure
not selected

Other

If other, please specify:

BEHAVIOR ONLY: What is the administration setting?

General education classroom
selected

Special education classroom
not selected

School office
selected

Recess

Lunchroom

Home

Other

If other, please specify:

Does the program require technology?

If yes, what technology is required to implement your program? (Select all that apply)

Computer or tablet
not selected

Internet connection
not selected

Other technology (please specify)

If your program requires additional technology not listed above, please describe the required technology and the extent to which it is combined with teacher small-group instruction/intervention:

What is the administration context?

Individual

Small group If small group, n=

Large group If large group, n=

Computer-administered
not selected

Other

If other, please specify:

What is the administration time?

Time in minutes

per (student/group/other unit)

Additional scoring time:

Time in minutes

per (student/group/other unit)

How many alternate forms are available, if applicable?

Number of alternate forms

per (grade/level/unit)

ACADEMIC ONLY: What are the discontinue rules?

No discontinue rules provided
not selected

Basals

Ceilings

Other

If other, please specify:

BEHAVIOR ONLY: Can multiple students be rated concurrently by one administrator?

Yes

If yes, how many students can be rated concurrently?

Briesch, Hemphill, Volpe, & Daniels (2015) examined this issue. Data derived from observing 14 students at a time most closely approximated criterion levels when students were individually observed in a random sequence by interval. This was extended by Dart, Radley, Briesch, Furlow, & Cavell (2016), who noted that a 15s MTS procedure where students were individually observed in a fixed sequence by interval, or where students were individually observed in a random sequence by interval, both resulted in data closely approximating a criterion estimate for academic engagement of a 13-student classroom.

Training & Scoring

Training

Is training for the administrator required?: Yes

Describe the time required for administrator training, if applicable:: Generally, 8 hours or more, however training time varies by study. Fellers & Saudargas, 1987: 12 hours (using SECOS system) Briesch, Chafouleas, & Riley-Tillman, 2010: 8 hours (using only MTS) Hintze & Matthews, 2004: 4 hours (using only MTS) Slate & Saudargas, 1986b: 13-15 hours (using SECOS system)

Please describe the minimum qualifications an administrator must possess.: No minimum qualifications

Are training manuals and materials available?: Yes

Are training manuals/materials field-tested?: No

Are training manuals/materials included in cost of tools?: Yes
If No, please describe training costs:

Can users obtain ongoing professional and technical support?: Yes
If Yes, please describe how users can obtain support:: MTS is a common methodology for direct observation, and technical support should be available from any expert in behavior assessment.

Scoring

BEHAVIOR ONLY: What types of scores result from the administration of the assessment?

Score
Observation	Behavior Rating
Frequency Duration Interval Latency	Raw score

Conversion
Observation	Behavior Rating
Rate Percent	Standard score Subscale/ Subtest Composite Stanine Percentile ranks Normal curve equivalents IRT based scores

Interpretation
Observation	Behavior Rating
Error analysis Peer comparison Rate of change	Dev. benchmarks Age-Grade equivalent

How are scores calculated?

Manually (by hand)
selected

Automatically (computer-scored)
not selected

Other

If other, please specify:

Do you provide basis for calculating performance level scores?

Yes

What is the basis for calculating performance level and percentile scores?

Age norms

Grade norms
not selected

Classwide norms
not selected

Schoolwide norms
not selected

Stanines

Normal curve equivalents

What types of performance level scores are available?

Raw score

Standard score
not selected

Percentile score
not selected

Grade equivalents
not selected

IRT-based score
not selected

Age equivalents
not selected

Stanines

Normal curve equivalents
not selected

Developmental benchmarks
not selected

Developmental cut points
not selected

Equated

Probability
not selected

Lexile score
not selected

Error analysis
not selected

Composite scores
not selected

Subscale/subtest scores
not selected

Other

If other, please specify:

Please describe the scoring structure. Provide relevant details such as the scoring format, the number of items overall, the number of items per subscale, what the cluster/composite score comprises, and how raw scores are calculated.: Typically, when calculating level of performance, comparisons are made within-student (“absolute” decision-making using data collected for a specific student across time) or between a target student and a peer (“relative” decision-making, as takes place within the BOSS). Prevalence is the most frequent score resulting from MTS, and may be calculated by summing the number of intervals scored as an occurrence and dividing this value by the total number of intervals observed. Prevalence can be converted into a percentage (by multiplying prevalence with 100) or into a duration estimate (by multiplying prevalence with the observation period length). Frequency can also be calculated according to formulas found in Suen & Ary (1989) when certain criteria for the interval length and behavior stream are met.

Do you provide basis for calculating slope (e.g., amount of improvement per unit in time)?: No

ACADEMIC ONLY: Do you provide benchmarks for the slopes?

ACADEMIC ONLY: Do you provide percentile ranks for the slopes?

What is the basis for calculating slope and percentile scores?

Age norms

Grade norms
not selected

Classwide norms
not selected

Schoolwide norms
not selected

Stanines

Normal curve equivalents

Describe the tool’s approach to progress monitoring, behavior samples, test format, and/or scoring practices, including steps taken to ensure that it is appropriate for use with culturally and linguistically diverse populations and students with disabilities.

Levels of Performance and Usability

Are levels of performance specified in your manual or published materials?

Yes

If yes, specify the levels of performance and how they are used for progress monitoring:

[NOTE: These data are aggregated across students, as befits information regarding general levels of performance.] Fellers & Saudargas, 1987. Observed behavior of two groups of 15 female students (LD and non-LD; total n = 30) across grades 2, 4, and 5 from public elementary schools. LD and non-LD students were matched based on classroom (i.e., one for each group drawn from each classroom). Observed using SECOS system, which utilizes a combined definition of academic engagement called “schoolwork” with 15s MTS procedures. Students were observed at least three times for 20 minutes across two weeks. Percentage of total intervals during which “seatwork” was indicated, as mean (M) and standard deviation (SD). LD group. M = 68.3%, SD = 12.7%. Non-LD group. M = 73.9%, SD = 14.3%. Fellers, G., & Saudargas, R. A. (1987). Classroom Behaviors of LD and Nonhandicapped Girls. Learning Disability Quarterly, 10(3), 231. http://doi.org/10.2307/1510495 Slate & Saudargas, 1986a (“Differences in learning disabled and average students’ classroom behaviors”). Observed behavior of two groups of 14 male students (LD and non-LD; total n = 28) across grades 3, 4, and 5 from public elementary schools. Of LD group, White = 7, Black = 7. Of non-LD group, White = 6, Black = 8. Observed using SECOS system, which utilizes a combined definition of academic engagement called “schoolwork” with 15s MTS procedures. Students were observed four to six times for 20 minutes across 10 weeks. Percentage of total intervals during which “seatwork” was indicated, as mean (M) and standard deviation (SD). LD group. M = 67.9%, SD = 12.1%. Non-LD group. M = 68.1%, SD = 8.53%. Slate, J. R., & Saudargas, R. A. (1986). Differences in Learning Disabled and Average Students’ Classroom Behaviors. Learning Disability Quarterly, 9(1), 61. http://doi.org/10.2307/1510402 Slate & Saudargas, 1986b (“Differences in the classroom behaviors of behaviorally disordered and regular class children”). Observed behavior of two groups of 13 male students (behaviorally disordered [BD] and non-BD; total n = 26) across grades 3, 4, and 5 from public elementary schools. Observed using SECOS system, which utilizes a combined definition of academic engagement called “schoolwork” with 15s MTS procedures. Students were observed four times for 20 minutes, with each individual student’s observations occurring within a single two week period. Percentage of total intervals during which “seatwork” was indicated, as mean (M) and standard deviation (SD). BD group. M = 66.83%, SD = 14.38%. Non-BD group. M = 67.52%, SD = 7.40%. Slate, J. R., & Saudargas, R. A. (1986). Differences in the classroom behaviors of behaviorally disordered and regular class children. Behavioral Disorders, 45–53. Zigmund, Kerr, & Schaeffer, 1988. Observed behavior of three groups of students: students with LD, students with emotional disturbance (ED), and a control group of students. Observed using 15s MTS procedures of on-task behavior. Students were observed twice weekly for 30 minutes. LD group: n = 36. Male = 28, Female = 8. Grades 9 to 11. ED group: n = 8. Male = 7, Female = 1. Grades 9 to 12. Control students: typical students, randomly selected at each observation of a student with LD or ED. Number of total intervals during which “on-task” was indicated, as mean (M) and standard deviation (SD). Total intervals = 15. LD group. M = 8.49, SD = 2.734 ED group. M = 8.78, SD = 1.974 Control group. M = 8.82, SD = 1.742 Zigmond, N., Kerr, M. M., & Schaeffer, A. (1988). Behavior patterns of learning disabled and non-learning-disabled adolescents in high school academic classes. Remedial and Special Education, 9(2), 6–11.

What is the basis for specifying levels of performance?

Norm-referenced
not selected

Criterion-referenced
not selected

Other

If other, please specify:

False

If norm-referenced, describe the normative profile.

National representation (check all that apply):

Northeast:

New England

Middle Atlantic

Midwest:

East North Central

West North Central

South:

South Atlantic

East South Central

West South Central

West:

Mountain

Pacific

Local representation (please describe, including number of states)

Date: 1986-1988
Size: Small separate samples.

Gender (Percent)

Male
Female
Unknown

SES indicators (Percent)

Eligible for free or reduced-price lunch
Other SES Indicators

Race/Ethnicity (Percent)

White, Non-Hispanic
Black, Non-Hispanic
Hispanic
American Indian/Alaska Native
Asian/Pacific Islander
Other
Unknown

Disability classification (Please describe)
First language (Please describe)
Language proficiency status (Please describe)

If criterion-referenced, describe procedures for specifying levels of performance.

Describe any other procedures for specifying levels of performance.

Has a usability study been conducted on your tool (i.e., a study that examines the extent to which the tool is convenient and practicable for use?)

Yes

If yes, please describe, including the results:

The broader class of systematic direct observation (SDO) methodologies, which includes SDO, has been examined in a combined usability and social validity study conducted by Riley-Tillman, Chafouleas, Briesch, and Eckert (2008). The total sample size across two samples of school psychologists was 191 (92 in Study 1, 99 in Study 2). Most respondents worked in public schools (83.7%, 88.9% by Study), were female (76.1%, 74.7%), practiced with a “Masters plus 30” credential (48.9%, 41.4%), and were fairly evenly split across years in practice, urbanicity, and age group served. Results from responses to 16 Likert-type-scaled items (1 = strongly disagree, 6 = strongly agree) indicated that SDO procedures were generally perceived as acceptable to very acceptable (mean scores for positively-worded items were 4.4 to 5.1 across samples). Items specific to the time and intrusiveness upon teachers/staff, school psychologists, and the general classroom environment were rated from a mean of 2.0 to 2.8 using the scale described above, indicating low to moderate feelings towards the intrusiveness of procedures. To wit, each of these items began with the stem “The use of this technique was overly intrusive on…”. Mean responses to the item “This technique provides a feasible method of assessing the effectiveness of an intervention” were 4.7 and 4.8 across samples. Riley-Tillman, T., Chafouleas, S., Briesch, A., & Eckert, T. (2008). Daily Behavior Report Cards and Systematic Direct Observation: An investigation of the acceptability, reported training and use, and decision reliability among school psychologists. Journal of Behavioral Education, 17(4), 313-327. doi:10.1007/s10864-008-9070-5

Has a social validity study been conducted on your tool (i.e., a study that examines the significance of goals, appropriateness of procedures (e.g., ethics, cost, practicality), and the importance of treatment effects)?

Yes

If yes, please describe, including the results:

Many of the items in the study described above (Riley-Tillman, Chafouleas, Briesch, and Eckert, 2008) relate to the specific social validity of SDO procedures. For instance, mean responses for “This technique should prove effective in monitoring an intervention” were 4.9 and 5.0 across samples, “Use of this technique was a good way to handle the child’s problems” were 4.7 and 4.4 across samples, and “Overall, using this technique would be beneficial for the child” were 4.9 and 4.6 across samples. Riley-Tillman, T., Chafouleas, S., Briesch, A., & Eckert, T. (2008). Daily Behavior Report Cards and Systematic Direct Observation: An investigation of the acceptability, reported training and use, and decision reliability among school psychologists. Journal of Behavioral Education, 17(4), 313-327. doi:10.1007/s10864-008-9070-5

Performance Level

Reliability

Age / Grade Informant	Early childhood / K Researcher	Grades K-5 Researcher
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

*Offer a justification for each type of reliability reported, given the type and purpose of the tool.

*Describe the sample(s), including size and characteristics, for each reliability analysis conducted.: BRIESCH, CHAFOULEAS, & RILEY-TILLMAN (2010): Examinee sample: 12 students. Mean age = 5 years 11 months. White = 10, African-American = 1, Asian = 1. Female = 7, Male = 5. SDO rater sample: 2 researchers, trained with videos to 95% IOA criterion between observers (kappa = .89). Training lasted 8 hours. WOOD, HOJNOSKI, LARACY, & OLSON (2015): Examinee sample: 24 children. Female = 11, Male = 13. Age range = 38 – 65 months. Mean age = 51 months, SD = 8.30 months. Majority Caucasian. Primary Language of English = 23, Spanish = 1. Special education services for speech/language = 6. Special education services for unidentified needs = 1. Rater sample: 3 researchers. Trained using three training videos with criterion of 85% agreement on 3 consecutive videos met. ZAKSZESKI, HOJNOSKI, & WOOD (2017) Examinee sample: 24 children. Female = 11, Male = 13. Age range = 38 – 66 months. Mean age = 51 months, SD = 8 months. White = 16. Primary Language of English = 23, Spanish = 1. Special education services = 7. Rater sample: 3 researchers. Trained using three training videos with criterion of 85% agreement on all behavior categories.

*Describe the analysis procedures for each reported type of reliability.

*In the table(s) below, report the results of the reliability analyses described above (e.g., model-based evidence, internal consistency or inter-rater reliability coefficients). Include detail about the type of reliability data, statistic generated, and sample size and demographic information.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:: *Briesch, Chafouleas, & Riley-Tillman, 2010 +Hintze & Matthews, 2004 ^(Johnson, Chafouleas, & Briesch, 2017)

Manual cites other published reliability studies:: Yes

Provide citations for additional published studies.: Briesch, A. M., Chafouleas, S. M., & Riley-Tillman, T. C. (2010). Generalizability and dependability of behavior assessment methods to estimate academic engagement: A comparison of systematic direct observation and direct behavior rating. School Psychology Review, 39(3), 408. Wood, B. K., Hojnoski, R. L., Laracy, S. D., & Olson, C. L. (2015). Comparison of Observational Methods and Their Relation to Ratings of Engagement in Young Children. Topics in Early Childhood Special Education, 0271121414565911. Zakszeski, B. N., Hojnoski, R. L., & Wood, B. K. (2017). Considerations for Time Sampling Interval Durations in the Measurement of Young Children’s Classroom Engagement. Topics in Early Childhood Special Education, 37(1), 42-53.

Do you have reliability data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?: No

If yes, fill in data for each subgroup with disaggregated reliability data.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Validity

Age / Grade Informant	Early childhood / K Researcher	Grades K-5 Researcher
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

*Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.

*Describe the sample(s), including size and characteristics, for each validity analysis conducted.: WOOD, HOJNOSKI, LARACY, & OLSON, 2015: Examinee sample: 24 children. Female = 11, Male = 13. Age range = 38 – 65 months. Mean age = 51 months, SD = 8.30 months. Majority Caucasian. Primary Language of English = 23, Spanish = 1. Special education services for speech/language = 6. Special education services for unidentified needs = 1. Rater sample: 3 researchers. Trained using three training videos with criterion of 85% agreement on 3 consecutive videos met. SAUDARGAS & ZANOLLI, 1990: Examinee sample: 16 students. Grade 1 = 2, Grade 2 = 1, Grade 3 = 5, Grade 4 = 8. Rater sample: 2 graduate students. Trained using videotapes.

*Describe the analysis procedures for each reported type of validity.

*In the table below, report the results of the validity analyses described above (e.g., concurrent or predictive validity, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of validity analysis not compatible with above table format:: *Wood, Hojnoski, Laracy, & Olson, 2015 +Zakszeski, Hojnoski, & Wood, 2017 ^Saudargas & Zanolli, 1990

Manual cites other published reliability studies:: Yes

Provide citations for additional published studies.: Wood, B. K., Hojnoski, R. L., Laracy, S. D., & Olson, C. L. (2015). Comparison of Observational Methods and Their Relation to Ratings of Engagement in Young Children. Topics in Early Childhood Special Education, 0271121414565911. Zakszeski, B. N., Hojnoski, R. L., & Wood, B. K. (2017). Considerations for Time Sampling Interval Durations in the Measurement of Young Children’s Classroom Engagement. Topics in Early Childhood Special Education, 37(1), 42-53.

Describe the degree to which the provided data support the validity of the tool.: As is true for information regarding sensitivity to change, validity evidence for estimates of academic engagement derived from MTS 15-second procedures is sparse, given that time-sampling procedures in general and MTS specifically are often viewed as a gold standard measure when continuous observation is not feasible. However, recently, Wood, Hojnoski, Laracy, and Olson (2015) examined error of MTS-derived estimates of prevalence when compared to those derived from continuous observation. MTS was found to be the least error-prone estimate when compared to PI and WI sampling. Absolute mean error (across students) was 6.28%, while mean measurement error that maintained the properties of over/underestimation was -3.35%. The Pearson correlation coefficient between MTS-derived estimates and those from continuous observation was .83, and Spearman’s rho, a non-parametric rank-order correlation coefficient, was .71 when MTS-derived estimates were compared to expert rankings of student engagement. In a follow-up to this study, Zakszeski, Hojnoski, and Wood (2017) examined the error of MTS-derived estimates of prevalence when compared to those derived from continuous observation. The Pearson correlation coefficient between MTS-derived estimates and those from continuous observation was .890 (p < .01), with an observed measurement error of 2.04% (percentage derived from continuous observation subtracted from percentage derived from MTS). In a less quantitative study, Saudargas and Zanolli (1990) used visual analysis to examine patterns of engagement estimates derived from both continuous observation and MTS. In almost all cases, trends between both data patterns were consistent across days, even when level was discrepant. Quantitative results reported by authors indicates that there was a less than 9% discrepancy identified between scores derived from MTS and continuous observation for 18 of 22 observations (82%).

Do you have validity data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?: No

If yes, fill in data for each subgroup with disaggregated validity data.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of validity analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Bias Analysis

Age / Grade: Informant	Early childhood / K Researcher	Grades K-5 Researcher
Rating	Not Provided	Not Provided

Have you conducted additional analyses related to the extent to which your tool is or is not biased against subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)? Examples might include Differential Item Functioning (DIF) or invariance testing in multiple-group confirmatory factor models.: No

If yes,
a. Describe the method used to determine the presence or absence of bias:

b. Describe the subgroups for which bias analyses were conducted:

c. Describe the results of the bias analyses conducted, including data and interpretative statements. Include magnitude of effect (if available) if bias has been identified.

Growth Standards

Sensitivity to Behavior Change

Age / Grade: Informant	Early childhood / K Researcher	Grades K-5 Researcher
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

Describe evidence that the monitoring system produces data that are sensitive to detect incremental change (e.g., small behavior change in a short period of time such as every 20 days, or more frequently depending on the purpose of the construct). Evidence should be drawn from samples targeting the specific population that would benefit from intervention. Include in this example a hypothetical illustration (with narrative and/or graphics) of how these data could be used to monitor student performance frequently enough and with enough sensitivity to accurately assess change:

Reliability (Intensive Population): Reliability for Students in Need of Intensive Intervention

Age / Grade Informant	Early childhood / K Researcher	Grades K-5 Researcher
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

Offer a justification for each type of reliability reported, given the type and purpose of the tool:

Describe the sample(s), including size and characteristics, for each reliability analysis conducted:

Describe the analysis procedures for each reported type of reliability:

In the table(s) below, report the results of the reliability analyses described above (e.g., model-based evidence, internal consistency or inter-rater reliability coefficients). Report results by age range or grade level (if relevant) and include detail about the type of reliability data, statistic generated, and sample size and demographic information.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Do you have reliability data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?: No

If yes, fill in data for each subgroup with disaggregated reliability data.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Validity (Intensive Population): Validity for Students in Need of Intensive Intervention

Age / Grade Informant	Early childhood / K Researcher	Grades K-5 Researcher
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.

Describe the sample(s), including size and characteristics, for each validity analysis conducted.

Describe the analysis procedures for each reported type of validity.

In the table(s) below, report the results of the validity analyses described above (e.g., concurrent or predictive validity, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of validity analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Describe the degree to which the provided data support the validity of the tool.

Do you have validity data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?: No

If yes, fill in data for each subgroup with disaggregated validity data.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of validity analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Decision Rules: Data to Support Intervention Change

Age / Grade: Informant	Early childhood / K Researcher	Grades K-5 Researcher
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

Are validated decision rules for when changes to the intervention need to be made specified in your manual or published materials?: No
If yes, specify the decision rules:

What is the evidentiary basis for these decision rules?

Decision Rules: Data to Support Intervention Selection

Age / Grade: Informant	Early childhood / K Researcher	Grades K-5 Researcher
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

Are validated decision rules for what intervention(s) to select specified in your manual or published materials?: No
If yes, specify the decision rules:

What is the evidentiary basis for these decision rules?

Data Collection Practices

Most tools and programs evaluated by the NCII are branded products which have been submitted by the companies, organizations, or individuals that disseminate these products. These entities supply the textual information shown above, but not the ratings accompanying the text. NCII administrators and members of our Technical Review Committees have reviewed the content on this page, but NCII cannot guarantee that this information is free from error or reflective of recent changes to the product. Tools and programs have the opportunity to be updated annually or upon request.

Summary

Tool Information
Descriptive Information
Administration
Training & Scoring
Usability

Performance Level
Reliability
Validity
Bias Analysis

Growth Standards
Sensitivity to Behavior Change
Reliability (Intensive Population)
Validity (Intensive Population)
Decision Rules

Data Collection Practices

Momentary Time-SamplingAcademic Engagement

Summary

Tool Information

Descriptive Information

Acquisition and Cost Information

Administration

Training & Scoring

Training

Scoring

Levels of Performance and Usability

Performance Level

Reliability

Validity

Bias Analysis

Growth Standards

Sensitivity to Behavior Change

Reliability (Intensive Population): Reliability for Students in Need of Intensive Intervention

Validity (Intensive Population): Validity for Students in Need of Intensive Intervention

Decision Rules: Data to Support Intervention Change

Decision Rules: Data to Support Intervention Selection

Data Collection Practices

Momentary Time-Sampling
Academic Engagement