Momentary Time-Sampling
Academic Engagement

Summary

Momentary time-sampling (MTS) is a behavior assessment methodology within systematic direct observation wherein an observation period is divided into intervals, and behavior during each interval is scored as an occurrence if the behavior is occurring at the moment the interval begins or ends (depending on the specific procedures used). By dividing intervals scored as occurrences by the total number of intervals, MTS provides an estimate of the proportion or percentage of an observation period during which a target behavior was occurring (i.e., “prevalence”). Depending on the characteristics of the underlying behavior, MTS may also be used to estimate the frequency of a target behavior (see Suen & Ary, 1989). However, this latter use of MTS is uncommon in the social sciences literature. Like interval-recording procedures (i.e., partial-interval and whole-interval recording), MTS can be used with a number of interval lengths, observation durations, and target behaviors depending upon the target assessment question. This review focuses on the use of MTS with 15-second intervals, with a target behavior of academic engagement (or AE; defined as including both passive and active engagement, as described below) and a focus on individual students (rather than progress monitoring academic engagement for an entire class or small group). The studies reviewed are those that explicitly examine the reliability, validity, and levels of performance of data derived from MTS with 15-second intervals for AE with individual students as the target.

Where to Obtain:
John Hintze & William Matthews
Initial Cost:
Free
Replacement Cost:
Free
Included in Cost:
Momentary time-sampling is described in numerous books, articles, and presentations. Its methods are simple and transparent, and as a result, MTS may be considered free to use. However, training in the use of MTS typically occurs within the context of a graduate course on behavior assessment (e.g., school psychology, special education, applied behavior analysis), and as a result, some costs may be associated with its use. However, this is not necessarily a “prerequisite” to use of the instrument, and the cost for training will vary by user (from free to thousands of dollars).
The information provided on MTS in published tools varies widely.
Training Requirements:
Generally, 8 hours or more, however training time varies by study. Fellers & Saudargas, 1987: 12 hours (using SECOS system) Briesch, Chafouleas, & Riley-Tillman, 2010: 8 hours (using only MTS) Hintze & Matthews, 2004: 4 hours (using only MTS) Slate & Saudargas, 1986b: 13-15 hours (using SECOS system)
Qualified Administrators:
No minimum qualifications specified.
Access to Technical Support:
MTS is a common methodology for direct observation, and technical support should be available from any expert in behavior assessment.
Assessment Format:
  • Direct observation
Scoring Time:
  • Scoring is automatic OR
Scores Generated:
Administration Time:
  • minutes per
Scoring Method:
  • Manually (by hand)
  • Automatically (computer-scored)
Technology Requirements:

Tool Information

Descriptive Information

Please provide a description of your tool:
Momentary time-sampling (MTS) is a behavior assessment methodology within systematic direct observation wherein an observation period is divided into intervals, and behavior during each interval is scored as an occurrence if the behavior is occurring at the moment the interval begins or ends (depending on the specific procedures used). By dividing intervals scored as occurrences by the total number of intervals, MTS provides an estimate of the proportion or percentage of an observation period during which a target behavior was occurring (i.e., “prevalence”). Depending on the characteristics of the underlying behavior, MTS may also be used to estimate the frequency of a target behavior (see Suen & Ary, 1989). However, this latter use of MTS is uncommon in the social sciences literature. Like interval-recording procedures (i.e., partial-interval and whole-interval recording), MTS can be used with a number of interval lengths, observation durations, and target behaviors depending upon the target assessment question. This review focuses on the use of MTS with 15-second intervals, with a target behavior of academic engagement (or AE; defined as including both passive and active engagement, as described below) and a focus on individual students (rather than progress monitoring academic engagement for an entire class or small group). The studies reviewed are those that explicitly examine the reliability, validity, and levels of performance of data derived from MTS with 15-second intervals for AE with individual students as the target.
Is your tool designed to measure progress towards an end-of-year goal (e.g., oral reading fluency) or progress towards a short-term skill (e.g., letter naming fluency)?
not selected
not selected
The tool is intended for use with the following grade(s).
selected Preschool / Pre - kindergarten
selected Kindergarten
selected First grade
selected Second grade
selected Third grade
selected Fourth grade
selected Fifth grade
selected Sixth grade
selected Seventh grade
selected Eighth grade
selected Ninth grade
selected Tenth grade
selected Eleventh grade
selected Twelfth grade

The tool is intended for use with the following age(s).
selected 0-4 years old
selected 5 years old
selected 6 years old
selected 7 years old
selected 8 years old
selected 9 years old
selected 10 years old
selected 11 years old
selected 12 years old
selected 13 years old
selected 14 years old
selected 15 years old
selected 16 years old
selected 17 years old
selected 18 years old

The tool is intended for use with the following student populations.
selected Students in general education
selected Students with disabilities
selected English language learners

ACADEMIC ONLY: What dimensions does the tool assess?

Reading
not selected Global Indicator of Reading Competence
not selected Listening Comprehension
not selected Vocabulary
not selected Phonemic Awareness
not selected Decoding
not selected Passage Reading
not selected Word Identification
not selected Comprehension

Spelling & Written Expression
not selected Global Indicator of Spelling Competence
not selected Global Indicator of Writting Expression Competence

Mathematics
not selected Global Indicator of Mathematics Comprehension
not selected Early Numeracy
not selected Mathematics Concepts
not selected Mathematics Computation
not selected Mathematics Application
not selected Fractions
not selected Algebra

Other
Please describe specific domain, skills or subtests:


BEHAVIOR ONLY: Please identify which broad domain(s)/construct(s) are measured by your tool and define each sub-domain or sub-construct.
This review focuses on examinations of the properties of data derived from MTS with 15-second intervals for measuring AE, when this construct is defined as including both active (e.g., writing on a piece of paper) and passive (e.g., looking at the teacher during a lecture) engagement. In other words, a student could engage in active or passive engagement in order to be considered to be engaging in AE.
BEHAVIOR ONLY: Which category of behaviors does your tool target?
Externalizing

Acquisition and Cost Information

Where to obtain:
Email Address
Address
Phone Number
Website
Initial cost for implementing program:
Cost
$0.00
Unit of cost
Replacement cost per unit for subsequent use:
Cost
$0.00
Unit of cost
Duration of license
Additional cost information:
Describe basic pricing plan and structure of the tool. Provide information on what is included in the published tool, as well as what is not included but required for implementation.
Momentary time-sampling is described in numerous books, articles, and presentations. Its methods are simple and transparent, and as a result, MTS may be considered free to use. However, training in the use of MTS typically occurs within the context of a graduate course on behavior assessment (e.g., school psychology, special education, applied behavior analysis), and as a result, some costs may be associated with its use. However, this is not necessarily a “prerequisite” to use of the instrument, and the cost for training will vary by user (from free to thousands of dollars).
Provide information about special accommodations for students with disabilities.
The information provided on MTS in published tools varies widely.

Administration

BEHAVIOR ONLY: What type of administrator is your tool designed for?
selected
selected
not selected
not selected
selected
not selected
If other, please specify:

BEHAVIOR ONLY: What is the administration format?
selected
not selected
not selected
not selected
not selected
If other, please specify:

BEHAVIOR ONLY: What is the administration setting?
selected
selected
not selected
selected
selected
selected
not selected
If other, please specify:

Does the program require technology?

If yes, what technology is required to implement your program? (Select all that apply)
not selected
not selected
not selected

If your program requires additional technology not listed above, please describe the required technology and the extent to which it is combined with teacher small-group instruction/intervention:

What is the administration context?
selected
selected    If small group, n=
not selected    If large group, n=
not selected
not selected
If other, please specify:

What is the administration time?
Time in minutes
per (student/group/other unit)

Additional scoring time:
Time in minutes
per (student/group/other unit)

How many alternate forms are available, if applicable?
Number of alternate forms
per (grade/level/unit)

ACADEMIC ONLY: What are the discontinue rules?
not selected
not selected
not selected
not selected
If other, please specify:

BEHAVIOR ONLY: Can multiple students be rated concurrently by one administrator?

If yes, how many students can be rated concurrently?
Briesch, Hemphill, Volpe, & Daniels (2015) examined this issue. Data derived from observing 14 students at a time most closely approximated criterion levels when students were individually observed in a random sequence by interval. This was extended by Dart, Radley, Briesch, Furlow, & Cavell (2016), who noted that a 15s MTS procedure where students were individually observed in a fixed sequence by interval, or where students were individually observed in a random sequence by interval, both resulted in data closely approximating a criterion estimate for academic engagement of a 13-student classroom.

Training & Scoring

Training

Is training for the administrator required?
Yes
Describe the time required for administrator training, if applicable:
Generally, 8 hours or more, however training time varies by study. Fellers & Saudargas, 1987: 12 hours (using SECOS system) Briesch, Chafouleas, & Riley-Tillman, 2010: 8 hours (using only MTS) Hintze & Matthews, 2004: 4 hours (using only MTS) Slate & Saudargas, 1986b: 13-15 hours (using SECOS system)
Please describe the minimum qualifications an administrator must possess.
selected No minimum qualifications
Are training manuals and materials available?
Yes
Are training manuals/materials field-tested?
No
Are training manuals/materials included in cost of tools?
Yes
If No, please describe training costs:
Can users obtain ongoing professional and technical support?
Yes
If Yes, please describe how users can obtain support:
MTS is a common methodology for direct observation, and technical support should be available from any expert in behavior assessment.

Scoring

BEHAVIOR ONLY: What types of scores result from the administration of the assessment?
Score
Observation Behavior Rating
selected Frequency
selected Duration
not selected Interval
not selected Latency
not selected Raw score
Conversion
Observation Behavior Rating
not selected Rate
selected Percent
not selected Standard score
not selected Subscale/ Subtest
not selected Composite
not selected Stanine
not selected Percentile ranks
not selected Normal curve equivalents
not selected IRT based scores
Interpretation
Observation Behavior Rating
not selected Error analysis
selected Peer comparison
not selected Rate of change
not selected Dev. benchmarks
not selected Age-Grade equivalent
How are scores calculated?
selected Manually (by hand)
selected Automatically (computer-scored)
not selected Other
If other, please specify:

Do you provide basis for calculating performance level scores?
Yes

What is the basis for calculating performance level and percentile scores?
not selected Age norms
not selected Grade norms
not selected Classwide norms
not selected Schoolwide norms
not selected Stanines
not selected Normal curve equivalents

What types of performance level scores are available?
not selected Raw score
not selected Standard score
not selected Percentile score
not selected Grade equivalents
not selected IRT-based score
not selected Age equivalents
not selected Stanines
not selected Normal curve equivalents
not selected Developmental benchmarks
not selected Developmental cut points
not selected Equated
not selected Probability
not selected Lexile score
not selected Error analysis
not selected Composite scores
not selected Subscale/subtest scores
not selected Other
If other, please specify:

Please describe the scoring structure. Provide relevant details such as the scoring format, the number of items overall, the number of items per subscale, what the cluster/composite score comprises, and how raw scores are calculated.
Typically, when calculating level of performance, comparisons are made within-student (“absolute” decision-making using data collected for a specific student across time) or between a target student and a peer (“relative” decision-making, as takes place within the BOSS). Prevalence is the most frequent score resulting from MTS, and may be calculated by summing the number of intervals scored as an occurrence and dividing this value by the total number of intervals observed. Prevalence can be converted into a percentage (by multiplying prevalence with 100) or into a duration estimate (by multiplying prevalence with the observation period length). Frequency can also be calculated according to formulas found in Suen & Ary (1989) when certain criteria for the interval length and behavior stream are met.
Do you provide basis for calculating slope (e.g., amount of improvement per unit in time)?
No
ACADEMIC ONLY: Do you provide benchmarks for the slopes?
ACADEMIC ONLY: Do you provide percentile ranks for the slopes?
What is the basis for calculating slope and percentile scores?
not selected Age norms
not selected Grade norms
not selected Classwide norms
not selected Schoolwide norms
not selected Stanines
not selected Normal curve equivalents

Describe the tool’s approach to progress monitoring, behavior samples, test format, and/or scoring practices, including steps taken to ensure that it is appropriate for use with culturally and linguistically diverse populations and students with disabilities.

Levels of Performance and Usability

Are levels of performance specified in your manual or published materials?
Yes
If yes, specify the levels of performance and how they are used for progress monitoring:
[NOTE: These data are aggregated across students, as befits information regarding general levels of performance.] Fellers & Saudargas, 1987. Observed behavior of two groups of 15 female students (LD and non-LD; total n = 30) across grades 2, 4, and 5 from public elementary schools. LD and non-LD students were matched based on classroom (i.e., one for each group drawn from each classroom). Observed using SECOS system, which utilizes a combined definition of academic engagement called “schoolwork” with 15s MTS procedures. Students were observed at least three times for 20 minutes across two weeks. Percentage of total intervals during which “seatwork” was indicated, as mean (M) and standard deviation (SD). LD group. M = 68.3%, SD = 12.7%. Non-LD group. M = 73.9%, SD = 14.3%. Fellers, G., & Saudargas, R. A. (1987). Classroom Behaviors of LD and Nonhandicapped Girls. Learning Disability Quarterly, 10(3), 231. http://doi.org/10.2307/1510495 Slate & Saudargas, 1986a (“Differences in learning disabled and average students’ classroom behaviors”). Observed behavior of two groups of 14 male students (LD and non-LD; total n = 28) across grades 3, 4, and 5 from public elementary schools. Of LD group, White = 7, Black = 7. Of non-LD group, White = 6, Black = 8. Observed using SECOS system, which utilizes a combined definition of academic engagement called “schoolwork” with 15s MTS procedures. Students were observed four to six times for 20 minutes across 10 weeks. Percentage of total intervals during which “seatwork” was indicated, as mean (M) and standard deviation (SD). LD group. M = 67.9%, SD = 12.1%. Non-LD group. M = 68.1%, SD = 8.53%. Slate, J. R., & Saudargas, R. A. (1986). Differences in Learning Disabled and Average Students’ Classroom Behaviors. Learning Disability Quarterly, 9(1), 61. http://doi.org/10.2307/1510402 Slate & Saudargas, 1986b (“Differences in the classroom behaviors of behaviorally disordered and regular class children”). Observed behavior of two groups of 13 male students (behaviorally disordered [BD] and non-BD; total n = 26) across grades 3, 4, and 5 from public elementary schools. Observed using SECOS system, which utilizes a combined definition of academic engagement called “schoolwork” with 15s MTS procedures. Students were observed four times for 20 minutes, with each individual student’s observations occurring within a single two week period. Percentage of total intervals during which “seatwork” was indicated, as mean (M) and standard deviation (SD). BD group. M = 66.83%, SD = 14.38%. Non-BD group. M = 67.52%, SD = 7.40%. Slate, J. R., & Saudargas, R. A. (1986). Differences in the classroom behaviors of behaviorally disordered and regular class children. Behavioral Disorders, 45–53. Zigmund, Kerr, & Schaeffer, 1988. Observed behavior of three groups of students: students with LD, students with emotional disturbance (ED), and a control group of students. Observed using 15s MTS procedures of on-task behavior. Students were observed twice weekly for 30 minutes. LD group: n = 36. Male = 28, Female = 8. Grades 9 to 11. ED group: n = 8. Male = 7, Female = 1. Grades 9 to 12. Control students: typical students, randomly selected at each observation of a student with LD or ED. Number of total intervals during which “on-task” was indicated, as mean (M) and standard deviation (SD). Total intervals = 15. LD group. M = 8.49, SD = 2.734 ED group. M = 8.78, SD = 1.974 Control group. M = 8.82, SD = 1.742 Zigmond, N., Kerr, M. M., & Schaeffer, A. (1988). Behavior patterns of learning disabled and non-learning-disabled adolescents in high school academic classes. Remedial and Special Education, 9(2), 6–11.

What is the basis for specifying levels of performance?
selected
not selected
not selected Other
If other, please specify:
False

If norm-referenced, describe the normative profile.

National representation (check all that apply):
Northeast:
not selected New England
not selected Middle Atlantic
Midwest:
not selected East North Central
not selected West North Central
South:
not selected South Atlantic
not selected East South Central
not selected West South Central
West:
not selected Mountain
not selected Pacific

Local representation (please describe, including number of states)
Date
1986-1988
Size
Small separate samples.
Gender (Percent)
Male
Female
Unknown
SES indicators (Percent)
Eligible for free or reduced-price lunch
Other SES Indicators
Race/Ethnicity (Percent)
White, Non-Hispanic
Black, Non-Hispanic
Hispanic
American Indian/Alaska Native
Asian/Pacific Islander
Other
Unknown
Disability classification (Please describe)


First language (Please describe)


Language proficiency status (Please describe)
If criterion-referenced, describe procedures for specifying levels of performance.

Describe any other procedures for specifying levels of performance.

Has a usability study been conducted on your tool (i.e., a study that examines the extent to which the tool is convenient and practicable for use?)

If yes, please describe, including the results:
The broader class of systematic direct observation (SDO) methodologies, which includes SDO, has been examined in a combined usability and social validity study conducted by Riley-Tillman, Chafouleas, Briesch, and Eckert (2008). The total sample size across two samples of school psychologists was 191 (92 in Study 1, 99 in Study 2). Most respondents worked in public schools (83.7%, 88.9% by Study), were female (76.1%, 74.7%), practiced with a “Masters plus 30” credential (48.9%, 41.4%), and were fairly evenly split across years in practice, urbanicity, and age group served. Results from responses to 16 Likert-type-scaled items (1 = strongly disagree, 6 = strongly agree) indicated that SDO procedures were generally perceived as acceptable to very acceptable (mean scores for positively-worded items were 4.4 to 5.1 across samples). Items specific to the time and intrusiveness upon teachers/staff, school psychologists, and the general classroom environment were rated from a mean of 2.0 to 2.8 using the scale described above, indicating low to moderate feelings towards the intrusiveness of procedures. To wit, each of these items began with the stem “The use of this technique was overly intrusive on…”. Mean responses to the item “This technique provides a feasible method of assessing the effectiveness of an intervention” were 4.7 and 4.8 across samples. Riley-Tillman, T., Chafouleas, S., Briesch, A., & Eckert, T. (2008). Daily Behavior Report Cards and Systematic Direct Observation: An investigation of the acceptability, reported training and use, and decision reliability among school psychologists. Journal of Behavioral Education, 17(4), 313-327. doi:10.1007/s10864-008-9070-5


Has a social validity study been conducted on your tool (i.e., a study that examines the significance of goals, appropriateness of procedures (e.g., ethics, cost, practicality), and the importance of treatment effects)?
Yes
If yes, please describe, including the results:
Many of the items in the study described above (Riley-Tillman, Chafouleas, Briesch, and Eckert, 2008) relate to the specific social validity of SDO procedures. For instance, mean responses for “This technique should prove effective in monitoring an intervention” were 4.9 and 5.0 across samples, “Use of this technique was a good way to handle the child’s problems” were 4.7 and 4.4 across samples, and “Overall, using this technique would be beneficial for the child” were 4.9 and 4.6 across samples. Riley-Tillman, T., Chafouleas, S., Briesch, A., & Eckert, T. (2008). Daily Behavior Report Cards and Systematic Direct Observation: An investigation of the acceptability, reported training and use, and decision reliability among school psychologists. Journal of Behavioral Education, 17(4), 313-327. doi:10.1007/s10864-008-9070-5

Performance Level

Reliability

Age / Grade
Informant
Early childhood / K
Researcher
Grades K-5
Researcher
Rating Convincing evidence Convincing evidence
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
*Offer a justification for each type of reliability reported, given the type and purpose of the tool.
*Describe the sample(s), including size and characteristics, for each reliability analysis conducted.
BRIESCH, CHAFOULEAS, & RILEY-TILLMAN (2010): Examinee sample: 12 students. Mean age = 5 years 11 months. White = 10, African-American = 1, Asian = 1. Female = 7, Male = 5. SDO rater sample: 2 researchers, trained with videos to 95% IOA criterion between observers (kappa = .89). Training lasted 8 hours. WOOD, HOJNOSKI, LARACY, & OLSON (2015): Examinee sample: 24 children. Female = 11, Male = 13. Age range = 38 – 65 months. Mean age = 51 months, SD = 8.30 months. Majority Caucasian. Primary Language of English = 23, Spanish = 1. Special education services for speech/language = 6. Special education services for unidentified needs = 1. Rater sample: 3 researchers. Trained using three training videos with criterion of 85% agreement on 3 consecutive videos met. ZAKSZESKI, HOJNOSKI, & WOOD (2017) Examinee sample: 24 children. Female = 11, Male = 13. Age range = 38 – 66 months. Mean age = 51 months, SD = 8 months. White = 16. Primary Language of English = 23, Spanish = 1. Special education services = 7. Rater sample: 3 researchers. Trained using three training videos with criterion of 85% agreement on all behavior categories.
*Describe the analysis procedures for each reported type of reliability.

*In the table(s) below, report the results of the reliability analyses described above (e.g., model-based evidence, internal consistency or inter-rater reliability coefficients). Include detail about the type of reliability data, statistic generated, and sample size and demographic information.

Type of Subscale Subgroup Informant Age / Grade Test or Criterion n
(sample/
examinees)
n
(raters)
Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of reliability analysis not compatible with above table format:
*Briesch, Chafouleas, & Riley-Tillman, 2010 +Hintze & Matthews, 2004 ^(Johnson, Chafouleas, & Briesch, 2017)
Manual cites other published reliability studies:
Yes
Provide citations for additional published studies.
Briesch, A. M., Chafouleas, S. M., & Riley-Tillman, T. C. (2010). Generalizability and dependability of behavior assessment methods to estimate academic engagement: A comparison of systematic direct observation and direct behavior rating. School Psychology Review, 39(3), 408. Wood, B. K., Hojnoski, R. L., Laracy, S. D., & Olson, C. L. (2015). Comparison of Observational Methods and Their Relation to Ratings of Engagement in Young Children. Topics in Early Childhood Special Education, 0271121414565911. Zakszeski, B. N., Hojnoski, R. L., & Wood, B. K. (2017). Considerations for Time Sampling Interval Durations in the Measurement of Young Children’s Classroom Engagement. Topics in Early Childhood Special Education, 37(1), 42-53.
Do you have reliability data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
No

If yes, fill in data for each subgroup with disaggregated reliability data.

Type of Subscale Subgroup Informant Age / Grade Test or Criterion n
(sample/
examinees)
n
(raters)
Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of reliability analysis not compatible with above table format:
Manual cites other published reliability studies:
No
Provide citations for additional published studies.

Validity

Age / Grade
Informant
Early childhood / K
Researcher
Grades K-5
Researcher
Rating Partially convincing evidence Partially convincing evidence
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
*Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.
*Describe the sample(s), including size and characteristics, for each validity analysis conducted.
WOOD, HOJNOSKI, LARACY, & OLSON, 2015: Examinee sample: 24 children. Female = 11, Male = 13. Age range = 38 – 65 months. Mean age = 51 months, SD = 8.30 months. Majority Caucasian. Primary Language of English = 23, Spanish = 1. Special education services for speech/language = 6. Special education services for unidentified needs = 1. Rater sample: 3 researchers. Trained using three training videos with criterion of 85% agreement on 3 consecutive videos met. SAUDARGAS & ZANOLLI, 1990: Examinee sample: 16 students. Grade 1 = 2, Grade 2 = 1, Grade 3 = 5, Grade 4 = 8. Rater sample: 2 graduate students. Trained using videotapes.
*Describe the analysis procedures for each reported type of validity.

*In the table below, report the results of the validity analyses described above (e.g., concurrent or predictive validity, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.

Type of Subscale Subgroup Informant Age / Grade Test or Criterion n
(sample/
examinees)
n
(raters)
Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of validity analysis not compatible with above table format:
*Wood, Hojnoski, Laracy, & Olson, 2015 +Zakszeski, Hojnoski, & Wood, 2017 ^Saudargas & Zanolli, 1990
Manual cites other published reliability studies:
Yes
Provide citations for additional published studies.
Wood, B. K., Hojnoski, R. L., Laracy, S. D., & Olson, C. L. (2015). Comparison of Observational Methods and Their Relation to Ratings of Engagement in Young Children. Topics in Early Childhood Special Education, 0271121414565911. Zakszeski, B. N., Hojnoski, R. L., & Wood, B. K. (2017). Considerations for Time Sampling Interval Durations in the Measurement of Young Children’s Classroom Engagement. Topics in Early Childhood Special Education, 37(1), 42-53.
Describe the degree to which the provided data support the validity of the tool.
As is true for information regarding sensitivity to change, validity evidence for estimates of academic engagement derived from MTS 15-second procedures is sparse, given that time-sampling procedures in general and MTS specifically are often viewed as a gold standard measure when continuous observation is not feasible. However, recently, Wood, Hojnoski, Laracy, and Olson (2015) examined error of MTS-derived estimates of prevalence when compared to those derived from continuous observation. MTS was found to be the least error-prone estimate when compared to PI and WI sampling. Absolute mean error (across students) was 6.28%, while mean measurement error that maintained the properties of over/underestimation was -3.35%. The Pearson correlation coefficient between MTS-derived estimates and those from continuous observation was .83, and Spearman’s rho, a non-parametric rank-order correlation coefficient, was .71 when MTS-derived estimates were compared to expert rankings of student engagement. In a follow-up to this study, Zakszeski, Hojnoski, and Wood (2017) examined the error of MTS-derived estimates of prevalence when compared to those derived from continuous observation. The Pearson correlation coefficient between MTS-derived estimates and those from continuous observation was .890 (p < .01), with an observed measurement error of 2.04% (percentage derived from continuous observation subtracted from percentage derived from MTS). In a less quantitative study, Saudargas and Zanolli (1990) used visual analysis to examine patterns of engagement estimates derived from both continuous observation and MTS. In almost all cases, trends between both data patterns were consistent across days, even when level was discrepant. Quantitative results reported by authors indicates that there was a less than 9% discrepancy identified between scores derived from MTS and continuous observation for 18 of 22 observations (82%).
Do you have validity data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
No

If yes, fill in data for each subgroup with disaggregated validity data.

Type of Subscale Subgroup Informant Age / Grade Test or Criterion n
(sample/
examinees)
n
(raters)
Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of validity analysis not compatible with above table format:
Manual cites other published reliability studies:
No
Provide citations for additional published studies.

Bias Analysis

Age / Grade: Informant Early childhood / K
Researcher
Grades K-5
Researcher
Rating No No
Have you conducted additional analyses related to the extent to which your tool is or is not biased against subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)? Examples might include Differential Item Functioning (DIF) or invariance testing in multiple-group confirmatory factor models.
No
If yes,
a. Describe the method used to determine the presence or absence of bias:
b. Describe the subgroups for which bias analyses were conducted:
c. Describe the results of the bias analyses conducted, including data and interpretative statements. Include magnitude of effect (if available) if bias has been identified.

Growth Standards

Sensitivity to Behavior Change

Age / Grade: Informant Early childhood / K
Researcher
Grades K-5
Researcher
Rating Data unavailable Data unavailable
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
Describe evidence that the monitoring system produces data that are sensitive to detect incremental change (e.g., small behavior change in a short period of time such as every 20 days, or more frequently depending on the purpose of the construct). Evidence should be drawn from samples targeting the specific population that would benefit from intervention. Include in this example a hypothetical illustration (with narrative and/or graphics) of how these data could be used to monitor student performance frequently enough and with enough sensitivity to accurately assess change:

Reliability (Intensive Population): Reliability for Students in Need of Intensive Intervention

Age / Grade
Informant
Early childhood / K
Researcher
Grades K-5
Researcher
Rating Data unavailable Data unavailable
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
Offer a justification for each type of reliability reported, given the type and purpose of the tool:
Describe the sample(s), including size and characteristics, for each reliability analysis conducted:
Describe the analysis procedures for each reported type of reliability:

In the table(s) below, report the results of the reliability analyses described above (e.g., model-based evidence, internal consistency or inter-rater reliability coefficients). Report results by age range or grade level (if relevant) and include detail about the type of reliability data, statistic generated, and sample size and demographic information.

Type of Subscale Subgroup Informant Age / Grade Test or Criterion n
(sample/
examinees)
n
(raters)
Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of reliability analysis not compatible with above table format:
Manual cites other published reliability studies:
No
Provide citations for additional published studies.
Do you have reliability data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
No
If yes, fill in data for each subgroup with disaggregated reliability data.
Type of Subscale Subgroup Informant Age / Grade Test or Criterion n
(sample/
examinees)
n
(raters)
Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of reliability analysis not compatible with above table format:
Manual cites other published reliability studies:
No
Provide citations for additional published studies.

Validity (Intensive Population): Validity for Students in Need of Intensive Intervention

Age / Grade
Informant
Early childhood / K
Researcher
Grades K-5
Researcher
Rating Dash Dash
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.
Describe the sample(s), including size and characteristics, for each validity analysis conducted.
Describe the analysis procedures for each reported type of validity.
In the table(s) below, report the results of the validity analyses described above (e.g., concurrent or predictive validity, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.
Type of Subscale Subgroup Informant Age / Grade Test or Criterion n
(sample/
examinees)
n
(raters)
Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of validity analysis not compatible with above table format:
Manual cites other published reliability studies:
No
Provide citations for additional published studies.
Describe the degree to which the provided data support the validity of the tool.
Do you have validity data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
No
If yes, fill in data for each subgroup with disaggregated validity data.
Type of Subscale Subgroup Informant Age / Grade Test or Criterion n
(sample/
examinees)
n
(raters)
Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of validity analysis not compatible with above table format:
Manual cites other published reliability studies:
No
Provide citations for additional published studies.

Decision Rules: Data to Support Intervention Change

Age / Grade: Informant Early childhood / K
Researcher
Grades K-5
Researcher
Rating Data unavailable Data unavailable
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
Are validated decision rules for when changes to the intervention need to be made specified in your manual or published materials?
No
If yes, specify the decision rules:
What is the evidentiary basis for these decision rules?

Decision Rules: Data to Support Intervention Selection

Age / Grade: Informant Early childhood / K
Researcher
Grades K-5
Researcher
Rating Data unavailable Data unavailable
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
Are validated decision rules for what intervention(s) to select specified in your manual or published materials?
No
If yes, specify the decision rules:
What is the evidentiary basis for these decision rules?

Data Collection Practices

Most tools and programs evaluated by the NCII are branded products which have been submitted by the companies, organizations, or individuals that disseminate these products. These entities supply the textual information shown above, but not the ratings accompanying the text. NCII administrators and members of our Technical Review Committees have reviewed the content on this page, but NCII cannot guarantee that this information is free from error or reflective of recent changes to the product. Tools and programs have the opportunity to be updated annually or upon request.