aimswebPlus Reading
Oral Reading Fluency
Summary
aimswebPlus Oral Reading Fluency is an individually administered, standardized test of oral reading for Grades 1-8, with options for Grades 9-12 to use off grade-level forms. There are 23 test forms for each grade ORF is designed to be used in the universal screening for all students at the beginning, middle, and end of the school year and for progress monitoring of students identified as at risk. Oral Reading Fluency (ORF) measures a student’s ability to read literary (fictional) English texts aloud. ORF is designed to measure a student’s reading rate and accuracy, with the ability to track patterns in reading errors. Parallel test forms within each grade level are composed of unfamiliar unique passages equated for text complexity and style. Students are given 1 minute to read the text presented on a single printed page while the examiner follows along on a digital record form copy of the passage on the computer to record the student's reading accuracy, note the form of any miscues, and mark the last word read. Examiners have the option to audio record the student’s reading and add details to digital record forms after test administration. Words read incorrectly due to mispronunciations, word substitutions, skipped words, or pausing and not initiating an attempt within 3 seconds are marked as errors. The final score represents the student’s reading rate, or the number of words read correctly in 1 minute. Reading accuracy is also reported as the percentage of words read correctly out of the total number of words attempted. Digital record forms are also available to review patterns in student errors.
- Where to Obtain:
- Pearson Inc.
- aimswebsupport@pearson.com
- Pearson Clinical Assessment, 927 E. Sonterra Blvd., Suite 119, San Antonio, TX, 78258
- 1-866-313-6194
- www.pearsonassessments.com/aimswebplus
- Initial Cost:
- $7.00 per student
- Replacement Cost:
- $7.00 per student per year
- Included in Cost:
- aimswebPlus is a subscription-based online solution that includes digital editions of training manuals and testing materials within the application. The per-student cost of $7.00 for one year grants access to all measures (reading and math). An aimswebPlus Unlimited subscription is available for districts with enrollment of 2,500 students or fewer. It includes all aimswebPlus measures (reading and math) and these supplemental measures: Shaywitz DyslexiaScreen, BASC-3 BESS Teacher and Student forms, WriteToLearn, and RAN Objects, Colors and Shapes. The cost for one year is $4995.00.
- Test accommodations that are documented in a student’s Individual Education Plan (IEP) are permitted with aimswebPlus. However, not all measures allow for accommodations. Oral Reading Fluency is an individually administered, timed test that employs strict time limits to generate rate-based scores. As such, valid interpretation of national norms, which are an essential aspect of decision-making during benchmark testing, depends on strict adherence to the standard administration procedures. Only these accommodations are allowed for Oral Reading Fluency: enlarging student test pages and modifying the environment (e.g., special lighting, adaptive furniture).
- Training Requirements:
- Less than one hour of administrator training is required read the administration and scoring guidelines and become familiar with the testing materials.
- Qualified Administrators:
- Paraprofessional or professional educators may be trained to administer ORF.
- Access to Technical Support:
- Pearson provides an extensive online Help database and offers both phone- and email-based support. A customer forum facilitates asking and answering questions, and additional on-site, virtual, and on-demand training may be purchased.
- Assessment Format:
-
- Individual
- Scoring Time:
-
- Scoring is automatic OR
- 2 minutes per student
- Scores Generated:
-
- Raw score
- Percentile score
- Error analysis
- Administration Time:
-
- 1 minutes per student
- Scoring Method:
-
- Manually (by hand)
- Automatically (computer-scored)
- Other : Test forms are scored on a digital record form by the examiner while students complete the measure, then final scores are calculated by the computer based on the words read correctly, words ready incorrectly, last word attempted, and if the student finished the text before the 1-minute time limit.
- Technology Requirements:
-
- Computer or tablet
- Internet connection
Tool Information
Descriptive Information
- Please provide a description of your tool:
- aimswebPlus Oral Reading Fluency is an individually administered, standardized test of oral reading for Grades 1-8, with options for Grades 9-12 to use off grade-level forms. There are 23 test forms for each grade ORF is designed to be used in the universal screening for all students at the beginning, middle, and end of the school year and for progress monitoring of students identified as at risk. Oral Reading Fluency (ORF) measures a student’s ability to read literary (fictional) English texts aloud. ORF is designed to measure a student’s reading rate and accuracy, with the ability to track patterns in reading errors. Parallel test forms within each grade level are composed of unfamiliar unique passages equated for text complexity and style. Students are given 1 minute to read the text presented on a single printed page while the examiner follows along on a digital record form copy of the passage on the computer to record the student's reading accuracy, note the form of any miscues, and mark the last word read. Examiners have the option to audio record the student’s reading and add details to digital record forms after test administration. Words read incorrectly due to mispronunciations, word substitutions, skipped words, or pausing and not initiating an attempt within 3 seconds are marked as errors. The final score represents the student’s reading rate, or the number of words read correctly in 1 minute. Reading accuracy is also reported as the percentage of words read correctly out of the total number of words attempted. Digital record forms are also available to review patterns in student errors.
- Is your tool designed to measure progress towards an end-of-year goal (e.g., oral reading fluency) or progress towards a short-term skill (e.g., letter naming fluency)?
-
ACADEMIC ONLY: What dimensions does the tool assess?
- BEHAVIOR ONLY: Please identify which broad domain(s)/construct(s) are measured by your tool and define each sub-domain or sub-construct.
- BEHAVIOR ONLY: Which category of behaviors does your tool target?
Acquisition and Cost Information
Administration
Training & Scoring
Training
- Is training for the administrator required?
- Yes
- Describe the time required for administrator training, if applicable:
- Less than one hour of administrator training is required read the administration and scoring guidelines and become familiar with the testing materials.
- Please describe the minimum qualifications an administrator must possess.
- Paraprofessional or professional educators may be trained to administer ORF.
- No minimum qualifications
- Are training manuals and materials available?
- Yes
- Are training manuals/materials field-tested?
- No
- Are training manuals/materials included in cost of tools?
- Yes
- If No, please describe training costs:
- Can users obtain ongoing professional and technical support?
- Yes
- If Yes, please describe how users can obtain support:
- Pearson provides an extensive online Help database and offers both phone- and email-based support. A customer forum facilitates asking and answering questions, and additional on-site, virtual, and on-demand training may be purchased.
Scoring
- Please describe the scoring structure. Provide relevant details such as the scoring format, the number of items overall, the number of items per subscale, what the cluster/composite score comprises, and how raw scores are calculated.
- Students receive 1 point for each word read correctly within the 1-minute time limit. Therefore, ORF scores capture a student’s reading rate in the form of the number of words read correctly per minute. Benchmark screening forms, comprised of two passages, are scored as the average words read correctly per minute based of the scores on both passages. Users may optionally mark categories of miscues and qualitative observations to enhance score interpretation. Reading accuracy is also reported as the percentage of number of word read correctly out of the total number of words attempted. National percentile rankings are provided to indicate how the student’s final score relates to peers in their grade level based on a large nationally representative norming sample of students. AimswebPlus also provides options for customers to view percentile rankings based on other students who have completed the same ORF forms within the customer’s account. When there are at least 30 other test scores gathered, real-time local norms are calculated to indicate a scores percentile ranking at the school or district level.
- Do you provide basis for calculating slope (e.g., amount of improvement per unit in time)?
- Yes
- ACADEMIC ONLY: Do you provide benchmarks for the slopes?
- Yes
- ACADEMIC ONLY: Do you provide percentile ranks for the slopes?
- Yes
- Describe the tool’s approach to progress monitoring, behavior samples, test format, and/or scoring practices, including steps taken to ensure that it is appropriate for use with culturally and linguistically diverse populations and students with disabilities.
- Once a student has been identified as in need of intensive intervention with progress monitoring, teachers or interventionists will create a progress monitoring schedule within aimswebPlus. Initiating progress monitoring with ORF begins with identifying a student’s baseline performance. Baseline scores are typically gathered during Benchmark screening but can also be obtained using Progress Monitoring forms directly. For student’s performing significantly below grade level expectations, we offer options for baseline performance to be set using off-grade-level forms, which can be necessary to ensure that progress monitoring forms use text at an appropriate level for ORF to be sensitive to growth (e.g. avoiding floor effects). Baseline scores are compared against national percentile norms to determine a student’s initial performance level. AimswebPlus uses national norming data on student growth conditional to initial performance levels (student growth percentiles or SGP) to provide the administrator/examiner feedback when creating a progress monitoring schedule. Administrators choose a target end date for the progress monitoring schedule, an interval of testing time points, and a target goal score. Choosing an optimal target goal score for the target end date is supported with feedback based on SGP norms matching the student’s initial performance level. Feedback informs the administrator how to choose a goal score that is ambitious enough to help close the gap, but not unrealistically too high or too low. Oral Reading Fluency is an individually administered measure using printed pages shown to students while an examiner records the student’s performance on a computer-based digital record form. The 1-to-1 administration style allows the examiner to directly observe performance, which can provide important formative information in addition to final scores about how to continue the student’s instructional support and progress monitoring schedule. ORF is designed to be a brief measure, to minimize testing time especially with students who have difficulties directly related to the measure's content. For each progress monitoring test session, the student sees a one-page story and reads as much of it as possible in 1 minute. To support assessment with diverse populations, instructions are designed to be brief, using simple, grade-appropriate language. Scoring rules include specific guidance about not penalizing students who pronounce words differently due to regional dialects or articulation differences. All stories were reviewed by qualified experts to minimize bias. When allowed by a student’s IEP accommodations like enlarging materials or adapting the physical environment is permitted.
Rates of Improvement and End of Year Benchmarks
- Is minimum acceptable growth (slope of improvement or average weekly increase in score by grade level) specified in your manual or published materials?
- Yes
- If yes, specify the growth standards:
- Winter, Winter to Spring, or Fall to Spring). An SGP indicates the percentage of students in the national sample whose seasonal (or annual) rate of improvement (ROI) fell at or below a specified ROI. Separate SGP distributions are computed for each of five initial performance levels: Well Below Average (<11th percentile), Below average (11th-25th percentile), Average (26th-74th percentile). Above Average (75th-89th percentile), and Well Above Average (>89th Percentile). Guidance is provided in our manuals about how to interpret SGPs and form decisions on whether students are showing acceptable growth. This includes evaluating ROIs observed between benchmark screening test administrations, and ROI slopes estimated from multiple progress monitoring scores over time. aimswebPlus progress monitoring features also provide real-time feedback for administrators when setting progress monitoring goals. Goals are set in the system by selecting the measure and baseline score, the goal date, the monitoring frequency (default is weekly), and the goal score. When the user defines the goal score, the system automatically labels the ambitiousness of the goal. The rate of improvement needed to achieve the goal is computed and translated into an SGP. An SGP < 50 is considered Insufficient; an SGP between 50 and 85 is considered Closes the Gap; an SGP between 85 and 97 is considered Ambitious; and an SGP > 97 is considered Overly Ambitious. aimswebPlus recommends setting performance goals at the top of the Closes the Gap range. Our manuals provide extensive guidance including case-study examples to help the administrator make the most appropriate decision about what growth rate is appropriate for each individual student.
- Are benchmarks for minimum acceptable end-of-year performance specified in your manual or published materials?
- Yes
- If yes, specify the end-of-year performance standards:
- aimswebPlus allows users to select from a range of end-of-year targets, with recommendations on how to decide on the most appropriate goal accounting for their school's unique student body and instructional needs as well as alignments to any state-defined criteria for proficiency. aimswebPlus defines a meaningful target as one that is objective, quantifiable, and can be linked to a criterion that has inherent meaning for teachers. To establish a meaningful performance target using aimswebPlus tiers, the account manager (e.g., a school/district administrator) is advised to choose a target that is linked to a criterion, is challenging and achievable, and reflects historical performance results (when available). Users are also advised to consider the resources available to achieve the goal. The targets are primarily based on spring reading or math composite score national percentiles but can also be applied to individual measures in the aimswebPlus assessment system. Twelve national percentile targets ranging from the 15th through the 70th percentile, in increments of 5 are provided. This range was chosen because it covers the breadth of passing rates on state assessments and the historical range of targets typically used. The system provides a default spring performance target of the 30th national percentile. Targets can be set separately for Reading and Math. Guides and other resources provide more detail to help users define a high-quality performance target and present a step-by-step method to align spring performance targets to performance levels on state accountability tests. Once a target is selected, the aimswebPlus system automatically identifies the fall (or winter) cut score that divides the score distribution into three instructional Tiers. Students above the highest cut score are in Tier 1 and have a high probability (80%–95%) of meeting the performance target; students between the upper and lower cut scores are in Tier 2 and have a moderate probability (40%–70%) of meeting the performance target; and students below the lower cut score are in Tier 3 and have a low probability (10%–40%) of meeting the performance target.
- Date
- 2013-2014
- Size
- 18,088
- Male
- 50
- Female
- 50
- Unknown
- 0
- Eligible for free or reduced-price lunch
- Based on schoolwide eligibility for free or reduced lunch, students were sorted into Low (1-32% eligible), Moderate (33-66% eligible), and High (67-100% eligible) SES categories. Students were distributed fairly evenly among the three SES levels.
- Other SES Indicators
- White, Non-Hispanic
- 48-54%
- Black, Non-Hispanic
- Hispanic
- American Indian/Alaska Native
- Asian/Pacific Islander
- Other
- Unknown
- Disability classification (Please describe)
- Participating schools were required to assess all students in the selected grades except those with moderate to severe intellectual disabilities or moderate to severe motor impairment and those who are blind, or deaf.
- First language (Please describe)
- English
- Language proficiency status (Please describe)
- Participating schools were required to assess all students in the selected grades except those with an English Language Proficiency score of less than 3.
Performance Level
Reliability
Grade |
Grade 1
|
Grade 2
|
Grade 3
|
Grade 4
|
Grade 5
|
Grade 6
|
Grade 7
|
Grade 8
|
---|---|---|---|---|---|---|---|---|
Rating |
- *Offer a justification for each type of reliability reported, given the type and purpose of the tool.
- The purpose of Oral Reading Fluency (ORF) as a progress monitoring tool is to measure reading performance across multiple time points. For each grade level, ORF uses multiple alternative forms that are equivalent in passage difficulty and consistent in form. Here we report two types of reliability evidence: alternative-form reliability and internal-consistency (alpha). Justification for Study 1: Alternate-form reliability, where equivalent forms are administered close together in time, is important for progress monitoring ORF measures because it shows the consistency of the scores from independently timed administrations with different content. Justifi cation for Study 2: Internal consistency of test form scores is important to assess howreliably the forms collectively measure the same underlying construct.
- *Describe the sample(s), including size and characteristics, for each reliability analysis conducted.
- To conduct each reliability analysis independently, two student samples at each grade level were drawn from allstudents completing ORF progress monitoring measures during the 2022-2023 school year. The two samples weredefi ned based on their demographic characteristics (i.e., gender and ethnicity). The purpose was to create tworepresentative samples. Sample 1 was used to calculate internal consistency (Male: 51.25%, Female: 43.75%, Non-Binary/Unknown: 5%, Asian: 1.5%, Black: 15.5%, Hispanic: 13.5%, Native American: 2%, White: 44.5%,Multiple/Other/Unknown: 23.4). Internal Consistency was calculated on students taking 80% or more of theprogress monitoring forms. To include students who did not complete all PM scores, we employed the multipleimputation by chained equations (MICE) algorithm using the mice package in R, with 5 multiple imputations usingpredicative mean matching. The percentage of imputed data was less than 7.5% for each grade level. Sample 2 wasused in computing alternative form reliability (Male: 49.9%, Female: 43.1%, Nonbinary/Not Reporting: 7.1%, Asian:1.6%, Black: 16.0%, Hispanic: 13.4%, Native American: 2.0%, White: 41.3%, Multiple/Other/Unknown: 25.9).
- *Describe the analysis procedures for each reported type of reliability.
- Alternate-form reliability and Internal consistency analyses were completed on separate samples of aimswebPlusprogress monitoring data. For Alternate-form reliability evidence was calculated as the Pearson correlationcoeffi cient using test scores across all pairs of progress monitoring forms. For our internal consistency of ORF forms,Alpha (Cronbach’s Alpha) was calculated by treating ORF scores from each administration of the progressmonitoring forms as separate indicators of the student’s ability.
*In the table(s) below, report the results of the reliability analyses described above (e.g., model-based evidence, internal consistency or inter-rater reliability coefficients). Include detail about the type of reliability data, statistic generated, and sample size and demographic information.
Type of | Subscale | Subgroup | Informant | Age / Grade | Test or Criterion | n (sample/ examinees) |
n (raters) |
Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of reliability analysis not compatible with above table format:
- Manual cites other published reliability studies:
- No
- Provide citations for additional published studies.
- Do you have reliability data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
- No
If yes, fill in data for each subgroup with disaggregated reliability data.
Type of | Subscale | Subgroup | Informant | Age / Grade | Test or Criterion | n (sample/ examinees) |
n (raters) |
Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of reliability analysis not compatible with above table format:
- Manual cites other published reliability studies:
- Provide citations for additional published studies.
Validity
Grade |
Grade 1
|
Grade 2
|
Grade 3
|
Grade 4
|
Grade 5
|
Grade 6
|
Grade 7
|
Grade 8
|
---|---|---|---|---|---|---|---|---|
Rating |
- *Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.
- Two external criterion measures were used in our validity analyses: the English Language Arts end-of-year assessment from the Tennessee Comprehensive Assessment Program (TCAP-ELA) and the Northwest Evaluation Association Measures of Academic Progress for Reading (NWEA-MAP Reading). TCAP-ELA was used as our external criterion measure for our concurrent validity analyses for grades 3-8 and predictive validity analyses for grades 1-8. TCAP-ELA is a state-summative assessment measuring reading and writing abilities, with a focus on reading literary and informational texts. The Map Growth Reading RIT scores were used as our external criterion measure for our concurrent validity analyses for grades 1-2. MAP Reading assesses standards-aligned reading skills. TCAP-ELA and MAP Reading are appropriate criterion measure for these analyses as ORF is intended to measure a fundamental reading ability, which has been found to be strongly associated with reading proficiency more broadly (e.g. Fuchs et al, 2021, Petscher & Kim, 2011, Washburn, 2022).
- *Describe the sample(s), including size and characteristics, for each validity analysis conducted.
- Two samples of students from Tennessee and Illinois were chosen for our validity analyses. The Tennessee sample comes from a large school district with students represented across 51 elementary schools (Grades 1-5) and 21 middle schools (Grades 6-8) in urban, suburban, and rural regions. Students of all ability levels in this district completed ORF as a part of their universal screening assessments at the beginning, middle, and end of the school year and a portion of these students use ORF for progress monitoring. The Illinois sample comes from a school district with students represented across 10 elementary schools of varying sizes and locations around a medium sized city . Demographic data indicate the Illinois sample was drawn from a diverse district composed of multiple ethnic and socioeconomic backgrounds. For predictive validity analyses, ORF data for Grades 3-8 of the Tennessee sample were gathered during Fall of the 2022-2023 school year. For predictive validity analyses for grades 1-2, ORF data for Grade 1 of the Tennessee sample were gathered during the Spring of the 2021, and ORF data for Grade 2 was gathered during the Spring of 2022. For concurrent validity analyses, ORF data for Grades 3-8 came from the Tennessee sample and was gathered during the spring of 2023. For concurrent validity analyses for Grades 1-2, ORF data came from the Illinois Sample and was gathered during the winter of 2022-2023. All students with valid ORF and External criterion data scores meeting the criteria for our analyses were included.
- *Describe the analysis procedures for each reported type of validity.
- Two type of validity analyses were conducted with ORF and the external criterion measures: predictive validity and concurrent validity. Predictive Validity analyses were conducted by examining the strength of the Pearson correlation coefficient between ORF scores gathered multiple months prior to TCAP-ELA scores for the same students. For predictive validity analyses for grades 3-8 correlation coefficients were calculated between scores from ORF tests given in the Fall of 2022 (August-November) and TCAP-ELA given in the Spring (March) of 2023. For the predictive validity analysis for Grade 1, correlation coefficients were calculated between scores from ORF tests given in the Spring of 2020 and TCAP scores of the same students assessed at the end of Grade 3 in the Spring of 2023. For the predictive validity analysis for Grade 2, correlation coefficients were calculated between scores from ORF tests given in the Spring of 2021 and TCAP scores of the same students assessed at the end of Grade 3 in the Spring of 2023. Concurrent Validity analyses were conducted with the Illinois Sample for Grades 1-2 and the Tennessee Sample for Grades 3-8, by examining the strength of the Pearson correlation coefficient between ORF scores and the external criterion measures given within 2 months of each other. For the concurrent validity analysis for Grades 1 and 2 correlation coefficients were calculated between MAP Growth Reading scores in the Winter of 2023 (end of January to beginning of February) and ORF scores around the same time (January 2023). For the concurrent validity analysis for Grades 3-8, correlation coefficients were calculated between scores from ORF tests given in the Spring of 2023 and TCAP-ELA given in the spring (March) of 2023. For predictive and concurrent validity analyses, 95% confidence intervals for the correlation coefficients using the Fischer z-transformation.
*In the table below, report the results of the validity analyses described above (e.g., concurrent or predictive validity, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.
Type of | Subscale | Subgroup | Informant | Age / Grade | Test or Criterion | n (sample/ examinees) |
n (raters) |
Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of validity analysis not compatible with above table format:
- Manual cites other published reliability studies:
- No
- Provide citations for additional published studies.
- Describe the degree to which the provided data support the validity of the tool.
- Results of the predictive validity studies indicate that ORF scores have a strong positive relationship with general assessments of Reading abilities assessed in the TCAP summative assessment of ELA. This positive relationship was observed across all grade levels, especially among elementary school-aged students. Furthermore, results in grades 1 and 2 support the validity of ORF as measuring fundamental reading abilities associated with the development of reading skills relevant to end-of-years state summative assessments like the TCAP-ELA. Results of the concurrent validity analyses, especially for Grades 3-8 further indicate that ORF is a valid measure of passage reading associated with reading skills assessed on the TCAP-ELA. Concurrent validity coefficients for Grades 1-2, indicate a stronger positive relationship between ORF and MAP Growth Reading scores in 2nd grade than 1st grade, which may reflect a broader range of reading and language skills assessed on Grade 1 MAP assessments that may not be directly related to ORF passage reading skills. Similarly, trends of predictive and concurrent validity coefficients being stronger for younger grade levels than higher grade levels (e.g. Grades 6-8) may indicate how oral reading skills are more indicative of the kinds of reading practices emphasized in elementary school, and a relatively foundational compared to higher-level reading comprehension abilities emphasized on summative state assessments like the TCAP.
- Do you have validity data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
- No
If yes, fill in data for each subgroup with disaggregated validity data.
Type of | Subscale | Subgroup | Informant | Age / Grade | Test or Criterion | n (sample/ examinees) |
n (raters) |
Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of validity analysis not compatible with above table format:
- Manual cites other published reliability studies:
- No
- Provide citations for additional published studies.
Bias Analysis
Grade |
Grade 1
|
Grade 2
|
Grade 3
|
Grade 4
|
Grade 5
|
Grade 6
|
Grade 7
|
Grade 8
|
---|---|---|---|---|---|---|---|---|
Rating | No | No | No | No | No | No | No | No |
- Have you conducted additional analyses related to the extent to which your tool is or is not biased against subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)? Examples might include Differential Item Functioning (DIF) or invariance testing in multiple-group confirmatory factor models.
- Yes
- If yes,
- a. Describe the method used to determine the presence or absence of bias:
- ORF passages are written to be a fair assessment of reading accuracy and speed for all students, and the identification of reading errors is a critical component of progress monitoring with ORF. Therefore, bias Analyses focused on evaluating how passages are composed at the word level, to ensure that the words chosen are of equal difficulty to students of different genders or backgrounds associated with differing race/ethnicity groups. DIF analyses for ORF tests were conducted using the logistic regression method. Items were evaluated by the delta R2, the differences in Nagelkerke's R2 coefficients. The effect sizes are classified as "negligible", "moderate" or "large" based on Zumbo and Thomas (1997). In this analysis, items with “moderate” or “large” effect sizes were flagged as DIF items. Diff analyses were performed on all ORF parallel forms, with words of the passages represent "items". Therefore, this approach examines the extent to which the words used within our ORF passages assess reading fairly between gender and ethnicity groups.
- b. Describe the subgroups for which bias analyses were conducted:
- DIF analyses were performed in two ways to examine test performance within two demographic categories: gender and race/ethnicity. First, we compared results between students identified as male or female. Second, we investigated evidence of DIF between the following racial/ethnic subgroups: White, Black, Hispanic, Asian, and Other (including unknown, multiple races, and other frequently reported ethnicity groups). The total student sample in DIF analyses across Grades 1-8 included 355,540 students across 1833 school districts for DIF analyses based on gender (52% Male; 48% Female) and 302,509 students across 1560 districts for DIF analyses based on ethnicity (2.5% Asian, 18.3% Black, 16.9% Hispanic, 53% White, and 9.21% Other).
- c. Describe the results of the bias analyses conducted, including data and interpretative statements. Include magnitude of effect (if available) if bias has been identified.
Word reading accuracy and the identification of common reading miscues is a critical feature of testing with ORF, especially for progress monitoring students with reading difficulties and disabilities. ORF scores equal the number of words read correctly from a passage within a 1-minute time limit. Results indicate that the text of ORF passages provide a fair assessment of reading accuracy across gender and racial/ethnic groups. At all grade levels over 99% of items on average showed no or negligible evidence of DIF between students identified as male and females or between racial/ethnic groups. The average proportion of items showing either moderate or large effect sizes ranged across grade levels from 0.0018-0.0096 for DIF analyses based on gender, and 0.0011-0.0052 for Diff analyses based on race/ethnicity.
Growth Standards
Sensitivity: Reliability of Slope
Grade | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 |
---|---|---|---|---|---|---|---|---|
Rating |
- Describe the sample, including size and characteristics. Please provide documentation showing that the sample was composed of students in need of intensive intervention. A sample of students with intensive needs should satisfy one of the following criteria: (1) all students scored below the 30th percentile on a local or national norm, or the sample mean on a local or national test fell below the 25th percentile; (2) students had an IEP with goals consistent with the construct measured by the tool; or (3) students were non-responsive to Tier 2 instruction. Evidence based on an unknown sample, or a sample that does not meet these specifications, may not be considered.
- aimswebPlus progress monitoring data from ORF tests administered during the 2022-2023 school year was analyzed to identify student cases with sufficient data for our reliability of slope analyses. First, we only included student cases where students used on-grade-level ORF forms. Next, we identified students at each grade level with at least 10 ORF test scores gathered at regular intervals across at least 20 weeks. From these students, included only students with a baseline data point (1st ORF test score) below the 30% percentile according to national norms.
- Describe the frequency of measurement (for each student in the sample, report how often data were collected and over what span of time).
- Students included in the reliability analysis had completed at least 10 ORF forms, with some student's schedules completing up to 20 unique progress monitoring forms. Testing intervals were no shorter than once per week and up to once per month. Students with gaps between testing sessions longer than 40 days were excluded from the analysis. All students included in the analysis completed testing schedules over a span of at least 20 weeks.
- Describe the analysis procedures.
- Reliability of slope analyses for ORF progress monitoring were conducted separately for each grade level, using a mixed effects regression model. The ‘lme4’ package in R was used to model the average rate of improvement (ROI) in ORF test scores observed across time (fixed effect) while accounting for individual variation in baseline scores (random intercept) and ROIs (random slopes) observed in each individual student's data. The resulting model was used to extract the true score variance of the slope, representing the true variability in score changes detected by ORF in students over time, and the total variance in slope, which includes both the true score variance, individual variation (random effects), and other forms of variance. The reliability of slope coefficient was calculated to be the ration of true score variance to total slope variance in the model.
In the table below, report reliability of the slope (e.g., ratio of true slope variance to total slope variance) by grade level (if relevant).
Type of | Subscale | Subgroup | Informant | Age / Grade | Test or Criterion | n (sample/ examinees) |
n (raters) |
Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of reliability analysis not compatible with above table format:
- Manual cites other published reliability studies:
- No
- Provide citations for additional published studies.
- Do you have reliability of the slope data that is disaggregated by subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)?
- No
If yes, fill in data for each subgroup with disaggregated reliability of the slope data.
Type of | Subscale | Subgroup | Informant | Age / Grade | Test or Criterion | n (sample/ examinees) |
n (raters) |
Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of reliability analysis not compatible with above table format:
- Manual cites other published reliability studies:
- Provide citations for additional published studies.
Sensitivity: Validity of Slope
Grade | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 |
---|---|---|---|---|---|---|---|---|
Rating |
- Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.
-
- Describe the sample(s), including size and characteristics. Please provide documentation showing that the sample was composed of students in need of intensive intervention. A sample of students with intensive needs should satisfy one of the following criteria: (1) all students scored below the 30th percentile on a local or national norm, or the sample mean on a local or national test fell below the 25th percentile; (2) students had an IEP with goals consistent with the construct measured by the tool; or (3) students were non-responsive to Tier 2 instruction. Evidence based on an unknown sample, or a sample that does not meet these specifications, may not be considered.
- Describe the frequency of measurement (for each student in the sample, report how often data were collected and over what span of time).
- Describe the analysis procedures for each reported type of validity.
In the table below, report predictive validity of the slope (correlation between the slope and achievement outcome) by grade level (if relevant).
NOTE: The TRC suggests controlling for initial level when the correlation for slope without such control is not adequate.
Type of | Subscale | Subgroup | Informant | Age / Grade | Test or Criterion | n (sample/ examinees) |
n (raters) |
Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of reliability analysis not compatible with above table format:
- Manual cites other published validity studies:
- Provide citations for additional published studies.
- Describe the degree to which the provided data support the validity of the tool.
- Do you have validity of the slope data that is disaggregated by subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)?
If yes, fill in data for each subgroup with disaggregated validity of the slope data.
Type of | Subscale | Subgroup | Informant | Age / Grade | Test or Criterion | n (sample/ examinees) |
n (raters) |
Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of reliability analysis not compatible with above table format:
- Manual cites other published validity studies:
- Provide citations for additional published studies.
Alternate Forms
Grade | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 |
---|---|---|---|---|---|---|---|---|
Rating |
- Describe the sample for these analyses, including size and characteristics:
- What is the number of alternate forms of equal and controlled difficulty?
- If IRT based, provide evidence of item or ability invariance
- If computer administered, how many items are in the item bank for each grade level?
- If your tool is computer administered, please note how the test forms are derived instead of providing alternate forms:
Decision Rules: Setting & Revising Goals
Grade | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 |
---|---|---|---|---|---|---|---|---|
Rating |
- In your manual or published materials, do you specify validated decision rules for how to set and revise goals?
- Yes
- If yes, specify the decision rules:
-
What is the evidentiary basis for these decision rules?
NOTE: The TRC expects evidence for this standard to include an empirical study that compares a treatment group to a control and evaluates whether student outcomes increase when decision rules are in place.
Decision Rules: Changing Instruction
Grade | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 |
---|---|---|---|---|---|---|---|---|
Rating |
- In your manual or published materials, do you specify validated decision rules for when changes to instruction need to be made?
- If yes, specify the decision rules:
-
What is the evidentiary basis for these decision rules?
NOTE: The TRC expects evidence for this standard to include an empirical study that compares a treatment group to a control and evaluates whether student outcomes increase when decision rules are in place.
Data Collection Practices
Most tools and programs evaluated by the NCII are branded products which have been submitted by the companies, organizations, or individuals that disseminate these products. These entities supply the textual information shown above, but not the ratings accompanying the text. NCII administrators and members of our Technical Review Committees have reviewed the content on this page, but NCII cannot guarantee that this information is free from error or reflective of recent changes to the product. Tools and programs have the opportunity to be updated annually or upon request.