aimswebPlus Reading
Oral Reading Fluency

Summary

Descriptive Information

aimswebPlus Oral Reading Fluency is an individually administered, standardized test of oral reading for Grades 1-8, with options for Grades 9-12 to use off grade-level forms. There are 23 test forms for each grade ORF is designed to be used in the universal screening for all students at the beginning, middle, and end of the school year and for progress monitoring of students identified as at risk. Oral Reading Fluency (ORF) measures a student’s ability to read literary (fictional) English texts aloud. ORF is designed to measure a student’s reading rate and accuracy, with the ability to track patterns in reading errors. Parallel test forms within each grade level are composed of unfamiliar unique passages equated for text complexity and style. Students are given 1 minute to read the text presented on a single printed page while the examiner follows along on a digital record form copy of the passage on the computer to record the student's reading accuracy, note the form of any miscues, and mark the last word read. Examiners have the option to audio record the student’s reading and add details to digital record forms after test administration. Words read incorrectly due to mispronunciations, word substitutions, skipped words, or pausing and not initiating an attempt within 3 seconds are marked as errors. The final score represents the student’s reading rate, or the number of words read correctly in 1 minute. Reading accuracy is also reported as the percentage of words read correctly out of the total number of words attempted. Digital record forms are also available to review patterns in student errors.

Acquisition & Cost

Where to Obtain:: Pearson Inc.; aimswebsupport@pearson.com; Pearson Clinical Assessment, 927 E. Sonterra Blvd., Suite 119, San Antonio, TX, 78258; 1-866-313-6194; www.pearsonassessments.com/aimswebplus

Initial Cost:: $7.00 per student

Replacement Cost:: $7.00 per student per year

Included in Cost:: aimswebPlus is a subscription-based online solution that includes digital editions of training manuals and testing materials within the application. The per-student cost of $7.00 for one year grants access to all measures (reading and math). An aimswebPlus Unlimited subscription is available for districts with enrollment of 2,500 students or fewer. It includes all aimswebPlus measures (reading and math) and these supplemental measures: Shaywitz DyslexiaScreen, BASC-3 BESS Teacher and Student forms, WriteToLearn, and RAN Objects, Colors and Shapes. The cost for one year is $4995.00.; Test accommodations that are documented in a student’s Individual Education Plan (IEP) are permitted with aimswebPlus. However, not all measures allow for accommodations. Oral Reading Fluency is an individually administered, timed test that employs strict time limits to generate rate-based scores. As such, valid interpretation of national norms, which are an essential aspect of decision-making during benchmark testing, depends on strict adherence to the standard administration procedures. Only these accommodations are allowed for Oral Reading Fluency: enlarging student test pages and modifying the environment (e.g., special lighting, adaptive furniture).

Training & Technical Support

Training Requirements:: Less than one hour of administrator training is required read the administration and scoring guidelines and become familiar with the testing materials.

Qualified Administrators:: Paraprofessional or professional educators may be trained to administer ORF.

Access to Technical Support:: Pearson provides an extensive online Help database and offers both phone- and email-based support. A customer forum facilitates asking and answering questions, and additional on-site, virtual, and on-demand training may be purchased.

Administration

Assessment Format:

Individual

Scoring Time:

Scoring is automatic OR
2 minutes per student

Scores Generated:

Raw score
Percentile score
Error analysis

Administration Time:

1 minutes per student

Scoring Method:

Manually (by hand)
Automatically (computer-scored)
Other : Test forms are scored on a digital record form by the examiner while students complete the measure, then final scores are calculated by the computer based on the words read correctly, words ready incorrectly, last word attempted, and if the student finished the text before the 1-minute time limit.

Technology Requirements:

Computer or tablet
Internet connection

Tool Information

Descriptive Information

Please provide a description of your tool:: aimswebPlus Oral Reading Fluency is an individually administered, standardized test of oral reading for Grades 1-8, with options for Grades 9-12 to use off grade-level forms. There are 23 test forms for each grade ORF is designed to be used in the universal screening for all students at the beginning, middle, and end of the school year and for progress monitoring of students identified as at risk. Oral Reading Fluency (ORF) measures a student’s ability to read literary (fictional) English texts aloud. ORF is designed to measure a student’s reading rate and accuracy, with the ability to track patterns in reading errors. Parallel test forms within each grade level are composed of unfamiliar unique passages equated for text complexity and style. Students are given 1 minute to read the text presented on a single printed page while the examiner follows along on a digital record form copy of the passage on the computer to record the student's reading accuracy, note the form of any miscues, and mark the last word read. Examiners have the option to audio record the student’s reading and add details to digital record forms after test administration. Words read incorrectly due to mispronunciations, word substitutions, skipped words, or pausing and not initiating an attempt within 3 seconds are marked as errors. The final score represents the student’s reading rate, or the number of words read correctly in 1 minute. Reading accuracy is also reported as the percentage of words read correctly out of the total number of words attempted. Digital record forms are also available to review patterns in student errors.

Is your tool designed to measure progress towards an end-of-year goal (e.g., oral reading fluency) or progress towards a short-term skill (e.g., letter naming fluency)?: End-year goal
Short-term skill

The tool is intended for use with the following grade(s).

Preschool / Pre - kindergarten
not selected

Kindergarten
selected

First grade
selected

Second grade
selected

Third grade
selected

Fourth grade
selected

Fifth grade
selected

Sixth grade
selected

Seventh grade
selected

Eighth grade
not selected

Ninth grade
not selected

Tenth grade
not selected

Eleventh grade
not selected

Twelfth grade

The tool is intended for use with the following age(s).

0-4 years old
not selected

5 years old
selected

6 years old
selected

7 years old
selected

8 years old
selected

9 years old
selected

10 years old
selected

11 years old
selected

12 years old
selected

13 years old
selected

14 years old
not selected

15 years old
not selected

16 years old
not selected

17 years old
not selected

18 years old

The tool is intended for use with the following student populations.

Students in general education
selected

Students with disabilities
selected

English language learners

ACADEMIC ONLY: What dimensions does the tool assess?

Reading

Global Indicator of Reading Competence
not selected

Listening Comprehension
not selected

Vocabulary
not selected

Phonemic Awareness
selected

Decoding

Passage Reading
not selected

Word Identification
not selected

Comprehension

Spelling & Written Expression

Global Indicator of Spelling Competence
not selected

Global Indicator of Writting Expression Competence

Mathematics

Global Indicator of Mathematics Comprehension
not selected

Early Numeracy
not selected

Mathematics Concepts
not selected

Mathematics Computation
not selected

Mathematics Application
not selected

Fractions

Algebra

Other

Please describe specific domain, skills or subtests:

BEHAVIOR ONLY: Please identify which broad domain(s)/construct(s) are measured by your tool and define each sub-domain or sub-construct.

BEHAVIOR ONLY: Which category of behaviors does your tool target?

Acquisition and Cost Information

Where to obtain:

Email Address: aimswebsupport@pearson.com
Address: Pearson Clinical Assessment, 927 E. Sonterra Blvd., Suite 119, San Antonio, TX, 78258
Phone Number: 1-866-313-6194
Website: www.pearsonassessments.com/aimswebplus

Initial cost for implementing program:

Cost: $7.00
Unit of cost: student

Replacement cost per unit for subsequent use:

Cost: $7.00
Unit of cost: student
Duration of license: year

Additional cost information:

Describe basic pricing plan and structure of the tool. Provide information on what is included in the published tool, as well as what is not included but required for implementation.: aimswebPlus is a subscription-based online solution that includes digital editions of training manuals and testing materials within the application. The per-student cost of $7.00 for one year grants access to all measures (reading and math). An aimswebPlus Unlimited subscription is available for districts with enrollment of 2,500 students or fewer. It includes all aimswebPlus measures (reading and math) and these supplemental measures: Shaywitz DyslexiaScreen, BASC-3 BESS Teacher and Student forms, WriteToLearn, and RAN Objects, Colors and Shapes. The cost for one year is $4995.00.

Provide information about special accommodations for students with disabilities.: Test accommodations that are documented in a student’s Individual Education Plan (IEP) are permitted with aimswebPlus. However, not all measures allow for accommodations. Oral Reading Fluency is an individually administered, timed test that employs strict time limits to generate rate-based scores. As such, valid interpretation of national norms, which are an essential aspect of decision-making during benchmark testing, depends on strict adherence to the standard administration procedures. Only these accommodations are allowed for Oral Reading Fluency: enlarging student test pages and modifying the environment (e.g., special lighting, adaptive furniture).

Administration

BEHAVIOR ONLY: What type of administrator is your tool designed for?

General education teacher
not selected

Special education teacher
not selected

Parent

Child

External observer
not selected

Other

If other, please specify:

BEHAVIOR ONLY: What is the administration format?

Direct observation
not selected

Rating scale
not selected

Checklist

Performance measure
not selected

Other

If other, please specify:

BEHAVIOR ONLY: What is the administration setting?

General education classroom
not selected

Special education classroom
not selected

School office
not selected

Recess

Lunchroom

Home

Other

If other, please specify:

Does the program require technology?

Yes

If yes, what technology is required to implement your program? (Select all that apply)

Computer or tablet
selected

Internet connection
not selected

Other technology (please specify)

If your program requires additional technology not listed above, please describe the required technology and the extent to which it is combined with teacher small-group instruction/intervention:

What is the administration context?

Individual

Small group If small group, n=

Large group If large group, n=

Computer-administered
not selected

Other

If other, please specify:

What is the administration time?

Time in minutes

per (student/group/other unit)

student

Additional scoring time:

Time in minutes

per (student/group/other unit)

student

How many alternate forms are available, if applicable?

Number of alternate forms

per (grade/level/unit)

grade

ACADEMIC ONLY: What are the discontinue rules?

No discontinue rules provided
not selected

Basals

Ceilings

Other

If other, please specify:

Progress Monitoring forms (1 passage) of ORF do not have a discontinue rule. Benchmark screening forms (2 passages) have a discontinue rule where if a student is unable to correctly read the 1st 10 words on the 1st passage, the test can be discontinued, and the student does not need to attempt the 2nd passage.

BEHAVIOR ONLY: Can multiple students be rated concurrently by one administrator?

If yes, how many students can be rated concurrently?

Training & Scoring

Training

Is training for the administrator required?: Yes

Describe the time required for administrator training, if applicable:: Less than one hour of administrator training is required read the administration and scoring guidelines and become familiar with the testing materials.

Please describe the minimum qualifications an administrator must possess.: Paraprofessional or professional educators may be trained to administer ORF.; No minimum qualifications

Are training manuals and materials available?: Yes

Are training manuals/materials field-tested?: No

Are training manuals/materials included in cost of tools?: Yes
If No, please describe training costs:

Can users obtain ongoing professional and technical support?: Yes
If Yes, please describe how users can obtain support:: Pearson provides an extensive online Help database and offers both phone- and email-based support. A customer forum facilitates asking and answering questions, and additional on-site, virtual, and on-demand training may be purchased.

Scoring

BEHAVIOR ONLY: What types of scores result from the administration of the assessment?

Score
Observation	Behavior Rating
Frequency Duration Interval Latency	Raw score

Conversion
Observation	Behavior Rating
Rate Percent	Standard score Subscale/ Subtest Composite Stanine Percentile ranks Normal curve equivalents IRT based scores

Interpretation
Observation	Behavior Rating
Error analysis Peer comparison Rate of change	Dev. benchmarks Age-Grade equivalent

How are scores calculated?

Manually (by hand)
selected

Automatically (computer-scored)
selected

Other

If other, please specify:

Test forms are scored on a digital record form by the examiner while students complete the measure, then final scores are calculated by the computer based on the words read correctly, words ready incorrectly, last word attempted, and if the student finished the text before the 1-minute time limit.

Do you provide basis for calculating performance level scores?

Yes

What is the basis for calculating performance level and percentile scores?

Age norms

Grade norms
not selected

Classwide norms
selected

Schoolwide norms
not selected

Stanines

Normal curve equivalents

What types of performance level scores are available?

Raw score

Standard score
selected

Percentile score
not selected

Grade equivalents
not selected

IRT-based score
not selected

Age equivalents
not selected

Stanines

Normal curve equivalents
not selected

Developmental benchmarks
not selected

Developmental cut points
not selected

Equated

Probability
not selected

Lexile score
selected

Error analysis
not selected

Composite scores
not selected

Subscale/subtest scores
not selected

Other

If other, please specify:

Please describe the scoring structure. Provide relevant details such as the scoring format, the number of items overall, the number of items per subscale, what the cluster/composite score comprises, and how raw scores are calculated.: Students receive 1 point for each word read correctly within the 1-minute time limit. Therefore, ORF scores capture a student’s reading rate in the form of the number of words read correctly per minute. Benchmark screening forms, comprised of two passages, are scored as the average words read correctly per minute based of the scores on both passages. Users may optionally mark categories of miscues and qualitative observations to enhance score interpretation. Reading accuracy is also reported as the percentage of number of word read correctly out of the total number of words attempted. National percentile rankings are provided to indicate how the student’s final score relates to peers in their grade level based on a large nationally representative norming sample of students. AimswebPlus also provides options for customers to view percentile rankings based on other students who have completed the same ORF forms within the customer’s account. When there are at least 30 other test scores gathered, real-time local norms are calculated to indicate a scores percentile ranking at the school or district level.

Do you provide basis for calculating slope (e.g., amount of improvement per unit in time)?: Yes

ACADEMIC ONLY: Do you provide benchmarks for the slopes?: Yes

ACADEMIC ONLY: Do you provide percentile ranks for the slopes?: Yes

What is the basis for calculating slope and percentile scores?

Age norms

Grade norms
not selected

Classwide norms
selected

Schoolwide norms
not selected

Stanines

Normal curve equivalents

Describe the tool’s approach to progress monitoring, behavior samples, test format, and/or scoring practices, including steps taken to ensure that it is appropriate for use with culturally and linguistically diverse populations and students with disabilities.: Once a student has been identified as in need of intensive intervention with progress monitoring, teachers or interventionists will create a progress monitoring schedule within aimswebPlus. Initiating progress monitoring with ORF begins with identifying a student’s baseline performance. Baseline scores are typically gathered during Benchmark screening but can also be obtained using Progress Monitoring forms directly. For student’s performing significantly below grade level expectations, we offer options for baseline performance to be set using off-grade-level forms, which can be necessary to ensure that progress monitoring forms use text at an appropriate level for ORF to be sensitive to growth (e.g. avoiding floor effects). Baseline scores are compared against national percentile norms to determine a student’s initial performance level. AimswebPlus uses national norming data on student growth conditional to initial performance levels (student growth percentiles or SGP) to provide the administrator/examiner feedback when creating a progress monitoring schedule. Administrators choose a target end date for the progress monitoring schedule, an interval of testing time points, and a target goal score. Choosing an optimal target goal score for the target end date is supported with feedback based on SGP norms matching the student’s initial performance level. Feedback informs the administrator how to choose a goal score that is ambitious enough to help close the gap, but not unrealistically too high or too low. Oral Reading Fluency is an individually administered measure using printed pages shown to students while an examiner records the student’s performance on a computer-based digital record form. The 1-to-1 administration style allows the examiner to directly observe performance, which can provide important formative information in addition to final scores about how to continue the student’s instructional support and progress monitoring schedule. ORF is designed to be a brief measure, to minimize testing time especially with students who have difficulties directly related to the measure's content. For each progress monitoring test session, the student sees a one-page story and reads as much of it as possible in 1 minute. To support assessment with diverse populations, instructions are designed to be brief, using simple, grade-appropriate language. Scoring rules include specific guidance about not penalizing students who pronounce words differently due to regional dialects or articulation differences. All stories were reviewed by qualified experts to minimize bias. When allowed by a student’s IEP accommodations like enlarging materials or adapting the physical environment is permitted.

Rates of Improvement and End of Year Benchmarks

Is minimum acceptable growth (slope of improvement or average weekly increase in score by grade level) specified in your manual or published materials?: Yes; If yes, specify the growth standards:; Winter, Winter to Spring, or Fall to Spring). An SGP indicates the percentage of students in the national sample whose seasonal (or annual) rate of improvement (ROI) fell at or below a specified ROI. Separate SGP distributions are computed for each of five initial performance levels: Well Below Average (<11th percentile), Below average (11th-25th percentile), Average (26th-74th percentile). Above Average (75th-89th percentile), and Well Above Average (>89th Percentile). Guidance is provided in our manuals about how to interpret SGPs and form decisions on whether students are showing acceptable growth. This includes evaluating ROIs observed between benchmark screening test administrations, and ROI slopes estimated from multiple progress monitoring scores over time. aimswebPlus progress monitoring features also provide real-time feedback for administrators when setting progress monitoring goals. Goals are set in the system by selecting the measure and baseline score, the goal date, the monitoring frequency (default is weekly), and the goal score. When the user defines the goal score, the system automatically labels the ambitiousness of the goal. The rate of improvement needed to achieve the goal is computed and translated into an SGP. An SGP < 50 is considered Insufficient; an SGP between 50 and 85 is considered Closes the Gap; an SGP between 85 and 97 is considered Ambitious; and an SGP > 97 is considered Overly Ambitious. aimswebPlus recommends setting performance goals at the top of the Closes the Gap range. Our manuals provide extensive guidance including case-study examples to help the administrator make the most appropriate decision about what growth rate is appropriate for each individual student.

Are benchmarks for minimum acceptable end-of-year performance specified in your manual or published materials?: Yes; If yes, specify the end-of-year performance standards:; aimswebPlus allows users to select from a range of end-of-year targets, with recommendations on how to decide on the most appropriate goal accounting for their school's unique student body and instructional needs as well as alignments to any state-defined criteria for proficiency. aimswebPlus defines a meaningful target as one that is objective, quantifiable, and can be linked to a criterion that has inherent meaning for teachers. To establish a meaningful performance target using aimswebPlus tiers, the account manager (e.g., a school/district administrator) is advised to choose a target that is linked to a criterion, is challenging and achievable, and reflects historical performance results (when available). Users are also advised to consider the resources available to achieve the goal. The targets are primarily based on spring reading or math composite score national percentiles but can also be applied to individual measures in the aimswebPlus assessment system. Twelve national percentile targets ranging from the 15th through the 70th percentile, in increments of 5 are provided. This range was chosen because it covers the breadth of passing rates on state assessments and the historical range of targets typically used. The system provides a default spring performance target of the 30th national percentile. Targets can be set separately for Reading and Math. Guides and other resources provide more detail to help users define a high-quality performance target and present a step-by-step method to align spring performance targets to performance levels on state accountability tests. Once a target is selected, the aimswebPlus system automatically identifies the fall (or winter) cut score that divides the score distribution into three instructional Tiers. Students above the highest cut score are in Tier 1 and have a high probability (80%–95%) of meeting the performance target; students between the upper and lower cut scores are in Tier 2 and have a moderate probability (40%–70%) of meeting the performance target; and students below the lower cut score are in Tier 3 and have a low probability (10%–40%) of meeting the performance target.

What is the basis for specifying minimum acceptable growth and end of year benchmarks?

Norm-referenced
not selected

Criterion-referenced
not selected

Other

If other, please specify:

False

If norm-referenced, describe the normative profile.

National representation (check all that apply):

Northeast:

New England

Middle Atlantic

Midwest:

East North Central

West North Central

South:

South Atlantic

East South Central

West South Central

West:

Mountain

Pacific

Local representation (please describe, including number of states)

Date: 2013-2014
Size: 18,088

Gender (Percent)

Male: 50
Female: 50
Unknown: 0

SES indicators (Percent)

Eligible for free or reduced-price lunch: Based on schoolwide eligibility for free or reduced lunch, students were sorted into Low (1-32% eligible), Moderate (33-66% eligible), and High (67-100% eligible) SES categories. Students were distributed fairly evenly among the three SES levels.
Other SES Indicators

Race/Ethnicity (Percent)

White, Non-Hispanic: 48-54%
Black, Non-Hispanic: 12-16%
Hispanic: 24-27%
American Indian/Alaska Native
Asian/Pacific Islander
Other: 9-11%
Unknown

Disability classification (Please describe): Participating schools were required to assess all students in the selected grades except those with moderate to severe intellectual disabilities or moderate to severe motor impairment and those who are blind, or deaf.
First language (Please describe): English
Language proficiency status (Please describe): Participating schools were required to assess all students in the selected grades except those with an English Language Proficiency score of less than 3.

Do you provide, in your user’s manual, norms which are disaggregated by race or ethnicity? If so, for which race/ethnicity?

White, Non-Hispanic
not selected

Black, Non-Hispanic
not selected

Hispanic

American Indian/Alaska Native
not selected

Asian/Pacific Islander
not selected

Other

Unknown

If criterion-referenced, describe procedure for specifying criterion for adequate growth and benchmarks for end-of-year performance levels.

Describe any other procedures for specifying adequate growth and minimum acceptable end of year performance.

To get the most value from progress monitoring, aimswebPlus recommends the following: (1) establish a time frame, (2) determine the level of performance expected, and (3) determine the criterion for success. Typical time frames include the duration of the intervention or the end of the school year. An annual time frame is typically used when IEP goals are written for students who are receiving special education services. aimswebPlus provides several ways to define a level of expected performance. The goal can be based on: well-established performance benchmarks that can be linked to aimswebPlus measures via national percentiles (e.g., the link to state test performance levels) or total score (e.g., words read per minute in Grade 2); a national performance norm benchmark (e.g., the 50th national percentile is often used to indicate on-grade level performance); a local performance norm benchmark; or an expected or normative rate of improvement (ROI). When users set progress monitoring goals, aimswebPlus uses student growth percentiles to describe these normative rates of improvement. Within the aimswebPlus software, the user enters the goal date and moves a digital slider to the desired ROI. As the slider moves, it provides feedback about the strength of the ROI: Insufficient, Closes the Gap, Ambitious, or Overly Ambitious. Users are encouraged to use the Ambitious (85th–97th SGP) for students who need intensive intervention.

Performance Level

Reliability

Grade	Grade 1	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

*Offer a justification for each type of reliability reported, given the type and purpose of the tool.: The purpose of Oral Reading Fluency (ORF) as a progress monitoring tool is to measure reading performance across multiple time points. For each grade level, ORF uses multiple alternative forms that are equivalent in passage difficulty and consistent in form. Here we report two types of reliability evidence: alternative-form reliability and internal-consistency (alpha). Justification for Study 1: Alternate-form reliability, where equivalent forms are administered close together in time, is important for progress monitoring ORF measures because it shows the consistency of the scores from independently timed administrations with different content. Justifi cation for Study 2: Internal consistency of test form scores is important to assess howreliably the forms collectively measure the same underlying construct.

*Describe the sample(s), including size and characteristics, for each reliability analysis conducted.: To conduct each reliability analysis independently, two student samples at each grade level were drawn from allstudents completing ORF progress monitoring measures during the 2022-2023 school year. The two samples weredefi ned based on their demographic characteristics (i.e., gender and ethnicity). The purpose was to create tworepresentative samples. Sample 1 was used to calculate internal consistency (Male: 51.25%, Female: 43.75%, Non-Binary/Unknown: 5%, Asian: 1.5%, Black: 15.5%, Hispanic: 13.5%, Native American: 2%, White: 44.5%,Multiple/Other/Unknown: 23.4). Internal Consistency was calculated on students taking 80% or more of theprogress monitoring forms. To include students who did not complete all PM scores, we employed the multipleimputation by chained equations (MICE) algorithm using the mice package in R, with 5 multiple imputations usingpredicative mean matching. The percentage of imputed data was less than 7.5% for each grade level. Sample 2 wasused in computing alternative form reliability (Male: 49.9%, Female: 43.1%, Nonbinary/Not Reporting: 7.1%, Asian:1.6%, Black: 16.0%, Hispanic: 13.4%, Native American: 2.0%, White: 41.3%, Multiple/Other/Unknown: 25.9).

*Describe the analysis procedures for each reported type of reliability.: Alternate-form reliability and Internal consistency analyses were completed on separate samples of aimswebPlusprogress monitoring data. For Alternate-form reliability evidence was calculated as the Pearson correlationcoeffi cient using test scores across all pairs of progress monitoring forms. For our internal consistency of ORF forms,Alpha (Cronbach’s Alpha) was calculated by treating ORF scores from each administration of the progressmonitoring forms as separate indicators of the student’s ability.

*In the table(s) below, report the results of the reliability analyses described above (e.g., model-based evidence, internal consistency or inter-rater reliability coefficients). Include detail about the type of reliability data, statistic generated, and sample size and demographic information.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Do you have reliability data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?: No

If yes, fill in data for each subgroup with disaggregated reliability data.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published reliability studies:

Provide citations for additional published studies.

Validity

Grade	Grade 1	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

*Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.: Two external criterion measures were used in our validity analyses: the English Language Arts end-of-year assessment from the Tennessee Comprehensive Assessment Program (TCAP-ELA) and the Northwest Evaluation Association Measures of Academic Progress for Reading (NWEA-MAP Reading). TCAP-ELA was used as our external criterion measure for our concurrent validity analyses for grades 3-8 and predictive validity analyses for grades 1-8. TCAP-ELA is a state-summative assessment measuring reading and writing abilities, with a focus on reading literary and informational texts. The Map Growth Reading RIT scores were used as our external criterion measure for our concurrent validity analyses for grades 1-2. MAP Reading assesses standards-aligned reading skills. TCAP-ELA and MAP Reading are appropriate criterion measure for these analyses as ORF is intended to measure a fundamental reading ability, which has been found to be strongly associated with reading proficiency more broadly (e.g. Fuchs et al, 2021, Petscher & Kim, 2011, Washburn, 2022).

*Describe the sample(s), including size and characteristics, for each validity analysis conducted.: Two samples of students from Tennessee and Illinois were chosen for our validity analyses. The Tennessee sample comes from a large school district with students represented across 51 elementary schools (Grades 1-5) and 21 middle schools (Grades 6-8) in urban, suburban, and rural regions. Students of all ability levels in this district completed ORF as a part of their universal screening assessments at the beginning, middle, and end of the school year and a portion of these students use ORF for progress monitoring. The Illinois sample comes from a school district with students represented across 10 elementary schools of varying sizes and locations around a medium sized city . Demographic data indicate the Illinois sample was drawn from a diverse district composed of multiple ethnic and socioeconomic backgrounds. For predictive validity analyses, ORF data for Grades 3-8 of the Tennessee sample were gathered during Fall of the 2022-2023 school year. For predictive validity analyses for grades 1-2, ORF data for Grade 1 of the Tennessee sample were gathered during the Spring of the 2021, and ORF data for Grade 2 was gathered during the Spring of 2022. For concurrent validity analyses, ORF data for Grades 3-8 came from the Tennessee sample and was gathered during the spring of 2023. For concurrent validity analyses for Grades 1-2, ORF data came from the Illinois Sample and was gathered during the winter of 2022-2023. All students with valid ORF and External criterion data scores meeting the criteria for our analyses were included.

*Describe the analysis procedures for each reported type of validity.: Two type of validity analyses were conducted with ORF and the external criterion measures: predictive validity and concurrent validity. Predictive Validity analyses were conducted by examining the strength of the Pearson correlation coefficient between ORF scores gathered multiple months prior to TCAP-ELA scores for the same students. For predictive validity analyses for grades 3-8 correlation coefficients were calculated between scores from ORF tests given in the Fall of 2022 (August-November) and TCAP-ELA given in the Spring (March) of 2023. For the predictive validity analysis for Grade 1, correlation coefficients were calculated between scores from ORF tests given in the Spring of 2020 and TCAP scores of the same students assessed at the end of Grade 3 in the Spring of 2023. For the predictive validity analysis for Grade 2, correlation coefficients were calculated between scores from ORF tests given in the Spring of 2021 and TCAP scores of the same students assessed at the end of Grade 3 in the Spring of 2023. Concurrent Validity analyses were conducted with the Illinois Sample for Grades 1-2 and the Tennessee Sample for Grades 3-8, by examining the strength of the Pearson correlation coefficient between ORF scores and the external criterion measures given within 2 months of each other. For the concurrent validity analysis for Grades 1 and 2 correlation coefficients were calculated between MAP Growth Reading scores in the Winter of 2023 (end of January to beginning of February) and ORF scores around the same time (January 2023). For the concurrent validity analysis for Grades 3-8, correlation coefficients were calculated between scores from ORF tests given in the Spring of 2023 and TCAP-ELA given in the spring (March) of 2023. For predictive and concurrent validity analyses, 95% confidence intervals for the correlation coefficients using the Fischer z-transformation.

*In the table below, report the results of the validity analyses described above (e.g., concurrent or predictive validity, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of validity analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Describe the degree to which the provided data support the validity of the tool.: Results of the predictive validity studies indicate that ORF scores have a strong positive relationship with general assessments of Reading abilities assessed in the TCAP summative assessment of ELA. This positive relationship was observed across all grade levels, especially among elementary school-aged students. Furthermore, results in grades 1 and 2 support the validity of ORF as measuring fundamental reading abilities associated with the development of reading skills relevant to end-of-years state summative assessments like the TCAP-ELA. Results of the concurrent validity analyses, especially for Grades 3-8 further indicate that ORF is a valid measure of passage reading associated with reading skills assessed on the TCAP-ELA. Concurrent validity coefficients for Grades 1-2, indicate a stronger positive relationship between ORF and MAP Growth Reading scores in 2nd grade than 1st grade, which may reflect a broader range of reading and language skills assessed on Grade 1 MAP assessments that may not be directly related to ORF passage reading skills. Similarly, trends of predictive and concurrent validity coefficients being stronger for younger grade levels than higher grade levels (e.g. Grades 6-8) may indicate how oral reading skills are more indicative of the kinds of reading practices emphasized in elementary school, and a relatively foundational compared to higher-level reading comprehension abilities emphasized on summative state assessments like the TCAP.

Do you have validity data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?: No

If yes, fill in data for each subgroup with disaggregated validity data.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of validity analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Bias Analysis

Grade	Grade 1	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8
Rating	Not Provided	Not Provided	Not Provided	Not Provided	Not Provided	Not Provided	Not Provided	Not Provided

Have you conducted additional analyses related to the extent to which your tool is or is not biased against subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)? Examples might include Differential Item Functioning (DIF) or invariance testing in multiple-group confirmatory factor models.: Yes

If yes,
a. Describe the method used to determine the presence or absence of bias:: ORF passages are written to be a fair assessment of reading accuracy and speed for all students, and the identification of reading errors is a critical component of progress monitoring with ORF. Therefore, bias Analyses focused on evaluating how passages are composed at the word level, to ensure that the words chosen are of equal difficulty to students of different genders or backgrounds associated with differing race/ethnicity groups. DIF analyses for ORF tests were conducted using the logistic regression method. Items were evaluated by the delta R2, the differences in Nagelkerke's R2 coefficients. The effect sizes are classified as "negligible", "moderate" or "large" based on Zumbo and Thomas (1997). In this analysis, items with “moderate” or “large” effect sizes were flagged as DIF items. Diff analyses were performed on all ORF parallel forms, with words of the passages represent "items". Therefore, this approach examines the extent to which the words used within our ORF passages assess reading fairly between gender and ethnicity groups.

b. Describe the subgroups for which bias analyses were conducted:: DIF analyses were performed in two ways to examine test performance within two demographic categories: gender and race/ethnicity. First, we compared results between students identified as male or female. Second, we investigated evidence of DIF between the following racial/ethnic subgroups: White, Black, Hispanic, Asian, and Other (including unknown, multiple races, and other frequently reported ethnicity groups). The total student sample in DIF analyses across Grades 1-8 included 355,540 students across 1833 school districts for DIF analyses based on gender (52% Male; 48% Female) and 302,509 students across 1560 districts for DIF analyses based on ethnicity (2.5% Asian, 18.3% Black, 16.9% Hispanic, 53% White, and 9.21% Other).

c. Describe the results of the bias analyses conducted, including data and interpretative statements. Include magnitude of effect (if available) if bias has been identified.: Word reading accuracy and the identification of common reading miscues is a critical feature of testing with ORF, especially for progress monitoring students with reading difficulties and disabilities. ORF scores equal the number of words read correctly from a passage within a 1-minute time limit. Results indicate that the text of ORF passages provide a fair assessment of reading accuracy across gender and racial/ethnic groups. At all grade levels over 99% of items on average showed no or negligible evidence of DIF between students identified as male and females or between racial/ethnic groups. The average proportion of items showing either moderate or large effect sizes ranged across grade levels from 0.0018-0.0096 for DIF analyses based on gender, and 0.0011-0.0052 for Diff analyses based on race/ethnicity.

Growth Standards

Sensitivity: Reliability of Slope

Grade	Grade 1	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

Describe the sample, including size and characteristics. Please provide documentation showing that the sample was composed of students in need of intensive intervention. A sample of students with intensive needs should satisfy one of the following criteria: (1) all students scored below the 30th percentile on a local or national norm, or the sample mean on a local or national test fell below the 25th percentile; (2) students had an IEP with goals consistent with the construct measured by the tool; or (3) students were non-responsive to Tier 2 instruction. Evidence based on an unknown sample, or a sample that does not meet these specifications, may not be considered.: aimswebPlus progress monitoring data from ORF tests administered during the 2022-2023 school year was analyzed to identify student cases with sufficient data for our reliability of slope analyses. First, we only included student cases where students used on-grade-level ORF forms. Next, we identified students at each grade level with at least 10 ORF test scores gathered at regular intervals across at least 20 weeks. From these students, included only students with a baseline data point (1st ORF test score) below the 30% percentile according to national norms.

Describe the frequency of measurement (for each student in the sample, report how often data were collected and over what span of time).: Students included in the reliability analysis had completed at least 10 ORF forms, with some student's schedules completing up to 20 unique progress monitoring forms. Testing intervals were no shorter than once per week and up to once per month. Students with gaps between testing sessions longer than 40 days were excluded from the analysis. All students included in the analysis completed testing schedules over a span of at least 20 weeks.

Describe the analysis procedures.: Reliability of slope analyses for ORF progress monitoring were conducted separately for each grade level, using a mixed effects regression model. The ‘lme4’ package in R was used to model the average rate of improvement (ROI) in ORF test scores observed across time (fixed effect) while accounting for individual variation in baseline scores (random intercept) and ROIs (random slopes) observed in each individual student's data. The resulting model was used to extract the true score variance of the slope, representing the true variability in score changes detected by ORF in students over time, and the total variance in slope, which includes both the true score variance, individual variation (random effects), and other forms of variance. The reliability of slope coefficient was calculated to be the ration of true score variance to total slope variance in the model.

In the table below, report reliability of the slope (e.g., ratio of true slope variance to total slope variance) by grade level (if relevant).

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Do you have reliability of the slope data that is disaggregated by subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)?: No

If yes, fill in data for each subgroup with disaggregated reliability of the slope data.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published reliability studies:

Provide citations for additional published studies.

Sensitivity: Validity of Slope

Grade	Grade 1	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.

Describe the sample(s), including size and characteristics. Please provide documentation showing that the sample was composed of students in need of intensive intervention. A sample of students with intensive needs should satisfy one of the following criteria: (1) all students scored below the 30th percentile on a local or national norm, or the sample mean on a local or national test fell below the 25th percentile; (2) students had an IEP with goals consistent with the construct measured by the tool; or (3) students were non-responsive to Tier 2 instruction. Evidence based on an unknown sample, or a sample that does not meet these specifications, may not be considered.

Describe the frequency of measurement (for each student in the sample, report how often data were collected and over what span of time).

Describe the analysis procedures for each reported type of validity.

In the table below, report predictive validity of the slope (correlation between the slope and achievement outcome) by grade level (if relevant).
NOTE: The TRC suggests controlling for initial level when the correlation for slope without such control is not adequate.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published validity studies:

Provide citations for additional published studies.

Describe the degree to which the provided data support the validity of the tool.

Do you have validity of the slope data that is disaggregated by subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)?

If yes, fill in data for each subgroup with disaggregated validity of the slope data.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published validity studies:

Provide citations for additional published studies.

Alternate Forms

Grade	Grade 1	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

Describe the sample for these analyses, including size and characteristics:

What is the number of alternate forms of equal and controlled difficulty?

If IRT based, provide evidence of item or ability invariance

If computer administered, how many items are in the item bank for each grade level?

If your tool is computer administered, please note how the test forms are derived instead of providing alternate forms:

Decision Rules: Setting & Revising Goals

Grade	Grade 1	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

In your manual or published materials, do you specify validated decision rules for how to set and revise goals?: Yes
If yes, specify the decision rules:

What is the evidentiary basis for these decision rules? NOTE: The TRC expects evidence for this standard to include an empirical study that compares a treatment group to a control and evaluates whether student outcomes increase when decision rules are in place.

Decision Rules: Changing Instruction

Grade	Grade 1	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

In your manual or published materials, do you specify validated decision rules for when changes to instruction need to be made?
If yes, specify the decision rules:

What is the evidentiary basis for these decision rules? NOTE: The TRC expects evidence for this standard to include an empirical study that compares a treatment group to a control and evaluates whether student outcomes increase when decision rules are in place.

Data Collection Practices

Most tools and programs evaluated by the NCII are branded products which have been submitted by the companies, organizations, or individuals that disseminate these products. These entities supply the textual information shown above, but not the ratings accompanying the text. NCII administrators and members of our Technical Review Committees have reviewed the content on this page, but NCII cannot guarantee that this information is free from error or reflective of recent changes to the product. Tools and programs have the opportunity to be updated annually or upon request.

Summary

Tool Information
Descriptive Information
Administration
Training & Scoring
Benchmarks

Performance Level
Reliability
Validity
Bias Analysis

Growth Standards
Sensitivity
Alternate Forms
Decision Rules

Data Collection Practices

aimswebPlus ReadingOral Reading Fluency

Summary

Tool Information

Descriptive Information

Acquisition and Cost Information

Administration

Training & Scoring

Training

Scoring

Rates of Improvement and End of Year Benchmarks

Performance Level

Reliability

Validity

Bias Analysis

Growth Standards

Sensitivity: Reliability of Slope

Sensitivity: Validity of Slope

Alternate Forms

Decision Rules: Setting & Revising Goals

Decision Rules: Changing Instruction

Data Collection Practices

aimswebPlus Reading
Oral Reading Fluency