Amira ISIP
Reading

Summary

Amira has two essential jobs: (1) provide supplemental instructional materials and student data to teachers with a bridge from science of reading professional development to classroom execution and (2) engage students in the time on the tongue needed to bridge identified reading skills gaps. Amira’s personalized learning software listens to students read aloud, identifies dyslexic students, continuously assesses reading mastery, and delivers individualized tutoring. Our solution empowers the learning community, from students to paraprofessionals, teachers, parents, and administrators, with AI-powered tools for real-time results. Utilizing Amira, teachers can continuously assess and monitor student progress in oral language, phonological awareness, phonics, fluency, vocabulary and comprehension. Amira optimizes instructional strategies and resource allocation to provide a comprehensive solution. Anytime a student reads with Amira, all stakeholders are provided up-to-the-moment feedback critical for teachers to make the right decisions for intervention support.

Where to Obtain:
Amira Learning, Inc.
orders@amiralearning.com
5214F Diamond Heights Blvd #3255, San Francisco, CA 94131
650-455-4380
www.amiralearning.com
Initial Cost:
$7.50 per student
Replacement Cost:
$7.50 per student per year
Included in Cost:
Amira is a comprehensive and holistic assessment, instruction, and tutoring suite. Assess ($7.50 per student per year): Benchmark, dyslexia screening, and ongoing progress monitoring to diagnose skill gaps; Instruct ($12 .50 per student per year): Lesson plans prescribed and delivered based on targeted needs that can be linked to each district’s high-quality instructional materials; Tutor ($15 per student per year): Combined literacy instruction and practice during school or through extended learning opportunities; Amira Full Suite (includes Assess, Instruct, and Tutor): $35 per student per year. These student license costs include the screening tool software, access for students and district/school personnel, virtual (live and asynchronous) professional development, and foundational data for implementation and monitoring. Bulk discounts are available pending the number of student licenses needed by states/districts.
Amira software is accessibility ready, adhering to the Web Content Accessibility Guidelines (WCAG 2.1) Level AA, ensuring that content is accessible and enhancing usability for all users. Amira adheres to best practices for UX development, supporting WCAG 2.1 guidelines. Amira is also SOC 2 Type 2 certified. All tasks in Amira’s English suite of assessments and practice are also available in Spanish. Details on Amira’s accommodations are available in the Teacher Manual (2024) pages 16-31 available at https://go.amiralearning.com/hubfs/Assessment/Amira%20Teacher%20Manual%20(2024).pdf.
Training Requirements:
Administrators are encouraged to attend or complete online asynchronous 45-minute training, “Getting Started with Amira” prior to initial administration. Following administration, administrators may access self-paced asynchronous training, including understanding data on Amira Academy. Amira handles most of the tasks that typically are difficult for teachers. The software acts as a proctor, guiding the student through each task. The software is a highly adept support technician, identifying hardware and software issues that may impact the assessment process. Finally, Amira produces a consistent and comprehensive scoring of the items, providing the teacher with a framework for evaluating outputs.
Qualified Administrators:
No minimum qualifications specified.
Access to Technical Support:
Assessment Format:
Scoring Time:
  • Scoring is automatic OR
  • 0 minutes per student
Scores Generated:
  • Raw score
  • Standard score
  • Percentile score
  • Grade equivalents
  • IRT-based score
  • Developmental benchmarks
  • Developmental cut points
  • Equated
  • Lexile score
  • Error analysis
  • Composite scores
  • Subscale/subtest scores
Administration Time:
  • 20 minutes per student
Scoring Method:
  • Manually (by hand)
  • Automatically (computer-scored)
Technology Requirements:
  • Computer or tablet
  • Internet connection
Accommodations:
Amira software is accessibility ready, adhering to the Web Content Accessibility Guidelines (WCAG 2.1) Level AA, ensuring that content is accessible and enhancing usability for all users. Amira adheres to best practices for UX development, supporting WCAG 2.1 guidelines. Amira is also SOC 2 Type 2 certified. All tasks in Amira’s English suite of assessments and practice are also available in Spanish. Details on Amira’s accommodations are available in the Teacher Manual (2024) pages 16-31 available at https://go.amiralearning.com/hubfs/Assessment/Amira%20Teacher%20Manual%20(2024).pdf.

Descriptive Information

Please provide a description of your tool:
Amira has two essential jobs: (1) provide supplemental instructional materials and student data to teachers with a bridge from science of reading professional development to classroom execution and (2) engage students in the time on the tongue needed to bridge identified reading skills gaps. Amira’s personalized learning software listens to students read aloud, identifies dyslexic students, continuously assesses reading mastery, and delivers individualized tutoring. Our solution empowers the learning community, from students to paraprofessionals, teachers, parents, and administrators, with AI-powered tools for real-time results. Utilizing Amira, teachers can continuously assess and monitor student progress in oral language, phonological awareness, phonics, fluency, vocabulary and comprehension. Amira optimizes instructional strategies and resource allocation to provide a comprehensive solution. Anytime a student reads with Amira, all stakeholders are provided up-to-the-moment feedback critical for teachers to make the right decisions for intervention support.
The tool is intended for use with the following grade(s).
not selected Preschool / Pre - kindergarten
selected Kindergarten
selected First grade
selected Second grade
selected Third grade
selected Fourth grade
selected Fifth grade
selected Sixth grade
not selected Seventh grade
not selected Eighth grade
not selected Ninth grade
not selected Tenth grade
not selected Eleventh grade
not selected Twelfth grade

The tool is intended for use with the following age(s).
not selected 0-4 years old
selected 5 years old
selected 6 years old
selected 7 years old
selected 8 years old
selected 9 years old
selected 10 years old
selected 11 years old
not selected 12 years old
not selected 13 years old
not selected 14 years old
not selected 15 years old
not selected 16 years old
not selected 17 years old
not selected 18 years old

The tool is intended for use with the following student populations.
selected Students in general education
selected Students with disabilities
selected English language learners

ACADEMIC ONLY: What skills does the tool screen?

Reading
Phonological processing:
selected RAN
selected Memory
selected Awareness
selected Letter sound correspondence
selected Phonics
not selected Structural analysis

Word ID
selected Accuracy
selected Speed

Nonword
selected Accuracy
selected Speed

Spelling
selected Accuracy
selected Speed

Passage
selected Accuracy
selected Speed

Reading comprehension:
selected Multiple choice questions
selected Cloze
not selected Constructed Response
selected Retell
not selected Maze
not selected Sentence verification
not selected Other (please describe):


Listening comprehension:
selected Multiple choice questions
not selected Cloze
not selected Constructed Response
selected Retell
not selected Maze
not selected Sentence verification
selected Vocabulary
selected Expressive
selected Receptive

Mathematics
Global Indicator of Math Competence
not selected Accuracy
not selected Speed
not selected Multiple Choice
not selected Constructed Response

Early Numeracy
not selected Accuracy
not selected Speed
not selected Multiple Choice
not selected Constructed Response

Mathematics Concepts
not selected Accuracy
not selected Speed
not selected Multiple Choice
not selected Constructed Response

Mathematics Computation
not selected Accuracy
not selected Speed
not selected Multiple Choice
not selected Constructed Response

Mathematic Application
not selected Accuracy
not selected Speed
not selected Multiple Choice
not selected Constructed Response

Fractions/Decimals
not selected Accuracy
not selected Speed
not selected Multiple Choice
not selected Constructed Response

Algebra
not selected Accuracy
not selected Speed
not selected Multiple Choice
not selected Constructed Response

Geometry
not selected Accuracy
not selected Speed
not selected Multiple Choice
not selected Constructed Response

not selected Other (please describe):

Please describe specific domain, skills or subtests:
BEHAVIOR ONLY: Which category of behaviors does your tool target?


BEHAVIOR ONLY: Please identify which broad domain(s)/construct(s) are measured by your tool and define each sub-domain or sub-construct.

Acquisition and Cost Information

Where to obtain:
Email Address
orders@amiralearning.com
Address
5214F Diamond Heights Blvd #3255, San Francisco, CA 94131
Phone Number
650-455-4380
Website
www.amiralearning.com
Initial cost for implementing program:
Cost
$7.50
Unit of cost
student
Replacement cost per unit for subsequent use:
Cost
$7.50
Unit of cost
student
Duration of license
year
Additional cost information:
Describe basic pricing plan and structure of the tool. Provide information on what is included in the published tool, as well as what is not included but required for implementation.
Amira is a comprehensive and holistic assessment, instruction, and tutoring suite. Assess ($7.50 per student per year): Benchmark, dyslexia screening, and ongoing progress monitoring to diagnose skill gaps; Instruct ($12 .50 per student per year): Lesson plans prescribed and delivered based on targeted needs that can be linked to each district’s high-quality instructional materials; Tutor ($15 per student per year): Combined literacy instruction and practice during school or through extended learning opportunities; Amira Full Suite (includes Assess, Instruct, and Tutor): $35 per student per year. These student license costs include the screening tool software, access for students and district/school personnel, virtual (live and asynchronous) professional development, and foundational data for implementation and monitoring. Bulk discounts are available pending the number of student licenses needed by states/districts.
Provide information about special accommodations for students with disabilities.
Amira software is accessibility ready, adhering to the Web Content Accessibility Guidelines (WCAG 2.1) Level AA, ensuring that content is accessible and enhancing usability for all users. Amira adheres to best practices for UX development, supporting WCAG 2.1 guidelines. Amira is also SOC 2 Type 2 certified. All tasks in Amira’s English suite of assessments and practice are also available in Spanish. Details on Amira’s accommodations are available in the Teacher Manual (2024) pages 16-31 available at https://go.amiralearning.com/hubfs/Assessment/Amira%20Teacher%20Manual%20(2024).pdf.

Administration

BEHAVIOR ONLY: What type of administrator is your tool designed for?
not selected General education teacher
not selected Special education teacher
not selected Parent
not selected Child
not selected External observer
not selected Other
If other, please specify:

What is the administration setting?
not selected Direct observation
not selected Rating scale
not selected Checklist
not selected Performance measure
not selected Questionnaire
not selected Direct: Computerized
not selected One-to-one
not selected Other
If other, please specify:

Does the tool require technology?
Yes

If yes, what technology is required to implement your tool? (Select all that apply)
selected Computer or tablet
selected Internet connection
not selected Other technology (please specify)

If your program requires additional technology not listed above, please describe the required technology and the extent to which it is combined with teacher small-group instruction/intervention:

What is the administration context?
selected Individual
selected Small group   If small group, n=4
selected Large group   If large group, n=20
selected Computer-administered
selected Other
If other, please specify:
any sized group

What is the administration time?
Time in minutes
20
per (student/group/other unit)
student

Additional scoring time:
Time in minutes
0
per (student/group/other unit)
student

ACADEMIC ONLY: What are the discontinue rules?
not selected No discontinue rules provided
selected Basals
not selected Ceilings
not selected Other
If other, please specify:


Are norms available?
Yes
Are benchmarks available?
Yes
If yes, how many benchmarks per year?
3
If yes, for which months are benchmarks available?
Aug-Nov, Dec-March, April-July; Amira ISIP offers unlimited additional administrations of progress monitoring throughout the school year in addition to benchmark assessment.
BEHAVIOR ONLY: Can students be rated concurrently by one administrator?
If yes, how many students can be rated concurrently?

Training & Scoring

Training

Is training for the administrator required?
Yes
Describe the time required for administrator training, if applicable:
Administrators are encouraged to attend or complete online asynchronous 45-minute training, “Getting Started with Amira” prior to initial administration. Following administration, administrators may access self-paced asynchronous training, including understanding data on Amira Academy. Amira handles most of the tasks that typically are difficult for teachers. The software acts as a proctor, guiding the student through each task. The software is a highly adept support technician, identifying hardware and software issues that may impact the assessment process. Finally, Amira produces a consistent and comprehensive scoring of the items, providing the teacher with a framework for evaluating outputs.
Please describe the minimum qualifications an administrator must possess.
selected No minimum qualifications
Are training manuals and materials available?
Yes
Are training manuals/materials field-tested?
Yes
Are training manuals/materials included in cost of tools?
Yes
If No, please describe training costs:
Can users obtain ongoing professional and technical support?
Yes
If Yes, please describe how users can obtain support:

Scoring

How are scores calculated?
selected Manually (by hand)
selected Automatically (computer-scored)
not selected Other
If other, please specify:

Do you provide basis for calculating performance level scores?
Yes
What is the basis for calculating performance level and percentile scores?
not selected Age norms
selected Grade norms
not selected Classwide norms
not selected Schoolwide norms
not selected Stanines
not selected Normal curve equivalents

What types of performance level scores are available?
selected Raw score
selected Standard score
selected Percentile score
selected Grade equivalents
selected IRT-based score
not selected Age equivalents
not selected Stanines
not selected Normal curve equivalents
selected Developmental benchmarks
selected Developmental cut points
selected Equated
not selected Probability
selected Lexile score
selected Error analysis
selected Composite scores
selected Subscale/subtest scores
not selected Other
If other, please specify:

Does your tool include decision rules?
No
If yes, please describe.
Can you provide evidence in support of multiple decision rules?
No
If yes, please describe.
Please describe the scoring structure. Provide relevant details such as the scoring format, the number of items overall, the number of items per subscale, what the cluster/composite score comprises, and how raw scores are calculated.
Amira leverages machine learning and AI to automatically score every student interaction. Interactions that Amira automatically scores include rapid automatized naming (RAN), letter sound fluency, blending, word reading, word part manipulation, spelling, listening comprehension, oral reading, multiple-choice questions, open-ended oral responses, and open-ended written responses, among others. The cornerstone of Amira’s scoring system lies in its ability to automatically and accurately score these varied interactions. Where necessary, Amira incorporates rubric-based scoring to further enhance the precision of scoring, particularly for open-ended responses such Amira’s dialogue-based comprehension questions. Amira’s ability to automatically and accurately score each student interaction uniquely enables the software not only to provide immediate feedback to students but also to maintain continuously updated profiles on each student's progress, achievements, and instructional needs. Each time a student completes an activity, Amira instantly and automatically scores that activity. The scoring process also immediately updates all teacher-facing reports. This means that Amira’s scoring system allows Amira to maintain a real-time profile of each student’s progress, achievements, and instructional needs, and makes that always up-to-date profile available to educators. An integral feature of Amira’s scoring system is mechanisms for quality and equity assurance. One such mechanism is a “meta-analysis” conducted by our machine learning models. This process identifies any discrepancies that may indicate a misrepresentation of a student’s true abilities, enabling the system to flag activities that may not represent a student’s best effort. A second mechanism is a set of automated data integrity tests to ensure data quality and consistency across the locations data is stored. The third and critical mechanism is that recordings of activities are always available to educators should they want to listen and adjust any of Amira’s scoring. Educators may click on an activity in any of Amira’s reports to bring up a recording of the activity and adjust scores as needed. Amira’s Scoring System rigorously and universally adheres to the following principles: (a) The scoring of ALL items is visible and transparent to teachers/proctors. There is no black box – educators can see every item taken and every Amira score; (b) educators have the ability to actually listen to student responses, enabling 100% auditability of Amira’s scoring; (c) educators/teachers/proctors have the ability to correct Amira’s scoring manually. If desired, a state/district can allow educators to have the final word on the scoring of any student’s assessment, with the ability to override the Amira scoring item by item.
Describe the tool’s approach to screening, samples (if applicable), and/or test format, including steps taken to ensure that it is appropriate for use with culturally and linguistically diverse populations and students with disabilities.
Amira’s voice-administered system is fully validated and adaptable for students in kindergarten through 6th grade, ensuring it meets the requirements for effective early literacy assessment across various grade levels and student needs. The system’s validation process includes rigorous testing to ensure that it provides accurate and reliable results, even when administered to young learners, including those with special needs or limited prior education. Amira’s mode of administration is designed to be flexible and inclusive, allowing for both digital, AI-driven assessments and partially teacher-facilitated modes. This adaptability ensures that the Screener is appropriate for students in early elementary grades, including those with specific educational needs. The following evidence demonstrates how Amira ISIP has been validated across diverse student populations: Voice Administration Validation: Amira’s voice-enabled assessment system has been validated specifically for students in K-6. The model training and evaluation process leverages an extensive dataset of over 10,000 hours of children's speech, carefully balanced to ensure representation across diverse populations representative of national demographics along dimensions of gender, race/ethnicity, socioeconomic status, ELL/MLL students, and accent/dialect group. These studies confirmed that the system could accurately capture and assess reading performance, with a classification accuracy of over 90% across these grade levels. This validation is crucial for ensuring that the assessments provide reliable data, even in a fully digital, AI-driven mode where the system autonomously guides students through tasks. Consistency and Standardization: The consistency and standardization provided by Amira’s AI reduce the risk of bias and human error, aligning with best practices for large-scale assessments (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 2014). Amira’s machine-proctored model is supported by extensive research demonstrating that AI-driven assessments can offer greater consistency and reliability than human-proctored ones. Human proctoring introduces variability due to differences in proctor training and execution, which can lead to measurement error and bias (Kane, 2013). In contrast, Amira’s AI provides a uniform testing experience, ensuring that every student is assessed under the same conditions, thereby upholding the integrity of the results (Wilson & Draney, 2002). At the individual item scoring level, currently the model being used in production has a 96% agreement with human expert judgment. To benchmark this number, human inter-rater reliability metrics for the same assessment range between 96-97%. At the aggregate assessment score level, there is an extremely high correlation (0.95 - 0.98, depending on the grade level) between the model scores and the same assessments scored by human experts. Proctoring and Administration for Kindergarteners: In both English and Spanish, Amira has proctored and administered millions of screenings. The system has been effectively used in thousands of classrooms, including those in more than 1,200 districts across all 50 states. The proctoring model for kindergarteners emphasizes small group administration, with a teacher serving as the meta-proctor. In this model, the teacher helps students log in, creating a staggered start that aids in managing the process. Once logged in, Amira leads students through a simple, voice-based dialog to ensure the environment is test-ready. This model allows Amira to function as an effective teacher’s assistant, handling most situations independently and alerting the teacher only when necessary. Adaptability to Student Needs: The Screener can be administered in various modes to accommodate different student needs. For younger students or those requiring additional support, Amira can operate in a small group, partially teacher-facilitated mode. In this setup, the teacher facilitates the assessment, while Amira provides automated guidance and feedback. Research supports this approach, indicating that younger children and those with specific needs benefit from the combination of human interaction and AI-driven assessment, which offers reassurance and allows for immediate intervention. Developmental Alignment and Task Appropriateness: Amira’s tasks are carefully aligned with the developmental milestones of K-6 students, ensuring the assessments are both appropriate and effective for identifying reading difficulties, including dyslexia risk at each stage. For example, tasks in kindergarten focus on phonological awareness, while those in 2nd grade emphasize reading fluency. This developmental alignment ensures that the Screener is sensitive to the literacy skills typical of each age group, providing a fair and relevant assessment regardless of the student’s prior education. Evidence of Effectiveness in Diverse Settings: Amira has been implemented and validated in various educational settings, including urban schools in large states such as California and Oklahoma where it was used in whole-classroom settings. Across the country, Amira has been administered in over 500,000 sessions, spanning diverse environments such as urban, suburban, and rural schools. These include Title I schools, charter schools, and districts serving large populations of English language learners and students with disabilities. The system’s predictive validity has been confirmed in these environments, demonstrating that it can provide accurate assessments even when administered to entire classrooms of young students. This large-scale validation ensures that Amira’s fully digital mode is not only effective but also scalable for broad application across different types of schools and student populations. Accommodations for Special Needs: Amira employs a Universal Design for Learning (UDL) approach to ensure that the Screener is accessible to all students, including those with disabilities. The system provides a range of accommodations, including the option for a fully proctored, one-on-one assessment or the use of paper-based alternatives like the TPRI or Tejas Lee for students who may not be well-suited to digital assessments. To date, Amira has been validated with over 7,000 students, including those requiring specific accommodations, with approximately 20% utilizing options such as one-on-one proctoring or bilingual support. This extensive validation confirms that the system reliably accommodates and accurately assesses students with various needs, including those with visual or auditory impairments, English language learners, and students with individualized education programs (IEPs). Amira’s UDL approach includes features that are WCAG 2.1 Level AA compliant, ensuring accessibility for students with disabilities. These accommodations have been tested and validated across multiple studies to ensure that they do not compromise the accuracy or reliability of the assessments. This commitment to accessibility has allowed Amira to meet the needs of a diverse student population while maintaining high standards of assessment validity. Use with English Learners (ELs): Amira’s Screener is configurable for English language learners, offering the option to screen in English, Spanish, or both. This flexibility is supported by research indicating that assessing literacy skills in both the native language and English can provide a more accurate picture of a student’s abilities. Amira’s ability to offer proctoring in Spanish or administer the assessment in a hybrid mode ensures that ELs receive an equitable assessment experience. In summary, Amira’s voice administration is fully validated and adaptable for all K-6 students, including those with special needs or limited prior education. The system’s design and extensive validation across diverse populations ensure that it is both effective and appropriate for early literacy assessment in various educational settings. The flexibility in administration modes, combined with robust evidence of accuracy and reliability, makes Amira an ideal tool for assessing young learners.

Technical Standards

Classification Accuracy & Cross-Validation Summary

Grade Kindergarten
Grade 1
Grade 2
Grade 3
Grade 4
Grade 5
Grade 6
Classification Accuracy Fall Partially convincing evidence Convincing evidence Convincing evidence Convincing evidence Partially convincing evidence Partially convincing evidence Partially convincing evidence
Classification Accuracy Winter Convincing evidence Convincing evidence Convincing evidence Convincing evidence Partially convincing evidence Partially convincing evidence Data unavailable
Classification Accuracy Spring Convincing evidence Convincing evidence Convincing evidence Convincing evidence Partially convincing evidence Partially convincing evidence Partially convincing evidence
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available

NWEA MAP

Classification Accuracy

Select time of year
Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.
The criterion outcome measure used in this study was the NWEA MAP Reading assessment, specifically the RIT scores from the end-of-year administration. This measure was selected for its well-established validity and reliability in assessing students' reading ability across a broad range of skills and grade levels. The NWEA MAP Reading assessment is entirely independent from the screening measure utilized in this study. The two assessments are developed and administered separately, and there is no overlap in test items, scoring rubrics, or administration processes. The only shared characteristic is their common purpose of measuring the construct of reading ability. This shared construct ensures alignment between the screening and outcome measures without introducing direct dependency, thereby supporting the validity of the classification accuracy study.
Do the classification accuracy analyses examine concurrent and/or predictive classification?

Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.
Describe how the classification analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).
The degree to which the Amira Screener can accurately identify students who need intensive intervention was evaluated using classification accuracy statistics based on the Amira cut scores that show the proportion of students correctly classified by their ARM scores as at-risk or not-at-risk and the criterion measure cut scores that show whether students actually need intensive intervention. The classification accuracy analysis was conducted as follows: Compare an individual student's (a) ARM score and the candidate ARM cut score and (b) their score on the criterion measure and the criterion measure cut score. Assign "1" in one of the four designations demonstrated here: TP: Students classified by screener as "At-Risk" and are actually "At-Risk" FP: Students classified by screener as "At-Risk" and are actually "Not At-Risk FN: Students classified by screener as "Not At-Risk" and are actually "At-Risk" TN: Students classified by screener as "Not At-Risk" and are actually "Not At-Risk" Aggregate the designations to obtain the total counts in each cell for students in the sample. Compute the statistics as shown below. Classification Accuracy Rate Formula: (TP + TN) / (total sample size) Description: Proportion of the study sample whose classification by the ARM cut scores was consistent with classification by the criterion measure. False Negative (FN) Rate Formula: FN / (FN + TP) Description: The proportion of "at-risk" students identified as "not at-risk" by the ARM score. False Positive (FP) Rate Formula: FP / (FP + TN) Description: The proportion of "not at-risk" students identified as "at-risk" by the ARM score. Positive Predictive Value (PPV) Formula: TP / (TP + FP) Description: Proportion of students identified as "at-risk" by ARM scores who are actually at-risk. Negative Predictive Value (NPV) Formula: TN / (TN + FN) Description: Proportion of students identified as "not at-risk" by ARM scores who are actually not at-risk. Sensitivity Formula: TP / (TP + FN) Description: Proportion of "at-risk" students that were identified as "at-risk" by the ARM score. Specificity Formula: TN / (TN + FP) Description: Proportion of "not at-risk" students that were identified as "not at-risk" by ARM scores. Area Under the Curve (AUC) Description: Represents the area under the receiver operating characteristics (ROC) curve, including the lower (AUC-LB) and upper (AUC-UB) bounds of the 95% confidence interval. Confidence intervals were calculated using a 1000-sample bootstrap method to construct a two-sided interval. Interpretation: Measures how well ARM scores separate the study sample into "at-risk" and "not at-risk" categories that match those from the criterion measure cut scores. The cut points for the Amira Screener were determined to align closely with the identification of students at risk for requiring intensive intervention, as defined by their performance on the criterion measure (NWEA MAP Reading assessment). Specifically, candidate cut scores were tested at the 20th, 25th, and 30th percentiles of the ARM scores for each grade and testing window. These cut points were chosen because they correspond to widely accepted thresholds for identifying students performing below grade-level expectations and, consequently, at risk of academic difficulty. The final cut points were selected based on their ability to maximize classification accuracy while minimizing the rates of false negatives and false positives, as evaluated using standard classification metrics. Specifically, the cut points that resulted in the highest lower bound on the AUC were selected. By achieving an optimal balance of sensitivity and specificity, the selected cut scores ensure that students most in need of support are accurately identified without unnecessarily flagging those who are not at risk. The alignment between the cut points and students at risk was validated through the classification accuracy analysis, which compared ARM classifications with those based on the criterion measure. High sensitivity values demonstrated the cut points' ability to identify the majority of at-risk students, while high specificity values confirmed that students not at risk were correctly excluded. This approach ensures the cut points serve as a reliable tool for early identification and targeted intervention. The groups contrasted were students who are at risk versus students who are not at risk.
Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?
No
If yes, please describe the intervention, what children received the intervention, and how they were chosen.

Cross-Validation

Has a cross-validation study been conducted?
No
If yes,
Select time of year.
Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.
Do the cross-validation analyses examine concurrent and/or predictive classification?

Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.
Describe how the cross-validation analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).
Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?
If yes, please describe the intervention, what children received the intervention, and how they were chosen.

Classification Accuracy - Fall

Evidence Kindergarten Grade 1 Grade 2 Grade 3 Grade 4 Grade 5 Grade 6
Criterion measure NWEA MAP NWEA MAP NWEA MAP NWEA MAP NWEA MAP NWEA MAP NWEA MAP
Cut Points - Percentile rank on criterion measure 20 20 20 20 20 20 20
Cut Points - Performance score on criterion measure 140 158 171 183 192 200 203
Cut Points - Corresponding performance score (numeric) on screener measure -0.10 .70 1.71 2.5 3.28 4.41 5.41
Classification Data - True Positive (a) 1074 1528 1879 1718 26 322 53
Classification Data - False Positive (b) 1193 869 723 702 475 896 6
Classification Data - False Negative (c) 326 345 459 337 10 114 22
Classification Data - True Negative (d) 4872 5652 5803 4921 1435 2314 20
Area Under the Curve (AUC) 0.79 0.84 0.85 0.86 0.73 0.73 0.74
AUC Estimate’s 95% Confidence Interval: Lower Bound 0.77 0.83 0.84 0.85 0.70 0.71 0.61
AUC Estimate’s 95% Confidence Interval: Upper Bound 0.80 0.85 0.86 0.87 0.78 0.75 0.87
Statistics Kindergarten Grade 1 Grade 2 Grade 3 Grade 4 Grade 5 Grade 6
Base Rate 0.19 0.22 0.26 0.27 0.02 0.12 0.74
Overall Classification Rate 0.80 0.86 0.87 0.86 0.75 0.72 0.72
Sensitivity 0.77 0.82 0.80 0.84 0.72 0.74 0.71
Specificity 0.80 0.87 0.89 0.88 0.75 0.72 0.77
False Positive Rate 0.20 0.13 0.11 0.12 0.25 0.28 0.23
False Negative Rate 0.23 0.18 0.20 0.16 0.28 0.26 0.29
Positive Predictive Power 0.47 0.64 0.72 0.71 0.05 0.26 0.90
Negative Predictive Power 0.94 0.94 0.93 0.94 0.99 0.95 0.48
Sample Kindergarten Grade 1 Grade 2 Grade 3 Grade 4 Grade 5 Grade 6
Date 2023-2024 2023-2024 2023-2024 2023-2024 2023-2024 2023-2024 2023-2024
Sample Size 7465 8394 8864 7678 1946 3646 101
Geographic Representation East North Central (IL, IN)
Mountain (AZ)
South Atlantic (MD)
East North Central (IL)
Mountain (AZ)
Pacific (CA)
South Atlantic (MD)
East North Central (IL)
Mountain (AZ)
Pacific (CA)
South Atlantic (MD)
Mountain (AZ)
Pacific (CA)
South Atlantic (MD)
Mountain (NV)
South Atlantic (SC)
West South Central (OK)
Mountain (NV)
South Atlantic (SC)
West South Central (OK)
West South Central (LA)
Male 41.8% 47.7% 48.4%        
Female 41.1% 45.9% 46.4%        
Other              
Gender Unknown              
White, Non-Hispanic 49.9% 25.9% 15.3%        
Black, Non-Hispanic 22.1% 19.8% 19.9%        
Hispanic 11.5% 10.4% 9.7%        
Asian/Pacific Islander 4.2% 3.5% 3.5%        
American Indian/Alaska Native              
Other 6.8% 10.1% 2.9%        
Race / Ethnicity Unknown              
Low SES 6.7% 6.3% 6.5%        
IEP or diagnosed disability 9.4% 11.6% 12.2%        
English Language Learner 13.7% 13.3% 13.0%        

Classification Accuracy - Winter

Evidence Kindergarten Grade 1 Grade 2 Grade 3 Grade 4 Grade 5
Criterion measure NWEA MAP NWEA MAP NWEA MAP NWEA MAP NWEA MAP NWEA MAP
Cut Points - Percentile rank on criterion measure 20 20 20 20 20 20
Cut Points - Performance score on criterion measure 140 158 171 183 192 200
Cut Points - Corresponding performance score (numeric) on screener measure .21 .94 1.97 2.92 3.58 4.71
Classification Data - True Positive (a) 1111 926 1400 1563 64 173
Classification Data - False Positive (b) 1317 850 875 574 545 1045
Classification Data - False Negative (c) 218 172 313 319 16 42
Classification Data - True Negative (d) 5497 6392 6138 7655 1320 2385
Area Under the Curve (AUC) 0.82 0.86 0.85 0.87 0.75 0.75
AUC Estimate’s 95% Confidence Interval: Lower Bound 0.81 0.85 0.84 0.85 0.72 0.72
AUC Estimate’s 95% Confidence Interval: Upper Bound 0.83 0.88 0.86 0.88 0.78 0.77
Statistics Kindergarten Grade 1 Grade 2 Grade 3 Grade 4 Grade 5
Base Rate 0.16 0.13 0.20 0.19 0.04 0.06
Overall Classification Rate 0.81 0.88 0.86 0.91 0.71 0.70
Sensitivity 0.84 0.84 0.82 0.83 0.80 0.80
Specificity 0.81 0.88 0.88 0.93 0.71 0.70
False Positive Rate 0.19 0.12 0.12 0.07 0.29 0.30
False Negative Rate 0.16 0.16 0.18 0.17 0.20 0.20
Positive Predictive Power 0.46 0.52 0.62 0.73 0.11 0.14
Negative Predictive Power 0.96 0.97 0.95 0.96 0.99 0.98
Sample Kindergarten Grade 1 Grade 2 Grade 3 Grade 4 Grade 5
Date 2023-2024 2023-2024 2023-2024 2023-2024 2023-2024 2023-2024
Sample Size 8143 8340 8726 10111 1945 3645
Geographic Representation East North Central (IL, IN)
Mountain (AZ)
South Atlantic (MD)
East North Central (IL)
Mountain (AZ)
Pacific (CA)
South Atlantic (MD)
East North Central (IL)
Mountain (AZ)
Pacific (CA)
South Atlantic (MD)
Mountain (AZ)
Pacific (CA)
South Atlantic (MD)
Mountain (NV)
South Atlantic (SC)
West South Central (OK)
Mountain (NV)
South Atlantic (SC)
West South Central (OK)
Male 38.3% 48.0% 49.2%      
Female 37.7% 46.2% 47.1%      
Other            
Gender Unknown            
White, Non-Hispanic 45.7% 26.0% 15.5%      
Black, Non-Hispanic 20.3% 19.9% 20.2%      
Hispanic 10.6% 10.4% 9.8%      
Asian/Pacific Islander 3.9% 3.5% 3.6%      
American Indian/Alaska Native            
Other 6.2% 10.2% 2.9%      
Race / Ethnicity Unknown            
Low SES 6.1% 6.4% 6.6%      
IEP or diagnosed disability 8.7% 11.6% 12.4%      
English Language Learner 12.6% 13.4% 13.2%      

Classification Accuracy - Spring

Evidence Kindergarten Grade 1 Grade 2 Grade 3 Grade 4 Grade 5 Grade 6
Criterion measure NWEA MAP NWEA MAP NWEA MAP NWEA MAP NWEA MAP NWEA MAP NWEA MAP
Cut Points - Percentile rank on criterion measure 20 20 20 20 20 20 20
Cut Points - Performance score on criterion measure 140 158 171 183 192 200 203
Cut Points - Corresponding performance score (numeric) on screener measure .47 1.43 2.19 3.19 3.82 4.97 5.97
Classification Data - True Positive (a) 1096 1210 1497 1290 88 225 63
Classification Data - False Positive (b) 1209 746 794 545 512 666 7
Classification Data - False Negative (c) 208 289 328 238 25 83 13
Classification Data - True Negative (d) 5572 6145 6190 5579 1291 1672 18
Area Under the Curve (AUC) 0.83 0.85 0.85 0.88 0.75 0.72 0.77
AUC Estimate’s 95% Confidence Interval: Lower Bound 0.82 0.84 0.84 0.87 0.72 0.70 0.63
AUC Estimate’s 95% Confidence Interval: Upper Bound 0.84 0.86 0.87 0.89 0.77 0.75 0.89
Statistics Kindergarten Grade 1 Grade 2 Grade 3 Grade 4 Grade 5 Grade 6
Base Rate 0.16 0.18 0.21 0.20 0.06 0.12 0.75
Overall Classification Rate 0.82 0.88 0.87 0.90 0.72 0.72 0.80
Sensitivity 0.84 0.81 0.82 0.84 0.78 0.73 0.83
Specificity 0.82 0.89 0.89 0.91 0.72 0.72 0.72
False Positive Rate 0.18 0.11 0.11 0.09 0.28 0.28 0.28
False Negative Rate 0.16 0.19 0.18 0.16 0.22 0.27 0.17
Positive Predictive Power 0.48 0.62 0.65 0.70 0.15 0.25 0.90
Negative Predictive Power 0.96 0.96 0.95 0.96 0.98 0.95 0.58
Sample Kindergarten Grade 1 Grade 2 Grade 3 Grade 4 Grade 5 Grade 6
Date 2023-2024 2023-2024 2023-2024 2023-2024 2023-2024 2023-2024 2023-2024
Sample Size 8085 8390 8809 7652 1916 2646 101
Geographic Representation East North Central (IL, IN)
Mountain (AZ)
South Atlantic (MD)
East North Central (IL)
Mountain (AZ)
Pacific (CA)
South Atlantic (MD)
East North Central (IL)
Mountain (AZ)
Pacific (CA)
South Atlantic (MD)
Mountain (AZ)
Pacific (CA)
South Atlantic (MD)
Mountain (NV)
South Atlantic (SC)
West South Central (OK)
Mountain (NV)
South Atlantic (SC)
West South Central (OK)
West South Central (LA)
Male 38.6% 47.7% 48.7%        
Female 37.9% 45.9% 46.7%        
Other              
Gender Unknown              
White, Non-Hispanic 46.1% 25.9% 15.4%        
Black, Non-Hispanic 20.4% 19.8% 20.0%        
Hispanic 10.7% 10.4% 9.7%        
Asian/Pacific Islander 3.9% 3.5% 3.5%        
American Indian/Alaska Native              
Other 6.3% 10.1% 2.9%        
Race / Ethnicity Unknown              
Low SES 6.2% 6.3% 6.5%        
IEP or diagnosed disability 8.7% 11.6% 12.3%        
English Language Learner 12.7% 13.3% 13.1%        

Reliability

Grade Kindergarten
Grade 1
Grade 2
Grade 3
Grade 4
Grade 5
Grade 6
Rating Convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
*Offer a justification for each type of reliability reported, given the type and purpose of the tool.
Reliability refers to the relative stability with which a test measures the same skills across minor differences in conditions. Two types of reliability are reported in the table below, Parallel Form reliability and Cronbach’s coefficient alpha. Parallel Forms Reliability is crucial for ensuring the consistency of the Amira Progress Monitoring assessment. This analysis measures the consistency of results across different assessment forms, which is essential for accurately tracking student growth, since students receive a different form each time they receive a progress monitoring assessment. By confirming that each form is equivalent, we can ensure that any observed improvements in student scores are due to actual learning, not differences in complexity or difficulty in the test forms. The coefficient reported is the average correlation among alternate forms of the measure. High alternate-form reliability coefficients suggest that these multiple forms are measuring the same construct. Coefficient alpha, commonly known as Cronbach's alpha, is a measure of internal consistency reliability used widely in education research and other fields. It estimates the proportion of total variance in a set of scores that is attributable to the true score variance, reflecting the reliability of the measurement.
*Describe the sample(s), including size and characteristics, for each reliability analysis conducted.
The samples used to establish reliability include students who tested in the 2023-2024 school year. Both samples encompassed at least dozens of districts across the country for each grade. These districts were selected to emulate the diversity and variation of the national population of students and are representative in a variety of dimensions including school type, socioeconomic status, geographic region, gender, race, and ethnicity. Students in the parallel forms reliability sample each took two different Progress Monitoring forms within the same time window (1 week). Students in the internal consistency analyses were those who had taken at least 5 forms (instances) of Progress Monitoring across the 2023-2024 school year.
*Describe the analysis procedures for each reported type of reliability.
To assess parallel forms reliability, two forms of the assessment were administered to the same group of students within the range of one week. The scores obtained on each assessment version were then correlated to assess the degree of consistency between them. We measure these correlations using Pearson’s correlation coefficient, which is a measure of the strength of the linear relationship between two variables. The practical significance of the reliability coefficients was evaluated as follows: poor (0−0.39), adequate (0.40−0.59), good (0.60−0.79), and excellent (0.80−1.0). These estimates of practical significance are arbitrary, but conventionally used, and provide a useful heuristic for interpreting the reliability data. Confidence intervals were then calculated for the correlation coefficients computed across distinct pairs of forms. To obtain an estimate of internal consistency reliability, Cronbach's alphas were calculated for students who had taken at least 5 forms of Progress Monitoring assessment over the year. The 95% confidence interval of each reliability metric is computed using the bootstrap method, where 1000 samples with replacement are drawn from the data, and the 2.5% and 97.5% quantiles are calculated and reported.

*In the table(s) below, report the results of the reliability analyses described above (e.g., internal consistency or inter-rater reliability coefficients).

Type of Subgroup Informant Age / Grade Test or Criterion n Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of reliability analysis not compatible with above table format:
Manual cites other published reliability studies:
No
Provide citations for additional published studies.
Do you have reliability data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
No

If yes, fill in data for each subgroup with disaggregated reliability data.

Type of Subgroup Informant Age / Grade Test or Criterion n Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of reliability analysis not compatible with above table format:
Manual cites other published reliability studies:
No
Provide citations for additional published studies.

Validity

Grade Kindergarten
Grade 1
Grade 2
Grade 3
Grade 4
Grade 5
Grade 6
Rating Convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
*Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.
Concurrent validity measures how well Amira scores correlate with the scores of another test that is administered at the same time and is already established as valid for measuring the same construct. Predictive validity refers to the extent to which scores on the Amira assessment can accurately predict future performance on a related outcome or criterion. The external assessments used in these studies include the i-Ready Reading Diagnostic and NWEA MAP Reading assessment. Both assessments are nationally-normed, computer adaptive measures of reading ability that are widely used in many states with established validity studies of their own.
*Describe the sample(s), including size and characteristics, for each validity analysis conducted.
The samples include students who tested in the 2022-2023 school year. This includes a sample of students from hundreds of districts across the country. These districts were selected to emulate the diversity and variation of the national population of students and are representative in a variety of dimensions including school type, socioeconomic status, geographic region, gender, race, and ethnicity. Sample sizes for each validity study vary across testing window, grade and criterion measure ranging from 988 to 5,643.
*Describe the analysis procedures for each reported type of validity.
Concurrent validity was established by correlating Amira’s Reading Mastery (ARM) scores from students in grades K through 6 who took both an Amira assessment and the external measure within the same two-week window of one another. The predictive validity of Amira was examined by correlating Amira’s assessment scores taken during the beginning of the year (Fall) window to scores from external measures taken at the end of the school year (Spring). In both forms of validity, the relationship between Amira’s scores and the external criterion measure was evaluated using Pearson’s correlation coefficient. Coefficients were calculated using bootstrap sampling across 100 random samples, and median correlation coefficients as well as 95% confidence intervals on the correlation coefficients are reported. All median and lower-bound correlation coefficients are 0.70 or higher, indicating a strong positive linear relationship between Amira and the external measure.

*In the table below, report the results of the validity analyses described above (e.g., concurrent or predictive validity, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.

Type of Subgroup Informant Age / Grade Test or Criterion n Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of validity analysis not compatible with above table format:
Manual cites other published reliability studies:
Yes
Provide citations for additional published studies.
https://www.amiralearning.com/amira-technical-guide.html Rice, M. L., & Hoffman, L. (2015). Predicting vocabulary growth in children with and without specific language impairment: A longitudinal study from 2; 6 to 21 years of age. Journal of Speech, Language, and Hearing Research, 58(2), 345–359. Boscardin, C. K., Muthén, B., Francis, D. J., & Baker, E. L. (2008). Early identification of reading difficulties using heterogeneous developmental trajectories. Journal of Educational Psychology, 100(1), 192.
Describe the degree to which the provided data support the validity of the tool.
All results show a correlation of 0.7 or higher (strong correlation) between Amira’s Progress Monitoring scores and external criterion scores, so the provided data support the validity of the tool to a high degree.
Do you have validity data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
No

If yes, fill in data for each subgroup with disaggregated validity data.

Type of Subgroup Informant Age / Grade Test or Criterion n Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of validity analysis not compatible with above table format:
Manual cites other published reliability studies:
No
Provide citations for additional published studies.

Bias Analysis

Grade Kindergarten
Grade 1
Grade 2
Grade 3
Grade 4
Grade 5
Grade 6
Rating Provided Provided Provided Provided Provided Provided Provided
Have you conducted additional analyses related to the extent to which your tool is or is not biased against subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)? Examples might include Differential Item Functioning (DIF) or invariance testing in multiple-group confirmatory factor models.
Yes
If yes,
a. Describe the method used to determine the presence or absence of bias:
We conducted a Differential Item Functioning (DIF) analysis using the Zumbo & Thomas (ZT) classification system with logistic regression implemented in the difR package in R software. The analysis examined 2,598 items across 10 subtests spanning grades Pre-K through 8, using a sample of 65,000 students from three large districts across three states. Items were classified using Nagelkerke's R² effect size thresholds: A items (negligible DIF) ≤ 0.13, B items (slight to moderate DIF) > 0.13 but ≤ 0.26, and C items (moderate to large DIF) > 0.26. To ensure robust DIF detection, items with fewer than 100 responses were excluded from analysis. DIF was examined using both overall reading ability scores and subscale scores as matching criteria to validate findings across different ability matching approaches. Following statistical analysis, curriculum experts reviewed all B and C flagged items to determine whether observed DIF represented construct-irrelevant variance (bias) or legitimate construct-related performance differences.
b. Describe the subgroups for which bias analyses were conducted:
Bias analyses were conducted across the following subgroups: 1. Gender: Male and female students 2. Race/Ethnicity: Hispanic/Latino, African American/Black, and White students The sample was strategically selected to provide sufficient demographic diversity and adequate sample sizes for reliable DIF detection across these key demographic groups that are central to educational equity considerations in academic screening.
c. Describe the results of the bias analyses conducted, including data and interpretative statements. Include magnitude of effect (if available) if bias has been identified.
The DIF analysis revealed exceptionally low levels of differential item functioning across all demographic subgroups: Results Summary: - 99% of items were classified as A (negligible DIF) - <1% were classified as B (slight to moderate DIF) - <1% were classified as C (moderate to large DIF) Expert Review Results: All flagged items underwent curriculum expert review for construct-irrelevant variance. Items were evaluated for evidence of bias versus legitimate construct-related performance variations. A small number of items (approximately 0.20%) were identified as exhibiting bias and were removed from the pool. Interpretation: The removal of biased items ensures that the final assessment maintains technical rigor and equity across diverse student populations. The extremely low percentage of items requiring removal (0.20%) demonstrates that the vast majority of items function equivalently across demographic subgroups, providing strong evidence for test fairness. Construct Validity Support: Pearson correlations between overall and subscale scores ranged from 0.72-0.90 (R² = 0.52-0.81), indicating strong construct coherence and supporting the validity of our matching criteria approach.

Data Collection Practices

Most tools and programs evaluated by the NCII are branded products which have been submitted by the companies, organizations, or individuals that disseminate these products. These entities supply the textual information shown above, but not the ratings accompanying the text. NCII administrators and members of our Technical Review Committees have reviewed the content on this page, but NCII cannot guarantee that this information is free from error or reflective of recent changes to the product. Tools and programs have the opportunity to be updated annually or upon request.