IXL Universal Math Screener
Mathematics

Summary

IXL’s Universal Math Screener for grades K-8 identifies students who may need intervention. Criterion-referenced and designed by education experts using modern assessment principles and frameworks, the Universal Screener quickly flags students who are at risk of not meeting grade-level standards. The IXL Universal Math Screener is quick, requiring on average only 20 minutes to complete, and helps administrators plan for intervention accordingly. Administrators can easily set up a screener window, up to five times per year, with desired start and end dates. During a screener window, students will see a notification on their dashboard that directs them to complete their screener, available to them either in English or Spanish. Students have up to three minutes to answer each question, and are allotted 30 minutes to complete the screener. The screener’s advanced algorithm adapts to a student’s abilities after each question to obtain precise proficiency levels. Once the screener window has closed, administrators can review the Universal Screener Levels report to understand which students are on or above grade level, below grade level, or far below grade level–by district, schools, grade level, or individual student–both overall and by each strand.

Where to Obtain:
IXL Learning
orders@ixl.com
777 Mariners Island Blvd., Suite 600, San Mateo, CA 94404
855-255-8800
www.ixl.com/membership/quote
Initial Cost:
$5.00 per student
Replacement Cost:
$5.00 per student per school year, covering up to five screens per school year per purchase
Included in Cost:
The IXL Universal Math Screener is priced on an annual per student basis. Each student license purchase covers up to five screens per school year for the student. Training is priced separately. Access to the IXL Math learning platform is an additional cost, and offered together with the IXL Universal Math Screener at $13.25 per annual student license.
The IXL Universal Math Screener supports assistive technologies, including screen reader compatibility, audio support, keyboard shortcuts, browser zoom up to 200 percent, and high contrast ratios. More broadly, the IXL Universal Math Screener is highly adaptive, which allows it to quickly adjust and meet students' diverse abilities and proficiency levels.
Training Requirements:
Training not required
Qualified Administrators:
No minimum qualifications specified.
Access to Technical Support:
Administrators and teachers can refer to online resources including the IXL Universal Screener Technical Manual and Teacher Implementation Guide for guidance on administration and use of the Screener. Users can also refer to IXL’s online help center for additional user guides and answers to frequently asked questions at www.ixl.com/help-center. IXL offers technical support via phone (855-255-6676) from 7AM to 7PM, Monday to Friday. Users may also contact IXL via email (help@ixl.com). Other than closure for major holidays, IXL’s attentive staff will respond to inquiries within one business day.
Assessment Format:
Scoring Time:
  • Scoring is automatic
Scores Generated:
  • Percentile score
  • Developmental benchmarks
  • Other: After students complete the screener, administrators and teachers have access to key data for planning instruction: nationally-normed percentiles, overall achievement levels, and achievement levels by strand.
Administration Time:
  • 20 minutes per student
Scoring Method:
  • Automatically (computer-scored)
Technology Requirements:
  • Computer or tablet
  • Internet connection
Accommodations:
The IXL Universal Math Screener supports assistive technologies, including screen reader compatibility, audio support, keyboard shortcuts, browser zoom up to 200 percent, and high contrast ratios. More broadly, the IXL Universal Math Screener is highly adaptive, which allows it to quickly adjust and meet students' diverse abilities and proficiency levels.

Descriptive Information

Please provide a description of your tool:
IXL’s Universal Math Screener for grades K-8 identifies students who may need intervention. Criterion-referenced and designed by education experts using modern assessment principles and frameworks, the Universal Screener quickly flags students who are at risk of not meeting grade-level standards. The IXL Universal Math Screener is quick, requiring on average only 20 minutes to complete, and helps administrators plan for intervention accordingly. Administrators can easily set up a screener window, up to five times per year, with desired start and end dates. During a screener window, students will see a notification on their dashboard that directs them to complete their screener, available to them either in English or Spanish. Students have up to three minutes to answer each question, and are allotted 30 minutes to complete the screener. The screener’s advanced algorithm adapts to a student’s abilities after each question to obtain precise proficiency levels. Once the screener window has closed, administrators can review the Universal Screener Levels report to understand which students are on or above grade level, below grade level, or far below grade level–by district, schools, grade level, or individual student–both overall and by each strand.
The tool is intended for use with the following grade(s).
not selected Preschool / Pre - kindergarten
selected Kindergarten
selected First grade
selected Second grade
selected Third grade
selected Fourth grade
selected Fifth grade
selected Sixth grade
selected Seventh grade
selected Eighth grade
not selected Ninth grade
not selected Tenth grade
not selected Eleventh grade
not selected Twelfth grade

The tool is intended for use with the following age(s).
not selected 0-4 years old
selected 5 years old
selected 6 years old
selected 7 years old
selected 8 years old
selected 9 years old
selected 10 years old
selected 11 years old
selected 12 years old
selected 13 years old
not selected 14 years old
not selected 15 years old
not selected 16 years old
not selected 17 years old
not selected 18 years old

The tool is intended for use with the following student populations.
selected Students in general education
selected Students with disabilities
selected English language learners

ACADEMIC ONLY: What skills does the tool screen?

Reading
Phonological processing:
not selected RAN
not selected Memory
not selected Awareness
not selected Letter sound correspondence
not selected Phonics
not selected Structural analysis

Word ID
not selected Accuracy
not selected Speed

Nonword
not selected Accuracy
not selected Speed

Spelling
not selected Accuracy
not selected Speed

Passage
not selected Accuracy
not selected Speed

Reading comprehension:
not selected Multiple choice questions
not selected Cloze
not selected Constructed Response
not selected Retell
not selected Maze
not selected Sentence verification
not selected Other (please describe):


Listening comprehension:
not selected Multiple choice questions
not selected Cloze
not selected Constructed Response
not selected Retell
not selected Maze
not selected Sentence verification
not selected Vocabulary
not selected Expressive
not selected Receptive

Mathematics
Global Indicator of Math Competence
selected Accuracy
not selected Speed
selected Multiple Choice
selected Constructed Response

Early Numeracy
selected Accuracy
not selected Speed
selected Multiple Choice
selected Constructed Response

Mathematics Concepts
selected Accuracy
not selected Speed
selected Multiple Choice
selected Constructed Response

Mathematics Computation
selected Accuracy
not selected Speed
selected Multiple Choice
selected Constructed Response

Mathematic Application
selected Accuracy
not selected Speed
selected Multiple Choice
selected Constructed Response

Fractions/Decimals
selected Accuracy
not selected Speed
selected Multiple Choice
selected Constructed Response

Algebra
selected Accuracy
not selected Speed
selected Multiple Choice
selected Constructed Response

Geometry
selected Accuracy
not selected Speed
selected Multiple Choice
selected Constructed Response

not selected Other (please describe):

Please describe specific domain, skills or subtests:
The IXL Universal Math Screener covers five strands: Numbers & Operations, Algebra & Algebraic Thinking, Fractions, Geometry, and Data & Measurement.
BEHAVIOR ONLY: Which category of behaviors does your tool target?


BEHAVIOR ONLY: Please identify which broad domain(s)/construct(s) are measured by your tool and define each sub-domain or sub-construct.

Acquisition and Cost Information

Where to obtain:
Email Address
orders@ixl.com
Address
777 Mariners Island Blvd., Suite 600, San Mateo, CA 94404
Phone Number
855-255-8800
Website
www.ixl.com/membership/quote
Initial cost for implementing program:
Cost
$5.00
Unit of cost
student
Replacement cost per unit for subsequent use:
Cost
$5.00
Unit of cost
student
Duration of license
school year, covering up to five screens per school year per purchase
Additional cost information:
Describe basic pricing plan and structure of the tool. Provide information on what is included in the published tool, as well as what is not included but required for implementation.
The IXL Universal Math Screener is priced on an annual per student basis. Each student license purchase covers up to five screens per school year for the student. Training is priced separately. Access to the IXL Math learning platform is an additional cost, and offered together with the IXL Universal Math Screener at $13.25 per annual student license.
Provide information about special accommodations for students with disabilities.
The IXL Universal Math Screener supports assistive technologies, including screen reader compatibility, audio support, keyboard shortcuts, browser zoom up to 200 percent, and high contrast ratios. More broadly, the IXL Universal Math Screener is highly adaptive, which allows it to quickly adjust and meet students' diverse abilities and proficiency levels.

Administration

BEHAVIOR ONLY: What type of administrator is your tool designed for?
not selected General education teacher
not selected Special education teacher
not selected Parent
not selected Child
not selected External observer
not selected Other
If other, please specify:

What is the administration setting?
not selected Direct observation
not selected Rating scale
not selected Checklist
not selected Performance measure
not selected Questionnaire
not selected Direct: Computerized
not selected One-to-one
not selected Other
If other, please specify:

Does the tool require technology?
Yes

If yes, what technology is required to implement your tool? (Select all that apply)
selected Computer or tablet
selected Internet connection
not selected Other technology (please specify)

If your program requires additional technology not listed above, please describe the required technology and the extent to which it is combined with teacher small-group instruction/intervention:
The Screener can be accessed via popular web browsers such as Chrome, Firefox, Safari, or Edge.

What is the administration context?
selected Individual
selected Small group   If small group, n=
selected Large group   If large group, n=
selected Computer-administered
not selected Other
If other, please specify:

What is the administration time?
Time in minutes
20
per (student/group/other unit)
student

Additional scoring time:
Time in minutes
0
per (student/group/other unit)

ACADEMIC ONLY: What are the discontinue rules?
not selected No discontinue rules provided
not selected Basals
not selected Ceilings
selected Other
If other, please specify:
The assessment is ended for a student when one of the four stopping rule components has been met. The four components of the stopping rule are (1) overall precision, (2) confidence in classification, (3) maximum number of items, and (4) maximum amount of time. There is a 30 minute time limit for completion of the Universal Screener.


Are norms available?
Yes
Are benchmarks available?
Yes
If yes, how many benchmarks per year?
The IXL Universal Math Screener is typically administered in three distinct windows (beginning-of-year, middle-of-year, and end-of-year). It may be administered up to five times per academic year.
If yes, for which months are benchmarks available?
Benchmarks are set relative to the beginning (August 1 – November 30), middle (December 1 – February 28), and end (March 1 – June 1) of the school year.
BEHAVIOR ONLY: Can students be rated concurrently by one administrator?
If yes, how many students can be rated concurrently?

Training & Scoring

Training

Is training for the administrator required?
No
Describe the time required for administrator training, if applicable:
Administrator training for the Universal Screener is optional, though highly recommended. Administrator training is offered as 30-minute virtual sessions.
Please describe the minimum qualifications an administrator must possess.
selected No minimum qualifications
Are training manuals and materials available?
Yes
Are training manuals/materials field-tested?
Yes
Are training manuals/materials included in cost of tools?
Yes
If No, please describe training costs:
Administrators and teachers can refer to online resources including the IXL Universal Screener Technical Manual (https://www.ixl.com/materials/us/IXL_Math_Universal_Screener_Technical_Manual.pdf) and Teacher Implementation Guide (https://www.ixl.com/materials/us/i_guides/Teacher_Guide_Universal_Screener.pdf) for guidance on administration and use of the Screener. These are available at no additional cost. Live virtual administrator training is available as an additional purchase. These are offered at $295.00 for a 30-minute session for up to 50 attendees and $549.00 for up to 200 attendees.
Can users obtain ongoing professional and technical support?
Yes
If Yes, please describe how users can obtain support:
Administrators and teachers can refer to online resources including the IXL Universal Screener Technical Manual and Teacher Implementation Guide for guidance on administration and use of the Screener. Users can also refer to IXL’s online help center for additional user guides and answers to frequently asked questions at www.ixl.com/help-center. IXL offers technical support via phone (855-255-6676) from 7AM to 7PM, Monday to Friday. Users may also contact IXL via email (help@ixl.com). Other than closure for major holidays, IXL’s attentive staff will respond to inquiries within one business day.

Scoring

How are scores calculated?
not selected Manually (by hand)
selected Automatically (computer-scored)
not selected Other
If other, please specify:

Do you provide basis for calculating performance level scores?
Yes
What is the basis for calculating performance level and percentile scores?
not selected Age norms
selected Grade norms
not selected Classwide norms
not selected Schoolwide norms
not selected Stanines
not selected Normal curve equivalents

What types of performance level scores are available?
not selected Raw score
not selected Standard score
selected Percentile score
not selected Grade equivalents
not selected IRT-based score
not selected Age equivalents
not selected Stanines
not selected Normal curve equivalents
selected Developmental benchmarks
not selected Developmental cut points
not selected Equated
not selected Probability
not selected Lexile score
not selected Error analysis
not selected Composite scores
not selected Subscale/subtest scores
selected Other
If other, please specify:
After students complete the screener, administrators and teachers have access to key data for planning instruction: nationally-normed percentiles, overall achievement levels, and achievement levels by strand.

Does your tool include decision rules?
No
If yes, please describe.
We provide administrators with a table linking classifications of students into proficiency levels and national percentile ranks. Classification at a given time of the year provides a criterion-referenced inference regarding a student’s math ability at that time relative to standards-based expectations for their grade level. Percentile ranks provide a norm-referenced inference of a student’s performance relative to their peers; specifically, percentile ranks indicate the percentage of scores that fall below a specific score. This table serves the following three purposes: (1) it provides educators with a high-level breakdown of student performance nationwide; (2) it allows for the differentiation of students who received the same classification; and (3) educators can use the percentile ranges associated with the Far below grade level, as well as school and individual student contexts, to identify appropriate cutoffs for Tier III interventions within a Response-to-Intervention/Multi-Tiered System of Supports (RTI/MTSS) framework.
Can you provide evidence in support of multiple decision rules?
No
If yes, please describe.
Please describe the scoring structure. Provide relevant details such as the scoring format, the number of items overall, the number of items per subscale, what the cluster/composite score comprises, and how raw scores are calculated.
The IXL Universal Math Screener includes a mix of selected-response and constructed-response questions which are scored in real-time. The screener covers five strands: Numbers & Operations, Algebra & Algebraic Thinking, Fractions, Geometry, and Data & Measurement. After students complete the screener, administrators and teachers have access to key data for planning instruction: nationally-normed percentiles, overall achievement levels, and achievement levels by strand. Nationally-normed percentiles provide information about the typical levels of performance for an identifiable population of students or schools. For example, a student may achieve the highest test score in her class on a given assessment but still fall below the national average of students at her grade level who have completed the same assessment (i.e., a percentile rank < 50). This information allows educators to compare their students’ scores to the scores of students across the United States who completed the same assessment. (More information about national norms may be found in the IXL Universal Math Screener Technical Manual, https://www.ixl.com/materials/us/IXL_Math_Universal_Screener_Technical_Manual.pdf.) The IXL Universal Math Screener also classifies students as performing far below grade level, below grade level, or on or above grade level. In addition, achievement levels are provided for each of the five strands (e.g., far below/below/on or above grade level in the Numbers and Operations strand, the Algebra & Algebraic Thinking strand, etc). Due to the adaptive nature of the IXL Universal Math Screener, the total number of items delivered to students varies based on several factors. However, lower and upper bounds on the number of items from any content strand are set based on the student grade and administration window (i.e., beginning, middle, or end of year). Though not included here, the test constraints associated with each time of year are published in the IXL Universal Math Screener Technical Manual. As a student progresses through the assessment, the algorithm updates its estimate of ability and the standard error of this ability estimate each time the student responds to an item using a conventional Bayesian expected a posteriori (EAP) estimation method. As the algorithm becomes more certain in its running estimate of student ability (as indicated by the decreasing standard error of the estimate), it selects items more suitable to the student’s ability estimate. That is, the algorithm chooses more difficult items as the student answers correctly and less challenging items as the student answers incorrectly. This adaptivity reduces the number of items needed to measure math ability, thus increasing test efficiency and improving the test experience.
Describe the tool’s approach to screening, samples (if applicable), and/or test format, including steps taken to ensure that it is appropriate for use with culturally and linguistically diverse populations and students with disabilities.
The IXL Universal Math Screener is a brief, accurate assessment designed specifically for identifying students from kindergarten through eighth grade who are experiencing moderate or severe math difficulties. It may be administered up to five times per academic year, with performance expectations automatically adjusting for the beginning, middle, and end of the academic year. By scheduling the first administration early in the academic year, educators can promptly identify students with moderate or severe difficulties. Results are available immediately following the administration so that intervention resources can be deployed quickly to the students who need them. Results are also available by strand, providing insight into the specific areas where each student is struggling. Content for the IXL Universal Math Screener was developed using a principled assessment design framework. Principled assessment design is a general framework for designing, developing, and implementing an assessment to support the ongoing accumulation and synthesis of evidence to support the validity claims made by the assessment (e.g., Ferrara et al., 2016). This framework requires assessment targets to be clearly defined at the beginning of the process. These assessment targets drive the entire assessment development plan, and continuous focus on targets ensures that all subsequent decisions are consistent with providing evidence to support validity claims. The assessment targets for the IXL Universal Math Screener are achievement level descriptors (ALDs) derived from all 50 states’ standards, including the Common Core. Achievement level descriptors are statements that describe expectations of what students at specific achievement levels should know and be able to do. As the IXL Universal Math Screener is designed to identify students who are experiencing moderate or severe difficulties in math, the achievement level was conceptualized as the threshold between on-grade achievement and below-grade achievement. Therefore, the ALDs were written for each educational standard to describe what a minimally competent student would know and be able to do. An external panel of experienced educators was recruited to review sample items alongside the evaluation of ALDs described previously. The SME panel consisted of in-service educators with significant experience teaching or supervising the grade level(s) they evaluated, and with demographics matching that of elementary and middle school teachers in the U.S. Multiple rounds of internal review were then conducted to ensure all items appropriately targeted the intended achievement level descriptors while meeting the content and item writing specifications. The format of the IXL Universal Math Screener includes a mix of selected-response questions for which a student chooses an answer from a provided set of options and constructed-response questions for which a student produces their own answer. Where appropriate, the writers of these items applied the principles of Universal Design (Thompson et al., 2002) to reduce construct-irrelevant variance while simultaneously increasing accessibility. Items were written to target achievement level descriptors while considering important factors such as reading level, cognitive load, vocabulary, and sentence length, among others. The IXL Universal Math Screener is designed to be appropriate for students with diverse backgrounds (e.g., race, ethnicity, culture, and gender) and levels of ability. The screener is available in Spanish, making it accessible for Spanish-speaking students regardless of English language proficiency. It also supports assistive technologies to provide equal accessibility for all students. These include screen reader compatibility, audio support, keyboard shortcuts, browser zoom up to 200 percent, and high contrast ratios. More broadly, the IXL Universal Math Screener is highly adaptive, which allows it to quickly adjust and meet students' diverse abilities and proficiency levels.

Technical Standards

Classification Accuracy & Cross-Validation Summary

Grade Grade 2
Grade 3
Grade 4
Grade 5
Grade 6
Grade 7
Grade 8
Classification Accuracy Fall Partially convincing evidence Partially convincing evidence Partially convincing evidence Partially convincing evidence Partially convincing evidence Partially convincing evidence Partially convincing evidence
Classification Accuracy Winter Data unavailable Data unavailable Partially convincing evidence Partially convincing evidence Partially convincing evidence Partially convincing evidence Partially convincing evidence
Classification Accuracy Spring Data unavailable Data unavailable Data unavailable Data unavailable Data unavailable Data unavailable Data unavailable
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available

NWEA MAP Growth

Classification Accuracy

Select time of year
Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.
The criterion measure is the NWEA MAP® Growth™ math assessment (MAP Growth), a computer-adaptive assessment of math ability. The MAP Growth math assessment and the IXL Universal Math Screener are independent assessments developed by different organizations. The classification analysis samples included students in all four U.S. Census regions: West (CA, AZ), Midwest (MI, OH), South (KY, TX, WV), and Northeast (PA). These states represent seven U.S. Census Divisions: Pacific (CA), Mountain (AZ), East North Central (MI, OH), West South Central (TX), East South Central (KY), Middle Atlantic (PA), and South Atlantic (WV).
Do the classification accuracy analyses examine concurrent and/or predictive classification?

Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.
For the Fall classification accuracy analyses, both measures were administered in the beginning of the 2024-25 school year. The IXL Universal Math Screener was administered between August 1, 2024 and November 30, 2024; the MAP Growth math assessment was administered between August 7, 2024 and November 8, 2024. As these two measurements were taken in close temporal proximity, concurrent validity analyses were conducted. For the Winter classification accuracy analyses, both measures were administered in the middle of the 2024-25 school year. The IXL Universal Math Screener was administered between December 1, 2024 and February 12, 2025; the MAP Growth math assessment was administered between December 1, 2024 and February 13, 2024. As these two measurements were taken in close temporal proximity, concurrent validity analyses were conducted.
Describe how the classification analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).
Cut points on the MAP Growth math assessment were determined using the 2020 nationwide MAP Growth norms for student achievement. In line with NCII TRC guidance, students with RIT scores below the 20th percentile were considered “at risk,” while those with RIT scores at or above the 20th percentile were considered “not at risk.” Cut points on the screening measure were identified as theta values that maximized classification accuracy with the criterion measure. Using these values, students were classified as “at risk” if they scored below the cut point and “not at risk” if they scored at or above the cut point.
Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?
No
If yes, please describe the intervention, what children received the intervention, and how they were chosen.

Cross-Validation

Has a cross-validation study been conducted?
No
If yes,
Select time of year.
Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.
Do the cross-validation analyses examine concurrent and/or predictive classification?

Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.
Describe how the cross-validation analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).
Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?
If yes, please describe the intervention, what children received the intervention, and how they were chosen.

Classification Accuracy - Fall

Evidence Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Criterion measure NWEA MAP Growth NWEA MAP Growth NWEA MAP Growth NWEA MAP Growth NWEA MAP Growth NWEA MAP Growth NWEA MAP Growth
Cut Points - Percentile rank on criterion measure 20 20 20 20 20 20 20
Cut Points - Performance score on criterion measure
Cut Points - Corresponding performance score (numeric) on screener measure -1.9 -1.4 -1.4 -1.1 -1.4 -1.1 -0.9
Classification Data - True Positive (a) 23 172 168 175 223 208 157
Classification Data - False Positive (b) 23 115 59 78 62 78 65
Classification Data - False Negative (c) 10 80 78 83 196 140 86
Classification Data - True Negative (d) 266 724 924 931 1265 1050 857
Area Under the Curve (AUC) 0.81 0.77 0.81 0.80 0.74 0.76 0.79
AUC Estimate’s 95% Confidence Interval: Lower Bound 0.73 0.74 0.78 0.77 0.72 0.74 0.76
AUC Estimate’s 95% Confidence Interval: Upper Bound 0.89 0.84 0.84 0.83 0.77 0.79 0.82
Statistics Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Base Rate 0.10 0.23 0.20 0.20 0.24 0.24 0.21
Overall Classification Rate 0.90 0.82 0.89 0.87 0.85 0.85 0.87
Sensitivity 0.70 0.68 0.68 0.68 0.53 0.60 0.65
Specificity 0.92 0.86 0.94 0.92 0.95 0.93 0.93
False Positive Rate 0.08 0.14 0.06 0.08 0.05 0.07 0.07
False Negative Rate 0.30 0.32 0.32 0.32 0.47 0.40 0.35
Positive Predictive Power 0.50 0.60 0.74 0.69 0.78 0.73 0.71
Negative Predictive Power 0.96 0.90 0.92 0.92 0.87 0.88 0.91
Sample Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Date Fall 2024 Fall 2024 Fall 2024 Fall 2024 Fall 2024 Fall 2024 Fall 2024
Sample Size 322 1091 1229 1267 1746 1476 1165
Geographic Representation Middle Atlantic (PA)
West South Central (TX)
East North Central (MI)
Middle Atlantic (PA)
East North Central (MI)
Middle Atlantic (PA)
East South Central (KY)
Middle Atlantic (PA)
East North Central (MI)
Middle Atlantic (PA)
East North Central (MI)
Middle Atlantic (PA)
East North Central (MI)
Middle Atlantic (PA)
Male              
Female              
Other              
Gender Unknown              
White, Non-Hispanic              
Black, Non-Hispanic              
Hispanic              
Asian/Pacific Islander              
American Indian/Alaska Native              
Other              
Race / Ethnicity Unknown              
Low SES              
IEP or diagnosed disability              
English Language Learner              

Classification Accuracy - Winter

Evidence Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Criterion measure NWEA MAP Growth NWEA MAP Growth NWEA MAP Growth NWEA MAP Growth NWEA MAP Growth
Cut Points - Percentile rank on criterion measure 20 20 20 20 20
Cut Points - Performance score on criterion measure
Cut Points - Corresponding performance score (numeric) on screener measure -1.0 -0.7 -0.9 -0.4 -0.3
Classification Data - True Positive (a) 75 141 59 91 70
Classification Data - False Positive (b) 45 92 46 56 62
Classification Data - False Negative (c) 23 24 5 18 9
Classification Data - True Negative (d) 151 269 142 141 116
Area Under the Curve (AUC) 0.77 0.80 0.84 0.78 0.77
AUC Estimate’s 95% Confidence Interval: Lower Bound 0.72 0.76 0.79 0.73 0.72
AUC Estimate’s 95% Confidence Interval: Upper Bound 0.82 0.84 0.88 0.82 0.82
Statistics Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Base Rate 0.33 0.31 0.25 0.36 0.31
Overall Classification Rate 0.77 0.78 0.80 0.76 0.72
Sensitivity 0.77 0.85 0.92 0.83 0.89
Specificity 0.77 0.75 0.76 0.72 0.65
False Positive Rate 0.23 0.25 0.24 0.28 0.35
False Negative Rate 0.23 0.15 0.08 0.17 0.11
Positive Predictive Power 0.63 0.61 0.56 0.62 0.53
Negative Predictive Power 0.87 0.92 0.97 0.89 0.93
Sample Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Date Winter 2025 Winter 2025 Winter 2025 Winter 2025 Winter 2025
Sample Size 294 526 252 306 257
Geographic Representation Pacific (CA)
West South Central (TX)
Mountain (AZ)
West South Central (TX)
Pacific (CA)
West South Central (TX)
East North Central (OH)
West South Central (TX)
Middle Atlantic (PA)
South Atlantic (WV)
Male          
Female          
Other          
Gender Unknown          
White, Non-Hispanic          
Black, Non-Hispanic          
Hispanic          
Asian/Pacific Islander          
American Indian/Alaska Native          
Other          
Race / Ethnicity Unknown          
Low SES          
IEP or diagnosed disability          
English Language Learner          

Reliability

Grade Grade 2
Grade 3
Grade 4
Grade 5
Grade 6
Grade 7
Grade 8
Rating Convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
*Offer a justification for each type of reliability reported, given the type and purpose of the tool.
Often, the reliability of a traditional, fixed-form assessment is evaluated via internal consistency measures such as Cronbach’s alpha or McDonald’s omega. However, in a CAT assessment context, these measures are not appropriate. Below, we report two types of reliability that are appropriate for CAT assessments: marginal reliability and standard error of measurement (SEM; including accompanying plots). Marginal Reliability is reported in the table further below. Standard Error of Measurement: A student’s standard error of measurement (SEM) is an indicator of the precision of their score and describes the range in which a score may vary upon repeated testing that is due to chance. SEM is a function of the interaction between the ability of a student, the difficulty of the items, and the number of items on a test. A lower SEM indicates less error and more precision around a score. Because the CAT algorithm selects items based on a student’s estimated ability level, it is able to target a student more accurately, and significantly decrease the SEM with fewer items than a traditional fixed-form assessment. Although the CAT algorithm targets students more accurately according to their ability, it generally performs better for students whose ability and performance are closer to the typical ability of students at or near their grade level. As seen in the plots (available from the Center upon request), the SEM is lowest for grade 2 students whose ability is around -2.0, grade 3 students whose ability is between -2.0 and 0.0, grade 4 students whose ability is between -0.5 and 0.5, grade 5 students whose ability is between 0.0 and 1.0, grade 6 students whose ability is between 0.5 and 1.5, grade 7 students whose ability is around 1.5, and grade 8 students whose ability is between 1.5 and 2.5.
*Describe the sample(s), including size and characteristics, for each reliability analysis conducted.
The sample for calculating the marginal reliability and SEM included student records from the 2022-2023 school year. The sample included students in all nine U.S. Census Bureau divisions. Sample sizes by grade are presented in the table below. Students in this sample spanned all performance levels.
*Describe the analysis procedures for each reported type of reliability.
Marginal reliability: Although traditional measures of reliability are not estimable for adaptive assessments, marginal reliability provides a method that closely approximates the traditional measures of internal consistency when the ability distribution and item parameters are known (see Dimitrov, 2003; Samejima, 1977, 1994). SEM: As mentioned, the IXL Universal Math Screener uses EAP to estimate student ability and SEM to indicate the level of certainty or reliability of this estimate. It derives the SEM by calculating the standard deviation of the posterior distribution of the EAP estimate by integrating over all possible values of ability given a response pattern (see Bock & Mislevy, 1982). Although not summarized here, the Technical Manual for the IXL Universal Math Screener includes additional reliability information, including test-retest reliability and classification consistency.

*In the table(s) below, report the results of the reliability analyses described above (e.g., internal consistency or inter-rater reliability coefficients).

Type of Subgroup Informant Age / Grade Test or Criterion n Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of reliability analysis not compatible with above table format:
Standard error of measurement (SEM) describes the range in which a score may vary upon repeated testing that is due to chance. It is inversely related to marginal reliability, with lower SEM values indicating higher precision. In this sample, median SEM coefficients ranged from 0.441 (grade 4) to 0.484 (grade 2), indicating low error and high precision around students’ scores on the IXL Universal Math Screener across grades 2-8. The plots (available from the Center upon request) display SEM for each grade in the analysis. The above histograms represent the wide spectrum of math ability among students in this sample (panels show number of test-takers by ability level across grades 2-8), demonstrating that the sample included students across all performance levels.
Manual cites other published reliability studies:
No
Provide citations for additional published studies.
Do you have reliability data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
No

If yes, fill in data for each subgroup with disaggregated reliability data.

Type of Subgroup Informant Age / Grade Test or Criterion n Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of reliability analysis not compatible with above table format:
Manual cites other published reliability studies:
Provide citations for additional published studies.

Validity

Grade Grade 2
Grade 3
Grade 4
Grade 5
Grade 6
Grade 7
Grade 8
Rating Convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
*Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.
The IXL Universal Math Screener Technical Manual provides validity evidence based on test content (i.e., subject-matter expert review), internal structure (i.e., unidimensionality and DIF), and relations to other variables. (For more information, see the Technical Manual here: https://www.ixl.com/materials/us/IXL_Math_Universal_Screener_Technical_Manual.pdf). In this section, we focus on the relationship between the screener and an external assessment across all 4 U.S. Census Bureau regions and 5 of the U.S. Census Bureau geographical divisions. In both the concurrent and predictive validity analyses, the criterion measure was the NWEA MAP® Growth™ math assessment (MAP Growth), a widely-used computer-adaptive assessment of math ability. The MAP Growth math assessment and the IXL Universal Math Screener are independent assessments developed by different organizations. While the two assessments were developed separately, they are expected to be related as the underlying construct being measured is the same, i.e., students’ math ability.
*Describe the sample(s), including size and characteristics, for each validity analysis conducted.
Students were included in the sample if they completed both measures (The IXL Universal Math Screener and the NWEA MAP Growth assessment) within the time frame of interest. For the concurrent analyses, the time frame of interest for both assessments was August-November 2024. For the predictive analysis, the IXL Universal Math screener was administered in August-November 2024, and the NWEA MAP Growth assessment in math was administered in February-May 2025. In the concurrent analyses, sample sizes ranged from 322 (Grade 2) to 1,746 (Grade 6). In the predictive analyses, sample sizes ranged from 249 (Grade 3) to 669 (Grade 6). The samples for both analyses represented students across all performance levels and across all four of the U.S. Census Bureau Regions and five of the nine U.S. Census Bureau geographical divisions.
*Describe the analysis procedures for each reported type of validity.
Pearson product-moment correlation analyses were conducted between IXL Universal Math Screener theta values (i.e., continuous measures of student math ability) and students’ scores on the outcome measure (MAP Growth RIT scores). Confidence intervals (95%) were calculated using Fisher’s r-to-z transformation. For the concurrent analyses, a positive correlation between the two variables indicates that students who have higher math ability as measured by the IXL Universal Math Screener also have higher math ability as measured by a concurrent administration of the criterion assessment. For the predictive analyses, a positive correlation between the two variables indicates that students who have higher math ability as measured by the IXL Universal Math Screener also have higher math ability as measured by a administration of the criterion assessment that occurred later in time.

*In the table below, report the results of the validity analyses described above (e.g., concurrent or predictive validity, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.

Type of Subgroup Informant Age / Grade Test or Criterion n Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of validity analysis not compatible with above table format:
Establishing the validity of an assessment includes “accumulating relevant evidence to provide a sound scientific basis for the proposed score interpretations” (AERA, APA, & NCME, 2014; p. 11). One way to provide evidence for such interpretations in a CAT context is by establishing unidimensionality, meaning that only a single construct is the target of measurement. Given that most methods of testing for unidimensionality are challenging, if not impossible, with CAT data because of sparseness, we used strand-level subscores to examine grade-level construct validity and to determine if using a unidimensional Rasch model was appropriate. We fitted unidimensional confirmatory factor analysis (CFA) models to grade-level subscore data collected from August 2024 through January 2025 (Wang, McCall, Jiao, & Harris, 2013). For grade 2, we specified a single factor with four strand-level subscores representing algebra and algebraic thinking, data and measurement, geometry, and numbers and operations. For grades 3-8, we specified a single factor with five strand-level subscores, expanding the representation of the strands in grade 2 to include fractions. Student scores came from the 2024-25 school year. Results show that the assumption of unidimensionality was tenable across all grades (full results table available from the Center upon request), with all CFI values greater than 0.95, all RMSEA values less than 0.06 except for grade 8 (0.074), and all SRMR values less than 0.08 (Hu & Bentler, 1999). Together, these statistics indicate that a single construct (math ability) is being assessed, i.e., a unidimensional Rasch model is appropriate to administer the IXL Math Universal Screener.
Manual cites other published reliability studies:
Yes
Provide citations for additional published studies.
Hargis, M. B. (2023). Assessing the concurrent and predictive validity of the IXL Universal Math Screener using NWEA MAP Growth as criterion. IXL Learning. https://www.ixl.com/materials/us/research/ Assessing_the_Concurrent_and_Predictive_Validity_of_the_IXL_Math_Universal_Screener_Using_ NWEA_MAP_Growth_as_Criterion.pdf
Describe the degree to which the provided data support the validity of the tool.
In each grade, there was a statistically significant positive correlation between concurrent administrations of IXL’s universal screener and an external math assessment, NWEA MAP Growth, as well as between earlier administrations of IXL's universal screener and later administrations of NWEA MAP Growth. Coefficients ranged from .70 to .86, reflecting strong relationships between students’ ability as measured by the IXL Universal Math Screener and the widely-used MAP Growth assessment. These results illustrate a strong relationship between the IXL Universal Math Screener and NWEA MAP Growth in terms of both concurrent and predictive validity, with the advantage that IXL’s universal screener requires substantially less time to administer (about half the time needed for administering MAP), allowing completion in less than a class period. In addition, unidimensionality analyses indicate that, as expected, a single construct (math ability) is being assessed by the IXL Universal Math Screener.
Do you have validity data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
No

If yes, fill in data for each subgroup with disaggregated validity data.

Type of Subgroup Informant Age / Grade Test or Criterion n Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of validity analysis not compatible with above table format:
Manual cites other published reliability studies:
Provide citations for additional published studies.

Bias Analysis

Grade Grade 2
Grade 3
Grade 4
Grade 5
Grade 6
Grade 7
Grade 8
Rating Provided Provided Provided Provided Provided Provided Provided
Have you conducted additional analyses related to the extent to which your tool is or is not biased against subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)? Examples might include Differential Item Functioning (DIF) or invariance testing in multiple-group confirmatory factor models.
Yes
If yes,
a. Describe the method used to determine the presence or absence of bias:
Differential item functioning (DIF) analysis investigates each item for signs of interactions with sample characteristics. An item is said to exhibit DIF when equally able individuals from different groups have notably differing probabilities of answering the item correctly. DIF detection procedures help to gather validity evidence for the proposed interpretations of test scores by ensuring that scores are free from potential bias and that individual items do not create an advantage for one group over another. The majority of DIF detection methods are based on manifest groups, where the group membership of interest (e.g., gender) is known prior to the DIF analysis. However, when the manifest group membership is unknown, unobservable, or does not provide a valid indication of true group membership, then the use of a latent (i.e., identified using data) group membership may be warranted. We report both latent and manifest DIF analyses. To examine latent DIF, we first identified the number of latent classes through the exploratory mixture Rasch model analysis, which provides a probability of group membership for each student across all classes. Models were constructed with different numbers of latent classes and model fit was compared using Akaike information criterion (AIC), Bayesian information criterion (BIC), and log-likelihood. A chi square likelihood ratio test was then conducted to assess the goodness-of-fit between the different models, and Cohen’s omega was calculated to determine the effect size (Cohen, 1988). Using Cohen’s omega, we assessed whether there were meaningful differences between the models, which would indicate potential bias. To examine manifest DIF, we used the Rasch separate calibration t-test method. This method is based on the differences between two separate calibrations of the same item from the subpopulations of interest, holding the other item and person parameters constant to ensure scale stability (Wright & Stone, 1979). Determining the flagging criterion for the Rasch separate calibration t-test method typically involves setting the magnitude for the difference between the two calibrations and an appropriate p-value for significance. Wright and Douglas (1975) proposed a “half-logit” rule, where a difference in item difficulty between examinee subgroups of at least 0.5 logits warrants additional investigation. This critical value reflects a sizable portion of the typical range of item difficulty in operational tests (about -2.5 logits to about +2.5 logits); accordingly, a shift of 0.5 logits (10% of the scale) begins to affect the accuracy of the measurement. These criteria (a statistically-significant p-value and a shift of 0.5 logits) were used to assess manifest DIF in the IXL Universal Math Screener.
b. Describe the subgroups for which bias analyses were conducted:
Student gender (male vs. female).
c. Describe the results of the bias analyses conducted, including data and interpretative statements. Include magnitude of effect (if available) if bias has been identified.
Latent DIF: For each grade, the 2-class model fit better than the 1-class model; however, Cohen’s omega shows no practical effect size difference between the two models (Cohen’s omega ranged from 0.016 to 0.026), which suggests that only a single class is present in the data. This result provides evidence to support the Rasch model assumption of invariance and suggests that potential bias, which may advantage one group over another, is undetectable. Manifest DIF: When investigating DIF based on reported sex, out of thousands of items that received sufficient exposures from relevant groups, we flagged 46 items in grade 2, 64 items in grade 3, 100 items in grade 4, 82 items in grade 5, 78 items in grade 6, 92 items in grade 7, and 74 items in grade 8 for potential DIF. More importantly, all the flagged items were free from substantive DIF (see the following section). Substantive DIF: It is important to distinguish between statistical DIF and substantive DIF (Penfield & Lam, 2000; Roussos & Stout, 1996). Statistical DIF refers to the statistical identification of DIF, whereas substantive DIF refers to the identification of construct-irrelevant factors responsible for the statistical DIF (i.e., potential sources of bias). It is always important to remember that statistical DIF is simply a detection strategy and that careful scrutiny of items is always warranted to identify and address substantive DIF. DIF detection methods may identify some items as being unbiased even though they are indeed biased, while some items identified as having bias may not actually be biased. Therefore, all items identified as potentially biased using the methods outlined above were reviewed by subject-matter experts to ensure that all items are free from substantive DIF.

Data Collection Practices

Most tools and programs evaluated by the NCII are branded products which have been submitted by the companies, organizations, or individuals that disseminate these products. These entities supply the textual information shown above, but not the ratings accompanying the text. NCII administrators and members of our Technical Review Committees have reviewed the content on this page, but NCII cannot guarantee that this information is free from error or reflective of recent changes to the product. Tools and programs have the opportunity to be updated annually or upon request.