IXL Universal Math Screener
Mathematics
Summary
IXL’s Universal Math Screener for grades K-8 identifies students who may need intervention. Criterion-referenced and designed by education experts using modern assessment principles and frameworks, the Universal Screener quickly flags students who are at risk of not meeting grade-level standards. The IXL Universal Math Screener is quick, requiring on average only 20 minutes to complete, and helps administrators plan for intervention accordingly. Administrators can easily set up a screener window, up to five times per year, with desired start and end dates. During a screener window, students will see a notification on their dashboard that directs them to complete their screener, available to them either in English or Spanish. Students have up to three minutes to answer each question, and are allotted 30 minutes to complete the screener. The screener’s advanced algorithm adapts to a student’s abilities after each question to obtain precise proficiency levels. Once the screener window has closed, administrators can review the Universal Screener Levels report to understand which students are on or above grade level, below grade level, or far below grade level–by district, schools, grade level, or individual student–both overall and by each strand.
- Where to Obtain:
- IXL Learning
- orders@ixl.com
- 777 Mariners Island Blvd., Suite 600, San Mateo, CA 94404
- 855-255-8800
- www.ixl.com/membership/quote
- Initial Cost:
- $5.00 per student
- Replacement Cost:
- $5.00 per student per school year, covering up to five screens per school year per purchase
- Included in Cost:
- The IXL Universal Math Screener is priced on an annual per student basis. Each student license purchase covers up to five screens per school year for the student. Training is priced separately. Access to the IXL Math learning platform is an additional cost, and offered together with the IXL Universal Math Screener at $13.25 per annual student license.
- The IXL Universal Math Screener supports assistive technologies, including screen reader compatibility, audio support, keyboard shortcuts, browser zoom up to 200 percent, and high contrast ratios. More broadly, the IXL Universal Math Screener is highly adaptive, which allows it to quickly adjust and meet students' diverse abilities and proficiency levels.
- Training Requirements:
- Training not required
- Qualified Administrators:
- No minimum qualifications specified.
- Access to Technical Support:
- Administrators and teachers can refer to online resources including the IXL Universal Screener Technical Manual and Teacher Implementation Guide for guidance on administration and use of the Screener. Users can also refer to IXL’s online help center for additional user guides and answers to frequently asked questions at www.ixl.com/help-center. IXL offers technical support via phone (855-255-6676) from 7AM to 7PM, Monday to Friday. Users may also contact IXL via email (help@ixl.com). Other than closure for major holidays, IXL’s attentive staff will respond to inquiries within one business day.
- Assessment Format:
-
- Scoring Time:
-
- Scoring is automatic
- Scores Generated:
-
- Percentile score
- Developmental benchmarks
- Other: After students complete the screener, administrators and teachers have access to key data for planning instruction: nationally-normed percentiles, overall achievement levels, and achievement levels by strand.
- Administration Time:
-
- 20 minutes per student
- Scoring Method:
-
- Automatically (computer-scored)
- Technology Requirements:
-
- Computer or tablet
- Internet connection
- Accommodations:
- The IXL Universal Math Screener supports assistive technologies, including screen reader compatibility, audio support, keyboard shortcuts, browser zoom up to 200 percent, and high contrast ratios. More broadly, the IXL Universal Math Screener is highly adaptive, which allows it to quickly adjust and meet students' diverse abilities and proficiency levels.
Descriptive Information
- Please provide a description of your tool:
- IXL’s Universal Math Screener for grades K-8 identifies students who may need intervention. Criterion-referenced and designed by education experts using modern assessment principles and frameworks, the Universal Screener quickly flags students who are at risk of not meeting grade-level standards. The IXL Universal Math Screener is quick, requiring on average only 20 minutes to complete, and helps administrators plan for intervention accordingly. Administrators can easily set up a screener window, up to five times per year, with desired start and end dates. During a screener window, students will see a notification on their dashboard that directs them to complete their screener, available to them either in English or Spanish. Students have up to three minutes to answer each question, and are allotted 30 minutes to complete the screener. The screener’s advanced algorithm adapts to a student’s abilities after each question to obtain precise proficiency levels. Once the screener window has closed, administrators can review the Universal Screener Levels report to understand which students are on or above grade level, below grade level, or far below grade level–by district, schools, grade level, or individual student–both overall and by each strand.
ACADEMIC ONLY: What skills does the tool screen?
- Please describe specific domain, skills or subtests:
- The IXL Universal Math Screener covers five strands: Numbers & Operations, Algebra & Algebraic Thinking, Fractions, Geometry, and Data & Measurement.
- BEHAVIOR ONLY: Which category of behaviors does your tool target?
-
- BEHAVIOR ONLY: Please identify which broad domain(s)/construct(s) are measured by your tool and define each sub-domain or sub-construct.
Acquisition and Cost Information
Administration
- Are norms available?
- Yes
- Are benchmarks available?
- Yes
- If yes, how many benchmarks per year?
- The IXL Universal Math Screener is typically administered in three distinct windows (beginning-of-year, middle-of-year, and end-of-year). It may be administered up to five times per academic year.
- If yes, for which months are benchmarks available?
- Benchmarks are set relative to the beginning (August 1 – November 30), middle (December 1 – February 28), and end (March 1 – June 1) of the school year.
- BEHAVIOR ONLY: Can students be rated concurrently by one administrator?
- If yes, how many students can be rated concurrently?
Training & Scoring
Training
- Is training for the administrator required?
- No
- Describe the time required for administrator training, if applicable:
- Administrator training for the Universal Screener is optional, though highly recommended. Administrator training is offered as 30-minute virtual sessions.
- Please describe the minimum qualifications an administrator must possess.
-
No minimum qualifications
- Are training manuals and materials available?
- Yes
- Are training manuals/materials field-tested?
- Yes
- Are training manuals/materials included in cost of tools?
- Yes
- If No, please describe training costs:
- Administrators and teachers can refer to online resources including the IXL Universal Screener Technical Manual (https://www.ixl.com/materials/us/IXL_Math_Universal_Screener_Technical_Manual.pdf) and Teacher Implementation Guide (https://www.ixl.com/materials/us/i_guides/Teacher_Guide_Universal_Screener.pdf) for guidance on administration and use of the Screener. These are available at no additional cost. Live virtual administrator training is available as an additional purchase. These are offered at $295.00 for a 30-minute session for up to 50 attendees and $549.00 for up to 200 attendees.
- Can users obtain ongoing professional and technical support?
- Yes
- If Yes, please describe how users can obtain support:
- Administrators and teachers can refer to online resources including the IXL Universal Screener Technical Manual and Teacher Implementation Guide for guidance on administration and use of the Screener. Users can also refer to IXL’s online help center for additional user guides and answers to frequently asked questions at www.ixl.com/help-center. IXL offers technical support via phone (855-255-6676) from 7AM to 7PM, Monday to Friday. Users may also contact IXL via email (help@ixl.com). Other than closure for major holidays, IXL’s attentive staff will respond to inquiries within one business day.
Scoring
- Do you provide basis for calculating performance level scores?
-
Yes
- Does your tool include decision rules?
-
No
- If yes, please describe.
- We provide administrators with a table linking classifications of students into proficiency levels and national percentile ranks. Classification at a given time of the year provides a criterion-referenced inference regarding a student’s math ability at that time relative to standards-based expectations for their grade level. Percentile ranks provide a norm-referenced inference of a student’s performance relative to their peers; specifically, percentile ranks indicate the percentage of scores that fall below a specific score. This table serves the following three purposes: (1) it provides educators with a high-level breakdown of student performance nationwide; (2) it allows for the differentiation of students who received the same classification; and (3) educators can use the percentile ranges associated with the Far below grade level, as well as school and individual student contexts, to identify appropriate cutoffs for Tier III interventions within a Response-to-Intervention/Multi-Tiered System of Supports (RTI/MTSS) framework.
- Can you provide evidence in support of multiple decision rules?
-
No
- If yes, please describe.
- Please describe the scoring structure. Provide relevant details such as the scoring format, the number of items overall, the number of items per subscale, what the cluster/composite score comprises, and how raw scores are calculated.
- The IXL Universal Math Screener includes a mix of selected-response and constructed-response questions which are scored in real-time. The screener covers five strands: Numbers & Operations, Algebra & Algebraic Thinking, Fractions, Geometry, and Data & Measurement. After students complete the screener, administrators and teachers have access to key data for planning instruction: nationally-normed percentiles, overall achievement levels, and achievement levels by strand. Nationally-normed percentiles provide information about the typical levels of performance for an identifiable population of students or schools. For example, a student may achieve the highest test score in her class on a given assessment but still fall below the national average of students at her grade level who have completed the same assessment (i.e., a percentile rank < 50). This information allows educators to compare their students’ scores to the scores of students across the United States who completed the same assessment. (More information about national norms may be found in the IXL Universal Math Screener Technical Manual, https://www.ixl.com/materials/us/IXL_Math_Universal_Screener_Technical_Manual.pdf.) The IXL Universal Math Screener also classifies students as performing far below grade level, below grade level, or on or above grade level. In addition, achievement levels are provided for each of the five strands (e.g., far below/below/on or above grade level in the Numbers and Operations strand, the Algebra & Algebraic Thinking strand, etc). Due to the adaptive nature of the IXL Universal Math Screener, the total number of items delivered to students varies based on several factors. However, lower and upper bounds on the number of items from any content strand are set based on the student grade and administration window (i.e., beginning, middle, or end of year). Though not included here, the test constraints associated with each time of year are published in the IXL Universal Math Screener Technical Manual. As a student progresses through the assessment, the algorithm updates its estimate of ability and the standard error of this ability estimate each time the student responds to an item using a conventional Bayesian expected a posteriori (EAP) estimation method. As the algorithm becomes more certain in its running estimate of student ability (as indicated by the decreasing standard error of the estimate), it selects items more suitable to the student’s ability estimate. That is, the algorithm chooses more difficult items as the student answers correctly and less challenging items as the student answers incorrectly. This adaptivity reduces the number of items needed to measure math ability, thus increasing test efficiency and improving the test experience.
- Describe the tool’s approach to screening, samples (if applicable), and/or test format, including steps taken to ensure that it is appropriate for use with culturally and linguistically diverse populations and students with disabilities.
- The IXL Universal Math Screener is a brief, accurate assessment designed specifically for identifying students from kindergarten through eighth grade who are experiencing moderate or severe math difficulties. It may be administered up to five times per academic year, with performance expectations automatically adjusting for the beginning, middle, and end of the academic year. By scheduling the first administration early in the academic year, educators can promptly identify students with moderate or severe difficulties. Results are available immediately following the administration so that intervention resources can be deployed quickly to the students who need them. Results are also available by strand, providing insight into the specific areas where each student is struggling. Content for the IXL Universal Math Screener was developed using a principled assessment design framework. Principled assessment design is a general framework for designing, developing, and implementing an assessment to support the ongoing accumulation and synthesis of evidence to support the validity claims made by the assessment (e.g., Ferrara et al., 2016). This framework requires assessment targets to be clearly defined at the beginning of the process. These assessment targets drive the entire assessment development plan, and continuous focus on targets ensures that all subsequent decisions are consistent with providing evidence to support validity claims. The assessment targets for the IXL Universal Math Screener are achievement level descriptors (ALDs) derived from all 50 states’ standards, including the Common Core. Achievement level descriptors are statements that describe expectations of what students at specific achievement levels should know and be able to do. As the IXL Universal Math Screener is designed to identify students who are experiencing moderate or severe difficulties in math, the achievement level was conceptualized as the threshold between on-grade achievement and below-grade achievement. Therefore, the ALDs were written for each educational standard to describe what a minimally competent student would know and be able to do. An external panel of experienced educators was recruited to review sample items alongside the evaluation of ALDs described previously. The SME panel consisted of in-service educators with significant experience teaching or supervising the grade level(s) they evaluated, and with demographics matching that of elementary and middle school teachers in the U.S. Multiple rounds of internal review were then conducted to ensure all items appropriately targeted the intended achievement level descriptors while meeting the content and item writing specifications. The format of the IXL Universal Math Screener includes a mix of selected-response questions for which a student chooses an answer from a provided set of options and constructed-response questions for which a student produces their own answer. Where appropriate, the writers of these items applied the principles of Universal Design (Thompson et al., 2002) to reduce construct-irrelevant variance while simultaneously increasing accessibility. Items were written to target achievement level descriptors while considering important factors such as reading level, cognitive load, vocabulary, and sentence length, among others. The IXL Universal Math Screener is designed to be appropriate for students with diverse backgrounds (e.g., race, ethnicity, culture, and gender) and levels of ability. The screener is available in Spanish, making it accessible for Spanish-speaking students regardless of English language proficiency. It also supports assistive technologies to provide equal accessibility for all students. These include screen reader compatibility, audio support, keyboard shortcuts, browser zoom up to 200 percent, and high contrast ratios. More broadly, the IXL Universal Math Screener is highly adaptive, which allows it to quickly adjust and meet students' diverse abilities and proficiency levels.
Technical Standards
Classification Accuracy & Cross-Validation Summary
Grade |
Grade 2
|
Grade 3
|
Grade 4
|
Grade 5
|
Grade 6
|
Grade 7
|
Grade 8
|
---|---|---|---|---|---|---|---|
Classification Accuracy Fall |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Classification Accuracy Winter |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Classification Accuracy Spring |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |




NWEA MAP Growth
Classification Accuracy
- Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.
- The criterion measure is the NWEA MAP® Growth™ math assessment (MAP Growth), a computer-adaptive assessment of math ability. The MAP Growth math assessment and the IXL Universal Math Screener are independent assessments developed by different organizations. The classification analysis samples included students in all four U.S. Census regions: West (CA, AZ), Midwest (MI, OH), South (KY, TX, WV), and Northeast (PA). These states represent seven U.S. Census Divisions: Pacific (CA), Mountain (AZ), East North Central (MI, OH), West South Central (TX), East South Central (KY), Middle Atlantic (PA), and South Atlantic (WV).
- Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.
- For the Fall classification accuracy analyses, both measures were administered in the beginning of the 2024-25 school year. The IXL Universal Math Screener was administered between August 1, 2024 and November 30, 2024; the MAP Growth math assessment was administered between August 7, 2024 and November 8, 2024. As these two measurements were taken in close temporal proximity, concurrent validity analyses were conducted. For the Winter classification accuracy analyses, both measures were administered in the middle of the 2024-25 school year. The IXL Universal Math Screener was administered between December 1, 2024 and February 12, 2025; the MAP Growth math assessment was administered between December 1, 2024 and February 13, 2024. As these two measurements were taken in close temporal proximity, concurrent validity analyses were conducted.
- Describe how the classification analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).
- Cut points on the MAP Growth math assessment were determined using the 2020 nationwide MAP Growth norms for student achievement. In line with NCII TRC guidance, students with RIT scores below the 20th percentile were considered “at risk,” while those with RIT scores at or above the 20th percentile were considered “not at risk.” Cut points on the screening measure were identified as theta values that maximized classification accuracy with the criterion measure. Using these values, students were classified as “at risk” if they scored below the cut point and “not at risk” if they scored at or above the cut point.
- Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?
-
No
- If yes, please describe the intervention, what children received the intervention, and how they were chosen.
Cross-Validation
- Has a cross-validation study been conducted?
-
No
- If yes,
- Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.
- Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.
- Describe how the cross-validation analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).
- Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?
- If yes, please describe the intervention, what children received the intervention, and how they were chosen.
Classification Accuracy - Fall
Evidence | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 |
---|---|---|---|---|---|---|---|
Criterion measure | NWEA MAP Growth | NWEA MAP Growth | NWEA MAP Growth | NWEA MAP Growth | NWEA MAP Growth | NWEA MAP Growth | NWEA MAP Growth |
Cut Points - Percentile rank on criterion measure | 20 | 20 | 20 | 20 | 20 | 20 | 20 |
Cut Points - Performance score on criterion measure | |||||||
Cut Points - Corresponding performance score (numeric) on screener measure | -1.9 | -1.4 | -1.4 | -1.1 | -1.4 | -1.1 | -0.9 |
Classification Data - True Positive (a) | 23 | 172 | 168 | 175 | 223 | 208 | 157 |
Classification Data - False Positive (b) | 23 | 115 | 59 | 78 | 62 | 78 | 65 |
Classification Data - False Negative (c) | 10 | 80 | 78 | 83 | 196 | 140 | 86 |
Classification Data - True Negative (d) | 266 | 724 | 924 | 931 | 1265 | 1050 | 857 |
Area Under the Curve (AUC) | 0.81 | 0.77 | 0.81 | 0.80 | 0.74 | 0.76 | 0.79 |
AUC Estimate’s 95% Confidence Interval: Lower Bound | 0.73 | 0.74 | 0.78 | 0.77 | 0.72 | 0.74 | 0.76 |
AUC Estimate’s 95% Confidence Interval: Upper Bound | 0.89 | 0.84 | 0.84 | 0.83 | 0.77 | 0.79 | 0.82 |
Statistics | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 |
---|---|---|---|---|---|---|---|
Base Rate | 0.10 | 0.23 | 0.20 | 0.20 | 0.24 | 0.24 | 0.21 |
Overall Classification Rate | 0.90 | 0.82 | 0.89 | 0.87 | 0.85 | 0.85 | 0.87 |
Sensitivity | 0.70 | 0.68 | 0.68 | 0.68 | 0.53 | 0.60 | 0.65 |
Specificity | 0.92 | 0.86 | 0.94 | 0.92 | 0.95 | 0.93 | 0.93 |
False Positive Rate | 0.08 | 0.14 | 0.06 | 0.08 | 0.05 | 0.07 | 0.07 |
False Negative Rate | 0.30 | 0.32 | 0.32 | 0.32 | 0.47 | 0.40 | 0.35 |
Positive Predictive Power | 0.50 | 0.60 | 0.74 | 0.69 | 0.78 | 0.73 | 0.71 |
Negative Predictive Power | 0.96 | 0.90 | 0.92 | 0.92 | 0.87 | 0.88 | 0.91 |
Sample | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 |
---|---|---|---|---|---|---|---|
Date | Fall 2024 | Fall 2024 | Fall 2024 | Fall 2024 | Fall 2024 | Fall 2024 | Fall 2024 |
Sample Size | 322 | 1091 | 1229 | 1267 | 1746 | 1476 | 1165 |
Geographic Representation | Middle Atlantic (PA) West South Central (TX) |
East North Central (MI) Middle Atlantic (PA) |
East North Central (MI) Middle Atlantic (PA) |
East South Central (KY) Middle Atlantic (PA) |
East North Central (MI) Middle Atlantic (PA) |
East North Central (MI) Middle Atlantic (PA) |
East North Central (MI) Middle Atlantic (PA) |
Male | |||||||
Female | |||||||
Other | |||||||
Gender Unknown | |||||||
White, Non-Hispanic | |||||||
Black, Non-Hispanic | |||||||
Hispanic | |||||||
Asian/Pacific Islander | |||||||
American Indian/Alaska Native | |||||||
Other | |||||||
Race / Ethnicity Unknown | |||||||
Low SES | |||||||
IEP or diagnosed disability | |||||||
English Language Learner |
Classification Accuracy - Winter
Evidence | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 |
---|---|---|---|---|---|
Criterion measure | NWEA MAP Growth | NWEA MAP Growth | NWEA MAP Growth | NWEA MAP Growth | NWEA MAP Growth |
Cut Points - Percentile rank on criterion measure | 20 | 20 | 20 | 20 | 20 |
Cut Points - Performance score on criterion measure | |||||
Cut Points - Corresponding performance score (numeric) on screener measure | -1.0 | -0.7 | -0.9 | -0.4 | -0.3 |
Classification Data - True Positive (a) | 75 | 141 | 59 | 91 | 70 |
Classification Data - False Positive (b) | 45 | 92 | 46 | 56 | 62 |
Classification Data - False Negative (c) | 23 | 24 | 5 | 18 | 9 |
Classification Data - True Negative (d) | 151 | 269 | 142 | 141 | 116 |
Area Under the Curve (AUC) | 0.77 | 0.80 | 0.84 | 0.78 | 0.77 |
AUC Estimate’s 95% Confidence Interval: Lower Bound | 0.72 | 0.76 | 0.79 | 0.73 | 0.72 |
AUC Estimate’s 95% Confidence Interval: Upper Bound | 0.82 | 0.84 | 0.88 | 0.82 | 0.82 |
Statistics | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 |
---|---|---|---|---|---|
Base Rate | 0.33 | 0.31 | 0.25 | 0.36 | 0.31 |
Overall Classification Rate | 0.77 | 0.78 | 0.80 | 0.76 | 0.72 |
Sensitivity | 0.77 | 0.85 | 0.92 | 0.83 | 0.89 |
Specificity | 0.77 | 0.75 | 0.76 | 0.72 | 0.65 |
False Positive Rate | 0.23 | 0.25 | 0.24 | 0.28 | 0.35 |
False Negative Rate | 0.23 | 0.15 | 0.08 | 0.17 | 0.11 |
Positive Predictive Power | 0.63 | 0.61 | 0.56 | 0.62 | 0.53 |
Negative Predictive Power | 0.87 | 0.92 | 0.97 | 0.89 | 0.93 |
Sample | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 |
---|---|---|---|---|---|
Date | Winter 2025 | Winter 2025 | Winter 2025 | Winter 2025 | Winter 2025 |
Sample Size | 294 | 526 | 252 | 306 | 257 |
Geographic Representation | Pacific (CA) West South Central (TX) |
Mountain (AZ) West South Central (TX) |
Pacific (CA) West South Central (TX) |
East North Central (OH) West South Central (TX) |
Middle Atlantic (PA) South Atlantic (WV) |
Male | |||||
Female | |||||
Other | |||||
Gender Unknown | |||||
White, Non-Hispanic | |||||
Black, Non-Hispanic | |||||
Hispanic | |||||
Asian/Pacific Islander | |||||
American Indian/Alaska Native | |||||
Other | |||||
Race / Ethnicity Unknown | |||||
Low SES | |||||
IEP or diagnosed disability | |||||
English Language Learner |
Reliability
Grade |
Grade 2
|
Grade 3
|
Grade 4
|
Grade 5
|
Grade 6
|
Grade 7
|
Grade 8
|
---|---|---|---|---|---|---|---|
Rating |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |




- *Offer a justification for each type of reliability reported, given the type and purpose of the tool.
- Often, the reliability of a traditional, fixed-form assessment is evaluated via internal consistency measures such as Cronbach’s alpha or McDonald’s omega. However, in a CAT assessment context, these measures are not appropriate. Below, we report two types of reliability that are appropriate for CAT assessments: marginal reliability and standard error of measurement (SEM; including accompanying plots). Marginal Reliability is reported in the table further below. Standard Error of Measurement: A student’s standard error of measurement (SEM) is an indicator of the precision of their score and describes the range in which a score may vary upon repeated testing that is due to chance. SEM is a function of the interaction between the ability of a student, the difficulty of the items, and the number of items on a test. A lower SEM indicates less error and more precision around a score. Because the CAT algorithm selects items based on a student’s estimated ability level, it is able to target a student more accurately, and significantly decrease the SEM with fewer items than a traditional fixed-form assessment. Although the CAT algorithm targets students more accurately according to their ability, it generally performs better for students whose ability and performance are closer to the typical ability of students at or near their grade level. As seen in the plots (available from the Center upon request), the SEM is lowest for grade 2 students whose ability is around -2.0, grade 3 students whose ability is between -2.0 and 0.0, grade 4 students whose ability is between -0.5 and 0.5, grade 5 students whose ability is between 0.0 and 1.0, grade 6 students whose ability is between 0.5 and 1.5, grade 7 students whose ability is around 1.5, and grade 8 students whose ability is between 1.5 and 2.5.
- *Describe the sample(s), including size and characteristics, for each reliability analysis conducted.
- The sample for calculating the marginal reliability and SEM included student records from the 2022-2023 school year. The sample included students in all nine U.S. Census Bureau divisions. Sample sizes by grade are presented in the table below. Students in this sample spanned all performance levels.
- *Describe the analysis procedures for each reported type of reliability.
- Marginal reliability: Although traditional measures of reliability are not estimable for adaptive assessments, marginal reliability provides a method that closely approximates the traditional measures of internal consistency when the ability distribution and item parameters are known (see Dimitrov, 2003; Samejima, 1977, 1994). SEM: As mentioned, the IXL Universal Math Screener uses EAP to estimate student ability and SEM to indicate the level of certainty or reliability of this estimate. It derives the SEM by calculating the standard deviation of the posterior distribution of the EAP estimate by integrating over all possible values of ability given a response pattern (see Bock & Mislevy, 1982). Although not summarized here, the Technical Manual for the IXL Universal Math Screener includes additional reliability information, including test-retest reliability and classification consistency.
*In the table(s) below, report the results of the reliability analyses described above (e.g., internal consistency or inter-rater reliability coefficients).
Type of | Subgroup | Informant | Age / Grade | Test or Criterion | n | Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of reliability analysis not compatible with above table format:
- Standard error of measurement (SEM) describes the range in which a score may vary upon repeated testing that is due to chance. It is inversely related to marginal reliability, with lower SEM values indicating higher precision. In this sample, median SEM coefficients ranged from 0.441 (grade 4) to 0.484 (grade 2), indicating low error and high precision around students’ scores on the IXL Universal Math Screener across grades 2-8. The plots (available from the Center upon request) display SEM for each grade in the analysis. The above histograms represent the wide spectrum of math ability among students in this sample (panels show number of test-takers by ability level across grades 2-8), demonstrating that the sample included students across all performance levels.
- Manual cites other published reliability studies:
- No
- Provide citations for additional published studies.
- Do you have reliability data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
- No
If yes, fill in data for each subgroup with disaggregated reliability data.
Type of | Subgroup | Informant | Age / Grade | Test or Criterion | n | Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of reliability analysis not compatible with above table format:
- Manual cites other published reliability studies:
- Provide citations for additional published studies.
Validity
Grade |
Grade 2
|
Grade 3
|
Grade 4
|
Grade 5
|
Grade 6
|
Grade 7
|
Grade 8
|
---|---|---|---|---|---|---|---|
Rating |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |




- *Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.
- The IXL Universal Math Screener Technical Manual provides validity evidence based on test content (i.e., subject-matter expert review), internal structure (i.e., unidimensionality and DIF), and relations to other variables. (For more information, see the Technical Manual here: https://www.ixl.com/materials/us/IXL_Math_Universal_Screener_Technical_Manual.pdf). In this section, we focus on the relationship between the screener and an external assessment across all 4 U.S. Census Bureau regions and 5 of the U.S. Census Bureau geographical divisions. In both the concurrent and predictive validity analyses, the criterion measure was the NWEA MAP® Growth™ math assessment (MAP Growth), a widely-used computer-adaptive assessment of math ability. The MAP Growth math assessment and the IXL Universal Math Screener are independent assessments developed by different organizations. While the two assessments were developed separately, they are expected to be related as the underlying construct being measured is the same, i.e., students’ math ability.
- *Describe the sample(s), including size and characteristics, for each validity analysis conducted.
- Students were included in the sample if they completed both measures (The IXL Universal Math Screener and the NWEA MAP Growth assessment) within the time frame of interest. For the concurrent analyses, the time frame of interest for both assessments was August-November 2024. For the predictive analysis, the IXL Universal Math screener was administered in August-November 2024, and the NWEA MAP Growth assessment in math was administered in February-May 2025. In the concurrent analyses, sample sizes ranged from 322 (Grade 2) to 1,746 (Grade 6). In the predictive analyses, sample sizes ranged from 249 (Grade 3) to 669 (Grade 6). The samples for both analyses represented students across all performance levels and across all four of the U.S. Census Bureau Regions and five of the nine U.S. Census Bureau geographical divisions.
- *Describe the analysis procedures for each reported type of validity.
- Pearson product-moment correlation analyses were conducted between IXL Universal Math Screener theta values (i.e., continuous measures of student math ability) and students’ scores on the outcome measure (MAP Growth RIT scores). Confidence intervals (95%) were calculated using Fisher’s r-to-z transformation. For the concurrent analyses, a positive correlation between the two variables indicates that students who have higher math ability as measured by the IXL Universal Math Screener also have higher math ability as measured by a concurrent administration of the criterion assessment. For the predictive analyses, a positive correlation between the two variables indicates that students who have higher math ability as measured by the IXL Universal Math Screener also have higher math ability as measured by a administration of the criterion assessment that occurred later in time.
*In the table below, report the results of the validity analyses described above (e.g., concurrent or predictive validity, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.
Type of | Subgroup | Informant | Age / Grade | Test or Criterion | n | Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of validity analysis not compatible with above table format:
- Establishing the validity of an assessment includes “accumulating relevant evidence to provide a sound scientific basis for the proposed score interpretations” (AERA, APA, & NCME, 2014; p. 11). One way to provide evidence for such interpretations in a CAT context is by establishing unidimensionality, meaning that only a single construct is the target of measurement. Given that most methods of testing for unidimensionality are challenging, if not impossible, with CAT data because of sparseness, we used strand-level subscores to examine grade-level construct validity and to determine if using a unidimensional Rasch model was appropriate. We fitted unidimensional confirmatory factor analysis (CFA) models to grade-level subscore data collected from August 2024 through January 2025 (Wang, McCall, Jiao, & Harris, 2013). For grade 2, we specified a single factor with four strand-level subscores representing algebra and algebraic thinking, data and measurement, geometry, and numbers and operations. For grades 3-8, we specified a single factor with five strand-level subscores, expanding the representation of the strands in grade 2 to include fractions. Student scores came from the 2024-25 school year. Results show that the assumption of unidimensionality was tenable across all grades (full results table available from the Center upon request), with all CFI values greater than 0.95, all RMSEA values less than 0.06 except for grade 8 (0.074), and all SRMR values less than 0.08 (Hu & Bentler, 1999). Together, these statistics indicate that a single construct (math ability) is being assessed, i.e., a unidimensional Rasch model is appropriate to administer the IXL Math Universal Screener.
- Manual cites other published reliability studies:
- Yes
- Provide citations for additional published studies.
- Hargis, M. B. (2023). Assessing the concurrent and predictive validity of the IXL Universal Math Screener using NWEA MAP Growth as criterion. IXL Learning. https://www.ixl.com/materials/us/research/ Assessing_the_Concurrent_and_Predictive_Validity_of_the_IXL_Math_Universal_Screener_Using_ NWEA_MAP_Growth_as_Criterion.pdf
- Describe the degree to which the provided data support the validity of the tool.
- In each grade, there was a statistically significant positive correlation between concurrent administrations of IXL’s universal screener and an external math assessment, NWEA MAP Growth, as well as between earlier administrations of IXL's universal screener and later administrations of NWEA MAP Growth. Coefficients ranged from .70 to .86, reflecting strong relationships between students’ ability as measured by the IXL Universal Math Screener and the widely-used MAP Growth assessment. These results illustrate a strong relationship between the IXL Universal Math Screener and NWEA MAP Growth in terms of both concurrent and predictive validity, with the advantage that IXL’s universal screener requires substantially less time to administer (about half the time needed for administering MAP), allowing completion in less than a class period. In addition, unidimensionality analyses indicate that, as expected, a single construct (math ability) is being assessed by the IXL Universal Math Screener.
- Do you have validity data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
- No
If yes, fill in data for each subgroup with disaggregated validity data.
Type of | Subgroup | Informant | Age / Grade | Test or Criterion | n | Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of validity analysis not compatible with above table format:
- Manual cites other published reliability studies:
- Provide citations for additional published studies.
Bias Analysis
Grade |
Grade 2
|
Grade 3
|
Grade 4
|
Grade 5
|
Grade 6
|
Grade 7
|
Grade 8
|
---|---|---|---|---|---|---|---|
Rating | Provided | Provided | Provided | Provided | Provided | Provided | Provided |
- Have you conducted additional analyses related to the extent to which your tool is or is not biased against subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)? Examples might include Differential Item Functioning (DIF) or invariance testing in multiple-group confirmatory factor models.
- Yes
- If yes,
- a. Describe the method used to determine the presence or absence of bias:
- Differential item functioning (DIF) analysis investigates each item for signs of interactions with sample characteristics. An item is said to exhibit DIF when equally able individuals from different groups have notably differing probabilities of answering the item correctly. DIF detection procedures help to gather validity evidence for the proposed interpretations of test scores by ensuring that scores are free from potential bias and that individual items do not create an advantage for one group over another. The majority of DIF detection methods are based on manifest groups, where the group membership of interest (e.g., gender) is known prior to the DIF analysis. However, when the manifest group membership is unknown, unobservable, or does not provide a valid indication of true group membership, then the use of a latent (i.e., identified using data) group membership may be warranted. We report both latent and manifest DIF analyses. To examine latent DIF, we first identified the number of latent classes through the exploratory mixture Rasch model analysis, which provides a probability of group membership for each student across all classes. Models were constructed with different numbers of latent classes and model fit was compared using Akaike information criterion (AIC), Bayesian information criterion (BIC), and log-likelihood. A chi square likelihood ratio test was then conducted to assess the goodness-of-fit between the different models, and Cohen’s omega was calculated to determine the effect size (Cohen, 1988). Using Cohen’s omega, we assessed whether there were meaningful differences between the models, which would indicate potential bias. To examine manifest DIF, we used the Rasch separate calibration t-test method. This method is based on the differences between two separate calibrations of the same item from the subpopulations of interest, holding the other item and person parameters constant to ensure scale stability (Wright & Stone, 1979). Determining the flagging criterion for the Rasch separate calibration t-test method typically involves setting the magnitude for the difference between the two calibrations and an appropriate p-value for significance. Wright and Douglas (1975) proposed a “half-logit” rule, where a difference in item difficulty between examinee subgroups of at least 0.5 logits warrants additional investigation. This critical value reflects a sizable portion of the typical range of item difficulty in operational tests (about -2.5 logits to about +2.5 logits); accordingly, a shift of 0.5 logits (10% of the scale) begins to affect the accuracy of the measurement. These criteria (a statistically-significant p-value and a shift of 0.5 logits) were used to assess manifest DIF in the IXL Universal Math Screener.
- b. Describe the subgroups for which bias analyses were conducted:
- Student gender (male vs. female).
- c. Describe the results of the bias analyses conducted, including data and interpretative statements. Include magnitude of effect (if available) if bias has been identified.
- Latent DIF: For each grade, the 2-class model fit better than the 1-class model; however, Cohen’s omega shows no practical effect size difference between the two models (Cohen’s omega ranged from 0.016 to 0.026), which suggests that only a single class is present in the data. This result provides evidence to support the Rasch model assumption of invariance and suggests that potential bias, which may advantage one group over another, is undetectable. Manifest DIF: When investigating DIF based on reported sex, out of thousands of items that received sufficient exposures from relevant groups, we flagged 46 items in grade 2, 64 items in grade 3, 100 items in grade 4, 82 items in grade 5, 78 items in grade 6, 92 items in grade 7, and 74 items in grade 8 for potential DIF. More importantly, all the flagged items were free from substantive DIF (see the following section). Substantive DIF: It is important to distinguish between statistical DIF and substantive DIF (Penfield & Lam, 2000; Roussos & Stout, 1996). Statistical DIF refers to the statistical identification of DIF, whereas substantive DIF refers to the identification of construct-irrelevant factors responsible for the statistical DIF (i.e., potential sources of bias). It is always important to remember that statistical DIF is simply a detection strategy and that careful scrutiny of items is always warranted to identify and address substantive DIF. DIF detection methods may identify some items as being unbiased even though they are indeed biased, while some items identified as having bias may not actually be biased. Therefore, all items identified as potentially biased using the methods outlined above were reviewed by subject-matter experts to ensure that all items are free from substantive DIF.
Data Collection Practices
Most tools and programs evaluated by the NCII are branded products which have been submitted by the companies, organizations, or individuals that disseminate these products. These entities supply the textual information shown above, but not the ratings accompanying the text. NCII administrators and members of our Technical Review Committees have reviewed the content on this page, but NCII cannot guarantee that this information is free from error or reflective of recent changes to the product. Tools and programs have the opportunity to be updated annually or upon request.