aimswebPlus Math
Number Comparison Fluency - Triads

Summary

aimswebPlus Number Comparison Fluency - Triads (NCF-T) is a computer-based, standardized test which measures a student’s ability to quickly evaluate the magnitude of numbers and make accurate comparisons. During the test students answer multiple-choice items making magnitude comparisons among three answer options. Each item is presented as a triad of numbers: a target number on top and two reference numbers lower to the left and right. The student must determine whether the magnitude of target number is closer in the magnitude of the reference number on the left, the number on the right, or exactly between the two numbers. The student attempts as many items as possible in three minutes. Students receive one point for each correct response, no points for skipping items, and lose a half point for incorrect responses, which functions a correction for guessing. Students may earn up to 42 total points. aimswebPlus scoring reports offer ways for examiners and teachers to evaluate which specific items students are answering incorrectly grouped by categories of items assessing similar subskills. As a timed test, scores reflect a student’s fluency with magnitude processing. NCF-T is designed for grades 2-8 with options for Grades 9-12 to use off grade-level forms. The forms for each grade level present students with numbers of varying complexity following the common progression number concepts taught in Grades 9-12, including multi-digit integers, fractions and decimals, negatives, and numbers with exponents including scientific notation. There are 23 equivalent alternative test forms for each grade allowing NCF-T to be used for both the universal screening with all students at the beginning, middle, and end of the school year and for frequent progress monitoring of students identified as at risk. As a computer-based measure, NCF-T may be administered to groups or individual students. Students complete NCF-T on a computer answering multiple-choice style items via a secure web-based testing platform, TestNav.

Where to Obtain:
Pearson Inc.
aimswebsupport@pearson.com
Pearson Clinical Assessment, 927 E. Sonterra Blvd., Suite 119, San Antonio, TX, 78258
1-866-313-6194
www.pearsonassessments.com/aimswebplus
Initial Cost:
$7.00 per student
Replacement Cost:
$7.00 per student per year
Included in Cost:
aimswebPlus is a subscription-based online solution that includes digital editions of training manuals and testing materials within the application. The per-student cost of $7.00 for one year grants access to all measures (reading and math). An aimswebPlus Unlimited subscription is available for districts with enrollment of 2,500 students or fewer. It includes all aimswebPlus measures (reading and math) and these supplemental measures: Shaywitz DyslexiaScreen, BASC-3 BESS Teacher and Student forms, WriteToLearn, and RAN Objects, Colors and Shapes. The cost for one year is $4995.00.
Test accommodations that are documented in a student’s Individual Education Plan (IEP) are permitted with aimswebPlus. However, not all measures allow for accommodations. Number Comparison Fluency–Triads is an individually administered, timed test that employs strict time limits to generate rate-based scores. As such, valid interpretation of national norms, which are an essential aspect of decision-making during benchmark testing, depends on strict adherence to the standard administration procedures. Only these accommodations are allowed for Number Comparison Fluency–Triads: enlarging test screens and modifying the environment (e.g., special lighting, adaptive furniture).
Training Requirements:
Less than one hour of administrator training is required to read the administration and scoring directions to learn rules for proctoring testing sessions.
Qualified Administrators:
Administrators may be paraprofessional or professional members of the educational staff. Test administrators must have knowledge of the NCF-T administration and scoring guidelines.
Access to Technical Support:
Administrators may be paraprofessional or professional members of the educational staff. Test administrators must have knowledge of the NCF-T administration and scoring guidelines.
Assessment Format:
  • Individual
  • Small group
  • Large group
  • Computer-administered
Scoring Time:
  • Scoring is automatic OR
  • 0 minutes per student
Scores Generated:
  • Raw score
  • Percentile score
  • Error analysis
  • Subscale/subtest scores
Administration Time:
  • 3 minutes per student
Scoring Method:
  • Automatically (computer-scored)
Technology Requirements:
  • Computer or tablet
  • Internet connection

Tool Information

Descriptive Information

Please provide a description of your tool:
aimswebPlus Number Comparison Fluency - Triads (NCF-T) is a computer-based, standardized test which measures a student’s ability to quickly evaluate the magnitude of numbers and make accurate comparisons. During the test students answer multiple-choice items making magnitude comparisons among three answer options. Each item is presented as a triad of numbers: a target number on top and two reference numbers lower to the left and right. The student must determine whether the magnitude of target number is closer in the magnitude of the reference number on the left, the number on the right, or exactly between the two numbers. The student attempts as many items as possible in three minutes. Students receive one point for each correct response, no points for skipping items, and lose a half point for incorrect responses, which functions a correction for guessing. Students may earn up to 42 total points. aimswebPlus scoring reports offer ways for examiners and teachers to evaluate which specific items students are answering incorrectly grouped by categories of items assessing similar subskills. As a timed test, scores reflect a student’s fluency with magnitude processing. NCF-T is designed for grades 2-8 with options for Grades 9-12 to use off grade-level forms. The forms for each grade level present students with numbers of varying complexity following the common progression number concepts taught in Grades 9-12, including multi-digit integers, fractions and decimals, negatives, and numbers with exponents including scientific notation. There are 23 equivalent alternative test forms for each grade allowing NCF-T to be used for both the universal screening with all students at the beginning, middle, and end of the school year and for frequent progress monitoring of students identified as at risk. As a computer-based measure, NCF-T may be administered to groups or individual students. Students complete NCF-T on a computer answering multiple-choice style items via a secure web-based testing platform, TestNav.
Is your tool designed to measure progress towards an end-of-year goal (e.g., oral reading fluency) or progress towards a short-term skill (e.g., letter naming fluency)?
selected
not selected
The tool is intended for use with the following grade(s).
not selected Preschool / Pre - kindergarten
not selected Kindergarten
not selected First grade
selected Second grade
selected Third grade
selected Fourth grade
selected Fifth grade
selected Sixth grade
selected Seventh grade
selected Eighth grade
not selected Ninth grade
not selected Tenth grade
not selected Eleventh grade
not selected Twelfth grade

The tool is intended for use with the following age(s).
not selected 0-4 years old
not selected 5 years old
not selected 6 years old
selected 7 years old
selected 8 years old
selected 9 years old
selected 10 years old
selected 11 years old
selected 12 years old
selected 13 years old
selected 14 years old
not selected 15 years old
not selected 16 years old
not selected 17 years old
not selected 18 years old

The tool is intended for use with the following student populations.
selected Students in general education
selected Students with disabilities
selected English language learners

ACADEMIC ONLY: What dimensions does the tool assess?

Reading
not selected Global Indicator of Reading Competence
not selected Listening Comprehension
not selected Vocabulary
not selected Phonemic Awareness
not selected Decoding
not selected Passage Reading
not selected Word Identification
not selected Comprehension

Spelling & Written Expression
not selected Global Indicator of Spelling Competence
not selected Global Indicator of Writting Expression Competence

Mathematics
not selected Global Indicator of Mathematics Comprehension
not selected Early Numeracy
selected Mathematics Concepts
not selected Mathematics Computation
not selected Mathematics Application
selected Fractions
not selected Algebra

Other
Please describe specific domain, skills or subtests:
Number notation knowledge and magnitude processing

BEHAVIOR ONLY: Please identify which broad domain(s)/construct(s) are measured by your tool and define each sub-domain or sub-construct.
BEHAVIOR ONLY: Which category of behaviors does your tool target?

Acquisition and Cost Information

Where to obtain:
Email Address
aimswebsupport@pearson.com
Address
Pearson Clinical Assessment, 927 E. Sonterra Blvd., Suite 119, San Antonio, TX, 78258
Phone Number
1-866-313-6194
Website
www.pearsonassessments.com/aimswebplus
Initial cost for implementing program:
Cost
$7.00
Unit of cost
student
Replacement cost per unit for subsequent use:
Cost
$7.00
Unit of cost
student
Duration of license
year
Additional cost information:
Describe basic pricing plan and structure of the tool. Provide information on what is included in the published tool, as well as what is not included but required for implementation.
aimswebPlus is a subscription-based online solution that includes digital editions of training manuals and testing materials within the application. The per-student cost of $7.00 for one year grants access to all measures (reading and math). An aimswebPlus Unlimited subscription is available for districts with enrollment of 2,500 students or fewer. It includes all aimswebPlus measures (reading and math) and these supplemental measures: Shaywitz DyslexiaScreen, BASC-3 BESS Teacher and Student forms, WriteToLearn, and RAN Objects, Colors and Shapes. The cost for one year is $4995.00.
Provide information about special accommodations for students with disabilities.
Test accommodations that are documented in a student’s Individual Education Plan (IEP) are permitted with aimswebPlus. However, not all measures allow for accommodations. Number Comparison Fluency–Triads is an individually administered, timed test that employs strict time limits to generate rate-based scores. As such, valid interpretation of national norms, which are an essential aspect of decision-making during benchmark testing, depends on strict adherence to the standard administration procedures. Only these accommodations are allowed for Number Comparison Fluency–Triads: enlarging test screens and modifying the environment (e.g., special lighting, adaptive furniture).

Administration

BEHAVIOR ONLY: What type of administrator is your tool designed for?
not selected
not selected
not selected
not selected
not selected
not selected
If other, please specify:

BEHAVIOR ONLY: What is the administration format?
not selected
not selected
not selected
not selected
not selected
If other, please specify:

BEHAVIOR ONLY: What is the administration setting?
not selected
not selected
not selected
not selected
not selected
not selected
not selected
If other, please specify:

Does the program require technology?

If yes, what technology is required to implement your program? (Select all that apply)
selected
selected
not selected

If your program requires additional technology not listed above, please describe the required technology and the extent to which it is combined with teacher small-group instruction/intervention:

What is the administration context?
selected
selected    If small group, n=
selected    If large group, n=
selected
not selected
If other, please specify:

What is the administration time?
Time in minutes
3
per (student/group/other unit)
student

Additional scoring time:
Time in minutes
0
per (student/group/other unit)
student

How many alternate forms are available, if applicable?
Number of alternate forms
23
per (grade/level/unit)
grade

ACADEMIC ONLY: What are the discontinue rules?
selected
not selected
not selected
not selected
If other, please specify:

BEHAVIOR ONLY: Can multiple students be rated concurrently by one administrator?
If yes, how many students can be rated concurrently?

Training & Scoring

Training

Is training for the administrator required?
Yes
Describe the time required for administrator training, if applicable:
Less than one hour of administrator training is required to read the administration and scoring directions to learn rules for proctoring testing sessions.
Please describe the minimum qualifications an administrator must possess.
Administrators may be paraprofessional or professional members of the educational staff. Test administrators must have knowledge of the NCF-T administration and scoring guidelines.
not selected No minimum qualifications
Are training manuals and materials available?
Yes
Are training manuals/materials field-tested?
No
Are training manuals/materials included in cost of tools?
Yes
If No, please describe training costs:
Can users obtain ongoing professional and technical support?
Yes
If Yes, please describe how users can obtain support:
Administrators may be paraprofessional or professional members of the educational staff. Test administrators must have knowledge of the NCF-T administration and scoring guidelines.

Scoring

BEHAVIOR ONLY: What types of scores result from the administration of the assessment?
Score
Observation Behavior Rating
not selected Frequency
not selected Duration
not selected Interval
not selected Latency
not selected Raw score
Conversion
Observation Behavior Rating
not selected Rate
not selected Percent
not selected Standard score
not selected Subscale/ Subtest
not selected Composite
not selected Stanine
not selected Percentile ranks
not selected Normal curve equivalents
not selected IRT based scores
Interpretation
Observation Behavior Rating
not selected Error analysis
not selected Peer comparison
not selected Rate of change
not selected Dev. benchmarks
not selected Age-Grade equivalent
How are scores calculated?
not selected Manually (by hand)
selected Automatically (computer-scored)
not selected Other
If other, please specify:

Do you provide basis for calculating performance level scores?
Yes

What is the basis for calculating performance level and percentile scores?
not selected Age norms
selected Grade norms
not selected Classwide norms
selected Schoolwide norms
not selected Stanines
not selected Normal curve equivalents

What types of performance level scores are available?
selected Raw score
not selected Standard score
selected Percentile score
not selected Grade equivalents
not selected IRT-based score
not selected Age equivalents
not selected Stanines
not selected Normal curve equivalents
not selected Developmental benchmarks
not selected Developmental cut points
not selected Equated
not selected Probability
not selected Lexile score
selected Error analysis
not selected Composite scores
selected Subscale/subtest scores
not selected Other
If other, please specify:

Please describe the scoring structure. Provide relevant details such as the scoring format, the number of items overall, the number of items per subscale, what the cluster/composite score comprises, and how raw scores are calculated.
There are 40 total items on NCF-T forms. The NCF–T total score is calculated using a correction for guessing, which uses the accuracy of all items attempted. Students receive 1 point for each correct response (CR) and lose ½ point for each Incorrect Response (IR), then scores are rounded down to the nearest integer. Items not attempted (skipped with no response) and items not reached are ignored. The maximum total score is 40 points.
Do you provide basis for calculating slope (e.g., amount of improvement per unit in time)?
Yes
ACADEMIC ONLY: Do you provide benchmarks for the slopes?
Yes
ACADEMIC ONLY: Do you provide percentile ranks for the slopes?
Yes
What is the basis for calculating slope and percentile scores?
not selected Age norms
selected Grade norms
not selected Classwide norms
not selected Schoolwide norms
not selected Stanines
not selected Normal curve equivalents

Describe the tool’s approach to progress monitoring, behavior samples, test format, and/or scoring practices, including steps taken to ensure that it is appropriate for use with culturally and linguistically diverse populations and students with disabilities.
Once a student has been identified as in need of intensive intervention with progress monitoring, teachers or interventionists will create a progress monitoring schedule within aimswebPlus. Initiating progress monitoring with NCF-T begins with identifying a student’s baseline performance. Baseline scores are typically gathered during Benchmark screening but can also be obtained using Progress Monitoring forms directly. For student’s performing significantly below grade level expectations, we offer options for baseline performance to be set using off-grade-level forms, which can be beneficial for detecting meaningful growth (e.g. avoiding floor effects). Baseline scores are compared against national percentile norms to determine a student’s initial performance level. aimswebPlus uses national norming data on student growth conditional to initial performance levels (student growth percentiles or SGP) to provide the administrator/examiner feedback when creating a progress monitoring schedule. Administrators choose a target end date for the progress monitoring schedule, an interval of testing time points, and a target goal score. Choosing an optimal target goal score for the target end date is supported with feedback based on SGP norms matching the student’s initial performance level. Feedback informs the administrator how to choose a goal score that is ambitious enough to help close the gap, but not unrealistically to high or too low. NCF-T is designed as a curriculum-based measure, to be a brief assessment minimizing testing time especially with students who have difficulties directly related to the measure's content. NCF-T is typically administered individually or in small group testing sessions for progress monitoring at regular testing intervals specific to the students' progress monitoring schedules. After test administrators read the assessment directions, students complete NCF-T measures independently on the computer. The progress monitoring report displays scores after each assessment across time. Scores are overlaid on a plot showing the goal trend line from the baseline score to the student’s progress monitoring goal score. Once at least 3 progress monitoring data points are gathered, aimswebPlus calculates and displays the students ROI trend line, which updates after each subsequent assessment. To support assessment with diverse populations, instructions are designed to be brief, using simple, grade-appropriate language. Students may wear headphones to hear on screen text instructions read aloud and may also enable TestNav accommodative tools (e.g., contrast settings, magnifier/zoom). A version of NCF-T with Spanish instructions, is also available as a part of a 2-measure Spanish Number Sense Fluency form. When allowed by a student’s IEP accommodations like adapting the physical environment is permitted.

Rates of Improvement and End of Year Benchmarks

Is minimum acceptable growth (slope of improvement or average weekly increase in score by grade level) specified in your manual or published materials?
Yes
If yes, specify the growth standards:
aimswebPlus uses national norming data on student growth to provide student growth percentiles (SGP). SGPs are calculated to communicate the range of typical growth trajectories for students within each grade level, conditional to the student's initial performance level, and specified for growth intervals through the school year (i.e, Fall to Winter, Winter to Spring, or Fall to Spring). An SGP indicates the percentage of students in the national sample whose seasonal (or annual) rate of improvement (ROI) fell at or below a specified ROI. Separate SGP distributions are computed for each of five initial performance levels: Well Below Average (<11th percentile), Below average (11th-25th percentile), Average (26th-74th percentile), Above Average (75th-89th percentile), and Well Above Average (>89th Percentile). Guidance is provided in our manuals about how to interpret SGPs and form decisions on whether students are showing acceptable growth. This includes evaluating ROIs observed between benchmark screening test administrations, and ROI slopes estimated from multiple progress monitoring scores over time. aimswebPlus progress monitoring features also provide real-time feedback for administrators when setting progress monitoring goals. Goals are set in the system by selecting the measure and baseline score, the goal date, the monitoring frequency (default is weekly), and the goal score. When the user defines the goal score, the system automatically labels the ambitiousness of the goal. The rate of improvement needed to achieve the goal is computed (ROI) and compared against SGP norms. An SGP < 50 is labeled “Insufficient” for closing the gap; an SGP between 50 and 85 is labeled “Closes the Gap”; an SGP between 85 and 97 is considered “Ambitious”; and an SGP > 97 is considered “Overly Ambitious”. aimswebPlus recommends setting performance goals at the top of the Closes the Gap range. Our manuals provide extensive guidance including case-study examples to help the administrator make the most appropriate decision about what growth rate is appropriate for each individual student.
Are benchmarks for minimum acceptable end-of-year performance specified in your manual or published materials?
Yes
If yes, specify the end-of-year performance standards:
aimswebPlus allows users to select from a range of end-of-year targets, with recommendations on how to decide on the most appropriate goal accounting for their school's unique student body and instructional needs including state-defined criteria for proficiency. aimswebPlus defines a meaningful target as one that is objective, quantifiable, and can be linked to a criterion that has inherent meaning for teachers. To establish a meaningful performance target using aimswebPlus tiers, the account manager (e.g., a school/district administrator) is advised to choose a target that is linked to a criterion, is challenging and achievable, and reflects historical performance results (when available). Users are also advised to consider the resources available to achieve the goal. The targets are primarily based on spring reading or math composite score national percentiles but can also be applied to individual measures in the aimswebPlus assessment system. Twelve national percentile targets ranging from the 15th through the 70th percentile, in increments of 5 are provided. This range was chosen because it covers the breadth of passing rates on state assessments and the historical range of targets typically used. The system provides a default spring performance target of the 30th national percentile. Targets can be set separately for Reading and Math. Guides and other resources provide more detail to help users define a high-quality performance target and present a step-by-step method to align spring performance targets to performance levels on state accountability tests. Once a target is selected, the aimswebPlus system automatically identifies the fall (or winter) cut score that divides the score distribution into three instructional Tiers. Students above the highest cut score are in Tier 1 and have a high probability (80%–95%) of meeting the performance target; students between the upper and lower cut scores are in Tier 2 and have a moderate probability (40%–70%) of meeting the performance target; and students below the lower cut score are in Tier 3 and have a low probability (10%–40%) of meeting the performance target.
What is the basis for specifying minimum acceptable growth and end of year benchmarks?
selected
not selected
not selected Other
If other, please specify:
False

If norm-referenced, describe the normative profile.

National representation (check all that apply):
Northeast:
selected New England
selected Middle Atlantic
Midwest:
selected East North Central
selected West North Central
South:
selected South Atlantic
selected East South Central
selected West South Central
West:
selected Mountain
selected Pacific

Local representation (please describe, including number of states)
Date
2013-2014
Size
18,000
Gender (Percent)
Male
50
Female
50
Unknown
0
SES indicators (Percent)
Eligible for free or reduced-price lunch
Based on schoolwide eligibility for free or reduced lunch, students were sorted into Low (1-32% eligible), Moderate (33-66% eligible), and High (67-100% eligible) SES categories. Students were distributed fairly evenly among the three SES levels.
Other SES Indicators
Race/Ethnicity (Percent)
White, Non-Hispanic
50%
Black, Non-Hispanic
Hispanic
American Indian/Alaska Native
Asian/Pacific Islander
Other
Unknown
Disability classification (Please describe)
Participating schools were required to assess all students in the selected grades except those with moderate to severe intellectual disabilities or moderate to severe motor impairment and those who are blind, or deaf.

First language (Please describe)
English

Language proficiency status (Please describe)
Participating schools were required to assess all students in the selected grades except those with an English Language Proficiency score of less than 3.
Do you provide, in your user’s manual, norms which are disaggregated by race or ethnicity? If so, for which race/ethnicity?
not selected White, Non-Hispanic
not selected Black, Non-Hispanic
not selected Hispanic
not selected American Indian/Alaska Native
not selected Asian/Pacific Islander
not selected Other
not selected Unknown

If criterion-referenced, describe procedure for specifying criterion for adequate growth and benchmarks for end-of-year performance levels.

Describe any other procedures for specifying adequate growth and minimum acceptable end of year performance.
To get the most value from progress monitoring, aimswebPlus recommends the following: (1) establish a time frame, (2) determine the level of performance expected, and (3) determine the criterion for success. Typical time frames include the duration of the intervention or the end of the school year. An annual time frame is typically used when IEP goals are written for students who are receiving special education services. aimswebPlus provides several ways to define a level of expected performance. The goal can be based on: well-established performance benchmarks that can be linked to aimswebPlus measures via national percentiles (e.g., the link to state test performance levels) or total score; a national performance norm benchmark (e.g., the 50th national percentile is often used to indicate on-grade level performance); a local performance norm benchmark; or an expected or normative rate of improvement (ROI). When users set progress monitoring goals, aimswebPlus references student growth percentile norms to provide feedback about value and ambitiousness of the goal. Within the aimswebPlus software, the user enters the goal date and moves a digital slider to the desired ROI. As the slider moves, it provides feedback about the strength of the ROI: Insufficient, Closes the Gap, Ambitious, or Overly Ambitious. Users are encouraged to use the Ambitious (85th–97th SGP) for students who need intensive intervention.

Performance Level

Reliability

Grade Grade 2
Grade 3
Grade 4
Grade 5
Grade 6
Grade 7
Grade 8
Rating Convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence Partially convincing evidence
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
*Offer a justification for each type of reliability reported, given the type and purpose of the tool.
The purpose of Number Comparison Fluency - Triads as a progress monitoring tool is to consistently measure math performance across multiple time points. For each grade level, NCF-T uses multiple alternative forms that are equivalent in form. Here we report two types of reliability evidence: alternative-form reliability and internal-consistency (alpha). Justification for Study 1: Alternate-form reliability is important for progress monitoring NCF-T measures because it shows that pairs of test forms independently administered with different content can produce similar results. Justification for Study 2: Internal consistency of test form scores is important to demonstrate how a set of test forms, when considered together, consistently measure the same underlying construct. Together, these methods provide robust evidence of the reliability of our test measures, both as a cohesive set and as individual forms.
*Describe the sample(s), including size and characteristics, for each reliability analysis conducted.
To conduct each reliability analysis independently, two student samples were drawn from all students completing NCF-T progress monitoring measures during the 2022-2023 school year. The two samples were defined based on their demographic characteristics (i.e., gender and ethnicity). The purpose was to create two representative samples. Sample 1 was used to calculate Internal Consistency (Female: 50.1%, Male: 45.0%, Non-binary/Not Reported: 4.7%; Meal Status Free: 10.7%, Reduced: 0.7%; Ethnicity Asian: 1.0%, Black 11.5%, Hispanic 15.7%, White 41.0%, Multiple/Other/Unknown: 23.9%.). At each grade level, internal consistency was calculated on students taking 80%or more of the progress monitoring forms. To include students who did not complete all PM scores, we employed the multiple imputation by chained equations (MICE) algorithm using the mice package in R, with 5 multiple imputations using predicative mean matching. The percentage of imputed data was less than 12.5% for each grade level. Sample2 was used in computing alternative form reliability (Female: 49.3%, Male: 45.0%, Non-binary/Not Reported: 5.7%; Meal Status Free: 10.7%, Reduced: 0.7%; Ethnicity Asian: 1.4%, Black 13.9%, Hispanic 15.0%, White 42.0%, Multiple/Other/Unknown: 25.3%.). All students taking at least two progress monitoring forms were included in the analysis to maximize the variability in performance levels represented.
*Describe the analysis procedures for each reported type of reliability.
Alternate-form reliability and Internal consistency analyses were completed on separate samples of aimswebPlus progress monitoring data. For Alternate-form reliability evidence was calculated as the Pearson correlation coefficient using test scores across all pairs of progress monitoring forms. For our internal consistency of NCF-T scores across forms, Alpha (Cronbach’s Alpha) was calculated by treating NCF-T scores from each administration of the progress monitoring forms as a separate indicator of the student’s ability.

*In the table(s) below, report the results of the reliability analyses described above (e.g., model-based evidence, internal consistency or inter-rater reliability coefficients). Include detail about the type of reliability data, statistic generated, and sample size and demographic information.

Type of Subscale Subgroup Informant Age / Grade Test or Criterion n
(sample/
examinees)
n
(raters)
Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of reliability analysis not compatible with above table format:
Manual cites other published reliability studies:
No
Provide citations for additional published studies.
Do you have reliability data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
No

If yes, fill in data for each subgroup with disaggregated reliability data.

Type of Subscale Subgroup Informant Age / Grade Test or Criterion n
(sample/
examinees)
n
(raters)
Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of reliability analysis not compatible with above table format:
Manual cites other published reliability studies:
Provide citations for additional published studies.

Validity

Grade Grade 2
Grade 3
Grade 4
Grade 5
Grade 6
Grade 7
Grade 8
Rating Partially convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence Unconvincing evidence
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
*Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.
Two external criterion measures were used in our validity analyses: the Math state summative assessment from the Tennessee Comprehensive Assessment Program (TCAP-Math) and the Northwest Evaluation Association Measures of Academic Progress for Math Growth (MAP Growth Math). The mathematics state summative assessment from the Tennessee Comprehensive Assessment Program (TCAP-Math) was used as the criterion measure for concurrent and predictive analyses. TCAP-Math is a state-summative assessment measuring math abilities aligned to Tennessee’s academic standards and each grade level’s assessment is composed of four subdomains matching key domains of the standards. Concurrent validity analyses focused on construct validity by assessing strength of the association between NCF-T scores and the subdomains of TCAP-Math intended to assess the most similar math abilities. Since NCF-T is intended to assess a more specific math ability than the full extent of the TCAP-Math blueprint, concurrent validity analyses used the sum of raw scores from the TCAP-Math subdomains which most directly align to number comparison abilities assessed on each grade level’s NCF-T blueprint (Grades 3-5: Fractions, Number Relationships and Patterns; Grade 6: Number Relationships, Ratios and Rates; Grade 7: Number Relationships, Proportional Reasoning; Grade 8: Number Relationships). Predictive validity analyses assessed the strength of the relationship between how magnitude processing and number knowledge abilities assessed by NCF-T scores predict on-grade-level math proficiency. Thus, predictive validity analyses used TCAP-Math Scale Scores as the criterion measure for grades 2-8. MAP Growth Math RIT scores were used as our external criterion measure for our concurrent validity analyses for Grade 2. MAP Growth Math assesses standards-aligned math skills. TCAP-Math and MAP Growth Math are appropriate criterion measure for these analyses as NCF-T is intended to measure grade appropriate number knowledge and magnitude processing abilities related to the standards-aligned skills assessed by these external criterion assessments.
*Describe the sample(s), including size and characteristics, for each validity analysis conducted.
Two samples of students from Tennessee and Illinois were chosen for our validity analyses. The Tennessee sample comes from a large school district with students represented across 51 elementary schools (Grades 2-5) and 21 middle schools (Grades 6-8) in urban, suburban, and rural regions. Students of all ability levels in this district completed NCF-T as a part of their universal screening assessments at the beginning, middle, and end of the school year and a portion of these students use NCF-T for progress monitoring. The Illinois sample comes from a school district with students represented across 10 elementary schools of varying sizes and locations around a medium sized city. Demographic data indicate the Illinois sample was drawn from a diverse district composed of multiple ethnic and socioeconomic backgrounds. For predictive validity analyses, NCF-T data for Grades 3-8 of the Tennessee sample were gathered during Fall of the 2022-2023 school year. For predictive validity analyses for Grade 2, NCF-T data for Grade 2 was gathered during the Spring of 2022. For concurrent validity analyses, NCF-T data for Grades 3-8 came from the Tennessee sample and was gathered during the spring of 2023. For concurrent validity analyses for Grade 2, NCF-T data came from the Illinois Sample and was gathered during the winter of 2022-2023. All students with valid NCF-T and External criterion data scores meeting the criteria for our analyses were included.
*Describe the analysis procedures for each reported type of validity.
Two type of validity analyses were conducted with NCF-T scores and the external criterion measures: predictive validity and concurrent validity. Predictive Validity analyses were conducted by examining the strength of the Pearson correlation coefficient between NCF-T scores gathered multiple months prior to TCAP-Math scale scores for the same students. For predictive validity analyses for grades 3-8 correlation coefficients were calculated between scores from NCF-T tests given in the Fall of 2022 (August-November) and TCAP-Math given in the Spring (March) of 2023. For the predictive validity analysis for Grade 2, correlation coefficients were calculated between scores from NCF-T tests given in the Spring of 2022 and TCAP scores of the same students assessed at the end of Grade 3 in the Spring of 2023. Concurrent Validity analyses were conducted with the Illinois Sample for Grade 2 and the Tennessee Sample for Grades 3-8, by examining the strength of the Pearson correlation coefficient between NCF-T scores and the external criterion measures observed within 2 months of each other. For the concurrent validity analysis for Grade 2, correlation coefficients were calculated between MAP Growth Math scores in the Winter of 2023 (end of January to beginning of February) and the same students’ NCF-T scores from the closest administration date (January-February 2023). For the concurrent validity analysis for Grades 3-8, correlation coefficients were calculated between scores from TCAP-Math tests given in the Spring of 2023 (March) and the same student’s NCF-T test with the closest administration date in the Spring of 2023 (March-May). The sums of raw scores from subdomains for each grade level of the TCAP-Math assessment that were most directly aligned to mental computation were used as the external criterion score. For predictive and concurrent validity analyses, 95% confidence intervals for the correlation coefficients using the Fischer z-transformation.

*In the table below, report the results of the validity analyses described above (e.g., concurrent or predictive validity, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.

Type of Subscale Subgroup Informant Age / Grade Test or Criterion n
(sample/
examinees)
n
(raters)
Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of validity analysis not compatible with above table format:
Results reported provide strong support for the validity of NCF-T as an assessment of magnitude processing and number knowledge abilities. Overall, predictive and concurrent validity results indicate that NCF-T scores show a moderate-to-strong positive relationship with general assessments of Math abilities assessed in the TCAP summative assessment of Math. This positive relationship was observed across all grade levels, especially among upper elementary school-aged students. This relationship was also observed in predictive validity results with Grade 2 NCF-T test scores and TCAP scores observed a full year later, supporting validity of NCF-T measuring fundamental number processing abilities associated with the development of broader math abilities reflected on summative state test outcomes like the TCAP-Math. Results of the concurrent validity analyses, especially for Grades 3-8 further indicate that NCF-T can measure variability in number knowledge and magnitude processing abilities similar to relevant subdomains of the TCAP-Math assessment. The concurrent validity coefficients observed in the Illinois Grade 2 sample lower than all other grade levels, which may reflect limitations of the smaller sample with a more limited range of abilities observed. Additionally, moderate correlation coefficients observed among Grade 8 students are lower that preceding grade levels, which may indicate that as mathematics content becomes more and more complex and student knowledge becomes more advanced fundamental concepts of number comparison fluency may show less variability or may be less directly related to skills assessed in state summative tests.
Manual cites other published reliability studies:
No
Provide citations for additional published studies.
Describe the degree to which the provided data support the validity of the tool.
Do you have validity data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
No

If yes, fill in data for each subgroup with disaggregated validity data.

Type of Subscale Subgroup Informant Age / Grade Test or Criterion n
(sample/
examinees)
n
(raters)
Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of validity analysis not compatible with above table format:
Manual cites other published reliability studies:
No
Provide citations for additional published studies.

Bias Analysis

Grade Grade 2
Grade 3
Grade 4
Grade 5
Grade 6
Grade 7
Grade 8
Rating Yes Yes Yes Yes Yes Yes Yes
Have you conducted additional analyses related to the extent to which your tool is or is not biased against subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)? Examples might include Differential Item Functioning (DIF) or invariance testing in multiple-group confirmatory factor models.
Yes
If yes,
a. Describe the method used to determine the presence or absence of bias:
To assess how well NCF-T items function fairly for different student subgroups, we ran DIF analyses for NCF-T tests using the logistic regression method. Items were evaluated by the delta R2, the differences in Nagelkerke's R2 coefficients. The effect sizes are classified as "negligible", "moderate" or "large" based on Zumbo and Thomas (1997). In this analysis, items with “moderate” or “large” effect sizes were flagged as DIF items.
b. Describe the subgroups for which bias analyses were conducted:
DIF analyses were conducted on two demographic categories of gender and race/ethnicity. NCF-T data collected from benchmark screening and progress monitoring assessments was analyzed for each grade level. For gender, we analyzed the presence of DIF between student identified as male (47.5%) or female (52.5%). Gender DIF analyses across all grade levels included a total of 133,926 students from 1328 school districts across the US. For racial/ethnic groups, we analyzed the presence of DIF between students identified as Asian (2.0%), Black (17.8%), Hispanic (19.4%), White (51.6%), or other groups including multiple races, unknown, and rarely observes racial/ethnic groups (9.2%). DIF analyses based on race/ethnicity across all grade levels included a total of 113,089 students from 1118 school districts across the US.
c. Describe the results of the bias analyses conducted, including data and interpretative statements. Include magnitude of effect (if available) if bias has been identified.
Results of our DIF analyses indicate that NCF-T provides a fair assessment of math abilities across demographic groups. Results summarized as the average percent of items showing moderate to large DIF effect sizes indicated that across all grades and alternate forms, more than 99% of items shown did not show DIF between gender groups or racial/ethnic groups. The average proportion of items showing moderate to large effect sizes ranged from 0.000-0.007.

Growth Standards

Sensitivity: Reliability of Slope

Grade Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Rating Convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence Convincing evidence Unconvincing evidence
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
Describe the sample, including size and characteristics. Please provide documentation showing that the sample was composed of students in need of intensive intervention. A sample of students with intensive needs should satisfy one of the following criteria: (1) all students scored below the 30th percentile on a local or national norm, or the sample mean on a local or national test fell below the 25th percentile; (2) students had an IEP with goals consistent with the construct measured by the tool; or (3) students were non-responsive to Tier 2 instruction. Evidence based on an unknown sample, or a sample that does not meet these specifications, may not be considered.
The sample data for the reliability of slope analyses was drawn from all progress monitoring data gathered during the 2022-2023 school year. Progress monitoring testing sequences were evaluated for the following criteria to include in the analyses. First, the baseline assessment score for each student was compared against the NCF-T national norms for the season the test was administered to identify students score below the 30th percentile. Students with progress monitoring schedules in aimswebPlus nearly exclusively consist of students in need of intensive intervention, more over this criterion was used to direct our analyses further to only students with intensive needs. Second, only students with at least 10 valid progress monitoring scores gathered at regular testing intervals across at least 20 weeks were retained for analysis. The final sample for all grade levels included 19,785 students representing 884 districts across the US with a diverse demographic profile (Female: 52%, Male: 39.0%, Non-Binary/Not Reported: 7.9%; Asian: 1.2%, Black: 14.0%, Hispanic: 15.7%, White: 39.5%, Other Categories or Not reported: 29.7)
Describe the frequency of measurement (for each student in the sample, report how often data were collected and over what span of time).
Data for this analysis was comprised of students with at least 10 data points collected over at least 20 weeks. Furthermore, test scores needed to be collected at regular intervals which varied across students. Typically, tests were gathered at weekly or twice per month (every other week), Disruptions in schedules were allowed only up to 40 days between test dates.
Describe the analysis procedures.
To analyze the Reliability of the Slope for NCF-T we fit a mixed effects model estimating the average rate of improvement slope (ROI) as the linear function of test scores over time (fixed effect), while accounting for variability in individual differences in baseline scores (random intercept) and individual slopes (random slope). The model was fit using the lme4 package in R. From the model we extracted the true score variance and the total score variance and total variance of slope to calculate the reliability of the slope (reliability of slope = true score variance / total variance of slope).

In the table below, report reliability of the slope (e.g., ratio of true slope variance to total slope variance) by grade level (if relevant).

Type of Subscale Subgroup Informant Age / Grade Test or Criterion n
(sample/
examinees)
n
(raters)
Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of reliability analysis not compatible with above table format:
Manual cites other published reliability studies:
No
Provide citations for additional published studies.
Do you have reliability of the slope data that is disaggregated by subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)?
No

If yes, fill in data for each subgroup with disaggregated reliability of the slope data.

Type of Subscale Subgroup Informant Age / Grade Test or Criterion n
(sample/
examinees)
n
(raters)
Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of reliability analysis not compatible with above table format:
Manual cites other published reliability studies:
No
Provide citations for additional published studies.

Sensitivity: Validity of Slope

Grade Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Rating Data unavailable Data unavailable Data unavailable Data unavailable Data unavailable Data unavailable Data unavailable
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.
Describe the sample(s), including size and characteristics. Please provide documentation showing that the sample was composed of students in need of intensive intervention. A sample of students with intensive needs should satisfy one of the following criteria: (1) all students scored below the 30th percentile on a local or national norm, or the sample mean on a local or national test fell below the 25th percentile; (2) students had an IEP with goals consistent with the construct measured by the tool; or (3) students were non-responsive to Tier 2 instruction. Evidence based on an unknown sample, or a sample that does not meet these specifications, may not be considered.
Describe the frequency of measurement (for each student in the sample, report how often data were collected and over what span of time).
Describe the analysis procedures for each reported type of validity.

In the table below, report predictive validity of the slope (correlation between the slope and achievement outcome) by grade level (if relevant).
NOTE: The TRC suggests controlling for initial level when the correlation for slope without such control is not adequate.

Type of Subscale Subgroup Informant Age / Grade Test or Criterion n
(sample/
examinees)
n
(raters)
Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of reliability analysis not compatible with above table format:
Manual cites other published validity studies:
Provide citations for additional published studies.
Describe the degree to which the provided data support the validity of the tool.
Do you have validity of the slope data that is disaggregated by subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)?

If yes, fill in data for each subgroup with disaggregated validity of the slope data.

Type of Subscale Subgroup Informant Age / Grade Test or Criterion n
(sample/
examinees)
n
(raters)
Median Coefficient 95% Confidence Interval
Lower Bound
95% Confidence Interval
Upper Bound
Results from other forms of reliability analysis not compatible with above table format:
Manual cites other published validity studies:
Provide citations for additional published studies.

Alternate Forms

Grade Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Rating Unconvincing evidence Unconvincing evidence Unconvincing evidence Unconvincing evidence Unconvincing evidence Unconvincing evidence Unconvincing evidence
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
Describe the sample for these analyses, including size and characteristics:
What is the number of alternate forms of equal and controlled difficulty?
NCF-T forms included 3 benchmark screening forms and 20 progress monitoring forms. All forms are designed to be equivalent according to content and item difficulty. All forms for each grade level are composed according to a common blueprint, with equal proportions of items assessing different categories of number comparison subskills based on the number of digits in the numbers and the numeric notation. Equivalency studies were conducted during form development to control for the difficulty of forms.
If IRT based, provide evidence of item or ability invariance
If computer administered, how many items are in the item bank for each grade level?
The number of items used to compose the alternate forms of NCF-T range from 586-589 items per grade.
If your tool is computer administered, please note how the test forms are derived instead of providing alternate forms:

Decision Rules: Setting & Revising Goals

Grade Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Rating Data unavailable Data unavailable Data unavailable Data unavailable Data unavailable Data unavailable Data unavailable
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
In your manual or published materials, do you specify validated decision rules for how to set and revise goals?
If yes, specify the decision rules:
What is the evidentiary basis for these decision rules?
NOTE: The TRC expects evidence for this standard to include an empirical study that compares a treatment group to a control and evaluates whether student outcomes increase when decision rules are in place.

Decision Rules: Changing Instruction

Grade Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8
Rating Data unavailable Data unavailable Data unavailable Data unavailable Data unavailable Data unavailable Data unavailable
Legend
Full BubbleConvincing evidence
Half BubblePartially convincing evidence
Empty BubbleUnconvincing evidence
Null BubbleData unavailable
dDisaggregated data available
In your manual or published materials, do you specify validated decision rules for when changes to instruction need to be made?
If yes, specify the decision rules:
What is the evidentiary basis for these decision rules?
NOTE: The TRC expects evidence for this standard to include an empirical study that compares a treatment group to a control and evaluates whether student outcomes increase when decision rules are in place.

Data Collection Practices

Most tools and programs evaluated by the NCII are branded products which have been submitted by the companies, organizations, or individuals that disseminate these products. These entities supply the textual information shown above, but not the ratings accompanying the text. NCII administrators and members of our Technical Review Committees have reviewed the content on this page, but NCII cannot guarantee that this information is free from error or reflective of recent changes to the product. Tools and programs have the opportunity to be updated annually or upon request.