aimswebPlus Math
Number Sense Fluency

Summary

Descriptive Information

aimswebPlus is a brief and valid assessment system for screening and monitoring reading and math skills for all students in Kindergarten through Grade 8. Normative data were collected in 2013-14 on a combination of fluency measures that are sensitive to growth and new standards-based assessments of classroom skills. The resulting scores and reports inform instruction and help improve student performance. Number Sense Fluency is administered in an group setting, Once testing is complete, summary and detailed reports for students, classrooms, and districts can be generated immediately.

Acquisition & Cost

Where to Obtain:: Pearson; info@aimsweb.com; San Antonio Office 19500 Bulverde Road, #201 San Antonio, TX, 78259; 1-866-313-6194; www.aimswebplus.com

Initial Cost:: $8.50 per student

Replacement Cost:: $8.50 per student per year

Included in Cost:: aimswebPlus is an online solution that includes digital editions of training manuals and testing materials within the application. Cost per student for 1 Year: $8.50/student/year for access to all measures (reading and math) Cost per student for subsequent years: $8.50 Complete Kit: aimswebPlus is an online solution that includes digital editions of training manuals and testing materials within the application.; aimswebPlus is a subscription-based tool. There are three subscription types available for customers: ● aimswebPlus Complete is $8.50 per student and includes all measures. ● aimswebPlus Reading is $6.50 per student and includes Early Literacy and Reading measures. ● aimswebPlus Math is $6.50 per student and includes Early Numeracy and Math measures. Note. Current aimsweb customers upgrading to aimswebPlus receive a $2/student discount off of the subscription. Test accommodations that are documented in a student’s Individual Education Plan (IEP) are permitted with aimswebPlus. However, not all measures allow for accommodations. Number Sense Fluency—the combined measures of Mental Computation Fluency (MCF) and Number Comparison Fluency–Triads (NCF–T)—is a computer-administered, timed test that employs strict time limits, in part, to generate rate-based scores. As such, valid interpretation of national norms, which are an essential aspect of decision-making during benchmark testing, depend on strict adherence to the standard administration procedures. The following accommodation is allowed for Number Sense Fluency during screening and progress monitoring: ● modifying the environment (e.g., special lighting, adaptive furniture)

Training & Technical Support

Training Requirements:: Less than one hour of training.

Qualified Administrators:: Paraprofessional or professional.

Access to Technical Support:: Pearson provides phone and email-based support, as well as a user group forum that facilitates the asking and answering of questions.

Administration

Assessment Format:

Small group
Large group
Computer-administered

Scoring Time:

Scoring is automatic OR
0 minutes per student/group

Scores Generated:

Raw score
Percentile score
Composite scores

Administration Time:

7 minutes per student/group

Scoring Method:

Automatically (computer-scored)

Technology Requirements:

Computer or tablet
Internet connection

Tool Information

Descriptive Information

Please provide a description of your tool:: aimswebPlus is a brief and valid assessment system for screening and monitoring reading and math skills for all students in Kindergarten through Grade 8. Normative data were collected in 2013-14 on a combination of fluency measures that are sensitive to growth and new standards-based assessments of classroom skills. The resulting scores and reports inform instruction and help improve student performance. Number Sense Fluency is administered in an group setting, Once testing is complete, summary and detailed reports for students, classrooms, and districts can be generated immediately.

Is your tool designed to measure progress towards an end-of-year goal (e.g., oral reading fluency) or progress towards a short-term skill (e.g., letter naming fluency)?: End-year goal
Short-term skill

The tool is intended for use with the following grade(s).

Preschool / Pre - kindergarten
not selected

Kindergarten
not selected

First grade
selected

Second grade
selected

Third grade
selected

Fourth grade
selected

Fifth grade
selected

Sixth grade
selected

Seventh grade
selected

Eighth grade
not selected

Ninth grade
not selected

Tenth grade
not selected

Eleventh grade
not selected

Twelfth grade

The tool is intended for use with the following age(s).

0-4 years old
not selected

5 years old
not selected

6 years old
selected

7 years old
selected

8 years old
selected

9 years old
selected

10 years old
selected

11 years old
selected

12 years old
selected

13 years old
selected

14 years old
not selected

15 years old
not selected

16 years old
not selected

17 years old
not selected

18 years old

The tool is intended for use with the following student populations.

Students in general education
selected

Students with disabilities
selected

English language learners

ACADEMIC ONLY: What dimensions does the tool assess?

Reading

Global Indicator of Reading Competence
not selected

Listening Comprehension
not selected

Vocabulary
not selected

Phonemic Awareness
not selected

Decoding

Passage Reading
not selected

Word Identification
not selected

Comprehension

Spelling & Written Expression

Global Indicator of Spelling Competence
not selected

Global Indicator of Writting Expression Competence

Mathematics

Global Indicator of Mathematics Comprehension
not selected

Early Numeracy
not selected

Mathematics Concepts
selected

Mathematics Computation
not selected

Mathematics Application
selected

Fractions

Algebra

Other

Please describe specific domain, skills or subtests:

BEHAVIOR ONLY: Please identify which broad domain(s)/construct(s) are measured by your tool and define each sub-domain or sub-construct.

BEHAVIOR ONLY: Which category of behaviors does your tool target?

Acquisition and Cost Information

Where to obtain:

Email Address: info@aimsweb.com
Address: San Antonio Office 19500 Bulverde Road, #201 San Antonio, TX, 78259
Phone Number: 1-866-313-6194
Website: www.aimswebplus.com

Initial cost for implementing program:

Cost: $8.50
Unit of cost: student

Replacement cost per unit for subsequent use:

Cost: $8.50
Unit of cost: student
Duration of license: year

Additional cost information:

Describe basic pricing plan and structure of the tool. Provide information on what is included in the published tool, as well as what is not included but required for implementation.: aimswebPlus is an online solution that includes digital editions of training manuals and testing materials within the application. Cost per student for 1 Year: $8.50/student/year for access to all measures (reading and math) Cost per student for subsequent years: $8.50 Complete Kit: aimswebPlus is an online solution that includes digital editions of training manuals and testing materials within the application.

Provide information about special accommodations for students with disabilities.: aimswebPlus is a subscription-based tool. There are three subscription types available for customers: ● aimswebPlus Complete is $8.50 per student and includes all measures. ● aimswebPlus Reading is $6.50 per student and includes Early Literacy and Reading measures. ● aimswebPlus Math is $6.50 per student and includes Early Numeracy and Math measures. Note. Current aimsweb customers upgrading to aimswebPlus receive a $2/student discount off of the subscription. Test accommodations that are documented in a student’s Individual Education Plan (IEP) are permitted with aimswebPlus. However, not all measures allow for accommodations. Number Sense Fluency—the combined measures of Mental Computation Fluency (MCF) and Number Comparison Fluency–Triads (NCF–T)—is a computer-administered, timed test that employs strict time limits, in part, to generate rate-based scores. As such, valid interpretation of national norms, which are an essential aspect of decision-making during benchmark testing, depend on strict adherence to the standard administration procedures. The following accommodation is allowed for Number Sense Fluency during screening and progress monitoring: ● modifying the environment (e.g., special lighting, adaptive furniture)

Administration

BEHAVIOR ONLY: What type of administrator is your tool designed for?

General education teacher
not selected

Special education teacher
not selected

Parent

Child

External observer
not selected

Other

If other, please specify:

BEHAVIOR ONLY: What is the administration format?

Direct observation
not selected

Rating scale
not selected

Checklist

Performance measure
not selected

Other

If other, please specify:

BEHAVIOR ONLY: What is the administration setting?

General education classroom
not selected

Special education classroom
not selected

School office
not selected

Recess

Lunchroom

Home

Other

If other, please specify:

Does the program require technology?

Yes

If yes, what technology is required to implement your program? (Select all that apply)

Computer or tablet
selected

Internet connection
not selected

Other technology (please specify)

If your program requires additional technology not listed above, please describe the required technology and the extent to which it is combined with teacher small-group instruction/intervention:

What is the administration context?

Individual

Small group If small group, n=

Large group If large group, n=

Computer-administered
not selected

Other

If other, please specify:

What is the administration time?

Time in minutes

per (student/group/other unit)

student/group

Additional scoring time:

Time in minutes

per (student/group/other unit)

student/group

How many alternate forms are available, if applicable?

Number of alternate forms

3 benchmark forms and 20 progress monitoring forms

per (grade/level/unit)

grade

ACADEMIC ONLY: What are the discontinue rules?

No discontinue rules provided
not selected

Basals

Ceilings

Other

If other, please specify:

BEHAVIOR ONLY: Can multiple students be rated concurrently by one administrator?

If yes, how many students can be rated concurrently?

Training & Scoring

Training

Is training for the administrator required?: Yes

Describe the time required for administrator training, if applicable:: Less than one hour of training.

Please describe the minimum qualifications an administrator must possess.: Paraprofessional or professional.; No minimum qualifications

Are training manuals and materials available?: Yes

Are training manuals/materials field-tested?: Yes

Are training manuals/materials included in cost of tools?: Yes
If No, please describe training costs:

Can users obtain ongoing professional and technical support?: Yes
If Yes, please describe how users can obtain support:: Pearson provides phone and email-based support, as well as a user group forum that facilitates the asking and answering of questions.

Scoring

BEHAVIOR ONLY: What types of scores result from the administration of the assessment?

Score
Observation	Behavior Rating
Frequency Duration Interval Latency	Raw score

Conversion
Observation	Behavior Rating
Rate Percent	Standard score Subscale/ Subtest Composite Stanine Percentile ranks Normal curve equivalents IRT based scores

Interpretation
Observation	Behavior Rating
Error analysis Peer comparison Rate of change	Dev. benchmarks Age-Grade equivalent

How are scores calculated?

Manually (by hand)
selected

Automatically (computer-scored)
not selected

Other

If other, please specify:

Do you provide basis for calculating performance level scores?

Yes

What is the basis for calculating performance level and percentile scores?

Age norms

Grade norms
not selected

Classwide norms
not selected

Schoolwide norms
not selected

Stanines

Normal curve equivalents

What types of performance level scores are available?

Raw score

Standard score
selected

Percentile score
not selected

Grade equivalents
not selected

IRT-based score
not selected

Age equivalents
not selected

Stanines

Normal curve equivalents
not selected

Developmental benchmarks
not selected

Developmental cut points
not selected

Equated

Probability
not selected

Lexile score
not selected

Error analysis
selected

Composite scores
not selected

Subscale/subtest scores
not selected

Other

If other, please specify:

Please describe the scoring structure. Provide relevant details such as the scoring format, the number of items overall, the number of items per subscale, what the cluster/composite score comprises, and how raw scores are calculated.: Number Sense Fluency (NSF) comprises two sections, which are always administered together. ● Mental Computation Fluency (MCF): a timed measure that assesses fluency through solving 1-and 2-step computation problems. ● Number Comparison Fluency–Triads (NCF–T): a timed measure that assesses fluency through number comparison. Both NCF–T and MCF employ a correction for guessing when calculating the total score. The corrected total score is: NC – NW/2, where NC is the number of items correctly answered and NW is the number of items answered incorrectly. Scores are then rounded to the nearest whole number. Corrected total scores can range from 0 to 40 (NCF–T) or 42 (MCF). Items not attempted and items not reached are ignored in the calculation of the corrected total score. Together, these measures combine into a Number Sense Fluency score, which is the simple sum of the NCF–T and MCF corrected scores. This NSF score is the basis for progress monitoring decisions.

Do you provide basis for calculating slope (e.g., amount of improvement per unit in time)?: Yes

ACADEMIC ONLY: Do you provide benchmarks for the slopes?: Yes

ACADEMIC ONLY: Do you provide percentile ranks for the slopes?: Yes

What is the basis for calculating slope and percentile scores?

Age norms

Grade norms
not selected

Classwide norms
not selected

Schoolwide norms
not selected

Stanines

Normal curve equivalents

Describe the tool’s approach to progress monitoring, behavior samples, test format, and/or scoring practices, including steps taken to ensure that it is appropriate for use with culturally and linguistically diverse populations and students with disabilities.: Number Sense Fluency (NSF) measures a student’s automaticity with comparing numbers within and across number systems and mentally solving one- and two-step computation problems. As noted above, NSF is divided into two sections: Number Comparison Fluency–Triads (NCF–T) and Mental Computation Fluency (MCF). NCF–T has a 3 minute time limit and MCF has a 4 minute time limit. Students answer as many items as they can within the time limit for each given section. The two measures are described below. Number Comparison Fluency–Triads (NCF–T) ● Overview: Measures a student’s ability to assess magnitude and compare numbers within and across number systems. Content at each grade level reflects the expectations outlined in the Common Core State Standards for mathematics. ● Test Format: group, online, timed; all items have three response options ● Test Content: The student answers multiple-choice math items, each requiring the comparison of a set of three numbers. Each item is presented as a triad of numbers, with the student determining whether the top number in the triad is closer in value to the bottom left number, the bottom right number, or exactly between the two numbers. Each NCF–T form contains 40 items, presented four per screen. (See test blueprint information and sample items below). ● 23 unique forms per grade, 3 benchmark and 20 progress monitoring; PM testing conducted at teacher-determined intervals ● Score: 1 point for each correctly answered item, total scores then adjusted for guessing ● Time limit: 3 minutes Number Comparison Fluency–Triads (NCF–T) Blueprint An NCF–T test blueprint was developed to reflect the expectations described in the Common Core State Standards at each given grade level. This blueprint was then used to develop all 23 NCF–T forms per grade (3 screening forms, 20 progress monitoring forms for each grade). The following table outlines NCF–T content by grade, topic area, and item count. Mental Computation Fluency (MCF) ● Overview: Measures a student’s ability to mentally solve computation problems. Content at each grade level reflects the expectations outlined in the Common Core State Standards for mathematics. ● Test Format: group, online, timed; all items have three response options ● Test Content: The student answers multiple-choice math items, each requiring the mental computation of a math expression. Each MCF form contains 42 items, presented two per screen. (See test blueprint information and sample items on following pages). ● 23 unique forms per grade, 3 benchmark and 20 progress monitoring; PM testing conducted at teacher-determined intervals ● Score: 1 point for each correctly answered item, total scores then adjusted for guessing ● Time limit: 4 minutes Mental Computation Fluency (MCF) Blueprint An MCF test blueprint was developed to reflect the expectations described in the Common Core State Standards at each given grade level. This blueprint was then used to develop all 23 MCF forms per grade (3 screening forms, 20 progress monitoring forms for each grade). The following table outlines MCF content by grade, topic area, and item count. All NSF student-facing test content contains numbers and math computation symbols. Instructional text was written using simple, grade-appropriate language that keeps the students’ receptive language load to a minimum. In addition, audio is available for any students who would prefer to have instructional text read to them.

Rates of Improvement and End of Year Benchmarks

Is minimum acceptable growth (slope of improvement or average weekly increase in score by grade level) specified in your manual or published materials?: Yes; If yes, specify the growth standards:; aimswebPlus provides student growth percentiles (SGP) by grade and initial performance level (Fall and Winter) for establishing growth standards. An SGP indicates the percentage of students in the national sample whose seasonal (or annual) rate of improvement (ROI) fell at or below a specified ROI. Separate SGP distributions are computed for each of five levels of initial (Fall or Winter) performance. Goals are set in the system by selecting the measure and baseline score, the goal date, the monitoring frequency (default is weekly), and the goal score. When the user defines the goal score, the system automatically labels the ambitiousness of the goal. The rate of improvement needed to achieve the goal is computed and translated into an SGP. An SGP < 50 is considered Insufficient; an SGP between 50 and 85 is considered Closes the Gap; an SGP between 85 and 97 is considered Ambitious; and an SGP > 97 is considered Overly Ambitious. aimswebPlus recommends setting performance goals that represent rates of growth between the 85th and 97th SGP. However, the user ultimately determines what growth rate is appropriate on an individual basis.

Are benchmarks for minimum acceptable end-of-year performance specified in your manual or published materials?: Yes; If yes, specify the end-of-year performance standards:; aimswebPlus allows users to select a target from a range of end-of-year targets the one that is most appropriate for their instructional needs. aimswebPlus defines a meaningful target as one that is objective, quantifiable, and can be linked to a criterion that has inherent meaning for teachers. To establish a meaningful performance target using aimswebPlus tiers, the account manager (e.g., a school/district administrator) is advised to choose a target that: ● is linked to a criterion, ● is challenging and achievable, ● closes the achievement gap, and ● reflects historical performance results (when available). Customers are also advised to give consideration to the availability of resources to achieve the goal. The targets are based on spring reading or math composite score national percentiles. Twelve national percentile targets ranging from the 15th through the 70th percentile, in increments of 5 are provided. This range was chosen because it covers the breadth of passing rates on state assessments and the historical range of targets our customers typically use. The system provides a default spring performance target of the 30th national percentile. Targets can be set separately for Reading and Math. The aimswebPlus Tiers Guide provides more detail to help customers define a high quality performance target. It also provides a step-by-step method to align spring performance targets to performance levels on state accountability tests. Once a target is selected, the aimswebPlus system automatically identifies the fall (or winter) cut score that divides the score distribution into three instructional Tiers. Students above the highest cut score are in Tier 1 and have a high probability (80%–95%) of meeting the performance target; students between the upper and lower cut scores are in Tier 2 and have a moderate probability (40%–70%) of meeting the performance target; and students below the lower cut score are in Tier 3 and have a low probability (10%–40%) of meeting the performance target. The system recommends that a progress monitoring schedule be defined for any student below the 25th national percentile in a given season, or in Tiers 2 or 3.

What is the basis for specifying minimum acceptable growth and end of year benchmarks?

Norm-referenced
not selected

Criterion-referenced
not selected

Other

If other, please specify:

False

If norm-referenced, describe the normative profile.

National representation (check all that apply):

Northeast:

New England

Middle Atlantic

Midwest:

East North Central

West North Central

South:

South Atlantic

East South Central

West South Central

West:

Mountain

Pacific

Local representation (please describe, including number of states)

11 states, across all 4 U.S. regions

Date: 2013-14
Size: 3000

Gender (Percent)

Male: 50
Female: 50
Unknown

SES indicators (Percent)

Eligible for free or reduced-price lunch
Other SES Indicators: Based on schoolwide eligibility for free or reduced lunch, students were sorted into Low (1-32% eligible), Moderate (33-66% eligible), and High (67-100% eligible) SES categories. Students were distributed fairly evenly among the three SES levels.

Race/Ethnicity (Percent)

White, Non-Hispanic: 53-58%
Black, Non-Hispanic: 12-14%
Hispanic: 22-24%
American Indian/Alaska Native
Asian/Pacific Islander
Other: 5-10%
Unknown

Disability classification (Please describe): The norm sample includes all students in the classroom with exceptions for moderate to severe intellectual disability; blind or deaf; or moderate to severe motor coordination disability.
First language (Please describe)
Language proficiency status (Please describe): ELL (Percent): 10

Do you provide, in your user’s manual, norms which are disaggregated by race or ethnicity? If so, for which race/ethnicity?

White, Non-Hispanic
not selected

Black, Non-Hispanic
not selected

Hispanic

American Indian/Alaska Native
not selected

Asian/Pacific Islander
not selected

Other

Unknown

If criterion-referenced, describe procedure for specifying criterion for adequate growth and benchmarks for end-of-year performance levels.

Describe any other procedures for specifying adequate growth and minimum acceptable end of year performance.

To get the most value from progress monitoring, aimswebPlus recommends the following: (1) establish a time frame, (2) determine the level of performance expected, and (3) determine the criterion for success. Typical time frames include the duration of the intervention or the end of the school year. An annual time frame is typically used when IEP goals are written for students who are receiving special education services. For example, aimswebPlus goals can be written as follows: In 34 weeks, the student will compare numbers and answer computational problems to earn a score of 30 points on Grade 4 Number Sense Fluency forms. aimswebPlus provides several ways to define a level of expected performance. The goal can be based on: ● well-established performance benchmarks that can be linked to aimswebPlus measures via national percentiles (e.g., the link to state test performance levels) or total score (e.g., word read per minute in Grade 2); ● a national performance norm benchmark (e.g., the 50th national percentile is often used to indicate on-grade level performance); ● a local performance norm benchmark; ● or an expected or normative rate of improvement (ROI) such as the 85th national student growth percentile. When customers choose normative ROIs, aimswebPlus uses student growth percentiles to describe these normative rates of improvement. Within the aimswebPlus software, the user enters the goal date and moves a digital slider to the desired ROI. As the slider moves, it provides feedback about the strength of the ROI: Insufficient, Closes the Gap, Ambitious, or Overly Ambitious. Users are encouraged to use the Ambitious (85th–97th SGP) for students in need of intensive intervention.

Performance Level

Reliability

Grade	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

*Offer a justification for each type of reliability reported, given the type and purpose of the tool.: Alternate-form reliability, where equivalent forms are administered close together in time, is highly appropriate for progress monitoring CBM measures because it shows the consistency of scores from independently timed administrations with different content. Internal consistency reliability is not appropriate for speeded CBM measures. The stability coefficient, where equivalent forms are administered with an interval of several months, reflects additional measurement error due to true change over time. As a result, these reliabilities are generally lower. The alternate-form stability coefficient is based on correlations between fall-winter and winter-spring benchmark scores.

*Describe the sample(s), including size and characteristics, for each reliability analysis conducted.: The concurrent alternate-form reliability sample is based on 25 schools from across the U.S. representing each of three SES levels (described above). Participating schools administered the alternate forms to all students in Grades 2 through 8 in the school, with few exceptions for moderate to severe intellectual disabilities. Each student completed 3 alternate NSF forms with forms administered in sets and each set counter-balanced with reverse order. Each set included one anchor form and two unique forms (e.g. 1, 2, 3 or 1, 3, 4). In total 24 sets were administered. The number of students completing each pair ranged from 77–153 across Grades 2 through 8. The table reports the median reliability coefficient and the sample is the minimum sample for that grade. The stability coefficient is derived from the national norm sample described above.

*Describe the analysis procedures for each reported type of reliability.: Pearson correlation coefficients of the scores from alternate forms.

*In the table(s) below, report the results of the reliability analyses described above (e.g., model-based evidence, internal consistency or inter-rater reliability coefficients). Include detail about the type of reliability data, statistic generated, and sample size and demographic information.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Do you have reliability data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?: No

If yes, fill in data for each subgroup with disaggregated reliability data.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Validity

Grade	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

*Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.: Six criterion measures were used to calculate criterion validity for aimswebPlus Math: • Iowa Tests of Basic Skills®–Total Math (ITBS®) • Illinois Standards Achievement Test (ISAT) • New Mexico Standards Based Assessment (NMSBA) • Northwest Evaluation Association Measures of Academic Progress® (NWEA–MAP®) • State of Texas Academic Assessment of Readiness (STAAR) • aimswebPlus Concepts & Applications (CA) The ITBS is a comprehensive, group-administered, paper-based assessment of reading and math achievement. ITBS’s Total Math score reflects performance on standards-based math concepts, problem solving, and computation. The ISAT is the end-of-year achievement test assessing Illinois learning standards covering five math strands: Number Sense, Measurement, Algebra, Geometry, and Data Analysis and Probability. The NMSBA is used to measure student proficiency on New Mexico’s reading and math learning standards. NWEA–MAP is a computer-adaptive test that assesses achievement in reading and mathematics. Results are reported on an RIT scale, which is then linked to each state’s performance standards. The STAAR assesses student performance on Texas’s mathematics and reading learning standards. Finally, aimswebPlus Concepts & Applications (CA) is a standards-based interim assessment of math skills that is administered at the beginning, middle, and end of the school year. This assessment consists of 29–31 math concepts and problem solving items aligned to Common Core State Standards (CCSS) in mathematics in each of Grades 2 through 8. It is an individually administered power test in which students are given the time they need to complete each item. Its content differs from and has no overlap with Number Sense Fluency.

*Describe the sample(s), including size and characteristics, for each validity analysis conducted.: Crit. Grade N % Fem. % Male % Black % Hisp. % Otr % White ITBS 2 178 61 39 24 35 1 31 ISAT 3 69 49 51 1 25 13 61 ISAT 4 175 51 49 4 28 9 58 ISAT 5 189 53 47 2 21 9 68 ISAT 6 273 59 41 22 6 8 64 ISAT 7 130 45 55 13 2 3 82 ISAT 8 122 37 63 5 1 3 91 STAAR 3 146 55 45 10 39 14 37 STAAR 4 207 51 49 8 46 14 32 STAAR 5 91 47 53 2 52 6 41 STAAR 6 61 55 45 5 44 3 48 STAAR 7 61 40 60 5 43 4 49 STAAR 8 75 61 39 15 53 0 32 MAP 2 218 48 52 5 31 12 53 MAP 3 129 46 54 1 40 14 44 MAP 4 150 59 41 5 35 10 49 MAP 5 125 47 53 3 43 11 43 MAP 6 141 55 45 22 9 10 59 NMSBA 6 206 51 49 3 63 1 32 NMSBA 7 216 47 53 2 62 0 36 NMSBA 8 219 45 55 6 70 1 24 CA 2 1484 50 50 14 23 10 53 CA 3 1497 50 50 14 23 10 53 CA 4 1482 50 50 14 22 10 54 CA 5 1492 50 50 14 23 10 53 CA 6 737 50 50 13 24 9 53 CA 7 959 50 50 14 23 5 58 CA 8 1026 50 50 12 22 8 58

*Describe the analysis procedures for each reported type of validity.: All criterion measures were administered in the Spring. The concurrent studies are correlations between Spring NSF scores and the criteria, and the predictive studies are correlations between Fall NSF scores and the criteria.

*In the table below, report the results of the validity analyses described above (e.g., concurrent or predictive validity, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of validity analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Describe the degree to which the provided data support the validity of the tool.: Number Sense Fluency (NSF) is designed to measure a student's automaticity with comparing numbers within and across number systems and mentally solving one - and two-step computation problems, foundational skills considered important for success and included as learning standards in the Common Core State Standards. These validity studies support the interpretation of NSF scores as foundational for success in the general math domain. Furthermore, they demonstrate that performance on NSF has moderately strong relationship with end-of-year general math achievement.

Do you have validity data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?: No

If yes, fill in data for each subgroup with disaggregated validity data.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of validity analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Bias Analysis

Grade	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8
Rating	Not Provided	Not Provided	Not Provided	Not Provided	Not Provided	Not Provided	Not Provided

Have you conducted additional analyses related to the extent to which your tool is or is not biased against subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)? Examples might include Differential Item Functioning (DIF) or invariance testing in multiple-group confirmatory factor models.: No

If yes,
a. Describe the method used to determine the presence or absence of bias:: Note. Number Sense Fluency is a number-based assessment and does not require the same kind of bias analyses as more vocabulary- and context-heavy assessments (e.g., math word problems, reading comprehension). Instructional text was written using simple, grade-appropriate language that keeps the students’ receptive language load to a minimum. In addition, audio is available for any students who would prefer to have instructional text read to them.

b. Describe the subgroups for which bias analyses were conducted:

c. Describe the results of the bias analyses conducted, including data and interpretative statements. Include magnitude of effect (if available) if bias has been identified.

Growth Standards

Sensitivity: Reliability of Slope

Grade	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

Describe the sample, including size and characteristics. Please provide documentation showing that the sample was composed of students in need of intensive intervention. A sample of students with intensive needs should satisfy one of the following criteria: (1) all students scored below the 30th percentile on a local or national norm, or the sample mean on a local or national test fell below the 25th percentile; (2) students had an IEP with goals consistent with the construct measured by the tool; or (3) students were non-responsive to Tier 2 instruction. Evidence based on an unknown sample, or a sample that does not meet these specifications, may not be considered.: The sample consisted of students below the 25th national percentile on the fall Number Sense Fluency (NSF) benchmark and who were assigned a math performance goal and receiving frequent progress monitoring with NSF. All progress monitoring schedules were at least 20 weeks in duration during the 2016–17 school year.

Describe the frequency of measurement (for each student in the sample, report how often data were collected and over what span of time).: The interval between the first and last administration was a minimum of 20 weeks and a maximum of 42 weeks. Most administrations occurred weekly, with a small percentage conducted twice monthly.

Describe the analysis procedures.: Each student’s progress monitoring administrations were sequenced by date and divided into two groups: odd numbered administrations (e.g, 1,3,5, etc) and even numbered administrations (e.g., 2,4,6, etc). Linear regression was used to compute the slope for each student by group. The following model was used: Scorei = Intercept + Datei where Date is the amount of time since the start of progress monitoring and i ranges from 1 to the number of administrations. The correlation between odd-group and even-group slopes across all students was computed and converted to a split-half reliability coefficient using the Spearman-Brown Formula: 2r/(1+r)

In the table below, report reliability of the slope (e.g., ratio of true slope variance to total slope variance) by grade level (if relevant).

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Do you have reliability of the slope data that is disaggregated by subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)?: No

If yes, fill in data for each subgroup with disaggregated reliability of the slope data.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Sensitivity: Validity of Slope

Grade	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.: Math Concepts & Applications (CA) was used as the criterion measure. CA is a standards-based interim assessment administered as a separate test in the aimswebPlus Fall, Winter, and Spring benchmark math assessment battery. This assessment consists of 29–31 math concepts and problem solving items aligned to Common Core State Standards (CCSS) in mathematics in each of Grades 2 through 8. It is an individually administered power test in which students are given the time they need to complete each item. The math CA content differs from and has no overlap with Number Sense Fluency. NSF measures a student’s automaticity with comparing numbers within and across number systems and mentally solving one- and two-step computation problems, foundational skills considered important for success and included as learning standards in the Common Core State Standards. This foundational skill is instrumental for success on general math problem solving skills, number concepts, and algebraic reasoning. As such , it is expected that students who improve the most on NSF from Fall to Spring should have greater proficiency in the Spring with number concepts and problem solving skills, as measured by CA.

Describe the sample(s), including size and characteristics. Please provide documentation showing that the sample was composed of students in need of intensive intervention. A sample of students with intensive needs should satisfy one of the following criteria: (1) all students scored below the 30th percentile on a local or national norm, or the sample mean on a local or national test fell below the 25th percentile; (2) students had an IEP with goals consistent with the construct measured by the tool; or (3) students were non-responsive to Tier 2 instruction. Evidence based on an unknown sample, or a sample that does not meet these specifications, may not be considered.: The sample is the same as that used to compute the reliability of the slope.

Describe the frequency of measurement (for each student in the sample, report how often data were collected and over what span of time).: The interval between the first and last administration was a minimum of 20 weeks. Most administrations occurred weekly, with a small percentage conducted twice monthly.

Describe the analysis procedures for each reported type of validity.: Spring CA scores were regressed onto the Fall to Spring PM slope for NSF and the Fall NSF scores. Including Fall NSF scores controls for differences in initial performance, thus removing its effect on the relationship between slope and outcome. Standardized regression coefficients and associated standard errors are reported in the table below. Model 1a: CA Spring Score = Intercept + NSF slope + NSF Fall Score

In the table below, report predictive validity of the slope (correlation between the slope and achievement outcome) by grade level (if relevant).
NOTE: The TRC suggests controlling for initial level when the correlation for slope without such control is not adequate.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published validity studies:: No

Provide citations for additional published studies.

Describe the degree to which the provided data support the validity of the tool.: These results support the validity of the inference that growth in the NSF score reflects growth in math proficiency more generally because growth in the criterion construct contributes to higher criterion scores in the Spring. Because NSF is different in content an format from the criterion measure, one would not expect a high correlation between NSF growth and Spring criterion performance. Therefore, moderate correlations such as these are good supporting evidence.

Do you have validity of the slope data that is disaggregated by subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)?: No

If yes, fill in data for each subgroup with disaggregated validity of the slope data.

Type of	Subscale	Subgroup	Informant	Age / Grade	Test or Criterion	n (sample/ examinees)	n (raters)	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published validity studies:: No

Provide citations for additional published studies.

Alternate Forms

Grade	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

Describe the sample for these analyses, including size and characteristics:: The sample consisted of more than 2,000 students per grade with at least 200 schools from across the country represented at each grade. All students had a math performance goal and progress monitoring schedule and scored at or below the 30th national percentile on the spring NSF benchmark form. Performance is based on the first PM form administered following the fall benchmark. On average this occurred about 21 days after benchmark testing. Although students were randomly assigned to forms ability differences were taken into account by via analysis of covariance in which fall benchmark scores were treated as a covariate. The means reported in the table below were adjusted for ability differences based on fall NSF benchmark scores.

What is the number of alternate forms of equal and controlled difficulty?: 20 per grade The adjusted mean performance on the first PM administration was basis of form comparability. To demonstrate comparability we provide the effect size at each grade as the mean difference between each form and the average difficulty across all forms in standard deviations units. (X_i-μ)/SD Additionally, the comparability of the entire set of 20 forms is summarized by the percentage of the overall score variance attributed to Form via analysis of variance with Form treated as a fixed factor. For each grade, Form accounted for a trivial percentage of variance, typically less than 2%, and nearly all of the effect sizes were less than 0.30, with most near 0.10, indicating a small difference between each form and the overall mean. More detailed information about alternate form comparability is available from the Center upon request.

If IRT based, provide evidence of item or ability invariance

If computer administered, how many items are in the item bank for each grade level?

If your tool is computer administered, please note how the test forms are derived instead of providing alternate forms:

Decision Rules: Setting & Revising Goals

Grade	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

In your manual or published materials, do you specify validated decision rules for how to set and revise goals?: Yes
If yes, specify the decision rules:: To get the most value from progress monitoring, aimswebPlus recommends the following: (1) establish a time frame, (2) determine the level of performance expected, and (3) determine the criterion for success. Typical time frames include the duration of the intervention of the end of the school year. An annual time frame is typically used when IEP goals are written for students who are receiving special education services. For example, aimswebPlus goals can be written as follows: In 34 weeks, the student will compare numbers and answer computational problems to earn a score of 30 points on Grade 4 Number Sense Fluency Forms. aimswebPlus provides several ways to define a level of expected performance. The goal can be based on: ● well-established performance benchmarks that can be linked to aimswebPlus measures via national percentiles (e.g., the link to state test performance levels) or total score (e.g., word read per minute in Grade 2); ● a national performance norm benchmark (e.g., the 50th national percentile is often used to indicate on-grade level performance); ● a local performance norm benchmark; ● or an expected or normative rate of improvement (ROI), such as the 85th national student growth percentile. To use this last method (student growth percentile), the user begins by selecting the measure and baseline score, the goal date, the monitoring frequency (default is weekly), and a tentative goal score. The system automatically labels the ambitiousness of the goal as Insufficient (SGP below 50), Closes the Gap (SGP between 50 and 85), Ambitious (86 to 97), or Overly Ambitious (above 97). The user can then adjust the goal (or the goal date) in light of this feedback. For students in need of intensive intervention, aimswebPlus recommends setting performance goals that represent rates of growth between the 86th and 97th SGP (Ambitious). An SGP of 86 represents a growth rate achieved by just 15% of the national sample, which is why it is considered ambitious. However, it is reasonable to expect significantly higher than average growth when implementing effective, intensive intervention. If the goal is set according to a benchmark based on raw scores or national or local norms, the aimswebPlus system still labels the ambitiousness of the goal in one of the four levels described above. If the goal corresponds to an Insufficient or Overly Ambitious rate of growth, users are advised to consider adjusting the goal. However, the user ultimately determines what growth rate is required on an individual basis. With respect to the decision to revise a goal, aimswebPlus provides empirically-based feedback about the student’s progress relative to the initial goal using the statistical tool described in our response to question B5 below. If the projected score at the goal date is fully Above Target (ie., the 75% confidence interval for the student’s projected score at the goal date is entirely above the goal score), we recommend that the user consider raising the goal if the goal date is at least 12 weeks out. Otherwise, we recommend not changing the goal. On the other hand, if the upper end of the confidence interval on the projected score lies Below Target, we recommend either changing the intervention, increasing its intensity, or lowering the goal if the initial goal was Overly Ambitious.

What is the evidentiary basis for these decision rules? NOTE: The TRC expects evidence for this standard to include an empirical study that compares a treatment group to a control and evaluates whether student outcomes increase when decision rules are in place.: As described, users have the flexibility in the method they use to set and revise goals in aimswebPlus. The SGP-based labeling of goals as Overly Ambitious, Ambitious, Closes the Gap, or Insufficient is intended to assist the use in choosing a goal, but is not an automatic goal-setting system. Likewise, the analytical system that generates a confidence interval for the student's predicted performance at the goal date helps the user manage progress monitoring but does not make a decision about revising the goal. Certainly, a decision to lower a goal would rely primarily on the educator's judgement, since the first consideration would be to change the intervention. No experiment has been conducted in which the aimswebPlus information related to setting and revision goals was provided for some students receiving intensive intervention but not others.

Decision Rules: Changing Instruction

Grade	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

In your manual or published materials, do you specify validated decision rules for when changes to instruction need to be made?: Yes
If yes, specify the decision rules:: aimswebPlus applies a statistical procedure, based on linear regression, to the student’s progress monitoring scores in order to provide empirically-based guidance about whether the student is likely to meet, fall short of, or exceed his/her goal. The calculation procedure (presented below) is fully described in the aimswebPlus Progress Monitoring Guide (Pearson, 2017). aimswebPlus users will not have to do any calculations—the online system does this automatically. The decision rule is based on a 75% confidence interval for the student’s predicted score at the goal date. This confidence interval is student-specific and takes into account the number and variability of progress monitoring scores and the duration of monitoring. Starting at the sixth week of monitoring (when there are at least four monitoring scores), the aimswebPlus report following each progress monitoring administration includes one of the following statements: A. Below Target. Projected to not meet the goal. This statement appears if the confidence interval is completely below the goal score. B. Above Target. Projected to meet or exceed the goal. This statement appears if the confidence interval is completely above the goal score. C. Near Target. Projected score at goal date: Between (X) and (Y). This statement appears if the confidence interval includes the goal score, with X and Y indicating the bottom and top of the confidence interval, respectively. If Statement A appears, the user has a sound basis for deciding that the current intervention is not sufficient and a change to instruction should be made. If Statement B appears, there is an empirical basis for deciding that the goal is not sufficiently challenging and should be increased. If Statement C appears, the student’s progress is not clearly different from the aimline, so there is not a compelling reason to change the intervention or the goal; however, the presentation of the confidence-interval range enables the user to see whether the goal is near the upper limit or lower limit of the range, which would signal that the student’s progress is trending below or above the goal. A 75% confidence interval was chosen for this application because it balances the costs of the two types of decision errors. Incorrectly deciding that the goal will not be reached (when in truth it will be reached) has a moderate cost: an intervention that is working will be replaced by a different intervention. Incorrectly deciding that the goal may be reached (when in truth it will not be reached) also has a moderate cost: an ineffective intervention will be continued rather than being replaced. Because both kinds of decision errors have costs, it is appropriate to use a modest confidence level. Calculation of the 75% confidence interval for the score at the goal date Calculate the trend line. This is the ordinary least-squares regression line through the student’s monitoring scores. Calculate the projected score at the goal date. This is the value of the trend line at the goal date. Calculate the standard error of estimate (SEE) of the projected score at the goal date, using the following formula: 〖SEE〗_(predicted score)= √((∑_i^k 〖(y_i-〖ý〗_i)〗^2)/(k-2))×√(1+1/k+〖(GW-(∑_1^k w_i)/k)〗^2/(∑_i^k 〖(w_i-(∑_1^k w_i)/k)〗^2 )) where k = number of completed monitoring administrations, w = week number of a completed administration, GW = week number of the goal date, y = monitoring score, y’ = predicted monitoring score at that week (from the student’s trendline). The means and sums are calculated across all of the completed monitoring administrations up to that date. Add and subtract 1.25 times the SEE to the projected score, and round to the nearest whole numbers.

What is the evidentiary basis for these decision rules? NOTE: The TRC expects evidence for this standard to include an empirical study that compares a treatment group to a control and evaluates whether student outcomes increase when decision rules are in place.: The decision rules are statistically rather than empirically based. The guidance statements from applying the 75% confidence interval to the projected score are correct probabilistic statements, under certain assumptions that: the student's progress to data can be described by a linear trend line. If the pattern of the student's monitoring scores is obviously curvilinear, then the projected score based on a linear trend will likely be misleading. We provide training in the aimswebPlus Progress Monitoring Guide about the need for users to take nonlinearity into account when interpreting progress monitoring data. Another assumption is that the student will continue to progress at the same rate as they have been progressing to that time. This is an unavoidable assumption for a decision system based on extrapolating from past growth. No controlled experimental study has been conducted to support the decision rules, however, an empirical study of actual progress monitoring results was undertaken to evaluate the accuracy of the decision rules as various points during the progress monitoring schedule. aimswebPlus Number Sense Fluency (NSF) and Oral Reading Fluency (ORF) progress monitoring data collected during the 2016-17 school year was used to evaluate the accuracy of the decision feedback. All students on a PM schedule who scored below the 30th national percentile on the fall benchmark and who had at least 20 PM administrations were included. Grades 2 and 3 were chosen. More than 1000 student's scores were used in each grade. Most administrations were collected about weekly. Because we did not have the student’s actual goal score we generated a goal score based on the the ROI that corresponds to a student growth percentile of 55. This level was chosen because it represents an average rate of improvement and it resulted in about 50% of the students meeting the goal. The goal score was computed as follows: Fall Benchmark Score + ROI55*Weeks. Where ROI55 is the ROI associated with the SGP of 55 and Weeks is the number of weeks from the baseline score (Fall Benchmark) and the Spring Benchmark. For each student, beginning with the 8th score and going through the last score, we computed the score feedback based on the rules described in the previous section. If the student was projected to be below target an intervention change was deemed necessary and coded 1. Otherwise, the student was assigned a score of zero for that administration (no change is needed). We computed the accuracy of the decision to change interventions by comparing the decision to whether the student ultimately did not meet the goal score by the Spring Benchmark. Accuracy was computed as the percentage of the decisions to change intervention of all students who did not ultimately meet the goal. The results showed that decision accuracy improved with each successive administration with 70% - 75% accuracy by the 8th administration and 75% - 80% by the 15th administration and 90% by the 20th administration. This trend was replicated in each sample and it provides evidence that the decision rules validly indicate when a change in the intervention should be made because the student is unlikely to achieve the goal with the current rate of improvement.

Data Collection Practices

Most tools and programs evaluated by the NCII are branded products which have been submitted by the companies, organizations, or individuals that disseminate these products. These entities supply the textual information shown above, but not the ratings accompanying the text. NCII administrators and members of our Technical Review Committees have reviewed the content on this page, but NCII cannot guarantee that this information is free from error or reflective of recent changes to the product. Tools and programs have the opportunity to be updated annually or upon request.

Summary

Tool Information
Descriptive Information
Administration
Training & Scoring
Benchmarks

Performance Level
Reliability
Validity
Bias Analysis

Growth Standards
Sensitivity
Alternate Forms
Decision Rules

Data Collection Practices

aimswebPlus MathNumber Sense Fluency

Summary

Tool Information

Descriptive Information

Acquisition and Cost Information

Administration

Training & Scoring

Training

Scoring

Rates of Improvement and End of Year Benchmarks

Performance Level

Reliability

Validity

Bias Analysis

Growth Standards

Sensitivity: Reliability of Slope

Sensitivity: Validity of Slope

Alternate Forms

Decision Rules: Setting & Revising Goals

Decision Rules: Changing Instruction

Data Collection Practices

aimswebPlus Math
Number Sense Fluency