i-Ready Diagnostic and Growth Monitoring
Mathematics
Summary
i-Ready Growth Monitoring is a brief, computer-delivered, periodic adaptive assessment in mathematics for students in grades K–8, assessing Number & Operations/The Number System, Algebra & Algebraic Thinking, Geometry, and Measurement & Data. Growth Monitoring is part of the i-Ready Assessment suite and is designed to be used jointly with i-Ready Diagnostic to allow for progress monitoring throughout the year and determine whether students are on track for appropriate growth. Growth Monitoring is designed to be administered monthly but may be administered as frequently as every week in which the i-Ready Diagnostic assessment is not administered. i-Ready Growth Monitoring is a general outcome measure form of progress monitoring. The reports show whether students are on track for their target growth by projecting where their ability level will likely be at the end of the school year and comparing the projected growth-to-growth targets. For students who are below level, Growth Monitoring can be used as a tool for Response to Intervention (RTI) programs. Curriculum Associates designed and developed i-Ready specifically to assess student mastery of state and Common Core State Standards (CCSS). Growth Monitoring assessment takes approximately 15 minutes and may be conducted with all students or with specific groups of students who have been identified as at risk of academic failure. i-Ready’s sophisticated adaptive algorithm automatically selects from thousands of multiple-choice and technology-enhanced items to get to the core of each student's strengths and challenges, regardless of the grade level at which the student is performing. The system automatically analyzes and scores student responses. Available as soon as a student completes the assessment, i-Ready’s intuitive Growth Monitoring reports—available at the student and class levels—focus solely on how students are tracking toward their end-of-year growth.
- Where to Obtain:
- Curriculum Associates, LLC
- RFPs@cainc.com
- 153 Rangeway Road, N. Billerica MA 01862
- 800-225-0248
- www.curriculumassociates.com
- Initial Cost:
- $7.25 per student
- Replacement Cost:
- $7.25 per student per year
- Included in Cost:
- $7.25/student/year for i-Ready Assessment for mathematics, which includes the Diagnostic and Growth Monitoring. The license fee includes online student access to assessment, plus staff access to management and reporting suite, downloadable lesson plans, and user resources including the i-Ready Central® support website; account set-up and secure hosting; all program maintenance/ updates/enhancements during the active license term; and unlimited user access to U.S.-based service and support via toll-free phone and email during business hours. Professional development is required and available at an additional cost ($2,200/session up to six hours). Site license pricing is also available.
- In most cases, students who require accommodations will not require additional help to use i-Ready. The design emphasizes making necessary adjustments so that a large percentage of students requiring accommodations will be able to take the test and complete the instruction in a standard manner and the interpretation or the purpose of the test or lesson quiz is not compromised. Even though items on universally designed assessments and lessons will be accessible for most students, there will still be some students who continue to need accommodations. The goal of Universal Design in such cases is to facilitate the use of the appropriate accommodations and to reduce threats to the validity and comparability of scores. In i-Ready Assessment, universal accessibility features are available to all students and do not need to be enabled. Additionally, there are processes and tools in i-Ready Assessment that are only used to support students who have documented needs, which are usually mandated supports provided as a part of a student’s IEP or 504 plan. IEP teams and other educators determine which accommodations a student receives. Although Curriculum Associates provides guidance on how our products support various accommodations, educators who work with individual students determine which accommodations are needed and how to correctly implement those accommodations. To make i-Ready accessible to the widest population of students, we offer a range of accessibility supports that can also meet the requirements of a number of student accommodations. This accessibility update is designed to provide educators with information about i-Ready’s current accessibility supports, insight into our vision, and plans for future enhancements. For a current list of embedded and non-embedded student supports and accommodations, please refer to the i-Ready Accessibility and Accommodations Update: https://i-readycentral.com/view-resource/?id=33812. This documentation is updated regularly to reflect our most recent i-Ready accessibility developments. For more information, please also refer to the i-Ready Diagnostic Universal Accessibility Features and Accommodations Guidance resource: https://i-readycentral.com/download/?res=27077&view_pdf=1. We have also included guidance on accessibility features available in i-Ready Standards Mastery: https://i-readycentral.com/download/?res=28726&view_pdf=1. For video demonstrations of certain features, please refer to our series of short video resources that demonstrate i-Ready’s embedded accessibility features: https://i-readycentral.com/articles/video-demonstrations-of-i-readys-embedded-accessibility-features/. The linked documents and resources are housed online at i-Ready Central (i-ReadyCentral.com/iReadyAccessibilityResources) along with other helpful accessibility resources. We regularly update our documentation and resources with releases to reflect reductions in exceptions and new gains.
- Training Requirements:
- 4 - 8 hours of training.
- Qualified Administrators:
- Paraprofessional or professional
- Access to Technical Support:
- Partner Success Manager plus unlimited access to in-house technical support during business hours.
- Assessment Format:
-
- Individual
- Computer-administered
- Scoring Time:
-
- Scoring is automatic OR
- 0 minutes per
- Scores Generated:
-
- Percentile score
- IRT-based score
- Developmental benchmarks
- Other : On-grade achievement level placements
- Administration Time:
-
- 15 minutes per student
- Scoring Method:
-
- Automatically (computer-scored)
- Technology Requirements:
-
- Computer or tablet
- Internet connection
Tool Information
Descriptive Information
- Please provide a description of your tool:
- i-Ready Growth Monitoring is a brief, computer-delivered, periodic adaptive assessment in mathematics for students in grades K–8, assessing Number & Operations/The Number System, Algebra & Algebraic Thinking, Geometry, and Measurement & Data. Growth Monitoring is part of the i-Ready Assessment suite and is designed to be used jointly with i-Ready Diagnostic to allow for progress monitoring throughout the year and determine whether students are on track for appropriate growth. Growth Monitoring is designed to be administered monthly but may be administered as frequently as every week in which the i-Ready Diagnostic assessment is not administered. i-Ready Growth Monitoring is a general outcome measure form of progress monitoring. The reports show whether students are on track for their target growth by projecting where their ability level will likely be at the end of the school year and comparing the projected growth-to-growth targets. For students who are below level, Growth Monitoring can be used as a tool for Response to Intervention (RTI) programs. Curriculum Associates designed and developed i-Ready specifically to assess student mastery of state and Common Core State Standards (CCSS). Growth Monitoring assessment takes approximately 15 minutes and may be conducted with all students or with specific groups of students who have been identified as at risk of academic failure. i-Ready’s sophisticated adaptive algorithm automatically selects from thousands of multiple-choice and technology-enhanced items to get to the core of each student's strengths and challenges, regardless of the grade level at which the student is performing. The system automatically analyzes and scores student responses. Available as soon as a student completes the assessment, i-Ready’s intuitive Growth Monitoring reports—available at the student and class levels—focus solely on how students are tracking toward their end-of-year growth.
- Is your tool designed to measure progress towards an end-of-year goal (e.g., oral reading fluency) or progress towards a short-term skill (e.g., letter naming fluency)?
-
ACADEMIC ONLY: What dimensions does the tool assess?
- BEHAVIOR ONLY: Please identify which broad domain(s)/construct(s) are measured by your tool and define each sub-domain or sub-construct.
- BEHAVIOR ONLY: Which category of behaviors does your tool target?
Acquisition and Cost Information
Administration
Training & Scoring
Training
- Is training for the administrator required?
- Yes
- Describe the time required for administrator training, if applicable:
- 4 - 8 hours of training.
- Please describe the minimum qualifications an administrator must possess.
- Paraprofessional or professional
- No minimum qualifications
- Are training manuals and materials available?
- Yes
- Are training manuals/materials field-tested?
- Yes
- Are training manuals/materials included in cost of tools?
- Yes
- If No, please describe training costs:
- Can users obtain ongoing professional and technical support?
- Yes
- If Yes, please describe how users can obtain support:
- Partner Success Manager plus unlimited access to in-house technical support during business hours.
Scoring
- Please describe the scoring structure. Provide relevant details such as the scoring format, the number of items overall, the number of items per subscale, what the cluster/composite score comprises, and how raw scores are calculated.
- i-Ready scale scores are linear transformations of logit values. Logits, also known as “log odd units,” are measurement units for logarithmic probability models such as the Rasch model. Logits are used to determine both student ability and item difficulty. Within the Rasch model, if the ability matches the item difficulty, then the person has a .50 chance of answering the item correctly. For i-Ready, student ability and item logit values generally range from around -6 to 6. When the i-Ready vertical scale was updated in August 2016, the equipercentile equating method was applied to the updated logit scale. The appropriate scaling constant and slope were applied to the logit value to convert to scale score values between 100 and 800 (Kolen and Brennan, 2014). This scaling is accomplished by converting the estimated logit values with the following equations: Scale Value = 466.41 + 25.42 × Logit Value Once this conversion is made, floor and ceiling values are imposed to keep the scores within the 100–800 scale range. This is achieved by simply recoding all values below 100 up to 100 and all values above 800 down to 800. The scale score range, mean, and standard deviation on the updated scale are either exactly the same as (range) or very similar (mean and standard deviation) to those from the scale prior to the August 2016 scale update, which generally allows year-over-year comparisons of i-Ready scale scores. Additional information on the formulas used to derive raw scores is available from the Center upon request. i-Ready is a computer-adaptive test that uses Item Response Theory (IRT) to estimate a student’s score. In addition to the measurement model used to provide student scores, i-Ready Growth Monitoring also has a projection model that yields projected scores, which are particularly useful to educators interested in progress monitoring. The Growth Monitoring projection model was developed after the first full-year implementation of the assessment. Several models were evaluated in an extensive research study, in collaboration with independent researchers from Harvard University. The model that had the best psychometric characteristics (e.g., low residual, low residual bias, consistent projection precision across the school year) and was operationally feasible was selected. The final projection model has the following key structural features: 1) Projection is based on a weighted combination of two values: a) The average across all test scores a student receives during the academic year, including Diagnostic and Growth Monitoring (grand mean, or GM), and b) Predicted end-of-year scale score based on a simple linear regression (linear prediction, or LP). 2) Weighting of the GM and the LP is determined by fitting multiple linear regression models to the preceding year’s assessment data on the relationship between GM and LP and the actual end-of-year Diagnostic test scores students obtained in the previous year. A set of multiple regression intercept and weighting factors is derived for each of the nine grades (K–8), two subjects, three ability groups based on fall percentile rank (bottom 25%, middle 50%, and top 25%), and eight months (October to May). Thus, a total of 432 (9 × 2 × 3 × 8) sets of model parameters are developed. These structural features of the projection model have a few advantages: 1) Because model parameters are obtained based on operational data, they can be updated yearly with the most current growth pattern from the past academic year. 2) Because model parameters are obtained for three ability groups, the differential growth rate for students at the high and low ends of the ability spectrum is taken into consideration. 3) Because model parameters are obtained for each month, the projection error stays low even at the beginning of the school year, when the number of data points is small. To illustrate the accuracy of the Growth Monitoring projection model, all students from the 2014–2015 school year were randomly assigned into one of two samples: the training sample or the validation sample. The training sample was used to derive weighting parameters for each of the 432 models. These parameters were then applied to the validation sample. Figure 4 of the Technical Manual shows the normalized root-mean-square error (NRMSE) from the validation sample. NRMSE is zero when the prediction matches perfectly to the actual test score; an NRMSE of less than .10 is considered adequate fit. Figure 4 of the Technical Manual shows that, while the prediction error is relatively higher in October when only three months of test data are available and the projection is more than six months out, it quickly drops to a lower level (i.e., most are below .10) in November and stays low and stable across the rest of the year. Section 2.2 of the i-Ready Technical Manual provides more details about the projection model. The methodology for setting growth targets is described in Chapter 6 of the i Ready Technical Manual. Consumers interested in more detailed information should contact the publisher of the i-Ready Technical Manual, Curriculum Associates.
- Do you provide basis for calculating slope (e.g., amount of improvement per unit in time)?
- ACADEMIC ONLY: Do you provide benchmarks for the slopes?
- Yes
- ACADEMIC ONLY: Do you provide percentile ranks for the slopes?
- No
- Describe the tool’s approach to progress monitoring, behavior samples, test format, and/or scoring practices, including steps taken to ensure that it is appropriate for use with culturally and linguistically diverse populations and students with disabilities.
- i-Ready Growth Monitoring is a brief, computer delivered, periodic adaptive assessment in mathematics for students in grades K–8. Growth Monitoring is part of the i-Ready Assessment suite and is designed to be used jointly with i-Ready Diagnostic to allow for progress monitoring throughout the year to determine whether students are on track for appropriate growth. Growth monitoring is a periodic assessment that may be administered as frequently as every week in which the i-Ready Diagnostic assessment is not administered. The reports for these brief assessments (an average duration of 15 minutes or less) show whether students are on track for their target growth by projecting where their ability level will likely be at the end of the school year and comparing the projected growth to growth targets. For students who are below level, Growth Monitoring can be used as a tool for RTI programs. Growth Monitoring is a general outcome measure form of progress monitoring. The reports associated with Growth Monitoring—available at the student and class levels—focus solely on how students are tracking toward their end-of-year growth. Curriculum Associates is committed to fair and unbiased product development. i-Ready is developmentally, linguistically, and culturally appropriate for a wide range of students at each of the assessed grades. For instance, the names, characters, and scenarios used within the program are ethnically and culturally diverse. We developed all items and passages in i-Ready to be accessible for all students regardless of their need for accommodation. In most cases, students who require accommodations (e.g., large print or extra time) will not require additional help to complete an i-Ready assessment. The design of the assessment emphasizes making necessary adjustments to the items so that a large percentage of students requiring accommodations will be able to take the test in a standard manner and the interpretation or the purpose of the test is not compromised. According to the Standards (AERA, APA, NCME, 2014), “Universal Design processes strive to minimize access challenges by taking into account test characteristics that may impede access to the construct for certain test takers.” i-Ready was developed with the universal principles of design for assessment in mind and followed the seven elements of Universal Design for large-scale assessments recommended by NCEO (2002): 1. Inclusive assessment population, 2. Precisely defined constructs, 3. Accessible, non-biased items, 4. Amenable to accommodations, 5. Simple, clear, and intuitive instructions and procedures, 6. Maximum readability and comprehensibility, 7. Maximum legibility. Curriculum Associates periodically runs differential item functioning (DIF) analysis to ensure that items are operating properly and to identify items that need to go through additional review by subject matter experts and key stakeholders to determine if the items should be removed from the item pool for further editing or replaced. Items with moderate and large DIF are subjected to this extensive review to identify the potential sources of differential functioning. We then determine whether each item should remain in the operational pool, be removed from the item pool, or be revised and resubmitted for field-testing. DIF analysis and subsequent item reviews are important quality assurance procedures to support the validity of the items in the item pool, and are carried out annually by Curriculum Associates following the best practices in the field of educational measurement. Validity refers to the degree to which evidence and theory can support the interpretations of scores used for the assessment (AERA, APA, NCME, 2014). Under the Rasch item response theory (IRT) model, the probability of a correct response to an item is only dependent on the item difficulty and the person’s ability level. If an item favors one group of students over another based on the test taker’s characteristics (e.g., gender, ethnicity), then the assumption of IRT is violated, and the item is considered biased and unfair. A biased item will exhibit DIF. DIF analysis is a procedure used to determine if items are fair and appropriate for assessing the knowledge of various subgroups (e.g., gender and ethnicity) while controlling for ability. However, it should be noted that the presence of DIF alone is not evidence of item bias. Difference in item responses would be expected when the student groups differed in knowledge or the ability level being measured. Consequently, difference in item performance obtained from groups of students with different ability levels does not represent item bias. The determination of DIF, therefore, should be based on not only DIF analysis, but also content experts’ comprehensive review. The following describes the latest DIF analysis conducted on the i-Ready items. DIF was investigated using WINSTEPS® by comparing the item difficulty measure for two demographic categories in a pairwise comparison through a combined calibration analysis. The essence of this methodology is to investigate the interaction of the person-groups with each item, while fixing all other item and person measures to those from the combined calibration. The method used to detect DIF is based on the Mantel-Haenszel procedure (MH), and the work of Linacre & Wright (1989) and Linacre (2012). Typically, the group representing test takers in a specific demographic group is referred to as the focal group. The group made up of test takers from outside this group is referred to as the reference group. For example, for gender, Female is the focal group, and Male is the reference group. More information is provided in section 3.4 of the i-Ready Technical Manual. Consumers interested in more detailed information should contact the publisher of the i-Ready Technical Manual, Curriculum Associates.
Rates of Improvement and End of Year Benchmarks
- Is minimum acceptable growth (slope of improvement or average weekly increase in score by grade level) specified in your manual or published materials?
- Yes
- If yes, specify the growth standards:
- i-Ready provides two growth targets for each student, depending on their grade level and starting placement level. Typical Growth represents the average expected growth for the given grade level and starting placement level. From lowest to highest starting placement level, the Typical Growth targets are: Grade K: 32, 24, 21 Grade 1: 36, 29, 26, 21 Grade 2: 29, 26, 22, 18 Grade 3: 30, 27, 26, 25, 21 Grade 4: 24, 23, 23, 23, 19 Grade 5: 20, 18, 18, 18, 14 Grade 6: 15, 14, 14, 13, 13 Grade 7: 13, 13, 12, 12, 11 Grade 8: 12, 10, 9, 9, 9 Stretch Growth represents aspirational growth designed to put students on a 1-year, 2-year, or more than 2-year path toward proficiency. From lowest to highest starting placement level, the Stretch Growth targets are: Grade K: 39, 38, 35 Grade 1: 57, 37, 36, 32 Grade 2: 48, 36, 35, 31 Grade 3: 55, 43, 35, 34, 30 Grade 4: 47, 41, 34, 33, 24 Grade 5: 41, 35, 31, 29, 20 Grade 6: 35, 30, 26, 25, 20 Grade 7: 33, 25, 23, 22, 20 Grade 8: 31, 23, 22, 21, 19
- Are benchmarks for minimum acceptable end-of-year performance specified in your manual or published materials?
- Yes
- If yes, specify the end-of-year performance standards:
- This information is provided directly to districts and schools as part of our support process.
- Date
- Size
- Male
- Female
- Unknown
- Eligible for free or reduced-price lunch
- Other SES Indicators
- White, Non-Hispanic
- Black, Non-Hispanic
- Hispanic
- American Indian/Alaska Native
- Asian/Pacific Islander
- Other
- Unknown
- Disability classification (Please describe)
- First language (Please describe)
- Language proficiency status (Please describe)
Performance Level
Reliability
Grade |
Kindergarten
|
Grade 1
|
Grade 2
|
Grade 3
|
Grade 4
|
Grade 5
|
Grade 6
|
Grade 7
|
Grade 8
|
---|---|---|---|---|---|---|---|---|---|
Rating |
- *Offer a justification for each type of reliability reported, given the type and purpose of the tool.
- For the i-Ready Diagnostic, Curriculum Associates prepares the IRT-based marginal reliability, as well as the standard error of measurement (SEM). Given that the i-Ready Diagnostic is a computer-adaptive assessment that does not have a fixed form, some traditional reliability estimates such as Cronbach’s alpha are inappropriate for quantifying consistency of student scores. The IRT analogue to classical reliability is called marginal reliability, and operates on the variance of the theta scores (i.e., proficiency) and the average of the expected error variance. The marginal reliability uses the classical definition of reliability as proportion of variance in the total observed score due to true score under an IRT model (the i-Ready Diagnostic uses a Rasch model to be specific). In addition to marginal reliability, SEMs are also important for quantifying the precision of scores. In an IRT model, SEMs are affected by factors such as how well the data fit the underlying model, student response consistency, student location on the ability continuum, match of items to student ability, and test length. Given the adaptive nature of i-Ready and the wide difficulty range in the item bank, standard errors are expected to be low and very close to the theoretical minimum for tests of similar length. The theoretical minimum would be reached if each interim estimate of student ability is assessed by an item with difficulty matching perfectly to the student’s ability estimated from previous items. Theoretical minimums are restricted by the number of items served in the assessment—the more items that are served up, the lower the SEM could potentially be. For mathematics, the minimum SEM for overall scores is 6.00. In addition to providing the mean SEM by subject and grade, the graphical representations of the conditional standard errors of measurement (CSEM) provide additional evidence of the precision with which i-Ready measures student ability across the operational score scale. In the context of model-based reliability analyses for computer adaptive tests, such as i-Ready, CSEM plots permit test users to judge the relative precision of the estimate. These figures are available from the Center upon request.
- *Describe the sample(s), including size and characteristics, for each reliability analysis conducted.
- Data for obtaining the marginal reliability and SEM was from the August and September administrations of the i Ready Diagnostic from 2016 (reported in Table 4.4 of the i-Ready Diagnostic Technical Manual). All students tested within the time frame were included and this time period was selected because it coincides with most districts’ first administration of the i-Ready Diagnostic. Sample sizes by grade are presented in the table shown under question #4 on the next page.
- *Describe the analysis procedures for each reported type of reliability.
- This marginal reliability uses the classical definition of reliability as the proportion of variance in the total observed score due to true score. The true score variance is computed as the observed score variance minus the error variance: ρ_θ=(σ_(θ-)^2 σ ̅_E^2)/(σ_θ^2 ) where ρθ is the marginal reliability estimate, σ^2θ is the observed error variance of the ability estimate, σ ̅_E^2 is the observed average conditional error variance. Similar to a classical reliability coefficient, the marginal reliability estimate increases as the standard error decreases; it approaches 1 when the standard error approaches 0. The observed score variance, the error variance, and SEM (the square root of the error variance) are obtained through WINSTEPS calibrations. One separate calibration was conducted for each grade.
*In the table(s) below, report the results of the reliability analyses described above (e.g., model-based evidence, internal consistency or inter-rater reliability coefficients). Include detail about the type of reliability data, statistic generated, and sample size and demographic information.
Type of | Subscale | Subgroup | Informant | Age / Grade | Test or Criterion | n (sample/ examinees) |
n (raters) |
Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of reliability analysis not compatible with above table format:
- Manual cites other published reliability studies:
- No
- Provide citations for additional published studies.
- Do you have reliability data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
- No
If yes, fill in data for each subgroup with disaggregated reliability data.
Type of | Subscale | Subgroup | Informant | Age / Grade | Test or Criterion | n (sample/ examinees) |
n (raters) |
Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of reliability analysis not compatible with above table format:
- Manual cites other published reliability studies:
- No
- Provide citations for additional published studies.
Validity
Grade |
Kindergarten
|
Grade 1
|
Grade 2
|
Grade 3
|
Grade 4
|
Grade 5
|
Grade 6
|
Grade 7
|
Grade 8
|
---|---|---|---|---|---|---|---|---|---|
Rating |
- *Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.
- All of the following criterion measures are external to the i-Ready progress monitoring tool and include widely used assessments of mathematics ability: • The NWEA Measures of Academic Progress (NWEA MAP) Mathematics assessment is a computerized adaptive assessment measuring a student's general knowledge in Mathematics. • The Smarter Balanced Assessment (SBA) for Mathematics is a summative, standards-aligned assessment administered in the Spring and used in various states to provide information about students’ Mathematics achievement and to support effective teaching and learning. • The Mathematics Alabama Comprehensive Assessment Program (ACAP) summative assessment is a computer-based, criterion-referenced assessment, administered in the Spring and designed to measure student progress on the Alabama Courses of Study Standards in Mathematics in grades 2-8. • The Massachusetts Comprehensive Assessment System (MCAS) Mathematics test is a statewide standards-based summative assessment in mathematics administered in the Spring and developed to help parents, students, educators, and policymakers determine where districts, schools, and students are meeting expectations and where they need additional support.
- *Describe the sample(s), including size and characteristics, for each validity analysis conducted.
- For the concurrent validity analyses, the sample includes students taking both the i-Ready progress monitoring tool and the criterion measure (MAP, ACAP, MCAS) in the Spring testing window. Sample sizes varied from 211 to 14,755 and included students at all performance levels. For predictive validity analyses in grades K-2, the sample includes students taking the i-Ready progress monitoring tool between 12 and 36 months prior to the criterion measure (SBA, MCAS). Sample sizes varied from 853 to 18,172 and included students at all performance levels. For the predictive validity analyses in grades 3-8, the sample includes students taking both the progress monitoring tool and the criterion measure (SBA) in different testing windows within the same academic year, with the i-Ready progress monitoring tool being administered in the Fall and the criterion measure in the following Spring. Sample sizes varied from 38,920 to 44,438 and included students at all performance levels.
- *Describe the analysis procedures for each reported type of validity.
- For both concurrent and predictive analyses, Pearson correlation coefficients between the screening and criterion measures were calculated and the 95% confidence interval around the Pearson r correlation coefficient was computed using Fisher r-to-z transformation. For concurrent analyses, scores on the i-Ready progress monitoring tool administered in the Spring were correlated to scores on the criterion measure administered in the Spring of the same academic year. For predictive analyses in grades K-2, scores on the i-Ready progress monitoring tool administered in the Spring were correlated to scores on the criterion measure for the same students in subsequent academic years. Specifically, screening scores in grades K and 2 were correlated to criterion scores in grade 3, and screening scores in grade 1 were correlated to criterion scores in grade 4. For grades 3-8, scores on the i-Ready progress monitoring tool administered in the Fall (predictive) and Spring (concurrent) were correlated to scores on the criterion measure administered in the Spring of the same academic year. Fisher r-to-z transformations were conducted to standardize the correlations and the 95% confidence interval around the correlations was computed.
*In the table below, report the results of the validity analyses described above (e.g., concurrent or predictive validity, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.
Type of | Subscale | Subgroup | Informant | Age / Grade | Test or Criterion | n (sample/ examinees) |
n (raters) |
Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of validity analysis not compatible with above table format:
- Manual cites other published reliability studies:
- No
- Provide citations for additional published studies.
- Describe the degree to which the provided data support the validity of the tool.
- Concurrent validity coefficients for grades K-2 were positive and significant, and the lower bound of the confidence interval around the coefficients ranged from .65 to .83 indicating a strong relationship between the i-Ready progress monitoring tool and the external criterion measures (MAP, ACAP). Predictive validity coefficients for grades K-2 were positive and significant, and the lower bound of the confidence interval around the coefficients ranged from .60 to .79. Given that the i-Ready progress monitoring tool for these analyses was administered 12 to 36 months prior to the criterion measure, these correlation coefficients are predictably lower than measures taken closer in time. However, these validity coefficients still meet the minimum threshold of 0.60, indicating a robust positive relationship between the i-Ready progress monitoring tool and high-stakes statewide assessments (SBA, MCAS), even across an extended period of time. Concurrent and predictive validity coefficients for grades 3–8 were mostly in the .80’s, suggesting a strong relationship between the i-Ready Diagnostic progress monitoring tool and the criterion state summative assessments in the same subject and grade.
- Do you have validity data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
If yes, fill in data for each subgroup with disaggregated validity data.
Type of | Subscale | Subgroup | Informant | Age / Grade | Test or Criterion | n (sample/ examinees) |
n (raters) |
Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of validity analysis not compatible with above table format:
- Manual cites other published reliability studies:
- No
- Provide citations for additional published studies.
Bias Analysis
Grade |
Kindergarten
|
Grade 1
|
Grade 2
|
Grade 3
|
Grade 4
|
Grade 5
|
Grade 6
|
Grade 7
|
Grade 8
|
---|---|---|---|---|---|---|---|---|---|
Rating | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
- Have you conducted additional analyses related to the extent to which your tool is or is not biased against subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)? Examples might include Differential Item Functioning (DIF) or invariance testing in multiple-group confirmatory factor models.
- Yes
- If yes,
- a. Describe the method used to determine the presence or absence of bias:
- DIF was investigated using WINSTEPS® (Version 3.92) by comparing item difficulty for pairs of demographic subgroups through a combined calibration analysis. This methodology evaluates the interaction of the person-level subgroups with each item, while fixing all other item and person measures to those from the combined calibration. The method used to detect DIF is based on the Mantel-Haenszel procedure (MH), and the work of Linacre & Wright (1989) and Linacre (2012). Typically, the groups of test takers are referred to as “reference” and “focal” groups. For example, for analysis of gender bias, Female test takers are the focal group, and Male test takers are the reference group. More information is provided in section 3.4 of the i Ready Technical Manual. Consumers interested in more detailed information should contact the publisher of the i-Ready Technical Manual, Curriculum Associates.
- b. Describe the subgroups for which bias analyses were conducted:
- The latest large-scale DIF analysis included a random sample (20%) of students from the 2015–2016 i-Ready operational data. Given the large size of the 2015–2016 i-Ready student population, it is practical to carry out the calibration analysis with a random sample. The following demographic categories were compared: Female vs. Male; African American and Hispanic vs. Caucasian; English Learner vs. non–English Learner; Special Ed vs. General Ed; Economically Disadvantaged vs. Not Economically Disadvantaged. In each pairwise comparison, estimates of item difficulty for each category in the comparison were calculated. The table below presents the total number and percentage of students included in the DIF analysis. Subgroup n Percent Male 267200 52 Female* 247000 48 White 126400 34.1 African American or Hispanic* 244100 65.9 Non-EL 262700 80.8 EL* 62400 19.2 General Education 181000 85.1 Special Education* 31600 14.9 Not Economically Disadvantaged 192100 67.1 Economically Disadvantaged* 94100 32.9 *Denotes the focal group
- c. Describe the results of the bias analyses conducted, including data and interpretative statements. Include magnitude of effect (if available) if bias has been identified.
All active items in the current item pool for the 2015–2016 school year are included in the DIF analysis, covering items administered to students in grades K–12. The total numbers of items are 3,103 for mathematics. WINSTEPS was used to conduct the calibration for DIF analysis by grade. To help interpret the results, the Educational Testing Service (ETS) criteria using the delta method was used to categorize DIF (Zwick, Thayer, & Lewis, 1999) and is presented below. ETS DIF Category A (negligible): |DIF| < 0.43 B (moderate): |DIF| ≥ 0.43 and |DIF| < 0.64 C (large): |DIF| ≥ 0.64 B- or C- suggests DIF against focal group B+ or C+ suggests DIF against reference group Tables reporting the numbers and percentages of items exhibiting DIF for each of the demographic categories are available, upon request, from the Center. The majority of reading items showed negligible DIF (at least 90 percent), and for very few categories did more than 3 percent of items show large DIF (level C) by grade.
Growth Standards
Sensitivity: Reliability of Slope
Grade | Kindergarten | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 |
---|---|---|---|---|---|---|---|---|---|
Rating |
- Describe the sample, including size and characteristics. Please provide documentation showing that the sample was composed of students in need of intensive intervention. A sample of students with intensive needs should satisfy one of the following criteria: (1) all students scored below the 30th percentile on a local or national norm, or the sample mean on a local or national test fell below the 25th percentile; (2) students had an IEP with goals consistent with the construct measured by the tool; or (3) students were non-responsive to Tier 2 instruction. Evidence based on an unknown sample, or a sample that does not meet these specifications, may not be considered.
- Describe the frequency of measurement (for each student in the sample, report how often data were collected and over what span of time).
- Describe the analysis procedures.
In the table below, report reliability of the slope (e.g., ratio of true slope variance to total slope variance) by grade level (if relevant).
Type of | Subscale | Subgroup | Informant | Age / Grade | Test or Criterion | n (sample/ examinees) |
n (raters) |
Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of reliability analysis not compatible with above table format:
- Manual cites other published reliability studies:
- Provide citations for additional published studies.
- Do you have reliability of the slope data that is disaggregated by subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)?
If yes, fill in data for each subgroup with disaggregated reliability of the slope data.
Type of | Subscale | Subgroup | Informant | Age / Grade | Test or Criterion | n (sample/ examinees) |
n (raters) |
Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of reliability analysis not compatible with above table format:
- Manual cites other published reliability studies:
- Provide citations for additional published studies.
Sensitivity: Validity of Slope
Grade | Kindergarten | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 |
---|---|---|---|---|---|---|---|---|---|
Rating |
- Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.
-
- Describe the sample(s), including size and characteristics. Please provide documentation showing that the sample was composed of students in need of intensive intervention. A sample of students with intensive needs should satisfy one of the following criteria: (1) all students scored below the 30th percentile on a local or national norm, or the sample mean on a local or national test fell below the 25th percentile; (2) students had an IEP with goals consistent with the construct measured by the tool; or (3) students were non-responsive to Tier 2 instruction. Evidence based on an unknown sample, or a sample that does not meet these specifications, may not be considered.
- Describe the frequency of measurement (for each student in the sample, report how often data were collected and over what span of time).
- Describe the analysis procedures for each reported type of validity.
In the table below, report predictive validity of the slope (correlation between the slope and achievement outcome) by grade level (if relevant).
NOTE: The TRC suggests controlling for initial level when the correlation for slope without such control is not adequate.
Type of | Subscale | Subgroup | Informant | Age / Grade | Test or Criterion | n (sample/ examinees) |
n (raters) |
Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of reliability analysis not compatible with above table format:
- Manual cites other published validity studies:
- Provide citations for additional published studies.
- Describe the degree to which the provided data support the validity of the tool.
- Do you have validity of the slope data that is disaggregated by subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)?
If yes, fill in data for each subgroup with disaggregated validity of the slope data.
Type of | Subscale | Subgroup | Informant | Age / Grade | Test or Criterion | n (sample/ examinees) |
n (raters) |
Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of reliability analysis not compatible with above table format:
- Manual cites other published validity studies:
- Provide citations for additional published studies.
Alternate Forms
Grade | Kindergarten | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 |
---|---|---|---|---|---|---|---|---|---|
Rating |
- Describe the sample for these analyses, including size and characteristics:
- The i-Ready assessment forms are assembled automatically by Curriculum Associates’ computer-adaptive testing (CAT) algorithm, subject to objective content and other constraints described in section 2.1.3 in Chapter 2 of the i-Ready Technical Manual. As such, the sample size per form that would be applicable to linear (i.e., non-adaptive) assessments does not directly apply to Curriculum Associates’ i-Ready Diagnostic assessment. Note that many analyses that Curriculum Associates conducts (e.g., to estimate growth targets) are based on normative samples, which for the 2015–2016 school year, included 3.9 million i-Ready Diagnostic assessments taken by more than one million students from over 4,000 schools. The demographics of the normative sample at each grade closely match that of the national student population. Tables 7.3 and 7.4 of the Technical Manual present the sample sizes for each normative sample and the demographics of the samples compared with the latest population target, as reported by the National Center for Education Statistics. Consumers interested in more detailed information should contact the publisher of the i-Ready Technical Manual, Curriculum Associates.
- What is the number of alternate forms of equal and controlled difficulty?
- Virtually infinite. As a computer-adaptive test, in i-Ready all administrations are equivalent forms. However, each student is presented with an individualized testing experience where he or she is served test items based on answer choices to previous questions. In essence, this scenario provides a virtually infinite number of test forms, because individual student testing experiences are largely unique.
- If IRT based, provide evidence of item or ability invariance
- Section 2.1.3 in Chapter 2 of the i-Ready Technical Manual describes the adaptive nature of the tests and how the item selection process works. The i-Ready Growth Monitoring assessments are a general outcome measure of student ability and measure a subset of skills that are tested on the Diagnostic. Items on Growth Monitoring are from the same domain item pool for the Diagnostic. Test items are served based on the same IRT ability estimate and item selection logic. Often, test developers want to show that the items in their measure are invariant, meaning the items are measuring both groups similarly. To illustrate the property of item invariance across the groups of i-Ready test takers in need of intensive intervention (i.e., below the national norming sample’s 30th percentile rank in terms of overall mathematics scale score) and those without such need (i.e., at or above the 30th percentile rank), a special set of item calibrations were prepared. Correlations between independent item calibrations for subgroups of students below and at-or-above the 30th percentile rank were computed to demonstrate the extent that i-Ready parameter estimates are appropriate for use with both groups. To demonstrate comparable item parameter estimates, correlations between the below and at-or-above the 30th percentile item difficulty parameter estimates and their corresponding confidence interval—constructed using Fisher’s r-to-z transformation (Fisher, R. A. 1915. Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika, 10(4), 507-521)—were provided. Correlations and corresponding confidence intervals can serve as a measure of the consistency between the item difficulty estimates. Student response data used for item invariance analyses were from the August and September 2017 administrations of the i-Ready Diagnostic. Students tested within this timeframe were subjected to the same inclusion rules that Curriculum Associates uses for new item calibration (i.e., embedded field test). This administration window was selected because it coincides with most districts’ first administration of the i-Ready Diagnostic. To ensure appropriately precise item parameter estimates, the sample was restricted to those items to which there were at least 300 students from each group (those below and those at-or-above the 30th percentile rank). Subgroup sample sizes and the counts of items included by grade for mathematics are presented in the table below. Analysis Grd < 30th % > 30th % Items Coefficient CI Item Invariance K 75,436 136,444 227 0.886 [0.854, 0.911] Item Invariance 1 106,874 263,264 383 0.832 [0.798, 0.860] Item Invariance 2 146,696 277,506 470 0.861 [0.836, 0.883] Item Invariance 3 167,020 315,559 467 0.849 [0.821, 0.872] Item Invariance 4 160,444 338,955 540 0.826 [0.798, 0.851] Item Invariance 5 163,664 328,824 603 0.825 [0.798, 0.849] Item Invariance 6 146,499 247,250 623 0.797 [0.767, 0.824] Item Invariance 7 121,737 215,261 655 0.788 [0.757, 0.815] Item Invariance 8 116,054 185,534 679 0.787 [0.756, 0.814] Note: Counts of students include all measurement occasions and hence may include the same unique student tested more than once.
- If computer administered, how many items are in the item bank for each grade level?
- For grades K-8, typical item pool sizes are 911, 1348, 1885, 2489, 2843, 3295, 3741, 4104, and 4303, respectively. Students who perform at an extremely high level will be served with items from grade levels higher than the grade level restriction.
- If your tool is computer administered, please note how the test forms are derived instead of providing alternate forms:
- The i-Ready Diagnostic and Growth Monitoring tests are computer adaptive, meaning the items presented to each student vary depending upon how the student has responded to the previous items. Upon completion of an item randomly selected from a set of five items around a predetermined starting difficulty level, interim ability estimates are updated, and the next item is chosen relative to the new interim ability estimate. Thus, the items can better target the estimated student ability, and more information is obtained from each item presented.
Decision Rules: Setting & Revising Goals
Grade | Kindergarten | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 |
---|---|---|---|---|---|---|---|---|---|
Rating |
- In your manual or published materials, do you specify validated decision rules for how to set and revise goals?
- No
- If yes, specify the decision rules:
-
What is the evidentiary basis for these decision rules?
NOTE: The TRC expects evidence for this standard to include an empirical study that compares a treatment group to a control and evaluates whether student outcomes increase when decision rules are in place.
Decision Rules: Changing Instruction
Grade | Kindergarten | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 |
---|---|---|---|---|---|---|---|---|---|
Rating |
- In your manual or published materials, do you specify validated decision rules for when changes to instruction need to be made?
- No
- If yes, specify the decision rules:
-
What is the evidentiary basis for these decision rules?
NOTE: The TRC expects evidence for this standard to include an empirical study that compares a treatment group to a control and evaluates whether student outcomes increase when decision rules are in place.
Data Collection Practices
Most tools and programs evaluated by the NCII are branded products which have been submitted by the companies, organizations, or individuals that disseminate these products. These entities supply the textual information shown above, but not the ratings accompanying the text. NCII administrators and members of our Technical Review Committees have reviewed the content on this page, but NCII cannot guarantee that this information is free from error or reflective of recent changes to the product. Tools and programs have the opportunity to be updated annually or upon request.