Iowa Assessments
Reading Tests, Forms E, F, G
Summary
The Iowa Assessments are a comprehensive set of measures that assess student achievement in Kindergarten through Grade 12. The tests are designed to provide a thorough assessment of a student’s progress in skills and standards that are essential to successful learning. The tests provide a continuous standard score scale based on the performance of nationally representative groups of students. Available for web-based or paper-based administration, three forms are available for Grades K–8; two forms are available at the high school levels. The exceptional quality of the Iowa Assessments comes in part from the unique, collaborative test development process. The tests are written by researchers at The University of Iowa and comprise a nationally norm-referenced test series developed by independent authors who teach measurement courses, conduct research on both measurement theory and practice, develop tests, and administer a statewide testing program. Constructing meaning from print, or reading comprehension, should be the main focus of reading instruction regardless of grade level. The Iowa Assessments Reading tests are designed with this underlying philosophy in mind. The scope and sequence provides a framework for assessing reading skills that is aligned with proven methods of instruction across the grades. The Reading tests focus on two critical strands in the process of learning to read: Reading Comprehension and Vocabulary. The scores from the Reading and Vocabulary tests together produce the Reading Total score. At the primary grades (Kindergarten through Grade 3), an optional Word Analysis test is available.
- Where to Obtain:
- Developer: Iowa Testing Programs, College of Education, The University of Iowa Publisher: Riverside Assessments LLC, dba Riverside Insights
- Inquiry@riversideinsights.com
- Riverside Insights, Attention Customer Service, One Pierce Place, Suite 900W, Itasca, IL 60143
- 800.323.9540
- https://www.riversideinsights.com/solutions/iowa-assessments
- Initial Cost:
- $14.50 per student
- Replacement Cost:
- Contact vendor for pricing details.
- Included in Cost:
- Initial cost is for online administration. Paper-based testing cost varies by grade, version of the test, and scoring package selected. Replacement cost: depends on mode of administration The Iowa Assessments can be administered online or on paper, in which mode a scannable test booklet (Grades K-2) or a reusable test booklet and a scannable answer sheet (Grades 3-12) are used. The cost varies by mode of administration, grade level/level of the test, scoring services desired, and version of the assessment (Complete/Core/Survey). Online testing for any grade level is $14.50. Machine-scorable test booklets are available in packages of 5 or 25, at an average cost per student between $9.20 - $12.80. Reusable test booklets are also available in packages of 5 or 25, at an average cost per student between $10.43 - $12.50. Answer documents come in packages of 25, and range in cost per student from $1.88 - $2.00. Braille and Large Print versions are also available for student testing. When the Iowa Assessments are administered online, students must have access to an internet-connected computer (Mac or PC). Teachers need internet-connected computers to access DataManager, Riverside Insights' web-based system for managing test administrations, and accessing score reports and ancillary materials.
- All measures were developed following Universal Design for Assessment guidelines to reduce the need for accommodations. A number of accommodations that are commonly used with students taking the Iowa Assessments are listed in the Directions for Administration. Braille (Grades 3–12) and Large-Print (Grades 1–8) editions as available; these forms require a paper-based administration.
- Training Requirements:
- 1-4 hrs of training
- Qualified Administrators:
- professional level
- Access to Technical Support:
- Help Desk via email and phone
- Assessment Format:
-
- Direct: Computerized
- Other: Direct: Paper based
- Scoring Time:
-
- Scoring is automatic
- Scores Generated:
-
- Raw score
- Standard score
- Percentile score
- Grade equivalents
- Stanines
- Normal curve equivalents
- Equated
- Lexile score
- Composite scores
- Subscale/subtest scores
- Other: predicted scores are available
- Administration Time:
-
- 52 minutes per student
- Scoring Method:
-
- Automatically (computer-scored)
- Technology Requirements:
-
- Computer or tablet
- Internet connection
- Other technology : Educators can also use printers to print reports and the manuals, if they wish, also headphones
- Accommodations:
- All measures were developed following Universal Design for Assessment guidelines to reduce the need for accommodations. A number of accommodations that are commonly used with students taking the Iowa Assessments are listed in the Directions for Administration. Braille (Grades 3–12) and Large-Print (Grades 1–8) editions as available; these forms require a paper-based administration.
Descriptive Information
- Please provide a description of your tool:
- The Iowa Assessments are a comprehensive set of measures that assess student achievement in Kindergarten through Grade 12. The tests are designed to provide a thorough assessment of a student’s progress in skills and standards that are essential to successful learning. The tests provide a continuous standard score scale based on the performance of nationally representative groups of students. Available for web-based or paper-based administration, three forms are available for Grades K–8; two forms are available at the high school levels. The exceptional quality of the Iowa Assessments comes in part from the unique, collaborative test development process. The tests are written by researchers at The University of Iowa and comprise a nationally norm-referenced test series developed by independent authors who teach measurement courses, conduct research on both measurement theory and practice, develop tests, and administer a statewide testing program. Constructing meaning from print, or reading comprehension, should be the main focus of reading instruction regardless of grade level. The Iowa Assessments Reading tests are designed with this underlying philosophy in mind. The scope and sequence provides a framework for assessing reading skills that is aligned with proven methods of instruction across the grades. The Reading tests focus on two critical strands in the process of learning to read: Reading Comprehension and Vocabulary. The scores from the Reading and Vocabulary tests together produce the Reading Total score. At the primary grades (Kindergarten through Grade 3), an optional Word Analysis test is available.
ACADEMIC ONLY: What skills does the tool screen?
- Please describe specific domain, skills or subtests:
- BEHAVIOR ONLY: Which category of behaviors does your tool target?
-
- BEHAVIOR ONLY: Please identify which broad domain(s)/construct(s) are measured by your tool and define each sub-domain or sub-construct.
Acquisition and Cost Information
Administration
- Are norms available?
- Yes
- Are benchmarks available?
- No
- If yes, how many benchmarks per year?
- If yes, for which months are benchmarks available?
- BEHAVIOR ONLY: Can students be rated concurrently by one administrator?
- If yes, how many students can be rated concurrently?
Training & Scoring
Training
- Is training for the administrator required?
- Yes
- Describe the time required for administrator training, if applicable:
- 1-4 hrs of training
- Please describe the minimum qualifications an administrator must possess.
- professional level
- No minimum qualifications
- Are training manuals and materials available?
- Yes
- Are training manuals/materials field-tested?
- No
- Are training manuals/materials included in cost of tools?
- No
- If No, please describe training costs:
- Training is provided through the Riverside Training Academy for an annual cost of $500–$1500, depending on the number of educators in the district.
- Can users obtain ongoing professional and technical support?
- Yes
- If Yes, please describe how users can obtain support:
- Help Desk via email and phone
Scoring
- Do you provide basis for calculating performance level scores?
-
Yes
- Does your tool include decision rules?
-
Yes
- If yes, please describe.
- Consistent with an RTI approach to screening, we recommend the 10th – 20th percentile rank for the decision rule when identifying students within a grade in need of intensive intervention. However, at the request of a district or school, alternative decisions rules are considered. For example, some schools elect to use a local percentile rank of 15, as well as a cut point corresponding to the “below basic” category on the state’s accountability exam.
- Can you provide evidence in support of multiple decision rules?
-
Yes
- If yes, please describe.
- There are two cut scores defined on the Iowa Assessments for each school system, which correspond to the state’s lowest performance level cut and to 15th percentile rank (PR) of the system’s state assessment score distribution. The former can be considered as a moderate intervention and the later as an intensive intervention.
- Please describe the scoring structure. Provide relevant details such as the scoring format, the number of items overall, the number of items per subscale, what the cluster/composite score comprises, and how raw scores are calculated.
- As the Iowa Assessments are fixed form tests, each form has its own raw score to scale score conversion. Raw scores are converted to national standard scores to adjust for random differences in the difficulty of the parallel forms, as well as to place scores from different grade levels onto the vertical scale to support interpretations about growth. This score scale was derived from a national standardization reflective of the entire student population. Composite scores are obtained by averaging the standard scores from specific subtests (e.g., Reading Total is the average of Reading Comprehension and Vocabulary). Standard scores are converted to national percentile ranks and grade equivalents to support additional interpretations. Scoring services are provided by Riverside Insights; score keys and Norms and Score Conversions guides can be purchased for local scoring.
- Describe the tool’s approach to screening, samples (if applicable), and/or test format, including steps taken to ensure that it is appropriate for use with culturally and linguistically diverse populations and students with disabilities.
- Comparative data collected at the time of standardization enable norm-referenced interpretations of student performance in addition to standards-based interpretations. It is through the standardization process that scores, scales, and norms are developed. The procedures used in the standardization of the Iowa Assessments are designed to make the norming sample reflect the national population as closely as possible, ensuring proportional representation of important groups of students. Many public and non-public schools cooperated in the National Comparison Study. The standardization program, planned jointly by Iowa Testing Programs and the company, was carried out as a single enterprise. The standardization program sample should be selected to represent the national population with respect to ability and achievement. It should be large enough to represent the diverse characteristics of the population, but a carefully selected sample of reasonable size would be preferred over a larger but less carefully selected sample. Sampling units should be chosen primarily on the basis of district size, region of the country, and socioeconomic characteristics as determined by the school’s Title I status and percent of students eligible for free- and reduced-price lunch. A balance between public and non-public schools should be obtained. To ensure applicability of norms to all students, testing accommodations for students who require them should be a regular part of the standard administrative conditions as designated in a student’s Individual Education Program (IEP) and in the accommodation practices of the participating schools. Many public and non-public schools cooperated in the National Comparison Study. Additionally, various approaches to understanding group differences in test scores are a regular part of research and test development efforts for the Iowa Assessments. To ensure that assessment materials are appropriate and fair for different groups, careful test development procedures are followed. Sensitivity reviews by content and fairness committees and extensive statistical analysis of the items and tests are conducted. The precision of measurement for important groups in the National Comparison Study is evaluated when examining the measurement characteristics of the tests. Differences between groups in average performance and in the variability of performance are also of interest, and these are examined for changes over time. In addition to descriptions of group differences in test performance, analyses of differential item functioning are undertaken with results from the national item tryouts as well as with results from the National Comparison Study.
Technical Standards
Classification Accuracy & Cross-Validation Summary
Grade |
Grade 1
|
Grade 2
|
Grade 3
|
Grade 4
|
Grade 5
|
Grade 6
|
Grade 7
|
Grade 8
|
Grade 9
|
Grade 10
|
Grade 11
|
Grade 12
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
Classification Accuracy Fall | ||||||||||||
Classification Accuracy Winter | ||||||||||||
Classification Accuracy Spring |
ACT
Classification Accuracy
- Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.
- The criterion is 100% independent of the Iowa Assessments. They are published by separate organizations and for slightly different purposes. Grades 6–12: Students' ACT scores served as the primary criterion. The ACT, generally taken during Grades 11–12, is commonly used by school districts to gauge their students’ level of college and career readiness as well as by postsecondary institutions for admissions and placement decisions. The ACT test is designed to measure knowledge and skills in four core academic content areas: English, mathematics, reading, and science. Reading is used here because, like the Reading test on the Iowa Assessments, it is a measure of general reading comprehension that requires students refer to what was explicitly stated, reason to determine implicit meanings, determine main ideas, understand sequence of events, and draw generalizations, among other things. Both tests offer diagnostic information in terms of a student’s strengths and weaknesses. The ACT is independent of the Iowa Assessments.
- Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.
- Describe how the classification analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).
- Cut points were defined to identify students in need of intensive intervention. This was set to the 10th PR on the ACT Reading test. In the state of Iowa, this corresponds to an ACT Reading score of 14, while scores on the Iowa Assessments will vary by grade. Note that state PRs were used, not PRs that are specific to the matched sample. Classification accuracy and area-under-the-curve results were calculated as described in the “Frequently Asked Questions” document (see https://intensiveintervention.org/sites/default/files/NCII_AcademicScreening_FAQ_2020-06-30.pdf for more details). Students needing intensive intervention based on their ACT score were contrasted with those not in need of intensive intervention.
- Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?
-
No
- If yes, please describe the intervention, what children received the intervention, and how they were chosen.
Cross-Validation
- Has a cross-validation study been conducted?
-
No
- If yes,
- Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.
- Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.
- Describe how the cross-validation analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).
- Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?
- If yes, please describe the intervention, what children received the intervention, and how they were chosen.
Iowa Statewide Assessment of Student Progress (ISASP)
Classification Accuracy
- Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.
- The criterion is 100% independent of the Iowa Assessments. The tests are published by separate organizations and for different purposes. Whereas the Iowa Assessments are national administered achievement tests published by Riverside Insights, the ISASP is published independently by Pearson and available only in the state of Iowa. However, both assessments measure general reading comprehension and are aligned to core standards.
- Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.
- Describe how the classification analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).
- Cut points were defined to identify students in need of intensive intervention. This was set to the 10th PR on the ISASP in the state of Iowa. As these samples vary by grade, the ISASP cut score and the Iowa Assessment score vary by grade as well. Note that state PRs were used, not PRs that are specific to the matched sample. Classification accuracy and area-under-the-curve results were calculated as described in the “Frequently Asked Questions” document (see https://intensiveintervention.org/sites/default/files/NCII_AcademicScreening_FAQ_2020-06-30.pdf for more details). Students needing intensive intervention based on their ISASP score were contrasted with those not in need of intensive intervention.
- Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?
-
No
- If yes, please describe the intervention, what children received the intervention, and how they were chosen.
Cross-Validation
- Has a cross-validation study been conducted?
-
No
- If yes,
- Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.
- Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.
- Describe how the cross-validation analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).
- Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?
- If yes, please describe the intervention, what children received the intervention, and how they were chosen.
FAST
Classification Accuracy
- Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.
- The criterion is 100% independent of the Iowa Assessments. They are produced by separate organizations and for slightly purposes. In Grades 1–5, the grade-specific FAST aReading score served as a criterion. The FAST aReading assessment is a computer-adaptive measure of broad reading ability for Grades K–12. aReading is designed for universal screening to identify students at risk for academic delays and to differentiate instruction for all students. It targets phonological awareness, orthography and morphology, vocabulary, concepts of print, and phonics.
- Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.
- Describe how the classification analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).
- Cut points were defined to identify students in need of intensive intervention. This was set to either the 10th PR or 20th PR on the FAST in this district, depending on grade. Classification accuracy and area-under-the-curve results were calculated as described in the “Frequently Asked Questions” document (see https://intensiveintervention.org/sites/default/files/NCII_AcademicScreening_FAQ_2020-06-30.pdf for more details). Students needing intensive intervention based on their FAST score were contrasted with those not in need of intensive intervention. The grade-specific FAST aReading score was used with the corresponding Reading score on the Iowa Assessments in each grade.
- Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?
-
No
- If yes, please describe the intervention, what children received the intervention, and how they were chosen.
Cross-Validation
- Has a cross-validation study been conducted?
-
No
- If yes,
- Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.
- Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.
- Describe how the cross-validation analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).
- Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?
- If yes, please describe the intervention, what children received the intervention, and how they were chosen.
Classification Accuracy - Fall
Evidence | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 | Grade 9 | Grade 10 | Grade 11 | Grade 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Criterion measure | Iowa Statewide Assessment of Student Progress (ISASP) | Iowa Statewide Assessment of Student Progress (ISASP) | Iowa Statewide Assessment of Student Progress (ISASP) | Iowa Statewide Assessment of Student Progress (ISASP) | Iowa Statewide Assessment of Student Progress (ISASP) | ACT | ACT | ACT | ACT | ACT | ACT | ACT |
Cut Points - Percentile rank on criterion measure | 20 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 |
Cut Points - Performance score on criterion measure | ||||||||||||
Cut Points - Corresponding performance score (numeric) on screener measure | 136 | 154 | 170 | 181 | 192 | 217 | 232 | 247 | 257 | 263 | 279 | 279 |
Classification Data - True Positive (a) | 52 | 66 | 307 | 327 | 300 | 217 | 232 | 197 | 235 | 183 | 227 | 156 |
Classification Data - False Positive (b) | 52 | 338 | 966 | 816 | 866 | 1567 | 1502 | 1279 | 1882 | 1569 | 1723 | 1138 |
Classification Data - False Negative (c) | 67 | 15 | 40 | 71 | 68 | 53 | 53 | 42 | 52 | 45 | 56 | 35 |
Classification Data - True Negative (d) | 432 | 1357 | 3982 | 4299 | 4332 | 6274 | 6412 | 5917 | 7533 | 6282 | 7958 | 4553 |
Area Under the Curve (AUC) | 0.77 | 0.88 | 0.91 | 0.90 | 0.91 | 0.87 | 0.88 | 0.91 | 0.89 | 0.88 | 0.89 | 0.88 |
AUC Estimate’s 95% Confidence Interval: Lower Bound | 0.72 | 0.86 | 0.90 | 0.89 | 0.89 | 0.86 | 0.87 | 0.89 | 0.88 | 0.87 | 0.87 | 0.86 |
AUC Estimate’s 95% Confidence Interval: Upper Bound | 0.82 | 0.91 | 0.92 | 0.92 | 0.92 | 0.89 | 0.90 | 0.92 | 0.91 | 0.90 | 0.91 | 0.90 |
Statistics | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 | Grade 9 | Grade 10 | Grade 11 | Grade 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Base Rate | 0.20 | 0.05 | 0.07 | 0.07 | 0.07 | 0.03 | 0.03 | 0.03 | 0.03 | 0.03 | 0.03 | 0.03 |
Overall Classification Rate | 0.80 | 0.80 | 0.81 | 0.84 | 0.83 | 0.80 | 0.81 | 0.82 | 0.80 | 0.80 | 0.82 | 0.80 |
Sensitivity | 0.44 | 0.81 | 0.88 | 0.82 | 0.82 | 0.80 | 0.81 | 0.82 | 0.82 | 0.80 | 0.80 | 0.82 |
Specificity | 0.89 | 0.80 | 0.80 | 0.84 | 0.83 | 0.80 | 0.81 | 0.82 | 0.80 | 0.80 | 0.82 | 0.80 |
False Positive Rate | 0.11 | 0.20 | 0.20 | 0.16 | 0.17 | 0.20 | 0.19 | 0.18 | 0.20 | 0.20 | 0.18 | 0.20 |
False Negative Rate | 0.56 | 0.19 | 0.12 | 0.18 | 0.18 | 0.20 | 0.19 | 0.18 | 0.18 | 0.20 | 0.20 | 0.18 |
Positive Predictive Power | 0.50 | 0.16 | 0.24 | 0.29 | 0.26 | 0.12 | 0.13 | 0.13 | 0.11 | 0.10 | 0.12 | 0.12 |
Negative Predictive Power | 0.87 | 0.99 | 0.99 | 0.98 | 0.98 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 |
Sample | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 | Grade 9 | Grade 10 | Grade 11 | Grade 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | Fall 2016 | Fall 2017 | Fall 2017 | Fall 2017 | Fall 2017 | Fall 2002 | Fall 2003 | Fall 2004 | Fall 2005 | Fall 2006 | Fall 2007 | Fall 2008 |
Sample Size | 603 | 1776 | 5295 | 5513 | 5566 | 8111 | 8199 | 7435 | 9702 | 8079 | 9964 | 5882 |
Geographic Representation | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) |
Male | 40.3% | 50.6% | 51.8% | 50.9% | 51.1% | 46.3% | 45.5% | 46.4% | 45.9% | 46.5% | 47.6% | 46.1% |
Female | 35.0% | 51.2% | 48.9% | 49.6% | 49.0% | 55.7% | 54.4% | 53.6% | 54.2% | 54.3% | 55.2% | 54.5% |
Other | ||||||||||||
Gender Unknown | 0.1% | 0.1% | ||||||||||
White, Non-Hispanic | 67.5% | 82.6% | 79.6% | 78.7% | 79.3% | 89.9% | 90.7% | 93.0% | 94.3% | 95.7% | 95.9% | 92.7% |
Black, Non-Hispanic | 2.0% | 2.4% | 5.8% | 6.7% | 5.3% | 3.0% | 2.8% | 2.1% | 1.4% | 1.5% | 2.1% | 2.3% |
Hispanic | 5.6% | 11.5% | 8.4% | 9.0% | 8.6% | 2.2% | 2.2% | 2.0% | 2.0% | 1.4% | 2.1% | 2.6% |
Asian/Pacific Islander | ||||||||||||
American Indian/Alaska Native | ||||||||||||
Other | 0.2% | 5.2% | 6.9% | 6.2% | 6.9% | 5.2% | 4.1% | 2.4% | 2.4% | 2.2% | 2.6% | 3.0% |
Race / Ethnicity Unknown | 1.8% | 0.2% | 0.6% | 0.0% | 0.0% | 0.0% | ||||||
Low SES | 18.7% | 34.0% | 33.8% | 34.2% | 32.8% | 13.3% | 13.3% | 12.3% | 11.2% | 10.0% | 11.1% | 12.0% |
IEP or diagnosed disability | 3.2% | 9.2% | 11.7% | 11.2% | 11.8% | 2.8% | 3.0% | 2.6% | 2.4% | 1.9% | 1.5% | 1.7% |
English Language Learner | 2.0% | 3.1% | 4.3% | 4.0% | 3.5% | 1.3% | 1.2% | 0.5% | 0.7% | 0.4% | 0.5% | 0.6% |
Classification Accuracy - Winter
Evidence | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 | Grade 9 | Grade 10 | Grade 11 | Grade 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Criterion measure | Iowa Statewide Assessment of Student Progress (ISASP) | Iowa Statewide Assessment of Student Progress (ISASP) | Iowa Statewide Assessment of Student Progress (ISASP) | Iowa Statewide Assessment of Student Progress (ISASP) | Iowa Statewide Assessment of Student Progress (ISASP) | ACT | ACT | ACT | ACT | ACT | ACT | ACT |
Cut Points - Percentile rank on criterion measure | 20 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 |
Cut Points - Performance score on criterion measure | ||||||||||||
Cut Points - Corresponding performance score (numeric) on screener measure | 142 | 155 | 172 | 185 | 200 | 220 | 232 | 247 | 253 | 263 | 279 | 279 |
Classification Data - True Positive (a) | 51 | 161 | 561 | 680 | 668 | 103 | 139 | 152 | 124 | 128 | 126 | 86 |
Classification Data - False Positive (b) | 56 | 734 | 1683 | 1762 | 1812 | 780 | 848 | 867 | 626 | 838 | 771 | 601 |
Classification Data - False Negative (c) | 40 | 41 | 121 | 148 | 148 | 23 | 34 | 33 | 28 | 29 | 29 | 19 |
Classification Data - True Negative (d) | 442 | 3007 | 8507 | 8730 | 8771 | 3481 | 4579 | 4858 | 3630 | 3352 | 3512 | 2445 |
Area Under the Curve (AUC) | 0.86 | 0.89 | 0.91 | 0.91 | 0.90 | 0.90 | 0.89 | 0.90 | 0.90 | 0.90 | 0.90 | 0.88 |
AUC Estimate’s 95% Confidence Interval: Lower Bound | 0.80 | 0.87 | 0.90 | 0.90 | 0.89 | 0.87 | 0.87 | 0.88 | 0.88 | 0.88 | 0.88 | 0.85 |
AUC Estimate’s 95% Confidence Interval: Upper Bound | 0.91 | 0.91 | 0.92 | 0.92 | 0.91 | 0.92 | 0.91 | 0.92 | 0.92 | 0.92 | 0.91 | 0.91 |
Statistics | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 | Grade 9 | Grade 10 | Grade 11 | Grade 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Base Rate | 0.15 | 0.05 | 0.06 | 0.07 | 0.07 | 0.03 | 0.03 | 0.03 | 0.03 | 0.04 | 0.03 | 0.03 |
Overall Classification Rate | 0.84 | 0.80 | 0.83 | 0.83 | 0.83 | 0.82 | 0.84 | 0.85 | 0.85 | 0.80 | 0.82 | 0.80 |
Sensitivity | 0.56 | 0.80 | 0.82 | 0.82 | 0.82 | 0.82 | 0.80 | 0.82 | 0.82 | 0.82 | 0.81 | 0.82 |
Specificity | 0.89 | 0.80 | 0.83 | 0.83 | 0.83 | 0.82 | 0.84 | 0.85 | 0.85 | 0.80 | 0.82 | 0.80 |
False Positive Rate | 0.11 | 0.20 | 0.17 | 0.17 | 0.17 | 0.18 | 0.16 | 0.15 | 0.15 | 0.20 | 0.18 | 0.20 |
False Negative Rate | 0.44 | 0.20 | 0.18 | 0.18 | 0.18 | 0.18 | 0.20 | 0.18 | 0.18 | 0.18 | 0.19 | 0.18 |
Positive Predictive Power | 0.48 | 0.18 | 0.25 | 0.28 | 0.27 | 0.12 | 0.14 | 0.15 | 0.17 | 0.13 | 0.14 | 0.13 |
Negative Predictive Power | 0.92 | 0.99 | 0.99 | 0.98 | 0.98 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 |
Sample | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 | Grade 9 | Grade 10 | Grade 11 | Grade 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | Winter 2016-2017 | Winter 2017-18 | Winter 2017-18 | Winter 2017-18 | Winter 2017-18 | Winter 2002 | Winter 2003 | Winter 2004 | Winter 2005 | Winter 2006 | Winter 2007 | Winter 2008 |
Sample Size | 589 | 3943 | 10872 | 11320 | 11399 | 4387 | 5600 | 5910 | 4408 | 4347 | 4438 | 3151 |
Geographic Representation | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) |
Male | 46.7% | 52.4% | 52.2% | 51.6% | 51.4% | 45.2% | 45.5% | 45.3% | 44.6% | 44.6% | 54.2% | 44.9% |
Female | 42.8% | 48.2% | 48.1% | 48.6% | 48.7% | 54.1% | 54.4% | 54.7% | 55.4% | 55.9% | 68.1% | 55.7% |
Other | ||||||||||||
Gender Unknown | 0.6% | 0.1% | ||||||||||
White, Non-Hispanic | 81.7% | 86.4% | 81.4% | 81.3% | 82.6% | 92.0% | 91.4% | 93.9% | 92.2% | 93.7% | 114.4% | 93.0% |
Black, Non-Hispanic | 2.0% | 0.9% | 2.3% | 2.1% | 2.1% | 0.6% | 1.4% | 1.7% | 1.5% | 1.6% | 2.1% | 2.2% |
Hispanic | 4.4% | 9.0% | 10.7% | 10.6% | 10.0% | 1.2% | 2.0% | 2.6% | 2.7% | 3.3% | 3.6% | 3.2% |
Asian/Pacific Islander | ||||||||||||
American Indian/Alaska Native | ||||||||||||
Other | 3.1% | 4.4% | 5.9% | 6.2% | 5.5% | 3.2% | 2.8% | 1.7% | 1.5% | 1.8% | 2.3% | 2.3% |
Race / Ethnicity Unknown | 0.3% | 3.1% | 2.4% | 0.0% | 2.0% | |||||||
Low SES | 36.7% | 42.1% | 37.9% | 38.3% | 38.2% | 12.3% | 12.9% | 13.3% | 14.2% | 15.4% | 16.5% | 14.6% |
IEP or diagnosed disability | 8.7% | 13.5% | 12.3% | 12.4% | 12.3% | 2.5% | 2.5% | 2.2% | 1.9% | 2.0% | 1.9% | 2.0% |
English Language Learner | 1.4% | 4.7% | 5.4% | 5.0% | 4.2% | 0.5% | 1.0% | 0.7% | 1.0% | 0.6% | 0.7% | 0.8% |
Classification Accuracy - Spring
Evidence | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 | Grade 9 | Grade 10 | Grade 11 | Grade 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Criterion measure | FAST | FAST | FAST | FAST | FAST | ACT | ACT | ACT | ACT | ACT | ACT | ACT |
Cut Points - Percentile rank on criterion measure | 10 | 10 | 20 | 20 | 20 | 10 | 10 | 10 | 10 | 10 | 10 | 10 |
Cut Points - Performance score on criterion measure | ||||||||||||
Cut Points - Corresponding performance score (numeric) on screener measure | 138 | 151 | 162 | 175 | 187 | 231 | 241 | 247 | 253 | 263 | 276 | 272 |
Classification Data - True Positive (a) | 82 | 107 | 169 | 166 | 103 | 62 | 50 | 91 | 75 | 78 | 59 | 51 |
Classification Data - False Positive (b) | 131 | 132 | 102 | 92 | 180 | 434 | 403 | 393 | 316 | 387 | 298 | 228 |
Classification Data - False Negative (c) | 14 | 22 | 33 | 28 | 22 | 12 | 12 | 22 | 16 | 19 | 14 | 11 |
Classification Data - True Negative (d) | 692 | 732 | 783 | 858 | 798 | 1739 | 1615 | 2397 | 1779 | 1549 | 1302 | 1116 |
Area Under the Curve (AUC) | 0.90 | 0.91 | 0.93 | 0.96 | 0.90 | 0.88 | 0.88 | 0.91 | 0.92 | 0.88 | 0.87 | 0.88 |
AUC Estimate’s 95% Confidence Interval: Lower Bound | 0.87 | 0.89 | 0.91 | 0.94 | 0.88 | 0.84 | 0.85 | 0.88 | 0.89 | 0.85 | 0.84 | 0.84 |
AUC Estimate’s 95% Confidence Interval: Upper Bound | 0.93 | 0.93 | 0.95 | 0.97 | 0.93 | 0.92 | 0.91 | 0.93 | 0.94 | 0.91 | 0.90 | 0.92 |
Statistics | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 | Grade 9 | Grade 10 | Grade 11 | Grade 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Base Rate | 0.10 | 0.13 | 0.19 | 0.17 | 0.11 | 0.03 | 0.03 | 0.04 | 0.04 | 0.05 | 0.04 | 0.04 |
Overall Classification Rate | 0.84 | 0.84 | 0.88 | 0.90 | 0.82 | 0.80 | 0.80 | 0.86 | 0.85 | 0.80 | 0.81 | 0.83 |
Sensitivity | 0.85 | 0.83 | 0.84 | 0.86 | 0.82 | 0.84 | 0.81 | 0.81 | 0.82 | 0.80 | 0.81 | 0.82 |
Specificity | 0.84 | 0.85 | 0.88 | 0.90 | 0.82 | 0.80 | 0.80 | 0.86 | 0.85 | 0.80 | 0.81 | 0.83 |
False Positive Rate | 0.16 | 0.15 | 0.12 | 0.10 | 0.18 | 0.20 | 0.20 | 0.14 | 0.15 | 0.20 | 0.19 | 0.17 |
False Negative Rate | 0.15 | 0.17 | 0.16 | 0.14 | 0.18 | 0.16 | 0.19 | 0.19 | 0.18 | 0.20 | 0.19 | 0.18 |
Positive Predictive Power | 0.38 | 0.45 | 0.62 | 0.64 | 0.36 | 0.13 | 0.11 | 0.19 | 0.19 | 0.17 | 0.17 | 0.18 |
Negative Predictive Power | 0.98 | 0.97 | 0.96 | 0.97 | 0.97 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 | 0.99 |
Sample | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | Grade 7 | Grade 8 | Grade 9 | Grade 10 | Grade 11 | Grade 12 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Date | Spring 2017 | Spring 2017 | Spring 2017 | Spring 2017 | Spring 2017 | Spring 2017 | Spring 2004 | Spring 2005 | Spring 2006 | Spring 2007 | Spring 2008 | Spring 2009 |
Sample Size | 919 | 993 | 1087 | 1144 | 1103 | 2247 | 2080 | 2903 | 2186 | 2033 | 1673 | 1406 |
Geographic Representation | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) | West North Central (IA) |
Male | 49.9% | 50.2% | 50.8% | 51.9% | 51.3% | 49.1% | 46.9% | 44.5% | 44.7% | 45.5% | 65.3% | 44.9% |
Female | 50.5% | 49.8% | 50.1% | 48.9% | 49.6% | 57.9% | 55.1% | 55.5% | 55.3% | 56.3% | 80.0% | 55.1% |
Other | ||||||||||||
Gender Unknown | 0.1% | 0.4% | ||||||||||
White, Non-Hispanic | 54.8% | 53.8% | 53.3% | 53.8% | 54.3% | 98.0% | 97.4% | 90.6% | 89.3% | 89.7% | 129.0% | 89.1% |
Black, Non-Hispanic | 27.0% | 27.9% | 29.4% | 26.8% | 28.6% | 0.7% | 0.7% | 4.3% | 5.3% | 5.8% | 8.1% | 4.9% |
Hispanic | 14.1% | 13.9% | 14.0% | 14.9% | 14.1% | 1.4% | 1.1% | 1.9% | 2.2% | 2.5% | 3.6% | 2.5% |
Asian/Pacific Islander | ||||||||||||
American Indian/Alaska Native | ||||||||||||
Other | 4.4% | 4.3% | 4.1% | 4.9% | 3.7% | 4.8% | 2.4% | 3.2% | 3.2% | 3.6% | 4.5% | 3.5% |
Race / Ethnicity Unknown | 0.1% | 0.1% | 0.1% | 0.3% | 0.1% | 2.3% | 0.9% | 0.2% | ||||
Low SES | 58.2% | 59.5% | 59.0% | 54.0% | 55.5% | 14.6% | 13.5% | 17.6% | 20.3% | 20.6% | 26.7% | 18.7% |
IEP or diagnosed disability | 19.4% | 19.9% | 20.6% | 20.6% | 20.9% | 2.6% | 2.5% | 3.5% | 3.5% | 3.4% | 3.9% | 2.6% |
English Language Learner | 3.5% | 3.2% | 3.2% | 3.3% | 3.4% | 0.8% | 0.2% | 1.1% | 3.0% | 1.2% | 1.3% | 1.3% |
Reliability
Grade |
Grade 1
|
Grade 2
|
Grade 3
|
Grade 4
|
Grade 5
|
Grade 6
|
Grade 7
|
Grade 8
|
Grade 9
|
Grade 10
|
Grade 11
|
Grade 12
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
Rating | d | d | d | d | d | d | d | d |
- *Offer a justification for each type of reliability reported, given the type and purpose of the tool.
- The model-based reliability estimates the proportion of variability within a single administration of a test that is due to inconsistency among the items that comprise the test. This is akin to commonly reported internal consistency measures (e.g., Cronbach’s alpha, KR-20).
- *Describe the sample(s), including size and characteristics, for each reliability analysis conducted.
- Data come from a national probability sample that was weighted to match the characteristics of the national student population. Table 2.1 presents then-counts for the sample from the national probability sample. Tables 2.1.A presents the percentages of students by school type. Tables 2.1.B and C present the percentages of the geographical region for public and private schools, respectively. Table 2.1.D presents the percentages of the racial-ethnic representation. Note that tables 2.1A to D include Grades K to 12.
- *Describe the analysis procedures for each reported type of reliability.
- Reliability is defined as the proportion of test score variance that is attributable to true variation in the trait the test measures. The model-based reliability coefficients were estimated from the item-response records in the standardization sample under classical test theory. This national sample was weighted to be reflective of the national student population and used to calculate reliability using an essentially tau-equivalent model. The 95% confidence interval for the reliability estimate was computed using Feldt 's (1965) procedures. Although not summarized here, the Iowa Assessments Research and Development Guide includes several other types of reliability information, including split-half reliability, test-retest reliability, alternate forms reliability, and CSEMS. Another study examined the comparability of paper-based and computer-based administrations of the Iowa Assessments. In this study, the same students took both administration modes. The order of testing modes was counterbalanced, and an interval of between one and two weeks separated the two administrations. Correlations between scores in different modes can be interpreted as estimates of test -retest reliability. While the mode of administration does represent an additional source of variation in these scores, high correlations constitute evidence that the combined effects of temporal changes in examinees and administrative conditions are small.
*In the table(s) below, report the results of the reliability analyses described above (e.g., internal consistency or inter-rater reliability coefficients).
Type of | Subgroup | Informant | Age / Grade | Test or Criterion | n | Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of reliability analysis not compatible with above table format:
- Manual cites other published reliability studies:
- No
- Provide citations for additional published studies.
- Do you have reliability data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
- Yes
If yes, fill in data for each subgroup with disaggregated reliability data.
Type of | Subgroup | Informant | Age / Grade | Test or Criterion | n | Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of reliability analysis not compatible with above table format:
- Manual cites other published reliability studies:
- No
- Provide citations for additional published studies.
Validity
Grade |
Grade 1
|
Grade 2
|
Grade 3
|
Grade 4
|
Grade 5
|
Grade 6
|
Grade 7
|
Grade 8
|
Grade 9
|
Grade 10
|
Grade 11
|
Grade 12
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
Rating |
- *Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.
- Generally speaking, the better a test measures its intended construct and can support its intended interpretations, the stronger its validity is said to be. Validity evidence comes from a variety of sources, including reviews on the adequacy and coverage of a test’s content, the ability of test scores to predict performance on other measures and provide diagnostic information, and the ability to draw accurate inferences about the current status of a test taker with respect to a given construct. The Technical Manuals for the Iowa Assessments summarize validity evidence from a variety of sources while also providing test blueprints and test development frameworks, including Universal Design, so that potential users may review and evaluate the accuracy of the Iowa Assessments for their specific needs. The validity evidence presented here is based on the relationships between test scores on the Iowa Assessments and state content-aligned accountability tests or other large-scale achievement tests. These empirical relationships, summarized with correlations, include concurrent evidence, predictive evidence, and construct evidence. Concurrent evidence comes from when the criterion test is taken within close proximity of the Iowa Assessments, usually about a month apart. Predictive evidence comes from correlations between scores on the Iowa Assessments and scores from an external measure taken 3 months to 6 years after the student took the Iowa Assessments. In addition to concurrent and predictive validity evidence, construct validity evidence comes from the degree and stability of the relationship of Iowa Assessment scores with ISASP scores. Whereas the Iowa Assessments are national administered achievement tests published by Riverside Insights, the ISASP is published independently by Pearson and available only in the state of Iowa. However, both assessments are aligned to core standards. As both the Iowa Assessments and the ISASP are measures of general achievement, aligned to core standards and, in the Grades 3-8 samples, mandated by the state at the time of administration, the correlations between these assessments provide evidence of construct validity for the Iowa Assessments and the ability scale underlying. In Grades 1-5, one criterion was the grade-specific FAST aReading score. The FAST aReading assessment is a computer-adaptive measure of broad reading ability for Grades K-12. aReading is designed for universal screening to identify students at risk for academic delay. It targets phonological awareness, orthography and morphology, vocabulary, concepts of print, and phonics. Like the Iowa Assessments, FAST is designed to identify student's strengths and weakness, monitor student growth in reading, and inform student instruction. Scores on the ACT served as a criterion in Grades 6-12. The ACT is generally taken during Grades 11-12 and it is commonly used by school districts to gauge their students' level of college and career readiness as well as by postsecondary institutions for admissions and placement decisions. The ACT test is designed to measure knowledge and skills in four core academic content areas: English, mathematics, reading, and science. Reading is used here because, like the Reading test on the Iowa Assessments, it is a measure of general reading comprehension that requires students refer to what was explicitly state, reason to determine implicit meanings, determine main ideas, understand sequence of events, and draw generalizations, amongst other similarities. The State of Texas Assessments of Academic Readiness, or STAAR®, is the state testing program that was implemented in the 2011–2012 school year. The Texas Education Agency (TEA), in collaboration with the Texas Higher Education Coordinating Board (THECB) and Texas educators, developed the STAAR program in response to requirements set forth by the 80th and 81st Texas legislatures. STAAR is an assessment program designed to measure the extent to which students have learned and are able to apply the knowledge and skills defined in the state-mandated curriculum standards, the Texas Essential Knowledge and Skills. Predictive validity coefficients with the Texas STAAR are also presented for some grades.
- *Describe the sample(s), including size and characteristics, for each validity analysis conducted.
- The FAST samples come from one of Iowa's most diverse and largest districts. It represents the full range of student achievement. Sample sizes were generally above 1,000. Other characteristics of each grade are presented in the Classification Accuracy Summary tables. FAST scores come from the Spring 2018 semester. The ACT samples comprise the same students presented in the Classification Accuracy summary tables. Each sample represents potential college-bound students and demonstrates how, even in middle school, the development of student achievement is strongly related to students' later readiness for college. Samples were representative of the student population and always above 10,000 when examining predictive valid. For concurrent validity in Grades 9-11, sample sizes ranged from 70 to over 1000. Some predictive validity coefficients come from the Iowa Assessments and Texas STAAR prediction study, where prediction equations were developed. Results from the Iowa Assessments administration in January 2017 for Grades 3 through 5 and the STAAR assessment in March and May 2017 were used. The Texas STAAR samples had over 4,000 students in each grade (see report for additional details). Construct-related validity coefficients come from correlations between the Iowa Assessments, taken during the 2017-2018 school year, and the ISASP, taken during the 2018-2019 school year. Except for grade 2, these statewide samples contain over 35,000 students and are representative of the full student population.
- *Describe the analysis procedures for each reported type of validity.
- For each type of validity, correlations between scores on the Iowa Assessments and the criterion were estimated and the 95% confidence interval computed using the Fisher z-transformation.
*In the table below, report the results of the validity analyses described above (e.g., concurrent or predictive validity, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.
Type of | Subgroup | Informant | Age / Grade | Test or Criterion | n | Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of validity analysis not compatible with above table format:
- Manual cites other published reliability studies:
- Yes
- Provide citations for additional published studies.
- Fina, A. (2014). Growth and college readiness of Iowa students: A longitudinal study linking growth to college outcomes. Unpublished doctoral dissertation, The University of Iowa, Iowa City. Retrieved from http://ir.uiowa.edu/etd/1455 Fina, A., Welch, C. J., Dunbar, S. B., and Ansley, T. N. (2015). College readiness with the Iowa Assessments. Iowa City, IA: Iowa Testing Programs. Retrieved from https://itp.education.uiowa.edu/ia/WhitePapers.aspx Furgol, K., Fina, A., and Welch, C. J. (2011). Establishing validity evidence to assess college readiness through a vertical scale. Iowa City, IA: Iowa Testing Programs. Retrieved from https://itp.education.uiowa.edu/ia/LinkToResearch.aspx Kapoor, S. (2014). Growth sensitivity and standardized assessments: New evidence on the relationship. Unpublished doctoral dissertation, The University of Iowa, Iowa City. Retrieved from http://ir.uiowa.edu/etd/1472 Tudor, J. (2015). Developing a national frame of reference on student achievement by weighing student records from a state assessment. Unpublished doctoral dissertation, The University of Iowa, Iowa City. Rankin, A. D., LaFond, L., Welch, C. J., and Dunbar, S. B. (2013). Fairness report for the Iowa Assessments. Iowa City, IA: Iowa Testing Programs. Retrieved from https://itp.education.uiowa.edu/ia/Research.aspx Wang, M., Chen, K., and Welch, C. J. (2012). Evaluating college readiness for English language learners and Hispanic and Asian students. Paper presented at the annual meeting of the American Educational Research Association, Vancouver. Retrieved from https://itp.education.uiowa.edu/ia/LinkToResearch.aspx Welch, C. J., and Dunbar, S. B. (2011). K–12 assessments and college readiness: Necessary validity evidence for educators, teachers and parents. Paper presented at the annual meeting of the American Educational Research Association, New Orleans. Retrieved from https://itp.education.uiowa.edu/ia/LinkToResearch.aspx Welch, C. J., and Dunbar, S. B. (2014a). Comparative evaluation of online and paper and pencil forms of the Iowa Assessments. Iowa City, IA: Iowa Testing Programs. Retrieved from https://itp.education.uiowa.edu/ia/LinkToResearch.aspx Welch, C. J., and Dunbar, S. B. (2014b). Measuring growth with the Iowa Assessments. Iowa Testing Programs Black and Gold Paper 01‒14. Iowa City, IA: Iowa Testing Programs. Retrieved from https://itp.education.uiowa.edu/ia/WhitePapers.aspx
- Describe the degree to which the provided data support the validity of the tool.
- The evidence presented above strongly supports the use of the Iowa Assessments as a screening and intervention tool. For example, even though FAST has an adaptive platform while the Iowa Assessments used in this study are paper-based, validity coefficients with FAST were high (.70-.80). With the ACT, the strong and consistent relationships observed between scores on the Iowa Assessments and the ACT are remarkable. The validity coefficients between the Iowa Assessments and the Texas STAAR were high as well, as were the correlations between the Iowa Assessments and ISASP. This validity evidence demonstrates consistently strong relationships between the Iowa Assessments and the criterion measures, across all grades and times of year reported.
- Do you have validity data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?
- No
If yes, fill in data for each subgroup with disaggregated validity data.
Type of | Subgroup | Informant | Age / Grade | Test or Criterion | n | Median Coefficient | 95% Confidence Interval Lower Bound |
95% Confidence Interval Upper Bound |
---|
- Results from other forms of validity analysis not compatible with above table format:
- Manual cites other published reliability studies:
- No
- Provide citations for additional published studies.
Bias Analysis
Grade |
Grade 1
|
Grade 2
|
Grade 3
|
Grade 4
|
Grade 5
|
Grade 6
|
Grade 7
|
Grade 8
|
Grade 9
|
Grade 10
|
Grade 11
|
Grade 12
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
Rating | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
- Have you conducted additional analyses related to the extent to which your tool is or is not biased against subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)? Examples might include Differential Item Functioning (DIF) or invariance testing in multiple-group confirmatory factor models.
- Yes
- If yes,
- a. Describe the method used to determine the presence or absence of bias:
- questions in contexts accessible to students with a variety of backgrounds and interests. A goal of all test development in Iowa Testing Programs is to assemble test materials that reflect the diversity of the test-taking population in the United States. Reviewers are given information about the purposes of the tests, the content areas, and cognitive classifications. They are asked to look for possible racial-ethnic, regional, cultural, or gender biases in the way the item was written or in the information required to answer the question. The reviewers rate items as “probably fair,’’ “possibly unfair,” or “probably unfair” and comment on the balance of the items and make recommendations for change. Based on these reviews, items identified by the reviewers as problematic are either revised to eliminate objectionable features or eliminated from consideration for the final forms. Differential Item Functioning (DIF): DIF identifies items that function differently for two groups of examinees with the same total test score. In many cases, one group will be more likely to answer an item correctly on average than another group. These differences might be due to differing levels of knowledge and skills between the groups. DIF analyses take these group differences into account and help identify items that might unfairly favor one group over another. The items that are identified as potentially unfair by DIF are then presented for additional review. The statistical analyses of items for DIF are based on variants of the Mantel-Haenszel (MH) procedure (Dorans and Holland, 1993) . Specific item-level comparisons of performance are made for groups of males and females, Blacks and Whites, and Hispanics and Whites. The number of items identified as favoring a given group according to the classification scheme used by the Educational Testing Service (ETS) for the National Assessment of Educational Progress (NAEP) is shown in Table 5.1 below. Differential Test Functioning (DTF): A series of logistic regressions are conducted in predicting success on an end of year outcome measures (i.e., Georgia Milestones and Texas STAAR), predicted by risk-status as determined by the Iowa Assessments, membership in a gender and ethnicity groups, and an interaction term between the two variables. The presence or absence of bias are determined based on the statistical significance of the interaction term.
- b. Describe the subgroups for which bias analyses were conducted:
- Gender (Male vs. Female) and Ethnicity groups (Blacks vs. White and Hispanic vs. White) are considered for DIF and DTF. Analyses were conducted across grades K-8.
- c. Describe the results of the bias analyses conducted, including data and interpretative statements. Include magnitude of effect (if available) if bias has been identified.
- Differential Item Functioning The overall percentages of items flagged for DIF in each form are very small and generally balanced across comparison groups. This is the goal of careful attention to content relevance and sensitivity during test development. Table 5.2 presents the number of items identified in category C from a national standardization study.
Data Collection Practices
Most tools and programs evaluated by the NCII are branded products which have been submitted by the companies, organizations, or individuals that disseminate these products. These entities supply the textual information shown above, but not the ratings accompanying the text. NCII administrators and members of our Technical Review Committees have reviewed the content on this page, but NCII cannot guarantee that this information is free from error or reflective of recent changes to the product. Tools and programs have the opportunity to be updated annually or upon request.