Iowa Assessments
Reading Tests, Forms E, F, G

Summary

Descriptive Information

The Iowa Assessments are a comprehensive set of measures that assess student achievement in Kindergarten through Grade 12. The tests are designed to provide a thorough assessment of a student’s progress in skills and standards that are essential to successful learning. The tests provide a continuous standard score scale based on the performance of nationally representative groups of students. Available for web-based or paper-based administration, three forms are available for Grades K–8; two forms are available at the high school levels. The exceptional quality of the Iowa Assessments comes in part from the unique, collaborative test development process. The tests are written by researchers at The University of Iowa and are the nation’s only nationally norm-referenced test series developed by independent authors who teach measurement courses, conduct research on both measurement theory and practice, develop tests, and administer a statewide testing program. Constructing meaning from print, or reading comprehension, should be the main focus of reading instruction regardless of grade level. The Iowa Assessments Reading tests are designed with this underlying philosophy in mind. The scope and sequence provides a framework for assessing reading skills that is aligned with proven methods of instruction across the grades. The Reading tests focus on two critical strands in the process of learning to read: Reading Comprehension and Vocabulary. The scores from the Reading and Vocabulary tests together produce the Reading Total score. At the primary grades (Kindergarten through Grade 3), an optional Word Analysis test is available.

Acquisition & Cost

Where to Obtain:: Developer: Iowa Testing Programs, College of Education, The University of Iowa Publisher: Houghton Mifflin Harcourt; AssessmentsCS@hmhco.com; HMH, Attention Customer Experience Support-Assessments, 255 38th Avenue, Suite L, St. Charles, IL 60174; 800.323.9540; https://www.hmhco.com/programs/iowa-assessments

Initial Cost:: $14.00 per student

Replacement Cost:: Contact vendor for pricing details.

Included in Cost:: Replacement cost: depends on mode of administration The Iowa Assessments can be administered on paper, requiring a scannable test booklet (Grades K–2) or a reusable test booklet and a scannable answer sheet (Grades 3–12). They can also be administered online. The cost varies by mode of administration, grade level/level of the test, scoring services desired, and version of the assessment (Complete/Core/Survey). Machine-scorable Test Booklets are available in packages of 5 or 25, at an average cost per student between $9.15 – $12.27. Reusable Test Bookless are also available in packages of 5 or 25, at an average cost per student between $7.82 – $9.60. Answer Documents come in packages of 15, 50, or 100, and range in cost per student from $1.56 – $1.92. Additionally, Braille and Large Print versions are available options for student testing. When the Iowa Assessments are administered online, students must have access to an Internet-connected computer (Mac or PC). Teachers need Internet-connected computers to access DataManager, HMH’s web-based system for managing test administrations and accessing score reports and ancillary materials. Online Testing is priced at $14/student.; All measures were developed following Universal Design for Assessment guidelines to reduce the need for accommodations. A number of accommodations that are commonly used with students taking the Iowa Assessments are listed in the Directions for Administration. Braille (Grades 3–12) and Large-Print (Grades 1–8) editions as available; these forms require a paper-based administration.

Training & Technical Support

Training Requirements:: 1-4 hrs of training

Qualified Administrators:: professional level

Access to Technical Support:: Help Desk via email and phone

Administration

Assessment Format:

Direct: Computerized
Other: Direct: Paper based

Scoring Time:

Scoring is automatic

Scores Generated:

Raw score
Standard score
Percentile score
Grade equivalents
Stanines
Normal curve equivalents
Equated
Lexile score
Composite scores
Subscale/subtest scores
Other: predicted scores are available

Administration Time:

52 minutes per student

Scoring Method:

Automatically (computer-scored)

Technology Requirements:

Computer or tablet
Internet connection
Other technology : Educators can also use printers to print reports and the manuals, if they wish, also headphones

Accommodations:: All measures were developed following Universal Design for Assessment guidelines to reduce the need for accommodations. A number of accommodations that are commonly used with students taking the Iowa Assessments are listed in the Directions for Administration. Braille (Grades 3–12) and Large-Print (Grades 1–8) editions as available; these forms require a paper-based administration.

Descriptive Information

Please provide a description of your tool:: The Iowa Assessments are a comprehensive set of measures that assess student achievement in Kindergarten through Grade 12. The tests are designed to provide a thorough assessment of a student’s progress in skills and standards that are essential to successful learning. The tests provide a continuous standard score scale based on the performance of nationally representative groups of students. Available for web-based or paper-based administration, three forms are available for Grades K–8; two forms are available at the high school levels. The exceptional quality of the Iowa Assessments comes in part from the unique, collaborative test development process. The tests are written by researchers at The University of Iowa and are the nation’s only nationally norm-referenced test series developed by independent authors who teach measurement courses, conduct research on both measurement theory and practice, develop tests, and administer a statewide testing program. Constructing meaning from print, or reading comprehension, should be the main focus of reading instruction regardless of grade level. The Iowa Assessments Reading tests are designed with this underlying philosophy in mind. The scope and sequence provides a framework for assessing reading skills that is aligned with proven methods of instruction across the grades. The Reading tests focus on two critical strands in the process of learning to read: Reading Comprehension and Vocabulary. The scores from the Reading and Vocabulary tests together produce the Reading Total score. At the primary grades (Kindergarten through Grade 3), an optional Word Analysis test is available.

The tool is intended for use with the following grade(s).

Preschool / Pre - kindergarten
selected

Kindergarten
selected

First grade
selected

Second grade
selected

Third grade
selected

Fourth grade
selected

Fifth grade
selected

Sixth grade
selected

Seventh grade
selected

Eighth grade
selected

Ninth grade
selected

Tenth grade
selected

Eleventh grade
selected

Twelfth grade

The tool is intended for use with the following age(s).

0-4 years old
selected

5 years old
selected

6 years old
selected

7 years old
selected

8 years old
selected

9 years old
selected

10 years old
selected

11 years old
selected

12 years old
selected

13 years old
selected

14 years old
selected

15 years old
selected

16 years old
selected

17 years old
selected

18 years old

The tool is intended for use with the following student populations.

Students in general education
not selected

Students with disabilities
not selected

English language learners

ACADEMIC ONLY: What skills does the tool screen?

Reading

Phonological processing:

RAN

Memory

Awareness

Letter sound correspondence
not selected

Phonics

Structural analysis

Word ID

Accuracy

Speed

Nonword

Accuracy

Speed

Spelling

Accuracy

Speed

Passage

Accuracy

Speed

Reading comprehension:

Multiple choice questions
not selected

Cloze

Constructed Response
not selected

Retell

Maze

Sentence verification
not selected

Other (please describe):

Listening comprehension:

Multiple choice questions
not selected

Cloze

Constructed Response
not selected

Retell

Maze

Sentence verification
not selected

Vocabulary
not selected

Expressive
not selected

Receptive

Mathematics

Global Indicator of Math Competence

Accuracy

Speed

Multiple Choice
not selected

Constructed Response

Early Numeracy

Accuracy

Speed

Multiple Choice
not selected

Constructed Response

Mathematics Concepts

Accuracy

Speed

Multiple Choice
not selected

Constructed Response

Mathematics Computation

Accuracy

Speed

Multiple Choice
not selected

Constructed Response

Mathematic Application

Accuracy

Speed

Multiple Choice
not selected

Constructed Response

Fractions/Decimals

Accuracy

Speed

Multiple Choice
not selected

Constructed Response

Algebra

Accuracy

Speed

Multiple Choice
not selected

Constructed Response

Geometry

Accuracy

Speed

Multiple Choice
not selected

Constructed Response

Other (please describe):

Please describe specific domain, skills or subtests:

BEHAVIOR ONLY: Which category of behaviors does your tool target?: Internalizing
Externalizing
Internalizing and Externalizing

BEHAVIOR ONLY: Please identify which broad domain(s)/construct(s) are measured by your tool and define each sub-domain or sub-construct.

Acquisition and Cost Information

Where to obtain:

Email Address: AssessmentsCS@hmhco.com
Address: HMH, Attention Customer Experience Support-Assessments, 255 38th Avenue, Suite L, St. Charles, IL 60174
Phone Number: 800.323.9540
Website: https://www.hmhco.com/programs/iowa-assessments

Initial cost for implementing program:

Cost: $14.00
Unit of cost: student

Replacement cost per unit for subsequent use:

Cost
Unit of cost
Duration of license: 1 year

Additional cost information:

Describe basic pricing plan and structure of the tool. Provide information on what is included in the published tool, as well as what is not included but required for implementation.: Replacement cost: depends on mode of administration The Iowa Assessments can be administered on paper, requiring a scannable test booklet (Grades K–2) or a reusable test booklet and a scannable answer sheet (Grades 3–12). They can also be administered online. The cost varies by mode of administration, grade level/level of the test, scoring services desired, and version of the assessment (Complete/Core/Survey). Machine-scorable Test Booklets are available in packages of 5 or 25, at an average cost per student between $9.15 – $12.27. Reusable Test Bookless are also available in packages of 5 or 25, at an average cost per student between $7.82 – $9.60. Answer Documents come in packages of 15, 50, or 100, and range in cost per student from $1.56 – $1.92. Additionally, Braille and Large Print versions are available options for student testing. When the Iowa Assessments are administered online, students must have access to an Internet-connected computer (Mac or PC). Teachers need Internet-connected computers to access DataManager, HMH’s web-based system for managing test administrations and accessing score reports and ancillary materials. Online Testing is priced at $14/student.

Provide information about special accommodations for students with disabilities.: All measures were developed following Universal Design for Assessment guidelines to reduce the need for accommodations. A number of accommodations that are commonly used with students taking the Iowa Assessments are listed in the Directions for Administration. Braille (Grades 3–12) and Large-Print (Grades 1–8) editions as available; these forms require a paper-based administration.

Administration

BEHAVIOR ONLY: What type of administrator is your tool designed for?

General education teacher
not selected

Special education teacher
not selected

Parent

Child

External observer
not selected

Other

If other, please specify:

What is the administration setting?

Direct observation
not selected

Rating scale
not selected

Checklist

Performance measure
not selected

Questionnaire
selected

Direct: Computerized
not selected

One-to-one
selected

Other

If other, please specify:

Direct: Paper based

Does the tool require technology?

Yes

If yes, what technology is required to implement your tool? (Select all that apply)

Computer or tablet
selected

Internet connection
selected

Other technology (please specify)

If your program requires additional technology not listed above, please describe the required technology and the extent to which it is combined with teacher small-group instruction/intervention:

Educators can also use printers to print reports and the manuals, if they wish, also headphones

What is the administration context?

Individual
selected

Small group If small group, n=
selected

Large group If large group, n=
not selected

Computer-administered
not selected

Other

If other, please specify:

What is the administration time?

Time in minutes

per (student/group/other unit)

student

Additional scoring time:

Time in minutes

per (student/group/other unit)

student

ACADEMIC ONLY: What are the discontinue rules?

No discontinue rules provided
not selected

Basals

Ceilings

Other

If other, please specify:

Are norms available?: Yes

Are benchmarks available?: No
If yes, how many benchmarks per year?
If yes, for which months are benchmarks available?

BEHAVIOR ONLY: Can students be rated concurrently by one administrator?
If yes, how many students can be rated concurrently?

Training & Scoring

Training

Is training for the administrator required?: Yes

Describe the time required for administrator training, if applicable:: 1-4 hrs of training

Please describe the minimum qualifications an administrator must possess.: professional level; No minimum qualifications

Are training manuals and materials available?: Yes

Are training manuals/materials field-tested?: No

Are training manuals/materials included in cost of tools?: No
If No, please describe training costs:: Training can occur either in-person or online. Online Training starts at $200.00. A more in-depth online training is $800.00, with onsite training starting at $2,950.00_______________________

Can users obtain ongoing professional and technical support?: Yes
If Yes, please describe how users can obtain support:: Help Desk via email and phone

Scoring

How are scores calculated?

Manually (by hand)
selected

Automatically (computer-scored)
not selected

Other

If other, please specify:

Do you provide basis for calculating performance level scores?: Yes

What is the basis for calculating performance level and percentile scores?

Age norms

Grade norms
not selected

Classwide norms
not selected

Schoolwide norms
selected

Stanines

Normal curve equivalents

What types of performance level scores are available?

Raw score

Standard score
selected

Percentile score
selected

Grade equivalents
not selected

IRT-based score
not selected

Age equivalents
selected

Stanines

Normal curve equivalents
not selected

Developmental benchmarks
not selected

Developmental cut points
selected

Equated

Probability
selected

Lexile score
not selected

Error analysis
selected

Composite scores
selected

Subscale/subtest scores
selected

Other

If other, please specify:

predicted scores are available

Does your tool include decision rules?: Yes
If yes, please describe.: In general, the Iowa Assessments adopts the 20th local percentile rank for its decision rule when identifying students within a grade in need of intensive intervention. However, at the request of a district or school, alternative decisions rules are considered. For example, some schools elect to use a local percentile rank of 15, as well as a cut point corresponding to the “below basic” category on the state’s accountability examine.

Can you provide evidence in support of multiple decision rules?: Yes
If yes, please describe.: As seen in the prediction study report attached , there are two cut scores defined on the Iowa Assessments for each school system, which correspond to the state’s lowest performance level cut and to 15th percentile rank (PR) of the system’s state assessment score distribution. The former can be considered as a moderate intervention and the later as an intensive intervention. However, for the NCII Screening Tools application, only the 20th PR for the ACT and FAST, and15th PR for the Georgia Milestones and Texas STAAR should be considered.

Please describe the scoring structure. Provide relevant details such as the scoring format, the number of items overall, the number of items per subscale, what the cluster/composite score comprises, and how raw scores are calculated.: Score keys and Norms and Score Conversions Guides can be purchased The number of questions a student gets right on a test is the student’s raw score. By itself, a raw score has little or no meaning; therefore, raw scores are usually converted to other types of scores for interpretational purposes, including standard scores. Composite scores are obtained by averaging the developmental standard scores from certain component tests. The average standard score can be converted to a percentile rank, grade equivalent, or other type of score for interpretational purposes

Describe the tool’s approach to screening, samples (if applicable), and/or test format, including steps taken to ensure that it is appropriate for use with culturally and linguistically diverse populations and students with disabilities.: Comparative data collected at the time of standardization enable norm-referenced interpretations of student performance in addition to standards-based interpretations. It is through the standardization process that scores, scales, and norms are developed. The procedures used in the standardization of the Iowa Assessments are designed to make the norming sample reflect the national population as closely as possible, ensuring proportional representation of important groups of students. Many public and non-public schools cooperated in the National Comparison Study. The standardization program, planned jointly by Iowa Testing Programs and the company, was carried out as a single enterprise. The standardization program sample should be selected to represent the national population with respect to ability and achievement. It should be large enough to represent the diverse characteristics of the population, but a carefully selected sample of reasonable size would be preferred over a larger but less carefully selected sample. Sampling units should be chosen primarily on the basis of district size, region of the country, and socioeconomic characteristics as determined by the school’s Title I status and percent of students eligible for free- and reduced-price lunch. A balance between public and non-public schools should be obtained. To ensure applicability of norms to all students, testing accommodations for students who require them should be a regular part of the standard administrative conditions as designated in a student’s Individual Education Program (IEP) and in the accommodation practices of the participating schools. Many public and non-public schools cooperated in the National Comparison Study. Additionally, various approaches to understanding group differences in test scores are a regular part of research and test development efforts for the Iowa Assessments. To ensure that assessment materials are appropriate and fair for different groups, careful test development procedures are followed. Sensitivity reviews by content and fairness committees and extensive statistical analysis of the items and tests are conducted. The precision of measurement for important groups in the National Comparison Study is evaluated when examining the measurement characteristics of the tests. Differences between groups in average performance and in the variability of performance are also of interest, and these are examined for changes over time. In addition to descriptions of group differences in test performance, analyses of differential item functioning are undertaken with results from the national item tryouts as well as with results from the National Comparison Study.

Technical Standards

Classification Accuracy & Cross-Validation Summary

Grade	Grade 1	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Classification Accuracy Fall
Classification Accuracy Winter
Classification Accuracy Spring

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

FAST

Classification Accuracy

Select time of year

Fall

Winter

Spring

Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.: Each criterion is 100% independent of the Iowa Assessments. They are produced by separate organizations and for slightly different purposes. The criterion measures vary by grade. In Grades 1–8, the grade-specific FAST aReading score served as a criterion. The FAST aReading assessment is a computer-adaptive measure of broad reading ability for Grades K–12. aReading is designed for universal screening to identify students at risk for academic delays and to differentiate instruction for all students. It targets phonological awareness, orthography and morphology, vocabulary, concepts of print, and phonics.

Do the classification accuracy analyses examine concurrent and/or predictive classification?

Concurrent
Predictive

Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.

Describe how the classification analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).: Classification rates and area-under-the-curve results were calculated as described in the “Frequently Asked Questions” document (see https://intensiveintervention.org/sites/default/files/NCII_AcademicScreening_FAQ_July2018.pdf for more details). Cut points were selected to appropriately identify students who need intensive intervention (either 15th PR or 20th PR). For the analyses related to the FAST assessment, the grade-specific FAST aReading score was used with the corresponding Reading score on the Iowa Assessments. In each grade, scores associated with the 20th percentile rank (PR) were used to set the cut score on both the FAST and the Iowa Assessments.

Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?: No
If yes, please describe the intervention, what children received the intervention, and how they were chosen.

Cross-Validation

Has a cross-validation study been conducted?: Yes
If yes,

Select time of year.

Fall

Winter

Spring

Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.: Each criterion is 100% independent of the Iowa Assessments. They are produced by separate organizations and for slightly different purposes. The criterion measures vary by grade. In Grades 1–5, the primary criterion was the grade-specific FAST aReading score. The FAST aReading assessment is a computer-adaptive measure of broad reading ability for Grades K–12. aReading is designed for universal screening to identify students at risk for academic delays and to differentiate instruction for all students. It targets phonological awareness, orthography and morphology, vocabulary, concepts of print, and phonics.

Do the cross-validation analyses examine concurrent and/or predictive classification?

Concurrent
Predictive

Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.

Describe how the cross-validation analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).: In each analysis, the cut point corresponded to the score associated with the 20th PR in the relevant sample in Section 1 (Classification Accuracy). These cut points identify students who need intensive intervention. Classification rates and area-under-the-curve results were calculated as described in the “Frequently Asked Questions” document (see https://intensiveintervention.org/sites/default/files/NCII_AcademicScreening_FAQ_July2018.pdf for more details).

Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?: No
If yes, please describe the intervention, what children received the intervention, and how they were chosen.

ACT

Classification Accuracy

Select time of year

Fall

Winter

Spring

Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.: Each criterion is 100% independent of the Iowa Assessments. They are produced by separate organizations and for slightly different purposes. The criterion measures vary by grade. In Grades 6–12, students ACT scores served as the primary criterion. The ACT is generally taken during Grades 11–12 and it is commonly used by school districts to gauge their students’ level of college and career readiness as well as by postsecondary institutions for admissions and placement decisions. The ACT test is designed to measure knowledge and skills in four core academic content areas: English, mathematics, reading, and science. Reading is used here because, like the Reading test on the Iowa Assessments, it is a measure of general reading comprehension that requires students refer to what was explicitly state, reason to determine implicit meanings, determine main ideas, understand sequence of events, and draw generalizations, among other things.

Do the classification accuracy analyses examine concurrent and/or predictive classification?

Concurrent
Predictive

Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.

Describe how the classification analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).: Classification rates and area-under-the-curve results were calculated as described in the “Frequently Asked Questions” document (see https://intensiveintervention.org/sites/default/files/NCII_AcademicScreening_FAQ_July2018.pdf for more details). Cut points were selected to appropriately identify students who need intensive intervention (either 15th PR or 20th PR). When the ACT is the criterion, the sample represents all students in Iowa who took the ACT and graduated from an Iowa high school in the Spring of 2009. In Grades 6–12, student’s grade grade-specific reading score on the Iowa Assessments was matched to the Grade 11/12 ACT score. The ACT score and Iowa Assessments score associated with the 20th PR in this sample was used to define identify students in need of intensive intervention. An ACT score of 17 was associated with a PR of 20. On the Iowa Assessments, while the grade-specific cut score varied by grade, it always represented the 20th PR. Students who took the ACT within 3 months of the Iowa Assessments were removed from the sample, as were students who took the ACT prior to the Iowa Assessments within a given grade.

Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?: No
If yes, please describe the intervention, what children received the intervention, and how they were chosen.

Cross-Validation

Has a cross-validation study been conducted?: Yes
If yes,

Select time of year.

Fall

Winter

Spring

Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.: Each criterion is 100% independent of the Iowa Assessments. They are produced by separate organizations and for slightly different purposes. The criterion measures vary by grade. In Grades 6–12, students’ ACT scores served as the criterion. The ACT is generally taken during Grades 11–12 and it is commonly used by school districts to gauge their students’ level of college and career readiness as well as by postsecondary institutions for admissions and placement decisions. The ACT test is designed to measure knowledge and skills in four core academic content areas: English, mathematics, reading, and science. Reading is used here because, like the Reading test on the Iowa Assessments, it is a measure of general reading comprehension that requires students refer to what was explicitly state, reason to determine implicit meanings, determine main ideas, understand sequence of events, and draw generalizations, among other things.

Do the cross-validation analyses examine concurrent and/or predictive classification?

Concurrent
Predictive

Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.

Describe how the cross-validation analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).: In each analysis, the cut point corresponded to the score associated with the 20th PR in the relevant sample in Section 1 (Classification Accuracy). These cut points identify students who need intensive intervention. Classification rates and area-under-the-curve results were calculated as described in the “Frequently Asked Questions” document (see https://intensiveintervention.org/sites/default/files/NCII_AcademicScreening_FAQ_July2018.pdf for more details).

Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?: No
If yes, please describe the intervention, what children received the intervention, and how they were chosen.

State of Texas Assessments of Academic Readiness (STARR)

Classification Accuracy

Select time of year

Fall

Winter

Spring

Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.: Each criterion is 100% independent of the Iowa Assessments. They are produced by separate organizations and for slightly different purposes. The criterion measures vary by grade. The State of Texas Assessments of Academic Readiness, or STAAR®, is the state testing program that was implemented in the 2011–2012 school year. The Texas Education Agency (TEA), in collaboration with the Texas Higher Education Coordinating Board (THECB) and Texas educators, developed the STAAR program in response to requirements set forth by the 80th and 81st Texas legislatures. STAAR is an assessment program designed to measure the extent to which students have learned and are able to apply the knowledge and skills defined in the state-mandated curriculum standards, the Texas Essential Knowledge and Skills

Do the classification accuracy analyses examine concurrent and/or predictive classification?

Concurrent
Predictive

Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.

Describe how the classification analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).: Classification rates and area-under-the-curve results were calculated as described in the “Frequently Asked Questions” document (see https://intensiveintervention.org/sites/default/files/NCII_AcademicScreening_FAQ_July2018.pdf for more details). Cut points were selected to appropriately identify students who need intensive intervention (either 15th PR or 20th PR).

Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?: No
If yes, please describe the intervention, what children received the intervention, and how they were chosen.

Cross-Validation

Has a cross-validation study been conducted?: No
If yes,

Select time of year.

Fall

Winter

Spring

Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.

Do the cross-validation analyses examine concurrent and/or predictive classification?

Concurrent
Predictive

Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.

Describe how the cross-validation analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).

Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?
If yes, please describe the intervention, what children received the intervention, and how they were chosen.

Georgia Milestone

Classification Accuracy

Select time of year

Fall

Winter

Spring

Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.: Each criterion is 100% independent of the Iowa Assessments. They are produced by separate organizations and for slightly different purposes. The criterion measures vary by grade. The Georgia Milestones Assessment System (Georgia Milestones) is a comprehensive summative assessment program spanning Grades 3 through high school. Georgia Milestones measures how well students have learned the knowledge and skills outlined in the state-adopted content standards in English Language Arts, mathematics, science, and social studies. Students in Grades 3 through 8 take an end-of-grade assessment in English Language Arts and mathematics while students in Grades 5 and 8 are also assessed in science and social studies. High school students take an end-of-course assessment for each of the ten courses designated by the State Board of Education.

Do the classification accuracy analyses examine concurrent and/or predictive classification?

Concurrent
Predictive

Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.

Describe how the classification analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).: Classification rates and area-under-the-curve results were calculated as described in the “Frequently Asked Questions” document (see https://intensiveintervention.org/sites/default/files/NCII_AcademicScreening_FAQ_July2018.pdf for more details). Cut points were selected to appropriately identify students who need intensive intervention (either 15th PR or 20th PR).

Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?: No
If yes, please describe the intervention, what children received the intervention, and how they were chosen.

Cross-Validation

Has a cross-validation study been conducted?: No
If yes,

Select time of year.

Fall

Winter

Spring

Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.

Do the cross-validation analyses examine concurrent and/or predictive classification?

Concurrent
Predictive

Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.

Describe how the cross-validation analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).

Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?
If yes, please describe the intervention, what children received the intervention, and how they were chosen.

Classification Accuracy - Fall

Evidence	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Criterion measure	Georgia Milestone	Georgia Milestone	Georgia Milestone	ACT	ACT	ACT	ACT	ACT	ACT	ACT
Cut Points - Percentile rank on criterion measure
Cut Points - Performance score on criterion measure
Cut Points - Corresponding performance score (numeric) on screener measure	15th percentile	15th percentile	15th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile
Classification Data - True Positive (a)
Classification Data - False Positive (b)
Classification Data - False Negative (c)
Classification Data - True Negative (d)
Area Under the Curve (AUC)	0.88	0.89	0.91	0.85	0.87	0.88	0.88	0.87	0.87	0.86
AUC Estimate’s 95% Confidence Interval: Lower Bound	0.86	0.87	0.89	0.84	0.86	0.87	0.87	0.86	0.86	0.84
AUC Estimate’s 95% Confidence Interval: Upper Bound	0.90	0.91	0.93	0.86	0.88	0.89	0.89	0.88	0.88	0.87

Statistics	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Base Rate
Overall Classification Rate
Sensitivity
Specificity
False Positive Rate
False Negative Rate
Positive Predictive Power
Negative Predictive Power

Sample	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Date	Fall 2017	Fall 2017	Fall 2017	Fall ‘02	Fall ‘03	Fall ‘04	Fall ‘05	Fall ‘06	Fall ‘07	Fall ‘08
Sample Size
Geographic Representation	South Atlantic (GA)	South Atlantic (GA)	South Atlantic (GA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)
Male
Female
Other
Gender Unknown
White, Non-Hispanic
Black, Non-Hispanic
Hispanic
Asian/Pacific Islander
American Indian/Alaska Native
Other
Race / Ethnicity Unknown
Low SES
IEP or diagnosed disability
English Language Learner

Classification Accuracy - Winter

Evidence	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Criterion measure	State of Texas Assessments of Academic Readiness (STARR)	State of Texas Assessments of Academic Readiness (STARR)	State of Texas Assessments of Academic Readiness (STARR)	ACT	ACT	ACT	ACT	ACT	ACT	ACT
Cut Points - Percentile rank on criterion measure
Cut Points - Performance score on criterion measure
Cut Points - Corresponding performance score (numeric) on screener measure	15th percentile	15th percentile	15th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile
Classification Data - True Positive (a)
Classification Data - False Positive (b)
Classification Data - False Negative (c)
Classification Data - True Negative (d)
Area Under the Curve (AUC)	0.89	0.89	0.90	0.87	0.86	0.87	0.88	0.87	0.88	0.87
AUC Estimate’s 95% Confidence Interval: Lower Bound	0.87	0.88	0.89	0.85	0.85	0.86	0.87	0.86	0.87	0.85
AUC Estimate’s 95% Confidence Interval: Upper Bound	0.90	0.90	0.91	0.88	0.88	0.88	0.89	0.88	0.89	0.88

Statistics	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Base Rate
Overall Classification Rate
Sensitivity
Specificity
False Positive Rate
False Negative Rate
Positive Predictive Power
Negative Predictive Power

Sample	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Date	Early 2017	Early 2017	Early 2017	Winter 0203	Winter 0304	Winter 0405	Winter 0506	Winter 0607	Winter 0708	Winter 0809
Sample Size
Geographic Representation	West South Central (TX)	West South Central (TX)	West South Central (TX)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)
Male
Female
Other
Gender Unknown
White, Non-Hispanic
Black, Non-Hispanic
Hispanic
Asian/Pacific Islander
American Indian/Alaska Native
Other
Race / Ethnicity Unknown
Low SES
IEP or diagnosed disability
English Language Learner

Classification Accuracy - Spring

Evidence	Grade 1	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Criterion measure	FAST	FAST	FAST	FAST	FAST	FAST	ACT	ACT	ACT	ACT	ACT	ACT
Cut Points - Percentile rank on criterion measure
Cut Points - Performance score on criterion measure
Cut Points - Corresponding performance score (numeric) on screener measure	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile
Classification Data - True Positive (a)
Classification Data - False Positive (b)
Classification Data - False Negative (c)
Classification Data - True Negative (d)
Area Under the Curve (AUC)	0.88	0.91	0.93	0.96	0.92	0.91	0.86	0.88	0.88	0.87	0.88	0.88
AUC Estimate’s 95% Confidence Interval: Lower Bound	0.86	0.89	0.91	0.94	0.90	0.89	0.84	0.87	0.86	0.86	0.87	0.86
AUC Estimate’s 95% Confidence Interval: Upper Bound	0.91	0.93	0.95	0.97	0.94	0.94	0.88	0.90	0.90	0.89	0.90	0.90

Statistics	Grade 1	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Base Rate
Overall Classification Rate
Sensitivity
Specificity
False Positive Rate
False Negative Rate
Positive Predictive Power
Negative Predictive Power

Sample	Grade 1	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Date	Spring ‘17	Spring ‘17	Spring '17	Spring ‘17	Spring ‘17	Spring ‘17	Spring ‘04	Spring ‘05	Spring ‘06	Spring ‘07	Spring ‘08	Spring ‘09
Sample Size
Geographic Representation	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)
Male
Female
Other
Gender Unknown
White, Non-Hispanic
Black, Non-Hispanic
Hispanic
Asian/Pacific Islander
American Indian/Alaska Native
Other
Race / Ethnicity Unknown
Low SES
IEP or diagnosed disability
English Language Learner

Cross-Validation - Fall

Evidence	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Criterion measure	ACT	ACT	ACT	ACT	ACT	ACT	ACT
Cut Points - Percentile rank on criterion measure
Cut Points - Performance score on criterion measure
Cut Points - Corresponding performance score (numeric) on screener measure	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile
Classification Data - True Positive (a)
Classification Data - False Positive (b)
Classification Data - False Negative (c)
Classification Data - True Negative (d)
Area Under the Curve (AUC)	0.84	0.86	0.86	0.85	0.84	0.85	0.86
AUC Estimate’s 95% Confidence Interval: Lower Bound	0.83	0.84	0.85	0.83	0.83	0.84	0.84
AUC Estimate’s 95% Confidence Interval: Upper Bound	0.86	0.87	0.88	0.86	0.86	0.86	0.87

Statistics	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Base Rate
Overall Classification Rate
Sensitivity
Specificity
False Positive Rate
False Negative Rate
Positive Predictive Power
Negative Predictive Power

Sample	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Date	Fall ‘06	Fall ‘07	Fall ‘08	Fall ‘09	Fall ‘10	Fall ‘11	Fall ‘12
Sample Size
Geographic Representation	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)
Male
Female
Other
Gender Unknown
White, Non-Hispanic
Black, Non-Hispanic
Hispanic
Asian/Pacific Islander
American Indian/Alaska Native
Other
Race / Ethnicity Unknown
Low SES
IEP or diagnosed disability
English Language Learner

Cross-Validation - Winter

Evidence	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Criterion measure	ACT	ACT	ACT	ACT	ACT	ACT	ACT
Cut Points - Percentile rank on criterion measure
Cut Points - Performance score on criterion measure
Cut Points - Corresponding performance score (numeric) on screener measure	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile
Classification Data - True Positive (a)
Classification Data - False Positive (b)
Classification Data - False Negative (c)
Classification Data - True Negative (d)
Area Under the Curve (AUC)	0.82	0.85	0.85	0.85	0.84	0.85	0.86
AUC Estimate’s 95% Confidence Interval: Lower Bound	0.81	0.83	0.83	0.84	0.82	0.84	0.84
AUC Estimate’s 95% Confidence Interval: Upper Bound	0.84	0.86	0.86	0.86	0.86	0.86	0.87

Statistics	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Base Rate
Overall Classification Rate
Sensitivity
Specificity
False Positive Rate
False Negative Rate
Positive Predictive Power
Negative Predictive Power

Sample	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Date	Winter 0607	Winter 0708	Winter 0809	Winter 0910	Winter 1011	Winter 1112	Winter 1213
Sample Size
Geographic Representation	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)
Male
Female
Other
Gender Unknown
White, Non-Hispanic
Black, Non-Hispanic
Hispanic
Asian/Pacific Islander
American Indian/Alaska Native
Other
Race / Ethnicity Unknown
Low SES
IEP or diagnosed disability
English Language Learner

Cross-Validation - Spring

Evidence	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Criterion measure	FAST	FAST	FAST	FAST	FAST	ACT	ACT	ACT	ACT	ACT	ACT
Cut Points - Percentile rank on criterion measure
Cut Points - Performance score on criterion measure
Cut Points - Corresponding performance score (numeric) on screener measure	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile	20th percentile
Classification Data - True Positive (a)
Classification Data - False Positive (b)
Classification Data - False Negative (c)
Classification Data - True Negative (d)
Area Under the Curve (AUC)	0.92	0.94	0.92	0.91	0.93	0.87	0.88	0.88	0.88	0.88	0.87
AUC Estimate’s 95% Confidence Interval: Lower Bound	0.90	0.93	0.90	0.89	0.91	0.86	0.87	0.86	0.87	0.87	0.85
AUC Estimate’s 95% Confidence Interval: Upper Bound	0.94	0.96	0.94	0.93	0.95	0.88	0.90	0.89	0.89	0.89	0.88

Statistics	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Base Rate
Overall Classification Rate
Sensitivity
Specificity
False Positive Rate
False Negative Rate
Positive Predictive Power
Negative Predictive Power

Sample	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Date	Spring ‘16	Spring ‘16	Spring ‘16	Spring ‘16	Spring ‘15	Spring ‘08	Spring ‘09	Spring ‘10	Spring ‘11	Spring ‘12	Spring ‘13
Sample Size
Geographic Representation	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)	West North Central (IA)
Male
Female
Other
Gender Unknown
White, Non-Hispanic
Black, Non-Hispanic
Hispanic
Asian/Pacific Islander
American Indian/Alaska Native
Other
Race / Ethnicity Unknown
Low SES
IEP or diagnosed disability
English Language Learner

Reliability

Grade	Grade 1	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Rating	^d	^d	^d	^d	^d	^d	^d	^d

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

*Offer a justification for each type of reliability reported, given the type and purpose of the tool.: Internal Consistency The internal consistency reliability coefficient estimates the proportion of variability within a single administration of a test that is due to inconsistency among the items that comprise the test. Test-retest Reliability: The Iowa Assessments are often used as an interim assessment, and students can take the assessment multiple times a year. Therefore, the test-retest reliability estimate is appropriate to provide stability estimates for the same students.

*Describe the sample(s), including size and characteristics, for each reliability analysis conducted.: Investigation of sources of variation that might affect scores on large-scale assessments was provided in two studies of reliability based on test administrations from multiple occasions. The first used data from a national probability that was weighted to match the characteristics of the national student population. The second used data from comparability study comparing paper-and-pencil administrations to computer-based administrations. Table 2.1 presents the n-counts for the sample from the national probability sample. Tables 2.1.A presents the percentages of students by school type. Tables 2.1.B and C present the percentages of the geographical region for public and private schools, respectively. Table 2.1.D presents the percentages of the racial-ethnic representation. Note that tables 2.1A to D include Grades K to 12.

*Describe the analysis procedures for each reported type of reliability.: • For internal consistency, K-R 20 reliability coefficients were calculated from the item-response records in the standardization sample. This national sample was weighted to be reflective of the national student population. The 95% confidence interval for the reliability estimate was computed using Feldt’s (1965) approximations for a sample size of 500. • Test-retest reliability estimates were calculated for a subsample of the national probability sample used in the standardization. Specifically, several hundred students in each grade were tested in the fall and then retested in the spring using the same from. Reliability estimates were calculated by correlating scores from the two administrations. • Although not summarized here, the Research and Development Guide for the Iowa Assessments includes several other types of reliability information, including split-half reliability, alternate forms reliability, and CSEMS. Another study examined the comparability of paper-based and computer-based administrations of the Iowa Assessments. In this study, the same students took both administration modes. The order of testing modes was counterbalanced, and an interval of between one and two weeks separated the two administrations. Correlations between scores in different modes can be interpreted as estimates of test-retest reliability. While the mode of administration does represent an additional source of variation in these scores, high correlations constitute evidence that the combined effects of temporal changes in examinees and administrative conditions are small.

*In the table(s) below, report the results of the reliability analyses described above (e.g., internal consistency or inter-rater reliability coefficients).

Type of	Subgroup	Informant	Age / Grade	Test or Criterion	n	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Do you have reliability data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?: Yes

If yes, fill in data for each subgroup with disaggregated reliability data.

Type of	Subgroup	Informant	Age / Grade	Test or Criterion	n	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Validity

Grade	Grade 1	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

*Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.: All analyses presented here utilized a criterion that was external to the assessment system. In Grades 1–5, one criterion was the grade-specific FAST aReading score. The FAST aReading assessment is a computer-adaptive measure of broad reading ability for Grades K–12. aReading is designed for universal screening to identify students at risk for academic delay. It targets phonological awareness, orthography and morphology, vocabulary, concepts of print, and phonics. Like the Iowa Assessments, FAST is designed to identify student’s strengths and weakness, monitor student growth in reading, and inform student instruction. Scores on the ACT served as a criterion in Grades 6–12. The ACT is generally taken during Grades 11–12 and it is commonly used by school districts to gauge their students’ level of college and career readiness as well as by postsecondary institutions for admissions and placement decisions. The ACT test is designed to measure knowledge and skills in four core academic content areas: English, mathematics, reading, and science. Reading is used here because, like the Reading test on the Iowa Assessments, it is a measure of general reading comprehension that requires students refer to what was explicitly state, reason to determine implicit meanings, determine main ideas, understand sequence of events, and draw generalizations, amongst other similarities. Validity coefficients with Georgia Milestones and Texas STAAR are also presented. Iowa Assessments scores are often used to predict performance on other state assessments at the end of the school year.

*Describe the sample(s), including size and characteristics, for each validity analysis conducted.: The FAST samples come from one of Iowa’s most diverse and largest districts. This district includes the city of Davenport and portions of neighboring cities. It represents the full range of student achievement. Sample sizes and characteristics of each grade are presented in the supporting tables in Section 1 (Classification Accuracy) and Section 4 (Cross-Validation). The ACT samples comprise the same students presented in the supporting tables in Sections 1 (Classification Accuracy) and 4 (Cross-Validation). Sample sizes and student characteristics are presented in each section. This sample represents potential college-bound students and demonstrates how, even in middle school, the development of student achievement is strongly related to students’ later readiness for college. The Georgia Milestones samples come from Newton County Schools, and the sample sizes are between 1275 to 1359 across Grades 3 to 8. The Texas STAAR samples come from Houston Independent School District, and the sample sizes between 4253 to 5609 across Grades 3 to 8.

*Describe the analysis procedures for each reported type of validity.: For evidence of concurrent and predictive validity, correlations between scores on the Iowa Assessments and the FAST or ACT were examined and the 95% confidence interval was computed for each sample. Concurrent evidence comes from when the Iowa Assessments and the criterion were administered in the same grade. When there was a year or more between administrations, the type of evidence was specified predictive. Concurrent validity coefficients between the Iowa Assessments and Georgia Milestones are from the Iowa Assessments and Georgia Milestones Assessment System (the Milestones) prediction study, where the prediction equations are developed. Results from the Iowa Assessments administration in Fall 2017 and the Milestones administration in Spring 2018 are used in this study. Concurrent validity coefficients between the Iowa Assessments and Texas STAAR are from the Iowa Assessments and Texas STAAR prediction study, where the prediction equations are developed. Results from the Iowa Assessments administration in January 2017 for Grades 3 through 7 and the STAAR assessment in March and May 2017 were used in this study.

*In the table below, report the results of the validity analyses described above (e.g., concurrent or predictive validity, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.

Type of	Subgroup	Informant	Age / Grade	Test or Criterion	n	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of validity analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Describe the degree to which the provided data support the validity of the tool.: The evidence presented above strongly supports the use of the Iowa Assessments as an intervention tool. For example, even though FAST has an adaptive platform while the Iowa Assessments used in this study are paper-based, correlation coefficients with the FAST were high (.70–.80). With the ACT, all correlation coefficients were higher than .70. Only in Grade 6 did the correlation fall below .70. This is not surprising given that the Grade 6 assessment was administered about 5 years before students took the ACT. What is remarkable is the strong and consistent relationship observer between scores on the Iowa Assessments and the ACT. The correlation coefficients between the Iowa Assessments and Georgia Milestones, and between the Iowa Assessments and Texas STAAR were generally about .80 across Grades 3 to 8. These evidences support high or appropriate level of concurrent and predictive validity of the Iowa Assessments.

Do you have validity data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?: No

If yes, fill in data for each subgroup with disaggregated validity data.

Type of	Subgroup	Informant	Age / Grade	Test or Criterion	n	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of validity analysis not compatible with above table format:

Manual cites other published reliability studies:: No

Provide citations for additional published studies.

Bias Analysis

Grade	Grade 1	Grade 2	Grade 3	Grade 4	Grade 5	Grade 6	Grade 7	Grade 8	Grade 9	Grade 10	Grade 11	Grade 12
Rating	Not Provided	Not Provided	Provided	Provided	Provided	Provided	Provided	Provided	Not Provided	Not Provided	Not Provided	Not Provided

Have you conducted additional analyses related to the extent to which your tool is or is not biased against subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)? Examples might include Differential Item Functioning (DIF) or invariance testing in multiple-group confirmatory factor models.: Yes

If yes,
a. Describe the method used to determine the presence or absence of bias:: questions in contexts accessible to students with a variety of backgrounds and interests. A goal of all test development in Iowa Testing Programs is to assemble test materials that reflect the diversity of the test-taking population in the United States. Reviewers are given information about the purposes of the tests, the content areas, and cognitive classifications. They are asked to look for possible racial-ethnic, regional, cultural, or gender biases in the way the item was written or in the information required to answer the question. The reviewers rate items as “probably fair,’’ “possibly unfair,” or “probably unfair” and comment on the balance of the items and make recommendations for change. Based on these reviews, items identified by the reviewers as problematic are either revised to eliminate objectionable features or eliminated from consideration for the final forms. Differential Item Functioning (DIF): DIF identifies items that function differently for two groups of examinees with the same total test score. In many cases, one group will be more likely to answer an item correctly on average than another group. These differences might be due to differing levels of knowledge and skills between the groups. DIF analyses take these group differences into account and help identify items that might unfairly favor one group over another. The items that are identified as potentially unfair by DIF are then presented for additional review. The statistical analyses of items for DIF are based on variants of the Mantel-Haenszel (MH) procedure (Dorans and Holland, 1993) . Specific item-level comparisons of performance are made for groups of males and females, Blacks and Whites, and Hispanics and Whites. The number of items identified as favoring a given group according to the classification scheme used by the Educational Testing Service (ETS) for the National Assessment of Educational Progress (NAEP) is shown in Table 5.1 below. Differential Test Functioning (DTF): A series of logistic regressions are conducted in predicting success on an end of year outcome measures (i.e., Georgia Milestones and Texas STAAR), predicted by risk-status as determined by the Iowa Assessments, membership in a gender and ethnicity groups, and an interaction term between the two variables. The presence or absence of bias are determined based on the statistical significance of the interaction term.

b. Describe the subgroups for which bias analyses were conducted:: Gender (Male vs. Female) and Ethnicity groups (Blacks vs. White and Hispanic vs. White) are considered for DIF and DTF.

c. Describe the results of the bias analyses conducted, including data and interpretative statements. Include magnitude of effect (if available) if bias has been identified.: Differential Item Functioning The overall percentages of items flagged for DIF in each form are very small and generally balanced across comparison groups. This is the goal of careful attention to content relevance and sensitivity during test development. Table 5.2 presents the number of items identified in category C from a national standardization study.

Data Collection Practices

Most tools and programs evaluated by the NCII are branded products which have been submitted by the companies, organizations, or individuals that disseminate these products. These entities supply the textual information shown above, but not the ratings accompanying the text. NCII administrators and members of our Technical Review Committees have reviewed the content on this page, but NCII cannot guarantee that this information is free from error or reflective of recent changes to the product. Tools and programs have the opportunity to be updated annually or upon request.

Summary
Descriptive Information
Administration
Training & Scoring

Technical Standards
Classification Accuracy &
Cross-Validation Summary
Reliability
Validity
Bias Analysis

Data Collection Practices

Iowa AssessmentsReading Tests, Forms E, F, G

Summary

Descriptive Information

Administration

Training & Scoring

Training

Scoring

Technical Standards

Classification Accuracy & Cross-Validation Summary

FAST

Classification Accuracy

Cross-Validation

ACT

Classification Accuracy

Cross-Validation

State of Texas Assessments of Academic Readiness (STARR)

Classification Accuracy

Cross-Validation

Georgia Milestone

Classification Accuracy

Cross-Validation

Classification Accuracy - Fall

Classification Accuracy - Winter

Classification Accuracy - Spring

Cross-Validation - Fall

Cross-Validation - Winter

Cross-Validation - Spring

Reliability

Validity

Bias Analysis

Data Collection Practices

Iowa Assessments
Reading Tests, Forms E, F, G