Imagine+ Screener (formerly EarlyBird Education)
Dyslexia and Early Literacy Screener

Summary

Descriptive Information

The Imagine+ Screener, formerly known as the EarlyBird Dyslexia and Early Literacy Screener, is a gamified universal screener and comprehensive literacy assessment for students in grades PreK-2. Developed, tested, and scientifically validated by experts at Boston Children’s Hospital and the Florida Center for Reading Research, the screener combines all relevant predictors of dyslexia and potential for word-reading into one child-centered, easy-to-administer assessment. Appropriate for both pre-readers and early readers, the Imagine+ Screener addresses the components of reading and targets literacy milestones predictive of subsequent reading success. The screener contains the tools necessary to easily and accurately identify students at risk of dyslexia and reading difficulties during the critical early years when intervention is most effective. The screener is self-administered in small groups with adult oversight at the beginning, middle, and end of the school year to assess word-reading performance across low and moderate expectations. Students navigate the gamified assessment with guidance from Pip, their feathery friend, who leads them through a city adventure to meet various animal friends who introduce each subtest. This immersive design ensures the assessment remains engaging and accessible for young learners. Educator resources include a web-based dashboard that provides actionable and intuitive assessment data at student, classroom, school, and district levels, a teacher intervention resource platform, and professional learning workshops to help teachers use collected data to drive instruction.

Acquisition & Cost

Where to Obtain:: Imagine Learning LLC; solutions@imaginelearning.com; 100 S Mill Ave., Suite 1700 Tempe, AZ 85281; 877-725-4257; https://www.imaginelearning.com/

Initial Cost:: $6.00 per student

Replacement Cost:: $6.00 per student per 12 months

Included in Cost:: The cost for the Imagine+ Screener is $6 per student annually. This provides students with online access to all virtual subtest assessments relevant to the grade level, staff access to the dashboard (with automatic scoring) and data reporting, downloadable supplemental paper/pencil subtests, intuitive easily-accessed help articles and tutorials for educators, student specific reports for sharing results with families (available in multiple languages), At Home Activities to share with families, easy-access service and support, secure hosting, and all maintenance, upgrades and enhancements for the duration of the license. Professional development is also available at an additional cost.; Imagine+ Screener has been designed to support a wide range of learners, with accommodating features including: Students may use the touchscreen, mouse or keyboard, accommodating students with limited mobility skills. Imagine+ Screener, with the exception of the RAN and Oral Reading Fluency subtests, is untimed. The game is designed to allow the student or administrator to pause at any time to allow for breaks and pauses with easy resuming of the game. Students can adjust the audio, as needed, and can use the assessment with noise-buffering headphones. The Imagine+ Screener can be administered individually or in small groups, as needed. The Imagine+ Screener was designed to reduce assessment bias. As part of item development, all items were reviewed for bias and fairness. During administration, Imagine+ Screener utilizes voice recognition AI, so each student is evaluated consistently, and absent of administrator scoring bias or fatigue. Finally, as a fun, interactive game, students are relaxed and engaged, and participate fully in the assessment, often not even realizing they are being assessed, thereby reducing stress and test anxiety performance issues.

Training & Technical Support

Training Requirements:: Because Imagine+ Screener is an intuitive and easy-to-understand game, a brief 30-45 minute virtual or on-site training is all that is required for staff participating in screener administration.

Qualified Administrators:: No minimum qualifications specified.

Access to Technical Support:: Imagine+ Screener provides full service support. Intuitive, easily-accessed help articles, tutorials, and next step resources are also available to educators. Each Imagine+ Screener customer has a Customer Success Manager who provides support and training throughout all stages of the customer journey. Professional Development sessions are available at an additional cost. In addition to the professional development/training, Imagine Learning maintains a robust Help Center of online resources as well as live customer and technical support seven days a week at no additional cost. Support services are available via phone, chat, and a web-based help desk option. Standard support hours (Eastern Time), include Monday through Friday: 7:30am to 9:30 pm; and Saturday and Sunday: 9:00 am to 5:00 pm. Imagine Learning systems are monitored 24 hours per day/7 days per week. Our Network Operating Center may be reached at any time to report system issues.

Administration

Assessment Format:

Scoring Time:

Scoring is automatic OR
5 minutes per student (to confirm RAN score)

Scores Generated:

Percentile score
IRT-based score
Probability
Subscale/subtest scores
Other: Teachers are able to see the breakdown of individual subtest skills as they pertain to the foundational skills of reading. The subtest scores are presented in the form of percentiles (which indicate how the child’s performance on the subtest compares to a nationally representative sample of students). In addition to a percentile score, the Letter Name and Letter Sound subtests (Kindergarten) also provide raw scores (number correct out of the total given).

Administration Time:

30 minutes per student (on average, varies by grade)

Scoring Method:

Automatically (computer-scored)
Other : As soon as the student completes each subtest in the app, scores are automatically calculated and displayed on the teacher dashboard. The RAN subtest is an exception, as it is completed by the student in the app but then requires score confirmation (asynchronously, at the convenience of the educator, by listening to the recording). Optional supplemental paper-and-pencil subtests (Oral Reading Fluency, Reading Comprehension), if administered, must be scored by hand while the student is completing the task.

Technology Requirements:

Computer or tablet
Internet connection

Accommodations:: Imagine+ Screener has been designed to support a wide range of learners, with accommodating features including: Students may use the touchscreen, mouse or keyboard, accommodating students with limited mobility skills. Imagine+ Screener, with the exception of the RAN and Oral Reading Fluency subtests, is untimed. The game is designed to allow the student or administrator to pause at any time to allow for breaks and pauses with easy resuming of the game. Students can adjust the audio, as needed, and can use the assessment with noise-buffering headphones. The Imagine+ Screener can be administered individually or in small groups, as needed. The Imagine+ Screener was designed to reduce assessment bias. As part of item development, all items were reviewed for bias and fairness. During administration, Imagine+ Screener utilizes voice recognition AI, so each student is evaluated consistently, and absent of administrator scoring bias or fatigue. Finally, as a fun, interactive game, students are relaxed and engaged, and participate fully in the assessment, often not even realizing they are being assessed, thereby reducing stress and test anxiety performance issues.

Descriptive Information

Please provide a description of your tool:: The Imagine+ Screener, formerly known as the EarlyBird Dyslexia and Early Literacy Screener, is a gamified universal screener and comprehensive literacy assessment for students in grades PreK-2. Developed, tested, and scientifically validated by experts at Boston Children’s Hospital and the Florida Center for Reading Research, the screener combines all relevant predictors of dyslexia and potential for word-reading into one child-centered, easy-to-administer assessment. Appropriate for both pre-readers and early readers, the Imagine+ Screener addresses the components of reading and targets literacy milestones predictive of subsequent reading success. The screener contains the tools necessary to easily and accurately identify students at risk of dyslexia and reading difficulties during the critical early years when intervention is most effective. The screener is self-administered in small groups with adult oversight at the beginning, middle, and end of the school year to assess word-reading performance across low and moderate expectations. Students navigate the gamified assessment with guidance from Pip, their feathery friend, who leads them through a city adventure to meet various animal friends who introduce each subtest. This immersive design ensures the assessment remains engaging and accessible for young learners. Educator resources include a web-based dashboard that provides actionable and intuitive assessment data at student, classroom, school, and district levels, a teacher intervention resource platform, and professional learning workshops to help teachers use collected data to drive instruction.

The tool is intended for use with the following grade(s).

Preschool / Pre - kindergarten
selected

Kindergarten
selected

First grade
selected

Second grade
not selected

Third grade
not selected

Fourth grade
not selected

Fifth grade
not selected

Sixth grade
not selected

Seventh grade
not selected

Eighth grade
not selected

Ninth grade
not selected

Tenth grade
not selected

Eleventh grade
not selected

Twelfth grade

The tool is intended for use with the following age(s).

0-4 years old
selected

5 years old
selected

6 years old
selected

7 years old
selected

8 years old
not selected

9 years old
not selected

10 years old
not selected

11 years old
not selected

12 years old
not selected

13 years old
not selected

14 years old
not selected

15 years old
not selected

16 years old
not selected

17 years old
not selected

18 years old

The tool is intended for use with the following student populations.

Students in general education
selected

Students with disabilities
selected

English language learners

ACADEMIC ONLY: What skills does the tool screen?

Reading

Phonological processing:

RAN

Memory

Awareness

Letter sound correspondence
selected

Phonics

Structural analysis

Word ID

Accuracy

Speed

Nonword

Accuracy

Speed

Spelling

Accuracy

Speed

Passage

Accuracy

Speed

Reading comprehension:

Multiple choice questions
not selected

Cloze

Constructed Response
not selected

Retell

Maze

Sentence verification
not selected

Other (please describe):

Listening comprehension:

Multiple choice questions
not selected

Cloze

Constructed Response
not selected

Retell

Maze

Sentence verification
selected

Vocabulary
selected

Expressive
selected

Receptive

Mathematics

Global Indicator of Math Competence

Accuracy

Speed

Multiple Choice
not selected

Constructed Response

Early Numeracy

Accuracy

Speed

Multiple Choice
not selected

Constructed Response

Mathematics Concepts

Accuracy

Speed

Multiple Choice
not selected

Constructed Response

Mathematics Computation

Accuracy

Speed

Multiple Choice
not selected

Constructed Response

Mathematic Application

Accuracy

Speed

Multiple Choice
not selected

Constructed Response

Fractions/Decimals

Accuracy

Speed

Multiple Choice
not selected

Constructed Response

Algebra

Accuracy

Speed

Multiple Choice
not selected

Constructed Response

Geometry

Accuracy

Speed

Multiple Choice
not selected

Constructed Response

Other (please describe):

Please describe specific domain, skills or subtests:: Kindergarten: Imagine+ Screener is a comprehensive assessment, with subtests in the critical skill areas related to the science of reading: Phonemic Awareness (First Sound Matching, Rhyming, Blending, Deletion, Nonword Repetition), Phonics (Letter Name, Letter Sound, Nonword Reading, Nonword Spelling), Fluency (Word Reading, Oral Reading Fluency, Object RAN), Vocabulary (Receptive Vocabulary, Word Matching), Comprehension (Oral Sentence Comprehension, Follow Directions, Reading Comprehension). Grades 1 and 2: Imagine+ Screener is a comprehensive assessment, with subtests in the critical skill areas related to the science of reading: Phonemic Awareness (Nonword Repetition); Phonics (Nonword Reading, Nonword Spelling); Fluency (Oral Reading Fluency, Word Reading, Letter RAN); Vocabulary (Expressive Vocabulary, Word Matching); Comprehension (Reading Comprehension, Follow Directions).

BEHAVIOR ONLY: Which category of behaviors does your tool target?: Internalizing
Externalizing
Internalizing and Externalizing

BEHAVIOR ONLY: Please identify which broad domain(s)/construct(s) are measured by your tool and define each sub-domain or sub-construct.

Acquisition and Cost Information

Where to obtain:

Email Address: solutions@imaginelearning.com
Address: 100 S Mill Ave., Suite 1700 Tempe, AZ 85281
Phone Number: 877-725-4257
Website: https://www.imaginelearning.com/

Initial cost for implementing program:

Cost: $6.00
Unit of cost: student

Replacement cost per unit for subsequent use:

Cost: $6.00
Unit of cost: student
Duration of license: 12 months

Additional cost information:

Describe basic pricing plan and structure of the tool. Provide information on what is included in the published tool, as well as what is not included but required for implementation.: The cost for the Imagine+ Screener is $6 per student annually. This provides students with online access to all virtual subtest assessments relevant to the grade level, staff access to the dashboard (with automatic scoring) and data reporting, downloadable supplemental paper/pencil subtests, intuitive easily-accessed help articles and tutorials for educators, student specific reports for sharing results with families (available in multiple languages), At Home Activities to share with families, easy-access service and support, secure hosting, and all maintenance, upgrades and enhancements for the duration of the license. Professional development is also available at an additional cost.

Provide information about special accommodations for students with disabilities.: Imagine+ Screener has been designed to support a wide range of learners, with accommodating features including: Students may use the touchscreen, mouse or keyboard, accommodating students with limited mobility skills. Imagine+ Screener, with the exception of the RAN and Oral Reading Fluency subtests, is untimed. The game is designed to allow the student or administrator to pause at any time to allow for breaks and pauses with easy resuming of the game. Students can adjust the audio, as needed, and can use the assessment with noise-buffering headphones. The Imagine+ Screener can be administered individually or in small groups, as needed. The Imagine+ Screener was designed to reduce assessment bias. As part of item development, all items were reviewed for bias and fairness. During administration, Imagine+ Screener utilizes voice recognition AI, so each student is evaluated consistently, and absent of administrator scoring bias or fatigue. Finally, as a fun, interactive game, students are relaxed and engaged, and participate fully in the assessment, often not even realizing they are being assessed, thereby reducing stress and test anxiety performance issues.

Administration

BEHAVIOR ONLY: What type of administrator is your tool designed for?

General education teacher
not selected

Special education teacher
not selected

Parent

Child

External observer
not selected

Other

If other, please specify:

What is the administration setting?

Direct observation
not selected

Rating scale
not selected

Checklist

Performance measure
not selected

Questionnaire
not selected

Direct: Computerized
not selected

One-to-one
not selected

Other

If other, please specify:

Does the tool require technology?

Yes

If yes, what technology is required to implement your tool? (Select all that apply)

Computer or tablet
selected

Internet connection
not selected

Other technology (please specify)

If your program requires additional technology not listed above, please describe the required technology and the extent to which it is combined with teacher small-group instruction/intervention:

What is the administration context?

Individual
selected

Small group If small group, n=24
not selected

Large group If large group, n=
selected

Computer-administered
not selected

Other

If other, please specify:

What is the administration time?

Time in minutes

per (student/group/other unit)

student (on average, varies by grade)

Additional scoring time:

Time in minutes

per (student/group/other unit)

student (to confirm RAN score)

ACADEMIC ONLY: What are the discontinue rules?

No discontinue rules provided
not selected

Basals

Ceilings

Other

If other, please specify:

The majority of Imagine+ Screener tasks (with the exception of RAN, Letter Name, Letter Sound, Oral Reading Fluency, and Reading Comprehension) are based on computer adaptive algorithms that leverage an Item Response Theory (IRT) framework to optimally match students to items. Because IRT item difficulties and person ability estimates are co-located on the same scale, algorithms are able to move students through individual assessments according to their response on individual items within a task. Correct responses to items typically result in students being administered relatively more difficult items based on the student’s ability whereas incorrect responses to items typically result in students being administered relatively easier items based on the student’s ability. The advantage of Computer Adaptive Testing (CAT) is that the student receives items targeted to their unique ability estimate and tasks can be administered quickly to obtain reliable information. It allows for the most precise assessment within the shortest amount of time.

Are norms available?: Yes

Are benchmarks available?: Yes
If yes, how many benchmarks per year?: 3
If yes, for which months are benchmarks available?: Beginning of Year (BOY) is available August to November, Middle of Year (MOY) is December to February, and End of Year (EOY) is March to June

BEHAVIOR ONLY: Can students be rated concurrently by one administrator?
If yes, how many students can be rated concurrently?

Training & Scoring

Training

Is training for the administrator required?: Yes

Describe the time required for administrator training, if applicable:: Because Imagine+ Screener is an intuitive and easy-to-understand game, a brief 30-45 minute virtual or on-site training is all that is required for staff participating in screener administration.

Please describe the minimum qualifications an administrator must possess.: The app is designed to be self-explanatory and easy for children to understand, so it can be administered by any adult, with no minimum qualifications or special training required. The Letter Name, Letter Sound, Oral Reading Fluency, and Reading Comprehension subtests (paper-and-pencil supplemental subtests outside of the app) require the administrator to read the directions for the student and score the student's responses in real time.; No minimum qualifications

Are training manuals and materials available?: Yes

Are training manuals/materials field-tested?: Yes

Are training manuals/materials included in cost of tools?: Yes
If No, please describe training costs:

Can users obtain ongoing professional and technical support?: Yes
If Yes, please describe how users can obtain support:: Imagine+ Screener provides full service support. Intuitive, easily-accessed help articles, tutorials, and next step resources are also available to educators. Each Imagine+ Screener customer has a Customer Success Manager who provides support and training throughout all stages of the customer journey. Professional Development sessions are available at an additional cost. In addition to the professional development/training, Imagine Learning maintains a robust Help Center of online resources as well as live customer and technical support seven days a week at no additional cost. Support services are available via phone, chat, and a web-based help desk option. Standard support hours (Eastern Time), include Monday through Friday: 7:30am to 9:30 pm; and Saturday and Sunday: 9:00 am to 5:00 pm. Imagine Learning systems are monitored 24 hours per day/7 days per week. Our Network Operating Center may be reached at any time to report system issues.

Scoring

How are scores calculated?

Manually (by hand)
selected

Automatically (computer-scored)
selected

Other

If other, please specify:

As soon as the student completes each subtest in the app, scores are automatically calculated and displayed on the teacher dashboard. The RAN subtest is an exception, as it is completed by the student in the app but then requires score confirmation (asynchronously, at the convenience of the educator, by listening to the recording). Optional supplemental paper-and-pencil subtests (Oral Reading Fluency, Reading Comprehension), if administered, must be scored by hand while the student is completing the task.

Do you provide basis for calculating performance level scores?: Yes

What is the basis for calculating performance level and percentile scores?

Age norms

Grade norms
not selected

Classwide norms
not selected

Schoolwide norms
not selected

Stanines

Normal curve equivalents

What types of performance level scores are available?

Raw score

Standard score
selected

Percentile score
not selected

Grade equivalents
selected

IRT-based score
not selected

Age equivalents
not selected

Stanines

Normal curve equivalents
not selected

Developmental benchmarks
not selected

Developmental cut points
not selected

Equated

Probability
not selected

Lexile score
not selected

Error analysis
not selected

Composite scores
selected

Subscale/subtest scores
selected

Other

If other, please specify:

Teachers are able to see the breakdown of individual subtest skills as they pertain to the foundational skills of reading. The subtest scores are presented in the form of percentiles (which indicate how the child’s performance on the subtest compares to a nationally representative sample of students). In addition to a percentile score, the Letter Name and Letter Sound subtests (Kindergarten) also provide raw scores (number correct out of the total given).

Does your tool include decision rules?: Yes
If yes, please describe.: Kindergarten: The Dyslexia Risk score uses predictive algorithms and cutpoints to classify students into the categories of ‘at risk’ or ‘not at risk,’ based on performance on an outcome measure. Because it uses a cut point based on the 16th percentile on the outcome measure, the Imagine+ Dyslexia Screener can be used to help teachers identify which students are in need of further assessment/intervention. Grades 1 and 2: The Potential for Word Reading score is the probability, expressed as a percentage, that reflects the likelihood at each testing period that a student will reach grade-level expectations in word reading by the end of the year, presuming the student does not receive appropriate, evidence-based remediation. It is determined by an algorithm that was created through a multifactorial analysis of all available subtests for each grade and time period, and driven by those subtests statistically determined to be most predictive of end-of-year reading outcomes in each grade. Low performance (low potential for word reading), for the purposes of this analysis, is defined as likely scoring at or below the 16th percentile on the KTEA-3 Letter and Word Recognition subtest at end-of-year. Because low performance uses a cut point based on the 16th percentile on the outcome measure, the screener can be used to help teachers identify which students are in need of further assessment/intervention. The concept of risk or success can be viewed in many ways, including the concept as a “percent chance” which is a number between 1 and 99, with 1 meaning there is a low chance that a student will develop a problem, and 99 being there is a high chance the student will not develop a problem. When attempting to identify children who are “at-risk” for poor performance on some type of future measure of reading achievement, this is typically a yes/no decision based upon some kind of “cut-point” along a continuum of risk. Decisions concerning appropriate cut-points are made based on the level of correct classification that is desired from the screening assessments. A variety of statistics may be used to guide such choices (e.g., sensitivity, specificity, positive and negative predictive power; see Schatschneider, Petscher & Williams, 2008) and each was considered in light of the other in choosing appropriate cut-points.

Can you provide evidence in support of multiple decision rules?: Yes
If yes, please describe.: Kindergarten: The Imagine+ Dyslexia and Early Literacy assessment yields two predictive profile scores. The Dyslexia Screener Risk Score can be used to identify students needing intensive intervention, as it uses a cut point based on the 16th percentile on the outcome measure. The second Imagine+ predictive profile, called Potential for Word Reading, uses a cut point based on the 40th percentile on the outcome measure. For the Kindergarten validation study, students’ performance was coded as ‘1’ for scores at or above the 40th percentile on the SAT10 for PWR, and ‘0’ for scores that did not meet this criteria. The Potential for Word Reading (PWR) predictive profile represents a prediction of success - indicating the probability that the student will reach grade level expectations by end of year without appropriate intervention. If a student is “flagged” for the PWR predictive profile, the student would likely benefit from intervention in certain foundational skills related to literacy, as indicated by low percentiles on one or more subtests. Additional information can be found in the Imagine+ Dyslexia and Early Literacy Technical Manual. Grades 1 and 2: Predicted Potential for Word Reading performance is displayed in the form of a percent and categorized into one of three groups: high performance, moderate performance, and low performance. High performance, for the purposes of this analysis, is defined as likely scoring above the 40th percentile on the KTEA-3 Letter and Word Recognition subtest at end-of-year. Low performance, for the purposes of this analysis, is defined as likely scoring at or below the 16th percentile on the KTEA-3 Letter and Word Recognition subtest at end-of-year. Moderate performance includes all students who do not meet either of the above criteria for the high or low performance categories. A student in the moderate performance category would likely benefit from intervention in certain foundational skills related to literacy, as indicated by low percentiles on one or more subtests. Additional information can be found in the Imagine+ Screener Grade 1 & 2 Dyslexia and Early Literacy Technical Manual.

Please describe the scoring structure. Provide relevant details such as the scoring format, the number of items overall, the number of items per subscale, what the cluster/composite score comprises, and how raw scores are calculated.: Imagine+ Screener subtests (with the exception of RAN, Letter Name, and Letter Sound, Oral Reading Fluency, and Reading Comprehension) are computer adaptive, based on item response theory (IRT). After practice questions, each subtest presents a set of initial items, which span a developmentally appropriate range of difficulty and are presented in a random order. These fixed items calibrate the child’s initial ability level and allow the computer adaptive algorithm to present additional items to further pinpoint the child’s ability score. This ensures that fewer questions are asked overall, and all are within an appropriate level of difficulty for the individual child. Once a subtest has been completed by the student, it is scored automatically, generating a raw score that corresponds to a final theta ability estimate. The final theta is converted to a normed percentile, which is displayed on the Imagine+ Screener Report and indicates how the child’s performance on the subtest compares to a nationally representative sample of students. The final theta to percentile conversion tables are updated periodically to reflect the most recent representative sample of student scores available. Percentiles in the lowest two quintiles are highlighted blue (light blue for < 40th percentile; dark blue for < 20th percentile), which gives teachers a quick visual summary of each child’s areas of strength and need, as well as those of the classroom as a whole. In addition to the individual subtest percentiles, weighted algorithms of a subset of subtests yield a Potential for Word Reading (PWR) score, which indicates the probability that the student will reach grade level expectations in word reading by the end of the year without appropriate intervention. The PWR score identifies which students are in need of further diagnostic testing and/or intervention. Subtest percentiles and the PWR score appear on the classroom and student dashboard views, clearly labeled and defined for educators. Subtests are grouped on the dashboard according to broader categories corresponding to the science of reading - Phonemic Awareness, Phonics, Fluency, Vocabulary, and Comprehension.

Describe the tool’s approach to screening, samples (if applicable), and/or test format, including steps taken to ensure that it is appropriate for use with culturally and linguistically diverse populations and students with disabilities.: The Imagine+ Screener assessment addresses literacy milestones in pre-readers that have been found to be predictive of subsequent reading success. Imagine+ Screener includes screening for dyslexia as well as other reading challenges, with analyses providing “predictive profiles” as well as subtest specific diagnostic norms for each child. Imagine+ Screener is designed for three times per year benchmarking – Beginning of Year (BOY) July to November, Middle of Year (MOY) December to February, and End of Year (EOY) March to June. The subtest battery and risk algorithms are adapted to each time of year to accommodate child development and expectations. Multiple validation studies, involving nationally representative samples of students (from multiple geographic regions of the United States, attending a mix of public, private, and charter schools), have been designed and carried out to establish construct, predictive, and concurrent validity, as well as to create normed percentiles for each subtest. The samples included students with and without a familial history of diagnosed or suspected dyslexia and a range of socioeconomic backgrounds (as determined by the percentage of students receiving free or reduced-price lunch at the participating schools). In terms of race and ethnicity, every attempt has been made to ensure that samples closely reflect U.S. census data. The gamified aspects of Imagine+ Screener were designed by a group of experts at Massachusetts Institute of Technology (MIT) to be developmentally appropriate for pre-readers and early readers alike. Teachers report that children are engaged when using Imagine+ Screener and the directions are simple, clear and age-appropriate. This is by design. Additionally, the game is set in an urban setting, with only animal characters, so that it is broadly appealing and widely understood across diverse populations. Finally, the game was designed with instructional best practices - provide instruction, model the activity, and allow for hands-on practice before assessment begins. At the beginning of the Imagine+ Screener assessment, the child is asked “Are you ready to go on an adventure?” The child is then shown a map of a cartoon city and introduced to a new feathery friend, Pip, who will join them on their journey. The narrator explains that the child will meet more animal friends (located at fixed points along the path shown on the map), play games with them, and collect prizes on the way to the final destination. Each “game” is a subtest, associated with a different animal friend. When the child selects the icon, the animal friend greets the child and explains how to do the task with the help of animated visuals. Next, Pip demonstrates the task. For most subtests, the child then attempts 1-2 practice questions, for which corrective feedback is provided. Accommodations, in many cases, are not needed with Imagine+ Screener. Because reading is not required to play the Imagine+ Screener game, English Language Learners (Level 1 or above) who can understand the spoken directions in English generally do well with Imagine+ Screener. They can also be assisted by translators who explain the directions for each subtest. Districts who have used Imagine+ Screener with their Dual Language Learners report that they get highly valued information early in the year that they could not get any other way. Students with behavioral challenges also respond well to Imagine+ Screener. Since it looks and feels like a gentle game, it captures the attention of some children who can be harder to assess, including students on the autism spectrum. Because the game is adaptive, students are assessed at an appropriate level for their ability. Imagine+ Screener uses an AI speech recognition engine to assess student's verbal responses, thus reducing implicit bias in assessment. Imagine+ Screener provides teachers with a much more comprehensive picture of a student’s reading proficiency at a much earlier age. AI technology automatically scores the child’s response so that the score is available for immediate reporting.

Technical Standards

Classification Accuracy & Cross-Validation Summary

Grade	Kindergarten	Grade 1	Grade 2
Classification Accuracy Fall
Classification Accuracy Winter
Classification Accuracy Spring

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

Kaufman Test of Educational Achievement (KTEA) - Phonological Processing subtest

Classification Accuracy

Select time of year

Fall

Winter

Spring

Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.: The Kaufman Test of Educational Achievement, Third Edition (KTEA–3 Comprehensive Form; Kaufman & Kaufman, 2014) is an individually administered measure of academic achievement for grades pre-kindergarten through 12 or ages 4 through 25 years. The KTEA-3 has 19 subtests and one of them is named “Phonological Processing”. It can be administered to students in Pre-K through grade 12+ (ages 4 – 25). The student responds orally to items that require manipulation of sounds. The following skills are included in this subtest: Rhyming, Sound Matching, Blending and Segmenting Phonemes, and Deleting Sounds. While the items involve both phonological processing, phonological awareness, and phonemic awareness (often referred to phonological skills), the publisher (Pearson) named it “Phonological Processing”. In general, phonological skills can be defined as ‘understanding that words consist of smaller sound units (syllables, phonemes) and being able to manipulate these smaller units.” This ability to manipulate the sounds of your oral language enables emergent readers to analyze the phonological structure of a word and subsequently link it to the corresponding orthographic and lexical-semantic features which establishes and facilitates word recognition. Phonological skills can be measured using phonological processing and phonological and phonemic awareness tasks. Additionally, the KTEA-3 manual emphasizes a strong relationship between the Phonological Processing subtest and word reading and decoding skills. More specifically, the KTEA-3 manual reports correlations of around 0.6 between the Phonological Processing subtest and the Reading Comprehension and Letter Word Recognition subtests in pre-k (page 149, table B1) as well as kindergarten (page 150, table B.2). Furthermore, in grade 1, the Phonological Processing subtest correlates with the Nonword Decoding, Letter/Word Reading, and Reading Comprehension subtests (all above 0.6). Correlations with general abilities measure have been reported much lower (around 0.2-0.3; page 74, Table 2.16), emphasizing the relationship between Phonological Processing and measures of reading and decoding. When it comes to children with reading disabilities, the manual reports a significant difference for the Phonological Processing subtest for children with compared to children without a reading disorder, with children with reading disabilities exhibiting significantly lower scores (page 80, Table 2.18). Many of the skill areas covered by the Phonological Processing subtest of the KTEA-3 are also addressed through subtests in the Imagine+ Screener (e.g. rhyming, first sound matching, blending, deletion), although the Imagine+ Screener subtests and items within those subtests were developed independently of the KTEA-3. Kaufman, A. S., & Kaufman, N. L. (2014). Kaufman test of educational achievement, third edition. Bloomington, MN: NCS Pearson.

Do the classification accuracy analyses examine concurrent and/or predictive classification?

Concurrent
Predictive

Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.

Describe how the classification analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).: Logistic regressions were used, in part, to calibrate classification accuracy. Students’ performance on the selected criterions were coded as ‘1’ for performance below the 16th percentile on the KTEA-3 Phonological Processing for the Dyslexia Risk flag, and ‘0’ for scores that did not meet this criteria. In this way, the Dyslexia flag is a prediction of risk. Each dichotomous variable was then regressed on a combination of the Imagine+ Screener. As such, students could be identified as not at-risk on the multifactorial combination of screening tasks via the joint probability and demonstrating adequate performance on the criterion (i.e., specificity or true-negatives), at-risk on the combination of screening task scores via the joint probability and not demonstrating adequate performance on the criterion (i.e., sensitivity or true-positives), not at-risk based on the combination of screening task scores but at-risk on a criterion (i.e., false negative error), or at-risk on the combination of screening task scores but not at-risk on the criterion (i.e., false positive error). Classification of students in these categories allows for the evaluation of cut-points on the combination of screening tasks to determine which were the cut-point maximizing selected indicators. The concept of risk or success can be viewed in many ways, including the concept as a “percent chance” which is a number between 1 and 99, with 1 meaning there is low chance that a student may develop a problem, and 99 being there is a high chance that the student may develop a problem. When attempting to identify children who are “at-risk” for poor performance on some type of future measure of reading achievement, Imagine+ Screener uses a yes/no decision based upon a “cut-point” along a continuum of risk. Decisions concerning appropriate cut-points are made based on the level of correct classification that is desired from the screening assessments. A variety of statistics may be used to guide such choices (e.g., sensitivity, specificity, positive and negative predictive power; see Schatschneider, Petscher & Williams, 2008) and each was considered in light of the other in choosing appropriate cut-points. Area under the curve, sensitivity, and specificity estimates from the final logistic regression model were bootstrapped 1,000 times in order to obtain a 95% confidence interval of scores using the cutpointr package in R statistical software.

Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?: Yes
If yes, please describe the intervention, what children received the intervention, and how they were chosen.: Imagine+ Screener studies did not include an intervention component and Imagine+ Screener did not collect data related to intervention services conducted at the participating schools. That said, because the samples were comprised of children demonstrating a wide range of performance in terms of literacy-related skills, it is likely that some children included in the samples may have received intervention in addition to classroom instruction between administration of the screening measure and outcome assessment.

Cross-Validation

Has a cross-validation study been conducted?: No
If yes,

Select time of year.

Fall

Winter

Spring

Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.

Do the cross-validation analyses examine concurrent and/or predictive classification?

Concurrent
Predictive

Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.

Describe how the cross-validation analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).

Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?
If yes, please describe the intervention, what children received the intervention, and how they were chosen.

Kaufman Test of Educational Achievement (KTEA) - Letter and Word Recognition subtest

Classification Accuracy

Select time of year

Fall

Winter

Spring

Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.: The Kaufman Test of Educational Achievement, Third Edition (KTEA–3 Comprehensive Form; Kaufman & Kaufman, 2014) is an individually administered measure of academic achievement for grades pre-kindergarten through 12 or ages 4 through 25 years. The KTEA-3 has 19 subtests and one of them is named “Letter and Word Recognition”. It can be administered to students in Pre-K through grade 12+ (ages 4 – 25). The student responds orally to name letters and read real English words in isolation. The skills covered in the Letter and Word Recognition subtest of the KTEA-3 are also addressed through subtests in the Imagine+ Screener assessment (e.g. Letter Names and Word Reading subtests), but the Imagine+ Screener subtests and the items within those subtests were developed independently of the KTEA-3.

Do the classification accuracy analyses examine concurrent and/or predictive classification?

Concurrent
Predictive

Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.

Describe how the classification analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).: Logistic regressions were used, in part, to calculate classification accuracy. Grade 1 and Grade 2 study participants’ performance on the selected criterions were coded as ‘1’ for performance above the 40th percentile on the KTEA-3 Letter and Word Recognition subtest (for "high performance" PWR) or at or below the 16th percentile on the KTEA-3 Letter and Word Recognition subtest (for "low performance" PWR), and ‘0’ for scores that did not meet these criteria. In this way, the "high performance" PWR represents a prediction of success and the "low performance" PWR is a prediction of severe risk. Each dichotomous variable was then regressed on a combination of Imagine+ Screener Assessments. As such, students could be identified as not at-risk on the multifactorial combination of screening tasks via the joint probability and demonstrating adequate performance on the criterion (i.e., specificity or true-negatives), at-risk on the combination of screening task scores via the joint probability and not demonstrating adequate performance on the criterion (i.e., sensitivity or true-positives), not at-risk based on the combination of screening task scores but at-risk on a criterion (i.e., false negative error), or at-risk on the combination of screening task scores but not at-risk on the criterion (i.e., false positive error). Classification of students in these categories allows for the evaluation of cut-points on the combination of screening tasks to determine which were the cut-point maximizing selected indicators. The concept of risk or success can be viewed in many ways, including the concept as a “percent chance” which is a number between 1 and 99, with 1 meaning there is a low chance that a student may develop a problem, and 99 being there is a high chance that the student may develop a problem. When attempting to identify children who are “at-risk” for poor performance on some type of future measure of reading achievement, Imagine+ Screener reports the likelihood of reading success both in terms of a percent and by classifying students into categories based on their performance.

Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?: Yes
If yes, please describe the intervention, what children received the intervention, and how they were chosen.: The studies did not include an intervention component and Imagine+ Screener did not collect data related to intervention services conducted at the participating schools. That said, because the samples were comprised of children demonstrating a wide range of performance in terms of literacy-related skills, it is likely that some children included in the samples may have received intervention in addition to classroom instruction between administration of the screening measure and outcome assessment.

Cross-Validation

Has a cross-validation study been conducted?: No
If yes,

Select time of year.

Fall

Winter

Spring

Describe the criterion (outcome) measure(s) including the degree to which it/they is/are independent from the screening measure.

Do the cross-validation analyses examine concurrent and/or predictive classification?

Concurrent
Predictive

Describe when screening and criterion measures were administered and provide a justification for why the method(s) you chose (concurrent and/or predictive) is/are appropriate for your tool.

Describe how the cross-validation analyses were performed and cut-points determined. Describe how the cut points align with students at-risk. Please indicate which groups were contrasted in your analyses (e.g., low risk students versus high risk students, low risk students versus moderate risk students).

Were the children in the study/studies involved in an intervention in addition to typical classroom instruction between the screening measure and outcome assessment?
If yes, please describe the intervention, what children received the intervention, and how they were chosen.

Classification Accuracy - Fall

Evidence	Kindergarten	Grade 1	Grade 2
Criterion measure	Kaufman Test of Educational Achievement (KTEA) - Phonological Processing subtest	Kaufman Test of Educational Achievement (KTEA) - Letter and Word Recognition subtest	Kaufman Test of Educational Achievement (KTEA) - Letter and Word Recognition subtest
Cut Points - Percentile rank on criterion measure	16	16	16
Cut Points - Performance score on criterion measure
Cut Points - Corresponding performance score (numeric) on screener measure
Classification Data - True Positive (a)	24	80	75
Classification Data - False Positive (b)	29	93	63
Classification Data - False Negative (c)	7	13	8
Classification Data - True Negative (d)	124	418	377
Area Under the Curve (AUC)	0.85	0.93	0.94
AUC Estimate’s 95% Confidence Interval: Lower Bound	0.80	0.90	0.92
AUC Estimate’s 95% Confidence Interval: Upper Bound	0.90	0.95	0.96

Statistics	Kindergarten	Grade 1	Grade 2
Base Rate	0.17	0.15	0.16
Overall Classification Rate	0.80	0.82	0.86
Sensitivity	0.77	0.86	0.90
Specificity	0.81	0.82	0.86
False Positive Rate	0.19	0.18	0.14
False Negative Rate	0.23	0.14	0.10
Positive Predictive Power	0.45	0.46	0.54
Negative Predictive Power	0.95	0.97	0.98

Sample	Kindergarten	Grade 1	Grade 2
Date	August - November 2019	September-November 2019, 2020, 2021, 2022	September-November 2019, 2020, 2021, 2022
Sample Size	184	604	523
Geographic Representation	Middle Atlantic (NY, PA) Mountain (MT) New England (MA, RI) West North Central (MO) West South Central (LA, TX)	New England (MA) Pacific (OR) South Atlantic (FL, GA, SC)	New England (MA) Pacific (OR) South Atlantic (FL, GA, SC)
Male	48.4%
Female	50.0%
Other
Gender Unknown	1.6%
White, Non-Hispanic	73.4%	38.2%	34.6%
Black, Non-Hispanic	6.5%	34.6%	30.8%
Hispanic	9.2%	17.4%	25.6%
Asian/Pacific Islander	0.5%	4.6%	4.0%
American Indian/Alaska Native
Other	8.7%	4.8%	4.2%
Race / Ethnicity Unknown	1.6%	0.3%	0.8%
Low SES		57.0%	47.6%
IEP or diagnosed disability	6.5%
English Language Learner

Classification Accuracy - Winter

Evidence	Kindergarten	Grade 1	Grade 2
Criterion measure	Kaufman Test of Educational Achievement (KTEA) - Phonological Processing subtest	Kaufman Test of Educational Achievement (KTEA) - Letter and Word Recognition subtest	Kaufman Test of Educational Achievement (KTEA) - Letter and Word Recognition subtest
Cut Points - Percentile rank on criterion measure	16	16	16
Cut Points - Performance score on criterion measure
Cut Points - Corresponding performance score (numeric) on screener measure
Classification Data - True Positive (a)	24	84	76
Classification Data - False Positive (b)	29	88	61
Classification Data - False Negative (c)	7	9	7
Classification Data - True Negative (d)	124	423	379
Area Under the Curve (AUC)	0.85	0.95	0.95
AUC Estimate’s 95% Confidence Interval: Lower Bound	0.80	0.92	0.93
AUC Estimate’s 95% Confidence Interval: Upper Bound	0.90	0.97	0.97

Statistics	Kindergarten	Grade 1	Grade 2
Base Rate	0.17	0.15	0.16
Overall Classification Rate	0.80	0.84	0.87
Sensitivity	0.77	0.90	0.92
Specificity	0.81	0.83	0.86
False Positive Rate	0.19	0.17	0.14
False Negative Rate	0.23	0.10	0.08
Positive Predictive Power	0.45	0.49	0.55
Negative Predictive Power	0.95	0.98	0.98

Sample	Kindergarten	Grade 1	Grade 2
Date	August - November 2019	December-February 2019, 2020, 2021, 2022, 2023	December-February 2019, 2020, 2021, 2022, 2023
Sample Size	184	604	523
Geographic Representation	Middle Atlantic (NY, PA) Mountain (MT) New England (MA, RI) West North Central (MO) West South Central (LA, TX)	New England (MA) Pacific (OR) South Atlantic (FL, GA, SC)	New England (MA) Pacific (OR) South Atlantic (FL, GA, SC)
Male	48.4%
Female	50.0%
Other
Gender Unknown	1.6%
White, Non-Hispanic	73.4%	38.2%	34.6%
Black, Non-Hispanic	6.5%	34.6%	30.8%
Hispanic	9.2%	17.4%	25.6%
Asian/Pacific Islander	0.5%	4.6%	4.0%
American Indian/Alaska Native
Other	8.7%	4.8%	4.2%
Race / Ethnicity Unknown	1.6%	0.3%	0.8%
Low SES		57.0%	47.6%
IEP or diagnosed disability	6.5%
English Language Learner

Classification Accuracy - Spring

Evidence	Grade 1	Grade 2
Criterion measure	Kaufman Test of Educational Achievement (KTEA) - Letter and Word Recognition subtest	Kaufman Test of Educational Achievement (KTEA) - Letter and Word Recognition subtest
Cut Points - Percentile rank on criterion measure	16	16
Cut Points - Performance score on criterion measure
Cut Points - Corresponding performance score (numeric) on screener measure
Classification Data - True Positive (a)	46	74
Classification Data - False Positive (b)	50	47
Classification Data - False Negative (c)	7	8
Classification Data - True Negative (d)	240	394
Area Under the Curve (AUC)	0.92	0.95
AUC Estimate’s 95% Confidence Interval: Lower Bound	0.88	0.92
AUC Estimate’s 95% Confidence Interval: Upper Bound	0.97	0.97

Statistics	Grade 1	Grade 2
Base Rate	0.15	0.16
Overall Classification Rate	0.83	0.89
Sensitivity	0.87	0.90
Specificity	0.83	0.89
False Positive Rate	0.17	0.11
False Negative Rate	0.13	0.10
Positive Predictive Power	0.48	0.61
Negative Predictive Power	0.97	0.98

Sample	Grade 1	Grade 2
Date	March-April 2020, 2021, 2022, 2023	March-April 2020, 2021, 2022, 2023
Sample Size	343	523
Geographic Representation	New England (MA) Pacific (OR) South Atlantic (FL, GA, SC)	New England (MA) Pacific (OR) South Atlantic (FL, GA, SC)
Male
Female
Other
Gender Unknown
White, Non-Hispanic	38.2%	34.6%
Black, Non-Hispanic	34.7%	30.8%
Hispanic	17.5%	25.6%
Asian/Pacific Islander	4.7%	4.0%
American Indian/Alaska Native
Other	4.7%	4.2%
Race / Ethnicity Unknown	0.3%	0.8%
Low SES	56.9%	47.6%
IEP or diagnosed disability
English Language Learner

Reliability

Grade	Kindergarten	Grade 1	Grade 2
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

*Offer a justification for each type of reliability reported, given the type and purpose of the tool.: Marginal Reliability is an appropriate model-based measure of reliability to use, given that most Imagine+ Screener subtests are computer adaptive and use Item Response Theory (IRT). Reliability is reported for the subtests that contribute to the Potential for Word Reading score used for screening. Reliability describes how consistent test scores will be across multiple administrations over time, as well as how well one form of the test relates to another. Because the Imagine+ Screener uses IRT as its method of validation, reliability takes on a different meaning than from a Classical Test Theory (CTT) perspective. The biggest difference between the two approaches is the assumption made about the measurement error related to the test scores. CTT treats the error variance as being the same for all scores, whereas the IRT view is that the level of error is dependent on the ability of the individual. As such, reliability in IRT becomes more about the level of precision of measurement across ability. Although it is often more useful to graphically represent the standard error across ability levels to gauge for what range of abilities the test is more or less informative, it is possible to estimate marginal reliability through a calculation. We report reliability based on a k-fold cross validation approach of the pass/fail decisions at each cut point for Dyslexia Risk Score and Potential for Word Reading. This reliability method is the most appropriate for our screening tool because the risk classification is a logistic regression based risk prediction that estimates the predicted, person-level, log-odds of a dichotomized status on the standardized word reading outcome. This is contrary to many traditional screeners that use a single composite scale score with a risk cut score and hence report more traditional calculations of reliability.

*Describe the sample(s), including size and characteristics, for each reliability analysis conducted.: Kindergarten: Marginal reliability for the Rhyming, First Sound Matching, Nonword Repetition, and Vocabulary kindergarten subtests was conducted using a representative sample of 419 kindergarten students in 19 schools and eight states in every region of the country including MT, MO, MA, NY, LA, PA, RI, and TX who took the Imagine+ Screener assessment between August and November 2019. The sample was 75.5% White, 12.92% Black or African American, 4.9% Asian, 2.45% American Indian, .89% Native Hawaiian / Pacific Islander. 12.22% identified as Hispanic. 3.34% did not respond. For the rest of the kindergarten subtests (Letter Name, Letter Sound, Blending, Deletion, Word Reading, Word Matching, Follow Directions, and Oral Sentence Comprehension), a statewide representative sample of kindergarten students that roughly reflected Florida’s demographic diversity and academic ability (N ~ 2,400) was collected on students in Kindergarten as part of a larger K-2 validation and linking study. Because the samples used for data collection did not strictly adhere to the state distribution of demographics (i.e., percent limited English proficiency, Black, White, Latino, and eligible for free/reduced lunch), sample weights according to student demographics were used. Grades 1 and 2: Marginal reliability was conducted for the screener task in the fall, winter, and spring for Grade 1 and Grade 2 students. Samples were collected from 23 schools for Grade 1 and 21 schools for Grade 2 across 5 states (FL, SC, GA, OR, MA), spanning multiple geographic regions. The fall Grade 1 sample had 447 students; the winter Grade 1 sample had 627 students; the spring Grade 1 sample had 270 students; the fall Grade 2 sample had 288 students; the winter Grade 2 sample had 509 students; the spring Grade 2 sample had 257 students.

*Describe the analysis procedures for each reported type of reliability.: An estimate of reliability, known as marginal reliability (Sireci, Thissen, & Wainer, 1991), was calculated using the variance of ability with the mean squared error. The formula and additional information about the procedure is available in the Imagine+ Screener Dyslexia and Early Literacy Assessment Technical Manuals. We fitted a single and multiple logistic-regression model to predict pass/fail status from the set of screening predictors, then evaluated decision-level reliability with a 20-fold stratified cross-validation. For each fold we refit the model on 19/20 of the sample, generated predicted probabilities for all students in the sample, and converted those probabilities to a binary risk flag using the fixed threshold of log-odds. We then compared every pair of fold-specific classifications, calculating Cohen’s κ to quantify how consistently the screen would label the same child as at-risk across independent replications of the modelling process. The 20-fold cross-validated pass/fail decision yielded a mean Cohen’s κ of > .95 across all administrations, grades, and cut-points.

*In the table(s) below, report the results of the reliability analyses described above (e.g., internal consistency or inter-rater reliability coefficients).

Type of	Subgroup	Informant	Age / Grade	Test or Criterion	n	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published reliability studies:: Yes

Provide citations for additional published studies.: Barbara R. Foorman, Yaacov Petscher, Christopher Stanley & Adrea Truckenmiller (2017) Latent Profiles of Reading and Language and Their Association With Standardized Reading Outcomes in Kindergarten Through Tenth Grade, Journal of Research on Educational Effectiveness, 10:3, 619-645, DOI: 10.1080/19345747.2016.1237597 Foorman, B. R., Petscher, Y., & Herrera, S. (2018, March 7). Unique and common effects of decoding and language factors in predicting reading comprehension in grades 1–10. Learning and Individual Differences. Retrieved April 24, 2022, from https://www.sciencedirect.com/science/article/abs/pii/S1041608018300414#preview-section-abstract

Do you have reliability data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?: No

If yes, fill in data for each subgroup with disaggregated reliability data.

Type of	Subgroup	Informant	Age / Grade	Test or Criterion	n	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of reliability analysis not compatible with above table format:

Manual cites other published reliability studies:: Yes

Provide citations for additional published studies.: Barbara R. Foorman, Yaacov Petscher, Christopher Stanley & Adrea Truckenmiller (2017) Latent Profiles of Reading and Language and Their Association With Standardized Reading Outcomes in Kindergarten Through Tenth Grade, Journal of Research on Educational Effectiveness, 10:3, 619-645, DOI: 10.1080/19345747.2016.1237597 Foorman, B. R., Petscher, Y., & Herrera, S. (2018, March 7). Unique and common effects of decoding and language factors in predicting reading comprehension in grades 1–10. Learning and Individual Differences. Retrieved April 24, 2022, from https://www.sciencedirect.com/science/article/abs/pii/S1041608018300414#preview-section-abstract

Validity

Grade	Kindergarten	Grade 1	Grade 2
Rating

Legend

Convincing evidence

Partially convincing evidence

Unconvincing evidence

Data unavailable

^dDisaggregated data available

*Describe each criterion measure used and explain why each measure is appropriate, given the type and purpose of the tool.: Kindergarten: The Phonological Processing subtest on the KTEA-3 was used to determine predictive validity and construct (convergent) validity for the Dyslexia Risk part of the Imagine+ Screener assessment. The Phonological Processing subtest measures the ability to perform a variety of tasks related to phonological awareness, such as rhyming, blending, and segmenting words, and yields a composite score. The SAT-10 Word Reading was used to determine additional predictive and concurrent validity. The Word Reading subtest of the SAT-10/SESAT measures the ability to read words. Both tests assess some of the constructs that Imagine+ Screener also measures, but each is a standardized, paper-and-pencil psychometric test that was developed and published separately by Pearson and so is external to the Imagine+ Screener. Grades 1 and 2: The Letter and Word Recognition subtest on the KTEA-3 was used to determine predictive validity and concurrent validity for the Potential for Word Reading predictive score. The Letter and Word Recognition subtest measures the student's ability to identify letters and read real words in isolation. Imagine+ Screener also measures those constructs (using the Letter Name subtest and Word Reading subtest), but the KTEA-3 is a standardized, paper-and-pencil psychometric test that was developed and published separately by Pearson and so is external to the Imagine+ Screener assessment.

*Describe the sample(s), including size and characteristics, for each validity analysis conducted.: Kindergarten: A validity study was conducted for the Dyslexia Risk (KTEA-3 outcome measure) aspect of the Imagine+ Screener for Kindergarten during the 2019-2020 school year. Students were administered the full Imagine+ Screener assessment (all Kindergarten-appropriate subtests) during fall/winter and the KTEA-3 during spring/summer 2020. Having two data points from approximately 200 participants (located in 8 states across every region of the country), from the app the previous fall and from the psychometric assessments in late spring/summer 2020, allowed for the evaluation of the screener’s predictive validity. Separately, data collection for the PWR Risk (SAT-10/SESAT outcome measure) aspect of the Imagine+ Screener began by testing item pools for the Screen tasks (i.e., Letter Sounds, Phonological Awareness, Word Reading, Vocabulary Pairs, and Following Directions). A statewide representative sample of students that roughly reflected Florida’s demographic diversity and academic ability (N ~ 2,400) was collected on students in Kindergarten as part of a larger K-2 validation and linking study. Because the samples used for data collection did not strictly adhere to the state distribution of demographics (i.e., percent limited English proficiency, Black, White, Latino, and eligible for free/reduced lunch), sample weights according to student demographics were used to inform the item and student parameter scores. Grades 1 and 2: A validity study was conducted to collect data on new subtests used in the Imagine+Screener and to create the algorithm for the new Potential for Word Reading predictive score in Grade 1 and Grade 2. There were 447 Grade 1 participants in the fall, 627 in the winter, and 270 in the spring, and there were 288 Grade 2 participants in the fall, 509 in the winter, and 257 in the spring. The Grade 1 sample was approximately 35% Black, 38% White, 17% Hispanic, 5% Asian, and 5% Multiracial; the Grade 2 sample was approximately 31% Black, 35% White, 26% Hispanic, 4% Asian, and 4% Multiracial. Participants were located across 5 states (FL, GA, SC, OR, MA) spanning multiple geographic regions. 57% of the Grade 1 participants and 48% of the Grade 2 participants were eligible for free/reduced price lunch. Students in each grade were administered all grade appropriate subtests during fall, winter, and spring, and the KTEA-3 Letter and Word Recognition subtest in the spring. Having two data points (from the app in fall and winter and from the outcome measure in spring) allowed for the evaluation of the screener's predictive validity. Having spring app data and spring outcome measure data allowed for the evaluation of the screener's concurrent validity.

*Describe the analysis procedures for each reported type of validity.: Kindergarten: Predictive Validity The predictive validity of the Dyslexia Risk screening tasks to the KTEA-3 Phonological Processing subtest was done through a series of multiple regression analyses tested for the additive and interactive relations between Imagine+ Screener and the K-PA outcome (KTEA-3, Phonological Processing) to find the fewest number of tasks that maximized the percentage of explained variance in K-PA. The final model included the Following Directions, Nonword Repetition, and Rhyming subtests with R2 of .37 (multiple r = .61, 95% CI = .50, .69, n = 184). The predictive and concurrent validity of the PWR screening tasks to the SAT-10 Word Reading (SESAT in K) was addressed through a series of linear and logistic regressions. The linear regressions were run two ways. First, a correlation analysis was used to evaluate the strength of relations between each of the Screening task ability scores with SESAT. Pearson correlations between PWR tasks and the SESAT Word Reading task ranged from .38 to .59. Second, a multiple regression was run to estimate the total amount of variance that the linear combination of the predictors explained in SESAT (46%). Construct Validity Construct validity describes how well scores from an assessment measure the construct it is intended to measure. A component of construct validity is convergent validity, which can be evaluated by testing relations between a developed assessment (like the Imagine+ Screener Rhyming subtest) and another related assessment (like the Phonological Processing subtest of the KTEA-3). The goal of convergent validity is to yield a high association which indicates that the developed measure converges, or is empirically linked to, the intended construct. Concurrent validity (correlation) analyses were also conducted. Phonological awareness skills (like First Sound Matching) and sound/symbol correspondence tasks (like Letter Sounds) would be expected to have moderate associations between them; thus, the expectation is that moderate correlations would be observed. Predictive, convergent, and concurrent validity results are reported below. Grades 1 and 2: The predictive validity of the Potential for Word Reading screening to the KTEA-3 Letter and Word Recognition subtest was done through a series of multiple regression analyses testing for the additive and interactive relations between Imagine+ Screener subtests and the K-LWR outcome (KTEA-3, Letter and Word Recognition) to find the fewest number of tasks that maximized the percentage of explained variance in K-LWR. The predictive and concurrent validity of the PWR screener to the outcome measure was addressed through a series of linear and logistic regressions. First a correlation analysis was used to evaluate the strength of relations between each of the screener subtests ability scores with the outcome measure. Second, a multiple regression was run to estimate the total amount of variance that the screener tasks explained in the KTEA-3 Letter and Word Recognition. Predictive and concurrent validity results can be found in the Imagine+ Screener Grade 1 & 2 Dyslexia and Early Literacy technical manual.

*In the table below, report the results of the validity analyses described above (e.g., concurrent or predictive validity, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.

Type of	Subgroup	Informant	Age / Grade	Test or Criterion	n	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of validity analysis not compatible with above table format:: Kindergarten only: Convergent and concurrent validity analyses were also conducted. The Imagine+ Screener Rhyming subtest to the KTEA-3 Phonological Processing (n = 215). Convergent validity was .53. The correlation between the Imagine+ Screener First Sound Matching (n = 191) and Letter Sounds (n = 213) subtests was .51.

Manual cites other published reliability studies:: Yes

Provide citations for additional published studies.: Barbara R. Foorman, Yaacov Petscher, Christopher Stanley & Adrea Truckenmiller (2017) Latent Profiles of Reading and Language and Their Association With Standardized Reading Outcomes in Kindergarten Through Tenth Grade, Journal of Research on Educational Effectiveness, 10:3, 619-645, DOI: 10.1080/19345747.2016.1237597; Foorman, B. R., Petscher, Y., & Herrera, S. (2018, March 7). Unique and common effects of decoding and language factors in predicting reading comprehension in grades 1–10. Learning and Individual Differences. Retrieved April 24, 2022, from https://www.sciencedirect.com/science/article/abs/pii/S1041608018300414#preview-section-abstract; Rhinehart, L.V. and Gotlieb, R.J.M. (2023), English Learners’ Performance on a Measure of Dyslexia Risk. Learning Disabilities Research & Practice, 38: 199-208. https://doi.org/10.1111/ldrp.12316

Describe the degree to which the provided data support the validity of the tool.

Do you have validity data that are disaggregated by gender, race/ethnicity, or other subgroups (e.g., English language learners, students with disabilities)?: No

If yes, fill in data for each subgroup with disaggregated validity data.

Type of	Subgroup	Informant	Age / Grade	Test or Criterion	n	Median Coefficient	95% Confidence Interval Lower Bound	95% Confidence Interval Upper Bound

Results from other forms of validity analysis not compatible with above table format:

Manual cites other published reliability studies:

Provide citations for additional published studies.

Bias Analysis

Grade	Kindergarten	Grade 1	Grade 2
Rating	Provided	Provided	Provided

Have you conducted additional analyses related to the extent to which your tool is or is not biased against subgroups (e.g., race/ethnicity, gender, socioeconomic status, students with disabilities, English language learners)? Examples might include Differential Item Functioning (DIF) or invariance testing in multiple-group confirmatory factor models.: Yes

If yes,
a. Describe the method used to determine the presence or absence of bias:: Kindergarten: DIF analysis on items / Guidelines for Retaining Items: Several criteria were used to evaluate item performance. The first process was to identify items which demonstrated strong floor or ceiling effects in response rates >= 95%. Such items are not useful in creating an item bank as there is little variability in whether students are successful on the item. In addition to evaluating the descriptive response rate, we estimated item-total correlations. Items with negative values are indicative of poor functioning such that it suggests individuals who correctly answer the question tend to have lower total scores. Similarly, items with low item-total correlations indicate the lack of a relation between item and total test performance. Items with correlations <.15 were flagged for removal. Following the descriptive analysis of item performance, difficulty and discrimination values from the IRT analyses were used to further identify items which were poorly functioning. Items were flagged for item revision if the item discrimination was negative or the item difficulty was greater than +4.0 or less than -4.0. Secondary criteria were used in evaluating the retained items, which was comprised of a differential item function (DIF) analysis. DIF refers to instances where individuals from different groups with the same level of underlying ability significantly differ in their probability to correctly endorse an item. Unchecked, items included in a test which demonstrate DIF will produce biased test results. For the PWR study, DIF testing was conducted comparing: Black-White students, Latino-White students, Black-Latino students, students eligible for Free or Reduced Priced Lunch (FRL) with students not receiving FRL, and English Language Learner to non-English Language Learner students. DIF testing in the PWR study was conducted with a multiple indicator multiple cause (MIMIC) analysis in Mplus (Muthén & Muthén, 2008); moreover, a series of four standardized and expected score effect size measures were generated using VisualDF software (Meade, 2010) to quantify various technical aspects of score differentiation between the gender groups. First, the signed item difference in the sample (SIDS) index was created, which describes the average unstandardized difference in expected scores between the groups. The second effect size calculated was the unsigned item difference in the sample (UIDS). This index can be utilized as supplementary to the SIDS. When the absolute value of the SIDS and UIDS values are equivalent, the differential functioning between groups is equivalent; however, when the absolute value of the UIDS is larger than SIDS, it provides evidence that the item characteristic curves for expected score differences cross, indicating that differences in the expected scores between groups change across the level of the latent ability score. The D-max index is reported as the maximum SIDS value in the sample, and may be interpreted as the greatest difference for any individual in the sample in the expected response. Lastly, an expected score standardized difference (ESSD) was generated, and was computed similar to a Cohen’s (1988) d statistic. As such, it is interpreted as a measure of standard deviation difference between the groups for the expected score response with values of .2 regarded as small, .5 as medium, and .8 as large. Items demonstrating DIF were flagged for further study in order to ascertain why groups with the same latent ability performed differently on the items. DIF testing in the Dyslexia Risk study was estimated using the difR package (Magis, Beland, & Raiche, 2020) using the Mantel-Haenszel method (1959) for detecting uniform DIF. For each of the six tasks, DIF was tested for four primary contrasts: 1) Male vs. Female, 2) White vs. Sample, and 3) Black vs. Sample. The Mantel-Haenszel chi-square statistic was reported for test by item and the chi-square was used to derive an effect size estimate (i.e., ETS delta scale; Holland & Thayer, 1988). Effect size values <= 1.0 are considered small, 1.0 – 1.5 is moderate, and >= 1.5 is considered large. Differential Test Functioning: A component of checking the validity of cut-points and scores on the assessments involved also testing differential accuracy of the regression equations across different demographic groups. This procedure involved a series of logistic regressions predicting success on the SESAT (i.e., at or above the 40th percentile) outcome measure. The independent variables included a variable that represented whether students were identified as not at-risk based on the identified cut-point on a combination score of the screening tasks, a variable that represented a selected demographic group, as well as an interaction term between the two variables. A statistically significant interaction term would suggest that differential accuracy in predicting end-of-year risk status existed for different groups of individuals based on the risk status identified by the PWR screener. Grades 1 and 2: Differential Item Functioning (DIF) analysis on items / Guidelines for Retaining Items: Several criteria were used to evaluate item performance. The first process was to identify items which demonstrated strong floor or ceiling effects in response rates >= 95%. Such items are not useful in creating an item bank as there is little variability in whether students are successful on the item. In addition to evaluating the descriptive response rate, we estimated item-total correlations. Items with negative values are indicative of poor functioning such that it suggests individuals who correctly answer the question tend to have lower total scores. Similarly, items with low item-total correlations indicate the lack of a relation between item and total test performance. Items with correlations <.15 were flagged for removal. Following the descriptive analysis of item performance, difficulty and discrimination values from the IRT analyses were used to further identify items which were poorly functioning. Items were flagged for item revision if the item discrimination was negative or the item difficulty was greater than +4.0 or less than -4.0. Secondary criteria were used in evaluating the retained items, which was comprised of a differential item function (DIF) analysis. DIF refers to instances where individuals from different groups with the same level of underlying ability significantly differ in their probability to correctly endorse an item. Unchecked, items included in a test which demonstrate DIF will produce biased test results. Differential Test Functioning, or testing differential accuracy of the regression equations across different demographic groups, was also conducted. This procedure involved a series of logistic regressions predicting success on the outcome measure (KTEA-3 Letter and Word Recognition subtest). The independent variables included a variable that represented whether students were identified as not at-risk based on an identified cut point on a combination score of the screening tasks, a variable that represented a selected demographic group, as well as an interaction term between the two variables. A statistically significant interaction term would suggest that differential accuracy in predicting end-of-year risk status existed for different groups of individuals based on the risk status identified by the PWR screener.

b. Describe the subgroups for which bias analyses were conducted:: Kindergarten: For the PWR study, DIF testing was conducted comparing: Black-White students, Latino-White students, Black-Latino students, students eligible for Free or Reduced Priced Lunch (FRL) with students not receiving FRL, and English Language Learner to non-English Language Learner students. For the Dyslexia Risk study, DIF was tested for 1) Male vs. Female, 2) White vs. Sample, and 3) Black vs. Sample. Differential accuracy was separately tested for the PWR study for Black and Latino students as well as for students identified as English Language Learners (ELL) and students who were eligible for Free/Reduced Price Lunch (FRL). Grades 1 and 2: Differential test functioning was calculated for Black students, Hispanic students, Male (vs. Female) students, and students identified as English Language Learners (ELL).

c. Describe the results of the bias analyses conducted, including data and interpretative statements. Include magnitude of effect (if available) if bias has been identified.: Kindergarten: Differential Item Functioning (DIF) - Across all Kindergarten tasks and comparisons, only 12 items demonstrated at DIF with at least a moderate effect size (i.e., ETS >= 1.0): 2 nonword repetition items, and 10 Word Matching items. These items were removed from the item bank for further study and testing. All remaining items presented with ETS delta values <1.00 indicating small DIF. Differential Test Functioning - No statistically significant differential accuracy was found for any demographic sub-group. For more information, see pages 16-17 and 29 in the Imagine+ Technical Manual. Grades 1 and 2: No significant differences were found for any demographic sub-group. See Tables 8-11 in the Imagine+ Screener Grade 1 & 2 Dyslexia and Early Literacy technical manuals.

Data Collection Practices

Most tools and programs evaluated by the NCII are branded products which have been submitted by the companies, organizations, or individuals that disseminate these products. These entities supply the textual information shown above, but not the ratings accompanying the text. NCII administrators and members of our Technical Review Committees have reviewed the content on this page, but NCII cannot guarantee that this information is free from error or reflective of recent changes to the product. Tools and programs have the opportunity to be updated annually or upon request.

Summary
Descriptive Information
Administration
Training & Scoring

Technical Standards
Classification Accuracy &
Cross-Validation Summary
Reliability
Validity
Bias Analysis

Data Collection Practices

Imagine+ Screener (formerly EarlyBird Education)Dyslexia and Early Literacy Screener

Summary

Descriptive Information

Administration

Training & Scoring

Training

Scoring

Technical Standards

Classification Accuracy & Cross-Validation Summary

Kaufman Test of Educational Achievement (KTEA) - Phonological Processing subtest

Classification Accuracy

Cross-Validation

Kaufman Test of Educational Achievement (KTEA) - Letter and Word Recognition subtest

Classification Accuracy

Cross-Validation

Classification Accuracy - Fall

Classification Accuracy - Winter

Classification Accuracy - Spring

Reliability

Validity

Bias Analysis

Data Collection Practices

Imagine+ Screener (formerly EarlyBird Education)
Dyslexia and Early Literacy Screener