Achieve3000’s LevelSet

Reading

Cost

Technology, Human Resources, and Accommodations for Special Needs

Service and Support

Purpose and Other Implementation Information

Usage and Reporting

Initial Cost:

$11.00 per student

 

Replacement Cost:

$11.00 per student per year.


Annual license renewal fee subject to change.

 

Included in Cost:

LevelSet can be licensed for $11 per student per year. Online training is available at $440 per session, and onsite training is $2,300 per day. Included in the licenses is LevelSet pre, interim, and post-test administrations in English and Spanish for grades 2-12 and adult learners with 3 equivalent, alternate forms per grade. As a cloud-based solution, LevelSet can be used on any device with Internet connectivity.

 

Technology Requirements:

  • Computer or tablet
  • Internet connection

 

Training Requirements:

  • Less than 1 hour of training

 

Qualified Administrators:

  • No minimum qualifications specified

 

Accommodations:

The Achieve3000 platform uses design principals which meet ADA and Section 508 requirements. Scaffolds are not provided during the LevelSet assessment for students with disabilities.

 

Where to Obtain:

Website: www.achieve3000.com

Address: 1985 Cedar Bridge Avenue, Suite 3, Lakewood, NJ 08701

Phone number: 732.367.5505
Email: orders@achieve3000.com


Access to Technical Support:

On-demand online resources in Ask Achieve3000 provide step-by-step instructions for teachers to administer the LevelSet assessment, and student-facing videos serve to introduce students to the LevelSet assessment, its purpose, administration tips, and preparation guidelines. In addition to these on-demand assets, Achieve3000 curriculum and implementation managers—through onsite or live online training, and a customer support department—through phone or email communications—can respond to any questions that may arise as the assessment is administered.

 

Developed in collaboration with MetaMetrics®, Inc., the makers of the Lexile Framework for Reading®, the LevelSet™ universal screener establishes each student’s initial Lexile reading level in English or in Spanish. LevelSet measures a student’s ability to comprehend informational text and provides a scale score that matches reading ability with text complexity.  It can be administered up to three times per year, first as a pre-test to establish a baseline Lexile level, forecast readiness for university and career benchmarks, match students with differentiated, tailored text; and identify the best solution and implementation that will promote accelerated growth for every student. Interim and post-test administrations provide a summative measure of student growth.

 

LevelSet can be used as a stand-alone assessment or in conjunction with Achieve3000 differentiated instruction. During the test, students read a series of approximately 30 paragraph-long passages and answer a cloze-style question about each one.

Assessment Format:

  • Direct: Computerized

 

Administration Time:

  • 15 minutes per student

 

Scoring Time:

  • Scoring is automatic

 

Scoring Method:

  • Calculated automatically

 

Scores Generated:

  • Standard score
  • Normal curve equivalents
  • Lexile score

 

Classification Accuracy

Grade3456789
Criterion 1 FallFull bubbleFull bubbleFull bubbleFull bubbledashdashdash
Criterion 1 Winterdashdashdashdashdashdashdash
Criterion 1 Springdashdashdashdashdashdashdash
Criterion 2 FallEmpty bubbleHalf-filled bubbleHalf-filled bubbleFull bubbleFull bubbleHalf-filled bubbleHalf-filled bubble
Criterion 2 Winterdashdashdashdashdashdashdash
Criterion 2 Springdashdashdashdashdashdashdash

Primary Sample

 

Criterion 1, Fall

Grade

3

4

5

6

7

8

9

Criterion

CAASPP

CAASPP

CAASPP

CAASPP

Not Provided

Not Provided

Not Provided

Cut points: Percentile rank on criterion measure

Not Provided

Not Provided

Not Provided

Not Provided

Not Provided

Not Provided

Not Provided

Cut points: Performance score (numeric) on criterion measure

2338

2375

2415

2438

Not Provided

Not Provided

Not Provided

Cut points: Corresponding performance score (numeric) on screener measure

227.5

347.5

462.5

517.5

Not Provided

Not Provided

Not Provided

Base rate in the sample for children requiring intensive intervention

0.13

0.12

0.17

0.15

Not Provided

Not Provided

Not Provided

False Positive Rate

0.18

0.12

0.09

0.08

Not Provided

Not Provided

Not Provided

False Negative Rate

0.13

0.16

0.24

0.28

Not Provided

Not Provided

Not Provided

Sensitivity

0.87

0.84

0.76

0.72

Not Provided

Not Provided

Not Provided

Specificity

0.82

0.88

0.91

0.92

Not Provided

Not Provided

Not Provided

Positive Predictive Power

0.42

0.49

0.54

0.53

Not Provided

Not Provided

Not Provided

Negative Predictive Power

0.98

0.97

0.96

0.96

Not Provided

Not Provided

Not Provided

Overall Classification Rate

0.82

0.87

0.89

0.90

Not Provided

Not Provided

Not Provided

Area Under the Curve (AUC)

0.92

0.94

0.93

0.94

Not Provided

Not Provided

Not Provided

AUC 95% Confidence Interval Lower Bound

0.91

0.93

0.92

0.93

Not Provided

Not Provided

Not Provided

AUC 95% Confidence Interval Upper Bound

0.93

0.95

0.94

0.95

Not Provided

Not Provided

Not Provided

 

Criterion 2, Fall

Grade

3

4

5

6

7

8

9

Criterion

STAAR

STAAR

STAAR

STAAR

STAAR

STAAR

STAAR

Cut points: Percentile rank on criterion measure

Not Provided

Not Provided

Not Provided

Not Provided

Not Provided

Not Provided

Not Provided

Cut points: Performance score (numeric) on criterion measure

1282

1370

1418

1450

1511

1558

3440

Cut points: Corresponding performance score (numeric) on screener measure

227.5

347.5

462.5

517.5

587.5

622.5

737.5

Base rate in the sample for children requiring intensive intervention

0.30

0.28

0.26

0.25

0.26

0.27

0.22

False Positive Rate

0.38

0.34

0.26

0.18

0.20

0.12

0.29

False Negative Rate

0.09

0.10

0.14

0.20

0.21

0.31

0.15

Sensitivity

0.91

0.90

0.86

0.80

0.79

0.69

0.85

Specificity

0.62

0.66

0.74

0.82

0.80

0.88

0.71

Positive Predictive Power

0.50

0.51

0.53

0.60

0.58

0.69

0.45

Negative Predictive Power

0.94

0.94

0.94

0.92

0.91

0.89

0.94

Overall Classification Rate

0.70

0.73

0.77

0.82

0.80

0.83

0.74

Area Under the Curve (AUC)

0.86

0.86

0.88

0.89

0.88

0.89

0.87

AUC 95% Confidence Interval Lower Bound

0.85

0.85

0.87

0.88

0.87

0.88

0.86

AUC 95% Confidence Interval Upper Bound

0.87

0.87

0.89

0.90

0.89

0.90

0.88

 

Reliability

Grade3456789
RatingFull bubbleFull bubbleFull bubbleFull bubbleFull bubbleFull bubbleFull bubble
  1. Justification for each type of reliability reported, given the type and purpose of the tool: Internal Consistency: Internal-consistency reliability examines the extent to which a test measures a single basic concept.  One procedure for determining the internal consistency of a test is coefficient alpha (a).  Coefficient alpha sets an upper limit to the reliability of tests constructed in terms of the domain-sampling model.

Test-Retest: Test-retest reliability examines the stability of test scores over time.  When the same test is administered twice within a reasonable time, the correlation of the results provides evidence of test-retest reliability. The closer the results, the greater the test-retest reliability of the assessment.

Alternate Form: Alternate-form reliability examines the consistency of test scores sampled from the same domain of items.  When two forms that are considered to be parallel, or interchangeable (i.e. LevelSet Forms D and E) are administered to the same group of students, the correlation coefficient provides information about how well the two parallel forms yield the same results for students and is often referred to as a coefficient of stability and equivalence.

 

  1. Description of the sample(s), including size and characteristics, for each reliability analysis conducted: Internal Consistency: Internal-consistency reliability examines the extent to which a test measures a single basic concept.  One procedure for determining the internal consistency of a test is coefficient alpha ().  Coefficient alpha sets an upper limit to the reliability of tests constructed in terms of the domain-sampling model.

Test-Retest: Test-retest reliability examines the stability of test scores over time.  When the same test is administered twice within a reasonable time, the correlation of the results provides evidence of test-retest reliability. The closer the results, the greater the test-retest reliability of the assessment.

Alternate Form: Alternate-form reliability examines the consistency of test scores sampled from the same domain of items.  When two forms that are considered to be parallel, or interchangeable (i.e. LevelSet Forms D and E) are administered to the same group of students, the correlation coefficient provides information about how well the two parallel forms yield the same results for students and is often referred to as a coefficient of stability and equivalence.

 

  1. Description of the analysis procedures for each reported type of reliability: Internal Consistency: Internal-consistency reliability examines the extent to which a test measures a single basic concept.  One procedure for determining the internal consistency of a test is coefficient alpha ().  Coefficient alpha sets an upper limit to the reliability of tests constructed in terms of the domain-sampling model.

Test-Retest: Test-retest reliability examines the stability of test scores over time.  When the same test is administered twice within a reasonable time, the correlation of the results provides evidence of test-retest reliability. The closer the results, the greater the test-retest reliability of the assessment.

Alternate Form: Alternate-form reliability examines the consistency of test scores sampled from the same domain of items.  When two forms that are considered to be parallel, or interchangeable (i.e. LevelSet Forms D and E) are administered to the same group of students, the correlation coefficient provides information about how well the two parallel forms yield the same results for students and is often referred to as a coefficient of stability and equivalence.

 

  1. Reliability of performance level score (e.g., model-based, internal consistency, inter-rater reliability).

Type of Reliability

Age or Grade

n

Coefficient

95% Confidence Interval: Lower Bound

95% Confidence Interval: Upper Bound

Internal Consistency

2

1,586

0.90

0.89

0.91

Internal Consistency

3

686

0.84

0.82

0.86

Internal Consistency

4

775

0.85

0.83

0.87

Internal Consistency

5

960

0.81

0.79

0.83

Internal Consistency

6

605

0.80

0.77

0.83

Internal Consistency

7

459

0.80

0.77

0.83

Internal Consistency

8

180

0.72

0.64

0.78

Internal Consistency

9

329

0.81

0.77

0.84

Internal Consistency

10

321

0.80

0.76

0.84

Internal Consistency

11

75

0.65

0.50

0.76

Internal Consistency

12

562

0.77

0.74

0.80

Test-Re-Test

2

102

0.91

0.87

0.94

Test-Re-Test

3

118

0.92

0.87

0.94

Test-Re-Test

4

115

0.95

0.93

0.97

Test-Re-Test

5

177

0.94

0.92

0.96

Test-Re-Test

6

166

0.94

0.92

0.96

Test-Re-Test

7

120

0.97

0.96

0.98

Test-Re-Test

8

104

0.96

0.94

0.97

Test-Re-Test

9

109

0.92

0.89

0.94

Test-Re-Test

10

100

0.96

0.94

0.97

Alternate Form

2

187

0.80

0.74

0.85

Alternate Form

3

230

0.89

0.86

0.91

Alternate Form

4

228

0.93

0.91

0.95

Alternate Form

5

327

0.93

0.91

0.94

Alternate Form

6

284

0.93

0.91

0.94

Alternate Form

7

270

0.94

0.93

0.95

Alternate Form

8

190

0.95

0.93

0.96

Alternate Form

9

209

0.92

0.90

0.94

Alternate Form

10

168

0.92

0.89

0.94

Alternate Form

11

30

0.95

0.90

0.98

Alternate Form

12

12

0.94

0.80

0.98

 

Disaggregated Reliability

The following disaggregated reliability data are provided for context and did not factor into the Reliability rating.

Type of Reliability

Subgroup

Age or Grade

n

Coefficient

95% Confidence Interval: Lower Bound

95% Confidence Interval: Upper Bound

None

 

 

 

 

 

 

 

 

Validity

Grade3456789
RatingFull bubbleFull bubbleFull bubbleFull bubbleFull bubbleFull bubbleFull bubble

1.Description of each criterion measure used and explanation as to why each measure is appropriate, given the type and purpose of the tool: CAASSP.  The CAASPP is the California state assessment of ELA and is administered during the spring to all students in Grades 3-8 and 11.

STAAR.  The STAAR is the Texas state assessment of ELA and is administered during the spring to all students in Grades 3-8, English I, and English II.  Data for validity was from students who were administered the two assessments within 3 weeks.

 

2.Description of the sample(s), including size and characteristics, for each validity analysis conducted: CAASSP.  The sample consisted of 14,831 students in Grades 3-6 where: 45.97% were female and 52.57% were male; 10.26% were Filipino, 67.99% were Hispanic or Latino, and 11.16% were White (not Hispanic); and 51.53% were classified as economically disadvantaged.

STAAR.  The sample consisted of 41,148 students in Grades 3-8, English I, and English II where: 49.04% were female and 50.95% were male; 22.56% were Black/African American, 64.52% were Hispanic, and 9.76% were White (not Hispanic); and 80.77% were classified as eligible for free or reduced-price lunch.    

 

3.Description of the analysis procedures for each reported type of validity: CAASSP.  Correlation between interim or state assessment scale scores in the spring (prior grade level) and the Achieve3000 Lexile measure in the fall.

STAAR.  Correlation between interim or state assessment scale scores in the spring (prior grade level) and the Achieve3000 Lexile measure in the fall.

 

4.Validity for the performance level score (e.g., concurrent, predictive, evidence based on response processes, evidence based on internal structure, evidence based on relations to other variables, and/or evidence based on consequences of testing), and the criterion measures.

Type of Validity

Age or Grade

Test or Criterion

n

Coefficient

95% Confidence Interval: Lower Bound

95% Confidence Interval: Upper Bound

Predictive

3

CAASPP

3,418

0.80

0.79

0.81

Predictive

4

CAASPP

3,593

0.83

0.82

0.84

Predictive

5

CAASPP

3,939

0.82

0.81

0.83

Predictive

6

CAASPP

3,881

0.80

0.79

0.81

Concurrent

3

CAASPP

2,787

0.81

0.80

0.83

Concurrent

4

CAASPP

2,915

0.82

0.81

0.84

Concurrent

5

CAASPP

3,239

0.81

0.80

0.82

Concurrent

6

CAASPP

3,347

0.81

0.80

0.82

Predictive

3

STAAR

6,362

0.73

0.71

0.74

Predictive

4

STAAR

6,490

0.72

0.70

0.73

Predictive

5

STAAR

6,420

0.73

0.72

0.74

Predictive

6

STAAR

5,961

0.75

0.74

0.76

Predictive

7

STAAR

5,657

0.75

0.74

0.76

Predictive

8

STAAR

5,593

0.76

0.74

0.77

Predictive

9

STAAR

4,465

0.70

0.69

0.72

Concurrent

3

STAAR

3,570

0.77

0.76

0.79

Concurrent

4

STAAR

4,081

0.80

0.79

0.81

Concurrent

6

STAAR

4,088

0.84

0.83

0.85

Concurrent

7

STAAR

3,278

0.83

0.82

0.84

 

5.Results for other forms of validity (e.g. factor analysis) not conducive to the table format: Not Provided

 

6.Describe the degree to which the provided data support the validity of the tool: When the scores from two tests that have been developed to assess the same construct (i.e. reading comprehension) are highly correlated, it supports the validity argument for the use of test scores as measures of that construct.  Correlation coefficients showing the relationship between the LevelSet test scores and state or nationally normed reading tests provide evidence of criterion-related validity for the Achieve3000 LevelSet tests. The correlations shown indicate that the two tests are measuring similar a construct – reading comprehension.

 

Disaggregated Validity

The following disaggregated validity data are provided for context and did not factor into the Validity rating.

Type of Validity

Subgroup

Age or Grade

Test or Criterion

n

Coefficient

95% Confidence Interval: Lower Bound

95% Confidence Interval: Upper Bound

None

 

 

 

 

 

 

 

 

Results for other forms of disaggregated validity (e.g. factor analysis) not conducive to the table format: Not Provided

 

 

If your manual cites other published validity studies, provide these citations: In the LevelSet Technical Manual--

Study 1.  NWEA MAP is an interim assessment of reading comprehension and is administered typically three times per year to all students in the school.  The ISTEP+ was the Indiana state summative of ELA and administered all students in Grades 3-8 and 10.  The HAS was the Hawaii state summative assessment of ELA and administered to all students in Grades 3-10.  Data from Fall 2014 administrations of LevelSet from five school districts from across the United States were included in this validation study. This sample was a subset of the sample collected for the reliability studies. These school districts provided Achieve3000 with data from LevelSet administrations from their KidBiz3000, TeenBiz3000, and Empower3000 programs. In addition, scores from another test of reading comprehension administered during Spring 2014 were provided to serve as a criterion measure of reading comprehension.

 

Study 2.  Gates-MacGinitie Reading Test is a group-administered, norm-referenced assessment that yields scores for Vocabulary, Reading Comprehension, and Total Reading.  The test was administered as a pre-test and a post-test.  The sample for the study was selected from four school districts located in three regions of the United States (the West South region, the East North Central Region, and the Pacific region).  Two districts were classified large suburb and two districts were classified as large city.  Within each grade in the study, teachers were randomly assigned to the treatment or control groups.  Only treatment teachers implemented the Achieve3000 program, while both groups implemented their usual ELA materials.  A total of 512 students were in the treatment group with 127 (24.8%) in Grade 3, 263 (51.4%) in Grade 6, and 122 (23.8%) in Grade 9.  The treatment group consisted of: 222 (43.4%) females and 290 (56.6%) males; 178 (34.8%) students classified as Hispanic and 334 (65.2%) not classified as Hispanic; and 329 (64.3%) students classified as white, 116 (22.7%) classified as black of African American, 26 (5.1%) classified as Asian, and 36 (7.0%) others.  Of the students in the treatment group, 41 (8.0%) were classified as needing special education services, 183 (35.7%) received free- and reduced-price lunch, and 59 (11.5%) were classified as English language learners (ELL).

Sample Representativeness

Grade3456789
Data
  • Local without Cross-Validation
  • Local without Cross-Validation
  • Local without Cross-Validation
  • Local without Cross-Validation
  • Local without Cross-Validation
  • Local without Cross-Validation
  • Local without Cross-Validation
  • Primary Classification Accuracy Sample

    Criterion 1, Fall

    Grade

    3

    4

    5

    6

    7

    8

    9

    Criterion

    CAASPP

    CAASPP

    CAASPP

    CAASPP

    Not Provided

    Not Provided

    Not Provided

    National/Local Representation

    CA

    CA

    CA

    CA

    Not Provided

    Not Provided

    Not Provided

    Date

    Spring 2017

    Spring 2017

    Spring 2017

    Spring 2017

    Not Provided

    Not Provided

    Not Provided

    Sample Size

    2,787

    2,915

    3,239

    3,347

    Not Provided

    Not Provided

    Not Provided

    Male

    47%

    46%

    47%

    47%

    Not Provided

    Not Provided

    Not Provided

    Female

    53%

    54%

    53%

    53%

    Not Provided

    Not Provided

    Not Provided

    Gender Unknown

    0%

    0%

    0%

    0%

    Not Provided

    Not Provided

    Not Provided

    Free or Reduced-price Lunch Eligible

    52%

    51%

    52%

    51%

    Not Provided

    Not Provided

    Not Provided

    White, Non-Hispanic

    12%

    11%

    11%

    10%

    Not Provided

    Not Provided

    Not Provided

    Black, Non-Hispanic

    4%

    4%

    3%

    3%

    Not Provided

    Not Provided

    Not Provided

    Hispanic

    67%

    67%

    68%

    69%

    Not Provided

    Not Provided

    Not Provided

    American Indian/Alaska Native

    2%

    2%

    2%

    3%

    Not Provided

    Not Provided

    Not Provided

    Other

    10%

    10%

    10%

    10%

    Not Provided

    Not Provided

    Not Provided

    Race/Ethnicity Unknown

    3%

    4%

    4%

    4%

    Not Provided

    Not Provided

    Not Provided

    Disability Classification

    11%

    11%

    12%

    12%

    Not Provided

    Not Provided

    Not Provided

    First Language

    53%

    52%

    51%

    51%

    Not Provided

    Not Provided

    Not Provided

    Language Proficiency Status

    30%

    23%

    17%

    14%

    Not Provided

    Not Provided

    Not Provided

     

    Criterion 2, Fall

    Grade

    3

    4

    5

    6

    7

    8

    9

    Criterion

    STAAR

    STAAR

    STAAR

    STAAR

    STAAR

    STAAR

    STAAR

    National/Local Representation

    TX

    TX

    TX

    TX

    TX

    TX

    TX

    Date

    May 9, 2017

    May 9, 2017

    March 29, 2017

    May 9, 2017

    May 9, 2017

    March 29, 2017

    April 10, 2017

    Sample Size

    6,362

    6,490

    6,420

    5,961

    5,657

    5,593

    4,665

    Male

    50%

    52%

    50%

    50%

    51%

    51%

    52%

    Female

    50%

    48%

    50%

    50%

    49%

    49%

    48%

    Gender Unknown

    82%

    83%

    82%

    82%

    79%

    77%

    78%

    Free or Reduced-price Lunch Eligible

    Not Provided

    Not Provided

    Not Provided

    Not Provided

    Not Provided

    Not Provided

    Not Provided

    White, Non-Hispanic

    23%

    22%

    22%

    23%

    22%

    22%

    23%

    Black, Non-Hispanic

    62%

    65%

    64%

    65%

    65%

    65%

    66%

    Hispanic

    2%

    2%

    2%

    2%

    2%

    1%

    2%

    American Indian/Alaska Native

    1%

    1%

    1%

    1%

    1%

    1%

    1%

    Other

    Not Provided

    Not Provided

    Not Provided

    Not Provided

    Not Provided

    Not Provided

    Not Provided

    Race/Ethnicity Unknown

    5%

    6%

    8%

    28%

    24%

    21%

    21%

    Disability Classification

    44%

    43%

    40%

    30%

    25%

    22%

    23%

    First Language

    STAAR

    STAAR

    STAAR

    STAAR

    STAAR

    STAAR

    STAAR

    Language Proficiency Status

    TX

    TX

    TX

    TX

    TX

    TX

    TX

     

    Bias Analysis Conducted

    Grade3456789
    RatingYesYesYesYesYesYesYes
    1. Description of the method used to determine the presence or absence of bias: The Mantel-Haenszel (MH) Log Odds Ratio statistic, or estimated effect size, is used to determine the direction of differential item functioning (SAS Institute Inc., 1985). This measure is obtained by combining the odds ratios, j, across levels with the formula for weighted averages.  Educational Testing Service (ETS) classifies DIF based on the MH D-DIF statistic (Zwick, 2012), developed by Holland and Thayer.  Within Winsteps (Linacre, 2011), items are classified according to the ETS DIF Categories.

     

    1. Description of the subgroups for which bias analyses were conducted: Gender – 1,310 items (96.3% of items in the study): Male (N = 207,716) and Female (N = 195,174);

    Race – 1,070 items (78.7% of items in the study): Non-white (N = 20,778) and White (N = 16.977) – optional reporting field;

    Ethnicity – 506 items (37.2% of items in the study): Non-Hispanic (N = 5,954) and Hispanic (N = 32,227) – optional reporting field; and

    SES Status (Free and Reduced-Price Lunch) – 893 items (66.0% of items in the study): No (N = 8,474) and Yes (N = 14,162) – optional reporting field.

     

    1. Description of the results of the bias analyses conducted, including data and interpretative statements: Across the 1,360 LevelSet (version 2) items and Form B items in the field study, 42 items (3.28%) showed Class C DIF in relation to gender, 95 items (8.88%) showed Class C DIF in relation to race, 32 items (6.32%) showed Class C DIF in relation to ethnicity (Hispanic-non-Hispanic) status, and 82 items (9.18%) showed DIF in relation to socio economic status Class C DIF.

     

    Administration Format

    Grade3456789
    Data
  • Individual
  • Individual
  • Individual
  • Individual
  • Individual
  • Individual
  • Individual
  • Administration & Scoring Time

    Grade3456789
    Data
  • 15 minutes
  • 15 minutes
  • 15 minutes
  • 15 minutes
  • 15 minutes
  • 15 minutes
  • 15 minutes
  • Scoring Format

    Grade3456789
    Data
  • Automatic
  • Automatic
  • Automatic
  • Automatic
  • Automatic
  • Automatic
  • Automatic
  • Types of Decision Rules

    Grade3456789
    Data
  • Discontinue Rule
  • Discontinue Rule
  • Discontinue Rule
  • Discontinue Rule
  • Discontinue Rule
  • Discontinue Rule
  • Discontinue Rule
  • Evidence Available for Multiple Decision Rules

    Grade3456789
    Data
  • No
  • No
  • No
  • No
  • No
  • No
  • No