Hot Math Tutoring

Study: Fuchs, Fuchs, Craddock, Hollenbeck, Hamlett, et al. (2008)

Fuchs, L.S., Fuchs, D., Craddock, C., Hollenbeck, K.N., Hamlett, C.L., & Schatschneider, C. (2008). Effects of small-group tutoring with and without validated classroom instruction on at-risk students’ math problem solving: Are two tiers of prevention better than one? Journal of Educational Psychology, 100, 491-509. (NIHMS62377; PMID 19122881
Descriptive Information Usage Acquisition and Cost Program Specifications and Requirements Training

Hot Math Tutoring is a third-grade small-group tutoring program designed to enhance at-risk (AR) students’ word-problem performance. Based on schema theory, Hot Math Tutoring provides explicit instruction on (a) solution strategies for four word-problem types and (b) how to transfer those solution strategies to word problems with unexpected features, such as problems that include irrelevant information, or that present a novel question requiring an extra step, or that include relevant information presented in charts or graphs, or that combine problem types, and so on.

Hot Math Tutoring centers on four word-problem types, chosen from common third-grade curricula: “shopping list” word problems, “half” problems, step-up function or “buying bags” problems, and 2-step “pictograph” problems. The program is divided into 3-week units (three 20-30 minute sessions per week); one unit is devoted to each of the four word-problem types, and a one-week review is conducted following winter break. Frequent cumulative review across word-problem types is incorporated.

During the first 5 sessions of each unit, problem-solution instruction is delivered. Sessions 6-9 in each unit are designed to teach students to transfer the solution strategy to problems with unexpected questions or irrelevant information.


Hot Math Tutoring is intended for use in third grade. It is designed for use with students with disabilities (including learning disabilities, intellectual disabilities, and behavioral disabilities) and any student at risk of academic failure. The academic area of focus is math word problems.

Hot Math Tutoring has been used in more than 200 schools across the country.


Where to obtain:

The Fuchs Research Group

Lynn Davies
228 Peabody
Vanderbilt University
Nashville, TN 37220
Phone: 615-343-4782


Cost: Initial cost per student for implementing program: $80 per tutor plus ~$25 per student in copying

Replacement cost per student for subsequent use: ~$25

Included: Manual ($40), master copies of all materials ($40)
Not included: individual student copies of materials, concrete reinforcers

The manual provides all information necessary for implementation and includes master copies of all materials. Schools need to make copies of materials (lamination for posters and reusable materials is recommended).

Hot Math Tutoring is designed for use with individual students or small groups of two to four students.

Hot Math Tutoring takes 20-30 minutes per session with a recommended three sessions per week for 13 weeks.

The program includes a highly specified teacher’s manual. The program is not affiliated with a basal text, but can be used with a Classroom Hot Math program (2 times per week for 30-45 minutes per session); efficacy data support Hot Math Tutoring with or without Classroom Hot Math. No special technology is required.


One full day of training, plus follow-up by school or district staff with weekly supervision of tutors is required.

Tutors are trained in one full-day session. Tutors are introduced to the program and its goals and provided instruction, demonstrations, and scripted materials. They are paired to practice the program and are provided feedback from the trainer. Additional consultation with the trainer is available by email or phone following training. Tutors attend weekly meetings to learn about and practice upcoming program topics and to discuss challenges. These weekly meetings are supervised by a building or district instructional support person.

Instructors may be certified teachers or paraprofessionals. The training manuals have been used widely, and users report high levels of satisfaction.

To schedule Hot Math tutor training, contact


Participants: Convincing Evidence

Sample size: 84 students across 120 classrooms with students in third grade (2,023 students screened intially; 56 students in the treatment group and 28 students in the control group)

Risk Status: All of these students scored below the district criterion designating risk for math learning disabilities on the Test of Computation Fluency. The at-risk sample was at the 24th percentile (lowest 72 of each cohort’s 300 students). The 300 students were a representative sample on a combination of the pretest immediate transfer measure of math problem solving (a reliable index that correlates well with commercial measures of math problem solving) and pretest performance on the Test of Computational Fluency, a reliable and widely used measure of mathematics skill. I use the term “representative sample” in the research design sense, i.e., representing the full range of performance (e.g., not among a sample of students selected low or high performing). In the case of this study/sample, students were in a metropolitan area with a high proportion of subsidized lunch students. So in terms of a national sample, it is safe to assume the samples are below the 25th percentile of a nationally representative sample in the demographic sense.





p of chi square





Grade level







  Grade 1






  Grade 2






  Grade 3





NS (p=1.00) 

  Grade 4






  Grade 5






  Grade 6






  Grade 7






  Grade 8






  Grade 9






  Grade 10






  Grade 11






  Grade 12












NS (p=0.396)

  American Indian






  Asian/Pacific Islander
























Socioeconomic status

  Subsidized lunch





NS (p=0.988)

  No subsidized lunch






Disability status

  Speech-language impairments






  Learning disabilities






  Behavior disorders






  Intellectual disabilities












  Not identified with a disability





NS (p=0.266)

ELL status

  English language learner





NS (p=0.158)

  Not English language learner












NS (p=0.494)







Training of Instructors: None of the tutors was a certified teacher; only one tutor had previous experience tutoring. Tutors were trained in one full-day session. Tutors were introduced to the program and its goals and provided instruction, demonstrations, and scripted materials. They were paired to practice the program. Then, they condcuted one lesson for a trainer and were judged on a point-by point system for fidelity to treatment. A tutor who achieved 95% fidelity was considered reliable. A tutor who scored lower than 95% fidelity was coached on points he/she missed, asked to practice more, and then re-rated at a later time on another lesson. At weekly meetings, tutors met with a trainer to solve problems that arose. At the beginning of each unit, a 3-hour training session was conducted to orient tutors and distribute supporting materials. Across the four years of the study, the typical tutor was one to two years beyond undergraduate education, studying for a graduated degree in education, special education, counseling, or education policy. The majority of tutors worked for the project one year, with three tutors working for more than one year. Each year of the study, two full-time project coordinators, typically with bachelor's or master's level degrees typically outside of education, also tutored. Each year, five or six tutors were needed. (None of the tutors conducted Classroom Hot Math and Hot Math Tutoring).

Design: Convincing Evidence

Did the study use random assignment?: Yes.

If not, was it a tenable quasi-experiment?: Not applicable.

If the study used random assignment, at pretreatment, were the program and control groups not statistically significantly different and had a mean standardized difference that fell within 0.25 SD on measures used as covariates or on pretest measures also used as outcomes?: Yes.

If not, at pretreatment, were the program and control groups not statistically significantly different and had a mean standardized difference that fell within 0.25 SD on measures central to the study (i.e., pretest measures also used as outcomes), and outcomes were analyzed to adjust for pretreatment differences?: Not applicable.

Were the program and control groups demographically comparable at pretreatment?: Yes.

Was there attrition bias1 ?: No.

Did the unit of analysis match the unit for random assignment (for randomized studies) or the assignment strategy (for quasi-experiments)?: Yes.

1 NCII follows guidance from the What Works Clearinghouse (WWC) in determining attrition bias. The WWC model for determining bias based on a combination of differential and overall attrition rates can be found on pages 13-14 of this document:


Fidelity of Implementation: Convincing Evidence

Describe when and how fidelity of treatment information was obtained: Each tutoring session was audiotaped. At the study’s end, four research assistants independently listened to tapes while completing a checklist to identify the percentage of points addressed. We sampled tapes so that, within conditions, tutors, groups, and session numbers were sampled equitably. For each of 64 tutoring small groups, 20% of sessions were sampled (7-8 tapes distributed equally across the four units). Intercoder agreement, calculated on 20% of the sampled tapes, was 96.4%.

Provide documentation (i.e., in terms of numbers) of fidelity of treatment implementation: The mean percentage of points addressed across all units was 98.12 (SD = 1.28).

Measures Targeted: Convincing Evidence

Measures Broader: Convincing Evidence

Proximal  Measure Score type & range of measure Reliability statistics Relevance to program instructional content

Immediate Transfer


Number correct (0-44)


Coefficient alpha on this sample 0.84-0.95


Incorporates novel problems in the same format as the problems used for problem-solution instruction; none of the cover stories are used for instruction.

Distal Measure Score type & range of measure Reliability statistics Relevance to program instructional content

Near Transfer


Number correct (0-79)


Coefficient alpha on this sample 0.87-0.96


Incorporates novel problems that vary from the problems used for problem-solution instruction in terms of one or more of the transfer features addressed in Hot Math Tutoring: unfamiliar vocabulary, different question, irrelevant information, or combination of problem types. Comprises nine problems: a shopping list problem with a novel format (information shown in bulleted format, with a selection rather than an open-ended response format); a shopping list problem with a novel question (asking for money left at the end); a buying bags problem with a different key word (packages instead of bags); a buying bags problem with a novel question (comparing prices of two packaging options); a half problem with unfamiliar vocabulary (share equally instead of half); a pictograph problem with a novel question (asking for money left at the end); a pictograph problem with a novel question (comparing quantities at the end); a problem with irrelevant information that combined a buying bags problem with a pictograph problem and combined novel vocabulary with a novel question; and a problem with irrelevant information that combined a shopping list problem with a buying bags problem and combined a novel format with a novel question.

Far Transfer


Number correct (0-72)


Coefficient alpha on this sample 0.91-0.94


Designed to mirror real-life problems; varies from the problems used for instruction in multiple ways: is formatted to look like a commercial, standardized test; presents a multi-paragraph with four questions; some of the information needed to answer the question is removed from the multi-paragraph narrative and placed in figures or question stems; contains multiple pieces of numeric and narrative irrelevant information; provides opportunities for students to formulate decisions; combines all four problem types; and varies all four Hot Math Tutoring transfer features. Simultaneously assessed transfer of all four problem types and the four transfer features addressed in Hot Math Tutoring. Also, to decrease association between the task and classroom or Hot Math Tutoring, far transfer was formatted to look like a commercial test (printed with a formal cover, on green paper, with photographs and graphics interspersed throughout the test booklet). Two assessments were constructed as alternate forms: Although the context of the problem situations differed, the structure of the problem situation and the questions are identical, and the problem solutions and reading demands are equivalent.


Number of Outcome Measures: 3 Math

Mean ES - Targeted: 1.15*

Mean ES - Broader: 0.60*

Effect Size:

Targeted Measures

Construct Measure Effect Size
Math Immediate Transfer 1.15***

Broader Measures

Construct Measure Effect Size
Math Near Transfer 0.82***
Math Far Transfer 0.38


*        p ≤ 0.05
**      p ≤ 0.01
***    p ≤ 0.001
–      Developer was unable to provide necessary data for NCII to calculate effect sizes
u      Effect size is based on unadjusted means
†      Effect size based on unadjusted means not reported due to lack of pretest group equivalency, and effect size based on adjusted means is not available


Visual Analysis (Single Subject Design): N/A

Disaggregated Data for Demographic Subgroups: No

Disaggregated Data for <20th Percentile: No

Administration Group Size: Small Group, (n=2-4)

Duration of Intervention: 20-30 minutes, 3 times a week, 13 weeks

Minimum Interventionist Requirements: Paraprofessional, 8 hours of training plus weekly follow-up

Reviewed by WWC or E-ESSA: E-ESSA

What Works Clearingouse

This program was not reviewed by What Works Clearinghouse.


Evidence for ESSA

No studies considered met Evidence for ESSA's inclusion requirements.

Other Research: Potentially Eligible for NCII Review: 0 studies