001) and Year 2 ( p < .001), when evaluations were administered on paper in the classroom for all face-to- face courses and online for all online courses. Although the difference in response rate between face-to- face and online courses during the Year 3 administration was statistically reliable (when both face-to-to- face and online courses were evaluated with online surveys), the effect was small (η p 2 = .02). Thus, there was minimal difference in response rate between face-to-face and online courses when evaluations were administered online for all courses. No other factors or interactions included in the analysis were statistically reliable. Evaluation Ratings The same 2 × 3 × 3 analysis of variance model was used to evaluate mean SET ratings. This analysis produced two statistically significant main effects. The first main effect involved evaluation year, F (1.86, 716) = 3.44, MSE = 0.18, p = .03 (η p 2 = .01; see Footnote 1). Evaluation ratings associated with the Year 3 administration ( M = 3.26, SD = 0.60) were significantly lower than the evaluation ratings associated with 1 A Greenhouse–Geisser adjustment of the degrees of freedom was performed in anticipation of a sphericity assumption violation. 2 A test of the homogeneity of variance assumption revealed no statistically significant difference in response rate variance between the two delivery modes for the 1st, 2nd, and 3rd years.

COMPARISON OF STUDENT EVALUATIONS OF TEACHING 7 both the Year 1 ( M = 3.35, SD = 0.53) and Year 2 ( M = 3.38, SD = 0.54) administrations. Thus, all courses received lower SET scores in Year 3, regardless of course delivery method and course level. However, the size of this effect was small (the largest difference in mean rating was 0.11 on a five-item scale). The second statistically significant main effect involved delivery mode, F (1, 358) = 23.51, MSE = 0.52, p = .01 (η p 2 = .06; see Footnote 2). Face-to-face courses ( M = 3.41, SD = 0.50) received significantly higher mean ratings than did online courses ( M = 3.13, SD = 0.63), regardless of evaluation year and course level. No other factors or interactions included in the analysis were statistically reliable. Stability of Ratings The scatterplot presented in Figure 1 illustrates the relation between SET scores and response rate. Although the correlation between SET scores and response rate was small and not statistically significant, r (362) = .07, visual inspection of the plot of SET scores suggests that SET ratings became less variable as response rate increased. We conducted Levene’s test to evaluate the variability of SET scores above and below the 60% response rate, which several researchers have recommended as an acceptable threshold for response rates (Berk, 2012, 2013; Nulty, 2008). The variability of scores above and below the 60% threshold was not statistically reliable, F (1, 362) = 1.53, p = .22. Discussion Online administration of SETs in this study was associated with lower response rates, yet it is curious that online courses experienced a 10% increase in response rate when all courses were evaluated with online forms in Year 3. Online courses had suffered from chronically low response rates in previous years, when
