Experimental setup analysis
For the experimental setup, two analyses were performed: a within course/between section comparison, looking at the achievement of sections that were part of the experiment versus the control sections; and a within section comparison, looking at the achievement of students who swapped for a physical resource versus those that did not.
Within course comparison
For the within course comparison, three sub-analyses were performed: all sections of a course were compared against each other; experimental sections and control sections were each grouped together and compared; and finally, formats (i.e. physical and digital) were grouped together and compared.
All sections comparison
For the all sections analysis, each section was compared to the others within a course to determine if any section had a statistically significant difference, independent of the section being a control or experimental section. The normality of each section was tested using the Shapiro-Wilk test. Each section showed deviations from normality (p < .05). Because of the small sample size for each section, it was decided that a non-parametric test would be used to assess differences between each course’s sections. A Kruskal-Wallis test was used in place of a Mann-Whitney U test as there were more than two sections for some of the courses. One statistically significant difference was detected in Course 2, χ2(2) = 8.429, p = .015. Pairwise comparisons of sections were performed using Dunn’s procedure with a Bonferroni correction for multiple comparisons. Adjusted p-values are presented. Values are mean ranks unless otherwise indicated. The post hoc analysis found significant differences in student grades between sections C1 (44.78) and C2 (65.91) (p = .012), but not between E1 (57.31) and either of the control sections.
Experimental sections versus control sections comparison
The normality of each group (i.e. experimental and control, for each course) was tested using the Shapiro-Wilk test and the presence of outliers was assessed using boxplots. Both groups for each course showed deviations from normality (p < .05); some groups additionally had outliers. Because of the small sample size, it was decided that a non-parametric test would be used in place of a parametric test. A Mann-Whitney U test was used to compare the distribution of grades between the experimental and control groups for each course. None of the results were statistically significant (i.e. all were p > .05), indicating that the null hypothesis of each group having the same grade distribution should be retained. The results of the test for each course are summarized below.
Mann-Whitney U test of experimental versus control flagged students by course
Course | U | z | p |
---|---|---|---|
Course 1 | 1753.5 | 1.011 | .312 |
Course 2 | 1417.5 | .310 | .756 |
Course 3 | 582.0 | 1.156 | .248 |
Course 4 | 1860.0 | .162 | .871 |
Physical versus digital format comparison
For this analysis, students were groups as either digital (i.e. control sections plus students who opted not to swap in experimental sections) or physical (i.e. students in experimental sections who swapped) and the grades of the groups were compared. For this grouping, it should be additionally noted that while some students did explicitly reject the physical format when offered (and were thus noted as “digital” for flagging purposes), students who were never present—and therefore did not express a preference either way—are also included in the digital group.
The normality of each group (i.e. experimental and control, for each course) was tested using a Shapiro-Wilk test. Both groups for each course displayed deviations from normality. Because of the small sample size, a non-parametric test was chosen. A Mann-Whitney U test was used to compare the distribution of grades in each group for each course. The results are summarized below:
Mann-Whitney U test of physical versus digital resource format by course
Course | U | z | p | Group (mean ranks) Physical | Group (mean ranks) Digital |
---|---|---|---|---|---|
Course 1 | 1661.0 | 2.514 | .031* | 70.87 | 55.62 |
Course 2 | 1180.5 | 1.000 | .317 | ||
Course 3 | 376.5 | 2.029 | .042* | 43.15 | 30.53 |
Course 4 | 1748.5 | -.240 | .810 |
* denotes statistical significance at the p < .05 level
Thus, grouping based on resource format, we see statistically significant differences in the distributions of grades for Courses 1 and 3, with the physical format group having a higher mean rank than the digital group in both courses. The shapes of the distributions for the groups of each course were dissimilar, thus the difference in medians of the two groups was not compared.
Within section comparison
For each experimental section that had a swap rate of less than 100% (3 sections total), students were grouped based on their resource format (i.e. physical versus digital) and a comparison of the two groups was conducted. As above, students ended up in the digital group of an experimental section if they either opted not to swap their resource or if they never indicated a preference and thus defaulted to their digital IPM resource. The normality of each group (i.e. physical versus digital) was tested using the Shapiro-Wilk test. Deviations from normality were detected in each group for each course section (p < .05), except for the digital format group of Course 4, section E2.
Based on the small sample size, a non-parametric test was used to assess differences between the groups. A Mann-Whitney U test was used to compare the distribution of grades between the format groups. Values are mean ranks unless otherwise noted. One statistically significant difference was found for Course 1, with the physical format group (23.55) having a higher mean rank than the digital format group (14.05), U = 241.5, z = 2.281, p = .023. Because of the dissimilar shapes of the distributions, however, no comparison of medians was performed. Courses 2, 3, and 4 did not show any statistically significant differences between the distribution of grades of the two formats.