"

18 RStudio Workshop: Analysis of Variance (ANOVA)

ANOVA stands for Analysis of Variance, and it is a statistical technique used to compare the means of three or more groups to determine if there are statistically significant differences between them. ANOVA helps you understand whether the observed differences in means are likely due to genuine effects or if they could have occurred by chance.

The fundamental concept of ANOVA involves splitting the overall variation seen in the data into two parts: one related to differences between groups and the other reflecting differences within groups. When the variation between groups is notably greater than the variation within groups, it indicates notable distinctions among the means of the groups.

There are different types of ANOVA, including one-way ANOVA (comparing means of one factor across multiple groups) and two-way ANOVA (examining the influence of two factors on a response variable). ANOVA helps you understand the variability in data and whether it is significantly influenced by different groupings or factors.

The assumptions of an ANOVA are similar to those of a t-test, but they apply to multiple groups. The main assumptions are:

  • Independence: Observations within each group are independent of each other. Independence assumption is typically based on the experimental design and isn’t directly tested statistically so consult with your supervisor if you’re worried about violating this assumption
  • Normality: The residuals (differences between observed and predicted values) are approximately normally distributed within each group.
    • Visual Inspection: Create histograms or Q-Q plots of the residuals for each group. If they look like bell curves, normality is likely met.
    • Use “shapiro.test()” to perform a Shapiro-Wilk normality test on residuals for each group.

#Shapiro-Wilk normality test for Group 1

shapiro.test(residuals_group1)

#Shapiro-Wilk normality test for Group 2

shapiro.test(residuals_group2)

#Repeat for other groups

  • Homogeneity of Variance (Equal Variances): The variances of the residuals are roughly equal across all groups.

Use leveneTest() from the “car” package to perform Levene’s test for homogeneity of variances.

# Levene's test for homogeneity of variances (install and load "car" package)

leveneTest(residuals ~ group_variable)

Remember to replace residuals with your calculated residuals and group_variable with the variable that defines your groups. If the assumptions are not met, consider using alternatives like Welch’s ANOVA (for unequal variances) or non-parametric tests. Always assess the assumptions carefully, as violations can affect the validity of your ANOVA results. Again consult with your supervisor on these assumptions and the tests.