R Cheat Sheet: Regression & Experimental Analysis

Quick Reference for Lab Session

Author

Affiliation

Linear Regression

Fitting Models

# Simple regression
model <- lm(dependent_var ~ independent_var, data = mydata)

# Multiple regression
model <- lm(dependent_var ~ var1 + var2, data = mydata)

# With log transformation
model <- lm(log(dependent_var) ~ independent_var, data = mydata)

Examining Results

summary(model)                    # Full model summary
coef(model)                       # Extract coefficients
confint(model)                    # Confidence intervals
summary(model)$r.squared          # R²

Model Diagnostics

# Extract values
fitted_values <- fitted(model)      # Fitted values (predictions)
residuals_values <- residuals(model) # Residuals

# Diagnostic plots
plot(fitted_values, residuals_values)  # Residuals vs Fitted
qqnorm(residuals_values)               # Q-Q plot
qqline(residuals_values)               # Add reference line

t-Tests

Independent Samples t-test

# Basic syntax
t.test(outcome ~ group, data = mydata)

# With equal variances assumed
t.test(outcome ~ group, data = mydata, var.equal = TRUE)

# With unequal variances (Welch's test)
t.test(outcome ~ group, data = mydata, var.equal = FALSE)

Paired t-test

t.test(after ~ before, data = mydata, paired = TRUE)

Effect Size (Cohen’s d)

library(effectsize)
cohens_d(outcome ~ group, data = mydata)

ANOVA

One-Way ANOVA

# Using aov()
model <- aov(outcome ~ group, data = mydata)
summary(model)

# Using lm() (gives same results)
model <- lm(outcome ~ group, data = mydata)
anova(model)

Effect Size (η²)

library(effectsize)
eta_squared(model)

Post-Hoc Tests

# Tukey HSD (for aov models)
TukeyHSD(model)

# Alternative using emmeans (works with lm too)
library(emmeans)
emmeans(model, pairwise ~ group)

Assumption Checking

Normality Tests

# Shapiro-Wilk test (H0: data is normally distributed)
shapiro.test(mydata$variable)

# For residuals
shapiro.test(residuals(model))

# Visual check
qqnorm(mydata$variable)
qqline(mydata$variable)

Equal Variances Test

library(car)
# Levene's test (H0: variances are equal)
leveneTest(outcome ~ group, data = mydata)

Data Transformation

Log Transformation

# Create new variable
mydata <- mydata %>%
  mutate(log_var = log(variable))

# Interpretation: coefficients = % change
# Formula: percentage_change = (exp(coefficient) - 1) * 100

Square/Quadratic

# For U-shaped or inverted-U relationships
mydata <- mydata %>%
  mutate(var_squared = variable^2)

model <- lm(outcome ~ variable + var_squared, data = mydata)

Visualization (ggplot2)

Scatter Plot with Regression Line

ggplot(data, aes(x = predictor, y = outcome)) +
  geom_point() +
  geom_smooth(method = "lm", se = TRUE)

Boxplot for Group Comparisons

ggplot(data, aes(x = group, y = outcome, fill = group)) +
  geom_boxplot() +
  geom_jitter(width = 0.2, alpha = 0.5) +
  stat_summary(fun = mean, geom = "point", shape = 23, size = 3, fill = "red")

Bar Plot with Error Bars

# First calculate means and SE
summary_data <- data %>%
  group_by(group) %>%
  summarise(
    mean = mean(outcome),
    se = sd(outcome) / sqrt(n())
  )

# Then plot
ggplot(summary_data, aes(x = group, y = mean)) +
  geom_col(fill = "steelblue") +
  geom_errorbar(aes(ymin = mean - 1.96*se, ymax = mean + 1.96*se), 
                width = 0.2)

Common dplyr Operations

library(dplyr)

# Filter rows
data %>% filter(variable > 10)

# Select columns
data %>% select(var1, var2)

# Create new variables
data %>% mutate(new_var = var1 + var2)

# Group operations
data %>%
  group_by(group) %>%
  summarise(
    mean = mean(outcome),
    sd = sd(outcome),
    n = n()
  )

Interpretation Guidelines

p-values

p < 0.05: Statistically significant (reject null hypothesis)
p ≥ 0.05: Not statistically significant (fail to reject null)
Report exact p-values when possible (not just “p < 0.05”)

Effect Sizes

Cohen’s d (for t-tests)

Small: d ≈ 0.2
Medium: d ≈ 0.5
Large: d ≈ 0.8

η² (eta-squared, for ANOVA)

Small: η² ≈ 0.01 (1% of variance)
Medium: η² ≈ 0.06 (6% of variance)
Large: η² ≈ 0.14 (14% of variance)

R² (for regression)

Proportion of variance explained
Range: 0 to 1 (or 0% to 100%)
Higher is better, but context matters!

Quick Troubleshooting

Common Errors

Error	Likely Cause	Solution
`object not found`	Variable name typo or data not loaded	Check spelling, use `head(data)`
`non-numeric argument`	Using categorical where numeric expected	Check variable type with `str(data)`
`missing values`	NA values in data	Use `na.omit()` or check with `sum(is.na(data))`
Package not found	Library not installed	Run `install.packages("packagename")`

Getting Help in R

?function_name        # Help for specific function
??search_term         # Search help files
example(lm)          # See examples
str(data)            # Structure of data
head(data)           # First 6 rows
summary(data)        # Summary statistics

Essential Packages to Load

library(ggplot2)      # Visualization
library(dplyr)        # Data manipulation
library(broom)        # Model tidying (optional, but useful)
library(effectsize)   # Effect size calculations
library(car)          # Assumption tests (Levene's test)

Statistical Decision Tree

Do you have...

1 continuous outcome + 1 continuous predictor?
  → Linear Regression (lm)

1 continuous outcome + 2+ continuous predictors?
  → Multiple Regression (lm)

1 continuous outcome + 1 categorical predictor (2 groups)?
  → Independent t-test (t.test)

1 continuous outcome + 1 categorical predictor (3+ groups)?
  → One-Way ANOVA (aov or lm)

Non-linear relationship?
  → Consider transformation (log, quadratic)

Binary outcome (0/1)?
  → Logistic Regression (glm, family = binomial)

Tip: Keep this sheet handy during the exercise portion. Most questions can be answered by adapting these templates to your specific variable names!

Other Formats

Linear Regression

Fitting Models

Examining Results

Model Diagnostics

t-Tests

Independent Samples t-test

Paired t-test

Effect Size (Cohen’s d)

ANOVA

One-Way ANOVA

Effect Size (η²)

Post-Hoc Tests

Assumption Checking

Normality Tests

Equal Variances Test

Data Transformation

Log Transformation

Square/Quadratic

Visualization (ggplot2)

Scatter Plot with Regression Line

Boxplot for Group Comparisons

Bar Plot with Error Bars

Common dplyr Operations

Interpretation Guidelines

p-values

Effect Sizes

Cohen’s d (for t-tests)

η² (eta-squared, for ANOVA)

R² (for regression)

Quick Troubleshooting

Common Errors

Getting Help in R

Essential Packages to Load

Statistical Decision Tree