Wrapping my head around random versus fixed effects took me a while in graduate school. In part, this is because multiple definitions exist. Within ecology, the general definition I see is that a fixed effect is estimated by itself whereas a random effect comes from a higher distribution. Two examples drilled this home for me and helped it click.
First, the question: “Do we care about the specific group or only that the groups might be having an impact?” helped me see the difference between fixed and random effects. For example, if we were interested in air quality data as a function of temperature across cities, city could be either a fixed or random effect. If city was a fixed effect, then we would be interested the air quality at that specific city (e.g., the air quality in New York, Los Angles, and Chicago). Conversely, if city as a random effect, then we would not care about a specific city, only that a city might impact the results due to city specific conditions.
Second, an example in one of Marc Kerry’s book on WinBugs drilled home the point. Although he used WinBugs, the R package lme4 can be used to demonstrate this. Additionally, although his example was something about snakes, a generic regression will work. (I mostly remember the figure and had to recreate it from memory. It was about ~5 or 6 years ago and I have not been able to find the example in his book to recreate it, hence I coded this from memory). Here’s the code
population = rep(c(“a”, “b”, “c”), each = 3)
intercept = rep( c(1, 5, 6), each = 3)
slope = 4
sd = 2.0
dat = data.frame(
population = population,
interceptKnown = intercept,
slopeKnown = slope,
sdKnown = sd,
predictor = rep(1:3, times = 3))
dat$response = with(dat,
rnorm(n = nrow(dat), mean = interceptKnown, sd = sdKnown) +
predictor * slopeKnown
## Run models
lmOut <- lm(response ~ predictor + population, data = dat)
lmerOut <- lmer( response ~ predictor + (1 | population), data = dat)
## Create prediction dataFrame
dat$lm <- predict(lmOut, newData = dat)
dat$lmer <- predict(lmerOut, newData = dat)
ggplot(dat, aes(x = predictor, y = response, color = population)) +
geom_point(size = 2) +
scale_color_manual(values = c(“red”, “blue”, “black”)) +
geom_line(aes(x = predictor, y = lm)) +
geom_line(aes(x = predictor, y = lmer), linetype = 2)
Which produces this figure:
Example of a fixed-effect intercept (solid line) compared to a random-effect (dashed line) regression analysis.
Play around with the code if you want to explore this more. At first, I could not figure out how to make the dashed lines be farther apart from the solid lines. Change the simulated standard deviation to see what happens. Hint, my initial guess of decreasing did not help.