Mixed Effects Logistic Regression

Generalized linear models use a link function \(g(\cdot)\) that transforms the continuous, unbounded response variable \(y\) of linear regression onto some discrete, bounded space. This allows us to model outcomes that are not continuous and do not have normally distributed errors. To obtain the relationship between the predictors and the untransformed response variable, we apply the inverse link function \(g^{-1}(\cdot)\) to the right hand side of the model.

\[ y = g^{-1}(\mathbf{X}\boldsymbol{\beta}) \]

This linear combination of the explanatory variables is known as the linear predictor and is represented by the greek letter \(\eta\). Since software handles all the nuts and bolts of estimation via maximum likelihood, we can just throw some random effects terms into \(\eta\) without having to worry. The only change we have to make in R is to use glmer() instead of lmer(). Let’s try this out with some survey data. Reveal the code below and run it in R, since I’m using some tricks to clean the data that we won’t learn until later in the semester.

library(plyr) # load plyr first to avoid dplyr conflicts later
library(tidyverse)
library(countrycode)

## read in data
ab <- rio::import('http://afrobarometer.org/sites/default/files/data/round-6/merged_r6_data_2016_36countries2.sav')
au <- read.csv('https://jayrobwilliams.com/data/teaching/au.csv')

## drop missing observations, subset variables, and attach region
ab <- ab %>% filter(Q90A %in% 0:1, # close to party
                    EA_FAC_A %in% 0:1, # post office
                    EA_FAC_B %in% 0:1, # school
                    EA_FAC_C %in% 0:1, # police station
                    EA_FAC_D %in% 0:1, # clinic
                    Q4A %in% 0:5, # present economic condition
                    Q11A %in% 0:3, # property crime
                    Q11B %in% 0:3, # violent crime
                    Q52A %in% 0:3, # trust president
                    Q97 %in% 0:9) %>% # education level
  mutate(COUNTRY = countrycode(gsub('[0-9]*', '', RESPNO), origin = 'cowc',
                               destination = 'country.name', nomatch = NULL),
         COUNTRY = recode(COUNTRY, 'BDI' = 'Burundi', 'CVE' = 'Cape Verde',
                          'MAU' = 'Mauritius', 'MLW' = 'Malawi',
                          'MOZ' = 'Mozambique', 'NGR' = 'Niger',
                          'SRL' = 'Sierra Leone', 'TAN' = 'Tanzania')) %>%
  select(COUNTRY, Q90A, EA_FAC_A:EA_FAC_D, Q4A, Q11A, Q11B, Q52A, Q97) %>%
  left_join(au) %>%
  filter(!is.na(REGION))

## create custom coefficient map for texreg tables
tab_map <- list('EA_FAC_A' = 'Post Office', 'EA_FAC_B' = 'School',
                'EA_FAC_C' = 'Police Station', 'EA_FAC_D' = 'Clinic',
                'Q4A' = 'Economic Condition', 'Q11A' = 'Property Crime',
                'Q11B' = 'Violent Crime', 'Q52A' = 'Trust President',
                'Q97' = 'Education', '0|1' = '0|1', '1|2' = '1|2', '2|3' = '2|3')

Now that we’ve got our data, let’s fit a model. Write a model where whether a respondent feels close to a party is a function of their access to post offices, schools, police stations, and clinics, and their education level. Include a random intercept by country. In scalar form, this model looks like this:

\[\begin{align} \text{party}_i &= \text{logit}^{-1}(\beta_0 + \beta_1post~office + \beta_2school + \beta_3police + \beta_4clinic + \beta_5education + \alpha_{country} + \epsilon) \\ \alpha &\sim \mathcal{N}(0, \sigma^2) \end{align}\]
library(lme4)
library(texreg)

## fit model w/ nested random intercepts
mod <- glmer(Q90A ~ EA_FAC_A + EA_FAC_B + EA_FAC_C + EA_FAC_D + Q97 +
               (1 | COUNTRY), data = ab, family = binomial(link = 'logit'))

## present results
htmlreg(mod, stars = .05, custom.coef.map = tab_map)
Statistical models
Model 1
Post Office -0.12*
(0.03)
School 0.09*
(0.04)
Police Station -0.03
(0.03)
Clinic 0.03
(0.03)
Education -0.01*
(0.01)
AIC 53450.56
BIC 53511.46
Log Likelihood -26718.28
Num. obs. 44371
Num. groups: COUNTRY 33
Var: COUNTRY (Intercept) 0.56
*p < 0.05

Interpretation of the fixed effects (in the lmer() sense!) is straightforward and equivalent to a regular logit. To understand the random effects, we need to do a little more work. The table tells us that the variance of the random intercept by country is 0.56. Since we have assumed mean 0 for the distribution of the random effects, a variance of 0.56 means that there isn’t large variation in the baseline probability of feeling close to a political party in the countries in our data.

Quantities of Interest

Since coefficients are no longer the marginal effect of \(x\) on \(y\) in GLMs, we often present quantities of interest to facilitate model intepretation. Unlike regular GLMs, we can no longer follow our standard approach and simply calculate \(\text{logit}^{-1}(\mathbf{X}\boldsymbol{\beta})\) for a number of \(\boldsymbol{\beta}\) vectors sampled from a multivariate normal distributions. Doing this only accounts for uncertainty in the fixed effects and ignores uncertainty in the random effects because it treats them as known and ignores the fact that they are estimates drawn from a distribution. If we do this, we’re in effect ignoring a large amount of the uncertainty in our model, and so the confidence intervals of our predictions will be smaller than they should be.

Luckily the simulate.merMod() function takes care of this for us. It is similar to the base predict(), except it allows you to draw new random effects from their distribution. This means we can use it to generate predicted probabilities be averaging across the predictions for all of our simulated coefficients. Use the code from this week’s lecture slides as a starting point to calculate the predicted probability of feeling close the president as a function of education level. Plot the results for Mauritius, Senegal, and Sierra Leone.

library(MASS) # multivariate random normal distribution
library(doParallel) # parallel plyr
registerDoParallel(parallel::detectCores()) # use all cores
source('https://raw.githubusercontent.com/jayrobwilliams/RWmisc/master/theme_rw.R')

## subset to countries of interest
pred_dat <- ab %>% subset(COUNTRY %in% c('Mauritius', 'Senegal', 'Sierra Leone'))

## repeat data for each value in Q97
pred_dat_full <- pred_dat[rep(seq_len(nrow(pred_dat)), length(unique(ab$Q97))), ]

## replace Q97 with simulated range of values
pred_dat_full$Q97 <- with(ab, rep(seq(min(Q97), max(Q97),
                                      length.out = length(unique(Q97))),
                                  each = nrow(pred_dat)))

## sample 500 draws from distribution of fixed effects
coef_samp <- mvrnorm(n = 500, mu = fixef(mod), Sigma = vcov(mod))

## predict outcomes for all 500 sampled fixed effects
preds <- alply(coef_samp, 1, function(x) {
  simulate(mod, nsim = 1, # 1 simulation
           re.form = ~ 0, # sample all REs
           newdata = pred_dat_full, # generate predictions from simulated data
           newparams = list(beta = x)) # use sampled fixed effects
  }, .parallel = T, .paropts = list(.export = c('mod', 'pred_dat_full')))

## create empty column to hold simulated predictions
pred_dat_full$pred <- NA

## calculate predicted probabilities from simulated predictions
preds_agg_all <- ldply(preds, function(x, dat) {
  dat$pred <- x[[1]] # set prediction to simulated outcome
  ddply(dat, c('COUNTRY', 'Q97'), # split data by country and Q97
        function(m) mean(m$pred)) # calculate proportion of 1s
  }, dat = pred_dat_full, .parallel = T,
  .paropts = list(.export = c('mod', 'pred_dat_full')))

## calculate median and quantiles for plotting
preds_agg <- ddply(preds_agg_all, c("COUNTRY", "Q97"),
                   function (y) quantile(y$V1, c(0.05, 0.5, 0.95)))

## rename columns for ggplot access
names(preds_agg)[3:5] <- c("LB", "PE", "UB")

## plot predicted probabilities
ggplot(preds_agg, aes(x = Q97, y = PE, ymin = LB, ymax = UB)) +
  geom_linerange() +
  geom_point() +
  facet_wrap(~ COUNTRY) +
  theme_rw()

We can see that there isn’t a statistically significant difference between any of the countries, but that shouldn’t be surprising given that the variance of our country random intercept is 0.56.

Nested Random Effects

One of the advantages of lme4 is that it allows us to estimate nested random effects models when we have data with multiple levels of dependence. We can extend the model above to allow the mean to the country random intercept to vary as a function of region.

\[\begin{align} \text{party}_i &= \text{logit}^{-1}(\mathbf{x}_i\boldsymbol{\beta} + \alpha_{j[i]}) \\ \alpha &\sim \mathcal{N}(\mu, \sigma^2) \\ \mu_i &= \gamma\text{Region}_i \\ \end{align}\]

The way to do this in glmer() is (1 | group1 / group2) where group2 is a smaller group nested in group1 e.g. countries in regions, cities in states, armed groups in countries, etc. Estimate a model in glmer() that matches this new statistical mode.

## fit model w/ nested random intercepts
mod <- glmer(Q90A ~ EA_FAC_A + EA_FAC_B + EA_FAC_C + EA_FAC_D + Q97 +
               (1 | REGION / COUNTRY), data = ab, family = binomial(link = 'logit'))

## present results
htmlreg(mod, stars = .05, custom.coef.map = tab_map)
Statistical models
Model 1
Post Office -0.12*
(0.03)
School 0.09*
(0.04)
Police Station -0.03
(0.03)
Clinic 0.03
(0.03)
Education -0.01*
(0.01)
AIC 53452.01
BIC 53521.62
Log Likelihood -26718.01
Num. obs. 44371
Num. groups: COUNTRY:REGION 33
Num. groups: REGION 5
Var: COUNTRY:REGION (Intercept) 0.51
Var: REGION (Intercept) 0.05
*p < 0.05

While this nested approach allows us to account for more complex depdence structures, it also complicates the interpetation of our results. Now to get the intercept for a given country, we have to combine the fixed effect intercept, the country:region intercept, and the region intercept. Do this for South Africa, but remember that a merMod object is not an lm object, so we have to use the beta slot or fixef() to access the fixed effects.

fixef(mod)['(Intercept)'] + ranef(mod)[[1]][grepl('South Africa', rownames(ranef(mod)[[1]])), ] +
  ranef(mod)[[2]][grepl('Southern', rownames(ranef(mod)[[2]])), ]
## (Intercept) 
##        0.98

Ordered Logits

While there are lots of different GLMs out in the world, glmer() can only really fit binomial and Poisson models. If you want to fit a multinomial probit, you’ll have to turn to another package. For today, we’re just going to look at ordered logistic regression via the clmm() function in the ordinal package. Cumulative link mixed models are another way of referring to random effects ordered logit (or probit) models. Just like lme4, ordinal supports nested random effects. Try fitting a model that explains a respondent’s level of trust in the president as a function of their economic condition, whether they’ve been a victim of property or violent crime, and their education level.

library(ordinal)

## fit model
mod <- clmm(Q52A ~ Q4A + Q11A + Q11B + Q97 + (1 | REGION / COUNTRY), data = ab)
## Error in getY(fullmf): response needs to be a factor

Unfortunately, clmm() isn’t quite as smart as glmer(). To make it happy, we need to make sure that our outcome variable, and any grouping variables, are factors. We don’t have to do this for lmer() because the function automatically checks that these variables are integers and converts them to factors if they are. Once we’ve taken care of that, we can fit the model.

## convert outcome and country to factor
ab <- ab %>% mutate(Q52A = as.factor(Q52A), COUNTRY = as.factor(COUNTRY))

## fit model
mod <- clmm(Q52A ~ Q4A + Q11A + Q11B + Q97 + (1 | REGION / COUNTRY), data = ab)

## present results
htmlreg(mod, stars = .05, custom.coef.map = tab_map, groups = list('Predictors' = 1:4, 'Cutpoints' = 5:7))
Statistical models
Model 1
Predictors
     Economic Condition 0.45*
(0.01)
     Property Crime -0.06*
(0.01)
     Violent Crime -0.10*
(0.02)
     Education -0.09*
(0.00)
Cutpoints
     0|1 -0.72*
(0.11)
     1|2 0.48*
(0.11)
     2|3 1.53*
(0.11)
Log Likelihood -55049.99
AIC 110117.97
BIC 110196.27
Num. obs. 44371
Groups (COUNTRY:REGION) 33
Groups (REGION) 5
Variance: COUNTRY:REGION: (Intercept) 0.35
Variance: REGION: (Intercept) 0.00
*p < 0.05

The cutpoints divide the outcome variable as a function of the linear predictor, so if \(\eta_i = 1.22\) after the inclusion of the appropriate random intercept terms, then \(\hat{y_i} = 2\) because \(0.48 < 1.22 \leq 1.53\).

Individual Exercise

Use the Global Terrorism Database contained in GTD.csv to estimate a model where the number of terrorist attacks in a country-year is explained by GDP per capita and VDEM’s polyarchy score (v2x_polyarchy). WDI and the vdem packages (https://github.com/xmarquez/vdem) are your friends. Include a random intercept term by country, and allow the mean of country random intercepts to vary by year. Produce a publication quality table of your results. Is there more variation between countries or between years?