library(network)
library(sna)
library(latentnet)
One of the simplest divisions in type of networks is between directed and undirected networks. Undirected networks consist of ties between actors, and do not have any directionality e.g. trade networks and military alliances. Directed networks still measure ties between actors, but also capture which created the tie e.g. which state in an alliance proposed the agreement or which combatant in a conflict initiated the violence. The data in this lab are from Cranmer et al. (2017), and are a network of organizations in the Swiss climate change mitigation network. We have a number of exogenous covariates that we wish to use to explain the formation of ties:
Read in these data from the .csv
files.
# policy forum affiliation data
# 1 = affiliation; 0 = no affiliation
# committee names are in the column labels; organizations in the row labels
forum <- as.matrix(read.table(file = 'climate0205-committee.csv',
header = T, row.names = 1, sep = ';'))
# influence reputation data
# square matrix with influence attribution
# 1 = influential; 0 = not influential
# cells contain the ratings of row organizations about column organizations
infrep <- as.matrix(read.table(file = 'climate0205-rep.csv',
header = T, row.names = 1, sep = ';'))
# collaboration; directed network
collab <- as.matrix(read.table(file = 'climate0205-collab.csv',
header = T, row.names = 1, sep = ';'))
# type of organization; vector with five character types
types <- as.character(read.table(file='climate0205-type.csv',
header = T, row.names = 1, sep = ';')[, 2])
# alliance-opposition perception; -1 = row organization perceives column organization as
# an opponent; 1 = row organization perceives column organization as an ally; 0 = neutral
allopp <- as.matrix(read.table(file = 'climate0205-allop.csv',
header = T, row.names = 1, sep = ';'))
# preference dissimilarity; Manhattan distance over four important policy issues
prefdist <- as.matrix(read.table(file = 'climate0205-prefdist.csv',
header = T, row.names = 1, sep = ';'))
Next we need to prepare the covariates for use in a network model. Multiple the forum matrix by its transpose to compute the one mode project of this membership matrix. Then create a matrix where each entry denotes whether its organization pair is a private-NGO pairing (hint: will this matrix by symmetric?). Finally, create matrices that capture whether the target of a tie is a government organization and whether the sender of a tie is an NGO organization.
# compute one-mode projection over different policy forums
forum <- forum %*% t(forum)
# 0 out the diagonal because it has no meaning
diag(forum) <- 0
# create matrix capturing all private-NGO pairs
priv_ngo <- matrix(0, nrow = nrow(collab), ncol = ncol(collab))
for (i in 1:nrow(priv_ngo)) {
for (j in 1:ncol(priv_ngo)) {
if ((types[i] == 'private' && types[j] == 'ngo') ||
(types[i] == 'ngo' && types[j] == 'private')) {
priv_ngo[i, j] <- 1
priv_ngo[j, i] <- 1
}
}
}
# create matrix capturing whether alter is a government organization
gov_alt <- matrix(rep(as.numeric(types == 'gov'), length(types)), byrow = T,
nrow = length(types))
# create matrix capturing whether ego is an NGO
ngo_ego <- matrix(rep(as.numeric(types == 'ngo'), length(types)), byrow = F,
nrow = length(types))
Which of these matrices will be symmetric? Which will be asymmetric? Why are some symmetric and some asymmetric? Finally, we need to convert the collaboration matrix to a network object so that we can fit a latent network model.
# create network object
nw_collab <- network(collab)
# inspect network object
nw_collab
## Network attributes:
## vertices = 34
## directed = TRUE
## hyper = FALSE
## loops = FALSE
## multiple = FALSE
## bipartite = FALSE
## total edges= 207
## missing edges= 0
## non-missing edges= 207
##
## Vertex attribute names:
## vertex.names
##
## No edge attributes
We can see that we have no edge attributes and onle one vertex attribute: the names of the vertices themselves. Since the point of a latent space network is the ability to use exogenous covariates to predict ties, we need to get some covariates in our network object. We’ll be using several measures of network position as well as exogenous covariates to explain this network.
Betweenness centrality is a measure of how central a vertex is for paths connecting other vertices. It is calculated by:
\[ g(v) = \sum_{s\neq v \neq t} \frac{\sigma_{st}(v)}{\sigma_{st}} \]
where \(\sigma_{st}\) is the total number of shortest paths from vertex \(s\) and vertex \(t\), and \(\sigma_{st}(v)\) is the number of those shortest paths that pass through \(v\).
In a directed network, indegree centrality is the number of ties where a vertex is the target. In matrix form, it is calculated as:
\[ k^{in} = \mathbf{Ae} \]
where \(\mathbf{A}\) is the adjacency matrix of the network and \(\mathbf{e}\) is an \(n\) length vector of 1s i.e. the row sum of the adjacency matrix. Outdegree centrality is the number of ties where a vertex is the sender, and is the column sum of the adjacency matrix. Use the set.vertex.attribute()
function to add information on organization type, betweenness centrality, and degree centrality in the subjective influence network to each vertex.
# set node attribute for organization type
set.vertex.attribute(nw_collab, 'orgtype', types)
# set node attribute for betweenness centrality
set.vertex.attribute(nw_collab, 'betweenness', betweenness(nw_collab))
# set node attribute for degree centrality in influence network
set.vertex.attribute(nw_collab, 'influence', degree(infrep, gmode = 'digraph', cmode = 'indegree'))
# inspect vertex attributes of a random organization
nw_collab$val[[17]]
## $na
## [1] FALSE
##
## $vertex.names
## [1] "AQ"
##
## $orgtype
## [1] "party"
##
## $betweenness
## [1] 24
##
## $influence
## [1] 17
Now we’re ready to fit a latent space network model. The ergmm()
function allows us to fit these models, but the forumal is significantly different from those we’re used to seeing. It still starts nw_collab ~
since the network is our response variable, but the right side is where things get weird. We can’t just include our covariates; we have to specify how they enter the model. The nodematch()
argument calculates the homophily of a vertex for the given attribute; in our case, we want to include the homophily for organization type, so use nodematch()
on our organization type vertex attribute. The edgecov()
argument includes the edge values of a given matrix in our model, so use this argument with the government target, NGO sender, private-NGO pairing, forum membership, subjective influence, preference distance, and percevied alliance-opposition matrices (one call per matrix). The nodeicov()
argument includes the vertex values of a given attribute of the network, so include our indegree centrality influence measure, along with the absdiff()
argument, which will include the absolute difference in influence between each vertex. Finally, we need to define the dimensionality of our latent space. Use the euclidean()
argument and set d = 2
and G = 0
for a latent space network with two dimensions and no clusters.
# set seed to use in GOF statistics later
seed <- 01110011
mod_0c <- ergmm(nw_collab ~
nodematch('orgtype') +
edgecov(gov_alt) +
edgecov(ngo_ego) +
edgecov(priv_ngo) +
edgecov(forum) +
edgecov(infrep) +
edgecov(prefdist) +
edgecov(allopp) +
nodeicov('influence') +
absdiff('influence') +
euclidean(d = 2, G = 0),
seed = seed,
control = control.ergmm(sample.size = 10000, burnin = 50000, interval = 100))
Let’s take a quick look at the output of our model.
summary(mod_0c)
## NOTE: It is not certain whether it is appropriate to use latentnet's BIC to select latent space dimension, whether or not to include actor-specific random effects, and to compare clustered models with the unclustered model.
##
## ==========================
## Summary of model fit
## ==========================
##
## Formula: nw_collab ~ nodematch("orgtype") + edgecov(gov_alt) + edgecov(ngo_ego) +
## edgecov(priv_ngo) + edgecov(forum) + edgecov(infrep) + edgecov(prefdist) +
## edgecov(allopp) + nodeicov("influence") + absdiff("influence") +
## euclidean(d = 2, G = 0)
## Attribute: edges
## Model: Bernoulli
## MCMC sample of size 10000, draws are 100 iterations apart, after burnin of 50000 iterations.
## Covariate coefficients posterior means:
## Estimate 2.5% 97.5% 2*min(Pr(>0),Pr(<0))
## (Intercept) -1.9936 -2.8848 -1.11 <2e-16 ***
## nodematch.orgtype 1.4304 0.8222 2.02 0.0002 ***
## edgecov.gov_alt 0.6999 0.0688 1.35 0.0278 *
## edgecov.ngo_ego 1.9418 1.0577 2.89 <2e-16 ***
## edgecov.priv_ngo -1.4141 -2.5814 -0.34 0.0084 **
## edgecov.forum 1.1867 0.4598 1.90 0.0012 **
## edgecov.infrep 1.6127 1.0942 2.15 <2e-16 ***
## edgecov.prefdist -0.9956 -1.9536 -0.02 0.0458 *
## edgecov.allopp 1.4552 0.9756 1.94 <2e-16 ***
## nodeicov.influence 0.1236 0.0840 0.17 <2e-16 ***
## absdiff.influence -0.0609 -0.1097 -0.01 0.0200 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Overall BIC: 932
## Likelihood BIC: 661
## Latent space/clustering BIC: 271
##
## Covariate coefficients MKL:
## Estimate
## (Intercept) 1.519
## nodematch.orgtype 1.277
## edgecov.gov_alt -0.246
## edgecov.ngo_ego -0.570
## edgecov.priv_ngo -1.415
## edgecov.forum 1.477
## edgecov.infrep -0.231
## edgecov.prefdist -1.809
## edgecov.allopp -0.199
## nodeicov.influence -0.061
## absdiff.influence 0.054
Coefficients indicate the effect of each statistic on the probability of a tie between organizations \(i\) and \(j\). For example, the edgecov.gov_alt
coefficient means that an edge is more likely to another organization when it is a governmental one than when it is another type. Similarly, the edgecov.prefdist
coefficient means that a tie less likely between two organizations with greater preference distance. Least surprising of all, the presence of a perceived alliance between two organizations makes ties between them more likely.
Now let’s fit two more networks with the same covariates, but one with one cluster, and one with two clusters.
# one cluster
mod_1c <- ergmm(nw_collab ~
nodematch('orgtype') +
edgecov(gov_alt) +
edgecov(ngo_ego) +
edgecov(priv_ngo) +
edgecov(forum) +
edgecov(infrep) +
edgecov(prefdist) +
edgecov(allopp) +
nodeicov('influence') +
absdiff('influence') +
euclidean(d = 2, G = 1),
seed = seed,
control = control.ergmm(sample.size = 10000, burnin = 50000, interval = 100))
# two clusters
mod_2c <- ergmm(nw_collab ~
nodematch('orgtype') +
edgecov(gov_alt) +
edgecov(ngo_ego) +
edgecov(priv_ngo) +
edgecov(forum) +
edgecov(infrep) +
edgecov(prefdist) +
edgecov(allopp) +
nodeicov('influence') +
absdiff('influence') +
euclidean(d = 2, G = 2),
seed = seed,
control = control.ergmm(sample.size = 10000, burnin = 50000, interval = 100))
Let’s compare the results for all three models:
texreg::htmlreg(list(mod_0c, mod_1c, mod_2c), html.tag = F, head.tag = F, body.tag = F)
Model 1 | Model 2 | Model 3 | ||
---|---|---|---|---|
(Intercept) | -1.99* | -2.03* | -2.06* | |
[-2.88; -1.11] | [-2.93; -1.14] | [-2.94; -1.16] | ||
nodematch.orgtype | 1.43* | 1.42* | 1.41* | |
[0.82; 2.02] | [0.82; 2.02] | [0.81; 2.01] | ||
edgecov.gov_alt | 0.70* | 0.70* | 0.69* | |
[0.07; 1.35] | [0.05; 1.33] | [0.07; 1.34] | ||
edgecov.ngo_ego | 1.94* | 1.96* | 2.00* | |
[1.06; 2.89] | [1.10; 2.89] | [1.08; 3.07] | ||
edgecov.priv_ngo | -1.41* | -1.34* | -1.36* | |
[-2.58; -0.34] | [-2.57; -0.22] | [-2.50; -0.32] | ||
edgecov.forum | 1.19* | 1.20* | 1.18* | |
[0.46; 1.90] | [0.47; 1.93] | [0.44; 1.89] | ||
edgecov.infrep | 1.61* | 1.63* | 1.61* | |
[1.09; 2.15] | [1.12; 2.13] | [1.10; 2.16] | ||
edgecov.prefdist | -1.00* | -0.98* | -0.99* | |
[-1.95; -0.02] | [-1.96; -0.02] | [-1.93; -0.02] | ||
edgecov.allopp | 1.46* | 1.48* | 1.44* | |
[0.98; 1.94] | [1.02; 1.97] | [0.99; 1.92] | ||
nodeicov.influence | 0.12* | 0.12* | 0.12* | |
[0.08; 0.17] | [0.08; 0.17] | [0.08; 0.16] | ||
absdiff.influence | -0.06* | -0.06* | -0.06* | |
[-0.11; -0.01] | [-0.11; -0.01] | [-0.11; -0.01] | ||
BIC (Overall) | 931.74 | 929.84 | 883.55 | |
BIC (Likelihood) | 660.52 | 649.31 | 654.56 | |
BIC (Latent Positions) | 271.22 | 280.53 | 228.99 | |
* 0 outside the confidence interval |
Unfortunately, we can’t use BIC for model comparison between latent space network models, so lets calculate some other goodness of fit statistics for each model. The gof()
function allows us to supply the statistics we want (and there a are a lot of them for network models). Calculate the dyad−wise and edge-wise shared partners for each model, as well as the indegree and outdegree centrality. Make sure you set control.gof.ergmm(seed = seed)
so that these caluclations use the same seed as the models. After obtaining these GOF statistics, use plot()
to plot them against the observed values of the statistics for the collaboration network.
# goodness of fit assessments
gof_0c <- gof(mod_0c, GOF = ~ dspartners + espartners +
idegree + odegree, control = control.gof.ergmm(seed = seed))
gof_1c <- gof(mod_1c, GOF = ~ dspartners + espartners +
idegree + odegree, control = control.gof.ergmm(seed = seed))
gof_2c <- gof(mod_2c, GOF = ~ dspartners + espartners +
idegree + odegree, control = control.gof.ergmm(seed = seed))
par(mfrow = c(2,2))
plot(gof_0c, main = 'no cluster model goodness of fit')
par(mfrow = c(2,2))
plot(gof_1c, main = 'one cluster model goodness of fit')