# 💻 **Tutorial 02**: Introduction to VIMuRe in R

VIMuRe v0.1.3 (latest)

If you use `VIMuRe`

in your research, please cite (De Bacco et al. 2023).

TLDR: By the end of this tutorial, you will be able to:

- Load data into
`VIMuRe`

- Fit a model to your data
- Obtain estimates of the latent network structure

Found an interesting use case for `VIMuRe`

? Let us know! Open a discussion on our GitHub repository.

# ⚙️ Setup

**Import packages**

```
library(haven)
library(tidyr)
library(dplyr)
library(magrittr)
library(igraph)
library(vimure)
```

⚠️ Ensure you have installed the latest version of `VIMuRe`

before running this tutorial. Follow the 📦 Installation instructions if you haven’t already.

# 📥 Step 1: Load the data

Here, we’ll use the `VIMuRe`

package to model social networks data from a village in Karnataka, India, which were gathered as part of a project studying the adoption of microfinance (Banerjee et al. 2013b). The data files we use here are derived from those openly-available on the Harvard Dataverse page (Banerjee et al. 2013a) and raw data files available on Matthew O. Jackson’s website. See 💻 Tutorial 1 for more details on how to prepare data for `VIMuRe`

.

We have selected a particular village to focus on. The dataset contains information on multiple types of social relationships. We are interested in relationships that were “double-sampled” in the original survey, meaning those elicited through two different prompts. For example, respondents were asked both *“If you needed to borrow kerosene or rice, to whom would you go?”* as well as *“Who would come to you if he/she needed to borrow kerosene or rice?”* These two different prompts should give us different perspectives on what could be the same relationship. If everyone in the network is being surveyed, then this should mean that we are eliciting information on a relationship twice (hence, “double sampling”): we’re asking `ego`

if they go to `alter`

for help, and also asking `ego`

if `alter`

comes to them for help.

Four different tie types (or “layers”) were double-sampled in this way (i.e., elicited with two prompts). Here, we will use all four layers to model the latent network structure. The four layers are:

`money`

: lending and borrowing money`advice`

: giving and receiving advice`visit`

: visiting and being visited`kerorice`

: lending and borrowing of household items

We stored all the data in a single data frame, `edgelist`

.

## Click to see the code we used

```
# Here we are using the get_layer() function defined in Tutorial 01
library(haven)
library(tidyr)
library(dplyr)
library(magrittr)
library(igraph)
<- get_layer(1, "money", get_indivinfo(METADATA_FILE), raw_csv_folder=RAW_CSV_FOLDER)
layer_money <- get_layer(1, "advice", get_indivinfo(METADATA_FILE), raw_csv_folder=RAW_CSV_FOLDER)
layer_advice <- get_layer(1, "kerorice", get_indivinfo(METADATA_FILE), raw_csv_folder=RAW_CSV_FOLDER)
layer_kerorice <- get_layer(1, "visit", get_indivinfo(METADATA_FILE), raw_csv_folder=RAW_CSV_FOLDER)
layer_visit
<- bind_rows(layer_money$edgelist,
edgelist
layer_advice$edgelist,
layer_kerorice$edgelist,
layer_visit$edgelist)<- as.character(edgelist$ego)
edgelist$ego <- as.character(edgelist$alter)
edgelist$alter <- as.character(edgelist$reporter)
edgelist$reporter
<- union(layer_money$reporters, layer_advice$reporters) %>% union(layer_kerorice$reporters) %>% union( layer_visit$reporters)
reporters
rm(layer_money, layer_advice, layer_kerorice, layer_visit)
library(vimure)
vm_config()
# The code below should help you confirm that all reporters appear in the edgelist:
# This should return an empty set setdiff(reporters, unique(edgelist$reporter))
```

```
# Load in the network data that we processed in [Tutorial 1](/latest/tutorials/python/tutorial01-data-preparation.qmd) for Village 1
<- "docs/data/2010-0760_Data/Data/Raw_csv"
PROCESSED_CSV_FOLDER
# Read in all of the csv's and combine them into one dataframe
<- do.call("rbind", list(
edgelist money = read.csv("village1_money.csv"),
layer_advice = read.csv("village1_advice.csv"),
layer_kerorice = read.csv("village1_kerorice.csv"),
layer_visit = read.csv("village1_visit.csv")
))
# Clean up the row names
rownames(edgelist) <- NULL
```

After our preprocessing, the input data takes the form of the dataset given below.

`ego`

: generally the requester, who is also referred to as \(i\) in (De Bacco et al. 2023) (required ✔️)`alter`

: generally the giver, also referred to as \(j\) in (De Bacco et al. 2023) (required ✔️)`reporter`

: the person reporting on the tie, as referred to as \(m\) in (De Bacco et al. 2023) (required ✔️)`tie_type`

: the prompt type that was given, e.g., borrow money or lend money (optional)`layer`

: the relationship type, e.g., money (optional)`weight`

: the weight of the edge, e.g., the actual amount of money transferred (optional)

Suppose we have a data frame `edgelist`

of our village’s data that looks like this:

```
set.seed(100) # for reproducibility
%>% dplyr::sample_n(size = 10, replace = FALSE) edgelist
```

ego | alter | reporter | tie_type | layer | weight |
---|---|---|---|---|---|

110501 | 110601 | 110501 | visitcome | visit | 1 |

108602 | 109502 | 109502 | keroricego | kerorice | 1 |

107308 | 106302 | 107308 | lendmoney | money | 1 |

114701 | 117501 | 114701 | keroricecome | kerorice | 1 |

104001 | 102101 | 104001 | visitcome | visit | 1 |

115503 | 115504 | 115504 | helpdecision | advice | 1 |

110601 | 102901 | 110601 | lendmoney | money | 1 |

112902 | 106802 | 106802 | helpdecision | advice | 1 |

110402 | 116203 | 116203 | borrowmoney | money | 1 |

100702 | 100801 | 100702 | lendmoney | money | 1 |

(Ensure that column data types are strings, not numeric.)

Alternatively, you might have your data as a **directed** `igraph`

object. In this case, you must ensure that the `igraph`

object has the following **edge attributes**:

`reporter`

: the person reporting on the tie, as referred to as \(m\) in (De Bacco et al. 2023) (required ✔️)`tie_type`

: the prompt type that was given (optional)`layer`

: the tie type (optional)`weight`

: the weight of the edge (optional)

Note that the `graph_from_data_frame`

function assumes that the first two columns in the data frame comprise the edge list (i.e., the \(ego\) and \(alter\) columns and treat all subsequent columns as edge attributes).

`<- igraph::graph_from_data_frame(edgelist, directed=TRUE) G `

Suppose our village’s data is in the form of an `igraph`

object `G`

:

`summary(G)`

which gives us the following summary:

```
46333aa DNW- 417 2690 --
IGRAPH + attr: name (v/c), reporter (e/c), tie_type (e/c), layer (e/c), weight | (e/n)
```

# 📊 Step 2: Summary statistics

Let’s take a closer look at how many nodes and edges we have in our data. We can also look at how many reporters we have in our data.

## 2.1 Number of nodes

```
<- dplyr::union(edgelist$ego, edgelist$alter) %>% unique()
unique_nodes
cat("Number of nodes:", length(unique_nodes), "\n")
cat("Number of edges:", nrow(edgelist), "\n")
```

```
Number of nodes: 417
Number of edges: 2690
```

```
cat("Number of nodes:", igraph::vcount(G), "\n")
cat("Number of edges:", igraph::ecount(G), "\n")
```

```
Number of nodes: 417
Number of edges: 2690
```

## 2.2 Number of reporters

```
<- edgelist$reporter %>% unique()
reporters cat("Number of reporters: ", length(reporters), "\n")
```

`Number of reporters: 203 `

```
<- igraph::get.edge.attribute(G, "reporter") %>% unique()
reporters cat("Number of reporters: ", length(reporters), "\n")
```

`Number of reporters: 203 `

## 2.3: Average number of ties per reporter

There are a couple of things to note about this dataset: reporters could name a maximum of four alters for each question, and reporters can only report on ties that they are involved in (e.g., if a reporter is involved, they cannot report on it). Because we are modelling double-sampled questions, each reporter can report a maximum of 8 ties.

Let’s create a plot to visualise the distribution of the number of ties per reporter:

```
# Create a plot_df to summarise the number of ties per reporter
<- edgelist %>%
plot_df group_by(layer, tie_type, reporter) %>%
summarize(n_ties = n(), .groups="keep") %>%
ungroup() %>%
group_by(n_ties, tie_type, layer) %>%
summarize(n_reporters = n(), .groups="keep") %>%
ungroup() %>%
arrange(layer, n_ties, tie_type)
```

```
# Create a plot_df to summarise the number of ties per reporter
<- data.frame(get.edgelist(G))
plot_df colnames(plot_df) <- c("ego", "alter")
<- list.edge.attributes(G)
add_cols for (i in 1:length(add_cols)) {
<- get.edge.attribute(G, add_cols[i])
plot_df[, add_cols[i]]
}
<- plot_df %>%
plot_df group_by(layer, tie_type, reporter) %>%
summarize(n_ties = n(), .groups="keep") %>%
ungroup() %>%
group_by(n_ties, tie_type, layer) %>%
summarize(n_reporters = n(), .groups="keep") %>%
ungroup() %>%
arrange(layer, n_ties, tie_type)
```

producing the following table:

n_ties | tie_type | layer | n_reporters |
---|---|---|---|

1 | giveadvice | advice | 115 |

1 | helpdecision | advice | 104 |

2 | giveadvice | advice | 45 |

2 | helpdecision | advice | 73 |

3 | giveadvice | advice | 6 |

3 | helpdecision | advice | 13 |

4 | giveadvice | advice | 1 |

4 | helpdecision | advice | 2 |

n_ties | tie_type | layer | n_reporters |
---|---|---|---|

1 | keroricecome | kerorice | 41 |

1 | keroricego | kerorice | 42 |

2 | keroricecome | kerorice | 99 |

2 | keroricego | kerorice | 100 |

3 | keroricecome | kerorice | 45 |

3 | keroricego | kerorice | 40 |

4 | keroricecome | kerorice | 2 |

4 | keroricego | kerorice | 4 |

n_ties | tie_type | layer | n_reporters |
---|---|---|---|

1 | borrowmoney | money | 57 |

1 | lendmoney | money | 66 |

2 | borrowmoney | money | 92 |

2 | lendmoney | money | 86 |

3 | borrowmoney | money | 31 |

3 | lendmoney | money | 16 |

4 | borrowmoney | money | 1 |

n_ties | tie_type | layer | n_reporters |
---|---|---|---|

1 | visitcome | visit | 45 |

1 | visitgo | visit | 36 |

2 | visitcome | visit | 84 |

2 | visitgo | visit | 67 |

3 | visitcome | visit | 39 |

3 | visitgo | visit | 38 |

4 | visitcome | visit | 11 |

4 | visitgo | visit | 31 |

Alternatively as a bar plot:

## Click to show plot code

```
library(ggplot2)
$tie_type <-
plot_dffactor(plot_df$tie_type,
levels = c("visitgo", "visitcome",
"keroricego", "keroricecome",
"borrowmoney", "lendmoney",
"helpdecision", "giveadvice"))
$layer <-
plot_dffactor(plot_df$layer,
levels = c("advice", "money", "kerorice", "visit"))
= (
g ggplot(plot_df, aes(x = tie_type, y=n_reporters, fill=layer)) +
geom_col() +
coord_flip() +
facet_grid(n_ties ~ ., labeller=label_both) +
theme_bw() +
theme(legend.position = "bottom",
legend.box.spacing = unit(0.5, "pt")) +
labs(x = "Number of ties reported for this prompt",
y = "Number of reporters",
fill = "Layer")
)
ggsave(plot=g, filename="tutorial02_fig01.png", width=6, height=8, dpi=300)
```

Note that the plot above only represents the reports made by the 203 nodes that appear in the `reporter`

column. Given that the total number of nodes in the network is 417, either some nodes did not report on any ties, or they were not interviewed at all.

VIMuRe will consider all nodes in the network, even if they are not present in the `reporter`

column. If you want to restrict your analysis to include only the network of reporters, you must filter the edge list before proceeding.

## 2.4: Union vs. Intersection and the concept of Concordance

Concordance is the proportion of ties in a network that both reporters report. It measures the extent to which the two reporters agree about the tie (see (Ready and Power 2021) for a discussion of concordance in the Karnataka network data). It is calculated as follows:

\[ \text{Concordance} = \frac{\text{\# of ties reported by both reporters}}{\text{\# number of unique ties reported}} \]

```
# Take the intersection: keep only records where
# both reporters report on the same tie in both tie_types
<-
df_intersection %>%
edgelist group_by(ego, alter, .drop=FALSE) %>%
filter(n() == 2) %>%
select(ego, alter) %>%
distinct()
# Take the union: keep all ties reported
# irrespective of tie_type and how many times they were reported
<- edgelist %>% select(ego, alter) %>% distinct()
df_union
# Concordance
cat(paste0("Concordance is: ", nrow(df_intersection) / nrow(df_union)))
```

```
# Take the intersection: keep only records where
# both reporters report on the same tie in both tie_types
<- E(G)[count_multiple(G, E(G)) == 2]
which_edges_intersection
<-
G_intersection ::subgraph.edges(G, which_edges_intersection) %>%
igraph::simplify()
igraph
# Take the union: keep all ties reported
# irrespective of tie_type and how many times they were reported
<- igraph::simplify(G)
G_union
# Concordance
cat(
paste0(
"Concordance is: ",
::ecount(G_intersection) / igraph::ecount(G_union)
igraph
) )
```

producing:

`Concordance is: 0.200226244343891`

# 📦 Step 3: Use `vimure`

package to run the model

Now that you have prepared your data, you can use `vimure`

to run the model:

```
library(vimure)
# Run the model
<- vimure(edgelist) model
```

```
library(vimure)
# Run the model
<- vimure(G) model
```

(*If you would like to see what the model is doing, you can set verbose=True in the fit() method*)

After which, you might be interested to look at the posterior estimates of the model parameters:

```
# Returns a list with the posterior estimates of the model parameters
<- model$get_posterior_estimates()
posterior_estimates
# Check which parameters were estimated
names(posterior_estimates)
```

`"nu" "theta" "lambda" "rho" `

We will look at what these mean in the next section.

## 3.1: Posterior Estimates

The four values you get from the `get_posterior_estimates()`

method are the geometric expectations of the distributions of the following parameters:

Parameter | Symbol | Description |
---|---|---|

`nu` |
\(\nu\) | The inferred mutuality for the network (latent equivalent to \(\eta\)) |

`theta` |
\(\theta_{lm}\) | The “reliability” of a reporter \(m\) on layer \(l\) |

`lambda` |
\(\lambda_{lk}\) | The latent contribution that \(Y_l=k\) has on \(\theta_{lm}\), where \(k \in \{1, \ldots, K\}\) represents the weight of the tie. |

`rho` |
\(\rho_{lijk}\) | The probability that a directed tie of weight \(k-1\) exists between nodes \(i\) and \(j\) on a particular layer \(l\) |

Remember that the expected value of our model is:

\[ \mathbb{E} \left[X_{lijm}^{}|Y_{lij}^{} = k\right] = \theta_{lm}\lambda_{lk}^{}+\eta X_{ljim}^{} \]

where \(X_{lijm}^{}\) is the observed value of the tie between nodes \(i\) and \(j\) reported by reporter \(m\) on layer \(l\) and \(Y_{lij}^{}\) is the “ground truth” of the strength of the directed tie between nodes \(i\) and \(j\) on layer \(l\).

In the subsections below, we will provide an explanation of \(\nu\) and \(\theta_{lm}\) parameters and how to reason about the mutuality and “reliability” of reporters. \(\rho\) and \(\lambda\) are more technical and we will go into detail about them in a future tutorial.

**A quick note about \(\rho\)**

If you want to extract a quick point estimate of the inferred network, you can simply take the expected value of \(\rho\) for \(k=1\):

```
# Expected value of rho for k=1
= posterior_estimates[['rho']][1,,,2] rho
```

The code above will return a \(N \times N\) weighted adjacency matrix, which can be thought of as the probability of a tie between individuals \(i\) and \(j\) on layer \(l = 0\) (Tutorial 03 will go into more detail about how to interpret the \(\rho\) parameter.)

Now let’s look at parameters \(\nu\) and \(\theta_{lm}\):

### 3.1.1: Mutuality (\(\nu\) or \(\eta\))

In our case study:

`'nu']] posterior_estimates[[`

`0.6289576`

This indicates a large mutuality in the network. That is, if \(i\) reports that \(j\) *lends* money to them, then \(i\) is likely to report that \(j\) also *borrows* money from them.

This is a network-wide parameter and encompasses all layers. If you have reasons to believe mutuality varies across layers, you have to re-run the model for each layer separately.

### 3.1.2: Under-reporting vs over-reporting: a combination of (\(\theta_{lm} \times \lambda_{lk}\))

First, let’s look at the shape of the \(\theta\) array:

`'theta']]) dim(posterior_estimates[[`

` 4 417`

We have four layers, so the first dimension is 4. The second dimension is the number of reporters in the network. In our case, this shows that VIMuRe infers that there are 417 reporters in the network.

This parameter, combined with \(\lambda_{lk}\) represents the “reliability” of a reporter \(m\) on layer \(l\):

`'lambda']]) dim(posterior_estimates[[`

` 4 2`

The first dimension is the number of layers \(l\), and the second dimension is the strength of the tie, which in this case can assume the values \(k \in \{0, 1\}\). We only have two possible values for \(k\) because that is the default setting for the `VimureModel`

class. If you want to model more than two possible values for \(k\), you can set the `K`

parameter when running the fit function: `model.fit(..., K=3)`

.

These two parameters combined indicate the probability that a reporter will *tend* to report ties, in general, for a given layer (recall the expected value of the VIMuRe model shown in Section 3.1: Posterior Estimates). Loosely speaking, we can think of this value as the “reliability” of a reporter on a given layer.

You might be wondering why there are so many reporters, 417. After all, in Section 2.2: Number of reporters, we calculated this number to be around 200! This is due to a technicality in the way the VIMuRe package handles the data. Our current implementation represents the total number of reporters, \(M\) as \(M = N\) (i.e. the number of nodes is equal to the number of reporters).

This is akin to saying each node has *the potential* to be a reporter. But to control for which nodes are actually reporters, we use a binary mask, \(R_{lijm}\), which is 1 if the node \(m\) is a reporter on layer \(l\) for the tie between nodes \(i\) and \(j\) and 0 otherwise. You can control how this mask is built by passing the `R`

parameter to the `fit()`

method. Read more about it in section 3.3: A note on the reporter mask (`R`

).

Changing this is a priority for the next version of the package. But, as it is not a trivial change, we have decided to leave it as it is for now.

**How are reporters’ “reliability” distributed across layers?**

Let’s check which reporters have the highest “reliability” values, and plot the values for each layer:

## Click to see the code

```
# Get the node mapping: which node corresponds to which index in the array
<- model$nodeNames
df_nodes
# Which nodes are reporters?
# Note: this assumes you have a `reporters` list from Tutorial 01
<- df_nodes %>% filter(name %in% reporters) %>% pull(id)
id_reporters
<- data.frame(
df_lambda_k1 layer = model$layerNames,
lambda_k1 = posterior_estimates$lambda[,2]
)
<- purrr::map_dfr(id_reporters, ~ {
df_theta_reporters <- .x
reporter_id ::map_dfr(seq_along(model$layerNames), ~ {
purrr<- .x
l <- model$layerNames[l]
layer_name <- posterior_estimates$theta[l, reporter_id] * df_lambda_k1$lambda_k1[l]
theta <- df_nodes$name[reporter_id]
reporter_name tibble(
layer = layer_name,
reporter_id = reporter_id,
reporter_name = reporter_name,
theta_scaled = theta
)
})
})
# Sort by reliability (scaled by lambda_k=1)
%>%
df_theta_reporters pivot_wider(names_from = layer, values_from = theta_scaled, values_fn = list) %>%
select(-1) %>%
unnest(cols = everything(), names_repair = "unique") %>%
arrange(desc(advice), desc(kerorice), desc(money), desc(visit)) %>%
head()
```

## Click to see the code

```
library(ggplot2)
= c('#E69F25', '#5CB4E4', '#069F72', '#F1E545', '#0773B2', '#CC79A8')
COLORS
= (ggplot(df_theta_reporters, aes(x=theta_scaled, fill=layer)) +
g geom_histogram(binwidth=0.1, color="#212121") +
theme_bw() +
facet_grid(layer ~ ., scales="free_y") +
scale_fill_manual(values=COLORS, guide="none") +
theme(axis.text.x=element_text(size=10),
axis.text.y=element_text(size=10),
panel.grid.minor=element_blank(),
panel.grid.major=element_blank(),
strip.background = element_blank(),
strip.text.x = element_text(size=10),
axis.title=element_text(size=16)) +
scale_x_continuous(name = expression(paste(theta * lambda[k==1])),
breaks = seq(0, 1.1, by = 0.1),
limits = c(0, 1.1)) +
scale_y_continuous(name="Count", breaks=seq(0, 60, by=10), limits=c(0, 50)) +
ggtitle(expression(paste("Reliability distribution across layers (", theta[{l}], " \u00D7 ", lambda[k==1], ")"))))
ggsave(plot=g, filename="theta_reporters.png", width=7, height=9, dpi=300)
```

**How should we interpret the values?**

The interpretation depends on both the mutuality of the network and the survey design. If the mutuality is high (as indicated by \(\nu\) being closer to 1), reporters will naturally tend to reciprocate the ties they report, by default, thus reporters with high \(\theta\) values (relative to the \(\theta\) distribution) are likely “over-reporting” ties.

Let’s investigate this a bit further by checking the ties reported by a top reporter (113302) in the advice layer:

reporter_name | advice | kerorice | money | visit |
---|---|---|---|---|

113302 | 0.8970675 | 0.7397009 | 0.4591973 | 0.2099856 |

113901 | 0.8780586 | 0.8209907 | 0.8680566 | 0.6693385 |

109902 | 0.8504384 | 0.7928275 | 0.0000155 | 0.6415688 |

109505 | 0.8504182 | 0.0000130 | 0.8384146 | 0.8622429 |

105902 | 0.8504050 | 0.0000128 | 0.4580637 | 0.3737390 |

105402 | 0.8503861 | 0.5638047 | 0.4760844 | 0.7225281 |

Most reporters have reliability in the \([0.4, 0.6]\) interval, but 113302 has a “reliability” of \(\approx 0.9\) in the advice layer. Let’s check the ties reported by this reporter:

`%>% filter(reporter == "113302", layer == "advice") edgelist `

ego | alter | reporter | tie_type | layer | weight |
---|---|---|---|---|---|

113302 | 110502 |
113302 | giveadvice | advice | 1 |

113302 | 113102 |
113302 | giveadvice | advice | 1 |

105701 |
113302 | 113302 | helpdecision | advice | 1 |

110501 |
113302 | 113302 | helpdecision | advice | 1 |

113601 |
113302 | 113302 | helpdecision | advice | 1 |

113301 |
113302 | 113302 | helpdecision | advice | 1 |

Node `113302`

does not report any reciprocal ties, but they do report ties with six other nodes.

**Are all nodes also reporters?**

Two of these six nodes are not reporters:

```
<- c("110502", "113102", "105701", "110501", "113601", "113301")
nodes_mentioned
<- nodes_mentioned %>% setdiff(reporters)
not_reporters not_reporters
```

` "113102" "105701"`

We will never be able to confirm ties between `113302`

and these nodes and this is already one of the reasons why `113302`

’s “reliability” is high. (Note that these two non-reporter nodes will have reliability values of \(\approx 0\)).

On top of that, when we look at the reports made by the remaining four reporters, only node `113301`

confirms the helpdecision tie reported by `113302`

:

```
= c("110502", "110501", "113601", "113301")
potential_friends
%>% filter(reporter %in% potential_friends & (ego == "113302" | alter == "113302")) edgelist
```

ego | alter | reporter | tie_type | layer | weight |
---|---|---|---|---|---|

113301 | 113302 | 113301 | giveadvice | advice | 1 |

That is why `113302`

’s \(\theta\lambda_{k=1}\) is high: they report a lot of ties, but only one of them is confirmed by other reporters.

Nodes with values in the average range of the \(\theta_{lm}\lambda_{lk}\) distribution is likely to be more “reliable” reporters.

## 3.2: Implicit assumptions

You might have noticed the following warning messages when you ran `vimure()`

:

`UserWarning: The set of nodes was not informed, using ego and alter columns to infer nodes.`

`UserWarning: The set of reporters was not informed, assuming set(reporters) = set(nodes) and N = M.`

`UserWarning: Reporters Mask was not informed (parameter R). Parser will build it from reporter column, assuming a reporter can only report their own ties.`

The messages are not errors; they are simply indications that the VIMuRe package has made some implicit assumptions when loading the data. If you would like to set the nodes, and the reporter mask *explicitly*, you can pass these values as arguments to the fit() method. For example:

```
vimure(edgelist,
nodes=all_nodes,
R=reporter_mask)
```

## 3.3: A note on the reporter mask (`R`

)

While nodes and reporters are simple lists of strings, the reporter mask `R`

is a multidimensional sparse logical array (a tensor) with dimensions \(L \times N \times N \times N\). Here, `R[l, i, j, m]`

indicates that reporter \(m\) can report on the tie between \(i\) and \(j\) if the tie belongs to layer \(l\).

So, for example, if the survey design was such that reporters could only report on ties involving themselves, this implies:

\[\begin{cases} R_{lijm} = 1 & \text{if } i = m \text{ or } j = m \\ R_{lijm} = 0 & \text{otherwise}. \end{cases}\]Depending on your survey design, you might want to construct `R`

manually. For example, if all reporters were asked about all ties – regardless of whether they were involved in the ties or not – then `R`

would be a multi-dimensional array of ones. We are planning a future tutorial on how to handle different survey designs.

It’s worth noting that currently, the last dimension has size \(N\) instead of \(M\). This is because the current implementation of the code assumes that all reporters are part of the network. The mathematical model, as described in (De Bacco et al. 2023), is more flexible and supports an arbitrary set of reporters, whether they are a subset of nodes or not. However, we have not yet implemented this functionality in the code. Our survey designs have so far assumed that all reporters are part of the network, so this has not been a problem. If this is a feature you would like to see sooner, let us know by adding a discussion on GitHub.

# 4. Next steps

Future tutorials will cover:

- How to extract point estimates and visualise the resulting network using \(\rho\)
- How to handle different survey designs
- How to sample from the posterior distribution
- (Advanced) Tweaking the optimization parameters
- (Advanced) Understanding the convergence diagnostics

## References

*Science*341 (6144): 1236498. https://doi.org/10.1126/science.1236498.

*Journal of the Royal Statistical Society Series A: Statistics in Society*, February, qnac004. https://doi.org/10.1093/jrsssa/qnac004.

*Network Science*9 (4): 387–402. https://doi.org/10.1017/nws.2021.18.