Model-assisted design of experiments in the presence of network correlated outcomes

We consider the problem of how to assign treatment in a randomized experiment, in which the correlation among the outcomes is informed by a network available pre-intervention. Working within the potential outcome causal framework, we develop a class of models that posit such a correlation structure among the outcomes. Then we leverage these models to develop restricted randomization strategies for allocating treatment optimally, by minimizing the mean square error of the estimated average treatment effect. Analytical decompositions of the mean square error, due both to the model and to the randomization distribution, provide insights into aspects of the optimal designs. In particular, the analysis suggests new notions of balance based on specific network quantities, in addition to classical covariate balance. The resulting balanced, optimal restricted randomization strategies are still design unbiased, in situations where the model used to derive them does not hold. We illustrate how the proposed treatment allocation strategies improve on allocations that ignore the network structure, with extensive simulations.


Introduction
The past decade has witnessed a surge of interest in causal analyses in the context of social networks, social media platforms and online advertising (e.g., Christakis and Fowler 2007;Aral et al. 2009;Bakshy et al. 2011Bakshy et al. , 2012Bond et al. 2012;Kim et al. 2015;Gui et al. 2015;Phan and Airoldi 2015;Cavusoglu et al. 2016). From a statistical perspective, the challenging aspect of these applications is how to account for the presence of connections, or network data, observed pre-intervention, possibly with uncertainty, and often missing (Airoldi and Rubin 2016).
While there is a well-developed literature on several aspects of the statistical analysis of network data (e.g., Wasserman 1994;Kolaczyk 2009;Goldenberg et al. 2010;Bickel and Chen 2009), the literature about methods for experimentation and causal analyses that leverages observed connections is at a budding stage (e.g., Here, we consider the problem of how to assign treatment in a randomized experiment, when the correlation among the outcomes is informed by a network available at the design stage.

Related work
The need to account for network connections in causal analyses has led scholars to focus on two specific problem settings: (i) network interference (Airoldi et al. 2012;Toulis and Kao 2013;Aronow and Samii 2013;Ugander et al. 2013;Eckles et al. 2014), a situation where the potential outcomes of unit i are a function of the treatment assigned to unit i and of the treatment assigned to other units that are related to unit i through the network, or of the observed outcomes of these related units, (ii) network-correlated outcomes (McPherson et al. 2001;Manski 2013;Shalizi and Thomas 2011), an alternative setting where the network informs the correlation among the potential outcomes, because the potential outcomes of unit i are a function of its covariates and those of other units, and the covariates of units that are connected are more similar than the covariates of the units that are not. In this paper, we focus on the network-correlated outcomes setting which has received less attention. With few exceptions (Aral et al. 2009;Shalizi and Thomas 2011), the literature considers these problems in isolation even as it is motivated by scenarios in which both are plausible (e.g., see Christakis and Fowler 2007). The effects of network interference, whether as the target of inference or as a nuisance, have been mostly studied in randomized experiments (Parker 2011;Airoldi et al. 2012;Toulis and Kao 2013;Aronow and Samii 2013;Ugander et al. 2013;Eckles et al. 2014; Karwa and Airoldi 2016; Sussman and Airoldi 2016) with a recent exception (Forastiere et al. 2016).
The confounding due to correlated outcomes, typical of problems where homophily is plausible (McPherson et al. 2001), has been mostly explored in observational studies using potential outcomes (Manski 2013) or causal graphical models (e.g., see Shalizi and Thomas 2011;Shalizi and McFowland III 2016). Aral et al. (2009) proposed a randomization strategy to disentangle interference and homphily in an application to online marketing, in the context of a dynamic network. However, this randomization strategy is tailored to the application and hard to analyze theoretically.
We have been working toward an analytical understanding of how to best identify and estimate the causal effect of interference in the presence of confounding due to correlated outcomes, in the context of randomized experiments on networks. Our approach is to develop restricted randomization strategies that leverage a (static) network available preintervention. Thus far, we analyzed shortcomings of popular randomization strategies for estimating causal effects (Karwa and Airoldi 2016), and developed elements of a theory of estimation for them Sussman and Airoldi 2016), in the presence of network interference. We leveraged these results to estimate causal effects in observational studies (Forastiere et al. 2016). This body of work has already led to some insights and general principles (Airoldi and Rubin 2016).
In this paper, we propose a collection of model-assisted restricted randomization strategies to improve estimation of the average treatment effect, in the experiments where a network is available pre-intervention, in the presence of network-correlated outcomes. Restricted randomization as a way to increasing the precision of estimates has a long tradition in the field of experimental design (e.g., see Yates 1948;Youden 1972;Simon 1979;Bailey 1983;Higham et al. 2015). The basic idea is that some assignments are considered problematic (e.g., when some covariates are unbalanced between the treatment arms) and can be excluded. In

Contributions
Our approach is inspired by classical work on model-assisted survey sampling in which a specific model is used to reduce the variance of a given estimator (typically, a liner weighted estimator) in a way that does not harm its properties when the model is wrong (e.g., see Särndal et al. 2003).
Drawing inspiration from the model-assisted survey sampling literature, we propose a model-assisted approach to the design of experiments. Namely, we posit a working model for the potential outcomes specified conditionally on a network observed pre-intervention, and then restrict the randomization to assignments for which the estimator of interests achieves a low mean square error. The class of models we propose leads to analytical expressions for the mean square error that suggest new notions of balance in terms of network statistics related to the degree distribution. We also develop new theoretical results showing that the modelassisted restricted randomization approach we propose maintains the design unbiasedness of the difference-in-means estimator even when the model is misspecified, and reduces the expected variance of the estimator when the model holds.

Analytical insights for evaluating allocations
Here we introduce the model-assisted approach to designing experiments; we posit a model for the potential outcomes that is used for calculating the mean square error of the difference of means estimator. Explicit formulas for the mean square error, for fixed allocation vector Z and averaged over allocation vectors Zs, indicate which aspects of the network play a role in the estimation of causal effects, in this setting, in Sections 2.3 and 2.4. We then introduce a more general model in Section 2.5 and we show, in the appendix, the extent to which the intuition developed in the illustrative model holds more generally. While the models we posit help developing analytical insights, and are useful in specific applications, the methodology we propose is design-unbiased even when these models do not hold, as we show in Section 3.6.

Causal inference setup
We work within the potential outcomes framework (Rubin 1974;Holland 1986;Imbens and Rubin 2015). We consider a population of N units, a binary treatment, denoted Z i = 1 if unit i is assigned to treatment, and real-valued outcomes, denoted Y i . The corresponding vectors are denoted Y, Z, respectively. We assume the stable unit-treatment-value assumption holds, which implies that the outcome of unit i is only a function of the treatment assigned to it, Y i (Z) = Y i (Z i ), thus excluding interference (Rubin 1974).
We consider a finite population setting, where the potential outcomes Y (Z) are unknown constant quantities, given Z. The only source of variation is how treatment is allocated to units. We assume treatment is allocated according to a distribution on the space of all binary vectors of length N , typically referred to as the randomization distribution (Imbens and Rubin 2015).
For illustrating the idea of model-assisted restricted randomizations, we consider the average treatment effect as the the inferential target of interest, defined as and the popular difference-in-means estimator for the average treatment effect, (1)

The normal-sum model
The model-assisted approach to experimental design requires a model, which is used to improve the inferential properties of the difference-in-means estimator when the model holds.
We posit a the model that depends on a network, which is available at the design stage in our setup.
Consider N units and an undirected network G among them, or, equivalently, a binary adjacency matrix A of size N × N , with the added constraint that A ii = 1 for all i, which we refer to as the extended adjacency matrix. The neighborhood of a unit i is defined as the index set N i = {j s.t. A ij = 1 or A ji = 1}. Let us posit the following model, The network induces correlation among the outcomes that are assigned to control because the mean of each Y i (0) is given by the sum of the covariate values, X j , of units j in a neighborhood of unit i. The effect of treatment is additive. Equations (2)-(4) define the normal-sum model. The implied model for the observed outcomes, Y obs , is given in Appendix D.1. We consider a more general version of this model in Section 2.5, but will otherwise focus on this model for the sake of clarity in presenting the restricted randomization approach.
For our purposes, the normal-sum model provides a useful abstraction for exploring the problem of optimal design of experiments in the presence of network-correlated outcomes. However, an illustrative application will help anchor the intuition. The normal-sum model arises naturally, for example, when considering the time users spend on a social media platform. Consider the binary treatment Z i to be the exposure to a new feature of the website designed to increase engagement and time spent online, and let Y i (Z i ) be the time spent online by user i when assigned to treatment Z i . The causal effect of interest τ is then the effect of the new feature on the time spent online. Let us assume a constant, additive treatment effect, for simplicity. In the absence of network connections and in the absence of treatment (i.e., Z i = 0), X i is the expected value of Y i (0) conditional on X i . So X i can be thought of as the intrinsic propensity of user i to spend time on the website. The model then captures the fact that the time spent on the website by user i increases with the number of its neighbors, with the neighbors' propensities to spend time on the website, and with the exposure to the new feature if the treatment has an effect.
In the appendix, we also consider a variant of the normal-sum model, which we refer to as the normal-mean model, in which the mean of Y i (0) is given by the average of the covariate values, X j , of units j in a neighborhood of unit i. In Section 2.5 we consider a general family of models that subsumes the normal-sum and normal-mean models. Model-assisted design strategies for models in this family, similar to those developed in Sections 2 and 3 for the normal-sum model, can be developed by using the mean-square error calculations analysis detailed in Appendix A.
The rest of the paper explores the implications of the normal-sum model for designing optimal treatment allocation strategies.

Interpretation of the mean square error for a fixed treatment allocation vector
We compute the mean square error of the difference-in-means estimator, according to the normal-sum model for Y obs , defined as mse(τ | Z) ≡ E{(τ − τ * ) 2 | Z}, for a fixed treatment allocation vector Z. We refer to this quantity as the conditional mean square error. We have, (5) We can identify desirable assignments by evaluating their conditional mean square error. This idea is the basis for the model-assisted restricted randomization strategies, in Section 3.2.
In the absence of specific constraints on the number of treated units, different treatment allocation vectors will generally have a different number of treated and untreated units, The bias is proportional to the difference in the average neighborhood sizes of treated and untreated units. Intuitively, this difference measures a lack of balance between the two groups, in terms of their network characteristics-specifically, the average degree. A larger value of the mean µ amplifies the contribution of this imbalance to the mean square error. Since the designer does not have control over µ, desirable treatment assignments minimize bias by balancing the average neighborhood size between treated and untreated units. The first variance term is which is minimized when N 1 = N 0 . Intuitively, this term penalizes the difference between the number of treated and untreated units. A larger value of the parameter γ amplifies the contribution of this imbalance to the mean square error. This result is consistent with classical results on the optimality of balanced randomizations for estimating the average treatment effect in the absence of network correlated outcomes (Fisher 1954;Imbens and Rubin 2015). The second variance term involves features of the network; it is The factor on the right hand side of (8) is proportional to the average number of shared neighbors among pairs of units both assigned to the treatment group. The factor in (9) is proportional to the average number of shared neighbors among pairs of units both assigned to the control group. The factor in (10) is proportional to the average number of shared neighbors among pairs of units, one assigned to treatment and one assigned to control. Considering the signs in front of these three factors, the second variance term may be minimized by assigning units with shared neighbors to different groups, and by avoiding assigning treatment or control to entire clusters of units that are densely connected.
2.4 Interpretation of the mean square error averaged over alloca-

tion vectors
Next, we compute the mean square error of the difference-in-means estimator, according to the normal-sum model and the distribution on the allocation vectors implied by a complete randomization strategy-which assigns equal probability to all of the treatment allocation vectors Z for which the numbers of units in treatment and control are fixed to (N 0 , N 1 ). We refer to this quantity, defined as mse(τ ) ≡ E[E{(τ − τ * ) 2 | Z}], as the marginal mean square error. It is, The right hand side of (11) (top) is the mean square error of the difference-in-means estimator due to a complete randomization strategy in the absence of a network, since (γ 2 + σ 2 ) is the total variance implied by the network-sum model. The three factors C 1:3 in (11) (bottom) can be seen as the contribution to the variance due to the presence of network-correlated outcomes. The term C 1 is proportional to the average degree of the nodes; thus networks with higher average degrees will tend to lead to higher mean square errors, ceteris paribus. The term C 2 is proportional to the average number of shared neighbors among all pairs of nodes; thus networks that are locally denser will tend to have lower mean square error, ceteris paribus. The term C 3 is proportional to the variance of observed degrees; thus low variability in the degree of the nodes will lead to lower mean square error, ceteris paribus. Interestingly, this contribution is not necessarily positive, because of term C 2 , which summarizes average local density. However, the contribution depends on summaries of the degree distribution of the network available pre-intervention that are not under control of the designer.

More general models of network-correlated potential outcomes
The normal-sum model introduced in Section 2.2, and the normal-mean model introduced in Appendix D.4, are special cases of a more general model which replaces Equation (3) in the main paper and Equations (37) in the appendix with the more general formulation, with regularity conditions on the function g, to essentially ensuring that, for any subset of nodes S ⊂ N i , the conditional expectation E(g[{X j } j∈N i ]|{X j } j∈S ) is well behaved. We detail the regularity conditions (i.e., positivity, symmetry, and monotonicity) as well as the general form of the mean square error for this more general model in Appendix A. In addition, we show that the general form of the mean squared error suggests that good designs seek to decrease the number of neighbors shared within treatment groups and increase the number of units shared between treatment groups, while balancing the size of the groups, as well as the distribution of neighborhood sizes. These derivations indicate that the network balance criteria the proposed restricted randomizations are based upon extend well beyond the normal-sum model. Moreover, model-assisted strategies come with theoretical guarantees that hold regardless of the validity of the model, as we show next.

Methodology and theory
Randomization strategies are probability distribution on the set of binary vectors Z. Restricted randomization strategies are probability distributions implied by discarding allocation vectors Z ∈ Z according to a set of rules. We review classical strategies in Section 3.1, introduce new strategies in Section 3.2 and study their theoretical properties in Section 3.6. Section 3.5 briefly discusses inference.

Classical randomization and restricted randomization strategies
According to a Bernoulli randomization strategy with parameter p ∈ (0, 1), each treatment allocation vector Z ∈ Z has individual treatments Z i drawn as independent Bernoulli random variables with probability of success p, for i = 1, . . . , N units in the population.
A completely randomized design with parameters (N 0 , N 1 ), where N 0 + N 1 = N , only considers treatment allocation vectors Z ∈ Z such that N i−1 Z i = N 1 , and assigns equal probability to them. If N 0 = N 1 = N/2 we refer to it as a balanced completely randomized design.
Restricted randomization strategies stem from the observation that, when designing an experiment, it is often clear how to evaluate whether a treatment allocation vector is undesirable. For instance, when an allocation vector Z leads to statistical imbalance for one or more key covariates, it leaves the door open to confounding even in the presence of randomization (Gosset 1938;Cox 1982). Indeed, the most common form of restricted randomization is to discard treatment allocations that lead to covariate imbalances (Lock-Morgan and Rubin 2012).

Model-assisted restricted randomization strategies
We introduce four model-assisted designs, which differ by the degree of reliance on the model; namely, on the conditional mean square errorfor the difference-in-means estimator.
First, we consider balanced restricted randomization strategies, which discard treatment allocation vectors where the number of treated units N 1 differs from the number of untreated units N 0 -or differs by more than one in the case of N odd. This strategy aims at minimizing the contribution of the total variance to the conditional mean square error, according to (7).
Second, we introduce unbiased restricted randomization strategies, which discard treatment allocation vectors where the average number of neighbors for treated units differs from the average number of neighbors for untreated units. This strategy aims at minimizing the contribution of the bias to the conditional mean square error, as suggested by the discussion of (6).
Third, we introduce optimal restricted randomization strategies, which discard treatment allocation vectors that minimize the average number of shared neighbors among pairs of treated units, according to (8), minimize the average number of shared neighbors among pairs of untreated units, according to (9), and maximize the average number of shared neighbors among pairs of units one of which is treated and the other untreated, according to (10).
Let Z ≡ {0, 1} N be the set of all possible treatment allocation vector on N units. Formally, we can define sets of allocations corresponding to the restricted randomization defined above as where q MSE α is the α quantile of the distribution of the conditional mean square error. These subsets of randomizations depend on network statistics that the normal-sum model suggests as relevant for computing the conditional mean square error, discussed in Section 2.3.
The rest of the paper focuses on the first three model-assisted strategies: balanced restricted randomization design, which assigns equal probability to all Z ∈ Z b , balanced/unbiased restricted randomization design, which assigns equal probability to all Z ∈ Z b ∩ Z u , balanced/unbiased/optimal restricted randomization design, which assigns equal probability to The fourth model-assisted strategy, which we refer to as unconstrained/optimal restricted randomization design, aims at trading off small increases in bias for significant reductions in variance. This design assigns equal probability to all Z ∈ Z min , defined as Even in situations where the set Z min contains a single allocation vector, because we can only approximately search the space Z for the optimal vector Z and because we use multiple initializations to perform such a search, in practice Z min contains multiple allocation vectors.

Model-based optimal treatment allocation strategies
The four model-assisted strategies in Section 3.2 leverage a model for the outcomes for selecting allocations that improve properties of the difference-in-means estimator, which ignores the model. One may wonder why not leveraging the model for the outcomes to also derive a better estimator for the average treatment effect, and then selecting allocations that improve properties of that estimator. Here, we develop such a model-based optimal treatment allocation strategy.
The natural next step is to replace the difference-in-means estimator with the maximum likelihood estimator for τ under the normal-sum model. The estimatorτ mle and its condi-tional mean square error are derived in Appendix D.5. The optimal maximum likelihood design is then the model-based restricted randomization strategy that assigns equal probability to all Z ∈ Z mle , defined as In the appendix, we evaluate the performance of the maximum likelihood estimator for τ using a completely randomized design, as a baseline, to quantify the improvement due to optimal restricted randomization. When evaluating the performance of both model-based strategies, we fix parameters µ, σ, and γ at their true value, and consider τ as the only unknown parameter.

Restricted randomizations via rerandomization
A general approach for sampling from arbitrary restricted randomization designs, referred to as rerandomization, has been recently formalized by Lock-Morgan and Rubin (2012). If φ is a binary function such that assignment Z is belongs to the restricted randomization set if and only if φ(Z) = 1, then one simple way to sample from the restricted randomization design by using a simple rejection sampling approach: draw an assignment Z from the original design, then keep the assignment if φ(Z) = 1, and reject it if φ(Z) = 0. In our setting, the restricted sets defined in 13-15 can be defined terms of different functions φ. Denote the indicator function by I(.), then Thus rerandomization can be used to sample from the restricted randomization designs we proposed. Rerandomization as a means to implement restricted randomization strategies is particularly useful when performing exact tests and computing confidence intervals, as detailed next.

Inference via inversion of a sequence of exact Fisher tests
There are traditionally three types of confidence intervals in randomization-based inference: Neymanian intervals, bootstrap intervals, and Fisher intervals. Neymanian inference in the context of restricted randomization is generally a challenging problem. Recent work has proposed an asymptotic theory of re-randomization (Li et al. 2016). Unfortunately, the asymptotic regime considered there is not compatible with our setting for two reasons: (i) proposed methods require the number of covariates to be fixed in the asymptotic regime, while in our case the quantities that are analogous to covariates include the number of neighbors shared by each pair of units, which grows with the number of units in the asymptotic regimes of interest; and (ii) proposed methods also require the constraints to be a function only of the vector of differences in means (between treated and control units) for the observed covariates, and of the variance-covariance matrix of that vector; a condition that does not hold in our case (see appendix). Bootstrap intervals are difficult to implement since the correlation structure of the outcomes may be complex.
Instead, we propose using Fisher intervals, which are obtained by inverting a sequence of Fisher exact tests (e.g., Rosenbaum et al. 2002). This can be accomplished by means of rerandomization (e.g., see Lock-Morgan and Rubin 2012, sec 2.2), but by using the proposed restricted randomization distributions as the permutation distributions. Details are given in Appendix B.
To illustrate the potential gains from Fished interval estimates based on restricted randomization, we ran a simulation in which we generated outcomes from the normal-sum model, and computed Fisher intervals based on balanced optimal restricted randomization (with α = 0.05). For a fixed network of 500 nodes, we generated two hundred realizations of the potential outcomes according to the normal-sum model, and two hundred observed assignments. For each realization, we computed a Fisher confidence interval based on balanced optimal restricted randomization and Fisher intervals based on balanced complete randomization. The results are displayed in Figure 2. We see that the proposed design based on restricted randomizations reduces the size of the Fisher confidence intervals (at the same level of confidence) compared to the length of the intervals obtained under complete randomization. The exact design of this simulation, as well as a more extensive simulation and corresponding results, are given in Appendix B.

Theory
An advantage of model-assisted designs is that they only partly depend on a model for the outcomes. Thus it is reasonable to expect that these designs might lead to desirable inferential properties even when the model they rely on for evaluating treatment allocations is wrong. Following this intuition, here we show that the difference-in-means estimator is design unbiased (e.g., see Särndal et al. 2003) for the restricted randomization strategies developed in Section 3.2.
Definition 1 (Design unbiasedness). An estimatorτ is unbiased with respect to a distribution on Z, typically referred to as a design on Z, if we have: The main result follows. Proofs are given in Appendix E.
Theorem 2. The difference-in-means estimatorτ defined in (1) is an unbiased estimator of the average treatment effect with respect to the following distributions: (i) Uniform distribution on Z b , which defines the balanced design (ii) Uniform distribution on Z b ∩ Z u , which defines the balanced/unbiased design (iii) Uniform distribution on Z b ∩ Z o , which defines the balanced/optimal design (iv) Uniform distribution on Z b ∩ Z u ∩ Z o , which defines the balanced/unbiased/optimal.
Intuitively, as a consequence of design unbiasedness and of the increasingly nested supports, we can compare variances ofτ implied by the designs in Theorem 2, in expectation.
Corollary 3. Letτ be the estimator defined in (1). We have, And similar inequalities can be derived easily for any pairs of nested designs in Theorem 2.
These results are based on symmetry arguments, which is why Z b is always part of the support of designs that make the difference-in-means estimator unbiased. This notion of symmetry is made precise in the following Lemma.
As a consequence, if we required the unconstrained optimal design to be balanced, by restricting its support to Z b ∩ Z min , we would recover design unbiasedness for the differencein-means estimator. However, we do not consider balanced unconstrained optimal designs.

Discussion
In this paper we have introduced a strategy for model-assisted design of experiments. Given the difference-in-means estimator, we use a model for the outcomes to compute its mean squared error conditional on a fixed treatment allocation vector Z ∈ Z. This calculation identifies network statistics that are relevant in controlling bias and variance terms. We then restrict the support of the randomization distribution to specific subsets of Z that minimize some of these terms. Should the model not hold, the difference-in-means estimator remains unbiased for the average treatment effects with respect to the restricted randomization distribution, as detailed in Theorem 2. In the model-assisted survey sampling literature, in contrast, given a linear weighted estimator such as the Horwitz-Thompson, a model is used to derive correction factors for the weights. The corrected estimator has reduced variance if the model holds, and is otherwise unbiased with respect to the sampling distribution independently of the model. The idea behind model-assisted design is fairly general, two key elements being the estimator and the model. The theoretical guarantees in Section 3.6 are limited to estimators satisfying the symmetry condition of Lemma 1, and to the model family in Appendix A. Extending the theory to a larger class of estimators and models is conceptually feasible, although often results in complex expressions for the mean square error, and hard-to-interpret balance criteria.
In practice, there are often additional issues to consider, which we have assumed away in our analysis for simplicity of exposition. Importantly, covariates will have to be taken into account, and the parameters µ, σ 2 , and γ will need to be specified or estimated. Options for inference on the parameters include point priors (Box and Lucas 1959), or specifying full priors to work with the integrated mean squared error. In both situations, historical data and pilot studies might be used to calibrate these priors, and are recommended for optimal design in practice (Kim et al. 2015;Shakya et al. 2016). The theory we developed for a more general model of network-correlated outcomes, and extensive simulation studies, both detailed in the Appendix, show that the gains in terms of efficiency one can expect to achieve with model-assisted design of experiments (over design-based and model-based strategies) are robust to a large degree of misspecifications. This paper is a starting point. In the context of the literature on homophily and peerinfluence, this paper suggests a viable strategy to get an analytical handle on which features of a network might be useful to control when designing an experiment. However, we limit ourselves to the case of network-correlated outcomes in the absence of peer-influence, we only analyze the conditional mean square error for the differences-in-means estimator under the normal-sum model, in Sections 2 and 3, and under the normal-mean model, in AppendixD.4, the conditional mean square error for the maximum likelihood estimator, in Appendix D.5, and we carry out an empirical sensitivity analysis. We initially choose to tackle networkcorrelated outcomes in isolation to gain clear analytical insights. We are currently working on combining these insights to design randomization strategies that can optimally estimate causal effects of interest in the presence of both network interference and confounding due to network correlations.

Supplementary Material
The supplement is organized as follows. Section A provides an in-depth analysis of the general model mentioned in the article, including all proofs. Section B provides a detailed procedure for inverting Fisher exact tests and obtaining confidence intervals with restricted randomization designs. It also reports results of an extended simulation study comparing the size of the confidence intervals obtained by inverting Fisher exact tests based on restricted randomizations to the size of confidence intervals obtained by inverting Fisher exact tests based on complete randomization. Section C presents the results of extensive simulation studies illustrating the robustness of our model-assisted strategies to model-misspecification, in contrast with the model-based strategies. Section D presents detailed derivations and proofs of theorems and lemmas.
A More general models of network-correlated outcomes In this section, we introduce a more general class of models, for which the interpretation of the conditional mean square error-used for the purpose of restricting randomizations in the model-assisted design strategies-generalizes that of the normal-sum and normal-mean models. This illustrates the extent to which the design guidelines derived from the simpler models hold more generally. Section A.1 introduces the general model formulation and states the results for it. Section A.2 gives examples of different instances of this model. Section A.3 details the proofs.

A.1 Elements of experimental design for more general models
Although the exact network balance criteria will depend on the model, some broad experimental design guidelines are available for a relatively large class of models. Consider the family, where the g satisfies the following regularity conditions: Regularity conditions. Let {X 1 , . . . , } a sequence of iid random variables. There exists a real-valued set function φ and a real-valued function of three variables h(·, ·, ·) such that for any collection {X k } k∈χ indexed by a finite set χ and any subset of indices S ⊂ χ, the following hold: 2. h(n, s, ·) is a monotone function of its third argument, for n and s fixed. That is, h(n, s, ·) is either non-increasing for every n and s, or non-decreasing for every n and s.
We now state our main new theorems. We will give examples of functions g satisfying the constraints immediately after: Theorem 4. If the regularity conditions above hold, we have: and: and: where q, v and m all depend on g, µ, and σ 2 and are defined in Lemma 3, below. In addition, the function m has the following properties: 1. m(a, b, c) ≥ 0 for all a, b ≥ 0 and c ≤ min(a, b).
2. m is symmetric in its first two arguments: m(a, b, c) = m(b, a, c) 3. m is a non-decreasing function of its third argument, when the first two arguments are held constant.
Remark 1. The functions q and h in the theorem depend on the specific choice of g, and on the parameters. However when looking at the variance one can see that everything else being equal, the mean square error is minimized by decreasing within-group overlap, and by increasing between-group overlap, precisely as in the basic normal-sum model. A general heuristic for experimental design would then be: minimize shared neighbors within treatment and control groups, maximize shared neighbors between these groups, while keeping the distribution of neighborhood sizes similar in both treatment and control groups.

A.2 Example models
We now give examples of functions g satisfying the three regularity conditions stated above.
Example 1. Consider g({X j } Ni ) = j∈Ni X j . Let S ⊂ N i . We have: is an increasing function of x for fixed n and s, and that φ({X k } k∈S ) is a non-decreasing function of any element of the set, after fixing the others. So g({X j } Ni ) = j∈Ni X i satisfies all three regularity conditions.
With regards to the first regularity condition we have: where φ({X k } k∈S ) = S∈S S. The other two regularity conditions are easily verifiable.
We have used the fact that the distribution of max(S) depends on S and is a function of |N i | and |S| and max({X k } S ). It is easy to verify the last two regularity conditions are also satisfied.

A.3 Proofs
Lemma 2. If the regularity conditions hold, then for any X 1 , . . . iid random variables there exists a function m(·, ·, ·) such that for any χ 1 and χ 2 two sets of indices such that χ 1 = χ 2 , and such that S = χ 1 ∩ χ 2 = ∅, we have the following: is a non-decreasing function.
We have: where the last equality uses the first regularity condition. Since the X's are iid, the covariance depends on {X k } k∈S only through |S|, the number of random variables in the set, and so we can write: where we emphasize once again that implicitly, m depends on g and on the model. We now study the properties of m: Partial Symmetry: Since the covariance is symmetric, it is clear that: m(|χ 1 |, |χ 2 |, |S|) = m(|χ 2 |, |χ 1 |, |S|) Positivity: By the second regularity condition, h(n, ·) is either a non-decreasing function of its second argument for all n, or a non-increasing function of its second argument for all n. But it is known (see e.g Thorisson (1995), section 2) that the covariance of two monotone functions of random variables is positive. Thus: cov h(|χ 1 |, φ({X k } k∈S )), h(|χ 2 |, φ({X k } k∈S )) ≥ 0 and we have: m(|χ 1 |, |χ 2 |, |S|) ≥ 0.
Monotonicity: Let χ 1 and χ 2 such that |χ 1 | = |χ 1 | and |χ 2 | = |χ 2 | and χ 1 ∩ χ 2 = {s 0 } ∪ S ≡ S . On the one hand, we have: on the other hand, we also have: But we have: The only random element in the covariance of the last line is X s0 , since we condition on all the other random variables. But by the third regularity condition, the function: for fixed {X k } k∈S is non-decreasing. Thus as above the covariance will be positive (Thorisson (1995)), and so: and so we also have: and putting it all together, we have: and combining with Equation 21, we have: that is, m(|χ 1 |, |χ 2 |, |S| + 1) ≥ m(|χ 1 |, |χ 2 |, |S|) and so by induction, m is a non-decreasing function of its third argument.
Lemma 3. If g satisfies the regularity conditions, we have:

(partial symmetry) m is symmetric in its first two arguments: m(a, b, c) = m(b, a, c)
3. (monotonicity) m is a non-decreasing function of its third argument, when the first two arguments are held constant.
Proof. We proceed in order: Expectation: We now apply Lemma 2 to the RHS of the last equality, with χ 1 = N i , χ 2 = N j , and S = N i ∩ N j , and we immediately obtain: and the properties of m are also obtained from Lemma 2.
Proof of Theorem. Recall that:τ We have: where we have used Lemma 3. We now turn to the variance. Let ω(Z i ) = Zi N1 − 1−Zi N0 . We have: But we have: Applying Lemma 3 to the first term of the RHS of the last line gives: Now notice that: otherwise So applying Lemma 3 to the second term of the RHS of the last line gives: where the −2 N1N0 term is obtained by symmetry of m with respect to its first two argument. Now using the fact that mse Θ (τ , τ |Z) = Bias Θ (τ , τ |Z) 2 + V Θ [τ |Z], we have: which completes the proof.

B Fisher intervals based on restricted randomizations B.1 Inferential procedure
This section simply combines together classic results on inverting Fisher tests (Rosenbaum et al. 2002) and on exact tests with restricted randomization (Lock-Morgan and Rubin 2012). The inferential procedure assumes that we have imposed balance, so Z ∈ Z b , which is the case considered in the article. We begin by describing how to obtain a p-value for the following type of sharp null hypothesis, then we describe how to invert a sequence of such tests to obtain a confidence interval. Let Z ∼ R be any of the restricted randomization schemes we proposed that imposes exact balance on the size of the treatment groups. A p-value for H τ * is obtained as follows: 1. let T obs =τ (Z obs ).

Compute the two-sided p-value p
The p-value p (M ) τ * thus obtained is a monte-carlo approximation of the true p-value p τ * for the test H τ * . It can be made arbitrarily precise by increasing M . The difference with the traditional Fisher test here occurs in Step 3, when we sample Z ∼ R, where R is a restricted randomization design. This is usually performed using rerandomization as described in the main article.
Confidence intervals are then obtained by inverting a sequence of such Fisher exact tests. Specifically, let τ min < τ max be such that p M τ min < α, p M τ max < α, and such that there exists τ min < τ < τ max such that p M τ > α. These can always be found, by construction. Let δ K = (τ max − τ min )/K. A 100 × (1 − α) interval for τ can be constructed as follows: 1. For k = 1, . . . , K, compute p M τ min +k * δ K .

B.2 Simulation study
This simulation compares the size of the Fisher intervals obtained with the balanced optimal model-assisted design, to that of the Fisher intervals obtained with balanced complete randomization. The setup is as follows. We generated 100 Erdos-Renyi graphs, with N nodes, and parameter p = 0.15. For each graph, we computed the mean square error for the associated normal sum model with parameters µ = 1, σ = 2, γ = 1, and τ = 1. Then for each graph we generated 200 independent realizations of the potential outcomes vector from the true model, and for each realization we computed the Fisher intervals from balanced complete randomization, and from balanced optimal randomization with α = 0.05. Thus in total, we obtained 20000 intervals for each method. We then then repeated the same step but with misspecified models. Specifically, we consider the 'small' misspecification case in which we randomly modify 5% of the edges in each graph, and the 'large' misspecification case in which we randomly modify 10% of the edges in each graph. In each misspecification case, we then proceed as if the misspecified graph was the correct graph (so we compute the mean square error based on the misspecified graph), but we generate the potential outcomes from the true model. Then as in the correctly specified case, we compute the size of the Fisher intervals obtained using our method to those obtained under balanced randomization.
Figures 2 summarizes the results. Each panel shows the percentage reduction in the size of confidence intervals obtained by using our method, for a single network (so 200 values). Each row correspond to a different degree of misspecification. The columns indicate how we chose the network displayed in the panels: for the panel in the first column and first row, we chose the network for which the mean percentage reduction was smallest, in the correctly specified case. The other panels can be interpreted similarly. The key message of this plot is that our method leads to smaller confidence intervals, even under strong misspecification, and that this reduction is consistent across different networks.

C Numerical results and robustness to misspecification
In this section, we report simulation results to assess the performance of the proposed randomization and re-randomization strategies against standard completely randomized allocation, Bernoulli allocation, and more recent re-randomization strategies based on these strategies. We perform three sets of simulations. In Section C.2, the proposed randomization strategies are obtained by relying on diffuse prior distributions for key parameters centered around the true values. In Section C.3, we explore comparative performance when the actual model (namely, the network used to specify the model) is misspecified. In Section C.4, we explore comparative performance when the prior distributions informing the proposed strategies are increasingly misspecified.

C.1 Design of simulation experiments
We consider four families of networks: Erdös-Renyi, power law, stochastic blockmodel, and small world on a ring lattice (Goldenberg et al. 2010). We do this for convenience, but without loss of generality, since the formulas for the mean square error in Sections 2.3-2.4 and the theory and methods in Section 3 depend on observed network statistics. We generate 100 networks, each with 500 nodes, from these families. These networks all have comparable densities (0.08 ± 0.02) by design. The outcomes are generated according to the model in Equations 2-4, with parameters µ = 1, σ = 2 and γ = 1. We note that several allocation strategies described in Section 3.2 require solving optimization problems, for which we can only provide approximate solutions. All optimizations are carried out via stochastic optimization (Goldberg and Holland 1988). We discuss the variability in the results due to this approximation when appropriate.

C.2 Comparative performance analysis
The goal of this set of simulations is to quantify the order of magnitude of improvements in integrated mean squared error an analyst can expect, under controlled conditions. In these simulations, we compare the performance of the different estimators when the data are simulated from the model in Equations 2-4. For each of the 400 networks described in Section C.1, we generate 300 assignments for each of the methods described in Section 3. For each assignment we compute the mean square error in Equation 5. Thus the results here compare performance of the randomization strategies coupled with the simple difference-in-means estimator. We postpone the discussion of the maximum likelihood estimator to the following section. Figure 3 shows the mean square error densities for seven randomization strategies, estimated from 30,000 replicated experiments for each network family. Figure 4 shows the mean square error densities for seven randomization strategies, estimated from 300 replicated experiments for the first simulated network in each family. In both Figures, we truncated the X axis at 5, however the mean square errors for the Bernoulli and balanced randomizations take values as high as 10.
The results suggest a number of observations. Balancing the number of treated and control units generally improves the mean squared error over Bernoulli randomizations. Removing bias through rerandomization, by balancing the average degree of treated and untreated units (see Equation 6), generally improves the mean squared error over balanced randomizations. Rerandomizations that effectively discard balanced randomizations with high mean square error often outperform balanced unbiased rerandomizations, with the exception of power law networks. In these networks, the balanced unbiased rerandomization still outperforms the rerandomization that keeps only allocation vectors with the 20% highest mean square error. This happens because the degree distribution in power law networks is very skewed and even the top 20% allocations in terms of mean square error display quite a lack of balance in terms of average degree between treated and non treated units.
We note that different rerandomization strategies explore the treatment allocation vectors with different criteria, thus these improvements are simply a consequence of the difficulty in exploring a vast space; we sampled 300 allocation vectors in a space that has roughly 2 100 elements. In our experience in designing large experiments practitioners typically generate tens of thousands of allocation vectors, but only look closely at hundreds of them, so our simulation is realistic in this respect (Kim et al. 2015, and ongoing work by the authors). The other three network families, on the other hand, have much more symmetric degree distributions, and so explicitly disregarding balance on average degree does not lead to heavy bias and higher mean square errors.
Interestingly, balanced unbiased rerandomizations that explicitly control the variance terms in the conditional mean square error (in Equations 8-10) substantially improve the mean squared error over balanced rerandomizations based on the overall mean square error. This suggest that is is unlikely to find allocation vectors with good variance control in a small set of allocations. Unconstrained rerandomizations that directly control the sum of bias and variance terms substantially improve the mean squared error over balanced unbiased optimal rerandomizations. This is consistent with the findings in classical estimation tasks, where a small increase in bias may lead to larger reductions in variance, and thus to lower mean square error. Lastly, the gap between the mean square error of unconstrained optimal rerandomizations and the mean square error of balanced unbiased optimal rerandomizations depends on the family of networks we consider.
Overall, model-assisted rerandomizations perform better, under ideal conditions. The theory in Section 3.6 provides assurances in terms of unbiasedness when the model does not hold.

C.3 Robustness to network misspecification
The goal of this set of simulations is to quantify the loss in performance of the randomization strategies we are considering when the network the model conditions on is misspecified.
We perturbed each of the 400 networks simulated for Section C.1 by randomly rewiring different proportions of the edges. For each perturbed network we generated 100 allocation vectors using six randomization strategies-those considered in the previous section with the exclusion of the Bernoulli randomizations (same color scheme as above). Perturbations in the network only affect the proposed model-assisted rerandomization strategies, which rely on explicit bias and variance terms that now depend on the perturbed network, while the outcomes are generated according to a model that relies on the unperturbed network. In addition, we consider the randomization strategy that minimizes the analytical expression for the mean square error of the maximum likelihood estimator, in Equation 40, which also leverages the perturbed network for estimating the ATE (in purple). We evaluated assignments in terms of marginal mean squared error, computed using the unperturbed networks.  ization strategies described above, and for the MLE under a balanced complete randomization as a baseline for the MLE (in pink). This baseline allows us to quantify the effects model misspecification on the mean square error because of failures in the estimation task only (when treatment is assigned using balanced complete randomizations), and to contrast it with the effects model misspecification on the mean square error because of failures in both estimation and optimal treatment allocation tasks. The four panels in Figure 5 show the mean square error for the four different network families. The X axis measures the fraction of edges rewired that defines the severity of the network perturbation. For instance, at 0.01 we rewire 1% of the edges; in networks with 500 nodes and density 0.15, this corresponds to 188 edges on average. At zero, mean square errors correspond to unperturbed networks.
The results suggest a few observations. There is a clear contrast between randomization strategies based on the MLE and those based on the difference-in-means estimator. While strategies targeting the MLE outperform the other strategies in the absence of model misspefication (i.e., no edges rewired), even for modest misspecification (i.e., 5% edges rewired) their mean square error increases substantially and exceeds that of strategies based on the difference-in-means estimator. Perhaps surprisingly, the balanced complete randomization for the MLE (pink curve) performs worst than the optimizing treatment assignment for the MLE (purple curve) for the range of misspecification explored. This over-sensitivity to model misspecification makes MLE-based randomization strategies, and MLE estimation of the ATE, unattractive options in practice.
In contrast, randomization strategies based on the difference-in-means estimator are generally insensitive to increasing amounts of misspecification, which is plausible since this estimator does not depend on the network. Any mount of misspecification (in the range we consider) does not alter the ordering the proposed rerandomization strategies suggested in Section C.2, in terms of average marginal mean square error over the simulated networks.

C.4 Robustness to prior misspecification
The goal of this set of simulations is to quantify the loss in performance of the randomization strategies we are considering when parameters in the model for the outcomes are misspecified.
For each of the 400 networks used in the previous simulations, we generated 100 assignments from each of the six randomization strategies based on the difference-in-means estimator, for each of the nine sets of parameters specified in Table 1. Recall that γ 2 is the variance of the outcomes, and (µ, σ 2 ) are mean and variance of the individual features that induce correlation among the outcomes along a given network. The parameter sets are listed in increasing order or average marginal mean square error expected under unconstrained optimal rerandomizations. The first set of parameter values gives the values used to generate the outcomes. Figure 4 shows the resulting mean sqartuare errors (mean ± 2 standard deviations) for the six randomization strategies (same color scheme as above). The results suggest a few observations. The balanced complete randomization strategy (in gray) is insensitive to the changes in the model since it does not rely on any aspect of it for assigning treatment. The two balanced rerandomization strategies (5% in red, 20% in orange) select allocations based on their conditional mean square error, which is computed using misspecified parameters; these strategies suffer in settings where both the parameter that controls the bias µ is wrongly assumed to be negligible and the parameter that controls the variability in the outcomes γ is wrongly assumed to be much bigger than its real value. The balanced unbiased rerandomization strategy (in green) is insensitive to misspecification; it disregards the variance components of the mean square error, thus eliminating sensitivity to the misspecification of γ and σ, and it selects allocations that zero out the term δ N in Equation 6, thus eliminating any potential adverse effects due to the misspecification of µ. The balanced unbiased optimal rerandomization strategy (in blue) is generally robust to parameter misspecification, while achieving low mean square error. Interestingly, the optimal unconstrained rerandomization strategy (in black), which de-Set no. spite parameter misspecifications achieves the lowest mean square error, in settings 8-9 suffers from trading too little bias (which equals µ · δ N ) for variance, since it wrongly assumes a high value for µ, and thus looses its advantage over the balanced unbiased optimal rerandomization strategy.

D Analytical derivations
Notes on the appendix. As in the main text, A will denote the extended adjacency matrix of the graph. That is, it is the adjacency matrix with all diagonal terms equal to 1. For clarity, we will distinguish all expectations, variances and covariances with respect to the randomization distribution, denoted by E Z , V Z , Cov, from the expectations, variances and covariances with respect to the model, denoted by E Θ , V Θ , Cov Θ . Expectations, variances and covariances without subscripts are to be understood as joint operations over the randomization distribution and the model.

D.1 Model for the observed data for the normal-sum model
We first derive the correlation between control potential outcomes It follows that It then follows that the observed model can be written D.2 Derivation of the conditional mean square error for the normalsum model It follows from the calculations in Appendix D.1 that: where ω(Z) = Z/N 1 − (1 − Z)/N 0 . So we have shown that: It is quick to derive that: . Thus, we immediately have: which gives the following conditional mean square error: D.3 Derivation of the marginal mean square error for the normalsum model The purpose of this section is to derive the analytical expression for: We start by noticing that: so Bias(τ ) = 0. Also, we have: So we have: So we can focus on this quantity. We have I'll start with E Θ [S 2 1 ] and we'll see that it's actually equal to E Θ [S 2 0 ]. Also, I'll write Y i = Y i (1) and Y =Ȳ (1) in order to simplify the notation. Now, let: Remember that we have: and Now note that: and that: which leads to: Clearly, none of the above would change for S 2 0 since the τ 's cancel out. So finally have: Note: a simple sanity check is to look at what would happen if there was no network. That is, if |N i | = 1 for all i, and |N i ∩ N j | = 0 for all i = j. The above formula then reduces to ( 1 N0 + 1 N1 )(γ 2 + σ 2 ), which is correct. This suggests a refactorization of the equation above: where: is the variance term in the absence of a network, and is the variance term correction when a network structure is present. So in conclusion, we have: which completes the proof.

D.4 Analysis of the difference-in-means estimator under the normalmean model
We consider the normal-means model, as an alternative: It easy to verify that for all Z, we have: E Θ [τ |Z] = 0. Then, as in the sum model, the variance can be expressed as: where: whereÃ is the matrix such thatÃ ij = A ij /|N i |. And so finally Which we can write in longer form as: The first term penalizes, as before, imbalance in the sizes of the treatment groups. The last three terms look a lot like what we had with the sum-model, except that we now have weighted averages. with more painful algebra, we can derive the marginal mean square error, and show that it is: The different terms of the equation can once again be used as new measures of balance that are functions of network quantities, although the interpretation is slightly more involved.
D.5 Analysis of the maximum likelihood estimator under the normalsum model The naive estimator does not make any reference to the network. This is not the case for the MLE, and we need to introduce a distinction between true and observed network. Let A 0 be the adjacency matrix associated with the true unobserved network, and A be the adjacency matrix associated with the noisy observed network. The model we use will be based on the observed network, while the evaluation will be with respect to the true network. We have shown in the observed model of (23) that the observed outcomes are jointly multivariate normal. Let v = Aµ + τ Z obs be the mean, and let Σ = γ 2 I + σ 2 AA be the variance. We also denote by Σ 0 = γ 2 I + σ 2 A 0 A 0 the variance based on the true covariance matrix. Finally, define µ * = Aµ and µ 0 = A 0 µ. With this, standard results show that: Remark 2. In all the simulations, we plug the true value of µ, σ, and γ in the mle, in order to be consistent with what we assume known at design-time when we compare it with the other methods.
but then under the true model, we have: and so the bias is: The variance is quickly derived: and so finally: E Proofs E.1 Proof of Corollary 3

Statement of Corollary 3
Letτ be the estimator defined in (1). We have, Proof. The key intuition for the proof is that Z b ∩ Z o ⊂ Z b and that the assignments that are in Z b and not in Z b ∩ Z o have large model mean square error.
Notice that since Bias Z b ∩Z o (τ |Y ) = Bias Z b (τ |Y ) = 0, we have: and similarly which concludes the proof.

E.3 Proof of Theorem 2
There are four parts to this theorem: we must show that the estimator is unbiased under the uniform designs on Z b , Z b ∩ Z u , Z b ∩ Z o , and Z b ∩ Z u ∩ Z o . Th proofs follow the same general ideas, so we will skip the details whenever the proofs are similar. Proof. This proof could be carried exactly as above. The longer proof that we use introduces concepts that will be reused in most of the following proofs, but in a simple scenario. By definition we have: Now, introduce for all i the sets: And since we also have: we conclude that: Which implies that for all i. We can then write: which completes the proof.
(iii) Unbiasedness under the uniform distribution on Z b ∩ Z o Proof. It is clear from the previous proof, that the key element of the proof is to show that: for all i. For this, we by start proving that: By the Lemma, we haveτ (Z * ) = 2τ −τ (Z). Now, let Z ∈ Z b ∩ Z o , we then have: ≤ q M SE α which means that Z * ∈ Z b ∩ Z o . So we have proved that: The rest of the proof unfolds exactly as in the proof of (i).
(ii) Unbiasedness under the uniform distribution on Z b ∩ Z u Proof. Here again, the key is to show that: Let Z ∈ Z b ∩ Z u , then by definition, we have: Bias(τ , τ |Z) = E[τ (Z) − τ |Z] = 0 (50) but then, since Z b ∩ Z u ⊂ Z b , by the Lemma, we have: = −Bias(τ |Z) = 0 which implies, that Z * ∈ Z b ∩ Z u . So we have proved that: The rest of the proof follows as in the previous two proofs.
(iv) Unbiasedness under the uniform distribution on Z b ∩ Z u ∩ Z o This proof is exactly the same as the previous one. We will just prove the fact that if Z b ∩ Z u = ∅, then Z b ∩ Z u ∩ Z o contains at least two elements. The proof is simple: Let Z ∈ Z b ∩ Z u . There exists a Z 0 that minimizes the mean square error on the set Z b ∩ Z u and so Z 0 ∈ Z b ∩Z u ∩Z o . But we have shown in the proof of (i) that for Z ∈ Z b , we have MSE(τ , τ |Z) = MSE(τ |Z * ). And so since Z 0 ∈ Z b ∩ Z u ∩ Z o ⊂ Z b , we have: which means that Z * ∈ Z b ∩ Z u ∩ Z o , and so |Z b ∩ Z u ∩ Z o | ≥ 2.