How Much Information is Needed to Infer Reticulate Evolutionary Histories?

Phylogenetic networks are a generalization of evolutionary trees and are an important tool for analyzing reticulate evolutionary histories. Recently, there has been great interest in developing new methods to construct rooted phylogenetic networks, that is, networks whose internal vertices correspond to hypothetical ancestors, whose leaves correspond to sampled taxa, and in which vertices with more than one parent correspond to taxa formed by reticulate evolutionary events such as recombination or hybridization. Several methods for constructing evolutionary trees use the strategy of building up a tree from simpler building blocks (such as triplets or clusters), and so it is natural to look for ways to construct networks from smaller networks. In this article, we shall demonstrate a fundamental issue with this approach. Namely, we show that even if we are given all of the subnetworks induced on all proper subsets of the leaves of some rooted phylogenetic network, we still do not have all of the information required to completely determine that network. This implies that even if all of the building blocks for some reticulate evolutionary history were to be taken as the input for any given network building method, the method might still output an incorrect history. We also discuss some potential consequences of this result for constructing phylogenetic networks.

Modern systematics assumes a tree as an integral component of the evolutionary model (Penny et al. 1992). However, genome science is also delivering a level of complexity previously under appreciated for many biological systems and organisms (e.g., Mallet 2007;Abbott et al. 2013;Liu et al. 2013;Muhlfeld et al. 2014). This growing appreciation has in turn motivated the development of phylogenetic networks (see e.g., Huson et al. 2010;Morrison 2011. These networks are a generalization of evolutionary trees and, in the broadest sense, can be any type of graph-theoretical network that is used to represent potentially complex patterns of evolutionary relationship. Some networks, also referred to as data-display or split networks (Dress and Huson 2004;Morrison 2010), attempt only to represent bipartitions or splits in data, and the evidence these splits provide for contradictory relationships. In these networks the internal nodes usually have no explicit meaning. Such networks generalize unrooted evolutionary trees and have been used to visualize homoplasy and detect errors in human sequence data (e.g., Bandelt et al. 2000;Bandelt et al. 2001), for visualizing the support for particular bifurcating trees and hypotheses (e.g., Holland et al. 2005) and for exploring the genetic complexity of plant and animal data sets (e.g., Morrison 2005). There are numerous ways of computing these graphs (e.g., Huson and Bryant 2006). Methods essentially differ in the extent to which they visualize incompatibilities either because of the way they compute the splits and/or because of the dimensionality of the displayed network.
Other networks, also referred to as genealogical networks, are constructed to model evolutionary history wherein the evolution is suspected of being reticulate in nature. These networks, which are the focus of the present study, are typically rooted and contain internal vertices that represent hypothetical ancestors and leaves that represent taxa sampled from the data (extant or extinct). They are directed graphs with a single root vertex and leaves labeled with taxon names (see e.g., Fig. 1 and Mathematical Definitions section). They also contain no directed cycles, thus ensuring that no taxon can be a descendent of itself. In these networks, vertices with more than one parent correspond to taxa that are formed by reticulate evolutionary events such as recombination or hybridization. In particular, a rooted evolutionary tree is a special type of rooted phylogenetic network which does not represent any reticulate evolutionary events. Genealogical networks are reviewed in for example , and have been used to study the evolution of organisms such as plants (Marcussen et al. 2011), viruses (Visser et al. 2012), and bacteria (Kunin et al. 2005).
Various methods have been proposed to construct genealogical phylogenetic networks, although it is generally agreed that there is still much more to be done in this direction (see e.g., Nakhleh 2011; Bapteste et al. 2013). Many of these methods follow a strategy that is also commonly used to build evolutionary trees (e.g., to construct supertrees), namely to infer networks from building blocks such as triplets (evolutionary trees with three leaves) (Huber et al. 2011), evolutionary trees (Kelk et al. 2012;D.Huson andScornavacca 2012), or clusters/clades (van Iersel et al. 2010). However, a fundamental issue with this strategy is that the commonly used building blocks do not necessarily determine or encode networks, in contrast to evolutionary trees. In other words, there can be pairs of rooted phylogenetic networks that do not represent the same evolutionary histories, but still display exactly the same building blocks (see e.g., Gambette and Huber (2012) for triplets and clusters, and Willson (2011) for evolutionary trees). For example, considering the two networks in Figure 1 (i) and (ii), the first one of which is adapted from the network pictured in van Iersel et al. (2009, Fig. 10), which was constructed from a data set of the yeast Cryptococcus gattii (see Hagen et al. 2013, for a following up study). Both networks display the same collection of evolutionary trees (pictured in the Supplementary Material available on Dryad (http://dx.doi.org/10.5061/dryad.f6n8s)), and therefore the same triplets, but they are not equivalent as networks. This is of importance since it implies that even if all of the building blocks for some reticulate evolutionary history were to be taken as the input for any given network building method, the method might still output an incorrect history.
To address this problem, it was recently proposed that networks be constructed using a network analog of triplets called trinets (Huber and Moulton 2012). Trinets are rooted phylogenetic networks with three leaves (see e.g., Fig. 1 (iii)-(v)); they can be induced on any three leaves of a rooted network by taking the union of all paths from the root to one of the three leaves, and then removing all vertices that lie above the last vertex that is on all such paths, and suppressing parallel edges (see Supplementary Material available on Dryad (http://dx.doi.org/10.5061/dryad.f6n8s) for an illustration). For example, in Figure 1 the trinet pictured in (iii) is induced on the three leaves c,e,f of the network pictured in (i). Note that in this example, even though the two networks in (i) and (ii) both induce the trinet pictured in (v), the trinets (iii) and (iv) that they induce on c,e,f are not equivalent. In particular, it follows that the networks in (i) and (ii) are also not equivalent. Thus, considering trinets could hold some promise for distinguishing between networks, especially since some special types of rooted phylogenetic networks (e.g., level-1, level-2, and tree-child) are in fact encoded by their trinets (see e.g.,Huber and Moulton 2012; van Iersel and Moulton 2014).
Even so, when trying to extend these results on trinet encodings to more general networks we were somewhat surprised to discover that trinets do not necessarily encode networks. Indeed, more generally, in this article we shall show that even if we are given the networks induced on all subsets of the leaves of a network except for the leaf-set itself, (which includes all possible trinets), we still do not necessarily have enough information to encode the network. More specifically, for any set X of taxa of size at least three, we shall present an example of two nonequivalent rooted, binary phylogenetic networks with leaf set X which both induce exactly the same network on any subset Y of X with Y = X (see Theorem 2). As an illustration, we present these two networks in the case that X has four elements in Figure 2. In addition, in the Supplementary Material available on Dryad (http://dx.doi.org/10.5061/dryad.f6n8s), we show that these networks also induce exactly the same set of evolutionary trees. Hence, even knowing all of the induced networks together with all of the induced trees for each of these two networks is still not enough information to distinguish between them.
Our examples were inspired by some results due to Thatte concerning the reconstructability of the socalled pedigree graphs (Thatte 2008), which are used to represent ancestral relationships between individuals in a population. Thatte was able to show that a pedigree cannot in general be reconstructed from the collection of its proper subpedigrees. Although this result is similar in nature to ours, it is not a simple corollary, as pedigree graphs have quite a different structure to phylogenetic networks (e.g., a pedigree graph can have multiple roots or "founders" and all other vertices have two parents). Moreover, Thatte's concept of a subpedigree is different from our concept of a network induced on a subset of a network's leaves. Intriguingly, both Thatte's and our results are somewhat related to the Kelly-Ulam reconstruction conjecture that states that a graph is uniquely determined by all of its subgraphs. This conjecture is still open, although for directed graphs it is known to be false (see e.g., Stockmeyer 1977). Even so there are again important mathematical distinctions between graphs in general and phylogenetic networks and pedigrees (e.g., graphs are not labeled by a set of taxa and the concept of a subgraph is different from an induced network).
The contents of the rest of the article are as follows. First we present some mathematical preliminaries on phylogenetic networks and also some terminology concerning binary sequences which will be key for constructing our examples. Then, given any leaf set of size at least three we present an example of two distinct nonbinary, rooted phylogenetic networks having the same leaf set which both induce exactly the same network on any proper subset of their leaves. These were the first examples that we discovered, and at the time we were uncertain as to whether or not there could be examples of binary networks with this property, as there are various mathematical results in phylogenetics that hold for binary trees/networks but not for nonbinary ones. However, by adapting our nonbinary networks we are also able to construct two binary networks with the same property. Since the proof of this fact follows the same approach to that for the nonbinary case but is considerably more technical, we shall present this in the Appendix. We conclude with a brief discussion of some ramifications and future directions as well as some potential consequences of our results for constructing reticulate evolutionary histories.

Digraphs
The basic graph-theoretical structure that underlies the phylogenetic networks in this article is called a digraph. This is a connected, directed graph G consisting of a set of vertices V(G) representing taxa (both hypothetical and sampled) and a set E(G) of directed edges or arcs that join pairs of them. We denote an arc starting at vertex u and ending at vertex v by (u,v), and call u a parent of v and v a child of u. This represents the fact that u is a direct ancestor of v. The in-and outdegree of a vertex v in G is the number of arcs ending and starting at v, respectively. A vertex of G that has outdegree 0 is called a leaf of G (which corresponds to a sampled taxon, either extinct or extant), and the set of all leaves of G is denoted by L(G). Note that vertices with indegree at most one and outdegree at least two represent speciations, whereas those with indegree at least two represent reticulations (e.g., evolutionary events such as hybridization and recombination). If a digraph G has a unique vertex with indegree zero, corresponding to a common ancestor of all of the taxa in question, then that vertex is called the root of G, denoted by (G), and we call G a rooted digraph. If G is rooted and G is a further rooted digraph then we say that G and G are isomorphic (as digraphs) if they are isomorphic in the usual graph-theoretical sense. If, in addition to being isomorphic, every leaf is mapped to itself by the underlying map, then G and G are called equivalent.
A digraph with no directed cycles is called a directed acyclic graph (DAG). For a rooted DAG G, a vertex in G that is neither a leaf nor the root is called an interior vertex of G. In addition, a vertex u in G is called an ancestor of a vertex v in G if u and v are equal (Although this means that in mathematical terms every vertex is considered to be an ancestor of itself, we adopt this mathematical convention as it simplifies the mathematics and is a common assumption in the theory of directed graphs.) or there exists a directed path in G starting at u and ending at v. If u is an ancestor of v but u = v then we say that v is below u. Thus, in a DAG a vertex v can never be below itself, which corresponds to the fact that v cannot be a biological descendent of itself. Furthermore, if G has at least three vertices and v is a vertex with outdegree one then we call v degenerate if indegree of v is not at least two. Finally, we call G binary if the outdegree of (G) is two and the sum of the indegree and outdegree of every interior vertex of G is three.

Phylogenetic Trees and Networks
Suppose for the remainder of the article that X is some (nonempty) set of taxa. A (phylogenetic) network N (on X) is a rooted DAG without degenerate vertices whose set of leaves is X. Unless the phylogenetic network N in question has precisely two vertices, we always assume that the outdegree of the root of N is at least two. Note that a network N that does not contain vertices with indegree two or more is just an evolutionary or phylogenetic tree (on X). As usual, we call a phylogenetic tree in which every leaf is the child of the root a star tree.
Now, suppose that Y is a nonempty subset of the set X of species. We now consider the subnet of N induced by restricting our attention to the leaves in Y. The lowest as the phylogenetic network on Y obtained from N as follows: First, delete all vertices of N (and their incident arcs) that are not on a directed path from LSA(Y) to some element in Y. Next, repeatedly suppress all resulting degenerate vertices (i.e. replace any such vertex v and the two arcs (u,v) and (v,w) containing it by a single arc (u,w)) and remove all parallel arcs until a phylogenetic network on Y is obtained. This definition for a subnet was introduced by Huber and Moulton (2012), and it aims to capture features that can be recovered from data (e.g., all degenerate vertices are suppressed as it would not be possible to decide how many degenerate vertices to include in a reconstructed network). Note that N | X = N if and only if N is recoverable. Also note that every subnet of N induced by restricting N to some nonempty subset of its leaves is necessarily recoverable.
We say that two phylogenetic networks N and N on X are network-equivalent if for every nonempty, proper subset Y of X, the phylogenetic networks N | Y and N | Y are equivalent. Thus, two phylogenetic networks are equivalent if and only if they represent the same evolutionary histories. Note that the following useful observation concerning network-equivalence is an immediate consequence of our definitions.
Lemma 1. Suppose that N 1 and N 2 are two recoverable phylogenetic networks on X. If N 1 and N 2 are equivalent, then N 1 and N 2 are network-equivalent.

Binary Sequences
All of our networks will be constructed using special types of binary sequences, that is, sequences over the alphabet {0,1}. We use binary sequences since they provide a convenient way to encode the vertices of certain phylogenetic trees that will be relevant to our constructions.
As our examples rely on using some special types of binary sequences we now introduce some general terminology concerning such sequences. Suppose that n is a nonnegative integer. We denote by l(w) the length of a binary sequence w. We let ∅ denote the empty sequence, that is, the unique sequence with length 0, let w k,n be the binary sequence of length n with 0's in all but the k-th place, 1 ≤ k ≤ n, and let 0 n , 1 n be the binary sequences of length n consisting of all 0's and all 1's, respectively. We also let B n denote the set of all binary sequences that have length n. Note that B 0 = {∅}. Now, assume n ≥ 1. For each sequence w in B n and all 1 ≤ i ≤ n, we denote by [w] i the i-th letter of w starting from the left. We define the weight of w as n i=1 [w] i , that is, the number of 1's contained in w. Moreover, for each sequence w ∈ B n , we define the support supp(w) of w to be the subset of {1,2,...,n} consisting of all indices i ∈ {1,...,n} with [w] i = 1. Finally, we denote by B 1 n and B 2 n the subsets of B n consisting of sequences whose weights are odd and even, respectively. Note that we will assume that 0 n is the only sequence in B n that is contained in neither B 1 n nor B 2 n . Thus, |B 1 n |=2 n−1 while |B 2 n |=2 n−1 −1. As an illustration of these definitions, w 2,3 = 010, the weight of the sequence 011 is 2 and its support is {2,3}, and B 1 3 = {001,010,100,111}, B 2 3 ={110,101,011}. Now, suppose that w and w are two binary sequences. Then w is called a prefix of w if l(w ) ≤ l(w) and [w ] i =[w] i holds for all 1 ≤ i ≤ l(w ). Note that the empty sequence is a prefix of every binary sequence. Also, if B is a set of binary sequences and w ∈ B, then we call a sequence w ∈ B −{w} a precursor of w (in B) if w is a prefix of w, that is, a prefix of w is a precursor of w in B if and only if this prefix is contained in B −{w}. And if, in addition, every precursor of w in B other than w is also a precursor of w in B, then we say that w is the maximal precursor of w (in B). Note that if this exists, then it is unique. Finally, we call w a common precursor of B if, for every sequence w ∈ B −{w}, w is a precursor of w in B.

Nonbinary Network Examples
In this section we shall present two nonbinary, phylogenetic networks N 1 and N 2 on an arbitrary set X with at least three elements that are not equivalent and prove that they are network-equivalent.
We begin by defining two rooted DAGs D 1 and D 2 from which we will obtain N 1 and N 2 , respectively. Let n ≥ 3, let X ={x 1 ,...,x n }, and let Y ={y 1 ,...,y n } be a set such that X ∩Y =∅. For i = 1,2, associate to X and Y the rooted DAG D i with vertex set X ∪Y ∪B i n ∪{ i }, and arc set comprising of (i) for all u ∈ B i n the arcs ( i ,u), (ii) for all 1 ≤ j ≤ n the arcs (y j ,x j ), and (iii) for all 1 ≤ j ≤ n and u ∈ B i n the arcs (u,y j ) if and only if [u] j = 1. Note that X is the set of leaves of D i and i is the root. We illustrate these DAGs for n = 4 in Figure 3 and list the binary sequences that label the vertices in B 1 n and B 2 n in its caption. To obtain the phylogenetic networks N 1 and N 2 we just suppress 106 SYSTEMATIC BIOLOGY VOL. 64 all degenerate vertices of D 1 and D 2 , respectively. Note that both N 1 and N 2 are recoverable because we have LSA(X) = 1 in N 1 and LSA(X) = 2 in N 2 .
We now prove the first of our main results.
Theorem 1 For every n ≥ 3, the networks N 1 and N 2 are not equivalent. However, N 1 and N 2 are network-equivalent.
Proof . To see that N 1 and N 2 are not equivalent note that B 1 n ∩B 2 n =∅ and that sequence 1 n is contained in B 1 n ∪ B 2 n . Consequently, there exists a child of the root of N 1 or N 2 (but not both) that has outdegree n. Thus, N 1 and N 2 cannot be equivalent. We next show that N 1 and N 2 are network-equivalent. Let k ∈{1,...,n} and put X k = X −{x k }, Y k = Y −{y k }. Note that N 1 | X k and N 2 | X k are recoverable as they are subnets of N 1 and N 2 , respectively. In view of Lemma 1, it therefore suffices to show that N 1 | X k and N 2 | X k are equivalent.
To this end, for any k ∈{1,...,n}, we associate for i = 1,2 a rooted DAG D k i to D i with the leaf x k removed. In particular, we define D k i to be the rooted DAG with leaf set X k obtained from D i by first deleting all arcs from D i that do not lie on a path from the root i of D i to a leaf in X k and then removing all resulting isolated vertices. Note that For brevity, for the rest of this proof, we let V k i and E k i denote the vertex set and edge set of D k i , respectively, and we put w k = w k,n . We define a map k from V k 1 to V k 2 as follows. Let ϕ k = ϕ k,n denote the map from B n to B n that "flips" precisely the k-th letter of a sequence in B n , that is the map given by, for all w ∈ B n , putting [ϕ k (w)] j =[w] j for j ∈{1,...,n}−{k}, and [ϕ k (w)] j = 1−[w] j for j = k. Note that w k ∈ B 1 n and supp(ϕ k (w k )) =∅. Moreover, the map ϕ k induces a bijection ϕ k from B 1 n −{w k } to B 2 n . Using this bijection, we now define the map k by putting, for v ∈ V k 1 , We shall show that this map is a bijection from V k 1 to V k 2 that extends to an isomorphism from D k 1 to D k 2 which maps every element in X k to itself. This implies that N 1 | X k and N 2 | X k are equivalent.
Clearly, k maps every element in X k to itself. To see that k is a bijection, note that since w k is the only element in B 1 n that is contained in V(D 1 ) but not V k 1 , we have B 1 n −{w k }⊆V k 1 . Combined with the fact that B 2 n ⊆ V k 2 also holds and that ϕ k is a bijection, it follows that k is a bijection.
To see that k induces an isomorphism between D k 1 and D k 2 it suffices to show for all v,w ∈ V k 1 , that (v,w) ∈ E k 1 if and only if ( k (v), k (w)) ∈ E k 2 . In view of k (u) = u holding for all u ∈ V k 1 −B 1 n and ( i ,w) ∈ E(D i ) holding for all w ∈ B i n , it follows that we may restrict our attention to showing that for all j ∈{1,...,n}−{k} and all w ∈ B 1 n −{w k } we have that (w,y j ) ∈ E k 1 if and only if ( k (w), k (y j )) ∈ E k 2 . So let j ∈{1,...,n}−{k} and w ∈ B 1 n −{w k }. Assume first that (w,y j ) ∈ E k 1 . Then [w] j = 1 and so [ k (w)] j = [ϕ k (w)] j = 1 as k = j. Thus, ( k (w), k (y j )) = ( k (w),y j ) ∈ E k 2 as k (y j ) = y j . Conversely, assume that ( k (w), k (y j )) ∈ E k 2 . Then since k (y j ) = y j we have [ϕ k (w)] j =[ k (w)] j = 1, and, hence [w] j = 1 in view of j = k. Thus, (w,y j ) ∈ E k 1 , as required. FIGURE 4. Constructing the network H 1 from the network D 1 in the case |X|=4. At each stage, we indicate those vertices that have been inserted by unfilled circles. i) The network D 1,1 in which the root has been replaced by the tree P 4 . ii) The network D 1,2 with the vertices in the middle layer indicated by squares. The tree R 1 is indicated in bold. iii) The network D 1,3 . The tree C supp(w) associated to the binary sequence w = 1110 is indicated in bold. iv) The network H 1 obtained by suppressing all vertices in D 1,3 having indegree and outdegree equal to one.

Binary Network Examples
We now extend the definitions of the networks D 1 and D 2 defined in the previous section so as to define two binary phylogenetic networks H 1 and H 2 that are not equivalent, but which are network-equivalent. We shall just present the definitions of H 1 and H 2 ; the proof of their network-equivalence is quite technical and can be found in the Appendix.
Let n ≥ 3 and i ∈{1,2}. Starting with the rooted DAGs D i defined in the previous section, we shall define a sequence of three rooted DAGs all having leaf set X, the last one of which will yield H i . We illustrate this process in Figure 4, for the rooted DAG D 1 depicted in Figure 3.
Step 1: We begin by replacing the star tree containing the root vertex of D i by a tree with leaf set B i n that is a subtree of a certain tree P n which is defined as follows. Let A n be the set of all binary sequences with length at most n. The tree P n is the rooted tree with vertex set A n and arc set consisting of all pairs (w,w ) ∈ A n ×A n for which w is the maximal precursor of w in A n . Note that the common precursor of A n is clearly the root of P n . In addition, since each sequence w ∈ A n is the maximal precursor of exactly two sequences in A n if w ∈ B n , and is not the maximal precursor of any sequence in A n if w ∈ B n , it follows that P n is a binary phylogenetic tree on B n . We depict the tree P 3 in Figure 5(i). Now, we replace the subgraph of D i with vertex set consisting of the root i of D i and the children of i (i .e. the star tree on B i n with root i ) by the (necessarily binary) restriction P n | B i n of P n to B i n . Let D i,1 denote the resulting rooted DAG (see e.g., Fig. 4(i)).
Step 2: We now replace each of the vertices y j , 1≤ j ≤ n, in the "bottom layer" of D i,1 by a tree R j . This tree is defined by reversing the direction of all arcs in the tree 108 SYSTEMATIC BIOLOGY VOL. 64 obtained by restricting the tree P n to the set B i n,j of all binary sequences in B i n whose j-th letter is 1. Note that the unique leaf of R j is y j since for all 1 ≤ j ≤ n the source set of R j is B i n,j and n j=1 B i n,j = B i n which is the leaf set of P n | B i n . Now, note that, for all 1 ≤ j ≤ n, the indegree of y j in D i is |B i n,j |=2 n−2 and that therefore the indegree of y j in D i,1 is also 2 n−2 . We now replace for all 1 ≤ j ≤ n the subgraph of D i,1 induced on the set {y j }∪B i n,j by R j . Let D i,2 denote the resulting rooted DAG (see e.g., Fig. 4(ii)).
Step 3: The final stage of the construction involves replacing each of the vertices in the "middle layer" of D i,2 with another phylogenetic tree which is defined as follows.
Let A ={a 1 ,...,a k }, k ≥ 1, denote a set of positive integers with a 1 < ··· < a k . If k = 1, then we denote by C A the phylogenetic tree whose unique leaf is labeled by the sole element in A. More generally, for k ≥ 2 we denote by C A the (up to equivalence) unique binary phylogenetic tree on A such that, over all non-leaf vertices v of C A , the collection of leaves below v is 1≤j<k {{a j ,a j+1 ,...,a k }}. Note that C A is an example of a rooted caterpillar tree (see e.g., Semple and Steel 2003). In Figure 5(ii) we present the tree C A for A ={1,2,3,5,7}. Now, any nondegenerate vertex w of D i,2 is not binary if and only if w ∈ B i n and |supp(w)| > 2. Therefore, we shall consider vertices in B i n whose support has size at least three. We shall replace all such vertices w by a rooted tree that is derived from the tree C supp(w) as follows (essentially, we replace w and its outgoing arcs by a rooted caterpillar whose leaves are the children of w). Put Y w := {y t ∈ Y : t ∈ supp(w)}. Then, since for all 1 ≤ j < l ≤ n the trees R j and R l defined in Step 2 do not share an interior vertex in D i,2 , it follows that for every child w of w there exists a unique vertex y ∈ Y w below w . For all t ∈ supp(w) let a t denote the child of w in D i,2 that lies on the path from w to y t so that, in particular, the set of children of w is {a t : t ∈ supp(w)}. To obtain the final digraph D i,3 in our sequence, we replace, for each w ∈ B i n with |supp(w)| > 2, the subgraph of D i,2 induced on the set consisting of w and its children by the tree obtained from C supp(w) by replacing each of its leaves t ∈ supp(w) by the corresponding child a t of w and replacing its root by w (see e.g., Fig. 4(iii)).
The phylogenetic network H i is now defined to be the rooted DAG obtained from D i,3 by suppressing all degenerate vertices (see e.g., Fig. 4(iv)). Note that, by construction, the leaf set of H i is X and H i is binary. Also note that H i is recoverable.
The proof of our second main result is quite technical and is given in the Appendix.
Theorem 2 For every n ≥ 3, the binary phylogenetic networks H 1 and H 2 are not equivalent. However, H 1 and H 2 are network-equivalent.

DISCUSSION
Our examples illustrate a problem with generalizing evolutionary models from rooted trees to rooted networks. We show that there are pairs of phylogenetic networks on an arbitrary set of taxa that are not equivalent, and yet display the same set of evolutionary trees (see Supplementary Material available on Dryad (http://dx.doi.org/10.5061/dryad.f6n8s) for additional details), as well as the same set of induced subnetworks. Although these examples are artificial in their construction, they still point to the possibility that this phenomenon could arise in nature, especially since phylogenetic networks can be extremely complex (see e.g., Kunin et al. 2005;Dagan et al. 2008).
The problem that we have presented has some potential ramifications for the development and use of new methods for constructing networks that explicitly represent evolution. First, as mentioned in the introduction, it implies that in practice we will have to be careful to ensure that the output from any network construction method is uniquely determined by its input. This in itself is not necessarily a great problem since even when we construct phylogenetic trees there can be multiple solutions (e.g., there can be several most parsimonious trees). Second, given that we know that there are cases where a network cannot be uniquely recovered from all of its induced subnetworks, it becomes important to characterize under which conditions these cases will be manifested, and we should try to understand how often biological data will 109 actually meet these conditions. Finally, in the context of extending consensus and supertree methods to include phylogenetic networks, our result shows that, unlike trees, it will not be possible to develop supernetwork methods in general that are consistent, that is methods that are guaranteed to output a given network from all of its induced subnetworks. However, again it will be interesting to better understand how important this will actually be in practice.
Even though we have found that networks are not necessarily encoded by their induced subnets in general, some classes of networks are. For example, level-2 networks and thus also phylogenetic trees and level-1 networks are encoded by their trinets (note that the level of a binary phylogenetic network is the maximum number of indegree-2 vertices taken over all biconnected components of the network) (van Iersel and Moulton 2014). Hence, it might be of interest to determine which types of networks are encoded by their induced subnets and also to possibly concentrate on developing methods to construct these special types of networks. Note that various methods have already been designed to construct special types of networks (see e.g., Willson 2012), but this obviously requires some care to ensure that the properties of the networks under consideration are realistic enough to represent real data. Note also that the level of the networks in our examples is exponential in |X| (it is (2 n−2 −1)n with n =|X|). Hence, it could be of interest to decide whether networks with reasonably low level relative to the size of their leaf set (e.g., linear level as function of |X|) are encoded by their subnets.
Even if we are not necessarily able to encode a network by its subnets, it could still be of interest to investigate whether at least some parameters (e.g., the number of reticulation vertices) can be determined or at least approximated by the knowledge of their induced subnets (or even trees). In addition, it could be useful to decide whether or not networks might be encoded if more information is available (e.g., if we are given branch lengths/dates for vertices or some model of evolution). Note that Thatte and Steel investigated reconstructability of pedigrees assuming a certain probabilistic model and were able to prove some encoding results for pedigrees in general (see e.g., Thatte and Steel 2008;Thatte 2013), so analogous results might also hold for phylogenetic networks.
There are some related mathematical problems that are also worth mentioning. It has been shown that a graph drawn uniformly at random is encoded by its subgraphs with probability 1, as the size of the vertex set goes to infinity (see e.g., Bollobás 1990). It would be interesting to work out the probability that a randomly selected phylogenetic network is encoded by its induced subnets. This might also provide some clues about whether or not networks arising in practice could be expected to be encoded by induced subnets or not. In particular, the aforementioned probabilistic result suggests that maybe networks on large sets of taxa that are not encoded by their induced subnetworks might be quite rare in practice. In addition, an interesting algorithmic question is the following: if we are given a phylogenetic network, can we decide efficiently if it is uniquely encoded by its induced subnets? And, if we are given a set of networks, can we efficiently decide if they are induced subnets of some network?
In conclusion, even if there may be more than one network that can induce the same set of trees and/or subnetworks, it is still useful to find ways to construct these networks so that alternative evolutionary scenarios can be explored. This has already proven a useful strategy in phylogenetics (for example, understanding the number of reconciliations of a gene tree with a species tree (Bansal et al. 2013)). In regards to this, it would be interesting to develop ways to determine how many networks can potentially display the same set of subnetworks. More generally, a better understanding of the structure of networks in terms of substructures could also give us a better understanding of the performance of current methods for network construction, and will hopefully also eventually help us to design new methods for confidently recovering reticulate evolutionary histories.

APPENDIX
In this appendix we prove Theorem 2. Assume that n is a positive integer and that k ∈{1,...,n}. We will use the following lemmas concerning the trees P n and C A used in the construction of H 1 and H 2 . For brevity, for any sequence w ∈ B n with nonempty support (with the natural order), we put C w = C supp(w) . Also and as in the proof of Theorem 1, we let ϕ k : B n → B n be the map that "flips" precisely the k-th letter of a sequence in B n .
Proof . Put w k = w k,n . Let w ∈ B n −{0 n ,w k }. Then neither supp(w) =∅ nor supp(w) ={k} holds. This implies that the trees C w and C ϕ k (w) are both well-defined. Let w denote the unique sequence in B n whose support is supp(w)−{k}, which is clearly not the empty set. Then the tree C w is also well-defined and, as is straightforward to see, it is isomorphic to C w | supp(w ) . In view of ∅ = supp(w ) ⊆ supp(ϕ k (w)), we also have that We now establish a similar result for the tree P n . For A n as defined in Step 1 of the construction, a set A ⊆ A n of binary sequences is called complete if it contains a (necessarily unique) common precursor. Given such a set, generalizing the definition of P n , we let P[A] denote the directed tree with vertex set A and arc set the set of all arcs (w,w ) ∈ A×A for which w is the maximal precursor of w in A. Note that P n = P[A n ].
(i) The trees P n | B 1 n −{w n,k } and P n | B 2 n are isomorphic. (ii) For all j ∈{1,...,n}−{k}, the trees P n | B 1 n,j and P n | B 2 n,j are isomorphic.
Proof . Let A * be the set of all binary sequences of finite length. For all l ≥ 1, define the map l : A * → A * by l : A * → A * : w → ϕ l,n (w) if w ∈ B n for some l ≤ n, w else. Now, note that if a subset A ⊆ A n is complete, then the set l (A) is complete since, for any two distinct sequences w,w ∈ A, we have that w is a precursor of w in A if and only if l (w) is a precursor of l (w ) in l (A). Moreover, for l = k the map k induces an isomorphism between the trees P[A] and P[ k (A)]. In particular, k induces, for every subset B ⊆ B n , an isomorphism between P n | B and P n | k (B) .
Both statements in the lemma now follow in view of the fact that l (B 1 n −{w k,n }) = ϕ l (B 1 n −{w k,n }) = B 2 n holds for all 1 ≤ l ≤ n and l (B 1 n,j ) = B 2 n,j holds for all 1 ≤ j,l ≤ n with j = l.
We now prove Theorem 2. We first show that H 1 and H 2 are not equivalent. Indeed, let i ∈{1,2}. If v is a vertex in D i then the set of leaves below v equals X if and only if v is the root of D i or v = 1 n . Since if m is even 1 m ∈ B 2 m and if m is odd 1 m ∈ B 1 m , the number of vertices v in D i for which the set of leaves below v equals X is different in D 1 and D 2 . Thus, H 1 and H 2 cannot be equivalent.
We now prove that H 1 and H 2 are network-equivalent. Let X k = X −{x k } and Y k = Y −{y k }. Note that the subnets H k 1 = H 1 | X k and H k 2 = H 2 | X k are recoverable as they are subnets of H 1 and H 2 , respectively. Hence, in view of Lemma 1, it suffices to show that H k 1 and H k 2 are equivalent. Now, put D i,0 = D i . For 0 ≤ j ≤ 3, let D k i,j = (V k i,j ,E k i,j ) denote the rooted DAG with leaf set X k obtained from D i,j by first deleting all arcs from D i,j that do not lie on a path from the root of D i,j to a leaf in X k and then removing the resulting isolated vertices. Note that D k i,0 = D k i and that H k i is the phylogenetic network on X k obtained from D k i by suppressing all degenerate vertices. By Lemma 3(i), there exists a bijection k 1 : V k 1,1 → V k 2,1 that extends to an isomorphism from D k 1,1 to D k 2,1 such that, for all v ∈ V k 1,1 , we have k and k 1 (v) = k (v) else where k is the bijection from V k 1 to V k 2 given in the proof of Theorem 1. Using Lemma 3(ii), there also exists a bijection k 2 : V k 1,2 → V k 2,2 that extends to an isomorphism from D k 1,2 to D k 2,2 for which k 2 (v) = k 1 (v) holds for all v ∈ V k 1,1 . Moreover, by Lemma 2, there also exists a bijection k 3 : V k 1,3 → V k 2,3 that extends to an isomorphism from D k 1,3 to D k 2,3 for which k 3 (v) = k 2 (v) holds for all v ∈ V k 1,2 . Consequently, H k 1 and H k 2 must be equivalent, as required.

SUPPLEMENTARY MATERIAL
Data available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.f6n8s. FUNDING L.vI. was supported by a Veni grant from The Netherlands Organisation for Scientific Research (NWO).