GLEE: Geometric Laplacian Eigenmap Embedding

Graph embedding seeks to build a low-dimensional representation of a graph G. This low-dimensional representation is then used for various downstream tasks. One popular approach is Laplacian Eigenmaps, which constructs a graph embedding based on the spectral properties of the Laplacian matrix of G. The intuition behind it, and many other embedding techniques, is that the embedding of a graph must respect node similarity: similar nodes must have embeddings that are close to one another. Here, we dispose of this distance-minimization assumption. Instead, we use the Laplacian matrix to find an embedding with geometric properties instead of spectral ones, by leveraging the so-called simplex geometry of G. We introduce a new approach, Geometric Laplacian Eigenmap Embedding (or GLEE for short), and demonstrate that it outperforms various other techniques (including Laplacian Eigenmaps) in the tasks of graph reconstruction and link prediction.


INTRODUCTION
Graphs are ubiquitous in real-world systems from the internet to the world wide web to social media to the human brain. The application of machine learning to graphs is a popular and active research area. One way to apply known machine learning methods to graphs is by transforming the graph into a representation that can be directly fed to a general machine learning pipeline. For this purpose, the task of graph representation learning, or graph embedding, seeks to build a vector representation of a graph by assigning to each node a feature vector that can then be fed into any machine learning algorithm.
Popular graph embedding techniques seek an embedding where the distance between the latent representations of two nodes represents their similarity. For example, Chen et al. [10] calls this the "community aware" property (nodes in a community are considered similar, and thus their representations must be close to one another), while Chen et al. [11] calls it a "symmetry" between the node domain and the embedding domain. Others call methods based on this property with various names such as "positional" embeddings [43] or "proximity-based" embeddings [21]. Consequently, many of these approaches are formulated in such a way that the distance (in the embedding space) between nodes that are similar (in the original data domain) is small. Here, we present a different approach. Instead of focusing on minimizing the distance between similar nodes, we seek an embedding that preserves the most basic structural property of the graph, namely adjacency; the works [21,43] call this approach "structural" node embeddings. Concretely, if the nodes i and j are neighbors in the graph G with n nodes, we seek d-dimensional vectors s i and s j such that the adjacency between i and j is encoded in the geometric properties of s i and s j , for some d ≪ n. Examples of geometric properties are the dot product of two vectors (which is a measure of the angle between them), the length (or area or volume) of a line segment (or polygon or polyhedron), the center of mass or the convex hull of a set of vectors, among others. In Section 3 we propose one such geometric embedding technique, called Geometric Laplacian Eigenmap Embedding (GLEE), that is based on the properties of the Laplacian matrix of G, and we then proceed to compare it to the original formulation of Laplacian Eigenmaps as well as other popular embedding techniques.
GLEE has deep connections with the so-called simplex geometry of the Laplacian [12,15]. Fiedler [15] first made this observation, which highlights the bijective correspondence between the Laplacian matrix of an undirected, weighted graph and a geometric object known as a simplex. Using this relationship, we find a graph embedding such that the representations s i , s j of two non-adjacent nodes i and j are always orthogonal, s i · s j = 0, thus achieving a geometric encoding of adjacency. Note that this does not satisfy the "community aware" property of [10]. For example, the geometric embedding s i of node i will be orthogonal to each nonneighboring node, including those in its community. Thus, s i is not close to other nodes in its community, whether we define closeness in terms of Euclidean distance or cosine similarity. However, we show that this embedding -based on the simplex geometry -contains desirable information, and that it outperforms the original, distance-minimizing, formulation of Laplacian Eigenmaps (LE) on the tasks of graph reconstruction and link prediction in certain cases.
The contributions of this work are as follows.
(1) We present a geometric framework for graph embedding that departs from the tradition of looking for representations that minimize the distance between similar nodes by highlighting the intrinsic geometric properties of the Laplacian matrix. (2) The proposed method, Geometric Laplacian Eigenmap Embedding (GLEE), while closely related to the Laplacian Eigenmaps (LE) method, outperforms LE in the tasks of link prediction and graph reconstruction. Moreover, a common critique of LE is that it only considers first-order adjacency in the graph. We show that GLEE takes into account higher order connections (see Section 3.2). (3) The performance of existing graph embedding methods (which minimize distance between similar nodes) suffers when the graph's average clustering coefficient is low. This is not the case for GLEE. In Section 2 we recall the original formulation of LE, in order to define the Geometric Laplacian Eigenmap Embedding (GLEE) in Section 3 and discuss its geometric properties. We mention related work in Section 4 and present experimental studies of GLEE in Section 5. We finish with concluding remarks in Section 6.

BACKGROUND ON LAPLACIAN EIGENMAPS
Belkin and Niyogi [3,4] introduced Laplacian Eigenmaps as a general-purpose method for embedding and clustering an arbitrary data set. Given a data set {x i } n i=1 , a proximity graph G = (V , A) is constructed with node set V = {x i } and edge weights A = (a i j ). The edge weights are built using one of many heuristics that determine which nodes are close to each other and can be binary or real-valued. Some examples are k nearest neighbors, ϵ-neighborhoods, heat kernels, etc. To perform the embedding, one considers the Laplacian matrix of G, defined as L = D − A, where D is the diagonal matrix whose entries are the degrees of each node. One of the defining properties of L is the value of the quadratic form: The vector y * that minimizes the value of (1) will be such that the total weighted distance between all pairs of nodes is minimized. Here, y i can be thought of as the one-dimensional embedding of node i. One can then extend this procedure to arbitrary d-dimensional node embeddings by noting that tr (Y T LY) = i, j a i j ∥y i − y j ∥ 2 , where Y ∈ R n×d and y i is the ith row of Y. The objective function in this case is Importantly, the quantity tr (Y T LY) has a global minimum at Y = 0. Therefore, a restriction is necessary to guarantee a non-trivial solution. Belkin and Niyogi [3,4] choose Y T DY = I, though others are possible. Applying the method of Lagrange multipliers, one can see that the solution of (2) is achieved at the matrix Y * whose rows y * i are the solutions to the eigenvalue problem When the graph contains no isolated nodes, y * i is then an eigenvector of the matrix D −1 L, also known as the normalized Laplacian matrix. The embedding of a node j is then the vector whose entries are the jth elements of the eigenvectors y * 1 , y * 2 , ..., y * d .

PROPOSED APPROACH: GEOMETRIC LAPLACIAN EIGENMAPS
We first give our definition and then proceed to discuss both the algebraic and geometric motivations behind it.
Definition 3.1 (GLEE). Given a graph G, consider its Laplacian matrix L. Using singular value decomposition we may write L = SS T for a unique matrix S. Define S d as the matrix of the first d columns of S. If i is a node of G, define its d-dimensional Geometric Laplacian Eigenmap Embedding (GLEE) as the ith row of S d , denoted by s d i . If the dimension d is unambiguous, we will just write s i .
Algebraic motivation. In the case of positive semidefinite matrices, such as the Laplacian, the singular values coincide with the eigenvalues. Moreover, it is well known that S d is the matrix of rank d that is closest to L in Frobenius norm, i.e., ∥L − S d (S d ) T ∥ F ≤ ∥L − M∥ F for all matrices M of rank d. Because of this, we expect S d to achieve better performance in the graph reconstruction task than any other d-dimensional embedding (see Section 5.1).
As can be seen from Equation (1), the original formulation of Laplacian Eigenmaps is due to the fact that the distance between the embeddings of neighboring nodes is minimized, under the restriction Y T DY = I . We can also formulate GLEE in terms of the distance between neighboring nodes. Perhaps counterintuitively, GLEE solves a distance maximization problem, as follows. The proof follows from a routinary application of Lagrange multipliers and is omitted.
Theorem 3.2. Let Λ be the diagonal matrix whose entries are the eigenvalues of L. Consider the optimization problem

(4)
Its solution is the matrix S d whose columns are the eigenvectors corresponding to the largest eigenvalues of L.
The importance of Theorem 3.2 is to highlight the fact that distance-minimization may be misleading when it comes to exploiting the properties of the embedding space. Indeed, the original formulation of Laplacian Eigenmaps, while well established in Equation 2, yields as result the eigenvectors corresponding to the lowest eigenvalues of L. However, standard results in linear algebra tell us that the best low rank approximation of L is given by the eigenvectors corresponding to the largest eigenvalues. Therefore, these are the ones used in the definition of GLEE.
Geometric motivation. The geometric reasons underlying Definition 3.1 are perhaps more interesting than the algebraic ones. A recent review paper [12] highlights the work of Fiedler [15], who discovered a bijective correspondence between the Laplacian matrix of a graph and a higher-dimensional geometric object called a simplex.
is linearly independent), then their convex hull is called a simplex.
A simplex is a high-dimensional polyhedron that is the generalization of a 2-dimensional triangle or a 3-dimensional tetrahedron. To see the connection between the Laplacian matrix of a graph and simplex geometry we invoke the following result. The interested reader will find the proof in [12,15]. Theorem 3.4. Let Q be a positive semidefinite k × k matrix. There exists a k × k matrix S such that Q = SS T . The rows of S lie at the vertices of a simplex if and only if the rank of Q is k − 1. □ Corollary 3.5. Let G be a connected graph with n nodes. Its Laplacian matrix L is positive semidefinite, has rank n − 1, and has eigendecomposition L = PΛP T . Write S = P √ Λ. Then, L = SS T and the rows of S are the vertices of a (n − 1)-dimensional simplex called the simplex of G. □ Corollary 3.5 is central to the approach in [12], providing a correspondence between graphs and simplices. Corollary 3.5 also shines a new light on GLEE: the matrix S d from Definition 3.1 corresponds to the first d dimensions of the simplex of G. In other words, computing the GLEE embeddings of a graph G is equivalent to computing the simplex of G and projecting it down to d dimensions. We proceed to explore the geometric properties of this simplex that can aid in the interpretation of GLEE embeddings. We can find in [12] the following result. Corollary 3.6. Let s i be the ith row of S in Corollary 3.5. s i is the simplex vertex corresponding to node i, and satisfies In particular, s i is orthogonal to the embedding of any non-neighboring node j. □ Corollary 3.6 highlights some of the basic geometric properties of the simplex (such as lengths and dot products) that can be interpreted in graph theoretical terms (resp., degrees and adjacency). In Figure 1 we show examples of these properties. It is worth noting that other common matrix representations of graphs do not present a spectral decomposition that yields a simplex. For example, the adjacency matrix A is not in general positive semidefinite, and the normalized Laplacian D −1 L (used by LE) is not symmetric. Therefore, Theorem 3.4 does not apply to them. We now proceed to show how to take advantage of the geometry of GLEE embeddings, which can all be thought of as coming from the simplex, in order to perform common graph mining tasks. In the following we focus on unweighted, undirected graphs.

Graph Reconstruction
For a graph G with n nodes, consider its d-dimensional GLEE embedding S d . When d = n, in light of Corollary 3.6, the dot product between any two embeddings s i , s j can only take the values −1 or 0 and one can reconstruct the graph perfectly from its simplex. However, if d < n, the distribution of dot products will take on real values around −1 and 0 with varying amounts of noise; the larger the dimension d, the less noise we find around the two modes. It is important to distinguish which nodes i, j have embeddings s i , s j whose dot product belongs to the mode at 0 or to the mode at −1, for this determines whether or not the nodes are neighbors in the graph. One possibility is to simply "split the difference" and consider i and j as neighbors whenever s i · s j < −0.5. More generally, given a graph G and its embedding S d , defineL(θ ) to be the estimated Laplacian matrix using the above heuristic with threshold θ , that iŝ Then, we seek the value of θ , call it θ opt , that minimizes the loss θ opt = arg min If all we have access to is the embedding, but not the original graph, we cannot optimize Equation (6) directly. Thus, we have to estimate θ opt heuristically. As explained above, one simple estimator is the constantθ c = −0.5. We develop two other estimators: θ k ,θ д , obtained by applying Kernel Density Estimation and Gaussian Mixture Models, respectively. We do so in Appendix A as their development has little to do with the geometry of GLEE embeddings. Our experiments show that different thresholds θ c , θ k , and θ д produce excellent results on different data sets; see Appendix A for discussion.

Link Prediction
Since the objective of GLEE is to directly encode graph structure in a geometric way, rather than solve any one particular task, we are able to use it in two different ways to perform link prediction. These are useful in different kinds of networks.

Number of Common Neighbors.
It is well known that heuristics such as number of common neighbors (CN) or Jacard similarity (JS) between neighborhoods are highly effective for the task of link prediction in networks with a strong tendency for triadic closure [39]. Here, we show that we can use the geometric properties of GLEE in order to approximately compute CN. For the purpose of exposition, we assume d = n unless stated otherwise in this section.
Given an arbitrary subset of nodes V in the graph G, we denote by |V | its number of elements. We further define the centroid of V , denoted by C V , as the centroid of the simplex vertices that correspond to its nodes, i.e., C V = 1 |V | i ∈V s i . The following lemma, which can be found in [12], highlights the graph-theoretical interpretation of the geometric object C V . Lemma 3.7 (From [12]). Given a graph G and its GLEE embedding S, consider two disjoint node sets V 1 and V 2 . Then, the number of edges with one endpoint in V 1 and one endpoint in V 2 , is given by Proof. By linearity of the dot product, we have The expression on the right is precisely the required quantity. □ Lemma 3.7 says that we can use the dot product between the centroids of two node sets to count the number of edges that are shared by them. Thus, we now reformulate the problem of finding the number of common neighbors between two nodes in terms of centroids of node sets. In the following, we use N (i) to denote the neighborhood of node i, that is, the set of nodes connected to it. Lemma 3.8. Let i, j ∈ V be non-neighbors. Then, the number of common neighbors of i and j, denoted by CN (i, j), is given by Proof. Apply Lemma 3.7 to the node sets V 1 = N (i) and V 2 = {j}, or, equivalently, to V 1 = N (j) and V 2 = {i}. □ Now assume we have the d-dimensional GLEE of G. We approximate CN (i, j) by estimating both deg(i) and C N (j) . First, we know 3D simplex (tetrahedron) Figure 1: Simplex geometry and GLEE. Given a graph G with n nodes (top row), there is a (n − 1)-dimensional simplex that perfectly encodes the structure of G, given by the rows of the matrix S from Corollary 3.5 (middle row). The first d columns of S yield the Geometric Laplacian Eigenmap Embedding (GLEE) of G (bottom row). In each example, embeddings are colorcoded according to the node they represent. For n = 3, all nodes in the triangle graph are interchangeable. Accordingly, their embeddings all have the same length and subtend equal angles with each other. For n = 4, the green and purple nodes are interchangeable, and thus their embeddings are symmetric. Note that the length of each embedding corresponds to the degree of the corresponding node. For n = 34 we show the Karate Club network [51], in which we highlight one node in green and all of its neighbors in purple. In the bottom right panel, the dotted line is orthogonal to the green node's embedding. Note that most of the non-neighbors' embeddings (in gray) are close to orthogonal to the green node's embedding, while all neighbors (in purple) are not.
from Corollary 3.6 that deg(i) ≈ ∥s d i ∥ 2 . Second, we define the approximate neighbor set of i asN (i) = {k : s d k · (s d i ) T <θ }, whereθ is any of the estimators from Section 3.1. We can now write The higher the value of this expression, the more confident is our prediction that the link (i, j) exists.

Number of Paths of Length 3.
A common critique of the original Laplacian Eigenmaps algorithm is that it only takes into account first order connections, which were considered in Section 3.2.1. Furthermore, Kovács et al. [23] point out that the application of link prediction heuristics CN and JS does not have a solid theoretical grounding for certain types of biological networks such as protein-protein interaction networks. They further propose to use the (normalized) number of paths of length three (L3) between two nodes to perform link prediction. We next present a way to approximate L3 using GLEE. This achieves good performance in those networks where CN and JS are invalid, and show that GLEE can take into account higher-order connectivity of the graph. Lemma 3.9. Assume S is the GLEE of a graph G of dimension d = n. Then, the number of paths of length three between two distinct nodes i and j is Proof. The number of paths of length three between i and j is (A 3 ) i j , where A is the adjacency matrix of G. We have where the last expression follows by the linearity of the dot product, and is equivalent to (11). □ When d < n, we can estimate deg(i) by ∥s d i ∥ 2 and N (i) byN (i) as before, with the help of an estimatorθ from Section 3.1.

Runtime analysis
On a graph G with n nodes, finding the k largest eigenvalues and eigenvectors of the Laplacian takes O(kn 2 ) time, if one uses algorithms for fast approximate singular value decomposition [18,45]. Given a k-dimensional embedding matrix S, reconstructing the graph is as fast as computing the product S · S T and applying the threshold θ to each entry, thus it takes O(n ω + n 2 ), where ω is the exponent of matrix multiplication. Approximating the number of common neighbors between nodes i and j depends only on the dot products between embeddings corresponding to their neighbors, thus it takes O(k × min(deg(i), deg(j))), while approximating the number of paths of length 3 takes O(k × deg(i) × deg(j)).

RELATED WORK
Spectral analyses of the Laplacian matrix have multiple applications in graph theory, network science, and graph mining [30,41,46]. Indeed, the eigendecomposition of the Laplacian has been used for sparsification [42], clustering [48], dynamics [36,47], robustness [20,40], etc. We here discuss those applications that are related to the general topic of this work, namely, dimensionality reduction of graphs.
One popular application is the use of Laplacian eigenvectors for graph drawing [22,35], which can be thought of as graph embedding for the specific objective of visualization. In [35] one such method is outlined, which, similarly to GLEE, assigns a vector, or higher-dimensional position, to each node in a graph using the eigenvectors of its Laplacian matrix, in such a way that the resulting vectors have certain desirable geometric properties. However, in the case of [35], those geometric properties are externally enforced as constraints in an optimization problem, whereas GLEE uses the intrinsic geometry already present in a particular decomposition of the Laplacian. Furthermore, their method focuses on the eigenvectors corresponding to the smallest eigenvalues of the Laplacian, while GLEE uses those corresponding to the largest eigenvalues, i.e. to the best approximation to the Laplacian through singular value decomposition.
On another front, many graph embedding algorithms have been proposed, see for example [16,19] for extensive reviews. Most of these methods fall in one of the following categories: matrix factorization, random walks, or deep architectures. Of special importance to us are methods that rely on matrix factorization. Among many advantages, we have at our disposal the full toolbox of spectral linear algebra to study them [7][8][9]28]. Examples in this category are the aforementioned Laplacian Eigenmaps (LE) [3,4] and Graph Factorization (GF) [1]. One important difference between GLEE and LE is that LE uses the small eigenvalues of the normalized Laplacian D −1 L, while GLEE uses the large eigenvalues of L. Furthermore, LE does not present the rich geometry of the simplex. Graph Factorization (GF) finds a decomposition of the weighted adjacency matrix W with a regularization term. Their objective is to find embeddings {s i } such that s i · s j = a i j , whereas in our case we try to reconstruct s i · s j = L ij . This means that the embeddings found by Graph Factorization will present different geometric properties. There are many other methods of dimensionality reduction on graphs that depend on matrix factorization [5,25,50]. However, even if some parameterization, or special case, of any of these methods results in a method resembling the singular value decomposition of the Laplacian (thus imitating GLEE), to the authors' knowledge none of these methods make direct use of its intrinsic geometry.
Among the methods based on random walks we find DeepWalk [34] and node2vec [17], both of which adapt the framework of word embeddings [29] to graphs by using random walks and optimize a shallow architecture. It is also worth mentioning NetMF [37] which unifies several methods in a single algorithm that depends on matrix factorization and thus unifies the two previous categories.
Among the methods using deep architectures, we have the deep autoencoder Structural Deep Network Embedding (SDNE) [49]. It penalizes representations of similar nodes that are far from each other using the same objective as LE. Thus, SDNE is also based on the distance-minimization approach. There is also [6] which obtains a non-linear mapping between the probabilistic mutual information matrix (PMI) of a sampled network and the embedding space. This is akin to applying the distance-minimization assumption not to the graph directly but to the PMI matrix.
Others have used geometric approaches to embedding. For example, [14] and [33] find embeddings on the surface of a sphere, while [32] and [31] use the hyperbolic plane. These methods are generally developed under the assumption that the embedding space is used to generate the network itself. They are therefore aimed at recovering the generating coordinates, and not, as in GLEE's case, at finding a general representation suitable for downstream tasks.

EXPERIMENTS
We put into practice the procedures detailed in Sections 3.1 and 3.2 to showcase GLEE's performance in the tasks of link prediction and graph reconstruction. Code to compute the GLEE embeddings of networks and related computations is publicly available at [44]. For our experiments, we use the following baselines: GF because it is a direct factorization of the adjacency matrix, node2vec because it is regarded as a reference point among those methods based on random walks, SDNE because it aims to recover the adjacency matrix of a graph (a task GLEE excels at), NetMF because it generalizes several other well-known techniques, and LE because it is the method that most directly resembles our own. In this way we cover all of the categories explained in Section 4 and use either methods that resemble GLEE closely or methods that have been found to generalize other techniques. For node2vec and SDNE we use default parameters. For NetMF we use the spectral approximation with rank 256. The data sets we use are outlined in Table 1. Beside comparing GLEE to the other algorithms, we are interested in how the graph's structure affects performance of each method. This is why we have chosen data sets have similar number of nodes and edges, but different values of average clustering coefficient. Accordingly, we report our results with respect to the average clustering coefficient of each data set and the number of dimensions of the embedding (the only parameter of GLEE). In Appendix B we compare the performance of each estimator explained in Section 3.1. In the following experiments we useθ k as our estimator for θ opt .
as performance metric the precision at k measure, defined as the precision of the first k reconstructed edges. Note that precision at k must always decrease when k grows large, as there will be few correct edges left to reconstruct.
Following Section 3.1, we reconstruct the edge (i, j) if s d i · s d j <θ . The further the dot product is from 0 (the ideal value for non-edges), the more confident we are in the existence of this edge. For LE, we reconstruct the edge (i, j) according to how small the distance between their embeddings is. For both GF, node2vec and NetMF, we reconstruct edges based on how high their dot product is. SDNE is a deep autoencoder and thus its very architecture involves a mechanism to reconstruct the adjacency matrix of the input graph.
We show results in Figure 2, where we have ordered data sets from left to right in ascending order of clustering coefficient, and from bottom up in ascending order of embedding dimension. GF results omitted from this Figure as it scored close to 0 for all values of k and d. On CA-GrQc, for low embedding dimension d = 32, SDNE performs best among all methods, followed by node2vec and LE. However, as d increases, GLEE substantially outperforms all others, reaching an almost perfect precision score at the first 10,000 reconstructed edges. Interestingly, other methods do not substantially improve performance as d increases. This analysis is also valid for CA-HepTh, another data set with high clustering coefficient. However, on PPI, our data set with lowest clustering coefficient, GLEE drastically outperforms all other methods for all values of d. Interestingly, LE and node2vec perform well compared to other methods in data sets with high clustering, but their performance drops to near zero on PPI. We hypothesize that this is due to the fact that LE and node2vec depend on the "community-aware" assumption, thereby assuming that two proteins in the same cluster would interact with each other. This is the exact point that [23] refutes. On the other hand, GLEE directly encodes graph structure, making no assumptions about the original graph, and its performance depends more directly on the embedding dimension than on the clustering coefficient, or on any other assumption about graph structure. GLEE's performance on data sets PPI, Wiki-Vote, and caida point to the excellent potential of our method in the case of low clustering coefficient.

Link Prediction
Given the embedding of a large subgraph of some graph G, can we identify which edges are missing? The experimental setup is as follows. Given a graph G with n nodes, node set V and edge set E obs , we randomly split its edges into train and test sets E t r ain and E t est . We use |E t r ain | = 0.75n, and we make sure that the subgraph induced by E t r ain , denoted by G t r ain , is connected and contains every node of V . We then proceed to compute the GLEE of G t r ain and test on E t est . We report AUC metric for this task. We use both techniques described in Sections 3.2.1 and 3.2.2, which we label GLEE and GLEE-L3 respectively Figure 3 shows that node2vec repeats the behavior seen in graph reconstruction of increased performance as clustering coefficient increases, though again it is fairly constant with respect to embedding dimension. This observation is also true for NetMF. On the high clustering data sets, LE and GLEE have comparable performance to each other. However, either GLEE or GLEE-L3 perform better than all others on the low clustering data sets PPI, Wiki-Vote, as expected. Also as expected, the performance of GLEE-L3 decreases as average clustering increases. Note that GLEE and LE generally improve performance when d increases, whereas node2vec and SDNE do not improve. (GF and SDNE not shown in Figure 3 for clarity. They scored close to 0.5 and 0.6 in all data sets independently of d.) The reason why none of the methods studied here perform better than 0.6 AUC in the caida data set is an open question left for future research. We conclude that the hybrid approach of NetMF is ideal for high clustering coefficient, whereas GLEE is a viable option in the case of low clustering coefficient as evidenced by the results on PPI, Wiki-Vote, and caida.

CONCLUSIONS
In this work we have presented the Geometric Laplacian Eigenmap Embedding (GLEE), a geometric approach to graph embedding that exploits the intrinsic geometry of the Laplacian. When compared to other methods, we find that GLEE performs the best when the underlying graph has low clustering coefficient, while still performing comparably to other state-of-the-art methods when the clustering coefficient is high. We hypothesize that this is due to the fact that the large eigenvalues of the Laplacian correspond to the small eigenvalues of the adjacency matrix and thus represent the structure of the graph at a micro level. Furthermore, we find that GLEE's performance increases as the embedding dimension increases, something we do not see in other methods. In contrast to techniques based on neural networks, which have many hyperparameters and costly training phases, GLEE has only one parameter other than the embedding dimension, the threshold θ , and we have provided three different ways of optimizing for it. Indeed, GLEE only depends on the SVD of the Laplacian matrix.
We attribute these desirable properties of GLEE to the fact that it departs from the traditional literature of graph embedding by replacing the "community aware" notion (similar nodes' embeddings must be similar) with the notion of directly encoding graph structure using the geometry of the embedding space. In all, we find that GLEE is a promising alternative for graph embedding due to its simplicity in both theoretical background and computational implementation, especially in the case of low clustering coefficient. By taking a direct geometric encoding of graph structure using the simplex geometry, GLEE covers the gap left open by the "community aware" assumption of other embedding techniques, which requires high clustering. Future lines of work will explore

A THRESHOLD ESTIMATORS
We present two other estimators of θ opt to accompany the heuristiĉ θ c = −0.5 mentioned in Section 3.1.

A.1 Kernel Density Estimation
As can be seen in Figure 4, the problem of finding a value of θ that sufficiently separates the peaks corresponding to edges (around the peak centered at −1) and non-edges (around the peak centered at 0) can be stated in terms of density estimation. That is, given the histogram of values of s i · s T j for all i, j, we can approximate the density of this empirical distribution by some density function f k . A good heuristic estimator of θ opt is the value that minimizes f k between the peaks near −1 and 0. For this purpose, we use Kernel Density Estimation over the distribution of s i · s T j and a box kernel (a.k.a. "top hat" kernel) function to define We then use gradient descent to find the minimal value of f k between the values of −1 and 0. We call this valueθ k . We have found experimentally that a value of h = 0.3 gives excellent results, achieving near zero error in the reconstruction task ( Figure 4, middle row).

A.2 Gaussian Mixture Models
Here we use a Gaussian Mixture Model (GMM) over the distribution of s i · s j . The model will find the two peaks near −1 and 0 and fit each to a Gaussian distribution. Once the densities of said Gaussians have been found, say f 1 and f 2 , we define the estimatorθ д as that point at which the densities are equal (see Figure 4, bottom row).
However, we found that a direct application of this method yields poor results due to the sparsity of network data sets. High sparsity implies that the peak at 0 is orders of magnitude higher than the one at −1. Thus, the left peak will usually be hidden by the tail of the right one so that the GMM cannot detect it. To solve this issue we take two steps. First, we use a Bayesian version of GMM that accepts priors for the Gaussian means and other parameters. This guides the GMM optimization algorithm to find the right peaks at the right places. Second, we sub-sample the distribution of dot products in order to minimize the difference between the peaks, and then to fix it back after the fit. Concretely, put r = i <j 1{s i · s T j <θ c }. That is, r is the number of dot products less than the constantθ c = −0.5. Instead of fitting the GMM to all the observed dot products, we fit it to the set of all r dot products less thanθ c plus a random sample of r dot products larger thanθ c . This temporarily fixes the class imbalance, which we recover after the model has been fit as follows. The GMM fit will yield a density for the sub-sample as f д = w 1 f 1 + w 2 f 2 , where f i is the density of the ith Gaussian, and w i are the mixture weights, for i = 1, 2. Since we sub-sampled the distribution, we will get w 1 ≈ w 2 ≈ 0.5, but we need the weights to reflect the original class imbalance. For this purpose, we definê w 1 =m/ n 2 andŵ 2 = 1−ŵ 1 , wherem is an estimate for the number of edges in the graph. (This can be estimated in a number of ways, for example one may putm = r , orm = n log(n).) Finally, we define the estimator as the value that satisfieŝ under the constraint that −1 <θ д < 0. Since f 1 and f 2 are known Gaussian densities, Equation 16 can be solved analytically.
In this case, due to sparsity, the problem of optimizing the GMM is one of non-parametric density estimation with extreme class imbalance. We solve it by utilizing priors for the optimization algorithm, as well as sub-sampling the distribution of dot products, according to some of its known features (i.e., the fact that the peaks will be found near −1 and 0), and we account for the class imbalance by estimating graph sparsity separately. Finally, we define the estimatorθ д according to Equation 16. Algorithm 1 gives an overview of this procedure. For a comparison between the effectiveness of the three different estimatorsθ c ,θ k ,θ д , see Appendix B.

B ESTIMATOR COMPARISON
In Section 3.1 and Appendix A we outlined three different schemes to estimate θ opt which resulted inθ c ,θ k ,θ д . Which one is the best? We test each each of these estimators on three random graph models: Erdös-Rényi (ER) [13], Barabási-Albert (BA) [2], and Hyperbolic Graphs (HG) [24]. For each random graph with adjacency matrix A, we compute the Frobenius norm of the difference between the reconstructed adjacency matrixÂ using each of the three estimators. In Figure 5 we show our results. We see thatθ c andθ k L ← {s i · s T j : s i · s T j <θ c } 3: R ← random sample of size |L| of {s i · s T j : s i · s T j ≥θ c } 4: w 1 , w 2 , f 1 , f 2 ← fit a Bayesian GMM to L ∪ R 5:ŵ 1 ←m/ n 2 6:ŵ 2 ← 1 −ŵ 1 7:θ д ← solution ofŵ 1 f 1 (θ ) =ŵ 2 f 2 (θ ) 8: returnθ д 9: end procedure achieve similar performance across data sets, whileθ д outperforms the other two for ER at d = 512, though it has high variability in the other models. From these results we conclude that at low dimensions d = 32, too much information has been lost and thus there is no hope to learn a value ofθ that outperforms the heuristiĉ θ c = −0.5. However, at larger dimensions, the estimatorsθ д andθ k perform better, with different degrees of variability. We conclude also that no single heuristic forθ is best for all types of graphs. In the rest of our experiments we useθ k as our estimator for θ opt . We highlight that even though θ k is better than θ c in some data sets, it might be costly to compute, while θ c incurs no additional costs.  Figure 5: Estimator comparison. We compute the three different estimators on three different random graph models: Erdös-Rényi (ER), Barabási-Albert (BA), and Hyperbolic Graphs (HG). All graphs have n = 10 3 nodes, and average degree ⟨k⟩ ≈ 8. Hyperbolic graphs generated with degree distribution exponent γ = 2.3. We show the average of 20 experiments; error bars mark two standard deviations. Values normalized in the range [0, 1].