Introduction
Manifold learning techniques such as PCA and Isomap (more generally Laplacian Eigenmaps) have been demonstrated to be useful as dimensionality reduction techniques, and potential relief from the “curse of dimensionality”. But, they don’t really “learn” a bonafide manifold, and don’t presume a statistical model for their input. Recent developments1 show that provided some “reasonable geometry” of the underlying manifold and sufficient data, an approximation can be recovered. Recovering a manifold from points can be formulated in terms of intrinsic or extrinsic distances. The intrinsic variant is the Geometric Whitney Problem2.
A related problem in the field of Random Geometric Graphs is that of recovering distances (up to some error) between vertices of a random geometric graph. This problem is essentially approximating the original metric space of the vertices, provided that the input is a random geometric graph which has “masked away” exact distance information.
Some of the immediate applications of Random Geometric Graphs can lie in network modelling by fitting an RGG model to an observed sub-graph of a tremendously large network to make inferences (such as existence of Hamiltonian cycles3). Random Geometric Graphs can readily be applied to internet, biological and social network modelling. The core idea here is that connection probablility is based on some notion of vertex similarity4
A natural question is “Do suitably general random graph models on points sampled from a manifold captures sufficient information to approximate the original metric spaces (both manifold and ambient), and thus reconstruct the manifold?”.
The paper by Huang, H., Jiradilok, P. and Mossel, E. on “Reconstructing the geometry of random geometric graphs”5 addresses informally the question:
Given some points randomly sampled from some abstract space/surface, and only a vague/coarse notation of distance between only a few pairs of these points, can guess what the surface/space ‘looks like’?
In particular, the problem of of “reconstructing the geometry” is specialized to “approximating metric spaces”
Prior Art
The authors’ work connects the two fields of Random Geometric Graphs and Manifold learning. Possibly, the closest and most relevant work to theirs is that of Araya and DeCastro: reconstructing distances from a random geometric graph obtained from uniformly sampled points on an d-dimensional sphere (d is unknown) in the sparse regime without a known “link” function6. Other works have explored re-constructing distances when the sampling manifold is known to be a member of a 1-parameter family (eg. unit cubes). The generalization of manifold, and relaxation of probability measure constraints are salient features of the authors’ work.
Superficially, the work of Araya and DeCastro formalizes the notion of a “link” function in the form of graphons (with special reference to graphons which are univariate functions over the inner product), and considers relatively sparse graphs. The graphon formalization of Random Geometric Graphs is more general than the distance-probability-function based definition employed by the authors, which in turn in more general than the classical Random Geometric Graph (threshold, dot-product based) which can be seen as a special case where the graphon is an indicator function over distances / dot products. It is evident that the classical variant poses a question about distribution of distances among sampled points; in comparison the current model has two “levels of randomness” - since points must be sampled from the manifold as per a probability measure, and then for every pair, a Bernoulli trial with success probability determined by the distance-probability function determines the edges.
The Random Geometric Graph Model
A function \(\rho: \mathbb{R} \to [0, 1]\) is a distance-probability function over manifold \(M\)7 if:
- \(\rho\) is Lipschitz with constant \(L_p < \infty\)
- \(\forall 0 \le a \le b \le \text{diam}(M), \exists l_p \in \mathbb{R}^{+}\) such that: \(\rho(a) - \rho(b) \ge l_p (b-a)\)
Let \(X_1, X_2, ..., X_N \in \mathbb{R}^n\) be \(N\) points sampled from a d-dimensional manifold in \(\mathbb{R}^n\) according to probability measure \(\mu\). Obtain a matrix of independent random values, \(U \in \mathcal{M}_{n}(\mathbb{R})\) such that: \(U_{ij} \sim \text{uniform}(0,1)\).
Let \(V = \{1, 2, ..., N\} \subset \mathbb{N}\), then define: \(E = \{ (i, j) \in V \times V | i \neq j \land U_{ij} < \rho(\|X_i - X_j\|) \}\). The resulting undirected graph \(G = (V, E)\) is the random geometric under consideration.
Constraints
The authors impose fairly tame constraints.
M is a d-dimensional compact, complete, connected, smooth manifold embedded in \(\mathbb{R}^n\) with its Riemannian metric induced from \(\mathbb{R}^n\).
There exists a non-zero lower bound for the injectivity radius of \(M\).
\(\kappa = \max_{p \in M}{\max_{u,v \in T_pM, \|u\|=\|v\|=1}{\langle S_p(u), v \rangle}} < \infty\), where \(S\) is the shape operator, and \(\langle , \rangle\) denotes the inner product.
\(\exists r_{M,0} \in \mathbb{R}^{+}\) such that \(\forall p \in M, r < r_{M,0}, \mathcal{B}(p;r) \cap M\) is connected (i.e, locally 1-piece). This is termed Repulsion property by the authors.
Let \(\mu\) be the probability measure on M. Define: \(\mu_{\min}(r) = \min_{p \in M}{\mu(\mathcal{B}(p; r) \cap M)}\). Then, \(\inf_{r \in (0, r_M]}{\mu_{\min}(r) r^{-d}} > c_{\mu} \in \mathbb{R}^{+}\)
The constraint on \(\kappa\), intuitively, serves to provide “local bounds” on the behaviour of the manifold around a point. In particular, \(\kappa\) measures the maximum amount by which the tangent space could ‘turn’ around any point. One may visualize the manifold to “look like” a “bent” disk locally.
The original formulation of the constraints does not explicitly include the injectivity radius lower bound; but is nonetheless required to reconstruct the Riemannian manifold.8
The Injectivity Radius constraint is stronger than Repulsion property constraint. Let \(r_{*} = \inf{\{r \in \mathbb{R}^{+}| \exists p \in M, \text{disconnected}(\mathcal{B}(p; r) \cap M\})}\). Suppose \(A = \mathcal{B}(p, r_0) \cap M\) is disconnected for some \(p \in M, r_0 \in \mathbb{R}^{+}\). Since \(M\) is compact, there are finitely many components9 \(D_1, D_2, .., D_l | \cup_{i=1}^{l}{D_i} = A\). Define \(d(D_i, D_j) = \inf{\{}\epsilon \in \mathbb{R}^{+} | D_j \subseteq (D_i)_{\epsilon} \land D_i \subseteq (D_j)_{\epsilon}\}\) where \((D)_{\epsilon}\) is the \(\epsilon\)-fattening of D. Choose D, D’ by \(\arg\min_{i, j}{d(D_i, D_j)}\). Among D, D’ choose p, q by \(\arg\min_{p \in D, q \in D'}{d(p,q)}\). Let \(2 \tau_0 = d(p,q)\) then \(D_{\tau_0} \cap D'_{\tau_0} \neq \emptyset\). Thus, the injectivity radius can be atmost \(\tau_0 < \frac{r_*}{2}\). Clearly, \(0 \leq \tau \leq \frac{1}{2}r_{*}\). If Repulsion property does not hold, then \(r_{*} = 0 \implies \tau = 0\).
Main Result
There exists \(c>0\) so that the following holds. Suppose \((M,\mu,\rho)\) satisfies aforementioned assumptions. Then there exists a deterministic polynomial-in-\(n\) time algorithm that takes \(G=G(n,M,\mu,\rho)\) as input, and outputs a weighted graph \(\Gamma\) on \([n]\) and a metric \(d_{euc}\) on \([n]\), such that, with probability \(1-n^{-\omega(1)}\),
for every \(p \in M\), there exists \(v\) such that \[\|X_v-p\| \le d_{gd}(X_v,p) \le C(M,\mu,\rho)n^{-c/d}.\]
for every pair of vertices \(v,w\), \[\left|d_{gd}(X_v,X_w) - d_{\Gamma}(v,w) \right| \le C(M,\mu,\rho)n^{-c/d},\] where \(d_{gd}\) denoted the geodesic distance on \(M\) and \(d_{\Gamma}\) denotes the path metric on \(\Gamma\).
for every pair of vertices \(v,w\), \[\big|\| X_v - X_w \| - d_{euc}(v,w) \big| \le C(M,\mu,\rho)n^{-c/d},\] where \(\| \cdot \|\) denotes the Euclidean norm.
Here, \(C(M,\mu,\rho) \ge 1\) is a constant which depends on \(M\), \(\mu\), and \(\rho\).
From the graph observer’s PoV, the following must be known:
- An instance of the random graph \(G\) together with the description of how the random graph is constructed,
- The property of the manifold \(M\) that it is compact, complete, connected, and smooth, and that it is embedded in some Euclidean space \(\mathbb{R}^N\) ,
- The dimension \(d \in \mathbb{Z}_{\ge 1}\) of the manifold \(M\),
- The function \(\rho:[0,\infty) \to [0,1]\),
- An upper bound \(D^0 > 0\), together with the information that \(\text{diam}(M) \le D^0\),
- An upper bound \(\kappa^0 > 0\), together with the information that \(\kappa \le \kappa^0\),
- A lower bound \(r_{M,0}^0 > 0\), together with the information that \(r_{M,0} \ge r_{M,0}^0\), and
- A constant \(C^0 > 0\), together with the information that \(\mu_{\min}(x) \ge C^0 \cdot x^d\), for every \(x \in [0,r_M]\).
Proof Ideas
It suffices to obtain a \(\delta\)-net of \(\eta\)-clusters, i.e, a collection \(\{(u_\alpha, U_\alpha)\}_{\alpha}\) for \(u_\alpha \in V, U_\alpha \subset V\) such that \(\cup_{\alpha}{\mathcal{B}(X_\alpha, \delta)} \supseteq M\) and \(u_{\alpha_2} \in U_{\alpha_1} \implies \|X_{\alpha_1} - X_{\alpha_2}\| \leq \eta\). Clustering can determine distances, since distances between some vertex \(u_\beta\) and \(v \in U_\alpha\) differ by \(\sim \eta\), the quantity \(s_{U_\alpha, v} = \rho^{-1}\bigl({\frac{N(v) \cap U_\alpha}{U_\alpha}\bigr)} \approx \|X_v - X_\alpha\|\) since the independent events of having edges to the cluster can be approximated as Bernoulli trials.
Furthermore, the cluster-net may be obtained in a recursive manner- provided the first cluster, attempt to find an “almost orthogonal” set of vertices and corresponding clusters using which constraints may be imposed for the selection of the next cluster at a distance \(\delta\) away.
Weighted graphs \(\Gamma_{r}\) can be recovered from the cluster by adding edges \((u_\alpha, u_\beta)\) when \(s_{U_\alpha, u_\beta} < r\) with the same weight. The path metric on \(\Gamma_{2\delta^{\frac{1}{3}}}\) approximates geodesic distance, and the path metric on \(\Gamma_{\infty}\) approximates euclidean distance.
Intuitively, clusters could be constructed be inspecting common neighbours, since if two vertices have a large number of common neighbours we suspect that they are proximal in the original metric space. However, naively applying this in all cases can result in situations where points very far apart would be indistinguishable.
Consider the 2-manifold of the unit disk \(D\) in \(\mathbb{R}^2\), parametrized in polar co-ordinates, with probability measure \(\frac{d\mu}{dA} = \frac{e^{-\alpha r} + \varepsilon}{\varepsilon \pi + \frac{2\pi}{\alpha^2}(1 - (\alpha + 1)e^{-\alpha})}\) for large \(\alpha >> 0\) and \(\varepsilon \approx 0\). Suppose \(X_u, X_v \in D\) are diametrically opposite points on the circumference. Consider another \(X_{u'} \approx X_{u}\), i.e, another i.i.d. variable that takes the same value, or is very near to \(X_u\). Although, this is a somewhat rare event, in such a case merely grouping based on common neighbours will not be able to distinguish \(\|X_u - X_{u'}\|\) from \(\|X_u - X_v\|\), i.e, \(\#\text{CommonNeighbours}(u,v) \approx \#\text{CommonNeighbours}(u, v)\) simply because majority of the common neighbours will come from the central inner part of disk.
The intuitive idea of clustering based on common-neighbours is not entirely wrong, however, as it is still useful in the extremal case. Define \(K: M \times M \to [0,1]\) as \(K(x,y) = \mathbb{E}_{Z \sim \mu}{[\rho(\|x-Z\|)\rho(\|y-Z\|)]}\) and \(K_{\infty} = \max_{x, y \in M}{K(x,y)}\). Now write, \(K(x,y) = \frac{K(x,x) + K(y,y)}{2} - \frac{1}{2}\int_{M}{(\rho(\|x-Z\|) - \rho(\|y-Z\|))^2 d\mu}\) by applying the identity \(ab = \frac{1}{2}(a^2 + b^2 - (a-b)^2)\).
Suppose \(\exists A(x,y) \subset M\) such that \(\forall z \in A, |\rho(\|z -x\|) - \rho(\|z - y\|)| \ge \alpha \min\{\|x-y\|, r_{M}\}\), then:
\[\int_{A}{(\rho(\|x-Z\|) - \rho(\|y-Z\|))^2 d\mu} \ge \alpha^2 l_p^2 \mu_{\min}(A(x,y)) \min\{\|x-y\|^2, r_{M}^2\} = 2c_1 \min\{\|x-y\|^2, r_{M}^2\}\]
It can be shown that one choice of \(A(x,y)\) is a ball of radius \(\frac{r_M}{4}\) within a neighbourhood of \(x\) making \(c_1\) independent of x and y.
Thus: \[K(x,y) \le K_{\infty} - c_1 \min\{\|x-y\|^2, r_{M}^2\}\]
It can be shown by applying Hoeffding Inequality for random variates that
\[\mathbb{P}\bigg(\left\{\forall \{i,j\} \in W^2, i \neq j\, \Big| \frac{1}{n}|N_W(i) \cap N_W(j)| - K(X_i,X_j) \Big| \le n^{-1/2+\zeta}\right\}\bigg)\]
is \(1 - n^{-\omega(1)}\) for \(W \subseteq V, |W| = n\).
Thus, the expression \(k(i,j) = \frac{1}{|W|}|N_W(i) \cap N_W(j)|\), in the ideal case with sufficiently many points, is a stand-in for \(K(X_i, X_j)\) with reasonably small defect. With this, we can rank all pairs \({i, j}\) according to \(k(i,j)\) and choose top pairs such that their defect with the maximum lies within some factor proportional to \(c_1\), then we can extract a cluster from the subgraph induced by these vertices and pairs.
Footnotes
Fefferman et. al 2013↩︎
after H. Whitney↩︎
See Diaz et al., “Sharp Threshold for Hamiltonicity of Random Geometric Graphs”↩︎
cf. Erdos-Renyi graph and bond-percolation.↩︎
Available at: https://arxiv.org/abs/2402.09591↩︎
E. Araya, Y. DeCastro 2019↩︎
\(\rho\) must specify a graphon which “differentiates” between points in the manifold, otherwise resultant geometric graphs will be indistinguishable under isometries of the original manifold↩︎
See “Geometric Whitney Problem” Fefferman et. al, 2019↩︎
choose an open cover where a set contains at most 1 component↩︎