-
PDF
- Split View
-
Views
-
Cite
Cite
Armin Eftekhari, Tamir Bendory, Gongguo Tang, Stable super-resolution of images: theoretical study, Information and Inference: A Journal of the IMA, Volume 10, Issue 1, March 2021, Pages 161–193, https://doi.org/10.1093/imaiai/iaaa029
- Share Icon Share
Abstract
We study the ubiquitous super-resolution problem, in which one aims at localizing positive point sources in an image, blurred by the point spread function of the imaging device. To recover the point sources, we propose to solve a convex feasibility program, which simply finds a non-negative Borel measure that agrees with the observations collected by the imaging device. In the absence of imaging noise, we show that solving this convex program uniquely retrieves the point sources, provided that the imaging device collects enough observations. This result holds true if the point spread function of the imaging device can be decomposed into horizontal and vertical components and if the translations of these components form a Chebyshev system, i.e., a system of continuous functions that loosely behave like algebraic polynomials. Building upon the recent results for one-dimensional signals, we prove that this super-resolution algorithm is stable, in the generalized Wasserstein metric, to model mismatch (i.e., when the image is not sparse) and to additive imaging noise. In particular, the recovery error depends on the noise level and how well the image can be approximated with well-separated point sources. As an example, we verify these claims for the important case of a Gaussian point spread function. The proofs rely on the construction of novel interpolating polynomials—which are the main technical contribution of this paper—and partially resolve the question raised in Schiebinger et al. (2017, Inf. Inference, 7, 1–30) about the extension of the standard machinery to higher dimensions.
1. Introduction
Consider an unknown number of point sources with unknown locations and amplitudes. An imaging mechanism provides us with a few noisy measurements from which we wish to estimate the locations and amplitudes of these sources. Because of the finite resolution of any imaging device, poorly separated sources are indistinguishable without using an appropriate localization technique that would take into account the sparse structure within the image.
This super-resolution problem of localizing point sources finds various applications in, for instance, astronomy [3], geophysics [4], chemistry, medicine, microscopy and neuroscience [5–11]. In this paper, we study the grid-free and non-negative super-resolution of two-dimensional (2-D) signals (i.e., images) in the presence of noise, extending the one-dimensional (1-D) results of [1].
Program (1.5) does not involve a grid on |${\mathbb{I}}^2$| and notably does not regularize |$z$| beyond non-negativity, thus radically deviating from the existing literature [2, 13, 14, 19, 20]. This paper establishes that in the noiseless setting |$\delta =0$|, solving Program (1.5) precisely recovers the true measure |$x$|, provided that |$x$| is a non-negative sparse measure on |${\mathbb{I}}^2$| and under certain conditions on the imaging apparatus |$\varPhi $|. In addition, when |$\delta>0$| and |$x$| is an arbitrary nonnegative measure on |${\mathbb{I}}^2$|, solving Program (1.5) well approximates |$x$|. In particular, we establish that any non-negative measure supported on |${\mathbb{I}}^2$| that agrees with the observations |$y$| in the sense of Program (1.5) is near the true measure |$x$|.
This paper does not focus on the important question of how to numerically solve the infinite-dimensional Program (1.5) in practice. One straightforward approach would be to discretize the measure |$z$| on a fine uniform grid for |${\mathbb{I}}^2$|, thereby replacing Program (1.5) with a finite-dimensional convex feasibility program that can be solved with standard convex solvers. Moreover, a few recent papers have proposed algorithms to directly solve Program (1.5) [21–23], i.e., these algorithms can be used to solve Program (1.5) without discretization. A comprehensive numerical comparison between these alternatives is of great importance and we leave that to a future study. This paper instead aims to provide theoretical justifications for the success of Program (1.5), thereby arguing that imposing non-negativity is theoretically enough for successful super-resolution. In other words, under mild conditions, the imaging device acts as an injective map on sparse non-negative measures and we can stably find its inverse map.
This work relies heavily on a recent work [1], which established that grid-free and non-negative super-resolution in 1-D can be achieved by solving the 1-D version of Program (1.5). In doing so, it removed the regularization required in prior work and substantially simplified the existing results. However, extending [1] to two dimensions is far from trivial and requires a careful design of a new family of dual certificates, as will become clear in the next sections. Indeed, this work overcomes the technical obstacles noted in [2, Section 4] for extending the proof machinery to higher dimensions.
Before turning to the details, let us summarize the technical contributions of this paper. Section 2 presents these contributions in detail, while proofs are deferred to Section 4 and the appendices.
Sparse measures without noise. Suppose that the measure |$x$| consists of |$K$| positive impulses located in |${\mathbb{I}}^2$|. In the absence of noise (|$\delta =0$|), Proposition 2 below shows that solving Program (1.5) with |$\delta ^{\prime}=0$| successfully recovers |$x$| from the observations |$y\in \mathbb{R}^{M\times M}$|, provided that |$M\ge 2K+1$| and that |$\{\phi _m\}_{m=1}^M$| form a Chebyshev system on |${\mathbb{I}}$|. A Chebyshev system, or |$\mathscr{C}$|-system for short,1 is a collection of continuous functions that loosely behave like algebraic monomials; see Definition 1. |$\mathscr{C}$|-system is a widely used concept in classical approximation theory [24–26] that also plays a pivotal role in some modern signal processing applications; see for instance [1, 2, 19]. In other words, Proposition 2 below establishes that the imaging operator |$\varPhi $| in (1.4) is an injective map from |$K$|-sparse non-negative measures on |${\mathbb{I}}^2$|, provided that |$\{\phi _m\}_{m=1}^M$| form a |$\mathscr{C}$|-system on |${\mathbb{I}}$| and |$M\ge 2K+1$|.
In contrast to earlier results, no minimum separation between the impulses is necessary, Program (1.5) does not contain any explicit regularization to promote sparsity, and lastly |$\{\phi _m\}_{m=1}^M$| need only to be continuous. We note that Proposition 2 is a non-trivial extension of the 1-D result in [1] to images. Indeed, the key concept of |$\mathscr{C}$|-systems do not generalize to two or higher dimensions and proving Proposition 2 requires a novel construction of dual certificates to overcome the technical obstacles anticipated in [2, Section 4].
Arbitrary measure with noise. More generally, consider an arbitrary non-negative measure |$x$| supported on |${\mathbb{I}}^2$|. As detailed later, given |$\varepsilon \in (0,1/2]$|, the measure |$x$| can always be approximated with a |$K$|-sparse and |$\varepsilon $|-separated non-negative measure, up to an error of |$R(x,K,\varepsilon )$| in the generalized Wasserstein metric, denoted throughout by |$d_{\textrm{GW}}$|. This is true even if |$x$| itself is not |$\varepsilon $|-separated or not atomic at all. We may think of |$R(x,K,\varepsilon )$| as the ‘model-mismatch’ of approximating |$x$| with a well-separated sparse measure, i.e., |$R(x,K,\varepsilon ) = d_{\textrm{GW}}(x,x_{K,\varepsilon })= \min d_{\textrm{GW}}(x,\chi )$|, where the minimum is taken over every non-negative, |$K$|-sparse and |$\varepsilon $|-separated measure |$\chi $|.
We remark that Theorem 11 applies to any non-negative measure |$x$|, without requiring any separation between the impulses in |$x$|. In fact, |$x$| might not be atomic at all. Of course, the recovery error |$d_{\textrm{GW}}(x,\widehat{x})$| does depend on how well |$x$| can be approximated with a well-separated sparse measure, which is reflected in the right-hand side of (1.6) and hidden in the factors |$c_1,c_2,c_3$| therein. As emphasized earlier, no regularization other than non-negativity is used and |$\{\phi _m\}_m$| need only be continuous.
As a concrete example of this general framework, we consider the case where |$\{\phi _m\}_m$| are translated copies of a Gaussian ‘window’, i.e., copies of a Gaussian function. Building on the results from [1], we show in Section 2.3 that the conditions for both Proposition 2 and Theorem 11 are met for this important example. That is, solving Program (1.5) successfully and stably recovers an image that has undergone Gaussian blurring.
2. Main results
2.1 Sparse measure without noise
(|$\mathscr{C}$|-system). Real-valued and continuous functions |$\{\phi _{m}\}_{m=1}^M$| form a |$\mathscr{C}$|-system on the interval |${\mathbb{I}}$|, provided that the determinant of the |$M\times M$| matrix |$[\phi _{m}(\tau _{k})]_{k,m=1}^M$| is positive for every (strictly) increasing sequence |$\{\tau _{k}\}_{k=1}^M\subset{\mathbb{I}}$|.
Proved in Section 4.2, the following result states that solving Program (1.5) successfully recovers |$x$| from the noise-free image |$y$|, provided that the measurement functions form a |$\mathscr{C}$|-system.
(Sparse measure without noise). Let |$x$| be a |$K$|-sparse non-negative measure supported on |$\operatorname{interior}({\mathbb{I}}^2)$|, specified by (2.1). Suppose that |$M\ge 2K+1$| and that the measurement functions |$\{\phi _{m}\}_{m=1}^M$| form a |$\mathscr{C}$|-system on the interval |${\mathbb{I}}$|. Lastly, for |$\delta =0$|, consider the imaging operator |$\varPhi $| and the image |$y\in \mathbb{R}^{M\times M}$| in (1.3) and (1.4). Then, |$x$| is the unique solution of Program (1.5) with |$\delta ^{\prime}=0$|.

This figure visualizes the idea behind the proof of Proposition 2, explained in Remark 3. As an example, consider the non-negative measure |$x=a_1\delta _{\theta _1}+a_2\delta _{\theta _2}$|. The locations of two impulses at |$\theta _1=(t_1,s_1)\in{\mathbb{I}}^2$| and |$\theta _2=(t_2,s_2)\in{\mathbb{I}}^2$| are shown with black dots in Fig. 1(a). For Program (1.5) to successfully recover |$x$| from the image |$y$| in (2.2), we should construct a non-negative polynomial |$Q$| of the form of (2.3) that has zeros exactly on |$\theta _1$| and |$\theta _2$|. We do so by combining a number of univariate polynomials in |$t$| and |$s$|. More specifically, consider a non-negative polynomial |$q_{t_1}(t)=\sum _{m=1}^M b^{11}_m \phi _m(t)$| that vanishes only at |$t_1$|. Likewise, consider similar non-negative polynomials |$q_{t_2}(t)$|, |$q_{s_1}(s)$| and |$q_{s_2}(s)$|, which are zero only at |$t_2$|, |$s_1$| and |$s_2$|, respectively. Figure 1(c) shows the zero set of the polynomial |$q_{t_1}(t)q_{s_2}(s)$| as the union of blue and red lines. Similarly, Fig. 1(b) shows the zero set of the polynomial |$q_{t_2}(t)q_{s_1}(s)$|. Note that the intersection of these two zero sets is exactly |$\{\theta _1,\theta _2\}$|. That is, |$q(\theta ) = q_{t_1}(t)q_{s_2}(s)+q_{t_2}(t)q_{s_1}(s)$| is a non-negative polynomial of the form in (2.3) that has zeros only at |$\{\theta _1,\theta _2\}$|, as desired. It only remains now to construct the univariate polynomials |$q_{t_1},q_{t_2},q_{s_1},q_{s_2}$| described above. When the measurement functions |$\{\phi _m\}_{m=1}^M$| form a |$\mathscr{C}$|-system and |$M\geq 2K+1$|, the existence of these univariate polynomials follows from standard results. We note that the construction of |$Q$| in this example is slightly different from the proof of Proposition 2, in order to simplify this presentation.
2.2 Arbitrary measure with noise
In this section, we present the main result of this paper. Theorem 11 below generalizes Proposition 2 to account for ① model mismatch, where |$x$| is not necessarily a well-separated sparse measure but might be close to one and ② imaging noise (|$\delta \ge 0$|). That is, Theorem 11 below addresses the stability of Program (1.5) to model mismatch and its robustness against imaging noise. Some preparation is necessary before presenting the result.
2.2.1 Separation.
Unlike sparse and noise-free super-resolution in Proposition 2, a notion of separation will play a role in Theorem 11.
For example, for |$x$| in Fig. 1, we have |$\textrm{sep}(x)=\min (t_2-t_1,s_2-s_1,t_1,1-t_2,s_1,1-s_2)$|. We remark that our notion of separation is more restrictive than the one commonly used in the super-resolution literature [12, 28, 29], as it requires the point sources to be separated in both |$t$| and |$s$| directions by at least |$\nu $|.
2.2.2 Generalized Wasserstein distance.
Compared with (2.5), the two new terms in (2.7) gage the difference between the mass of |$x_1$| and the mass of |$x_2$|. Our choice of error metric |$d_{\textrm{GW}}$| is natural in the sense that any solution of Program (1.5) is itself a measure. However, note that controlling the error in the space of measures in our main result below does not immediately translate into controlling the error in the location of point sources, which may be estimated by, for example, applying the Prony’s method [17] to a solution of Program (1.5).
For instance, when |$\varepsilon $| is infinitesimally small, the measure |$\delta _{1/2}+\delta _{1/2+\varepsilon }$| has a small distance in the Wasserstein metric from the measure |$2\delta _{1/2}$|, but it is indeed impossible to distinguish the two impulses in the presence of noise in general, unless we impose additional structure on the noise. Nevertheless, in the limit of vanishing noise, it is possible to directly control the error in the location of point sources and we refer the reader to [33–35] for the details.
2.2.3 Model mismatch.
Our main result, Theorem 11 below, bounds the recovery error |$d_{\textrm{GW}}(x,\widehat{x})$|, where |$\widehat{x}$| is a solution of Program (1.5). Note that, even though |$x$| is an arbitrary non-negative measure in this section, it can always be approximated with a well-separated sparse measure, up to some error with respect to the metric |$d_{\textrm{GW}}$|. This sparse measure will play a key role in Theorem 11.
In words, the residual |$R(x,K,\varepsilon )$| can be thought of as the mismatch in modeling |$x$| with a well-separated sparse measure. Indeed, note that the minimum in (2.8) is achieved: we can limit the search in (2.8) to the (bounded) set of |$K$|-sparse and |$\varepsilon $|-separated measures with TV norm bounded by |$\|x\|_{\textrm{TV}}$|. This set is also closed, with respect to the weak topology imposed by |$d_{\textrm{GW}}$|, see [30, Theorem 13], and thus compact. Lastly, the objective function |$d_{\textrm{GW}}$| of (2.8) is a norm and thus continuous a fortiori, hence the claim.
2.2.4 Smoothness.
For Program (1.5) to succeed in the general settings of this section, we also impose additional requirements on the imaging apparatus in the next two paragraphs. We assume in this section that the imaging apparatus is smooth in the following sense.
It is often not difficult to verify the Lipschitz-continuity of |$\varPhi $| with respect to |$d_{\textrm{GW}}$|, as the following example demonstrates.
2.2.5 |$\mathscr{C}^*$|-system.
To study the stability of Program (1.5), we also need to modify the notion of |$\mathscr{C}$|-system in Definition 1. We begin with the definition of an admissible sequence, visualized in Fig. 2.

This figure illustrates an example of an admissible sequence; see Definition 8. For a fixed |$n$|, the red dots form the increasing sequence |$\{\tau _k^n\}^{M}_{k=0}$|. Note that the endpoints of the interval are included in the sequence, i.e., |$\tau _0^n = 0$| and |$\tau ^n_M=1$|. In view of Definition 8, as |$n\rightarrow \infty $|, the sequence |$\{\tau _k^n\}_{k=1}^{M-1}$| converges to three distinct points on the interior of the interval, shown with blue bars. Moreover, all limit points in |$(0,1)$| have an even multiplicity except for one, which has a multiplicity of exactly one.
For a pair of integers |$K$| and |$M$| obeying |$M\ge 2K+1$|, we say that |$\{\{\tau _{k}^{n}\}_{k=0}^M\}_{n\ge 1}\subset{\mathbb{I}}$| is a |$(K,\varepsilon )$|-admissible sequence if
1. |$\tau _{0}^{n}=0$| and |$\tau _{M}^{n} = 1$| for every |$n$|, i.e., the endpoints of |${\mathbb{I}}=[0,1]$| are included in the increasing sequence |$\{\tau _k^n\}_{k=0}^M$|, for every |$n$|;
2. as |$n\rightarrow \infty $|, the increasing sequence |$\{\tau _{k}^{n}\}^{M-1}_{k=1}$| converges (element-wise) to an |$\varepsilon $|-separated finite subset of |${\mathbb{I}}$| with at most |$K$| distinct points, where every element has an even multiplicity, except one element that appears only once.2
(|$\mathscr{C}^*$|-system). For an integer |$K$| and an even integer |$M$| obeying |$M\ge 2K+2$|, real-valued functions |$\{\phi _m\}_{m=0}^M$| form a |$\mathscr{C}^*_{K,\varepsilon }$|-system on |${\mathbb{I}}$| if every |$(K,\varepsilon )$|-admissible sequence |$\{\{\tau _{k}^n\}_{k=0}^{M}\}_{n\ge 1}$| satisfies
1. the determinant of the |$(M+1)\times (M+1)$| matrix |$[\phi _{m}(\tau _{k}^n)]_{k,m=0}^M $| is positive for all sufficiently large |$n$|;
2. moreover, all minors along the |$\underline{l}$|th row of the matrix |$[\phi _{m}(\tau _{k}^n)]_{k,m=0}^M$| approach zero at the same rate when |$n\rightarrow \infty $|. Here, |$\underline{l}$| is the index of the element of the limit sequence that appears only once.3 .
① Note that a |$\mathscr{C}^*_{K,\varepsilon }$|-system on |${\mathbb{I}}$| is also a |$\mathscr{C}^*_{K^{\prime},\varepsilon }$|-system for every integer |$K^{\prime}\le K$|. Indeed, this claim follows from the observation that every |$(K^{\prime},\varepsilon )$|-admissible sequence is itself a |$(K,\varepsilon )$|-admissible sequence. ② Moreover, if |$\{\phi _m\}_{m=0}^M$| form a |$\mathscr{C}^*_{K,\varepsilon }$|-system on |${\mathbb{I}}$|, then so do the scaled functions |$\{c_m \phi _m\}_{m=0}^M$| for positive constants |$\{c_m\}_{m=0}^M$|.
We remark that Definition 9 only considers admissible sequences to simplify the burden of verifying whether a family of functions form a |$\mathscr{C}^*$|-system.
To summarize, the widely used notion of |$\mathscr{C}$|-system in Definition 1 plays a key role in the analysis of sparse inverse problems in the absence of noise, whereas |$\mathscr{C}^*$|-system above was introduced in [1] and tailored for the stability analysis of sparse inverse problems.4 It was established in [1] that translated copies of the Gaussian window |$e^{-t^2}$| indeed form a |$\mathscr{C}^*$|-system, under mild conditions reviewed in Section 2.3 below. We suspect this to also hold for many other measurement windows with sufficiently fast decay.5
2.2.6 Main result.
We are now ready to present the main result of this paper, which quantifies the performance of Program (1.5) in the general case where |$x$| is an arbitrary non-negative measure on |${\mathbb{I}}^2$| and in the presence of additive noise. Theorem 11, proved in Section 4.4, states that Program (1.5) approximately recovers |$x$| provided that certain |$\mathscr{C}$|- and |$\mathscr{C}^*$|-systems exist. As an example of this general result, Section 2.3 later specializes Theorem 11 for imaging under Gaussian blur.
(Arbitrary measure with noise). Consider a non-negative measure |$x$| supported on |${\mathbb{I}}^2$|. Consider also a noise level |$\delta \ge 0$|, the measurement functions |$\{\phi _m\}_{m=1}^M$| and the image |$y\in \mathbb{R}^{M\times M}$|; see (1.3) and (1.4). We assume that the imaging apparatus is |$L$|-Lipschitz in the sense of (2.9).
For an integer |$K$| and |$\varepsilon \in (0,1/2]$|, let |$x_{K,\varepsilon }$| be a |$K$|-sparse and |$\varepsilon $|-separated non-negative measure on |${\mathbb{I}}^2$| that approximates |$x$| with the residual of |$R(x,K,\varepsilon )$|, in the sense of (2.8). In particular, let |$\varTheta =\{\theta _k\}_{k=1}^K =\{(t_k,s_k)\}_{k=1}^K\subset \operatorname{interior}({\mathbb{I}}^2)$| denote the support of |$x_{K,\varepsilon }$|, and set |$T=\{t_k\}_{k=1}^K$| and |$S=\{s_k\}_{k=1}^K$| for short.
The error bound in (2.13) holds if |$M\ge 2K+2$| and
1. |$\{\phi _{m}\}_{m=1}^M$| form a |$\mathscr{C}$|-system on |${\mathbb{I}}$|,
2. |$\{F_{T_\varOmega }\}\cup \{\phi _{m}\}_{m=1}^M$| and |$\{F_{S_\varOmega }\}\cup \{\phi _{m}\}_{m=1}^M$| both form |$\mathscr{C}^*_{K,\varepsilon }$|-systems on |${\mathbb{I}}$| for every |$\varOmega \subseteq [K]$|,
3. |$\{F_{t_k}^+\}\cup \{\phi _{m}\}_{m=1}^M$| and |$\{F_{t_k}^-\}\cup \{\phi _{m}\}_{m=1}^M$| both form |$\mathscr{C}^*_{K,\varepsilon }$|-systems on |${\mathbb{I}}$| for every |$k\in [K]$|,
4. |$\{F_{s_k}^+\}\cup \{\phi _{m}\}_{m=1}^M$| form a |$\mathscr{C}^*_{K,\varepsilon }$|-system on |${\mathbb{I}}$| for every |$k\in [K]$|.
An example of the functions in Theorem 11 appears in Fig. 3(e), where the purple graph is an example of |$F_{T_\varOmega }$| for |$\varOmega = \{t_1\}$|, shown in the figure as |$F_{t_1}$| for brevity. Theorem 11 for image super-resolution is unique in a number ways. The differences with prior work are further discussed in Section 3 and also summarized here. First, Theorem 11 applies to arbitrary measures, not only atomic ones. In particular, for atomic measures, no minimum separation or limit on the density of impulses are imposed in contrast to earlier results [2, 13, 14, 34].
![This figure visualizes some of the principle ideas behind the proof of Theorem 11, explained in Remark 12. As an example, consider the non-negative measure $x=a_1\delta _{\theta _1}+a_2 \delta _{\theta _2}$. The locations of two impulses at $\theta _1=(t_1,s_1)\in{\mathbb{I}}^2$ and $\theta _2 = (t_2,s_2)\in{\mathbb{I}}^2$ are shown with white dots, and the black lines show the corresponding grid. For Program (1.5) to approximately recover $x$ from the noisy image $y$ given in (1.2), we need to construct a non-negative polynomial $Q$ of the form in (2.3) that is zero at the impulse locations and large away from the impulses, to ensure stability. That is, we need $Q(\theta _1)=Q(\theta _2)=0$ and also $Q(\theta )\ge \overline{g}>0$ when $\theta $ is far from both $\theta _1,\theta _2$, for a positive scalar $\overline{g}$. This lower bound for the polynomial $Q$ is shown in Fig. 3(a) as a heat map, with warmer colors corresponding to larger values, i.e., blue corresponds to zero and green corresponds to $\overline{g}$. Let us first express this lower bound on $Q$ in terms of univariate functions. To that end, consider a function $F_{t_1}(t)$ that is zero near and equal to $\sqrt{\overline{g}}$ away from $t_1$. Likewise, consider similar non-negative functions $F_{t_2}(t),F_{s_1}(s),F_{s_2}(s)$ that are zero near and equal to $\sqrt{\overline{g}}$ away from $t_2,s_1,s_2$, respectively. Figure 3(b) shows the heat map of $F_{t_1}(t)F_{s_2}(s)$, Fig. 3(c) shows the heat map of $F_{t_2}(t)F_{s_1}(s)$ and lastly Fig. 3(d) shows the heat map of their sum, i.e., $G(\theta ):= F_{t_1}(t)F_{s_2}(s)+ F_{t_2}(t)F_{s_1}(s)$. Note that $G$ is zero near and larger than $\overline{g}$ away from the impulse locations $\theta _1,\theta _2$, as desired. It only remains to construct the univariate polynomials $q_{t_1},q_{t_2},q_{s_1},q_{s_2}\in \textrm{span}(\phi _1,\cdots ,\phi _M)$ that satisfy the inequalities $q_{t_1}\ge F_{t_1}, q_{t_2}\ge F_{t_2}, q_{s_1}\ge F_{s_1}, q_{s_1}\ge F_{s_1}$, with equality at $t_1,t_2,s_1,s_2$, respectively. Under the assumptions of Theorem 11, the existence of these univariate polynomials follows from [1]. In this fashion, we finally obtain a non-negative polynomial $Q(\theta ) = q_{t_1}(t)q_{s_2}(s)+q_{t_2}(t)q_{s_1}(s)$ that is zero at the impulse locations and larger than $\overline{g}$ away from the impulses, as desired. For example, for the Gaussian window detailed in Section 2.3 with the standard deviation $\sigma =0.2$, Fig. 3(e) shows $q_{t_1}(t)$ and Fig. 3(f) shows the heat map of the dual certificate $Q(\theta )$, both in logarithmic scale. Yet, another polynomial $Q^0$ is needed to control the recovery error near the impulses and thus complete the proof of Theorem 11; see Section 4.4.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/imaiai/10/1/10.1093_imaiai_iaaa029/8/m_iaaa029f3.jpeg?Expires=1748149523&Signature=gGrCpUKW~I~~xwPTsGnhRfRgNXAQa5HHiX~o2zDOrURNz~JFvDhHmWkcDDo--4RAXhajAt3Ef2q7-~-ueBoIIVuqFhiJwAJOVlWsnKXKBZm8MFJJIQaLFsCavOaneNYhkdKC5v9eq1108UHuW2n2cm49-uOFI58dcj0Qn3I0CYHooSJ6HROoCXBJKUzZGbstOGn64qsJzmdeq-dStPm~rImr9sMvY8-EVSSh03mvbu-jOu5kpWM4140mbXfodGqHiL28RQ-PlvFivOqfi3KZrQp8vazi0f6qawNyEJSkWm8DLcM2-0MWNtSPkPioluZ0MZMTO9wLjBbPizOwXJHjdQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
This figure visualizes some of the principle ideas behind the proof of Theorem 11, explained in Remark 12. As an example, consider the non-negative measure |$x=a_1\delta _{\theta _1}+a_2 \delta _{\theta _2}$|. The locations of two impulses at |$\theta _1=(t_1,s_1)\in{\mathbb{I}}^2$| and |$\theta _2 = (t_2,s_2)\in{\mathbb{I}}^2$| are shown with white dots, and the black lines show the corresponding grid. For Program (1.5) to approximately recover |$x$| from the noisy image |$y$| given in (1.2), we need to construct a non-negative polynomial |$Q$| of the form in (2.3) that is zero at the impulse locations and large away from the impulses, to ensure stability. That is, we need |$Q(\theta _1)=Q(\theta _2)=0$| and also |$Q(\theta )\ge \overline{g}>0$| when |$\theta $| is far from both |$\theta _1,\theta _2$|, for a positive scalar |$\overline{g}$|. This lower bound for the polynomial |$Q$| is shown in Fig. 3(a) as a heat map, with warmer colors corresponding to larger values, i.e., blue corresponds to zero and green corresponds to |$\overline{g}$|. Let us first express this lower bound on |$Q$| in terms of univariate functions. To that end, consider a function |$F_{t_1}(t)$| that is zero near and equal to |$\sqrt{\overline{g}}$| away from |$t_1$|. Likewise, consider similar non-negative functions |$F_{t_2}(t),F_{s_1}(s),F_{s_2}(s)$| that are zero near and equal to |$\sqrt{\overline{g}}$| away from |$t_2,s_1,s_2$|, respectively. Figure 3(b) shows the heat map of |$F_{t_1}(t)F_{s_2}(s)$|, Fig. 3(c) shows the heat map of |$F_{t_2}(t)F_{s_1}(s)$| and lastly Fig. 3(d) shows the heat map of their sum, i.e., |$G(\theta ):= F_{t_1}(t)F_{s_2}(s)+ F_{t_2}(t)F_{s_1}(s)$|. Note that |$G$| is zero near and larger than |$\overline{g}$| away from the impulse locations |$\theta _1,\theta _2$|, as desired. It only remains to construct the univariate polynomials |$q_{t_1},q_{t_2},q_{s_1},q_{s_2}\in \textrm{span}(\phi _1,\cdots ,\phi _M)$| that satisfy the inequalities |$q_{t_1}\ge F_{t_1}, q_{t_2}\ge F_{t_2}, q_{s_1}\ge F_{s_1}, q_{s_1}\ge F_{s_1}$|, with equality at |$t_1,t_2,s_1,s_2$|, respectively. Under the assumptions of Theorem 11, the existence of these univariate polynomials follows from [1]. In this fashion, we finally obtain a non-negative polynomial |$Q(\theta ) = q_{t_1}(t)q_{s_2}(s)+q_{t_2}(t)q_{s_1}(s)$| that is zero at the impulse locations and larger than |$\overline{g}$| away from the impulses, as desired. For example, for the Gaussian window detailed in Section 2.3 with the standard deviation |$\sigma =0.2$|, Fig. 3(e) shows |$q_{t_1}(t)$| and Fig. 3(f) shows the heat map of the dual certificate |$Q(\theta )$|, both in logarithmic scale. Yet, another polynomial |$Q^0$| is needed to control the recovery error near the impulses and thus complete the proof of Theorem 11; see Section 4.4.
Moreover, Theorem 11 addresses both noise and model mismatch in image super-resolution. Indeed, even in the 1-D case, stability was identified as a technical obstacle in earlier work [2]. In addition, the recovery error in Theorem 11 is quantified with a natural metric between measures, i.e., the generalized Wasserstein metric, in contrast to prior work; see for example [37] that separately studies the error near and away from the impulses. Lastly, the measurement functions |$\{\phi _m\}_m$| are required to be continuous rather than (several times) differentiable [2, 34]. All this is achieved without the need to explicitly regularize for sparsity in Program (1.5).
Note also that, in practice, we often have an upper bound for the noise level |$\delta $| and the model mismatch |$R(x,K,\epsilon )$|, which would allow us to apply Theorem 11. This approach to quantifying stability against noise and model mismatch is common in model-based signal processing [38]. Several additional remarks are in order about Theorem 11.
(Proof technique). For Program (1.5) to successfully recover a sparse measure in the absence of noise, we constructed a non-negative polynomial |$Q(\theta )$|, within the span of the measurement functions, which vanished only at the impulse locations |$\varTheta =\{\theta _k\}_{k=1}^K=\{(t_k,s_k)\}_{k=1}^K$|; see the discussion after Proposition 2. For approximate recovery in the presence of model mismatch and noise, we need to construct a non-negative polynomial |$Q(\theta )$| that is bounded away from zero far from the impulse locations |$\varTheta $|, i.e., |$ Q(\theta ) \ge \overline{g}>0, \textrm{for every} \ \theta \ \textrm{far from} \ \varTheta , $| where |$\overline{g}$| is a positive scalar. Letting |$T=\{t_k\}_{k=1}^K$| and |$S=\{s_k\}_{k=1}^K$| for short, the proof of Theorem 11 constructs |$Q$| by combining certain univariate polynomials, similar to the proof of Proposition 2 which was itself summarized earlier in Section 2.1, and illustrated in Fig. 1. Among these univariate polynomials, for example, the proof constructs a non-negative polynomial |$q_T$| such that |$ q_T(t)\ge 1, \textrm{for every} t \textrm{far from}\ T. $| As shown in [1], such a univariate polynomial |$q_T$| exists if |$\{F_T\}\cup \{\phi _m\}_{m=1}^M$| form a |$\mathscr{C}^*$|-system. In addition to |$Q$|, we also find it necessary to construct yet another non-negative polynomial |$Q^0$| to control the recovery error near the impulse locations and thus complete the proof of Theorem 11, see Section 4.4 for more details. Figure 3 illustrates some of the key ideas in the proof of Theorem 11.
The bound on the recovery error |$d_{\textrm{GW}}(x,\widehat{x})$| in (2.13) depends on the noise level |$\delta $| and on how well |$x$| can be approximated with a well-separated sparse measure. More specifically, for any |$\varepsilon \in (0,1/2]$|, |$x$| can be approximated with a |$K$|-sparse and |$\varepsilon $|-separated measure |$x_{K,\varepsilon }$|, with the residual of |$R(x,K,\varepsilon )$|; see (2.8). We might then think of a solution |$\widehat{x}$| of Program (1.5) as an estimate for |$x_{K,\varepsilon }$| and therefore an estimate for |$x$|, up to the residual |$R(x,K,\varepsilon )$|. Both the separation |$\varepsilon $| and the residual |$R(x,K,\varepsilon )$| appear on the right-hand side of the error bound (2.13).
In particular, when |$\delta =\varepsilon = R(x,K,\varepsilon )= 0$|, we again obtain Proposition 2 for recovery of a |$K$|-sparse non-negative measure in the absence of noise. Note that, given a noise level |$\delta $|, this work does not address the challenging problem of choosing the separation |$\varepsilon $| in order to minimize the right-hand side of (2.13). Intuitively, for large |$\delta $|, we must choose the separation |$\varepsilon $| large enough to maintain stability against the large noise level. In turn, a large |$\varepsilon $| leads to a large residual |$R(x,K,\varepsilon )$|; see (2.8). The correct balance between |$\epsilon $| and |$R(x,K,\epsilon )$| depends on the particular choice of the measurement functions |$\{\phi _m\}_{m}$| and is beyond the scope of this paper.
Theorem 11 applies to any non-negative measure |$x$|. In particular, when |$x$| is an atomic measure, Theorem 11 applies regardless of the separation between the impulses present in the measure |$x$|. However, it is crucial to note that the recovery error |$d_{\textrm{GW}}(x,\widehat{x})$| does indeed depend on the separation of |$x$|.
2.3 Example with Gaussian window
Suppose first that there is no imaging noise and, consequently, |$y_{m,n}=(g_2\star x)(\theta ^{\prime}_{m,n})$| is the (m,n)th pixel of the image, for every |$m,n\in [M]$|. The Gaussian windows |$\{\phi _m\}_{m=1}^M$|, specified in (2.16), form a |$\mathscr{C}$|-system on |${\mathbb{I}}$| for arbitrary |$T^{\prime}\subset{\mathbb{I}}$|; see for instance [24, Example 5]. Therefore, in view of Proposition 2, the measure |$x$| is the unique solution of Program (1.5) with |$\delta ^{\prime}=0$|, provided that |$M\ge 2K+1$|. This simple argument should be contrasted with the elaborate proofs of earlier 1-D results, for example Theorem 1.3 in [2].
In the presence of noise, i.e., when |$\delta \ge 0$|, [1, Lemma 23] establishes that all the families of functions in Theorem 11 are indeed |$\mathscr{C}^*_{K,\varepsilon }$|-systems on |${\mathbb{I}}$|, provided that the endpoints of |${\mathbb{I}}$| are included in the sampling points |$T^{\prime}$|. In fact, [1, Section 1.2] goes further and also evaluates the factors involved in the error bound for 1-D super-resolution; although arguably the result is suboptimal and there is a room for improvement. In principle, those results could be in turn used to evaluate |$c_1,c_2,c_3$| in the error bound of Theorem 11, a direction which is not pursued here. The conclusion of this section is recorded below.
3. Related work
The current wave of super-resolution research using convex optimization began with the two seminal papers of Candès and Fernandez-Granda [12, 27]. In those papers, the authors showed that a convex program with a sparse-promoting regularizer stably recovers a complex atomic measure from the low-end of its spectrum. This holds true if the minimal separation between any two spikes is inversely proportional to the maximal measured frequency, i.e., the ‘bandwidth’ of the sensing mechanism. Many papers extended this fundamental result to randomized models [29], support recovery analysis [37, 39–42], denoising schemes [43, 44], different geometries [45–49] and incorporating prior information [50]. Most of these works easily generalize to multidimensional signals. In addition, a special attention to multidimensional signals was given in a variety of papers; see for instance [51, 52].
The separation condition above is unnecessary for nonnegative measures, and this is the important regime on which this paper and most of this review focuses. There are a number of works that study non-negative sparse super-resolution for atomic measures supported on a grid. In [13, 14], it was shown that for such 1-D or 2-D signals, stable reconstruction is possible without imposing a separation condition, but instead requiring a milder condition on the density of the impulses. In particular, the error grows exponentially fast as the density of the spikes increases. A similar result was derived for signals on the sphere [53].
In this paper, we focus on the grid-free setting [54, 55] in which the non-negative measure is not necessarily supported on a predefined grid. This is the most general regime and requires more advanced machinery and algorithms. In [2], it was shown that in the absence of noise, a convex program with TV regularizer can recover—without imposing any separation—an atomic measure on the real line [2]. The same holds on other geometries as well [45, Section 5]. However, all these results have no stability guarantees, assume a differentiable point spread function and make use of a TV regularizer to promote sparsity. Our Proposition 2 and Theorem 11 address all these shortcomings and solve the sparse (grid-free) image super-resolution problem in its most general form. The leap from the 1-D results of [1] to 2-D requires new techniques since the key technical ingredient, i.e., |$\mathscr{C}$|-systems, does not naturally extend to higher dimensions.
Let us add that the low-noise regime for positive 1-D super-resolution was studied in [20]. There, it was shown that a convex program with a sparse-promoting regularizer results in the same number of spikes as the original measure when the noise level is small. Furthermore, the solution converges to the underlying positive measure if the signal-to-noise ratio scales like |$\mathscr{O}(1/\textrm{sep}^2)$|, where |$\textrm{sep}$| is the minimal separation between adjacent spikes. In contrast to our work, the framework of [20] builds upon smooth convolution kernel and uses a sparse-promoting regularizer, rather the feasibility problem considered in Program (1.5). In [33], it was shown that the 2-D version of the same program enjoys similar properties for a pair of spikes.
Going back to signed measures, another line of work is based on various generalizations of Prony’s method [56], which encodes the support of the measure as zeros of a designed polynomial. Such generalizations include methods such as MUSIC [57], Matrix Pencil [18] and ESPRIT [58], to name a few. In 1-D and in the absence of noise, these methods are guaranteed to achieve exact recovery for a complex measure without enforcing any separation. This is not true for convex programs in which separation is a necessary condition; see [59]. The separation is not necessary for convex programs only for non-negative measures, like the model considered in this paper. The stability analysis of some of these methods, under a separation condition, is found in [60–65]. However, their extension to 2-D is not trivial and accordingly different methods were proposed [66–70]. To the best of our knowledge, the stability of these algorithms for two or higher dimensions is not understood. That being said, we do not claim that convex programs are numerically superior over the Prony-like techniques and we leave comprehensive numerical study for future research.
4. Theory
4.1 Notation
4.2 Proof of Proposition 2 (sparse measure without noise)
The following standard result is an immediate extension of [1, Lemma 9] and, roughly speaking, states that Program (1.5) is successful if a certain dual certificate |$Q$| exist.
Let |$x$| be a |$K$|-sparse non-negative atomic measure supported on |$\varTheta \subset \operatorname{interior}({\mathbb{I}}^2)$|; see (2.1). Then, |$x$| is the unique solution of Program (1.5) with |$\delta ^{\prime}=0$| if
• the |$M^{2}\times K$| matrix |$ \left [\phi _{m}(t_{k})\phi _{n}(s_{k})\right ]_{m,n,k=1}^{m=M,n=M,k=K} $| has full column rank, and
• there exist real coefficients |$\{b_{m,n}\}_{m,n=1}^M$| and polynomial |$Q(t,s)=\sum _{m,n=1}^M b_{m,n}\phi _m(t)\phi _n(s)$| such that |$Q$| is nonnegative on |$\operatorname{interior}({\mathbb{I}}^2)$| and vanishes only on |$\varTheta $|.
Let |$x$| be a |$K$|-sparse non-negative atomic measure supported on |$\operatorname{interior}({\mathbb{I}}^2)$|. For |$M\ge 2K+1$|, suppose that |$\{\phi _{m}\}_{m=1}^M$| form a |$\mathscr{C}$|-system on |${\mathbb{I}}$|. Then, the dual certificate |$Q$| prescribed in Lemma 15 exists.
Combining Lemmas 15 and 16 completes the proof of Proposition 2.
The technical challenge in the proof of Lemma 16 is that |$\mathscr{C}$|-systems do not generalize to two dimensions. Indeed, to prove our claim, we effectively reduce the construction of the dual polynomial |$Q(\theta )$| with |$\theta =(t,s)$| into the construction of a number of univariate polynomials in |$t$| and |$s$|. The key observation of the proof is the following. Suppose for simplicity that |$\phi _1 \equiv 1$|. Recall that |$\varTheta = \{\theta _k\}_{k=1}^K=\{(t_k,s_k)\}_{k=1}^K$| are the impulse locations, and let |$T=\{t_{k}\}_{k=1}^K$|, |$S=\{s_{k}\}_{k=1}^K$| for short, so that |$\varTheta \subseteq (T\times S)$|. Suppose that a univariate polynomial |$q_{T}(t)$| of |$\{\phi _m(t)\}_{m=1}^M$| is non-negative on |${\mathbb{I}}$| and only vanishes on |$T$|. Similarly, consider a polynomial |$q_{S}(s)$| of |$\{\phi _n(s)\}_{n=1}^M$| that is non-negative on |${\mathbb{I}}$| and only vanishes on |$S$|. Then, the polynomial |$Q(\theta )=q_{T}(t)+q_{S}(s)$| is non-negative on |${\mathbb{I}}^{2}$| and vanishes only on |$T\times S$|.
In general |$\varTheta \subset (T\times S)$| and, consequently, the above |$Q$| will have unwanted zeros on |$(T\times S)\backslash \varTheta $|. This issue may be addressed by replacing the |$M^2\times K$| matrix in Lemma 15 with a larger matrix of size |$M^2\times K^2$|. Alternatively, we use in the proof a more nuanced argument to construct a different polynomial |$Q$| that vanishes exactly on |$\varTheta $|, without the unwanted zeros on |$(T\times S) \backslash \varTheta $|; see Appendix A for details. This more nuanced argument also lends itself naturally to noisy super-resolution, as we will see later.
4.3 Geometric intuition for Proposition 2
Proposition 2 states that the imaging operator |$\varPhi $| in (1.4) is injective on all |$K$|-sparse non-negative measures (such as |$x$|) provided that we take enough observations (|$M\ge 2K+1$|) and the measurement functions |$\{\phi _m\}_{m=1}^M$| form a |$\mathscr{C}$|-system on |${\mathbb{I}}$|. Here, we provide some geometric intuition about the role of the dual certificate in the proof of Proposition 2.

This figure complements Section 4.3. Figure 4(a) shows the conic hull |$\mathscr{C}$| of the range of the imaging operator |$\varPhi :{\mathbb{I}}^2\rightarrow \mathbb{R}^{M\times M}$|. Note that the image |$y$| in (2.2) belongs to the cone |$\mathscr{C}$|; see (4.6). For Program (1.5) to successfully recover the true measure |$x$| from the image |$y$|, it suffices that |$\{\varPhi (\theta _k)\}_{k=1}^K$| form a |$K$|-dimensional exposed face of the cone |$\mathscr{C}$|, where |$\{\theta _k\}_{k=1}^K$| is the support of the measure |$x$|. This face is shown in green. In words, this condition is sufficient for |$x$| to be the unique solution of Program (1.5). Equivalently, it suffices to find a hyperplane, with normal vector |$b$|, that strictly supports |$\mathscr{C}$| on this face. This latter condition can be interpreted as finding a nonnegative polynomial of |$\{\phi _m(t)\phi _n(s)\}_{m,n=1}^M$| with zeros exactly on the support of the measure |$x$|.
4.4 Proof of Theorem 11 (arbitrary measure with noise)
In this section, we will prove the main result of this paper, i.e., Theorem 11. For an integer |$K$| and |$\varepsilon \in (0,1/2]$|, let |$x_{K,\varepsilon }$| be a |$K$|-sparse and |$\varepsilon $|-separated non-negative measure on |${\mathbb{I}}^2$| that approximates |$x$| with residual |$R(x,K,\varepsilon )$|, in the sense of (2.8). Let |$\varTheta =\{\theta _k\}_{k=1}^K=\{(t_k,s_k)\}_{k=1}^K \subset \operatorname{interior}({\mathbb{I}})^2$| be the support of |$x_{K,\varepsilon }$|, and set |$T=\{t_k\}_{k=1}^K$| and |$S=\{s_k\}_{k=1}^K$| for short. Consider also the neighborhoods |$\{\theta _{k,\varepsilon }\}_{k=1}^K\subseteq{\mathbb{I}}^2$| and |$\varTheta _{\varepsilon }=\cup _{k=1}^K \theta _{k,\varepsilon }$| defined in (4.3). Let |$\{\theta _{k,\varepsilon }^{C}\}_{k=1}^K$| and |$\varTheta _{\varepsilon }^{C}$| denote the complements of these sets with respect to |${\mathbb{I}}^{2}$|.
4.4.1 Dual certificates.
Lemmas 18 and 19 below show that Program (1.5) stably recovers the atomic measure |$x_{K,\varepsilon }$| in the presence of noise, provided that certain dual certificates exist. The proofs, which appear in Appendices B and C, are standard and Lemmas 18 and 19 below are extensions of Lemmas 16 and 17, respectively, in [1]. In particular, Lemma 18 below controls the recovery error away from the support |$\varTheta $| of |$x_{K,\varepsilon }$|, whereas Lemma 19 controls the error near the support. The latter features an approximate dual certificate and shares some broad similarities with [72].
By combining Lemmas 18 and 19, the next result bounds the error |$d_{\textrm{GW}}(x_{K,\varepsilon },\widehat{x})$|. The proof is omitted as the steps are identical to those taken in [1, Lemma 18].
4.4.2 Existence of the dual certificates.
To construct the dual certificates required in Lemmas 18 and 19, some preparation is necessary. For notational convenience, throughout we model |$x_{K,\varepsilon }$| by (2.1), with |$x$| therein replaced with |$x_{K,\varepsilon }$|. Then, recall from Section 4.1 that |$\varTheta =\{\theta _{k}\}_{k=1}^K=\{(t_{k},s_{k})\}_{k=1}^K$| are the impulse locations of |$x_{K,\varepsilon }$|, and let |$T=\{t_{k}\}_{k=1}^K$| and |$S=\{s_{k}\}_{k=1}^K$| for short. In particular, note that |$\varTheta \subseteq T\times S$|.
For |$M\ge 2K+2$|, suppose that |$\{\phi _{m}\}_{m=1}^M$| form a |$\mathscr{C}$|-system on |${\mathbb{I}}$|. For every |$k\in [K]$|, suppose also that |$\{F_{t_k}^+\}\cup \{\phi _{m}\}_{m=1}^M$|, |$\{F_{t_k}^{-}\}\cup \{\phi _{m}\}_{m=1}^M$| and |$\{F_{s_k}^{+}\}\cup \{\phi _{m}\}_{m=1}^M$| are all |$\mathscr{C}^*_{K,\varepsilon }$|-systems on |${\mathbb{I}}$|; see (4.19). Then, the dual certificate |$Q^{0}$|, as specified in Lemma 19, exists with |$\alpha =\alpha (\varepsilon )= 1- 1/q_{\max }^{\pi _*}(\varepsilon )$|, where |$q_{\max }^{\pi _*}(\varepsilon )$| is defined in (E2) and (E7). In particular, |$ \alpha (0) = 0$|.
4.4.3 Completing the proof of Theorem 11
5. Perspective
In this paper, we have shown that a simple convex feasibilty program is guaranteed to robustly recover a sparse (non-negative) image in the presence of model mismatch and additive noise, under certain conditions on the imaging apparatus. No sparsity-promoting regularizer or separation condition is needed, and the techniques used here are arguably simple and intuitive. In other words, we have described when the imaging apparatus acts as an injective map over all sparse images and when we can stably find its inverse. These results build upon and extend a recent paper [1] which focuses on 1-D signals. The extension to images, however, requires novel constructions of interpolating polynomials, called dual certificates. In practice, many super-resolution problems appear in even higher dimensions. While we believe that similar results hold in any dimension, it is yet to be proven. Similarly, the super-resolution problem was studied in different non-Euclidean geometries for complex measures and under a separation condition [45–48, 73]. It would be interesting to examine whether our results, which are based on the properties of Chebyshev systems, extend to these non-trivial geometries and to manifolds in general.
Verifying the conditions on the window for stable recovery in Theorem 11 is rather ponderous. As an example, we have shown that the Gaussian window, a ubiquitous model of convolution kernels, satisfies those conditions. It is important to identify other such admissible windows and, if possible, simplify the conditions on the window in Theorem 11. Another interesting research direction is deriving the optimal separation |$\varepsilon $| (as a function of noise level |$\delta $|) that minimizes the right-hand side of the error bound in (2.13). Such a result will provide the tightest error bound for Program (1.5).
This work has focused solely on the theoretical performance of Program (1.5). It is essentially important to understand, even numerically, the pros and cons of the different localization algorithms suggested in the literature. For instance, it would be interesting to investigate whether the sparse-promoting regularizer, albeit not necessary for our analysis of non-negative measures, reduces the recovery error.
Acknowledgements
AE would like to thank Jared Tanner and Bogdan Toader for their insightful feedback. TB would like to thank Amit Singer for his support and Nir Sharon for helpful discussions on Chebyshev systems. We are deeply indebted to the anonymous reviewers of this work for their careful and detailed comments. We would also like to sincerely thank Jean-Baptiste Seby for spotting an error in the earlier version of this manuscript.
Funding
Alan Turing Institute (EP/N510129/1 to A.E.); Turing Seed Funding (SF019 to A.E.); National Science Foundation (CCF-1704204 to G.T.); Defense Advanced Research Projects Agency Lagrange Program (N660011824020 to G.T.).
Data Availability Statement
No new data were generated or analyzed in support of this review.
Footnotes
It is also common to use T-system as the abbreviation of the Chebyshev system.
That is, every element is repeated an even number of times (|$2,4,\cdots $|) except one element that appears only once.
A non-negative sequence |$\{u^n\}_{n\ge 1}$| approaches zero at the rate |$n^{-p}$| if |$u_n=\varTheta (n^{-p})$|. See, for example, [36, p. 44].
Let us point out that we use the shorthand of |$\mathscr{C}^*$|-system here instead of T|$^*$|-system used in [1].
The definition of |$\mathscr{C}^*$|-system here is slightly different from that in [1] but the difference is inconsequential.
In the proof of [1, Proposition 19] and with the notation therein, |$F$| must be such that |$\{F\}\cup \{\phi _j\}_{j=1}^m$| form a |$\mathscr{C}^*$|-system, but is otherwise arbitrary. Our Proposition 24 thus follows from [1, Proposition 19], after ① recalling the first property of |$\mathscr{C}^*$|-systems listed in Remark 10 and ② noting that the sum in the definition of |$\dot{q}$| in the proof of Proposition 19 is a continuous function of |$t$| because of the continuity of the functions |$\{\phi _j\}_{j=1}^m$|.
References
A. Proof of Lemma 16
Next, we recall [1, Lemma 15], originally proved in [24, Theorem 5.1], stated below for convenience.
Consider a set |$T^{\prime}\subset{\mathbb{I}}$| of size |$K^{\prime}$|. With |$M\ge 2K^{\prime}+1$|, suppose that |$\{\phi _{m}\}_{m=1}^M$| form a |$\mathscr{C}$|-system on |${\mathbb{I}}$|. Then, there exist coefficients |$\{b_m\}_{m=1}^M$| such that the polynomial |$q_{T^{\prime}}=\sum _{m=1}^M b_{m}\phi _{m}$| is non-negative on |${\mathbb{I}}$| and vanishes only on |$T^{\prime}$|.
On the other hand, suppose that |$\theta \in \varTheta ^{C}$|. The first possibility is that |$\theta =(t,s)\in T^{C}\times S^{C}\subseteq \varTheta ^{C}$|, i.e., |$t\in T^{C}$| and |$s\in S^{C}$|. For arbitrary index set |$\varOmega \subseteq [K]$|, note that |$q_{T_{\varOmega }}(t)\cdot q_{S_{[K]\backslash \varOmega }}(s)>0$| by design. It follows from (A.1) that |$Q(\theta )=Q(t,s)>0$| when |$\theta \in T^{C}\times S^{C}$|. The second possibility is that |$\theta =(t_{k},s_{l})$| with |$k\ne l$| and |$k,l\in [K]$|. There always exists |$\varOmega _{0}\subset [K]$| such that |$t_{k}\in [K]\backslash \varOmega _{0}$| and |$s_{l}\in \varOmega _{0}$|. For such |$\varOmega _{0}$|, it holds that |$q_{T_{\varOmega }}(t_{k})\cdot q_{S_{[K]\backslash \varOmega }}(s_{l})>0$|. For instance, one can choose |$\varOmega _{0}=\{s_l\}$| for which both |$q_{T_{\{s_l\}}}(t_{k})$| and |$q_{S_{[K]\backslash \{s_l\}}}(s_l)$| are strictly positive. Consequently, |$Q(\theta )>0$| by (A.1) when |$\theta \in \varTheta ^{C}\backslash (T^{C}\times S^{C})$|. In conclusion, |$Q$| is positive and vanishes only on |$\varTheta $|, as claimed. This completes the proof of Lemma 16.
B. Proof of Lemma 18
C. Proof of Lemma 19
D. Proof of Proposition 21
The high-level strategy of the proof is again to construct the desired polynomial |$Q(\theta )$| in |$\theta =(t,s)$| by combining a number of univariate polynomials in |$t$| and |$s$|. Each of these univariate polynomials is built using a 1-D version of Proposition 21, which is summarized below for the convenience of the reader; see [1, Proposition 19].6
Consider a finite set |$T^{\prime} \subset{\mathbb{I}}$| of size no larger than |$K$|. For |$M\ge 2K+2$|, suppose that |$\{\phi _{m}\}_{m=1}^M$| form a |$\mathscr{C}$|-system on |${\mathbb{I}}$|. Consider also |$F^{\prime}:\mathbb{R}\rightarrow \mathbb{R}$| and suppose that |$\{F^{\prime}\}\cup \{\phi _{m}\}_{m=1}^M$| form a |$\mathscr{C}^*_{K,\varepsilon }$|-system on |${\mathbb{I}}$|. Then, there exist real coefficients |$\{b_{m}\}_{m=1}^M$| and a continuous polynomial |$q_{T^{\prime}}=\sum _{m=1}^M b_{m}\phi _{m}$| such that |$q_{T^{\prime}}\ge F^{\prime}$| with equality holding on |$T^{\prime}$|.
On the other hand, consider |$\theta \in \varTheta _{\varepsilon }^{C}$|, i.e., |$\theta $| does not belong to any of the neighborhoods |$\{\theta _{k,\varepsilon }\}_{k=1}^K$|. We consider two cases below.