Abstract

Motivated by the problem of robustness to deformations of the input for deep convolutional neural networks, we identify signal classes that are inherently stable to irregular deformations induced by distortion fields |$\tau \in L^\infty ({\mathbb{R}}^{d};{\mathbb{R}}^{d})$|⁠, to be characterized in terms of a generalized modulus of continuity associated with the deformation operator. Resorting to ideas of harmonic and multiscale analysis, we prove that for signals in multiresolution approximation spaces |$U_{s}$| at scale |$s$|⁠, stability in |$L^{2}$| holds in the regime |$\|\tau \|_{L^\infty }/s\ll 1$|—essentially as an effect of the uncertainty principle. Instability occurs when |$\|\tau \|_{L^\infty }/s\gg 1$|⁠, and we provide a sharp upper bound for the asymptotic growth rate. The stability results are then extended to signals in the Besov space |$B^{d/2}_{2,1}$| tailored to the given multiresolution approximation. We also consider the case of more general time-frequency deformations. Finally, we provide stochastic versions of the aforementioned results, namely we study the issue of stability in mean when |$\tau (x)$| is modelled as a random field (not bounded, in general) with identically distributed variables |$|\tau (x)|$|⁠, |$x\in{\mathbb{R}}^{d}$|⁠.

1. Introduction

1.1 The problem of stability to deformations

In this note we consider a mathematical problem motivated by the theory and practice of machine learning, which is the robustness of the output of a neural network under modifications of the input datum. Let us briefly illustrate this issue by considering a function |$f\colon{\mathbb{R}}^{d} \to{\mathbb{R}}$|⁠. Some basic transformations to be taken into account involve intensity perturbations, that is, |$\tilde{f}(x)=f(x)+h(x)$| for some |$h \colon{\mathbb{R}}^{d} \to{\mathbb{R}}$|⁠, or signal deformations, namely, |$\tilde{f}(x)= F_\tau f(x) := f(x-\tau (x))$| for some distortion field |$\tau \colon{\mathbb{R}}^{d} \to{\mathbb{R}}^{d}$|⁠. We stress that this model encompasses natural transformations such as translations or rotations.

Regardless of the variety of the architectures, the network under our attention can be represented by a map |$\varPhi $| from |$L^{2}({\mathbb{R}}^{d})$| to some Banach space with norm |${\left \vert \kern -0.25ex\left \vert \kern -0.25ex\left \vert \cdot \right \vert \kern -0.25ex\right \vert \kern -0.25ex\right \vert }$|⁠. In order to better appreciate the relevant phenomena, let us consider the classification setting where |$\varPhi $| acts as a feature extractor. A fair degree of stability of |$\varPhi $| to small transformations of the input signal is a naturally desirable property in several contexts. For example, consider the classic learning task of digit recognition from images of handwritten symbols, where the input signals suffer from both intra-class and inter-class variance, due for instance to differences in the position of the digit with respect to the background or handwriting styles. As a rule of thumb, it is expected that a small distortion of |$f$| into |$\tilde{f}$| should correspond to small norm discrepancy |${\left \vert \kern -0.25ex\left \vert \kern -0.25ex\left \vert \varPhi (\tilde{f})-\varPhi (f) \right \vert \kern -0.25ex\right \vert \kern -0.25ex\right \vert }$| at the level of features.

The previous remarks thus lead us to require that |$\varPhi $| enjoys a Lipschitz regularity condition:

(1.1)

The smallest constant |$C>0$| for which such an estimate holds will be denoted by |$\mathrm{Lip}(\varPhi )$|⁠. Moreover, in the particular case of a deformation |$\tilde{f}=F_\tau f$| of |$f$|⁠, it would be desirable for |${\left \vert \kern -0.25ex\left \vert \kern -0.25ex\left \vert \varPhi (F_\tau f)-\varPhi (f) \right \vert \kern -0.25ex\right \vert \kern -0.25ex\right \vert }$| to be small whenever |$\tau $| is small with respect to some distortion metric. We can distinguish at least two different angles on the matter:

  • In keeping with the spirit of geometric deep learning [5], structural stability guarantees are inferred from global and local invariance requirements that are a priori embedded in the design of the network. A prominent example in this connection is provided by the analysis of the scattering transform introduced in [18] (see also [6]: if |$\varPhi $| is a scattering transform with fixed wavelets filters, modulus nonlinearity and no pooling stages, it was proved in [18, Proposition 2.5] that |$\varPhi $| is a non-expansive transform (i.e. |$\text{Lip}(\varPhi )=1$|⁠), and in [18, Theorem 2.12] that, for every |$\tau \in C^{2}({\mathbb{R}}^{d};{\mathbb{R}}^{d})$| with |$\|\nabla \tau \|_{L^\infty }\leq 1/2$|⁠,
    (1.2)
    where |$\|f\|_{\mathrm{scatt}}$| is a mixed |$\ell ^{1}(L^{2})$| scattering norm (which is finite for functions with a logarithmic Sobolev-type regularity), |$H \tau $| denotes the Hessian of |$\tau $| and |$2^{J}$| is the coarsest scale in the dyadic multiscale analysis associated with the network filters.
  • In the case where little information on the architecture of the network is available or exploitable, one can only assume to satisfy a Lipschitz condition as in (1.1). In such cases, stability results for |$\varPhi $| can be possibly inherited from the inherent robustness to deformations of certain input signal classes. This amount to determine a subset |$\mathcal{E}\subset L^{2}({\mathbb{R}}^{d})$| such that bounds for |$\|F_\tau f-f\|_{L^{2}}$| in terms of some complexity metric of |$\tau $| can be proved if |$f \in \mathcal{E}$|⁠. This is the essence of the decoupling method introduced in [15, 25, 26] to obtain stability results for generalized scattering networks by exploiting sensitivity estimates of the form |$\|F_\tau f-f\|_{L^{2}} \le C_{\mathcal{E}}\|\tau \|_{L^{\infty }}^{\alpha _{\mathcal{E}}} \|f\|_{L^{2}}$|⁠, which are proved for several classes of interest (including Lipschitz, band-limited and cartoon functions) and deformations |$\tau \in C^{1}({\mathbb{R}}^{d};{\mathbb{R}}^{d})$| with |$\|\nabla \tau \|_{L^\infty }$| sufficiently small1.

A detailed comparison between Mallat’s scattering transform and generalized scattering networks would lead us too far. For our purposes, we just stress that in both cases the results are proved for regular (i.e. at least |$C^{1}$|⁠) deformations. In the case of the scattering transform, one is ultimately confronted with the interplay between the network multiscale architecture and the deformation regularity. Consider the case where |$f$| is a band-pass function; roughly speaking, the condition |$\|\nabla \tau \|_{L^\infty }\leq 1/2$| guarantees that |$F_\tau f$| is still localized in frequency, essentially in the same band of |$f$|⁠, therefore a stability result as in (1.2) is reasonable (although highly non-trivial to prove) since the network separates scales by design.

On the other hand, the scope of the decoupling method goes beyond the analysis of generalized scattering transforms: the weak requirement that |$\varPhi $| is Lipschitz stable as in (1.1) allows us to virtually encompass any neural network where detailed information on structural stability is merely not available. Actually, while most of real-life neural networks are empirically observed to enjoy Lipschitz stability [22], assuming solely this condition about the feature extractor is a worst-case scenario, since other elusive forms of regularity are heuristically expected to occur as well—such as regularization and cancellation phenomena across hidden layers. In fact, the mathematical literature in this respect is quite limited (see e.g. [2, 27]) and the available provable bounds for |$\mathrm{Lip}(\varPhi )$| are usually quite pessimistic, as they do not exploit further structural information on the network.

Let us also highlight that, as observed in [18], the condition |$\|\nabla \tau \|_{L^\infty }\leq 1/2$| can be relaxed to |$\|\nabla \tau \|_{L^\infty }<1$| but then the constant blows up when |$\|\nabla \tau \|_{L^\infty }\to 1$|⁠. The same remark applies to the constants |$C_{\mathcal{E}}$| of sensitivity bounds proved in [25] for band-limited functions and in [17] for functions in the Sobolev space2  |$H^{1}({\mathbb{R}}^{d})$|⁠. It is thus natural to wonder whether stability results can be derived if |$\|\nabla \tau \|_{L^\infty }\geq 1$| (therefore |$x\mapsto x-\tau (x)$| is no longer invertible) or even for less regular deformations, such as discontinuous ones. Broadly speaking, irregular perturbations such as local pixel shuffling of an image proved to be involved in sophisticate adversarial models such as pixel deflection [21]. They could also be used to model local distortion errors arising in signal encoding, where robustness of classification is naturally expected, as well as to compare contiguous frames of a video where pixels locally move in an irregular fashion (i.e. discontinuous optical flows, pose estimation).

1.2 Robustness to irregular deformations

The previous discussion suggests that the interplay between the deformation regularity and the network structure is a subtle issue. In fact, it turns out that, unless a network is purposefully designed to be stable to irregular deformations, stability results for |$\varPhi $| at this low-regularity level can only be obtained via the decoupling methods, hence passing on the robustness issue to the input signal class. Indeed, in the context of irregular deformations, even for well structured networks such as the wavelet scattering ones, it may happen that |${\left \vert \kern -0.25ex\left \vert \kern -0.25ex\left \vert \varPhi (F_\tau f)-\varPhi (f) \right \vert \kern -0.25ex\right \vert \kern -0.25ex\right \vert }\approx \|F_\tau f-f\|_{L^{2}}$|⁠.

To be more precise, let us illustrate two kinds of peculiar phenomena that could occur when dealing with irregular deformations—see also [20] for further details.

  1. Consider a band-pass function |$f$| oscillating at frequency |$1/s$| (⁠|$s>0$| being the scale); even if |$\|\tau \|_{L^\infty }$| is small, it may very well happen that the energy of |$F_\tau f$| is amplified by a factor |$(\|\tau \|_{L^\infty }/s)^{d/2}$|⁠; see Fig. 1. Hence, if |$\varPhi $| is any energy preserving map (⁠|$\|f\|_{L^{2}}\lesssim{\left \vert \kern -0.25ex\left \vert \kern -0.25ex\left \vert \varPhi (f) \right \vert \kern -0.25ex\right \vert \kern -0.25ex\right \vert }\lesssim \|f\|_{L^{2}}$|⁠) then it follows from the triangle inequality that |${\left \vert \kern -0.25ex\left \vert \kern -0.25ex\left \vert \varPhi (F_\tau f)-\varPhi (f) \right \vert \kern -0.25ex\right \vert \kern -0.25ex\right \vert }/\|f\|_{L^{2}}\gtrsim (\|\tau \|_{L^\infty }/s)^{d/2}$| when |$\|\tau \|_{L^\infty }$| is large compared with  |$s$|⁠.

  2. Let |$f$| be a band-pass function, as above, oscillating at frequency |$1/s$|⁠; even if |$\|\tau \|_{L^\infty }$| is small, when |$\|\tau \|_{L^\infty }$| is comparable with |$s$| it may happen that |$f$| and |$F_\tau f$| are localized in different dyadic frequency bands, see Fig. 2. In particular, if |$\varPhi $| is a wavelet scattering network, their energy will propagate along separate frequency paths and thus the error |${\left \vert \kern -0.25ex\left \vert \kern -0.25ex\left \vert \varPhi (F_\tau f)-\varPhi (f) \right \vert \kern -0.25ex\right \vert \kern -0.25ex\right \vert }^{2} \approx{\left \vert \kern -0.25ex\left \vert \kern -0.25ex\left \vert \varPhi (F_\tau f) \right \vert \kern -0.25ex\right \vert \kern -0.25ex\right \vert }^{2}+{\left \vert \kern -0.25ex\left \vert \kern -0.25ex\left \vert \varPhi (f) \right \vert \kern -0.25ex\right \vert \kern -0.25ex\right \vert }^{2}$| will not be small if |$\varPhi $| is energy preserving.

A signal $f$ supported on $[-s,s]$ and its deformation $F_\tau f$, where $\tau (x)=x$ for $|x|<K$, with $K>s$. The plateau level corresponds to the value $f(0)$. The operator $F_\tau $ (with the choice of $\tau $ specified above) performs a single-point sampling of $f$, hence it does not make sense on discontinuous signals.
Fig. 1.

A signal |$f$| supported on |$[-s,s]$| and its deformation |$F_\tau f$|⁠, where |$\tau (x)=x$| for |$|x|<K$|⁠, with |$K>s$|⁠. The plateau level corresponds to the value |$f(0)$|⁠. The operator |$F_\tau $| (with the choice of |$\tau $| specified above) performs a single-point sampling of |$f$|⁠, hence it does not make sense on discontinuous signals.

A signal $f$ localized in frequency where $|\omega |\approx s^{-1}$. With the choice of the deformation $\tau =s\mathbb{1}_{\{f=-1\}}$, the signal $F_\tau f$ is low-pass (a similar example with $f$ continuous is easily obtained by smoothing the steps).
Fig. 2.

A signal |$f$| localized in frequency where |$|\omega |\approx s^{-1}$|⁠. With the choice of the deformation |$\tau =s\mathbb{1}_{\{f=-1\}}$|⁠, the signal |$F_\tau f$| is low-pass (a similar example with |$f$| continuous is easily obtained by smoothing the steps).

These phenomena are evident sources of instability in the case where |$\|\tau \|_{L^\infty }/s\gg 1$| and |$\|\tau \|_{L^\infty }/s\approx 1$|⁠, respectively. In passing, note that in order for |$F_\tau f$| to be well defined as an element of |$L^{2}({\mathbb{R}}^{d})$| for every |$\tau \in L^\infty ({\mathbb{R}}^{d};{\mathbb{R}}^{d})$|⁠, independently of the representative of |$f$| in |$L^{2}({\mathbb{R}}^{d})$|⁠, |$f$| must be assumed continuous at least—see again Fig. 1 for a concrete reference.

We thus conclude that for irregular deformations one is forced to shift the robustness problem from the network architecture to the signal class. In keeping with the spirit of mathematical analysis, let us emphasize that proving bounds for |$\|F_\tau f-f\|_{L^{2}}$| in terms of the deformation size |$\|\tau \|_{L^\infty }$| and |$\|f\|_{L^{2}}$| for suitable signal classes can be thought of as a generalization of a typical problem of harmonic analysis, where the differentiability properties of certain function spaces are quantitatively measured in terms of the magnitude of some |$L^{p}$| modulus of continuity |$\omega _{p}[f](t) := \| f(x+t)-f(x) \|_{L^{p}_{x}}$|⁠, |$t \in{\mathbb{R}}^{d}$|⁠, as |$|t| \to 0$|—cf. for instance [23, Chapter V] for a classic reference on the topic. This approach allows one to fine tune the regularity scale of a signal in a very precise way.

The previous remarks motivate focusing on a family of spaces where a precise tuning of the scale is available, in order to elucidate the relationship with the deformation size. We resort again to ideas and tools of modern harmonic analysis, namely we consider multiresolution approximation spaces |$U_{s} \subset L^{2}({\mathbb{R}}^{d})$|⁠, |$s>0$| [19], with a Riesz basis given by a sequence of functions of the type |$\phi _{s,n}(x):= s^{-d/2}\phi ((x-ns)/s)$|⁠, |$n \in{\mathbb{Z}}^{d}$|⁠, where |$\phi $| is a fixed filter satisfying certain mild regularity and decay conditions (cf. Assumptions A, B and C in Section 5 below). Different choices of |$\phi $| result in diverse multiresolution approximations, including band-limited functions and polynomial splines of order |$n\geq 1$|—see the discussion in Example 1 below for more details. In general, the introduction of a fixed resolution scale is also natural as a mathematical model of a concrete signal capture system—cf. the general A/D and D/A conversion schemes in [19, Section 3.1.3], and also [3, 4] for a similar limited-resolution assumption in a discrete setting. The scale |$s$| (or rather |$s^{-1}$|⁠) can also be viewed as a rough measure of the complexity of the input signal, and the previous discussion suggests that the ratio |$\|\tau \|_{L^\infty }/s$| should appear in sensitivity bounds rather than just |$\|\tau \|_{L^\infty }$|⁠, which is also expected in order to have dimensionally consistent estimates.

1.3 Generalized moduli of continuity for multiresolution spaces

The core of our first result can be presented as follows. Under suitable assumptions on |$\phi $| there exists a constant |$C>0$| such that, for every |$\tau \in L^\infty ({\mathbb{R}}^{d};{\mathbb{R}}^{d})$|⁠, |$s>0$|⁠,

(1.3)

Stability guarantees for any Lipschitz network |$\varPhi $| can thus be inferred by the fact that |${\left \vert \kern -0.25ex\left \vert \kern -0.25ex\left \vert \varPhi (F_\tau f) - \varPhi (f) \right \vert \kern -0.25ex\right \vert \kern -0.25ex\right \vert } \le \mathrm{Lip}(\varPhi ) \|F_\tau f- f\|_{L^{2}}$|⁠. We refer to Theorem 10 for precise statements. The estimate (1.3) for |$\|\tau \|_{L^\infty }/s \le 1$| recovers and extends the results proved in [25] for band-limited functions, now without any regularity assumption on the deformation. In Section 7 we show the sharpness of the estimate (1.3) in both regimes |$\|\tau \|_{L^\infty }/s \gg 1$| and |$\|\tau \|_{L^\infty }/s \ll 1$|⁠.

In short, whenever we have a Lipschitz bound, we have a stability result in the regime |$\|\tau \|_{L^\infty }/s \ll 1$|⁠, which can be explained in heuristic terms as one of the manifold forms of the uncertainty principle—see below for further comments in this connection. Observe also that the rate of instability agrees with that of the previous discussion in (a) when small-size oscillations, compared with the size of the deformation (namely, if |$\|\tau \|_{L^\infty }/s \gg 1$|⁠), are allowed.

Interestingly, for fixed |$f$|⁠, we have in any case |${\left \vert \kern -0.25ex\left \vert \kern -0.25ex\left \vert \varPhi (F_\tau f) - \varPhi (f) \right \vert \kern -0.25ex\right \vert \kern -0.25ex\right \vert }=O(\|\tau \|_{L^\infty })$| as |$\|\tau \|_{L^\infty }\to 0$|⁠, although this asymptotic estimate is not uniform with respect to |$s$|⁠. In fact, in sharp contrast with (1.2), the factor |$1/s$| in front of |$\|\tau \|_{L^\infty }$| associates with a feature of the input signal (i.e. the resolution of |$f$|⁠), whereas the invariance resolution |$2^{-J}$| in (1.2) is a fixed quantity that depends on the architecture of the network. However, the example in Fig. 2 above shows that in the framework of irregular deformations, even for a fixed wavelet scattering network, we cannot hope for an estimate whose quality does not deteriorate when |$\|\tau \|_{L^\infty }$| becomes comparable with the size of the oscillations of |$f$|⁠. We thus infer that while the choice of wavelet filters is crucial in [18] to manufacture a transform that is Lipschitz stable to the action of small diffeomorphisms, robustness under small and irregular deformations obeys a more general rule, as already anticipated above. In this connection, we address the reader to the aforementioned paper [20], where instability results are proved for wavelet scattering networks and deformations at low regularity levels, namely for distortion fields |$\tau \in C^\alpha ({\mathbb{R}}^{d};{\mathbb{R}}^{d})$| with |$0\le \alpha <1$|⁠.

The assumption that the input signal |$f$| belongs to |$U_{s}$| could be judged not realistic in practice. Rather, we often deal with signals that can be well approximated in low-complexity spaces. For such signal classes we have again a stability result, which is briefly outlined here in low-dimensional settings for simplicity—we refer to Theorem 11 for a general and precise statement. Let |$V_{j}:= U_{2^{j}}$|⁠, |$j\in \mathbb{Z}$|⁠, be a multiresolution analysis of |$L^{2}({\mathbb{R}}^{d})$|⁠. There exists a constant |$C>0$| such that, for every |$\tau \in L^\infty ({\mathbb{R}}^{d};{\mathbb{R}}^{d})$|⁠,

for any |$f\in L^{2}({\mathbb{R}}^{d})$| such that |$\|f\|_{\dot{B}^{d/2}_{2,1}}<\infty $|⁠, where |$\dot{B}^{d/2}_{2,1}$| denotes the homogeneous Besov space tailored to the given multiresolution analysis [19, Section 9.2.3]. This regularity level looks optimal in general—as already observed, |$f$| should be at least continuous, and therefore in |$B^{d/2}_{2,1}({\mathbb{R}}^{d})$| if we consider the scale of |$L^{2}$|-based Besov spaces as a reference.

In Section 6 we prove estimates in the same spirit for more general time-frequency deformations of the type |$F_{\tau ,\omega } f(x) = e^{i\omega (x)}f(x-\tau (x))$|⁠. Modulation deformations are relevant in case of spectral distortions of input signals. These deformations are approached here in a ‘perturbative’ way, that is, by reducing to the results already proved for the case |$\omega \equiv 0$|⁠.

The main technical tools behind our results are the properties of certain spaces |$X^{p,q}_{r}$|⁠, tailored to the deformation scale |$r>0$|⁠. Such function spaces are usually referred to as Wiener amalgam spaces and were introduced by Feichtinger in the ’80s [10, 11]. As the name suggests, they are obtained by means of a norm that amalgamates a local summability of |$L^{p}$| type on balls of radius |$r$| with an |$L^{q}$| behaviour at infinity. They are of current use in harmonic analysis and PDEs, possibly under slightly different names and forms—see for instance [8, 24].

In Section 3 we collect the main properties of these spaces, while in Section 4 we focus on the space |$X^{\infty ,2}_{r}$| of locally bounded functions, uniformly at the scale |$r$|⁠, with |$L^{2}$| decay. This choice should not be intended as a mere technical workaround: in Proposition 5 we prove that this class is indeed the optimal choice when dealing with arbitrary bounded deformations, since for functions |$f \in X^{\infty ,2}_{r}\cap C({\mathbb{R}}^{d})$| we have the clear-cut characterization

Moreover, the local control offered by |$X^{p,q}_{r}$| can be effectively exploited to prove a crucial embedding, cf. Theorem 8, which can be heuristically referred to as a reverse Hölder-type inequality for signals in |$U_{s}$| in the spirit of [24, Lemma 2.2], which can be regarded as a novel form of the already mentioned uncertainty principle. Intuitively, if a function |$f$| is localized in a low-frequency ball of radius |$R^{-1}$| centred at the origin, then |$f$| is approximately constant on balls of radius |$R$|⁠. As a result, deliberately ignoring the effect of the tails, its |$L^\infty $| norm on a ball of radius |$r<R$| can be roughly bounded by the |$L^{2}$| norm on the same ball (up to a factor |$(R/r)^{d/2}$|⁠). Strictly speaking, amalgam spaces are needed to put these heuristic remarks on a rigorous ground, leading precisely to the reverse Hölder-type inequality stated in Theorem 8.

We adopted so far a deterministic model for the deformation, namely a ball in |$L^\infty ({\mathbb{R}}^{d};{\mathbb{R}}^{d})$|⁠, without any additional structure, and therefore we provided stability guarantees in a worst-case scenario. In Section 8 we assume instead that |$\tau $| is a random field with identically distributed variables |$|\tau (x)|$|⁠, |$x\in{\mathbb{R}}^{d}$|⁠. We accordingly study the issue of stability in mean, providing stochastic versions of the above results. For example, we prove that

(1.4)

see Theorem 14 for the precise statement in any dimension, and for similar results when |$f$| belongs to limited-resolution spaces |$U_{s}$| as above. Here we set |$\mathbb{E}[|\tau |^{d}]$| for |$\mathbb{E}[|\tau (x)|^{d}]$|⁠, the latter being in fact independent of |$x$|⁠. We also emphasize that the field |$\tau $| is no longer assumed to be bounded.

2. Notation

The open unit ball of |${\mathbb{R}}^{d}$| with radius |$r>0$| and centred at the origin is denoted by |$B_{r}$|⁠.

We introduce a number of operators acting on |$f\colon{\mathbb{R}}^{d} \to{\mathbb{C}}$|⁠:

  • the dilation |$D_\lambda $| by |$\lambda \ne 0$|⁠: |$D_\lambda f(y) = f(\lambda y)$|⁠;

  • the translation |$T_{x}$| by |$x \in{\mathbb{R}}^{d}$|⁠: |$T_{x} f(y) = f(y-x)$|⁠;

  • the modulation |$M_\xi $| by |$\xi \in{\mathbb{R}}^{d}$|⁠: |$M_\xi f(y) = e^{i y \cdot \xi } f(y)$|⁠;

  • the reflection: |${\mathcal{I}} f(y) = f(-y)$|⁠;

  • the Fourier transform (whenever meaningful, e.g. if |$f \in L^{1}({\mathbb{R}}^{d})$|⁠), normalized here as

The space |$L^\infty ({\mathbb{R}}^{d};{\mathbb{R}}^{d})$| contains all the measurable vector fields |$\tau \colon{\mathbb{R}}^{d} \to{\mathbb{R}}^{d}$| such that

We introduce the inhomogeneous magnitude |$\langle y \rangle $| of |$y \in{\mathbb{R}}^{d}$|⁠, that is, |$\langle y \rangle := (1+|y|^{2})^{1/2}$|⁠.

The symbol |$\mathbb{1}_{E}$| will be used to denote the characteristic function of a set |$E$|⁠.

While in the statements of the results we will keep track of absolute constants in the estimates, in the proofs we will heavily make use of the symbol |$X \lesssim Y$|⁠, meaning that the underlying inequality holds up to a universal positive constant factor, namely

Moreover, |$X \asymp Y$| means that |$X$| and |$Y$| are equivalent quantities, that is, both |$X \lesssim Y$| and |$X\lesssim Y$| hold.

In the rest of the note all the derivatives are to be understood in the distribution sense, unless otherwise noted.

3. Multiscale Wiener amalgam spaces

The following family of function spaces will play a key role in the following.

 

Definition 1.
For |$1 \le p,q \le \infty $| and |$r>0$|⁠, we denote by |$X^{p,q}_{r}$| the space of all the complex-valued measurable functions in |${\mathbb{R}}^{d}$| such that
(3.1)
with obvious modifications if |$q=\infty $|⁠. In the case where |$r=1$| we write |$X^{p,q}$| for |$X^{p,q}_{1}$|⁠.

Let us emphasize that |$X^{\infty ,1}$| coincides with the well-known Wiener space of harmonic analysis (cf. e.g. [13, Section 6.1]). More generally, |$X^{p,q}$| coincides with the Wiener amalgam space |$W(L^{p},L^{q})$| of functions with local regularity of |$L^{p}$| type and global decay of |$L^{q}$| type, first introduced by Feichtinger in the ’80s [10, 11]; recall that the latter is a Banach space provided with the norm

where |$Q\subset{\mathbb{R}}^{d}$| is an arbitrary compact set with non-empty interior. In fact, different choices of |$Q$| yield equivalent norms; typical choices include |$Q=B_{1}$| and |$Q=[0,1]^{d}$|⁠. Moreover, the following equivalent discrete-type norm can be used to measure the amalgamated regularity:

(3.2)

We also highlight that |$X^{p,p}_{r}$| coincides with |$L^{p}({\mathbb{R}}^{d})$| as set for any |$1 \le p \le \infty $|⁠, but the norm is rescaled:

A similar change-of-scale property holds with respect to |$X^{p,q}$|⁠, in the sense of the following result.

 

Lemma 1.
For any |$1 \le p,q \le \infty $| and |$r>0$|⁠, we have that |$X^{p,q}_{r} = X^{p,q}$| as sets, and

 

Proof.
Let us consider the case |$p,q < \infty $| for conciseness, the other cases following easily. A straightforward computation shows that
which is the claim.

For future reference let us examine some properties of the spaces |$X^{p,q}_{r}$|⁠. First, we prove an embedding result that will be often used below.

 

Proposition 2.
For any |$1 \le p_{1},p_{2},q \le \infty $| with |$p_{1} \le p_{2}$|⁠, and |$r>0$|⁠, we have
where the constant |$C>0$| depends only on |$d$|⁠.

 

Proof.
Fix |$x \in{\mathbb{R}}^{d}$| and consider the mapping |$h_{x} \colon y \mapsto |f(x+y)|$|⁠. The standard Hölder inequality on the ball |$B_{r}$| yields, with |$\rho $| such that |$1/p_{1} = 1/p_{2} + 1/\rho $|⁠,
where |$C$| is the volume of the |$d$|-ball with radius |$1$|⁠. The claim thus follows.

In the following results we illustrate the behaviour of the spaces |$X^{p,q}_{r}$| under convolution and dilations. In fact, the case with |$r=1$| is covered by the standard theory of amalgam spaces (cf. [10, 16] and [7, Proposition 2.2] respectively), hence the result for |$r \ne 1$| follows by rescaling the norms in accordance with Lemma 1.

 

Proposition 3.
For any |$r>0$| and |$1 \le p_{1},p_{2},p,q_{1},q_{2},q \le \infty $| such that
we have
for a constant |$C>0$| that depends only on |$d$|⁠.

 

Proposition 4.
For any |$r,s>0$| and |$1 \le p,q \le \infty $| we have
for a constant |$C>0$| that depends only on |$d$|⁠.

4. |$L^\infty $| deformations and the space |$X^{\infty ,2}_{r}$|

Let us consider the class of deformation mappings |$F_\tau $| associated with distortion functions |$\tau \colon{\mathbb{R}}^{d} \to{\mathbb{R}}^{d}$| by setting

where |$f\colon{\mathbb{R}}^{d} \to{\mathbb{C}}$|⁠.

We prove that the class |$X^{\infty ,2}_{r}$| is the optimal choice as far as sensitivity bounds for arbitrary bounded deformations are concerned. The second part of the following result can be regarded as a linearization of a maximal operator (cf. [12, Section 6.1.3]).

 

Proposition 5.
We have
(4.1)
for every |$f \in X^{\infty ,2}_{r}\cap C({\mathbb{R}}^{d})$| and |$\tau \in L^\infty ({\mathbb{R}}^{d};{\mathbb{R}}^{d})$|⁠.
More precisely, for every function |$f \in X^{\infty ,2}_{r}\cap C({\mathbb{R}}^{d})$|⁠, we have the characterization
(4.2)

 

Remark 1.

Note that the continuity assumption on |$f \in X^{\infty ,2}_{r}$| is essential in the statement, otherwise |$f(x-\tau (x))$| may not even be well defined in |$L^{2}$| (i.e. independent of the representative |$f$|⁠), as evidenced by the case |$\tau (x)=x$| for |$x \in B_{R}$| and small |$R>0$|⁠. See also Fig. 1 in this connection.

 

Proof of Proposition  5.
It is clear that, for almost every |$x\in{\mathbb{R}}^{d}$|⁠,
and thus (4.1) follows after taking the |$L^{2}$| norm (the above supremum is the same as the essential supremum because |$f$| is continuous).
For what concerns (4.2), it is enough to prove that
To this aim, notice that if we could design a measurable correspondence |$\tau $| between |$x \in{\mathbb{R}}^{d}$| and a point |$y^{*} =\tau (x)\in \overline{B_{r}}$| where the function |$\overline{B_{r}} \ni y \mapsto |f(x-y)|$| attains its maximum, then
and the desired conclusion would follow once taking the |$L^{2}$| norm. The existence of such a measurable selector is a consequence of the measurable maximum theorem [1, Theorem 18.19] (in fact, an easier argument would give (4.2) with the supremum in place of the maximum, cf. [12, Section 6.1.3]).

The following result provides a sensitivity bound for |$L^{2}$| functions that are locally (i.e. on every compact subset) Lipschitz continuous, uniformly at the deformation scale. It should be compared with the result in [17], valid for functions in the Sobolev space |$H^{1}({\mathbb{R}}^{d})$| and deformations |$\tau \in C^{1}({\mathbb{R}}^{d};{\mathbb{R}}^{d})$| with |$\|\nabla \tau \|_{L^\infty }\leq 1/2$|⁠, hence regular.

 

Proposition 6.
There exists a constant |$C>0$| such that
(4.3)
for every |$\tau \in L^\infty ({\mathbb{R}}^{d};{\mathbb{R}}^{d})$| and every function |$f \in X^{\infty ,2}_{r}$| such that |$\| \nabla f \|_{X^{\infty ,2}_{r}} < \infty $|⁠.

Observe that the condition |$\| \nabla f \|_{X^{\infty ,2}_{r}} < \infty $| implies that |$\nabla f\in L^\infty _{\mathrm{loc}}({\mathbb{R}}^{d})$|⁠, and therefore |$f$| is locally Lipschitz continuous after possibly being redefined on a set of measure zero (cf. [9, Theorem 4, page 294]), in particular |$f$| is continuous. In the following text we will always identify |$f$| with its continuous version. Also, we set

(4.4)

 

Proof of Proposition  6.
For |$x\in{\mathbb{R}}^{d}$|⁠, |$r>0$| let |$B(x,r)$| be the open ball in |${\mathbb{R}}^{d}$| of radius |$r$| and centre |$x$|⁠. By the Poincaré inequality for a ball3 (cf. [9, Theorem 2, page 291]) we see that there exists a constant |$C>0$| such that, for every |$r>0$| and |$x\in{\mathbb{R}}^{d}$|⁠,
(4.5)
Setting |$y=\tau (x)$|⁠, |$r=\|\tau \|_{L^\infty }$| and taking the |$L^{2}$| norm lead to the desired conclusion.

5. Multiresolution approximation spaces

Fix |$\phi \in L^{2}({\mathbb{R}}^{d})$| and recall [19] that the associated approximation space |$U_{s}$| at scale |$s>0$| is defined as follows:

In the rest of the paper we are going to deal with the following assumptions on |$\phi $|⁠.

 

Assumption A.
There exist constants |$A, B>0$| such that
(5.1)
This is equivalent to assuming that |$\{ \phi _{s,n} \}$| is a Riesz basis for |$U_{s}$| (cf. [19, Theorem 3.4] in the case where |$d=1$|⁠, while the result for |$d>1$| follows by direct extension of the one-dimensional one).

We further assume one of the following regularity/decay conditions on |$\phi $|⁠.

 

Assumption B.

At least one of the following conditions holds.

  1. |$\phi $| belongs to the Wiener space:
    (5.2)
    in particular |$\phi $| is locally bounded and has a |$L^{1}$| decay.
  2. There exist |$\alpha> 1/2$| and |$B^{\prime}>0$| such that
    (5.3)
    where we introduced the weight function |$v(\omega ) = \langle \omega _{1} \rangle \cdots \langle \omega _{d} \rangle $|⁠, |$\omega \in{\mathbb{R}}^{d}$|⁠.

 

Assumption C.

At least one of the conditions (5.2) and (5.3) of Assumption B is satisfied for all |$\partial _{j} \phi $|⁠, |$j=1,\ldots ,d$|⁠, in place of |$\phi $|⁠.

 

Example 1.

This is a convenient stage where to present some examples of functions satisfying the assumptions. Generally speaking, (5.2) is satisfied by any function |$\phi \in L^\infty ({\mathbb{R}}^{d})$| with compact support, while the same condition on the Fourier side (i.e. |$\hat{\phi } \in L^\infty ({\mathbb{R}}^{d})$| with compact support) guarantees that 5.3 holds. To be more concrete, let us provide some standard examples in dimension |$d=1$|—Assumption A will be satisfied in all cases (cf. [19, Section 3.1.3, pages 69,70]).

  • The choice |$\phi = \mathbb{1}_{[0,1]}$|⁠, leading to piecewise constant approximations (block sampling), is easily seen to satisfy (5.2) but not (5.3) for any |$\alpha>1/2$|⁠, nor Assumption C.

  • The normalized sinc function |$\phi (x) = \frac{\sin (\pi x)}{\pi x}$|⁠, corresponding to Shannon approximations (i.e. band-limited functions), satisfies (5.3) for every |$\alpha>0$|⁠, as well as Assumption C, but not (5.2).

  • The B-spline |$\phi $| of degree |$n$|⁠, obtained by |$n+1$| convolutions of |$\mathbb{1}_{[0,1]}$| with itself and centring at |$0$| or |$1/2$|⁠, can be characterized by its Fourier transform:
    We see that if |$n\ge 1$| then both (5.2) and (5.3) are satisfied (for |$\alpha < n+1/2$|⁠), as well as Assumption C (the case |$n=0$| is covered by the previous case of |$\phi = \mathbb{1}_{[0,1]}$|⁠).

In Assumption B we introduced the weight function |$v$|⁠. Let us now define a companion Sobolev space, for |$\alpha \in \mathbb{R}$|⁠, |$\alpha \geq 0$|⁠:

Roughly speaking, |$H^\alpha _\otimes ({\mathbb{R}}^{d})$| consists of functions in |$L^{2}({\mathbb{R}}^{d})$| that have at least |$\alpha $| (possibly fractional) derivatives in the directions of the axes in |$L^{2}({\mathbb{R}}^{d})$|⁠. It is easy to realize that this space contains functions in the usual Sobolev space |$H^{d\alpha }({\mathbb{R}}^{d})$| as well as tensor products |$\phi _{1}\otimes \ldots \otimes \phi _{d}$|⁠, with |$\phi _{j}\in H^\alpha (\mathbb{R})$|⁠, |$j=1,\ldots , d$|⁠.

 

Proposition 7.
If |$\alpha>1/2$| we have the embedding |$H^\alpha _\otimes ({\mathbb{R}}^{d})\hookrightarrow L^\infty ({\mathbb{R}}^{d})\cap C({\mathbb{R}}^{d})$|⁠, as well as

 

Proof.
The embedding in |$L^\infty $| follows at once from the chain of inequalities
and the fact that |$v^{-\alpha } \in L^{2}({\mathbb{R}}^{d})$| if |$\alpha> 1/2$|⁠. The embedding in |$C({\mathbb{R}}^{d})$| is then clear because the space of Schwartz functions is easily seen to be dense in |$H^\alpha _\otimes ({\mathbb{R}}^{d})$|⁠.
Concerning the embedding in |$X^{\infty ,2}$|⁠, let |$g \in C^\infty _{c}({\mathbb{R}}^{d})$|⁠, with |$g=1$| on |$B_{1}$|⁠. Then
where the last inequality is proved in [13, Propositon 11.3.1(c)].

We now establish a crucial reverse Hölder-type inequality for functions in |$U_{s}$|⁠.

 

Theorem 8.

Let |$\phi \in L^{2}({\mathbb{R}}^{d})$| be such that Assumption A is satisfied.

  1. If Assumption B holds then there exists |$C>0$| such that, for every |$r,s>0$|⁠,
    (5.4)
  2. If Assumption C holds then there exists |$C>0$| such that, for every |$r,s>0$|⁠,

 

Remark 2.
Let |$P_{U_{s}}$| be the orthogonal projection operator on |$U_{s}$|⁠. Since |$\| P_{U_{s}} \|_{L^{2} \to L^{2}} = 1$|⁠, (5.4) is equivalent to

 

Proof of Theorem  8.
Let us commence with the proof of (5.4). Let |$\{ \tilde{\phi }_{s,n} \}_{n \in{\mathbb{Z}}^{d}}$| be the dual basis to |$\{ {\phi }_{s,n} \}_{n \in{\mathbb{Z}}^{d}}$|⁠. If |$f\in U_{s}$| then
and by Lemma 1 we have
(5.6)
where in the last step we used Lemma 1 and Proposition 4.
Assume now (5.2), namely |$\phi \in X^{\infty ,1}$|⁠. Then the conclusion follows from (5.6) using the equivalent discrete-type norm in (3.2) (with |$Q=[0,1]^{d}$|⁠):
where we used that |$\ell ^{1} * \ell ^{2} \hookrightarrow \ell ^{2}$| and |$\| a_{n} \|_{\ell ^{2}} \lesssim \| f\|_{L^{2}}$|⁠.
Let us assume (5.3) instead. By (5.6), it is enough to show that
Using the embedding in Proposition 7 we obtain
where we set |$F(\omega ) := \sum _{n \in{\mathbb{Z}}^{d}} a_{n} e^{-in\omega }$| (which is a |$2\pi $|-periodic, square integrable on |$[0,2\pi ]$|⁠, function), and then used (5.3) and

The proof of (5.5) goes along the same lines after differentiation in the representation |$f = \sum _{n \in{\mathbb{Z}}^{d}} a_{n} \phi _{s,n}$|⁠; the details are left to the interested reader.

 

Remark 3.

  1. It is easy to realize that if |$\phi $| satisfies (5.3) then |$\phi \in H^\alpha _\otimes ({\mathbb{R}}^{d})$| (it is enough to integrate both sides of (5.3) on |$[0,2\pi ]$|⁠); as a result, if |$\alpha>1/2$| then |$\phi $| is continuous by Proposition 7.

  2. If |$\phi \in L^{2}({\mathbb{R}}^{d})$| satisfies Assumption C then |$\phi $| has first-order partial derivatives locally in |$L^\infty $|⁠, hence |$\phi $| is locally Lipschitz, therefore continuous.

  3. If |$\phi \in L^{2}({\mathbb{R}}^{d})$| satisfies the assumption |$A$| and |$B$| and is continuous, then |$U_{s} \hookrightarrow C({\mathbb{R}}^{d})$| since the truncated sums |$\sum _{|n|\le N} a_{n} T_{sn}\phi $| are continuous and (5.4) shows that convergence in |$L^{2}$| implies convergence in |$X^{\infty ,2}_{r} \hookrightarrow L^\infty ({\mathbb{R}}^{d})$| for functions in |$U_{s}$|⁠.

  4. If |$s \ll r$| then the occurrence of the factor |$r/s$| in (5.4) can be heuristically explained by the presence of highly oscillating functions in |$U_{s}$|⁠, which are not stable under deformations of ‘size’ |$r$|⁠.

We are ready to provide deformation sensitivity bounds for functions in |$U_{s}$|⁠.

 

Theorem 9.
Let |$\phi \in L^{2}({\mathbb{R}}^{d})\cap C({\mathbb{R}}^{d})$| satisfy Assumptions A and B. There exists a constant |$C>0$| such that, for every |$\tau \in L^\infty ({\mathbb{R}}^{d};{\mathbb{R}}^{d})$| and |$s>0$|⁠,
(5.7)

 

Proof.

The desired estimate follows by a straightforward concatenation of Proposition 5, since the assumptions on |$\phi $| imply that |$U_{s} \hookrightarrow C({\mathbb{R}}^{d})$| (cf. Remark 3), and Theorem 8 with |$r=\| \tau \|_{L^\infty }$|⁠.

 

Theorem 10.
Let |$\phi \in L^{2}({\mathbb{R}}^{d}) $| be such that Assumptions A, B and C are satisfied. There exists a constant |$C>0$| such that
(5.8)
for every |$\tau \in L^\infty ({\mathbb{R}}^{d};{\mathbb{R}}^{d})$|⁠, |$s>0$| and |$f \in U_{s}$|⁠.

 

Proof.
Let us consider first the case |$\|\tau \|_{L^\infty }/s \le 1$|⁠. Combining Proposition 6 with Theorem 8 with |$r=\|\tau \|_{L^\infty }$| we infer, for |$f\in U_{s}$|⁠,
which is the claim.

The case |$\|\tau \|_{L^\infty }/s \ge 1$| can be approached via the triangle inequality, that is, |$\| F_\tau f - f \|_{L^{2}} \le \|F_\tau f \|_{L^{2}} + \| f \|_{L^{2}}$|⁠, and Theorem 9.

 

Remark 4.

More generally, the same result of Theorem 10 holds if |$f$| is replaced on the left-hand side by |$P_{U_{s}} f$| for |$f \in L^{2}({\mathbb{R}}^{d})$|⁠, cf. Remark 2. Moreover, taking into account the examples in Example 1 we see that Theorem 10 applies when |$U_{s}$| are approximation spaces of polynomial splines of degree |$n\geq 1$|⁠, as well of band-limited functions, which can be regarded as splines of infinite order.

We conclude this section by extending the above stability bounds to signal classes with minimal regularity. In addition to the assumptions of Theorem 10, we suppose that |$V_{j}:= U_{2^{j}}$|⁠, |$j\in \mathbb{Z}$|⁠, define a multiresolution approximation of |$L^{2}({\mathbb{R}}^{d})$|⁠, so that |$V_{j+1}\subset V_{j}$|⁠. Let |$W_{j+1}$| be the orthogonal complement of |$V_{j+1}$| in |$V_{j}$| and |$P_{W_{j}}$| be the corresponding orthogonal projection; for |$s\in \mathbb{R}$|⁠, the corresponding homogeneous Besov norm [19, Section 9.2.3] is given by

(5.9)

 

Theorem 11.

Under the same assumptions of Theorem 10, suppose in addition that |$V_{j}:= U_{2^{j}}$|⁠, |$j\in \mathbb{Z}$|⁠, define a multiresolution approximation of |$L^{2}({\mathbb{R}}^{d})$|⁠.

There exists |$C>0$| such that for every |$\tau \in L^\infty ({\mathbb{R}}^{d};{\mathbb{R}}^{d})$| and |$f\in L^{2}({\mathbb{R}}^{d})$| with |$\|f\|_{\dot{B}^{d/2}_{2,1}}<\infty $|⁠,
(5.10)
and
(5.11)

 

Proof.
We consider the decomposition
and apply (5.8) to each term, hence we obtain
(5.12)
which implies the desired result if |$d\geq 2$|⁠.
For |$d=1$| it is sufficient to continue the estimate in (5.12) using

 

Remark 5.

From the very definition (5.9) of the Besov norm, it follows that if |$d\geq 2$| and |$f\in L^{2}({\mathbb{R}}^{d})$| with |$\|f\|_{\dot{B}^{d/2}_{2,1}}<\infty $| then |$\|f\|_{\dot{B}^{1}_{2,1}}<\infty $|⁠.

Also, note that even in dimension 1 we have |$\|F_\tau f - f\|_{L^{2}}=O(\|\tau \|_{L^\infty })$| as |$\|\tau \|_{L^\infty }\to 0$| for every fixed |$f\in U_{s}$| and every |$s>0$|⁠, as a consequence of Theorem 10. However, this asymptotic estimate is not uniform in the ball |$\|f\|_{L^{2}}+\|f\|_{\dot{B}^{1/2}_{2,1}}\leq 1$|⁠, and the factor |$\|\tau \|^{1/2}_{L^\infty }$| in (5.11) is instead optimal when looking for uniform estimates; see the examples in Section 7 below. In dimension |$d\geq 2$| it follows easily from (5.10) that |$\|F_\tau f - f\|_{L^{2}}=O(\|\tau \|_{L^\infty })$| as |$\|\tau \|_{L^\infty }\to 0$| uniformly for |$f$| in the ball |$\|f\|_{L^{2}}+\|f\|_{\dot{B}^{d/2}_{2,1}}\leq 1$|⁠.

6. Frequency-modulated deformations

In this section we extend some results proved so far to the class of time-frequency deformation mappings |$F_{\tau ,\omega }$| associated with distortion functions |$\tau \in L^\infty ({\mathbb{R}}^{d}; {\mathbb{R}}^{d})$|⁠, |$\omega \in L^\infty ({\mathbb{R}}^{d};\mathbb{R})$| by setting

where |$f\colon{\mathbb{R}}^{d} \to{\mathbb{C}}$|⁠. In case of trivially null distortions we write |$F_{0,\omega }$| and |$F_{\tau ,0}$| with obvious meaning.

While most of the results above can be stated and proved with minor updates for general deformations |$F_{\tau ,\omega }$|⁠, we prefer to offer here a different perspective that allows one to reduce to the results for |$F_\tau $| in a straightforward way. Indeed, note that |$F_{\tau ,\omega } = F_{0,\omega }F_{\tau ,0}$| and |$F_{\tau ,0}$| coincides with the deformation |$F_\tau $| considered in the previous sections. Moreover, for every |$f \in L^{2}({\mathbb{R}}^{d})$| we have that |$\| F_{\tau ,\omega } f \|_{L^{2}} = \|F_{\tau ,0} f \|_{L^{2}}$| for arbitrary measurable |$\omega $|⁠, and

The second addend is already covered, while for the first one we have

As a result, the bounds in Propositions 5 and 6 generalize as follow.

 

Theorem 12.
We have
(6.1)
for every |$f \in X^{\infty ,2}_{r} \cap C({\mathbb{R}}^{d})$| and |$\tau \in L^\infty ({\mathbb{R}}^{d};{\mathbb{R}}^{d})$|⁠, |$\omega \in L^\infty ({\mathbb{R}}^{d};\mathbb{R})$|⁠.
Moreover, there exists |$C>0$| such that
(6.2)
for every |$\tau \in L^\infty ({\mathbb{R}}^{d};{\mathbb{R}}^{d})$|⁠, |$\omega \in L^\infty ({\mathbb{R}}^{d};\mathbb{R})$| and |$f \in X^{\infty ,2}_{r}$| with |$\| \nabla f \|_{X^{\infty ,2}_{r}}<\infty $|⁠.

With the same arguments of the proofs of Theorems 10, using the bounds in Theorem 12 whenever appropriate, we obtain the following generalization.

 

Theorem 13.
Let |$\phi \in L^{2}({\mathbb{R}}^{d})$| be such that Assumptions A, B and C in Section 5 hold. There exists a constant |$C>0$| such that
(6.3)
for every |$s>0$|⁠, |$f \in U_{s}$| and |$\tau \in L^\infty ({\mathbb{R}}^{d};{\mathbb{R}}^{d})$|⁠, |$\omega \in L^\infty ({\mathbb{R}}^{d};\mathbb{R})$|⁠.

We remark that for band-limited functions |$U_{s} = \mathrm{PW}_{R}$| with |$s=\pi /R$| and in the relevant case where |$R\|\tau \|_{L^\infty } \le 1$| we recover the same bounds proved in [25] without extra regularity conditions on |$\tau $| or |$\omega $|⁠. Similarly, one could generalize the estimates in Besov spaces of the previous section.

7. Sharpness of the estimates

We now study the problem of the sharpness of some estimates proved so far, focusing in particular on the case of band-limited functions.

For |$R>0$| consider the space of band-limited functions

We already commented in Example 1 that such a space of low-frequency functions can be equivalently designed as a multiresolution space; precisely, we have |$\mathrm{PW}_{R} = U_{s}$| with |$s=\pi / R$| after choosing the normalized low-pass sinc filter |$\phi =\phi _{0} \otimes \cdots \otimes \phi _{0}$| (⁠|$d$| times), with |$\phi _{0}(t) = \pi ^{-1/2}\sin t/t$|⁠, |$t \in{\mathbb{R}}$|⁠, which satisfies Assumptions A, B and C.

Theorems 9 and 10 above thus cover the case of band-limited approximations. Precisely, (5.7) now reads

(7.1)

while (5.8) becomes

(7.2)

We claim that the exponent |$d/2$| appearing in the previous estimates is optimal. For what concerns (7.1), it suffices to consider |$f_{R} \in \mathrm{PW}_{R}$| given by |$f_{R}=R^{d/2} D_{R} \phi $|⁠, so that |$\| f_{R}\|_{L^{2}} = 1$| and |$\widehat{f_{R}} = (\pi /R)^{d/2} \mathbb{1}_{[-R,R]^{d}}$|⁠. Now, for |$K>0$| set

so that |$\| \tau \|_{L^\infty } = K$|⁠. Then, for |$|x|\le K$| we have

and thus

By the triangle inequality we also deduce

(7.3)

which shows the sharpness of the exponent |$d/2$| in (7.2) as well.

Concerning the sharpness of the estimate (7.2) in the regime |$R\| \tau \|_{L^\infty }\ll 1$| we see that if |$f=f_{R}$| as above and |$\tau (x)=(c,0,\ldots ,0)\in{\mathbb{R}}^{d}$| (constant), for |$|c|R$| small enough we have

8. Random deformations

We now model the deformation |$\tau (x)$| as a measurable random field, i.e. |$\tau (x)=\tau (x,\omega )$| depends on an additional variable4  |$\omega \in \mathcal{U}$|⁠, where the sample space |$\mathcal{U}$| is equipped with a probability measure |$\mathbb{P}$|⁠, and the function |$\tau (x,\omega )$| is jointly measurable (see for instance [14, Chapter 3] for further details).

It is easy to realize that the results of the previous sections hold for almost every realization of |$\tau (x)$| if, e.g. |$\|\tau \|_{L^\infty }<\infty $|⁠, which must be intended hereinafter as the essential supremum jointly in |$x,\omega $|⁠. However, it turns out that some results hold, in fact, in a maximal sense5. Precisely, an inspection of the proof of the formula (4.1) shows that we have

(8.1)

and similarly (4.3) becomes

(8.2)

As a consequence, under the assumptions of Theorem 10 we have, for |$f\in U_{s}$|⁠,

(8.3)

while arguing as in the proof of Theorem 11 we get

(8.4)

and

(8.5)

We are now ready to state our result concerning the stability in mean under random deformations.

 

Theorem 14.
Under the assumptions A, B and C in Section 5, there exists a constant |$C>0$| such that, for every |$s>0$| and |$f\in U_{s}$|⁠,
(8.6)
and
(8.7)
for every measurable random function |$\tau $| such that the random variables |$|\tau (x)|$|⁠, |$x \in{\mathbb{R}}^{d}$|⁠, are identically distributed and the above moments are finite.
Moreover, if the spaces |$U_{2^{j}}$|⁠, |$j\in \mathbb{Z}$|⁠, define a multiresolution approximation of |$L^{2}({\mathbb{R}}^{d})$|⁠, for the same deformations |$\tau (x)$| and every |$f\in L^{2}({\mathbb{R}}^{d})$| with |$\|f\|_{\dot{B}^{d/2}_{2,1}} <\infty $| we have
(8.8)
and
(8.9)

For the sake of brevity, we wrote |$\mathbb{E}[|\tau |^{2}]$| in place of |$\mathbb{E}[|\tau (x)|^{2}]$|⁠, and similarly for the other moments, since the variables |$|\tau (x)|$|⁠, |$x\in{\mathbb{R}}^{d}$|⁠, are assumed to be identically distributed. However, observe that the field |$\tau (x)$| is not assumed to be bounded.

 

Proof of Theorem  14.
Let us prove (8.6) and (8.7) first. Let us set
Then we can write
Taking the expectation and setting |$p_{j}=\mathbb{P}(\{2^{j-1}<|\tau (x)|\leq 2^{j}\})$| (note that |$p_{j}$| is independent of |$x$|⁠) we get
We use the estimate (8.3) to bound each term and we obtain
We now observe that, for every |$x\in{\mathbb{R}}^{d}$|⁠,
and similarly
Hence we have proved the estimate
which gives (8.6) and (8.7).

Similar arguments lead to the proof of (8.8) and (8.9), now using (8.4) and (8.5).

Acknowledgements

The authors wish to express their gratitude to Giovanni S. Alberti, Enrico Bibbona and Matteo Santacesaria for fruitful conversations on the topics of the manuscript, as well as for valuable comments on preliminary drafts.

S.I.T. was member of the Machine Learning Genoa (MaLGa) Center, Università di Genova, when this study was performed. This material is based upon a work supported by the Air Force Office of Scientific Research under award number FA8655-20-1-7027.

F.N. is a fellow of the Accademia delle Scienze di Torino, and a member of the Società Italiana di Scienze e Tecnologie Quantistiche (SISTEQ).

The authors are members of the Gruppo Nazionale per l’Analisi Matematica, la Probabilità e le loro Applicazioni (GNAMPA) of the Istituto Nazionale di Alta Matematica (INdAM).

Data availability statement

No new data were generated or analysed in support of this research.

Footnotes

1

Precisely, |$\|\nabla \tau \|_{L^\infty }\leq 1/2d$| in [25] and |$\| \nabla \tau \|_{L^\infty }\leq 1/2$| in [18]. This discrepancy is due to the definition |$\| \nabla \tau \|_{L^\infty }:= \| |\nabla \tau | \|_{L^\infty }$| where |$|\nabla \tau |$| is the Frobenius norm of the matrix |$\nabla \tau (x)$| in [18] and the |$\ell ^\infty $| norm of its entries in [25].

2

Actually, the result in [17] is stated for functions in the Sobolev space |$H^{2}({\mathbb{R}}^{d})$|⁠. Inspection of the proof and an easy density argument show that it actually holds for functions in the Sobolev space |$H^{1}({\mathbb{R}}^{d})$| of functions |$f\in L^{2}({\mathbb{R}}^{d})$| such that |$\|\nabla f\|_{L^{2}}<\infty $|⁠.

3

That is, |$\|f - \overline{ f}_{x,r}\|_{L^\infty (B(x,r))}\leq C r\|\nabla f\|_{L^\infty (B(x,r))}$|⁠, where |$\overline{ f}_{x,r}$| is the average of |$f$| over |$B(x,r)$|⁠. Since under our assumption |$f$| is continuous in |${\mathbb{R}}^{d}$|⁠, we can replace the |$L^\infty $| norm in the left-hand side by the supremum of |$|f|$|⁠, and then one obtains (4.5) from the triangle inequality (by adding and subtracting |$\overline{ f}_{x,r}$|⁠).

4

In this section we do not consider frequency-modulated deformations, nor we use the notation |$\omega $| for the frequency, hence there is no risk of confusion with the notation of previous sections.

5

Actually, we could equivalently reformulate the main estimates of the previous sections as results for the maximal operators |$\sup _{|y|\leq r}|f(x-y)|$| and |$\sup _{|y|\leq r}|f(x-y)-f(x)|$|⁠. However, the above presentation in terms of their linearized versions |$F_\tau $| and |$F_\tau -I$| seems closer to the spirit of the intended applications.

References

1.

Aliprantis
,
C. D.
&
Border
,
K.
(
2006
)
Infinite dimensional Analysis. A Hitchhiker’s Guide
. Berlin:
Springer
.

2.

Balan
,
R.
,
Singh
,
M.
&
Zou
,
D.
(
2018
)
Lipschitz properties for deep convolutional networks
.
Contemp. Math.
,
706
,
129
151
.

3.

Bietti
,
A.
&
Mairal
,
J.
 
Invariance and stability of deep convolutional representations
. In:
Advances in Neural Information Processing Systems (NIPS)
. Red Hook, NY, USA: Curran Associates Inc.,
2017
.

4.

Bietti
,
A.
&
Mairal
,
J.
(
2019
)
Group invariance, stability to deformations, and complexity of deep convolutional representations
.
J. Mach. Learn. Res.
,
20
,
1
49
.

5.

Bronstein
,
M. M.
,
Bruna
,
J.
,
Cohen
,
T.
&
Veličković
,
P.
(
2025
)
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
.
To appear—-MIT Press
 
arXiv:2104.13478
.

6.

Bruna
,
J.
&
Mallat
,
S.
(
2013
)
Invariant scattering convolution networks
.
IEEE Trans. Pattern Anal. Mach. Intell.)
,
35
,
1872
1886
.

7.

Cordero
,
E.
&
Nicola
,
F.
(
2009
)
Sharpness of some properties of wiener amalgam and modulation spaces
.
Bull. Austral. Math. Soc.
,
80
,
105
116
.

8.

D’Ancona
,
P.
&
Nicola
,
F.
(
2016
)
Sharp |${L}^p$| estimates for Schrödinger groups
.
Rev. Mat. Iberoamericana
,
32
,
1019
1038
.

9.

Evans
,
L. C.
(
2010
)
Partial differential equations
.
Graduate Studies in Mathematics
, 19, Second edn.
Providence, Rhode Island
:
American Mathematical Society
.

10.

Feichtinger
,
H. G.
(
1983
)
Banach convolution algebras of Wiener type
.
Functions, series, operators, Vol. I, II (Budapest, 1980)
,
Colloq. Math. Soc. János Bolyai, 35
.
North-Holland, Amsterdam
, pp.
509
524
.

11.

Feichtinger
,
H. G.
(
1981
)
Banach spaces of distributions of Wiener’s type and interpolation
. In:
Functional analysis and approximation (Oberwolfach, 1980)
,
Internat. Ser. Numer. Math., 60, Birkhäuser, Basel-Boston, Mass.
,
153
165
.

12.

Grafakos
,
L.
(
2014
)
Modern Fourier analysis
, Third edn.
New York
:
Springer New York
.

13.

Gröchenig
,
K.
(
2001
)
Foundations of time-frequency analysis
.
Applied and Numerical Harmonic Analysis
.
Boston, MA
:
Birkhäuser Boston, Inc.

14.

Gihman
,
I. I.
&
Skorohod
,
A. V.
(
1980
)
The Theory of Stochastic Processes I. Translated from the Russian by Samuel Kotz
.
Corrected reprint of the first English edition. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], 210
.
New York, NY
:
Springer New York
.

15.

Grohs
,
P.
,
Wiatowski
 
T.
&
Bölcskei
,
H.
(
2016
)
Deep convolutional neural networks on cartoon functions
. In:
2016 IEEE International Symposium on Information Theory (ISIT)
,
1163
1167
.

16.

Heil
,
C.
(
2003
)
An introduction to weighted Wiener amalgams
.
Wavelets and their applications
. (
S.
 
Thangavelu
,
M.
 
Krishna
&
R.
 
Radha
eds).
New Dehli
:
Allied Publishers
, pp.
183
216
.

17.

Koller
,
M.
,
Großmann
,
J.
,
Monich
,
U.
&
Boche
,
H.
(
2018
)
Deformation stability of deep convolutional neural networks on Sobolev spaces
. In:
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
,
Calgary, AB
,
6872
6876
.

18.

Mallat
,
S.
(
2012
)
Group invariant scattering
.
Commun. Pure Appl. Math.
,
65
,
1331
1398
.

19.

Mallat
,
S.
(
2009
)
A Wavelet Tour of Signal Processing. The sparse way
.
With contributions from Gabriel Peyré
, Third edn.
Amsterdam
:
Elsevier/Academic Press
.

20.

Nicola
,
F.
&
Trapasso
,
S. I.
(
2023
)
Stability of the scattering transform for deformations with minimal regularity
.
J. Math. Pures Appl. (9)
,
180
,
122
150
.

21.

Prakash
,
A.
,
Moran
,
N.
,
Garber
,
S.
,
DiLillo
,
A.
,
Storer
,
J.
(
2018
)
Deflecting adversarial attacks with pixel deflection
.
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
, pp.
8571
8580
.

22.

Scaman
,
K.
&
Virmaux
,
A.
 
Lipschitz regularity of deep neural networks: analysis and efficient estimation
. In:
Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS 2018)
. Red Hook, NY, USA: Curran Associates Inc., 2018.

23.

Stein
,
E. M.
(
1970
 
xiv+
)
Singular Integrals and Differentiability Properties of Functions
 
Princeton Mathematical Series, No. 30
.
Princeton, N.J
: Princeton University Press, p.
290
.

24.

Tao
,
T.
(
1999
)
Low regularity semi-linear wave equations
.
Commun. Partial Differ. Equations
,
24
,
599
629
.

25.

Wiatowski
,
T.
&
Bölcskei
,
H.
(
2018
)
A mathematical theory of deep convolutional neural networks for feature extraction
.
IEEE Trans. Inf. Theory
,
64
,
1845
1866
.

26.

Wiatowski
,
T.
&
Bölcskei
,
H.
(
2015
)
Deep convolutional neural networks based on semi-discrete frames
. In:
2015 IEEE International Symposium on Information Theory (ISIT)
,
Hong Kong, China
,
1212
1216
.

27.

Zou
,
D.
,
Balan
,
R.
&
Singh
,
M.
(
2020
)
On Lipschitz bounds of general convolutional neural networks
.
IEEE Trans. Inf. Theory
,
66
,
1738
1759
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights)