Image reconstruction algorithms in radio interferometry: from handcrafted to learned regularization denoisers

We introduce a new class of iterative image reconstruction algorithms for radio interferometry, at the interface of convex optimization and deep learning, inspired by plug-and-play methods. The approach consists in learning a prior image model by training a deep neural network (DNN) as a denoiser, and substituting it for the handcrafted proximal regularization operator of an optimization algorithm. The proposed AIRI (``AI for Regularization in radio-interferometric Imaging'') framework, for imaging complex intensity structure with diffuse and faint emission from visibility data, inherits the robustness and interpretability of optimization, and the learning power and speed of networks. Our approach relies on three steps. Firstly, we design a low dynamic range training database from optical intensity images. Secondly, we train a DNN denoiser at a noise level inferred from the signal-to-noise ratio of the data. We use training losses enhanced with a nonexpansiveness term ensuring algorithm convergence, and including on-the-fly database dynamic range enhancement via exponentiation. Thirdly, we plug the learned denoiser into the forward-backward optimization algorithm, resulting in a simple iterative structure alternating a denoising step with a gradient-descent data-fidelity step. We have validated AIRI against CLEAN, optimization algorithms of the SARA family, and a DNN trained to reconstruct the image directly from visibility data. Simulation results show that AIRI is competitive in imaging quality with SARA and its unconstrained forward-backward-based version uSARA, while providing significant acceleration. CLEAN remains faster but offers lower quality. The end-to-end DNN offers further acceleration, but with far lower quality than AIRI.


INTRODUCTION
Data acquisition by interferometry in radio astronomy relies on an array of antennas measuring an incomplete coverage of the spatial Fourier domain of the image of interest, yielding an ill-posed inverse problem towards image formation.Modern and upcoming radio telescopes are designed to bring unprecedented resolution and sensitivity.In this context, the algorithms to be deployed to solve the inverse imaging problem face an ever increasing requirement to jointly deliver precision (i.e.high resolution and dynamic range), robustness (i.e.endowed with calibration and uncertainty quantification functionalities), and scalability (i.e. the capability to process sheer data volumes).
Most RI imaging pipelines utilized by astronomers are based on the CLEAN algorithm, originally proposed by Högbom (1974).Assuming a sparse sky model, this greedy algorithm iteratively removes the contribution of each point source in the dirty image (more generally known as the backprojected image, defined as the inverse Fourier transform of the data).Numerous extensions of CLEAN have been devised over the last fifty years (e.g.Schwarz 1978;Schwab & Cotton 1983;Bhatnagar & Cornwell 2004;Cornwell et al. 2008;Thompson et al. 2017).Albeit simple and computationally efficient, CLEANbased methods often require extensive manual intervention to ensure ★ E-mail: m.terris@hw.ac.uk their stability.Furthermore, CLEAN restored images often exhibit sub-optimal imaging quality.On the one hand, their resolution is limited to that of the observations due to the convolution with a smoothing beam.On the other hand, their sensitivity is limited to the noise level, through the addition of the residual image.
Since their first inception by Wiaux et al. (2009), convex and nonconvex optimization algorithms underpinned by sparsity priors have emerged in radio interferometric (RI) imaging (Li et al. 2011;Dabbech et al. 2015;Garsden et al. 2015;Onose et al. 2016Onose et al. , 2017;;Repetti et al. 2017;Pratley et al. 2018;Dabbech et al. 2018;Repetti et al. 2018Repetti et al. , 2019;;Birdi et al. 2020;Thouvenin et al. 2022a,b) and have led to a much higher image precision than that offered by advanced versions of CLEAN.In this framework, a so-called objective function is defined, typically as the sum of a data fidelity term and a regularization term injecting a prior image model to compensate for data incompleteness.The image estimate is defined as the minimizer of this objective and is reached via provably convergent algorithms.The obtained solution can also be understood in a Bayesian framework as a maximum a posteriori (MAP) estimate with respect to a posterior distribution, the negative logarithm of which is the objective.Uncertainty quantification approaches fully powered by optimization algorithms have also been proposed (Repetti et al. 2018(Repetti et al. , 2019)).The versatility of the framework has progressively imposed optimization algorithms as a cornerstone of current state-of-the-art techniques in RI imaging.We note that they have also proven their worth for image reconstruction in the related fields of optical interferometry (Thiébaut & Young 2017) and very long baseline interferometry (VLBI) (Akiyama et al. 2019(Akiyama et al. , 2022)).However, albeit robust and interpretable, optimization algorithms come at the expense of a significant increase of the computational cost over CLEAN, ultimately affecting their scalability to large data volumes.
Bayesian inference approaches have also been proposed to address the RI image formation problem (Cai et al. 2018;Arras et al. 2019), also for VLBI (Akiyama et al. 2022), naturally enabling uncertainty quantification, but remaining computationally very demanding for the data volumes expected by modern radio telescopes.
Recent works have contemplated the use of deep neural networks (DNNs) for end-to-end reconstruction or postprocessing in RI imaging (Terris et al. 2019;Connor et al. 2022;Gheller & Vazza 2021).Despite promising results and tremendous scalability capabilities, these approaches have neither been thoroughly validated for high resolution high dynamic range imaging of complex structure involving diffuse and faint emission, nor compared with CLEAN or stateof-the-art optimization approaches.Moreover, end-to-end DNNs are known to be subject to robustness issues (Goodfellow et al. 2015;Nguyen et al. 2015;Pang et al. 2018).Their exploitation raises questions, not only with regards to the interpretability of the obtained solution, specifically in a Bayesian context, but also their generalizability, i.e. their ability to provide accurate reconstruction quality in the case of acquisition conditions unseen during training.
In this work, we propose a new RI image reconstruction approach based on the Plug-and-Play (PnP) framework, whereby a prior image model is learned by training a DNN as a denoiser, and substituted for the so-called proximal operator enforcing regularization at the heart of proximal optimization algorithms (Venkatakrishnan et al. 2013;Chan et al. 2016;Zhang et al. 2017Zhang et al. , 2019)).PnP algorithms were shown to deliver outstanding image reconstruction quality in applications such as image restoration (Zhang et al. 2021) and medical imaging (Ahmad et al. 2020).Importantly, as the DNN is trained as a denoiser on an image database to serve as a simple regularization operator, it is by construction applicable for any sensing procedure, de facto avoiding the generalizability issue affecting end-to-end approaches.Leveraging monotone operator theory, recent works have shown that these algorithms provably converge to well characterized solutions under nonexpansiveness conditions on the denoiser, yielding similar interpretations to that of MAP estimators (Cohen et al. 2021;Pesquet et al. 2021).In summary, this versatile framework simultaneously inherits the robustness and interpretability of optimization approaches, and the learning power and scalability capabilities of DNNs.
Building on Pesquet et al. (2021), we propose a PnP framework dubbed AIRI, standing for "AI for Regularization in radiointerferometric Imaging".Focusing on monochromatic intensity imaging, our approach for developing a first AIRI algorithm capable of imaging complex structure with diffuse and faint emission can be summarized in three steps.The first step is to design, from publicly available optical images, a realistic but low dynamic range database of intensity images for supervised training.The second step is to train a DNN denoiser with basic architecture ensuring positivity of the reconstructed images, at the signal-to-noise ratio of the data.
The training relies on a simple ℓ 2 or ℓ 1 loss enhanced with a firm nonexpansiveness term ensuring algorithm convergence, and on an on-the-fly exponentiation procedure for dynamic range enhancement.The third step resides in plugging the DNN into the FB optimization algorithm, resulting in a simple iterative structure alternating between a denoising step and a gradient-descent data-fidelity step.
We finally validate the resulting AIRI-ℓ 2 and AIRI-ℓ 1 algorithms, implemented in Matlab, on high resolution high dynamic range simulations utilizing intensity images containing diffuse and faint emission across the field of view.Our test images are 3c353, Hercules A, Centaurus A, and Cygnus A, with size 512 × 512.Our benchmark algorithms are (i) the multi-scale CLEAN version implemented in the C++ WSClean software package (Offringa & Smirnov 2017), (ii) two optimization algorithms from the SARA family leveraging the handcrafted "average sparsity" proximal regularization operator: SARA itself (Onose et al. 2017) and its unconstrained FB-based version uS-ARA that we introduce here, and (iii) a UNet trained in an end-to-end fashion to reconstruct target images from dirty images.While SARA relies on an advanced primal-dual FB (PDFB) algorithm (Pesquet & Repetti 2015) enabling non-differentiable data-fidelity constraints, uSARA only differs from AIRI-ℓ 2 and AIRI-ℓ 1 by the use of the average sparsity proximal operator in lieu of a learned denoiser for regularization.It in fact corresponds to the standalone version of the imaging module of the joint calibration and imaging approach proposed in Repetti et al. (2017) and Dabbech et al. (2021).Results show that AIRI-ℓ 2 and AIRI-ℓ 1 are already competitive with SARA and uSARA in terms of imaging quality, while providing a significant acceleration.In a nutshell, AIRI does indeed inherit the robustness and interpretability of optimization approaches, and the learning power and speed of DNNs.The WSClean code remains significantly faster but offers lower reconstruction quality.The UNet offers further acceleration, but with far lower quality than AIRI.The remainder of the paper is organized as follows.In Section 2, we review imaging approaches from proximal optimization, recall SARA, and introduce its unconstrained version uSARA.In Section 3, we review PnP algorithms through the prism of proximal algorithms and introduce AIRI.In Section 4, we study the performance of AIRIℓ 2 and AIRI-ℓ 1 through extensive simulation in comparison with the benchmark algorithms.In Section 5, we draw our conclusions and discuss necessary future work.

RI imaging problem
Aperture synthesis in radio astronomy probes the sky by measuring an incomplete coverage of the spatial Fourier domain of the image of interest through an array of antennas.Focusing on monochromatic intensity imaging, assuming a narrow field of view, and in the absence of atmospheric and instrumental perturbations, each pair of antennas acquires a noisy Fourier component of the intensity image to be formed, called a visibility.The associated Fourier mode (also called -point) is given by the projection of the corresponding baseline, expressed in units of the observation wavelength, onto the plane perpendicular to the line of sight (Thompson et al. 2017).A discrete formulation of the resulting linear RI image formation problem, aiming to restore a target intensity image x ∈ R  from the measured complex visibilities y ∈ C  , reads (Onose et al. 2016) where  = GFZ ∈ C × is the measurement operator, G ∈ C × is a sparse interpolation matrix, encoding the non-uniform Fourier transform, F ∈ C × is the 2D Discrete Fourier Transform, Z ∈ R × is a zero-padding operator, incorporating the correction for the convolution performed through the operator G, and e ∈ C  is a realization of some i.i.d.Gaussian random noise, with zero mean and standard deviation  > 0. We refer to Appendix A for considerations regarding more general noise distributions, and the inclusion of socalled direction-dependent effects (DDEs) in G.
We also note that, backprojecting problem (1) into the image domain gives where (•) † denotes the complex conjugate transpose and Re{•} denotes the real part.Re{ † y} is known as the dirty image, Re{ † } is the operator representing the convolution of x by the point spread function, also known as the dirty beam, and Re{ † e} is the noise backprojected in the image domain.
We finally emphasise that in the optical interferometry and VLBI contexts, imaging is often performed from closure quantities derived from the visibilities, bringing a nonlinear inverse problem affected by non-Gaussian noise (Akiyama et al. 2019(Akiyama et al. , 2022)).Imaging in the presence of DDEs or from closure quantities lies beyond the scope of the present work.

Proximal algorithms & FB
Leaving the nature of  aside, problems of the likes of (1) are ubiquitous in imaging sciences, and arise for instance in image restoration (Levin et al. 2009;Yang et al. 2010;Bredies & Holler 2020), hyperspectral imaging (Wang et al. 2015;Xie et al. 2019) and medical imaging (Gupta et al. 2018;Zbontar et al. 2018;Fessler 2020), to name a few.A widespread approach to solve such a problem is to reformulate it as a convex minimization problem minimize where  (x) +  (x) is called the objective function,  ∈ Γ 0 (R  )1 is the term enforcing data-fidelity, and  ∈ Γ 0 (R  ) is the regularization term introduced to address the ill-posedness of (1) by enforcing a prior image model.The theory of optimization offers a myriad of algorithms to solve such minimization problems, ranging from the simple Forward-Backward (FB) algorithm (Bauschke & Combettes 2017), applicable if one of the two terms is differentiable, to ADMM (Boyd et al. 2011), or the Douglas-Rachford algorithm (Eckstein & Bertsekas 1992), applicable even if both  and  are non-differentiable, and more evolved structures, such as PDFB (Pesquet & Repetti 2015), that can handle multi-term objective functions with as many non-differentiable data-fidelity and regularization terms processed in parallel.All these algorithms are part of the same class of proximal algorithms, where differentiable and non-differentiable terms respectively translate in the algorithm into gradient and proximal operators.
Considering  ∈ Γ 0 (R  ), its proximal operator prox  is defined as and can be interpreted as a generalization of a projection operator, or interestingly, as a denoising operator.Indeed, problem (4) can be interpreted as the minimization problem to be solved for a simple denoising problem of the form z = u+w where the data z result from adding some i.i.d.Gaussian noise w to the unknown u.From this perspective, the first term of the objective function in (4) would be the standard data-fidelity term, given the Gaussian nature of the noise, and  the regularization term.A proximal operator is a denoiser!The choice of  and  in (3) is of paramount importance as it influences the final solution to the minimization task (Mallat 1999;Selesnick et al. 2005;Bredies & Holler 2020).Interpreting the reconstructed image as a MAP estimate leads naturally to choosing  as the negative log-likelihood associated with a given statistical model of the noise (typically a squared ℓ 2 norm for i.i.d.additive Gaussian noise), and  as the negative logarithm associated with a statistical prior image model (typically, ℓ 1 norm for a Laplace prior).However, optimization algorithms are not tied to this statistical interpretation.In the context of compressive sensing theory (Candès et al. 2006;Donoho 2006;Baraniuk 2007), where sparsity is the postulated signal model, the choice of the ℓ 1 norm as a regularization term simply emanates from it being the closest convex relaxation of the ℓ 0 norm that is the natural sparsity measure.ℓ 1 regularization was shown to yield state-of-the-art image reconstruction methods beyond the context of compressive sensing.For instance, sparsity of the gradient is enforced as the ℓ 1 norm of the magnitude of the image gradient, that is the so-called Total Variation semi-norm (TV) (Rudin et al. 1992;Bredies et al. 2010;Bredies & Holler 2020), whereas sparsity of the sought image in a sparsifying domain A ∈ R ×  can be enforced by choosing  (x) = A † x 1 where A can be a wavelet dictionary (Mallat 1999), an x-let transform (Candes & Demanet 2003;Do & Vetterli 2003), or a learned dictionary (Mairal et al. 2009), to name a few.Generally, the choice of the data-fidelity term is often driven by the statistical nature of the measurement noise, while regularization terms are carefully handcrafted to meet the specificity of the applications, with sparsity models defining the state-of-the-art.In summary, the MAP interpretation has its limitation, and in particular, setting the regularization parameter  according to the proper normalization of the statistical models of which  and  would be the negative logarithms, is known to be suboptimal.
The FB algorithm is a proximal algorithm designed to solve problems of the form (3), where  ,  ∈ Γ 0 (R  ) and  is differentiable.It is defined by the iterative sequence which is proven to converge towards a minimizer of the objective function (3), provided that 0 <  < 2/ where  is the Lipschitz constant2 of ∇  (Bauschke & Combettes 2017).Accelerated versions of FB, leveraging inertial terms, preconditioning, and stochastic approaches, also exist.We also emphasize that FB can be used to solve problems involving nonconvex regularization terms  (Beck & Teboulle 2009;Attouch et al. 2013;Chouzenoux et al. 2014;Combettes & Pesquet 2015;Repetti & Wiaux 2021).

SARA
The SARA family represent state-of-the-art optimization algorithms for RI imaging (Carrillo et al. 2012(Carrillo et al. , 2014;;Abdulaziz et al. 2016;Onose et al. 2016Onose et al. , 2017;;Birdi et al. 2018;Pratley et al. 2018;Dabbech et al. 2018;Abdulaziz et al. 2019;Thouvenin et al. 2022a Carrillo et al. 2012).The function  defines the SARA prior, also called the "average sparsity" prior, which consists of a positivity constraint and a log-sum prior promoting average sparsity in an over-complete dictionary  ∈ R × .More specifically, the dictionary  is defined as the concatenation of  = 9 orthogonal bases (the first eight Daubechies wavelets and the Dirac basis), re-normalized by √  = 3 to ensure that  † = I, where I denotes the identity operator.The SARA prior explicitly reads where (.)  denotes the  th coefficient of its argument vector, and R  + denotes the -dimensional real positive orthant.The parameter  > 0 is a regularization parameter, while  > 0 is used to avoid reaching zero values in the argument of the logarithmic terms.Thouvenin et al. (2022a) suggest to set both parameters to an estimate of the standard deviation of the measurement noise in the wavelet domain.Minimizing (6) should typically involve the proximal operator prox  , dubbed the "average sparsity" proximal operator.However, if the log-sum prior enforces a stronger sparsity of the solution than a typical ℓ 1 prior, it is also nonconvex.To address the resulting nonconvex minimization task, a reweighting procedure is adopted, where a sequence of convex surrogate minimization problems is solved iteratively (Candes et al. 2008;Carrillo et al. 2012), each involving a weighted-ℓ 1 prior , given by where W ∈ R × is a diagonal weighting matrix that needs to be updated after each resolution of the surrogate problem.The full SARA algorithm for solving (6) can be summarized as for  = 0, 1, . . .
with W 0 = I, where Diag(•) denotes the diagonal matrix containing its (vector) argument on the diagonal.Minimizing each of these problems will now involve the weighted-ℓ 1 prior  via its proximal operator prox  .One should acknowledge that the reweighting strategy also introduces significant complexity to the algorithm since a sequence of convex minimization problems needs to be solved.Given the non-differentiability of the data-fidelity term in (6), the PDFB algorithm (Pesquet & Repetti 2015) is leveraged to solve each minimization task of the form (9). We note that because  † is over-complete and due to the presence of the positivity constraint in (8), the proximal operator prox  does not admit a closed-form solution, thus a priori requiring a sub-iterative structure.However, the full splitting functionalities of PDFB enable the decomposition of  into functions admitting simple proximal operators (a componentwise projection on the real positive orthant and a component-wise thresholding operation).These proximal operators are handled in parallel by PDFB without the need for sub-iterations (Onose et al. 2016).Interestingly, the same functionalities enable handling large data volumes via parallel processing of data blocks (Onose et al. 2016), and large image sizes via parallel processing of image facets (Thouvenin et al. 2022a), providing significant scalability to SARA.

Unconstrained SARA (uSARA)
Building on the works of Repetti & Wiaux (2020, 2021), we propose to focus on the unconstrained formulation of the SARA problem, that we dub unconstrained SARA (uSARA), again looked at from the prism of monochromatic intensity RI imaging only.More precisely, we consider the problem (3), with a differentiable data-fidelity term  chosen as the mean squared error loss, corresponding to the negative log-likelihood of the data under the i.i.d.Gaussian random noise assumption, given by Adopting the prior model  described in (7), the resulting nonconvex minimization task writes where  > 0 is a regularization parameter.The nonconvexity of the prior  from ( 7) is handled through a similar reweighting procedure as in (9): for  = 0, 1, . . .
with W 0 = I.At each reweighting iteration, problem (12) can be solved with a standard FB algorithm.The full reweighting procedure for solving the uSARA problem ( 11) is summarized in Algorithm 1, where  1 > 0 is a relative variation convergence criterion, and where the gradient of  at x reads as and its Lipschitz constant is the spectral norm4 of Re{ † }, i.e.
We recall that the stepsize  in (5) is to be upper-bounded by 2/.As appears in Algorithm 1 solving (12) (steps 5-7), the weightedℓ 1 prior  is involved via prox  .The algorithmic structure does not exhibit the same full splitting functionalities as PDFB, and prox  must be computed iteratively.This can typically be achieved via a so-called dual FB algorithm (Combettes & Pesquet 2011), detailed in Algorithm 2, where  2 > 0 is a relative variation convergence criterion.This algorithm alternates between a projection on the real positive orthant (Step 3), denoted by Π R  + , and the proximal operator of the dual of the weighted-ℓ 1 prior, which involves a componentwise soft-thresholding operator prox  W• 15 (Step 4).Formally, Algorithms 1 and 2 lead to a triply sub-iterative structure: a proximal operator loop (Step 2), inside a weighted-ℓ 1 loop (Step 5), inside a reweighting loop (Step 3).This suggests a very computationally expensive structure.Nonetheless, firstly, Repetti & Wiaux (2021) have shown that only a fixed number of FB iterations  are required (Step 2), corresponding to the approximate minimization of (12), while preserving the convergence of the overall algorithm to a minimizer of the nonconvex objective (11).This contrasts with Algorithm 1 Re-weighted FB algorithm for uSARA x 0 = x 5: repeat for  = 0, . . .,  6: end for 8: SARA, where the sub-problems (9) need to be solved to convergence.The value of  can in fact be optimized to provide significant acceleration of Algorithm 1, effectively removing one iteration layer.
Secondly, the number of iterations in Algorithm 2 is moderate in practice when using an appropriate initialization strategy for the dual variable v 0 , leading to mild computational cost of prox  .We conclude this section by underlining that uSARA corresponds to the imaging module of the joint calibration imaging approach described in Repetti et al. (2017) and Dabbech et al. (2021).

PnP-FB
When solving an inverse imaging problem from an optimization theory viewpoint, one defines an objective function, of which the sought image would be a minimizer and which can be obtained via an iterative algorithm.The PnP approach (Venkatakrishnan et al. 2013) looks at the problem directly through the lens of the algorithm.As discussed in Section 2.2, it follows from the definition of a proximal operator that it can be interpreted as a denoiser.Borrowing a proximal optimization algorithm, the PnP approach proposes to replace the proximal regularization operator with a more general denoiser.We here note that this procedure can be applied to a wide class of algorithms, ranging from FB to PDFB.Specifically, assuming a minimization problem of the form (3) with the differentiable data fidelity term  , recall that the FB algorithm reads as in (5), with the proximal operator (4).Its PnP counterpart (PnP-FB) then simply follows as where D is a denoising operator, i.e. an operator specifically designed to remove i.i.d.Gaussian random noise from an image.Denoisers are ubiquitous in imaging sciences, and a large variety of denoisers have been defined and studied in the signal processing literature, including Gaussian filters, BM3D (Dabov et al. 2007), non local means (NLM) (Buades et al. 2011), with, more recently, deep neural networks (DNNs) leading the state of the art (Zhang et al. 2017).It is therefore very tempting to plug powerful denoisers in lieu of proximal operators... and play.In this context, we aim at taking advantage of the learning capabilities of DNNs in order to learn an appropriate regularization denoiser.Just as handcrafting the regularization term  (from which the proximal regularization operator prox  results) is crucial to the reconstruction quality for a pure optimization approach, learning a powerful denoiser is cornerstone for the reconstruction quality of PnP algorithms.Recent advances in DNNs for image denoising tasks have shown new state of the art results over traditional denoisers (Zhang et al. 2017;Wang et al. 2018;Zhou et al. 2020).Simultaneously, PnP algorithms with DNNs as denoisers have become the new state-of-the-art in image reconstruction (Zhang et al. 2021), significantly improving over their traditional optimization counterparts.
Yet, replacing the proximal operator with an off-the-shelf denoising operator is not inconsequential.In particular, a general nonproximal denoiser can a priori not be related with a regularization term in some overarching objective function.As a consequence, the characterization of the PnP solution (e.g. as a MAP estimator), and worse, the convergence properties of the iterative structure, are not ensured anymore, thus questioning the robustness and interpretability of PnP solutions.To overcome that issue, we follow the approach of Pesquet et al. (2021) allowing to enforce the so-called "firm nonexpansiveness constraint" on the denoiser D during training.This constraint ensures that D "contracts distances" and yields both the convergence of Algorithm 3 and the characterization of the limit point (see Appendix B for more details).The PnP-FB algorithm with established convergence guarantees is summarized in Algorithm 3, where  3 > 0 is a relative variation convergence criterion.In the RI imaging case of interest, the PnP counterpart to uSARA follows from defining  as in (10), with the measurement operator  from (1).The gradient of  at x therefore reads as in (13), with Lipschitz constant given in (14).

Denoising i.i.d. Gaussian noise of specific standard deviation
Building from Pesquet et al. (2021), our AIRI denoiser D will be trained in a supervised manner to remove i.i.d.Gaussian random noise of specific standard deviation .In other words, D is set to tackle the denoising problem where z denotes the noisy data input to the DNN, u is the unknown image corresponding to the target output of the DNN, and w ∼ N (0, I).As discussed in the next section, the input signal-to-noise ratio of the RI data provides a reliable handle on .

Equating training & target dynamic ranges
As already emphasized, AIRI will emerge from PnP-FB with  from (10) and  from (1).We propose to set the training noise level  for D to an appropriate estimate of the standard deviation of some effective image-domain noise induced by the original i.i.d.Gaussian random noise of standard deviation  on the Fourier data in (1).We propose below a procedure to estimate the standard deviation of this image-domain noise, which, for simplicity, we also approximate to be i.i.d.Gaussian random noise.Firstly, we resort to the image-domain formulation (2) of ( 1), normalized by the Lipschitz constant  in ( 14): Re{ † y}/ = Re{ † }x/ + Re{ † e}/.By construction, the chosen normalization ensures that the convolution operator Re{ † }/ has unit spectral norm, so that the dirty image is at the same scale as the original image.The covariance matrix of the image-domain noise Re{ † e}/ thus writes  2 Re{ † }/2 2 .Secondly, as proposed in Thouvenin et al. (2022a), we discard the correlation structure of the image-domain noise via the approximation Re{ † } I, leading to a noise covariance matrix approximation  2 I/2.As a result the effective image-domain noise is assumed i.i.d.Gaussian, and the original standard deviation  in the measurement domain is rescaled to / √ 2 in the transfer to the image domain.Equating the standard deviations of the training noise for D and the image-domain noise leads to the following heuristic: We note that this heuristic is algorithm agnostic, i.e. a priori independent of the algorithmic structure in which the denoiser is plugged.
In this sense, while similar, it is simpler and more general than the one proposed in Pesquet et al. (2021), which was derived specifically by estimating  around the fixed point of ( 15).As will be shown through simulations in Section 4, the proposed heuristic provides a very precise estimate of the optimal training noise level.
Assuming the validity of this heuristic, one must acknowledge a dependency of the training noise level  on the measurement model, via  and , suggesting that a denoiser should be trained independently for each target reconstruction.Also, we consider a normalized database of images whose maximum intensity values are lower or equal to 1 (see Section 3.3), so that the denoising performance cannot be ascertained on non-normalized images.These two generalizability questions can be solved by a simple rescaling of the original inverse problem (1) to simultaneously fit the normalization constraint (an upper bound on the peak intensity of the image being accessible from the dirty image) and rescale the noise level in the data to the training noise level of some already available denoiser.Dabbech et al. (2022) and Wilber et al. (2022a) build from these considerations and propose both a "single denoiser" approach and a more flexible "denoiser shelf" approach, where denoisers are respectively trained at a single predefined noise level or on a limited number of such noise levels, to which the target inverse problems can be rescaled at will.It also appears from the above rescaling considerations that the dimension of  is that of an inverse input image-domain peak signal-to-noise ratio rather than an absolute noise level.As the AIRI denoiser consistently categorizes -level values as noise to be removed, the input image-domain signal-to-noise ratio also naturally sets the target output peak signal-to-noise ratio or dynamic range of the reconstruction.In summary, our heuristic simply stipulates that the training dynamic range should be set equal to the target reconstruction dynamic range.

uSARA regularization parameter
We further note the very similar roles of the noise level  of the regularization denoiser in PnP-FB and the regularization parameter  in the FB algorithm (5) solving (3).At the algorithmic level, on the one hand, D is a denoiser and we have proposed to set  to the estimate (17) of the effective image-domain noise level, or more generally to the inverse target reconstruction dynamic range value.On the other hand, for the traditional FB algorithm,  is involved in (5) via prox  .Focusing on the specific uSARA instance described in Section 2.3.2, and summarized on Algorithms 1 and 2,  typically acts as a soft-thresholding parameter, thresholding out small values in the wavelet domain defined by the average sparsity dictionary.Setting its value to the standard deviation of the effective waveletdomain noise, which according to Thouvenin et al. (2022a) is the same as in the image domain given the unit spectral norm of the average sparsity dictionary , yields the heuristic: We readily note that, in contrast with the reasoning leading to (17), the above reasoning for  is specific to the FB structure underpinning uSARA.We also recall the parameter  > 0 in the definition of the average sparsity prior  in ( 7), used to avoid reaching zero values in the argument of the logarithmic terms.This parameter represents a floor level for the wavelet coefficients (Carrillo et al. 2012;Thouvenin et al. 2022a), and should naturally be set to the same effective waveletdomain noise level estimate as :  = .We further note that a variation of the proposed heuristic for the wavelet-domain noise level arises from a slightly different approximation in the transfer of the noise level from the image domain to the wavelet domain.Instead of assuming noise level conservation due to the normalization of , one can analyze the noise in each of the 9 bases of the dictionary separately, discarding the noise correlation structure between each pair of bases.In this case, the orthonormality of each basis and its normalization by 3 suggests a correction factor 1/3 in the heuristic values:  =  /3 √ 2.As will be shown in Section 4, relation ( 18) provides a useful reference to set  in uSARA.Even though the 1/3 correction will be shown to bring a closer-to-optimal value, it is still not as precise an estimate as ( 17) is for the training noise level of AIRI denoisers.We emphasize that a significant body of work has been dedicated to the question of setting the regularization parameter in problems of the form (3), with no simple and accurate solution to our knowledge (Donoho & Johnstone 1995;Luisier et al. 2007;Vidal et al. 2020).In this context, the fact that (18) provides an appropriate reference value for  in uSARA is an interesting result.However, the fact that the PnP-FB admits a very accurate heuristic to set the training noise level, avoiding the requirement for fine-tuning an arbitrary regularization parameter, represents a major advantage of AIRI over uSARA.

State-of-the-art and proposed approach
Supervised training of DNNs for imaging requires the definition of a database of images, which remains a main difficulty in translating deep learning methods to RI imaging due to the absence of groundtruths.The few works that have leveraged deep learning techniques for RI imaging so far resorted to various approaches.In Terris et al. (2019), we used a high dynamic range simulated radio map from Bonaldi & Braun (2018) of size 32000 × 32000 containing mainly elliptical Gaussian sources, broken down into a database of 1100 images of size 512 × 512.Gheller & Vazza (2021)  In this work, we are targeting high dynamic range imaging of complex structure with diffuse and faint emission across the field of view.The database to be designed for the training of the regularization denoiser thus needs to reflect this target.We propose (i) to source publicly available optical astronomy images containing a rich combination of compact and diffuse emission, (ii) to preprocess these images, aiming to remove existing noise and artefacts via a preliminary denoising procedure, and (iii) to exponentiate their intensity values to create high dynamic range images.The procedure results in the creation of a rich and clean training database of 2235 training images of size 512 × 512.More technical details are provided below.

Building a rich low dynamic range database
Firstly, we constituted a set of 32 large grayscale images (u raw  ) 1≤ ≤32 from optical astronomical intensity images available online, containing complex structure with both compact and diffuse emission.Each image is normalized with maximum intensity value equal to 1, and when the image contained colour channels, those were averaged.The typical size of each of these images is 3500 × 3500.The sample standard deviation of the residual noise in image regions with no signal, averaged over the 32 images reads  0 1/64, leading to a typical dynamic range of  −1 0 64, orders of magnitude below the values of interest in modern RI observations.
Secondly, the images are preprocessed with the aim of removing the residual noise, which includes various compression artefacts, and would otherwise be learned as part of the image model during training.To that effect, for all 1 ≤  ≤ 32, a simple denoising problem is formulated for the recovery of a clean image u  from u raw  : u raw  = u  + e  , assuming i.i.d.Gaussian random noise e  with standard deviation   .The denoising procedure applied consists in solving the optimization problem underpinning SARA in (6), with  = I and y = u raw  , resulting in a denoised database (u low  ) 1≤ ≤32 .We note that this procedure does not affect the (low) dynamic range of the images.In a nutshell, noise and artefacts are removed, but the floor signal level and maximum intensity remain essentially unchanged.
Finally, the cleaned images are split to create a rich and clean database (u low  ) 1≤ ≤ with  = 2235 images of size 512 × 512 containing both compact and diffuse emission.When the image dimensions are not multiples of 512, a symmetric padding is used before splitting.The resulting images exhibit a whole distribution of maximum intensity values bounded by 1, and a corresponding distribution of dynamic range values, with  −1 0 64 representing the "nominal" dynamic range value of the database.

Enhancing database dynamic range to target
In order to simulate the high dynamic range of interest in RI imaging, we propose to exponentiate pixel-wise the intensity value of all groundtruth images of the low dynamic range database.The exponentiation parameter will be set to achieve a final nominal dynamic range of the same order as the training dynamic range, which according to (17) is also equivalent to the target dynamic range of the reconstruction.Technically, the low dynamic range images u low  are exponentiated through pixel-wise as: for some parameter  1. Recalling that the maximum span of the pixel values in any u low  is [ 0 , 1], the resulting maximum span of the exponentiated images is approximately [ −1 (  0 − 1), 1].The exponentiation thus preserves the database normalization with maximum intensity value across images bounded by 1, while providing control on the final nominal dynamic range (  0 − 1) −1 .Imposing this value to be equal to the training dynamic range leads to the following equation for : Parametrizing  as  =  −1 leads to the following equation for the scaling factor :  =  (1 + )  −1 0 .This equation can be solved numerically to estimate the exponentiation factor  as a function of  0 and .In summary, the high dynamic range groundtruth database (u  ) 1≤ ≤ results from the low dynamic range groundtruth database (u low  ) 1≤ ≤ via (19), with exponentiation parameter  set in (20).
Figure 1 summarizes the preprocessing pipeline for the creation of the low and high dynamic range databases.Notice the diversity of images in the training samples displayed in Figure 1 (c).

Generating noisy & groundtruth pairs on-the-fly
The supervised training approach considered relies on image pairs: the groundtruth images (u  ) 1≤ ≤ corresponding to the target outputs of the DNN, and the noisy images (z  ) 1≤ ≤ corresponding to the DNN input.Given our choice, described in Section 3.2.1, to build a denoiser tailored to tackle i.i.d.Gaussian random noise of specific standard deviation , the noisy high dynamic range images read with w  ∼ N (0, I) and  given by ( 17).Importantly, as the high dynamic range database of noisy and groundtruth pairs depends on the target dynamic range through ( 19) and ( 21), one version of the database would need to be computed and stored for each dynamic range of interest, which is a highly impractical and unnecessary approach.Instead, z  and u  can simply be seen as functions of u low  and computed on the fly during training.

Training loss function
In with z  and u  given in ( 19) and ( 21) respectively, as functions of u low  computed on the fly during training.The second term in ( 22) is the firm nonexpansiveness regularization term, with  > 0 the associated regularization parameter, and  > 0 a safety margin parameter.The first term L is the standard part of the loss function, for which practical choices involve the ℓ 2 loss i.e.L (•) = 1/2 • 2 , and the ℓ 1 loss, i.e.L (•) = • 1 ; both will be studied in Section 4.

Test images
Our test image database consists of four radio images, namely 3c353, Hercules A, Centaurus A and Cygnus A, all resized to a size of  = 512 × 512, shown in Figure 2.These images all have a peak value equal to 1 and dynamic ranges above 10 4 .The simulations involving these images will target reconstruction dynamic ranges of the order of 10 4 .We emphasise that this is orders of magnitude higher than typical values in the PnP literature (Zhang et al. 2017(Zhang et al. , 2021;;Pesquet et al. 2021), and therefore significantly more challenging.
All simulations involving uSARA rely on a value for  using (18) including the 1/3 correction factor, while the values of  are fine-tuned around (18) to optimize the reconstruction quality.As for SARA, we rely on the prescriptions of Thouvenin et al. (2022a) for setting its hyperparameters.While all other methods use natural weighting of the data, CLEAN makes use of uniform weighting, which relies on the density of the Fourier sampling for enhanced resolution (see Section 2.1 and Appendix A) 6 .We underline that the CLEAN restored image is in units of Jansky per beam area, in contrast with SARA-based methods, AIRI, and the UNet, where the associated estimated images are in the units of Jansky per pixel.Therefore, in what follows, CLEAN images are rescaled by the area of the CLEAN beam, for visualization purposes.Finally, we note that we always show the dirty image computed with a uniform weighting scheme, and rescaled in the [0, 1] range for visualization purposes.
Regarding the end-to-end deep learning approach, we use a Deep Residual UNet from Zhang et al. (2021) and train it to reconstruct a target image in (1) from its dirty version in (2).We generate a database (u  , Re{ †  y  }) 1≤ ≤ , where for every , y  represents the visibility vector computed from the groundtruth u  and for a measurement operator   = G  FZ generated from a random pointing direction in the sky.The randomization ensures that the UNet training database contains a rich and diverse set of sampling patterns.We borrow the procedure detailed in Section 3.3.3 to enhance the nominal dynamic range of the groundtruth database (u  ) 1≤ ≤ , and equate it to the target dynamic range of reconstruction.The UNet is then trained to minimize the ℓ 1 loss between the groundtruth u  and the reconstruction UNet(Re{ †  y  }).The UNet is trained for 800 epochs with the Adam algorithm.

Evaluation metrics
For AIRI-ℓ 2 , AIRI-ℓ 1 , uSARA, SARA, and the UNet, the image estimates x are the reconstructed model images, while the image estimate considered for CLEAN is the restored image.The performance of all algorithms is assessed qualitatively via inspection of x, alongside the residual images, defined as r =  † (y −  x), where  is a normalization factor7 .Given the high dynamic ranges of interest, the estimated images are shown in logarithmic scale, using the mapping rlog = x ↦ → log 10 (10 3 x + 1)/3.The residual images are displayed in linear scale.
We also consider two quantitative evaluation metrics, namely the SNR and logSNR of the estimated images x with respect to the groundtruth x.The standard SNR metric (in dB) is defined as SNR( x, x) = 20 log 10 ( x / x − x ).It simply corresponds to a logarithmic version of the norm of the error image x− x.The logSNR (in dB) is the SNR evaluated on a logarithmic version of the intensity images.This metric is introduced given the high dynamic ranges of interest, and assigns more relative weight to low intensities than the SNR.Our definition is logSNR( x, x) = SNR(rlog( x), rlog(x)).

DNN architecture & training particulars
Firstly, as highlighted already, the denoising losses that we use for L in our training loss ( 22) are chosen to be either the ℓ 2 loss or the ℓ 1 loss, giving rise to AIRI-ℓ 2 and AIRI-ℓ 1 .
Secondly, we choose a basic DNN architecture for the denoiser D, in the form of a modified DnCNN (Zhang et al. 2017) where the batch normalization layers have been removed, as in Pesquet et al. (2021).But in this work, a ReLU has been added as the last layer.This is a simple way to enforce positivity at the output of the denoiser, and therefore of the final reconstruction, much necessary as the developed AIRI algorithm targets intensity (versus polarization) imaging (see Figure 3).
Thirdly, the DNN is trained for 10 5 epochs with the Adam algorithm (Kingma & Ba 2015) and learning rate 10 −4 , the learning rate being divided by 2 every 2 × 10 4 epochs.(z  , u  ) in ( 22) are of size 46 × 46, generated according to ( 21) and ( 19) from patches randomly extracted from the low dynamic range database of 512 × 512 images (u low  ) 1≤ ≤ , itself augmented with random rotations and zooms.The Jacobian regularization in ( 22) is computed with a power method with a number of iterations increased to 10 in the later stages of the training to improve the precision of the computation.As observed in Pesquet et al. (2021), starting the optimization of the network from a pretrained state with  = 0 in (22) enhances the results significantly.
Finally, we have observed that performing, at each iteration of the PnP algorithm, random flips and 90 degrees rotations of the image before applying the DNN, and inverting the transform after the denoising step, can significantly boost the reconstruction quality.We therefore adopt this strategy, inspired by Zhang et al. (2021).

Simulation setup
The PnP approach suggests that the best denoisers should provide the best regularizers.In this first experiment, we compare the denoising capabilities of the AIRI-ℓ 2 and AIRI-ℓ 1 denoisers on one side, and the average sparsity proximal operator propelling uSARA and SARA on the other side.To this end, we focus on a denoising problem where y ∈ R  is the data, x ∈ R  is the groundtruth image, and e ∈ R  is the realization of an i.i.d.Gaussian random noise with zero mean and standard deviation .In other words, ( 23) is a degenerate case of problem (1) with  = I.We consider the 3c353 image as a groundtruth x.As the following experiments for the full validation of AIRI-ℓ 2 and AIRI-ℓ 1 operate at a target dynamic range around 10 4 , we design this preliminary pure denoising experiment with a similarly challenging target dynamic range, and set the observation noise level to  = 10 −4 .Firstly, problem ( 23) can be solved using AIRI via Algorithm 3. As the image peaks at 1, no rescaling of the inverse problem in (1) is needed (see Section 3.2).In this simple case, the gradient in (13) reads ∇  (x) = x − y with Lipschitz constant in ( 14)  = 1.Taking  = 1/ = 1, Algorithm 3 trivially boils down to a simple noniterative application of D to the data, leading to the denoised image x = D(y). (24) The AIRI-ℓ 2 and AIRI-ℓ 1 denoisers were trained with a noise level  =  = 10 −4 as per (17), on the proposed synthetic training database, and following the procedure explained in Section 3.4.Following the procedure detailed in Section 3.3.3,the chosen exponentiation parameter for on-the-fly dynamic range enhancement of the database is  = 10 3 .Furthermore, we set the regularization parameter in the training loss ( 22) to  = 10 −9 and  = 10 −5 for AIRI-ℓ 2 and AIRI-ℓ 1 denoisers respectively.The safety margin parameter is set to  = 5 × 10 −2 .These values are chosen here as they are the ones to ensure the stability of Algorithm 3 in our next experiments.Secondly, problem (23) can also be solved in a proximal approach with uSARA, setting  = I.Interestingly, in this case, the uSARA objective boils down to the objective defining the average sparsity proximal operator (see definition (4)) of , for the average sparsity prior  in (7), and  =  following the reasoning around equation ( 18).Therefore, applying Algorithm 1 results in computing the denoised image by simply applying the average sparsity proximal operator with appropriate : We note, given ∇  (x) = x − y and  = 1, and choosing  = 1 and  = 1/ = 1, that Algorithm 1 simplifies to the sequential application of prox  (y, W  ) with only the weights being updated at each iteration.The algorithm was run with convergence criteria to  1 = 6 × 10 −6 Algorithm 1, and  2 = 10 −4 in Algorithm 2.

Experimental results
Denoising results are displayed in Figure 4.All three denoisers appear to be highly effective, both in terms of SNR and logSNR and visual denoising quality.The SNR values are very comparable.In terms of logSNR the AIRI-ℓ 1 denoiser achieves better performance than the AIRI-ℓ 2 denoiser, itself superior to the average sparsity proximal operator.Visually, the AIRI-ℓ 1 denoiser also achieves the reconstruction with the best resolution and least amount of artefacts (see zooms).
These results suggests that the AIRI-ℓ 1 and AIRI-ℓ 2 denoisers do encapsulate a superior prior model to the average sparsity model encapsulated in the corresponding proximal operator.

Simulation setup
Using our four 512 × 512 test images (see Figure 2), we study the impact of the noise level  involved in the training of the AIRI-ℓ 2 and AIRI-ℓ 1 denoisers according to the procedure set in Section 3, with the aim to validate the heuristic (17).We also study the optimal value of the regularization parameter  of uSARA around the heuristic (18).The performance of AIRI-ℓ 2 and AIRI-ℓ 1 in terms of reconstruction quality is compared to that of uSARA.
We simulate RI observations utilizing simulated -coverages of the radio telescope MeerKAT.The -patterns correspond to five randomly selected pointing directions displayed in Figures 5 and  6c hour.The data size thus increases linearly with Δ.Data are simulated following the model described in (1), with input SNR defined as iSNR = 20 log 10 ( x /) = 30 dB.We finally note that the -patterns extend to the edge of the considered Fourier domain as shown in Figures 5 and 6.In other words, we target mild, if any, superresolution with respect to the nominal resolution of the observation, as set by the largest baseline.
Since our groundtruth images have peak values at 1, no rescaling of the inverse problem in ( 1) is needed for AIRI, and a common heuristic value is reached for  in ( 17) and  in (18).Obviously,  and  are functions of , and therefore of both the pointing direction and total duration of observation.However, the heuristic values are within a 25% variation range across pointing directions for each Δ.For simplicity, a single heuristic value is considered for all five patterns associated with the same Δ, defined as  Δ , taken to be the average of the values resulting for each pointing direction.The two resulting values are as announced associated with a target dynamic range around 10 4 :  4 = 1.4 × 10 −4 for Δ = 4 h, and  8 = 9.3 × 10 −5 for Δ = 8 h.The values probed for  and  are sampled a factor √ 2 around the heuristic and by steps of 2 thereafter.Following the study by Pesquet et al. (2021), we choose the value of  = 1.98/ in Algorithms 1 and 3, which saturates the theoretical bound.The convergence criteria of Algorithms 1 and 3 are set to  1 =  3 = 5 × 10 −6 , with a maximum number of iterations of 6 × 10 3 , and we set  2 = 10 −5 in Algorithm 2.
We train our AIRI-ℓ 2 and AIRI-ℓ 1 denoisers with optimized values  = 10 −9 and  = 10 −5 respectively in the training loss ( 22), with  = 5×10 −2 , and observe that this ensures the stability of Algorithm 3 despite a spectral norm of the Jacobian slightly above 1.Following the procedure detailed in Section 3.3.3,the exponentiation parameter for on-the-fly dynamic range enhancement of the database is set to  = 10 3 .

Experimental results
Considering the image of 3c353 as groundtruth and the -pattern from Figure 5a (Δ = 4 h) for simulating the measurements, reconstructions for different values of  in AIRI-ℓ 2 and AIRI-ℓ 1 , and  in uSARA are displayed in Figures 7 and 8.More precisely, model images, displayed in logarithmic scale, and their associated SNR and logSNR values are provided in Figure 7, and residual images are displayed in Figure 8 in linear scale.We firstly notice that  and  play a similar role in the reconstruction quality: the higher their value, the smoother the recovered image; the lower their value, the stronger the artefacts.Overall, reconstructions with uSARA exhibit some wavelet artefacts, particularly noticeable around bright compact sources overlaying faint extended emission.AIRI-ℓ 2 reconstructions typically exhibit less artefacts, but appear to be generally smoother.AIRI-ℓ 1 reconstructions contain even less artefacts, with an achieved resolution similar to uSARA.We conclude these observations by highlighting the low-amplitude residual images obtained with low values of  and .However, such results are not a token of a good quality reconstruction.They instead reflect the over-fitting of the data in such settings.
Figure 9 shows the reconstruction SNR and logSNR graphs obtained when varying  and  for reconstructions with AIRI and uS-ARA respectively, for both the total observation durations Δ = 4 h and Δ = 8 h.Each point on each of the four graphs gives the average value and 95% confidence interval over 20 reconstructions arising when considering the four test images in Figure 2 and five different pointing directions from Figures 5 and 6c.
Firstly, for both AIRI-ℓ 2 and AIRI-ℓ 1 , we observe that the SNR and logSNR metrics clearly peak at the proposed heuristic (17) (shown as the dashed vertical black line) for both Δ = 4 h and Δ = 8 h, suggesting that equating the training dynamic range to the target dynamic range is an accurate procedure.This result is also confirmed by the visual analysis in Figure 7.For uSARA, the SNR and logSNR results and the visual analysis in Figure 7 suggest that the relation ( 18) is a useful reference, but rather provides an upper-bound on the optimal value for the soft-thresholding parameter, roughly 6 to 11 times smaller when looked at from an SNR standpoint, or 3 to 6 times smaller from a logSNR perspective.In general, we find that the logSNR provides better coherence with the visual analysis, suggesting a optimal soft-thresholding value 3 to 6 times below the reference.Interestingly, the 1/3 correction factor brings a closer-tooptimal value, but does not solve the discrepancy between the SNR and logSNR metrics.In contrast to AIRI-ℓ 2 and AIRI-ℓ 1 , uSARA thus exhibits a regularization parameter of which the value cannot be set simultaneously automatically and optimally.This represents a significant advantage of AIRI over uSARA.
Secondly, considering each method at its best, SNR and logSNR values for both Δ = 4 h and Δ = 8 h confirm a superior performance of AIRI-ℓ 1 over both AIRI-ℓ 2 and uSARA, the latter two achieving similar reconstruction qualities, with virtually same logSNR, and a slight SNR advantage for AIRI-ℓ 2 .
Thirdly, these results are in line with the visual analysis when considering each method at its best (see Figures 7d,7i,7m,and Figures 8b,8g,8k).AIRI-ℓ 2 and uSARA exhibit similar reconstruction qualities with the former, smoother but with less artefacts than the latter.AIRI-ℓ 1 exhibits less artefacts than AIRI-ℓ 2 , while preserving the resolution offered by uSARA.AIRI-ℓ 1 provides the best residuals, uSARA residuals clearly contain sources discarded by the model and relating to the artefacts, while AIRI-ℓ 2 's higher residuals correlate with the sub-optimal resolution.
Fourthly, we recall that AIRI-ℓ 2 , AIRI-ℓ 1 , and uSARA are underpinned by the same FB structure, and only differ by their choice of denoiser.From this perspective, the results of the current experiment are in line with those of the previous experiment, with a better regularization denoiser naturally leading to better overall reconstruction.
Finally, we observe that the confidence intervals around the mean SNR and logSNR values in Figure 9 are rather small for both AIRI-ℓ 2 and AIRI-ℓ 1 , and similar to those of uSARA.This is a first illustration that in the PnP approach there is no issue of generalizability to measurement conditions not considered during training as, by construction, the denoisers are blind to the measurement setting (with regards to the noise level, see discussion around (17)), in contrast to end-to-end networks.

Simulation setup
In this section, we validate AIRI-ℓ 2 and AIRI-ℓ 1 more extensively, both in reconstruction quality and computational cost, across a wider range of observation durations, and in comparison with, uSARA, SARA, CLEAN, as well as UNets trained in an end-to-end fashion.We consider the same four groundtruth images and experimental setup as for the previous experiment (see Section 4.3.1),with the same set of five pointing directions (see -coverages in Figures 5 and 6c).In order to further assess robustness to varying measurement conditions, the set of observation durations is enlarged to Δ ∈ {1 h, 2 h, 4 h, 8 h} (see Figure 6).We note that a different UNet is trained for each observation duration to alleviate the challenge of generalizability with respect to the sampling pattern.The exponentiation parameter for the UNet training database, resulting from the procedure in Section 3.3.3, is  = 10 3 .
We recall that our groundtruth images have peak values at 1, yielding the same heuristic value for  and .These values are still within a 25% variation range across pointing directions for all four Δ values, and a single heuristic value is considered for all five -patterns associated with the same Δ.The resulting four average  Δ are:  1 = 2.9 × 10 −4 ,  2 = 2.2 × 10 −4 ,  4 = 1.4 × 10 −4 , and  8 = 9.3 × 10 −5 .Given the result of the previous experiment the values of  for AIRI-ℓ 2 and AIRI-ℓ 1 are taken exactly at the heuristic, while the value of  for uSARA is fine-tuned manually to optimize the logSNR of each reconstruction, again resulting in soft-thresholding values ranging between 3 and 6 times below the heuristic, depending on Δ.Moreover, setting  = 10 −9 (resp. = 10 −5 ) for the AIRIℓ 2 (resp.AIRI-ℓ 1 ) denoiser in ( 22), with  = 5 × 10 −2 , ensured the stability of Algorithm 3. Following the procedure detailed in Section 3.3.3,the exponentiation parameter for on-the-fly dynamic range enhancement of the database is  = 10 3 .
At the level of Algorithms 1, 2, and 3, the stepsize , convergence criteria (  ) 1≤ ≤3 and maximum number of iterations are set to the same values as in the previous experiment.
Finally, AIRI-ℓ 2 , AIRI-ℓ 1 , uSARA, and SARA are run in Matlab, while CLEAN is implemented in the highly optimized C++ WSClean package, and the UNets are implemented in Python using PyTorch.uSARA, SARA, and CLEAN implementations utilize 10 CPU cores (Intel Xeon E5-2695 2.1 GHz).The parallel PDFB algorithmic structure underpinning SARA (see Section 2.3.1)optimizes the distribution of the operations involved at each iteration across those 10 CPU cores: 1 for the data-fidelity term, and 1 for each of the 9 bases of the average sparsity dictionary  (Onose et al. 2016).For uSARA, the average sparsity dictionary is parallelized via a faceting procedure using 9 facets (Pruša 2012;Thouvenin et al. 2022a).AIRI-ℓ 2 and AIRI-ℓ 1 utilize 10 CPU cores and 1 GPU (NVIDIA Tesla V100).The latter is for the application of the denoiser, as commonly done with DNNs in deep learning.The DNN can be run on CPU, but this slows down the reconstructions significantly, as will be evident from the experimental results.Implementations of uSARA, SARA, and CLEAN leveraging GPUs are not investigated here.The UNets are run utilizing 1 GPU (NVIDIA Tesla V100).

Reconstruction quality
Figure 10 shows the average reconstruction SNR and logSNR as a function of the observation duration Δ.Each point, on each of the two graphs, represents an average over 20 reconstructions arising when considering the four test images in Figure 2 and five pointing directions, while error bars give the 95% confidence intervals.As expected, the reconstruction SNR increases with Δ, as the data size increases with Δ.The results extend the conclusion of the previous experiment of a superior performance of AIRI-ℓ 1 over both AIRI-ℓ 2 and uSARA across Δ ∈ {1 h, 2 h, 4 h, 8 h}.In SNR, AIRI-ℓ 1 and AIRI-ℓ 2 are respectively around 3dB and 2dB above uSARA.In logSNR, AIRI-ℓ 1 is between 1dB and 2dB above both AIRI-ℓ 2 and uSARA.The comparison with SARA reveals that AIRI-ℓ 1 is on par with SARA in SNR and slightly superior in logSNR across the range of measurement settings considered.As previously, we observe that the confidence interval around the mean SNR and logSNR values for both AIRI-ℓ 2 and AIRI-ℓ 1 is similar to that of uSARA and SARA, reflecting the robustness of the proposed method.The reconstruction quality of both CLEAN and the UNets trained end-to-end is much inferior to those of AIRI and SARA approaches, for both SNR and logSNR and all measurement conditions considered.Interestingly, notice that the UNets significantly improve over CLEAN in linear scale, but perform significantly worse in logarithmic scale, which confirms the difficulty of the UNets to recover very faint emissions.This phenomenon was also noticed during training.
We recall that SARA and uSARA leverage the same prior model, encapsulated in the averaged sparsity proximal operator.However, SARA relies on a constrained data-fidelity term (see objective ( 6)), while uSARA uses an unconstrained formulation enforcing less abruptly data fidelity, in the sense that it imposes a soft penalty rather than a hard constraint (see objective ( 11)).A by-product of our analysis is to confirm the superiority of the former over the latter.
Figure 11 shows the reconstructed Hercules A images for the considered methods and the -coverages shown in Figures 6a, 6b,  and 6c.The visual analysis confirms the quantitative results from Figure 10.We acknowledge that the AIRI-ℓ 2 and AIRI-ℓ 1 algorithms can create artefacts by over-emphasising small structures (see the red zoom box in Figure 11o).SARA exhibits very similar quality to AIRIℓ 1 , but not always with the same resolution-to-artefacts tradeoff.The images suggest that the poor reconstruction of CLEAN stems from a strong offset in intensity values (see in particular the difference between the estimated higher intensity values in the upper rectangular zoom from Figures 11e,11k, and 11q, and that of the groundtruth).Also note the important residual noise hiding low intensity features of the image, as well as remaining artefacts for Δ = 1 h and Δ = 2 h.This figure also illustrates clearly the strengths and shortcomings of the UNets trained in an end-to-end manner.The relatively good SNR and low logSNR, together with the visual results, confirm that the UNets manage to recover quite accurately the high intensity emission but struggle to recover faint emission.Visually, the results of the UNets present some hallucinated point-like sources (Muckley et al.   5 and 6c.For AIRI-ℓ 2 and AIRI-ℓ 1 ,  is chosen at the heuristic.For uSARA,  is finetuned manually to optimize the logSNR, resulting in values between 3 and 6 times below the heuristic (in line with Figure 9).Error bars show the 95% confidence interval.
2021), which we believe is the consequence of our synthetic training database containing a large number of such sources.

Computational cost and reconstruction times
The computational costs for the various approaches considered are reported in Figure 12.Each point on the graphs is an average over the 20 simulated observations built from the four different groundtruths from Figure 2 and the five different -patterns shown in Figures 5  and 6c.Error bars show the 95% confidence intervals.
Figure 12a displays the average time per iteration, independently for each of the forward and backward steps of both uSARA and AIRI, as a function of the observation duration.We recall once more that uSARA relies on the same algorithmic structure as AIRI-ℓ 2 and AIRI-ℓ 1 .In both cases, the underpinning FB algorithm alternates between a (forward) gradient-descent step enforcing data fidelity and a (backward) regularization step enforcing a prior model via the application of either the average sparsity proximal operator (uSARA, Algorithm 1) or a learned denoiser (AIRI, Algorithm 3).We note that, in such an algorithmic structure, the computational cost of the regularization step scales linearly with the image size, while that of the gradient-descent step scales linearly with the data size.Firstly, we observe from the figure that the computation time of the gradient step naturally scales linearly with the observation duration, which is indeed proportional to data size.Secondly, we acknowledge that the computation of the proximal operator within uSARA is slow, which is due to its sub-iterative nature.For the image and data sizes considered in our simulations, and given our CPU implementation, it always dominates the computation time of the gradient, on average per iteration of Algorithm 1.This conclusion holds despite the fact that we have carefully optimized the external parameters of Algorithm 1 and Algorithm 2 to optimize the reconstruction time ( in Algorithm 1, and the tolerance criterion  2 and dual variable v 0 in Algorithm 2).In that regard, we note that an efficient initialization for the dual variable v 0 consists in setting it as the dual variable from the previous use of Algorithm 2. Using such an initialization, the number of sub-iterations for Algorithm 2 progressively decreases with the iteration count in Algorithm 1. Thirdly, using a standard GPU implementation of the AIRI-ℓ 2 and AIRI-ℓ 1 denoisers, their application is two orders of magnitude faster than the computation of the proximal operator.A CPU implementation of the denoisers makes them even slower than the proximal operator in uSARA.For the image and data sizes at stake, the AIRI denoisers on GPU are also significantly faster than the computation of the gradient, on average per iteration of Algorithm 1.
Figure 12b displays the total reconstruction times for uSARA and AIRI as a function of the observation duration, and in comparison with SARA and CLEAN.The AIRI-ℓ 2 and AIRI-ℓ 1 reconstruction times with denoiser on GPU scale almost linearly with the observation duration, due to the domination of the gradient step.For uSARA, and for AIRI-ℓ 2 and AIRI-ℓ 1 with denoiser on CPU, the dependency on observation duration is less severe given the domination of the regularization step at each iteration.Overall, given the speed up in the regularization step, the total reconstruction time of AIRI with denoiser on GPU is significantly (5 to 10 times) lower than the one of uSARA.AIRI-ℓ 2 and AIRI-ℓ 1 with denoiser on CPU are much slower, in fact slightly slower than uSARA itself.SARA offers reconstruction time between those of AIRI and uSARA.CLEAN remains significantly faster than AIRI (4 to 15 times), but the fastest method among all is by far the UNet trained end-to-end.We underline that this is to be expected since, not only UNets were run on GPU, but the reconstruction with the UNet only requires a single inference step, as opposed to the other methods that are iterative.
To summarize, the application of an AIRI denoiser (on GPU) is orders of magnitude faster than that of the average sparsity proximal operator of uSARA (on CPU).Substituting the former by the latter does not only provide superior imaging quality, but significantly reduces the computational cost of the regularization step.For image and data sizes where the uSARA regularization step dominates over the gradient-descent data-fidelity step, a cost reduction in the former, substituting the proximal operator for a DNN denoiser leads to significant speed up of the algorithm.  .AIRI-ℓ 2 and AIRI-ℓ 1 exhibit identical behaviours, with only AIRI-ℓ 2 results are reported and referred to as AIRI on the graphs.Each point is an average over the 20 simulated observations.Error bars show the (very small and virtually invisible) 95% confidence interval.For AIRI-ℓ 2 and AIRI-ℓ 1 , all reconstruction times are reported for both GPU (solid red lines) and CPU implementation (dashed red lines) of D. Left: average time per iteration in seconds (s.) for the forward and backward operators involved in Algorithm 1 (∇  and prox  ) and Algorithm 3 (∇  and denoiser D), as a function of the observation duration.Right: average total reconstruction times for AIRI-ℓ 2 , AIRI-ℓ 1 , uSARA, SARA, CLEAN, and the end-to-end UNets, as a function of the observation duration.

Conclusion for both AIRI & uSARA
In this work, we have proposed a new algorithmic framework for scalable precision RI imaging, dubbed AIRI.The approach consists in encapsulating a prior image model in a DNN denoiser, and plugging it in lieu of the proximal regularization operator of an optimization algorithm of the SARA family for image reconstruction.AIRI inherits the robustness and interpretability of optimization approaches and the learning power and speed of networks, while avoiding generalizability limitations of end-to-end networks, for which training involves the details of the measurement model.More specifically, we have firstly designed a realistic low dynamic range training database from optical images.We have also established a method to train firmly nonexpansive denoisers at a noise level inferred from the target dynamic ranges of reconstruction, with either an ℓ 2 or ℓ 1 loss, including a procedure for on-the-fly database dynamic range enhancement.Finally, the AIRI-ℓ 2 and AIRI-ℓ 1 algorithms resulting from plugging the denoisers into the FB algorithm were validated in simulation for reconstruction of images involving complex structure with diffuse and faint emission across the field of view (at a dynamic range of 10 4 ).The benchmark algorithms included CLEAN, the state-of-the-art SARA algorithm, as well as its new unconstrained version uSARA, which only differs from AIRI-ℓ 2 and AIRI-ℓ 1 by the use of the average sparsity proximal regularization operator instead of a learned denoiser in the FB algorithm.
Our results show that the first incarnations of AIRI are already competitive with, if not superior to, the advanced optimization algorithms of the SARA family in imaging quality, well beyond the capabilities of CLEAN and of the UNets trained in an end-to-end fashion.In particular, while the AIRI denoisers are trained on the same database as the UNets, the PnP solutions do not exhibit hallucinated artefacts, which illustrates the robustness of AIRI compared to the end-to-end UNets.With regards to computation time, the AIRI approach was shown to provide a significant acceleration potential over uSARA and SARA, thanks to the inference speed of DNN denoisers over the averaged sparsity proximal operator, but remain significantly slower than CLEAN, itself orders of magnitude slower than the UNets.The results also confirm the validity of our heuristic to set the training noise level , consisting in enforcing equality of the training and target dynamic range of the reconstruction, and the validity of the database exponentiation approach (with parameter ), consisting in enforcing equality of the final nominal dynamic range of the database and target dynamic range of reconstruction.This provides a significant advantage over uSARA, for which the regularization parameter  requires manual fine-tuning, even though the fact that the corresponding heuristic (and its 1/3-corrected version) provides an appropriate reference upper-bound for the optimal value is an interesting result.

Current limitations & future work
In what follows, we discuss current limitations of the AIRI framework, and future work.This also includes considerations for further developments of uSARA and end-to-end DNNs.
Firstly, the modified DnCNN denoiser architecture used, the ℓ 2 and ℓ 1 losses considered, as well as the set of optical images used to build a training database constitute the most basic choices leading to our first AIRI-ℓ 2 and AIRI-ℓ 1 incarnations of the framework.We anticipate that future work, building a richer database from RI observation, and considering more advanced losses, such as adversarial losses (Wang et al. 2018), and more advanced network architectures, such as UNets (Hurault et al. 2022), can provide a quantum jump in imaging precision with AIRI.We also acknowledge that our approach to enforce the firm nonexpansiveness of the denoiser is only approximate, and relies on fine-tuning the regularization parameter  in the training loss.Simpler and more precise approaches to ensure firm nonexpansiveness should be investigated in the future (e.g.leveraging 1-Lipschitz layers).We note however that, in the context of our experiments, the proposed approach has, in practice, provided a robust way to ensure convergence of the resulting PnP algorithms.
Secondly, fully optimized and parallelized implementations of AIRI algorithms, but also SARA, or uSARA, should be investigated, leveraging GPUs not only for DNN denoisers, but also for the implementation of the linear (measurement or regularization) operators involved.This should lead to further acceleration of these approaches, and possibly participate to closing the gap with CLEAN in terms of reconstruction time.We also note that accelerated versions of uSARA are achievable, in particular leveraging preconditioning strategies (Repetti & Wiaux 2021).
Thirdly, with regards to the details of the target imaging modality, we have investigated monochromatic intensity imaging on small fields of view only, with mild, if any, super-resolution factor with respect to the nominal resolution of the observation, as set by the largest baseline.Extensions of the framework to further super-resolution functionality and to wideband polarization imaging should be contemplated.Also, a by-product of the present analysis is to confirm the superiority of SARA over uSARA, due to the constrained approach to data fidelity in SARA, versus the unconstrained approach underpinning uSARA.In this context, the superior results of AIRI over uSARA are entirely related to the capability to encapsulate a better prior model than the average sparsity model in learned DNN denoisers.On the one hand, a first possible enhancement of the AIRI approach would consist in developing a PnP version of SARA (as opposed to uSARA), simultaneously taking advantage of networks to learn prior models and of the constrained data-fidelity formulation.On the other hand, to date, the unconstrained formulation under-pinning uSARA, AIRI-ℓ 2 , and AIRI-ℓ 1 is a critical building block of the only optimization-based algorithm proposed for RI imaging capable of handling jointly DDE calibration and imaging (Repetti et al. 2017;Dabbech et al. 2021).This is a strong justification for the study of unconstrained versus constrained formulations of the imaging module.
Fourthly, the proposed UNet end-to-end implementation showed interesting results, albeit significantly outperformed by the proposed AIRI algorithms.A more thorough study should be undertaken, leveraging more advanced architectures, such as unfolded networks that would enable to incorporate knowledge about the measurement operator within the architecture.
Finally, the present work is simulation-based only.Dabbech et al. (2022), Wilber et al. (2022a), andWilber et al. (2022b) validate AIRI and uSARA on real wide-field high-resolution high-dynamic range data from MeerKAT and ASKAP, relying on a highly parallelised implementation of the measurement operator tailored to the widefield setting.assume a nonexpansiveness constraint on the denoiser (Romano et al. 2017;Ryu et al. 2019;Terris et al. 2020;Cohen et al. 2021;Hertrich et al. 2021), but this constraint often comes with restrictive assumptions either on the algorithm, on the DNN architecture or on the operator ∇  .
In Pesquet et al. (2021), casting the problem in the framework of monotone operator theory, we have shown that the convergence of a PnP algorithm can be ensured by introducing a well-chosen regularization in the training loss of the denoiser D enforcing its firm nonexpansiveness, without any form of limitation on the DNN architecture itself.Importantly, the approach also offers a generalized notion of characterization of its limit as the solution to monotone inclusion problems, more general than pure optimization problems, hence opening a path towards generalizing the usual Bayesian MAP interpretation of optimization solutions.We summarize those results in the remainder of this section.
Yet, the subdifferential is also defined for non-differentiable functions such as  B (y,  ) ; we refer the reader to Bauschke & Combettes (2017) for more information.Problems as (B1) are part of the class of socalled monotone inclusion problems and encompass unconstrained minimization problems such as uSARA (11).They can however take a more general flavour and write as find x ∈ R  s.t.0 ∈ ∇  ( x) + A( x), (B2) where A is an operator satisfying some maximal monotonicity assumptions (Bauschke & Combettes 2017).It is important to stress that this class of operators includes, but is not restricted to, subdifferentials.Problem (B2) is thus in general not equivalent to a minimization problem.Solving such inclusion problems can be done using a generalized version of the proximal FB algorithm (5) with (∀ ∈ N), x +1 = J A (x  − ∇  (x  )), (B3) where J A is the so-called resolvent of A, yielding the convergence of the sequence (x  )  ∈N to x satisfying (B2).As previously, the stepsize  must satisfy 0 <  < 2/ where  is the Lipschitz constant of ∇  .The resolvent operator can be seen as a generalized version of the proximal operator (4) (Bauschke & Combettes 2017).In fact, ( 5) is a special case of (B3), since when A =  for  ∈ Γ 0 (R  ), one has J  = prox  .A key characterization of resolvent operators is the following: the operator J is the resolvent of a maximally monotone operator A if and only if it is firmly nonexpansive.Technically, by definition, an operator J is firmly nonexpansive if there exists an operator Q with Lipschitz constant smaller or equal to 1 such that J = (I + Q) /2, where I is the identity operator.Given the identical structure of PnP-FB and (B3), it follows that D in (15) should be firmly nonexpansive for the convergence of the PnP algorithm to be ensured, with the PnP solutions characterized as solutions of a monotone inclusion problem (B2).
Thus, in order to impose the firm nonexpansiveness of D, it is sufficient to impose that the Lipschitz constant of Q = 2 D − I is less than 1, or equivalently, that its Jacobian satisfies ∇ Q(x) S ≤ 1 for all x.Since most deep learning frameworks rely on Jacobian-vector products (Paszke et al. 2017;Balestriero & Baraniuk 2021;Maddox et al. 2021), one can, given some x ∈ R  , compute ∇ Q(x) S via the power method.Thus, D would ideally be trained as a denoiser with a standard training loss, but under the constraint that ∇ Q(x) S ≤ 1 for all x.Such a constraint is not trivial to impose in practice and, as proposed in Pesquet et al. (2021), we relax it and introduce a regularization term in the training loss that penalizes non firmly nonexpansive networks.
Given a database of noisy / groundtruth images (z  , u  ) 1≤ ≤ , and denoting θ ∈ R  the learnable parameters of D, the training loss ensuring the firm nonexpansiveness of D reads: where L is a loss function and  > 0 is a safety margin parameter.As underlined in Pesquet et al. (2021), this penalty does not explicitly guarantee that the solution of this problem satisfies the constraint ∇ Q(z) S ≤ 1 on the database, where it is effectively evaluated, let alone for all z ∈ R  .Various countermeasures are however deployed, which have been demonstrated to be effective at ensuring PnP convergence in practice (see Pesquet et al. (2021)).Firstly, the safety margin parameter  > 0 is a mean to strengthen the penalty.Secondly, the term ∇ Q S is not computed exactly at z  , but instead at a point sampled uniformly at random on the segment [z  , u  ], with the aim to explore a richer set of images than those of the database only.Last but not least, the value of  can, and should, be fine-tuned manually to optimize the balance between the regularization term and the standard loss L.
We conclude this section by underlining that, contrary to other works (Ryu et al. 2019;Terris et al. 2020), the proposed approach does not require to impose layer-wise Lipschitz constraints nor architectural restrictions on the DNN.
relied on 1000 radio images of size 2000 × 2000 simulated with ENZO(Bryan et al. 2014) and augmented by random rotations, resulting in the appearance of realistic extended emission in the field of view.Connor et al. (2022) generated a database of 900 synthetic images of size 2000 × 2000 containing elliptical Gaussian sources.
this section, we introduce the training loss for the supervised training of the denoiser D. Building on Pesquet et al. (2021) and following technical arguments detailed in Appendix B, we regularize the training loss with an appropriate Jacobian term to enforce the firm nonexpansiveness of D. By denoting Q = 2 D − I where I denotes the identity operator, and denoting θ ∈ R  the learnable parameters of D, the resulting training loss reads:

Figure 1 .Figure 2 .Figure 3 .
Figure 1.Summary of the proposed strategy for building a training database of postprocessed radio groundtruths.We first gather a set of 32 astronomical (optical) images of high quality but low dynamic range: (a) shows a subset of four images.(b) illustrates the effect of preprocessing on a small part of the image from (a).Eventually, we split the images into 2235 images of size 512 × 512 to create the low dynamic range training database.Samples of our training database, after application of the exponentiation (19) with  = 10 3 , are shown in (c) in logarithmic scale.

Figure 7 .
Figure 7. Experiment 2 results: Impact of the training noise level  (resp.regularization parameter ) in imaging with AIRI (resp.uSARA) on simulated RI data using 3c353 image as a groundtruth and the -pattern from Figure 5a (Δ = 4 h) in comparison with the reference value  4 suggested by the AIRI and uSARA heuristics.Top row: the groundtruth image 7a and the simulated dirty image 7b.Second row: estimated model images obtained with uSARA, with a regularization parameter  increasing with the column index.Third (resp.fourth) row: estimated model images of AIRI-ℓ 2 (resp.AIRI-ℓ 1 ), with a training noise level  increasing with the column index.Below each image we indicate the reconstruction metrics as (SNR, logSNR).Images are shown in logarithmic scale.

Figure 8 .
Experiment 2 results: Residual images associated with the reconstructions of Figure 7. Top row: residual images for the estimated model images obtained with uSARA, with a regularization parameter  increasing with the column index.Second (resp.third) row: respective residual images for the estimated model images obtained with AIRI-ℓ 2 (resp.AIRI-ℓ 1 ), with a training noise level  increasing with the column index.All images are shown in linear scale.
Case Δ = 8 h Figure 9. Experiment 2 Reconstruction metrics as a function of the training noise level  in the case of AIRI-ℓ 2 and AIRI-ℓ 1 , or the thresholding parameter  in the case of uSARA.Top row: reconstruction SNR in the case of Δ = 4 h (a) and Δ = 8 h (b).Bottom row: reconstruction logSNR in the case of Δ = 4 h (c) and Δ = 8 h (d).Each point is an average over the 20 simulated observations built from the four different groundtruths from Figure 2 and the five different -patterns shown in Figures 5 and 6c.Error bars show the 95% confidence interval.On each graph, the black vertical dotted line indicates the common heuristic value for  and  (i.e. 4 for Δ = 4 h and  8 for Δ = 8 h).Values for  and  are sampled a factor √ 2 around the heuristic and by steps of 2 thereafter.

Figure 10 .
Figure10.Experiment 3 results: Reconstruction SNR (left) and logSNR (right) as a function of the observation duration for AIRI-ℓ 2 , AIRI-ℓ 1 , uSARA, SARA, CLEAN and the end-to-end UNets.Each point is an average over the 20 simulated observations built from the four different groundtruths from Figure2and the five different -patterns shown in Figures5 and 6c.For AIRI-ℓ 2 and AIRI-ℓ 1 ,  is chosen at the heuristic.For uSARA,  is finetuned manually to optimize the logSNR, resulting in values between 3 and 6 times below the heuristic (in line with Figure9).Error bars show the 95% confidence interval.
Total timing for each method.