A primal-dual data-driven method for computational optical imaging with a photonic lantern

Optical fibres aim to image in-vivo biological processes. In this context, high spatial resolution and stability to fibre movements are key to enable decision-making processes (e.g., for microendoscopy). Recently, a single-pixel imaging technique based on a multicore fibre photonic lantern has been designed, named computational optical imaging using a lantern (COIL). A proximal algorithm based on a sparsity prior, dubbed SARA-COIL, has been further proposed to solve the associated inverse problem, to enable image reconstructions for high resolution COIL microendoscopy. In this work, we develop a data-driven approach for COIL. We replace the sparsity prior in the proximal algorithm by a learned denoiser, leading to a plug-and-play (PnP) algorithm. The resulting PnP method, based on a proximal primal-dual algorithm, enables to solve the Morozov formulation of the inverse problem. We use recent results in learning theory to train a network with desirable Lipschitz properties, and we show that the resulting primal-dual PnP algorithm converges to a solution to a monotone inclusion problem. Our simulations highlight that the proposed data-driven approach improves the reconstruction quality over variational SARA-COIL method on both simulated and real data.


Introduction
Optical fibres are used for imaging in-vivo biological processes, in particular for microendoscopy.To enable decision-making processes for in-vivo observations, the fibre must be stable to movements (e.g., bending), and enable to produce accurate imaging (with high spatial resolution).On the one hand, standard single-fibre coherent fibre bundles can provide resolutions of a few microns, and can facilitate observation of disease processes at the cellular level when combined with fluorescent contrast agents [1,2].Nevertheless, they are limited either in resolution or in stability.Instead, multimode fibres can be used, that can potentially deliver an order of magnitude higher spatial resolution, but they often encounter calibration issues when bending the fibre.On the other hand, multicore fibres coupled with a photonic lantern (MCF-PL) have recently been developed, to enable high resolution imaging and robustness to fibre movement [3,4] (see left part of Fig. 1).Distinct multimode light patterns are projected at the output of the lantern by individually exciting the single-mode MCF cores.Examples of patterns are given in Fig. 1 (right).Imaging through a MCF-PL leads to an ill-posed linear inverse problem, where the objective is to estimate an original unknown image from the photons detected by the single-pixel detector (see Background Section).
In [4], an iterative variational method (SARA-COIL) is proposed to estimate the original image from the MCF-PL measurements.It is based on the primal-dual Condat-Vũ iterations [5,6], for solving a sequence of constrained problems in a redundant wavelet domain.The authors show that it enables accurate reconstruction on simulated data.However, a drop on the reconstruction quality was observed on real data.Further, SARA-COIL is based on a reweighting-ℓ 1 scheme approximating a log-sum prior [7], coupled with multiple wavelet transforms, leading to an overall computationally expensive approach with weak convergence guarantees.
We aim to improve the reconstruction quality, yet ensuring the reliability of the solution.Motivated by the good performance of hybrid optimization method involving neural network (NNs) in computational imaging [8,9,10,11], we develop a new plug-and-play (PnP) algorithm based on primal-dual iterations.The proposed approach solves a constrained problem, using a learned denoising NN to replace the sparsity prior.We train the NN to hold desirable Lipschitz properties ensuring the stability of the provided solution [12].In addition, we show that the limit point of the proposed algorithm is a zero of a monotone operator.The proposed approach outperforms SARA-COIL on both simulated and real data.

Main contributions.
The first main contribution of this work is to demonstrate the efficacy of a data-driven PnP algorithm for the recently developed COIL modality.The second contribution is to develop a PnP strategy for solving the Morozov formulation of the problem (i.e., by addressing the degradation model through a constraint), which requires a primal-dual formulation.Compared with Tikhonov formulation, the known advantage of a Morozov approach is to make the choice of hyperparameters easier.The third contribution is to provide theoretical convergence guarantees for the provided PnP primal-dual formulation.These guarantees are grounded on the previous recent work of some of the authors [12], which allows to build the denoiser as the learnt resolvent of a maximally monotone operator.Such a resolvent is a versatile tool which makes the class of considered denoisers quite flexible, while leading to a sound interpretation of the recovered image as a solution to a monotone inclusion problem.Although many interesting PnP approaches have been developed during the last few years (see e.g., [13] and references therein), we believe that the two last contributions are pretty original in the field.
We show on simulated and real data that the proposed PnP algorithm outperforms the state-ofthe-art variational reweighted-ℓ 1 approach.
Outline.The remainder of the article is organized as follows.In the Background section we describe the formal COIL inverse problem, and provide some background on PnP in the context of monotone inclusion problems.The proposed approach is presented in Section Proposed Primal-Dual PnP algorithm.In Section Experimental results we evaluate the performance of the proposed PnP approach on simulated and real data.

Multicore fibre with photonic lantern
Light patterns generated by the PL are projected onto an object (e.g., tissue) and light returned from the object (e.g., fluorescence) is detected by a single-pixel detector.We consider M multimode light patterns.The m-th pattern (for m ∈ {1, . . ., M }) produces a scalar measurement y m ∈ R corresponding to the sum of the pixelwise multiplication of the pattern and the image of the object of interest.Formally, the observations y ∈ R M are obtained as where x ∈ R N is the unknown image (reshaped to a column vector), Φ ∈ R M ×N is the linear measurement operator, and w ∈ R M is a realization of a random perturbation.Each row of Φ contains one pattern (as per Fig. 1), reshaped to a row vector.The distribution of the noise is not known exactly, but assumed to have a bounded energy, i.e., ∥w∥ 2 ⩽ ε for ε > 0.

SARA-COIL methodology
The SARA-COIL method [4] defines the estimate as where f •Ψ is a regularization function used to promote sparsity in a transformed domain.In (2.2), Ψ ∈ R S×N is the concatenation of the first eight Daubechies wavelets transforms and the Dirac basis, so S = 9N is the dimension of the transform sparsifying domain, and f : R S → (−∞, +∞] is a log-sum penalization function [7,14,15] given by with α > 0. The authors in [4] propose to solve (2.2) using a reweighting-ℓ 1 approach [7], combined with a primal-dual algorithm [5,6].Formally, the primal-dual algorithm is used to sequentially solve a collection of problems of the form where ∆ ∈ R S×S is a weight diagonal matrix, whose diagonal elements are chosen according to the current estimate of x (see, e.g., [15,16] for more details).SARA-COIL showed a good performance on simulated data, but with a significant drop on real data.In addition, this method suffers from the high computational cost mentioned in the introduction.Finally, although recent works provided theoretical guarantees for some reweighted-ℓ 1 approaches solving log-sum problems [17,18,19,20], SARA-COIL does not satisfy the necessary conditions to ensure its convergence to a solution to (2.2).
In this work, we propose a different approach for solving the COIL inverse problem (2.1), combining optimization and deep learning, that does not necessitate to use a reweighting procedure.

Learning MMOs
Recently, optimization-based approaches have been made more powerful by coupling them with NN models.In particular, unfolded and PnP methods are highly efficient for solving inverse imaging problems [8,10,21,22,23,9,11].In this work we focus PnP methods.In a nutshell, they consist of replacing some steps in an optimization algorithm by a NN.For image recovery, denoising NNs are often used to replace proximity operators related to the regularization (e.g., replacing the softthresholding operator associated with ℓ 1 regularization).
In [12], the authors propose to use maximally monotone operator (MMO) theory to design PnP algorithms for solving monotone inclusion problems.The objective is to where A and B are MMOs, i.e., for every Such problems can be solved by algorithms grounded on MMO theory, including forward-backward (FB), Douglas-Rachford (DR), primal-dual approaches [30,31,32,5,33,6].For these algorithms, B can be handled through its resolvent operator J B = (Id + B) −1 [30].Depending on the scheme, operator A is then handled either explicitly (e.g., FB algorithm), through its resolvent operator (e.g., DR algorithm), or using further splitting (i.e., primal-dual methods).
In [12], the authors showed that the resolvent J B of a stationnary MMO B can be approximated by a learned feedforward NN J.They hence paved the way to using algorithms grounded on MMO theory in a PnP fashion, with theoretical guarantees.The authors showed that if J is FNE, then any sequence (x n ) n∈N generated by the resulting PnP algorithm converges to a limit point x.They also proposed a FB-PnP algorithm for solving (2.4), when A is the gradient of some convex and Lipschitz-differentiable data-fidelity function.They showed that for this FB-PnP algorithm, x satisfies 0 ∈ A( x) + γ −1 B( x) for B = J −1 − Id and γ ∈ (0, +∞) being the step-size of the FB-PnP algorithm.To the best of our knowledge, this MMO/PnP perspective has not been used yet for solving (2.4) considering a non-differentiable data-fidelity, and using other iterative schemes than FB.

Proposed Primal-Dual PnP algorithm
SARA-COIL solves a collection of problems of the form (2.3), which is a particular case of (2.4) with In this work, we propose to learn a generic operator B, i.e., find that aims to better regularize the solution, hence removing the need for reweighting.An efficient algorithm for solving Problem (3.2) is the primal-dual Condat-Vũ algorithm.The convergence proof of this algorithm is based on MMO theory, and it has the advantage that it does not require operator Φ to be inverted.
In this work, similarly to [12], we propose to characterize B through its resolvent.Let J θ be an operator parameterized by a vector θ.
, where Q θ is a 1-Lipschitz operator, then J θ is the resolvent of a MMO B. By modelling Q θ as a NN, the model parameter vector θ can thus be learnt to provide an optimal choice for the MMO regularizer.The resulting PnP method is given in Algorithm 1.Then, the following convergence result naturally follows from [5,6] .Theorem 3.1 Let (x k ) k∈N be a sequence generated by Algorithm 1. Assume that τ σ∥Φ∥ 2 < 1, and that J θ is chosen as described above.Let B be the MMO equal to J −1 θ − Id.Assume that there exists at least a solution x to the inclusion Algorithm 1 PnP primal-dual algorithm for solving (3.3) (3.3) Then (x k ) k∈N converges to a such a solution.
Unlike the convergence results of the primal-dual algorithms presented in [5,6], the primal stepsize τ acts not only on the convergence profile but also on the set of solutions (3.3).In particular, this theorem shows that B is equal to B up to the multiplicative factor 1/τ .A similar behaviour was observed in [12] for the FB-PnP algorithm, as discussed in the previous section.

Training
Similarly to [12], we train J θ , on a denoising task, to satisfy the desired 1-Lipschitz condition.Let (x i , z i ) i∈I be a set of pairs of ground truth images (x i ) i∈I and associated noisy images (z i ) i∈I .For every i ∈ I, we have z i = x i + υw i , where w i ∈ R N is a realization of an additive standard normal random variable, and υ > 0.
The vector of parameters θ of the NN is learned so as to The first term is the standard ℓ 2 loss for training denoising NNs, while the second term is a Jacobian regularization introduced in [12] to ensure that Q θ = 2 J θ − Id is 1-Lipschitz.For every i ∈ I, ∇Q θ (z i ) denotes the Jacobian of Q θ computed at z i , and (λ, δ) ∈ (0, +∞) 2 are regularization parameters.In particular, λ aims to balance the contribution of ℓ 2 loss function with the contribution of the Jacobian regularization, and δ is a parameter chosen to bound the norm away from 1.In our experiments, we set δ = 0.05, and choose manually λ to be the smallest parameter ensuring the 1-Lipschitz condition on Q θ .The spectral norm ∥∇Q θ (z i )∥ S is computed using a power method coupled with back-propagation.In [12], it was shown that in practice, when λ is chosen large enough, then ∥∇Q(x)∥ < 1 for x belonging to a neighbourhood of the training dataset, which is a sufficient condition to ensure the convergence of Algorithm 1 (see Theorem 3.1).
We train three DnCNN networks [34] considering different noise levels υ ∈ {5, 10, 20}, on NVIDIA GeForce RTX 2080 Ti GPUs provided by [35].We use Adam [36] optimizer on 60% images of the ImageNet [37] validation dataset, converted to grayscale images with values in [0, 255], with batch size 40, patch size 64 × 64, learning rate 5 × 10 −5 , and a scheduler reducing the learning rate by 10% when validation runs without improvement.We further use 20% of the ImageNet validation dataset as test set, and the remainder is used for validation during the training.
Training results are summarized in Table 1, where the 10, 000 images of our test dataset are used to check that max x ∥∇Q(x)∥ S ⩽ 1 and evaluate the PSNR values for the denoising task.For each value of υ, we choose the smallest λ needed to ensure that the previous inequality holds.

Experimental setting
We use the same MCF-PL setting as in [4,38].The PL was added at one end of ∼ 3m of MCF with 121 single-mode cores in a 11 × 11 square array.Each core was individually excited using coherent 514nm laser light, generating 11 2 = 121 different multimode patterns of light.To augment the total measurement number M , we also consider a setting where the fiber was rotated 9 times by 40 • around the optical axis, creating a total of 121 × 9 = 1089 patterns.The measurement operator Φ corresponds to the concatenation of the M = 121 or M = 1089 patterns, each of size N = 377 × 377.
These patterns are used for both simulated and experimental data.For simulated data, measurements are created according to Model (2.1), using the 1089 patterns and a noise level yielding an input SNR of 30dB.For experimental data, measurements were acquired using the fiber and a single-pixel camera, where the object was moved into the beam path, and the magnitude of the light transmitted through the object was recorded by a detector.We highlight the fact that the MCF was intentionally moved and deformed significantly between pattern calibration and imaging experiments, to further highlight the stability of the PL approach (see [4] for more details).The field of view of all reconstructions using experimentally measured data is 0.9mm×0.9mm in the object plane.

Simulated data
We  with the three trained DnCNNs and with SARA-COIL [4].We observe that PnP-COIL outperforms SARA-COIL on all examples, although the network has been trained on a very different dataset.
Further improvement could certainly be obtained by finetuning the network.Finally, we observed that PnP-COIL usually requires less iterations than SARA-COIL to reach convergence.The average reconstruction time needed for each method is also reported in the last two columns of Table 2, when the algorithms make use of GPU or only of CPU, respectively.The CPU reconstruction time is similar for the four methods, although the case υ = 5 seems to be faster on average.GPU time can only be provided for PnP-COIL1 , and yields important accelerations.

Experimental data
We validate the proposed algorithm on real data, acquired as per Section 4.2, using two images: cross and dots (Figure 3-left).The reconstructions obtained with SARA-COIL for cross are reported in Figure 3. Reconstructions using PnP-COIL are given in Figures 4-6.In each case, we show the results obtained considering the NNs trained on different noise levels υ, for different radius of the ℓ 2 -ball ε in (3.2).For all cases, the reconstructed images become smoother when both parameters (υ, ε) increase.For the cross example, we observe that the proposed primal-dual PnP method leads to higher accuracy in the reconstruction than SARA-COIL (see Figures 4 and 5).For the dots example, we see in Figure 6 that the proposed approach enables finding the 4 dots in the image when M = 1089, while SARA-COIL could barely see one of them.

Conclusion
In this work, we have introduced a new primal-dual PnP algorithm for solving monotone inclusion problems, in the context of computational optical imaging.The proposed approach enables to handle non-smooth data-fidelity terms involving a linear operator, e.g. an ellipsoidal constraint.We showed the outperformance of the proposed PnP-COIL approach on simulated and real COIL data with respect to the state-of-the art variational approach.

Figure 1 :
Figure 1: (left) Schematic representation of the MCF-PL.(right) Two examples of multimode pattern images obtained through the MCF.

) where B 2
(y, ε) is the ℓ 2 -ball centered in y with radius ε, and Φ ⊤ denotes the transpose of matrix Φ.The normal cone N S of a subset S of a Hilbert space H, equipped with an inner product ⟨• | •⟩, is defined as N S : x → {u ∈ H | (∀y ∈ S) ⟨u | y − x⟩ ⩽ 0} if x ∈ S, and ∅ otherwise.
validate our primal-dual PnP algorithm on simulated COIL data (dubbed PnP-COIL).To this aim, we generate 50 images with geometric patterns.Examples are shown in Figure 2 (left column).Average PSNR and SSIM values (with associated standard deviation) are reported in Table 2.For these results, we fixed ε = 50 in Algorithm 1. Quantitative results are very similar for the three trained DnCNNs, showing that PnP-COIL is fairly stable with respect to the training noise level.Visual inspections for three of these images are reported in Figure 2, for images obtained Ground truth

Figure 3 :
Figure 3: Experimental data: Ground truth and reconstructions obtained with SARA-COIL considering M = 121 and M = 1089 patterns for cross image (top) and M = 1089 patterns for dots image (bottom).

Table 1 :
Training results on denoising problem obtained over the 10, 000 images of the test dataset.