Real-time multi-view deconvolution

Summary: In light-sheet microscopy, overall image content and resolution are improved by acquiring and fusing multiple views of the sample from different directions. State-of-the-art multi-view (MV) deconvolution simultaneously fuses and deconvolves the images in 3D, but processing takes a multiple of the acquisition time and constitutes the bottleneck in the imaging pipeline. Here, we show that MV deconvolution in 3D can finally be achieved in real-time by processing cross-sectional planes individually on the massively parallel architecture of a graphics processing unit (GPU). Our approximation is valid in the typical case where the rotation axis lies in the imaging plane. Availability and implementation: Source code and binaries are available on github (https://github.com/bene51/), native code under the repository ‘gpu_deconvolution’, Java wrappers implementing Fiji plugins under ‘SPIM_Reconstruction_Cuda’. Contact: bschmid@mpi-cbg.de or huisken@mpi-cbg.de Supplementary information: Supplementary data are available at Bioinformatics online.

derived a number of optimizations to make traditional Richardson-Lucy multi-view deconvolution converge within less iterations. The implemented variants all used the following formula, but replaced X with the expressions given below: ψ t+1 = ψ t v∈V φv ψ t * Pv * X • Independent: • Efficient Bayesian: • Optimization I: • Optimization II: where ψ t is the estimate at iteration t, φv is the observed data of view v, Pv is the PSF of view v, P * v is the flipped PSF of view v and Wv is the set of all virtual distributions of view v (see Preibisch et al. (2014) for more details).

Convergence and number of iterations
The optimizations derived in Preibisch et al. (2014) and listed above reduce the number of iterations the algorithm requires to converge. Convergence behavior of the different optimization variants were extensively studied in Preibisch et al. (2014) and apply likewise to our implementation. In practice, choosing the number of iterations is a trade-off between achieved quality and computation time. We therefore leave it to the user, who needs to make this decision based on the particular situation (e.g. if deconvolution is performed in real-time, a reduced number of iterations might be preferred for an increase in overall acquisition speed). To facilitate the decision, we provide a tool for interactively investigating different numbers of iterations on a single cross-section (see also the Fiji plugin manual).

CUDA workflow for plane-wise multi-view deconvolution
Our plane-wise multi-view deconvolution implementation uses multiple CUDA streams to overlap GPU computations with data transfer, such that not only copies to and from the GPU, but also loading and saving data from and to hard-drive come without additional cost. The implemented workflow is outlined below. Here, all processing and CUDA calls are asynchronous, i.e. non-blocking. Synchronization is achieved by calls to cudaStreamSynchronize().

Libraries and dependencies
To efficiently calculate the Richardson-Lucy iteration step, convolutions were replaced by multiplications in Fourier domain. Fourier transformations were computed using the cuFFT library (https://developer. nvidia.com/cuFFT). Other arithmetic operations were implemented as custom CUDA kernel functions.
The entire workflow was implemented in the C programming language, using the CUDA specific extensions. The Fiji plugin was implemented in the Java programming language (Oracle Corporation). The C program was interfaced from Java using JNI (Java Native Interface).
The deployed plugin contains for each platform the corresponding binary library, which is statically linked agains the CUDA SDK. Additionally, the cuFFT library is bundled, which is required as a shared library.
Requirements for execution are a Nvidia graphics card that supports CUDA.

SUPPLEMENTARY FIGURE 3: COMPARISON OF DECONVOLUTION RESULTS ASSUMING A TILTED ROTATION AXIS
loc.   Figure 3. Comparison of deconvolution assuming a tilted rotation axis. Simulated data were created as in Supplementary Fig. 2, but the rotation axis was tilted against the x/y plane by a number of angles. For each value, both views and the deconvolution results from the 3D deconvolution and our plane-wise implementation are shown, from top (top row) and along the detection axis of view 1 (bottom row). Peak signal-to-noise ratios (PSNR) are given for both methods (in dB). Line profiles are shown of the ground truth, the simulated data and the deconvolution results in all three dimensions. Even if the rotation axis is tilted by 10 degrees, 3D deconvolution is well approximated by our plane-wise implementation. On our microscope, the rotation axis is usually tilted by less than 1 degree.     Figure 5. Comparison of deconvolution results using different PSFs. Simulated data were created as in Supplementary Fig. 2, using Gaussian PSFs with a fixed axial standard deviation σz of eight pixels, as determined empirically on our microscope. Different values were used for the lateral standard deviation σxy. For each value, both views and the deconvolution results from the 3D deconvolution and our plane-wise implementation are shown, along the rotation axis (top row) and along the detection axis of view 1 (bottom row). Peak signal-to-noise ratios (PSNR) are given for both methods (in dB). Line profiles are shown of the ground truth, the simulated data and the deconvolution results in all three dimensions. While the results obtained by plane-wise and original 3D deconvolution are similar for small values of σxy below a value of two, they start to diverge for higher values. σxy on our microscopes was typically between 1.5 and 1.8 pixels.