Fully Adaptive Bayesian Algorithm for Data Analysis, FABADA

The aim of this paper is to describe a novel non-parametric noise reduction technique from the point of view of Bayesian inference that may automatically improve the signal-to-noise ratio of one- and two-dimensional data, such as e.g. astronomical images and spectra. The algorithm iteratively evaluates possible smoothed versions of the data, the smooth models, obtaining an estimation of the underlying signal that is statistically compatible with the noisy measurements. Iterations stop based on the evidence and the $\chi^2$ statistic of the last smooth model, and we compute the expected value of the signal as a weighted average of the whole set of smooth models. In this paper, we explain the mathematical formalism and numerical implementation of the algorithm, and we evaluate its performance in terms of the peak signal to noise ratio, the structural similarity index, and the time payload, using a battery of real astronomical observations. Our Fully Adaptive Bayesian Algorithm for Data Analysis (FABADA) yields results that, without any parameter tuning, are comparable to standard image processing algorithms whose parameters have been optimized based on the true signal to be recovered, something that is impossible in a real application. State-of-the-art non-parametric methods, such as BM3D, offer slightly better performance at high signal-to-noise ratio, while our algorithm is significantly more accurate for extremely noisy data (higher than $20-40\%$ relative errors, a situation of particular interest in the field of astronomy). In this range, the standard deviation of the residuals obtained by our reconstruction may become more than an order of magnitude lower than that of the original measurements. The source code needed to reproduce all the results presented in this report, including the implementation of the method, is publicly available at https://github.com/PabloMSanAla/fabada


INTRODUCTION
The acquisition of any kind of experimental data is affected by several sources of statistical error, which ultimately translate into a random noise component in the measurements to be recorded. There are different types of noise depending on their physical origin, both related to electronic (thermal noise, fluctuations) and mechanical (non-perfect lenses, antennas, etc.) devices. In astronomy, for example, errors can be produced in the acquisition of the images due to defects in the optics of the telescopes and in the read-out process of the detector (typically a CCD) transforming the light captured by the telescope into an electrical signal. The noise introduced can sometimes be comparable to or even larger than the signal, and different image processing algorithms may be used to recover the information that is buried deep in the data.
Smoothing, where measurements are weighted at nearby spatial or temporal points (using different schemes to assign weights), is E-mail: pablom.sanala@gmail.com one of the most popular techniques to mitigate the effects of random noise (Savitzky & Golay 1964;Cleveland 1979). Nowadays there is a large number of smoothing algorithms, based on many different techniques (e.g., see Goyal et al. 2020, for a review) such as central moving average, data grouping/segmentation (Dabov et al. 2007), fitting smooth functions, different types of statistical analysis, partial differential equations, wavelength transformation filters, linear and nonlinear filtering, sparse models and nonlocal self-similarity models (Gu et al. 2014) and more recently artificial neural networks, see (Ilesanmi & Ilesanmi 2021;Zhang et al. 2017). Rest of these methods rely on some explicit or implicit assumptions about the true (noisefree) signal in order to separate it properly from the random noise. A common assumption is that the signal being retrieved varies gradually and that the data can be fit by a smooth function, see (Katkovnik et al. 2006).
Many techniques analyse the probability that the data correspond to a random Gaussian realisation of the model that attempts to describe the underlying signal plus random fluctuations of known amplitude (El Helou & Susstrunk 2020). In this work, we use Bayesian inference to evaluate and combine different candidate models that iteratively attempt to improve the quality of the fit to the data. This new Bayesian technique incorporates an automatic selection criterion based on the statistical properties of the residuals, and therefore it yields a fully non-parametric method. Although the motivation of our scheme is the application in the field of astronomy, our new algorithm, Fully Adaptive Bayesian Algorithm for Data Analysis FABADA is focused in a general way, and it is possible to generate a smooth model for any type of data. Several algorithms have been developed over the years for denoising data, and their ability to recover the underlying signal from an experimental data set with its corresponding errors has been evaluated and compared according to different standard metrics. Our method is fully described in Section 2, and we use a set of synthetic tests to compare it with other prescriptions in the literature. Details of the comparison procedure are provided in Section 3, and results are discussed in Section 4. Our main conclusions are briefly summarised in Section 5.

FABADA
The goal of our algorithm, FABADA is to estimate an unknown signal = ( ) at different locations ∈ specified by a 1D or 2D coordinate that belongs to the data domain. Its input are independent measurements, contaminated with random Gaussian white noise ( ) ∼ N (0, ( )), and the associated errors = ( ). The noisy observational data, : → R, have the form ( ) = ( ) + ( ), and FABADA returns an estimationˆ=ˆ( ) of the original signal ( ) that is statistically compatible with the measurements ( ).
The approximation that different measurements (pixels in an image, or wavelengths in a spectrum) are independent is crucial to our method: it is assumed that the noise values are fully independent random variables, and any correlation between adjacent measurements can only be attributed to the signal. This is not necessarily true in realistic astronomical observations, since both the measurement process and the arithmetic operations carried out by the data reduction pipeline may introduce a certain degree of correlation between adjacent errors. Although we think our assumption of fully independent noise is a good approximation in most cases of practical interest, it is worth noting that other approaches to account for spatial correlation and denoising, such as COmpressed Sensing (Farrens et al. 2020), andINLA (González-Gaitán et al. 2019), can be found in the literature.
In our method, we apply Bayes' theorem in an iteratively way to generate smooth models of the noisy measurements. Therefore, we must define a suitable likelihood function to evaluate these models to be tested, and specify a prior probability distribution for the signal to initiate the process. Our likelihood is based on the statistics of a Gaussian process, and we start from an improper, constant prior. Then, we evaluate different smooth versions of the posterior probabilities until a certain condition is reached, and we combine all the smooth models to produce the final estimationˆ( ), taking into account both their Bayesian evidence and the 2 of the residuals.

Iterative models
FABADA, as the name suggests, is a fully automatic algorithm that only takes the data set = ( ) and its associated errors = ( ) as input, = { , }, where the domain may have one or two dimensions. Within the field of astrophysics, may be, for example, the spectral energy distribution as a function of wavelength , or a broadband photometric image of an arbitrary region of the sky. This is indeed the kind of data that we will use to illustrate the performance of the algorithm, but the results can be easily generalized to any other type of empirical measurement in one or two dimensions affected by Gaussian random errors.
Under this assumption, the likelihood function L for a model ( ) of the underlying signal is given by (1) using the notation described in Table 1. Since we do not assume any previous knowledge about the signal, the prior probability distribution for our initial model 0 will be homogeneous in the range of all possible values, i.e. ( 0 ( )) = 1 for all , during the first iteration of the algorithm. According to Bayes' theorem, the posterior probability distribution is in this case the straightforward multivariate Gaussian centered on the empirical measurements . The expected value of 0 ( ) is thus 0 ( ) ≡ 0 ( ) = ( ), its variance 0 ( ) = 2 ( ) is determined by the corresponding errors, and the Bayesian evidence of the model E = ∫ ( ) L ( | , ) d reduces to unity. To create smoothed versions of this first model, we iteratively apply a central moving average filter to the expected values ( ) of the last iteration. We simply adopt avg = 3 in 1D (including the two adjacent measurements) and avg = 9 (a 3 × 3 square) in 2D. The basis of our method is to use information from neighboring points to update our priors regarding the correct value of ( ). For every iteration > 0, the prior probability distribution becomes a Gaussian centered on the smoothed expectation (4) of the posterior distribution from the previous iteration, using its local variance ( ) as a measure of our uncertainty. We stress that we are forsaking the strict Bayesian philosophy (the prior distribution should be, as its name indicates, totally independent from the data) on purely practical grounds. Once we accept this premise, one may compute the posterior probability distribution where and as well as the evidence As long as the model remains relatively close to the data (within an environment of the order of √︁ 2 ( ) + ( )), the evidence in its favour will be high, but if it departs significantly, the exponential term will indicate that we have reached the maximum smoothing that is statistically compatible with the data.

Stopping criteria
On the one hand, we evaluate the average evidence of model as and we ensure that we iterate until it reaches a maximum; E < E −1 . Since our priors are based on the data themselves, it is not surprising that the evidence increases rapidly during the first few iterations of the algorithm. However, our model uncertainties ( ) decrease monotonically, and the smooth models quickly become less compatible with the data than the first (self-tuned) estimates. Thus, the average evidence reaches its maximum at a very early stage, and then it slowly declines as the number of iterations increase.
In order to overcome the bias arising from the lack of independence of the priors (i.e. overfitting), we made use of the chi-square statistics and iterate until 2 > − 2 (the absolute maximum of the associated probability density function). Note that this condition is achieved when, on average, the model has departed by about one sigma from the observational measurements.
We always impose both criteria, so that the algorithm stops on the largest number of iterations. Usually this is set by the chi-square statistics, although the evidence-based condition may dominate at very high signal-to-noise ratio.

Model selection
Once the algorithm stops, after iterations, we combine all the models ( ) to generate our final estimationˆ( ) of the real signal ( ) as a weighted sum over the expected valueŝ where the adopted weights ( ) are an important ingredient of our algorithm. We explored several phenomenological prescriptions in an attempt to optimise the performance, stability, and reliability of the results. On the one hand, our tests show the convenience of taking into account the local evidence E ( ) calculated at each iteration for every location . This quantity adapts to the local structure of the data, giving more weight to the smoother models (larger number of iterations) in areas where the data are indeed smooth, while preserving the information when sharp edges are present.
On the other hand, we use the overall 2 statistic of each model to gauge its compatibility with the measurements , given the errors . The probability density function of the 2 of a large number of random variables can be approximated as a narrow Gaussian centered around − 2. Since our priors are biased because they are based on the data themselves, the maximum evidence occurs at values of 2 that are typically much lower than this value (i.e. they overfit the measurements). Using the probability density function directly would give minimal weight to any model that is not extremely close, 2 − 2, which we find tends to yield overly smoothed models. In order to avoid this problem, we use the actual value of 2 to give more weight to the smoother models, but not so much that the models favoured by the Bayesian evidence are almost completely ignored. We thus adopt as our final prescription for > 0. Of course, this expression is not valid for = 0, since 2 0 = 0 and the initial evidence is E 0 ( ) = 1 for every point. Note that, due to the use of an improper prior, this value is in general very different from E 1 ( ). To avoid these problems, we a posteriori set 2 0 = 2 1 based on the first iteration (i.e. the lowest of all smooth models) and use which is merely a reformulation of (9), assuming that our initial model based on the measurements should be, on average, about one sigma away from the true signal. Thus,

SYNTHETIC TESTS
We develop a battery of synthetic tests based on real astronomical data to assess the performance of the algorithm. More precisely, we apply FABADA to a set of astronomical spectra and images, with different levels of Gaussian random noise, and compare the quality of the reconstructed signal (in terms of the peak signal-to-noise ratio PSNR and the structure similarity index measure SSIM), as well as the execution time, with other methods available in the literature.

Other Algorithms
Over the last decades, lots of effort have been placed into the development of several applications that help in the analysis of digital images in different fields. Noise reduction is one of the basic problems in this context, and we attempt to provide a fair comparison of our algorithm with other methods that are representative of the current state of the art. Leaving aside the techniques based on some kind of training, such as e.g. neural networks (Ilesanmi & Ilesanmi 2021;El Helou & Susstrunk 2020;Zhang et al. 2017), we focus on a more traditional, statistical approach, closer in philosophy to the formalism propose here. It is important to stress that many of these standard methods involve a number of free parameters, and we optimise their values according to the metrics used to compare. Note that, of course, such optimisation is only possible in a synthetic tests, since the true signal in a real problem is unknown. Thus, our results represent an upper limit to the performance of parametric methods. We now briefly describe the main principles and free parameters, if any, of all the techniques that we have considered. A succinct summary is provided in Table 2.

Median filter
One of the classical non-linear digital filtering techniques, it is still often used to remove noise from an image or signal. The main idea of the median filter is to run through the data, replacing each point with the median of neighbouring entries. The number of neighbours used in the median is called the "window", which slides, entry by entry, over the entire signal. For each data point ( ), the region used to compute the median contains, for one dimensional data, ( − 1)/2 neighbours on each side, whereas for two dimensions it corresponds to a square of size centered in ( ). The optimisation procedure consisted on computing the estimationˆ( ) for different values of , starting with the smallest value of the window length, 0 = 3,  Table 2. List of all noise reduction methods used to compared with FABADA along their parameters and space implementations, one or two dimensions. The first six methods are standard parametric algorithms, while the last four are representative of state-of-the-art non-parametric methods. and increasing it according to +1 = + 2(1 + //5), where // denotes the integer division.

Savitzki-Golay filter (SGF)
As first noted by Savitzky and Golay in (Savitzky & Golay 1964), a smoothed version of the data may be obtained by fitting successive sub-sets of adjacent points with a low-degree polynomial using the method of least squares. When the data are equally spaced, the solution of the least squares (i.e., the coefficients of the polynomials) is analytical and independent of the data to be smoothed. Thus, for −1 2 ≤ ≤ − −1 2 , where the two free parameters of the method are the window length of the data region, i.e., the number of data points to be fitted, and the order of the polynomial.
are the ≥ Savitzky-Golay coefficients, andˆ, ( ) is the smoothed result of the filter at position . We vary the window length according to +1 = + 2(1 + //5) and ≤ 10 for the order of the polynomial.

LOWESS
A popular variant of the Savitzky-Golay method is the locally weighted scatterplot smoothing (LOWESS) (Cleveland 1979), where the regions to be fitted are not evenly spaced and the least squares procedure takes into account weighted values of the data, according to their distance from the point to be evaluated. This scheme involves computing the coefficients of the fitted polynomial each time, producing a less efficient algorithm. To compare with FABADA we use the implementation explained in (Cleveland 1979), which uses a linear fit and can only be used for one dimensional data. This implementation has only one parameter, which is the fraction of data points used to accomplish the linear regression at each point. For the optimisation we begin with the smallest fraction possible ( = 2/ ) and then it is incremented it by a factor of 1.5 until the best recovery is found.

Gaussian Filter (GF)
Another classical technique of noise reduction consists in filtering the high frequency components of the data using a Gaussian filter. The fast Fourier transform (FFT) is the most computationally efficient way to convert the data ( ) to the frequency domain˜( ). Once we have the spectrum of the image, defined as the amplitude of the FFT of the data, we can apply a low-pass Gaussian filter to discard the highest modes: where |˜( )| is the distance from the center (zero frequency component) in Fourier space, and 0 is the radius of the filter, equivalent to 0 = 2 / 0 in configuration space. This is again a one-parameter method, in which we select the radius 0 of the low-pass Gaussian filter by evaluating in twenty logarithmic steps between 0 = 1 and 0 = 630.

Wiener Filter
The Wiener filter minimizes the mean square error taking into account that the measurements are a random process where the statistical properties (in particular, the spectrum) of the noise are known. In this work we have used the implementation in the Scipy Signal Tools library (Virtanen et al. 2020;Lim 1990), where the output of the filter given the signal ( ) is given bŷ where and 2 are local estimates of the mean and variance within a window of size , and 2 = 2 ( ) is the the average variance across the data. We increment starting from 3 until the best recovery is found.

Total Variation filter (TV)
The total variation filter is a well known algorithm, first described by Rudin et al. (1992), that aims to minimise the image's total variation (TV) norm, defined as the square sum of the gradients of each pixel, in both directions. In this work, we use the implementation of Chambolle (2004) in the scikit-image library of Python that works either in one or two dimensions. This implementation has one parameter, the denoised weight , and we increase it from 10 −2 to 10 3 in logarithmically spaced steps until the maximum PSNR is found.

Wavelet denoising filter
The Wavelet denoising filter is an adaptive approach to wavelet soft thresholding where a unique threshold is estimated for each wavelet subband. In this work, we used the implementation in the scikit-image library (van der Walt et al. 2014), while the method is described in Chang et al. (2000). This algorithm is non-parametric, and no optimisation is necessary.

BM3D
We also include the Block-Matching and 3D filtering (Dabov et al. 2007) algorithm, which arguably represents the state of the art in the research field of image analysis (El Helou & Susstrunk 2020). A detailed account of this method, where image denoising is implemented as two-step process, can be found in (Dabov et al. 2007). For the first step, the noisy image is divided into equal-size square blocks. For each block, a 3D group is formed with similar regions (block-matching), and noise is attenuated by hard-thresholding the coefficients of a 3D transform. The filtered image is used to estimate the energy spectrum of the signal, and the process can be repeated a second time using a Wiener filtering instead of hard thresholding. The final smoothed result of the image is generated as a weighted average of the denoised blocks in their original positions. This is a non-parametric method, and no optimisation is necessary.

PySAP
Finally, the Python Sparse Data Analysis Package (PySAP; Farrens et al. 2020) is an open-source image processing software package developed for the COmpressed Sensing for Magnetic resonance Imaging and Cosmology (COSMIC) project. This package provides a set of flexible tools that can be applied to a variety of compressed sensing and image reconstruction problems. In particular, PySAP offers a denoising non-parametric automatic algorithm that is used in this work. The denoising algorithm uses the isotropic undecimated wavelet transform from the C++ package, Sparse2D, to decompose the noisy image, and a soft threshold is then applied with weights learned from the noisy image itself. This is a non-parametric method, and no optimisation is necessary.

Data Sample
All the methods explained in the previous section are applied to a set of test data in one and two dimensions (astrophysical spectra and monochromatic images, respectively) with different levels of Gaussian random noise.
An important aspect in the recovery of spectra is the conservation of their features, such as the Balmer break or emission and absorption lines, after noise reduction. For this purpose, we consider three different spectra (represented in Figure 1) that show these characteristics in different degrees. The first spectrum (left) is a Kurucz stellar atmosphere model (Castelli & Kurucz 2003) with an effective temperature = 11500 , metal abundance log = 0.1 and surface gravity log = 5.0, typical of an O/B type star, with a prominent Balmer break at ∼ 400 nm and several strong absorption lines. The spectrum of a supernova remnant, plotted on the middle panel, is a composite of 5 different observations (Blair et al. 2000) from the Faint Object Spectrograph (FOS) instrument of the Hubble Space Telescope (HST). This high-resolution spectrum (0.9 Å/pixel) is characterized by very prominent emission lines, useful for inferring different physical properties of these objects. The last spectrum (right) is taken from the TRDS Brown Atlas (Brown et al. 2014) which consist on a pair of interacting galaxies, Arp 256, in the constellation of Cetus, and it contains a combination of emission and absorption lines with a stellar continuum. The Kurucz and Arp 256 spectra can be found in the Synthetic Photometry SYNPHOT (Lim et al. 2015) Python package that simulates photometric data and spectra, observed or otherwise. The aim of using these different spectra is to obtain a good representation of the possible features that can appear in one-dimensional astrophysical data and see how the different algorithms perform in digging up spectral features out of a noisy signal.
For astronomical images, we consider eight different targets, displayed in Figure 2, that are intended to sample the wide range of features that may be encountered in the field, including planets, stars, diffuse nebulae, and galaxies, either alone or in potentially blended groups. Saturn is arguably the target that is most similar to the ordinary test images (e.g natural landscapes, human subjects) that are often used in the context of digital image processing. In addition, our sample includes two examples of nebulae (Crab and Bubble) dominated by the gaseous component, two with a more significant contribution of the stellar population (Eagle and Ghost nebulae), and a globular cluster full of stars with different brightness. There is also an image with a galaxy pair (NGC 4302 & NGC 4298) in which we can see two different orientations of the galaxies, as well as a galaxy cluster with a wide variety of morphologies and apparent sizes. All of these images have been taken from the Hubble Space Telescope gallery produced by NASA and the Space Telescope Science Institute (STScI). All of them have been compressed to 8-bit values, with a maximal dynamical range of 0 − 255 counts and 512 × 512 pixels size to lighten up the computational load. For simplicity, we have also normalized the astronomical spectra to 255 in order to have the same dynamical range and represent the noise in terms of this value for both dimensions.

Test Statistics
We apply different levels of Gaussian random noise ( ) with constant variance 2 to the real signal ( ): where ( ) = N (0, ), the element ∈ denotes independent measurements (spectrum wavelengths or image pixels), and we assume FABADA 7 that statistical errors are correctly characterised in the input data. Once ( ) is computed, a softened estimationˆ( ) of the real signal ( ) is carried out using the different algorithms explained above. In one dimension, noise levels vary from = 5 counts to = 95 counts, out of the 255 maximal value that sets the dynamical range of our data (i.e. of the order of ≈ 2 − 40% relative errors). We extend the values of to even higher values in two-dimensional data, specifically to 1024 counts (400%) to illustrate the challenging situation (not so seldom encountered in astronomy) that the signal is actually fainter than the background noise.
We evaluate the quality of the reconstruction in terms of the Peak Signal to Noise Ratio (PSNR) and the Structure Similarity Index Measure (SSIM) of the estimatorsˆ( ), following common practice in the signal processing literature.
By definition, the PSNR (usually expressed in decibels, dB) is related to the Mean Square Error (MSE) as PSNR(ˆ) = 10 · log 10 255 2 MSE , where 255 is the dynamical range in our data. In principle, a more faithful recovery of the underlying signal should yield smaller values of the MSE and higher values of the PSNR. The Structural Similarity Index Measure (SSIM) is another typical metric used in image restoration that takes into account properties such as luminance, contrast and structure. It is defined by the expression (Wang et al. 2004) SSIM(x, y) = 2 + 1 2 + 2 2 + 2 + 1 2 + 2 + 2 where and denote the two images being compared, and 2 are their mean and variance, and their covariance. The constants 1 and 2 are two variables to stabilize the division when the denominator approaches zero, and they are usually set to 1 = (0.01 ) 2 and 2 = (0.03 ) 2 , where is the dynamic range of the images. The SSIM metric can adopt values from 0 (absolute lack of correlation) to 1 (high structural similarity).
Another metric that we consider is the CPU time used to generate the estimation of the real data on a 2.40 GHz Intel i9-9980HK CPU along with 16Gb DDR4 2400 MHz RAM memory. Please note that this time corresponds to the final execution time for the given noise level in the Python implementation of the algorithms, once the optimal parameters have been found, but it does not include the time invested in the optimisation, which is considerably larger.

RESULTS
We now assess the ability of our algorithm to recover the underlying signal for the synthetic test cases described in Section 3.2. In order to facilitate the comparison with previous results reported in the literature, we use the Peak Signal-to-Noise Ratio (PSNR) defined in (21), which is just a measure of the Mean Squared Error (MSE), expressed in decibel (dB), as well as the Structural Similarity Index Measure (SSIM) defined in (22). All the results shown from the parametric methods are optimise using the PSNR metric (similar results are obtained if SSIM was used instead). Figures 3 and 4 show two examples of the results obtained by the different algorithms explained in Section 3.1: the median, Savitzky-Golay filter (SGF), Gaussian Filter (GF), Wiener filter, and locally weighted scatterplot smoothing (LOWESS) for one-dimensional spectra, and the median, SGF, GF, Wiener filter and block-matching (BM3D) filters for the images. Each row of both figures provides the recovered estimations of the signalˆ( ) for different noise levels, represented in each column.

Spectra Example
We represent in figure 3 the recoveries obtained for three random realizations with high, low, and extremely low signal-to-noise ratios (SNR) of the Arp 256 spectrum (see Figure 1). The PSNR achieved by each method is quoted within the corresponding panel along with the SSIM index.
At high SNR ( = 10, original PSNR= 28.13 dB, SSIM= 0.52), all algorithms display not only a similar performance, but actually converge to very similar solutions. The highest value of PSNR/SSIM is obtained by the optimised TV filter (33.69 dB / 0.84), a little better than the Wiener filter (33.52 dB / 0.82) and FABADA (33.15 dB / 0.82). The difference is lower than the difference between these three recoveries and any of the others. The improvement with respect to the originally high-quality data is necessarily modest in all cases, of the order of ≈ 3 dB, i.e. an increase of ∼ 50% in MSE. All algorithms are able to correctly trace the presence of the most prominent emission lines, as well as the strong Lyman-absorption line near the peak at the left end of the spectrum. Nevertheless, it is important to note that, while the TV filter provides the best recovery in terms of overall noise reduction, FABADA tends to preserve the true intensity of these features slightly better than any of the other algorithms.
This trend becomes more significant as the noise increases, and it is more difficult to discriminate significant spectral features from Gaussian random fluctuations. In the middle panels, where = 35, all models are able to reproduce the overall shape of the continuum. However, they fail to recover even the strongest absorption and emission lines, although hints of the brightest emission lines are still present in the Wavelet, LOWESS, Wiener, GF and TV filters. Only our prescription is able to provide a good description of these prominent features with this level of noise in the input data, albeit weaker absorption and emission lines are completely lost. This reflects on the values of the metrics, where FABADA and the Wavelet filter obtain the highest values (27.93 dB/0.77) and (27.95 dB/0.76) respectively; a difference of 10.68 dB, which is more than an order of magnitude of noise reduction with respect to the original measurements. However, the Wavelet filter seems to have a decreased resolution since it merges similar regions in wider bins.
If we now turn to the recovery of the noisiest spectrum ( = 95, on the right column), we see this behaviour of the wavelet filter more noticeable. Despite obtaining the highest values of the metrics, almost all the information enclosed in the spectrum is gone. In comparison, FABADA obtains the second highest value of PSNR, while GF and LOWESS obtain higher values of SSIM and close values of PSNR. While the results of FABADA and GF are actually similar, all the spectral information in LOWESS is gone. At these high levels of noise, the MSE and the SSIM metrics are rather inadequate to assess the quality of the reconstructed emission line spectra, because they incur in minimal penalty for failing to reproduce a handful of peaks that are barely statistically significant. The criteria implemented in FABADA are more conservative, and a lot of random fluctuations  Figure 1) by all the models explained in Section 3.1 (rows) for three different noise levels (from left to right columns). The real signal ( ), the noisy input data ( ), and the estimationˆ( ) are represented as red, grey and black lines, respectively. Numbers on each panel quote the PSNR and SSIM obtained by each method, to be compared with the noisy data (column headers). are kept (hence the slightly lower SSIM) together with the most significant remains of the actual signal. It is somewhat remarkable that FABADA manages to recover the brightest line even at this noise level, at variance with the other methods, while still obtaining high values of the noise reduction metrics. We stress once again that this does not necessarily imply a failure of the methods, but of the MSE as a goodness-of-fit indicator. On the other hand, it does highlight the robustness of FABADA in this respect, although the results reported here suggest that perhaps the MSE and SSIM is not the optimal metric to gauge the quality of the recovered solution or, more likely, that it should be complemented with another test statistic that quantifies information loss and/or gives more weight to informative features.

Image Example
Similar trends are observed in the results obtained for the 2D data. Figure 4 shows the recovery of the Bubble nebula image for the whole set of different algorithms, represented in each row, with four different noise levels ( = 10, 45, 95, 255) along the columns. The PSNR and SSIM values obtained for each are also shown in each panel, and a close-up of some structures is provided in the last four columns to illustrate whether the smoothing methods can reproduce their shape and edges. All models yield fairly similar reconstructions for the highest signal-to-noise case ( = 10). The best reconstruction in terms of the MSE and SSIM is provided by the TV method, which improves the PSNR from 28.15 dB to 36.98 dB, more than one order of magnitude of noise reduction in terms of the MSE, and from 0.5 to 0.91 for the SSIM index. The state-of-the-art BM3D obtains similar results with 36.94 dB and SSIM of 0.91. This is around 15% more than the recovery with FABADA (35.85 dB) in terms of PSNR and 10% (0.87) in terms of SSIM. Similar results are obtained with the Wiener filter (36.02 dB/0.89) and with PySAP (35.84 dB/0.89). The other classical filters (median, SGF, GF) obtain comparable results when their parameters are tuned to minimise the MSE, about ≈ 1 dB (∼ 25%) below FABADA's solution. Regardless of the MSE and SSIM statistics, the Wiener filter and PySAP virtually miss some of the filaments in the shape of the zoomed structure, while the other classical filters recover some of the shapes, although in a blurrier way than FABADA, TV and BM3D. Visually, the TV filter and BM3D reproduce the shape of the zoomed structure better, both in terms of sharpness and smoothness. In particular, several edges in FABADA seem to be a little bit more blurry compared to BM3D, together with a clearly visible salt and pepper noise component.
Similar behaviour is found when we increase the noise to = 45. FABADA yields results (30.82 dB/0.78) comparable to the best solutions obtained by BM3D (31.66 dB/0.82) and PySAP (31.03 dB/0.8). Despite its lower PSNR (30.81 dB), the GF obtains a higher value of SSIM (0.82) than FABADA. If we inspect the zoomed region, we see that PySAP is able to effectively filter the high-frequency noise with the cost of not recovering well the gradients and the structure. In contrast, BM3D is able to recover fairly well the gradients of the image, while FABADA and GF recover a blurrier image.  Once again, the advantages of our algorithm become more evident as the noise increases. In the middle range, with a noise level of = 95, one may see that FABADA's difference in MSE with respect to BM3D decrease to 0.37 dB, and it becomes better than PySAP by 0.7 dB, achieving higher values of SSIM in both cases. On the other hand, the GF reaches the highest values of MSE and SSIM, whereas the Wiener, TV, SGF and median methods display increasingly lower values than the other methods. Most of them are able to bring up large-scale gradients and structures, but they fail to recover the smallest filaments (as shown by the zoomed panels) due to the high level of noise. The most significant problem of FABADA is still the salt and pepper noise. The PySAP denoising filter, in turn, returns a composition of smoothed and high-contrast regions that hardly reproduces the true underlying gradients in the signal. The Wavelet filter again seems to merge regions with similar intensity, yielding a lower effective resolution in the final recovery. Finally, it is remarkable how the state-of-the-art BM3D method starts to add some artificial edges to the recovered image.
If we push the noise even further, as is often the case in practical astrophysical applications, the signal itself is comparable to or even lower than the statistical uncertainties. In our last test, the noise level is = 255, equal to the dynamic range of the original data. The GF obtains again the highest values of MSE (26.23 dB, 0.08 dB higher than FABADA's solution), almost an order of magnitude of noise reduction, and an SSIM value of 0.74, 0.02 point higher than FABADA. Despite the artificial lower resolution, the Wavelet filter is able to recover high values of the metrics (25.26 dB/ 0.7), which again gives us a hint that the MSE and SSIM metrics may not be the optimal way to quantify the recovery of this type of image. It is noteworthy to see that these three solutions are far above the rest, by more than ≈ 3 dB in PSNR and ≈ 0.3 in SSIM.
Upon visual inspection of the different reconstructions, including the zoomed region, one can see that almost every algorithm, except FABADA and GF, produces some artefacts in the images either at large or small scales. The GF, however, is very efficient at filtering highfrequency noise, but, by construction, highlights graininess at the characteristic scale of the filter. A similar effect is more evident in the PySAP, TV, SGF, and median filters. Although the wavelet filter is able to recover the large-scale structure of the image, the decrease of resolution is even more noticeable, and spurious futures associated with the wavelet basis appear in the reconstruction. The Wiener filter still mixes regions with very different levels of smoothing, while BM3D is extremely successful in eliminating the high-frequency 'grain', albeit the smooth areas of the real image are transformed into more staggered gradients. This feature might be inherent to the block-matching algorithm, whose aim is to classify similar square sections of the image into groups, thus resulting in patches with similar gradients and/or edges. The fully automatic method developed in this work, FABADA, seems to offer a reasonable compromise solution. At high levels of noise, it is able to reduce the MSE and increase the SSIM at the cost of significantly blurring the image, but the introduction of spurious patterns, apart from salt and pepper noise, is not as severe as in the other methods.

Entire database results
We have computed the PSNR, SSIM, and CPU time for all data samples, noise levels, and methods as a function of , averaged over ten different random realisations. The number of cases that a specific algorithm has recovered the best solution, as well as the average difference with respect to the optimal choice, are quoted in Table 3.

One dimension -Spectra
In one dimension ( Figure 5), FABADA behaves very similarly to the optimised Wiener and the Wavelet filters, particularly in terms of PSNR (top panel). In general, the best results regarding this metric are achieved with the latter methods; 18 out of the 57 (32%) test cases by the Wiener filter, followed by the Wavelet filter with 15 (26%) and FABADA, who obtained the best PSNR for 10 (18%) estimations, being especially successful in the low signal-to-noise regime. Then, the optimised GF obtained the best reconstruction in 9 (16%) cases, and the remaining 5 (10%) correspond to the optimised SGF (4) and the TV filter (1). Neither LOWESS nor the median filter obtained in any case the best values for PSNR. Just taking into account these results, one can see how FABADA performs as well as the best possible solutions of the standard methods typically used in astronomy. The average difference with respect to the highest PSNR achieved by any Figure 5. Peak Signal to Noise Ratios (PSNR), Similarity Structural Indexes (SSIM), and CPU times obtained for all the one-dimensional spectra samples and noise ranges considered in the comparison procedure. In the top set of figures is shown the PSNR in decibels (dB), in the middle is represented the SSIM, and in the bottom one the CPU time in seconds (s). Each figure of both groups is labelled with the reference name given at the top of the column. The dashed yellow line represents the PSNR and SSIM of the noisy data. Solid lines with filled symbols refer to the non-parametric methods. In 1D data, we used two automatic methods, the one presented in this work, FABADA, which is represented with a red solid square, and the Wavelet filter, with light green x. Dotted lines with unfilled symbols refer to the optimised methods. The LOWESS algorithm is represented with the blue diamond, the Wiener filter with the orange circle, the median filter with the green triangle, SGF with the brown hexagon, the GF with the purple cross, and the TV filter with the grey pentagon. algorithm is only 0.575 dB, a little closer than the Wavelet method 0.598 dB. This supports the idea that FABADA automatically achieves the limit of the standard methods when their parameters are tuned to the (unknown) optimal values.
In terms of the SSIM metric (middle panels), our algorithm obtains slightly worse results, being the best option only in 5 (9%) test cases, with an average distance with respect to the highest SSIM of 0.118. The optimised GF filter obtained 23 (40%) of the highest values and an average difference of 0.034. This result is due in part to the 132 spectra, where most of the information is concentrated in the emission lines. At high values of the noise level, almost all standard methods converge towards a horizontal line, where all the information is lost (see e.g. LOWESS in Figure 3). However, the SSIM (and, to some extent, PSNR) do not strongly penalise the misfitting of the emission lines when their statistical significance becomes low, which hints the limitations of these metrics to quantify the recovery of relevant information. Discarding the SN132D spectra, FABADA is seldom the optimal choice, but the average distance decreases to 0.048 in terms of SSIM, compared to 0.022 for the GF and 0.056 − 0.085 for the other algorithms.
As regards to the CPU time, it is easy to see that FABADA has a strong dependence on the level of noise, at variance with most classical methods, due to the increasing number of iterations required to fulfil our 2 stopping condition. This behaviour is also seen in the median, due to a similar increase in the optimal window size, whether other methods are less sensitive to the noise level. FABADA is, in general, significantly slower than the other algorithms (except LOWESS, at high signal-to-noise ratios), although it must be borne in mind that we have not taken into account the time consumed by the optimisation process of the standard algorithms, only the execution time once the optimal parameters have been found.

Two dimensions -Images
For the two-dimensional images ( Figure 6) BM3D reaches the highest PSNR for 60 out of our 112 (54%) test cases with different target Figure 6. Peak Signal to Noise Ratios (PSNR), Similarity Structural Indexes (SSIM), and CPU times obtained for all the two-dimensional image samples and noise ranges considered in the comparison procedure. In the top set of figures is shown the PSNR in decibels (dB), in the middle is represented the SSIM, and in the bottom one is the CPU time in seconds (s). Each panel is labelled with the reference image name given. The dashed yellow line represents the PSNR and SSIM of the noisy data. Solid lines with filled symbols refer to the non-parametric methods. In 2D data, there are four automatic methods, FABADA, which is represented with a red solid square, the block matching 3D filtering BM3D, with blue stars, the Wavelet filter with light green exes, and the PySAP with dark yellow diamonds. Dotted lines with unfilled symbols refer to the optimised methods. The Wiener filter is represented with the orange circle, the median filter with the green triangle, SGF with the brown hexagon, GF with the purple cross and the TV filter with the grey pentagon. and noise levels, followed by the optimised GF with 23 (21%), the Wavelet filter 17 (15%), FABADA with 6 (5%), TV filter with 5 (4%) and 1 (1%) that correspond to the optimised median filter. Neither SGF, PySAP, nor the Wiener filter ever obtained the highest values. Nevertheless, BM3D is on average 1.44 dB below the optimal choice, comparable to the 1.43 dB of FABADA, 1.50 dB of GF, and 1.77 dB of the Wavelet filter due to the significantly different trends observed at different noise levels.
In general, BM3D stands over the other methods, including FABADA, at high SNR ( 95 dB), in particular for the Saturn image. Its collaborative filter is particularly well suited for periodic data, or images with repetitive patterns, which are virtually absent in other test cases. The stars image would be a paradigmatic example, and the difference in this test is insignificant.
On the other hand, FABADA, the Wavelet filter, and GF predominate in the low-SNR regime ( 95 dB), where BM3D only achieves the best reconstruction in 10 out of the 56 (18%) test cases. The remaining 46 are achieved by GF with 22 (39%), the Wavelet filter with 17 (30%), FABADA with 6 (12%), and the median filter with 1 (1%). Furthermore, BM3D is on average 2.86 dB below the highest value, while FABADA, GF, and the Wavelet filter are only 0.57, 0.59, and 0.65 dB lower, respectively. Essentially, FABADA, the Wavelet filter, and the optimised GF achieve remarkably similar results, and they perform better than BM3D at low signal-to-noise ratios. Despite the high values obtained by the Wavelet filter, we see in our examples that small-scale details in the image are lost, and they become replaced by artificial structures arising from the wavelet basis. This is not seen either in FABADA or BM3D.
This behaviour is also seen in the middle panels, where the SSIM is represented. From the 54 (48%) of the highest values achieved by BM3D, only 8 (19%) are above > 95 counts, and the difference with respect to the highest value of SSIM increases with noise level from 0.1 to 0.2. Although in this metric FABADA only achieves the best recovery in 4 (5%) cases, 1 (7%) at high noise levels, it is only 0.1 apart, on average, from the best solution. The optimised GF obtains the highest solution for 23 (15%) cases and 25 (13%) in the low SNR regime. Additionally, the Wavelet filter produces better estimations than FABADA in 17 (15%) cases, and 15 (13%) in the low SNR regime. Despite this difference, GF and the Wavelet filter are still 0.11 and 0.12 below the best solution (0.07 and 0.081 at low SNR) respectively, again very close to the results obtained by our algorithm.
This change in the trend of the best solution is at the cost of CPU time. While FABADA seems to converge to its solution in an efficient way in the high SNR regime (consistently below the second), for high noise levels (low SNR) the time rises up to 30 s, as shown in the bottom panels. Once again, for the optimised models we only take into account the execution time with the fine-tuned parameters already known. The results are similar to the one-dimensional case, but the time scale increases with the problem's dimensionality. The Wavelet filter is the fastest algorithm, obtaining an average time of 0.02 seconds, while the PySAP denoising filter seems to be the slowest, with 18.24 seconds. FABADA also features a high execution time (8.08 seconds on average), whereas BM3D, GF, TV, the Wiener filter, and SGF yield 8.7, 0.87, 0.05 and 0.04 seconds, respectively. However, the performance of FABADA and the median filter vary significantly as a function of SNR. Combining these results with the above metrics for the quality of the reconstruction, we argue that our method provides a competitive alternative both at high SNR, where it achieves slightly less reliable results than the state-of-the-art algorithm BM3D at a fraction of the computational cost, as well as in the low SNR regime, where it provides a more faithful reconstruction after a significantly larger execution time.

CONCLUSIONS
In this work, we present the theory and implementation of a novel automatic algorithm for noise reduction: the Fully Adaptive Bayesian Algorithm for Data Analysis (FABADA). Our method iteratively evaluates progressively smoother models of the underlying signal and then combines them according to their Bayesian evidence and 2 statistic. The source code is publicly available at https://github.com/PabloMSanAla/fabada.
We compare FABADA with other methods that are representative of the current state of the art in image analysis and digital signal processing. For this comparison, we used the most typical metrics, the Peak-Signal-to-Noise-Ratio (PSNR), which is a measure of the Mean Square Error (MSE), the Structural Similarity Index (SSIM), and the CPU time. One important advantage of our method, shared by BM3D, PySAP, and the Wavelet filter, over classical algorithms is the absence of free parameters to be tuned by the user. Our results suggest that FABADA, the Wavelet filter and BM3D achieve values of PSNR or SSIM comparable to or better than the best possible solution attainable by the classical methods. Beyond the precise values of the global quantitative metrics, both FABADA and BM3D are quite successful in adapting to the structures present in the input data. Perhaps the most significant difference between them is that FABADA's priors assume that the signal is smooth, whereas BM3D uses blockmatching to look for repetitive patterns. This might be relevant when one must recover the height and shape of the spectral features in 1D or the gradients and boundaries in 2D. We argue that FABADA appears to offer a trade-off between noise reduction, increasing the metric values significantly, in a way that is statistically compatible with the data, keeping significant features without introducing considerable artifacts.
Regarding execution time, non-parametric methods are more expensive than the classical alternatives once the optimal values of their parameters are known, something that is of course impossible in practice. FABADA is faster than BM3D at high SNR, although it usually yields a poorer reconstruction, and the trend reverses for low SNR.

DATA AVAILABILITY
The Python implementation of the method developed in this work is publicly available at https://github.com/PabloMSanAla/ fabada. The images used in this work were taken from the Hubble Space Telescope gallery produced by NASA and the Space Telescope Science Institute (STScI). The spectra used in this work were taken from Castelli & Kurucz (2003); Blair et al. (2000); Brown et al. (2014). reference PROID2021010044, and from IAC project P/300724, financed by the Ministry of Science and Innovation, through the State Budget and by the Canary Islands Department of Economy, Knowledge and Employment, through the Regional Budget of the Autonomous Community. Yago Ascasibar acknowledges financial support from grant "Starbursts throughout the evolution of the Universe" (PID2019-107408GB-C42/AEI/10.13039/501100011033) from the AEI-MICINN, Spain.