Adaptive Semi-linear Inversion of Strong Gravitational Lens Imaging

We present a new pixelized method for the inversion of gravitationally lensed extended source images which we term adaptive semi-linear inversion (SLI). At the heart of the method is an h-means clustering algorithm which is used to derive a source plane pixelization that adapts to the lens model magnification. The distinguishing feature of adaptive SLI is that every pixelization is derived from a random initialization, ensuring that data discretization is performed in a completely different and unique way for every lens model parameter set. We compare standard SLI on a fixed source pixel grid with the new method and demonstrate the shortcomings of the former when modeling singular power law ellipsoid (SPLE) lens profiles. In particular, we demonstrate the superior reliability and efficiency of adaptive SLI which, by design, fixes the number of degrees of freedom (NDOF) of the optimization and thereby removes biases present with other methods that allow the NDOF to vary. In addition, we highlight the importance of data discretization in pixel-based inversion methods, showing that adaptive SLI averages over significant systematics that are present when a fixed source pixel grid is used. In the case of the SPLE lens profile, we show how the method successfully samples its highly degenerate posterior probability distribution function with a single non-linear search. The robustness of adaptive SLI provides a firm foundation for the development of a strong lens modeling pipeline, which will become necessary in the short-term future to cope with the increasing rate of discovery of new strong lens systems.


INTRODUCTION
Strong gravitational lensing has seen rapid progress over the past decade thanks to the advent of targeted searches for strongly lensed systems. Surveys such as SLACS , SL2S (Sonnenfeld et al. 2013a), SWELLS (Treu et al. 2011) and BELLS  have together found over a hundred strong galaxy-galaxy lenses. Of these observed systems, the source and lens galaxies span a range of redshifts, morphologies and environments and are thus beginning to bring unique insight to our understanding of galaxy structure and its evolution. With the number of observations set to significantly increase over the next decade, strong lensing will play an ever growing role in the foreseeable future of extra-galactic astronomy.
Accompanying this fast growing data set of strong E-mail:ppxjwn@nottingham.ac.uk lenses has been the development of a number of different methods for their modeling. These fall broadly into two categories depending on whether the source is modeled by a smooth parametric light profile or a discretized surface brightness distribution. Methods that fall within the former category tend to search over a fully non-linear parameter space spanned by both the lens and source parameters. Such methods have seen regular use in the literature, for example, in the analysis of both SLACS (Bolton et al. 2008a) and SL2S (Gavazzi et al. 2012) where the method has been used to confirm the lensing nature of many systems and determine their Einstein radii. The fast run time and ease of use creates a niche for these fully non-linear methods but they lack sufficient accuracy to perform more complex lens modeling and they break down with irregular source morphology. Furthermore, owing to the typically large and complicated non-linear parameter space, these methods can not guarantee that the global best fit has been reached.
Methods using discretized source surface brightness distributions, which we will refer to hereafter as 'pixelized methods', circumvent these shortcomings; reconstruction of the source using a pixel grid inherently accounts for the possibility of an irregular source morphology and makes calculation of the source light a linear problem (Warren & Dye 2003, WD03 hereafter). This ultimately leads to an improved accuracy in fitting to the observed lensed image which subsequently enables lens modeling of greater complexity. However, pixelized methods can be computationally more expensive to run, especially when a high resolution source grid is used, and typically involve a greater investment of time to set up. These time demands have resulted in a mixture of fully non-linear and pixelized methods finding use in the literature, rather than sole application of the more sophisticated pixelized methods.
A striking omission among strong lensing studies is a demonstration of the reliability of pixelized methods with lens models that are more complex than simple isothermal density profiles. The singular power-law ellipsoid (SPLE) is one such model which gives rise to a more complex parameter space. The SPLE, with a volume mass density of the form ρ ∝ r −α , where the power-law slope, α, is a free parameter, is shown to be an excellent representation of the overall density profile of early type galaxies (ETGs) ; Barnabe et al. 2009). Accordingly, the SPLE gives a significant improvement to the fitting of strong lensing data compared to a singular isothermal ellipsoid (SIE) profile.
SLACS, BELLS and SL2S have made great progress in measuring α for over 100 strong lenses Bolton et al. 2012;Sonnenfeld et al. 2013a). This work indicates that the total density profile of massive ETGs steepens with decreasing redshift, a measurement now being used to constrain galaxy formation models (Oguri, Rusu & Falco 2014;Dutton & Treu 2014). However, determination of α was not made by fitting a SPLE to the strong lensing data, but instead by combining the velocity dispersion of the lens with the Einstein mass calculated from a SIE lens profile. This gives two measurements of a galaxy's mass at two different radii, which is combined through either an empirical mass scaling (Bolton et al. 2008b) or kinematic analysis (Treu & Koopmans 2002;Sonnenfeld et al. 2013b). Whilst this approach has its advantages, for example that an average slope is measured over a relatively wide range of radii, there are also limitations such as assumptions regarding lens mass sphericity when solving the Jeans equations and the fact that the error on the velocity dispersion usually dominates the uncertainty in α ).
In this paper, we advocate the use of purely strong lens data, on the basis that the stronger constraints arising from a full exploitation of the information contained in lensed images offsets the lack of dynamical data (and dispenses with the need for noisy kinematics). We present a new implementation of the semi-linear method of WD03, which we refer to as 'adaptive semi-linear inversion' (adaptive SLI). We demonstrate several important improvements brought about by adaptive SLI over existing implementations. In particular, we concentrate on the application of adaptive SLI to the reconstruction of SPLE lens models and, crucially, we show how standard SLI introduces biases when measuring α. We also advocate the use of the adaptive SLI method in a strong lensing reconstruction pipeline for application to large datasets from both existing surveys, e.g.,SLACS, SL2S and SWELLS, and future surveys such as DES and LSST (Oguri & Marshall 2010).
Several other pixelized methods have been developed over the past decade which improve on the use of a regular Cartesian square grid. Dye & Warren (2005) split up square pixels in high magnification regions to obtain a square grid adapted to the magnification, however the method retains the biases we describe. Tagore & Keeton (2014) make a number of improvements to this method in their pixsrc program. Dye et al. (2014) perform lens modeling on multiwavelength observations simultaneously, a feature we have implemented in adaptive SLI but not used in the present work. Adaptive SLI has the most in common with the adaptive grid of Vegetti & Koopmans (2009). We will show, however, how our different approach to source plane pixelization removes biases which are still present with their adaptive scheme.
The paper is laid out as follows: Section 2 describes the adaptive SLI method. We first describe how source plane pixelization is performed, followed by the linear regularization scheme and lens mass optimization. Section 3 presents a thorough comparison of the square and adaptive SLI methods with a focus on the biases inherent to the SPLE lens model with a square grid. We demonstrate how adaptive SLI removes these biases. Finally, a summary is given in section 4.

METHODOLOGY
In this section we describe the adaptive SLI method. Section 2.1 outlines source plane pixelization and inversion without regularization. We introduce regularisation with a fixed lens model in Section 2.2. Section 2.3 then deals with our approach to optimization of the non-linear lens parameters before concluding with a discussion of practicalities in Section 2.4.

Adaptive Source Plane Pixelization and Inversion
For a fixed set of lens model parameters, the SLI method of WD03 solves the linear problem of determining the surface brightnesses of source plane pixels such that coaddition of their individual lensed images provides the best fit to the observed lensed image. The goodness of fit is quantified by a merit function, G, which in the non-regularized case is simply χ 2 . The solution vector, s, of source pixel surface brightnesses, si, is given by source plane co-ordinates (x, y)j of all J traced image pixels are fed into an h-means clustering algorithm (Hartigan & Wong 1979). This algorithm finds a set of h-cluster centers Ci, i = 1, I that minimize the statistic E given by (2) E is the sum of cluster 'energies', where a cluster energy is the quadrature sum of the distances of each of its associated coordinates (x, y)j to its center Ci.
The h-means algorithm first calculates an initial set of cluster centers dependent on the co-ordinates (x, y)j of all traced image pixels. This center initialization is randomized, such that a completely different set of centers will be calculated for a nearly identical set of co-ordinates. The algorithm then proceeds by alternating between two processes; (i) for the current set of cluster centers, it allocates all traced image pixels to their nearest cluster center. (ii) for this new set of cluster assignments, all cluster centers are recalculated. This continues until either no point is moved or 20 iterations are performed. The resulting pixelization of the source plane is then dependent on the spatial distribution of traced image pixels and therefore also the magnification of the lens model, unlike the Cartesian grid used in square SLI and in a considerably more flexible way than the adaptive scheme of Dye & Warren (2005).
Each source plane cluster is the equivalent of a source plane pixel in the sense that all pixels in the image which belong to it are assigned the same surface brightness. However, unlike previous adaptive schemes, the pixels are completely arbitrary in shape and are not forced to adhere to any prescribed geometric forms which may bias the lens reconstruction. Furthermore, minimisation of cluster energies given in equation (2) ensures that the clusters are contiguous and do not overlap in spatial extent with neighbouring clusters. To simplify plotting of reconstructed sources with this scheme, in this paper we approximate clusters as Voronoi cells whose centres are the cluster centres. We stress that this is purely for visualization and not a feature of the adaptive gridding scheme.
We also employ sub-gridding of the image plane, which splits each image pixel into a set of square sub-pixels. The centres of these sub-pixels are then traced to the source plane for the clustering algorithm and inversion, rather than the centres of the full pixels. This increases the workload 1 of the clustering algorithm but removes pixel aliasing effects from the overall inversion which as we discuss later, are problematic when recovering lens model parameters. Note that this scheme achieves direct sub-gridding of the image unlike the reverse approach of Treu & Koopmans (2004, see their Appendix B) who bilinearly interpolate the source plane for each full image pixel traced there. Throughout this paper, we divide image pixels into 4 × 4 sub-pixels except where otherwise stated.
In addition to sub-gridding, we also mask the observed image to remove background noise. The goodness of fit is then computed only for pixels within this mask. The removal 1 The increase in clustering workload results in an insignificant increase to the duration of one full iterative step when optimizing lens model parameters.
of background noise reduces the number of co-ordinates fed to the clustering algorithm. This ensures both greater efficiency and that a larger fraction of source plane pixels are dedicated to relevant regions in the image. We note that such tight masking can, in principle, result in additional extraneous images being masked out and thus incorrect solutions being deemed acceptable, although in practice this can be easily checked by computing final unmasked lens model images.

Source Plane Regularization
Regularization adds an additional linear term, GL, weighted by a scalar, λ, referred to as the regularization weight, to the merit function such that G = χ 2 + λGL. In essence, this acts like a prior by more heavily penalizing reconstructed sources which are less smooth. In this way, regularisation suppresses over-fitting to the image noise. The solution vector s of source pixel brightnesses is then where H is the regularization matrix which relates to the second derivative of GL as detailed by WD03. The regularity of a square grid makes regularisation very straightforward, but with an adaptive grid, this is less so. We opt to use a nearest-neighbour regularization scheme of the form where we use the h-cluster centers to determine the Nnn nearest neighbours for each cluster. We set Nnn = 3, although we find results are practically unchanged provided Nnn > 1. This scheme computes the difference in surface brightness between neighbouring source pixels, analogous to gradient regularization for a square grid. Different approaches to source plane pixelization have led to a variety of regularization schemes appearing in the literature. For example, the Voronoi grid of Vegetti & Koopmans (2009) uses a scheme analogous to curvature regularization whilst Tagore & Keeton (2014) present multiple schemes for their square grid adaptive mesh, including one which imposes a Sersic light profile on source reconstruction. In comparison, our scheme is more simplistic, although we find it is perfectly adequate for the test cases we investigate in this work. A feature of our approach is that although we regularize a given pixel with Nnn neighbours, it may happen to regularise < Nnn or > Nnn other pixels itself. Nevertheless, any bias that this might introduce is averaged away by the random nature of the source plane discretization as we show in Section 3.3.2.

Model Optimization
The SLI method was placed within a Bayesian framework by Suyu et al. (2006, S06 hereafter), allowing the regularization weight to be set automatically by the data. Adaptive SLI uses this Bayesian wrapper in the form derived by Dye et al.  Figure 2) using the true lens model except the velocity dispersion, σ, which is varied over ±0.5 kms −1 about the true value. Square SLI (dashed line) gives a smooth and relatively noiseless evidence. Adaptive SLI (solid lines) gives rise to a much noisier and fluctuating evidence, owing to its constantly changing data discretization. The evidence is noisy both with (grey line) and without (black line) sub-gridding, although sub-gridding acts to slightly reduce the noise.
(2008) and given by where is the Bayesian evidence which is maximized.
There are three levels of inference in our model optimization. At the first level the model is assumed true (i.e., mass model and λ are fixed) and we solve for the source surface brightnesses that best fit the observed image, as described in section 2.1. The second level finds the hyper parameter, λ, which maximizes the evidence for a given set of lens model parameters. This normalizes the posterior probability distribution in both lens and hyper parameters to give the most probable solution. Dye et al. (2008) set a second hyper parameter at this level, the 'splitting factor', which determines the magnification threshold beyond which source pixels are split into finer pixels. In principle, we could introduce an additional hyper parameter to our adaptive SLI to mimic this behaviour, for example, the source grid resolution. However, since this reduces computational efficiency and is degenerate with regularisation weight, we opt not to implement it in the present work. The third and final level of inference then maximizes to calculate the most probable lens model. This is a search of the posterior distribution of the lens parameters. We adopt the approximation of S06 by assuming that the probability distribution for λ is a delta function which permits direct comparison of evidence between models. In the present work, we use this three tier approach for model optimization in the case of both square and adaptive SLI. In some instances, when demonstrating the effects of fixing the regularization weight, we instead minimize the more basic merit function G = χ 2 + λGL. This is simply motivated by the fact that the other terms in the evidence, as given in equation (5), remain constant in this case.

Data discretization
Square SLI uses a fixed source plane grid, giving a nonlinear parameter space which is smooth over small scales. A small perturbation to the lens parameters gives only a small change in . This is because the source plane co-ordinates, (x, y)j, of traced image pixels remain nearly identical; the way the data is discretized, i.e., the allocations between image and source pixels fij, also remains nearly identical. Source and image reconstruction then proceeds essentially unchanged, leading to only a fractional change in .
With adaptive SLI, small perturbations to the lens parameters have a far more pronounced effect. Although the coordinates (x, y)j are nearly identical, their use in calculating the initial cluster centers is randomized. This means that even a tiny perturbation to these co-ordinates gives a different set of initial cluster centers. Given a different initialization, the clustering algorithm will then ultimately calculate a completely different set of final clusters. While source and image reconstruction will remain similar given the lens model only changes minimally, a relatively large change in is still possible. This is because for each lens model, adaptive SLI performs data discretization in a different and unique way, giving fij matrices that are potentially very different.
Adaptive SLI is seeded such that identical lens parameters derive the same set of clusters, F matrix and therefore . However, initialization is different for perturbations to the lens parameters of the order of just one part in 10 5 , a scale far smaller then that which model optimization probes. Therefore, in the context of a full non-linear search, the inversion of every lens model uses a set of clusters which always discretize the data in a different, unique and unrelated way.
The notion of data discretization being unrelated to previous discretizations turns out to be vitally important. This property is not present in previous methods since in these methods, pixelization is computed in a manner that is deterministic and/or smoothly varying with lens model parameters. A consequence of this for our adaptive scheme, however, is that the non-linear parameter space is very noisy, making determination of convergence and parameter marginalization challenging. This is illustrated in Figure 1. The figure shows the variation of evidence with velocity dispersion, σ, computed using the setup for simulated image 1 described later, keeping all other lens parameters fixed. We show this variation for square SLI and adaptive SLI with and without image sub-gridding. Square SLI gives rise to a relatively smooth evidence surface as σ is varied. However, in the adaptive SLI case, large jumps in occur even over the very small steps in σ of 1 ms −1 plotted, as a result of the unrelated data discretization with each step. Image sub-gridding slightly lessens the size of the jumps by largely removing pixel aliasing effects but the discretization effect remains present.

Optimization Practicalities
Our non-linear search must be suited to sampling this noisy and fluctuating parameter space. This is something traditional MCMC searches are not equipped to deal with, owing to their reliance on a walk up a relatively smooth like-lihood surface. Therefore, we instead use the MultiNest algorithm (Feroz & Hobson 2007;Feroz, Hobson & Bridges 2009), which implements the nested sampling Monte Carlo technique of Skiing (2006). The algorithm initially generates a set of live points within parameter space which probe the smoother large scales to map out the high evidence regions. The lowest evidence points are subsequently replaced iteratively, resulting in convergence towards high evidence regions where noise slowly becomes more prominent. We find that this is a very efficient way to handle the noise within our parameter space and with a sufficient number of live points, MultiNest accurately determines the most probable solutions. We use MuiltNest to perform model optimization with both square and adaptive SLI, the latter requiring comparatively more iterations to complete given its noisy parameter space.
A well recognised problem when performing model optimization of strong lens data is the existence of unwanted solutions which correspond to either an over or under estimated lens mass. In both cases, the reconstructed source resembles a demagnified version of the observed lensed image rather than a much more compact source at the correct solution. While regularization ensures that the evidence of these solutions is well below that of the global maximum, they occupy a large volume of parameter space which MultiNest can waste significant time exploring. We therefore apply a set of coupled priors which ensure that these incorrect solutions are not accessible to MultiNest. We determine these with a grid search over the lens parameters α, the axis ratio, q, and the velocity dispersion, σ, to identify the distinct and isolated regions where the incorrect solutions lie. Initial MultiNest sampling then uses randomized triplets of σ, q and α drawn from this region, with care taken to ensure the entire volume of this region is sampled and no viable solutions are trimmed or lost. After MultiNest has run for a while and achieved a specific accuracy, it switches to elliptical sampling mode where the current live points create an ellipsoidal sampling contour in parameter space and σ, q and α are then drawn instead from these contours.
Once sampling is complete, the set of all accepted points, including the current live points, then map out the evidence surface in parameter space. Each parameter is marginalized over in one dimension to calculate their posterior probability distribution, of which the median is computed to give final parameter estimates and their combination the most probable lens model. Errors presented correspond to 1σ confidence bounds, i.e., the 16th and 84th percentile of the posterior probability distribution.

COMPARISON OF SQUARE AND ADAPTIVE SLI
In this section, we demonstrate the advantages of the adaptive SLI method over conventional square SLI. In particular, we emphasise a variety of reconstruction biases inherent with square SLI which adaptive SLI eliminates. Our comparison makes use of the SPLE lens model which has a volume mass density profile of the form with a variable power-law index, α, and fixed normalisation ρ0r α 0 . We also use a SIE lens profile achieved by fixing α = 2. Deflection angles are computed using the formalism of Keeton (2002) which uses an equivalent velocity dispersion parameter, σ, for lens mass normalisation, relating to the Einstein radius, b, by σ = (bc 2 /4π)(Dos/D ls ), where Dos and D ls are the angular diameter distances from the observer to the source and from the lens to the source respectively. The SIE model has a total of five free parameters; the co-ordinates of the lens centre, (x, y), the velocity dispersion, σ, the axis ratio, q (semi-minor axis/semi-major axis), and lens rotation angle, φ, defined counter-clockwise from the y axis. The SPLE model has the additional sixth parameter, the density slope, α.
We use two synthetic lensed images in our comparison. These are generated using a SPLE lens model with parameters (x, y) = (0, 0), σ = 285 km/s, q = 0.8, φ = 45 • and α = 2. Both images have a pixel scale of 0.048" and are convolved with a Gaussian point spread function (PSF) with full width at half maximum (FWHM) of 0.13". A Gaussian noise map to mimic fixed read noise is added to both images, giving a total signal to total noise ratio of 116 and 104 in the masked regions of image 1 and 2 respectively. Sources are modeled as 2D symmetric Gaussians with FWHM = 0.071". The source of image 1 is centred exactly on the optic axis, giving an Einstein cross image, whereas in image 2, the source is positioned just above the top right cusp of the inner caustic, giving a standard cusp-caustic image. We place the source at a redshift of z = 3.0, the lens at a redshift of z = 0.3 and the cosmological parameters assumed are Ωm = 0.3, ΩΛ = 0.7 and h = 0.7. Figure 2 shows both simulated images, their masks and corresponding simulated sources.
It is likely that accurate modeling of these images is more challenging than the majority of real lensed images. As Lagattuta & Vegetti (2012) discuss, the typically more irregular distribution of light in a real source gives rise to a less degenerate lensed image which in turn allows tighter constraints on the lens modeling. Moreover, our S/N and image resolution are selected to be lower than that presently achieved in the highest quality strong lensing data sets. In addition, multi-wavelength observations of strong lenses are becoming commonplace and through their simultaneous analysis, offer lens modeling of even greater accuracy (e.g., Dye et al. 2014). Our results therefore offer a fairly conservative view of what can be achieved with higher quality and more comprehensive imaging but nevertheless provide a suitable basis upon which to make our comparison of inversion methods.

Source Plane Pixelization
The top row of figure 3 shows the source plane pixelization and reconstruction of image 1 using the input lens model for both square and adaptive SLI. There are clear disadvantages with the use of a fixed grid in square SLI: (i) The grid is set up prior to the inversion, meaning that both the grid size and position are not determined by the lens model and that it may align poorly with the intrinsic distribution of source light; (ii) The fixed pixel area results in a highly varying magnification between source pixels such that a sub-set of pixels will dominate the inversion and unevenly spread the uncertainty; (iii) Outer source pixels may not map to any image pixels and are then constrained solely by regularization. By contrast, adaptive SLI's pixelization matches the lens model magnification and completely removes the need to specify a source plane size or location. Furthermore, uncertainty between source pixels is more evenly spread and there are no source pixels constrained solely by regularization.
The bottom row of figure 3 shows two additional source reconstructions performed on the adaptive grid. Both use the same lens model as before apart from a tiny perturbation to the velocity dispersion of ±1 m/s (i.e., a fractional change of 3.5 × 10 −6 ). All three adaptive pixelizations are globally similar, as expected given the almost identical magnification pattern, but upon closer inspection, it is apparent that the source pixels have significantly different centres, sizes and shapes. We stress again that we use Voronoi cells to simplify plotting whereas the underlying source pixelisation is set by clusters of traced image pixels. The figure serves to illustrate the effect described in section 2.3.1, whereby adaptive SLI derives a completely different set of clusters for every lens model and therefore always discretizes the source plane data in a unique and unrelated way.

SIE Lens Parameter Estimation
Our next aim is to investigate how well the square and adaptive SLI methods can recover the parameters of the input lens model used to generate our simulated images. In this section, we consider the simpler case of the SIE model, i.e., we fix the density slope to α = 2 to match the input value.
In this case, the non-linear search has five free parameters which we limit with top hat priors to reduce the search volume of parameter space: x and y (each with priors −0.05 to 0.05 ), σ (prior 260 km/s to 305 km/s), q (prior 0.7 to 0.9) and φ (prior 40 • to 50 • ). For the square SLI, we set the source plane grid to a size of 0.5 × 0.5 with 20 × 20 . Top row -A comparison of square and adaptive SLI pixelizations for image 1 using the input lens model. Bottom rowtwo additional adaptive SLI pixelizations using the input lens model but with σ perturbed by +1 m/s (left) and -1 m/s (right). The reconstructed source for each inversion is overlaid. The square grid has a resolution of 20 × 20 pixels and size 0.5 × 0.5 . The adaptive grid uses a resolution of 300 pixels and an arbitrarily large size to contain all traced image pixels (but shown here at 0.64 × 0.64 for comparison). All four inversions use image subgridding of 4 × 4; each dot shows the traced location of each image sub-pixel. Note that source pixels are more organically shaped than the Voronoi cells which we display purely for clarity.

SPLE Lens Parameter Estimation
In this section, we apply square and adaptive SLI to our simulated images with the more general SPLE lens profile. We use the same inversion setup as in Section 3.2 and the same priors on the lens position and φ. We use a tophat prior on α over the range 1.75 to 2.25 and we calculate σ and q using randomized triplets as described in section 2.4.
Our initial run using square SLI finds α = 2.041 +0.026  . Top row -reconstruction of image 1 with square SLI using the most likely lens models given in table 2 with α fixed to 1.9, 2.0 and 2.1. Middle row -the corresponding reconstructed source obtained from square SLI (resolution 20 × 20, size 0.5 × 0.5 ; caustic overlaid). Bottom row -The same lens reconstruction but using adaptive SLI (200 source pixels). Both inversions use image sub-gridding of 4 × 4. The geometric scaling of the source plane with α is immediately clear. N i is the number of traced sub-gridded image pixels within the source plane. N Pix is the number of source pixels with a count above 1.5× the background noise, highlighted by a bold pixel edge.
results are inconsistent with the input lens model. As we discuss below, this failure is due to degeneracies within the SPLE lens profile and biases within the square SLI method itself.
In terms of degeneracies, the SPLE model has the well documented mass-slope degeneracy, whereby a more centrally concentrated mass distribution (i.e., higher α) and a lower overall lens mass normalisation produces a similar lensing effect (and vice versa). The net result of this is a geometric scaling of the source plane, such that the source expands for increasing α. This is similar to the fully degenerate mass-sheet transformation (MST), where a transformation of the lens mass alongside a geometric scaling of the source plane produces an identical set of observables. In the case of the SPLE lens, as Schneider & Sluse (2013) show, the degeneracy due to the MST is broken. Whilst this ensures that every lens model gives a unique solution, a strong degeneracy is still present and our lens parameterisation is such that this degeneracy is also tied in with the axis ratio parameter, q. We refer to this three-way degeneracy as the 'σ − q − α degeneracy' hereafter. To illustrate this degeneracy we fit a SPLE lens profile to image 1 using square SLI, with α fixed to 1. though square SLI has the sensitivity to find the maximum evidence, the presence of even minor systematic biases within this degeneracy will push the solution a long way from the true parameter set. Such biases arise due to the arbitrary manner in which the square source plane is gridded as well as features inherent to the method itself. In the following subsections, we describe these biases in detail and demonstrate how adaptive SLI removes them.

Bias 1: The number of traced image pixels
In our preceding square SLI example, we used a source plane size of 0.5 ×0.5 as a compromise between being sufficiently large to encompass the source light and small enough not to compromise the source plane resolution. A consequence of this is that a significant fraction of image pixels trace back to locations outside the source plane. Some of these image pixels which trace to just outside the edge of the source plane still constrain the source due to PSF convolution and sub-gridding, but many will not, and yet all image pixels contribute to the χ 2 term in equation (5). As Vegetti & Koopmans (2009) point out, as the lens model is iterated during optimization, the number of image pixels which trace to a point within the defined source plane area can vary. The resulting variation in the number of degrees of freedom (NDOF) causes problems for model inference if not taken into account.
To explore the effect of varying the NDOF, we use a fixed SPLE model and vary the number of image sub-pixels, Ni, that trace to within a circular source plane aperture centred on the reconstructed source. By varying the radius of this aperture and ignoring traced image pixels outside it, (i.e., we set a null image for all exterior source pixels in fij described in Section 2.1), we vary Ni. The number of source pixels are kept constant so as not to contribute to the changing NDOF; those pixels not within the aperture remain constrained by regularization. This mimics the effect of image pixels tracing outside a regular source plane but without the added complication of varying the parameterization of the problem.
We first reconstruct image 1 using the input lens model and investigate how λ and ln scale with Ni by changing the source plane aperture radius. The χ 2 term in the evidence is calculated in the same fixed image mask for each aperture radius. Figure 5 shows the range of mask radii we use.
The results plotted in figure 6 show that as Ni increases, the regularization weight (selected by maximizing the evi-  (4) is not used and λ is fixed. A reduction in N i results in a higher χ 2 and higher overall merit function χ 2 + λG L dence) also increases. However, the evidence falls with increasing Ni and thus lens optimizations which allow the number of image pixels that trace to the source plane to vary are biased towards lower magnification parameter sets. This is a manifestation of the σ − q − α degeneracy.
To understand this behaviour at a more fundamental level, we investigate this further in figure 7 by plotting the variation of two components of the evidence as expressed in equation (5). The first component is simply χ 2 , which as expected, reflects poorer image reconstruction by becoming larger as Ni is reduced. The second component we plot is the sum of the second, third and fourth terms in equation (5) which together account for the regularization dependence of the evidence. The figure shows that despite poorer image reconstruction, the overall behaviour of the evidence is dominated by the more rapidly varying regularization terms (note that the summation of both terms determines the behaviour of −2 ln since the last term in equation (5) is constant).
Since the evidence appears to cause a bias towards solutions with fewer traced image pixels, we repeat this analysis with a fixed value of λ and the non-Bayesian merit function G = χ 2 + λGL, originally advocated by WD03. The results are plotted in figure 8 which shows that, in exactly the same way that χ 2 in figure 7 behaves, the bias is now present in the opposite sense, whereby solutions favoured are those with the highest value of Ni. This is more intuitive given that image reconstruction should be more accurate when given every possible data point. This demonstrates that regardless of whether the evidence is used or not, lens modeling with a varying NDOF will be biased to solutions which either minimize or maximize Ni and accurate lens modeling with the SPLE profile therefore requires this to be fixed throughout model optimization.
In the case of the SIE profile, while this bias is still present, the σ − q − α degeneracy, which allows the source to geometrically scale, is not. Therefore, even though square SLI allows the NDOF to vary, only parameter sets near the true SIE model actually give accurate image reconstructions. ∆α Image 1 Image 2 Figure 9. α bias, ∆α = α calc − αtrue, plotted against source plane size for both images 1 and 2. All points are generated using a full MultiNest nonlinear search.
The variation of the χ 2 term in the evidence for the SIE profile dominates the variation of the regularisation terms.
In the case of the SPLE profile, the evidence coupled with the σ − q − α degeneracy tries to minimize Ni which corresponds to positively biasing α thus explaining the incorrect lens models initially found. Of course a cut-off will be reached when the expanded source covers the entire source plane and the increase in the contribution from the regularization terms in the evidence from reducing Ni is offest by the more rapidly increasing χ 2 term as source light is lost and the observed lensed image becomes poorly fit. One would therefore expect a relationship between the source plane size and the bias in α, ∆α = α calc − αtrue. Reverting back to the Bayesian merit function, a larger source plane allows solutions corresponding to higher α to be attained before the source expands outside the source plane boundary. Figure 9 confirms that ∆α does increase with increasing source plane size as expected. This continues until a turnover point when the source plane starts to become larger than the maximum extent of the source expansion allowed by the σ − q − α degeneracy. Just beyond the turnover point are intermediate values where Ni still varies but the source plane becomes large enough to lessen the bias. The turnover occurs at different source plane sizes between images 1 and 2 due to differences in the source position with respect to the image caustic and thus differences in Ni.
The requirement that the NDOF must be fixed during lens optimization is naturally satisfied by the adaptive SLI method since every source plane pixellization is derived directly from the magnification map and is independent of the source plane size which keeps Ni fixed. Other inversion methods found in the literature have also identified this problem (for example, Vegetti & Koopmans 2009;Suyu et al. 2013; Collett & Auger 2014), although the level of efficacy with which it has been managed is somewhat variable. Nevertheless, as we discuss below, there are other more pertinent biases which have not been fully appreciated and which are explicitly addressed by adaptive SLI.

Bias 2: Discreteness Biases
Following our discussion in the previous sub-section, we fix the NDOF hereafter when using square SLI, by increasing the source plane size to 0.7 × 0.7 . We also narrow the priors on the lens offsets to ±0.02 and on φ to ±2 • to reduce caustic movement and we use a prior on α of ±0.15 to limit the largest caustic size. These changes are applied to the inversion of both images 1 and 2 and we maintain a source plane resolution 20 × 20 pixels. As discussed, these changes to the inversion setup lead to a far less efficient and less robust inversion, demonstrating the disadvantage of using a fixed grid which is initialized independently of the lens model. Results presented using adaptive SLI retain the wider priors given at the beginning of this section, i.e., those used for the SIE profile and α between 1.75 and 2.25.
Despite these changes, SPLE modeling with square SLI continues to be both inaccurate and dependent on the way in which the source plane is pixelized. Similar findings have also been made by Suyu et al. (2013) who found that the dominant systematic uncertainty in their lens modeling is source plane resolution and Tagore & Keeton (2014) who showed that the results of a SIE lens model change if the setup of either the observed image (noise map, telescope pointing) or source plane (regularization scheme) is varied. Clearly, these systematic effects must be eradicated in order to robustly determine lens model parameters and their error distributions.
The importance of data discretization was, in fact, already highlighted in figure 1, where adaptive SLI's noisy parameter space is a direct consequence of changing just the source plane discretization. The figure shows that a change in the data discretization can give a relatively large change in the value of ln 10. The scale of this change is comparable to the scale over which solutions within the σ − q − α degeneracy vary and therefore selecting a single source plane discretization with square SLI will generally give rise to a significantly biased solution. We hereafter refer to this effect as 'discreteness bias'.
We test for discreteness biases by calculating lens models for images 1 and 2 with square SLI, using the setup described above to fix the NDOF. However, for each lens model, we shift the position of the square grid by a fractional pixel width. This gives a small change in the data discretization such that there is a minor shift in the overall allocation between image and source pixels for an identical lens model. For each image we perform modeling using nine different phase shifts, with sub-gridding off and with subgridding of 4 × 4. Every nonlinear search uses 200 active MultiNest points.
The results are shown in table 3. The third and fourth columns show the value of α obtained for each image without sub-gridding. Phase shifting of the square grid directly impacts the value of α estimated, with several of the values of α being significantly biased for both images. This demonstrates the effect of discreteness bias. For comparison, in the last row of the table we also show the results of adaptive SLI using 200 clusters, an arbitrarily large source plane and 300 active MultiNest points. As the table shows, adaptive SLI also calculates an incorrect lens model without sub-gridding because pixel aliasing effects in the image dominate the inversion. Like Tagore & Keeton (2014), we also find that changing the synthetic image noise realization changes the resulting set of most probable lens parameters obtained if image sub-gridding is not used.
The results of removing pixel aliasing by applying image sub-gridding are shown in the fifth and sixth columns of table 3. The values of α obtained from image 1 with square SLI are now consistent with the input lens model, although an element of scatter still indicates the effect of source plane discretization. However, α obtained with square SLI from image 2 continues to be significantly discrepant with the input value, with many values inconsistent at the 3σ confidence level. Applying sub-gridding with adaptive SLI returns accurate values of α for both images.
We represent these results graphically in figure 10, where the marginalized one-dimensional posterior distribution function (PDF) of the sum of all nine phase shifts for square SLI, marginalized over α, is plotted for both images with a thick black line. Alongside this, the PDF of each individual phase shift is also shown, scaled by 1 3 for clarity. The PDF given by adaptive SLI is shown with a dashed black line. The figure shows that although the summed PDF for image 1 is consistent with the input value of 2.0, in image 2, this is much less so. Since narrower lens model priors were introduced for square SLI in this section, the PDF for α for image 2 falls off more rapidly towards α = 2.15 which artificially lessens the inconsistency that would have otherwise been observed. Weighting the calculation of the summed PDF by evidence gives near-identical results owing to the small difference between the evidence values of each individual PDF.
Conversely, as table 3 and figure 10 show, adaptive SLI is accurate for both images. The fact that data discretization is different and unique for every trial lens model parameter set, regardless of the size of the change in parameters, underlies the reliability of the method (see appendix A for a more detailed discussion). In addition, figures 12 and 13 show the two-parameter adaptive SLI PDFs for images 1 and 2 respectively. In each of these figures, we also show the results of a higher noise run where the S/N was lowered to 70 in each image. The figures show the σ −q −α degeneracy previously discussed and also that parameter errors increase for poorer quality data. This effect is much less obvious with square SLI due to the inherent discreteness biases present. Figure 10 shows the PDF of individual square SLI runs are generally both narrower and more sharply peaked than the PDF given either by their summation or adaptive SLI. Discreteness biases lead to higher evidence values being calculated at the favoured lens model, resulting in significant error under estimation. While this can be alleviated by the summing of multiple inversions with differing methods of data discretization, as was done in Suyu et al. (2013) and figure 10, this still only provides an approximation of the errors and as shown for image 2 may still ultimately give a biased lens model. In appendix A we find error underestimation occurs if we fix adaptive SLI's initialization, thus reintroducing discreteness biases. Our general conclusion is that without fully accounting for all systematics associated with data discretization, a comprehensive estimation of all the lens models associated errors is not possible. Moreover, by accounting for these systematics adaptive SLI permits a more accurate calculation of the marginalized evidence improving the prospects for accurate model comparison.
In addition to the effect of lowering S/N, we also investigate the effect of changing source plane resolution. We use the same phase shifting methodology as previously but now use a higher source plane resolution of 36 × 36 pixels with square SLI. In this case, to improve efficiency, we calculate λ by maximizing (5) for each phase using the input lens model and then keep λ fixed to that value for the entire inversion. We then minimize the basic merit function G = χ 2 + λGL instead of the evidence. Of course this approach can not be used for real observations although we discuss strategies for improving efficiency with real data in section 3.3.3. Figure 11 shows the resulting PDFs obtained with the higher resolution reconstruction. The most striking feature is that they show a much broader PDF. This is a consequence of removing the Bayesian wrapper and fixing the regularisation weight. The variation in the PDFs for each source grid phase indicates that discreteness biases are still present, however, the presence of an additional bias discussed below in section 3.3.3 leads to greater consistency amongst their overall α estimation. A cut off at the upper α prior is also seen, showing the disadvantage of being forced to narrow priors to ensure the NDOF remains fixed. It is clear from this test that discretization biases remain present regardless of the change in source plane resolution.

Bias 3: Effective Source Resolution
The third and final bias is again relevant for the SPLE model and also more generally, those with a degeneracy that allows geometric source scaling. The degeneracy arises with any fixed source plane pixelization if the regularisation weight is not correctly optimized by finding the maximum evidence with each iteration, or if a fixed regularization weight is used.
To demonstrate this effect, we use square SLI with image 1 and a SPLE lens model fixed with the input set of parameters. We vary the source plane resolution in steps from 12 × 12 to 28 × 28 and reconstruct the source with each resolution, keeping the source plane size fixed. With a fixed source plane size and fixed lens model, the number of image pixels traced to the source plane also remains fixed and hence this test is not sensitive to bias 1 where Ni varies. We investigate how the figure of merit χ 2 + λGL varies with source plane resolution when λ is fixed at the optimal value for a 20 × 20 pixel source plane and then how the full evidence varies when λ is optimized for each resolution. Figure 14 shows the results. When λ is fixed, the figure of merit improves near-monotonically to higher source plane resolutions. When λ is optimized, the evidence remains more constant (modulo the variation due to data discretization effects previously discussed). This shows that fixing λ biases lens models with degeneracies which allow the source to geometrically scale on a fixed resolution grid. The reason for this is because the number of source pixels representing the significant source flux varies, thus mimicking a changing source resolution. We illustrate this in figure 4 for the SPLE model where we outline in bold those source plane pixels that have a flux that is a factor of 1.5 above the read noise. The total number of these pixels is labelled by NPix in the figure. NPix can be considered a measure of the effective source plane resolution for reconstruction. In this way, the figure shows that square SLI has a varying effective source Variation of ln and −(χ 2 + λG L )/2 with source plane resolution for square SLI. In the case of ln , λ is optimized for each source plane resolution by maximizing the evidence. For the figure of merit −(χ 2 + λG L )/2, λ is fixed at the optimal value for a source plane resolution of 20. The source plane resolution corresponds to the number of pixels along one edge of a square grid of pixels for a fixed source plane size of 0.7 × 0.7 . plane resolution with different SPLE lens model parameterizations within the σ − q − α degeneracy.
The expectation with the SPLE model is therefore that fixed source pixelizations with non-optimized regularization weights bias lens model parameters to high values of α where the effective source resolution is increased. This behaviour is clearly seen in figure 11 for fixed λ where the PDFs for each source grid phase shift unanimously agree on a value of α that is higher than the input value. We note that the behaviour is also seen to a lesser extent in figure 10 where λ is optimized and yet there remains a bias in the summed PDF for α. The fact that this is not seen with adaptive SLI indicates that the optimization of α with each lens model iteration when using a fixed grid does not completely remove the effect (and figure 14 gives a hint of this).
The bottom row of figure 4 shows that by adapting to the magnification, adaptive SLI retains a more equal effective source resolution for all lens models. While NPix still varies, it fluctuates about a mean value and ultimately the random nature of the pixelization between iterations of the lens model average away any resulting systematics, in the same manner in which discretization biases are removed. Source pixelization schemes suggested in the literature which adapt to the lens model by scaling the size of the source plane according to the caustic size will remove the effective resolution bias to first order, but it is possible that the bias is not completely removed. This is most likely to occur when the caustic is non-symmetric, for example, in the presence of external shear, however this is something we have not investigated in the present work.

Lens Model With Fixed Regularization
We have shown how adaptive SLI, when optimizing λ with each lens model iteration, removes a bias that occurs with the SPLE model when using a fixed source plane pixelization. What we have not considered is how well adaptive SLI copes if λ is not optimized with every lens model iteration.  Table 3. The values of α estimated using square SLI on a set of nine phase shifted grids. Each result corresponds to an individual nonlinear search with the NDOF fixed for every lens model. The third to sixth columns the source plane is size 0.7 × 0.7 and resolution 20 × 20 pixels. The first third and fourth columns are results for images 1 and 2 without image sub-gridding and the fifth and sixth apply to sub-gridding of 4 × 4. The seventh column corresponds to a high resolution, 36 × 36, square grid for image 2 with sub-gridding of 4 × 4 and fixed λ set by maximizing equation (5) and merit function G = χ 2 + λG L . The bottom two rows show the marginalized value of α when summed over all 9 phase shifts and the value given by adaptive SLI. The PDFs are plotted in figures 10 and 11.

N/A
A reduction in the rate of the number of λ optimizations per lens model iteration has the advantage of a significant increase in modelling efficiency, since this reduces the number of times the computationally expensive determinants in equation (5) must be evaluated.
To test the feasibility of adaptive SLI with such a reduced rate of λ optimization, we carry out a simple test where we use adaptive SLI with a fixed λ and the same inversion setup as the previous section. For both images 1 and 2 we perform lens modeling three times, each using a value of λ of 0.5, 1.0 and 1.5 times the optimal value found by maximizing the evidence.
The results are shown in table 4. All values of α are consistent with the input lens model and minimal scatter is seen between results using different λ. From a modelling efficiency point of view, this is very encouraging, demonstrating that a reduced rate of optimizing λ is feasible. An unexpected result is that the errors are lower than those obtained when optimizing λ with each lens model iteration. We believe that this is a result of the probability distribution function for λ not being the delta function that the Bayesian formalism in equation (5) assumes it to be, but we leave a more thorough investigation for future work.
Both Vegetti & Koopmans (2009);Collett & Auger (2014) employ this strategy, although given real data is used their initial lens model is estimated using a fixed λ corresponding to over regularization. This model then maximizes equation (5), giving a new λ which is held fixed to estimate the final lens model. This process therefore performs two non-linear searches with the Bayesian wrapper off, which is only used once to optimize λ after the initial run. While this strategy is clearly worthy of future investigation for adaptive SLI, its handling of data discretization means it performs lens modeling at comparatively lower source plane resolutions anyway, for which the optimization of λ for every set of lens parameters remains fast to compute and therefore viable. While this should be generally be preferred when  Table 4. Marginalized α for images 1 and 2 using the merit function G = χ 2 + λG L with adaptive SLI. λ is fixed to 0.5, 1.0 and 1.5 times the optimal value, λopt, set by maximizing the evidence.
possible, provided the aforementioned approximations are not dominant, a hybrid of both strategies may be best when the analysis of large data sets or high resolution images is considered. This will be of major consideration in the development of adaptive SLI into a streamlined strong lens analysis pipeline.

SUMMARY AND DISCUSSION
We have presented adaptive semi-linear inversion (SLI), a new method for the inversion of gravitationally lensed extended sources. The source plane pixelization is determined by clustering the coordinates of all traced image pixels with an h-means algorithm, deriving a source plane pixelization which matches the lens magnification for every lens model parameter set. The distinguishing feature of adaptive SLI is that it does this using a random initialization and therefore the discretization of source plane data is handled in a completely different, unique way for every lens model. The method then efficiently samples the underlying posterior distribution of degenerate lens models while naturally accounting for systematics which otherwise bias lens modeling. We  have demonstrated this unique feature using the specific example of a SPLE lens model.
In this paper, we have compared adaptive SLI with the standard semi-linear inversion method of WD03 which uses a fixed square grid of pixels to discretize the source plane. In this comparison, we have used two realistic simulated images to highlight the benefits of a source plane pixelization which adapts to the lens magnification. Our selection of the standard SLI method of WD03 for comparison with adap-tive SLI was for simplicity, but many of the consequences arising from use of a square grid apply to use of fixed source pixelizations generally.
We have discussed two key biases inherent to pixelized inversions and we have demonstrated how adaptive SLI removes them. These are: Adaptive SLI -α = 2.0188 +0.0550 −0.0503 Figure 11. As figure 10 but showing the PDF of α given by square SLI using 9 high resolution (36 × 36) phase shifted grids on image 2. The Bayesian wrapper of S06 is used to initially set λ but is then switched off for the inversion. Results correspond to the seventh column of table 3. The PDF using adaptive SLI is that given in the bottom panel of figure 10. The figure shows that a higher resolution source plane does not remove discretization biases. model iterations, due to image pixels tracing outside the source plane, try to either maximize or minimize the NDOF. Such extremes in the NDOF are achieved by lens model parameters which lie a significant distance from the correct parameters. Although a fixed NDOF can be ensured with any pixelized inversion method, this typically results in a less efficient and robust inversion as well as the requirement that priors on lens parameters are narrowed. The pixelizations calculated by adaptive SLI match the magnification pattern of the lens model, allowing the NDOF to be fixed without suffering the loss in performance of other methods.
(ii) The use of a fixed source plane pixelization, for example, but not limited to, a square grid of pixels, generally gives rise to a biased set of model parameters with the SPLE lens. We demonstrated this by phase shifting a square grid by fractional pixel widths and showing that each phase gives a significantly different and generally biased lens model. By deriving the pixelization of every set of lens model parameters in an always different and unique way, adaptive SLI naturally explores all systematics associated with data discretization and thus removes all associated biases. We investigated the use of adaptive SLI with a fixed, not random, initialization (see appendix A), in line with some existing inversion methods and found that the biases not only persisted but were in fact amplified compared to square SLI, although we note that the severity of the effect will vary significantly depending upon the exact implemention.
Through its removal of discretization biases, adaptive SLI gives accurate and robust sampling of the lens model posterior probability distribution function with just one nonlinear search, something we believe is not possible with existing methods. We demonstrated this by fitting the highly degenerate SPLE to our two simulated images. By design, adaptive SLI copes well with highly degenerate profiles like the SPLE and it therefore offers strong potential for the fitting of more sophisticated profiles, for which parameter degeneracies are even more challenging. For example, models which decompose the mass into a baryonic component and a dark matter component possess a strong degeneracy between the two.
Of course, several methods in the literature use a source pixelization that depends on the lens model. However, as discussed in section 2.3, none use the random initialization of adaptive SLI. In this way, they therefore calculate related pixelizations, restricting them, like square SLI, to a specific subset of possible data discretizations which will therefore not fully remove the discretization effects we have shown.
A future goal of adaptive SLI is its development into a strong lensing analysis pipeline. The number of detected strong lens systems is presently undergoing a period of acceleration which is set to continue in the foreseeable future with surveys such as DES, LSST and Euclid. Accordingly, robust lens modeling techniques which offer more standalone functionality, reduced user setup and higher efficiency are becoming increasingly sought after. As demonstrated in this work, the robustness of adaptive SLI gives the necessary strong foundation upon which a lens modeling pipeline can be built.
The biases outlined in this paper, most noticeably those related to data discretization, are just one example of the systematics which can dramatically affect the results of lens modeling. There are many more systematics which have not been fully investigated, not limited to issues such as PSF accuracy, the use of over-simplified parametric lens profiles, the impact of image quality and intrinsic source morphology. As strong lensing inversion methods grow in their maturity and begin to be used as standard across many different astronomical disciplines, it is imperative that such effects are thoroughly explored in the short-term future.

APPENDIX A: DATA DISCRETIZATION WITH FIXED INITIALIZATION
The use of random initialization in adaptive SLI ensures that data discretization is different and unique with each lens model parameter set. Here, we further investigate this important feature. In particular, we wish to test the consequences of fixing the initialization of source plane pixel cluster centres, since this should give rise to an overall smoother variation of evidence with lens parameters and therefore aid optimization.
Random initialization occurs because the initial centers from which the clustering algorithm proceeds are calculated randomly from the input spatial coordinates (the traced image pixels coordinates (x, y)j, j = 1, J; see section 2.1 for more details). Initialization is then easily fixed by ensuring these initial centers are always the same every time the clustering algorithm starts. The adaptive pixelization still continues as normal, but the key difference is that it does so in a deterministic way. Therefore, in this way, nearly identical lens models give rise to nearly identical pixelizations and the data discretization for similar lens models are now related and dependent on the lens model, as with other methods in the literature.
In terms of fixing the initialization, we define our fixed initial centers as those given by the final pixelization calculated by adaptive SLI for each of images 1 and 2. We wish to investigate the result of changing the intialization, therefore we also calculate a set of eight other random initializations corresponding to slight perturbations in the input lens model. In this way, we end up with nine different sets of cluster centres which are then fixed for each of the nine corresponding adaptive SLI runs. This set up is analogous to the phasing shifting of nine grids with square SLI. The inversion setup parameters (e.g. source resolutions, image subgridding, regularization) are the same as those used in the main paper.
We first ensure that a fixed initialization smooths non- Marginalized 2d PDF − Image 2 High Noise Figure 13. Two dimensional PDFs of σ, q and α given by adaptive SLI for image 2 (top) and a lower S/N version of image 2 (bottom). The one dimensional PDFs for these parameters are shown in the top right corner for each image.
linear parameter space by repeating the demonstration given in figure 1, whereby we fix every parameter to that of the input lens models except σ, which we vary over the range 284.5 km/s to 285.5 km/s. The results are given in figure A1 for three of the nine fixed initializations. As expected, the variation of evidence with σ is much smoother than adaptive SLI when initialization is randomized. The results of using square and randomized adaptive SLI are also plotted on figure A1 for comparison. This method acts to smooths parameter space on small scales and will result in a full nonlinear search using significantly fewer iterations to find the optimal lens model. However, the figure shows that a hint of the return of discreteness bias since all three evidence curves follow different shapes and peak at different values of σ.
We now perform a full non-linear search for every fixed initialization for images 1 and 2, as we did when phase shifting with square SLI. The PDFs are plotted in figure A2 in the same way as those showing square SLI phase shifts in figure 10. As the figure shows, discreteness biases are actually amplified, appearing more severe than those found by phase shifting square SLI. Indeed, while square SLI gave mostly accurate and consistent lens models for image 1, in this case they span a wider range of α with greater inconsistency, while image 2 remains inaccurate. Furthermore, individual PDFs appear more sharply peaked and narrower than the summed PDF of adaptive SLI. The underestimation of errors is present once again and demonstrates that without full consideration of the data discretization, biases in errors arise. The benefit of reducing the number of iterations required to find the best-fit lens model is clearly negated by the return of discreteness biases. The random initialization used in adaptive SLI is required to ensure that data is discretized in a unique and unrelated way for every lens model, such that the final model calculated is not biased and has realistic errors.
We expect that data discretization biases are present in other inversion methods within the literature, such as the adaptive square grids of Dye & Warren (2005); Tagore & Keeton (2014), rectangular grids of Collett & Auger (2014); Suyu et al. (2013) and adaptive Voronoi grid of Vegetti & Koopmans (2009); none use random initialization like adaptive SLI and thus source pixelizations are not unique nor unrelated between lens model iterations. However, the fact that square SLI gives rise to discretization biases which are Adaptive SLI -α = 2.0188 +0.0550 −0.0503 Figure A2. Posterior distribution function (PDF) of α obtained by adaptive SLI with fixed initialization of source pixel cluster centres for image 1 (top) and image 2 (bottom). The inversion was set up with 4 × 4 sub-gridding, an arbitrarily large source plane and 200 clusters. The range of α values show greater inconsistency and lower accuracy than those in figure 10, showing that fixed initialization in adaptive SLI reintroduces discreteness biases and amplifies them compared to square SLI. The thick black line shows the summed PDF of all fixed grids and the dashed line shows the accurate lens models calculated using adaptive SLI.
less prominent than fixing an adaptive grid serves to illustrate that the severity of their effect depends entirely on the pixelization scheme and is therefore hard to predict in specific cases. We conclude that the random initialization of adaptive SLI is a vital component to the method, ensuring it can fully explore all systematics associated with data discretization to ultimately give an unbiased and accurate lens model using just one non-linear search. Figure A1. Inversion of image 1 using the true lens model except the velocity dispersion, σ, which is varied over ±0.5 kms −1 about the true value. The figure is as figure 1 except here we show three adaptive inversions with different but fixed initial source pixelizations. The figure shows that the resulting variation of evidence with α is smoother when fixing the source pixelization compared to allowing it to vary with each lens model iteration (black line).