Automated galaxy-galaxy strong lens modelling: no lens left behind

The distribution of dark and luminous matter can be mapped around galaxies that gravitationally lens background objects into arcs or Einstein rings. New surveys will soon observe hundreds of thousands of galaxy lenses, and current, labour-intensive analysis methods will not scale up to this challenge. We instead develop a fully automatic, Bayesian method which we use to fit a sample of 59 lenses imaged by the Hubble Space Telescope in uniform conditions. We set out to \textit{leave no lens behind} and focus on ways in which automated fits fail in a small handful of lenses, describing adjustments to the pipeline that allows us to infer accurate lens models. Our pipeline ultimately fits {\em all} 59 lenses in our sample, with a high success rate key because catastrophic outliers would bias large samples with small statistical errors. Machine Learning techniques might further improve the two most difficult steps: subtracting foreground lens light and initialising a first, approximate lens model. After that, increasing model complexity is straightforward. We find a mean $\sim1\%$ measurement precision on the measurement of the Einstein radius across the lens sample which {\em does not degrade with redshift} up to at least $z=0.7$ -- in stark contrast to other techniques used to study galaxy evolution, like stellar dynamics. Our \texttt{PyAutoLens} software is open source, and is also installed in the Science Data Centres of the ESA Euclid mission.


INTRODUCTION
Galaxy-scale strong lensing is the distortion of light rays from a background source into multiple images, by the gravitational field of a foreground galaxy along the same line of sight. From the apparent position, shape and flux of those multiple images, it is possible to infer both the intrinsic morphology of the background galaxy at magnified resolution, and the distribution of (all gravitating) mass in the foreground lens.
In combination with kinematic measurements, lensing methods have inferred the mean total density profile of mas-Contact e-mail: amy.etherington@durham.ac.uk sive elliptical galaxies and how that evolves with redshift (Gavazzi et al. 2007;Koopmans et al. 2009;Auger et al. 2010;Sonnenfeld et al. 2013a;Bolton et al. 2012), and put constraints on their dark matter content, stellar mass-to-light ratio, and inner structure (Sonnenfeld et al. 2012;Oldham & Auger 2018;Nightingale et al. 2019;Shu et al. 2015Shu et al. , 2016a. If the background source is variable and the mass model known, measurements of time delays between multiple images can constrain the value of the Hubble constant (Suyu et al. 2017;Wong et al. 2019). If the lens galaxy contains small substructures, they also perturb the multiple images, and provide a clean test of the nature of dark matter (Vegetti et al. 2010;Li et al. 2016Li et al. , 2017Hezaveh et al. 2016;Ritondale et al. 2019;Despali et al. 2019;Amorisco et al. 2022;He et al. 2021).
Currently, a couple of hundred strong lensing systems have been observed, by dedicated surveys such as the Sloan Lens ACS (SLACS) (Bolton et al. 2006;Auger et al. 2010), BOSS Emission Line Lens (BELLS) (Brownstein et al. 2012), Strong Lensing Legacy (SL2S) (Gavazzi et al. 2012) surveys, BELLS GALaxy-Lyα EmitteR sYstems (BELLS GALLERY) (Shu et al. 2016c,b), the SLACS Survey for the Masses (S4TM) Survey (Shu et al. 2017), LEnSed laeS in the EBOSS suRvey (LESSER) (Cao et al. 2020), and the Spectroscopic Identification of Lensing Objects (Talbot et al. 2018(Talbot et al. , 2021. During the next decade, a couple of hundred thousand strong lenses will be discovered by wide-field surveys including Euclid, LSST, and SKA (Collett 2015). Such large samples of strong lenses will contain rare 'golden' systems such as double or triple source plane systems (Collett & Auger 2014;Collett & Bacon 2016;Collett & Smith 2020), and unlock considerable scientific potential through vastly improved statistics (e.g. Birrer et al. 2020;Sonnenfeld & Cautun 2021;Cao et al. 2020;Orban De Xivry & Marshall 2009). To tackle the forthcoming thousand-fold increase in data volume, model inference must be automated, and made robust without human intervention.
Convolutional Neural Networks (CNNs) are a fast approach that have recently been shown to be successful at lens modelling. Hezaveh et al. (2017) and Levasseur et al. (2017) modelled nine lens systems observed by the Hubble Space Telescope (HST). However, this approach requires a large, and significantly varied and unbiased training set of mock lenses to learn from. These are requirements that can be difficult to guarantee, which could be problematic for source galaxies with irregular morphologies. Using a different method, Shajib et al. (2021) used the DOLPHIN software to model 23 lenses from an initial sample of 50 SLACS lenses.
We use the PyAutoLens software (Nightingale & Dye 2015, hereafter N15;Nightingale et al. 2018, hereafter N18), an open-source Bayesian forward-modelling project designed specifically with automation in mind. We develop an automated data analysis pipeline that models the distribution of foreground light and mass as a sum of smooth analytic functions, and the background light as either another sum of analytic functions (e.g. Tessore et al. 2016), or as a pixellated image (Warren & Dye 2003;Suyu et al. 2006;Dye & Warren 2005;Vegetti & Koopmans 2009;Joseph et al. 2019;Galan et al. 2021). By fitting a mock sample of ∼ 500 lenses we further show that previous versions of PyAutoLens (like many lens fitting algorithms) underestimated the statistical uncertainty of lens model parameters. A major component of this is a discretization effect inherent to pixel-based source reconstructions -for which we provide a solution.
We apply our automated lens modelling pipeline to a uniform sample of 59 SLACS and BELLS GALLERY lenses that were observed with the Hubble Space Telescope. Our goal is to model every single lens and therefore leave no lens behind : if we were analysing ∼ 100, 000 lenses, even a low rate of (unflagged) failures would require unfeasible human intervention, and would bias the increasingly tight statistical precision of subsequent scientific analysis. Our first, 'blind' analysis achieves a promising success rate of 85%. We then emphasise trying to understand why some lenses are not well fit, and improve our pipeline until they are. This mirrors the kind of methodology that will be possible with future large samples: a fairly fixed initial framework, that is adapted after early results. In this paper we are trying to establish that first fixed framework.
With the full sample modelled, we investigate the accuracy to which the Einstein radius is recovered. Cao et al. (2020) recently demonstrated the robustness of the measurement by comparing the Einstein radii of power-law fits to mock lenses with complex mass distributions, inferred from SDSS-MaNGA stellar dynamics data, to their true values. They showed that the Einstein radius was recovered to 0.1% accuracy, taking into account both systematic and statistical sources of uncertainty. We examine how this compares to the statistical uncertainties we infer for the Einstein radii of the SLACS and GALLERY sample. Further, we compare to previous literature measurements (Bolton et al. 2008a;Shu et al. 2016b) to verify our results and quantify how the uncertainty varies due to different methods and assumptions. Our work therefore provides an outlook on the accuracy to which we can anticipate measuring the Einstein radius in upcoming large samples of tens of thousands of lenses.
This paper is structured as follows. In Section 2 we give a brief overview of lensing theory and provide the mass and light profile parameterisations we adopt. Section 3 describes the sample selection and data reduction procedure for the data images of the SLACS and GALLERY samples. The method is then explained in detail in Section 4 and applied to a sample of mock data in Section 5 to investigate problems associated with pixelised source reconstructions. The results of applying the automated procedure to the SLACS and GALLERY samples are then presented in Section 6. Finally we discuss the implications for the future of automated analyses in Section 7 and summarise in Section 8. Throughout this work we assume a Planck 2015 cosmological model Ade et al. (2016). The results of every fit to the SLACS and GALLERY datasets can be found at the following link https://zenodo.org/record/6104823.

LENS MODELLING THEORY
The aim of this study is to investigate the practicalities of automated extended source modelling to infer the mass distributions of a large sample of lenses. We give a brief overview of relevant theory for this analysis in Section 2.1, and describe our choice of mass and light profile parameterisations in Sections 2.2 and 2.3, respectively.

Lensing theory
Strong lensing occurs in and around regions where the surface mass density of the lens Σ(R) exceeds the critical surface mass density for lensing, where D l , Ds, and D ls are respectively the angular diameter distances to the lens, to the source, and from the lens to the source, and c is the speed of light. Hence, assuming a cosmological model it is possible to fix the 3D geometry of the lens system using the observed redshifts of the foreground lens and background source galaxies. An extended distribution of Figure 1. Lens subtracted data images (left) and their corresponding pixel-grid reconstructions (right) for the "Gold" sample of lens galaxies (see Section 6.1 for a description of our classification process). Lenses are in order of Right Ascension, with SLACS lenses appearing first, followed by GALLERY lenses. The full model fits for these lenses, plotted with an indication of scale, are available in Appendix C.
matter can be described by its convergence, a dimensionless, 2D projected surface mass density defined as κ(x, y) = Σ(x, y) Σcrit . ( The lensing properties of a galaxy with κ(x, y) are characterised by the projected gravitational potential φ that satisfies the Poisson equation: ∇ 2 φ = 2κ. The lens galaxy deflects light rays from the source galaxy by an amount described by the deflection angle field, α = ∇φ. The goal of lens modelling, then, is to solve the lens equation, which relates the observed image positions θ = (θ1, θ2) of deflected light rays in the image plane, from a source at position β = (β1, β2) in the source plane. Given a lensed image and (a model of) the distribution of foreground mass, one can invert Equation 3 to recover the distribution of light in the source plane. In Figure 1 the pixelised source plane reconstructions of the lenses fitted in this work are shown next to their lensed data image. Gravitational lensing magnifies the background source, including an (infinitely thin) region of infinite magnification in the lens plane known as the tangential critical curve. Axisymmetric lenses have a circular critical curve known as the Einstein radius, REin. The mean surface mass density inside REin is equal to the critical surface mass density Σcrit of the lens (Equation (1)). The Einstein radius and enclosed Einstein mass are thus uniquely defined in the axisymmetric case, quantifying the size and efficiency of the lens. For asymmetric, irregular and realistic lenses, the definition of Einstein radius must be generalised. Several conventions are possible (see Meneghetti et al. 2013 for a good overview) but we choose to use the effective Einstein radius where A is the area enclosed by the tangential critical curve. This definition is self-consistent across different mass density profiles, and clearly recovers the definition of REin in the case of a circular critical curve. To calculate this in practice, we first obtain the set of points that defines the tangential critical curve contour from our lensing maps, using a marching squares algorithm, then compute the enclosed area using Green's theorem

Mass profile parameterisation
We model the distribution of mass in the lens galaxy as a Power-Law Ellipsoidal Mass Distribution (PLEMD), assuming that this is able to capture the combined mass distribution of both baryonic and dark matter. The convergence is (Suyu 2012), where γ is the logarithmic slope of the mass distribution in 3D, 1 q > 0 is the projected minor to major axis ratio of the elliptical isodensity contours, and b 0 is the angular scale length of the profile (referred to in some papers as the Einstein radius, but distinct from the more robust effective Einstein radius in Equation 5). The profile has additional free parameters for the central coordinates (xc, yc) and position angle φ, measured counterclockwise from the positive x-axis, and external shear. When varying the ellipticity, we actually sample from and adjust free parameters because these are defined continuously in −1 < εi < 1, eliminating the periodic boundaries associated with angle φ and the discontinuity at q = 0. We similarly parameterise the external lensing shear as components γ1ext and γ2ext. The external shear magnitude γext and angle φext are recovered from these parameters by The special case γ = 2 recovers the Singular Isothermal Ellipse (SIE) mass distribution, in which the steady-state motions of particles have constant 1D velocity dispersion σSIE, when projected along any line of sight. For this distribution of mass, the critical curve is the ellipse at κ = 1/2. Our definition of effective Einstein radius (Equation 5) means that the ellipse is R Ein,eff = qx 2 + y 2 /q, and the velocity dispersion is

Light profile parameterisation
We model the foreground galaxy's light distribution as the sum of two Sérsic profiles with different ellipticities but a common centre. This replicates the bulge and disc components that constitute an Early-type Galaxy (ETG) (Oh et al. 2017;Vika et al. 2014), and significantly increased the Bayesian evidence compared to a single Sérsic model, in a precursor study of three SLACS galaxies (Nightingale et al. 2019). The Sérsic profile is where I eff is the surface brightness at the effective radius R eff , defined here in the intermediate axis normalisation 1 , n is the Sérsic index, and k eff is a normalisation constant related to n such that R eff encloses half of the total light from the model (Graham & Driver 2005). The axis ratio and position angle of each component are parameterized during the fitting process, using elliptical components as in Equation (8). Aside from the two components' common centre, all free parameters are fitted independently of each other to allow for more complex light distributions. For example, the flux ratio of the two Sérsics is unconstrained, and the profiles may be elongated by different amounts and rotationally offset from one another. We model the distribution of light in the source galaxy as either a single Seŕsic profile or using a pixelated source reconstruction depending on the phase of the automated procedure, described in Section 4.3. The source galaxy is ultimately reconstructed on an adaptive Voronoi mesh, for which the procedure is described in detail in Section 4.2.

Lens sample selection
We analyse strong gravitational lenses around massive elliptical galaxies drawn from the SLACS (Bolton et al. 2008b) and BELLS GALLERY samples (Shu et al. 2016b). The SLACS sample were identified as lenses using SDSS spectroscopy to find higher redshift emission lines after subtracting a principle component model of the foreground galaxy spectrum (Bolton et al. 2006). This technique was modified for the GALLERY survey, to specifically select even higher redshift Lyα-emitting (LAE) source galaxies (Shu et al. 2016b). Spectroscopic redshifts of the lens and source are known, and follow-up high resolution imaging has been carried out for all systems.
To keep the data quality reasonably uniform (as it would be for a large future survey), we restrict the SLACS sample to the 43 lenses imaged to at least 1-orbit depth in the HST Advanced Camera for Surveys (ACS) F814W band. We add the 17 grade-A confirmed LAE lenses from GALLERY, all of which have been observed to 1-orbit depth in the HST Wide Field Camera 3 (WFC3) F606W band. Several systems have second or third foreground lenses of low mass. However, for this first attempt at automation in which we shall try to fit only a single main lens, we have not considered GALLERY lens J0918+4518, which has two equally bright lens galaxies. We end up with a set of 59 lenses.

Data reduction
HST imaging of both the SLACS and GALLERY samples was reduced using custom pipelines. The procedure for the SLACS sample is described in Bolton et al. (2008a) and produces images with 0.05 pixels; the procedure for GALLERY is described in Brownstein et al. (2012) and Shu et al. (2016c), and produces images with 0.04 pixels. The point spread function (PSF) was determined for both samples using the Tiny Tim software Krist (1993). The aforementioned papers also describe an optional method to subtract the lens galaxy's light by fitting it with a b-spline. Our pipeline benefits from fitting the lens light simultaneously with its mass, so we shall generally not use the b-spline data. However, our pipeline struggles to automatically deblend the lens and source light of three systems, so we shall try the b-spline data there.

Overview
Our strong lens analysis is carried out using the software PyAutoLens 2 , which is described in N18, building on the works of Warren & Dye (2003, hereafter WD03), Suyu et al. (2006, hereafter S06) and N15.
To fit a lens model to an image, PyAutoLens first assumes a parameterisation for the distribution of light and mass in the lens, and the distribution of light in the source, using the parametric profiles described in Sections 2.2 and 2.3. The parameterised intensity I of the lens light is evaluated at the centre of every image pixel, convolved with the instrumental PSF, and subtracted from the observed image. The mass model is then used to ray-trace image-pixels from their image-plane positions θ to source-plane positions β (via the lens Equation 3). The source analysis finally follows, which PyAutoLens performs using one of two approaches: (i) parametric profiles in the source-plane (e.g. the Sérsic profile) are used to simply evaluate I at every value of β; (ii) a pixelized source reconstruction is performed on an adaptive Voronoi mesh, where the values of β are used to pair image-pixels to the Voronoi source pixels which reconstruct the source (see WD03, S06, N15 and N18 for a full description of lensing analyses with pixelized source reconstructions).
The following link (https://github.com/Jammy2211/ autolens_likelihood_function) contains Jupyter notebooks that provide a visual step-by-step guide of the PyAu-toLens likelihood function used in this work. We have received feedback from readers of other papers using PyAu-toLens (who are less familiar with strong lens modelling) that they were unclear on the exact procedure that translates a strong lens model to a likelihood value. The notebooks aims to clarify this and provides links to all previous literature describing the PyAutoLens likelihood function, alongside an explanation of the technical aspects of the linear algebra and Bayesian inference. We provide a brief description of the PyAutoLens likelihood function below, but we recommend these notebooks to the interested reader if anything is unclear.

Source Reconstruction
After subtracting the foreground lens emission and raytracing coordinates to the source-plane via the mass model, the source is reconstructed in the source-plane using an adaptive Voronoi mesh which accounts for irregular or asymmetric source morphologies (see Figure 1). Our results use the PyAu-toLens pixelisation VoronoiBrightnessImage, which adapts 2 The PyAutoLens software is open source and available from https://github.com/Jammy2211/PyAutoLens the centres of the Voronoi pixels to the reconstructed source morphology, such that more resolution is dedicated to its brighter central regions (Nightingale et al. 2018).
The reconstruction computes the linear superposition of PSF-smeared source pixel images which best fits the observed image. This uses the matrix fij, which maps the jth pixel of each lensed image to each source pixel i. Following the formalism of (Warren & Dye 2003, WD03 hereafter), we define the vector Di = J j=1 fij(dj − bj)/σ 2 j and curvature matrix F ik = J j=1 fijf kj /σ 2 j , where dj are the observed image flux values with statistical uncertainties σj and bj are the model lens light values. The source pixel surface brightnesses values are given by s = F −1 D which are solved via a linear inversion that minimizes The term I i=1 sifij maps the reconstructed source back to the image-plane for comparison with the observed data.
This matrix inversion is ill-posed, therefore to avoid over-fitting noise the solution is regularized using a linear regularization matrix H (see WD03). Regularization acts as a prior on the source reconstruction, penalizing solutions where the difference in reconstructed flux of these two neighboring Voronoi source pixels is large. Our results uses the PyAu-toLens regularization scheme AdaptiveBrightness, which adapts the degree of smoothing to the reconstructed source's luminous emission (Nightingale et al. 2018). This has three hyper parameters, the inner regularization coefficient, outer regularization coefficient and a parameter which controls how the outer and inner regions of the source plane are defined for regularization. The degree of smoothing is chosen objectively using the Bayesian formalism introduced by Suyu et al. (2006). The likelihood function used in this work is taken from (Dye et al. 2008) and is given by

PyAutoLens
PyAutoLens is designed to approach lens modelling in a fully automated way (N18, Nightingale et al. 2021b). This uses a technique we term 'non-linear search chaining', which sequentially fits lens models of gradually increasing complexity. By initially fitting simpler lens models one can ensure that their corresponding non-linear parameter spaces are sampled in an efficient and robust manner that prevents local maxima being inferred. The resulting lens models then act as initialisation in subsequent model-fits which add more complexity to the lens model, guiding the non-linear search on where to look in parameter space for the highest likelihood lens models, ensuring the global maximum model has the highest chance of being inferred. Non-linear search chaining is performed using the probabilistic programming language PyAutoFit (https://github.com/rhayes777/PyAutoFit), a spin off project of PyAutoLens which generalises the statis-  Table 1. Composition of the pipelines that make up our uniform analysis. Where prior info is not passed from previous pipelines see Table A2 for the specific priors used on each model parameter.
tical methods used to model strong lenses into a general purpose statistics library.
To perform model-fitting PyAutoLens uses the nested sampling algorithm dynesty (Speagle 2020a), which obtains the posterior probability distributions for a given lens model's parameters. Nested sampling's ability to robustly sample from low dimensional (e.g. fewer than ∼ 30 parameters), complex parameter space distributions makes it well suited to lens modelling. We use dynesty's random walk sampling for every model-fit performed in this work, which we found gave a significant improvement over other sampling techniques owing to its better accounting of the covariance between lens model parameters. Since nested sampling starts by randomly sampling from the prior, the size and choice of priors directly impact the expected number of nested sampling iterations alongside how likely it is that a local maximum is incorrectly inferred. As such, using more informative priors will reduce the amount of time needed to integrate over the posterior and guide towards sampling the highest likelihood global maxima solutions.
Non-linear search chaining allows us to construct informative priors from the results of one dynesty search and pass them to subsequent model-fits, thereby guiding them on where to sample parameter space. This uses a technique called prior passing (see N18), which sets the prior of each parameter as a Gaussian whose mean is that parameter's previously inferred median PDF (probability density function) value and its width is a customisable value specific to every lens model and parameter. The specific order of prior passing used in this study is given in Table 1. The prior widths have been carefully chosen to ensure they are broad enough not to omit valid lens model solutions, but sufficiently narrow to ensure the lens model does not inadvertently infer local maxima. More quantitatively, the prior widths are typically greater than ∼ 10 times the errors we ultimately infer on each parameter, meaning it has negligible impact on the posterior (see Section 5).

User Setup
In this work, we use the standardised Source Light and Mass (SLaM) pipelines that are available, and fully customisable, in PyAutoLens. From these, we construct a pipeline that chains together a total of 11 dynesty searches which we apply to every lens in our sample, which we describe in detail in Section 4.3.3. Before we run the SLaM pipelines a few brief pre-processing steps must be carried out; we describe those here, as well as our chosen pipeline settings.
We define a circular mask centred on the lens galaxy that defines the image pixels we fit to. For the SLACS and GALLERY lenses we use a standard size of 3.5 and 3.0 radius, respectively. This is increased to 4.0 for the SLACS lenses J0912+0029 and J0216-0813, and for the GALLERY lens J0755+3445. All image pixels outside this mask are completely omitted from the analysis, meaning they are not traced to the source plane and included in the source reconstruction procedure.
We create scalable noise maps, unique to each lens, that define any regions inside the mask that we do not wish to fit (e.g. unrelated astronomical sources projected by chance along adjacent lines of sight). In these regions the image values are scaled to near zero and the noise-map values to large values such that the likelihood calculation effectively ignores them. Such areas of high flux would otherwise be indistinguishable from the source flux to the fitting procedure. We adopt this noise map approach, over the complete removal of such regions, since image-pixels are still traced to the source-plane and included in the source reconstruction procedure. This avoids creating discontinuities or 'holes' in the source pixelisation which can degrade the quality of the overall reconstruction. The maps are produced in a graphical user interface (GUI) available in PyAutoLens, designed to reduce the human time necessary for creating each unique map (∼1 minute per lens). We acknowledge this task is overly time-intensive when considering the incoming samples of tens of thousands of lenses and provide a discussion of possible routes to automation of this pre-processing step in Section 7.1.
Finally, we store an array containing the coordinates of the pixels containing the peak surface brightness of each multiple image of the source galaxy, again selected by the user via a GUI. These coordinates are used to remove local maxima from the parameter spaces explored throughout the pipeline. In practise, this is done by discarding any models where the ray-traced points in the source plane are not within a positions threshold value of each other. This value is initially set to 0.7 1 . Both the threshold and the positions themselves are then iteratively updated throughout the SLaM pipeline, by solving the lens equation using the maximum likelihood mass model estimated in a previous fit. For each iteration, the value is set to three times the separation of the positions found after solving the lens equation or a value of 0.2 , whichever is largest. This ensures that, as we progress from parametric to pixelised source reconstructions, we avoid the under and over-magnified solutions that can be problematic for these methods Maresca et al. (2020).

Uniform Analysis
The uniform analysis ultimately aims to constrain the parameters describing the mass and light distributions. The lens galaxy's mass is parameterized as a PLEMD (Equation 7), while the lens light is modelled as a double Seŕsic profile, which is a sum of two Sérsic profiles (Equation 11) with a common centre. This is achieved by reconstructing the source galaxy's light distribution on an adaptive Brightness-based Pixelisation and Regularisation (BPR) grid. The uniform analysis is constructed from multiple pipelines that each focus on fitting a specific aspect of the lens model which we describe below. For an overview of the composition of the overall method see Table 1. A scaled down version of this pipeline was used by Cao et al. (2020) to model fifty simulated strong lenses.
We begin with the Source Parametric (SP) pipeline that fits the foreground lens galaxy's light profile, alongside a robust initialisation of less complex models for the mass distribution of the lens and light distribution of the source galaxy. The lens mass is modelled as an SIE (Equation 7 with γ = 2) plus external shear. The lens light is modelled by the sum of a Sérsic and Exponential (Equation 11 with n=1) profile. The source galaxy's light is described by a single Sérsic profile; this is key to the initialisation of the model using the SP pipeline, as it allows us to compute an initial estimate of the mass profile without dynesty getting stuck in a local maximum (as methods with a pixelised source frequently do; N18, Maresca et al. 2020).
The Source Inversion (SI) pipeline then refines the lens galaxy's mass distribution by modelling the source galaxy using an adaptive pixelisation. This allows more realistic morphologies of the source galaxy to be recognised, which in turn improves the model for the lens galaxy's mass. The pixelisation and it's pixel-to-pixel regularisation are described by a set of hyper-parameters (see Section 4.2 for more details), that are fitted for as free parameters in the fit, these are first initialised using a Magnification based Pixelisation and Regularisation (MPR) grid. The source model from this fit is then used to inform the the BPR pixelisation that adapts to the surface brightness of the source galaxy, thereby reconstructing areas of high flux with higher resolution and lower regularisation relative to areas of low flux. The Light Parametric (LP) pipeline re-fits the lens galaxy's light profile. This produces a more accurate estimate of the lens galaxy's light than previously, because the lensed source galaxy's light is now reconstructed using the Voronoi pixelisation, thereby reducing residuals which otherwise impact the lens light model fit. The lens light model is now composed of two Sérsic profiles (the second component now has a free Sérsic index). This fit is performed using broad uninformative priors and thus does not use any information about the lens galaxy's light profile estimated by the previous pipelines.
Finally, the Mass Total (MT) pipeline extends the complexity of the model of the lens galaxy's mass to that of the PLEMD (Equation 7), whereby the slope of the density profile (γ) is now a free parameter in the model. A uniform prior between 1.5 and 3 is assumed on the slope. To ensure robust error inference on the parameters of our final model, the MT phase is extended by re-running the same model with a 'likelihood cap' applied (see Section 5 for details). The term 'Mass Total' is used to distinguish this pipeline from the 'Mass Light Dark' SLaM pipeline which is not used in this work. Instead of fitting a mass model that represents the total mass distribution this pipeline fits one that separately models the light and dark mater (Nightingale et al. 2019).

Results Database
Upon completion of a uniform pipeline there are dynesty samples of 11 different model-fits, alongside additional metadata describing quantities such as each parameter's estimate, their errors and the PyAutoLens settings. Across our sample of 59 strong lenses this creates over 500 lens modelling results, necessitating tools to automate their processing and inspection. PyAutoFit outputs all modelling results to a queryable SQLite database (Hipp 2020) such that they can be easily loaded into a Jupyter notebook or Python script post-analysis. By adopting PyAutoFit, all PyAutoLens results support this SQLite database which is the primary tool we use for analysing lens modelling results.

DEALING WITH NOISE IN LIKELIHOOD EVALUATIONS
N15 demonstrated that pixelised source reconstructions can be subject to a discretization bias that ultimately leads to the underestimation of errors calculated by a typical nonlinear search (N15). This is a result of discrete jumps in the likelihood as the lens model parameters are smoothly varied, which hinders convergence and parameter marginalisation. N15 suggests this may be a common problem for methods that employ pixelised sources. Here, we investigate the effects of the bias further using a large sample of mock observations.

Mock data sample
We create 59 synthetic lenses similar to our SLACS and GALLERY lenses, to approximately resemble the real data but with known truths. The mass distribution of each synthetic lens is a PLEMD; we set the radius b and ellipticity parameters ε1 and ε2 to those of the SIE lens model measured in previous lensing analyses (see Table 5 of Bolton et al. (2008b) and Table 2  Where the slope of the density profile is not available, we instead use the values inferred by preliminary runs of our own strong lensing-only analysis. The surface brightness of each source galaxy is simulated as an elliptical Sérsic, the parameters of which are set to those inferred during preliminary runs of our Source Parametric Pipeline (see Section 4 for more detail) 1 . The redshifts of the lens and source are set to those known for the corresponding real strong lens. For every synthetic lens configuration, we create six mock observations, each with different realisations of observational noise. To mimic the HST observations the lensed image of the source is generated with a pixel scale of 0.05 (SLACS) and 0.04 (GALLERY) and convolved with the instrumental point spread function (PSF) modelled from the actual image of the strong lens we are simulating. The synthetic images include a flat sky background of 37.5 (SLACS) and 31.5 electrons per second (GALLERY) and six different realisations of Poisson noise. We choose not to simulate light from the lens galaxy since this has the potential to introduce systematic effects that we are not interested in investigating with this sample (see Section 5). Across the resulting suite of 354 synthetic observations, the S/N of the brightest pixel in each image ranges from 4 to 68. Figure 2 compares a subset of simulated mock lenses with their real data counterparts.

The origin of discretization bias and error underestimation
First, we investigate how discretization bias manifests in PyAutoLens, whose source pixelisation differs in its implementation from N15 and N18. This is illustrated in Figure 3, which plots the variation of the log likelihood of a lens model when changing only the slope parameter γ of the mass distribution (fixing all other parameters to their true values). The parametric source model produces a smooth likelihood curve. The BPR pixelisation methods produce a higher likelihood, We create a sample of mock lenses that closely resemble each of the 59 SLACS and GALLERY lenses in our observed data sample, which we use for testing for data discretization bias. We show eight of these mock images (right panel) alongside the real data image they were simulated to represent (left panel with lens name). The mock images are simulated without light from the lens galaxy, as such we compare to the data images where the lens galaxy's (double Seŕsic) light profile has been subtracted.  Figure 3. Comparison of the log likelihood as a function of density profile slope when using a parametric source (pink curve) or Brightness-based pixelisation and Regularisation (BPR) pixelisations to fit to mock data. All model parameters other than the slope are fixed to their true values. The yellow line reveals the full level of noise in the likelihood due to the particulars of the source plane pixelisation, by using a new random seed for the k-means algorithm that pixelates the source plane for every likelihood evaluation. The other three colours use fixed k-means seeds, as is done throughout the rest of this paper.
but one that is subject to seemingly random noise. These 'spikes' in log likelihood occur over small ranges in the slope parameter; at least an order of magnitude smaller than the errors one infers for γ when fitting this lens with a parametric source. This confuses the nested sampler, which converges to positive spikes in likelihood that are tiny volumes of the multi-dimensional parameter space, and thus significantly underestimate the total statistical uncertainty.
To perform a source reconstruction using a pixelised source, one must first define a procedure that determines the shape and locations of the source-plane pixels, its discretization. For example, in the case of PyAutoLens, one can alter the random seed that determines the centres of the Voronoi source pixels. This element of choice makes the likelihood ill-determined, as is demonstrated in Figure 3 by the three different realisations of noise that are uncovered for the differently seeded grids (the only difference between the fits that produces the blue, orange, and purple likelihood surface is the choice of k-means seed that determines the source-pixel centres). If we choose to pass a random k-means seed to each individual fit (the yellow curve in Figure 3) the full scale of the noise due to different source discretizations is revealed, likelihood evaluations of almost identical lens models can yield very different likelihood values when the source pixelisation changes. Sampling the parameter space when using a random k-means seed is therefore prohibitively slow, ultimately leading to the non-linear search becoming stuck and being unable to converge.
In fact, repeat likelihood evaluations of an identical lens model also yield different likelihood values if the source pixelisation's discretization changes. Figure 4 shows the result of doing exactly this, where log likelihood values are computed using an identical lens model 500 times (we use the best fit lens model parameters from our fitting procedure to do this), with each computation using only a different Voronoi mesh to reconstruct the source. The three different coloured histograms show the results of this procedure for three of the six noise realisation images of a lens, that arrive at three different best fit lens models. In all cases, the histograms of log likelihood values show that changes in log likelihood of order ∼ 50 are possible by just changing the source pixelisation. To perform parameter estimation, changes in log likelihood of order ∼ 10 define how precisely a parameter is estimated at ∼ 3σ confidence. Thus, if our log likelihoods can fluctuate by of order ∼ 50 in a seemingly arbitrary way, this will be detrimental to parameter and error estimation.
Why does the log likelihood vary when we change the source pixelisation? For a given lens model, there are certain source pixelisations where the centres of their Voronoi source pixels line up with the locations of the traced image-pixels mapped from the image data in a 'preferable' way. Their alignment allows the source pixels to reconstruct the image data more accurately, in a way that is penalised less by regularisation (see S06). This produces what we consider an artificial 'boost' in likelihood. Conversely, other pixelisations have a less fortuitous alignment, such that their reconstruction of the image data is worse and they are more heavily penalised by regularisation, producing an artificial 'drop' in log likelihood. Figure 4 shows that the distribution of loglikelihoods appears to be Gaussian, a property we will use when we put forward a solution to this problem.
We are now in a position to explain the spiky likelihood surface shown for the fixed seed BPR pixelisations . Histogram of log-likelihood values from re-fitting the best-fit model with a new k-means seed 500 times, while keeping the model parameters fixed. The dashed line is the fitted Gaussian curve to these values. The vertical line shows the maximum likelihood value of the best-fit parameters found without a likelihood cap, which is always boosted by noise to extremely high likelihood. For clarity we show three of the six distributions from different noise realisation images of the mock lens, the same behaviour is evident in the three distributions not shown here.
in Figure 3. Let us first consider in more detail the BPR pixelisation implemented in PyAutoLens. To construct the source-pixel centres, the BPR pixelisation applies a weighted k-means algorithm in the image plane to determine a set of coordinates that are adapted to the lensed source's surface brightness. This k-means algorithm is seeded such that the same image-plane coordinates are inferred if the procedure (using the same inputs) is run multiple times (thus the completely random changes to the source pixelisation used to construct the histograms shown in Figure 4 cannot explain these likelihood spikes). These image-plane coordinates are then ray-traced via the mass model to the source-plane and are used as the centres of the source pixels of the Voronoi mesh. To produce the blue, orange, and purple curves shown in Figure 3, the coordinates that construct the source pixelisation are therefore fixed in the image-plane, but vary smoothly with the mass model in the source plane. The spiky likelihood surface can therefore be explained by how the continuous change in the positions of the source pixels generating the Voronoi pixelisation produces discrete changes in the Voronoi mesh (either creating new cells or changing the value of flux across cell boundaries -these changes may occur less frequently with interpolation of the source pixel grid). The reconstruction then receives boosts and drops in log likelihood as for certain mass models (values of γ) since the positions of the source pixels align more or less favourably with the data.

Testing for error underestimation in lens modelling
In the context of a full non-linear search which varies every lens model parameter, we expect that likelihood spikes due to this preferable alignment of the source pixelisation with the data will be present, negatively impacting our inference on each parameter's PDF. To investigate this, we fit the full sample of 354 mock images (see section 5.1) with a uniform pipeline constructed from the SLaM pipelines in PyAutoLens. The pipeline is the equivalent of that described in Section 4.3.3 but created for fitting images without the lens galaxy's light distribution (see Appendix A1 for an overview of the pipeline). The pipeline, then, infers the mass parameters of the lens galaxy described by a PLEMD, while reconstructing the source galaxy on a BPR pixelisation. We choose not to fit for an external shear (which is not present in the lens models of the simulated data) in order to simplify our investigation of likelihood boosts. Our main goal, here, is to determine if the error estimates inferred by the nonlinear search are being underestimated as a result of the data discretization bias. Figure 5 shows the posterior PDFs obtained for individual runs of three lenses in our mock sample. For each lens, six realisations of the image data with different noise maps were simulated and fitted, which correspond to the six individual PDFs shown on each panel of Figure 5. Not only do the PDF contours rarely contain the true parameter (represented by the grey dashed lines) they also rarely overlap with each other. To verify this is due to data discretization bias, for each of the 354 synthetic images we now produce 500 new likelihood evaluations -fixing all lens and source model parameters to the best-fit values, but randomising the k-means seed used to pixelate the source plane. For 94.6% of these 177,000 calculations, the new likelihood is lower than the best-fit model likelihood, indicating that the likelihood values inferred by dynesty were systematically boosted relative to the majority of possible source pixelisations. Figure 4 shows this for three example cases, where the solid lines show the maximum log likelihood model inferred via dynesty compared to a histogram of these 500 models draw using random k-means seeds.
The likelihood boosted solutions inferred by dynesty occupy a tiny volume of parameter space, such that parameter marginalisation significantly underestimates the width of the posterior PDF. For each of the lens model parameters we calculate the percentage of the 354 model fits that recover the true parameter within their 1D marginalised 68.7%, 95%, and 99% credible regions (blue bars in Figure 6). On average for all lens model parameters the truth is recovered only 30% or 50% of the time at the 68.7% and 95% credible regions, these coverage probabilities are significantly smaller than the percentage credible regions they were calculated for -the reported uncertainties are too small.

Likelihood Cap for improving sample statistics
We now investigate the efficacy of placing a 'log likelihood cap' on the non-linear search, where this cap is estimated in a way that seeks to smooth out likelihood spikes in parameter space. The cap is computed by taking the maximum likelihood lens model of the non-linear search inferred by the MT 1 ext search in the SLaM pipeline and computing 500 likelihood evaluations using this model but each with a different kmeans seed. This process produces the histograms shown in figure 4, which are fitted with a Gaussian whose mean then acts as the log likelihood cap. We then repeat the final MT 1 search of the pipeline (with identical parameters, hyper-parameters, k-means seed, etc.), but any log likelihood evaluation now returns no more than this value. If a log likelihood is computed above this cap, it is rounded down to the cap's value before it is returned to dynesty, we note that this assumes that dynesty has not converged on a local Figure 5. For three typical synthetic lenses, the posterior PDF of model parameters inferred from mock observations. With a likelihood cap (yellow), these PDFs have sufficient width to include the true value (crossed lines). Without a likelihood cap, the PDFs from mock data with different realisations of observational noise (six other colours) are too narrow because of noise in the likelihood evaluations. Fitted parameters shown are the mass-density slope (γ), mass normalisation (R al ), and two components of ellipticity (ε 1 , ε 2 ); all other free parameters are marginalised over. maxima in MT 1 . The yellow shaded contours in Figure 5 show the PDFs inferred by MT 1 ext using this log likelihood cap, which now appear larger, smoother, and do not have undesirable properties such as islands and discontinuities that are seen for the PDFs inferred without this cap.
When performed on our 354 synthetic images, the final parameter estimation now converges more consistently for different realisations of noise (for the sake of visual clarity, Figure 5 only shows one PDF, but all six PDFs do now overlap for each dataset). The coverage probabilities for the 1D marginalised 68.7% or 95% and 99% credible regions have increased significantly for all lens model parameters with the use of the likelihood cap (see Figure 6 for the comparison with and without the likelihood cap). On average the true lens model parameters are recovered 61% and 80% of the time at the 68.7% or 95% credible regions, respectively. Although we do not obtain full coverage, this is a significant improvement in error estimation compared to not including the likelihood capped phase. Furthermore, for each lens model parameter we compare the mean of the best fit values of the six noise realisations, and find that these are recovered 74% of the time at the 68.7% credible region on average for all parameters. This suggests that the likelihood cap is producing errors that are consistent with the uncertainty due to random noise in the image, and that our posteriors recover the true values slightly less frequently than hoped due to systematic biases in particular lens configurations that offset the inferred parameters from the truth.
Further testing is necessary to understand the systematics that result from the source discretization bias as well as any systematic offsets in inferred lens parameters in particular lens configurations. This would require a larger set of mocks than was simulated for this study (see Section 7.4 for more discussion) and is beyond the scope of this work. At present, it appears that the likelihood cap is effective at improving the coverage probability of the 68.7% credible region (only 7% shy of achieving coverage for lens models parameters on average). Since the mock data was simulated to be representative of the observed data, we assume this will be true of the errors on the data adopting the same approach.
As such, all errors quoted in this work are those at the 68.7% credible region of the PDFs inferred by the likelihood capped MT 1 ext phase.

Automation
We now inspect the results of our automated modelling procedure on the SLACS and BELLS GALLERY samples and quantify what fraction of lenses were fitted with a reliable lens model without human intervention. To facilitate this, we visually inspect every lens model, first after the SP pipeline and then again on completion of the uniform procedure. We label the final model of every lens in one of four categories: • Gold (54/59): The fit represents a physically plausible model of the lens and source. • Silver (4/59): The fit represents a physically plausible model of the lens and source. However, achieving this required changes to data pre-processing that may not be easy to automate (e.g. masking, lens light subtraction), and may degrade the quality of the inferred lens model. • Bronze (1/59): The fit represents a physically plausible model of the lens (with the correct number of multiple images), but other features in the data (e.g. residuals from lens light subtraction) visibly degrade the quality of the source model.
• Failure (0/59): The fit produces a physically implausible lens model (e.g. with an incorrect number of multiple images).
After a first blind run, we find 9 galaxies outside the "Gold" sample. In 8/9 cases, they went wrong during the first SP pipeline. We determine what went wrong, describe simple interventions, and rerun the pipeline. Our interventions successfully move all of these lenses into the "Bronze", "Silver" or "Gold" categories. Through this process, we suggest ways to reduce the failure rate in analyses of future large samples of lenses. For future analysis of large lens samples, one can anticipate undergoing this process on a subset of lenses before modelling the full sample.
If a lens ends up in the "Gold", "Silver", or "Bronze" categories, we consider its effective Einstein Radius R Ein,eff to be measured accurately. If a lens is in the "Gold" or "Silver" categories, we also consider more detailed quantities of the mass model (e.g. the slope γ) to be reliable. Indeed, we shall find our best-fit models broadly consistent with those from previous literature, in Sections 7.2 and 7.3.

Fully automated success
We immediately place 50/59 lenses (85%) in the "Gold" sample after the first, blind run of our uniform pipeline. These models show low levels of residuals and physically plausible source galaxy morphologies. Best-fit model parameters are listed in Tables 2 (SLACS) and 3 (GALLERY), and reconstructions are shown in Appendix C.

Semi-automated success
Fits to 4/59 lens systems converge to models with the wrong number of lensed images. In all four cases, the fits incorrectly converge to a highly elliptical mass distribution early in the SP pipeline, and could not recover the better solution in the SI or subsequent pipelines. The model of J1451−0239 fits 4 images to what is (by eye) a 2 image system (Figure 7). Fits to J0237−0239 and J0856+2010 converge to single-image models, each missing a central counter-image that is close to the centre of the lens galaxy and therefore difficult to disentangle from the lens galaxy's light (Figure 8). The model of J0841+3824 is multiply imaged, but its very faint counter image is in the wrong location (Figure 9).
We fixed this by rerunning the pipeline for these lenses, but restricting the SP 2 phase to more circular mass models, via a uniform prior εi ∈[−0.2, 0.2], instead of a Gaussian with σ = 0.3. To better find the global maximum likelihood solution for lenses J0237−0239, J0841+3824, and J0856+2010, we also increased the number of dynesty live points to 600 from 200 in SP 2 (this was not necessary for J1451-0239, where a change has no consequences other than increased runtime). With these settings, the automated modelling procedure is a success and the models (also shown in Figure 1) are moved into the "Gold" sample.
These fits can be easily fixed by a more restrictive (or an all-round better) early initialisation. Our solution of forcing fairly circular models works well for early-type galaxy lenses, but would need to be rethought if the sample could include late-type galaxies with (edge-on) discs. Since spectroscopic lens detection techniques also identify the lens galaxy type, a different prior could be used for each.
For now, we conclude that the biggest challenge of scaling up lens modelling to large samples is fitting an initial, physically plausible lens model. Once a simple lens model is correctly initialised, nothing prevents subsequent convergence of increasingly complex distributions of source light and lens mass. We shall discuss this further in Section 7.1.

Success with human intervention
Fits to 3/59 lens systems converge to a model in which imperfect lens light subtraction has left a spurious, residual ring of lens light that becomes considered part of the source. This again happens during the early SP pipeline, after which the Sérsic model of the source is too large (Figure 10a). Subsequent pixelised source models also include the residual lens light. Unlike the previous failure modes, we could not find small changes to the automated pipeline that fix these model fits.
For lenses J1153+4612, J1016+3859, and J0959+4416, we instead use the b-spline subtracted data (Section 3.2). These versions pre-subtract the lens galaxy's light more cleanly than our double Sérsic fit. Even then, we mask small remaining residuals near the centre of J1153+4612 and J1016+3859. We finally refit all three lenses using the version of the pipeline (which was also for the mock data) that does not fit the lens light. This results in successful models, as assessed by our visual inspection criteria (Figure 10b). Although we arrive at successful model fits, we categorise these lenses in the "Silver" sample, because the lens light was not fitted in a Bayesian manner.
The fit to 1/59 lens systems includes a counter-image that reproduces a residual knot of lens light emission instead of the adjacent but fainter true counter-image ( Figure 11). It can be fixed by masking the knot of lens light and rerunning  Table 2. Best-fit physical parameters for SLACS lenses. These are derived quantities, obtained from the varied parameters of the lens mass model (Table B1). Lens light model parameters are presented in (Table B4).  Table 3. Best-fit physical parameters for BELLS GALLERY lenses. These are derived quantities, obtained from the varied parameters of the lens mass model (Table B2). Lens light model parameters are presented in (Table B3)         The lens requires rerunning with our own double Sérsic subtracted data using the without lens light pipeline, as well as a custom mask, to arrive at the successful "Silver" model fit. the pipeline. However, this process would be difficult to automate with monochromatic imaging, so we place J1416+5136 in the "Silver" sample.

Remaining problematic lens
The lens J1103+5322 is the only system that is unable to pass our visual inspection criteria on completion of the uniform pipeline. In the SP pipeline the model fits an appropriate model that fits the global lensed structure of the source, however significant residuals are present in the fit. The lens light subtraction leaves a quadrupole-like feature in the centre of the subtracted image as well as flux extending past the Einstein-ring feature. The SP pipeline is able to fit a model that fits solely to the source light, however continuation of the pipeline leads to a final model that reconstructs the lens light residual structure, which in Figure 12 can be seen to extend far beyond the emission from the source. This feature could impact the measurement of parameters which depend on the gradient of the flux in the lensed source, like the slope of the mass model. Replacing the data with the b-spline subtracted data resulted in similar residual lens light emission being reconstructed by the source galaxy. Nevertheless, we believe that this model estimates R Ein,eff accurately, our measurement is within 5% of previous literature measurements (see Section 7.2 for a discussion on the expected uncertainty between these methods). As a result, we place this lens in our "Bronze" sample.

Effect of the Likelihood Cap
In Section 5 we demonstrated the necessity of a likelihood capped phase (MT 1 ext ) to increase the formal statistical errors inferred by the non-linear search such that they better recovered the true parameters on mock data. We now quantify the effect this phase has on the uncertainties inferred on real data (see Figure 13 for its affect on the density profile slope errors). On average, we find that this approach has increased the inferred non-linear search errors by a factor of ∼ 5, as assessed by the median of individual factor increases for all mass model parameters. We quote the median increase to avoid bias from 5 lenses whose errors increase by a factor of over 1000 upon introduction of the log likelihood cap. On investigation, we found these lenses correspond to those with the largest difference between the likelihood inferred in MT 1 and the likelihood cap applied to MT 1 ext (defined as the mean of 500 repeated likelihood evaluations with the same mass model, but different data discretizations). Hence, these lenses are the ones that were in the most "likelihood-boosted" regions of parameter space and as a result significantly underestimated the error. In the most extreme example, J0755+3445, the error inferred on the slope parameter with a likelihood cap is 64453 times larger than that inferred without a cap (see Ritondale et al. 2019, for a discussion of this lens). This highlights the scale at which the certainty of parameters can be incorrectly inferred without consideration of the source discretization bias. Further quantification of the average errors inferred at the 68% credible region for each mass model parameter with and without a likelihood cap is given in Table 4.
Of all the mass model parameters, the likelihood cap has the largest effect on the density profile slope. The median factor increase in the error size before and after the cap is 24. The distribution of the 68% credible region errors with and without the cap are plotted in the right panel of Figure 13. Notably there are two extreme outliers in the distribution of errors inferred without a cap, that are the two largest errors inferred across both distributions. For the lenses J1016+3859 and J0959+4416, both of which were replaced with b-spline subtracted data as an intervention to achieve model fits, the error actually decreases when the likelihood cap is applied. Although the uncertainty on the slope measurement is in general, as expected, significantly increased in MT 1 ext relative to MT 1 , the distribution of slopes inferred does not change significantly (left panel of Figure 13). The mean increases from 2.08 to 2.12, and the standard deviation increases from 0.21 to 0.24.
We derive errors on the effective Einstein radius by calculating a posterior PDF from all possible effective Einstein radii given the accepted non-linear search samples and their weights. We find the inclusion of the likelihood cap increases the mean 68% credible region error on the effective Einstein radius from 0.3% to 1.1%, and does not affect the distribution of R Ein,eff we infer (see Figure 14). This suggests that, on average, the Einstein radius can be measured to ∼ 1% uncertainty, taking into account uncertainties in the noise and source discretization. We note that this does not account for any systematic error that would result from discrepancies between the assumed mass model and the underlying mass distribution. However, although the mean uncertainty on R Ein,eff is low, two lenses (J0841+3824 and J1116+0915) have anomalously large uncertainties of 8.6% and 6.6% respectively. Hence, for some lens configurations it appears the Einstein radius can not be determined with such certainty. This may be an indication that the underlying mass distribution for these lenses is more complex than the PLEMD that we assume in our model fits. This seems reasonable for these two lenses since J0841+3824 is one of the few disky galaxies in the sample, with obvious extended spiral features in the data, and J1116+0915 contains a visible mass clump to the North of the lens that we do not fit for with our uniform approach.

What drives the precision of a lens model?
To investigate what properties of the lens or data (if any) drive the precision of the lens model, we measure correlation coefficients between statistical uncertainty on the effective    Figure 15. Inferred percentage errors on the Einstein radii at the 68% credible region as a function of observable properties of the lens galaxy, and the S/N of the source. Parameters for linear fits to these data are given in Table 5.
Einstein radius and observable properties of the lens galaxy: including the Einstein radius itself, the ratio of the Einstein radius to the effective radius, the lens redshift, the velocity dispersion of the lens, and the peak S/N of the source (Figure 15). Linear fits show no clear trend with most of these parameters. The only non-negligible correlation (defined as a non-zero gradient with >3σ significance) is with the Einstein radius. The correlation remains when we repeat the linear fit removing the two uncertainties that are larger than 5% that could bias the relation, although the effect size does reduce by over a third (Table 5).

DISCUSSION
7.1 Can we truly leave no lens behind?
The success of our uniform pipeline makes us optimistic for the future of automated strong lens analysis. We initially fitted 50/59 (85%) lenses in a blind run. We increased this to 54/59 (92%) "Gold" lenses after tweaking model priors, 58/59 (98%) "Gold" or "Silver" lenses with some pre-fitting and masking of lens light, and 59/59 (100%) including one successful model of the lens whose model of the source includes poorly-subtracted residuals of lens light. With just one pipeline, we have inferred parameters for 59/59 lenses that measure the lens galaxy's Einstein radius and other mass distribution parameters (of the power-law profile with an external shear we assume) that depend on only the first derivative of the potential of the lens galaxy. For 58/59 systems, we measure parameters describing their mass (including the parameters that depend on the gradient of the source flux such as γ). As well as this, we reconstruct a de-lensed image of the source galaxy, enabling study of its morphology. For 54/59 systems, we measure parameters describing their mass distribution and light distribution (as a double Sérsic profile) as well as reconstructing a de-lensed image of the source galaxy. The most challenging step in automating lens modelling is in the initial estimation of a simple lens model (in this work, we use an SIE plus shear). Notably, once our early SP phases arrived at a successful fit to this model, the rest of our pipeline always ran to completion, successfully increasing the model complexity. We therefore recommend that effort to further improve automation should focus on 'lens model initialisation' and find ways to avoid or flag the problematic solutions that occur at early stages of the analysis. Provided that our sample of lenses is representative of the larger population of lenses that will be discovered by future surveys, this strategy will lead to a high success rate for even complex mass fits and reduce the need for visual inspection of the results. An obvious starting point to improve lens model initialisation by PyAutoLens would be to further simplify the non-linear parameter space of the SP pipeline, for example by assuming models for the lens and source light with fewer parameters Convolutional Neural Networks (CNNs) have also been suggested as a fast method for automated lens fitting (Hezaveh et al. 2017;Levasseur et al. 2017;Morningstar et al. 2018). They provide a particularly compelling solution to the problem of lens model initialisation. For example, Pearson et al. (2021) combined a CNN with PyAutoLens, using models from the CNN to initialise the mass model priors of a PyAutoLens model-fit. In the majority of cases tested on mock data, the authors argued that a combination of the two methods outperformed either method individually. Indeed, the strengths of a CNN (fast run-times, avoidance of unphysical solutions) complement the weaknesses of Bayesian inference approaches like PyAutoLens. It is conceivable that a CNN could replace PyAutoLens's initial lens model fits altogether and allow the method to focus entirely on fitting more complex lens models with well characterised errors: a task better suited to PyAutoLens's fully Bayesian approach than a CNN. At least, a CNN might be able to identify and isolate which lenses will eventually make the gold sample, and reduce manual intervention Maresca et al. (2020). CNNs will also have an as-yet unquantified fraction of failures. If the lenses where a CNN fails are different to where traditional model-fitting approaches fail, combining the two may be key to maximising the success rate of lens model initialisation.
The second major challenge for automated lens modelling is deblending the foreground lens light. Within our sample, PyAutoLens could not deblend the lens and source light in 3/59 systems, and required visual inspection to recognise these bad fits. In these cases, we instead used b-spline fits that were created via a time-consuming manual process. This issue will be more prevalent in Euclid, owing to its lower spatial resolution than HST and lens samples with smaller Einstein radii (Collett 2015) -both of which move the source's light closer to that of the lens. Furthermore, our analysis included pre-processing steps that manually removed the light of foreground stars and interloper galaxies via a GUI, a task which is overly time-intensive for an individual scientist to perform on larger samples of lenses.
We propose two directions for future work that could improve automatic deblending. First, there are CNN architectures dedicated to the task of image deblending and segmentation (these architectures do not attempt to estimate the lens model parameters). These have been applied successfully on galaxy catalogues (Burke et al. 2019;Hausen & Robertson 2020) and in studies of strong lenses (Rojas et al. 2021), with multi-wavelength imaging seen to improve debelending quality. Alternatively, this task seems well suited to citizen science (Küng et al. 2015;Marshall et al. 2016;More et al. 2016), whereby members of the public could use a GUI to mark-out regions of the data they believe correspond to the lens, source and other objects. The desired outputs of either approach are pixel-level masks describing where the source, lens and other objects are in the image data, which could be used for the automated removal or masking of contaminating light before lens modelling begins.

Einstein radius measurements and uncertainty
The statistical precision with which the Einstein radius can be measured is promising for many possible scientific studies. For example, Sonnenfeld & Cautun (2021)'s proposed method to constrain the population-level parameters of lens galaxies relies on being able to accurately measure the Einstein radii of the sample of galaxies. Previous studies have attempted to account for the very small formal statistical uncertainties on model parameters (in particular those inferred with parametric source methods) and associated systematic uncertainties by comparing the fractional difference of parameter estimates using different approaches. Bolton et al. (2008b) and Sonnenfeld et al. (2013b) reported a typical expected systematic uncertainty on the Einstein radius of ∼2-3%. This value of uncertainty is often adopted over (or combined with) those determined from the non-linear search. Furthermore, given the Einstein radius is expected to be a model-  Figure 16. The Einstein radii measured by PyAutoLens (R AL Ein ) are generally consistent with those measured by previous analyses of the SLACS (Bolton et al. 2008a) and GALLERY (Shu et al. 2016c) lenses (R lit Ein ). This shows the fractional difference between new and old measurements, as a function of PyAutoLens axis ratio, q AL . that this amount of uncertainty accounts for differences in the assumed parameterisation of the mass model.

Einstein radii compared to previous measurements
In a similar fashion to Bolton et al. (2008b) and Sonnenfeld et al. (2013a) we now compare our measurements of Einstein radii with those that exist in the literature (see Figure 16) and estimate the uncertainty introduced as a result of the different methods. The full SLACS and GALLERY samples have previously been modelled with SIE profiles to measure the Einstein radii for supplementing a dynamical analysis of the lenses (SIE models of SLACS by Bolton et al. 2008a and SIE or SIE+shear models of GALLERY by Shu et al. 2016c). In this comparison, therefore, not only are the lensing methods very different, but we have also assumed the more complex PLEMD plus external shear (PL+ext) mass distribution for the lens galaxy. Compared to these previous measurements, we find Einstein radii with root mean square (RMS) fractional difference of 7.4%. This is larger than the (empirically motivated) ∼ 2-3% uncertainty that is typically assumed.
Several differences between the methods could lead to variation between their inferred Einstein radii. Bolton et al. (2008a) and Shu et al. (2016c) model the background source using either a single or multiple Sérsic ellipsoid components, and both choose different approaches to the lens light subtraction procedure than the one we adopt. While Bolton et al. (2008b) and Sonnenfeld et al. (2013a) investigated differences like these, neither were concerned with differences in the assumed mass model. Indeed, for a subset of 14 lenses that were also analysed by Shajib et al. (2021) assuming a PL+ext model, the RMS fractional difference is only 1.6%, it may be that the reduced uncertainty is a result of fitting the same mass model. Although, this is not of concern if the Einstein radius is indeed model-independent. For this Cao et al. (2020) provide good evidence, showing that the assumption of the PL+ext exhibits only 0.05 ± 0.17% systematic error on the Einstein radius relative to complex "MaNGA" mock data.
Notably, though, we find that five of the six lenses whose REin differ by over 10% in the SLACS and GALLERY samples, are accompanied by extremely large values of external shear magnitude (ranging from 0.16 to 0.39) when fitted with our PL+ext models. If these high shear lenses are removed from the comparison, the RMS fractional difference drops to 4.2%. Cao et al. (2020) also demonstrated that the asymmetry in complex mass distributions can lead to the inference of spurious external shears. On average, they inferred an external shear magnitude of 0.015, despite the mock data being generated without external shear. In this work, we infer an average of 0.096 shear magnitude for the SLACS and GALLERY lenses. These large shear values may be partly a result of the additional complexity in the asymmetry of real lenses. Cao et al. (2020) required the Multiple Gaussian Expansion components, that represented the stellar mass, to share a common axis ratio and position angle -this may not be a realistic representation of the angular structure of real lenses (Nightingale et al. 2019). Given that it is the lenses with high external shears that differ most from previous literature measurements of REin, we speculate that the assumption of a different mass model (in particular the assumption of external shear) may drive the larger fractional uncertainty. This would imply that the Einstein radius is less model-independent than is often assumed. Further work to test this hypothesis would be valuable.

Statistical uncertainty on Einstein radii
We now consider the size of the errors we measure on the Einstein radius, based entirely on our own PL+ext models. Our likelihood cap method (Section 5) addressed the small formal statistical uncertainties on the mass model parameters and allows us to infer uncertainties that account for differences in possible noise realisations and the choice of data discretization. Moreover, since pixel-grid methods have the flexibility to reconstruct the source with as much complexity as the data needs, they are not subject to the overfitting that occurs in parametric source methods due to overly simplistic source assumptions. With this approach, we measure a mean uncertainty on the inferred Einstein radius of ∼1%, albeit with a wide range of outliers, and 2/59 lens configurations exceeding 5%. Adopting a uniform uncertainty could therefore be problematic for some statistical inferences.
For example, determining the population level parameters of hundreds of thousands of lenses, as described by Sonnenfeld & Cautun (2021) Figure 17. The statistical uncertainty on a galaxy's total mass, when measured from its effective Einstein radius, does not degrade with lens redshift z 0.7 (top panel). This is in stark comparison to most astrophysical observables. For example, the uncertainty on a galaxy's total mass when measured from stellar dynamics (velocity dispersion) increases for more distant galaxies because of cosmological dimming and beam smearing (bottom panel).
turn increase the amount of lenses required to be able to make such a constraint. Moreover, the coverage probabilities of the lens model parameters with a likelihood cap (see Figure 6) did not quite reach the expected level, potentially indicating an under confidence in the posterior. Under confidence in the posterior could lead to biases in estimates of the population parameters such as an overestimate in the scatter of the population (Wagner-Carena et al. 2021). We discuss the importance of further testing of the confidence of the individual posteriors further in Section 7.4. For comparison, Cao et al. (2020) inferred an average of 0.01% statistical uncertainty on the Einstein radius when fitting to mock data simulated using "MaNGA" galaxies without the use of a likelihood cap. This order of magnitude difference from the uncertainties inferred in this study is likely a combination of the use of the likelihood cap increasing the errors in this work, and differences in the quality of the data. Cao et al. (2020)'s mock lensed sources are simulated with S/N of 50 and have visible extended arcs (or complete Einstein rings) that the lenses with the largest errors on REin inferred in this work do not, often appearing closer to point-like. Furthermore, they do not include the lens galaxy's light, a component which we have shown in this study can hinder the lens model fitting procedure.
Based on the empirical relations we derived in Section 6.2.2, the certainty to which one can measure the Einstein radius is remarkably independent of a number of data properties and galaxy observables. For example, it might be expected that a higher S/N source galaxy image would tighten the constraints, however this does not appear to be the case for the Einstein radius measurement. This is encouraging for future surveys that will not achieve as high S/N as the HST data used in this study.
The only parameter we investigate that exhibits a statistically significant correlation with measurement uncertainty on the Einstein radius is the Einstein radius itself. Measurements of the Einstein radius become less certain for small Einstein radii, and therefore low mass galaxies. This could also be relevant for surveys such as Euclid that are forecast to detect samples of lenses with smaller Einstein radii (typi-cally ∼0.5 according to Collett 2015). Interpolating from our empirical relationship, fitted to the sample excluding the two with anomalously large uncertainty, a lens with this Einstein radius should be measurable to ∼ 2.1% accuracy. However, since the pixel-scale and PSF of the Euclid VIS instrument are roughly twice that of HST, this should be considered a lower limit.

Implications for studies of galaxy evolution
Notably, there does not appear to be a correlation between the lens redshift and measurement uncertainty on the Einstein radius. This highlights the power of strong lensing as a tool for investigations into galaxy evolution. If the lensing measurements do not degrade with redshift, then inferences of how galaxy properties evolve will be well constrained even to high redshift. This is in contrast to e.g. stellar dynamics data, where cosmological dimming effects reduce the certainty of the stellar velocity dispersion (and therefore dynamical mass) of distant galaxies. The increase in fractional uncertainty of the velocity dispersion, σerr/σ, within our SLACS and GALLERY samples is shown in Figure 17. Within both samples σerr/σ increases with redshift (the difference in fibre apertures used for SLACS and GALLERY means direct comparison of their errors is not straightforward, albeit they still highlight that in general higher redshift galaxy measurements are lower S/N).
This creates an interesting dichotomy between using strong lensing to study galaxy evolution and other methods. In lensing, provided we are able to find the lenses at the highest redshifts (surveys such as Euclid and the Vera Rubin Observatory will observe lenses at redshift of up to ∼ 2 (Collett 2015)) we can anticipate that we will be able to measure their properties as well as those at lower redshifts. Issues that plague comparisons between the properties of low and high redshift galaxies via a technique like stellar dynamics, for example beam smearing Tiley et al. (2019), will therefore be less problematic. However, whilst comparing their properties may be more straightforward, strong lens samples will have complicated selection effects Sonnenfeld (2022) that a carefully constructed dynamics sample can more easily mitigate. The reduced lensing efficiency of lower mass galaxies may also restrict the high redshift samples to only the most massive galaxies, albeit this is a limitation for most observing techniques. A strength of lensing, therefore, is that it offers a different means by which to study galaxy evolution that complements the strengths and weaknesses of other techniques.

Measurements of other lens model parameters
In addition to the total mass enclosed within the Einstein radius, strong lensing information also constrains quantities like gradients of the distribution of mass, and the amount of external shear. This is captured in a model-dependent way via the parameters of our PL+ext mass model (see Sonnenfeld & Cautun 2021 for a model-independent expression of this information). We shall now compare our measurements of radial density gradient γ and shear magnitude γ ext , to measurements made using previous, independent analyses of overlapping sets of lenses. Shajib et al. (2021) modelled 23 SLACS lenses, including 14 in our sample. Like us, they used a uniform pipeline that simultaneously modelled the distribution of mass and light. They too described the lens galaxy's light as a double Sérsic profile whose centres are aligned. However, unlike us, they fixed the Sérsic index of each to values of n = 1 and n = 4 (the exponential and de Vaucouleurs profiles respectively) and join the axis ratios of the two profiles. A major difference in the two techniques lies in the source reconstruction; Shajib et al. (2021) reconstructed the source using a Sérsic profile and shapelet basis functions.
For all but one lens, Shajib et al. (2021) and our measurements of γ and γ ext are consistent ( Figure 18). For the discrepant lens J2300+0022, PyAutoLens infers γ = 2.55 and γ ext = 0.08, compared to Shajib et al. (2021)'s γ = 1.85 and γ ext = 0.03. We believe this discrepancy could be a result of the different order of operations in a model fit. Shajib et al. (2021) initialise their lens model assuming γ ext = 0.0 and relax this assumption once other components of the model are fit. In contrast, the first mass model we fit in our analysis assumes priors on the shear parameters that allow values up to γ ext = 0.2. Indeed, for J2300+0022 our search yields a best-fit shear of γ ext = 0.07. Discarding this lens, we find a mean difference of −0.07 ± 0.07 between the slopes inferred by the two methods, where the error is propagated from the standard error on the means of the two samples. On average, PyAutoLens measures slightly shallower slopes than Shajib et al. (2021), although this is not a significant difference -the mean discrepancy for the sample is consistent with zero at the current uncertainty level. A larger sample of measurements may be able to discern if there are systematic differences introduced on the density slope as a result of the lensing technique. We note that we measure a scatter of 0.17 between the slope measurements suggesting there may be systematic uncertainty between the two methods. Ritondale et al. (2019) modelled 17 GALLERY lenses, including 15 in our sample. Although they do not adopt a uniform analysis pipeline, their lens modelling technique more closely resembles ours, because they reconstruct the source Figure 20. The statistical uncertainty on measurements of the radial gradient of the total lens mass, reported by PyAutoLens are similar to those found by Shajib et al. (2021) for SLACS lenses (left). However, the uncertainty reported by Ritondale et al. (2019) for GALLERY lenses (right) is an order of magnitude smaller. That method uses a pixelised source, and may be subject to the source discretization bias that we discuss in Section 5.
galaxy using a pixelisation. On average, PyAutoLens measures a 0.13 ± 0.21 steeper density slope ( Figure 19). The scatter in this difference is comparable to the average uncertainty that we infer for the GALLERY lenses (0.11) but an order of magnitude larger than the average uncertainty inferred by Ritondale et al. (2019) (left panel of Figure 20). In fact, the uncertainties inferred by Ritondale et al. (2019) more closely resemble those from PyAutoLens before we used a likelihood cap to avoid source discretization bias (Section 5). This suggests that discretization bias may also affect the pixelisedsource method of Ritondale et al. (2019). Conversely, the uncertainties derived by Shajib et al. (2021), whose analytic approach to source reconstruction can not be affected by discretization bias, are similar to ours with the likelihood cap (right panel of Figure 20). It is reassuring that independent analyses yield results that are consistent in many ways. However, the relatively small number of lens systems in common to multiple analyses prevents much more detailed comparison between codes or modelling assumptions. The inconsistencies in other aspects of results highlights an urgent need for larger-scale tests.

Large-scale tests of lens modelling
A vital but unintended consequence of this paper, is a solution to, and better understanding of the source discretization bias that previously caused parameter uncertainties to be underestimated. This occurred in both synthetic and real lenses, as a result of noise in the likelihood evaluations of methods using a pixelised source reconstruction (due to particular alignments of source pixels being arbitrarily more or less penalised by regularisation). Our likelihood cap solution successfully reduced noise and smoothed posterior PDFs. It increased the size of our uncertainties such that they had roughly the expected level of coverage, and improved the recovery of all parameters in our synthetic data. Although the likelihood cap was determined in an empirical way, the size of the inferred errors is inherently linked to this choice of likelihood cap. It may be that a different choice of likelihood cap could provide better coverage probabilities than the one we adopted. Further investigation would be warranted to understand at a deeper level what causes these spikes in likelihood in pixelised source reconstructions, as improvements may be possible by changing the approach to pixelising the source plane, or regularising the pixelised source. Our work shows the importance of testing strong lens modelling methods on larger samples than previously attempted. Even our mock sample comprising 6 noise realisations of 59 lens configurations yields insufficient statistics to determine whether the inferred central values and statistical uncertainty on mass model parameters are consistent with the expectations of drawing each measurement from a normal distribution. Equally, whilst there is evidence for small systematic biases in the estimates of certain mass model parameters, we do not have enough unique lens configurations to determine the primary causes. Given that we are just a few years away from modelling samples of tens of thousands of lenses, tests of strong lens modelling methodology on synthetic data with complex mass distributions (e.g. Mukherjee et al. 2018;Enzi et al. 2020;Cao et al. 2022;He et al. 2022) must now scale up to ensure that error estimates are robust and systematic biases understood.

Computational Costs
Every SLACS and GALLERY lens modelled in this work was analysed using a single 2x Intel Xeon Gold 5120 x @ 2.20GHz CPU, on the Distributed Research Utilising Advanced Computing (DiRAC) Data-Centric System on the COSMA7 machine at Durham University. Run times depend primarily on the number of image pixels fitted after masking, which due to the standard 3.5" circular mask used to fit most lenses is fixed. The lower resolution of SLACS lenses (0.05 pixel −1 ) means they contain fewer image pixels than GALLERY lenses (0.04 pixel −1 ) and the fits were therefore faster. For SLACS lenses, the source parametric pipeline takes between 10 − 24 hours, the source inversion pipeline 10 − 36 hours, the light pipeline 10 − 72 hours and mass pipeline 6 − 48 hours. GALLERY lenses take longer on average, where the source parametric pipeline takes between 10 − 36 hours, the source inversion pipeline 10 − 48 hours, the light pipeline 12 − 144 hours and mass pipeline 12 − 72 hours. The scatter in run times is due to many factors: lens galaxy S/N, source galaxy S/N, lens configuration, lens morphology, source morphology, etc.
Based on the longest GALLERY run times, the upper limit for the overall run time is 300 CPU hours. For 100 000 strong lenses this would require 30 000 000 CPU hours over the 5 − 10 year lifetime of a survey like Euclid, producing an upper limit of ∼ 6 000 000 CPU hours per year. For the recent DiRAC resources allocation call, this amount of CPU time is a 'small' project. We therefore anticipate that the analysis performed in this work will not be limited by CPU resources in the near future. Based on profiling of PyAutoLens, we anticipate the run time of a single lens will reduce by a factor of four or above when fitting lower resolution wide-field imaging data (e.g. the resolution of Euclid data is 0.1 pixel −1 ). The 3.5" circular masks assumed throughout this work are also unnecessarily large for many lens systems, and reducing the mask size to 2.5" speeds up the analysis by factors of three and above.

SUMMARY
Tens of thousands of strong gravitational lenses will be imaged in the next few years, but current analysis techniques are labour-intensive. We use open source software PyAutoLens to develop a fully automated, Bayesian analysis of all 59 strong galaxy-galaxy lenses that have been observed by the Hubble Space Telescope (HST) under certain conditions. Adopting the open source software PyAutoLens provides an optimistic outlook for the future of automated analysis: for 54/59 lenses (92%) we achieved successful model fits (determined via visual classification) with no human intervention. We illustrate why other fits initially went wrong, and present solutions that allowed us to infer accurate models for all 59/59 lenses (100%) and recommend steps necessary for analysing the larger incoming samples. Notably, the difficulties primarily happen at the beginning of the analysis when attempting to determine an initial, approximate, lens model -and often reflect confusion between light from the foreground lens and background source. Once a simple model is initialised, our pipeline worked flawlessly to automatically fit a sequence of more complex models that measure more detailed properties of the lens galaxy. We therefore discuss how combining a Convolutional Neural Network with a Bayesian approach like PyAutoLens could increase the automation success rate whilst extracting maximum physical information from each strong lens.
We also use synthetic observations of ∼ 500 lenses to explain and solve a problem common to pixel-based strong lensing methods that causes the statistical uncertainty on model parameters to be underestimated. This is fundamentally due to noise in likelihood evaluations, caused by discretization effects in pixelised reconstructions of the source galaxy. We implemented an empirical correction that 'caps' the likelihood value to suppress noise. This significantly improves the measurement of the synthetic lens parameters, and leads to error estimates on different noise realisations of identical datasets that are more consistent with one another. On the real data we found this empirical correction (using the likelihood cap) gave a five fold average increase in the inferred uncertainties on model parameters. Comparing to previous literature results, we suggested this bias may be leading to uncertainty under estimation in other studies that use similar methods. Given the incoming samples of tens of thousands of strong gravitational lenses, we believe more detailed study of such systematics on larger mock samples is key.
Accurately knowing the systematic uncertainty on measurements of Einstein radius (total galaxy mass) will become vitally important for large samples of lenses, which beat down statistical uncertainty. Previous studies often assume a constant uncertainty of 2-3%. We find substantial variation between lenses, with a mean of 1% and 57/59 lenses with < 3%, but 2/59 lenses with > 5%. Future analysis of large samples, where careful control of systematics is paramount, must therefore adopt more rigorous errors. Our Einstein radii measurements assumed only a single type of parametric mass model and we do not investigate the degree of uncertainty that results from making different mass model assumptions.
Notably, the uncertainty on our measurements of Einstein radii (and those of the lens models in general) do not increase with redshift. That is, we learn as much about the strong lenses at redshift ∼ 0.7 as those at redshift ∼ 0.1. This is in stark contrast to other astrophysical probes of a galaxy's structure (e.g. dynamics, morphology), where cosmological dimming effects and beam smearing degrade measurements of distant galaxies. Nor does uncertainty on Einstein radii depend strongly upon the signal-to-noise ratio of our data. This makes strong lensing a compelling technique to study galaxy evolution: once high redshift strong lenses are found, it should be straight forward to measure their properties. Of course, the technique has its own challenges, for example complicated selection effects, but it should nevertheless provide an invaluable tool for studies of galaxy evolution over the next decade.

DATA AVAILABILITY
Text files and images of every model-fit performed in this work are available at https://zenodo.org/record/6104823. Full dynesty chains of every fit are available upon request.

SOFTWARE CITATIONS
This work uses the following software packages: The pipelines that make up the uniform analysis for modelling a lensed image that does not contain the lens galaxy's light are presented in Table A1. This pipeline was used to analyse the mock data in this work. As well as this, a variation on this analysis, that also includes external shear in the mass model, was used to fit the four lenses that required lens subtracted data to arrive at successful model fits. The initial model fit priors, and those used when we choose not to inform priors with prior passing, are given in Table A2.

APPENDIX B: INFERRED MODEL PARAMETERS
We present the best fit model parameters for all SLACS and GALLERY lenses. The PLEMD+ext mass model parameters are given in Tables B1 (SLACS) and B2 (GALLERY). The double Sérsic light model parameters for the Gold sample are presented in Tables B4 (SLACS), and B3 (GALLERY). We present the light parameters only for the "Gold" sample since the "Silver" and "Bronze" samples either do not fit the lens light or provide models we do not trust. All errors quoted are those derived from the 68% credible region of the PDF output from dynesty.

APPENDIX C: MODEL FITS
In this study we categorised the model fits into "Gold", "Silver", and "Bronze" depending on the quality of the model fit.
The "Gold" fits are presented in Figure C2 for SLACS lenses and Figure C1 for GALLERY lenses. The "Silver" lenses are then presented in Figure C3 and the "Bronze" lens in Figure C4.  Table A1. Pipeline model components for the analysis which fits to a lensed image which does not contain emission from the lens galaxy.

Model
Parameter Prior Table A2. The initial priors on every parameter of every light and mass profile fitted in this work. Column 1 gives the model component name. Column 2 gives the parameter. Column 3 gives the prior, where U (a, b) is a uniform prior between a and b, and N (µ, σ) is a Gaussian prior with mean µ and variance σ 2 . Note that due to prior passing (see Section 4) the final priors used to fit a model, corresponding to the results given in this work, will be updated from the above values. The priors of every fit can be found at the following link https://zenodo.org/record/6104823.