Gravitational imaging through a triple source plane lens: revisiting the Λ CDM–defying dark subhalo in SDSSJ0946+1006

The ΛCDM paradigm successfully explains the large–scale structure of the Universe, but is less well constrained on sub–galactic scales. Gravitational lens modelling has been used to measure the imprints of dark substructures on lensed arcs, testing the small–scale predictions of ΛCDM. However, the methods required for these tests are subject to degeneracies among the lens mass model and the source light profile. We present a case study of the unique compound gravitational lens SDSSJ0946+1006, wherein a dark, massive substructure has been detected, whose reported high concentration would be unlikely in a ΛCDM universe. For the first time, we model the first two background sources in both I– and U–band HST imaging, as well as VLT– MUSE emission line data for the most distant source. We recover a lensing perturber at a 5 . 9 σ confidence level with mass log 10 ( M sub /M ⊙ ) = 9 . 2 +0 . 4 − 0 . 1 and concentration log 10 c = 2 . 4 +0 . 5 − 0 . 3 . The concentration is more consistent with CDM subhalos than previously reported, and the mass is compatible with that of a dwarf satellite galaxy whose flux is undetectable in the data at the location of the perturber. A wandering black hole with mass log 10 ( M BH /M ⊙ ) = 8 . 9 +0 . 2 − 0 . 1 is a viable alternative model. We systematically investigate alternative assumptions about the complexity of the mass distribution and source reconstruction; in all cases the subhalo is detected at around the ≥ 5 σ level. However, the detection significance can be altered substantially (up to 11 . 3 σ ) by alternative choices for the source regularisation scheme.


INTRODUCTION
The standard ΛCDM model of cosmology describes a dark energy (Λ) dominated universe whose mass comprises ∼ 85% Cold Dark Matter (CDM).In contrast to baryons, this is an exotic type of matter outside of the standard model of particle physics that interacts with electromagnetism very weakly if at all.Assuming that Dark Matter (DM) is a particle, no candidate has been directly observed in a laboratory yet (e.g.Roszkowski et al. 2018;Schumann 2019;Billard et al. 2022).
Nonetheless, CDM theory successfully describes observations of the Universe on ∼Mpc scales and above (see e.g Bullock & Boylan-Kolchin 2017), such as the hierarchical formation of large scale structure (Anderson et al. 2014;Hildebrandt et al. 2017) and the cosmic microwave background (Planck Collaboration et al. 2020).Whilst DM is needed on galactic scales to explain rotation curves (Rubin ⋆ Contact e-mail: daniel.ballard@port.ac.uk & Ford 1970; Rubin et al. 1978Rubin et al. , 1985)), it is entirely possible that the DM is not precisely that of the CDM paradigm; alternative models may be required to explain observed phenomena on smaller, sub-galactic scales (Diemand et al. 2007(Diemand et al. , 2008)).In this lower-mass regime, alternatives to CDM have been proposed to resolve apparent discrepancies between observations and simulations (e.g.Del Popolo & Le Delliou 2017), though many of these can also be explained by other means than the DM model (see e.g.Fairbairn 2022).
Alternative DM models make different predictions about the properties of individual halos as well as their populations.For example, higher thermal velocities in Warm Dark Matter (WDM, e.g.Schneider et al. 2012;Lovell et al. 2014) models lead to less concentrated halo mass profiles (e.g.Ludlow et al. 2016;Bose et al. 2017) and a suppression of small-mass halos (Lovell et al. 2014(Lovell et al. , 2021)).Deviations from CDM on sub-galactic scales or in dwarf galaxies can, however, be obscured by their tidal interactions with more massive luminous halos (e.g.Despali et al. 2022b;Moreno et al. 2022).
While classical "hot" DM models are ruled out by observations of the large-scale Universe (see e.g.Primack & Gross 2001), the small scale effects of WDM models are much harder to constrain.The formation of luminous galaxies typically requires a halo mass of around ≳ 5 × 10 9 M⊙ (Benitez-Llambay & Frenk 2020), thereby limiting the sample of directly observable satellite galaxies (Kim et al. 2018;Newton et al. 2021;Nadler et al. 2021).Instead we must rely on observations that are directly sensitive to the gravitational effects of the DM itself, such as strong gravitational lensing.This provides a direct probe of small-mass halos, since the lensing effects of galaxies and halos depend only on their mass, irrespective of their luminosity.
J0946 is worthy of further study for two reasons.First, its perturbing subhalo has both an unexpectedly high mass if it is truly a dark matter substructure and not a dwarf satellite assembly of stars (Vegetti et al. 2010b, hereafter V10) as well as an unexpectedly high concentration given its mass, making it a substantial outlier with respect to CDM simulations (Nelson et al. (2015); Minor et al. (2021) -hereafter M21).Second, J0946 is a compound lens system, with a lens at z l = 0.222 and three sources at zs1 = 0.609, zs2 = 2.035 and zs3 = 5.975 (Collett & Smith 2020, hereafter CS20).These four galaxies are henceforth referred to as the main deflector, s1, s2, and s3 respectively.
Previous gravitational imaging studies of J0946 have only considered the lensing of s1 as observed in the F814W band by the Hubble Space Telescope (HST).In this paper, we extend on previous work in two ways, modelling all three sources in both the near-infrared F814W and the ultraviolet F336W bands simultaneously.Modelling the compound lensing should improve the macro-model of the main deflector, since compound lens modelling is much less affected by degeneracies than the modelling of a single source plane system (see e.g.Schneider & Sluse 2014).Furthermore, one of the lensed images of s3 happens to fall close to the projected location of the reported dark subhalo, providing additional constraints on its properties.Modelling both HST bands simultaneously will allow us to disentangle source light complexity from mass model complexity, since lensing is achromatic whereas star-forming galaxies typically look very different in the ultraviolet and infrared.
This paper is structured as follows.In Section 2, we de-scribe the data, the geometry of the compound lensing in J0946 and our modelling methodology, and include a test of our sensitivity to a DM substructure.In Section 3, we present and discuss our results for a single source plane, and compare them to similar literature model setups.In Section 4, we present and discuss the results of our full triple source plane lens modelling.In Section 5, we then perform systematics tests on various model assumptions.Finally, we conclude our findings in Section 6.

Data
We model two HST observations: the 2096 s ACS image in F814W (I-band) from Gavazzi et al. (2008) and the 5772 s WFC3/UVIS observation in F336W (U-band) from Sonnenfeld et al. (2012).The I-band image allows us to compare with previous results in the literature, whilst adding the U-band probes clumpier emission in the source galaxies and gives excellent angular resolution.Though available in the HST archive, we neglect intermediate optical wavelength bands as these are unlikely to capture any qualitatively different structures; the same is true for the longest available wavelength band, WFC3/IR F160W, whose resolution is moreover poorer than the I-band image.Data in both of our modelled bands are shown in Figure 1, with the reported location of the substructure from V10 overlaid.The I-band image as analysed has been drizzled to 0.05 ′′ /pixel; the U-band image covers the same area but drizzled to 0.04 ′′ pixels.We use the same lens lightsubtracted I-band image as Collett & Auger (2014, hereafter CA14), but we do not subtract the lens light from the U-band image since it is negligible at this wavelength, at the location of the arcs.Prior to the lensing analysis, the physical coordinates of the U-band data were aligned to those of the I-band data, to correct for a small relative offset between the pipeline-processed images.With the optimised shifts (δx = 0.027 ′′ , δy = −0.023′′ ), this correction is smaller than a single pixel.
Figure 1 also shows the VLT-MUSE narrow-band image extracted in a 5 Å window around 8475 Å, capturing Lyman-α emission from the most distant lensed source.This image is not used explicitly in our lens modelling; we instead delens the centroid positions of the two s3 image features and their astrometric uncertainties, derived from fitting a Gaussian to each image.Since the MUSE data have lower angular resolution, the image registration relative to HST is more uncertain than for the HST U-band versus I-band image alignment.To account for this, we artificially blur the I-band image with the MUSE Point Spread Function (PSF) and align this with a simulated HST I-band image of the arcs constructed out of the appropriate wavelength slices of the MUSE data cube.The resultant alignment uncertainty is propagated into the uncertainty of the s3 image centroids.
We model image pixels within one of four manually masked regions in the HST imaging of J0946, shown in Figure 2. We avoid the computational challenge of modelling both sources simultaneously (CA14), by reconstructing the two sources and two bands as separate parts of the likelihood, which are simultaneously fit with the same mass  model.This is a reasonable approach, since the two rings do not overlap on the sky.

Ray Tracing
For strong gravitational lensing, the source plane position, β, of a photon is displaced from its observed, lensed, im-age plane position, θ, by the reduced deflection angle, α, according to the lens equation: (1) The deflection angle, α, of a lens is related to the lensing potential on its lens plane, ψ, such that where ψ depends on the 2D projected lens mass distribution, as well as the angular diameter distances between observer, lens and source.Equation 1 is for a system with one lens and one source plane, but can be generalised to give the compound lens equation: ηijαi−1(θi−1) for j > 0 . (3) Here we have adjusted our notation from Equation 1 to no longer distinguish between lens and source, since in a compound lensing system a single galaxy can be both.In Equation 3, θi generically denotes an angular position on a redshift plane, i, where i = 0 is the foreground-most lens plane and observed image plane; any i > 0 refers to the i th source (or further lens) plane behind it.For a lensing plane l, the extra parameter ηij describes the scaling of the reduced deflection angles from one source plane, i, to another, j, defined as a ratio of angular diameter distances: Throughout the multi-source plane lensing portions of this work, we define reduced deflection angles of a lens relative to light coming from the plane immediately behind the lens.This is not the convention of Schneider et al. (1992), who define all reduced deflection angles relative to light coming from the furthest plane.Our convention allows easier comparison between our work and other single and double source plane models of J0946.A detailed explanation of our chosen convention is available in Appendix A.

Lens Modelling
To model the data, we follow the semi-linear inversion approach of Warren & Dye (2003).We define a model for the lensing mass distribution, and for each realisation of the non-linear parameters of that model we linearly solve for the best-fitting source.

Non-linear Mass Model
We assume that the main deflector is described by an Elliptical Power Law (EPL) model with external shear.We consider two possible scenarios for evidence comparison: one with and one without a dark subhalo in the form of a truncated Navarro-Frenk-White (tNFW) profile.We refer to these two scenarios as our smooth and tNFW-perturbed models, respectively.Additionally, in our multi-source plane models in Sections 4 and 5, s1 and s2 behave as lenses as well as sources; we model their mass distributions as singular isothermal sphere (SIS) profiles.
The EPL profile has six parameters that behave nonlinearly in the model: The Einstein radius, ϑE, the logarithmic slope, γ, the axis ratio, q, the position angle, φ, and two centroid coordinates (x, y).An SIS is identical to an EPL with γ = 2 and zero ellipticity.The external shear has two non-linear parameters: the shear strength, Γ, and the shear angle, φΓ.
The tNFW profile is based upon the profile derived by Navarro et al. (1996), whose density, ρ, at radial distance, r, is related to a characteristic density, ρ0, by ρNFW(r) = ρ0 r rs (1 + r rs ) 2 . (5) As in M21, we do not assume a fixed mass-concentration relation for the substructure, and therefore model both its concentration, c, and virial mass, M200.The relation between the scale radius in Equation 5, rs, and c is given by: where r200 is considered the virial radius enclosing M200, though is strictly the radius enclosing an average density that is 200 times the critical density of the Universe.Following M21, M200 is formally defined under the assumption that the subhalo can be considered a field halo, which is then tidally stripped by its massive host.To account for this tidal stripping, we assume that this profile is truncated according to Baltz et al. (2009): We compute both the virial mass, M200, analogous to a nontruncated field halo, as well as the total mass of the substructure, M sub , which accounts for the effect of the truncation radius, rt.The latter is a finite quantity for the above choice of truncation.It is hence possible that M sub < M200.The free parameters of our tNFW profile are M200, c, rt, and centre position (x, y).Throughout this work we assume that the dark perturber is a subhalo at z = 0.222, the redshift of the main deflector.M21 also find a good fit to the data when the perturber is a line-of-sight halo between the observer and the lens plane, with the mass and concentration marginally decreased but still anomalously high.

Mass and concentration from simulations
Extrapolating the field halo mass-concentration relation of Shao et al. (2023)  ).We note, however, that the differences that we later quote between these results and our own depend on the assumed parameters describing baryonic physics in the IllustrisTNG and SIMBA models, i.e. feedback from supernovae and active galactic nuclei.

Reconstructing unlensed source brightness distributions
Since we do not know the morphology of a source a priori, we infer it simultaneously with the lens parameters from the data.It is clear from the clumpiness of the arcs that the sources must be intrinsically irregular.Therefore, we adopt a pixellated free-form reconstruction of the source light.Specifically, we evaluate source brightness values defined on an adaptive Voronoi mesh created from a subset of image plane pixels ray-traced onto each source plane.In this work, we cast back all the pixels that fall within the mask of a given source for the band in consideration.The advantage of such an adaptive mesh is that it allows for a higher resolution source at those locations where the magnification through the lens becomes the strongest.We follow Nightingale et al. (2021Nightingale et al. ( , 2022) ) and employ a Natural Neighbour Interpolation scheme to determine sub-pixel source brightness values (Sibson 1981).We choose this scheme because (i) it yields a smooth Likelihood function which makes sampling the non-linear parameters much easier, and (ii) it forces the gradient of the source to be continuous, which is particularly important for substructure identification.
To impose the astrophysical prior that sources require a certain degree of smoothness, we additionally introduce a regularisation strength parameter for each source.The brightness values at the vertices follow a Gaussian regularisation prior whose covariance matrix penalises the source brightness gradient or curvature (see Suyu et al. 2006, for details).Fiducially, we opt for gradient regularisation, in contrast to V10 who use curvature regularisation and M21 who reconstruct their source out of a summation of analytic light profiles.However, since we do not a priori know how smooth our source reconstructions should be, we leave the regularisation strengths for the reconstructions of s1 and s2 as free parameters to be inferred by the model directly from the data.The centroid position (x, y) of s3 is also fit for, but the unlensed light distribution of this source is not reconstructed.

Posterior and evidence calculation
For model comparison, we evaluate both the posterior of the non-linear parameters, ξ, and the evidence of our models with and without a substructure.The posterior, P(ξ|d), relates to the likelihood function, Ltot(ξ), and the prior of model parameters, P(ξ), according to: The full details of Ltot(ξ) are described in Appendix B. The Bayesian evidence, Z, is an integral of the likelihood multiplied by the prior, which normalizes the posterior, i.e.: We evaluate the posterior and this integral using the preconditioned Monte Carlo package pocoMC (Karamanis et al. 2022a).pocoMC generates posterior samples by following a Sequential Monte Carlo scheme combined with a Normalizing Flow, which preconditions the target distribution to remove correlations among its parameters (Karamanis et al. 2022b) 1 .Evidences are calculated using the bridge sampling method and consistent with those obtained from the nested sampling algorithm MultiNest (Feroz et al. 2009(Feroz et al. , 2019)).
When comparing two models, we report the N σ confidence level that one is preferred over the other, i.e. we assume that one of the considered models is true and map the model probability onto the N σ probability volume of a Gaussian distribution.

Checking the sensitivity of our method for detecting substructures
Claiming the detection or non-detection of a substructure requires knowledge of the sensitivity of the data (see e.g.Despali et al. 2022a).To demonstrate that we are, in principle, sensitive to a substructure within the data at the reported location, we create a mock data set based upon our best smooth reconstruction of the I-band image of s1 (see Section 3) and inject a tNFW profile with the parameters reported in M21. Figure 3 illustrates how the inclusion of the substructure affects the closest arc, including the effects of the PSF and observational noise.We then remodel this data assuming both a smooth and tNFW-perturbed model, finding that the latter is preferred with a difference in the logarithmic evidence of ∆ ln Z = 15.16 ± 0.03 assuming gradient regularization of the source (corresponding to a 5.2σ detection significance).Our posteriors are consistent within but increase the number of particles to up to 6000.We further set the maximum number of MCMC steps to 10000.We found that these values ensure convergence of the posterior, given the multi-modality of the likelihood.1σ of the input subhalo mass and concentration.This suggests that we should be able to detect a substructure with similar properties to M21.However, since we are reproducing an injected halo whose parameters are exactly known, a more rigorous sensitivity calculation would be required if we were searching for new subhalos in J0946.

SINGLE SOURCE PLANE MODEL RESULTS AND DISCUSSION
In this section, we present the results of our single source plane models for J0946 and compare them with those of previous studies.

I-band Model
Modelling the I-band data of the inner arcs alone provides the closest comparison with previous studies of J0946 (e.g.V10, M21).We can reconstruct the data to the noise level assuming our smooth (EPL+Shear) model.Between our smooth and tNFW-perturbed models, we find that the posterior distributions of the macro-model parameters agree within the ∼ 3σ level or closer (with the exception of the x coordinate of the centre of the lens).Posterior distributions for these parameters are shown in Figure 4, alongside the best-fit source reconstruction and normalised imageplane residuals, which demonstrate our ability to successfully model these arcs down to the noise level.
In this single plane, I-band only case, the data prefers the existence of a tNFW substructure (∆ ln Z = 7.23 ± 0.03) with 3.4σ confidence over the smooth model.Our macromodel parameters are within 4σ of those reported by V10.Such differences are most likely due to our prescription of our source model (gradient regularised, versus curvature regularised in V10) and our wider prior ranges on all parameters.However, it is also noteworthy that comparisons with V10 are non-trivial due to differences in the parameterisation of the EPL mass profile.It is possible that small disagreements found in our results are introduced by such differences in parameter definitions.
The differences in likelihood and evidence between smooth and tNFW-perturbed models are recorded in Table 1.All priors and posterior results are documented in Appendix C.
Regarding the mass and concentration of the substructure, we find log 10 (M200/M⊙) = 10.8 +1.3  −0.6 and log 10 c = 2.0 +0.3 −0.3 .Our results exceed all of the simulation values with a root-mean-squared difference of 2.7-3.6 σc, with σc being the standard deviation of our concentration posterior.Our result is less of an outlier than M21 finds, both because of the greater uncertainty on our inferred parameters and the lower median value of the concentration.The subhalo mass, log 10 (M sub /M⊙) = 10.0 +0.4 −0.3 , remains perplexing, however, given that such a massive object should host a detectable population of stars (V10).

Dual I-band and U-band Model
Simultaneously modelling the I-and U-band data for s1 necessitates one additional non-linear parameter (the regularisation strength of the U-band source galaxy) but adds much more data to constrain the lens model.Doing this, the tNFW-perturbed model is preferred over the smooth model with an evidence ratio ∆ ln Z = 14.34 ± 0.04, corresponding to a 5.0σ confidence detection.
The addition of the U-band yields different posteriors on our macro-model parameters.Comparing with the Iband only case, the mass profile slope for the smooth model is significantly shallower (γ = 1.92 +0.03 −0.02 versus 2.12 +0.03 −0.07 ).However, when the tNFW perturber is included, both our models prefer a super-isothermal slope (γ = 2.27 +0.05 −0.04 and 2.23 +0.02 −0.02 respectively).The differences in γ between smooth and tNFW-perturbed cases are likely caused by a source position transformation, whereby multiple image plane locations that correspond to a single source plane position are invariant under a change in lens model if the source is afforded the flexibility to move (Schneider & Sluse 2014).Our multi-plane modelling should not suffer from this effect, as the scalings required for the degeneracy are source redshift dependent.Since we have sources present at multiple very different redshifts, the source position transformation degeneracy is broken, except in extremely contrived scenarios where a transformation of the mass on s1 counterbalances the transformation of the primary lens (Schneider 2014).
Despite the significant shifts in the parameters of the macro-model, the substructure mass and concentration are still consistent with the I-band only result within 1σ.Deviations from the predicted mass-concentration relations are on the level of 2.8-3.7 σc.

TRIPLE SOURCE PLANE MODEL RESULTS AND DISCUSSION
In this section, we present the results from our triple source plane (henceforth 'fiducial') models, where we reconstruct s1 and s2 both in the I-and U-band simultaneously, whilst also delensing s3 by mapping its two images to a common source plane position, with and without a tNFW perturbation.We use the same mass profiles and priors for the foreground lens as in our single-plane modelling, but we add an SIS at the centre of the delensed position of s1, allowing for a small offset between the centroids of the mass and light.We similarly add an SIS at s2 but enforce zero offset between the centroids of its mass and light, since CS20 showed that this assumption has negligible impact on s3.
We find that we are able to simultaneously reproduce the I-and U-band arcs of s1 and s2, and delens s3.Our source reconstructions and residuals are shown in Figure 5.The positions of the third source are shown in Figure 6.The extra data afforded from the outer set of arcs give much tighter constraints on the macro-model.We find that the super-isothermal results of V10, M21, and our single plane tNFW-perturbed models, do a comparatively poorer job of reconstructing s2.With our fiducial models, a near isothermal result is favoured for both the smooth and tNFW-perturbed cases, where γ = 1.956 +0.009 −0.010 and 1.949 +0.011  −0.010 respectively.The similarities between the recovered slopes and the reconstructed sources (as shown in Fig- ure 7) are clear demonstrations that the source position transformation of Schneider & Sluse (2014) has been broken by our multi-plane modelling.The 1σ and 2σ posterior distribution contours for these models -as well as for the single plane dual I-band and U-band models -can be found in Appendix D.
We find that the existence of the tNFW perturbation is preferred with an evidence ratio ∆ ln Z = 19.64 ± 0.03 over the smooth model, corresponding to a 5.9σ detection.The −0.3 .We show 2D posterior distributions of M sub and c against a selection of macro-model parameters, for the fiducial tNFW-perturbed model result in Figure 8, wherein we observe a notable degeneracy between the Einstein radius of the main deflector and the mass of its substructure, since the total mass within the Einstein ring is well-constrained.Otherwise, there are no strong degeneracies.The 2D M sub -c posterior distribution for our fiducial result is shown separately on the upper panel of Figure 9, overlaid with the single source plane results.Our fiducial M200-c posterior appears on the bottom panel of Figure 9, which also shows the M200-c relation of Dutton & Macciò (2014).The shape of this posterior distribution is similar to the results of M21, though our σc is greater than theirs primarily because of our more flexible source model.We find that our results differ from Dutton & Macciò (2014) and the other aforementioned mass-concentration relations by 2.6-3.3 σc.2012), our virial mass implies a stellar mass M * ∼ 10 7.5 M⊙.For a plausible stellar massto-light ratio of ∼ 2M⊙/L⊙ (appropriate to a passive dwarf galaxy -see e.g.Martin et al. 2008), this corresponds to an absolute magnitude MI ≈ −15.4,typical of dwarf elliptical populations in nearby galaxy groups.At this luminosity, such objects have typical sizes ∼ 1kpc (Venhola et al. 2019).Introducing a simulated galaxy of these properties scaled to z = 0.222 into the I-band image, we find that although such a galaxy would be detectable in isolation, it could not be unambiguously distinguished from other flux components if located at the position of the subhalo.Since the associated galaxy could easily be a factor of two fainter, or be more diffuse, than assumed here, we should not expect to see an easily-identified luminous galaxy hosted by the lensing substructure.The subhalo we have detected is therefore not unusually "dark", and appears compatible with being a dwarf satellite galaxy of the main deflector.

SYSTEMATIC TESTS
In this section, we examine several model assumptions that systematically could have influenced our ability to detect and measure a DM substructure.We perform tests on the choice of source regularisation and explore the effects of additional mass model complexity and an alternative hypothesis for the perturber.We explore all of these systematics for the triple source plane (I-and U-band) case only.

Degeneracy with source morphology
One of the main systematic uncertainties is the degeneracy between the complexity of the mass and the source light distributions.While enforcing a smoother source could lead to a false positive detection of a lensing perturber, allowing too much freedom in the intrinsic structure of the source could lead to non-detections even in the presence of DM substructures.In our fiducial model, we chose a gradient regularization scheme for the source galaxies, which allows for smallscale source structure.Alternatively, we can suppress these small-scale source features by regularising over curvature.This is the regularisation choice of V10.In this case, the substructure is detected with much higher significance: ∆ ln Z = 67.00± 0.02, or 11.3σ.Such a detection claim would be overconfident in our analysis since the evidence actually prefers gradient regularisation at ∼20σ confidence.This result is true for both the smooth and perturbed models.

E
It is concerning that the significance of the detection changes hugely between the two regularisation schemes since neither is astrophysically motivated.It remains an open question whether alternative regularisation schemes or source reconstruction techniques could raise or lower the evidence for a substructure.We leave this exploration to future work.
The mass-concentration posterior for the substructure under the curvature regularisation scheme is shown in the centre panel of Figure 9. Whilst the detection significance has changed, the inferred subhalo parameters and their uncertainties have not changed significantly.The substructure would, therefore, remain a modest outlier given either regularization scheme.

Angular structure in the main deflector
Previous works have shown that lensing substructure inference can be sensitive to the flexibility of the main deflector mass model (see e.g.Nightingale et al. 2022;Minor et al. 2021).Therefore, we explore additional complexity in the foreground lens model by combining our EPL with the modes m of a multipole expansion: where φ = arctan (x/y) and 0 ≤ km ≤ 1 is the amplitude of the m th mode with phase φm2 .Such an expansion can account for boxiness or diskiness of the lens galaxy.As in M21, we model multipole terms m = 3 and m = 4.We therefore add four non-linear parameters to the model: k3, k4, φ3 and φ4.The best fit source reconstructions and normalised image plane residuals are plotted in Appendix E. Multipoles perform comparably well at reconstructing the data as the tNFW perturbation.In fact, a smooth model with added multipoles performs marginally better in reconstructing J0946 than a tNFW-perturbed model, with the data preferring the presence of multipoles over the presence of the tNFW profile with 1.5σ confidence.This is not solely due to there being fewer degrees of freedom in the multipoles case, since the best fit log-likelihood is also improved, with ∆ ln L = 3.74.The preference for non-zero multipole terms is unsurprising given detailed examination of the light profile, which reveals some disturbance in the shapes of the isophotes that can be absorbed by these extra parameters (Sonnenfeld et al. 2012).
Modelling the multipole terms and a tNFWperturbation simultaneously provides the best reconstruction, where the substructure is detected with 6.2σ confidence.The inferred substructure in this case is more massive, with log 10 (M200/M⊙) = 10.6 +1.1 −0.4 , but less concentrated, with log 10 (c) = 1.9 +0.4 −0.3 , than in our fiducial model.Differences to the compared mass-concentration relations go down to 2.0-2.9 σc.The M200-c posterior for this model is shown in the bottom panel of Figure 9.

Additional complexity on s1
Our fiducial model assumes a spherically symmetric mass distribution for s1, though its light profile is noticeably elliptical (see e.g. the top panels of Figure 5).We therefore perform a systematic test where we assign a Singular Isothermal Ellipsoid (SIE) to s1 rather than an SIS.This adds two parameters to our fiducial models: the axis ratio, q, and position angle, φ, of s1.Our test shows that a smooth model prefers the presence of ellipticity components on s1 over the presence of a substructure in the main deflector with 2.9σ confidence, where both scenarios have the same number of degrees of freedom.Modelling smooth and tNFW-perturbed models with an ellipsoidal s1 simultaneously yields a substructure of total mass log 10 (M sub /M⊙) = 9.20 +0.35 −0.21 , virial mass log 10 (M200/M⊙) = 10.04 +1.31 −0.52 and concentration log 10 c = 2.53 +0.59  −0.40 detected at the 4.8σ confidence level; this is a lower evidence substructure result than the tNFW perturbation with multipoles.The difference to the ΛCDM predictions of the mass-concentration relation remain at a level of 2.5-3.1 σc.

A wandering black hole?
Since the dark halo in M21 is hard to accommodate within ΛCDM and our results have only partially alleviated that tension, it is worth considering alternative hypotheses for the perturber in J0946.Given the anomalously high concentration, and the surprising lack of a galaxy hosted within the halo, we investigate whether the perturber could be a supermassive black hole (see e.g.Ricarte et al. 2021).
The non-zero multipoles of the lens mass and the disrupted morphology of the light profile of the lens galaxy are characteristics of a merger where the ejection of such a black hole may not be implausible, either through 3-body ejection (Hoffman & Loeb 2007) or gravitational radiation recoil (Campanelli et al. 2007).
To test this proposal, we fit a 3-source model with a EPL, external shear and a point mass at the main deflector redshift, and recover a point mass of log 10 (MBH/M⊙) = 8.94 +0.19  −0.08 .Given J0946 has a velocity dispersion of ∼280 km s −1 Gavazzi et al. (2008), the M -σ relation implies that there should be a black hole of a few times 10 9 M⊙ (Kormendy & Ho 2013) at the centre of the lens.Thus, the proposed "wandering" black hole would need to be of comparable mass to the expected central black hole.
The point mass-perturbed model is formally preferred over the equivalent tNFW-perturbed model at 2.7σ.This is not definitive evidence and does not account for any prior preference between the models.This result is also driven purely by Occam's razor: the point mass perturbed model has a slightly lower likelihood than the tNFW model but has fewer parameters.
As the right panel of Figure 6 shows, the s3 image positions are sensitive to the change in mass profile, and the MUSE data is better reproduced with a point mass perturber.The significance of this is marginal, given that in all three panels the predicted centroids are well within the brightest parts of the s3 images.A more sophisticated treatment of s3 with higher-resolution data would be necessary to discriminate between possible density profiles for the perturbation.

Alternative Lens Light Subtraction
Previous studies (e.g.Nightingale et al. 2022) have highlighted the importance of an accurate model of the light distribution of the foreground lens, since lens light residuals may lead to false positive dark substructure detections.
The bulk of this work used the lens light subtraction of CA14.That light subtraction followed the method of Auger et al. (2011).It assumes that the lensing mass is a singular isothermal ellipsoid with external shear, and both sources are single Sérsic profiles.The lens light was then described as the sum of 3 Sérsic profiles.The best fitting triple-Sérsic model was then subtracted to give our fiducial lens lightsubtracted image.
To quantify the potential source of systematic error on our substructure detection confidence coming from inaccuracies in lens light subtraction, we repeat our triple plane analysis from Section 4 after performing an alternative lens subtraction on the data, by way of a Multi-Gaussian Expansion (MGE) fitted to the lens light and subtracted from the original image.We simultaneously fit the lens light with a 15 component MGE and s1 with a pixelated grid lensed by an EPL plus external shear model.The MGEs were concentric, but their amplitude, scale radius, ellipticity and position angle were free parameters for each component.We then subtracted off the MGE decomposed light profile to leave us with an alternative lens light-subtracted image to analyse.
With this alternative lens light subtraction we find a preference for the substructure with ∆ ln Z = 11.84 ± 0.03, corresponding to a 4.9σ detection, this is lower than the 5.9σ found with the original lens light-subtracted image.This example shows that whilst different reasonable choices for modelling lens light can alter the confidence of a substructure detection, the impact is unlikely to fundamentally change conclusions about the presence, or absence, of a substructure in J0946.

Impact of the Point Spread Function
We also test the potential for systematic errors to be introduced by a poor choice of PSF.The PSF in all our modelling was a well-motivated choice, having come from a star within the instrument's field of view.To investigate the impact of a poor choice of PSF on the detection of the substructure, we rotate our PSF through 90 degrees.Repeating the triple plane analysis of Section 4, but with the rotated PSF we recover the substructure with ∆ ln Z = 12.65 ± 0.03, corresponding to a 4.7σ detection.Therefore, the substructure detection again remains robust to ∼ 1σ accuracy, even assuming a blurring operator which mismatches the instrument's optics.

CONCLUSIONS
In this paper, we have presented a gravitational imaging case study of the compound lens SDSSJ0946+1006.Our model is remarkably successful in its ability to simultaneously reproduce the images of two background sources in this system in both the HST I and U bands and the image positions of a third source observed by MUSE.
By including multiple sources in our analysis, we were able to lift many of the lens modelling degeneracies that are present for a single source plane lens, whilst modelling multiple passbands simultaneously enabled us to probe different source structures, and possibly different regions in the source plane, thus disentangling structure in the lens from structures in the source3 .By comparing the Bayesian evidence of a smooth halo model to that of a tNFW-perturbed model, we test the claims that a dark subhalo exists in J0946 (in agreement with e.g.V10, Nightingale et al. ( 2022)).Our model prefers the existence of a subhalo with an evidence ratio ∆ ln Z = 19.64 ± 0.03 over the smooth model, corresponding to a 5.9σ detection.
The virial mass of the halo is log 10 (M200/M⊙) = 10.3 +1.2 −0.6 , and its concentration is log 10 c = 2.4 +0.5 −0.3 , which is 2.6-3.3σchigher than predicted by simulations.This is a much weaker tension than reported in M21 due to the inclusion of more data, the use of wider priors, and our more flexible source model.Additionally, Nadler et al. (2023) recently showed that gravothermal core collapse seen in some Self-Interacting Dark Matter (SIDM) models (Despali et al. 2019) is a potential mechanism to produce the substructure reported by M21; our less concentrated result should therefore be even easier to accommodate in SIDM.
The stellar mass of the subhalo, M * ∼ 10 7.5 M⊙, implied by its virial mass indicates that any luminous component to the subhalo would not be possible to detect in the data, given its proximity to the upper arc image of s1 or possible blending with residual flux from the subtracted light profile of the lens.It is therefore unsurprising that the lensing perturber is dark, and we cannot confidently distinguish between it being a dwarf satellite galaxy or a DM substructure of the main deflector.
We can alternatively model the data with a black hole of log 10 (MBH/M⊙) = 8.94 +0.19  −0.08 , which is preferred over the truncated NFW profile at 2.7σ due to having fewer degrees of freedom.This scenario represents a supermassive black hole being ejected from the lens galaxy as a consequence of a merger event.For the M -σ relation to hold, our resultant wandering black hole has comparable mass to the black hole expected at the centre of the lens galaxy.
Our analysis confirms that the distant source s3 is especially sensitive to the properties of the lensing perturbation, but the results are currently limited by the relatively low angular resolution of the MUSE data.High-resolution imaging of this source would be extremely powerful to probe the profile of the dark substructure, but will require a substantial investment of telescope time.
We also tested changes to the shape of the mass distribution in the macro-model by fitting of third and fourth order multipoles, as well as fitting for the ellipticity of s1.
Whilst our macro-model has moved somewhat under these changes, our highest evidence model (with multipoles in the main deflector) yields ∼ 6σ preference for the presence of a substructure in J0946.Its substructure gives the best compatibility with CDM simulations that we have found, at 2.0σc.
We demonstrated that we are able to recover the subhalo with much higher confidence (11.3σ versus 5.9σ) when regularising over the curvature of the sources rather than the gradient of the sources.Curvature regularisation makes the sources intrinsically smoother whilst the addition of a dark substructure counteracts this by adding small-scale perturbations to the arcs.However, the Bayesian evidence vastly prefers our fiducial gradient regularisation scheme.
Ultimately, we conclude that precision lens modelling is challenging.Alongside cosmography, gravitational imaging is perhaps the hardest lens modelling challenge of all.Even with the luxuries afforded by a compound lens in its ability to suppress the mass-sheet degeneracy, there are nuances in how complexity is afforded to the lensing mass models, and the reconstruction of light profiles in background sources, that make it difficult to draw conclusions about small-scale structures with certainty.Much care needs to be taken over the choices of the background source model before embarking on detailed lens modelling.In reality, random draws from the priors of curvature or gradient regularised sources look nothing like astrophysical galaxies: ultimately neither regularisation scheme is physical; much more work is needed to understand how to reconstruct sources, and the need for evidence calculations will make this work computationally expensive.The potential payoff for this work is huge: with hundreds of thousands of lenses to be discovered in the next decade (Collett 2015), gravitational imaging should yet place stringent constraints on the small-scale validity of ΛCDM.

APPENDIX A: SCALING CONVENTION OF OUR EINSTEIN RADII
For multi-plane lensing, the Einstein radius of a lens is not a well defined quantity since it changes with the source redshift.The physical deflection angle is source redshift independent, but it is not a convenient quantity for strong lensing, since it would require angular diameter distances to appear throughout the lens equation.Schneider et al. (1992) adopt a convention for scaling deflection angles, such that the Einstein radius, ϑE, of any deflector is defined with respect to light originating on the final source plane.This produces the following, recursive multiplane lens equation, which relates θν , the angular position on plane ν, to the image plane position, θ1: where beta is defined as with s representing the most distant source.
As pointed out by CS20, Equation A1 is inflexible to the discovery of additional, more distant source planes.One option to circumvent this is to define Einstein radii acting on sources coming from infinity, but such sources are never observable so these Einstein radii would be highly degenerate with other lens model parameters.Instead, we derive a modified scaling relation that defines Einstein radii as acting on the source most immediately behind each lens.
For a lens at redshift z l , we compute the same projected mass density, Σ(θ), regardless of whether we measure the Einstein radius from a lensed source at redshift zi or zj.Recalling that Σ(θ) = κ(θ)Σcrit, we must therefore always satisfy the condition κ(θ, zi)Σcrit(z l , zi) = κ(θ, zj)Σcrit(z l , zj). (A3) We therefore define the scaling relation ) This is mathematically identical to the scale factor βµν in Schneider et al. (1992), though the subscripts are defined differently.Their scaling parameter, βµν , is defined such that when ν is the redshift of the furthest source, βµν = 1.Our equivalent condition, ηij = 1, is achieved when j = i.Physically, this corresponds to the source being on the plane most immediately behind the lens.Throughout this work, we opt to use ηij, such that the Einstein radius of the main deflector is quoted according to the Einstein ring produced by s1 observed in the image plane.The Einstein radius we quote for s1 would correspond to an Einstein ring being produced by s2 observed at s1, and so on.This is an intuitive convention as it preserves the value of Einstein radius for the main deflector from previous single source plane studies of J0946, which is approximately measurable by visual inspection given the physical scale of a pixel.
Equation 3 describes the ray tracing through multiple source planes using this new scaling parameter.Explicitly, the single, double and triple source plane cases become and

APPENDIX B: THE LIKELIHOOD FUNCTION OF OUR MODEL
We construct our likelihood similarly to other gravitational imaging methods (see e.g. Warren & Dye 2003;Vegetti & Koopmans 2009;Nightingale & Dye 2015).We assume that the data vector d is a linear function of the source brightness s and the instrumental noise n: The operator F (ξ) = M P L(ξ) is the product of the mask operator, M , the blurring operator describing the effects of the band-specific PSF, P , and the lensing operator mapping each source pixel to the corresponding positions of the lensed images according to the lens model parameters, L(ξ).This mapping follows the interpolation scheme described in Section 2.3.3.Furthermore, the source s and the noise n follow zero-mean Gaussian priors with the covariance matrices R and C, respectively.As a result, the source marginalised likelihood for one source becomes By choosing mutually exclusive masks for the two Einstein radii, we can decompose the likelihood into individual likelihoods.For source i in band j, the source-marginalised likelihood is given by operator with respect to source i and the point spread function of band j.The product of the above likelihoods provides the corresponding joint likelihood: To remove over-and under-focused solutions from our likelihood, we trace four of the brightest image plane pixels of s1 to their source plane, where they are expected to correspond to roughly the same location.Whenever a pixel from outside the first source mask falls within one standard deviation around the centre of these four points, we decrease our likelihood by a factor of 10 −10 .This modification prevents our posterior from containing very smooth but non-physical source reconstructions that resemble the rings of the original data rather than a realistic source galaxy, for example, the scenario where there is no lensing and the sources are intrinsically arcs.Such solutions would otherwise be allowed by our mass model and source model priors.
Finally, we include the image positions of the third source in our likelihood by adding an additional Gaussian likelihood term to Equation B4.This term punishes the chisquared difference between predicted image positions of the third source for given model parameters ξ and observed positions in the VLT-MUSE data, with the positions and uncertainties taken from CS20.

B1 Source interpolation and likelihood
The choice of interpolation scheme affects the stability and speed of our analyses.While linear interpolation schemes on Delaunay meshes have been popular in previous studies (e.g.V10), they tend to create discontinuities in the likelihoods of lens modelling.In contrast, natural neighbour interpolation gives rise to much smoother likelihoods.For example, Figure B1 illustrates this behaviour as a function for parameters close to a maximum a posteriori point.Although the parameter ranges plotted are smaller, the linear interpolation scheme using the adaptive Delaunay mesh shows an abundance of peaks and troughs.
While a single likelihood evaluation tends to be slower for the natural neighbour interpolation scheme, the discontinuities and abundance of local maxima in the linear interpolation require more evaluations overall.Furthermore, sampling algorithms may get stuck in local maxima and, as a result, may underestimate the uncertainty of various model parameters.The Bayesian evidence prefers the natural neighbour interpolation scheme for our fiducial model with ∆ ln L ≳ 10.

APPENDIX C: TABLES OF RESULTS
Here we show the assumed priors and inferred posteriors on all our model parameters for all our discussed results.Table C1 contains the results for the single plane cases (one band and two band) and triple plane fiducial result, all assuming gradient regularisation.Table C2 contains single and triple plane results similarly, but with curvature regularisation.Table C3 shows results from the systematic tests on mass model complexity: including multipoles in the foreground lens, adding ellipticity to s1, and modelling the perturber as a point mass instead of a subhalo.

APPENDIX D: MACRO-MODEL POSTERIORS
Here we present how the macro model posterior distributions behave for the smooth and tNFW-perturbed cases of our simultaneous I-and U-band single plane model (Figure D1) and our fiducial triple plane model (Figure D2), similarly to Figure 4.

APPENDIX E: SOURCES AND RESIDUALS FROM SYSTEMATIC TESTS
Here we present all of the best fit sources and normalised image plane residuals corresponding to the systematic tests in Section 5. Figure E1 shows the results of modelling with curvature-regularised sources; Figures E2, E3 and E4 show the results with multipoles in the foreground lens, an SIE on s1, and a point mass as the perturber, respectively.
Table C1.Chart showing all lens model results obtained using gradient-regularised source reconstructions.Posteriors are quoted as medians with 1σ error bars.The total mass of the substructure, M sub , is a derived parameter, obtained by integrating over the whole tNFW profile, and is shown here as well as the non-truncated virial mass, M 200 , which is sampled.Note that ∆x and ∆y indicate where a centroid coordinate is quoted relative to the mean s1-plane location of four bright conjugate points from the inner set of image plane arcs.

Figure 1 .
Figure 1.HST imaging of J0946 in the I-band (left) and U-band (middle), and continuum-subtracted VLT-MUSE narrow-band imaging (width 5 Å centred at 8475 Å) showing the Ly-α emission at z = 5.975 (right).The cyan cross represents the best fit location of the substructure in as reported in V10 (which is visually indistinguishable from the best fit location in M21).

Figure 2 .
Figure 2. The data pixels used in our modelling of s1 (magenta masked) and s2 (green masked) in I-band (top) and U-band (bottom) HST data.All other pixels are ignored.For illustrative purposes, the image contrast of s2 is enhanced and a central region of image pixels is removed.

Figure 3 .
Figure 3. Mock data for our sensitivity test, where panels (left to right) show the initial model image, a zoomed inset around the location of the reported substructure, the effect of blurring by the HST I-band PSF, and the addition of background noise akin to the original HST I-band data.The top row is created from a smooth model for the lens, whilst the bottom row has an injected tNFW subhalo with the parameters of M21 at the cyan cross.The bottom right panel is used as mock data to recover the injected substructure with ∼ 5σ confidence.

Figure 4 .
Figure4.1σ and 2σ contours of the posterior distribution for the EPL and external shear parameters for our model of s1 in I-band only, with (cyan) and without (orange) the addition of a tNFW substructure.Inset: best fit source reconstruction (left) and residuals between the data and best fit model in units of standard deviation (right).These panels correspond to the tNFW-perturbed models, but are visually indistinguishable to the best fit smooth model results.

Figure 5 .
Figure 5. Source plane reconstructions and normalised image plane residuals for our best fit smooth (left) and tNFW-perturbed (right) model, for (from top to bottom) s1 in I-band, s1 in U-band, s2 in I-band and s2 in U-band.

Figure 6 .
Figure6.The 1σ and 2σ astrometric uncertainties (black contours) on the two image plane positions from the MUSE data (background image) with our posterior of s3 centroids forward ray-traced through our posterior of lens models to give our predicted 1σ and 2σ uncertainties on the image plane positions of s3, for our smooth (orange), tNFW-perturbed (cyan) and point mass-perturbed (magenta) models.

Figure 7 .
Figure 7. Isophotes of the I-band s1 reconstruction given the best tNFW-perturbed and smooth results from (top) the single plane modelling and (bottom) triple plane modelling.The alignment of the two source reconstructions in the latter case is indicative of a broken mass-sheet degeneracy.

Figure 9 .
Figure 9.The 1σ and 2σ M sub -c posterior for our single and triple plane model fits utilising gradient regularisation (top), as well as for alternative source reconstruction and mass model assumptions (middle).The M 200 -c posterior for our highest evidence models from each of these two panels (fiducial and multipoles) are plotted against an M 200 -c relation for CDM halos from Dutton & Macciò (2014), with 1σ and 2σ uncertainty (bottom).

Figure B1 .
Figure B1.Slices through the logarithmic likelihood in (from left to right) the Einstein radius, ϑ E , and slope, γ, of the main deflector, and the virial mass, log 10 (M 200 /M ⊙ ), of the tNFW substructure, shown for natural neighbour and linear interpolation schemes.Both cases have their mean likelihood subtracted for easier comparison.While the linear interpolation scheme shows many local discrete jumps, the natural neighboring interpolation renders the likelihood smooth with the highest peak much easier to identify.

Figure D1 .
Figure D1.Equivalent to Figure 4, for our single source plane I-and U-band models.The source reconstruction and image plane residual panels correspond to the tNFW-perturbed models, in (top) the I-band and (bottom) the U-band.

Figure D2 .
Figure D2.Equivalent to figures 4 and D1, for our triple source plane I-and U-band models.Source reconstructions and image plane residuals for the smooth and tNFW-perturbed cases are shown in Figure 5.

Figure E1 .
Figure E1.Source plane reconstructions and normalised image plane residuals for our best fit smooth (left) and tNFW-perturbed (right) model, for (from top to bottom) s1 in I-band, s1 in U-band, s2 in I-band and s2 in U-band, where all sources are modelled with curvature regularisation.

Figure E3 .
Figure E3.Source plane reconstructions and normalised image plane residuals for our best fit smooth (left) and tNFW-perturbed (right) model, for (from top to bottom) s1 in I-band, s1 in U-band, s2 in I-band and s2 in U-band, where s1 is modelled as an SIE.

Figure E4 .
Figure E4.Source plane reconstructions and normalised image plane residuals for our best fit model, for (from top to bottom) s1 in I-band, s1 in U-band, s2 in I-band and s2 in U-band, where the perturber is modelled as a point mass.

Table 1 .
The differences in best fit log-likelihood ∆ ln L and logevidence ∆ ln Z, between smooth and tNFW-perturbed models, shown for our single source plane and triple source plane results.These differences are quoted relative to the smooth case, such that positive values indicate preference for the tNFW-perturbed model.In brackets are the corresponding confidences of the detections.
2D posterior distributions for the total mass, log 10 (M sub /M ⊙ ), and concentration, log 10 c, of the substructure, against a selection of other lens model parameters: (from left to right) the Einstein radius, ϑ E , power law slope, γ, axis ratio, q, and position angle, φ, of the main deflector, external shear strength, Γ, and Einstein radii of s1 and s2, ϑ E , respectively.