Time-delay cosmographic forecasts with strong lensing and JWST stellar kinematics

We present a joint strong lensing and stellar dynamical framework for future time-delay cosmography purposes. Based on a pixelated source reconstruction and the axisymmetric Jeans equations, we are capable of constraining cosmological distances and hence the current expansion rate of the Universe ($H_0$) to the few percent level per lens, when high signal-to-noise integral field unit (IFU) observations from the next generation of telescopes become available. For illustrating the power of this method, we mock up IFU stellar kinematic data of the prominent lens system RXJ1131-1231, given the specifications of the James Webb Space Telescope. Our analysis shows that the time-delay distance ($D_{\Delta t}$) can be constrained with 3.1% uncertainty at best, if future IFU stellar kinematics are included in the fit and if the set of candidate model parameterisations contains the true lens potential. These constraints would translate to a 3.2% precision measurement on $H_0$ in flat $\Lambda$CDM cosmology from the single lens RXJ1131-1231, and can be expected to yield an $H_0$ measure with<2.0% uncertainty, if similar gains in precision can be reached for two additional lens systems. Moreover, the angular diameter distance ($D_\mathrm{d}$) to RXJ1131-1231 can be constrained with 2.4% precision, providing two distance measurements from a single lens system, which is extremely powerful to further constrain the matter density ($\Omega_\mathrm{m}$). The measurement accuracy of $D_\mathrm{d}$, however, is highly sensitive to any systematics in the measurement of the stellar kinematics. For both distance measurements, we strongly advise to probe a large set of physically motivated lens potentials in the future, to minimise the systematic errors associated with the lens mass parameterisation.


INTRODUCTION
According to our standard cosmological model, we live in a flat, cold, dark matter and dark energy dominated Universe (ΛCDM). While little is yet known about the nature of dark matter and dark energy, our standard cosmological model -anchored mainly through measurements of anisotropies in the Cosmic Microwave Background (CMB) -has been well established and provides an accurate description of e.g. the large scale structure formation and distribution (Spergel et al. 2007) and the abundance of various elements in the Universe. As powerful as this model is, however, it has been constantly facing challenges. On small scales, the well-known E-mail: yildirim@mpa-garching.mpg.de "core-cusp" issue (Moore 1994;McGaugh & de Blok 1998) as well as the "Missing Satellites Problem" (Kauffmann et al. 1993;Klypin et al. 1999;Moore et al. 1999;Boylan-Kolchin et al. 2011) are stubbornly defying predictions from cosmological N -body simulations within the ΛCDM framework. Certainly, some of the disagreements can be attributed to observational effects (van den Bosch et al. 2000), underlying modelling assumptions (Evans et al. 2009) and our ignorance of the small-scale physics and baryonic feedback processes and interactions (Oh et al. 2011), but an agreement between observations and simulations is still distant. Similarly, discrepancies arise at large scales. After making assumptions for a handful of parameters, such as spatial flatness and a constant dark energy equation of state of −1 (corresponding to the cosmological constant Λ), the standard cosmolog-ical model provides stringent constraints for the expansion rate of the Universe H0 through observations of the CMB (Planck Collaboration et al. 2018), which appears to be at odds with local measurements based on Cepheids and Type Ia Supernovae (Riess et al. 2018a(Riess et al. ,b, 2019. Especially the latter has been of particular interest lately. Given the significant 4.4σ discrepancy between the most recent measurements from both the Planck Collaboration and the Cepheid distance ladder, this result is generally interpreted as corroborating evidence for a non-standard cosmological model. Relaxing our assumption about spatial flatness, a constant dark energy equation of state (i.e., not fixed to −1 that corresponds to the cosmological constant Λ) or increasing the number of relativistic species would allow us to reconcile both measurements. But, before such drastic conclusions are drawn, independent measurements of H0 should be carried out, if feasible, to test for possibly unknown systematics in any single method and to assess the need for physics beyond the standard model.
Time-delay cosmography (TDC; Refsdal 1964) provides a methodologically independent tool for measuring H0 to the percent level (see recent reviews by, e.g., Treu & Marshall 2016;Suyu et al. 2018). By means of a multiply imaged, time-variable background source and an accurate description of the foreground lens mass distribution, the time-delay distance (D∆t) can be inferred, which is inversely proportional to the Hubble-Lemaître constant H0. The technique has long been plagued by poor time-delay measurements, invalid assumptions about the lens mass profile and systematic errors. However, it has been demonstrated that exhaustive studies of lensed quasars with exquisite light curves allow the measurement of H0 for a single system with an accuracy of ∼ 7% . In addition, it was shown that TDC leads to tight constraints on other cosmological parameters, competing with those from contemporary Baryon Acoustic Peak studies, when each probe is combined with the CMB .
In light of the aforementioned discrepancy between the current best cosmological probes, TDC has gained momentum and the H0LiCOW 1 (H0 Lenses in COSMOGRAIL's Wellspring) program has been initiated, which aims at measuring H0 with better than 3.5% precision and accuracy . As part of these efforts, complementary data sets consisting of i) high-cadence and long-baseline monitoring of quasar light curves, mostly through COSMO-GRAIL 2 , ii) high-spatially resolved photometric observations of the foreground lenses, lensed background quasars and quasar hosts, iii) wide-field photometric and spectroscopic observations of the lens' environments and iv) stellar kinematic data of the foreground lenses have been obtained. Each of these ingredients are crucial to break the inherent modelling degeneracies, i.e. the mass-sheet degeneracy (MSD; Falco et al. 1985), in strong lensing studies and to reliably pin down the time-delay distance and hence the Hubble-Lemaître constant in a single system. As of now, H0LiCOW reports a 3.0% measurement of H0 in flat LCDM, 1 H 0 Lenses in COSMOGRAIL's Wellspring; http://www. h0licow.org/ 2 COSmological MOnitoring of GRAvItational Lenses; https:// cosmograil.epfl.ch based on a joint analysis of 4 strong lensing systems (Birrer et al. 2019;Bonvin et al. 2017;Wong et al. 2017;Suyu et al. 2014). Yet, it is noteworthy that an H0 measurement which is comparable in precision with the best available probes (i.e. ∼ 2%) would still require the combination of almost a dozen lenses (Shajib et al. 2018). Accordingly, a measurement with 1% precision -a value that is considered as being highly beneficial for any Stage III and IV cosmological study to further constrain the dark energy equation of state (Weinberg et al. 2013) -would not be available until 40 such measurements have been carried out with similar precision.
Given these numbers and forecasts, a truly competitive TDC probe would greatly benefit from a much improved accuracy and precision for each lens study. In particular, three sources of uncertainty have been identified as the biggest contributors to the total time-delay distance error budget, i) the time delays, ii) the mass along the line of sight (LOS), and iii) the lens mass parameterisation. Assuming that future time delays can be measured to the percent levelbased on long-baseline optical monitoring campaigns and new curve-shifting algorithms (Tewes et al. 2013a) -, any time-delay cosmological probe that aims at obtaining an accuracy and precision of 1-2 percent in the near future, will have to drastically improve their estimate of the external convergence (κext) associated with LOS mass distributions and lift the modelling degeneracy due to different lens mass parameterisations.
Interestingly, McCully et al. (2017) developed a new framework to model LOS mass distributions efficiently and quantified the environmental effects through realistic simulations of lens fields. By reconstructing the three-dimensional mass distribution of strong-lens sightlines, they obtain constraints on κext that are consistent with those from statistical approaches of combining galaxy number density observations with N-body simulations (Hilbert et al. 2007(Hilbert et al. , 2009Collett et al. 2013;Suyu et al. 2014), but with a 4 times narrower distribution which yields much stronger priors on κext (which affects H0 linearly). Progress in reducing the uncertainty in the lens mass parameterisation, on the other hand, has been moderate. Stellar kinematic data of the foreground lens are now commonly employed to break the MSD and to align the time-delay distance measurements when e.g. a Navarro-Frank-White (NFW; Navarro et al. 1996Navarro et al. , 1997 or power law profile are adopted for the lens mass model. But, follow-up kinematic observations from adaptive-optics (AO) assisted ground-based facilities struggle to go beyond a single aperture-averaged velocity dispersion measurement, due to the difficulty in separating the bright quasar light from the foreground lens galaxy and the faintness of the lens itself. As a consequence, the final precision on H0 is currently limited to ∼ 7% from a single lens system. Moreover, timedelay studies employing kinematic data assume a spherical mass model for recovering the aperture-averaged stellar velocity dispersion, whereas the strong lensing mass model is of elliptical or even triaxial in nature, and are thus not fully self-consistent. Whether this assumption introduces a bias in the inferred time-delay distances also remains to be seen.
To drastically improve the precision and accuracy of a single lens study, more flexible dynamical models along with both high-spatially resolved observations that map the 2D stellar kinematics in great detail and sufficient signal-to-noise (S/N) to reliably extract the kinematic moments across the entire field-of-view (FOV) are necessary. The next generation of telescopes -such as the James Webb Space Telescope (JWST ), the Thirty Meter Telescope (TMT ) and the European Extremely Large Telescope (E-ELT ) -will provide the required improvement in sensitivity and resolution. The aim of this paper is to present a fully self-consistent, physically motivated modelling machinery for TDC, that will be capable of exploiting this data set to its full potential. Given the specifications of JWST, we will create mock stellar kinematics and, based on a joint strong lensing & stellar dynamical analysis, forecast the cosmological constraints from future space-and ground-based telescope observations. The paper is organised as follows. In Section 2, we cover the strong lensing and stellar dynamical theory and formalism. Section 3 will be used to present the already available HST observations, time delays and mock future stellar kinematics of the prominent strong lens configuration RXJ1131−1231. We model the data of RXJ1131−1231 within a Bayesian framework in Section 4, show the probability density function (PDF) for its time-delay distance and lens distance (D d ) and discuss possible sources of uncertainty. The inference of the cosmological parameters is carried out in Section 5 and finally followed by a summary in Section 6.
Throughout this paper, we adopt a standard cosmological model with H0 = 82.5 km s −1 Mpc −1 , a matter density of Ωm = 0.27 and a dark energy density of ΩΛ = 0.73, where our particular choice for H0 = 82.5 km s −1 Mpc −1 is driven by the time-delay distance measurements of RXJ1131−1231 in Suyu et al. (2014).

THEORY
Our joint strong lensing and stellar dynamical modelling machinery relies on a pixelated source fitting algorithm and the solutions of the Jeans equations in axisymmetric lens geometry, which are embedded in a Bayesian framework (see e.g. Barnabè & Koopmans 2007;van de Ven et al. 2010, for similar approaches). For brevity, we refer the reader to Suyu et al. (2006Suyu et al. ( , 2013 and Cappellari (2008), which cover in detail the theory and application to real observational data. Here, we confine ourselves to a description of the main formalism.

Time-delay strong lensing
In any strong lens configuration with a time-variable background source, an excess time delay will be observed. Here, θ is the angular image position, β the corresponding source position, z d the lens redshift, D d , Ds & D ds the angular diameter distance to the lens, the source and between the lens and source respectively, and the Fermat potential. The difference in the light propagation time at image position θ with respect to the non-lensed case can therefore be attributed to the first and second term in the Fermat potential, which represent the geometric excess path length and the gravitational time-delay of the lens potential ψL( θ) respectively, and a combination of cosmological distances which are generally referred to as the time-delay distance With D∆t being inversely proportional to H0, Eq.1 can be rewritten as i.e., the excess time delay can be used as a cosmological probe, if the form of the lens potential is sufficiently well known. However, since t( θ, β) itself cannot be measured, we rely on relative time delays between multiple images i and j, with e.g. quadruply imaged systems naturally providing more constraints than doubly lensed sources. Given the observables θi and ∆tij, the lens potential ψL and the source position β need to be modelled accurately to infer H0. A major drawback of this inference, though, is the MSD 3 . For illustration purposes, we assume a transformation of the lens potential ψL of the form where λ, c0 and s are constant scalars and vectors respectively. Moreover, the projected matter density ρ2D is related to its gravitational potential via Poisson's equation where is the projected dimensionless surface mass density (SMD) and the critical SMD, that is used to discriminate between the weak (κ 1) and strong lensing regime (κ 1). According to Eqs. 6 and 7, any transformation ψ L,λ ( θ) of ψL( θ) will translate into a transformed source position and a dimensionless SMD of the form That is, as λ, c0 and s only change the position and scaling of the source, which by itself is not directly observable, the above transformation essentially implies that any transformation of the lens potential (and hence of the dimensionless lens SMD) will be compensated by a corresponding scaling in the source plane, leaving many observables invariant under the transformation. Unfortunately, however, D∆t does not belong to this set of invariants. Being highly susceptible to the MSD, D∆t will also be scaled as follows with D model ∆t being the model time-delay distance (without accounting for the mass-sheet-transformation parameter λ in Eq. 11), and any cosmological inference based on strong lensing alone is therefore fundamentally limited by our ignorance of λ.
Since lensing is sensitive to all mass along the LOS, including small and large scale structures in the projected vicinity which can contribute to the SMD at the lens location, the MSD is inherently linked to this external convergence (κext). In fact, the MSD stems from the degeneracy between κext and the normalisation of the lens potential (but see Schneider & Sluse 2013, for a critical discussion). Consequently, Eq. 12 reduces to where D∆t is the true time-delay distance to the specific sightline of the lens, after accounting for the MSD. In contrast to λ, though, κext has the benefit of not simply being an arbitrary scaling of the lens potential, but being observationally and/or numerically assessable via photometric and spectroscopic observations of the lens environment (Fassnacht et al. 2006;Momcheva et al. 2006;Rusu et al. 2017;Sluse et al. 2017) as well as ray-tracing methods through e.g. the Millennium Simulations (Hilbert et al. 2007(Hilbert et al. , 2009Greene et al. 2013;Collett et al. 2013) and weak lensing (Tihhonova et al. 2018). Whereas early attempts to quantify κext have been only moderately successful, yielding external convergences that can affect the final measurement of H0 by 5% and more, more recent studies indicate that the distribution and impact of κext can be drastically reduced when e.g. sightlines are not significantly overdense (Greene et al. 2013;Rusu et al. 2017) or individual lens fields are modelled (McCully et al. 2017). Nonetheless, other means are needed to effectively break the MSD, and to reliably measure D∆t. This is particularly evident from Eq. 4 and 7. Assuming, for instance, a simple power law profile for the 3D density distribution (i.e. ρ3D( r) ∝ r −γ ), the lens potential becomes ψL ∝ r 2−γ . Any uncertainty in the slope of the mass density will thus translate into an uncertainty on the inferred time-delay distance (D∆t ∝ 1 γ−1 ). To constrain γ, methods have been developed to make use of the spatially extended lensed images of the source galaxy, i.e. the Active Galactic Nuclei (AGN) host galaxy in the case of time-delay lenses (e.g. Warren & Dye 2003;Suyu et al. 2006;Dye et al. 2008;Birrer et al. 2015). In this work, we follow the work of Suyu et al. (2006) and Suyu et al. (2013) for the lens modelling by describing the AGN host galaxy surface brightness on a grid of pixels, and the lens mass distribution with parameterised profiles. In addition, stellar kinematics of the lens galaxy are employed, which provide an independent assessment of the lens potential at different radii, further breaking lens mass model degeneracies for constraining D∆t.

Axisymmetric Jeans modelling
The dynamical state of a system of particles is fully described by its distribution function (DF) f ( x, v) 0, with particle positions x and velocities v. In the case that these particles are collisionless, interact purely via gravitational forces and are embedded in background potential ψD that is smooth in time and space, the time evolution of the DF is subject to the Collisionless Boltzmann Equation (CBE) which basically postulates a conservation of the phase-space density. Yet, as the phase-space distribution is not accessible for objects beyond our Galaxy, where only bulk motions and positions of stars along specific LOSs are observed, the CBE is impractical for real observational purposes. In fact, any real world application of the CBE would require a more practical formalism, incorporating kinematic moments which are more easily measurable via line profile shifts and widths. This can be achieved by multiplying the CBE with powers of the velocity moment and subsequent integration over velocity space. Further, rewriting the CBE in terms of the cylindrical coordinate system (R, z, φ) and under the assumption of axial symmetry, we obtain the two Jeans equations (Jeans 1922) where ν( x) = f d 3 v is the zeroth velocity moment and tracer density of the gravitational potential ψD 4 . Given the four unknown second-order velocity moments v 2 i = 1 ν f v 2 i d 3 vi and vivj = vivjf d 3 v, the Jeans equations do not have a unique solution. In practice, assumptions about the shape and alignment of the velocity ellipsoid are made to simplify Eq. 15 and 16. These usually include the alignment of the velocity ellipsoid with the cylindrical coordinate system (i.e. vRvz = 0) and a flattening in the meridional plane (i.e. vz v R − 1 = βz), which yield the more commonly seen form These equations now link a mass and tracer density to three intrinsic second-order velocity moments which, in turn, can be used to obtain a projected second-order velocity moment along the LOS Here, x and y are the cartesian coordinates on the plane of the sky, z the coordinate along the LOS, i the inclination angle, µ the observed surface brightness (SB)in contrast to ν, which represents the (deprojected) intrinsic luminosity densitycos φ = x/R (where x and R 2 = x 2 + y 2 denote the intrinsic coordinate axis and cylindrical radius) and v and σ the observed velocity and velocity dispersion (with Vrms = √ v 2 + σ 2 ). The assumptions for the shape and alignment of the velocity ellipsoid have been found to be a good description of the internal dynamical structure of fast-rotating ) early-type galaxies (ETGs; Cappellari et al. 2007). Hence, the axisymmetric Jeans equations provide a decent fit to the observed kinematics and are usually in agreement with constraints that are obtained via more sophisticated orbit-based dynamical models. However, the Jeans equations do not make use of the higher order kinematic moments, which contain valuable information regarding the intrinsic shapes of galaxies. This is most prominent for massive, slow-rotating and pressure supported ETGs, where generally worse fits are obtained as the assumption of axial symmetry also breaks down (Li et al. 2018). In principle, the Jeans equations can be extended to triaxial systems, consisting of three equations and six second-order moments, but the set of solutions still contains unphysical DFs (f < 0) (van de Ven et al. 2003). We proceed with these caveats in mind and note that the validity of the axisymmetric Jeans equations needs to be evaluated on a case by case basis, especially for the purposes of precision cosmology, where we aim to constrain the mass profile and hence time-delay distances to the percent level.
When constructing axisymmetric Jeans models, the intrinsic luminosity density is obtained by deprojecting the observed SB distribution. For this, we make use of a Multi-Gaussian Expansion (MGE) (Monnet et al. 1992;Emsellem et al. 1994) 5 . In brief, the distribution is parameterised by a set of two-dimensional Gaussians, such that the SB can be written as where µ0 is the peak SB, σ the dispersion along the projected major axis and q the apparent flattening of each Gaussian. In an oblate axisymmetric case, the inclination i is the only free viewing angle required to perform the deprojection. The deprojection is not unique (Rybicki 1987), unless the galaxy is viewed edge-on, but konus densities which project to zero SB have been found to be of little effect for SB distributions of realistic (elliptical) galaxies (van den Bosch 1997). If a deprojectable (i.e. cos 2 i < q 2 min , with qmin being the axis ratio of the flattest Gaussian in the fit) inclination i has been chosen, the intrinsic luminosity density in cylindrical coordinates reads as where σj = σ j and qj = q 2 j −cos 2 i sin i now denote the intrinsic dispersion and flattening of the Gaussians. Simple (massfollows-light) models can easily be constructed by linking the tracer density ν to the mass density ρ via a mass-to-light ratio (M/L) for the individual Gaussians (Υj). The MGE is particularly handy here, as the gravitational potential can then be obtained by means of a simple, one-dimensional integral with and Q 2 j (u) = 1 − (1 − q 2 j ) u 2 .

Joint formalism & Bayesian Inference
We start our joint formalism with the strong lensing part. Given the image positions θ of the AGN and AGN host, a lens potential ψL will be adopted, which relates the AGN and AGN host source positions to those in the image plane via the lens equation where α( θ) is the scaled deflection angle. We describe the source intensity distribution on a grid of pixels with values s (vector with dimension Ns, the number of source pixels), which is related to the observed intensity value of the image plane d via Here, d is a vector with length N d (the number of image pixels) and n the noise in the data. f represents the lensing operator (a matrix of dimension N d × Ns), which contains information regarding the lens potential and observational effects -such as telescope point spread function (PSF) -and which is constrained by the extended image positions and intensities as well as the relative time delays between the individual AGN images. In general, lens potentials for which the deflection angles α( θ) (and hence the lensing operator f) can be obtained analytically and/or with only moderate numerical effort are adopted, and the goodness-of-fit for a particular model is defined as In the above equation, di represents the image pixel intensities, di,m the modelled image pixel intensities and σ d,i the corresponding pixel uncertainties. Whereas the first term in Eq. 26 represents the fit to the image intensity distribution, the second and third terms account for the χ 2 contribution from fitting in particular to the AGN positions (aj) and their relative time delays (∆t k ). A best-fitting model is usually quickly obtained by minimising the cost function in Eq. 26, when the parameters of interest are few. However, since we are interested in inferring credible confidence intervals for all parameters of interest, we perform an analysis within the framework of Bayesian statistics. For simplicity, let us assume that the strong lens configuration is well parameterised by a softened power law elliptical mass distribution (SPEMD; Barkana 1998) with ζ 2 = x 2 + y 2 /q 2 , where E is a normalisation factor, η the power law index, ζc the core radius and q the observed flattening with the x -axis being aligned with the projected galaxy major axis. According to Bayes' theorem, the posterior PDF for this lensing-only model -with the set of parameters τL = {E, η, ζc, q, ω} and data sets dL = {di, aj, ∆t k }is given by where the log likelihood corresponds to Eq. 26 (log PL ∝ −χ 2 L /2, after marginalising over the source intensity pixel parameters s; Suyu & Halkola 2010) and ω comprises a set of remaining variables, such as D model ∆t , κext and the position angle (PA) of the projected SPEMD on the plane of the sky (we measure counter-clockwise from the x -axis to the yaxis). Once (non-)informative priors for τL have been chosen, the marginalised posterior PDF for a parameter of interest can be obtained by integrating the joint (lensing-only) PDF over all nuisance parameters. Similarly, the dynamics-only posterior PDF can easily be obtained by means of the dynamics-only likelihood Note that τL = τD. Unlike the lensing data, the dynamics are insensitive to e.g. κext, while depending explicitly on additional parameters, such as βz and D d . In the case of a joint lensing & dynamics model, the joint prior is simply a union of both τLD = {τL, τD} and the lensing & dynamics posterior PDF reads as given the independence of both data sets. The exploration of the parameter space and adequate sampling of the joint lensing & dynamics posterior PDF is carried out by means of the affine-invariant ensemble sampler emcee (Foreman-Mackey et al. 2013). Starting points are obtained by first carrying out a pre-annealing process, in order to avoid low probability modes of the multi-parameter space, and initialising the walkers of the sampler such that they sample well the prior probability distribution. At each step, i.e. at each parameter combination that is probed by the walkers, the lensing likelihood in Eq. 26 will be evaluated. Secondary products of this evaluation are the dimensionless SMD (Eq. 8) and SB distribution, which are then transformed into a physical mass and luminosity density profile, before being parameterised by a MGE (Eq. 20 and 21) to allow for a straightforward, analytical calculation of the lens potential according to Eq. 22. Here, the critical SMD in Eq. 9 has to be expressed in terms of D∆t, D d and the lens redshift z d The SMD profile (as obtained from the lensing part) is multiplied with (1 − κext), in order to take into account any contribution from the external convergence (see Eq. 11). As a consequence, κext cancels out in the mass density profile that is used in Eq. 17 and 18, and we are thus insensitive to κext when using the kinematic data. Note, however, that the absolute scaling of the lens potential is fixed and the MSD broken when stellar kinematics are included, which provide an independent measurement of the lens potential. Moreover, as D d itself is insensitive to the external convergence along the LOS when inferred from kinematic and lensing observations (Jee et al. 2015, and also noted above), we have After deprojection and adoption of an anisotropy parameter βz, the likelihood function in Eq. 30 can finally be evaluated. In combination with τLD, this yields the joint lensing & dynamics posterior PDF in Eq. 31, where the marginalised distributions can then be visualised by histograms with the most probable model and the 1σ uncertainties being approximated by the median and 16th and 84th percentiles of the distribution.

Bayesian Information Criterion
A large, flexible and physically motivated set of light and mass parameterisations is generally utilised to accurately model the foreground lens. This approach is largely motivated by our ignorance of the true underlying gravitational potential, which is tightly linked to the excess time-delay (Eq. 5). Even if lens galaxies are found to be well approximated by power law density distributions (Koopmans et al. 2006), the inferred time-delay distances can differ significantly when compared to density distributions that follow more closely e.g. a NFW profile. The discrepancy in D∆t between different lens mass models can be partially alleviated by including stellar kinematic information. However, the vast amount of data from both lensing and future IFU stellar kinematics should enable us to perform a model selection, by means of the differences in their likelihood functions. To this end we will make use of the Bayesian Information Criterion (BIC) where k represents the number of model parameters, n the number of data points andL the corresponding model maximum likelihood. The BIC discerns between candidate models by penalising models of increased complexity (i.e. with higher degrees freedom).
In Eq. 34, the model maximum likelihood is given by PL(dL|τL), PD(dD|τD) or PLD(dLD|τLD), depending on which data set is fitted. Yet, owing to the MSD and the minuscule differences in the likelihood function of different mass parameterisations when modelling RXJ1131−1231 (Suyu et al. 2014), the lensing-only likelihood cannot be used to discriminate between models. Even in a joint model, the likelihood would easily be dominated by the large number of pixel intensities that are fitted in the lensing part, which is why we will use the differences in the goodness of fit of the IFU kinematics from our joint strong lensing & dynamics run, to perform a proper weighting of our models according to the BIC. We follow the approach of Birrer et al. (2019), where a weighting scheme with respect to the minimal BIC is defined as For a given model including the lens mass/light distribution, PSF, AGN light, and AGN host galaxy surface brightness, the BIC value of this model could be computed to rank it relative to other models. We are particularly interested in comparing the lens mass parametrisation, and thus the changes in BIC due to different mass parameterisation. However, the BIC depends on also other modelling choices/parameters, especially the number of surface brightness pixels used to describe the AGN host galaxy, which introduces an uncertainty on the BIC (see Suyu et al. 2013, where source pixelisation dominates the uncertainties in the BIC for a given form of lens mass parametrisation). Given finite computing resources and thus a finite number of source intensity grids that we could explore, we quantify the uncertainty on the BIC due to the source grid pixelisation effect by comparing the BIC values of a range of source grids and estimating the scatter σBIC. To account for the uncertainty in the BIC in weighting models, we follow Birrer et al. (2019) and convolve fBIC in Eq. 35 with a Gaussian of variance σ 2 BIC , obtaining the new weights as where Carrying out the convolution integral, we find an analytic expression as follows where the Erf and Erfc functions are defined as and

DATA
In this section, we present real and mock observations of RXJ1131−1231, which are used to infer future cosmological constraints by means of our joint strong lensing & stellar dynamical analysis. We focus on RXJ1131−1231, in particular, given that it i) is the brightest lens galaxy among the H0LiCOW base sample, ii) has the most precise time-delay measurements, with only 1.3% uncertainty in the longest time delay, and iii) has plenty of ancillary data, which make it the most promising candidate for future integral field unit (IFU) observations.

Imaging
HST Advanced Camera for Surveys (ACS) data of RXJ1131−1231 have been obtained as part of programme GO:9744 (PI:Kochanek). The data set comprises imaging in the F814W and F555W filters, where five exposures each have been taken with a total integration time of 1980s. For the analysis, we give preference to the F814W imaging, given the fact that the stellar mass-to-light ratios become a weaker function of the underlying stellar populations (Bell & de Jong 2001;Cole et al. 2001). Moreover, the F814W data shows a clearer separation between the AGN and the spatially extended Einstein ring, whereas the F555W filter is more difficult to model due to diffraction spikes extending into the lensed arcs (but see Birrer et al. (2016) for a joint modelling of both bands). The reduction and combination of the imaging data is performed via the standard Multidrizzle pipeline (Fruchter & Hook 2002), with charge transfer inefficiencies properly taken into account by empirically tracing back the charge-coupled device (CCD) detector readout mechanism and thus the initial charge distribution (Anderson & Bedin 2010). The images are sky subtracted, corrected for geometric and photometric distortions and cosmic ray cleaned, before drizzled onto a final science frame with 0.05 /pixel resolution. Flux uncertainties for each pixel are obtained by adding in quadrature Poisson noise from the source and background noise from the sky and detector readout.
The final science frame is displayed in Fig. 1, where the centrally located galaxy lenses the background AGN into a quadruple lens configuration (A, B, C & D). The background quasar host is a spiral galaxy (Claeskens et al. 2006), which forms the extended Einstein ring. Discovered by Sluse et al. (2003), spectroscopic measurements of the foreground lens and background source yield a redshift of z d = 0.295 and zs = 0.654 (Sluse et al. 2003(Sluse et al. , 2007, respectively. The foreground lens is further accompanied by a satellite galaxy (S), which is assumed to be a dwarf elliptical (Claeskens et al. 2006).

Time delays
Time-delay measurements of RXJ1131−1231 have been carried out by means of a dedicated optical monitoring campaign within the COSMOGRAIL program. Based on highcadence (3 days) long-baseline (9+ years and more than 700 epochs) observations with meter-class telescopes, and new curve-shifting techniques, Tewes et al. ∆tCB = −0.4 ± 1.5 days and ∆tDB = 91.4 ± 1.2 days, with systematic errors already taken into account in the uncertainty estimates. In general, the long-baseline measurements result in time delays with ∼ 3% precision (Tewes et al. 2013b;Liao et al. 2015;Bonvin et al. 2016), and microlensing shifts in the time-delays (Tie & Kochanek 2018) are found to be negligible, given the long time delay (Chen et al. 2018). Given the above measurements, RXJ1131−1231 is particularly suitable for TDC purposes. The background AGN is not only quadruply imaged, providing three independent time-delay constraints, but the longest time delay yields an uncertainty of only 1.3% and forms a comparably low floor for any time-delay distance measurement. While percent level precision of ∆t is a necessary condition for any TDC probe that aims to measure H0 to the percent level in a single lens study, it is not sufficient. Even though the timedelay of RXJ1131−1231 is the smallest contributor to the error budget in D∆t , the final precision of ∼ 7% is still substantial and mostly dominated by uncertainties in the LOS mass contribution and degeneracies in the lens mass model.

IFU stellar kinematics
Spectroscopic observations of RXJ1131−1231 with state-ofthe-art instruments have, so far, only been able to yield a single stellar velocity dispersion measurement , due to the faintness of the lens and difficulties in separating the bright quasar light from the galaxy. Yet, future observatories, such as JWST and E-ELT, will be capable of obtaining far more than a single aperture averaged measurement of the stellar kinematics, due to their improved sensitivity and resolution. In order to assess the improvements in constraining D∆t, when IFU data from the next generation of telescopes become available, we have mocked up stellar kinematics based on the specifications of JWST 's near-infrared spectrograph (NIRSpec). To this end, we have carried out a lensing-only fit to the imaging data and timedelays of RXJ1131−1231, with a source resolution of 64 × 64 pixels. The light model consists of four pseudo-isothermal elliptical profiles (PIEMDs), which are used to mimic a twocomponent Sérsic distribution (Dutton et al. 2011;Suyu et al. 2014). The mass model consists of a baryonic and non-baryonic component, where the former is obtained by multiplying the light profile with a constant stellar M/L and the latter is accounted for by a NFW halo. The best-fitting model of this fit is then employed to create a luminosity and mass density profile, and complemented by a minimal set of random dynamical parameters, to create a mock kinematic map of the second-order velocity moment according to Eq. 19. Whereas the mock time-delay distance D model ∆t is based on the best-fitting lensing-only model, the mock input lens distance D model d is obtained by adopting our standard cosmological model in Sec. 1 with z d = 0.295 and zs = 0.654.
To simulate observational effects and to assess maps of different quality, we spatially bin the map beforehand to a target S/N of 20, 40 and 60 respectively. Especially the latter two are deemed more than sufficient to reliably extract the kinematics across the entire FOV (Falcón- ). The binning is achieved by assuming a S/N in the central (brightest) pixel. Given the parameterised SB distribution, and assuming a Poisson noise dominated regime, the relative pixel intensities can then be translated into a relative 2D S/N map, which is then binned according to the above requirement (Fig. 2). The S/N in the central pixel is obtained by means of JWST 's exposure time calculator (ETC V1.3), where we aimed for a S/N that is both high enough to yield enough spatially resolved measurements and achievable with reasonable on-source integration times. Our final data sets are comprised of three different S/N estimates. A central S/N of 60 (100, 30), with a target S/N of 40 (60, 20) in each bin, which we denote as 60/40 (100/60 & 30/20) hereafter.
The mock kinematics cover a 3 ×3 FOV for JWST at 0.1 /pixel resolution (see Table 1). This mimics a single pointing with JWST, centred on the lens, where a small cycle dither pattern with subarcsecond shifts can be carried out to allow for identification and removal of cosmic rays and detector defects. The FOV contains ∼ 900 spaxels. In reality, however, the number of useful spaxels (and hence the final number of bins) will be smaller than the total number within the nominal FOV, due to contamination from B, D and S. These will be masked during the fitting of the stellar spectra and extraction of the stellar kinematics. The loss in spatial information, though, should be minimal given the small point spread function (PSF). Nonetheless, we will also probe a smaller FOV of 2 ×2 , to quantify the changes in our cosmological constraints when less data is available. This smaller FOV is a conservative assumption and results in a considerable loss of spatial information when compared to the nominal FOV of 3 ×3 , but can be considered as a worst case scenario, where we aim to predict the minimal gain in Figure 2. Left: SB distribution of RXJ1131−1231 at JWST NIR-Spec resolution, which has been transformed into a S/N map. The on-source integration time with NIRSpec has been tuned to achieve a S/N of 60 in the central spaxel (∼ 7h with the ETC V1.3). Given the relative intensity distribution from a parameterised fit to its SB profile, the S/N for all spaxels follows accordingly. Right: binned NIRSpec map of RXJ1131−1231 to a target S/N of 40 in each bin, to allow for a reliable measurement of the stellar velocity moments across the entire FOV. For simplification, we omitted the AGN images and satellite while mocking up the observations. Notes. The PSF size is the actual size we have used for mocking and modelling purposes, and roughly 2× larger than the diffraction limited PSF FWHM. The mocked up observations can be obtained with the respective filter combination, which is selected such that it covers our target Ca II triplet stellar absorption features, given RXJ1131−1231's redshift of z = 0.295. The high resolution grating with its resolving power corresponds to an instrumental velocity dispersion of ∼ 50 km s −1 , likely sufficient to yield reliable measurements of the LOS velocity distribution across the entire FOV.
our cosmological inference. Note that these mock maps are created to harness the full power of JWST 's IFU spectrograph and are in contrast to previous studies of spatially resolved, but still radially averaged profiles of the velocity dispersion (Shajib et al. 2018). The kinematic data is convolved with a PSF of size 2× the diffraction limit of JWST and has a FWHM of 0.08 . We add realistic errors to the mock kinematics, where the error in each bin is derived by drawing a random number from a Gaussian distribution with µ = 0 and σstat = (v 2 LOS,l ) 1/2 × 1 (S/N) l (where (v 2 LOS,l ) 1/2 is the mock (v 2 LOS ) 1/2 value at bin position l and (S/N) l its corresponding S/N) 6 . This standard deviation is employed as our true measurement error. Moreover, correlated and uncorrelated uncertainties of 2% each have been added on top of the random Gaussian noise, to account for observational errors that can arise due to e.g. stellar template mismatches. The former simulates a systematic floor in our mock data set , whereas the latter follows the approach described above by adding again random Gaussian noise with µ = 0 and σuncorr = (v 2 LOS,l ) 1/2 × 1 50 to all bins across the entire FOV. In summary, we have for each bin where δvstat = Gaussian[0,σstat], δvcorr = 0.02 (v 2 LOS ) 1/2 , and δvuncorr = Gaussian[0,σuncorr]. The IFU kinematic maps are shown in Fig. 3, where the last column depicts our final mock data, which includes all sources of uncertainty and is employed as our reference data set throughout our joint analysis.

ANALYSIS
We construct time-delay strong lensing and stellar dynamical models within the axisymmetric Jeans formalism, as described in Sec. 2, and make use of the high-resolution HST data, time-delays and mock IFU stellar kinematic maps in Sec. 3.

Setup
To reliably constrain cosmological distances to the percent level, we require flexible and accurate prescriptions of the underlying lens mass distribution. In our joint modelling, we therefore make use of observationally and theoretically motivated mass models. In the case of RXJ1131−1231, this includes a SPEMD (which accounts for both the dark and luminous mass) and a COMPOSITE mass model, consisting of parameterised fits to the baryonic and non-baryonic matter distribution. Here, the baryonic mass is parameterised via four PIEMDs, which are used to mimic a two-component Sérsic contribution, and a NFW dark halo, and is therefore identical in nature with the mass model that has been used to mock up the IFU kinematics in Sec. 3.3. Both mass models have been shown to provide excellent fits to the strong lensing data (Suyu et al. 2014), but an aperture averaged stellar velocity dispersion measurement was essential to bring both D∆t distributions into agreement. While the discrepancy could be resolved by including stellar kinematic data, the precision and accuracy is still limited, and this mass model degeneracy is the main contributor to the error budget, which we aim to constrain further by modelling the mock IFU data set.
For simplicity, we neglect the satellite when mocking up the IFU map (Sec. 3) as well as during the modelling of the strong lensing and stellar kinematic data. The satellite is small enough to result in a loss of only a few spaxels, when being masked during the extraction of the IFU kinematics. More importantly, though, the satellite has a negligible effect for our mass model and cosmological inference , as it contributes as little as 1% to the SMD at  . Mock IFU v 2 LOS maps of RXJ1131−1231, at JWST NIRSpec resolution, for a S/N configuration of 60/40. First: mock kinematics without any errors. The data is based on the best-fitting lensing-only model, complemented by a minimal set of random dynamical parameters. Second: mock kinematics with random Gaussian errors. The error in each bin depends on its S/N and mock (v 2 LOS ) 1/2 value. This constitutes the "IDEAL" data set, without systematic uncertainties in the kinematic measurements (due to e.g., stellar template mismatch). Third: correlated errors have been added to the noisy (v 2 LOS ) 1/2 map in the second column, which results in a systematic floor of 2%. Fourth: uncorrelated errors of 2% have been adopted and added to the map in the third column. This last column illustrates our final mock kinematics, which account for various source of (systematic) uncertainties and observational difficulties and which will be used throughout our analysis as our reference data set, labelled "FIDUCIAL". the lens location. Our analysis relies on two separate mass models, as described above. The modelling parameters for both mass parameterisations are presented in Table 2, along with the dynamical modelling parameters, which have also been used to mock up the kinematic data set. Note that the PIEMDs (i.e. the Sérsic profiles) are fixed during the fitting process. Fits to the SB distribution are carried out beforehand, and the SB distribution is translated into a SMD profile by means of a variable stellar M/L. Moreover, the PA and centroids of the dark and luminous matter distribution in the COMPOSITE mass model are fixed to the same value, to ensure that the projected SMD can be deprojected to an intrinsically axisymmetric mass distribution.
Our respective models include a total of 9 variable parameters with uniform priors for both the SPEMD and COMPOSITE mass models. While the COMPOSITE model has an additional variable parameter (rs) with a Gaussian prior, the two mass models (SPEMD and COMPOSITE) have effectively the same number of free parameters with uniform priors. This number is significantly smaller than the total number of variables in the final lens model of RXJ1131−1231 in Suyu et al. (2013). However, most of those variables have a negligible effect on the key cosmological parameters, such as D∆t and D d , and we therefore adopt those optimised variables as fixed values during the modelling process. The only exception with respect to the optimised values in Suyu et al. (2014) is the PA of the NFW halo and the SPEMD, which are now both aligned with the SB distribution. As axial symmetry is a necessary condition for the construction of the Jeans models, a vastly different PA would violate our underlying modelling assumption. Source resolutions of varying sizes, on the other hand, will be probed, given that the parameter constraints show significant shifts depending on the pixelisation scheme for the AGN host galaxy surface brightness. Besides the uncertainty due to different mass parameterisations, this systematic uncertainty is, in fact, one of the biggest sources of uncertainty for a given mass model, resulting in a distribution that can be 2 − 3× as wide as for a fixed source grid resolution . For each mass model, we will therefore examine six different source grid resolutions of 54×54, 56×56, 58×58, 60×60, 62×62 and 64 × 64 pixels. These source grid resolutions are usually sufficient to achieve an adequate χ 2 , while stabilising the final modelling constraints towards a common value. The source grid size is chosen such that it contains the entire source intensity distribution, with the outermost source grid pixels converging towards zero intensity values. Our final lensingonly models will then equally weight the constraints from models of different source grid resolutions and different lens mass parameterisations (i.e. COMPOSITE and SPEMD), by combining the individual Markov Chains. An equal weighting is applied in the lensing-only case, since we are incapable of differentiating between different models due to the MSD. In the case of a joint lensing & dynamics run, however, a weighting scheme according to the BIC will be applied, where the dynamical likelihood PD(dD|τD) will be utilised to perform a model selection.

Modelling
We visualise the results of our joint strong lensing & stellar dynamical models in Fig. 4 and 5, where we show the marginalised 1D PDFs for our main parameters of interest, i.e. D model ∆t and D model d with a S/N of 60/40. The top panels display the constraints from fits to the strong lensing and kinematic data with statistical noise only (hereafter IDEAL, corresponding to the second panel of Fig. 3), whereas the bottom panels show the results from fits to the IFU stellar kinematics including various sources of uncertainty (hereafter FIDUCIAL, corresponding to the fourth panel of Fig. 3). The blue shaded region displays the PDF for models with a COMPOSITE mass distribution, the red shaded region for models with a SPEMD and the grey shaded region the combined PDF from both distributions when the IFU stellar kinematics are included in the fit and a weighting according to the BIC is performed. Note that the blue and red shaded region in Fig. 4 is the PDF from lensing-only models. This is in contrast to Fig. 5, where these also include the stellar kinematics, as the lensing-only models are insensitive to D model d . Table 2. Model parameters and priors for our joint strong lensing & dynamical models, including the cosmological distances, the SPEMD, the COMPOSITE mass distribution and the dynamical variables. The mock IFU data set is based on the best-fitting COMPOSITE lensing-only model with a source resolution of 64 × 64 pixels and random values for the dynamical parameters. The mock cosmological distances are based on the best-fitting lensing-only model for D ∆t and assuming H 0 = 82.5 km s −1 Mpc −1 , Ωm = 0.27, Ω Λ = 0.73, z d = 0.295 and zs = 0.654 for determining D d .

Description
Parameters Both panels in Fig. 4 clearly show the discrepancy in the time-delay distance, when different mass parameterisations are employed to model the foreground lens, with a double peaked distribution. Since these two models yield a comparable goodness of fit (a manifestation of the MSD), both are equally plausible and an equal weighting is applied in the lensing-only case. The combined COMPOSITE and SPEMD PDF therefore results in a distribution of 1836 +98 −121 Mpc (here, the 50th and 16th/84th percentiles of the distribution rather than the mean and standard deviation of a single Gaussian). The discrepancy in the time-delay distance measurement can be resolved by including IFU stellar kinematics, where the combined and weighted distributions converge towards the common mock input value of 1823 Mpc. The 2D kinematics clearly distinguish between both mass models, where the systematic offset in v 2 LOS is utilised to significantly downweight the contribution from the SPEMD (Fig. 6). A Gaussian fit to the combined and weighted COMPOSITE and SPEMD PDF with stellar kinematics included (i.e. grey shaded region) yields [µI, σI] = [1793 Mpc, 39 Mpc] in the IDEAL case and [µF, σF] = [1800 Mpc, 38 Mpc] for our FIDUCIAL data set respectively. This time-delay distance constraint is a considerable improvement in precision (2.1%) when compared to the lensing-only models (6.0%), even when systematic uncertainties in the stellar kinematics are generously taken into account.
The improvement can be traced back to three effects, in particular, i) a smaller width of the PDF for individual mass parameterisations with different source resolutions, ii) a shift of the mean of the distribution towards the true input time-delay distance and iii) a drastic downweighting of models with a significantly worse goodness of fit. In Fig 7, we illustrate the first two effects by showing the PDFs from lensing-only and joint strong lensing and stellar dynamical models. In both cases, we adopted a COMPOSITE mass distribution and modelled with six different source resolu-tions. The joint fit to the IFU stellar kinematics considerably reduces the width of the combined PDF from different source resolutions, effectively erasing the low probability wings from lensing-only models in Fig. 4, while shifting the whole distribution towards the input time-delay distance.
A joint fit with stellar kinematics of even higher S/N almost perfectly recovers the input time-delay distance while reaching a precision of 1.7%. In contrast, lower S/N kinematics yield not only worse precision but also worse accuracy, where the time-delay distance is only recovered within 1.2σ. The differences in the constraints for models of different S/N can be attributed to the aforementioned three effects, which are less prominent when the quality of the IFU kinematics degrades.
When it comes to the constraints for the lens distance D model d , we observe a much tighter distribution for our reference S/N of 60/40. Fits to the data with statistical noise only recover remarkably well the input lens distance of 775 Mpc. A Gaussian fit to the combined and weighted PDF from models with a COMPOSITE and SPEMD yields [µI, σI] = [770 Mpc, 14 Mpc]. This is a 1.8% precision measurement for the secondary cosmological distance we aim to infer. Yet, the distribution is clearly biased towards lower distances for fits to our FIDUCIAL data set with [µF, σF] = [736 Mpc, 13 Mpc], which is a consequence of the correlated noise and systematic floor we have added to (v 2 LOS ) 1/2 to mock realistic observational errors. The bias is especially prominent for models with a COMPOSITE mass distribution, where the joint PDF across all source resolutions (with equal weighting) yields a distribution with [µ,σ] = [730 Mpc,14 Mpc]. Keep in mind, however, that the mock data has been created with a source resolution of 64×64 pixels, which is at the edge of the joint COMPOSITE PDF for both D∆t and D d (see e.g. Figure 7). Picking a mock source resolution, and hence an input lens distance, which is closer to the median of the joint PDF in the first place would have partially alleviated  , based on joint strong lensing and stellar dynamical models, for the IDEAL data with statistical noise only and a S/N of 60/40. The blue shaded region shows the PDF for strong lensing-only models and a COMPOSITE mass distribution, including six different source resolutions with equal weighting. The red shaded region shows the corresponding PDF for strong lensing-only models with a SPEMD. The grey shaded region shows the combined and BIC weighted constraints from the joint run, including both mass parameterisations and all source resolutions. Bottom: same as above, but for our FIDUCIAL kinematic data set, including all sources of uncertainty. The vertical dashed line denotes the best-fitting lensingonly value for the COMPOSITE mass model and a source resolution of 64 × 64 pixels, which was used as input to mock up the IFU kinematics.
this strong bias for our COMPOSITE mass models. Despite our mock source resolution, we observe a similar bias in the joint COMPOSITE and SPEMD PDF across all S/N. Nonetheless, we are able to recover the true lens distance from our joint COMPOSITE and SPEMD PDF within ∼ 3σ, even for our highest S/N pick (with correlated and uncorrelated systematics included), where the model uncertainties are the tightest.
The final modelling results are summarised in Table 3, along with our set of complementary models, which we have constructed to assess uncertainties related to modelling i) a smaller FOV, ii) a miscalculation of the PSF size and iii) a single aperture measurement. Especially the latter has been carried out for direct comparison with literature measure-  , based on joint strong lensing and stellar dynamical models, for the IDEAL data with statistical noise only and a S/N of 60/40. The blue shaded region shows the PDF for models with a COMPOSITE mass distribution, including six different source resolutions with equal weighting. The red shaded region shows the corresponding PDF for models with a SPEMD. The grey shaded region shows the combined and BIC weighted constraints from the joint run, including both mass parameterisations and all source resolutions. Bottom: same as above, but for our FIDUCIAL kinematic data set, including all sources of uncertainty. The vertical dashed line denotes the mock lens distance as obtained from the COMPOSITE mass model, a source resolution of 64 × 64 pixels and under the assumption of our chosen cosmology. All models include the stellar kinematic data, as the lensing-only models are insensitive to D model d alone.
ments from H0LiCOW, which are based on a single aperture velocity dispersion.

Sources of uncertainty
To understand the impact of various sources of uncertainty, we complement our FIDUCIAL ("Full FOV") models by fitting to slightly different data sets, which have been mocked up in the same manner as outlined in Sec. 3.3.

Field Of View
In a first test, we fit to a smaller 2 × 2 FOV. As mentioned in Sec. 3.3 and 4.1, we have omitted the satellite and The systematic v 2 LOS offset in the SPEMD models is used to perform an effective model selection according to the BIC, and results in a significant downweighting of its corresponding probabilities.f AGN images in the data construction and modelling phase. Clearly, both will result in a loss of spatial information, as the affected spaxels will have to be masked when measuring the kinematic moments. In addition to contamination from nearby sources, spatial information will also suffer due to the breakdown of a Poisson noise dominated regime in the remote regions. It is hard to quantify this loss beforehand, given that the final S/N in any given spaxel will be a complex function of the noise properties of the detector, but we try to mimic both effects by drastically reducing the FOV by almost 50%. Keep in mind, however, that this does not translate to a loss of 50% of spatial information. The final number of bins is still 56, compared to 78 for the nominal FOV, and a consequence of the low S/N spaxels beyond ∼ 1 being discarded, which are otherwise massively binned to reach the target S/N.
The modelling results of this run are presented in Table 3, where fits to the IDEAL (i.e. with statistical noise only) and FIDUCIAL (i.e. with correlated and uncorrelated systematics) data are taken into account for a S/N configuration of 60/40. As expected, the constraints for Mpc] suffer and we obtain a precision of 2.5%, when compared to our reference "Full FOV" models with the same S/N. While we achieve a comparable precision for D model d with this smaller FOV, it is noteworthy that the bias towards lower distances is even more pronounced [µ, σ]FIDUCIAL = [733 Mpc, 13 Mpc], where the true lens distance can now only be recovered within ∼ 3.2σ. These findings urge us to aim for the deepest and highest S/N observations, as any loss in quality and/or quantity of the IFU stellar kinematics quickly diminishes any potential gain in the cosmological distance measurements. , based on strong lensing-only models. The individual colours represent the PDFs for models with different source resolutions and the same COMPOSITE mass parameterisation for the foreground lens mass distribution. The mean and standard deviation of the combined PDFs (i.e. for all source resolutions and equal weighting) is given in the legend. Bottom: same as above, but for joint strong lensing and stellar dynamical models, where systematic errors in the measurement of the stellar kinematics are included. Information from IFU data helps in reducing the width of the PDF for a given mass parameterisation, getting rid of the low probability wings, and shifts the mean towards the mock input value (vertical dashed line) at a source resolution of 64 × 64 pixels.

Point Spread Function
For the creation of the mock IFU stellar kinematics as well as during the modelling phase, we convolve the predictions of the axisymmetric Jeans equations with a PSF of 0.08 FWHM size. This PSF is twice as large as the diffraction limit of JWST, but a reasonable choice considering past applications and recent simulations for the next generation of telescopes (Tecza 2011). In contrast to state-of-the-art AO assisted instruments, this remarkable angular separation is achieved by means of a significantly smaller, flux dominating PSF core. The PSF will, however, be undersampled, given NIRSpec's pixel size, and a dithering strategy is vital to achieve the nominal spatial resolution. Nonetheless, slight mismatches with the true PSF can be expected when measuring the PSF size from real observations, and we account Table 3. Cosmological distance constraints from strong lensingonly and joint strong lensing and stellar dynamical models. The first column indicates the model, mock error type and S/N of the IFU stellar kinematics. Our FIDUCIAL modelling results are summarised under "Full FOV". Results for models with a smaller 2 × 2 FOV, an overestimated PSF and a single aperture measurement are denoted respectively. The latter three have only been modelled for a S/N of 60/40, which we deem optimal for future cosmological studies with JWST. The second column shows the constraints for the model time-delay distance D model ∆t , when the combined PDF is fitted by a normal distribution with mean µ and standard deviation σ. In instances where the distribution is clearly bimodal, we quote the 50th and 16th/84th percentiles of the distribution. The third column shows the same constraints for the model lens distance D model d . The PDF in all cases but the gNFW is the combined and BIC weighted PDF from COMPOSITE and SPEMDs models, with 6 different source grid resolutions. In the lensing-only case, an equal weighting of all models and source resolutions is applied due to the MSD. for this mismatch by convolving the model predictions with a PSF that is roughly 10% larger (i.e. ∼ 0.09 ).
Our PSF mismatch modelling results are again summarised in Table 3. In light of the sub-pixel PSF size, the cosmological distance constraints are stable across both error assumptions (i.e. for statistical noise only and with correlated and uncorrelated errors on top), yielding almost identical precision on both D model ∆t and D model d . We therefore omit probing models of different S/N and note that our constraints are insensitive to minor deviations from the true PSF size.

Single aperture
Stellar kinematics are now commonly employed to break the inherent modelling degeneracies in strong lensing studies. Yet, due to the faintness of the lens and difficulties in separating the bright quasar light from the galaxy, even with state-of-the-art facilities, the data is confined to a single aperture measurement of the stellar velocity dispersion. Moreover, currently employed techniques utilising this kinematic information in strong lensing studies are usually not self-consistent or physically too simple to capture the true complexity of realistic lens galaxies. For instance, most implementations in strong lensing studies assume elliptical lens mass models, but model predictions for the stellar kinematics are based upon a spherically symmetric mass distribution. Similarly, simple assumptions for the velocity anisotropy profile are made (Osipkov 1979;Merritt 1985a,b).
In order to assess the impact of using a single aperture measurement on D model ∆t and D model d , under the aforementioned modelling limitations, we mock up stellar kinematics of RXJ1131−1231 within a 0.8 ×0.8 FOV at JWST resolution, according to the procedure outlined in Sec. 3.3. The kinematics are then luminosity weighted to simulate a single aperture measurement of v 2 LOS . In principle, we would have to split v 2 LOS into two first order moments (Satoh 1980), defining the contribution of its ordered vs. random motions, for a straightforward comparison with a stellar velocity dispersion measurement. For simplicity, however, we assume that this massive elliptical lens is dispersion dominated such that v 2 LOS ≈ σ 2 within our FOV. Contrary to all previous models, but in line with literature studies, we break the self-consistency and run the axisymmetric Jeans models in the spherical limit, by fixing the projected short-vs. long-axis ratio of the luminous and dark matter distribution to q = 0.99. For both the COMPOSITE and SPEMD, the models can easily recover the single aperture v 2 LOS measurement of 325±12 km s −1 , yielding very similar goodness of fit values across all source resolutions. Ergo, the BIC is not capable of discerning between the two different mass parameterisations, leaving the final distribution still double peaked with D model ∆t = 1837 +99 −121 Mpc for our FIDU-CIAL models (median and 16th & 84th percentiles). The consequences of anchoring the cosmological distance measurements in RXJ1131−1231 on a single aperture velocity dispersion are most noticeable for D model d , where the 1D PDF is only loosely constrained, implying a precision 11%. This is smaller than the 18% found in Jee et al. (submitted), based on spherical Jeans models that fit the literature stellar velocity dispersion of 323±20, but the difference is likely to be attributed to the smaller errors in our mock data. Even though the precision on the distance measurements degrades substantially with only a single aperture averaged second order velocity moment instead of a 2D kinematic map, the input distances are recovered well within 1σ, without obvious signs of bias despite the spherical symmetry assumption employed here.

Generalised NFW
In a last effort to quantify the systematic uncertainties in our models, we adopt a generalised NFW profile (gNFW), where the halo follows a density distribution according to with ρ0 = δc ρc being a product of the characteristic density δc and the critical density ρc = 3H 2 /8πG at the time of halo formation, halo scale radius rs and density slope γ. The use of a gNFW halo is physically motivated by dissipational cosmological simulations, where the dark halo reacts to an accumulation of the central baryonic component via contraction (Blumenthal et al. 1986). More importantly, though, a mass model with a halo of gNFW form will allow us to better understand the systematics associated with a mass model which is comparably close to the true lens mass distribution. This is particularly interesting, given our general ignorance of the true underlying mass distribution. To this end, we make use of a gNFW halo with a rather arbitrary halo slope of γ = 1.2, while adopting the same priors for the remaining variables. When employing the above density parameterisation for the dark halo, we obtain strong biases for both the time-delay distance and lens distance, where the lens distance again is highly susceptible to any systematics in the kinematic measurements. . This simple toy model convincingly demonstrates the paramount importance of adopting a wide range of plausible mass parameterisations for TDC purposes, as strong offsets from the true time-delay distance will be measured otherwise, even if high-quality IFU kinematics are included in the fit. Note, that in contrast to earlier models, we have not combined the gNFW and COMPOSITE results. Rather, we have simply applied a weighting according to the BIC within the set different source resolutions for the gNFW case, as our focus was to quantify the systematic changes in the cosmological distance constraints for small changes in the parameterisation of the underlying gravitational potential.

COSMOLOGICAL FORECAST & DISCUSSION
We describe the cosmological constraints in flat ΛCDM using the forecasted distance measurements from the last section for both the IDEAL and FIDUCIAL lensing and dynamics models with a S/N of 60/40 and "Full FOV". We also compare the constraints from using the joint D∆t-D d measurement, with that from using only the marginalised D∆t measurement.

Importance sampling with the forecasted distances
In order to obtain a cosmographic forecast, the first step is to get the posterior probability distribution of the cosmological distance measurements, accounting for systematic uncertainties. From the lensing and dynamical modelling detailed in the previous section, we have Markov chains containing the sampled D model ∆t and D model d parameters, for various models and set ups. For each data set, we weight the various models (with different lensing mass parametrisation and lensing source grid resolutions) using their BIC values following Eq. 38, where we have estimated σBIC through the scatter in BIC values from models that differ only in lensing source resolutions (given that the source resolutions have a dominant effect on the scatter). We then combine the weighted chains/models, and fit the marginalised D , we can relate this to constraints on the cosmological parameters in any background cosmology through importance sampling (e.g., Lewis & Bridle 2002;. As an illustration, we consider the constraints on H0 specifically for the flat ΛCDM cosmology, where we adopt uniform priors on H0 between [50,120] km s −1 Mpc −1 and on the matter density parameter Ωm between [0.05, 0.5]. We draw 10 7 samples in {H0, Ωm}, and compute the corresponding D∆t and D d values given the lens and source redshifts in flat ΛCDM. For each of these samples, we also draw a value of κext from the κext distribution, and scale the distances according to Eqs. 13 and 33 to obtain D model ∆t and D model d . We finally weight the sample by P (D model ∆t , D model d |dL, dD). From the distribution of the weighted samples, we obtain constraints on H0 and Ωm.

Forecasted H0 constraint in flat ΛCDM
We show in Fig. 8 the cosmographic constraints for the FIDUCIAL lensing and dynamical models with S/N of 60/40 (violet). The constraints with statistical errors only (blue) are comparable, though slightly shifted in D d towards the mock input value of 775 Mpc. After including all sources of uncertainty, we expect to achieve a measurement of H0 = 86.2 +2.1 −2.0 km s −1 Mpc −1 , with 2.4% precision (defined by the 50th, 16th and 84th percentile), by having highquality spatially-resolved kinematic data from JWST. The marginalised D∆t constraint is 1895 +44 −45 Mpc, which is of similar precision as H0. Compared to the 6.6% uncertainty in D∆t without spatially resolved kinematic data (Suyu et al.  . The blue shaded contours show the corresponding constraints for our models with statistical noise only (i.e. without correlated and uncorrelated systematic errors). The green shaded contours are obtained from our FIDUCIAL models with a S/N of 60/40 when κext is estimated from number counts along overdense lens LOSs (Suyu et al. 2014). Both H 0 and D ∆t are recovered incredibly well in our FIDUCIAL models, with a precision of 2.4% and 2.3% respectively. While we can quote a precision of 1.8% on D d , the recovered value is highly biased towards lower distances due to the systematic floor we have added to v 2 LOS , in order to mock real observational errors. 2014), we are reducing the systematic uncertainty by a factor of ∼ 3. Even when accounting for a more conservative κext distribution (Suyu et al. 2014), by ray tracing through overdense LOSs in the Millennium Simulations (Springel et al. 2005), a measurement of H0 = 86.1 +3.2 −2.5 km s −1 Mpc −1 , i.e. with 3.3% precision, is within reach, partly constrained by the assumption of flat ΛCDM that restricts the range of plausible D∆t and D d values.
With spatially resolved kinematics, we would also constrain D d to 735 +13 −13 , with 1.8% uncertainty. This measurement is substantially better than the ∼ 18% from the single-aperture average velocity dispersion measurement (Jee et al., submitted;Jee et al. 2015), yet biased towards lower values due to the systematic floor we have added to the FIDUCIAL mock kinematics. Nonetheless, we recover the mock input lens distance within ∼ 3σ. To see whether D d helps to further constrain H0, we repeat the cosmographic forecast above using the marginalised probability distribution of D∆t, i.e., P (D∆t|dL, dD, denv) = dD d P (D∆t, D d |dL, dD, denv). With only the marginalised D∆t, the constraint on H0 degrades slightly to 87.4 +2.2 −2.2 (i.e. 2.5% uncertainty) for the FIDUCIAL model. Note, however, that the marginalised D∆t measurement is not totally ignoring the distance information in D d , as it has been used to break the MSD in the first place. Moreover, the combination of two distance measurements from a single lens provides not only tight constraints on H0, but adds significant constraining power also for the matter density Ωm (Fig. 10).
Given the low redshift of the lens galaxy in RXJ1131−1231, H0 is primarily constrained by D∆t in this case, but the tight constraint on D d would provide substantial constraints on cosmological models beyond flat ΛCDM. For illustration, we show the cosmographic constraints for flat wCDM cosmology in Fig. 10, where we focus on our FIDUCIAL models with improved (McCully et al. 2017) and conservative (Suyu et al. 2014) κext distributions. We note that the prior range on D∆t and D d in the more general wCDM model is substantially broader compared to flat ΛCDM; the conservative κext distribution thus leads to a wider D∆t distribution, in comparison to the case of ΛCDM in Fig. 8. With w = −1.4 +0.6 −0.7 , the time-independent dark energy is only loosely constrained but, nonetheless, compa-rable to the combined constraints from 3 single lenses without 2D kinematic data. . Also, such a D d measurement would serve as a stringent anchor for the inverse distance ladder approach for inferring H0 (Jee et al., submitted).
Our forecasted constraints are essentially limited by the S/N of the spatially resolved kinematic maps for breaking lens model degeneracies. As shown in the previous section, higher S/N helps to discriminate between the different mass parameterisations better and hence provide tighter constraints on D∆t, as the differences in the goodness of fit (and thus in the BIC weighting) become more prominent. For the case of S/N of 60/40 of the FIDUCIAL model, the resulting H0 would have an uncertainty of 2.4%. While this S/N can be achieved with reasonable observation times (∼ 6h), higher quality data to constrain H0 further would be difficult to obtain with JWST given the long integration time needed. Future giant segmented mirror telescopes like the E-ELT and TMT, however, could achieve a S/N 100/60 within the same time, owing to their ∼ 5 − 6× larger aperture.

SUMMARY & OUTLOOK
In this paper, we presented a self-consistent joint strong lensing & stellar dynamical modelling machinery for TDC purposes, which employs a pixelated source reconstruction model and the solutions of the Jeans equations in axial symmetry. Our analysis is carried out within the framework of Bayesian statistics and suited, especially, for the study of strong lens configurations for which IFU stellar kinematic data will become available in the near future, by means of the next generation of ground-and space-based telescopes. To assess the performance of the machinery and the expected gain in the inference of cosmological distances and parameters, we mocked up IFU observations of the prominent lens system RXJ1131−1231, at JWST NIRSpec resolution. RXJ1131−1231 was a particularly natural choice for this study as it is the brightest known lens galaxy for which precise time-delay measurements are already available. The mock stellar kinematic map was based on the best-fitting lensing-only mass model, with a dark and luminous matter contribution, while making random assumptions about the orbital anisotropy and viewing orientation. The mock lens distance has been obtained by assuming a standard cosmological model with H0 = 82.5 km s −1 Mpc −1 , Ωm = 0.27, ΩΛ = 0.73 and a lens and source redshift of z d = 0.295 and zs = 0.654, respectively.
With this suite of data, consisting of deep HST imaging, precise time-delays and mock IFU stellar kinematics of various levels of quality and including various sources of uncertainty, we constructed joint strong lensing & stellar dynamical models. Our models relied on two different mass parameterisations (COMPOSITE and SPEMD), which have been shown to yield significantly different time-delay distances when lensing-only fits are carried out (Suyu et al. 2014). Given the vast amount of information from the spatially resolved kinematics, we utilised the systematic differences in the predicted second-order LOS velocities between the different models, to apply a model selection according to the The violet shaded contours show the 1, 2 and 3σ confidence intervals for our FIDUCIAL models with S/N of 60/40 (i.e. including correlated and uncorrelated systematic errors in the IFU stellar kinematics). The black points (lines) depict the mock input values after accounting for a κext distribution with mean 0.05 and standard deviation of 0.01 (McCully et al. 2017). The green shaded contours are obtained from our FIDUCIAL models with a S/N of 60/40 when κext is estimated from number counts along overdense lens LOSs (Suyu et al. 2014). Even if w is only loosely constrained, given that D ∆t is mainly sensitive to H 0 , the constraints from this single lens system yield similar precision as literature studies of 3 lenses without IFU kinematics and provides a promising avenue for the exploration of models beyond ΛCDM.
BIC. The main results of our study can be summarised as follows: • The models recover remarkably well our input timedelay distance D∆t ( 1σ), when high-quality IFU stellar kinematics (S/N 60/40) are available. This result is irrespective of the IFU stellar kinematic errors (i.e. assuming purely statistical errors or with correlated and uncorrelated systematics of 2% each included).
• The time-delay distance can only be recovered within 1.2σ or worse, when the S/N of the IFU kinematics degrades below that of our reference 60/40 model.
• The lens distance D d is recovered in all cases within 1σ, when purely statistical errors for the stellar kinematics are assumed. But, a strong offset from the mock lens distance is observed, when the stellar kinematics are systematically biased towards higher or lower values. In these instances, the true lens distance can only be recovered within 3σ or worse, depending on the systematic offset of the data. Controlling the systematics in the measurement of the stellar kinematics is therefore key for a reliable inference of D d .
• The aforementioned results are valid for a 2D map, that covers the LOS velocity distribution within a 3 ×3 FOV (e.g. JWST NIRSpec nominal FOV). Modelling a smaller 2 ×2 FOV, to account for loss of spatial information due to contamination from nearby objects, yields similar accuracy but is less precise (∆D model ∆t /D model ∆t = 2.5%). This highlights the importance of deep and high-quality IFU data, as the gain in the cosmological inference is easily diminished when less spatial information is available.
• Small mismatches with the true kinematic PSF size have a negligible impact on the final modelling constraints.
• A single aperture stellar velocity dispersion is not very effective in breaking the MSD in RXJ1131−1231, yielding a marginal improvement in precision for D model ∆t over lensingonly models. The constraints for D model d suffer the most, with a precision 11%.
• We achieve a 2.1% precision measurement on D model ∆t , for our FIDUCIAL models with a S/N of 60/40 and including various sources of uncertainty while mocking up the IFU stellar kinematics. This translates to a 2.3% and 2.4% precision measurement on D∆t and H0, respectively, in flat ΛCDM.
• A 1.8% precision measurement can be achieved for D d . Yet, this measurement is sensitive to the aforementioned systematics in the stellar kinematics, since the constraints are mainly anchored by the IFU data.
• The constraints for D∆t, D d and hence H0 improve by a factor of 3, when high-quality IFU stellar kinematics are incorporated in the fit. The improvement can be traced back to three effects in particular, i) a smaller width of the PDF for individual mass models with different source resolutions, ii) a shift of the mean of the distribution towards the true time-delay distance and iii) a drastic downweighting of models with a significantly worse goodness of fit, which is otherwise not feasible due to the MSD.
The increased flexibility of our models allows for a more realistic modelling approach, while circumventing many of the assumptions and limitations of literature time-delay studies. Yet, as the lensing cross section increases with mass, gravitational lenses are likely to be massive elliptical galaxies, which have grown through numerous violent minor and major merger encounters (Wellons et al. 2016). As a consequence, lens galaxies are neither spherical nor elliptical. In fact, recent studies strongly indicate that the most massive galaxies are triaxial (Li et al. 2018), and modelling within an axisymmetric framework might be equally inadequate. It is beyond the scope of this paper to quantify the systematic uncertainties that can be traced back to the violation of axial symmetry, but literature studies show that the reconstructed dynamical masses can be underestimated by as much as 50%, depending on the viewing orientation (Thomas et al. 2007). Taking into account the link between the gravitational potential and the excess time delays, we advise against a modelling within this framework if strong signatures of triaxiality, such as isophotal twists or kinematically decoupled components, are present.
The modelling machinery presented in this paper, along with high-quality IFU data from future space-and groundbased telescopes, provides a promising outlook for constraining cosmological parameters to the few percent level from axisymmetric lenses. Given our forecast for the single lens system RXJ1131−1231, an H0 measurement of 1.4% precision can be expected, if three such distance measurements are combined. This is an important boost in precision when compared to the combination of three such lens systems without 2D kinematic data ) and comparable to the current best cosmological probes. It is worth noting, however, that the stellar kinematics have been mocked up by means of a COMPOSITE mass model, which in turn was used to model the suite of data. In reality, though, the set of candidate models is unlikely to contain the true form of the lens potential. As the BIC only applies a relative weighting scheme between all available models, the final accuracy and precision of the cosmological inference heavily relies on an adequate description of the true lens potential. In fact, even the slightest deviations from the true lens potential, as demonstrated by our gNFW models, can result in a biased inference of H0, which is in agreement with Sonnenfeld (2018), where a simple power law model was found to be insufficient to provide an unbiased measurement of H0 in most cases. As a consequence, the study presented here can only be regarded as a best-case scenario, and we strongly encourage to probe a large set of flexible and physically motivated lens mass parameterisations with sufficient degrees of freedom in the radial density profile, to minimise the systematic errors associated with it.