Cosmology from Large Populations of Galaxy-Galaxy Strong Gravitational Lenses

We present a forecast analysis on the feasibility of measuring the cosmological parameters with a large number of galaxy-galaxy scale strong gravitational lensing systems. Future wide area surveys are expected to discover and measure the properties of more than 10 000 strong lensing systems. We develop a hierarchical model that can simultaneously constrain the lens population and cosmological parameters by combining Einstein radius measurements with stellar dynamical mass estimates for every lens. Marginalizing over the lens density profiles and stellar orbital anisotropies, we find that $w$ can be constrained to a precision of $0.11$ with 10 000 galaxy-galaxy lens systems, which would be better than any existing single-probe constraint. We test our method on 161 existing lenses, finding $w=-0.96\pm0.46$. We also show how to mitigate against the potential systematic of redshift evolution in the mean lens density profile of the population.


INTRODUCTION
The acceleration of the universe has been discovered through Type-Ia supernovae (Perlmutter et al. 1999;Riess et al. 1998), and has been measured by several observational probes, including Cosmic Microwave Background (CMB) Anisotropies, Baryon Acoustic Oscillations, Weak Gravitational Lensing, Galaxy Clustering, and Redshift Space Distortion (Mortonson et al. 2013).These observations concluded that under flat ΛCDM model, the dark energy density makes up 70% of the universe today, and has an equation of state of w ≈ −1.However, the exact nature of dark energy and dark matter still remains unknown.The 5 σ discrepancy of Hubble Constant between Planck's CMB measurement (Planck Collaboration et al. 2014, 2016, 2020) and local measurements of supernovae (Riess et al. 2011;Freedman et al. 2012;Riess et al. 2016) also suggest potential new physics beyond the ΛCDM model.
In addition to the above methods, galaxy scale strong gravitational lensing provides an independent probe to constrain the cosmological parameters.Strong gravitational lensing occurs when two galaxies align perfectly to our line of sight, such that the light from the background source galaxy will be distorted and magnified by the foreground lens galaxy, resulting in multiple sources or an Einstein ring in the observed image (Einstein 1936;Zwicky 1937a,b).The radius of the Einstein ring (Eintein radius), relative positions, flux ratios, and time delays between multiple images depend both on the gravitational potential of the lens galaxy and angular diameter distances between the observer, lens galaxy, and source galaxy.It is the dependence on the angular diameter distances that makes strong lensing sensitive to cosmological parameters.Figure 1 illustrates the sensitivity of the tian.li@port.ac.ukEinstein radius to the equation of state of dark energy.The most well-studied method of constraining cosmology with strong lensing is time-delay cosmography, which uses the temporal variation of a gravitationally lensed quasar or supernova to constrain the Hubble constant (Refsdal 1964;Birrer et al. 2022;Treu & Marshall 2016;Treu et al. 2022).Alternatively, the equation of state of dark energy, w, can be measured from systems with sources at multiple redshifts (Gavazzi et al. 2008): Collett & Auger (2014) used a single double source plane lens to infer w = −0.99+0.19  −0.22 .Aside from single lens analyses, statistical analyses of the ensemble of lens systems can also place bounds on cosmological parameters.Oguri et al. (2008), Chae et al. (2002), andChae (2007) studied lenses from Cosmic Lens All-sky Survey (CLASS, Myers et al. 2003) and The Sloan Digital Sky Survey Quasar Lens Search (SQLS, Inada et al. 2012).They measured w through comparing the empirical distribution of image separations in observed samples of lenses with theoretical models.Since the number of lens systems is low, the constraint on w using this method is weak (e.g., w = −1.1 ± 0.6 +0.3  −0.5 , Oguri et al. (2008)).The current cosmology analyses with strong lensing systems are limited by several factors.The primary issue is that the sample of known galaxy-scale lenses is only a few hundred systems, discovered in several surveys with heterogeneous selection functions.Once a sample of time-delay or compound lenses has been selected only a handful of suitable systems have adequate data for precision cosmography, e.g the latest TDCOSMO sample has only 6 time-delay lenses (Birrer et al. 2020).Accurate and efficient lens modelling is also a challenge.The mass distribution in the lens must be inferred to convert lensing observables into cosmological constraints.This challenge is compounded by the mass-sheet degeneracy, where different mass models can produce identical strong lensing observables but imply different cosmological parameters (Schneider & Sluse 2013).Future optical imaging surveys, including Euclid and LSST, are predicted to discover more than 10 6 galaxy-galaxy strong lensing systems (Collett 2015).Euclid will provide high-resolution images, such that lens modelling could plausibly be performed for every lens, without the need for additional imaging data.However, almost all strong lensing science requires accurate lens and source redshifts, and the 4MOST Strong Lensing Spectroscopic Legacy Survey (4SLSLS) is expected to obtain the spectrum of ≈ 10 000 strong lensing systems (Collett et al. 2023).
Combining lensing and stellar dynamics opens up a new statistical method to constrain cosmology.Gravitational lensing determines the mass inside Einstein Radius, and stellar kinematics determines the gravitational potential within which the stars are moving (Koopmans 2006).Assuming general relativity, the dynamic mass enclosed with the Einstein radius must equal the gravitational mass measured using the Einstein radius.
Thus the combination of lensing and dynamical observables are sensitive to the mass profile of the lens, the orbital properties of its stars, and the cosmological distances (Futamase & Yoshida 2001;Grillo et al. 2008).The challenge inherent to this method is that the mass profile of lenses and the orbital profiles of their stars are not well known.To produce cosmological constraints, either assumptions must be made about the lenses, or the cosmological parameters and lens properties must be inferred simultaneously.Biesiada et al. (2010) applied the lensing plus dynamics method on 20 lens systems.By assuming a SIS mass profile for all lens galaxies, they found ΩM = 0.27 ± 0.28, and w = −0.63 ± 0.45.Cao et al. (2015a) and Chen et al. (2019) improved on this by fitting ∼ 100 lenses with powerlaw density profiles.
In this paper, we investigate how well the combination of Euclid and 4MOST data for 10 000 lenses can constrain the cosmological parameters.We employ a Bayesian hierarchical model and simultaneously fit for the cosmological parameters and the ensemble properties of the lens galaxies, including the density profile slope and the stellar orbital anisotropy.It is hard to perform detailed modelling for 10 000 lens systems, so we assume that we can only use catalog-level data in our analysis.
The rest of the paper is organized as follows.In Section 2, we present the mass model of the lens galaxy and the equation that relates the galaxy's velocity dispersion to cosmological parameters.
We then introduce the properties of the mock data and discuss potential future surveys for data acquisition.In Section 3, we describe the hierarchical model used to simultaneously fit the lens galaxy properties and cosmology.In Section 4, we present the results obtained under different cosmological models and data measurement accuracies.The final section summarizes the main conclusions.In this paper, the fiducial cosmology for the mock data set is as follows: ΩM = 0.3, ΩΛ = 0.7, Ω k = 0, and w = −1.

THEORY
The mass enclosed within the Einstein ring can be related to Einstein radius through: where D l is the angular diameter distance of the lens galaxy, D ls is the angular diameter distance between lens and source, and Ds is that between observer and source.θE is the angular Einstein radius.M(θE) is the galaxy mass enclosed within the Einstein radius, and the angular diameter distance is : where sinn(x) = sin(x), x, or sinh(x) for open (Ω k < 0), flat (Ω k = 0), or closed (Ω k > 0) universes respectively, and E(z) is the normalised Hubble parameter: When combining lensing and dynamics, the cosmological model is not directly probed by the measurement of a single distance, but instead through a ratio of distances Ds D ls .However, to make this measurement we need a model that can connect the dynamical data with the lensing mass.Since lenses are typically elliptical galaxies (ETGs) with E/S0 morphologies (Oguri & Marshall 2010), we use a power profile for both total mass density and stellar luminosity profile (Koopmans 2006): where ρ(r) is the mass density (include dark matter) distribution function.ν(r) is the luminosity density of stars.β(r) is the anisotropy of the stellar velocity dispersion (stellar orbital anisotropy), where σ θ and σr are the tangential and radial velocity dispersion, respectively.β ranges from +1 to −∞, where β = 0 corresponds to the "isotropic" case, β = 1 corresponds to a galaxy with pure circular stellar movement, β = −∞ means that stars in the galaxy only have radial movement.In the scenario where γ = δ = 2 and β = 0, the mass model reduces to the Singular Isothermal Sphere (SIS) model, which is a commonly used approximation for the mass profiles of elliptical galaxies (e.g.Auger et al. (2010)).
Solving Equation 5, we find that for a fixed galaxy mass and surface brightness profile, a steeper density profile (larger γ) and a higher velocity anisotropy (larger β) lead to a higher stellar velocity dispersion.Figure 2 shows the value of Equation 6 as a function of γ and β, assuming δ = 2.173, which is the typical value of SLACS lenses.

Mock data
To build our mock sample of lenses, we use the simulated strong lensing population forecast to be observed by Euclid from Collett (2015).We remove all systems with a source redshift greater than 1.5 since the [OII] doublet is redshifted out of the 4MOST wavelength range at this redshift.The sample is built assuming lenses are uniformly distributed in co-moving volume, follow the observed velocity dispersion function of SDSS (Choi et al. 2007), and have an SIS density profile.The source properties use the LSST sky simulations, with redshifts and number counts matched to observations (Connolly et al. 2010).These assumptions produce a realistic lens population, but lack the complexities of a non-SIS density profile, or of stellar anisotropy.Therefore, we take only the lens galaxy redshift, source galaxy redshift, and the Einstein radius in the data set.We assign γ, β, and δ values to each lens galaxy to produce new velocity dispersions.
Observations shows that γ has a distribution of 2.078 ± 0.16 Einstein radius and redshifts are taken from Collett (2015), with source redshifts below zs < 1.5.The velocity dispersions are computed from the other parameters using equation 5.Among the above properties, δ, redshift, Einstein radius, and Velocity dispersion are treated as observables.The measurement error of δ and redshift are negligible.γ can also be measured through detailed lens modelling, but we treat it as a free parameter except in the end of Section 5. (Auger et al. 2010), δ has a distribution of 2.173 ± 0.085 (Chen et al. 2019), and β = 0.18 ± 0.13 (Schwab et al. 2010;Bolton et al. 2006).
We generate mock galaxies using these distributions.The true values of velocity dispersion are generated through equation ( 6).Then, we add random noise to these values to simulate measurement error.
The measurement error of Einstein radius is set as 0.01 arcsec, and the error on velocity dispersion of 4MOST is 10km/s.We neglect errors on δ and the redshifts since they can be measured with high accuracy.In our fiducial model, we assume that β and γ for each lens are unknown.The full setup of mock data is shown in Table 1.

HIERARCHICAL MODEL
Since we do not know the mass profile or orbital anisotropy of individual lenses apriori, and they are not easily measured without detailed lens modelling and integral field unit kinematics, we instead require a hierarchical model to connect the measured Einstein radius and the velocity dispersion of every lens with the underlying cosmological parameters of the Universe.In fact, even the population properties for lens density profiles and stellar anisotropies are not well known once the cosmology is allowed to be a free parameter.
For our population model, we assume that density profile slopes and stellar anisotropies follow a Gaussian distribution.Our hierarchical model must therefore fit for the ensemble mean density profile slope, γ , and scatter, σγ, the ensemble mean anisotropy β , and scatter, σ β , the individual slopes, γi, and anisotropies, βi, of each lens, and the underlying cosmological model parameters.Figure 3 illustrate the structure of our hierarchical model, with the observed Einstein radii and velocity dispersions linked to the model parameters through equation 5.

Sampling the parameters of our hierarchical model
Our hierarchical model has a large number of free parameters ( ∼ 3 per observed lens × 10 000 lenses), traditional MCMC methods are not sufficient for exploring such a posterior distribution.To that end, we make use of the computational frameworks initially developed for large machine learning models, specifically the NumPyro (Phan et al. 2019;Bingham et al. 2019) (Baydin et al. 2018), allowing for the computation of gradients of the likelihood function to be evaluated for all free parameters in the model.We estimate the posterior distribution through NumPyro's No-U-Turn-Sampler (NUTS).This is an HMC sampler that uses gradient information to find new draw proposals from the likelihood.Its main advantages over traditional MCMC methods are it produces chains with small auto-correlation values (e.g. each draw is independent), requires relatively few warm-up steps, and can draw samples from very high dimensional distributions without issue.
The priors of the model parameters are as follows: Cosmology : Where U stands for uniform distribution, and HN is half normal distribution.Observables like Einstein radius and velocity dispersion have a Gaussian prior centered on their observed values and have a standard deviation that comes from the observational uncertainty.For NumPyro's NUTS sampler we use "init_to_median" as the initialization strategy with target_accept_prob = 0.99.We run 10 chains with 1000 warm-up steps and 1500 sampling steps.Under this set-up, a typical HMC run takes less than 1 hour on an A100 GPU.The typical r for each fit parameter is smaller than 1.05 with no divergent samples, indicating that all the chains independently converged on the same values.

RESULTS
We apply our hierarchical approach to model up to 10 000 lens systems and fit three different cosmological models.Aside from the ΛCDM model, we fit the wCDM model, where the equation of state of dark energy is not fixed at -1.We also fit the owCDM model, which allows the universe to have positive or negative curvature.
Figure 4 shows the posterior distributions of the ΛCDM, wCDM, and owCDM models.The marginalized results are also shown in the figure.The parameters of lens populations are accurately recovered within 68% confidence level.Especially, the mean of γ is constrained with high precision (± 0.03 level).Figure 5 shows the offset between the recovered and input values of the nuisance parameters in our model, i.e. the γi and βi The standard deviation of the differences are both 0.12.This number is smaller than the intrinsic scatter of γ and β (0.16 and 0.13), and demonstrates that the model is accurately recovering these nuisance parameters as well as the population and cosmological parameters that we are primarily interested in.
Every model successfully recovers the input cosmological parameters.For wCDM, we find w = −0.98 ± 0.11.This is more precise than current individual cosmological probes: the Pantheon Type Ia supernovae data find w = −0.90± 0.14 (Brout et al. 2022a); baryon acoustic oscillations (BAO) from eBOSS give w = −0.69± 0.15 (Alam et al. 2021).However, our results are less constraining on w than the combination of both datasets and cosmic microwave background data from Planck, which yields w = -1.03± 0.03 (Planck Collaboration et al. 2020).
We find some strong covariances between the cosmological parameters and the lens population parameters of our hierarchical model.The most noticeable are covariances between β with ωM , Ω k and w.This degeneracy arises from the fact that when mass is held constant, varying β leads to different velocity dispersions and, consequently, requires different cosmological parameters.Similarly, the mean density profile slope, γ , is covariant with cosmology, since it governs the mapping from velocity dispersion to Einstein radius.

The impact of data quality on the cosmological constraining power of lens samples
The previous section was based on the assumption that 4MOST delivers 10 000 strong lenses with a velocity dispersion measured to a precision level of 10 km/s.In practice, it is difficult to accurately constrain the velocity dispersion of the lens galaxy in a spectroscopic survey.First, not all lens galaxies will be bright enough for their spectra to have sufficient signal-to-noise ratio (SNR) to measure with 4MOST (Iovino et al. 2023).Second, measuring velocity dispersions from spectra is intrinsically hard, since we do not perfectly know the correct choice of stellar templates for the spectral energy distributions of lens galaxies (Collett et al. 2018;Newman et al. 2017).Thirdly, continuum contamination from the source galaxy may be challenging to remove (but see Turner et al 2023, submitted) In this section, we explore how the constraining power for the wCDM cosmological model varies when we have different sample sizes or velocity dispersion measurement errors.The results are shown in the top panel of figure 6.As expected, the constraining power on w improves as we increase the number of samples or improve the velocity dispersion measurement.The w constraint versus sample size roughly follows σw = 10 × N −0.5 as shown in the bottom panel of figure 6.For 4MOST observations, there will be a trade-off between observing more lens systems with relatively low SNR (resulting in larger LOSVD errors) and obtaining longer exposure time for each target (resulting in higher SNR but lower sample size).We find that generally, decreasing the LOSVD error by a factor of two (improving w by a factor of ≈ 1.3-1.7) is comparably helpful for w as increasing the sample size by a factor of two.
As shown in figure 5, the individual βi are constrained comparably well to direct measurements of the circular velocity curve using HST (0.12 vs 0.05-0.4,Gerhard et al. 2001).The individual γi error (0.12) is about 5 times larger than the typical result from lens modeling (e.g., 2.11 ± 0.04 by Dye & Warren (2005), 1.96 ± 0.02 by Dye et al. (2008), and 2.08 ± 0.03 by Suyu et al. (2010)).This indicates that adding constraints on individual γi through lens modeling, instead of leaving them as nuisance parameters would substantially improve the cosmological constraining power.

Evolving Equation of State of Dark Energy
The equation of state of dark energy (w) is defined as the ratio of pressure over energy density.In many theoretical cosmological models, w evolves with redshift (e.g.Peebles & Ratra (1988); Caldwell (2002); Feng et al. (2005)).We test our method on an evolving w where w(z) = w0 + wa z 1+z (Caldwell et al. 1998).Since our mock data has fixed w = −1, the fiducial value this model is w0 = −1, wa = 0 (Chevallier & Polarski 2001;Linder 2003).
Our forecast constraints for 10 000 lenses are shown in Figure 7.The 1D marginalized posteriors give wa = 0.0 +0.8 −1.4 , w0 = −1.0+0.5 −0.4 .Since these parameters are strongly covariant, we use the figure of merit (FoM) from equation 6 in Mortonson et al. (2010) to quantify the overall dark energy constraining power: Where A95 is the Area enclosed within the 95% confidence contour in the w0 − wa plane.Larger FoM's therefore imply better dark energy constraining power.The FoM of our 10 000 lens result is 15.As a comparison, the 10-year LSST forecast for the combination of weak lensing and Large Scale Structure which has a FoM of 49 (The LSST Dark Energy Science Collaboration et al. 2018), for LSST 10-year supernovae the FoM is expected to be 32.We find that the galaxy-galaxy strong lensing constraints have very similar w0 − wa covariance as those forecast for weak lensing.This is to be expected since both methods constrain the distance ratio Ds/D ls .

THE IMPACT OF THE EVOLUTION OF LENS GALAXY POPULATION
The results of the previous section were based on mock data where γ and β follow Gaussian distributions that do not evolve with redshift.However, this assumption may not be true of the real Universe.(which is all the lenses in our sample), dissipationless processes like gas-poor mergers can flatten the density slopes (e.g., Hilz et al. (2012Hilz et al. ( , 2013))).Hydrodynamic simulations show that the mass-density slope of early-type galaxies (ETGs) becomes flatter at lower redshift (Johansson et al. 2012;Remus et al. 2013Remus et al. , 2017;;Wang et al. 2020).However, most strong lensing studies find no evidence for evolving density slopes with redshift (Koopmans 2006;Auger et   2017), or a slight decrease in density slopes with redshift (Sonnenfeld et al. 2013;Cao et al. 2015bCao et al. , 2016;;Holanda et al. 2017).This might suggest that strong lensing surveys are biased towards certain lens populations, or that there is no evolution at all at low redshift.Since the possibility of an evolving mass profile has not been ruled out, we explore the robustness of our method when applied to an evolving lens population.
According to the simulations of Remus et al. ( 2017), the evolving total mass profile can be approximated by a linear relation: γz = 0.21z + 2.03.This equation gives unreasonably steep profiles compared to observed lenses, so for our new mock data we use the evolution of Remus et al. ( 2017), but fix the mean and scatter to give a total population that has the same average slope and scatter as in Section 4. Thus we draw our density slopes from γz = 0.21z + 1.89 with a scatter of 0.155.
We first try to fit the redshift evolving lens dataset with the nonevolving hierarchical model used in Section 4. The resulting cosmology posteriors distributions for 10 000 lens systems are shown in blue in figure 8 (full result in appendix A).The population parameter of γ population is recovered, but the cosmology parameters and the stellar anisotropies are systematically incorrect.Thus, a more complex model is needed to deal with this scenario.
We can generalize our hierarchical model to account for population redshift evolution by fitting additional parameters that describe the evolution of the population.In order to keep our parameters as similar as possible to those in section 4, we parameterise our density profile redshift evolution as follows: where 0.47 is the mean lens redshift of our population, we set γ = 2 as the ensemble mean density profile slope and ∆γ = 0.21 as its linear evolution with redshift.
Unlike the model where we ignore the evolution of the population, we find that fitting for the evolution does reproduce the input lens galaxy population parameters (Figure A2).We recover the correct redshift density evolution: ∆γ = 0.195 +0.015 −0.017 .Using this method, we are able to constrain w to -1.1 ± 0.07 level (Figure 8), which is better than the ± 0.11 error that we found for wCDM without Alternatively, the population parameters of describing density profiles can be constrained on a lens-by-lens basis using detailed lens modelling of the arcs observed in each system, without the need to know the cosmological model.This is because the slope of the mass density determines the radial derivative of the deflection angles, and thus the radial width of the arc is sensitive to the density profile slope (Dye & Warren 2005;Suyu et al. 2010;Collett et al. 2018).With image quality comparable to that of HST and Euclid, one can constrain γ to a precision level of σγ ≈ 0.02 (Meng et al. 2015), although this neglects the impact of the mass-sheet degeneracy (Gorenstein et al. 1988;Saha 2000;Wucknitz 2002;Liesenborgs & De Rĳcke 2012;Schneider & Sluse 2013).Whilst lens modelling at scale is currently challenging, adding a precise prior on γi for each lens should significantly improve the constraints on cosmological parameters.If we assume that we are able to pre-determine the value of γ for each lens system with a precision level of 0.02, we can treat γ as an observable, similar to the Einstein radius.The measurement of "w" is greatly improved in this scenario: w = −1.01 ± 0.06 for a 10 000 lenses with no evolution of the population, or = −1.08 ± 0.07 for the population where ∆γ = 0.21.Respectively, the figure of merit improves to 28 and 64 for 10 000 lenses assuming a flat w0waCDM cosmology (see the bottom two panels of the figure A1).
Table 2.The relative change of the 68 percent confidence interval of w as a function of how standardizable lenses are.We assume with 10 000 lens systems, but change the intrinsic scatter of the lens population γ and β.

LARGER SCATTER ON LENS POPULATION
In this study, our mock lenses are created, assuming the distribution of σγ and σ β match those inferred from SLACS data (Auger et al. 2010) and a set of nearby elliptical galaxies (Gerhard et al. 2001).
This assumption is critical to our forecasts since σγ and σ β quantify how standardizable each lens is.If σγ or σ β are significantly larger in the real Universe then the power of this method to constrain cosmography will be greatly diminished.It should be noted that the lens systems were selected from ground-based sky surveys, which may introduce biases compared to space-based surveys like Euclid.Additionally, the nearby elliptical galaxies might not be a perfect representation of lens elliptical galaxies.For instance, Xu et al. (2017) analyzed elliptical galaxies in the Illustris simulation and found that the standard deviation of velocity anisotropy can reach 0.3, while in our work, we employed a value of 0.13.We generate mock galaxies with a wider range of γ (0.1 -0.3) and β (0.1 -0.3) to test the effectiveness of our method.Table 2 presents the constraints on w (σw) with 1,000 lens systems in a flat universe under different distributions of γ and β.The constraints deteriorate as the scatter in either γ or β increases, but even in the most pessimistic scenario the constraints degrade by a factor of 2.17.On the other hand, our forecasts may be pessimistic if lenses can be further standardized by understanding the physical properties that drive the scatter of γi and βi away from the population mean.

APPLICATION ON EXISTING DATA
As a simple example of our method, we apply it to existing data to evaluate how well our power-law mass model can describe real lens galaxies.Additionally, we can compare our results with previous research that used a similar method and the wCDM model (Cao et al. 2015b;Chen et al. 2019).We use a dataset of 161 lens systems selected by Chen et al. (2019) (see Table A1), which were obtained from surveys including LSD, SL2S, SLACS, and S4TM.They measured the luminosity density slope and calculated the equivalent fiber radius of the spectrum for each galaxy.As we lack the specific δ value for each individual galaxy, we treat δ as an unknown value and draw from its measured distribution: δ = N (2.173, 0.085).The average velocity dispersion error for this dataset is 22 km/s.
The resulting posterior distribution is shown in figure 9.In our results, we find that w = -0.90± 0.45, which is an even better measurement compared to the theoretical result shown in Figure 6.Regarding galaxy population properties, we obtain γ = 1.89 ± +0.05, σγ = 0.18 ± 0.03, β = −0.17± 0.16, and σ β = 0.21 ± 0.09 at a 68% confidence level.The predicted γ and β populations are both smaller than those reported in other studies.This is likely due to the fact that we lack accurate δ values.Additionally, these 161 lens systems were obtained from four different surveys, and the population (selection function) of lens systems from different surveys can vary in which case fitting the ensemble with a single lens population would be incorrect.Understanding the selection function between different surveys is crucial for accurately measuring the lens population because the future strong lensing sample is likely to be the combination of LSST and Euclid discoveries.

DISCUSSION AND CONCLUSION
In this work, we have constructed a hierarchical model of galaxygalaxy lenses and the underlying cosmological parameters.We have employed Hamiltonian Monte Carlo through JAX-based NumPyro modelling to efficiently perform the analysis.Our findings indicate that we can simultaneously constrain cosmological parameters and lens galaxy properties under different cosmological models, including ΛCDM, wCDM, and owCDM.With a sample size of 10 000 lens systems, our method should achieve a 68% confidence interval of ±0.11 on w.These levels of constraint are comparable to other cosmic probes such as the Cosmic Microwave Background (CMB), standard candles, weak lensing, and galaxy clustering.Furthermore, we have shown that the evolution of the lens population can also be simultaneously constrained.We also tested the ability of galaxygalaxy lenses to constrain evolving dark energy.Our forecast Figure of Merit is 15, which is within a factor of 3 of each individual probe forecast for the LSST 10-year cosmological constraints (Mortonson et al. 2010).These results rely only on measurements of the Einstein radius, redshifts, velocity dispersion, and luminosity profile of each lens.Additional constraints on the lens density profile from detailed lens modelling, would improve the cosmological constraining power by a factor of ∼2, however, our method can still work without this huge investment in detailed modelling.Additionally, we applied our model to 161 real lenses in Section 7, finding w = −0.90± 0.45.As we discuss below, this result is likely systematics dominated but it illustrates that the method does work on real data.
One of the major simplifications made in this study is that the mass and light models used may not accurately represent real galaxies.The power law light profile we employ is a simplification of the more general Sersic profile (Sérsic 1963), which can describe the light profile of most elliptical galaxies.The effect of the PSF is also not considered in our model, which will bring extra complexity to the velocity dispersion measurement.Furthermore, galaxies are made of dark and luminous matter, and whilst the 'bulge-halo conspiracy' gives total density profiles that are close to powerlaw, the absolute truth is undoubtedly more complex than we have assumed.Our powerlaw assumptions, make the mathematics of our problem analytic (Equation 5), but should not hugely change the final constraints, since the model is only relevant in mapping the aperture dynamical mass onto the mass within the Einstein radius.
In this study, we made the assumption that all galaxy properties follow Gaussian distributions, this might not be true in reality.Additionally, due to a lack of observational evidence, we assumed that there are no correlations between γ, β, and δ.However, in actuality, these properties are likely to have initial scaling relations.For example, Auger et al. (2010) found that a steeper mass density profile implies a higher central surface mass density.Furthermore, Cappellari et al. (2007) discovered that elliptical galaxies can be categorized as slow and fast rotators, and they exhibit different β populations.There are also some outliers that have very low β found by (Gerhard et al. 2001).In our results, the orbital velocity anisotropy of lens galaxies exhibits a strong degeneracy with cosmological parameters.Therefore, understanding these correlations will aid in better fitting the mass models of individual galaxies.Another potential observational bias arises from our assumption that elliptical galaxies have a constant β, whereas observations suggest that elliptical galaxies have varying β with radius.Specifically, the central region of typical massive elliptical galaxies tends to be isotropic or mildly radially anisotropic (Gerhard et al. 2001;Cappellari et al. 2007).In spectroscopic surveys, the fiber has a fixed radius, which means that the region from which we measure the velocity dispersion will have a different size depending on the half-light radius.Consequently, any observed evolution in the properties may simply be a side effect resulting from the evolution of the lens galaxy's angular size.Similarly, we have ignored the potential for redshift evolution in the population anisotropy, which may be expected from simulations (Xu et al. 2017).
The simulated lens population in this study was limited to a redshift cutoff of 1.5.In real observations, the lens population is subject to additional limitations.For instance, observing the lens galaxy and the source galaxy simultaneously in a spectroscopic survey might not be feasible for lens systems with a large Einstein radius.As a result, some low-redshift lens galaxies may be ruled out from the analysis.Additionally, faint lens galaxies can lead to large velocity dispersion measurement errors, making them less likely to be included in the analysis.This can exclude low-mass galaxies and consequently affect the γ population estimation.Conversely, luminous galaxies beyond the redshift cutoff might be included in the analysis, which can introduce further complexities.These observational limitations and selection effects should be taken into account when interpreting the results and considering the true lens galaxy population.
The method used in this study can also be valuable for the analysis of time-delay cosmography, such as gravitational lens quasars and supernovae.Similar to our approach, galaxy density profiles can be simultaneously inferred with the Hubble constant, as demonstrated in previous work (e.g., Birrer et al. (2020)).But their mass models are heavily prior dominated, the bias in γ will introduce a significant bias in H0 in LSST era where the number of lensed quasar/SNe are large (Collett & Cunnington 2016).The lens galaxy population parameters we obtained can serve as priors when modeling gravitational lens quasars or supernovae.However, it is important to note that normal lens systems may have different population parameters compared to lensed quasar or supernova systems.Conducting a detailed investigation into the selection function between these systems will be necessary to mitigate any biases.
Whilst this work is a simplified investigation of how to simultaneously constrain the astrophysics of lenses and the underlying cosmological parameters, it has shown that the potential constraining power is competitive to more established cosmological probes.These encouraging results motivate further work on modelling the selection function, fitting more general density and dynamical profiles, and gathering the required data.

Figure 1 .
Figure 1.The Einstein radius as a function of the source redshift, for different values of the equation of state of dark energy.All other parameters are fixed.We assume the redshift of lens galaxy is 0.2, and has an Einstein mass of 10 11 M .

Figure 2 .
Figure 2. The value of the function that links Einstein radius and velocity disperion as a function of the orbital anisotropy and lens density profile slope.It is given in equation 6 as a function of γ and β.We have fixed δ = 2.173.The white regions are unphysical as the luminosity-weighted mass diverges here.

Figure 3 .
Figure 3. Schematic of our hierarchical model.Solid blue boxes are parameters of our model, and solid black boxes are observables or predictions of our model.The black dashed box includes the hyperparameters of the model, the parent lens population parameters and the cosmological parameters.The green dashed box contains the properties of individual lens systems, including the nuisance parameters γ i and β i of individual lens galaxies, which are drawn from the population parent distribution.Modeled velocity dispersions and Einstein radii can be computed through combining the parameters of the individual lenses and cosmology.Comparing these modelled values to the observed values, a likelihood function can be built up, which we sample with a Hamiltonian Monte Carlo analysis.

Figure 4 .Figure 5 .
Figure 4. Posterior distributions and 1D marginalized posterior distributions of the cosmological parameters and the lens population parameters for 10 000 lens systems.Grey assumes an owCDM universe, red is for wCDM, and blue is for ΛCDM universe.γ and σγ are the population density profile slope mean and standard deviation respectively.β and σ β are the mean and standard deviation values of the population's velocity anisotropy.The contour shows the 68% and 95% confidence levels, while the grey dashed line represents the fiducial value which was used to generate our mock data.

Figure 6 .
Figure 6.Top: 1 σ uncertainty on the equation of state of dark energy as a function of numbers of lenses and observational velocity dispersion measurement uncertainty.Bottom: At a fixed LOSVD error of 10 km/s, the standard deviation of w as function of number of lenses.The black dots represents the model result compared against the solid curve of 10 times the square root of the number of lenses.

Figure 7 .Figure 8 .
Figure7.Constraints on a flat cosmological model with evolving w where w = w 0 + wa(1-a).The grey dashed line is the fiducial value of our mocks.The blue contour is the posteriors distribution of wa, w 0 with 10 000 strong lensing systems.The green and red lines show current results from Type Ia supernova and CMB+BAO, respectively(Brout et al. 2022b; Planck Collaboration et al.  2020).

Figure 9 .
Figure 9. Posterior distribution and 1D marginalized distribution of the cosmological and lens population parameters assuming a wCDM universe and 161 real lenses from Cao et al. (2015a) and Chen et al. (2019).

Figure A1 .Figure A2 .
Figure A1.Constraints on w 0 and wa for a redshift evolving population mean density profile slope.The top panel results for a model which fits for linear evolution of the mean population γ (FoM=17).The bottom panel results when individual γ i are measured through detailed lens modelling of every lens (FoM=28).

Table 1 .
The parameters of the mock strong lensing systems used in this work.