How to Break the Mass Sheet Degeneracy with the Lightcurves of Microlensed Type Ia Supernovae

The standardizable nature of gravitationally lensed Type Ia supernovae (glSNe Ia) makes them an attractive target for time delay cosmography, since a source with known luminosity breaks the mass sheet degeneracy. It is known that microlensing by stars in the lensing galaxy can add significant stochastic uncertainty to the unlensed luminosity which is often much larger than the intrinsic scatter of the Ia population. In this work, we show how the temporal microlensing variations as the supernova disc expands can be used to improve the standardization of glSNe Ia. We find that SNe are standardizable if they do not cross caustics as they expand. We estimate that this will be the case for $\approx$6 doubly imaged systems and $\approx$0.3 quadruply imaged systems per year in LSST. At the end of the ten year LSST survey, these systems should enable us to test for systematics in $H_0$ due to the mass sheet degeneracy at the $1.00^{+0.07}_{-0.06}$\% level, or $1.8\pm0.2$\% if we can only extract time delays from the third of systems with counter images brighter than $i=24$ mag.


INTRODUCTION
Gravitational lensing is a powerful probe for understanding astrophysics and cosmology.Lensing is of particular use for constraining the expansion history of the Universe, since it is a geometric probe of the angular diameter distances between observer, lens and source (Narayan 1991).When a time variable source is multiply imaged by a gravitational lens, the time delays between images are inversely proportional to the Hubble constant  0 (Treu & Marshall 2016).Some measurements of lensed quasars (Wong et al. 2020) have found  0 = 73.3+1.7  −1.8 km s −1 Mpc −1 , in agreement with local measurements from Type Ia supernovae (Riess et al. 2022) but at odds with CMB measurements from Planck (Planck Collaboration et al. 2020).However, measurements are dependent on the mass distribution of the lens.Relaxing some assumptions and including non-lensing constraints has also lead to results which are consistent with those of Planck (Birrer et al. 2020).While gravitational lensing has the potential to direct some light on the existing tension between  0 measurements, a thorough understanding of systematics and degeneracies of lens modelling is necessary.
Historically time delays have been measured from lensed quasars (Millon et al. 2020), though supernovae were originally proposed (Refsdal 1964).Lensed supernovae (glSNe) offer several benefits over quasars: they evolve over much shorter timescales requiring months of monitoring as opposed to years, and they fade away enabling follow-up observations of the lensed host.Type Ia glSNe are ★ E-mail: weisluke@alum.mit.edu even better: they have well understood lightcurves, and are 'standardizable' candles, with a scatter of ≈ 0.15 mags (Richardson et al. 2014;Brout et al. 2022).The fact that Type Ia supernovae have a known intrinsic luminosity means that their magnification can be directly measured, helping to break lens modelling degeneracies.
In order to infer the value of  0 from a lensed time variable source, one needs to know the mass distribution of the lensing galaxy.The problem is that lensing observables alone cannot uniquely tell us about that mass distribution.A lensing model that predicts the positions, relative magnifications, and time delays between observed images can be rescaled with a sheet of mass in such a way that none of the observables change except for the time delays (Falco et al. 1985;Schneider & Sluse 2014).This mass sheet degeneracy (MSD) must be broken to extract the Hubble constant from time delay measurements, and this is one of the main systematic uncertainties in modern time delay cosmography (Birrer et al. 2020).The MSD is broken if you know the absolute magnification of the background source, which is possible with a lensed standard candle such as a glSNe Ia.
Unfortunately, there is one main barrier to using glSNe Ia to break the MSD: microlensing by stars.The image we observe is split into unresolvable microimages that are (de)magnified by the stars in the lensing galaxy (Young 1981;Paczynski 1986;Vernardos et al. 2024).We can only observe the total flux of these unresolved images, which can be (de)magnified from the macro-model.We need to know the macrolensing magnification to break the MSD, but the presence of microlensing adds significant stochastic noise (Dobler & Keeton 2006).In the worst case scenario this can introduce up to a magnitude of scatter (Weisenbach et al. 2021), which would make microlensed glSNe Ia no longer standardizable candles (Yahalomi et al. 2017).Foxley-Marrable et al. (2018) were the first to examine if some lensed images might suffer sufficiently small amounts of microlensing that they remain standardizable.By analyzing the microlensing of a uniform disk, they found that regions of low magnification and low stellar density have microlensing scatter comparable to the intrinsic scatter in the luminosity of an unlensed SN Ia1 .However, they did not use temporal information in their inference.As the supernova expands, it averages over different length scales of microlensing fluctuations.The glSNe never becomes sufficiently large to completely average out to the macro-model magnification, but the shape of the lightcurve can teach us about the nature of the microimages and hence inform us about the amplitude of the absolute microlensing magnification.
In what follows, we provide a discussion on the theoretical reasoning behind why certain regions of microlensing parameter space are or are not standardizable in Section 2. We discuss simulations for a simple model in Section 3. We discuss in Section 4 various methods and implementations for either selecting microlensing lightcurves that are standardizable or inferring a posterior of the microlensing magnification given an observed lightcurve.We extend from our single point to the entirety of useful microlensing parameter space in Section 5.In Section 6 we provide an estimate on the number of standardizable glSNe Ia expected to be discovered in the next decade from a forecasted LSST population, while discussing some of the limitations of our work in Section 7. We provide our conclusions in Section 8.

STANDARDIZABLE REGIONS OF MICROLENSING PARAMETER SPACE
In this section, we investigate why the time evolution of a microlensed lightcurve should be sensitive to the absolute microlensing magnification, and hence inform us about which glSNe images are standardizable.This section focuses on the theoretical underpinnings of why it should be possible to select a subset of standardizable microlensed images, while Section 4 will show how to do so with realistic observables.

The importance of the number of microimages
Microlensing fluctuations are highly dependent on where the image forms relative to the lensing galaxy.There are three main parameters of importance: the lens macro-model convergence and shear at the macroimage location,  and , and the ratio of stellar to total matter density at the location of the image, 1 −  (where  is the smooth matter fraction, which is predominantly the dark matter fraction but also includes any other smooth baryonic components).Foxley-Marrable et al. (2018) and Weisenbach et al. (2021) have already shown that microlensed systems with low macro-magnification and low stellar density are standardizable; see for example Figure 5 of Suyu et al. (2024).The physical reason that systems with low magnification and stellar density are standardizable is that the caustic network is sparse -the majority of the source plane is dominated by regions where the source does not lie within any microcaustics.Fundamentally, this is related to the idea that the mean number of microminima of the time delay surface (Wambsganss et al. 1992;Granot et al. 2003) for such systems is small.This is opposed to regions of parameter space with higher magnification or higher stellar density where the caustic network is more dense (see, e.g., Figure 12 of Vernardos et al. 2024), raising the expected number of microimages.Regions of the source plane can be indexed by how many caustics a source lies within, which we will denote with  = 1, 2, 3, ...2 .
Figure 1 shows three microlensing systems: one that would be standardizable with the Foxley-Marrable et al. ( 2018) method, an intermediate scenario, and one that is completely unstandardizable.Each of the microlensing histograms is bimodal, and an examination of the magnification maps and caustic structures reveals why: the histogram can be decomposed into subhistograms for each value of  (Rauch et al. 1992;Granot et al. 2003;Saha & Williams 2011).These subhistograms are offset from each other, since increasing  by 1 means the creation of another pair of microimages which introduces a jump in the minimum allowable magnification.The relative importance of the subhistograms is set by the density of the caustic network.
(i) For the standardizable system, the majority of the source plane consists of regions with  = 1, with a small fraction consisting of  = 2.The scatter is dominated by the  = 1 region, so the width of the microlensing histogram is small.
(ii) For the intermediate system, the relative fraction of  = 1 is lower and  = 2 is higher, making the bimodality more prominent and increasing the scatter.
(iii) The completely unstandardizable system has approximately equal fractions for  = 1 and  = 2. Since the subhistograms are offset, the total microlensing histogram is very broad.
The statement that a microlensed system is standardizable is really a statement that the likelihood of lying inside a microcaustic is low.If we could determine the value of , many glSNe would be standardizable.Unfortunately resolving the microimages is currently impossible.But there is a key piece of data that is observable, namely the lightcurve of the expanding supernova.The temporal variations as the supernova expands contain information about where it lies within the caustic network.

Constraining the number of microimages
Since supernovae expand overtime, a source that lies deep in the microcaustic network must eventually grow to the point that it crosses a caustic.The exact timescales involved and the rate of change of the lightcurve depend on the expansion velocity, the sizes of the caustics, and the mean spacing of the caustics; the latter two depend on the mass of the microlenses, , , and .Looking at Figure 1, it is fairly easy to convince oneself that sources starting in regions with  ≳ 3 will typically expand to cross a caustic more quickly than sources located elsewhere.Since caustics represent lines of extremely high magnification, caustic crossings induce substantial changes in the  microlensed supernova lightcurve.There are three basic lightcurve scenarios possible as the source expands which we illustrate in Figure 2: (i) The source lies near and inside (on the more magnified side of) one of the microcaustics.The micro magnification rises as the expansion approaches the caustic and then falls when parts of the disc fall outside of the microcaustic.
(ii) The source lies near and outside (on the less magnified side of) one of the microcaustics.The micro magnification sharply rises as parts of the disk cross the caustic.
(iii) The source lies sufficiently far from the microcaustics for the timescale of interest.The micro magnification is broadly insensitive to the size of the SN disk.
There can certainly be more complicated behaviour, for example, if the source lies closer to a cusp or overlapping caustics, but these are much rarer.
The shape of the lightcurve provides information about where the source lies within the caustic network.Clearly microlensing lightcurves that are flat originate far from caustics and so are not likely to be highly magnified.One would expect that this subpopulation will show much less micromagnification scatter, and hence be more standardizable.
We can quantify this by tracking whether or not a source placed within the microlensing map for the intermediate system ( =  = 0.4,  = 0.75) crosses a caustic in a given time period.A visualization of how this affects the microlensing magnification distribution is shown in Figure 3. Results are shown in Table 1.The high magnification tail of the distribution comes from regions of the source plane that are either i) near the inside of fold caustics, ii) near the outside of cusps, or iii) deep inside the caustics.These regions are quickly ruled out by the absence of caustic crossings.The magnification distribution for non-crossing events shrinks and becomes much more standardizable.
A problem with this approach is that configurations with dense caustic networks -high macro-magnification images or images forming in regions of high stellar density -will have very few source positions that do not cross a caustic as the SN expands.This approach also does not work for saddlepoint images as they have a low magnification tail which cannot be ruled out by the absence of a caustic crossing (see Appendix A).However, we expect intermediate systems to be standardizable under this approach and for it to be a substantial improvement upon previous analyses.

SIMULATIONS AND DATA
In this section, we discuss the assumptions used for our simulations, the simulated data that we will use throughout the paper, and sources of noise that are relevant for the lightcurves.

Assumptions
We will use a simple model for a microlensed supernova: we take the supernova to be a uniformly luminous disk that is expanding in radius constantly with time.While more complicated supernova models such as those used by Goldstein et al. (2018) and Huber et al. (2019) may technically be more accurate, Mortonson et al. (2005) and Vernardos & Tsagkatakis (2019) have shown that microlensing fluctuations are most sensitive to the half-light radius of the source as opposed to its luminosity profile.We furthermore use the following assumptions throughout this work: (i) Our lens is at a redshift  = 0.5 and our source is at a redshift  = 0.8, in a flat ΛCDM universe using the parameters of Planck Collaboration et al. (2020).Results loosely depend on the redshifts.
(ii) The microlenses are all of mass 0.3 ⊙ , which determines the Einstein radius   .Conversions to other masses can be made using the fact that   ∝ √ .(iii) The supernova has a constant expansion rate of 10 4 km/s (Pan 2020).The main purpose of this is to convert from size to approximate times where necessary.
(iv) The supernova peaks at 20 days in its rest frame.Days listed are days in the rest frame of the supernova unless otherwise specified.
(v) We will assume that we can separate the effects of microlensing from the intrinsic variations of the SN lightcurve, up to a constant uncertainty of 0.05 mag.This is comparable to the uncertainties in the model fitting of type Ia SNe (e.g.SALT3, Kenworthy et al. 2021, see Figure B1), which will set a fundamental floor on our ability to separate out the time evolution of the microlensing 3 .Figure 2 shows some example noisy microlensing lightcurves.

Simulations
We generate 100,000 microlensed lightcurves for the intermediate system parameters ( =  = 0.4,  = 0.75) to serve as our dataset.We use the inverse ray shooting method (Kayser et al. 1986;Wambsganss 1990) to create magnification maps that are ≈ 10  × 10  and ≈10,000 x 10,000 pixels with a pixel scale of 0.001  , or ≈ a quarter of a day of supernova expansion.We use enough stars to capture the bulk of the magnification (Katz et al. 1986), and distribute them in a rectangular region to reduce computational complexity while still accounting for the average microlensing deflection (Zheng et al. 2022).We can fit 10 x 10 = 100 expanding uniform disks onto a single map with no overlap after ≈ 100 days, requiring 1000 maps.We generate the maps and perform the convolutions on GPUs .Since we are only interested in lightcurves from sources that do not overlap on the source plane (not correlated), we do not need to convolve the entirety of the maps, greatly reducing the computation time needed.In addition to creating the lightcurves for each expanding source, we use a GPU version of Hans Witt's method (Witt 1990) to find the caustics of each star field in order to track whether the expanding disk crosses any caustics throughout the entirety of the lightcurve.

SELECTING STANDARDIZABLE LIGHTCURVES
In this section, we discuss how, in practice, to infer the posterior of the microlensing magnification given an observed lightcurve.We start by presenting two simple criteria for picking out lightcurves which do not cross caustics and discuss some of the difficulties that might arise due to noise when using these criteria.We then discuss how a bank of lightcurves can be used to estimate the amount of microlensing (de)magnification.Next, we examine how neural network regression can be used to predict the microlensing (de)magnification.Finally, we use a neural network to classify the lightcurves into two categories: did or did not cross a caustic.
Throughout this section, we assume glSNe lightcurves are observed from 5 days before peak up to 50 days after peak in the SN rest frame, with a 2 day cadence in the rest frame.

Simple criteria
We consider perhaps the simplest metric: measuring the standard deviation of our simulated lightcurves  lightcurve .We can then determine the fraction of simulated lightcurves that have  lightcurve less than some cutoff value, and what the microlensing scatter at peak 3 Unless it is possible to use the multiple microlensed images of the glSNe to improve upon classical Ia template fitting.we can select a subset (fraction) of lightcurves that have  lightcurve less than some desired cutoff.Lowering the cutoff reduces the fraction of lightcurves selected and their scatter.The red "+" marks where the cutoff for  lightcurve = 0.05, while the red "x" marks where the cutoff for  lightcurve = 0.01. is for those lightcurves.Results are shown in Figure 4.By selecting lightcurves which have small standard deviations, we select a subset with low scatter.A given amount of noise in the data sets a lower limit for the cutoff however, limiting the utility of this metric.In practice this noise is likely to come from imperfect knowledge of the unlensed SN lightcurve, rather than observational noise.If the standard deviation can only be recovered with 0.05 mag precision the improvement is marginal -70% of microlensing lightcurves will be consistent with flat and only the most extreme caustic crossings can be excluded.If a precision of 0.01 mag is achievable (which is likely the case if SALT3 mismatches correlate with time) then half of the lightcurves will be consistent with flat and the standardizability for this half of the dataset improves to approximately 0.15 magnitudes.

A bank of lightcurves
Since we are able to simulate microlensing curves, the statistically rigorous way to infer microlensing magnifications is to compare observations with similar simulated data (Kochanek 2004).
We assume that an observed lightcurve can be parameterized by a single value, the amount of microlensing (de)magnification at peak, .We then have that

𝑃(𝜇|𝐷) ∝ 𝑃(𝐷|𝜇)𝑃(𝜇)
(1) There is additionally a nuisance parameter, the position y of the source within the microcaustics, that must be marginalized over.The marginalization is approximated as a sum over a finite number of source positions, i.e. a finite number of simulated lightcurves.The likelihood  ( |(y)) is given by a  2 statistic of the difference between the data and the simulations.Thus, () is determined by summing up the finite collection (bank) of lightcurves, where each lightcurve in the bank is weighted by how well its shape matches the observed lightcurve.We separate our 100,000 lightcurves into two sets: 90,000 lightcurves with no noise to serve as perfect members of the bank, Figure 5. Inferred microlensing scatter at peak luminosity for our 10,000 mock observed lightcurves, versus the number of bank lightcurves that make up the inner 95% of their posteriors.The majority of mock observed lightcurves have scatter less than 0.2 mags -33% fall in the yellow bin, with a scatter of ≈0.12 mags.However, some of the lightcurves are matched only by a handful of bank lightcurves and so their posteriors are likely spurious.and 10,000 lightcurves with noise to serve as our mock observed data.The lightcurves are all shifted in magnitudes such that their means are each 0. This way, there is only relative knowledge of how their shapes evolve over time.
We use Equation 2to calculate the microlensing magnification posterior for each of the 10,000 mock observed lightcurves.We then calculate the scatter of the posterior for each lightcurve.The majority of mock observed lightcurves have microlensing scatters less than 0.2 mags.This is roughly a factor of 2 improvement from the point source microlensing histogram, indicating that information about the lightcurve shape can narrow down the predicted microlensing (de)magnification.Figure 5 shows these scatters as a function of how many bank lightcurves are similar.
The majority of the bank and the dataset consists of lightcurves with no distinguishing features (flat) and these are easily standardized as seen in Figure 5.In contrast, there are few lightcurves which have greater time variability, i.e. caustic crossings.The lack of similar curves in the bank makes the inferred posteriors unreliable: the approximation in Equation 2 only works if the bank is sufficiently well sampled.One could simulate more lightcurves to give a reliable posterior for everything (see, e.g., Kochanek 2004, where the order of 10 8 lightcurves are used), but since the bulk of the lightcurves are flat, and these are standardizable, we leave this for future work.

Machine learning -regression
We train a neural network to predict the (de)magnification due to microlensing.The network is a fully connected network with 2 hidden layers -simple, but sufficient for our purposes.The size of the input layer is determined by the observation length and cadence of the lightcurve, while the two hidden layers each have half as many neurons as the input layer to avoid overfitting.We take our set of 100,000 noisy lightcurves and set aside 80,000 as as training data, 10,000 as validation data, and 10,000 as test data.The label for each lightcurve in the training and validation sets is the amount of (de)magnification at peak supernova luminosity.Training is stopped when the training loss (mean squared error) on the validation data stops decreasing.The network is then applied to the 10,000 test lightcurves.
Figure 6 shows the predicted microlensing magnifications compared to the true values.The peak in the distribution at 0.5 mag is due to all of the flat lines.The neural network has learned i) the average magnification of the flat lines (which make up the bulk of the data), and ii) that it can can minimize the error by assigning the mean value of the flat  = 1 lines to all of them.There are, however, flat lines that come from the  = 2 region which therefore have an incorrect prediction.This is the reason for a slight tail in the distribution to the right of the peak, which cannot be removed with the simple point estimator of this network.More complicated networks that return a full posterior should be able to resolve this problem (e.g.Bayesian Neural Networks, Jospin et al. 2022).
The neural network made some progress on the remaining data, which show a roughly even amount of scatter around the predicted = truth line with widths of 0.2-0.25 magnitudes.This is slightly greater than the assumed intrinsic supernova scatter of 0.15 magnitudes.Furthermore, the number of lightcurves with bright predicted magnifications is small, making it somewhat difficult to be as confident in their scatter.While the small number of lightcurves with bright magnifications is due to the particular point we picked in microlensing parameter space, and could potentially be remedied with more simulations, the improvements to the flat lightcurves are in line with Table 1 and our expectations.

Machine learning -classification
We train a different neural network to classify the lightcurves into two categories: did it cross a caustic or not?We use the same simple network architecture as before but instead of training the network to learn the underlying magnification at peak, the training set is tagged 1 if it did not cross a caustic and 0 if it did cross.The output of the network then is no longer the magnification at peak, but the probability of a lightcurve belonging to either of the two categories.Figure 7 shows the ROC curve for the network when applied to the test data, along with the fraction of lightcurves selected as a function of the classification threshold and their associated scatter.At a classification threshold of 75% (roughly the threshold with the highest true positive rate and lowest false positive rate), slightly less than 60% of the lightcurves are inferred not to have crossed a caustic, and the scatter from those lightcurves is roughly 0.2 mags -consistent with the expectations from Table 1.

Remarks
We have shown that there are a variety of viable methods for reducing the scatter for the intermediate scenario microlensing parameters that we chose ( =  = 0.4,  = 0.75).Ultimately, each method gives approximately the same results, with the same underlying physical reason: flat microlensing lightcurves, which come from supernovae that do not cross a caustic as they expand over a long enough timescale, are more standardizable.The performance of the lightcurve bank and the regression neural network yield some further improvement for lightcurves that do cross a caustic, but since these events start off with much higher variance the improvement will not have much impact on the inference of the Hubble constant from a population of glSNe.The simple method of using the standard deviation of the lightcurve is sufficient to quickly select out the standardizable microlensing lightcurves.

COVERING MICROLENSING PARAMETER SPACE
Whilst the previous section focused on a particular set of parameters ( =  = 0.4,  = 0.75), we expect that other regions of microlensing parameter space with lower magnifications or lower stellar fractions can be similarly improved.We turn now to covering the rest of the useful parameter space.We opt to use the final neural network-based approach where lightcurves are sorted into two categories.
Any triplet of model values (, , ) can be transformed into a doublet of ( = , ) (Paczynski 1986;Kochanek 2004;Vernardos et al. 2014;Schechter et al. 2014), which is applicable for a singular isothermal elliptical potential.We therefore cover the space from  =  = 0.254 to  =  = 0.5 and from  = 0 to  = 1.For each point sampled in this space, we create 5,000 microlensing lightcurves using the same procedures discussed in Section 3.
We take 4000 lightcurves from every point sampled in the parameter space to create our training set and use 1000 lightcurves from each point as validation to avoid overfitting5 .This is done for two reasons: first, we would expect to get a fair number of flat lightcurves in the standardizable regions and non-flat lightcurves in the unstandardizable regions; second, by training the network on a sample of lightcurves that come from everywhere in the space and therefore show potential complexities from multiple caustic crossing events, we expect it to be more general and applicable.
Once the training is completed, we generate 5000 lightcurves at each point in parameter space to test the performance of the network.We calculate the scatter on those lightcurves which the network says does not cross a caustic.Figure 8 shows the results.The neural network selects only a fraction of the lightcurves (shown in Figure C1).However, the standardizable region of parameter space has been improved when compared to considering just the point-source histogram (i.e. using no time-series information from the lightcurve, Foxley-Marrable et al. 2018).More systems with magnified counterimages are standardizable, and a large fraction of parameter space with demagnified counter images are now standardizable as well.

STANDARDIZABLE LSST LENSED IA SUPERNOVAE
The discussion up to this point has focused on the theoretical improvements that could be made to standardizing microlensed Ia supernovae based on temporal information from their lightcurves.In this section, we discuss the practical difficulties of actually observing a glSN Ia which can be standardized.We then estimate the number of standardizable glSNe Ia to be discovered by LSST in the next decade.Finally, we examine how well we can constrain systematics in measurements of  0 from the mass sheet degeneracy with a sample of standardizable glSNe Ia.
Given that the unstandardizable saddlepoint macroimages are the trailing images, standardizing lensed supernovae heavily relies on discovering the leading image (or two, if the system is a quad) before peak.For a doubly imaged system, there is only one chance to standardize it.For a quad, there can be a second chance, but only for the rare quads with two standardizable images discovered early enough.

Generating (micro-)lensed supernovae samples
We follow Sainz de Murieta et al. (2023) to generate a population of glSNe Ia and determine the fraction that will be observable, useful for time-delay cosmography, and standardizable.While Sainz de Murieta et al. ( 2023) focused on unresolved lightcurves, the methods can be used to create a catalogue of a large population of glSNe Ia observed in the  band.
Since we need to measure the microlensing in each image, the images need to be resolvable; we limit ourselves to systems where the minimum image separation is 0.8 ′′ .In order to be a good candidate for time delay cosmography, the supernova images also need to have an appreciable (≳ 10 days) time delay, as time delays can be recovered to within a day or two (Pierel et al. 2021;Huber et al. 2022).We additionally impose that the system be discovered no later than 10 days before the peak of the first (second) image for doubles (quads) in the observer frame.This allows a long baseline to measure the evolution of the microlensing signal as the SN disk grows as in Section 4.
We estimate the stellar fractions at the image locations by assuming an elliptical de Vaucouleurs profile for the light (Vernardos 2019) with effective radii calculated from the scaling relation in Hyde & Bernardi (2009).Following Dobler & Keeton (2006), we assume isothermal total density profiles and normalise the stellar component such that the maximum stellar fraction is 1.This will make our forecasts somewhat pessimistic since observations prefer lower stellar fractions, but it depends on the stellar initial mass function (Auger et al. 2010).
We furthermore limit ourselves to systems where microlensing cannot demagnify the first image enough to be unobservable.This is done to ensure that any observed image would be a fair draw from the microlensing magnification distribution, rather than an unfair sampling from the brighter tail of the distribution.This would naively restrict us to redshifts where the unlensed supernova magnitude is greater than the detection limit of the survey; when accounting for microlensing, the minimum magnification is  = (1 −  • ) −2 which, depending on the macro-model and the stellar fraction for the image, allows us to push to slightly farther redshifts.We use 24th magnitude at this minimum magnification as our detection limit, approximately the value appropriate for LSST.

Results
We can now estimate rates of standardizable glSNe and make forecasts for breaking the mass sheet degeneracy with these systems.

Rate estimates
We start with a population of glSNe from Sainz de Murieta et al. ( 2023) with a rate of 13 doubles (4 quads) per year which are discovered early enough for standardization to be possible.We show one realization from the catalogue of such systems in Figure 9.Of those, 8.3 (1.3) per year are cosmologically golden, with time delays > 10 days and minimum image separations of 0.8 ′′ .Of the golden systems, 6.0 doubles and 0.3 quads per year have standardizable images with flat lightcurves.

Constraints on 𝐻 0
The MSD is governed by a parameter .The propagation of magnification uncertainties onto  is given in Appendix D. If we assume the intrinsic supernova scatter of 0.15 mag dominates the uncertainty on the observed magnification and microlensing dominates the uncertainty on the model magnification, we can infer how each standardizable system will constrain the systematics from the mass sheet transformation.Combining the ≈60 expected cosmological golden standardizable doubles, we find that they should constrain the population average ⟨⟩ to 1.00 +0.07 −0.06 % fractional uncertainty, and therefore detect systematics in  0 from the MSD at the 1% level.We note that if the intrinsic supernova scatter is lower (0.1 mag), this changes to a 0.74 +0.06  −0.05 % fractional uncertainty on ⟨⟩ and  0 .

Comparison to previous works
Our results are more pessimistic than those of Foxley-Marrable et al. (2018).The main factor behind this is a decrease in the estimated number of systems to be discovered with LSST; compare, e.g., Goldstein et al. (2019) 2023) estimate slightly higher rates of detection for LSST, which in part come about due to considering alternative image detection methods that do not rely solely on magnification (see also, e.g., Wojtak et al. 2019).However, detection methods such as image multiplicity miss early time information which could be key for standardizing microlensed lightcurves.
Comparing our results to those ignoring the time evolution of the lightcurve (i.e., roughly following Foxley-Marrable et al. 2018, and using just the point source microlensing histograms), we find that ≈20% more of the cosmological golden systems will be standardizable.We were able to substantially increase the size of the standardizable region of microlensing parameter space in Figure 8, which gives a 30% improvement on the number of standardizable systems in the whole population.However, the majority of systems with long time delays tend to come from the regions that were already standardizable under the considerations of Foxley-Marrable et al. (2018).
A byproduct of our methods is a decrease in the scatter on the systems that were already standardizable, at the cost of losing a small fraction of lightcurves.However, we get similar constraints on  0 for our simulated systems if we ignore information about the shape of the lightcurves: the rate of standardizable doubles decreases to 5.1 per year, which leads to a 1.11 +0.09 −0.07 % fractional uncertainty on  0 from the MSD.This suggests that model macro-parameters are the main driving factors in standardizability.This comes from the fact that whilst we have tightened () for all macrominima, the intrinsic scatter in the Ia population sets a floor on the utility of an individual system in breaking the MSD.

Observing the counter-images of standardizable images
Actually doing time-delay cosmography with these systems requires us to detect the trailing saddlepoint image(s).Figure 10 shows the distribution of counter image magnifications at peak, from the lens macro-model only.The distribution peaks at 24.5 mags in the  band for the doubles, 24 mags for image 4 of the quads, and 22.5 mags for image 3 of the quads.However, this ignores the fact that saddlepoint images are more susceptible to microlensing demagnifications (Schechter & Wambsganss 2002).Followup observations of the trailing image(s) will be difficult, but possible.
When considering the number of standardizable systems with bright counter-images, our lightcurve method represents a substantial improvement upon the previous histogram method of Foxley-Marrable et al. (2018).If we consider only the doubles which have a trailing image that peaks brighter than 25th mag, our yearly rate of standardizable doubles drops from 6.0 to 3.9, compared to the 3 per year expected when ignoring lightcurve shape.After ten years, we should have 39 systems and a 1.28 +0.12 −0.09 % fractional uncertainty on  0 from the MSD.For counter-images brighter than 24th magnitude we expect 2 standardizable systems per year and 1.8±0.2%fractional uncertainty on  0 from the MSD.

LIMITATIONS
The detection of any microlensing signal is limited due to uncertainties in the supernova model.We have pessimistically assumed 0.05 mag uncertainty on each point in our lightcurves.Whilst this is comparable to the mismatch in SALT3, we have neglected temporal or chromatic correlations, both of which might be able to improve upon our assumed 0.05 mag uncertainty.
The standardizable lightcurves are typically demagnified from the macro-model prediction.Although in our study we limited ourselves to simulated systems where microlensing could never demagnify the first image below the detection threshold of LSST, we must keep in mind that this will not always be the case.Supernovae at higher redshifts ( ≳ 1) begin to suffer from a microlensing magnification bias in LSST, as microlensing can demagnify (relative to the macro-model) the first image of systems below the detection thresh- We have assumed that the lens macro-model parameters for the images are perfectly known.In reality, uncertainties on , , and  require properly marginalizing over a region of the parameter space in Figure 8, rather than taking the scatter at an individual point.If uncertainties on the macro-model parameters are not large, the slow variations of the scatter in the standardizable region suggests that there will be minimal changes to results.
While our simulated systems have values of  calculated with simple assumptions,  should properly be inferred from the mass to light ratio of the lensing galaxy.If we were to use a Salpeter or Chabrier IMF, the smooth matter fraction would increase (Foxley-Marrable et al. 2018), providing a small boost in the number of standardizable systems.A rough estimation of this effect by halving the stellar fractions used leads to little change in our results: 7.6 standardizable doubles per year and 0.4 quads, with the doubles constraining  0 to 0.90 +0.06 −0.05 % fractional uncertainty from the MSD.Furthermore, there is an additional dependence of microlensing on the mass function of the lenses (Schechter et al. 2004) that is usually ignored.This dependence is difficult to manifest in microlensed quasars for physical reasons (Lewis & Gil-Merino 2006) but the implication for lensed supernovae has not been investigated.
While we considered the simple model of a uniform expanding disc, real supernovae will have a 2D intensity profile that must be convolved with the microlensing magnification pattern (Goldstein et al. 2018;Huber et al. 2019).We don't expect any changes to our conclusions due to the fact that the standardizable lightcurves have essentially constant microlensing magnification.

CONCLUSIONS
In this work, we have investigated how the temporal information from microlensing lightcurves can be used to reduce the uncertainty on the unlensed SN magnitude.We have shown that it is possible to select a subset of glSNe Ia images which can be used to break the mass sheet degeneracy.The main idea we have put forward is that by selecting lightcurves of supernovae which do not cross any microcaustics as they expand, you can restrict yourself to regions of the source plane typically outside the microcaustics which have lower amounts of scatter.Our method relies on having a long enough sequence of observations to rule out caustic crossings.
Using a simulated sample population of lensed Type Ia supernovae with resolvable (> 0.8 ′′ separation) images and long (> 10 days) time delays, we estimate the number of detectable, standardizable glSNe Ia systems to be discovered by LSST in the next decade as ≈60 doubly imaged systems and ≈3 quadruply imaged system.The doubles can constrain systematics in  0 from the MSD at a 1.00 +0.07 −0.06 % precision level.While the first image for the majority of cosmologically useful doubles should be able to break the mass sheet degeneracy, there will be observational challenges in following up the second image to measure the time delay.This is mostly driven by the faintness of the (typically demagnified) counter-images, which have a median brightness of ≈25 mag before accounting for additional microlensing (de)magnification.If we are only able to followup counter-images brighter than 24th magnitude, only a third of the double image systems are retained, and the fractional uncertainty on the breaking of the MSD degrades to 1.8 ± 0.2%.
Time delay measurements will already require high-quality observations up to and after peak supernova luminosity (Pierel & Rodney 2019;Huber et al. 2019).In that sense, no additional data should be required to determine whether the expanding supernova crosses a microcaustic or not.Given the observational cost of followup, it should be focused on systems where the first image is most likely to be standardizable.When the microlensing effect does not vary with time we essentially know the intrinsic shape of the unlensed SN lightcurve.By focusing on standardizable systems it will be easier to infer time delays, and after 10 years of LSST allow us to test the impact of the MSD on  0 at 1% precision.

APPENDIX C: FRACTION OF LIGHTCURVES SELECTED
Figure C1 shows the fraction of lightcurves which the neural network is 75% confident did not have any caustic crossings as a function of lens macro-model parameters.For the majority of standardizable glSNe Ia, at least 70% of the lightcurves should be useable (flat).

APPENDIX D: PROPOGATION OF UNCERTAINTIES
Scaling the mass model by the mass sheet parameter  and introducing the presence of a mass sheet of convergence  = 1 −  causes the magnification to transform as  →  −2 . (D1) Given the known intrinsic brightness of a supernovae and its observed brightness, we can infer the magnification.By comparison with the magnification predicted by the model, we can constrain the value of .In the absence of microlensing, a single standardizable glSN Ia can constrain the value of  to within ≈ 5% fractional uncertainty (Mörtsell et al. 2020;Birrer et al. 2022).
Using the fact that  is a ratio of model and observed magnifications, where  Δ obs is the uncertainty on the observed magnification of the supernova and  Δ model is the uncertainty on the magnification of the model, in magnitudes.
This paper has been typeset from a T E X/L A T E X file prepared by the author.

Figure 1 .
Figure 1.Top: Microlensing histograms for systems that are (from left to right) standardizable, intermediate, and unstandardizable.The vertical dashed lines mark the inner 68% of the histogram.The subhistograms are for those regions completely outside the caustics ( = 1), inside one caustic ( = 2), or deeper in the caustic network ( ≥ 3).Middle: Sample caustic patterns that produce the microlensing histograms.Bottom: Zoom of the caustics for the indicated central square of the larger maps.The solid, dashed, and dotted circles denote the size of a fiducial supernovae 0, 50, and 100 days after peak luminosity respectively.

Figure 2 .
Figure 2. Top: Sample microlensing lightcurves for a source that i) crosses a caustic inside to outside (blue), ii) crosses a caustic outside to inside (orange), or iii) does not cross a caustic (green), within the time period examined.Bottom: Sample noisy lightcurves, assuming a 2 day cadence in the source rest frame and 0.05 mag Gaussian noise.

Figure 3 .
Figure 3. Left: Microlensing magnification distributions for the intermediate scenario of Figure 1.The vertical dashed lines mark the inner 68% of the histograms.While the point source histogram is for the entire source plane, the subsequent histograms are only for those regions of the source plane where an expanding source will not cross a caustic by some given time.Right: Visualization of the ruled out regions in the source plane.The solid and dashed white circles in the magnification maps denote the size of a fiducial supernova at each time period.

Figure 4 .
Figure 4. Given the standard deviation  lightcurve of the lightcurve data points, we can select a subset (fraction) of lightcurves that have  lightcurve less than some desired cutoff.Lowering the cutoff reduces the fraction of lightcurves selected and their scatter.The red "+" marks where the cutoff for  lightcurve = 0.05, while the red "x" marks where the cutoff for  lightcurve = 0.01.

Figure 6 .
Figure 6.Predicted versus true microlensing (de)magnification from the regression neural network.The white solid and dashed lines mark the median and inner 68% of the distribution of true values for lightcurves within the same bin of predictions.The black dotted line marks where prediction = truth.

Figure 7 .
Figure 7. Top: ROC curve for the classification neural network.Bottom: Fraction of lightcurves selected as not crossing a caustic, and their scatter, as a function of classification threshold.The red plus symbol marks the 75% classification threshold in both figures.

Figure 8 .
Figure 8. Contour plot showing the scatter over microlensing parameter space for those lightcurves which the neural network predicts did not show any caustic crossings.The lightcurves used assume we observe every 2 days from 5 days before peak up to 50 days after peak (in the rest frame of the supernova) and have 0.05 mag noise on the data.The solid white line denotes the contour of 0.15 mag microlensing scatter for the lightcurves.The dashed white line denotes the contour of 0.15 mag scatter when considering only the point source histograms, i.e. no time-series information (Foxley-Marrable et al. 2018).The vertical black dotted line marks the boundary where the counter-image of a singular isothermal sphere is demagnified (left) or magnified (right).

Figure 9 .
Figure 9.One realisation from the catalogue of glSNe systems expected after a decade of LSST observations.The dashed and solid black lines denote the boundaries where systems are standardizable either via the microlensing magnification histogram or their lightcurves.

Figure 10 .
Figure 10.Distribution of counter image Sloan i band magnitudes under the lens macro-model only for the standardizable systems.

Figure A2 .
Figure A2.Microlensing magnification histogram for a faint macrosaddle showing the  = 0 subhistogram.Even when ruling out regions where an expanding supernova would cross a caustic, the low magnification tail of the histogram prevents the system from being standardizable.

Figure B1 .
Figure B1.Top: SALT3 supernovae templates with errors due to the model covariances.Bottom: Just the errors as a function of supernova phase.

Figure C1 .
Figure C1.Contour plot showing the fraction of lightcurves which the neural network predicts did not show any caustic crossings.The lightcurves used assume we observe every 2 days from 5 days before peak up to 50 days after peak (in the rest frame of the supernova) and have 0.05 mag noise on the microlensing lightcurve.The solid white line denotes the contour of 0.15 mag microlensing scatter for the lightcurves.The vertical black dotted line marks the boundary where the counter-image of a singular isothermal sphere is demagnified (left) or magnified (right).

Table 1 .
Microlensing magnification scatter for an expanding source which has not crossed a caustic up to some time for the intermediate configuration ( =  = 0.4,  = 0.75).Values were found by convolving one large scale map.Fractions indicate what percentage of locations on the source plane have not crossed a caustic by the indicated time.The first row gives results for all of the source plane under consideration, while subsequent rows show the decomposition into regions with various values of  .