Substructure in the lens HE 0435-1223

We investigate the properties of dark matter substructure in the gravitational lens HE 0435-1223 (z_l=0.455) via its effects on the positions and flux ratios of the quadruply-imaged background quasar (z_s=1.689). We start with a smooth mass model, add individual, truncated isothermal clumps near the lensed images, and use the Bayesian evidence to compare the quality of different models. Compared with smooth models, models with at least one clump near image A are strongly favored. The mass of this clump within its Einstein radius is log(Mein/Msun) = 7.65 +0.87/-0.84. The Bayesian evidence provides weaker support for a second clump near image B, with log(Mein/Msun) = 6.55 +1.01/-1.51. We also examine models with a full population of substructure, and find the mass fraction in substructure at the Einstein radius to be f_sub>0.00077, assuming the total clump masses follow a mass function dN/dM proportional to M^(-1.9) over the range M = 10^7-10^10 Msun. Few-clump and population models produce similar Bayesian evidence values, so neither type of model is objectively favored.


INTRODUCTION
A tension has arisen between the cold dark matter (CDM) paradigm and certain astronomical observations. On the theory side, N-body simulations have reached consensus that galaxy-scale dark matter halos should contain many bound subhalos that follow a power-law mass function, dN/dM ∝ M α . Probing from ∼ 10 10 M down to ∼ 10 4 M , simulations such as Via Lactea ) and Aquarius (Springel et al. 2008) predict a mass function slope of α ≈ −1.9 and a fractional amount of substructure in the vicinity of 8-11% (depending to some extent on resolution).
Observationally, however, the prediction of dark matter substructure has not been confirmed. Various surveys have sought to characterise the abundance, masses, and spatial distribution of low-mass galaxies in the Local Group (e.g., Simon & Geha 2007;Kalirai et al. 2010). Before 2005, only the 11 most massive and luminous Milky Way satellites had been found (Mateo 1998). After 2005, the Sloan Digital Sky Survey (York et al. 2000) made it possible to detect extremely faint satellites (e.g., Willman et al. 2005;Irwin et al. 2007;Liu et al. 2008;Belokurov et al. 2009Belokurov et al. , 2010. These "ultra-faint" dwarfs, with absolute magnitudes as low as MV ∼ −2, have more than doubled the number of Milky Way satellites to ∼ 25 (for a current list, see Wadepuhl & Springel 2011). Yet, despite this dramatic leap forward, the number of satellites still falls severely short of the hundreds predicted by N-body simulations (Klypin et al. 1999;Moore et al. 1999).
A clear contributor to this disparity is the lack of a complete and thorough survey of the local volume. Indeed, while a huge improvement over previous surveys, the SDSS is limited in both sky coverage (∼1/5 of the sky) and depth (g < 22.2). Attempts to account for these limitations suggest that a volumetrically complete survey will find many more satellites and may eliminate the problem altogether (Tollerud et al. 2008). Those estimates depend, however, on extrapolations from the currently known population, and it is quite plausible that even a complete survey will not find all the predicted satellites. If so, any remaining discrepancy between theoretical predictions and observations will presumably be attributed to the intrinsic luminosities of low-mass dwarfs. Satellites with total mass 10 7 M can experience suppressed or even quenched star formation (e.g., Strigari et al. 2007;Macciò et al. 2010). Cosmic reionisation, UV photo-evaporation, ram pressure or tidal stripping, supernovae, and cosmic rays may all play a role in hampering the conditions for star formation (Gnedin 2000;Scannapieco et al. 2001;Strigari et al. 2007;Madau et al. 2008;Mashchenko et al. 2008;Macciò et al. 2010;Peñarrubia et al. 2010;Wadepuhl & Springel 2011). While the precise mechanisms are still debated, the plausibility of such arguments points to a large population of "dark dwarfs", whose luminosities are so low that they will elude traditional observational techniques.
Intriguingly, while local observations of satellite galaxies seem to fall short of CDM predictions, measurements in more distant galaxies exhibit the opposite conflict. Sensitive to mass alone, strong gravitational lensing provides a unique tool to detect low-mass subhalos in cosmologically distant galaxies, regardless of their luminosities (e.g., Dalal & Kochanek 2002;Vegetti et al. 2010b). On large angular scales (∼ 1 ), the bulk properties of multiply-imaged quasars are determined by the macroscopic mass distribution of the lens galaxy and its surrounding environment. Upon detailed inspection, however, the properties of the images may be perturbed by small-scale structure in the mass distribution (Mao & Schneider 1998;Chiba 2002;Metcalf & Madau 2001;Dalal & Kochanek 2002;Metcalf & Zhao 2002;Bradač et al. 2002;Koopmans et al. 2002;Chen et al. 2007; Keeton & Moustakas 2009;Keeton 2009). Thus, with the positions, flux ratios, and time delays of lensed images we may be able to measure the properties of small-scale structure in lens galaxies.
Currently, some of the best constraints on dark matter substructure (outside of the Local Group) come from the analysis of "anomalous" flux ratios in four-image gravitational lenses. Many lenses have flux ratios that violate universal relations predicted for smooth mass models . Performing a statistical analysis of seven lenses, Dalal & Kochanek (2002) found the mass fraction in substructure to be 0.006 < f sub < 0.07 (90% confidence) at the Einstein radii of the lenses. This stands in contrast to CDM predictions, which yield f sub ∼ 0.001-0.003 at similar projected radii (Mao et al. 2004;Amara et al. 2006;Macciò & Miranda 2006). In particular, Xu et al. (2010) recently found that N-body simulations predict f sub ∼ 0.002 at typical Einstein radii even when considering other sources of small-scale structure beyond dark matter substructure (e.g., globular clusters, stellar streams).
Observational constraints, therefore, seem at odds. Tallies of Milky Way satellites seem to indicate a dearth of substructure, while lensing points to a surplus. Confronting this on the lensing side, there is great interest in expanding both the list of observables and the sample of lenses used to probe substructure. For example, infrared observations of lenses have begun to increase the number of quasar lenses available for flux ratio studies (e.g., Chiba et al. 2005;MacLeod et al. 2009;Minezaki et al. 2009;Fadely & Keeton 2011). Image positions ) and time delays (Keeton & Moustakas 2009) can complement flux ratios by providing different sensitivity to substructure in quasar lenses. Also, Einstein ring images offer a new way to probe substructure in galaxy-galaxy strong lenses (Vegetti & Koopmans 2009b;Vegetti et al. 2010a,b). In particular, Vegetti et al. (2010b) recently used a Bayesian analysis to infer f sub = 0.0215 +0.0201 −0.0125 in the lens SDSS J0946+1006, assuming α = −1.9 ± 0.1 for a mass range from M total = 10 6.6 M to 10 9.6 M .
In this paper we investigate the properties of the four image gravitational lens HE 0435−1223 (hereafter HE0435), selected for its relatively bright (F 160W < 18.1) and well separated (2.4 ) images (see Fig. 1). Since its discovery (Wisotzki et al. 2002), HE0435 has been extensively studied using ground-and space-based observations. From the ground, optical spectroscopy provided early evidence for stellar microlensing and against significant differential dust extinction in the lens (Wisotzki et al. 2003). More recently, optical monitoring has quantified the intrinsic and microlensing variability, and also revealed the time delays between images Courbin et al. 2010). Hubble Space Telescope imaging provided photometric evidence that the lens lies in an overdense environment (Morgan et al. 2005), and pencil-beam redshift surveys have confirmed the presence of a group of galaxies surrounding the lens (Wong et al. 2011). A galaxy lying near the lens on the sky (labeled G22 by Morgan et al. 2005) seems to be important for reproducing the lensed images . Using the available data, including new nearinfrared photometry (Fadely & Keeton 2011), we examine the mass distribution of HE0435 and pay particular attention to any evidence for substructure. New in our analysis is the use of both individual-clump and population-based simulations of substructure, which allow us both to constrain the masses of clumps near the images and to connect them to the broader substructure population. Not included in our analysis are effects from small mass halos along the line of sight. While such structures may produce millelensing effects similar to subhalos within lens galaxies, their ultimate importance is still debated (e.g., Chen et al. 2003;Metcalf 2005). Where necessary we assume a flat cosmology with Ωm = 0.27 and H0 = 70.4 km s −1 Mpc −1 , which is similar to the mean WMAP+BAO+H0 values presented in Komatsu et al. (2011).

CONSTRAINTS
Out of the previous observations of HE0435 we must select the data we seek to fit. The chosen data should provide valuable constraints on the lens mass distribution, be practical to use, and permit a straightforward interpretation. The optimal astrometric data are the HST-derived centroids of the lensed images and the main lens galaxy, G1 , along with the position of the neighboring galaxy, G22 (Morgan et al. 2005). The redshift of G22 is not known (see Wong et al. 2011); we assume this galaxy lies at the same redshift as G1. The data are summarised in Table 1.  Kochanek et al. (2006). The R-band flux ratios reflect the mean and standard deviation from light curve monitoring, and include scatter from intrinsic and microlensing variability. The K and L -band photometry of the images are from Fadely & Keeton (2011). The data for the lens galaxy G1, and the F160W magnitude of the neighbor galaxy G22, are from Kochanek et al. (2006), while the remaining data for G22 are from Morgan et al. (2005).
More care must be given to the photometric data. Several datasets are available: Kochanek et al. (2006) present R-band monitoring, Mosquera et al. (2011) present photometry in one broad-band and six narrow-band filters spanning ∼3500-8100Å, Wisotzki et al. (2003) present optical integral field spectroscopy, and Fadely & Keeton (2011) present photometry in the near-infrared K and L bands. Figure 2 shows the main dependences on time and wavelength using the optical monitoring of Kochanek et al. (2006) and the NIR flux ratios of Fadely & Keeton (2011). Three key features are seen in these data. There is clear time variability in the R-band flux ratios. All three L -band flux ratios are consistent with the mean values of the R-band flux ratios. Two of the K-band flux ratios are likewise consistent with the other wavelengths, but the K-band value of the B/C flux ratio is a factor of ∼1.3 higher than the corresponding R-and L -band values. One other key result, from analysis of spectra by Wisotzki et al. (2003), is that there is no evidence of dust extinction in the lens galaxy.
The subtlety here is that the measured flux ratios may be affected by stellar microlensing, but we would prefer to omit microlensing from our analysis to the extent possible because it adds considerable computational expense and distracts from our focus on dark matter substructure. Thus, we need to understand whether it is possible to account for or even eliminate microlensing from the flux ratio constraints. One simple possibility is to broaden the errorbars so they encompass microlensing effects. We do that by computing the standard deviation across all epochs in the R-band light curves from Kochanek et al. (2006). This incorporates all microlensing and intrinsic variability that occurred during the 2-year span of the light curves.
It does not, however, account for microlensing effects with time scales longer than ∼2 yr. To take the analysis one step further, we consider the multi-wavelength structure of the source quasar. For a source redshift of zs = 1.689, the R-and K-band observations probe rest-frame UV and optical wavelengths that are dominated by thermal emission from the hot quasar accretion disk, which is small enough (∼ 10 15−17 cm; Morgan et al. 2010) to be sensitive to microlensing. By contrast, the L -band observations (rest frame 1.4 µm) should contain emission from both the accretion  Kochanek et al. (2006), while square and triangular points show single-epoch K-and L -band data from Fadely & Keeton (2011). Solid and dotted lines indicate the mean and 68% confidence ranges across all epochs in the R-band data.
disk and the surrounding dust torus (Rowan-Robinson 1995; Nenkova et al. 2008). The relative contributions of the two components are not known exactly, but the dust torus probably accounts for 20-80% of the flux (e.g., Wittkowski et al. 2004;Hönig et al. 2008). Since the dust torus should be large enough to be immune to microlensing, its contribution should cause the L flux ratios to have little or no variability from microlensing. We therefore interpret the similarity between the L -and R-band flux ratios as evidence that there is no significant long-term microlensing affecting the R-band light curves.
In other words, we can take either the L -band measurements or the R-band measurements (with broadened errorbars) as microlensing-free estimates of the flux ratios. In some sense the choice is not very important because the two sets of measurements are consistent with each other. In practice, it is easier to work with the R-band data because at these wavelengths the source is much smaller than the Einstein radius of mass clumps larger than ∼ 100 M , so we can effectively treat it as a simple point source in our study of lensing by dark matter substructure (cf. Dobler & Keeton 2006).
In the context of this analysis we still need to understand why the K-band measurement of the B/C flux ratio differs from the other measurements. From Figure 2, the B/C flux ratio must vary significantly with time and/or wavelength. Dark matter subhalos can be ruled out as the cause because the agreement between the L and R flux ratios suggests that there is little or no "chromatic millilensing" in HE0435 (see Dobler & Keeton 2006). Microlensing may be a viable explanation, though, if the Einstein radii of stars in the lens galaxy are comparable to or larger than the size of the K-band source. We examine this hypothesis carefully in Section 7.1, considering not only how microlensing might explain the K-band data but also whether it could have altered the L -band data as well. To jump ahead, we conclude that microlensing can indeed explain the K-band data without ruining our interpretation that the L -band data provide good estimates of the microlensing-free flux ratios.
In our primary modeling we elect not to use measured time delays as constraints. At the time of our analysis, Kochanek et al. (2006) had published time delays based on two seasons of monitoring, but it was not clear how well the quasar and microlensing variability had been disentangled. Indeed, Blackburne & Kochanek (2010) reported newer time delay estimates differed by 2-5σ from the previous results. After our analysis was complete, Courbin et al. (2010) presented new data for HE0435 including refined astrometry from deconvolution of HST images and time delays from four additional years of R-band monitoring. We compare the new time delays to our lens models in Section 7.2. One valuable by-product of the analysis by Courbin et al. (2010) is estimates of the R-band flux ratios after correcting for both microlensing and intrinsic variability in the source. We note that these "corrected" R-band flux ratios match within ∼ 1σ the mean R-band values used here.

METHODOLOGY
We use a Bayesian framework both to constrain model parameters and to assess the quality of different models. We aim to compute the posterior probability distribution where d is the data which constrain the parameters θ for model M . 1 We calculate the likelihood, L = P (d|θ, M ), from the χ 2 goodness-of-fit: L ∝ e −χ 2 /2 . Since we are only concerned with relative posterior probabilities, we ignore the proportionality constant and set L = e −χ 2 /2 . In most cases we take the prior distribution, P (θ|M ), to be uniform for the parameters listed in Table 3; one exception is discussed in Section 6.2.
The denominator in eqn.
(1) is the marginal likelihood of the model, also known as the Bayesian Evidence: In many astrophysical studies there is only one model being examined. In that case the Bayesian evidence can be ignored, since the normalisation of the posterior does not affect confidence intervals for marginalised parameters. If the evidence is not needed, an effective way to proceed is to sample from the posterior distribution using methods such as Monte Carlo Markov Chains (MCMC). The evidence becomes crucial, though, when comparing different models. Since the Bayesian evidence quantifies the overall probability of a particular model, it provides an objective way to compare models even if they have different numbers of parameters (MacKay 2003;Gelman et al. 2003). The ratio of the posterior probabilities for two models M1 and M2 is We assume equal prior probabilities for all models, P (M1) = P (M2), so the ratio of posterior probabilities is just the ratio of evidences, which is called the Bayes factor. While the principle is well established, the quantitative significance of Bayes factors is not completely clear cut, and various scales are employed to facilitate the interpretation. The most common choice is the Jeffreys' scale (Jeffreys 1961), which grades Bayes factors as shown in Table 2. In this work, we use the Jeffreys' scale as a guideline for judging our models, and we actually work with the differential log evidence, ∆ log 10 (Evidence) = log 10 (Bayes factor). The practical challenge lies in integrating over all model parameters to compute the Bayesian evidence. We perform the integration using the Nested Sampling algorithm (Skilling 2004(Skilling , 2006, which provides marginalised parameter ranges in addition to the evidence. As a computational tool, nested sampling has been used in a variety of astrophysical studies (e.g., Mukherjee et al. 2006;Humphrey et al. 2009), including gravitational lensing (e.g., Vegetti & Koopmans 2009a;Barnabè et al. 2009).
Roughly speaking, the idea of nested sampling is to execute many random draws from the parameter space according to the following scheme: at each step, the next point is drawn uniformly from the prior distribution but limited to the region where the likelihood increases. Various procedures for doing this constrained sampling have been introduced (Mukherjee et al. 2006;Shaw et al. 2007 Clump models Table 3. Model parameters and priors for our smooth and clump models discussed in Sections 4 and 5, respectively. We adopt uniform priors within the specified intervals, with the exception of e c,G22 and e s,G22 for certain models as discussed in Section 6.2. Note that ec = e cos 2θe and es = e sin 2θe are the quasi-Cartesian components of the ellipticity e = 1 − q. nested sampling without such a strict one-way progression (Brewer et al. 2009). The volumes enclosed by different isolikelihood surfaces may not be known exactly, but they can be estimated statistically, so the likelihood values and associated volumes can be combined to estimate the Bayesian evidence (for details, see Skilling 2004Skilling , 2006. Because nested sampling is intrinsically stochastic, there is some statistical uncertainty in the evidence value, but we can compute the uncertainty using the methods presented by Keeton (2011). In general we do not wish to place strong priors on our model parameters, so we have a large parameter volume to explore. To alleviate the computational burden, we adopt a two-step approach to sampling. First, we execute an MCMC sampling of the posterior using uniform priors defined in Table 3. Details of our MCMC algorithm, techniques, and convergence criteria are discussed in Section 3.4 of Fadely et al. (2010). We use the posterior from MCMC to construct narrower priors that encompass the 99.999% CL parameter ranges (Table 3, dotted lines Figure 1). Using the narrower priors for nested sampling reduces the amount of time spent in regions of extremely low likelihood (χ 2 > 10 6 ). Tests with multivariate Gaussian distributions indicate that truncating such low-likelihood regions of the parameter space does not significantly alter estimates of the Bayesian evidence.

SMOOTH MODELS
As a first step we examine how well a smooth mass distribution (without substructure) can fit the observed image positions and flux ratios for HE0435. Following Kochanek et al. (2006), we adopt a "minimal" lens model in which the main lens galaxy (G1) has an ellipsoidal mass distribution with a softened power law density profile, where s is the core radius, ξ = x 2 + y 2 /q 2 is the ellipse coordinate (in the major axis frame), and q 1 is the projected axis ratio. The power law index is defined such that the mass enclosed within radius R scales as M (R) ∝ R β , so an isothermal profile has β = 1, while a steeper (shallower) profile has β < 1 (β > 1). Note that for a pure power law profile, the corresponding 3D density profile is ρ ∝ r β−3 . The normalisation parameter b has dimensions of length, and for a pure power law b is directly proportional to the Einstein radius: REin = bF (β, q). The proportionality factor F has a complicated form in terms of special functions, but if we make a Taylor expansion in the ellipticity e = 1 − q we can write We vary all model parameters for the main lens, including β. Kochanek et al. (2006) found that HE0435 has a density profile that is shallower than isothermal, which leads to a rising rotation curve. Varying β is important if we are to account for a wide range of possible mass distributions. Our minimal model also includes the effects of the environment of HE0435. The neighboring galaxy G22 is close enough (only 4.4 from G1) that it needs to be included explicitly. As in previous studies, we assume that G22 lies at the same redshift as G1. Kochanek et al. (2006) modeled G22 as a singular isothermal sphere and found a best-fit Einstein radius of 0.22 , so the galaxy should provide negligible surface mass density at the location of HE0435, and an isothermal profile should be adequate. We do, however, generalise by letting G22 be elliptical and use a singular isothermal ellipsoid model. We also add an independent external shear to account for tidal effects from the group of galaxies surrounding the lens (Morgan et al. 2005;Wong et al. 2011).
We use an updated version of the public lensmodel code (Keeton 2001) both to find the best-fit smooth model and to run nested sampling. The resulting Bayesian evidence is reported in Table 4 below. The best-fit model has χ 2 = 24.6 for N dof = −1. Since the model is formally underconstrained, finding χ 2 = 0 indicates that the model is not sufficiently flexible-it lacks some key freedom. To diagnose the failure, we show in Figure 3 the distributions of flux ratios predicted by the smooth model, compared with the observed values. 2 The smooth model is unable to account for the observed A/C flux ratio at high confidence. This constitutes a clear "flux ratio anomaly" of the sort that has been seen in other lenses (e.g., Mao & Schneider 1998;Bradač et al. 2002;Metcalf & Zhao 2002;Keeton et al. 2003Keeton et al. , 2005 and interpreted as evidence for dark matter substructure (e.g., Metcalf & Madau 2001;Dalal & Kochanek 2002).

FEW-CLUMP MODELS
Motivated by the anomaly, we hypothesise that HE0435 contains substructure and examine how well we can test that hypothesis and constrain the properties of the substructure. In this section we consider whether it is possible to explain the data with just one or a few clumps that (presumably) lie near the lensed images. In Section 6 we study full populations of substructure.

Approach
We first need to reflect on where we might expect mass clump(s) to be. Since the observed A/C flux ratio is higher than predicted by smooth models, we need substructure to increase the predicted ratio, which means making image A brighter or C fainter. In HE0435, A and C are both positive-parity images. Substructure tends to make positiveparity images brighter and negative-parity images fainter (Schechter & Wambsganss 2002;Keeton 2003), so we expect to need a clump near image A. We do not place a clump near image C, because that would tend to make C brighter and exacerbate the flux ratio anomaly. (This is the reason we have chosen to compute flux ratios relative to image C.) We imagine that there might be additional clumps near images B and/or D. To be specific, we consider a model with one clump near image A, a model with two clumps near A and B, a different model with two clumps near A and D, and a model with three clumps near A, B, and D. For simplicity, we label these four models A, AB, AD, and ABD, respectively. The models clearly have different numbers of parameters, but the Bayesian analysis can provide an objective ranking of the models (through the evidence) along with constraints on the clump positions and masses (through parameter marginalisation).
We model the clumps with a spherical pseudo-Jaffe profile, which has a three-dimensional density ρ(r) ∝ 1/r 2 (a 2 + r 2 ) that translates into a two-dimensional surface mass density of the form where b clump sets the mass scale while a represents a truncation radius. The total mass for this model is M total = πΣcritb clump a. Pseudo-Jaffe clumps are efficient lenses because they have a steep, isothermal slope inside the trun- cation radius; clumps with shallower profiles (e.g., NFW) are less efficient lenses and could therefore lead to different model results. We select the pseudo-Jaffe model because it includes the effects of tidal truncation, and its role in earlier studies (e.g., Dalal & Kochanek 2002;Vegetti et al. 2010a) facilitates the comparison of previous results with our results using new methodology. We defer a systematic study of clump density profiles to follow-up work. Following Dalal & Kochanek (2002), we set a = bG1 b clump,max to account for tidal truncation of a pseudo-Jaffe profile by the parent halo in an approximate but reasonable way. Here bG1 is the average mass normalisation of G1 and b clump,max is the maximum mass scale parameter for the clump. For HE0435, this works out to be a = 0.367 . Table 3 provides a complete list of the parameters for our few-clump models. We vary all of the parameters using nested sampling to obtain both evidence values and parameter constraints. We then use the constrained clump parameters to compute the clump masses. We can compute the total mass of the pseudo-Jaffe model, but we expect the quantity that is more relevant for lensing (especially flux ratios) to be the mass within the Einstein radius. We quote both but give particular attention to the mass within the Einstein radius.

Results
We first add a single clump near image A to our macromodel. Table 4 gives the Bayesian evidence along with constraints on the clump parameters. Figure 4 shows that adding the clump does much to alleviate the discrepancy between the predicted and observed flux ratios: the predicted value is now A/C = 1.72 +0.11 −0.10 . In fact, the model is able to reproduce the data perfectly, with a best-fit value of χ 2 = 0. In some sense this is not surprising because the model is underconstrained with N dof = −4, but we saw before that being underconstrained does not guarantee a perfect fit. Appar- ently the clump provides some important freedom that was not present in the smooth, minimal model. Figure 5 shows constraints on the position of the clump near image A for different mass ranges. We find that the clump position and mass are degenerate in the sense that a more massive clump can lie farther from the image and still reproduce the observed flux ratio. This is familiar from previous studies (e.g., Dalal & Kochanek 2002;Keeton 2009), and not surprising because, heuristically, flux perturbations are driven by shear perturbations of the form δγ ∝ M/d 2 , where d is the distance of the clump from the image (see Mao & Schneider 1998). In principle, a star placed close to the image can produce the same magnification as a more massive clump placed farther away (provided the source is sufficiently small). Adding position constraints can break the degeneracy, though. Position perturbations are driven by deflection perturbations of the form δα ∝ M/d Keeton 2009). Thus, if both the position and flux are affected by a clump, the different scalings make it possible to constrain the clump mass. On the one hand, a very lowmass clump is simply unable to affect the image position, regardless of its location; on the other hand, a high-mass clump may disturb the image position too much, and may affect other images as well. In HE0435 these effects let us find bounds on the mass of clump A: log 10 (M A Ein ) = 7.65 +0.87 −0.84 (or log 10 (M A total ) = 9.31 +0.44 −0.42 ). 3 We conclude that the clump is well constrained by the combination of both flux and position data.
We next consider models with more than one clump near the images. We keep the clump near image A because it seems essential, but try placing additional clumps near B and/or D. Table 4 gives the evidence values and parameter constraints for the various models. Figure 4 shows that adding more clumps does not significantly alter the predicted A/C flux ratio. This, in turn, implies that additional clumps have little effect on the inferred properties of clump A (see Table 4).
We can assess the relative probabilities of the various models using the Bayesian evidence values in Table 4. The models with clumps all have evidence values that are at least three orders of magnitude greater than our minimal smooth model. Clearly, the data strongly prefer models with at least one clump near the lensed images. According to the Jeffreys' scale (Table 2), the case for at least one clump is decisive for the range of models considered here. Examining the evidences in detail, we see that model AB has the highest evidence, with a value that is 0.63 dex higher than for the single-clump model. Formally, this provides "substantial" evidence for a second clump near image B with mass log 10 (M B Ein ) = 6.55 +1.01 −1.51 (or log 10 (M B total ) = 8.76 +0.50 −0.77 ). Since the evidence values carry uncertainties of 0.16 dex, the case for clump B is intriguing but far from decisive.
It is striking to see that adding a clump near image D has essentially no effect on the Bayesian evidence: the evidences for the A and AD models are indistinguishable (given the uncertainties), and likewise for the AB and ABD models. Apparently the parameters associated with clump D do not significantly improve the models' ability to reproduce the data, so they produce little change to the evidence. This is an example of Occam's Razor in action.
It is interesting to study how the addition of clumps affects the parameters of the smooth mass distribution. Figure  6 shows joint posterior probability distributions for several key parameters, before and after adding a clump near image A. In general, adding the clump broadens the distributions, which is not surprising because of the increased flexibility afforded by the clump. For some parameters the posterior also develops structure indicative of covariances in the 14dimensional parameter space (see the right-hand panel of Fig. 6). Even so, the median values of the distributions are not significantly altered, typically shifting within the 68% confidence intervals of the no-clump model.
One notable parameter is the density slope of the main lens, β. Using the quasar image positions and (estimated) time delays, the Einstein ring image of the quasar host galaxy, and a prior on H0, Kochanek et al. (2006) found the slope to be shallower than isothermal, corresponding to β > 1.0 in our models. We obtain β = 1.19 +0.13 −0.13 and 1.19 +0.17 −0.15 for our models without and with clump A, respectively. Thus, we find evidence for a shallow density profile independent of time delay constraints, and that result is not affected by the presence of substructure.
The mass constraints we find on substructure in HE0435 are a first for quasar lenses. Previous work in the radio and mid-infrared has provided good evidence for substructure but has not necessarily yielded upper and lower bounds on clump masses (e.g., Chiba et al. 2005;Minezaki et al. 2009 ) have been able to place specific constraints on clump masses. In all of those cases, however, the substructure was linked to a luminous satellite whose position could be constrained from direct observations. Fixing the position breaks the positionmass degeneracy that is inherent in flux perturbations, and thus can lead to very good constraints on mass (σM ∼ 0.1-0.3 dex). Our results for HE0435 are novel in two ways. First, we have been able to constrain clump positions and masses from lens data alone, with no direct observations of the clump(s). This shows that it is possible to constrain clumps even if they are invisible. Second, the masses we have found for clumps A and B are among the smallest found in any lens system to date. 4 While we believe these conclusions to be new and interesting, we do offer one cautionary note. Our clump constraints have been derived in the context of a fairly simple macromodel. Adding more flexibility to the macromodel, such as higher-order multipole modes or pixellated potential perturbations (e.g., Evans & Witt 2003;Yoo et al. 2005Yoo et al. , 2006Blandford et al. 2001;Koopmans 2005), would presumably alter the inferred clump constraints (e.g., broaden the mass uncertainties). However, Congdon & Keeton (2005) found that such features alone cannot explain flux ratio anomalies, and Yoo et al. (2005Yoo et al. ( , 2006 found that elliptical symmetry seems to be a reasonable assumption for lens galaxies. We therefore expect that adding reasonable flexibility might weaken but not eliminate the evidence for substructure in HE0435.

SUBSTRUCTURE POPULATION MODELS
So far we have concentrated on a few individual clumps near the lensed images of HE0435. Those clumps are presumably not the only examples of substructure in the system, but rather special representatives of a larger population (special in that they lie near an image). In this section we aim to constrain a full substructure population of the sort predicted by CDM (e.g., Diemand et al. 2007;Springel et al. 2008).

Model
We assume the clump population is characterised by a power law mass function of the form dN/dM ∝ M α with α = −1.9 Springel et al. 2008) and fixed lower and upper total mass thresholds of M total = 10 7 M and 10 10 M , respectively. We assume the clump positions are drawn from a uniform spatial distribution out to 10 from the centre of the lens. While realistic substructure may not be spatially uniform (e.g., Zentner et al. 2005;Springel et al. 2008;Nierenberg et al. 2011), using a uniform distribution facilitates comparison with previous work (e.g., Dalal & Kochanek 2002). Moreover, the choice of spatial distribution should not be terribly important for our results, because we focus on observables (image positions and flux ratios) that are mainly sensitive to clumps in the vicinity of the Einstein radius (e.g., Rozo et al. 2006;Keeton 2009).
We characterise the abundance of substructure using the mean convergence, κs = Σs/Σcrit, where Σs is the mean  Table 4. Marginalised clump parameters and evidence values for models which add the specified clump(s) to our minimal, smooth model. We quote differential log evidence values relative to the smooth model to facilitate model comparison (see Section 3 and Table  2). Positions are given in arcsec. The clump mass is the mass within the Einstein radius, in units of h −1 70 M . surface mass density in substructure (averaged over many realisations), and Σcrit is the critical surface mass density for lensing. For lensing purposes it is convenient to work with the scaled clump mass, m = M total /Σcrit, which has units of angular area. If the number density of clumps per unit mass of dn/dm, then the substructure convergence is Before undertaking extensive simulations, we would like to see if we can use the clumps inferred so far to estimate the properties of the larger population. While such an estimate must be taken with a grain of salt, it may help guide our exploration of parameter space. In Appendix A we present a toy model for the probability distribution P (κs) based on the idea that a clump population should have one clump in a location that leads to a strong flux perturbation in image A. That analysis leads to an estimate of κs = 0.025 +0.074 −0.022 (95% CL). Therefore, we consider the values of κs = [0.00022, 0.00046, 0.001, 0.0022, 0.0046, 0.01, 0.022, 0.046, 0.10] in our simulations.
For technical reasons, the clumps used in our popula-tion models differ from those used in our few-clump models in two small ways. First, instead of a smoothly truncated pseudo-Jaffe profile we use a sharply truncated isothermal profile whose density is ρ ∝ r −2 inside the truncation and zero outside. Second, the truncation radius is not fixed but scales with clump mass such that the ratio of the truncation radius to the Einstein radius is fixed to a ratio of ≈60.
(When we use a wide range of clump masses, it seems more sensible to have the truncation radius scale with mass than to use a some fixed value.) While in detail these differences may influence the distributions of deflections and magnifications produced by clumps, they should not significantly affect our results.

Approach
The complete set of parameters for this analysis includes the smooth model parameters (denoted by θ), the substructure convergence (κs), and the positions and masses of individual clumps (denoted by c). The quantity we seek is the posterior distribution for κs after marginalising over θ and c, The likelihood L depends explicitly on the clump positions and masses and the smooth model parameters; κs enters only implicitly, through the number of clumps, so we have written it in the priors P (c|κs). The latter factor also includes the priors on the clump positions and masses described above. Finally, P (θ) indicates priors on the smooth model parameters (see Table 3), and Ztot is a normalisation factor. Formally the integrand in eqn. (8) may have hundreds or thousands of dimensions, depending on the number of clumps, so we cannot evaluate it directly. Instead, we use Monte Carlo techniques. Let cj denote one particular realisation of the clump population. Suppose we generate Nc realisations for a particular value of κs. Then heuristically we can let We can then write the marginalised posterior for κs as Here κs is implicit on the right-hand side because it determines the number of clumps. Let us decompose this expression a little bit further. We can think of the θ integral as the Bayesian evidence for the macromodel, given a particular clump realisation, so let us put We can then rewrite eqn. (10) as In other words, the marginalised posterior for κs is the average macromodel evidence over many clump realisations (up to an overall normalisation factor). In principle, we just need to use nested sampling to evaluate the θ integral for each of many realisations, and then take the average. There are two practical issues. First, there is some statistical uncertainty in the average due to having a finite number of realisations. We set Nc = 5000 in order to sample the distribution well enough to achieve a statistical uncertainty of ∼ 20% in the evidence marginalised over clump realisations (verified with jackknife estimates).
Second, in our current implementation it can take several hours or more to run the θ nested sampling for a given clump realisation, so it is impractical to do the full evidence calculation for all 9×5000 cases. Instead, we explore the possibility of using the minimum χ 2 value (which is much easier to determine) as a proxy for the evidence in each case. The minimum χ 2 provides a measure of the peak log likelihood, so it may be more or less indicative of the evidence depending on whether the width of the likelihood distribution is fairly regular or irregular. For a subset of clump realisations we find both the best χ 2 (by optimising across θ) and the full evidence (by integrating over θ), and we compare the two quantities in Figure 7.
Looking first at the case with κs = 0.001, we see distinct patterns in the points. At values χ 2 6.5, we find that χ 2 is a very poor predictor of evidence: realisations with similar χ 2 values can have evidence values that span many orders of magnitude. This presumably occurs because certain realisations can produce good fits only for a highly tuned set of macro parameters, meaning the likelihood distributions are narrow in θ and the evidences are small; whereas other realisations can have a much larger range of macro parameters that produce reasonable fits. (See Fig. 8 for more discussion.) Above χ 2 ∼ 6.5, there is a tighter relation between χ 2 and evidence. The evidence decreases with χ 2 up to χ 2 ∼ 21, at which point the evidence values become consistent with the evidence for the smooth, minimal model. The patterns generally persist as κs increases, although with more scatter. We attribute the scatter to having more clumps and thus a wider range of substructure perturbations, some of which make the model more consistent with the data but many of which go in the opposite direction.
In Figure 7 we also plot the mean and scatter in the evidence values for bins of χ 2 . We use these to create a "lookup" scheme to convert from χ 2 to evidence for subsequent realisations. Specifically, for each realisation we optimise across the macro parameters θ to find the best-fit χ 2 . We interpolate between bins to find the mean and scatter in the log evidence for that χ 2 value. We then draw from the appropriate log-normal distribution to assign an evidence value to this realisation. With this process it becomes feasible to complete 5000 realisations for each value of κs.
During the course of this analysis, we initially found that some clump realisations led to unreasonably large bestfit values for the ellipticity of the neighbor galaxy G22 (eG22 ∼ 0.9). In order to prevent this, we adopted a mild Gaussian prior of 0.0 ± 0.2 on the two quasi-Cartesian ellipticity components. While ad hoc, this prior is tight enough to prevent unrealistic values for the ellipticity and to stabilise the χ 2 values, yet broad enough to allow a large range of ellipticity. Figure 9 shows the cumulative distribution of χ 2 values for different values of κs. As κs increase the distribution of χ 2 values gets broader; in other words, with more substructure there is a higher chance that images will be perturbed and the model will move away from the smooth case. The scatter goes in both directions: some clump realisations make the fit better, while others make it worse. That is in the nature of stochastic substructure. Figure 10 shows the average evidence as a function of the substructure convergence. We find that models with κs 0.001 have evidence values that are some three orders of magnitude higher than models with little or no substructure. According to the Jeffreys' scale, this is additional strong evidence for substructure in HE0435. Moreover, we find that the ∆ log 10 (evidence) values for population models are similar to those for few-clump models (Table 4), indicating that the data are consistent with, but do not objectively favour, millilensing by a full population of clumps.

Results
We can translate the substructure convergence, κs, into Figure 7. Best-fit χ 2 values and evidences for a subset of our simulations (gray points). We actually plot the differential log evidence relative to the smooth, minimal model. The three panels show results for κs = 0.001, 0.01, and 0.10. In all cases, the best-fit χ 2 is a poor predictor of the evidence for χ 2 6.5. For small values of κs, the evidence is tightly correlated with χ 2 for χ 2 6.5, and it is consistent with the smooth model value for χ 2 21. As κs increases, so does the scatter in evidence values. Overplotted are the mean and scatter in ∆ log 10 (Evidence) for χ 2 bins, which are used to convert from χ 2 to evidence for additional realisations (see text). Figure 9. For a given κs, we generate many substructure realisations, find the best χ 2 for each one, and plot the cumulative distribution of the resulting χ 2 values. Different colours indicate different κs values ranging from 0.00022 (black) to 0.10 (light orange). The vertical dashed line marks the optimised χ 2 for our smooth, minimal model. As κs increases, the χ 2 distribution broadens because some substructure realisations improve the fit while others worsen it. a substructure mass fraction at the Einstein radius. Our power law macromodels have κ0 ≈ β/2 at the Einstein radius (see, e.g., Kochanek 2002), so the local substructure mass fraction at the Einstein radius is f sub = κs/κ0 ≈ 2κs/β. As noted in Section 5.2, we find β ≈ 1.19 with ∼14% uncertainty. Thus, given that we can decisively rule out values κs 0.00046, we conclude that f sub > 0.00077 in HE0435 at high confidence. At present we do not obtain an upper limit on f sub ; the Bayesian evidence remains high for values as large as f sub ≈ 0.20. We speculate that such high substructure mass fractions are permissible thanks to the flexibility and freedom available to our macro model.
Our lower bound f sub > 0.00077 is consistent with, but weaker than, other lensing-based measurements. Using a sample of seven radio quads, Dalal & Kochanek (2002) found 0.006 < f sub < 0.07 at 90% confidence. In the case of SDSS 0946+1006, Vegetti et al. (2010b) found f sub = 0.0215 +0.0205 −0.0125 (68% confidence), assuming α = −1.9 ± 0.1 for the slope of the substructure mass function and total mass thresholds M total = 10 6.6 M and 10 9.6 M . Both results point to values of f sub that are higher than the values found in N-body simulations (f sub ∼ 0.002-0.003; e.g., Diemand et al. 2007;Springel et al. 2008;Xu et al. 2010). It is striking that HE0435 provides such strong evidence for substructure yet permits f sub values that are low and fully Figure 8. Spatial distributions of clumps for two realisations with κs = 0.001 (left) and κs = 0.01 (right). Triangles mark the image positions, while circles indicate clumps with the circle size proportional to the clump Einstein radius. Each realisation shown here provides a reasonable fit to the data and has at least one clump near image A (with spatial locations similar to our few-clump models). Note, however, that the realisations in the top row have worse χ 2 values but much higher evidence values than the realisations in the bottom row. We conjecture that when there is a massive clump near the images (as in the bottom row) the smooth component is confined to a smaller region of parameter space, leading to a narrower posterior distribution and hence a lower evidence value. consistent with CDM predictions. This result might imply that HE0435 simply has less substructure than some other lenses. Alternatively, it might indicate that something about HE0435 makes it less effective than some other lenses at constraining substructure. It is important to remember that flux ratios are mainly sensitive to clumps near the images, so they directly probe a small fraction of the lens galaxy halo, whereas f sub describes the global substructure population. Regardless of whether lens galaxies have different amounts of substructure or simply different strengths of anomalies (due to the stochastic nature of substructure lensing), it is clear that future studies will need to examine ensembles of lenses (like the seven used by Dalal & Kochanek 2002 but ideally even more) to produce strong, robust constraints on the global properties of dark matter substructure in galaxies.
Since our models permit, and other lensing results imply, f sub values higher than CDM predictions, it is worth considering how such a discrepancy might arise and whether it presents a challenge to CDM. On the theory side, one possible concern is that the number of surviving subhalos in N-body simulations might be underestimated, due to the lack of baryons in most simulations. In general, baryons are expected to cause dark matter (sub)halos to contract and become more concentrated, and that may make subhalos less susceptible to tidal disruption (e.g., Dolag et al. 2009). On the other hand, however, baryons also make the parent halo more concentrated, and that might raise the rate of tidal disruption (e.g., Romano-Díaz et al. 2010). Other possible concerns lie on the lensing side. Currently, the number of lenses available for studying f sub is small (of order 10) and  (Table 2), these results provide decisive evidence for substructure in HE0435.
thus sensitive to both statistical uncertainties and selection effects. Lensing is generally biased toward more massive and concentrated galaxies, perhaps with preferential orientations along the line of sight (e.g., Rozo et al. 2007;Mandelbaum et al. 2009). Furthermore, lens galaxies tend to lie in overdense environments (e.g., Momcheva et al. 2006), and the environment may boost the abundance of substructure (Oguri 2005). More work needs to be done to understand whether selection effects in lensing propagate into a significant bias in f sub .
Beyond our quantitative constraints on f sub , there are two aspects of our analysis worth highlighting. First, we have examined both individual clump models and full substructure population models for a given system, and sought to connect them. This is a first for quasar lenses. For galaxygalaxy strong lenses, Vegetti et al. (2010b) have detected a clump via gravitational imaging, and used that to infer f sub by an analysis similar in concept to what we present in Appendix A.
Another key feature of our analysis is the method used for studying substructure populations. Here, the only comparable study is that of Dalal & Kochanek (2002). Due to the complexity and computational demand of the study, Dalal & Kochanek chose to linearly reoptimise the macromodel and then work with the best χ 2 for each substructure realisation. By contrast, we have fully marginalised the macromodel and worked with the Bayesian evidence. We have found that the best χ 2 value can be an unreliable tracer of the evidence, at least in the HE0435 system. If such behavior occurs in the systems and models studied by Dalal & Kochanek, it could affect the recovered f sub values. Additionally, Dalal & Kochanek assumed a uniform mass for their substructure population. Here, we have considered a more realistic population with a mass function like that seen in N-body simulations. Due to the computational demand of our simulations, we have so far only examined one value of α and one range of clump masses. Future work will explore the dependence of inferred f sub values on these parameters.

IMPLICATIONS
While we have focused on using image positions and flux ratios to probe substructure in HE0435, our models have additional implications that we explore in this section.

K-band flux ratios
Our substructure models are able to account for the observed R-band flux ratios and, due to their similarity, the L flux ratios as well. As noted in Section 2, however, the B/C flux ratio is a factor of 1.27 higher in K than in the other passbands. This anomaly is perplexing because the K-band emission (rest-frame 0.82 µm) presumably originates from the quasar accretion disk, so the K and R sources should both be small compared with the Einstein radii of dark matter clumps and should therefore experience similar lensing magnifications.
Since differential dust extinction is not likely in HE0435 (Wisotzki et al. 2003), we hypothesise that the K-band anomaly is caused by stellar microlensing that happened to be stronger at the time of the infrared observations than during the earlier optical monitoring by Kochanek et al. (2006). Under this hypothesis, the (nearly) contemporaneous L flux ratios were not anomalous because the L emission originates from a larger region that is less susceptible to microlensing. We test whether this hypothesis is reasonable by simulating microlensing near image B using the ray-shooting code of Wambsganss (1999).
We simulate microlensing in a box with side length L = 15REin, where REin is the Einstein radius for the mean star mass. This box is chosen to be large compared with the source, but small enough that we can apply a uniform convergence and shear across the box. We set the convergence and shear to the values predicted by our best-fit two-clump (AB) lens model: κ local = 0.694 and γ local = 0.486. To divide κ local into the contribution from stars (κ ) and a contribution that is smooth on the scale of the box (i.e., from dark matter), we use results from star+dark matter lens models for HE0435 by Kochanek et al. (2006). Those models yield κ = 0.05 log 10 (rch/kpc), where rc is the NFW scale radius in the models. For the range of values, 2.5 < rc < 20 , considered by Kochanek et al. (2006), the κ values lie between 0.05 and 0.10. We try both extremes.
For each value of κ we generate 100 random realisations of the stellar distribution, drawn from a mass function dN/dm ∝ m −1.3 over the range 0.01-1.5 M . Such a mass function agrees with measurements from the Galactic bulge (Gould 2000) and has been used in previous microlensing studies (Morgan et al. 2010;Poindexter & Kochanek 2010). With this mass function and the source/lens redshifts appropriate for HE0435, the Einstein radius for the mean stellar mass is log 10 (REin/ ) = −6.1.
For each realisation we use ray shooting to construct a magnification map with resolution L/1024. We then convolve the map with a uniform circular source whose size corresponds to the expected size of the K-band emission region. To estimate the source size, we start with the empirical results for I-band sources from Morgan et al. (2010), and then scale from I (rest-frame 0.26 µm) to K using the familiar relation R ∝ λ 4/3 from Shakura & Sunyaev (1973). This yields a K-band source size of log 10 (Rsrc/ ) = −6.1 +0.5 −0.7 , Figure 11. Probability distributions for the B/C flux ratio under the influence of microlensing. The three curves correspond to different source sizes: median size (dashed/blue), 68% CL high value (solid/black), and 68% CL low (dot-dashed/red), using the size distribution from Morgan et al. (2010). The vertical dotted line indicates the value needed to reproduce our K-band observations. We find that small source sizes 0.48 or 0.73 (in units of the Einstein radius of the mean star mass) can reproduce the observations well for κ * = 0.05 or 0.10, respectively.
or Rsrc/REin = 0.2-3.1 (68% CL). Already we see that the K-band source has a size that should make it sensitive to microlensing. Figure 11 shows results from the microlensing simulations. In general, it is not hard for microlensing to increase the magnification of image B by a factor of 1.27. To go into more detail, we consider what range of source sizes can yield a microlensing boost > 1.27 more than 16% of the time (corresponding to a two-sided 68% CL). This requires a source size of Rsrc/REin 0.48 (0.73) for κ = 0.05 (0.10). Such sizes are well within the estimated range for the K-band source (Rsrc/REin ∼ 0.2-3.1), so it seems quite plausible that microlensing can create the additional magnification necessary to explain the observed K-band flux ratio.
At the same time, we must consider whether microlensing would affect the L flux enough to disturb the agreement between the model and data. In a "worst case" scenario we might imagine that all the L emission originates from the accretion disk. In this case the L -band source size would need to be Rsrc/REin 1.0-1.3 to keep the predicted flux ratio consistent with the observed value. With the Shakura & Sunyaev scaling this would imply an associated K-band source size of Rsrc/REin 0.54-0.63, which is compatible with the range of values needed to reproduce the observed K-band flux ratio. Since it is likely that some of the L emission originates from the larger torus, which would be even less sensitive to microlensing, we infer that the observed L -band flux ratio is not inconsistent with significant microlensing in the K-band.
Given uncertainties in the source sizes, stellar density, stellar mass function, and measured flux ratios, we conclude that microlensing provides a plausible explanation for the observed B/C flux ratio in the K-band. Confirmation of this hypothesis will be possible with future K-band observations to quantify variability in the flux ratios.

Time delays
After our modeling was complete, Courbin et al. (2010) presented new data for HE0435 including a measurement of the velocity dispersion of the lens galaxy and new estimates of the time delays between the images derived from six years of photometric monitoring. The new time delays have values similar to those obtained by Kochanek et al. (2006) but errorbars that are a factor of ∼2.6 larger. The primary origin of the increased uncertainties lies in the analysis methods used by the two teams. In particular, Courbin et al. (2010) find a larger uncertainty in the arrival time of image A.
There is no obvious way to post-process our modeling results to apply the new time delay constraints in a rigorous fashion. Nevertheless, we can compare the distribution of time delays predicted by our models with the new measurements. This test is statistically meaningful because we did not place any constraints on time delays when fitting the models. Figure 12 shows the comparison. We find that models without substructure are somewhat discrepant from models that include a few clumps. Furthermore, the predictions from our few-clump models are in good agreement with the new time delays measurements. It is clear, though, that adding time delay constraints would further improve the models. As a check, we have also examined predicted time delays for our population models (not shown). These, too, are in good agreement with the new data, and may benefit even more from new constraints because populations of substructure tend to broaden time delay distributions (see Keeton & Moustakas 2009). The improvement would be particularly relevant for models with high f sub values, since time delays scatter with στ ∝ √ f sub . We plan to include the new time delays, as well as velocity dispersion constraints, in future work.

CONCLUSIONS
We have conducted a fully Bayesian analysis of the lens HE 0435−1223. Combining astrometry from HST with flux ratios from ground-based monitoring, we probe the mass distribution down to milli-arcsecond scales, and assess various models for substructure using the Bayesian evidence. We also examine multi-wavelength properties of the lens. We summarise our conclusions as follows: • The observed flux ratio of images A and C cannot be Figure 12. Cumulative probability distributions of the time delays predicted by three models: smooth (black), smooth + clump near A (blue), and smooth + clumps near A and B (red). The shaded grayscale indicates the newly-measured time delays from Courbin et al. (2010). We find that models with substructure generally predict time delays that agree with the data, even though the observed values were not used as constraints, whereas models without substructure do less well. Future studies will benefit from adding time delay constraints.
reproduced by macroscopic, smooth lens models. This failure cannot be due to microlensing or intrinsic variability of the background quasar, since monitoring has quantified such variations. Instead, we find that adding a single clump near image A whose mass within the Einstein radius is log 10 (M A Ein ) = 7.65 +0.87 −0.84 (in units of h −1 70 M ) can account for the data.
• We consider other sources of substructure by including additional clumps near other lensed images. Using the Bayesian evidence to compare the various possibilities, we find that a model with clumps near images A and B is most favoured. This model has an evidence that is 0.63 dex greater than our single clump model, and implies a mass for clump B of log 10 (M B Ein ) = 6.55 +1.01 −1.51 . However, the modest increase in the evidence, coupled with evidence uncertainties, makes the case for clump B less decisive than the case for clump A.
• We also examine a full ensemble of subhalos, using a mass function consistent with CDM predictions (dN/dM ∝ M −1.9 over the range 10 7 -10 10 M ) and varying the abundance of substructure. By examining the Bayesian evidence, we infer the mass fraction of substructure to be f sub > 0.00077 near the Einstein radius. Our measurement of f sub , unlike other lensing-based measurements, is fully consistent with that predicted by CDM simulations (f sub ∼ 0.002-0.003).
• As part of our substructure analysis, we find that optimising the macromodel for each realisation of substructure cannot necessarily substitute for marginalising the macromodel. In the Bayesian framework, full marginalisation is important.
• Near-infrared flux ratio measurements in the K and L passbands generally agree with those from optical monitoring. The lone exception is the K-band value of the flux ratio B/C. We show that stellar microlensing provides a plausible explanation for the K-band flux of image B, if the K-band source has size log 10 (Rsrc/ ) −6.24. Estimates based on accretion disk measurements by Morgan et al. (2010) suggest that such a size is reasonable. Future K-band observations can test the microlensing hypothesis.

ACKNOWLEDGMENTS
We thank Phil Marshall for a very helpful and insightful referee report, and Dominique Sluse for interesting discussions about HE0435. Work presented here received support from the US National Science Foundation through grant AST-0747311.

APPENDIX A: CONNECTING INDIVIDUAL CLUMPS TO THE POPULATION
In Section 6 we specify the form of the mass function and spatial distribution from which our substructure population is drawn. The free parameter in our analysis is the amount of substructure, characterised by κs. Here we present a toy model to extrapolate from our few-clump models (Section 5) and estimate κs.
For clumps near an image, let A(m) be the area of the "region of influence" for a clump with scaled mass m to produce a perturbation. Focusing on magnification perturbations, we have A(m) ∝ m since magnification perturbations are driven by shear perturbations of the form δγ ∝ m/d 2 where d is the distance of the clump from the image (see Section 5.2). We can set the proportionality constant using our few-clump models: if there is a clump with scaled mass mi at distance di from image i, then we put where for simplicity we let Ai be the geometric area πd 2 i . Under these assumptions, the mean number of clumps that lie within the region of influence for image i is where κs,i is the substructure convergence near image i (qv. eqn. 7). In our main analysis we assume a uniform substructure convergence, so κs,i = κs for all images, but this framework could be made more general. The probability that there are Ni clumps affecting the image is a Poisson distribution of the form Inspired by our few-clump models, we consider the probability that there is one clump affecting image A: Since λA depends on κs, we can reinterpret this as a probability distribution for κs. Specifically, in a Bayesian framework with flat priors, the posterior distribution for κs has the form P clump (κs|mA, AA) ∝ AA mA κs exp − AA mA κs (A5) where mA is the mass of the clump near image A, and AA = πd 2 A is the area defined by the distance dA of the clump from the image. This distribution has a peak at κs = mA/AA, a mean of κs = 2mA/AA, and a standard deviation of σκ s = √ 2mA/AA. In practice we account for uncertainties in mA and AA by integrating over the allowed range, where P (mA, AA) is the posterior probability for the parameter pair (mA, AA), and the sum runs over allowed pairs. Us-ing this "final" P clump (κs) curve we find the median value and 95% confidence range κs = 0.025 +0.074 −0.022 .