Estimating the dark matter halo mass of our Milky Way using dynamical tracers

The mass of the dark matter halo of the Milky Way can be estimated by fitting analytical models to the phase-space distribution of dynamical tracers. We test this approach using realistic mock stellar halos constructed from the Aquarius N-body simulations of dark matter halos in the $\Lambda$CDM cosmology. We extend the standard treatment to include a Navarro-Frenk-White (NFW) potential and use a maximum likelihood method to recover the parameters describing the simulated halos from the positions and velocities of their mock halo stars. We find that the estimate of halo mass is highly correlated with the estimate of halo concentration. The best-fit halo masses within the virial radius, $R_{200}$, are biased, ranging from a 40\% underestimate to a 5\% overestimate in the best case (when the tangential velocities of the tracers are included). There are several sources of bias. Deviations from dynamical equilibrium can potentially cause significant bias; deviations from spherical symmetry are relatively less important. Fits to stars at different galactocentric radii can give different mass estimates. By contrast, the model gives good constraints on the mass within the half-mass radius of tracers even when restricted to tracers within 60kpc. The recovered velocity anisotropies of tracers, $\beta$, are biased systematically, but this does not affect other parameters if tangential velocity data are used as constraints.


INTRODUCTION
Our Milky Way (MW) galaxy provides a wealth of information on the physics of galaxy formation and the nature of the dark matter. This information can, in principle, be unlocked from studies of the positions, velocities and chemistry of stars in the Galaxy, its satellites and globular clusters, which can be observed with high precision.
Many inferences derived from the properties of the Milky Way (MW) depend on the precision and accuracy with which the mass of its dark matter halo can be estimated. An example is the much-publicized "too big to fail" problem, the apparent lack of MW satellite galaxies with central densities as high as those of the most massive dark matter subhalos predicted by ΛCDM simulations of 'Milky Way mass' hosts (Boylan-Kolchin et al. 2011, 2012Ferrero et al. 2012). In these simulations the number of massive subhalos depends strongly on the assumed MW halo mass and the problem disappears if the MW halo mass is sufficiently small ( ∼ < 1 × 10 12 M⊙; Wang et al. 2012;).
There are many different ways of constraining the MW dark matter halo mass 1 . These include timing argument estimators (Kahn & Woltjer 1959) calibrated against N -body simulations (Li & White 2008); the kinematics of bright satellites (Sales et al. 2007b,a;Barber et al. 2014;, particularly Leo I ) and the Magellanic Clouds (Busha et al. 2011;González et al. 2013); the kinematics of stellar streams (Newberg et al. 2010), especially the Sagittarius stream (Law et al. 2005;Gibbons et al. 2014); measurements of the escape velocity using nearby high velocity stars, such as those from the RAVE survey (Smith et al. 2007;Piffl et al. 2014); and combinations of photometric and kinematic data such as Maser observations and terminal velocity curves (McMillan 2011).
Some authors have used large composite samples of objects assumed to be dynamical tracers in the halo, such as stars, globular clusters and planetary nebulae. For example, the halo circular velocity, Vcirc, may be inferred from the radial velocity dispersion of tracers, σr(r), using the spherical Jeans equation. Such methods require the tracer velocity anisotropy and density profiles to be known or assumed. Battaglia et al. (2005) made use of a few hundred stars and globular clusters from 20 to 120 kpc; Xue et al. (2008) used 2401 BHB stars from SDSS/DR6 ranging from 20 to 60 kpc; Gnedin et al. (2010) used BHB and RR Lyrae stars ranging from 25 to 80 kpc; and Watkins et al. (2010) used 26 satellites within 300 kpc. Most recently Kafle et al. (2012Kafle et al. ( , 2014 used a few thousand BHB stars extending to 60 kpc and K-giants beyond 100 kpc. Most measurements based on dynamical tracers involve assumptions about the tracer density profiles and velocity anisotropies. However, Wilkinson & Evans (1999) introduced a Bayesian likelihood analysis, based on fitting a model phase space distribution function to the observed distances and velocities of tracers. In their analysis the tracer density profile and velocity anisotropy can be considered as free parameters of the distribution function, to be constrained together with parameters of the host halo such as its mass and characteristic scalelength. The sample of stars used by Wilkinson & Evans (1999) was small and their best-fit host halo mass for a truncated flat rotation curve model was 1.9 +3.6 −1.7 × 10 12 M⊙ (see also Sakamoto et al. 2003). Most recently,  used a few thousand BHB stars from SDSS up to r ∼50 kpc. Fig. 1 summarizes the results of these studies. The x-axis is the measured MW halo mass. We have converted results to M200 by assuming an NFW density profile (Navarro et al. 1996a(Navarro et al. , 1997a and using the mean halo concentration relation of Duffy et al. (2008) in cases where a value for concentration is not given in the original study. The measurements are grouped by methodology, indicated by colours and labeled along the y-axis. We group those methods that use large samples of dynamical tracers into two sets: 1) those based on the radial velocity dispersion of the tracers and spherical Jeans equation to infer the circular velocity and underlying potential; 2) those based on fitting to model distribution functions, which attempt to constrain both halo mass and the velocity anisotropy of the tracers simultaneously. Errorbars correspond to those quoted by the original authors; we have converted 90% or 95% confidence intervals to 1σ errors assuming a Gaussian distribution. Wilkinson & Evans (1999), Sakamoto et al. (2003) and Watkins et al. (2009) included systematic errors in their measured masses, which makes their errors relatively large. Fig. 1 shows that existing measurements of the most likely MW halo mass differ by more than a factor of 2.5, even when similar methods are used, although apart from a few outliers, the estimates are statistically consistent.
Here we are particularly interested in methods such as that of Wilkinson & Evans (1999), which treat the spatial and dynamical properties of tracers as free parameters to be constrained under the assumption of theoretical phase space distribution functions. The primary aim of this paper is to test the model distribution functions used in this approach. We extend the distribution function proposed by Wilkinson & Evans (1999) to one based on the NFW potential, and model the radial profiles of tracers with a more general double power-law functional form. The model function is then fit to the phase space distribution of stars in realistic mock stellar halo catalogues constructed from the cosmological galactic halo simulations of the Aquarius project (Springel et al. 2008), to understand its reliability and possible violations to the underlying assumptions. Our results have implications that are not limited to the specific form   Measurements using similar methods and/or tracer populations are plotted with the same colour. Categories include the timing argument estimator (black); constraints from luminous MW satellites such as the Magellanic Clouds (magenta), Leo I(cyan), the orbits or radial velocity dispersion of other bright satellites (yellow) and their Vmax distribution (light green); modeling of the Sagittarius tidal stream (grey); high velocity stars from the RAVE survey (blue); combinations of maser observations and and the terminal velocity curve (pink); and (of most relevance to this work) dynamical modeling using large samples of dynamical tracers (red and green). Methods involving large samples of dynamical tracers are split into two categories, 1) those based on the radial velocity dispersion of tracers (green) and 2) those using model distribution function to constrain both halo properties and velocity anisotropies of tracers simultaneously (red). We have converted results to M 200 by assuming an NFW density profile and a common mass-concentration relation. 95% or 90% confidence intervals have been converted to a 1σ error by assuming a Gaussian error distribution.
of the distribution function that we test, but are applicable to the method itself. This paper is structured as follows. The mock stellar halo catalogues are introduced in Section 2. Detailed descriptions of the model distribution function and the maximum likelihood approach are provided in Section 3. Our results are presented in Section 4, with detailed discussions of reliability and systematics in Section 5 and Section 6. We conclude in Section 7. Throughout this paper we adopt the cosmology of the Aquarius simulation series (H0 = 73 km s −1 Mpc −1 , Ωm = 0.25, ΩΛ = 0.75 and n = 1).

MOCK STELLAR HALO CATALOGUE
We use mock stellar halo catalogues constructed from the Aquarius N-body simulation suite (Springel et al. 2008) with the particle tagging method described by Cooper et al. (2010), to which we refer the reader for further details. In this section we summarise the most important features of these catalogues.

The Aquarius simulations
The Aquarius halos come from dark matter N-body simulations in a standard ΛCDM cosmology. Cosmological parameters are those from the first year data of WMAP (Spergel et al. 2003). Our work uses the second highest resolution level of the Aquarius suite, which corresponds to a particle mass of ∼ 10 4 h −1 M⊙.
The simulation suite includes six dark matter halos with virial masses spanning the factor-of-two range of Milky Way observations discussed in the previous section. We have only used five out of the six halos for our analysis (labeled halo A to halo E according to the Aquarius convention). The halo we have not used (halo F) undergoes two major merger events at z < 0.6, and is thus an unlikely host for a MWlike disc galaxy. We list in Table 1 the host halo mass, M200, and other properties of the five halos, which are taken from Navarro et al. (2010).

The galaxy formation and evolution model
The Durham semi-analytical galaxy formation model, GAL-FORM, has been used to post-process the Aquarius simulations, predicting the evolution of galaxies embedded in dark matter halos. To construct the mock stellar halo catalogues used in this paper, the version described by Font et al. (2011) was adopted. This model has several minor differences from the model of Bower et al. (2006), such that the Font et al. (2011) model matches better the observed luminosity function, luminosity-metallicity relation and radial distribution of MW satellites. The main changes are a more self-consistent calculation of the effects of the photoionization background and a higher chemical yield in supernovae feedback.

Particle tagging
The GALFORM model predicts the amount of stellar mass present in each dark matter halo in the simulation at each output time, as well as properties of stellar populations such as their total metallicity. However, GALFORM does not provide detailed information about how these stars are distributed in galaxies. The particle tagging method of Cooper et al. (2010) is a way to determine the sixdimensional spatial and velocity distribution of stars from dark matter only simulations, by associating newly-formed stars with tightly bound dark matter particles.
At each simulation snapshot, each newly formed stellar population predicted by GALFORM is assigned to the 1% most bound dark matter particles in its host dark matter halo. Each "tagged" dark matter particle then represents a fraction of a single stellar population, the age and metallicity of which are also known from GALFORM. Traced forward to the present day, these tagged particles give predictions for the observed luminosity functions and structural properties of MW and M31 satellites that match well to observations. Recently, Cooper et al. (2013) have applied this technique to 10 1 10 2 r [kpc] 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2 10 3 Halo E 10 1 10 2 r [kpc] Halo D 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2 10 3 stellar mass[M ⊙ ]/kpc 3 Halo C Halo B 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2 10 3 Halo A Figure 2. The radial density profiles of stellar mass (red points and lines with errors) in the mock stellar halo catalogue of the five Aquarius halos. The errors are in most cases of comparable size to the symbols and are almost invisible. Dashed black curves are double power-law fits to the red data points obtained from a χ 2 minimization. Green dashed curves are the best-fit density profiles from the maximum likelihood method.
large-scale cosmological simulations and have shown that it produces galactic surface brightness profiles that agree well with the outer regions of stacked galaxy profiles from SDSS.
Our study is based on tagged dark matter particles from accreted satellite galaxies. We ignore particles associated with in situ star formation in the central galaxy. Strictly, our results thus only apply in the case where most MW halo stars originate from accretion. This is supported by the data ofBell et al. (2008,2010) although other work suggests that a certain fraction of the halo stars are contributed by in-situ star formation, especially close to the central galaxy (r < 30 kpc) (see, e.g. Carollo et al. 2007Carollo et al. , 2010Zolotov et al. 2010;Helmi et al. 2011). Ignoring the possible in-situ component is thus a weakness of our mock stellar halo catalogue. Nevertheless, our mock halo stars enable us to test and constrain the theoretical distribution function and, in practice, most of our conclusions (see Sec. 4 and Sec. 5) do not depend on whether the MW halo stars formed in-situ or were brought in by accretion.

METHODOLOGY
In this section we discuss the theoretical context of our method for constraining dark matter halo properties using dynamical tracers and a maximum likelihood approach based on theoretical distribution functions. In Sec. 3.1, we describe how the phase-space distribution of the tracer population is modeled. Sec. 3.2 gives details about the explicit form of the distribution function. The likelihood function is introduced in Sec. 3.3. Finally, we describe how we weight tagged particles and how errors are estimated in Sec. 3.4. Our method follows that of Wilkinson & Evans (1999) but introduces significant modifications to the form of the dark matter halo potential and the assumed tracer density profile.

Phase-space distribution of Milky Way halo stars
The phase-space distribution function of tracers (e.g. stars) bound to a dark matter halo potential (binding energy E > 0) can be described by the Eddington formula (Eddington 1916). The simplest isotropic and spherically symmetric case is where the distribution function only depends on the binding energy per unit mass, E = Φ(r) − v 2 2 . Φ(r) and v 2 /2 are the underlying dark matter halo potential and kinetic energy per unit mass of tracers. The integral goes from the potential at the tracer boundary 2 to the binding energy of interest. Usually both the zero point of potential and tracer boundary, rmax,t, are chosen at infinity, and thus Φ(rmax,t) = 0.
In reality the velocity distribution of tracers may be anisotropic and depend both on energy and angular momentum, L. In the simplest case, the distribution function is assumed to be separable: where the energy part, f (E), is expressed as (Cuddeford 1991) f Here β is the velocity anisotropy parameter defined as with vr, v θ and v φ being the radial and two tangential components of the velocity. The integer, m, is chosen to make the integral converge and depends on the value of β. In our analysis the parameter range of β is −0.5 < β < 1 and m = 1. β > 0 represents radial orbits, while tangential orbits have β < 0. β = 0 corresponds to the isotropic velocity distribution.
In real observations, the tangential velocities of tracers are often unavailable. We thus test two different cases, in which i) only radial velocities are available and ii) both radial and tangential velocities are available. For case (i), the phase-space distribution in terms of radius, r, and radial velocity, vr, is given by the integral over tangential velocity, where C denotes a set of model parameters. With the Laplace transform, this can be written as where Er = Φ(r) − v 2 r /2. All factors of m cancel in the Laplace transform and hence Eqn. 6 does not depend on m. For case (ii), the distribution function is simply Eqn. 2, i.e.
There are two parameters in Eqn. 8 and Eqn. 9, the scalelength, rs, and the scaledensity, ρs, defined at r = rs. r max,h is the halo boundary. If the halo is infinite, the second term in Eqn. 8 vanishes. In most of our analysis, we will assume the NFW halo is infinite. We test different choices of halo boundary in the Appendix C.
To derive analytical expressions for Eqn. 6 and Eqn. 7, we need an analytical form for the tracer density profile, ρ(r). Fig. 2 shows the radial density profile of stellar mass (red points) in each of the five Aquarius halos. Error bars are obtained from 100 realizations of bootstrap resampling. In most of the cases, these profiles can be described well by a double power law (black dashed lines are double power-law fits that minimize χ 2 ). Significant deviations from a double power law are most obvious in the outskirts of the halos. For example, halo E has a prominent bump at r ∼ 100 kpc due to a tidal stream.
There are indications that the real MW has a twocomponent profile, with density falling off more rapidly beyond ∼ 25 kpc, whereas M31 has a smooth profile out to 100 kpc with no obvious break (e.g. Watkins et al. 2009;Deason et al. 2011;Sesar et al. 2011). Recently, Deason et al. (2014) report evidence for a very steep outer halo profile of the MW. If we believe that MW halo stars originate from the accretion of dwarf satellites, whether the profile is broken or unbroken depends on the details of accretion history (Deason et al. 2013;Lowing et al. 2014). There is an as yet unresolved debate over whether the stellar halo of the MW has an additional contribution from stars formed in situ, in which case a break in the profile may reflect the transition from in situ-dominated regions to accretion dominated regions.
As our mock halo stars (which are all accreted) and observed MW halo stars can be approximated by a double power-law profile, we adopt the following functional form to model tracer density profiles: This equation has three parameters: the inner slope, α, the outer slope, γ, and the transition radius, r0.
Previous studies have adopted a single power law to describe the density profile of MW halo stars beyond r ∼ 20 kpc (e.g. Xue et al. 2008;Gnedin et al. 2010;Wilkinson & Evans 1999). Our double power-law form naturally includes this possibility as a special case. We also note that Sakamoto et al. (2003) considered the case of "shadow" tracers with a radial distribution that shares the same functional form with the underlying dark matter. We emphasize that our mock halo stars are not "shadow" tracers; their radial distribution is significantly different from that of the dark matter.
Assuming these analytical expressions for Φ(r) and ρ(r), Eqn. 6 and Eqn. 7 can be written more explicitly as P (r, vr|ρs, rs, β, α, γ, r0 and P (r, vr, vt|ρs, rs, β, α, γ, r0) = Here, analogously to Wilkinson & Evans (1999), we have introduced a characteristic velocity, vs = rs √ 4πGρs. The binding energy, ǫ, angular momentum, l, potential, φ, and radius, R, have all been scaled by vs and rs and are thus dimensionless, as follows, As mentioned above, Rmax,t is the boundary of the tracer distribution and, for most of our analysis, we take Rmax,t = ∞. Note that Eqn. 11 and Eqn. 12 are deduced by assuming the tracer boundary, Rmax,t, is smaller or equal to the halo boundary, R max,h . In both Eqn. 11 and Eqn. 12 there are six model parameters.
The phase-space probability of a tracer at radius, r, whose radial and tangential velocities are vr and vt, can be derived from Eqn. 11 or Eqn 12. The lower limit of the integral is determined by solving where ǫ equals φ(R) − v 2 r /(2v 2 s ) when only the radial velocity is available, and ǫ equals φ(R) − v 2 r /(2v 2 s ) − v 2 t /(2v 2 s ) when tangential velocity is also available. The fact that the integral goes from Rinner to Rmax,t indicates that the phasespace distribution at radius r has a contribution from tracers currently residing at larger radii, whose radial excursion includes r.
Dynamical tracers, such as MW globular clusters, BHB stars and satellites, are subject to selection effects. For example, sample completeness is often a function of apparent magnitude (hence distance). If we assume that all selection effects can be described by a window function, then the probability of finding each tracer object, i, within the data window is given by the normalized phase-space density The integral in the denominator runs over the phase-space window. The likelihood function then has the following form: It can easily be shown that this likelihood function is equivalent to the extended likelihood function marginalized over the amplitude parameter of the phase-space density (e.g. Barlow 1990), which we are not interested in. For our mock MW halo star catalogue, we deliberately exclude stars in the innermost region of the halo. These stars have extremely high phase-space density and so make a dominant contribution to the total likelihood, strongly biasing the fit. We find that excluding all stars at r < 7 kpc removes this bias 3 . The window function in our analysis is then simply P = 0 at r < 7 kpc. In real observations, the window function can be much more complicated.
We seek parameters that maximize the value of the likelihood function defined in Eqn. 16 and Eqn. 17. In order to search the high-dimensional parameter space efficiently, we use the software IMINUIT, which is a python interface of the MINUIT function minimizer (James & Roos 1975).
There are six parameters in Eqn. 11 or Eqn. 12. To make best use of the likelihood method, we treat all six parameters as free. In previous work using this approach the three parameters of the spatial part of the tracer distribution are often fixed to their observed values. We have carried out tests and found that, as expected, three parameter models give results consistent with those using six parameters only when the choice of tracer density profile is close to the true distribution. We recommend that all six parameters should be left free if the observed sample size is large enough to avoid introducing unnecessary bias.
Another source of potential bias in the halo mass estimates of previous studies arises from the use of universal mean mass-concentration relations for dark matter halos. In CDM simulations, the relation between halo mass and concentration has very large scatter (e.g. Neto et al. 2007). Taking halo A as an example, if we use the mass concentration relation from Duffy et al. (2008), the estimated concentration would be around 5.7, which is almost three times smaller than the true value (see Table 1). This would result in an overestimate of halo mass by almost an order of magnitude, and the corresponding scalelength, rs, would be three times larger. The huge scatter in the mass-concentration relation can cause catastrophic problems unless we are fortunate enough that the host halo of the MW does in fact lie on the mean mass-concentration relation.

Weighting tagged particles
As described in Sec. 2.3, our mock catalogues are created by assigning stars from each single age stellar population to the 1% most bound dark matter particles in their host halo at the time of their formation. The total stellar mass of each population will obviously vary from one population to the next (according to our galaxy formation model), as will the number of dark matter particles actually tagged (according to the number of particles in the corresponding formation halo). The result is that stellar masses associated with individual dark matter particles range over several orders of magnitude. Particles tagged with larger stellar masses correspond to more stars, and thus in principle should carry more weight in the likelihood fit.
To reflect this we could simply reweight each particle according to its associated stellar mass, M * ,i. However, individual stars are not resolved: the phase-space coordinates of the underlying dark matter particles comprise the maximum amount of dynamical information available from the tagging technique. Therefore, we give each particle a weight (M * ,i/ΣiM * ,i)Ntags. This conserves the total particle number, Ntags, but re-distributes this among particles in proportion to the fraction of the total stellar mass they represent. In this way we maintain a meaningful error estimated from the likelihood function.
We also randomly divide stars into subsamples and apply our maximum likelihood analysis to each of these to estimate the effects of Poisson noise. To do so, we assign each weighted particle a new integer weight drawn from a Poisson distribution with mean equal to the weight given by the expression above. We repeat this procedure 10 times, so that we have 10 different subsamples. The expectation values of the total weight for all tagged particles in these subsamples are the same, so this approach can be regarded as analogous to bootstrap resampling. We find this procedure yields consistent error estimates with that obtained from the Hessian matrix of the likelihood surface.
We restrict our analysis to the 10% oldest tagged particles in the main halo. This is to reflect the fact that, in real observations, old halo stars such as blue horizontal branch (BHB) and RR Lyrae stars are most often used as dynamical tracers, because they are approximately standard candles. We also exclude stars bound to surviving subhalos.

Testing the method
Before fitting the model distribution function to our realistic mock stellar halo catalogues, we test the method with ideal samples of particles that obey Eqn. 12. We applied our maximum likelihood method to 750 sets of 1000 phase-space coordinates (r, vr and vt) each drawn randomly from the same distribution function of the form given by Eqn. (12). Fig. 3 shows a comparison between the input halo parameters and the recovered best-fit halo parameters. The x axis is the ratio between the best-fit and true-input halo mass, and the y axis the ratio between best-fit and true concentration. The red cross indicates the mean ratios averaged over all the 750 realizations, which is very close to unity (horizontal and vertical dashed lines).
The best-fit halo mass and concentration varies among realizations as a result of statistical fluctuations, as shown in  Figure 3. The ratio between input and best-fit halo masses (xaxis) versus the ratio between input and best-fit halo concentrations (y-axis). The red cross is the mean ratio over 750 different realizations, which is very close to 1 on both axes (horizontal and vertical dashed lines). Black solid contours mark the region in parameter plane enclosing 68.3% (1σ) and and 95.5% (2σ) of best fit parameters among the 750 realizations. Fig. 3. We note that the shape of these contours indicate a correlation between the recovered halo mass and concentration parameters. The correlation coefficient (i.e., normalized covariance) is -0.89, which implies a strong degeneracy in the model parameter. We will discuss this degeneracy further in Sec. 5. The above exercise reassures us that our method works with ideal tracers, so we can move on to apply it to the more realistic mock halo stars in our simulations.

RESULTS
In this section we investigate how well the true halo mass can be recovered by fitting Eqn. 11 to mock halo stars in cases where: (a) only radial velocities are available (Sec. 4.1) and (b) both radial and tangential velocities are available (Sec. 4.2). In both cases we model the underlying potential with infinite halo boundaries. We refer to parameters estimated with the maximum likelihood technique as best-fit (or measured) parameters, to be compared with the real (or true) parameters taken directly from the simulation. The total number of tagged particles we used in the five halos is shown in Table 2. These are of the order of 10 5 , one or two orders of magnitude larger than the tracer samples used by  or Kafle et al. (2012). This permits a robust test of the method free from the effects of sampling fluctuations. Future samples of observed tracers will grow with ongoing and upcoming surveys such as APOGEE (Ahn et al. 2013) and Gaia (Prusti 2012;Gilmore et al. 2012). Fig. 4 shows, as black points, the best-fit M200, c200 and β for our five halos in the case where only radial velocities are known. These best-fit parameters are given in Table 1   with the true halo or tracer properties (shaded in grey), which are plotted as red points in Fig. 4. Table 1 lists the true and best-fit values of the host halo mass (M200), halo concentration (c200), scalelength (rs), scaledensity (ρs) and virial radius (R200). Note only two of these parameters are independent. The best-fit M200 values are overestimates of the true values for halos B and D by 140% and 7%, and underestimates for halos A, C and E by 30%, 10% and 35% respectively. These biased estimates of halo mass and concentration are very significant compared with the small fitting errors.

Radial velocity only
The measured spatial parameters (α, γ and r0) agree well with the true values obtained from a double power-law fit to the stellar mass density, shown as black dashed lines in Table 1. best-fit and true model parameters for each of our five halos. The highlighted rows list the true values of model parameters and the subsequent two rows the corresponding best-fit values when using only radial velocities, vr, and using both radial and tangential velocities, vr + vt.   The agreement is especially good on scales smaller than or comparable to the transition radius r0. In the outskirts, differences become noticeable for halos B and C. This is due to the fact that there are fewer stars in these regions and the profiles have a significant contribution from coherent streams. As a result, the direct fitting of radial profiles returns parameters that are slightly different from those obtained from the likelihood technique because the latter also involves fitting to the velocity distribution. In addition, the direct fit is dependent on our choice of radial binning.
The velocity anisotropy, β, is poorly estimated. The model assumes β to have a single value at all radii. However, the true velocity anisotropy in the simulation does depend on radius: the blue solid curve with errors in Fig. 5 is the velocity anisotropy profile of stars in halo A as a function of distance from the halo centre. We also show the mean value of β (0.66 in Table 1) as the blue dashed line. The poor measurement of β is not simply due to radial averaging, because we can see that the estimate of β for halo A, 0.994, is significantly greater than the real value over the whole radial range probed. The black curve shows the radial profile of β for dark matter particles. There is an obvious offset between the velocity anisotropy profiles of stars (tagged particles) and all dark matter, which we will discuss in detail in Appendix A.

Radial plus tangential velocity
Best-fit parameters when tangential velocities are also included are shown as black points in Fig. 6 and in Table 1. Compared with the results when only radial velocities are used, we see a reduction in the overall bias of the best-fit parameters with respect to their true values. However, there are still significant discrepancies between best-fit and true parameters, compared with the small errors. M200 is underestimated for halos A, C, D and E by 40%, 20%, 20% and 15% respectively. For halo B there is a 5% overestimate. Although the measured host halo masses seem to be worse for halos A, C and D, compared to the case where only radial velocities were used, the agreement between measured and true halo concentrations in the same halos is significantly improved. The best-fit spatial parameters, on the other hand, converge to stable values that agree well with true values. Compared with Fig. 4, the measurements in Fig. 6 are much more clearly correlated with the true halo properties. In particular, the shape of the best-fit and true β curves are in good agreement, although there is approximately a constant offset between them. Tangential velocities are therefore essential for measuring tracer velocity anisotropy, but even with this information there can still be a systematic bias in  Figure 7. A phase-space contour plot (kinetic energy, K, versus angular momentum, L) of mock halo stars (green solid curves) and model predictions (red dashed curves). The left column is based on true halo parameters, true β and true spatial parameters of stars. The middle column is identical to the left column, except that β has been fixed to its best-fit value in Table 1 (when both radial and tangential velocities are used). Halo and tracer parameters in the right column have all been fixed to be the best-fit parameters with both radial and tangential velocities. Contours for the five halos are presented in different rows, as indicated by texts in the left column of each row.
the absolute value of β recovered by distribution function modeling. We return to this point in Appendix A.

Overall model performance
We have shown that the degree of bias between true and best-fit values resulting from our fitting procedure differs from halo to halo. In the current subsection we aim to show how well the model works in recovering the overall phase-space distributions of our mock halo stars. Fig. 7 shows phase-space contour plots for mock halo stars (green solid curves) and compares them with the predictions of our model (red dashed curves). We choose to make the contour plot in terms of two observable quantities: kinetic energy, K, and angular momentum, L. Each row shows a different halo. In each column, the choice of model parameters is different. In the leftmost column, we use the true values of all parameters. In the central column, we use the true values for all parameters except β, which is set to be the best-fit value in Table 1 obtained using both radial and tangential velocities. In the rightmost column, we set all parameters to their best-fit values (again using both radial and tangential velocities). Since the green solid curves show data from the simulation, they are identical for all three columns of a given row. The contour levels correspond to the mass density of the 10, 30, 60 and 90% densest cells.
The distribution functions defined by the true parame-ters (red dashed curves) are a poor description of the mock stars in the left hand column, especially for halos A, C and D where we see a significant over prediction of low angular momentum particles. Halo E is the exceptional case in which we find good agreement. The strongest disagreement in the other halos is, interestingly, mainly due to the biased measurement of β. In the central column, where we fix the value of β to its best-fit value (obtained using both velocity components) we see that the model predictions agree much better with the true phase-space distribution, although some discrepancies remain. For comparison, the right hand column of Fig. 7 shows that model predictions based on the best-fit parameters give an equally good match to the simulation data. Judging by eye alone, it would be hard to tell whether the middle column shows better or worse agreement than the right hand column. However, judging according to the likelihood ratios, the best-fit halo parameters are indeed a much better description of the data than the true halo parameters (≫ 3σ). This is also reflected in the small formal errors of the fit. Fig. 7 indicates that the model is able to recover the general phase-space distribution of the mock halo stars, although there are some subtle factors which significantly bias our best-fit parameter values relative to their true values. There are several possible sources of this bias:

SOURCES OF BIAS
• Both the underlying potential and the spatial distribution of tracers may not satisfy the spherical assumption.
• There are ambiguities in how to model the boundaries of halos.
• The true dark matter distribution may deviate from the NFW model.
• The velocity anisotropy (β) is not constant with radius as assumed in the model.
• Correlations among parameters make the model more sensitive to perturbations and, in some cases, a poor fit to one parameter will propagate to affect the others.
• Tracers may violate the assumption of dynamical equilibrium.
We have investigated each of these possibilities and found that violations of the spherical symmetry assumption and ambiguity in the treatment of halo boundaries are relatively unimportant; hence we describe their effects in Appendix B. We have investigated the density profiles of the Aquarius halos and found that halo A is not well fit by an NFW profile; instead its inner and outer density profiles are better described by two different NFW profiles of different mass and concentration. This might explain the systematic underestimation of M200. For the other halos, the NFW form is a good approximation and thus deviations from it are not the dominant source of bias.
In Fig. 6 we showed that the velocity anisotropy, β, varies strongly with radius. The best-fit value of β (which is assumed to be constant) turns out to show a significant bias. However, when we investigate correlations among parameters (in the following subsection) we find that, fortunately, this bias does not propagate to the other parameters of the model. We therefore defer discussion of the origin of the bias in β to Appendix A. In the following, we focus on correlations among model parameters and the dynamical state of tracers, which are the most important factors. Fig. 3 demonstrated a strong degeneracy between M200 and c200. From a modeling perspective, this is dangerous: there are multiple combinations of halo parameters that can give almost equally good fit to both the tracer density profile and velocity distribution. In this subsection we ask what causes this degeneracy and whether there are similar correlations among other parameters. In particular, we have seen that the model gives strongly biased estimates of the velocity anisotropy of stars, β. We want to check whether this bias propagates to the other parameters. To this end, Table 3 gives the normalized covariance matrix of the parameters recovered for halo B, using both radial and tangential velocities as constraints. The covariance matrix is qualitatively similar for the other halos. The degeneracy between M200 and c200 is very strong (covariance close to -1). To understand the origin of this degeneracy, we have explored the velocity distribution of tracers predicted by the model using different combinations of M200 and c200. We verify that, if M200 is increased, the predicted velocity distribution of stars in the centre of the halo extends to larger velocities, with a corresponding reduction in the probability of smaller velocities. A decrease in c200 can roughly compensate for this change in the velocity distribution. In terms of mass profiles, the degenerate parameters predict similar halo masses inside a certain characteristic radius. This radius is very close to the half-mass radius of the tracer population (more discussions will be presented in Han et al., in preparation). Conversely, the mass inside the characteristic radius changes rapidly for M200 and c200 along lines perpendicular to the degenerate contour. This suggests that the mass inside a fixed radius close to the half-mass radius of tracers is better constrained than the total halo mass.

Correlations among model parameters
In Fig. 8, we examine the halo mass profiles with the best-fit parameters, normalized by profiles with the true parameters. Except for halo B, which gives an acceptable result at all radii, the measurements are very close to the true value for r 0.2R200 but become biased at larger radii.
In contrast to the strong degeneracy between M200 and c200, the correlation between β and all the other parameters is very weak. This is fortunate, as it suggests the systematically biased estimate of β will not introduce further bias to the other parameters.
Correlations between halo parameters (M200 or c200) and tracer spatial parameters (α, γ and r0) are at the level of a few tens of percent. An increase in the tracer density . The best-fit total mass inside a fixed radius compared to the true mass inside that radius. Errors are obtained through error propagation from the covariance matrix of ρs and rs, with the correlated error between ρs and rs included. The black dashed line marks equality between the measured and true mass.
outer slope would cause an increase in the recovered halo mass and a corresponding decrease in halo concentration; conversely, an increase in the inner slope would cause a decrease in the halo mass and an increase in the concentration. As a result, uncertainties in the fit to the tracer density profile may further bias the best fit halo parameters. For example, the best-fit (green dashed) curve in the halo C panel of Fig. 2 agrees well with the true profile (red points) inside 170 kpc but is somewhat shallower at larger radii. If we fix the three spatial parameters in our fit to halo C to those given by a conventional reduced-χ 2 best-fit to the tracer density (dashed black curve in Fig. 2) the best-fit halo mass is boosted by about 10%. If the tracer density profiles deviate from the double power-law form, these correlations between halo parameters and spatial parameters would introduce further bias to the best-fit halo mass. Lastly, we note that correlations between the three spatial parameters are strong as well. This quantifies our earlier finding that, in the case of halo A, adding tangential velocities as constraints in the fit makes the outer slope of the tracer density profile shallower and the break radius smaller, but results in very little perceptible change in the overall profile shape. Hence, good fits to the tracer density profiles may not be unique. An increase in r0 can be roughly compensated by a corresponding increase of both α and γ.

Unrelaxed dynamical structures
The model distribution function used in our analysis assumes that the tracer population is in dynamical equilibrium and hence the phase-space density of tracers is conserved. Our mock halo stars are all accreted from satellite galaxies, with a range of accretion times. Some prominent phase-space structures, such as stellar streams, may therefore violate the assumption of dynamical equilibrium. In this section we ask how the presence of unrelaxed dynamical structures affects our results.
We expect the dynamical state of stars in our mock catalogue to depend on the infall redshift of their parent satellite, at least approximately (satellites on different orbits will have different rates of stellar stripping). We might expect to obtain an improved mass estimate if we use only stars from satellites that fell in earlier, because they have had more time to relax in the host potential. To test this, we rank halo stars according to their infall time 4 . We measure the host halo mass and concentration with samples defined by different cuts in infall time, corresponding to roughly the same fraction of stellar mass in each halo. The top and middle rows of Fig. 9 present these parameters as a function of the fraction of stars selected by each cut, for the five different halos. A small fraction corresponds to an earlier mean infall time, but also (obviously) to a smaller sample size. A fraction of 1 means all the mock stars have been included, hence the corresponding parameters are those listed in Table 1.
We see fluctuations in the measured halo properties with infall time, but no obvious trends. Using samples of stars with earlier mean infall times does not seem to reduce the bias between best-fit and true parameters. This may be because the dynamical state of tracers depend on many other factors, such as the orbit of their parent satellites 5 . Samples for which the measured halo masses increase produce a corresponding decrease in the measured concentrations, again reflecting the strong degeneracy between M200 and c200.
To gain more intuition regarding the dynamical state of halo tracers, Fig. 10 shows phase-space scatter diagrams for mock halo stars (radius, r, versus radial velocity, vr). Points are colour coded according to the infall time of their parent satellite, with black points corresponding to satellites falling in earliest and blue, magenta, red and yellow points to successively later infall times. Stars with earlier infall times are clearly more centrally concentrated. For points in Fig. 6 with decreasing fraction of stars that fell in earliest, the corresponding particles in Fig. 10 can be found by excluding yellow, red, magenta and blue points by sequence and looking at the remaining points.
Green curves in Fig. 10 are contours of constant angular momentum and binding energy. There are six contours in total, corresponding to three discrete values of binding energy and two discrete values of angular momentum: dashed lines have a higher angular momentum than solid lines. Smaller maximum radius indicates higher binding energy. It is thus straightforward to see that particles with higher binding energy have smaller velocities and are more likely to be found in the inner regions of the halo. Comparing the solid and dashed contours, we see that increasing angular momentum at fixed binding energy causes significant differences in the inner regions of the halo, while at larger radii the two sets of contours almost overlap.
We can see that points with the same colour trace these contours with some scatter, implying that stars whose parent satellites fall in at a particular epoch share similar orbits. This can be seen more clearly in the bottom right panel, which shows a scatter plot of binding energy versus angular momentum for stars in halo A. Points with the same colour occupy regions covering a narrow range of binding energy. The correlation between infall time and binding energy of subhalos has been studied by Rocha et al. (2012). Here we have shown that stars from stripped subhalos show a correlation between infall redshift and binding energy as well.
Although mock stars trace the green contours overall, we can still see some prominent structures. For example, there are two yellow spurs in the outskirts of halo D and one yellow spur in halo E. These correspond to particles that have only just been stripped from their parent satellites. These stars are far from equilibrium: their exclusion causes the rapid change in M200 and c200 in Fig. 9 between fractions of 1 and ∼ 0.7 in halos D and E.
To confirm that these unrelaxed phase-space structures can affect our results, we have repeated the above exercise excluding all stars whose parent satellites have not been en-tirely disrupted. Corresponding results are shown in the bottom row of Fig. 9, again ranking stars by their infall time. Measured halo masses are clearly affected by excluding stars whose parent satellites still survive. For halos A and C, we see some small fluctuations in the measured halo mass, but the systematic underestimate of the true halo mass still remains. The most dramatic changes occur for halos B, D and E. First, the point corresponding to a fraction of 1 for halos D and E show a significant increase in the recovered mass towards the true values, reinforcing our conclusion that unrelaxed structures are causing significant underestimates of M200 in these halos. In fact, most of the yellow dots in halo D panel of Fig. 10 are stars that have been stripped from satellites that still survive. After excluding these, the two highest-fraction points in the halo D panel are based on almost the same sample of stars. We also notice that fluctuations around the true value for the different fractions are reduced in the bottom row (for example, the two lowest fraction points in halo D).
The effects of excluding halo stars from surviving satellites are more ambiguous in halos B and E. Excluding stars that are considerably unrelaxed should better match the assumption of dynamical equilibrium, but these measurements are entangled with other uncertainties that might get worse. Our conclusion is thus for halo D (and perhaps E) the underestimates of their host halo masses when all particles are used are mainly due to unrelaxed dynamical structures; for the other halos, the effects of unrelaxed dynamical structures are not obvious. A more detailed study quantifying the dynamical state of tracers will be carried out in Han et al. (in preparation).

MODEL UNCERTAINTIES IN THE RADIAL AVERAGE AND IMPLICATIONS FOR REAL SURVEYS
We have seen in the previous section that our maximum likelihood technique recovers different halo mass from sets of tracers with different infall redshifts, or more fundamentally, different binding energies. The sense and magnitude of these differences show no obvious correlations with either quantity, however. Stars falling in earlier typically have high binding energy and are mostly concentrated in the central regions of the halo; since binding energy correlates with radius, we may also expect fluctuations in the recovered halo parameters when using samples of stars drawn from a particular radial range. In this section we investigate this radial dependence. This helps to understand the behaviour of the full model, which averages over all radii. Variations with radius are relevant to observational applications as well, because in practice tracers are often selected from relatively narrow radial ranges, and these ranges may be different for different tracers.
We assign stars to four bins of galactocentric radius: (7-20) kpc, (20-50) kpc, (50-100) kpc and >100 kpc. The model distribution function is then fit to stars in each bin separately. However, in each case the three spatial parame-ters of the tracer distribution are fixed to their best-fit values obtained from tracers over the entire radial range, otherwise we would end up with extremely poor extrapolations based on the local density slope. All the other parameters, M200, c200 and β, are left as free parameters. The window function, Eqn. 16, is modified appropriately for each bin. Fig. 11 shows the measured M200 as functions of the mean radius of each bin, normalized by the halo virial radius (R200). The value of M200 varies significantly with the tracer radius. For halos A, C and E, stars in the outermost (r >100 kpc) and innermost (r < 20 kpc) bins give underestimates, while stars at 20< r <100 kpc give significant overestimates. Similarly, for halos B and D, stars at r >100 kpc give underestimates, whereas stars at smaller radii give overestimates.
The velocity anisotropy of tracers, β, is a function of radius, whereas the model distribution function assumes a single value of β. To test whether the radial average of β may affect our estimates in the host halo mass, we repeated the analysis of Fig. 11 but fixed the value of β in each radial bin to the best-fit value obtained from the whole population. These measurements are almost identical to the measurements presented in Fig. 11, which confirms our result from Sec. 5.1 that the radial averaging of β does not cause further bias in the other parameters.
One feature in Fig. 11 is puzzling at first glance: the best-fit halo masses obtained from the four radial bins individually are all larger than the best-fit halo mass (M200 = 1.15) obtained using stars over the whole radial range. This seems odd, as we might expect that the best-fit M200 would be close to the average of the values estimated from the four radial subsamples. The situation is not that straightforward, however, because the likelihood surfaces from the subsamples are superimposed in two dimensional (M200 and c200 space when the full sample is used. Coupled with the strong degeneracy between the two parameters, the peak of the final likelihood surface is located around a region where the degeneracy lines from different subsamples intersect. In real observations, there is often a maximum radius of tracers corresponding to the instrumental flux limit. In the present literature this limit is typically much smaller than the expected halo virial radius. Beyond this maximum radius, extrapolations are required to fit the distributions of both the dark matter and tracers. We explore the implications of this directly in Fig. 12 by adopting several outer radial cuts (r < rcut) and reporting the estimated halo mass as a function of the cut radius normalized to the virial radius (rcut/R200). Unlike Fig. 11, the three spatial parameters are treated as unknown and left free to be constrained by the fit, in order to mimic real observations where the density profiles of tracers is taken directly from the available data.
The overall trends with rcut in Fig. 12 are very clear: the recovered halo mass is constant at large rcut, and turns up once rcut becomes small (about 40 per cent of R200). We checked the best-fit values of the tracer spatial parameters in each case, and found they do not vary much with the radial cut as long as rcut < 0.4R200. This is because the break radius of tracer density profiles in our mock catalogues are smaller than 0.4R200 for all the five halos, and so the extrapolation in tracer density is not severe. However, once rcut reduces below 0.4R200, the outer slope becomes essentially unconstrained. We believe the turn-up behaviour is due to respect to 'asymptotic' results from the same method using all stars in the halo. Furthermore, instead of being a sharp cut, the radial selection functions of real surveys are often complicated, with non-trivial incompleteness as function of radius and angular position. These selection effects may cause additional bias in the measured host halo mass.

CONCLUSIONS
Several authors have measured parameters of the host halo of our MW, in particular its total mass, by fitting specific forms of the distribution function to the observed distances and velocities of dynamical tracers such as old BHB and RR Lyrae stars, globular clusters and satellite galaxies (Wilkinson & Evans 1999;Sakamoto et al. 2003;. These models assume that the tracers are in dynamical equilibrium within the host potential. With the help of Jeans theorem, the distribution function of the tracers is further assumed to depend only on two integrals of motion, the binding energy, E, and the angular momentum, L. In the case of a separable function of E and L, the distribution function can be obtained through Eddington inversion of the tracer density profile. In this paper we have extended earlier analytical forms of the MW halo distribution function to the case of the NFW potential, which is of most relevance to CDM-based models. We generalized the radial distribution of tracers (halo stars) to a double power law, which is suggested by recent observational results and simulations. We used a maximum likelihood approach to fit this model distribution function to a realistic mock stellar halo catalogue of distances and radial velocities, constructed from the high resolution Aquarius N -body simulations using the particle tagging technique of Cooper et al. (2010). Our aim was to test the model performance and assumptions. We considered cases with and without additional tangential velocity data. Our conclusions are as follows: • The best-fit host halo virial masses and concentrations are biased from the true values, with the level of bias varying from halo to halo.
• Adding tangential velocity data substantially reduces this bias, but does not eliminate it. For example, for halo B the agreement between measured and true halo mass is very good (a 5% overestimate) if tangential velocities are used, but for halo A, a 40% underestimate persists even with this additional constraint. The inclusion of tangential velocities therefore is crucial for accurate measurements of both host halo and tracer properties, especially for the velocity anisotropies of the tracers.
• A strong negative correlation between the host halo mass and the halo concentration is found in our analysis. The two parameters are almost completely degenerate, with covariance very close to -1.
• The model gives a strongly biased measurement of the velocity anisotropies of stars.
• There are various sources that contribute to the biased estimates of halo properties. The two most important factors in our analysis are violation of the dynamical equilibrium assumption and correlations among different model parameters, especially the strong degeneracy between M200 and c200.
• In contrast to the significantly biased measurements of M200 or c200, the model gives good constraints on the total mass inside a fixed radius of about 0.2R200.
The strong degeneracy between halo mass and concentration arises because changes in the corresponding tracer velocity distribution due to the increase of one of these parameters can be roughly compensated by the other. This warns us that it is dangerous to constrain the total mass of the MW halo with this specific form of the distribution function. Small perturbations, for example from dynamically hot structures, may cause significant bias. Similar degeneracies between model parameters and the robustness of the best constrained mass within a fixed radius have been reported and discussed in previous work (e.g. Deg & Widrow 2014; Kafle et al. 2014;Wolf et al. 2010), although these models are quite different from ours. Further information needs to be incorporated into the model to break this strong degeneracy and give better constraints on halo properties.
In addition to the degeneracy between halo mass and concentration, relatively weak but still significant correlations exist between these halo parameters and the three parameters describing the spatial variation of tracer density. We found that a steeper inner slope gives a lower estimate of halo mass, while a steeper outer slope gives an higher estimate. If the true tracer density profile deviates from the double power-law form, the resulting bias will be propagated to the best-fit values of the halo parameters.
The correlations between velocity anisotropy, β, and all other parameters are very weak. This is fortunate, because we know that the model can give highly biased estimates of β for stars; this particular bias is not propagated to the other parameters.
The model distribution function requires tracers to be in dynamical equilibrium, with time-independent phase-space density. In reality, stars stripped from satellite galaxies can have highly correlated orbits that violate this assumption. We were able to test how well the assumption holds for our mock halo stars. Perhaps surprisingly, we do not find any systematic correlation of the recovered halo mass with the infall redshift of tracer subsamples. This suggests that the dynamical state of halo tracers depends on other factors, such as their orbits, and not only their infall time. Dynamical relaxation is nevertheless a factor: excluding stars stripped from surviving satellites improves the agreement between best-fit and true halo masses in two cases (halos D and E). This cut eliminates dynamically hot structures that can be identified by eye in these halos.
Beyond all these assumptions and uncertainties in the model itself, in real observations the maximum observable radius of dynamical tracers may be much smaller than the halo virial radius. We found tracer subsamples selected over different ranges of radius can give significantly different estimates of the host halo mass, even if the three parameters describing the density of tracers are fixed to be those derived from the whole tracer population. An outer radius limit results in biased measurements if it significantly smaller than the virial radius. For example, the recovered halo masses of halos A, B, C and D converge for outer radius limits larger than r ∼ 0.4R200 but give systematically larger masses for smaller radial limits. For one halo, E, this overestimation occurs for limits r 0.8R200. There are two reasons behind this radial dependence: stars at different radii have significantly different dynamical state and extrapolations to larger and smaller radii become less accurate when only a limited radial range is sampled.
We conclude that methods to estimate the mass of the Milky Way halo using the kind of distribution functions we have investigated here need to be used with extreme caution. This is particularly true when estimating the total virial mass. Restricting the estimate to the mass interior to r ∼ 0.2R200 is considerably more reliable. In any case, mock catalogues like those we have analyzed here (and made publicly available in Lowing et al. (2014)) are required to assess the reliability of any particular mass estimation method. In Fig. 6 we showed that there is a systematic bias between the best-fit and true value of β. We now show in the left panel of Fig. A1 the phase-space distribution of stars in halo A (top panel, binding energy, E, versus angular momentum, L) and the one dimensional angular momentum distribution at two fixed values of E (the second and third panels from the top, respectively) indicated by the horizontal dashed lines in the top panel. 6 E and L have been normalized so that they are dimensionless (see Eqn. 13). We only show results based on halo A; for the other halos our conclusions are the same.

ACKNOWLEDGMENTS
In the top panel we see that, for a fixed value of E, there is an upper limit to L which increases with decreasing E. This is the maximum allowed value of angular momentum, corresponding to circular orbits with zero radial velocity at fixed E. In the next two panels, we see that the angular momentum distributions at different values of E have similar features. They are both flat at small values of L and drop quickly when L approaches its upper limit. For comparison, we also plot two lines of the form F (E, L) ∝ L −2β , where β is the velocity anisotropy obtained from particles in the energy slice or the full sample. Neither model could give a satisfactory description of the empirical distribution. This implies that the physical interpretation of the power-law index in our distribution function as β is inaccurate. The true distribution function must be more complex.
We also notice that the best-fit value of β for halo A is 0.458 (Table 1), which predicts a power-law slope that is still a poor match to the empirical distributions in Fig. A1. If we fix the power-law slope in the model according to the true anisotropy of the full sample, this results in better agreement with tangential velocity distribution but a much poorer agreement with the radial velocity distribution. Afterall, our maximum likelihood approach is designed to fit the velocity and spatial distributions of stars, not the distributions of binding energy or angular momentum.
The β profile of dark matter particles have been studied in earlier works. For example, Wojtak et al. (2008Wojtak et al. ( , 2009 looked at the distribution functions of dark matter particles in halos of mass 10 14 to 10 15 M⊙. Although the details of their modeling and the mass range of halos are different from ours, their model distribution function can recover well the true β of dark matter particles in their simulation. We therefore examine the angular momentum distribution of dark matter particles in our simulations in the three right panels of Fig. A1. We find the mean velocity anisotropy for dark matter particles is about 0.3 (see black dashed line in Fig. 5). This agrees better with the shape of dark matter L distributions (green dashed lines), although there are still obvious discrepancies. Red dashed lines are predicted from the velocity anisotropy of dark matter particles in the particular binding energy range being probed. At E ∼ 10 −0.4 v 2 s , the agreement between the red dashed lines and the shape of the L distributions is quite poor, whereas at a lower binding energy (E ∼ 10 −1.1 v 2 s ), we see a better agreement. We have looked at many different choices of E in this regard, and found that for less bound dark matter particles, their velocity anisotropy correctly predicts the power-law slope of their L distribution. However, for dark matter particles that are more tightly bound, the velocity anisotropy is not as well correlated with the power-law slope of the L distribution. This is the same as the stellar case, although the discrepancy for dark matter particles is smaller.
Stars in the stellar halo are clearly a biased population of tracers with respect to dark matter particles in the simulation. Their orbits are more radial (Fig. 5) corresponding to a higher β. However, the difference in β is not because stars are more dynamically bound than dark matter particles: we have explicitly checked that, for a given fraction of the most bound dark matter particles, orbits are still more tangentially biased than stars with the same range of binding energy. The bias in β thus has more fundamental physical origin. First of all, in our model halo stars are all accreted from subhalos, while dark matter particles are added to the main halo by both clumpy and smooth accretion. We have calculated the velocity anisotropy of dark matter accreted from subhalos only, and found these particles are more radially biased than all the dark matter particles as a whole. This is probably because the clumps in which these particles are accreted (i.e. subhalos) have more radially biased orbits. Furthermore, halo stars in our analysis are tags placed on the most bound dark matter particles in progenitor subhalos, which have then been stripped and mixed into the main halo. Lowing et al. (2014) have found the halo stars are dominated by contributions from a few massive satellites. As the most bound parts of these satellites have been stripped into the halo, the satellites are more likely to have been on highly radial orbits, imparting a radial bias to halo stars. In contrast, dark matter particles enter the main halos in our simulations through quite different mechanisms, with both clumpy and smooth accretion ).

APPENDIX B: MODEL UNCERTAINTIES IN THE SPHERICAL ASSUMPTION
In our analysis and the majority of existing studies of using dynamical tracers to constrain the MW host halo mass, x -x y -y z -z all subsample 0.8  Figure B1. The host halo masses obtained by fitting to samples of stars drawn from different sky directions. Halos have been rotated based on their dark matter distribution. The x-axis and z-axis in the rotated system are chosen to be the major and minor principles of the dark matter halo. Measurements are repeated for six survey cones centered at ±x, ±y and ±z direction, with opening angle ± π 4 . Results based on all stars are plotted as the right most point.
both the tracer population and the underlying potential are modeled assuming spherical symmetry. However, we know dark matter halos in N-body simulations are triaxial (e.g. Jing & Suto 2002), and the spatial distribution of tracers is unlikely to be perfectly spherical either. It is thus necessary to investigate how the triaxial nature of the underlying dark matter potential and tracer populations affect our results.
To do this, we first rotate the five halos to a new Cartesian coordinate system defined by their principle axes. In this rotated system, the z-axis is aligned with the minor axis of the halo and the x-axis with major axis of the halo. We then repeat our analysis using six subsamples of tracers drawn from mock 'survey' cones pointing along each of the three axes from the origin at the centre of the halo, in the positive and negative directions. The opening angle of each cone is ± π 4 . Fig. B1 shows the recovered host halo masses for each of the six cones. Tangential velocities are included in the fit. For a direct comparison, we have also plotted results based on all tracers, as the right most point (from Table 1). There are significant variations in the results obtained from surveys along different directions, ranging from only a few percent (halo A) to as much as a factor of two (halo D and E). Halos D and E have the most obvious variations. We have explicitly checked that the significant overestimate along the positive y-axis of halo D is due to the existence of four relatively massive subhalos (M subhalo /M 200,host > 1%), while the large variation in halo E is due to one prominent stream (see the yellow dots in the bottom panel of Fig. 10 or Fig. 6 in Cooper et al. (2010)), which ranges from r ∼ 80 kpc (∼ 0.3R200) all the way to the virial radius.
These variations are, however, almost random and uncorrelated with the choice of any particular principle axis, and they change from halo to halo. Halo A has the smallest variation, with all results well below the true host halo mass (red dashed line). Although stronger variations are seen for halo C, all results are again well below the true host halo mass. Thus despite the fact that the variations between directions can be as large as a factor of two, this does not seem to be the dominant cause of the systematic differences between the best-fit and true halo mass in our analysis.

APPENDIX C: UNCERTAINTIES IN MODELING THE HALO BOUNDARIES
For all the analysis in the main text, we have been assuming the spatial extent of both NFW halos and tracers are infinite (r max,h = ∞ and rmax,t = ∞). It is, however, necessary to investigate whether the different choices of halo and tracer boundaries could affect our measured halo properties.
As we have mentioned in Sec. 3.2, in principle tracer boundaries (rmax,t) can be different from halo boundaries. Here for simplicity we assume r max,h = rmax,t. We tried four different choices of halo boundaries (r max,h ), ranging from twice to five times the halo virial radius. We avoid using boundaries at exactly the halo virial radius because our mock halo stars can extend beyond R200, while the mass distribution in the FoF group distribute continuously and extend further than R200. A sharp cut at R200 is thus not realistic.
The best-fit host halo masses and concentrations as functions of halo boundaries are presented in Fig. C1. The velocity anisotropy β almost does not change with the different choices of halo boundaries, and thus we do not show them. Dashed red lines are true values of halo masses and concentrations.
The measured halo masses increase with the decrease in halo boundaries, and the halo concentrations decrease accordingly, reflecting again the strong degeneracy between the two parameters. The choice of halo boundary that gives the best match between measured and true halo mass varies from halo to halo. For halo B, the best fit halo masses and concentrations almost do not change with the choice of halo boundaries when r max,h 3R200 and agree well with the true values. At r max,h = 2R200, the measured host halo mass gets significantly larger, indicating for halo B finite halo boundaries do not help to improve the fitting. For halo C, r max,h = 2R200 gives a very good match between the bestfit and true halo mass and concentration, demonstrating a finite halo boundary works better than infinite boundaries at least for halo C.
The estimated halo mass of halo A is closest to the true value when halo boundary is chosen at twice the virial radius. However, the estimated halo concentration at that boundary deviates significantly from the true concentration, suggesting the discrepancy between best-fit and true host halo mass of halo A could not be dominated by how boundaries are modeled. For halo D and E, we know already because of the existence of unrelaxed dynamical structures, the host halo masses are significantly underestimated, and thus the best match between measured and true halo parameters at twice (halo E) and four times (halo D) the virial radius demonstrates the entangling of different model uncertainties, which canceled with each other to give a good prediction.

APPENDIX D: DARK MATTER PARTICLES AS TRACERS
In Sec. A we show the velocity anisotropies of dark matter particles agree better with the distribution function model than those of stars. Furthermore, dark matter particles are more radially extended than stars and might probe better the underlying potential in outskirts. Thus we ask whether better constraints on the halo properties can be achieved by using dark matter particles as tracers. Obviously, it is not possible to directly observe the dynamics of dark matter, but asking this question helps to deepen our understanding of the model. The answer is, unfortunately, no. By using dark matter particles as tracers we end up with significant overestimates of the host halo mass, at least for our five Milky Way analog halos. These measurements are shown in Table D1, where we have used a randomly selected subsample of all dark matter particles in the halo FoF group (one particle out of every 5,000 in the simulation). We have explicitly checked that this conclusion does not change if we randomly select different subsamples of dark matter, remove dark matter particles in substructures or restrict them to be inside the halo virial radius. Both radial and tangential velocities have been used in this analysis.
To explore the reasons behind this, we present in Fig. D1 phase-space contour plots for dark matter particles and stars in the simulation and compare these with realizations drawn directly from the model distribution function. We only show plots based on halo A; for the other halos the conclusion is the same. Distributions of binding energy versus radial velocity, vr, tangential velocity, vt, and radius, r,  Figure D1. The phase-space density of dark matter particles or stars in halo A (green solid) and model predictions (red dashed). The contours mark the 10th, 30th, 60th and 90th percentiles of the 2D density distribution in parameter plane. We present contour plots of binding energy, E, versus radius, r, radial velocity, vr, and tangential velocity, vt. For simplicity we only use the magnitudes of vr and vt, so all quantities are positive. In deducing the binding energy, we use the analytical NFW potential model. Green contours in the left and middle columns are based on dark matter particles in the simulation, while in the right column we plot contours for stars. For the left, middle and right columns, true halo parameters, dynamical best-fit halo parameters from dark matter particles and true halo parameters are adopted in the potential model respectively. The lines are iso-density contours that contain the 10,30,60 and 90% densest cells. have been plotted separately, so that we are able to see how well the model prediction agrees with the true distribution of vr, vt and r for dark matter particles in the simulation. It is very clear to see that, with true halo parameters, the model predictions deviate significantly from the empirical distribution of r, vr and vt at low binding energy (left column). On the other hand, the best fit model agrees much better with the data (middle column). This improved agreement is caused by an overestimate of the host halo mass, leading to a deeper potential and increased binding energy for tracer particles. As a result, the sample becomes more dynamically bound and agrees better with the model.
Our conclusion is therefore that, although the model distribution function gives a better approximation in the velocity anisotropy of dark matter particles, the predicated phase-space distribution at the low binding energy end is very poor. By construction, the distribution function only describes closed systems. This not only requires that the tracer population should be bound and truncated, but also results in a sharp cut-off in binding energy that is a poor description of particle distribution in the low energy end. Compared with all dark matter particles, the star particles in our simulation are more dynamically bound and centrally concentrated (see the right column) and are thus not sensitive to the low binding energy tail of the dark matter distribution. Using stars rather than dark matter particles therefore insulates the fit from deficiencies of the distribution function model at the low binding energy tail.