The old nuclear star cluster in the Milky Way: dynamics, mass, statistical parallax, and black hole mass

We derive new constraints on the mass, rotation, orbit structure and statistical parallax of the Galactic old nuclear star cluster (NSC) and the mass of the supermassive black hole. We combine star counts and kinematic data from Fritz et al (2014), including 2'500 line-of-sight velocities and 10'000 proper motions. We show that the difference between the proper motion dispersions sigma_l and sigma_b cannot be explained by rotation, but is a consequence of the flattening of the NSC. We fit the surface density distribution of stars in the central 1000"by a spheroidal cluster with scale ~100"and a much larger nuclear disk component. We compute the two-integral distribution function f(E,Lz) for this density model, and add rotation self-consistently. We find that: (i) The orbit structure of the f(E,Lz) gives an excellent match to the observed velocity dispersion profiles as well as the proper motion and line-of-sight velocity histograms, including the double-peak in the v_l-histograms. (ii) This requires an axial ratio of q= 0.73+-0.04 for r<70"consistent with our determination from star counts. (iii) The NSC is approximately described by an isotropic rotator model. (iv) Using the corresponding Jeans equations to fit the proper motion and line-of-sight velocity dispersions, we obtain best estimates for the NSC mass, black hole mass, and distance M*(r<100")=(8.94+-0.31|stat+-0.9|syst)x10^6Msun, Mbh=(3.86+-0.14|stat+-0.4|syst)x10^6Msun, and R0=8.27+-0.09|stat+-0.1|syst kpc, where the systematic errors estimate additional uncertainties in the dynamical modeling. (v) The combination of the cluster dynamics with the S-star orbits around Sgr A* strongly reduces the degeneracy between black hole mass and Galactic centre distance present in previous S-star studies. A joint statistical analysis with the results of Gillessen et al (2009) gives Mbh=(4.23+-0.14)x10^6Msun and R0=8.33+-0.11kpc.


INTRODUCTION
Nuclear star clusters (NSC) are located at the centers of most late-type galaxies. They are more luminous than globular clusters, have masses of order ∼ 10 6 − 10 7 M⊙, have complex star formation histories, and obey scaling-relations with host galaxy properties as do central supermassive black holes (SMBH); see Böker (2010) for a review. Many host an AGN, i.e., a SMBH (Seth et al. 2008), and the ratio of NSC to SMBH mass varies widely (Kormendy & Ho 2013).
The NSC of the Milky Way is of exceptional interest ⋆ E-mail: sotiris@mpe.mpg.de, gerhard@mpe.mpg.de because of its proximity, about 8 kpc from Earth. It extends up to several hundred arcsecs from the center (Sgr A*) and its mass within 1 pc is ∼ 10 6 M⊙ with ∼ 50% uncertainty (Schödel et al. 2009;Genzel et al. 2010). There is strong evidence that the center of the NSC hosts a SMBH of several million solar masses. Estimates from stellar orbits show that the SMBH mass is M• = (4.31 ± 0.36) × 10 6 M⊙ (Schödel et al. 2002;Ghez et al. 2008;Gillessen et al. 2009). Due to its proximity, individual stars can be resolved and number counts can be derived; however, due to the strong interstellar extinction the stars can only be observed in the infrared. A large number of proper motions and line-of-sight velocities have been measured, and analyzed with spherical c 2002 RAS models to attempt to constrain the NSC dynamics and mass (Haller et al. 1996;Genzel et al. 1996Genzel et al. , 2000Trippe et al. 2008;Schödel et al. 2009;Fritz et al. 2014). The relaxation time of the NSC within 1 pc is tr ∼ 10 10 yr (Alexander 2005;Merritt 2013), indicating that the NSC is not fully relaxed and is likely to be evolving. One would expect from theoretical models that, if relaxed, the stellar density near the SMBH should be steeply-rising and form a Bahcall & Wolf (1976) cusp. In contrast, observations by Do et al. (2009); Buchholz et al. (2009); Bartko et al. (2010) show that the distribution of old stars near the SMBH appears to have a core. Understanding the nuclear star cluster dynamics may therefore give useful constraints on the mechanisms by which it formed and evolved (Merritt 2010).
In this work we construct axisymmetric Jeans and twointegral distribution function models based on stellar number counts, proper motions, and line-of-sight velocities. We describe the data briefly in Section 2; for more detail the reader is referred to the companion paper of Fritz et al. (2014). In Section 3 we carry out a preliminary study of the NSC dynamics using isotropic spherical models, in view of understanding the effect of rotation on the data. In Section 4 we describe our axisymmetric models and show that they describe the kinematic properties of the NSC exceptionally well. By applying a χ 2 minimization algorithm, we estimate the mass of the cluster, the SMBH mass, and the NSC distance. We discuss our results and summarize our conclusions in Section 5. The Appendix contains some details on our use of the Qian et al. (1995) algorithm to calculate the two-integral distribution function for the fitted density model.

DATASET
We first give a brief description of the data set used for our dynamical analysis. These data are taken from Fritz et al. (2014) and are thoroughly examined in that paper, which should be consulted for more details. The coordinate system used is a shifted Galactic coordinate system (l * , b * ) where Sgr A* is at the center and (l * , b * ) are parallel to Galactic coordinates (l, b). In the following we always refer to the shifted coordinates but will omit the asterisks for simplicity. The dataset consists of stellar number densities, proper motions and line-of-sight velocities. We use the stellar number density map rather than the surface brightness map because it is less sensitive to individual bright stars and non-uniform extinction.
The stellar number density distribution is constructed from NACO high-resolution images for R box < 20 ′′ , in a similar way as in Schödel et al. (2010), from HST WFC3/IR data for 20 ′′ < R box < 66 ′′ , and from VISTA-VVV data for 66 ′′ < R box < 1000 ′′ .
The kinematic data include proper motions for ∼10'000 stars obtained from AO assisted images. The proper motion stars are binned into 58 cells (Figure 1) according to distance from Sgr A* and the smallest angle to the galactic plane. This binning assumes that the NSC is symmetric with respect to the Galactic plane and with respect to the b-axis on the sky, consistent with axisymmetric dynamical modeling. The sizes of the bins are chosen such that all bins contain comparable numbers of stars.  We also use ∼2'500 radial velocities obtained from SIN-FONI integral field spectroscopy. The binning of the radial velocities is shown in Fig. 2. There are 46 rectangular outer bins as shown in Fig. 2 plus 6 rectangular rings around the center (not shown). Again the outer bins are chosen such that they contain similar numbers of stars. The distribution of radial velocity stars on the sky is different from the distribution of proper motion stars; hence the different binning. Finally we use about 200 maser velocities (at r > 100 ′′ , from Lindquist et al. 1992;Deguchi et al. 2004).

SPHERICAL MODELS OF THE NSC
In this section we study the NSC using the preliminary assumption that the NSC can be described by an isotropic distribution function (DF) depending only on energy. We use the DF to predict the kinematical data of the cluster. Later we add rotation self-consistently to the model. The advantages of using a distribution function instead of common Jeans modeling are that (i) we can always check if a DF is positive and therefore if the model is physical, and (ii) the DF provides us with all the moments of the system.
For the rest of the paper we use (r, θ, ϕ) for spherical and (R, ϕ, z) for cylindrical coordinates, with θ = 0 corresponding to the z-axis normal to the equatorial plane of the NSC. The first step is to model the surface density. We use the well-known one-parameter family of spherical γ-models (Dehnen 1993): where a is the scaling radius and M the total mass.The model behaves as ρ ∼ r −γ for r → 0 and ρ ∼ r −4 for r → ∞. Dehnen γ models are equivalent to the η-models of Tremaine et al. (1994) under the transformation γ = 3 − η. Special cases are the Jaffe (1983) and Hernquist (1990) models for γ = 2 and γ = 1 respectively. For γ = 3/2 the model approximates de Vaucouleurs R 1/4 law. In order to improve the fitting of the surface density we use a combination of two γ-models, i.e.
Mi ai The use of a two-component model will prove convenient later when we move to the axisymmetric case. The projected density is and can be expressed in terms of elementary functions for integer γ, or in terms of elliptic integrals for half-integer γ.
For arbitrary γ1 and γ2 the surface density can only be calculated numerically using equation (3). The surface density diverges for γ > 1 but is finite for γ < 1. The projected number density profile of the NSC obtained from the data of Fritz et al. (2014) (see Section 2) is shown in Figure 3. The inflection point at Rs ∼ 100 ′′ indicates that the NSC is embedded in a more extended, lower-density component. The surface density distribution can be approximated by a two-component model of the form of equation (2), where the six parameters (γ1, M1, a1, γ2, M2, a2) are fitted to the data subject to the following constraints: The slope of the inner component should be γ1 > 0.5 because isotropic models with a black hole and γ1 < 0.5 are unphysical (Tremaine et al. 1994), but it should be close to the limiting value of 0.5 to better approximate the observed core near the center (Buchholz et al. 2009). For the outer component γ2 ≪ 0.5 so that it is negligible in the inner part of the density profile. In addition M1 < M2 and a1 < a2. With these constraints we start with some initial values for the parameters and then iteratively minimize χ 2 . The reduced χ 2 resulting from this procedure is χ 2 /ν = 0.87 and the corresponding best-fit parameter values are: Here we provide only the ratio of masses instead of absolute values in model units since the shape of the model depends only on the ratio. The surface density of the final model is overplotted on the data in Figure 3. With the assumption of constant mass-to-light ratio and the addition of the black hole the potential (Φ = −Ψ) will be where M• is the mass of the black hole. Since we now know the potential and the density we can calculate the distribution function (DF) numerically using Eddington's formula, as a function of positive energy E = Ψ − 1 2 υ 2 , The 2nd term of the equation vanishes for reasonable behavior of the potential and the double derivative inside the integral can be calculated easily by using the transformation . Line-of-sight velocity dispersion σ los of the twocomponent spherical model with black hole, compared to the observed line-of-sight dispersions (black) and the proper motion dispersions in l (red) and b (green). The line-of-sight data includes the outer maser data, and for the proper motions a canonical distance of 8 kpc is assumed. Figure 4 shows the DF of the two components in their joint potential plus that of a black hole with mass ratio M•/(M1 + M2) = 1.4 × 10 −3 . The DF is positive for all energies. We can test the accuracy of the DF by retrieving the density using and comparing it with equation (2). Both agree to within 0.1%. The DF has the typical shape of models with a shallow cusp of γ < 3 2 . It decreases as a function of energy both in the neighborhood of the black hole and also for large energies. It has a maximum near the binding energy of the stellar potential well (Baes et al. 2005).
For a spherical isotropic model the velocity ellipsoid is a sphere of radius σ. The intrinsic dispersion σ can be calculated directly using The projected dispersion is then given by: In Figure 5 we see how our two-component model compares with the kinematical data using the values D = 8 kpc, M• = 4×10 6 M⊙ and M * (r < 100 ′′ ) = 5×10 6 M⊙. The good match of the data suggests that the assumption of constant massto-light ratio for the cluster is reasonable.

Adding self-consistent rotation to the spherical model
We describe here the effects of adding self-consistent rotation to the spherical model, but much of this also applies to the axisymmetric case which will be discussed in Section 4. We assume that the rotation axis of the NSC is aligned with the rotation axis of the Milky Way disk. We also use a cartesian coordinate system (x, y, z) where z is parallel to the axis of rotation as before, y is along the line of sight, and x is along the direction of negative longitude, with the center of the NSC located at the origin. The proper motion data are given in galactic longitude l and galactic latitude b angles, but because of the large distance to the center, we can assume that x l and z b. Whether a spherical system can rotate has been answered in Lynden Bell (1960). Here we give a brief review. Rotation in a spherical or axisymmetric system can be added self-consistently by reversing the sense of rotation of some of its stars. Doing so, the system will remain in equilibrium. This is equivalent with adding to the DF a part that is odd with respect to Lz. The addition of an odd part does not affect the density (or the mass) because the integral of the odd part over velocity space is zero. The most trivial way to add rotation to a system is reversing the sense of rotation of all of its counterrotating stars. This corresponds to adding f−(E, Lz) = sign(Lz)f (Maxwell's daemon, Lynden Bell 1960) to the initially non-rotating DF, and generates a system with the maximum allowable rotation. The general case of adding rotation to an initially spherical isotropic system can be written f (E, Lz) = (1 + g(Lz))f (E) where g(Lz) is an odd function with max |g(Lz)| < 1 to ensure positivity of the DF. We notice that the new distribution function is a two-integral DF. In this case the density of the system is still rotationally invariant but f− is not.
In Figure 5 we notice that the projected velocity dispersion in the l direction is larger than the dispersion in the b direction which was first found by Trippe et al. (2008). This is particularly apparent for distances larger than 10 ′′ . A heuristic attempt to explain this difference is made in Trippe et al. (2008) where the authors impose a rotation of the form υϕ(r, θ) along with their Jeans modeling, as a proxy for axisymmetric modeling. Here we show that for a self-consistent system the difference in the projected l and b dispersions cannot be explained by just adding rotation to the cluster.
The first part is zero because υxf+ is odd. The second part is different from zero; however when projecting υx along the line-of-sight the term dy υxdυx dυy dυzg(xυy − yυx)f+ is also zero because g(xυy − yυx) is an odd function of y and υy. Hence the projected mean velocity υx is zero. An alternative way to see this is by making a particle realization of the initial DF (e.g. Aarseth et al. 1974). Then we can add rotation by reversing the sign of Lz of a percentage of particles using some probability function which is equivalent to changing the signs of υx and υy of those particles. υ 2 x will not be affected by the sign change and the υ 2 x averaged over the line-of-sight will be zero because for each particle at the front of the system rotating in a specific direction there will be another particle at the rear of the system rotating in the opposite direction. In this work we do not use particle models to avoid fluctuations due to the limited number of particles near the center.
For the odd part of the DF we choose the two-parameter function from Qian et al. (1995). This is a modified version of Dejonghe (1986) which was based on maximum entropy arguments: where η = Lz/Lm(E), Lm(E) is the maximum allowable value of Lz at a given energy, and −1 < F < 1 and κ > 0 are free parameters. The parameter F works as a global adjustment of rotation while the parameter κ determines the contributions of stars with different Lz ratios. Specifically for small κ only stars with high Lz will contribute while large κ implies that all stars irrespective of their Lz contribute to rotation. For F=1 and κ ≫ 0, g(Lz) = sign(Lz) which corresponds to maximum rotation. As a specific example we chose the values F = -0.8 and κ = 2.5 for the parameters in equation (13). From the resulting distribution function f (E, Lz) we can calculate υϕ(R, z) in cylindrical coordinates using the equation To find the mean line-of-sight velocity versus galactic longitude l we have to project equation (14) to the sky plane

AXISYMMETRIC MODELING OF THE NSC
We have seen that spherical models cannot explain the difference between the velocity dispersions along the l and b directions. The number counts also show that the cluster is flattened; see Figure 7 and Fritz et al. (2014). Therefore we now continue with axisymmetric modeling of the nuclear cluster. The first step is to fit the surface density counts with an axisymmetric density model. The available surface density data extend up to 1000 ′′ in the l and b directions. For comparison, the proper motion data extend to ∼ 70 ′′ from the centre ( Figure 1). We generalize our spherical twocomponent γ-model to a spheroidal model given by where m 2 i = R 2 + z 2 /q 2 i is the spheroidal radius and the two new parameters q1,2 are the axial ratios (prolate > 1, oblate < 1) of the inner and outer component, respectively. Note that the method can be generalized to N components. The mass of a single component is given by 4πqi From Figure 7 we expect that the inner component will be more spherical than the outer component, although when the density profile gets flatter near the center it becomes more difficult to distinguish the axial ratio. In Figure 7 one also sees that the stellar surface density along the l direction is larger than along the b direction. Thus we assume that the NSC is an oblate system. To fit the model we first need to project the density and express it as a function of l and b.
The projected surface density as seen edge on is In general, to fit equation (17) to the data we would need to determine the eight parameters γ1,2, M1,2, a1,2, q1,2. However, we decided to fix a value for q2 because the second component is not very well confined in the 8-dimensional parameter space (i.e. there are several models each with different q2 and similar χ 2 ). We choose q2 = 1/3.6, close to the value found in Fritz et al. (2014). For similar reasons, we also fix the same values for γ1,2 as in the spherical case.
After fitting the remaining parameters we have: The reduced χ 2 that correspond to these parameter values is The second component is about 100 times more massive than the first, but also extends more than one order of magnitude further. Assuming constant mass-to-light ratio for the star cluster, we determine its potential using the relation from Qian et al. (1995), which is compatible with their contour integral method (i.e. it can be used for complex R 2 and z 2 ). The potential for a single component i is given by: i +u , and where Ψ0i is the central potential (for a review of the potential theory of ellipsoidal bodies consider Chandrasekhar (1969)). The total potential of the two-component model is

Axisymmetric Jeans modeling
Here we first continue with axisymmetric Jeans modeling. We will need a large number of models to determine the best values for the mass and distance of the NSC, and for the mass of the embedded black hole. We will use DFs for the detailed modeling in Section 4.3, but this is computationally expensive, and so a large parameter study with the DF approach is not currently feasible. In Section 4.3 we will show that a two-integral (2I) distribution function of the form f (E, L 2 z ) gives a very good representation to the histograms of proper motions and line-of-sight velocities for the nuclear star cluster in all bins. Therefore we can assume for our Jeans models that the system is isotropic in the meridional plane, υ 2 z = υ 2 R . From the tensor virial theorem (Binney & Tremaine 2008) we know that for 2I-models υ 2 Φ > υ 2 R in order to produce the flattening. In principle, for systems of the form f (E, Lz) it is possible to find recursive expressions for any moment of the distribution function (Magorrian & Binney 1994) if we know the potential and the density of the system. However, here we will confine ourselves to the second moments, since later we will recover the distribution function. By integrating the Jeans equations we get relations for the independent dispersions (Nagai & Miyamoto 1976): The potential and density are already known from the previous section. Once υ 2 z is found it can be used to calculate υ 2 ϕ . The intrinsic dispersions in l and b direction are given by the equations: where sin 2 θ = x 2 /R 2 and cos 2 θ = 1 − x 2 /R 2 . Projecting the previous equations along the line of sight we have: In order to define our model completely, we need to set the distance R0 and mass M * of the cluster and the black hole mass M•. To do this we apply a χ 2 minimization technique matching all three velocity dispersions in both sets of cells, using the following procedure. First we note that the inclusion of self-consistent rotation to the model will not affect its mass. This means that for the fitting we can use υ 2 los 1/2 for each cell of Figure 2. Similarly, since our model is axisymmetric we should match to the υ 2 l,b 1/2 for each proper motion cell; the υ l,b terms should be and indeed are negligible. Another way to see this is that since the system is axially symmetric, the integration of υ l,b along the line-ofsight should be zero because the integration would cancel out for positive and negative y.
With this in mind we proceed as follows, using the cluster's density parameters 1 as in equation (18). First we partition the 3d space (R0, M * , M•) into a grid with resolution 20 × 20 × 20. Then for each point of the grid we calculate the corresponding χ 2 using the velocity dispersions from all cells in Figs. 1 and 2, excluding the two cells at the largest radii (see Fig. 8). We compare the measured dispersions with the . Root mean square line-of-sight velocities compared with the best kinematical model, as a function of two-dimensional radius on the sky as in Fig. 8. In both plots the stellar mass of the NSC is 8.06 × 10 6 M ⊙ within m < 100 ′′ , the black hole mass is 3.88 × 10 6 M ⊙ , and the distance is 8.3 kpc (equation 27). All the maser data are included in the plot.
model values obtained from equations (23) for the centers of these cells. Then we interpolate between the χ 2 values on the grid and find the minimum of the interpolated function, i.e., the best values for (D, M * , M•). To determine statistical errors on these quantities, we first calculate the Hessian matrix from the curvature of χ 2 surface at the minimum, ∂χ 2 /∂pi∂pj. The statistical variances will be the diagonal elements of the inverted matrix. In order to calculate the . Note that σ b is slightly lower than σ los . The difference between σ b and σ l comes from the flattening of both the inner and outer components of the model. errors accurately we need at least a 3rd degree interpolation on the grid.
With this procedure we obtain a minimum reduced χ 2 /νJeans = 1.15 with νJeans = 161 degrees of freedom, for the values R0 = 8.20 kpc M * (m < 100 ′′ ) = 8.31 × 10 6 M⊙ M• = 3.50 × 10 6 M⊙, However, models with a more flattened inner density for the star cluster than given in equation (18) result in a better match to the velocity data. We have therefore repeated the fitting of the starcount density profile in Fig. 7, keeping γ1, γ2, and q2 fixed, setting q1 to predetermined values, and varying the remaining parameters. This results in a best kinematic model with q1 = 1/1.35 and The best reduced χ 2 that we obtain for the velocity dispersion profiles with these parameters is χ 2 /νJeans = 1.05 and corresponds to the values R0 = 8.30 kpc M * (m < 100 ′′ ) = 8.06 × 10 6 M⊙ M• = 3.88 × 10 6 M⊙.
Compared to the more spherical model, the cluster mass has decreased and the black hole mass has increased, being now more in accord with the value determined from the S star orbits around Sgr A * (Gillessen et al. 2009). The sum of both masses as well as the distance have changed only by 1%. We will consider the determination of these parameters in more detail in Section 4.2, as well as their errors. First, we now look at the comparison of this model with the velocity data. Figure 8 shows how the dispersions σ l and σ b averaged over angle for both models (the model fitting best the surface density data and the model fitting best the kinematic data) compare with the measured proper motion dispersions. Figure 9 shows how our best kinematic model, similarly averaged, compares with the line-of-sight mean square velocity data. The maser data are also included in the plot. It is seen that the model fits the data very well, in accordance with the χ 2 /νJeans = 1.05 per cell. Figure 10 shows how all three projected dispersions of the model compare. σ b is slightly lower than σ los due to projection effects. The fact that all three velocity dispersion profiles in Figs. 8, 9 are fitted well by the model suggests that the assumed semiisotropic dynamical structure is approximately correct.
The prediction shown in Fig. 8 is similar to Figure 11 of Trippe et al. (2008) but the interpretation is different. In the previous section we showed that the difference in projected dispersions cannot be explained by imposing rotation on the model. Here we demonstrated how the observational finding σ l > σ b can be quantitatively explained by flattened axisymmetric models of the NSC and the surrounding nuclear disk.

Distance to the Galactic Center, mass of the star cluster, and mass of the black hole
We now consider the determination of these parameters from the NSC data in more detail. Fig 11 shows the marginalized χ 2 -plot for the NSC model as given in equation (26), for pairs of two parameters (R0, M•), (M•, M * ), (R0, M * ), as obtained from fitting the Jeans dynamical model to the velocity dispersion profiles. The figure shows contour plots for constant χ 2 /νJeans with 1σ, 2σ, 3σ, 4σ, and 5σ in the three planes for the two-dimensional distribution of the respective parameters. We notice that the distance R0 has the smallest relative error. The best-fitting values for (R0, M•, M * ) are given in equation (27); these values are our best estimates based on the NSC data alone. We regard the underlying dynamical model based on equation (26) as our preferred model, be-cause the dynamical flattening of the inner component is largely determined by the ratio of σ l /σ b and the tensor virial theorem, and because the inner NSC flattening determined from starcounts depends sensitively on corrections for dust extinction of the WFC3/IR starcounts in the important region around 50 ′′ (Fritz et al. 2014, Fig. 25).
Statistical errors are determined from the Hessian matrix for this model. Systematic errors can arise from uncertainties in the NSC density structure, from deviations from the assumed axisymmetric two-integral dynamical structure, and from other sources, e.g., dust extinction within the cluster (see Section 5). We have already illustrated the effect of varying the cluster flattening on (R0, M•, M * ). We have also tested how variations of the cluster density structure (α2, q2, M2) beyond 500 ′′ impact on the best-fit parameters, and found that these effects are smaller. Based on all our Jeans models, we estimate the systematic uncertainties from the NSC density structure to be ∼ 0.1 kpc in R0 and ∼ 10% in M• and M * (m < 100 ′′ ). We will see in Section 4.3 below that the DF for the model fitting the cluster surface density best gives a clearly inferior representation of the velocity histograms than our kinematically preferred model, and also that the differences between both models appear larger than the residual differences between our preferred model and the observed histograms. We therefore take the forgoing systematic error estimates to include also other systematic modeling uncertainties, such as possible deviations from the two-integral dynamical structure, so that finally R0 = 8.30 ± 0.09|stat ± 0.1|syst kpc M * (m < 100 ′′ ) = (8.06 ± 0.31|stat ± 0.8|syst) × 10 6 M⊙ M• = (3.88 ± 0.14|stat ± 0.4|syst) × 10 6 M⊙.
We note several other systematic errors which are not easily quantifiable and so are not included in these estimates, such as biases in star counts versus integrated light distribution, inhomogeneous sampling of proper motions or line-of-sight velocities, extinction within the NSC, and the presence of an additional component of dark stellar remnants. Based on our preferred model, the mass of the star cluster within 100 ′′ converted into spherical coordinates is M * (r < 100 ′′ ) = (9.26 ± 0.32|stat ± 0.9|syst) × 10 6 M⊙. The model's mass within the innermost pc (25 ′′ ) is M * (m < 1pc) = 0.7 × 10 6 M⊙ in spheroidal radius, or M * (r < 1pc) = 0.9 × 10 6 M⊙ in spherical radius. The total mass of the inner component is M1 = 5.23×10 7 M⊙. Because most of this mass is beyond the radius where the inner component dominates the star counts, it is sensitively dependent on the assumed outer density profile of the inner component.
The distance and the black hole mass we found differ by 0.4% and 11%, respectively, from the values R0 = 8.33±0.17|stat ±0.31|syst kpc and M• = 4.31±0.36×10 6 M⊙ for R0 = 8.33 kpc, as determined by Gillessen et al. (2009) from stellar orbits around Sgr A * . Figure 12 shows the 1σ to 3σ contours of marginalized χ 2 for (R0, M•) jointly from stellar orbits (Gillessen et al. 2009), for the NSC model of this paper, and for the combined modeling of both data sets. The figure shows that both analyses are mutually consistent. When marginalized over M * and the respective other parameter, the combined modeling gives, for each parameter alone, R0 = 8.36 ± 0.11 kpc and M• = 4.26 ± 0.14 × 10 6 M⊙. We note that these errors for R0 and M• are both dominated by the distance error from the NSC modeling. Thus our estimated additional systematic error of 0.1 kpc for R0 in the NSC modeling translates to a similar additional error in the combined R0 measurement and, through the SMBH mass-distance relation given in Gillessen et al (2009), to an additional uncertainty ≃ 0.1 × 10 6 M⊙ in M•. We see that the combination of the NSC and S-star orbit data is a powerful means for decreasing the degeneracy between the SMBH mass and Galactic center distance in the S-star analysis.

Two-integral distribution function for the NSC.
Now we have seen the success of fitting the semi-isotropic Jeans models to all three velocity dispersion profiles of the NSC, and determined its mass and distance parameters, we proceed to calculate two-integral (2I) distribution functions. We use the contour integral method of Hunter & Qian (1993, HQ) and Qian et al. (1995). A 2I DF is the logical, next-simplest generalization of isotropic spherical models. Finding a positive DF will ensure that our model is physical. Other possible methods to determine f (E, Lz) include reconstructing the DF from moments (Magorrian 1995), using series expansions as in Dehnen & Gerhard (1994), or grid-based quadratic programming as in Kuijken (1995). We find the HQ method the most suitable since it is a straightforward generalization of Eddington's formula. The contour integral is given by: whereρ11(Ψ, R) = ∂ 2 ∂Ψ 2 ρ(Ψ, R). Equation (29) is remarkably similar to Eddington's formula. Like in the spherical case the DF is even in Lz. The integration for each (E, Lz) pair takes place on the complex plane of the potential ξ following a closed path (i.e. an ellipse) around the special value Ψenv. For more information on the implementation and for a minor improvement over the original method see Appendix A. We find that a resolution of (120 × 60) logarithmically placed cells in the (E, Lz) space is adequate to give us relative errors of the order of 10 −3 when comparing with the zeroth moment, i.e., the density, already known analytically, and with the second moments, i.e., the velocity dispersions from Jeans modeling.
The gravitational potential is already known from equations (19) and (20). For the parameters (cluster mass, black hole mass, distance) we use the values given in equation (27). Figure 13 shows the DF in (E, Lz) space. The shape resembles that of the spherical case (Fig. 4). The DF is a monotonically increasing function of η = Lz/Lz max(E) and declines for small and large energies. The DF contains information about all moments and therefore we can calculate the projected velocity profiles (i.e., velocity distributions, hereafter abbreviated VPs) in all directions. The normalized VP in the line-of-sight (los) direction is Using polar coordinates in the velocity space (υx, υz) → (υ ⊥ , ϕ) where υx = υ ⊥ cos ϕ and υz = υ ⊥ sin ϕ we find V P (υ los ; x, z) = 1 2Σ Following a similar path we can easily find the corresponding integrals for the VPs in the l and b directions. The typical shape of the VPs in the l and b directions within the area of interest (r < 100 ′′ ) is shown in Figure 14. We notice the characteristic two-peak shape of the VP along l that is caused by the near-circular orbits of the flattened system. Because the front and the back of the axisymmetric cluster contribute equally, the two peaks are mirror-symmetric, and adding rotation would not change their shapes.
The middle panels of Figure 15 and Figures B1 and B2 in Appendix B show how our best model (with parameters from equations (26) and (27)) predicts the observed velocity histograms for various combinations of cells. The reduced χ 2 for each histogram is also provided. The prediction is very good both for the VPs in υ l and υ b . Specifically, for the l proper motions our flattened cluster model predicts the two-peak structure of the data pointed out by several authors (Trippe et al. 2008;Schödel et al. 2009;Fritz et al. 2014). In order to calculate the VP from the model for each cell we averaged over the VP functions for the center of each cell weighted by the number of stars in each cell and normalized by the total number of stars in all the combined cells. Figure 15 compares two selected υ l -VPs for our two main models with the data. The left column shows how the observed velocity histograms (VHs) for corresponding cells compare to the model VPs for the less flattened model with parameters from equations (18) and (24), the middle column compares with the same VPs from our best kinematical model with parameters from equations (26) and (27). Clearly, the more flattened model with q1 = 1/1.35 fits the shape of the data much better than the more spherical model with q1 = 1/1.18, justifying its use in Section 4.2.
This model is based on an even DF in Lz and therefore does not yet have rotation. To include rotation, we will (in Section 4.4) add an odd part to the DF, but this will not change the even parts of the model's VPs. Therefore, we can already see whether the model is also a good match  Figure 14. Typical velocity distributions for l and b-velocities within the area of interest (r < 100 ′′ ). The red line shows the VPs in the b direction, the blue line in the l direction. The VPs along l show the characteristic two-peak-shape pointed out from the data by several authors (Schödel et al. 2007;Trippe et al. 2008;Fritz et al. 2014). to the observed los velocities by comparing it to the even parts of the observed los VHs. This greatly simplifies the problem since we can think of rotation as independent, and can therefore adjust it to the data as a final step. Figure  B3 shows how the even parts of the VHs from the los data compare with the VPs of the 2I model. Based on the reduced χ 2 , the model provides a very good match. Possible systematic deviations are within the errors. The los VHs are broader than those in the l direction because the los data contain information about rotation (the broader the even part of the symmetrized los VHs, the more rotation the system possesses, and in extreme cases they would show two peaks).  (18) and (24). Middle column: predicted VPs for our best kinematical model with parameters from equations (26) and (27). This more flattened model with q 1 = 1/1.35 fits the data much better than the rounder cluster model with q 1 = 1/1.18.

Adding rotation to the axisymmetric model:
is the NSC an isotropic rotator?
As in the spherical case, to model the rotation we add an odd part in Lz to the initial even part of the distribution function, so that the final DF takes the form f (E, Lz) = (1 + g(Lz))f (E, Lz). We use again equation (13); this adds two additional parameters (κ, F) to the DF. Equation (15) gives the mean los velocity vs galactic longitude. In order to constrain the parameters (κ, F) we fitted the mean los velocity from equation (15) to the los velocity data for all cells in Fig. 2. The best parameter values resulting from this 2D-fitting are κ = 2.8 ± 1.7, F = 0.85 ± 0.15 and χ 2 r = 1.25. Figure B4 shows that the VPs of this rotating model compare well with the observed los VHs.
An axisymmetric system with a DF of the form f (E, Lz) is an isotropic rotator when all three eigenvalues of the dispersion tensor are equal (Binney & Tremaine 2008) and therefore In order to calculate υϕ from equation (33) it is not necessary for the DF to be known since υ 2 ϕ and υ 2 R are already known from the Jeans equations (21). Figure 16 shows the fitted u los velocity from the DF against the isotropic rotator case calculated from equation (33), together with the mean los velocity data. The two curves agree well within 100 ′′ . Therefore according to our best model the NSC is close to an isotropic rotator, perhaps rotating slightly more slowly.

DISCUSSION & CONCLUSIONS
In this work we presented a dynamical analysis of the Milky Way's nuclear star cluster (NSC), based on ∼10'000 proper motions, ∼2'700 radial velocities, and new star counts from the companion paper of Fritz et al. (2014). We showed that an excellent representation of the kinematic data can be obtained by assuming a constant mass-to-light ratio for the cluster, and modeling its dynamics with axisymmetric two-integral distribution functions (2I-DFs), f (E, Lz). The DF modeling allows us to see whether the model is physical, i.e., whether the DF is positive, and to model the proper motion (PM) and line-of-sight (los) velocity histograms (VHs). One open question until now has been the nature of the double peaked VHs of the v l -velocities along Galactic longitude, and the bell-shaped VHs of v b along Galactic latitude, which cannot be fitted by Gaussians (Schödel et al. 2007). Our 2I DF approximation of the NSC gives an excellent prediction for the observed shapes of the v l -, v b , and v los -VHs. The models show that the double-peaked shape of the v l -VHs is a result of the flattening of the NSC, and suggest that the cluster's dynamical structure is close to an isotropic rotator. Because both PMs and los-velocities enter the dynamical models, we can use them also to constrain the distance to the GC, the mass of the NSC, and the mass of the Galactic centre black hole. To do this efficiently, we used the semiisotropic Jeans equations corresponding to 2I-DFs. In this section, we discuss these issues in more detail.

The dynamical structure of the NSC
The star count map derived in Fritz et al. (2014) suggests two components in the NSC density profile, separated by an inflection point at about ∼ 200 ′′ ∼ 8 pc (see Fig. 7 above).
To account for this we constructed a two-component dynamical model for the star counts in which the two components are described as independent γ-models. The inner, rounder component can be considered as the proper NSC, as in Fritz et al. (2014), while the outer, much more flattened component may represent the inner parts of the nuclear disk described in Launhardt et al. (2002). The scale radius of the inner component is ∼ 100 ′′ , close to the radius of influence of the SMBH, r h ∼ 90 ′′ (Alexander 2005). The profile flattens inside ∼ 20 ′′ to a possible core (Buchholz et al. 2009;Fritz et al. 2014) but the slope of the three-dimensional density profile for the inner component is not well-constrained. We took γ = 0.51 as γ = 0.5 is the lowest possible slope for an isotropic spherical model to have a positive DF.
The flattening for the inner component inferred from mainly HST star counts is small (q1 = 1/1.18); however, our Jeans dynamical modeling gives a larger value (q1 ≃ 1/1.35). The dynamical flattening is robust because it is largely determined by the ratio of σ b /σ l and the tensor virial theorem. Possible reasons for the difference between both values are a possible overcorrection of the star counts for extinction, and uncertainties in the contribution of foreground/background nuclear disk stars to the cluster kinematics (although our tests suggest that this is a small effect).
Assuming constant mass-to-light ratio for the NSC, we found that a 2I-DF model gives an excellent description of the proper motion and los velocity dispersions and VHs, in particular of the double-peaked distributions in the v lvelocities. This double-peaked structure is a direct consequence of the flattening of the star cluster; the detailed agreement of the model VPs with the observed histograms therefore confirms the larger value q1 = 1/1.35 for the inner cluster component which we had previously inferred from the velocity dispersions. We showed that for an axisymmetric model rotation cannot be seen directly in the proper motion VHs, but is apparent only in the los velocities. When  Figure 17. Average differential extinction of nuclear cluster stars plotted as a function of v l proper motion. The differential extinction is inferred from the difference in the color of a star to the median color of its 16 nearest neighbours, using the extinction law of Fritz et al. (2014), and correcting also for the weak color variation with magnitude.
a suitable odd part of the DF is added to include rotation, the 2I-DF model also gives a very good representation of the skewed los VHs. From the amplitude of the required rotation we showed that the NSC can be approximately described as an isotropic rotator model, perhaps rotating slightly slower than that outside 100". Individual VHs are generally fitted by this model within the statistical errors, but on closer examination the combined v l VHs show a slightly lower peak at negative velocities, as already apparent in the global histograms of Trippe et al. (2008);Schödel et al. (2009) . Fig. 17 suggests that differential extinction of order ∼ 0.12 mag within the cluster may be responsible for this small systematic effect, by causing some stars from the back of the cluster to fall out of the sample. The dependence of mean extinction on v l independently shows that the NSC must be rotating, which could otherwise only be inferred from the los velocities. In subsequent work, we will model the effect of extinction on the inferred dynamics of the NSC. This will then also allow us to estimate better how important deviations from the 2I-dynamical structure are, i.e., whether three-integral dynamical modeling would be worthwhile.

Mass of the NSC
The dynamical model results in an estimate of the mass of the cluster from our dataset. Our fiducial mass value is M * (m < 100 ′′ ) = (8.06 ± 0.31|stat ± 0.8|syst) × 10 6 M⊙ interior to a spheroidal major axis distance m = 100 ′′ . This corresponds to an enclosed mass within 3-dimensional radius r = 100 ′′ of M * (r < 100 ′′ ) = (9.26 ± 0.31|stat ± 0.9|syst) × 10 6 M⊙. The systematic modeling errors are estimated from the difference between our best kinematical model and the model that fits the surface density best, which has a different flattening for the inner component; see Section 4.2.
The fiducial mass M * (r < 100 ′′ ) for the best axisymmetric model is larger than that obtained with spherical models. The constant M/L spherical model with density parameters as in Section 3, for R0 = 8.3 kpc and the same black hole mass has M * (r < 100 ′′ ) = 6.6 × 10 6 M⊙.
There are two reasons for this difference: (i) At ∼ 50 ′′ where the model is well-fixed by kinematic data the black hole still contributes more than half of the interior mass. In this region, flattening the cluster at constant mass leaves σ l and σ los approximately constant, but decreases σ b to adjust to the shape. To fit the same observed data, the NSC mass must be increased. (ii) Because of the increasing flattening with radius, the average density of the axisymmetric model decreases faster than that of the spherical density fit; thus for the same observed velocity dispersion profiles a larger binding mass for the NSC is required. Figure 18 shows the enclosed mass within a spheroidal radius m as in equation (25), as well as the mass within spherical radius r. E.g., the mass within 1 pc is M * (r < 1pc) = 0.90 * 10 6 M⊙. This is compatible with the spherical modeling of Schödel et al. (2009) who gave a range of 0.6 − 1.7×10 6 M⊙, rescaled to R0 = 8.3 kpc, with the highest mass obtained for their isotropic, constant M/L model. According to Fig. 18, at r ≃ 30 ′′ = 1.2 pc the NSC contributes already ≃ 25% of the interior mass (≃ 45% at r ≃ 50 ′′ = 2 pc), and beyond r ≃ 100 ′ = 4 pc it clearly dominates.
An important point to note is that the cluster mass does not depend on the net rotation of the cluster but only on its flattening. This is because to add rotation self-consistently to the model we need to add an odd part to the DF which does not affect the density or the proper motion dispersions σ l and σ b .
Our NSC mass model can be described as a superposition of a moderately flattened nuclear cluster embedded in a highly flattened nuclear disk. Approximate local axis ratios for the combined density and for the total potential including the central black hole are shown in the upper panel of Fig. 18. Therefore, we can define the NSC proper as the inner component of this model, similar to Fritz et al. (2014).
Its total mass, M1 = 5.23×10 7 M⊙ (Section 4.2), is welldetermined within similar relative errors as M * (m < 100 ′′ ). However, identifying M1 with the total mass of the Galactic NSC at the center of the nuclear disk has considerable uncertainties: the scale radius of the inner component is ∼ 100 ′′ (equation 26), so that the mass of the inner component outside 2 scale radii (∼ 64% of the total) is not accurately determined, because of the uncertain density profile at large radii where the outer disk component dominates the surface density. If we assume that the outer density profile is not shallower than the γ-model, and that the mass within the half-mass radius is secure, we can estimate MNSC = (3.9 ± 1.3) × 10 7 M⊙.

Evolution of the NSC
After ∼ 10 half mass relaxation times t rh a dense nuclear star cluster will eventually evolve to form a Bahcall-Wolf cusp with slope γ = 7/4 (Merritt 2013); for rotating dense star clusters around black holes this was studied by Fiestas & Spurzem (2010). The minimum allowable inner slope for a spherical system with a black hole to have a positive DF is γ = 0.5. From the data it appears that the Galactic NSC instead has a core (Buchholz et al. 2009;Fritz et al. 2014), with the number density possibly even decreasing very close to the center (r < 0.3 pc). This is far from the expected Bahcall-Wolf cusp, indicating that the NSC is not fully relaxed. It is consistent with the relaxation time of the NSC being of order 10 Gyr everywhere in the cluster (Merritt 2013).
From Fig. 16 we see that the rotational properties of the Milky Way's NSC are close to those of an isotropic rotator. Fiestas et al. (2012) found that relaxation in rotating clusters causes a slow (∼ 3t rh ) evolution of the rotation profile. Kim et al. (2008) found that it also drives the velocity dispersions towards isotropy; in their initially already nearly isotropic models this happens in ∼ 4t rh . On a similar time-scale the cluster becomes rounder (Einsel & Spurzem 1999). Comparing with the NSC relaxation time suggests that these processes are too slow to greatly modify the dynamical structure of the NSC, and thus that its properties were probably largely set up at the time of its formation.
The rotation-supported structure of the NSC could be due to the rotation of the gas from which its stars formed, but it could also be explained if the NSC formed from merging of globular clusters. In the latter model, if the black hole is already present, the NSC density and rotation after completion of the merging phase reflects the distribution of disrupted material in the potential of the black-hole (e.g. Antonini et al. 2012). Subsequently, relaxation would lead to shrinking of the core by a factor of ∼ 2 in 10 Gyr towards a value similar to that observed (Merritt 2010). In the simulations of Antonini et al. (2012), the final relaxed model has an inner slope of γ = 0.45, not far from our models. Their cluster also evolved towards a more spherical shape, however, starting from a configuration with much less rotation and flattening than we inferred here for the present Milky Way NSC. Similar models with a net rotation in the initial distribution of globular clusters could lead to a final dynamical structure more similar to the Milky Way NSC.

Distance to the Galactic center
Using our large proper motions and los velocity datasets, we obtained a new estimate for the statistical parallax distance to the NSC. From matching our best dynamical model to the proper motion and los velocity dispersions within approximately |l|, |b| < 50 ′′ , we found R0 = 8.30 ± 0.09|stat ± 0.1|syst kpc. The statistical error is very small, reflecting the large number of fitted dispersion points. The systematic modeling error was estimated from the difference between models with different inner cluster flattening as discussed in Section 4.2.
Our new distance determination is much more accurate than that of Do et al. (2013) based on anisotropic spherical Jeans models of the NSC, R0 = 8.92 +0.58 −0.58 kpc, but is consistent within their large errors. We believe this is mostly due to the much larger radial range we modeled, which leaves less freedom in the dynamical structure of the model.
The new value for R0 is in the range R0 = 8.33 ± 0.35 kpc found by Gillessen et al. (2009) from analyzing stellar orbits around Sgr A * . A joint statistical analysis of the NSC data with the orbit results of Gillessen et al. (2009) gives a new best value and error R0 = 8.36 ± 0.11 kpc (Fig. 12, Section 4.2). Our estimated systematic error of 0.1 kpc for R0 in the NSC modeling translates to a similar additional uncertainty in this combined R0 measurement.
Measurements of R0 prior to 2010 were reviewed by Genzel et al. (2010). Their weighted average of direct measurements is R0 = 8.23±0.20±0.19 kpc, where the first error is the variance of the weighted mean and the second the unbiased weighted sample variance. Two accurate recent measurements give R0 = 8.33 ± 0.05|stat ± 0.14|syst kpc from RR Lyrae stars (Dekany et al. 2013) and R0 = 8.34 ± 0.14 kpc from fitting axially symmetric disk models to trigonometric parallaxes of star forming regions (Reid et al. 2014). These measurements are consistent with each other and with our distance value from the statistical parallax of the NSC, with or without including the results from stellar orbits around Sgr A * , and the total errors of all three measurements are similar, ∼ 2%.

Mass of the Galactic supermassive black hole
Given a dynamical model, it is possible to constrain the mass of the central black hole from 3D stellar kinematics of the NSC alone. With axisymmetric Jeans modeling we found M• = (3.88±0.14|stat ± 0.4|syst)×10 6 M⊙, where the systematic modeling error is estimated from the difference between models with different inner cluster flattening as discussed in Section 4.2. Within errors this result is in agreement with the black hole mass determined from stellar orbits around Sgr A * (Gillessen et al. 2009).
Our dataset for the NSC is the largest analyzed so far, and the axisymmetric dynamical model is the most accurate to date; it compares well with the various proper motion and line-of-sight velocity histograms. Nonetheless, future improvements may be possible if the uncertainties in the star density distribution and kinematics within 20" can be reduced, the effects of dust are incorporated, and possible deviations from the assumed 2I-axisymmetric dynamical structure are taken into account.
Several similar analyses have been previously made using spherical isotropic or anisotropic modeling. Trippe et al. (2008) used isotropic spherical Jeans modeling for proper motions and radial velocities in 1 ′′ < R < 100 ′′ ; their best estimate is M• ∼ 1.2 × 10 6 M⊙, much lower than the value found from stellar orbits. Schödel et al. (2009) constructed isotropic and anisotropic spherical broken power-law models, resulting in a black hole mass of M• = 3.6 +0.2 −0.4 × 10 6 M⊙. However, Fritz et al. (2014) find M• ∼ 2.27 ± 0.25 × 10 6 M⊙, also using a power-law tracer density. They argue that the main reason for the difference to Schödel et al. (2009) is because their velocity dispersion data for R > 15 ′′ are more accurate, and their sample is better cleaned for young stars in the central R < 2.5 ′′ . Assuming an isotropic spherical model with constant M/L, Fritz et al. (2014) find M• ∼ 4.35 ± 0.12 × 10 6 M⊙. Do et al. (2013) used 3D stellar kinematics within only the central 0.5 pc of the NSC. Applying spherical Jeans modeling, they obtained M• = 5.76 +1.76 −1.26 × 10 6 M⊙ which is consistent with that derived from stellar orbits inside 1 ′′ , within the large errors. However, in their modeling they used a very small density slope for the NSC, of γ = 0.05, which does not correspond to a positive DF for their quasi-isotropic model.
Based on this work and our own models in Section 4, the black hole mass inferred from NSC dynamics is larger for constant M/L models than for power law models, and it increases with the flattening of the cluster density distribution.
The conceptually best method to determine the black hole mass is from stellar orbits close to the black hole (Schödel et al. 2002;Ghez et al. 2008;Gillessen et al. 2009), as it requires only the assumption of Keplerian orbits and is therefore least susceptible to systematic errors. Gillessen et al. (2009) find that the largest uncertainty in the value obtained for M• is due to the uncertainty in R0, and that M• scales as M• ∝ R 2.19 0 . Therefore using our improved statistical parallax for the NSC also leads to a more accurate determination of the black hole mass. A joint statistical analysis of the axisymmetric NSC modeling to-gether with the orbit modeling of Gillessen et al. (2009) gives a new best value and error for the black hole mass, M• = (4.26 ± 0.14) × 10 6 M⊙ (see Fig. 12, Section 4.2). An additional systematic error of 0.1 kpc for R0 in the NSC modeling, through the BH mass-distance relation given in Gillessen et al (2009), translates to an additional uncertainty ≃ 0.1 × 10 6 M⊙ in M•.
Combining this result with the mass modeling of the NSC, we can give a revised value for the black hole influence radius r infl , using a common definition of r infl as the radius where the interior mass M (< r) of the NSC equals twice the black hole mass (Merritt 2013). Comparing the interior mass profile in Fig. 18 as determined by the dynamical measurement with M• = 4.26 × 10 6 M⊙, we obtain r infl ≃ 94 ′′ = 3.8 pc.
The Milky Way is one of some 10 galaxies for which both the masses of the black hole and of the NSC have been estimated (Kormendy & Ho 2013). From these it is known that the ratio of both masses varies widely. Based on the results above we estimate the Milky Way mass ratio M•/MNSC = 0.12 ± 0.04, with the error dominated by the uncertainty in the total NSC mass.

Conclusions
Our results can be summarized as follows: • The density distribution of old stars in the central 1000 ′′ in the Galactic center of the NSC can be well-approximated by superposition of spheroidal NSC with scale ∼ 100 ′′ and a much larger nuclear disk component.
• The difference between the proper motion dispersions σ l and σ b cannot be explained by rotation alone, but is a consequence of the flattening of the NSC. The dynamically inferred axial ratio for the inner component at ∼ 50 ′′ is 1/q1 ≃ 1.35.
• The orbit structure of an axisymmetric two-integral DF f (E, Lz) gives an excellent match to the observed doublepeak in the v l -proper motion velocity histograms, as well as to the shapes of the vertical v b -proper motion histograms. Our model also compares well with the symmetrized (even) line-of-sight velocity histograms.
• The rotation seen in the line-of-sight velocities can be modelled by adding an odd part of the DF, and this shows that the NSC is approximately described by an isotropic rotator model.
• Comparing proper motions and line-of-sight velocities through the model determines the NSC mass within 100 ′′ , the mass of the SMBH, and the distance to the NSC. From the star cluster data alone, we find M * (r < 100 ′′ ) = (9.26± 0.31|stat±0.9|syst)×10 6 M⊙, M• = (3.88±0.14|stat ±0.4|syst)× 10 6 M⊙, and R0 = 8.30 ± 0.09|stat ± 0.1|syst kpc, where the systematic errors estimate additional uncertainties in the dynamical modeling. The fiducial mass of the NSC is larger than in previous spherical models. The total mass of the NSC is significantly more uncertain due to the surrounding nuclear disk; we estimate MNSC = (3.9 ± 1.3) × 10 7 M⊙. The mass of the black hole so determined is consistent with results from stellar orbits around Sgr A * . The Galactic center distance agrees well with recent accurate determinations from RR Lyrae stars and masers in the Galactic disk, and has similarly small errors.
• Combining with the stellar orbit analysis of Gillessen et al. (2009), we find M• = (4.26 ± 0.14) × 10 6 M⊙ and R0 = 8.36±0.11 kpc. Because of the better constrained distance, the accuracy of the black hole mass is improved as well. Combining with the parameters of the cluster, the black hole radius of influence is 3.8 pc and the ratio of black hole to cluster mass is estimated to be 0.12±0.04.

APPENDIX A: TWO-INTEGRAL DISTRIBUTIONS FUNCTIONS
In this part we give implementation instructions for the 2I-DF algorithm of Hunter & Qian (1993, HQ). We will try to focus on the important parts of the algorithm and also on the tests that one has to make to ensure that the implementation works correctly. Our implementation is based on Qian et al. (1995) and made with Wolfram Mathematica. For the theory the reader should consider the original HQ paper.
We will focus on the even part of the DF and for the case where the potential at infinity, Ψ∞, is finite and therefore can be set to zero. First one partitions the (E, η) space where η ≡ Lz/Lz max(E) takes values in (0, 1). The goal of the HQ algorithm is to calculate the value of the DF on each of these points on a 2D grid and subsequently end up with a 3D grid where we can apply an interpolation to obtain the final smooth function f (E, Lz). The energy values on the 2D grid are placed logarithmically within an interval of interest [Emin, Emax] (higher Emax value is closer to the center) and the values of η are placed linearly between 0 and 1. Physically allowable E and Lz correspond to bound orbits in the potential Ψ and therefore E > 0. In addition at each energy there is a maximum physically allowed Lz corresponding to circular orbits with z = 0. This is given by the equations: where Rc is the radius of the circular orbit and the value Lz max ≡ Lz(Rc) is the maximum allowed value of Lz at a specific E. The Lz max(E) function can be found by solving the 1st equation for Rc and substituting in the second one therefore making a map E → Lz max. The value of the potential of a circular orbit with energy E is denoted by Ψenv(E) and can be found from Ψenv(E) = Ψ(R 2 c , 0) after solving the 1st of equation (A1) for Rc. The value Ψenv(E) is important for evaluation of f (E, Lz) and it is used in the contour of the complex integral.
To calculate the even part f+(E, Lz) of the DF for each point of the grid we have to apply the following complex contour integral on the complex ξ-plane using a suitable path: where the subscripts denotes the second partial derivative with respect to the first argument. A possible path for the contour is shown in figure A1. The loop starts at the point 0 on the lower side of the real ξ axis, crosses the real ξ axis at the point Ψenv(E) and ends at the upper side of real ξ axis. The parametrization of the path in general could be that of an ellipse: where h is the highest point of the ellipse. The value of h should not be too high because we want to avoid other singularities but not too low either to maintain the accuracy. We optimize our implementation by integrating along the upper part of the loop and multiply the real part of the result by 2 (this is because of the Schwarz reflection principle). In order to calculate the integrand of the integral we need the following transformation: ρ11(ξ, R 2 ) = ρ22(R 2 , z 2 ) [Ψ2(R 2 , z 2 )] 2 − ρ2(R 2 , z 2 )Ψ22(R 2 , z 2 ) [Ψ2(R 2 , z 2 )] 3 (A4) in which each subscript denotes a partial differentiation with respect to z 2 . This equation is analogous to equation (7) of the spherical case. In additionρ is the density considered as a function of ξ and R 2 as opposed to R 2 and z 2 . The integrand of the contour integral A2 depends only on θ angle for a given (E, Lz) pair. Therefore we need the maps R → ξ and z → ξ in order to find the value of the integrand for a specific θ. The first map is given by R 2 = 1 2 L 2 z /(ξ − E). The second is given by solving the equation ξ = Ψ L 2 z 2(ξ−E) , z 2 for z. It is very important that the solution of the previous equation corresponds to the correct branch in which the integrand attains its physically achieved values. In order to achieve that for each pair (E, Lz max) we start at the point ξ = Ψenv(E)(θ = 0) which belongs to the physical domain and we look for the unique real positive solution. For the next point of the contour we use as initial guess the value of z from the previous step that we already know that belongs to the correct branch. Using this method we can calculate the integrand in several values of θ then make an interpolation of the integrand and calculate the value of the DF using numerical integration. Figure A2 shows the shape of the DF for η = 0.5 for the potential we use in the fourth section of the paper for one value of h, using the aforementioned procedure. We notice that for large energies fluctuations of the DF appear. In order to solve this we can generalize the h value of the contour to h = h(E). The h(E) could be a simple step function that takes four or five different values. For our model the h(E) function is a decreasing function of E. This means that the minor axis of the ellipse should decrease as the E increases to avoid such fluctuations. In general we can write h = h(E, Lz) so that the contour depends both on E and Lz.
Once we implement the algorithm it is necessary to test it. Our first test is to check that the lower half of the integration path in figure A1 is the complex conjugate of the upper half. Probably the next most straightforward test is against the spherical case. It is possible to use the HQ algorithm to calculate a DF for spherical system. This DF should be equal to that obtained from Eddington's formula for the same parameters. After calculating our 2I-DF we compare its low-order moments with those of Jeans modeling. The 0th and 2nd moments of the DF (the 1st is 0 for the even part) are given from the integrals.
Comparison with the 0th moment (density) is straight forward since the density is analytically known from the start. The 1st moments should be 0 within the expected error. In our implementation the error between Jeans modeling and the DF is of the order of 10 −3 within the area of interest. An additional test would be to integrate the VPs over the velocity space. Since the VPs integrals are normalized with the surface density the integral of a VP over the whole velocity space should be 1 within the expected error. APPENDIX B: VELOCITY HISTOGRAMS FOR THE 2-I MODEL Figure B1. VHs and VPs in the l and b directions predicted by the 2I model in angular bins. The reduced χ 2 is also provided. The size of the bins is 0.6mas/yr (∼ 23.6 km/s) for the upper two plots and 0.5mas/yr (∼ 19.6 km/s) for the rest of the diagrams. The right column shows which cells have been used for the VHs and VPs. Figure B2. VHs and VPs in the l and b directions predicted by the 2I model in radial bins. The reduced χ 2 is also provided. The size of the bins is 0.5mas/yr (∼ 19.6 km/s) for the 1st and 4th column and 0.6mas/yr (∼ 23.6 km/s) for the rest of the diagrams. The right column shows which cells have been used for the VHs and VPs.  Figure B3. VHs for the symmetrized los data compared with the corresponding even VPs of the model. The reduced χ 2 is also provided. The size of the bins is 40km/s. For the upper left we use stars with 20 ′′ < |l| < 30 ′′ and |b| < 20 ′′ , for the upper right stars with 30 ′′ < |l| < 40 ′′ and |b| < 20 ′′ , for the bottom left 40 ′′ < |l| < 50 ′′ and |b| < 20 ′′ , and for the bottom right 50 ′′ < |l| < 70 ′′ and |b| < 20 ′′ .