Chemodynamical models of our Galaxy

A chemodynamical model of our galaxy is fitted to data from DR17 of the APOGEE survey supplemented with data from the StarHorse catalogue and gaia DR3. Dynamically, the model is defined by action-based distribution functions for dark matter and six stellar components plus a gas disc. The gravitational potential jointly generated by the model's components is used to examine the galaxy's chemical composition within action space. The observational data probably cover all parts of action space that are populated by stars. The overwhelming majority of stars have angular momentum J_\phi>0 implying that they were born in the Galactic disc. High-alpha stars dominate in a region that is sharply bounded by J_\phi \la J_\phi(solar). Chemically the model is defined by giving each stellar component a Gaussian distribution in ([Fe/H],[Mg/Fe]) space about a mean that is a linear function of the actions. The model's 47 dynamical parameters are chosen to maximise the likelihood of the data given the model in 72 three-dimensional velocity spaces while its 70 chemical parameters are similarly chosen in five-dimensional chemo-dynamical space. The circular speed falls steadily from 237\kms at R=4\kpc to 218\kms at R=20\kpc. Dark matter contributes half the radial force on the Sun and has local density 0.011\msun\pc^{-3}, there being 24.5\msun\pc^{-2} in dark matter and 26.5\msun\pc^{-2} in stars within 1.1\kpc of the plane.


INTRODUCTION
ESA's Gaia satellite provides locations and space velocities for tens of millions of stars (Gaia Collaboration et al. 2021Collaboration et al. , 2022)).In anticipation of the arrival of Gaia astrometry, several teams around the world have been accumulating the spectra of millions of stars at higher resolution than Gaia can achieve and using these spectra to derive the stars' chemical compositions, which are expected to yield insight into our Galaxy's history.The APOGEE survey (Majewski et al. 2017) is particularly powerful in this respect because, being based on the mid-infrared H band, it can probe the disc nearer the plane and over a wider radial range than other surveys, which are more strongly restricted by dust.
Since the middle of the 20th century we have known that the ages and chemical compositions of stars vary systematically with their locations and velocities (Roman 1950(Roman , 1999;;Eggen et al. 1962;Gilmore & Wyse 1998;Fuhrmann 2011).With the emergence of the theory of nucleosynthesis (Burbidge et al. 1957) and models of the chemical evolution of the ISM (Tinsley 1980;Pagel 1997) conviction grew that ⋆ E-mail: binney@physics.ox.ac.uk by studying the chemodynamical structure of our Galaxy we should be able to trace its history (Freeman & Bland-Hawthorn 2002).Over the last two decades efforts to realise this goal have taken two lines of attack.One line centres on simulations of galaxy formation that include gas, stars and dark matter, usually in some sort of cosmological context (e.g.Brook et al. 2004;Grand et al. 2017).Another line of attack models the Galaxy as a series of annuli within which stars form from gas that they simultaneously enrich (Matteucci & Francois 1989;Chiappini et al. 1997;Schönrich & Binney 2009a;Schönrich & McMillan 2017;Sharma et al. 2021;Chen et al. 2022).Strengths of the latter line of attack include the ability to fit models to the very detailed data now available for our Galaxy and to develop understanding of how specific physical processes, such as radial migration and the late arrival of type Ia supernovae manifest themselves in observational data.
The central premise of Schönrich & Binney (2009a, hereafter SB09) and much prior work is that all disc stars were born on nearly circular orbits in the plane from gas that is azimuthally well mixed, so its metallicity [Fe/H] and α-abundance [Mg/Fe] are functions of Galactocentric radius R and look-back time τ .Since the gross chemistry of stel-lar atmospheres evolves little, to a good approximation it then follows that the location ([Fe/H],[Mg/Fe]) of a star in the chemical plane is an (a priori unknown) function of its birth radius R b and age.SB09 modelled the functions [Fe/H](R b , τ ) and [Mg/Fe](R b , τ ) by adopting a radial profile of star formation and following the production of heavy elements and the dispersal of these elements by radial flows and winds.This effort lead to predictions for the number and chemistry of stars born at each radius and time.
Fluctuations in the Galaxy's gravitational potential cause the orbits of stars to drift (e.g.Binney & Lacey 1988).Sellwood & Binney (2002) divided this drift into (i) 'blurring', which is the drift away from circular orbits to more eccentric and inclined orbits, and (ii) 'churning', by which stars change their angular momenta without increasing their random velocities.By adjusting the intensity of blurring and churning, SB09 were able to match the distribution of solarneighbourhood stars in chemical space.Remarkably, the observed bimodality of the chemical distribution emerged naturally in a model in which the star-formation rate declined monotonically with time and radius.The bimodality was a consequence of the rapid decline in [Mg/Fe] about a gigayear after the start of star formation as type Ia supernova set in.
The methodology of SB09 has been extended in various directions.Chen et al. (2022) updated their work by (a) using recent nucleosynthetic yields, and (b) comparing the model predictions at 24 locations (R, |z|) in the Galaxy rather than just in the solar neighbourhood.This important latter step was made possible by DR14 of the APOGEE survey.Sharma et al. (2021) had previously fitted models to these data but instead of deriving chemical compositions from a model of star formation and nucleosynthetic yields, they specified a functional form for [Fe/H](R b , τ ) that contains parameters to be fitted to the data.They further assumed that [Mg/Fe] is a function of [Fe/H] and age that has a specified functional form, and fitted the form's parameters to the data.
Both Sharma et al. (2021) and Chen et al. (2022) followed SB09 in adopting a Schwarzschild DF f (ER, J φ , Ez) (e.g.Binney & Tremaine 2008, §4.4.3).Several powerful arguments favour use of DFs that are functions f (J) of the action integrals rather than the approximate energies ER and Ez (e.g.Binney & McMillan 2016), so Sanders & Binney (2015) reformulated SB09 in terms of action-based DFs.Specifically, they assumed that in the absence of churning, the DF of each coeval cohort of disc stars would have the form of the quasi-isothermal DF that was introduced by Binney & McMillan (2011).Blurring was represented by the radial and vertical velocity dispersion parameters of the DF being functions of age, and the cohort's current DF was obtained by convolution of this quasi-isothermal DF with a kernel that represented diffusion in J φ .Unfortunately at that time model-data comparisons were only possible in the solar neighbourhood, but the work of Sharma et al. (2021), in which a model was successfully fitted to the wide-ranging APOGEE data, is methodologically similar to Sanders & Binney (2015).
These studies approach the data with a clear preconception of our Galaxy's history.Our aim here is to be more datadriven: regardless of history, how are the Galaxy's stars distributed in 'chemodynamical space' -the five-dimensional space spanned by the actions, [Fe/H] and [Mg/Fe]?Logi-cally this question should precedes the question of why stars are distributed as they are.A map of this distribution would be an invaluable descriptor of our Galaxy that would transcend theories of galaxy formation.Binney & Vasiliev (2023, hereafter BV23) used the AGAMA software package (Vasiliev 2019) to fit to Gaia DR2 data a model in which both stars and dark matter were represented by DFs of the form f (J) and moved in the potential Φ(x) that they and interstellar gas jointly generate.Here we revise and extend this model: we revise it by modelling the Galaxy's bulge as a fat disc rather than a spheroid; we update it by fitting to data from APOGEE and the third rather than the second Gaia data release, and we extend it by assigning a chemical model to each of its stellar components.
In Section 2 we describe the stellar sample that we have analysed.Section 3 first examines how the means of [Mg/Fe] and [Fe/H] vary with location in action space and then studies the stellar density and mean values of the actions in four regions of action space within which particular stellar components are expected to dominate.Section 4 introduces a scheme for modelling the Galaxy's chemodynamical structure: Section 4.1 explains how we extend a model based on standard DFs f (J) to a chemodynamical model; Section 4.2 defines the functional forms we have assumed for f (J); Sections 4.3 to 4.6 specify the values of the model parameters from which data-fitting commenced, explain our strategy for dealing with the survey's selection function, and define the likelihood that the search maximises and the maximising procedure.Section 5 describes the model fitted to the data and discusses the quality of the fit it provides.Section 6 compares the scope of the present model to that of the widely used Besançon model, and Section 7 sums up and identifies next steps.Appendix A validates our procedure for maximising a likelihood.

THE DATA
From the 17th data release of the Sloan Digital Sky Survey (Abdurro'uf et al. 2022) we selected data for stars that have Gaia DR3 astrometry (Gaia Collaboration et al. 2021Collaboration et al. , 2022) ) and parameters from the StarHorse catalogue (Queiroz et al. 2018;Anders et al. 2022;Queiroz et al. 2023).We removed stars with a probability > 0.5 of belonging to a globular cluster according to Vasiliev & Baumgardt (2021), stars with the STAR WARN bit (7) of the ASPCAP FLAG set, and stars with a StarHorse distance with an uncertainty larger than 0.75 kpc.We further required the astrometric 'fidelity' parameter of Rybizki et al. (2022) to exceed 0.5.Finally, the sample was restricted to giants by requiring log g < 3.5 and T eff < 5500.We convert heliocentric data to galactocentric coordinates assuming the Sun's phase-space coordinates are (R, z) = (8.27,0.025) kpc (GRAVITY Collaboration et al. 2022) and (VR, Vz, V φ ) = (14, 7, 251) km s −1 (Schönrich 2012;Reid & Brunthaler 2020).
Fig. 1 shows the spatial distribution of the sample's stars projected onto the xy plane and xz planes in the upper and lower panels, respectively.The upper panel shows very clearly the rapid variation of the selection function inevitable in a pencil-beam spectroscopic survey.Also evident is the strong bias towards the Sun inevitable in a magnitudelimited survey.Notwithstanding these regrettable character-

ACTION-SPACE CHEMISTRY
We computed the giants' phase-space coordinates and from them computed the actions Jr, Jz and J φ in the gravitational potential of the self-consistent Galaxy model that is presented in Section 5 below.In this potential 217 863 stars are bound and 97 unbound; actions cannot be computed for unbound stars, so these stars were eliminated from the sample.

Mean values of [Fe/H] and [Mg/H]
The panels of Fig. 2 show the mean value of [Mg/Fe] in cells in action space, while Fig. 3 shows mean values of [Fe/H].The left columns show projections onto the (J φ , Jz) plane grouped by their values of Jr, while the right columns show projections onto the (J φ , Jr) plane grouped by values of Jz.In the left columns the values of Jr increase from the bottom panel upwards, while in the right columns the values of Jz increase upwards -the relevant range in Jr or Jz is shown at top-right of each panel.
Stars on orbits that are either nearly circular or in the plane contribute, respectively, to the bottom left or right pair of panels in each figure, while stars on orbits that are either highly eccentric or highly inclined contribute to the top pair of panels.
Fig. 4 shows the number of stars contributing to the bottom and top left panels of Figs. 2 and 3.These numbers are strongly influenced by APOGEE's selection function.In particular, the lower panel for low Jr shows a strong concentration around the Sun's location.The plots of mean [Mg/Fe] and [Fe/H] in Figs. 2 and 3 show no sign of this bias.
Note the extraordinarily wide coverage in J φ -at low Jz there are stars with J φ down to zero from a maximum value that exceeds twice the Sun's value of J φ (∼ 2000 kpc km s −1 ).This wide coverage of J φ at small Jr, Jz is possible because APOGEE extends to low Galactic latitude b and covers an unprecedentedly wide range in Galactic radius R, so it includes stars on near circular orbits at a wide range of radii.
In every panel of Figs. 2 to 4 the populated region has a sharp left edge that lies just to the left of the line J φ = 0.If stars were in fact strictly confined to J φ > 0, observational errors would still cause some stars to scatter to J φ < 0. Indeed, a small over-estimation of the distance s to a star at ℓ ≃ 0 and R ≪ R0 can move a star from the near to the far side of the Galactic centre without much effect on its apparent velocity, and thus reverse the sign of its measured J φ .The sharpness of the left boundaries of the populated regions in Figs. 2 to 4 attest to the accuracy of the StarHorse distances used to compute J.
The near total confinement of stars to J φ 0 is a clear indication that the overwhelming majority of stars were born in a disc within our Galaxy -stars accreted from other galaxies would end up on orbits with both signs of J φ with roughly equal probability.A significant part of the stellar halo is thought to comprise such accreted stars, and as a consequence the halo shows negligible net rotation.The stellar halo should be dominant at small J φ and significant Jr and/or Jz.Hence the sharpness of the boundary at J φ = 0 in all the panels of Figs. 2 to 4 implies that the stellar halo contributes rather few stars to the sample.It may well be that halo stars, being metal-poor and thus weak-lined, have trouble passing our spectroscopic quality cuts.Moreover, our requirement that a star's distance uncertainty should not exceed 0.75 kpc will have removed many distant halo stars.A better requirement might be upper limits on the uncertainties in ln R and ln |z| because what really matters is the fractional uncertainties in R and |z| not the absolute uncertainty in distance.
There are several remarkable features of Fig. 2: • Along the J φ axis of the bottom left panel of Fig. 2 there is a narrow orange region indicative of solar [Mg/Fe].This is the low-α disc.Above it a blue-shaded region of high [Mg/Fe] fills the interior of a U centred on J φ ∼ 1200 kpc km s −1 .To left and right as well as below, this region transitions sharply to yellow shades indicative of lower [Mg/Fe].The blue region is part of the high-α disc.
• In this bottom-left panel, to the right of J φ ≃ 2000 kpc km s −1 yellow shades extend to the highest populated values of Jz ∼ 150 kpc km s −1 .This phenomenon clearly shows that low-α stars, even ones on significantly inclined orbits, dominate at J φ 2000 kpc km s −1 .It implies that the high-α disc has a remarkably sharp outer edge at roughly R0.
• As one proceeds up the left column of Fig. 2 through samples of stars with increasing Jr, the orange/yellow region of the thin, low-α disc yields ground to the blue high-α region.This phenomenon indicates that the high-α disc extends down to Jz = 0; at low Jr it is overwhelmed by the low-α disc, but comes to the fore as Jr increases because its DF decreases less rapidly with increasing Jr.
• Similar trends are evident as one proceeds up the right column of Fig. 2 through samples of stars with increasing Jz, except that in the range 500 kpc km s −1 < J φ < 2000 km s −1 kpc the yellow shades of low-α stars recede more rapidly: in fact they have already disappeared from the sample with 50 kpc km s −1 < Jz < 100 kpc km s −1 .This phenomenon indicates that the main body of the low-α disc covers a wider range in Jr than in Jz, as is natural in a 'thin' disc.
• The V of green colours that pushes down at J φ ≃ 1500 kpc km s −1 in the bottom right panel of Fig. 2 suggests that within the low-α disc Jr has a minimum just interior to the Sun.
• Whereas in the bottom left panel of Fig. 2 brown shades are confined to a narrow band above the J φ axis, in the bottom-right panel at J φ > 2000 kpc km s −1 they extend to Jr 50 kpc km s −1 .This indicates that the outer low-α disc is radially hot.
• In the top two panels of the right column of Fig. 2, blue colours indicative of high-α extend right down to the J φ axis except at very low |J φ | and J φ > 2200 kpc km s −1 .This indicates that at Jz 100 kpc km s −1 the high-α disc dominates outside the bulge and the outer disc.Fig. 3 shows mean values of [Fe/H] in the format used to display [Mg/Fe] in Fig. 2. Blue shades now imply low metallicity, so the high-α disc with [Fe/H] ∼ −0.7 is coloured cyan, wile the metal-rich, inner low-α disc is coloured yellow.With this colour scheme very similar patterns are observed in Fig. 3 to those discussed above for Fig. 2: high-α often implies low-metallicity.The most striking difference occurs in the region J φ > 2000 kpc km s −1 : in Fig. 3 this region is largely green because the outer low-α disc is metal poor.However in the bottom right panel of Fig. 3 a tongue of yellow is evident at J φ ∼ 2000 kpc km s −1 that signals that the low-α disc becomes radially excited before it turns metalpoor.
The bottom-left panel of Fig. 3 (for Jr < 50 kpc km s −1 ) shows [Fe/H] > 0 is confined to small Jz except at small |J φ |.The bottom-right panel shows that at small Jz and |J φ |, [Fe/H] > 0 occurs predominantly at large Jr.These facts are consistent with the idea that metal-rich stars populate the kind of tightly bound, eccentric, low-inclination orbits that form the backbone of the bar.In the bottom right panel of Fig. 3 (for 0 < Jz < 50 kpc km s −1 ) in the range 1400 kpc km s −1 < J φ < 2200 kpc km s −1 a yellowbrown ridge reaches upwards: this is the signature of a vertically thin annulus of fairly metal-rich stars on eccentric orbits.
Figs. 2 and 3 bear comparison with Fig. 31 of Gaia Collaboration et al. (2023), which shows the distribution of more than 5 million stars in the (J φ , Jr) plane.The chemistry of these stars was determined from Gaia radial velocity spectra, and in four of its panels, cells are coloured by median [M/H] or median [α/Fe].The left panels plot over a wide range in (J φ , Jr) while the right panels zoom into the most densely populated part of the plane : 1000 kpc km s −1 < J φ < 3000 kpc km s −1 and Jr < 100 kpc km s −1 .One of the zoomed panels shows steep drops in [M/Fe] along two lines that slope up and to the left, intersecting the J φ axis at J φ ≃ 2000 and 2130 kpc km s −1 .One of these features is probably related to the drop apparent in the lower-right panels of Fig 3 but the correspondence is not clear.The bottom-right panel of the figure in the Gaia paper shows a decline in [α/Fe] around J φ = 2000 kpc km s −1 , but again the line of steepest descent slopes up and to the left rather than running straight up, and it isn't as sharp as that seen in Fig. 2 (and in Fig. 20 below).In this connection, note that the Gaia paper tracks [α/Fe] through lines of calcium rather than magnesium.To gain further insight we now explore distributions in the ([Fe/H],[Mg/Fe]) plane for the four regions of the (J φ , Jz) plane that are marked in Fig. 5 over a plot of the standard deviation in [Mg/Fe] in the slice of action space with smallest Jr.The standard deviation tends to be large where components with differing chemistry overlap.

The ([Fe/H], [Mg/Fe]) plane
The first of these regions, labelled A, lies along the Jz axis at Jz > 10 kpc km s −1 and J φ < 300 kpc km s −1 .Stars in this region are on highly inclined orbits, so they dominate the stellar density near the z axis.We call this the bulge/halo region.The second region, B, occupies the heart of the high-α region of action-space -Jz > 50 kpc km s −1 and 300 < J φ < 2000 kpc km s −1 .The third region, labelled C, lies just above the J φ axis at J φ > 300 kpc km s −1 .
Stars in this region are on nearly planar orbits, so will be predominantly thin-disc stars.The fourth region, D, at J φ > 2500 kpc km s −1 , is dominated by the outer, flaring disc.
The contours in Fig. 6 show the chemical structure of these four regions: the bulge/halo, high-α disc, thin-disc and outer disc regions are depicted by rows from top to bottom.In the left, centre and right columns the colours show mean values of J φ , Jz and Jr, respectively.It should be borne in mind that the stellar densities contoured, unlike the mean values plotted in Fig. 2, reflect the APOGEE selection function.Most significantly, stars near the Sun are over-represented.
• The top row of Fig. 6 shows that the bulge/halo region is indeed a superposition of two components, one centred on ([Fe/H],[Mg/Fe]) ≃ (−0.65, 0.35) and the other on ([Fe/H],[Mg/Fe]) ≃ (0.36, 0.03).The first population has a tail that extends to [Fe/H] ∼ −1 at slightly lower [Mg/Fe].It is probably a mixture of the high-α disc and the stellar halo with the former dominating.The metal-rich, low-α component is presumably the bulge.The mean values of J φ and Jr shown by the colours vary little with chemistry, although there is a marginal tendency for J φ to decrease with [Fe/H] and to be negative at [Fe/H]< −1.The central panel shows that Jz increases systematically with decreasing [Fe/H].That is, Jz is lowest in the bulge, highest in the halo and intermediate in the high-α disc.
• The second row in Fig. 6 shows the chemistry of the high-α region.The dominant component is centred on ([Fe/H],[Mg/Fe]) ≃ (−0.49, 0.33), with a tail sloping down to solar chemistry (0,0) and beyond.Within the dominant, high-α component, the left panel shows that any gradient in J φ is weak, although at [Fe/H]< −1 there is a slight tendency for J φ to decrease with [Fe/H].The central panel shows a systematic increase in Jz with decreasing [Fe/H] within the high-α component but no evidence of an analogous gradient in the low-α component.
• The thin-disc region shown in the third row of Fig. 6 comprises a dominant component centred on ([Fe/H],[Mg/Fe]) ≃ (−0.06, 0.036) that must be the main body of the low-α disc.On top we see a significant tail from the high-α disc.Thus the high-α disc makes a significant contribution to the thin-disc region.The right panel of the third row shows that in the dominant component, Jr increases with [Mg/Fe] -this is likely the signature of stochastic heating: stars with larger [Mg/Fe] are older and kinematically hotter.Another effect of age increasing with [Mg/Fe] is the widening spread in [Fe/H] with increasing [Mg/Fe]: older stars have migrated further with the consequence that in the over-represented solar-neighbourhood group an older population displays a wider spread in [Fe/H].The colours in the left panel of the third row indicate that the dominant component has a clear gradient in J φ , which must be a manifestation of the familiar metallicity gradient in the disc (e.g.Méndez-Delgado et al. 2022

, and references therin).
There is no sign of an analogous gradient in the high-α component.Surprisingly, the third row shows a tendency for Jz to increase with [Mg/Fe] -given that stars contributing to this row have by construction Jz < 10 kpc km s −1 , it is puzzling to observe significant variation of Jz .The signal is unmistakable nonetheless.Stars with exceptionally low Jz  Taken together the second and third rows confirm the presence of two populations that coexist at many points in action space.The low-α population dominates at low Jz and the high-α population dominates at high Jz.
The first and second rows show that stars metal-poorer than [Fe/H] ≃ −0.8 are only encountered at [Mg/Fe] 0.2.Moreover, these stars are confined to Jz 200 kpc km s −1 and Jr ≃ 100 kpc km s −1 .
• The bottom panels of Fig. 6 show the chemistry of the outer, flaring disc.It is almost entirely accounted for by a single population centred on ([Fe/H],[Mg/Fe]) ≃ (−0.4,0.1).This low-α population shows the gradient in J φ that we encountered in the inner low-α disc, so we have every reason to suppose that this population is just an extension of the dominant component in the third row (the low-α disc).Moreover, the central and right panels of the fourth row show the same trends in Jz and Jr we encountered in the third row, and attributed to increasing age and stochastic heating with [Mg/Fe].
Perhaps the most striking aspects of Fig. 6 are two 'dogs that didn't bark': (i) the absence of stars at low-metallicity and low-α in the top row (bulge/halo region) and (ii) the complete disappearance of the high-α population between the second and fourth rows (high-α and outer disc regions).Another notable result is the contrast between the strong gradients in J φ , Jz and Jr in the low-α disc and the extremely weak gradients in mean actions in the high-α component.
Fig. 7 reveals key differences in how low-and high-α stars are distributed in action space by plotting projections of the sample onto the (J φ , Jz) plane in the upper, and the (J φ , Jr) plane in the lower, pairs of panels.The absence of high-α stars at J φ > 3000 kpc km s −1 is striking, as is the extent to which the low-α stars extend to high Jz at both small and large J φ while being restricted to lower Jz at intermediate J φ .In Jr, the pattern of the low-α stars is very different: the populated region reaches highest in Jr at intermediate J φ even though stars with large |J φ − J φ⊙ | can reach the Sun, and thus boost their chances of entering the sample, only if they have large Jr.The wide spread in Jr at J φ ≃ J φ⊙ may be the result of resonant scattering by spirals, while the wide spread in Jz at large J φ might be a legacy of tidal interactions with objects in the dark halo.
It is worth noting how much the bottom panel of Fig. 7 differs from similar plots of stellar density in the (J φ , Jr) plane for stars near the Sun (e.g.Hunt et al. 2019;Trick et al. 2019): here we see no lines associated with resonances.
To see these features, the sample must be restricted to a narrow band in radius or distance from the Sun because (as Fig. 7 demonstrates) resonant orbits are not more-or less-populated than neighbouring non-resonant orbits.They nonetheless appear to be so if we impose a sufficient selection in angle variables θ because resonant terms in the Hamiltonian shuffle stars between the actions that are the axes of the plots.This motion is correlated with θ, so at some angles the density of stars has been enhanced at particular actions, while at other angles the density at these actions has been diminished.A sample that is strongly selected in radius or distance from the Sun is biased in both θr and θ φ and thus reveals the effects of perturbing terms in a way that the broadly selected sample of Fig. 7 does not.
The sample of stars with high-quality RVS spectra analysed by Gaia Collaboration et al. (2023) covers quite a wide range in R and distance from the Sun but it is much more concentrated around the Sun that the APOGEE sample.This may be why it shows ridges in the (J φ , Jr).As the Gaia paper remarks, it is not evident that its ridges are the same as those presented by Hunt et al. (2019) and Trick et al. (2019).The ridges of the Gaia paper are most evident in its plot of median values of zmax, the greatest distance a star moves from the plane.Fig. 8 is a plot of the mean value of Jz that ought to be closely related to a plot of mean values of zmax.Any ridges analogous to those in the Gaia paper are extremely indistinct.

MODELLING THE DATA
We now turn to the construction of a chemodynamical model of our Galaxy that reproduces as closely as possible the trends discovered in the last section.(1) An excellent way of summarising the chemodynamical structure of our Galaxy would be to decompose it into components that individually have simple EDFs.Specifically, we seek components that have analytic dynamical DFs f (J) that are extended to EDFs by multiplication by an analytic probability density P (c|J) that gives the probability that a star with actions J has the chemistry c.Any EDF F (c, J) can be written as a product f (J)P (c|J) -simply define f (J) ≡ d 2 c F (c, J) and P (c|J) ≡ F (c, J)/f (J) and is trivial to show that d 2 c P = 1 -but we want to define our components so P (c|J) is analytic.Non-negativity is a fundamental property of a probability density function (pdf), and multi-Gaussian expansions are widely used to approximate non-negative functions analytically.In this spirit we assume that P (c|J) can be approximated by a Gaussian distribution in c with mean and dispersion depending on J.
The general Gaussian two-dimensional probability density can be written where K is a 2 × 2 symmetric matrix and the subscripts imply dependence on J -in order to limit the number of parameters to determine, we assume that K is independent of J.A simple assumption is that cJ depends linearly on J: where c0 is a two-component object and C is a 2 × 3 matrix.Without loss of generality, we can choose the reference actions J0 = (0, 0, VcR0) to be similar to those of the Sun, with the implication that c0 becomes the mean chemistry of stars in the given component that are on solar-type orbits.

DFs of the components
We follow BV23 in modelling the stellar component of our Galaxy with six DFs f (J).Five of these are instances of a generalisation of the exponential DF introduced by Vasiliev (2019).The overall DF is a product of factors The function fr controls the breadth of the distribution in Jr and thus the velocity dispersions σR and σ φ near the equatorial plane.The function fz similarly controls the breadth of the distribution in Jz and hence controls both the thickness of the disc and the velocity dispersion σz.The other three factors control the disc's radial profile.On its own, the factor f φ generates a roughly exponentially declining surface density Σ(R) ≃ exp(−R/R d ).The factor fint tapers this profile towards the centre.At a theoretical level, this factor is motivated by the notion that the central portion of the disc has morphed into the bar/bulge leaving a depression at the centre of the surviving disc.At an empirical level, modelling the young disc through the observed distribution of  OB stars, Li & Binney (2022) concluded that an exponential that extends right to the centre predicts too many stars at small radii.The factor fext truncates the disc at some outer radius.This feature is motivated by the sharp transition from blue to yellow at J φ ≃ 2000 kpc km s −1 in Fig. 2, which suggests that the high-α disc has a sharp outer edge.These factors take the form where Here the action Jint determines the characteristic radius of the central depression and Dint is a parameter that determines the sharpness of its boundary.Similarly, the action Jext determines the disc's outer truncation radius and Dext sets the sharpness of the cutoff there.If Di is set to a negative value, fi = 1 is returned, ensuring that there is no central depression or radial truncation.
The functional forms adopted for fr and fz are essentially the same as those adopted by BV23: where Here J φ0 is the action that sets the disc's characteristic scale length and the action Ji0 sets the velocity dispersions σR and σz.The exponent pi controls the radial variation of these dispersions: the larger pi is, the faster the dispersions fall with radius.The action Jv , a surrogate for energy, is taken to be where Jv0 is a constant that controls the way the dispersions vary at small radii as discussed by BV23.The factor f φ has the form where with J d0 a constant that controls the way the surface density varies at small radii, as discussed by BV23.These formulae for fr, fz and f φ are the same as those in BV23 except that here Jv and J d depend on Jr and Jz in addition to J φ .Since disc stars typically have J φ ≫ Jr, Jz the additional dependence of Jv and J d is generally insignificant.It does, however, yield more plausible velocity distributions in the neighbourhood of V φ = 0. We follow BV23 in modelling the low-α disc as a superposition of three discs, presumed to be of increasing age and velocity dispersion: the young, middle-aged and old discs.Away from their centres, these discs are simple exponentials, but they may have central depressions in their surface density consistent with the bar/bulge having formed out of them.BV23 modelled the bulge by a spheroidal DF.Figs. 2, 3 and 7 indicate that this was a poor choice by suggesting that the bulge is almost entirely confined to J φ > 0, as is natural in a component that formed from a thin disc.Therefore we model the bulge as a truncated exponential, in which J φ0 is small and Jr0 and Jz0 are large.
The high-α disc is assumed to be a radially truncated exponential.The stellar halo is modelled by the same nonrotating spheroidal DF used by BV23.This probably provides a poor representation of the truth, but since the halo contributes very little to the APOGEE data, we defer improvement of its DF to future work.

Initial parameters
The chemodynamical model we are fitting has a large number of parameters.An automated search for a good model through a high-dimensional space is unlikely to succeed if started from a random location.Hence we started by handfitting the DFs to the dynamical data in the manner described by BV23.The resulting model differed from that of BV23 principally because the data employed extended from R = 1 kpc to 14 kpc rather than the narrower range R0 ± 3 kpc.
The starting parameters of the DFs were largely taken to be those determined by BV23.Experiment showed that central tapers in the surface densities of disc components tend to yield circular-speed curves that rise more slowly than the data imply, so in the final model only the young disc has a central taper.For the high-α disc, the sharp colour transition in Fig. 2  The other fresh choices required for the DFs were all the parameters of the bulge's DF.The bar/bulge extends out to R ≃ 3 kpc, which corresponds to Jext ≃ 800 kpc km s −1 , so Jext should be a value of this order; we started from Jext = 1000 kpc km s −1 and Dext = 200 kpc km s −1 .The bulge is a hot component, so we started from Jr0 = 100 kpc km s −1 and Jz0 = 50 kpc km s −1 .
Table 1 lists the initial values of the chemical parameters.On the left we have the tilt θ of the Gaussian ellipsoids with respect to the chemical axes 1 and the means x0, y0 and dispersions σx, σy of those Gaussians at the Sun's actionspace location.As we proceed from young disc to old and high-α, the mean value of [Fe/H] is presumed to decline while that of [Mg/Fe] is presumed to increase.The dispersions σx and σy increase along this sequence consistent with the evidence that stars diffuse from circular orbits as they age.The bulge is assumed to be metal-rich and α-normal, with a large dispersion σx.
The gradients of chemistry in action space are listed on the right of Table 1.Most values are set to zero.Exceptions are the coefficients C1,J φ that set the radial metallicity gradients in the thin disc and the stellar halo -these are set to a value similar to that, −0.31 ± 0.02 dex per Mpc km s −1 , that was inferred for d[Fe/H]/dJ φ by Spina et al. (2022) for open clusters.Other non-zero values are of C2,J r and C2,J z for the old disc; these imply that [Mg/Fe] increases with eccentricity and inclination, and a value for the high-α disc's C1,J z that implies strongly decreasing [Fe/H] with inclination.

Managing the APOGEE selection function
As remarked above, the contours in Fig. 6 depend on APOGEE's very complex, dust-dependent, selection function (SF) in addition to the EDF.Our approach to circumventing this problem is as follows.Our basic assumption is that the SF is independent of both velocity and chemistry but depends strongly on position.The independence of velocity is clear; less so is the independence of c since colour criteria were involved in the selection of stars.Nonetheless any dependence of the SF on c must be tiny by comparison with the dependence on x.
Since our model is axisymmetric and symmetric in z, velocity distributions are functions of (R, |z|) only.So we bin the real stars in the 72 bins in (R, |z|) space that are specified by Table 2 2 and determine the barycentre Xα of the stars in the αth bin -let Nα be the number of stars in this bin.Then, given a trial set of components and their EDFs, we determine the density ρiα = d 3 v, fi contributed by the ith component at Xα and let be n ∼ 5 times the fraction of real stars predicted to be contributed by the ith component.Now we create kiαNα mock stars by sampling velocity space at Xα under the velocity distribution specified by fi(J).Each chosen velocity Viα corresponds to an action Jiα, and we use this to choose a chemistry ciα by sampling the Gaussian P (c|Jiα).When this is done, each spatial bin contains nNα mock stars in addition to the Nα real stars.The mock stars, unlike the real stars, have known component memberships.BV23 should perhaps have stressed the novelty of their approach to mapping a galaxy's gravitational field.Competing methods, using the Jeans equations (e.g.Read & Steger 2017) or Schwarzschild modelling (e.g van den Bosch et al. 2008;Zhu et al. 2018), hinge on knowledge of the density ρ(x) of a stellar component.BV23 predict the densities of components, but they they use as an input only the density profile ρ(z) in the solar cylinder.The basic physical principle of the BV23 approach is most easily understood in the context of a one-dimensional problem.Let (x, v) be the system's phase-space coordinates and let Φ(0) = 0 with dΦ/dx 0, so x = 0 is the bottom of the potential well.Then at x = 0 stars of every energy E are present, and at x we find only stars with E Φ(x).Moreover, the velocity distribution in the vicinity of v = 0 at location X follows from the shape of the velocity distribution in the vicinity of v = 2Φ(X) at x = 0.In words: the potential relates velocity distributions at different locations, so it's logically possible to recover it from those velocity distributions without knowing the stellar density ρ(x).When the DF is isothermsl, f = F e −βE , the velocity distribution, n(v) = F e −βΦ(x) e −βv 2 /2 , is the same Gaussian at all locations and Φ cannot be recovered without knowing ρ(x).Fortunately, no stellar system can have an isothermal DF because the latter includes many unbound stars.

Defining the likelihood
Given a model potential Φ(x), a set of DFs and a model chemistry, we compute the log likelihood ln L of the data as follows.We distribute each bin's mock stars over a grid |z| 0 0.3 0.7 1 1.5 2 3 R 0.5 1.5 3 4 5 6 7 8 9 10 11.5 13 14.5 in velocity space by the cloud-in-cell algorithm (e.g.Binney & Tremaine 2008, Box 2.4).The velocity grid has n 3 g cells covering the cuboid with boundaries at ±VRmax, ±Vzmax and V φmin to V φmax , where VRmax = 2.5σR, Vzmax = 2.5σz, V φmin = −50 km s −1 and V φmax = V φ + 2.5σ phi .Here the grid size ng is proportional to the cube root of the number of real stars and σR etc are the standard deviations of the components of the velocities of the real stars: ng ranged up to 23 with average value 9.2.
The mass assigned to each grid cell is then divided by the number of mock stars in the relevant spatial bin.Then we use the cloud-in-cell algorithm to determine the mockstar density at the location of each real star and compute the contribution ln L α dyn of the αth spatial bin to the overall dynamical log likelihood as the sum over stars of the logarithms of these densities.In Appendix A we show that the maximum value of ln L α dyn under variation of the masses assigned to cells, subject to a fixed total mass on the grid, is achieved when the mass of mock stars in each cell coincides with the mass that would be obtained by distributing the real stars over the grid in the same way that the mock stars are distributed.
The dynamical log likelihood is the sum over spatial bins normalised by the number of real stars.This normalisation prevents the likelihood being dominated by grid cells that lie close to the Sun and are in consequence heavily populated: sparsely populated cells far from the Sun are powerful probes of the Galaxy's structure.The computation of the chemical contribution to the log likelihood is similar.We use the cloud-in-cell algorithm to distribute the mock stars over a five-dimensional grid ([Fe/H], [Mg/Fe], J φ , Jr, Jz).Then the mass assigned to each cell is divided by the number of mock stars, and ln L chem is computed as the mean of the logarithm of the grid density at the location of each real star.The final log likelihood is ln L = ln L dyn + ln L chem . ( An indication of how closely a model approximates the data can be obtained by distributing real rather than mock stars over the grids and then computing ln L as before by determining the grid density at the locations of the real stars.We call this the perfect-fit value of ln L. The perfect-fit and actual-fit values of ln L dyn for the chosen model are −6.460 and −6.573, while the corresponding values of ln L chem are −9.7737 and −9.7875.
We will see below that the algorithm chooses surprisingly large values for some dispersions σy and gradient coefficients Cij .We tested the robustness of thee choices by including in the quantity to be maximised a Gaussian prior term proportional to −σ 2 y and/or −C 2 ij .These priors had little effect.

A shortcut
The procedure just described assumes that the mock stars are drawn from the current DFs.It is expedient to be able to compute ln L from mock stars obtained by sampling a DF f0 that differs slightly from the current DF, f .To do this we weight each mock star by f (J)/f0(J), the ratio of the star's current probability density to the probability density when it was randomly drawn.The sum of these weights is the number of 'effective' mock stars and this is the number used to normalise the masses in cells after distributing mock stars on a grid.The need to re-sample was determined by monitoring the largest values of f (J)/f0(J).There is no need to re-sample after varying the chemical model.

Likelihood maximisation
We have used the Nelder-Mead algorithm encapsulated in the agama function findMinNdim to minimise − ln L. We alternated a few hundred steps in which the chemical model was fixed while the DFs were varied with a few hundred steps in which the DFs were fixed and the chemical model was varied.The code sought to maximise the sum of ln L dyn and ln L chem regardless of which parameters were being varied because ln L chem is useful when changing the DFs: it is sensitive to the relative masses of components and indicates whether a deficiency of stars at some phase-space location should be remedied by increasing the mass of the high-α disc or the old disc, for example.
When the DFs are varied without updating the potential Φ(x), the fit quality is invariant under multiplication of the masses mα of every component's DF by a common factor: with Φ fixed, the fit depends only on the relative masses of components.So the Nelder-Mead algorithm varied the DFs with the total stellar mass held constant.
After a few hundred Nelder-Mead adjustments of the DFs, the DF parameters were reviewed and potentially changed by hand before updating Φ to self-consistency.Then a new sample of mock stars was drawn.

RESULTS
Regardless of whether it is the DFs or the chemical model that is being adjusted, the Nelder-Mead algorithm yields a sequence of ln L values that increases rapidly at first and then more and more slowly.So one does not have the impression that it has located a global maximum.Hence the model presented here has the status of fitting the data about as well as any model we have tried rather than being a definitive model.The full curve in Figs 9 shows the circular-speed curve Vc(R) of the final potential; also shown are the contributions to Vc from the baryons (blue dotted curve) and dark matter (black dashed curve) along with several previous estimates of Vc(R) derived from tracers presumed to be on near-circular (Mróz et al. 2019;Ablimit et al. 2020), and a study of 23 000 red giants by Eilers et al. (2019).At R 4 kpc, where circular orbits are prohibited by the bar, the model curve deviates significantly from the triangular points from Wegg & Gerhard (2013), which describe the axisymmetric component of a non-axisymmetric potential.Clearly we cannot expect an axisymmetric model to reproduce the data for this region, but the extent of the conflict with earlier work suggests that our bulge is too massive and insufficiently compact.At R 4 kpc the model curve in Fig. 9 runs just above most of the points from previous studies.This is a consequence of taking V φ⊙ = 251 km s −1 for the Sun: when V φ⊙ is increased, the black (data) curves in the right columns of Figs. 13 to 15 move to the right and the potential has to be adjusted to make the red (model) curves follow them.Our value for V φ⊙ follows directly from the observations of Sgr A* and the assumption that this black hole can define the Galactic rest frame (Reid & Brunthaler 2020;GRAVITY Collaboration et al. 2022).Since the model predicts Vc(R0) = 234 km s −1 the Sun's peculiar velocity is V φ⊙ = 17 km s −1 .These values are consistent with the values Vc(R0) = 238 ± 9 km s −1 and V φ⊙ = 12 ± 9 km s −1 reported by Schönrich (2012).
The full curve in Fig. 10 shows the model density of stars as a function of |z| at the solar radius, while the points describe the observational estimates of this by Gilmore & Reid (1983) and Jurić et al. (2008), which can be moved ver-Figure 13.Velocity distributions within 24 spatial cells.The left column shows the distributions in V R , Vz and V φ in the cells that cover the equatorial plane and have Galactocentric radii R that increase from 1.1 kpc at the bottom to 13.5 kpc at the top.The right column shows the corresponding velocity distributions for the adjacent cells -these have barycentres at |z| ∼ 0.5 kpc.The numbers in the panels for Vz give the exact coordinates of the relevant barycentre.In each panel the velocity scale covers ±3σ i , where σ i is the standard deviation of the data histogram -the numbers along the bottom refer only to the bottom row of panels.
tically because they relate to star density rather than mass density.The agreement is satisfactory.The black dashed line shows the almost constant density of dark matter, while the blue curves show the density profiles of the young disc (full curve), the middle disc (dotted curve), the old disc (shortdashed curve) and the high-α disc (long-dashed curve).The red dotted curve shows the tiny contribution from the stellar halo.The dark halo makes the largest contribution to the density at |z| > 400 pc.
Comparison with Figs 14 and 15 in BV23 shows that the differences between the circular-speed curves of the two models are confined to the bar-bulge region R 5 kpcthe curve of the BV23 model rises much more steeply at R < 1 kpc.The BV23 discs are less massive, yield a shorter overall radial scale length R d , and have smaller scale heights -this is especially true of the old disc, for which the Jz0 parameter has jumped from 5 to 24 kpc km s −1 .At R > 10 kpc the old and high-α discs of BV23 have much lower σR and significantly lower σz, changes that are clearly mandated by the new data.The local densities of dark matter are almost identical in the two models.
The full blue curve in Fig. 11 shows the surface density of disc stars as a function of radius while the black dotted curve shows the surface density of all stars.At R 5 kpc the surface density falls nearly exponentially with scale length R d ≃ 3.6 kpc (marked by the red dashed curve).This is a significantly longer scale-length than was inferred by Robin et al. (2003)   The blue curves in the lower panel of Fig. 12 show the mean streaming velocities at z = 0 of the three thin-disc components and of the high-α disc.Asymmetric drift causes the streaming velocity to fall further and further below the circular speed as the velocity dispersions increase.The black dotted curve shows the rising mean streaming velocity in the bulge.
Tables 3 to 5 list the model's parameters, while Figs 13 to 18 show the resulting fits to data.These figures show that this model fits many aspects of the data very well while showing shortcomings in other aspects.

Fits to kinematics
Figs. 13 to 15 show, from left to right in each column, distributions of VR, Vz and V φ marginalised over the other two velocity components for the 72 spatial bins that were used to compute ln L. In the panels for VR and Vz the histograms cover ±3σ where σ is the standard deviation, while the histograms of V φ cover −50 km s −1 to V φ + 3σ.From bottom to top of each column the radius of the spatial bin increases, while different columns correspond to different values of |z|: the numbers at the bottom of each Vz histogram give the mean values of R and z of stars in the relevant bin.The black histograms show the APOGEE DR17 + Gaia DR3 data while the red histograms show the distributions of mock stars.
The largest discrepancies between data and model occur in V φ at small R and |z| (bottom right of each column in Fig. 13).These bins are dominated by the Galactic bar, so it is natural that our axisymmetric model fails to model the data well.More surprising is how well the model fits the VR and Vz distributions for these bins, and even provides passable fits to the V φ distributions above |z| = 0.7 kpc (lower right of the columns of Figs 14) and 15).
The model generally provides good fits to the V φ histograms at R 4 kpc.This result implies that the potential correctly gives the circular speed Vc(R) because, as BV23 explained, even a small mismatch between a model's circular speed and the true circular speed leads to an unmistakable displacement of the red and black curves in the V φ plots.If some V φ histograms for R > 4 kpc show discrepancies, it is because the model under-predicts stars with V φ ∼ 0.
In a few of the histograms for VR and Vz (see the top of Fig. 13) the red model curve fits the black data curve on one side much better than on the other.Since the red curves are by construction left-right symmetric (up to shot noise), such one-sided fits point to deviations from equilibrium in the data caused by spiral structure or tidal interactions, for example (e.g.McMillan et al. 2022;Khanna et al. 2023).

Parameters of the DFs
Tables 3 and 4 give the parameters of the fitted DFs.The middle disc is the most massive disc component: with 1.2 × 10 10 M⊙ it is 50 percent more massive than the old and highα discs and nearly three times as massive as the young disc.The bulge with 1.27 × 10 10 M⊙ is slightly more massive than the middle disc.The mass, ∼ 4 × 10 8 M⊙, of the stellar halo is gravitationally negligible.As discussed in Section 3, our sample may be biased against very metal-poor stars, so the mass we report for the stellar halo is likely an underestimate -for the halo within 100 kpc Deason et al. (2019) estimate luminosity L = 9.4 ± 2.4 × 10 8 L⊙ leading to mass M = 1.4 ± 0.4 × 10 9 M⊙ at least twice our value.
The young and middle discs have comparable scale actions J φ0 ∼ 1000 kpc km s −1 , while the old and high-α discs have much smaller scale actions, ∼ 500 and 400 kpc km s −1 , respectively.In the context of inside-out growth of galaxies, this result is to be expected.The parameter values Jint ∼ 190 kpc km s −1 and Dint ∼ 280 kpc km s −1 for the young disc imply that it does not differ strongly from a pure exponential.The high-α disc is quite sharply radially truncated near R0: Jext ≃ 2200, Dext ≃ 210 kpc km s −1 .
The scale actions Jr0 and Jz0 that control the in-plane and vertical velocity dispersions increase, as expected, along the sequence young disc, middle disc, old disc, high-α disc.The biggest jump in Jr0 occurs between the young and the middle disc, while the biggest jump in Jz0 occurs between the middle and old disc, so the disc with the largest velocity anisotropy is the middle disc.Such anisotropy is the signature of heating by spiral arms (e.g.Binney & Lacey 1988).
The value of Jr0 for the bulge is very similar to that of the high-α disc, while the bulge's value for Jz0 is only half that of the high-α disc.
The values of the parameter pr that controls the radial gradient of the in-plane velocity dispersions increases sys-tematically along the sequence young -high-α disc.This result implies that the radial gradient in σR steepens along this sequence.
The parameter pz that controls the radial gradient in σz does not vary systematically along the disc sequence.The middle disc has the most negative value of pz and therefore the weakest radial gradient, while the high-α disc has the steepest radial gradient.
The bulge DF's values J φ0 = 127 kpc km s −1 and Jext = 611 kpc km s −1 ensure that the bulge is compact.Its inplane dispersions are similar to those of the high-α disc (Jr0 = 122 kpc km s −1 ) but it has much smaller vertical dispersions and extent because Jz0 = 34 versus 64 kpc km s −1 for the high-α disc.For some reason pr = 0.82 has been set large and positive (causing σR to decline steeply with z) while pz = −0.13causes σz to decline much more slowly with R. Consequently the bulge is most anisotropic at small radii.The bulge mass, 1.27×10 10 M⊙ may be compared with the value 1.43 ± 0.18 × 10 10 M⊙ that Portail et al. ( 2015) estimated from made-to-measure modelling of data for redclump giants.
The total stellar mass is 4.60 × 10 10 M⊙, on the lower side of recent estimates.At R0 the stellar density is 0.036 M⊙ pc −3 in the plane falling to 0.0024 M⊙ pc −3 at z = 1.1 kpc.
At R0 the density of dark matter is 0.011 M⊙ pc −3 in the plane falling to 0.0095 M⊙ pc −3 at z = 1.1 kpc in agreement with most recent estimates (see Fig. 1  The actual surface density of stars and dark matter within 1.1 kpc of the plane is slightly lower than the naive estimate based on Kz because the gravitational field has a large, position dependent, radial component.It is and comprises 26.5 M⊙ pc −2 in stars, 24.7 M⊙ pc −2 in dark matter and 12.6 M⊙ pc −2 in gas.
In Table 4 the mass of the dark halo is given as 0.94 × 10 12 M⊙ but the great majority of this mass lies outside the volume for which we have data.The scale radius rs of the dark halo is set by the parameter J φ0 , which was arbitrarily set to 10 000 kpc km s −1 , so the volume modelled lies inside rs, where, in the absence of the stars, the dark-matter density would satisfy ρ ∼ r −2 .As explained by BV23, the actual density profile of the dark halo is more complex on account of the pull of the stars.

Fits to chemistry
Figs. 16 to 18 show predicted (left panels) and observed distributions in the ([Fe/H],[Mg/Fe]) plane within 30 bins in the (J φ , Jz) plane -the actions at the centre of the bin are given at the bottom of the bin's model panel.As one proceeds up each column, the mean value of Jz increases, so the bulge or thin disc dominates the bottom panels of each column.As one proceeds from column to column, the mean value of J φ increases, so the typical Galactocentric    Figure 18.The same as Fig. 16 but for J φ ∈ (1800, 2200) kpc km s −1 (left column) J φ ∈ (2300, 2700) kpc km s −1 (right column).
© 0000 RAS, MNRAS 000, 000-000 radius of stars increases from the left column of Fig. 16 to the right column of Fig. 18.Fig. 10 of Eilers et al. (2022) show similar data in a slightly different representation by plotting √ J 2 r + J 2 z vertically rather than Jz.In Fig. 16 the largest discrepancies between data and model occur around J φ = 0.This region of action space will be dominated by the stellar halo and the bulge and we have no confidence in the functional forms we have have adopted for their DFs, so it is not surprising that the model performs least well there.In fact it is encouraging that the model does capture the main features of the chemistry even there: in particular a strongly populated ridge that slopes from ([Fe/H],[Mg/Fe])≃ (0.4, 0.05) up towards (−0.08, 0.38), which has conentrations towards each end that vary in importance with Jz.
At J φ ∼ 500 kpc km s −1 and Jz < 50 kpc km s −1 (bottom of the right column of Fig. 16) the model underpopulates the principal ridge at high [Fe/H] and low [Mg/Fe].
Fig. 17, which covers 750 kpc km s −1 < J φ < 1750 kpc km s −1 , shows good agreement between model and data at all values of Jz except the largest: at the top of he left column the model struggles to reproduce a population of metal-rich, low-α stars.This issue with a deficit of metal-rich, low-α stars at high Jz is the only significant shortcoming of the fits shown in Fig. 18 for the largest angular momenta.Data for the stars that the model fails to provide is sparse and most liable to observational error, so it is not clear how significant the deficit in the model is.

Chemical parameters
Table 5 gives the values of the parameters of the chemical model that yields the fits shown in Figs.16 to 18, while each row of Fig. 19 displays the chemodynamical structure of the component listed along the figure's right-hand edge.Contours show the density of stars with a given chemistry that one obtains by sampling the DF without any selection function and then using the chemical model of Table 5 to assign a chemistry to each selected star.This plot differs from all our other plots in being representative of what the model claims is out there, rather than what APOGEE can see.In the left column each pixel is coloured by the mean of the J φ values of the stars in that pixel, while the centre and right columns are coloured by the mean values of Jz and Jr, respectively.The numbers in brackets in each panel show the values (in kpc km s −1 ) associated with red and blue.
The contours in the top row of Fig. 19 show that the main body of the bulge is modelled as a very metal-rich, roughly α-normal structure.From this there extends a tail of stars with progressively higher [Mg/Fe] and slightly lower [Fe/H].The ridge line is similar to the standard evolutionary trajectory of a population that has a high star-formation rate (e.g.Schönrich & Binney 2009b).The colours in the top row show that in the bulge chemistry barely depends on J φ but [Mg/H] increases with Jz and [Fe/H] increases with Jr.
The second row in Fig. 19 shows that the stellar halo is modelled as a metal-poor structure (x0 = −1.1,y0 = 0.28) widely scattered over the chemical plane (σx = 0.51, σy = 0.13).Its chemistry depends little on J φ but significantly on Jz and Jr in the sense of [Fe/H] increasing with Jz and [Mg/H] increasing with Jr -the reverse of the dependencies in found the bulge.The stellar halo is now believed to owe much to the 'Enceladus' merger event, which formed a mildly counter-rotating, relatively metal-rich, radially anisotropic and flattened component of the halo (Belokurov et al. 2018;Helmi et al. 2018;Myeong et al. 2018).According to this picture the mean value of Jr should increase with [Fe/H] while the mean value of Jz should decrease with [Fe/H].These are the trends seen in the bulge and not those Fig.19 implies for the halo.
The third row of Fig. 19 shows that the high-α disc covers a narrow range in chemistry, all at large [Mg/H] as we expect of a component that formed before type Ia supernovae became important.The narrowness of its footprint reflects a notably small dispersion σy = 0.0061, 24 times smaller than its dispersion in x.Table 5 shows that [Fe/H] depends only weakly on J φ and in the sense that metallicity increases withe J φ and therefore radius.Schönrich & McMillan (2017) discuss the origin of this 'inverse metallicity gradient'.[Fe/H] depends quite strongly on Jr and Jz but in opposing directions: increasing with Jr ad decreasing with Jz.
The bottom three rows of Fig. 19, for the thin-disc components, show distributions that slope in the same sense as the bulge and thick disc but more gradually so they do not reach such large values of [Mg/Fe].In these components the dominant dependence of chemistry on actions is upon J φ and the value C1,J φ = −0.363× 10 −3 dex/ kpc km s −1 given for the middle disc implies d[Fe/H]/dR ≃ −0.085 dex kpc −1 , consistent with the values −0.077 ± 0.013 to −0.059 ± 0.012 dex kpc −1 deduced by Méndez-Delgado et al. (2022) for interstellar N and O, respectively.
The young and middle discs extend to much lower [Fe/H] than the old disc, which is a priori surprising.The reason is that the these discs have lager scale actions J φ0 so they are predicted to extend to larger radii and higher values of J φ .Given the negative value of d[Fe/H]/dJ φ , the outer (largely unobserved) parts of these discs are predicted to be very metal-poor.It is likely that |d[Fe/H]/dJ φ | diminishes for J φ much larger than the solar value, with the consequence that [Fe/H] does not fall to the low values predicted under our assumption of constant d[Fe/H]/dJ φ .
The middle, old and high-α discs have C1,J r > 0 implying that metallicity increases with eccentricity.Radial migration, combined with radial gradients in metallicity and Jr can generate this correlation: at a given value of J φ there are stars that migrated outwards and inwards.On average, the former will have higher metallicities and radial actions than the latter, thus inducing a correlation between metallicity and Jr at given J φ .In the young disc, by contrast, [Fe/H] decreases with Jr.
The chemistry of the young disc depends strongly on Jr and very strongly on Jz.In part this may reflect the smaller range in actions within this cool component.Fig. 19 shows that [Mg/H] increases strongly with Jz and decreases with Jr.The young disc's value of σy = 0.0066 is small so significant values of [Mg/H] are reached when Jz is abnormally large.This fact accounts for the sharp lower boundary of the thin disc's footprint in the bottom panels of Fig. 19.
All thin-disc components have C2,J φ > 0, implying an outward increase in [Mg/Fe], while for the high-α discs C2,J φ < 0. Thus the signs of both C1,J φ and C2,J φ reverse as one passes to the high-α disc from the thin disc.Given that high [Mg/Fe] is the signature of a high star-formation rate, it is natural for C2,J φ to be negative in the absence of radial migration.
The discs' values of C2,J r decrease along the young -old sequence so [Mg/Fe] increases with eccentricity in the young disc and decreases with eccentricity in the old and high-α discs.C2,J z is positive but decreasing along the younghigh-α sequence.A picture in which [Mg/Fe] increases with age and stars are secularly scattered away from planar, circular orbits implies that [Mg/Fe] should increase with eccentricity and inclination.Table 5 shows that it does increase with inclination but in the older components it is flat or decreasing with eccentricity.
The left columns of Figs 2 and 3 show the mean values of observed [Mg/Fe] and [Fe/H] split into tranches of Jr.The right panels of Fig. 20 show the same data but regardless of Jr.The U-shaped boundary to the (blue) region dominated by the high-α disc is now spectacularly evident, and in the lower plot the metal-rich thin disc is seen to be bounded below by J φ ≃ 200 kpc km s −1 and above by J φ ≃ 2000 kpc km s −1 , interestingly coinciding with the right-hand boundary of high-α disc.
The left panels of Fig. 20 show the model's version of these plots.The principal features are captured but the match is far from perfect.Most notably, in the upper left plot the boundary of the U-shaped region is insufficiently sharp and [Mg/H] is too low at J φ 0. In the lower left plot [Fe/H] is too high at low Jz and J φ ≃ 0.

COMPARISON WITH THE BESANC ¸ON MODEL
A natural question is how the present model compares with the 'Besançon Galaxy model' (BGM), which has recently been updated by Robin et al. (2022) to self-consistency in the context of Gaia DR3 astrometry.Major differences include: (i) In the BGM the bulge, the stellar halo and the dark halo were represented by pre-defined density distributions rather than DFs.The BGM bulge is spherical while the two haloes are ellipsoidal; (ii) In the BGM the integrals used are (E, J φ , I3) rather than J; (iii) The BGM was fitted to sky-plane velocities (V ℓ , V b ) computed using inverse parallaxes rather than StarHorse distances, and line-of-sight velocities were not used; (iv) The BGM was fitted to (a) the (V ℓ , V b ) distributions of stars within 100 pc of the Sun viewed in 36 directions, and (b) the (V ℓ , V b ) and parallax distributions of more distant stars viewed along 26 lines of sight.Fits were made to the 0.1, 0.25, 0.5, 0.75 and 0.9 quantiles in V ℓ and V b rather than to densities in three-dimensional velocity space at 72 locations in (R, |z|).(v) Rather than ignoring age distributions as we have done, the BGM embodies them by being fitted to colour-magnitude diagrams at various locations; (vi) Modelling the parallax distributions required engagement with the Gaia scanning law, and adoption of both a dust model and a luminosity functions for each DF.(vi) The Galaxy's circular-speed curve was an input to the BGM model whereas here it follows from the fits to the kinematics.While there are significant differences in the data employed and how the model parameters were fitted, the two models have similar scope: both are axisymmetric, selfconsistent dynamical models from which mock samples can be drawn by specifying selection criteria.The present model does not come with pre-defined luminosity functions but the new version of agama provides for each DF to include a luminosity function appropriate to the component's age and chemistry.The new version defines classes for lines of sight and dust models, so magnitude-limited samples can be drawn for any line of sight.

CONCLUSIONS
The APOGEE spectroscopic survey combined with Gaia astrometry casts a brilliant light on the chemodynamical structure of our Galaxy.From DR17 of the Sloan Digital Sky Survey we selected just under 218 000 giant stars that are unlikely to belong to globular clusters and have reliable distances in the Bayesian StarHorse catalogue (Anders et al. 2022).The selected stars cover the radial range 0.5 < R < 14.5 kpc and lie within 4 kpc of the plane.
We have avoided engaging with the complex selection function of the APOGEE survey by examining the kinematic and chemical distributions within 72 spatial bins that extend from R = 1 kpc to R = 14 kpc at |z| < 4 kpc.
We have used these distributions to update and extend the self-consistent dynamical model of BV23.This model is defined by distribution functions f (J) for six stellar components and the dark halo, together with a prescribed surface density of gas.The gravitational potential that these eight ingredients jointly generate was computed iteratively and the resulting predictions for the kinematics of stars at 72 spatial locations were compared with the data.The functional forms adopted for the DFs differ from those used by BV23.A major difference is that here the bulge is assumed to be a fat disc rather than a spheroidal component.Another difference is that here the high-α disc is taken to be radially truncated.A minor difference is in the way the variables J d and Jv introduced by Vasiliev (2019) are defined.
Another difference with BV23 is in how the kinematic data are fitted: whereas BV23 fitted by hand histograms of VR, Vz and V φ marginalised over the other two components of velocity, we have used the Nelder-Mead algorithm to fit the predicted densities of stars in 72 three-dimensional velocity spaces.
In a major extension of BV23 we have added a chemical dimension to the model by assigning to each stellar DF a probability density in the chemical space c = ([Fe/H], [Mg/Fe]).These probability densities are Gaussians with mean values that are linear functions of J.
We examined the variation in action space of the mean values of [Mg/Fe] and [Fe/H].Plots of this distribution (Figs. 2 and 20) show a sharp boundary at J φ = 0 that is a testament to the accuracy of the StarHorse distances we have used.It also implies that within the (wide) spatial region covered by the data, few bulge stars counter-rotate, so the bulge should not be modelled as a spheroidal component but as a hot disc.In addition to the sharp boundary at J φ = 0, plots of [Mg/Fe] show a sharp transition at J φ ≃ 2000 kpc km s −1 that we interpret as radial truncation of the high-α disc.
Devising a mathematical formula for the density of objects in a five-dimensional space is challenging.We have broken the task into dynamical and chemical parts.We consider that the dynamical problem has been satisfactorily solved as regards disc stars, but less satisfactorily as regards the bulge and the spheroidal components.Given a DF, one has to quantify the density of its stars in chemical space, which the data show to be a function of actions: C1,J φ is the only gradient coefficient conventionally considered, but Table 5 shows that it is by no means the only significant coefficient, even bearing in mind that J φ spans a wider range than the other two actions.The simplest ansatz for a probability density is a Gaussian, but the Gaussian's properties must be continuous functions of the actions.We have implemented the simplest, non-trivial scheme, namely similar Gaussians centred on means that depend linearly on the actions.Even though this scheme is very basic, we ended up needing 70 parameters for the chemical pdf.
Alternatively, we could have followed Sanders & Binney (2015) and used a model inspired by the notion that disc stars diffuse from circular orbits.We were discouraged from pursuing this idea by unsuccessful attempts to model the APOGEE data published in Hayden et al. (2015) by the approach of Sanders & Binney (2015), but the proposal might be worth revisiting.A radically different scheme would be to represent the logarithm of the star density as a sum of basis functions, such as wavelets.A successful scheme would combine flexibility, a low parameter count and intrinsic nonnegativity.Given the very basic nature of our ansatz, the fits to data shown in Figs.16 to 18 may be counted a success.
We followed BV23 in representing the age distribution in the thin disc with three distinct discs, rather than following Binney (2012) and Sanders & Binney (2015) in using a single thin-disc DF that includes an age parameter.Given the limited success of the discrete approach, a return to continuous variation with age may be wise.The challenge is to model in a flexible yet parameter-poor way the dependence upon age of both the dynamical and the chemical parameters.
It is likely that the replacement of the halo and bulge DFs by better functional forms would make it possible to obtain better fits even without restructuring the chemical model.DFs that are not confined to one sign of J φ are subject to subtle constraints, as will be explained elsewhere (Binney in preparation).The parameters of the functional form used for the stellar and dark-matter DFs cannot be freely varied if unphysical kinematics are to be avoided, with the consequence that in the model neither component is as radially biased as it probably should be.The data imply that the bulge like the discs lies overwhelmingly at J φ > 0 but its DF should surely not vanish completely at J φ < 0, as it does in the model.This artificial vanishing of the bulge DF may be responsible for the poor match of the model's circularspeed curve to the points from Wegg & Gerhard (2013) in Fig. 9. Including in the modelling some observational bias against low-metallicity stars may also improve fits.
Our 'to do' list is as follows: (i) replace the quality cut on distances by cuts based on uncertainties in ln R and ln |z|, and possibly relax quality cuts on low-metallicity stars in order to reduce biases against halo stars; (ii) replace the DFs of dark and stellar haloes and explore more anisotropic DFs for these components; (iii) try to replace the three thin-disc DFs by a single DF that includes age-dependent dynamics and chemistry; (iv) add age data.
A model such as ours can be used to make innumerable predictions regarding orbits around the Galaxy and the density of stars of given chemistry at any location in phasespace; here we have shown only a tiny selection of such predictions.For example, it would be interesting to produce plots like Fig. 10 of Eilers et al. (2022) or to fit the functional forms of Lian et al. (2022) to the spatial structures of our low-and high-α discs and compare with the values Lian et al obtained.Using the agama package it is easy to make predictions in seconds by drawing from the model samples of stars with known component memberships.In addition to yielding mock observations, such samples can be used as initial conditions for an N-body simulation.Vasiliev (2019) showed that simulations started in this way form precisely equilibrium systems, and subtle departures from equilibrium can be explored by slightly shifting the phase-space coordinated returned by agama before advancing the N-body model in time.

Figure 1 .
Figure 1.The spatial distribution of the stellar sample: projections along the z axis (upper panel) and the y axis (lower panel).The colour scale is (base 10) logarithmic.

Figure 2 .
Figure 2. Mean values of [Mg/Fe] of stars as a function of position in action space.Values are shown for four bands in Jr (left column) and Jz as marked in the upper right of each panel.

Figure 3 .
Figure 3. Mean values of [Fe/H] of stars as a function of position in action space.Values are shown for four bands in Jr) (left column) and Jz as marked in the upper right of each panel.

Figure 4 .
Figure 4.The number of stars in each cell shown in the top and bottom panels of the left columns of Figs. 2 and 3.

Figure 5 .
Figure 5.The standard deviation in [Mg/Fe] in a slice of action space.The dotted lines mark the regions for which chemical pdfs are presented in Fig. 6.

Figs. 2
Figs. 2 and 3 show just mean values of [Mg/Fe] and [Fe/H].To gain further insight we now explore distributions in the ([Fe/H],[Mg/Fe]) plane for the four regions of the (J φ , Jz) plane that are marked in Fig.5over a plot of the standard deviation in [Mg/Fe] in the slice of action space with smallest Jr.The standard deviation tends to be large where components with differing chemistry overlap.The first of these regions, labelled A, lies along the Jz axis at Jz > 10 kpc km s −1 and J φ < 300 kpc km s −1 .Stars in this region are on highly inclined orbits, so they dominate the stellar density near the z axis.We call this the bulge/halo region.The second region, B, occupies the heart of the high-α region of action-space -Jz > 50 kpc km s −1 and 300 < J φ < 2000 kpc km s −1 .The third region, labelled C, lies just above the J φ axis at J φ > 300 kpc km s −1 .

Figure 6 .
Figure6.Contours show the chemistry of the four regions A-D marked by dotted lines in Fig.5.The top row is for the bulge/halo region A around the Jz axis.The second row is for the high-α region B. The third row is for the thin-disc region C just above the J φ axis, and the bottom row is for the outer, flaring low-α disc, D. The panels are coloured by mean J φ , Jz and Jr in the left, centre and right columns, respectively.Note that while the [Fe/H] axis covers the same range (−1.2, 0.5) in the lower three rows, in the top row it extends to metal-poorer stars: [Fe/H] = −1.7.
must lie very close to the plane, so they are either very close to the Sun or are significantly extincted.Could high extinction artificially lower their measured values of [Mg/Fe]?
4.1 EDF structureSanders & Binney (2015) introduced the concept of an extended distribution function (EDF), that is a density of stars in the five-dimensional space spanned by J and chemistry, c ≡ [Fe/H], [Mg/Fe] .

Figure 7 .
Figure7.The action-space distributions of low-and high-α stars.The colour scale gives the base-10 logarithm of the star density when projected onto the (J φ , Jz) plane in the upper panels and the (J φ , Jr) plane in the lower panels.In each pair the upper panel is for [Mg/Fe] > 0.2 and the lower panel is for [Mg/Fe] < 0.15.

Figure 8 .
Figure 8.The mean value of Jz within the (J φ , Jr) plane.

Figure 9 .
Figure9.Circular speed of the potential with prior estimates.

Figure 11 .
Figure 11.Full curve: surface density of disc stars.Dotted curve: surface density of all stars.Red dashed curve: exponentially falling surface density with scale length R d = 3.6 kpc.

Figure 12 .
Figure 12.Upper panel: contributions to the density at (R, 0).Lower panel: rotation rates of the components at (R, 0).
from the 2MASS survey.Recently, Robin et al. (2022) inferred a scale-length parameter Hρ = 2900 kpc for the DFs of the thin disc components, which is best compared to the values of J φ0 /Vc(R0) ≃ 4.27 kpc for the young and middle discs and 2.1 kpc for the old disc.Bovy et al. (2012) derived scale lengths for mono-abundance populations in APOGEE and found that these increased steadily from R d < 2 kpc to R d > 4 kpc as [Mg/Fe] decreases and [Fe/H] increases.The results in Table3are qualitatively consistent with that early work.The full curve in the upper panel of Fig.12shows as a function of radius the density of stars at z = 0, while the blue curves show the corresponding densities of the disc components.The contributions of the bulge and the stellar halo are shown by the black and red dotted curves.The density of the dark halo is shown by the black long-dashed curve.

Figure 16 .
Figure16.The chemical composition of stars that lie in 10 cells in action space.The left column is for cells with J φ ∈ (−200, 200) kpc km s −1 and Jz in intervals that increase from bottom to top; specifically (0, 20), (25, 55), (60, 90), (95, 125) and (130, 160) kpc km s −1 .The right column is for J φ ∈ (300, 700) kpc km s −1 and the same intervals in Jz.Within each column the right panels show the distribution of real stars, while the left panels show mock stars.

Figure 19 .
Figure 19.The components in chemical space.Each row shows the structure of the component listed on the right-hand edge.Contours show the density of stars of the given chemistry while colours show, from left to right, the mean values of J φ , Jz and Jr.The numbers in brackets give the values in kpc km s −1 corresponding to red and blue hues.The density increases by a factor 2 between adjacent contours.

Figure 20 .
Figure 20.Mean values of [Mg/Fe] (upper panels) and [Fe/H] (lower panels) in the (J φ , Jz) plane from the observations (right panels) and as predicted by the model (left panels).

Table 1 .
Initial values of the chemical pdfs (for fitted values see Table5below).The units of θ are degrees while x 0 , y 0 , σx, σy are given in dex.The values quoted for the gradient matrices C (eqn 3) are in dex per Mpc km s −1 .C 1,Jr ≡ C [Fe/H],Jr while C 2,Jr ≡ C [Mg/Fe],Jr , etc.

Table 3 .
Parameters of the disc DFs.Masses are in units of 10 10 M ⊙ and actions in kpc km s −1 .

Table 4 .
Parameters of the spheroidal DFs.Masses are in units of 10 10 M ⊙ and actions in kpc km s −1