Recovering the gravitational potential in a rotating frame: Deep Potential applied to a simulated barred galaxy

Stellar kinematics provide a window into the gravitational field, and therefore into the distribution of all mass, including dark matter. Deep Potential is a method for determining the gravitational potential from a snapshot of stellar positions in phase space, using mathematical tools borrowed from deep learning to model the distribution function and solve the Collisionless Boltzmann Equation. In this work, we extend the Deep Potential method to rotating systems, and then demonstrate that it can accurately recover the gravitational potential, density distribution and pattern speed of a simulated barred disc galaxy, using only a frozen snapshot of the stellar velocities. We demonstrate that we are able to recover the bar pattern speed to within 15% in our simulated galaxy using stars in a 4 kpc sub-volume centered on a Solar-like position, and to within 20% in a 2 kpc sub-volume. In addition, by subtracting the mock"observed"stellar density from the recovered total density, we are able to infer the radial profile of the dark matter density in our simulated galaxy. This extension of Deep Potential is an important step in allowing its application to the Milky Way, which has rotating features, such as a central bar and spiral arms, and may moreover provide a new method of determining the pattern speed of the Milky Way bar.


INTRODUCTION
One of the major goals of Milky Way dynamics is the recovery of the gravitational potential.This is because the gravitational potential is sourced by all forms of matter, both baryonic and dark.As far as is known at present, the dark component can only be mapped by its gravitational effects (direct or indirect).Recovering the gravitational potential is thus important for mapping the dark matter distribution in the Milky Way.Motivated by the ongoing Gaia mission, which is providing six-dimensional phase-space information for tens of millions of stars (Gaia Collaboration et al. 2016Collaboration et al. , 2023a)), we tackle this problem using a data-driven method using mathematical tools from deep learning, which we term "Deep Potential" (Green & Ting 2020;Green et al. 2023).
The dynamics of the trajectories of the stars in the Milky Way are dominated by gravitational forces.While we are able to measure the positions and velocities of the stars, stellar accelerations due to the Milky Way potential -typically on the order of 1 cm s −1 yr −1are exceedingly difficult to directly measure with present-day instruments (Silverwood & Easther 2019).Some systems do lend themselves to acceleration measurements, but they are either in the presence of strong accelerations or exotic systems such as pulsars or eclipsing binaries (Ghez et al. 2000;Chakrabarti et al. 2021;Phillips et al. 2021).
This means we are effectively limited to a single snapshot of the positions and velocities of the stars in the Milky Way.We describe ★ E-mail: taavet.kalda@gmail.comthis snapshot via its distribution function,  (ì , ì ), which gives the density of stars in phase space (position and velocity).However, unless additional assumptions are made about the nature of a gravitational system, any gravitational potential is consistent with any snapshot of the distribution function, as the gravitational potential only determines the evolution of the system.To connect the distribution function to the potential, one typically assumes the system to be in a steady state, in which the distribution function does not vary with time.In this work, for the first time, we weaken this assumption and require only that stationarity hold in some arbitrarily rotating frame.We achieve this by concurrently finding both the potential and the rotation parameters which render the distribution function most stationary.This is important because many physical systems of interest, such as the Milky Way, contain rotating features.The Milky Way harbors a central bar (e.g., Liszt & Burton 1980;Binney et al. 1991;Weinberg 1992;Binney et al. 1997;Blitz & Spergel 1991;Hammersley et al. 2000;Wegg & Gerhard 2013;Shen & Zheng 2020) and spiral arms (e.g., Oort et al. 1958;Georgelin & Georgelin 1976;Gerhard 2002;Churchwell et al. 2009;Reid et al. 2014), both of which break the standard, non-rotating stationarity assumption.The properties of the Milky Way bar are still not fully determined, with the two leading models being a slowly rotating long bar or a fast-rotating short bar (e.g., Clarke & Gerhard 2022).The method we present in this paper should be sensitive to the pattern speed of the Milky Way bar or local spiral features, and could help determine whether the bar has a slow or fast pattern speed.
Existing methods for modeling the dynamics typically rely on taking velocity moments of the Collisionless Boltzmann Equation viaJeans modeling (Binney & Tremaine 2008) or rely on simplified models for the distribution function (e.g., Schwarzschild 1979;Syer & Tremaine 1996;McMillan & Binney 2008;Bovy & Rix 2013;Magorrian 2014).This has been historically motivated by either the lack of or small quantities of available full six-dimensional phasespace data, but also by Milky Way by and large being axisymmetric or having other symmetries which lends itself to aforementioned modeling approaches.However, with the large quantitative and qualitative improvements brought by Gaia, capturing the full complexity of the stellar kinematic data -including non-axisymmetric structures -has become more feasible.Deep Potential goes beyond simple parametric models and borrows several techniques from the realm of deep learning.With the methodological improvements developed in this work, Deep Potential makes the following minimal assumptions about the underlying physics of the kinematic system: (i) The motions of stars are guided by a gravitational potential Φ(ì ).
(ii) We observe the phase-space coordinates of stars (the "kinematic tracers") that are statistically stationary in a rotating frame.
(iii) The overall matter density is non-negative everywhere: (ì ) ≥ 0. We can express this via gravitational potential using the Poisson equation as ( ì Notably, we do not assume that the gravitational potential is sourced by the observed kinematic tracers, as other matter components (e.g., dark matter) can contribute to the potential.
Our previous work on this method outlined its theoretical motivations and demonstrated its effectiveness on simpler toy models with observational errors and non-stationarities (Green & Ting 2020;Green et al. 2023).These toy models involved drawing particles from analytic models of stationary systems or evolving a set of tracer particles in a fixed background potential.In this work, we move further and demonstrate the method on a self-consistent -body simulation of a disc galaxy with a prominent central bar, and we further relax the assumption that the distribution function be stationary in the observed, "laboratory" frame of reference.Furthermore, recent work by Ghosh et al. (2023a) demonstrated that the presence of a prominent bar produces systematic biases in recovering the underlying distribution function and gravitational potential when using action-based dynamical modeling.
Similar methods to Deep Potential, using normalizing flows to represent the stellar distribution function and then determining the gravitational potential by assuming stationarity, have been developed by An et al. (2021), Naik et al. (2022), and Buckley et al. (2023a,b).Lim et al. (2023) recently applied normalizing-flow-based modeling to a sample of Milky Way stars in order to estimate the local dark matter density.The major qualitative addition made by our present work over previous methods is the relaxation of the stationarity assumption, to hold in an arbitrarily rotating frame.This is an important advance in order to accurately model the Milky Way, which harbors rotating features such as a central bar.
In this paper, we describe and expand on the methodology of Deep Potential (Section 2), outline the -body simulation and the selection of the dataset (Section 3), test the performance of the method (Sections 4 and 5) and discuss future prospects (Section 6).

METHOD
This work builds on the "Deep Potential" method, which is explained in Green & Ting (2020) and Green et al. (2023).In this work, we extend Deep Potential to allow for concurrent fitting of both the gravitational potential and the rotating frame in which the system appears most stationary.In a barred galaxy, for example, this could correspond to a frame rotating at the pattern speed of the bar.Here, we briefly review the key components of the Deep Potential method and then derive a generalized stationarity assumption in a rotating frame.
The first assumption of Deep Potential is that stars orbit in a background gravitational potential, Φ(ì ).The density of an ensemble of stars in six-dimensional phase space (position ì  and velocity ì ) is referred to as the distribution function,  (ì , ì ).The evolution of the distribution function is described by the Collisionless Boltzmann Equation: Our second assumption is that the distribution is stationary.In previous work, we assumed that stationarity holds in the laboratory frame (i.e., the frame in which the positions and velocities of the stars are measured, such as the barycenter of the Solar System):   / = 0.However, disc galaxies typically have rotating features, such as bars and spiral arms.In a barred galaxy, for example, it would be reasonable to assume that the galaxy would appear more stationary in a frame that co-rotates with the bar, instead of in an inertial frame.We therefore generalize the stationarity condition to a frame that is rotating with with angular speed ì Ω around an axis passing through a point ì  0 in space.We additionally allow the stationary frame to be move with constant velocity ì  0 relative to the laboratory frame.In the Milky Way, ì  0 and ì  0 could represent the location and velocity of the Galactic Center relative to the Solar System, and ì Ω (directed along the rotation axis of the Galaxy) could represent the pattern speed of either the central bar or of the spiral arms.Here, we work out the stationarity condition for the general case when ì  0 ≠, ì  0 ≠ ì 0 (for real observations, the location of the rotation axis and the velocity of the center of system are not always zero).In general, the parameters describing the stationary frame can either be fixed, or can be determined concurrently with the gravitational potential.Though we derive the stationarity condition in full generality, in the numerical experiments in this work, we will later fix ì  0 and ì  0 to zero.
In the following, we denote the partial derivative of the distribution function w.r.t.time in the rotating frame as (   / ) Ω .The generalized stationarity condition states that this partial derivative should equal zero.By translating the rotating-frame partial derivative to partial derivatives in the laboratory frame, we obtain our generalized stationarity condition in terms of laboratory-frame quantities: For a full derivation of this transformation, see Appendix A. Combining this with the Collisionless Boltzmann Equation, we arrive at If the distribution function is truly stationary, the gravitational potential can be uniquely determined by solving Eq. ( 5).Realistic physical systems will, however, not be completely stationary, and as such, there may not exist any potential which would render the system stationary (See An et al. 2021 andGreen et al. 2023 for discussion).In general, therefore, Deep Potential recovers the potential which minimizes some measure (to be discussed below) of the total non-stationarity in the system.Note that we do not assume that the gravitational potential is sourced by the observed stellar population alone.Accordingly, we do not impose the condition

Modeling the distribution function
In practice, when we observe stellar populations, we obtain a discrete sample of points in phase space, rather than a smooth distribution function  ( ì , ì ).In order to obtain gradients of the underlying distribution function, we require a continuous, differentiable object representing  ( ì , ì ).For this purpose, we use normalizing flows, which are a class of algorithms used for density estimation in unsupervised machine learning (for a review, see Kobyzev et al. 2019).A normalizing flow works by learning a set of invertible coordinate transformations that turn a simple distribution, usually a normal distribution, into a more complex one that fits the observed data.The complexity of the distributions it can capture is limited by the number of parameters describing the coordinate transformations.There are many approaches for constructing normalizing flows, and the field is in constant development.In this work, we opt to use FFJORD (Grathwohl et al. 2018), though the particular choice of normalizing flow method is not critical to the working of Deep Potential.The main drawback of normalizing flows is that most implementations assume the training data to be continuous everywhere.This, however, is not a significant problem, as most stellar systems exhibit the same property.
Given a sample of  stars with positions ì   and velocities ì   , we train a normalizing flow  (ì , ì ) using stochastic gradient descent to obtain the parameters of the flow that maximize the log-likelihood The loss is supplemented with Jacobian and kinetic regularization (Finlay et al. 2020), as detailed in Section 2.3.When doing subsequent analysis, we also multiply the output of the normalizing flow by .This is because normalizing flows are normalized to one, but we're interesting in the output being the number density of training data.The great advantage of using a normalizing flow is that our representation is both highly flexible and auto-differentiable.When implemented in a standard deep-learning framework, such as Ten-sorFlow (Abadi et al. 2015), PyTorch (Paszke et al. 2019) or JAX (Bradbury et al. 2018), it is possible to automatically differentiate the distribution function at arbitrary points in phase space, in order to obtain the terms   / ì  and   /ì  in the Collisionless Boltzmann Equation.

Modeling the gravitational potential
After learning the distribution function, we find the gravitational potential Φ( ì ) and angular rotation speed Ω that best satisfy the Collisionless Boltzmann Equation and generalized stationarity assumption given in Eq. (5).To parameterize the gravitational potential, we use a feed-forward neural network which takes as input a three-dimensional vector ì  and outputs a scalar, Φ.We concurrently train the parameters of the potential and Ω to minimize where L is the differential contribution to the loss of an individual point in phase space, given by The first term penalizes non-stationarity in a frame rotating with angular speed Ω while the second term penalizes negative mass densities.We first take the absolute value of (   / ) Ω , in order to penalize positive and negative changes in the phase-space density equally.The inverse hyperbolic sine function down-weights large values, while the constant  sets the level of non-stationarity at which our penalty transitions from being approximately linear to being approximately logarithmic.The loss is supplemented with ℓ 2 regularization based on the neural-network weights that describe Φ(ì ).The integral in Eq. ( 8) is computationally expensive to evaluate directly, but can be approximated by averaging L over  samples drawn from the distribution function, where  is a sufficiently large number: The constant / comes from the normalization of the distribution function, and can be omitted when implementing the loss function.

Implementation
In this paper, we implement Deep Potential in TensorFlow 2 (Abadi et al. 2015) and using TensorFlow Distributions (Dillon et al. 2017).All of our code is publicly available under a permissive license that allows reuse and modification with attribution, both in archived form at https://doi.org/10.5281/zenodo.8390759and in active development at https://github.com/gregreen/deep-potential.
To represent the distribution function, we use a chain of three FFJORD normalizing flows, each with eight or twenty densely connected hidden layers, depending on the system, of hidden_size neurons (in this paper, we use hidden_size = 128 or 256 for different systems) and a tanh activation function.For our base distribution, we use a multivariate Gaussian distribution with mean and variance along each dimension set to match the training data set.During training, we impose Jacobian and kinetic regularization with strength 5 × 10 −4 (Finlay et al. 2020), which penalizes overly complex flow models and tends to reduce training time.We train our flows using the rectified Adam optimizer (Liu et al. 2019), with a batch size of 2 13 (8096).We find that this relatively large batch size leads to faster convergence (in wall time) than more typical, smaller batch sizes.We begin the training with a "warm-up" phase that lasts 2048 steps, in which the learning rate linearly increases from 0 to 0.001.Thereafter, we use a constant learning rate.We decrease the learning rate by a factor of two whenever the training loss fails to decrease 0.01 below its previous minimum for 512 consecutive steps (this period is termed the "patience").We terminate the training when the loss drops below 10 −6 .
After training our normalizing flow, we draw  = 2 21 (∼ 2 million) phase-space coordinates, and calculate the gradients   / ì  and   /ì  at each point (using auto-differentiation), for use in learning the gravitational potential.We represent the gravitational potential using a feed-forward neural network with four densely connected hidden layers, each with 512 neurons and a tanh activation function (we eschew more commonly used activation functions, such as ReLU, which have discontinuous derivatives, as these may lead to unphysical potentials).The network takes a three-dimensional input (the position ì  in space), and produces a scalar output (the potential).No activation function is applied to the final scalar output.We add in an ℓ 2 loss on the potential network weights with strength 0.1/n_weights, where n_weights is the total number of weights in the network.We train the network using the rectified Adam optimizer, with batches of 2 15 (32 768) phase-space coordinates.We use a similar learning-rate scheme as before, with a warm-up phase lasting 2048 steps, an initial learning rate of 0.001, and a patience of 2048 steps.In the potential loss function (Eq.9), we set  = 1 × 10 5 ,  = 1, and  = 1, which affect the penalties on non-stationarity and negative gravitational mass densities.
When fitting both the distribution function and the gravitational potential, we reserve 25 % of our input data as a validation set.After each training step, we calculate the loss on a batch of validation data, in order to identify possible overfitting to the training data.Such overfitting would manifest itself as a significantly lower training loss than validation loss.In the experiments in this paper, no significant overfitting is observed-the difference in the likelihoods of the training and validation sets is typically less than 1%.
The choices made here are by no means set in stone, and can be altered without changing the overall Deep Potential framework.In particular, rapid advances have been made in research into normalizing flows over the preceding years.As more accurate and/or computationally efficient flows are developed, they can be used by Deep Potential.

FIDUCIAL 𝑁-BODY BAR MODEL
To demonstrate Deep Potential in a rotating frame, we make use of an -body simulation of a collisionless stellar disc that subsequently develops a strong bar in the central region.In Section 3.1, we describe the initial equilibrium set-up, and the basic structural parameters pertaining to the fiducial bar model.In Section 3.2, we explain some of the key bar properties which will be used later in this work.Table 1.Key structural parameters for the initial, equilibrium model of our simulated galaxy.This model is subsequently integrated in a self-consistent manner for 9 Gyr, during which time it develops a central bar.See Section 3 for details of the  -body simulation.
Finally, in Section 3.3, we specify the tracers that we use for training our normalizing flows.

Simulation set-up and equilibrium configuration
The details of the initial equilibrium model were previously explained in Ghosh et al. (2023b) (See rthick0.0there).Here, for the sake of completeness, we briefly mention the equilibrium set-up for our fiducial bar model.The equilibrium model consists of an axisymmetric stellar disc which is embedded in a live dark matter halo.The stellar disc is modeled with a Miyamoto-Nagai profile (Miyamoto & Nagai 1975) with a potential of the form where  d and ℎ z are the characteristic disc scale length and scale height, respectively, and  disc is the total mass of the stellar disc.
The dark matter halo is modeled with a Plummer sphere (Plummer 1911), with potential of the form where  H is the characteristic scale length, and  dm is the total halo mass.Here,  and  are the radius in the spherical and the cylindrical coordinates, respectively.In Table 1, we list the key values of the structural parameters for stellar disc as well as the dark matter halo.The total number of particles used to model each of these structural components is also listed in Table 1.
The initial conditions of the stellar disc are obtained using an iterative method (See Rodionov et al. 2009).For this model, we constrain the density profile of the stellar disc while allowing the velocity dispersions (both vertical and radial components) to adjust such that the system converges to an equilibrium solution (for details, see Fragkoudi et al. 2017;Ghosh et al. 2023b).The simulation is run using a TreeSPH code by Semelin & Combes (2002).A hierarchical tree method (Barnes & Hut 1986) with opening angle  = 0.7 is used for calculating the gravitational force, which includes terms up to quadrupole order in the multipole expansion.In addition, we use a Plummer potential for softening the gravitational forces with a softening length  = 150 pc.The equations of motion are integrated using a leapfrog algorithm (Press et al. 1986) with a fixed time step of Δ = 0.25 Myr.The model is evolved for a total time of 9 Gyr.

Properties of the stellar bar
To study the robustness of the Deep Potential in a rotating frame when applied to a barred galaxy, we choose three snapshots, namely at  = 2 Gyr, 4.25 Gyr, and 9 Gyr from the fiducial bar model.Fig. 1 shows the corresponding face-on density distribution of stellar particles at these three time steps.A visual inspection reveals that at all three time steps the model harbors a prominent stellar bar in the central region.
We quantify the strength of the bar by taking the Fourier decomposition (in azimuthal angle ) of the density in annuli (of cylindrical radius ).We then calculate the strength of the normalized  = 2 Fourier coefficient as a function of : where   denotes -th coefficient of the Fourier moment of the stellar density distribution,   is the mass of the  th particle (not to be confused with the Fourier coefficient), and   is the corresponding azimuthal angle.For each radius , the summation runs over all particles within the radial annulus [,  + Δ], with Δ = 0.5 kpc.Fig. 2 shows the corresponding radial profiles of the  = 2 Fourier coefficient at the three selected time steps.We see that at  = 2 Gyr, the bar is rapidly forming while at  = 4.25 Gyr the bar reaches its maximum strength (i.e., highest peak value of the  = 2 Fourier coefficient).By the end of the simulation run at  = 9 Gyr, the bar remains strong.For a detailed exposition of the temporal growth of the bar, the reader is referred to Ghosh et al. (2023b).
The remaining two properties of the bar that are relevant to this work are the extent of the bar,  bar , and the pattern speed of the bar, Ω bar .Following Ghosh & Di Matteo (2023), at time , we define  bar as the location where  2 / 0 drops to 70% of its peak value.The corresponding extent of  bar is indicated in Fig. 1 by a blue circle.We measure the bar pattern speed by fitting a straight line to the temporal variation of the phase-angle of the  = 2 Fourier mode.This method assumes that the bar rotates rigidly with a single pattern speed in that time-interval.The bar slows with time, with bar pattern speeds of 20.7 km s −1 kpc −1 , 12.15 km s −1 kpc −1 , and 8.1 km s −1 kpc −1 at 2, 4.25 and 9 Gyr, respectively.A rotating bar induces a number of resonances, namely, corotation (CR), the Inner Lindblad Resonance (ILR) and the Outer Lindblad Resonance (OLR).To determine the locations of these resonaces, we first need to compute the radial variation of the circular velocity (equivalently, the rotation curve).At time , the circular velocity  c is calculated with the asymmetric drift correction (?): Here,   is the azimuthal velocity, whereas   and   denote the radial and the azimuthal velocity dispersion, respectively.Using the rotation curve, we calculate the radial location of the CR, defined by Ω( =  CR ) = Ω bar .The corresponding  CR values are indicated in Fig. 1.We also provide the numerical values of  bar and  CR in Table 3.As the bar in our fiducial model slows down with time, the location of the CR is pushed farther out in the disc.

Dataset selection
In order to create samples of kinematic tracers for Deep Potential, we randomly select  = 2 19 (524 288) stellar particles at each time step, from inside a cylindrical volume of radius  = 16 kpc (centered at the origin of the system) and half-height (in ) of  = 2 kpc.At each time step, we orient the coordinate system such that the -axis is parallel to the angular momentum of the stellar particles and √  data , where  NF is the renormalized number of samples in the bin drawn from the normalizing flow and  data is the number of stellar particles in the same bin.As can be seen above, our normalizing flow captures all prominent features in the galactic stellar distribution, including the central bar and spiral arms.We thus obtain a smooth, differentiable representation of the galactic stellar population.
the origin corresponds to the peak density of the bar.Choosing the peak density as the origin is important, because there is a displacement in the order of 100 pc between the peak density and the center of mass of the stellar particles of the system.This is most likely caused by the system being in a transient state and having differential rotation between the central and outer regions.
Each stellar particle has a mass of 92 000 M ⊙ .To represent the composite particles as a collection of individual stars, we would need to upsample each composite particle by assuming that the density follows a Plummer sphere with  = 150 pc (this is caused by the softening length).This can however introduce artificial clumping that can manifest itself as spurious densities or accelerations inferred from the gravitational potential.With this in mind, we instead treat the stellar particles as individual stars, and do not perform any upsampling.
We also note that as the method currently stands, the training data in the chosen volume is assumed to have uniform completeness (i.e., each star has the same probability of being selected).If the completeness were not uniformly complete, the spatially varying selection function would introduce spurious gradients into the distribution function and hence invalidate the stationarity assumption.Uniform completeness is easy to guarantee for in the mock dataset, but with real observations, it is influenced by multiple factors such as crowding, dust, color and scanning patterns.Hence, for real Milky Way datasets, we expect modeling of the selection function to be critical.

Constructing the normalizing flow
When training the flows for the different time steps, three limitations of our FFJORD implementation must be addressed.The first challenge arises from the fact that the densities of the tracer particles vary by up to three orders of magnitude between the central and outer regions of the training volume.These large variations in density cause systematic biases in volumes of high density.To address this, we split the training volume into four concentric cylindrical shells, such that  As evidenced by the smoothness of the radial gradients, our stitching process, in which we fit the distribution function of cylindrical annuli separately and the join them together, does not introduce prominent boundary effects (i.e., discontinuities).the variation in the density of the tracer particles in each shell is lessened.This means training a separate normalizing flow for each shell.Each concentric shell is bounded by an inner surface with cylindrical radius  =  in , an outer surface with cylindrical radius  =  out , and two flat surfaces at  = ± = ±2 kpc.For details on the number of particles in each shell for the first time step, see Table 2 (the other two time steps are treated in a similar fashion).
The second limitation comes from the difficulty that FFJORD has in capturing discontinuities in the training data.When crossing the boundary of the training volume, the density of the tracer particles discontinuously drops from a finite value (inside the volume) to zero (outside the volume), causing the first and higher derivatives of the distribution function to be undefined.FFJORD outputs a continuous normalizing flow that attempts to model the discontinuity, but ends up introducing systematic biases in the boundary region.The third limitation comes from the known difficulty of FFJORD in capturing volumes that are not topologically equivalent to spheres (in this case, a cylindrical shell is topologically equivalent to a donut).These latter two limitations are addressed by adding additional "virtual" particles outside the boundary, such that the volume inside  <  in , || <  is filled with a roughly uniform density of particles and the distribution function drops smoothly to zero outside the cylinder defined by radius  out and half-height .We train a normalizing flow with the virtual particles included.Subsequently, to draw a sample from one of the shells, we reject the sample that fall outside the chosen volume.For more technical details about the virtual particles, see Appendix B.
In the end, we retrieve a sample of  = 2 21 = 2 097 152 points from the desired distribution function by drawing a proportionate number of samples from each of the four cylindrical shells separately and concatenating them together.The resulting sample is then used for training the potential.

Validation
The most straightforward diagnostic of the trained normalizing flows is a comparison of their predicted phase space density with the training data.In Fig. 3, we compare two-dimensional projections of our normalizing flow and the true stellar distribution function for one time step ( = 2 Gyr).As can be seen, there are no visible systematic biases in the recovered phase space density, even in regions of low and high density.We observe similar behavior for the other two time steps.
Since the N-body simulation consists of discrete particles, rather than a smooth phase-space density (even though an individual particle is spatially a Plummer sphere, it is a delta function in velocity space), it is not possible to directly compare the gradients   / ì  ,   /ì  of the trained flow with its corresponding true values.However, the stitching procedure provides a surprising diagnostic that proves to be useful.At the boundaries between two neighboring shells, both the densities and the gradients of the flows should match, as their combination must represent a smooth distribution function.We can use this as a test of how well the normalizing flow performs at the boundaries.This effect is clearest when plotting a face-on projection of   / in the mock galaxy, as in Fig. 4. We note that this diagnostic serves to supplement the two-dimensional projections of the phase space density, not replace it.Even when the phase space densities do not show obvious systematics, we find signs of discontinuities in   / .There are also cases in which the reverse is true.With our final normalizing flow implementation and stitching procedure, such discontinuities are not readily apparent.

GRAVITATIONAL POTENTIAL
After obtaining a total of  = 2 21 samples for each time step, including the gradients   / ì  and   /ì  , we train a potential neural network Φ(ì ) and rotation speed Ω (for each time step) that best minimizes the aforementioned loss function in Eq. 10.In general, we find training the potential to be more robust and straightforward than training the normalizing flows.In particular, given a sample of distribution function gradients, different choices for the structure and hyperparameters of the potential neural network yield very similar results.
Given the gravitational potential, it is possible to compare the predicted accelerations and densities (from the gradients and Laplacian of the potential, respectively) with the ground truth obtained from the simulation itself.Importantly, Φ(ì ) represents the contribution from all forms of matter, including dark matter, so the comparisons with the simulation must be performed accordingly.

Modeled rotation speed
The rotation speeds that Deep Potential captures at different time steps are listed in Table 3.These can be directly compared with the rotation speed of the bar (as measured from the  = 2 Fourier mode of stellar density in the galactic midplane; see Section 3.2), as the bar is the most significant rotating feature present in the system.Despite the large non-stationarities and secular evolution of the system, we capture the rotation speed to within ∼3 km s −1 kpc −1 , or ∼20 % at all time steps.Deep Potential thus recovers the bar rotation speed using only a frozen snapshot of the stellar kinematics.For the snapshot at  = 2 Gyr, we plot ground-truth accelerations in the  = 0 plane obtained from the simulation (left column), recovered accelerations inferred from the gravitational model (middle column), and a comparison between the two (right column).At each point in the plane, we normalize the accelerations by the modulus of the true acceleration vector, so that the plotted values are always of order unity.We only plot the  and  components of acceleration, as the -component at  = 0 is very close to zero, and out of the plane is similar to the density estimates from the gravitational potential.In the midplane, we recover the overall smooth pattern of accelerations over the disc, with small-scale fluctuations at the level of five percent.Table 3.For all three time steps of our simulated galaxy, we compare the rotation speed of the bar (Ω bar , calculated by considering the time-evolution of the  = 2 Fourier mode of stellar density, as outlined in Section 3.2) with the best-fit value for the rotation speed Ω of the frame shift which renders the system most stationary, according to Deep Potential.We also provide the values for the radius ( bar ) and corotation radius ( CR ) of the bar.In all three time steps, we recover the pattern speed of the bar to within ∼3 km s −1 kpc −1 , corresponding to a maximum fractional error of ∼20 %.

Comparison of the modeled accelerations
We obtain the accelerations predicted by the gravitational model by calculating the gradients of the model via auto-differentiation: ì  = − ì ∇Φ( ì ).We can compare this with the ground truth from the simulation by summing over the contributions of the particles in the -body simulation, both stellar and dark matter (and accounting for the softening length).Fig. 5 compares the prediction with the ground truth at time step  = 2 Gyr.In most of the galaxy, we recover accelerations to within ∼20 %.The largest discrepancies occur near the center of the galaxy, where the bar is strongest.However, the major axis of the bar is almost exactly reproduced, and can be immediately identified in the spatial pattern of   and   .

Comparison of the modeled density
We obtain model densities by taking the Laplacian of the modeled gravitational potential, once again by auto-differentiaton:  model = ∇ 2 Φ/(4).Because Φ(ì ) represents the overall gravitational potential, sourced by both stellar and dark matter particles, the density estimate also represents the total density.We can compare this with the ground truth from the simulation, by summing over the contributions of all the particles.Due to the softening length of the simulation, each particle represents a Plummer sphere density with scale length  = 150 pc.
The model vs. ground truth comparison for all time steps is shown in Fig. 6.We see that the major features of the simulated galaxy are reproduced, including the central bar and spiral arms.In general, the model performs worse in the presence of low densities (See the outer regions at  = 9 Gyr) and high density gradients, such as when moving along the semi-minor axis of the bar near the galactic center.We also observe ubiquitous small-scale fluctuations in the predicted density.Identifying the origin of these fluctuations is difficult, as the effects of the potential causes are difficult to disentangle.One possible cause could be the inherent non-stationarity of the data, stemming from the system being in a state of evolution.This effect could be compounded In the left column, we plot the total density at the three time steps ( = 2, 4.25, and 9 Gyr), calculated by summing the densities from the stellar and dark matter particles.The middle column shows the density implied by the gravitational potential that Deep Potential recovers from the stellar kinematics.In the right column, we plot the fractional difference between the recovered and the true density.At each time step, we recover the major features of the galaxy, including the central bar and spiral arms.Though the true density varies over several orders of magnitude over the plotted region of the galaxy, we generally recover the density to a few tens of percent, with the residuals taking the form of spatially oscillating (with scale ∼1 kpc) fluctuations about zero.We note that the density is most accurately recovered when the bar is weakest (at  = 2 and 4.25 Gyr), and that the largest density residuals tend to occur at small galactic radii, along the bar minor axis.
by the smoothing length of each individual particle manifesting itself as spurious densities over a characteristic scale that is comparable to the smoothing length ( = 150 pc).Finally, discrepancies in the accurate representation of the underlying distribution function by the normalizing flow could also contribute.Particularly, in the last two time steps, the hyperparameters for the normalizing flow are not tuned as thoroughly as for the first time step.
The effects from non-stationarity can be lessened by smoothing the density maps with some suitable kernel, as done in Buckley et al. (2023b), for example.This is useful when we are interested in the A comparison between the radial profiles for dark matter  dm (orange lines) and overall matter  tot (blue lines) predicted by the model (dashed lines) and the ground truth (solid lines, labeled "data") at time  = 2 Gyr.The model estimate for the dark matter density is obtained by subtracting the true stellar density from the overall model density (See Section 5.3 for discussion).The profiles are obtained by calculating the average densities in a volume |  | < 1 kpc,  < 16 kpc along , where  refers to the spherical radial distance.We accurately recover the radial profile of total (stellar plus dark-matter) density across the entire studied volume, which extends to 16 kpc.In the central, heavily baryon-dominated region of the galaxy, even small fractional errors in the recovered total density are sufficient to significantly alter the recovered dark-matter density.However, for  > 2 kpc, where the baryons are less dominant, we recover the dark-matter density profile to within a factor of two.
average density in a neighborhood of some particular point.However, we do not focus on smoothed densities in this work, as it does not provide significant additional insight into the performance of our rotating-frame framework.
By considering the modeled density and subtracting the groundtruth baryonic density, we can build an estimate for the dark matter density in the system.This approach is motivated by the fact that stellar density can be estimated from direct observations, while dark matter is not directly observable.For example, in systems such as the Milky Way, there are analytic models of different baryonic components (McKee et al. 2015).
We provide a model estimate of the radial profile of the dark matter halo at time step  = 2 Gyr in Fig. 7.The modeled density deviates from the ground truth in volumes where baryonic matter dominates over dark matter, and where the overall density is small.For  > 2 kpc, the dark matter profile is reconstructed to within a factor of two.

Quantifying non-stationarities
Real galaxies are never perfectly stationary.Even though we train the gravitational potential and Ω to minimize non-stationarities in the rotating frame, we do not expect them to be able perfectly render (   / ) Ω zero everywhere in phase space.This is also because the system is overconstrained: we are searching for a three-dimensional gravitational potential that renders the distribution function stationary at every point in six-dimensional phase space.4. For all three time steps of our simulated galaxy, a comparison between the true rotation speed of the bar (Ω bar ) and the rotation speed that renders the system most stationary, according to Deep Potential, in various sub-volumes imitating the Solar neighborhood.The sub-volumes are centered around a point at a distance of 8 kpc and an angle of ∼ 20 • with respect to the galactic bar.We provide values for populations with radius 2 kpc and 4 kpc, and for their mirrored versions on the opposite side of the galaxy (values in parentheses).The bar rotation speed is captured to within ∼ 20% and ∼ 15% for the 2 kpc and 4 kpc datasets respectively.
We can quantify non-stationarities by plotting the average (  ln  / ) Ω in different regions of phase space.(  ln  / ) Ω serves as a more useful measure than (   / ) Ω because it is more interpretable, corresponding to the inverse characteristic timescale during which the distribution function undergoes significant changes.Fig. 8 shows a comparison between the non-stationarities of a gravitational potential trained in a rotating frame and one that is trained in the laboratory frame.We can see that in this system, the rotating frame yields significantly better results.Notably, the nonrotating potential has clear imprints from the central bar that are not visible in the rotating potential.

Rotation speed recovery in a sub-volume imitating the Solar neighborhood
This work so far has focused on the validation of Deep Potential on the simulated galaxy as a whole.While this is important for the overall validation of the method within a rotating frame, we can also test how well the method recovers the bar rotation speed in a sub-volume which more closely resembles what we would observe from our position within the Milky Way.To this end, we select two spherical volumes of radius 2 kpc and 4 kpc, centered around a point at a distance of 8 kpc and an angle of ∼ 20 • with respect to the galactic bar, akin to how the Sun is positioned in the Milky Way (Gaia Collaboration et al. 2023b).Because of the two-fold symmetry of the galactic bar, we also produce mirrored datasets by reflecting the selected volumes with respect to the galactic center.We follow the same procedure for all time steps.One of the selected sub-volumes in the first time step is visualized in Fig. 9.At each time step, we train a normalizing flow and a gravitational potential, along with the rotation speed in the same fashion as outlined in the previous sections.Deep Potential recovers the rotation speed to within 20 % and 15 % for the 2 kpc and 4 kpc datasets, respectively, for all time steps, which is, notably, slightly better than its performance when trained on the entire volume of the galaxy.We hypothesize that this may be due to the fact that the our sub-volume avoids the galactic center, where Deep Potential has the greatest difficulty imposing stationarity (see the left panel of Fig. 8).In general, one can remain optimistic about the ability of Deep Potential to recover the Milky Way bar rotation speed from stars observed with a few kiloparsecs of the Sun.

CONCLUSIONS
In this paper, we have demonstrated the Deep Potential method in a self-consistent -body simulation of a barred galaxy at three differ- ent time steps over the course of the evolution of the system.We have methodologically extended the Deep Potential to impose stationarity in a rotating frame.We achieve this by selecting a population of stellar particles and simultaneously fitting a gravitational potential and a rotation speed that best renders the population stationary in the rotating frame.For our -body simulation, in a 16 kpc dataset encompassing the majority of the simulated galaxy, Deep Potential recovers the rotation speed of the galactic bar to within 20 %, accelerations in the system to within 20 % and densities to within 50 %.We also recover the radial dark matter profile in outer regions of the galaxy ( > 2 kpc).The main features of the galaxy, such as the spiral arms and the central bar, are successfully recovered, even in the presence of strong intrinsic non-stationarities in the galactic stellar population.The modeled densities, however, exhibit small scale fluctuations, which could be caused by the intrinsic non-stationarity of the data, the smoothing length of the particles, or discrepancies in the modeling of the underlying distribution function with a normalizing flow.We have additionally demonstrated that Deep Potential is capable of recovering the rotation speed in smaller sub-volumes imitating the Solar neighborhood: the rotation speed of the bar is captured to within 20 % and 15 % for sub-volumes of radius 2 kpc and 4 kpc respectively.
Working in an arbitrarily rotating frame is important for modeling real-life galaxies, which often have rotating features such as spiral arms or a central bar.The Milky Way serves as a good candidate for the next application of Deep Potential, with the availability of sixdimensional phase space info for tens of millions of stars from Gaia Data Release 3 (Gaia Collaboration et al. 2023a).In particular, Deep Potential may be able to determine the pattern speed of the Milky Way bar, as well as the radial dark-matter density profile.Indeed, there has been recent work on applying normalizing-flow-based modeling on the Milky Way.Lim et al. (2023) saw the first application on the Milky Way for inferring local dark matter densities.
There are also several other avenues for future development.One could extend the Deep Potential formalism to account for nonuniform observational completeness in the kinematic tracers and errors on measured phase-space locations.Further, if there is data available for additional dimensions that are relevant for selecting populations which share the same dynamical history, such as metallicity or alpha abundance, one could train normalizing flows incorporating those dimensions, and then enforce stationarity on the distribution function while conditioning on the extra dimensions.Finally, one can model systems where full six-dimensional phase-space data is not available by imposing symmetries on the system, such as ax- isymmetry.While this is not relevant for the Milky Way, it can be important for external galaxies or compact systems where some of the dimensions (such as distance) are missing.

Figure 1 .
Figure 1.Face-on distribution of all stellar particles, calculated at  = 2, 4.25, and 9 Gyr for our fiducial bar model.The dashed black curves denote the constant density contours.The blue circle in each panel denotes the extent of the bar ( bar ) whereas the magenta circle denotes the location of the corotation radius ( CR ).The galaxy harbors a prominent bar, which grows in radial extent from the earliest to the latest time step.

Figure 2 .
Figure2.A measure of the radial extent of the bar in our simulated galaxy at three time steps.At each time step, we measure the radial profile of the  = 2 Fourier coefficient of stellar density in cylindrical annuli (normalized by the  = 0 coefficient) using Eq. 13.A prominent peak in the radial profile of the  = 2 Fourier coefficient clearly demonstrates the presence of a central bar at all three time steps.The central bar extends radially over time, becoming increasingly prominent in the stellar mass distribution.

Figure 3 .
Figure 3.A demonstration of the performance of our normalizing flow model of the stellar phase-space distribution function in two projections.At time  = 2 Gyr, we plot 2D histograms of selected stellar particles (left column) and of 2 21 = 2097152 synthesized particles sampled from the trained normalizing flow (middle column), and a comparison between the two (right column).The top row shows a face-on view of the galaxy, while the lower panel shows one velocity-space projection (  vs.   ) of the galaxy.The density of the synthesized particles has been renormalized by an overall constant to match the density of the stellar particles.For each bin, we define the Poisson significance as being ( NF −  data )/√  data , where  NF is the renormalized number of samples in the bin drawn from the normalizing flow and  data is the number of stellar particles in the same bin.As can be seen above, our normalizing flow captures all prominent features in the galactic stellar distribution, including the central bar and spiral arms.We thus obtain a smooth, differentiable representation of the galactic stellar population.

Figure 4 .
Figure 4.A 2D projection of the radial gradients in stellar density at time  = 2 Gyr.We draw a sample of 2 21 particles, and take the median   / in  −  bins.We compress the color scale by applying the arcsinh function.As evidenced by the smoothness of the radial gradients, our stitching process, in which we fit the distribution function of cylindrical annuli separately and the join them together, does not introduce prominent boundary effects (i.e., discontinuities).

Figure 5 .
Figure 5.A comparison of the gravitational accelerations inferred by Deep Potential in the midplane of our simulated galaxy with the true accelerations.For the snapshot at  = 2 Gyr, we plot ground-truth accelerations in the  = 0 plane obtained from the simulation (left column), recovered accelerations inferred from the gravitational model (middle column), and a comparison between the two (right column).At each point in the plane, we normalize the accelerations by the modulus of the true acceleration vector, so that the plotted values are always of order unity.We only plot the  and  components of acceleration, as the -component at  = 0 is very close to zero, and out of the plane is similar to the density estimates from the gravitational potential.In the midplane, we recover the overall smooth pattern of accelerations over the disc, with small-scale fluctuations at the level of five percent.

Figure 6 .
Figure 6.Comparison of the true total mass density in the plane of our simulated galaxy with the density recovered by Deep Potential.In the left column, we plot the total density at the three time steps ( = 2, 4.25, and 9 Gyr), calculated by summing the densities from the stellar and dark matter particles.The middle column shows the density implied by the gravitational potential that Deep Potential recovers from the stellar kinematics.In the right column, we plot the fractional difference between the recovered and the true density.At each time step, we recover the major features of the galaxy, including the central bar and spiral arms.Though the true density varies over several orders of magnitude over the plotted region of the galaxy, we generally recover the density to a few tens of percent, with the residuals taking the form of spatially oscillating (with scale ∼1 kpc) fluctuations about zero.We note that the density is most accurately recovered when the bar is weakest (at  = 2 and 4.25 Gyr), and that the largest density residuals tend to occur at small galactic radii, along the bar minor axis.

)
Figure7.A comparison between the radial profiles for dark matter  dm (orange lines) and overall matter  tot (blue lines) predicted by the model (dashed lines) and the ground truth (solid lines, labeled "data") at time  = 2 Gyr.The model estimate for the dark matter density is obtained by subtracting the true stellar density from the overall model density (See Section 5.3 for discussion).The profiles are obtained by calculating the average densities in a volume |  | < 1 kpc,  < 16 kpc along , where  refers to the spherical radial distance.We accurately recover the radial profile of total (stellar plus dark-matter) density across the entire studied volume, which extends to 16 kpc.In the central, heavily baryon-dominated region of the galaxy, even small fractional errors in the recovered total density are sufficient to significantly alter the recovered dark-matter density.However, for  > 2 kpc, where the baryons are less dominant, we recover the dark-matter density profile to within a factor of two.

∂Figure 8 .
Figure 8.The level of inferred non-stationarity in the stellar population across the disc of our simulated galaxy, when applying Deep Potential either in a rotating frame (left panel) or in the non-rotating "laboratory" frame (right panel).For time step  = 2 Gyr, we calculate the non-stationarity,  ln  / , calculated from our modeled distribution function and recovered potential.In the left panel, we work in the rotating frame inferred by Deep Potential (with Ω = 17.5 km s −1 kpc −1 ) and use the corresponding recovered gravitational potential.In the right panel, we work in the laboratory frame, and use the gravitational potential inferred by Deep Potential for that frame.Both plots show the  −  plane with  = 0.For each bin in  − , the median value of  ln  / is calculated by averaging over the velocity dimensions, weighted by the distribution function. ln  / can be interpreted as the inverse timescale during which the distribution function undergoes significant changes.We overlay stellar isodensity contours, in order to indicate the location and extent of the galactic bar and spiral arms.When allowed to work in a rotating frame (left panel), Deep Potential finds a gravitational potential that renders the system nearly stationary.When constrained to work in a non-rotating frame (right panel), Deep Potential is unable to find a steady-state solution, and significant non-stationarities -particularly along the galactic bar -are present.This comparison demonstrates the improvement gained by allowing Deep Potential to work in rotating frames, particularly in systems such as barred galaxies.

Figure 9 .
Figure 9.A visualization of a sub-volume of the simulated galaxy at  = 2 Gyr, meant to imitate what one would see in the vicinity of the Sun in the Milky Way.The spherical sub-volume (denoted by a green dotted circle) is chosen to have a radius of 4 kpc, and is centered around a point in the midplane, at a distance of 8 kpc and angle of ∼ 20 • with respect to the major axis of the galactic bar (denoted by the diagonal blue dotted line), similar to how the Sun is positioned in the Milky Way.Using only stars in this sub-volume, Deep Potential recovers the galactic bar rotation speed to within 15 %.

Table 2 .
At each time step, the simulated galaxy is divided into spherical shells, which are separately modeled using normalizing flows and then stitched together.Above, we describe the shells used for partitioning the training volume at time step  = 2 Gyr.The radii  in and  out of the inner and outer cylindrical surfaces of each of the shells is given, along with the number of tracer particles  inside each of them. with_padding refers to the number of particles after padding the shells with additional "virtual particles."These virtual particles are added in order to transform the sharp inner and outer boundaries of the shells into a smooth roll-off.This eases the normalizing flow training.