On the functional form of the radial acceleration relation

We apply a new method for learning equations from data -- Exhaustive Symbolic Regression (ESR) -- to late-type galaxy dynamics as encapsulated in the radial acceleration relation (RAR). Relating the centripetal acceleration due to baryons, $g_\text{bar}$, to the total dynamical acceleration, $g_\text{obs}$, the RAR has been claimed to manifest a new law of nature due to its regularity and tightness, in agreement with Modified Newtonian Dynamics (MOND). Fits to this relation have been restricted by prior expectations to particular functional forms, while ESR affords an exhaustive and nearly prior-free search through functional parameter space to identify the equations optimally trading accuracy with simplicity. Working with the SPARC data, we find the best functions typically satisfy $g_\text{obs} \propto g_\text{bar}$ at high $g_\text{bar}$, although the coefficient of proportionality is not clearly unity and the deep-MOND limit $g_\text{obs} \propto \sqrt{g_\text{bar}}$ as $g_\text{bar} \to 0$ is little evident at all. By generating mock data according to MOND with or without the external field effect, we find that symbolic regression would not be expected to identify the generating function or reconstruct successfully the asymptotic slopes. We conclude that the limited dynamical range and significant uncertainties of the SPARC RAR preclude a definitive statement of its functional form, and hence that this data alone can neither demonstrate nor rule out law-like gravitational behaviour.


INTRODUCTION
Kinematic measurements of galaxies relate their visible and dynamical masses, affording constraints on the distribution of dark matter and/or the behaviour of gravity.These measurements are simplest to perform for late-type galaxies supported predominantly by rotation, as the enclosed dynamical mass may be calculated from the centripetal acceleration and the law of gravity.Such studies have revealed a striking correlation between the enclosed baryonic and total dynamical mass assuming Newtonian gravity, dubbed the mass discrepancy-acceleration (Sanders 1990;McGaugh 2004) or radial acceleration relation (RAR; Lelli et al. 2017).It has been claimed that the RAR indicates that at high accelerations the Newtonian dynamical mass follows the baryonic mass (indicating little dark matter and the validity of Newtonian mechanics), while as acceleration drops below a new constant of nature g0 ≈ 10 −10 m s −2 the dynamical mass increasingly exceeds the baryonic mass in a regular way.
One may attempt to understand these observations from either a dark matter or modified gravity perspective.In ΛCDM the difference between the dynamical and baryonic mass is due to the dark matter that makes up most of the mass of harry.desmond@port.ac.uk † deaglan.bartlett@physics.ox.ac.uk the galaxy.The RAR must therefore be explained by the relative distributions of dark and visible mass established by the process of galaxy formation.Interactionless cold dark matter is influenced only gravitationally by the baryonic mass so the emergence of the RAR must be somewhat fortuitous; it is not established directly by a baryon-dark matter coupling (although see Blanchet & Le Tiec 2008;Berezhiani & Khoury 2015;Famaey et al. 2018 for alternative ideas).In contrast, the modified gravity (or modified inertia) interpretation posits a breakdown of Newtonian mechanics at low acceleration so that the dynamical mass inferred by a Newtonian analysis is not the true dynamical mass of the galaxy.The prototypical instantiation of this idea is Modified Newtonian Dynamics (MOND ;Milgrom 1983a,c,b), in which the kinematic acceleration g obs follows the square root of the Newtonian acceleration g bar in the weak-field regime.This enables the total dynamical mass of the galaxy to remain equal to the baryonic mass across galaxies' rotation curves, eliminating the need for dark matter in them.The MOND paradigm attempts to dispense with dark matter entirely, and has cosmologically viable relativistic extensions (most recently Skordis & Złośnik 2021).It is reviewed in Famaey & McGaugh (2012) and Banik & Zhao (2022).
Central to the dark matter-modified gravity debate in the context of galaxy dynamics is the functional form of the RAR.This is because MOND makes a very specific predic-tion (absent the external field effect: g obs = g bar in the highacceleration "Newtonian regime" and g obs ∝ g 1/2 bar in the lowacceleration "deep-MOND regime") while dark matter could accommodate a range of possibilities depending on the effect of galaxy formation on halo density profiles, which remains highly uncertain (e.g.Duffy et al. 2010;Macciò et al. 2012;Grudić et al. 2020;Tenneti et al. 2018;Ludlow et al. 2017;Navarro et al. 2017;Keller & Wadsley 2017).The only potentially unambiguous prediction is that the RAR tends to g obs = Ωm/Ω b g bar at radii sufficiently large to encompass the cosmic baryon fraction, but it is unclear where or even if this occurs in galaxies.Thus, while the ΛCDM prediction for the full RAR can be tested only by applying potentially restrictive priors on galaxy formation effects (Di Cintio & Lelli 2016;Desmond 2017;Paranjape & Sheth 2021), a more direct route towards informing the dark matter-modified gravity debate is to test the MOND prediction, specifically the limiting behaviour at g g0 and g g0, the small intrinsic scatter and the lack of residual correlations.
Here we focus on the asymptotic behaviour.This can be assessed to some extent by fitting a functional form with free power-law slopes at both ends (Lelli et al. 2017), but this assumes that the slope tends to a constant at each end and restricts to a specific part of the functional parameter space for which this is the case.These are in question when assessing the accuracy of the MOND prescription.A fully satisfactory fit should therefore make no such assumptions, eliminating potential confirmation bias and testing without any priors the assertion that the RAR implies no dynamically relevant dark matter at high g and the deep-MOND limit at low g.We accomplish this here by means of a novel regression algorithm dubbed Exhaustive Symbolic Regression (ESR; Bartlett et al. 2022b), and hence assess the degree to which the RAR supports the tenets of MOND.Within the MOND paradigm, this method also enables optimisation of the "interpolating function" (IF) g obs = F(g bar ) between the two stipulated limits.
The structure of the paper is as follows.In Sec. 2 we describe the RAR data that we use, and in Sec. 3 our algorithm for generating functions and assessing their aptitude for describing the data.Sec. 4 presents the results.In Sec. 5 we discuss the broader ramifications, potential remaining uncertainties and ways in which the programme could be furthered in the future.Sec.6 concludes.Full details on ESR are given in the companion paper Bartlett et al. (2022b).Units not explicitly given are 10 −10 m s −2 , and all logarithms are natural.

OBSERVATIONAL DATA
We use the SPARC data set (Lelli et al. 2016),1 a compilation of 175 rotation curves from the literature combined with Spitzer 3.6µm photometry.We apply the same quality cuts as the RAR study of Lelli et al. (2017), removing galaxies with quality flag 3 (indicating large asymmetries, non-circular motions and/or offsets between stellar and HI distributions) and those with inclinations i < 30 deg, and points for which the quoted fractional uncertainty on the observed rotation veloc-ity is greater than 10 per cent.This leaves 2,696 points from 147 galaxies.

METHOD
We describe our method for generating and assessing trial functions in Sec.3.1, and our likelihood function in Sec.3.2.In Sec.3.3 we outline our criteria for assessing whether a function displays MOND-like behaviour.

Exhaustive Symbolic Regression
While algorithms for symbolic regression (SR)-the search for good functional descriptions of a dataset-are becoming mature, they remain fallible (La Cava et al. 2021).Unless the generating function of the data is known at the outset (in which case SR is not required), it is not possible to determine whether any SR algorithm has uncovered the best function.This motivated us to develop "Exhaustive Symbolic Regression" (ESR) which, given a set of basis functions, produces and evaluates every possible function up to a given complexity of equation, defined here as the number of nodes in its tree representation.This enables a brute-force solution to relatively simple problems and provides a touchstone for assessing the results of stochastic algorithms at higher complexity.As shown in detail in Bartlett et al. (2022b), stochastic searches regularly fail to the best functions at even moderate complexity ∼6, so that we would not be confidant of obtaining the functional form of the RAR through any algorithm besides ESR.
Presented in full in the companion paper, ESR has two main steps: i) generating, and optimising the parameters of, all functions up to a given complexity, and ii) ranking these functions using an information-theoretic metric combining accuracy and simplicity.For part i, the steps are: (1) Generate all possible trees containing a given number of nodes (equal to the complexity of functions considered).(2) Generate the complete set of such functions by decorating these trees with all permutations of the operators from the operator list specified in advance, utilising the constraints on the arity of the operator that can occupy a given node.(3) Simplify the functions and remove duplicates.Variants of the same function (e.g.x(x+θ0) and x 2 +θ0x) are however retained as these may have different model complexities (used in step ii, below).For each unique function the variant is retained that minimises this.(4) Determine the values of the free parameters appearing in the functions that maximise the likelihood of the data (see Sec. 3.2).( 5) Repeat for all complexities under consideration.
The only degrees of freedom in this procedure are the maximum complexity considered (here set at 9 as higher complexity is computationally prohibitive) and the set of operators of which the functions are composed.Here we choose: 2 • Nullary: g bar , θ • Unary: exp, sqrt, square, inv • Binary: +, −, * , /, pow where θ is a free parameter.We implicitly take the absolute value of the argument of any square root or power.
The result of this procedure is a list of all functions up to the maximum complexity (of which there are 2.24×10 7 ), along with the parameter values that maximise the likelihood of the RAR data.As in regular regression, using the maximum likelihood as the model selection criterion would favour overfitting, whereby a function fits the data near-perfectly but generalises or extrapolates poorly.SR therefore typically uses a two-objective optimisation, where the second objective is the "simplicity" of the function.In the absence of a metric for trading accuracy (the first objective) with complexity, optimal functions form a "Pareto front" where accuracy cannot be increased without reducing simplicity and vice versa.Simplicity has been defined analogously to model complexity (the number of nodes in the tree representation; e.g. in PySR; Cranmer et al. 2020), among others, but such definitions are typically arbitrary and thus compromise the objectivity of the regression results.
To remedy this, part ii of ESR implements the minimum description length principle (MDL; Rissanen 1978;Grünwald & Roos 2019;Grunwald 2007) as a model selection criterion, which has an information-theoretic motivation and provides a natural framework for making commensurable the two objectives.MDL states that functions are preferred to the extent that they compress the data, i.e. minimise the number of bits required to communicate the data with the aid of the function.We implement this with a two-step code in which the description length (also called codelength) is comprised of a component describing the function and a component describing the residuals of the data around the function's expectation.We use the Shannon-Fano coding scheme for the latter (Cover & Thomas 1991), and for the former include contributions both from the structure of the function (penalising those employing more operators) and from the free parameters (penalising more parameters, especially ones that must be specified to high precision to achieve a high likelihood).The overall codelength of the compressed data, L(D), is derived in sec.3 of Bartlett et al. (2022b): where L is the description length, D the dataset, H the hypothesis (i.e.function in question), L the likelihood, θ a free parameter of the function, k the number of nodes in the function's tree representation, n the number of unique operators involved, p the total number of free parameters, I the Fisher information matrix of the parameters and cj any constant natural numbers generated by simplifications.A hat denotes evaluation at the maximum-likelihood point.With all logarithms natural, this is the number of nats required to communicate the data with the aid of the function.L(D) supports a probabilistic interpretation over function space that generalises the likelihood: the relative probability of a function is exp(−L(D)) (Grunwald 2007).
The structure of the function alone determines the k log(n) term, but the remaining terms require the free parameters to be numerically optimised to maximise the likelihood (which we use interchangeably with minimising the loss). 3We now describe our choice of likelihood for the RAR data.

Loss function
As is typical (e.g.Lelli et al. 2017), we assume that g bar , g obs and their uncertainties are uncorrelated across the dataset.We further assume that the true g bar and g obs values, denoted g t bar and g t obs , generate the observed values with lognormal probability distributions centred at the true values with widths given by their uncertainties δg bar and δg obs .Following Lelli et al. (2017), we fix the mass-to-light ratios Υgas = 1.33,Υ disk = 0.5 and Υ bulge = 0.7 and assign them 10, 25 and 25 per cent uncertainties respectively, summing these in quadrature to estimate δV bar and hence δg bar (assuming no uncertainty in radial position).We likewise assume the uncertainties on distance D and inclination i to be statistical and hence sum their contributions in quadrature with the quoted statistical uncertainty on V obs according to Lelli et al. (2017, eq. 2) to estimate δg bar .
The likelihood of an observation given the function in question, f (g t bar ), is then: where This is derived by keeping the leading-order term in the Taylor expansion of log(f (g t bar )) around log(f (g bar )) and therefore assumes this to be small relative to the rate of change of f .An advantage of working in log(g bar ) − log(g obs ) space as opposed to g bar − g obs is that, the RAR being roughly a product of power-laws, this minimises the error due to the first-order approximation.We find this to be good for all of the best functions.The likelihood is then the product over all data points.We discuss in Sec.A the limitations of this likelihood model and how it could be improved.
Fig. 1 plots these functions on top of the RAR data for the best-fit values on the SPARC data (shown in the lower rows of Table 1 in Sec.4.1).The Simple and RAR IFs are distinguished from the Standard IF principally by a more gradual transition between the Newtonian and deep-MOND regimes, although the Standard IF also prefers a significantly higher value of g0.While the basic MOND framework is not committed to any particular IF, it is committed to Eq. 4 providing an optimal description of the data.Our assessment of the theory will therefore be based on the extent to which the best functions (those with lowest description length) conform to these limits: any function that does so, in addition to possessing a coefficient of proportionality of unity in g obs ∝ g bar at high g bar , may be considered a new MOND IF.(The low-g bar coefficient of proportionality is √ g0 but g0 is unknown a priori, so this does not supply an additional requirement.)Following Lelli et al. (2017), we will also consider a double power law fit: which has limiting logarithmic slopes of θ3 and θ2, and plot the best-fit in Fig. 1.We define s− ≡ limg bar →0 s and s+ ≡ limg bar →∞ s.
The MOND interpretation of the RAR is complicated by the possibility of the external field effect (EFE), a breakdown of the strong equivalence principle due to the nonlinear, acceleration-based modification to Newtonian mechanics (Milgrom 1983a).The EFE implies that otherwise identical galaxies in different external gravitational fields have different dynamics, which is a function of the external field strength gex relative to g0 and the internal field gin.In the quasi-Newtonian regime gin < gex < g0, Kepler's laws are recovered with dynamical masses scaled by gex/g0, while in the external field-dominated regime gin < g0 < gex, Newtonian mechanics are fully recovered (Famaey & McGaugh 2012).This steepens the RAR at low g bar .
The precise effect of the EFE is difficult to calculate in general because it depends both on the underlying MOND theory and on a galaxy's morphology and orientation with respect to the external field direction.The most sophisticated fitting functions to MOND simulations are currently to be found in Zonoozi et al. (2021) for QUMOND (Milgrom 2010) and Chae & Milgrom (2022) for AQUAL (Bekenstein & Milgrom 1984).Chae et al. ( 2022) tested these expectations by fitting the SPARC rotation curves for the average external field strength, finding this to be in good agreement with independent estimates based on the baryonic mass surrounding the  1.The dashed black line shows the one-to-one relation (Newtonian limit) and the cross in the lower right shows the average uncertainty size.
SPARC galaxies (Chae et al. 2021) for AQUAL, but less so for QUMOND.We will therefore use AQUAL to explore the effect of the EFE on the expected low-g bar slope and functional form more generally.Chae & Milgrom (2022) eq. 15 gives .
This allows for variable disk thickness and scale length, and is azimuthally averaged to reduce sensitivity to the orientation of the field relative to the disk axis.It recovers the Simple IF as eN ≡ gex/g0 → 0 and hence we refer to it as "Simple IF + EFE".Note that, while some form of the EFE is generically predicted by MOND, in modified inertia formulations it may be very different (e.g. a function of the entire past trajectory of an object) or negligible (Milgrom 2011).While there is evidence for the EFE in many systems (McGaugh & Milgrom 2013;Haghi et al. 2019;Chae et al. 2020b), in others it appears conspicuously absent (Hernandez et al. 2019;Freundlich et al. 2022).The black curve in Fig. 1 shows the best fit to the data using Eq. 6.

Mock data generation
To shed light on the significance of our results we apply ESR also to two stacks of mock data sets.We generate each mock data set using exactly the same number of points as the SPARC data, and with identical log g bar , δ log g bar and δ log g obs values, but with log g obs generated using a MON-Dian function.This assumes g t bar equal to the SPARC g bar (the maximum a priori estimate), log g bar for each mock realisation drawn from N (log g t bar , δ log g bar ), and log g obs from N (F(log g t bar ), δ log g obs ).To reduce the impact of noise in the mock data we apply ESR to a stack of 10 independent realisations. 4The only terms in the description length that depend on the dataset size are log(L) and the Fisher matrix I, both of which scale linearly.To make the results compatible with the real data we therefore divide these terms by 10.
The two mock data set stacks differ in the generating function F. For the first, we use the RAR IF with the best-fit value on the data g0 = 1.127 (see Sec. 4).This function is already known to describe the RAR well (Lelli et al. 2017) and has low enough complexity to be included (as a special case of a more general function, see below) in our function list.Since Eq. 4 is satisfied by construction in this case, evaluating it on the best functions from ESR will address the question of whether the dynamic range of the data is sufficiently high-and the uncertainties sufficiently low-to pick out unambiguously a correctly MONDian solution, as only in this case could one expect to obtain such behaviour for the real data were it generated by MOND.
The second stack is created using Eq. 6.We adopt g0 = 1.2 and gex = 1.2 × 10 −2 (eN = 0.01), corresponding roughly to maximal clustering of unobserved baryons (as expected in a MOND cosmology and maximising agreement with the rotation curve fits; Chae et al. 2021) and hence providing an upper bound on the impact of the EFE.This is similar to the value inferred in Chae (2022) and Chae et al. ( 2022) from fits to the SPARC rotation curves, and from our fit to the SPARC data in Table 1.5

SPARC data
We show in Table 1 the statistics of the best functions found by ESR on the SPARC data.We split the codelength of Eq. 1 into terms describing the residuals of the data around the functional expectation, the functional form and the parameter values as shown in the table footnotes.Below the horizontal line we give the results of the three MOND IFs, for which the free parameter corresponds to g0, the double power law (Eq.5) and the Simple IF + EFE (Eq.6).
) is the probability of the function given its description length, where the sum is over all functions up to complexity 9. (Note that these values would be changed by low-L(D) functions at higher complexity.)For reference, L(D) for the raw data (corresponding to the hypothesis log g obs = 0) is 53471, showing that significant compression is possible.
We find the best-fit g0 value for the RAR IF to be 1.13, somewhat lower than the 1.20 quoted by Lelli et al. (2017) although the data are the same.This is because Lelli et al. (2017) used scipy.odr to perform the optimisation rather than using the full first-order likelihood Eqs.2-3, and also did the analysis in the log(g obs ) − g bar rather than log(g bar ) − log(g obs ) plane (F.Lelli private communication).The double power law fit has much higher maximum likelihood than the MOND IFs, outweighing its increased codelength due to its four free parameters.Although the RAR IF has complexity  9 it is not explicitly produced by ESR due to the constant "1" appearing,6 a generalised form in which this is replaced by a free parameter appears at rank 17, with a probability 4 × 10 10 times lower than the top-ranked function (∆L(D) = 24.5).When θ0 = 0 the low-g bar logarithmic slope s− of this function is 1 rather than 1/2, so it does not function as a MOND IF.We refer to it as the "generalised RAR IF".
The best ESR functions are clearly superior to the MOND functions or double power law.While the best metric for this is L(D) (or equivalently P (f )), other statistics lead to the same conclusion.There are many functions more accurate (lower − log(L)) than even the double power law.Although the functions at rank 1-9 have more free parameters than the IFs this is more than compensated for by their greater accuracy: as an alternative metric, the Bayesian Information Criterion (BIC) of the rank 1 function is 108 lower than the Simple IF and even that of the rank 3 function with four free parameters is 94.5 lower, corresponding to a very strong preference.In agreement with Chae et al. (2020bChae et al. ( , 2021Chae et al. ( , 2022)), when fitting the Simple IF + EFE we find that eN > 0 is clearly preferred, and recovers a value around the large-scale structure expectation.Although this function is significantly more accurate than any IF on its own, more than compensating for its additional free parameter (∆BIC = −35 compared to the Simple IF), it has a poor description length due to the large functional contribution.The complexity is 59 using our current basis set of operators, although this would fall to 45 if tanh were explicitly included.In general, the great improvement in accuracy and simplicity of the ESR functions demonstrates the advantage of this method over guessing functions "by eye".
In the top panel of Fig. 2 we plot the best 20 functions on top of the SPARC data.The functions are colour-coded by P (f ), with darker colouring indicating functions favoured by MDL.The top six functions have discontinuities in s at g bar ≈ 0.02.We have checked that this is not due to outlying points, does not invalidate the first-order approximation of Eq. 2, and does not appear to map onto any local feature of the data.Instead it is likely due to the relative simplicity (i.e.complexity ≤ 9) of the functions considered, as we discuss further in Sec. 5.While such behaviour could be excluded by requiring no discontinuity in the derivative within the range of the data (or more generally weighting functions with an s-dependent prior), we see no principled reason to do so.The first function without a discontinuity is at rank 7, which has the Newtonian limit and s− = 0.52 at maximum likelihood.Although this is 9.3 σ from s− = 1/2, and hence this equation cannot function as a pure-MOND IF, the EFE leads to the expectation that s > 1/2 at low g bar as discussed in more detail below.Several of the remaining highly ranked functions have similar quasi-MONDian behaviour.
While some of the best functions have MONDian limits around g bar = 10 ±3 others do not, and in particular s− < 1/2 is common.To explore this further we calculate s− and s+ analytically for the top ten equations as a function of their free parameters, showing the results in Table B1.For each of the functions where s− and/or s+ depend on θ (i.e. are not fixed by the functional form alone) we perform a Markov Chain Monte Carlo (MCMC) inference using the numpyro sampler (Phan et al. 2019;Bingham et al. 2019) with broad flat priors to constrain the parameters and hence derive the posterior predictive distributions of the limiting slopes.Fig. 3 (left panel) shows the results for the top ten functions, using a dot to indicate a limit fixed by the functional form, a bar to show the 95 per cent confidence interval in cases where the slope depends on the parameters, and an arrowhead to indicate a limit outside of the plotting range (including ±∞).In many cases the 95% confidence interval is extremely narrow.
Although it is s− and s+ that directly relate to the MOND hypothesis, they require extrapolation far beyond the range of the data.To understand how the slopes behave near the limits of the SPARC data we also calculate s at the minimum, g bar, min = 8.32 × 10 −13 ms −2 , and maximum, g bar, max = 6.54 × 10 −9 ms −2 , measured baryonic accelerations.These are plotted in cyan and magenta respectively in Fig. 3.At g bar, max the logarithmic slope is 1 for almost all functions, as expected for the Newtonian limit as only at g bar → ∞ does s become 1.However, we find that the g bar,min slopes of the top five functions are not ∼ 1/2 but actually < 0 due to the aforementioned discontinuity.The remaining top functions have low-acceleration slopes typically slightly larger than 1/2.
These results show that the SPARC data do not unambiguously favour s− = 1/2 and s+ = 1.The requirement for an interpolating function to be MONDian is in fact even more stringent than this, since the coefficient of proportionality in the limiting high-g bar power-law relation must be unity, i.e. g obs = g bar .We find that among the functions in the top ten for which s+ = 1, four have such a coefficient (at rank 2, 4, 7, 10) while for the rank 1 function this is 0.84 ± 0.006 and at rank 8 it is 0.72 ± 0.01, where the uncertainties are obtained by fitting the functions with MCMC.At low g bar the coefficient in g obs ∝ g 1/2 bar should be √ g0 ≈ 1.The only function with s− = 1/2 (rank 6) has the coefficient 1.12 (close to the 1.10 expected from the canonical g0 = 1.2), while those further down with s gbar,min ≈ 1/2 have a coefficient of 1.The relative simplicity of the functions we consider here should preference a coefficient of 1, and hence again it is not clear to what extent the data may be said to be MONDian.The double power law has the limits g obs = 0.81 g 1.03 bar at high g bar and g obs = 1.57g 0.60 bar at low g bar .Next, we show in the left panel of Fig. 4 the separate Pareto fronts of description length and negative log-likelihood, with the second ("simplicity") objective measured by functional complexity.Unlike the Pareto fronts produced by traditional SR algorithms, those of ESR are guaranteed to be optimal.L(D) and − log(L) are minimised separately at each complexity, and have their minimum values over all complexity subtracted so that the globally best functions appear at 0. We show the MONDian functions and double power law as separate symbols, all of which we find to be strongly Paretodominated by the best ESR functions at lower complexity.Note that while the "knee" of the Pareto front (where L(D) or − log(L) turns over) would appear to be at complexity 6-7, there is a significant improvement in going from complexity 8 to 9.This cautions against automatically selecting functions at the knee (the default for example in PySR), and indicates that further improvement would likely be achievable by going beyond complexity 9.This is beyond the scope of the present work; we are content here to have discovered simple functional forms for the RAR surpassing any that have been considered heretofore.

Mock data
The above results suggest a Newtonian limit (g obs → g bar as g bar → ∞) is somewhat favoured by the data while a deep-MOND limit (g obs → √ g0 g bar as g bar → 0) is questionable.However, given the limited dynamical range and significant uncertainties of the data it is unclear to what extent we should expect to find these limits even if the generating function were MONDian.In addition, the EFE would imply s > 1/2 at low g bar .To investigate these issues we now apply ESR to the mock data of Sec.3.4.1. Top functions found by ESR applied to the SPARC data, ranked by total description length, compared to four MOND IFs and a double power law below the horizontal line.x ≡ g bar /10 −10 m s −2 .We include also the "generalised RAR IF" at rank 17, although this does not have a deep-MOND limit.The IFs and double power law are not produced explicitly by our implementation of ESR and hence their ranks are unknown, although they are clearly worse than the best ESR functions in both description length and likelihood.For the Simple, RAR and Standard IFs the parameter is g 0 , while for the "Simple IF + EFE" (Eq.6), the first is g 0 and the second e N .

RAR IF generating function
this data, but there are 15 functions with lower L(D), including several at lower complexity.This indicates that the characteristics of the SPARC data (dynamic range and uncertainties) are insufficient to pick out the true generating function: ESR prefers simpler functions which may achieve slightly higher likelihoods.Since the − log(L) term in L(D) becomes dominant at large dataset size, simply increasing the number of (mock) observations with otherwise identical properties would not be sufficient to push the generating function to the top of the list.For this dataset we find that the generalised RAR IF, appearing at rank 41, has higher L(D) than the RAR IF despite slightly higher likelihood, a success of MDL's penalisation of more complex functions.Note that the best-fit generalised RAR IF is x/(0.995− exp(− x/1.068)), somewhat offset from the ground-truth values {1, 1.127}.This results from a combination of the limited dataset size (introducing random noise) and a small bias in the maximumlikelihood estimator which we discuss further in Appendix A. At rank 27 we find a close cousin of the RAR IF in which the "1" is free and g0 is pinned to 1.This performs only slightly worse than the RAR IF itself because the true g0 is close to 1.
The highest ranked ESR function is better by ∆L(D) = 6.3 than the RAR IF.Although the relative probability between the best function and the best RAR-like function is smaller than in the observations (∼ 1600 compared to 4 × 10 10 ), it is interesting that the best RAR-like function appears further up the list in the real data (rank 17 vs 27).The double power law is disfavoured relative to the RAR IF and many ESR functions despite having the highest likelihood shown in the table, another success of the complexity penalisation.There are a few functions overall with lower − log(L), the lowest being −2048.0 at rank 51 ((θ0 + |θ1 + θ2/x| 1/2 ) −1 , L(D) = −2018.4).Also as expected, the Simple IF + EFE has eN snapped to 0 and hence behaves identically to the Simple IF, albeit with larger functional complexity.
The top 20 ESR functions for the RAR IF mock are overplotted on that data in the middle panel of Fig. 2. We find a slightly reduced spread in s at both the high-g bar and low-g bar ends compared to the real data, without any discontinuities within the data range.However there is still significant uncertainty beyond the range of the data.This is quantified in the middle panel of Fig. 3, where the top ten functions are all observed to have slopes of approximately 1/2 at g bar,min and 1 at g bar,max , although only in two cases is s− = 1/2: for the others it is lower.This indicates that constraining the slope to be near 1/2 at g bar,min is insufficient to conclude that s− takes a similar value, at least up to complexity 9.One would need to reduce the uncertainties, or, preferably, lower g bar,min .That this applies to a lesser extent at high-g bar is shown by the functions at rank 4 and 7 with s+ = ∞.
Adding to the conclusion that the mock data characteristics are insufficient to pick out a MONDian generating function, we find that the coefficient in g obs ∝ g bar at high g bar is only 1.12 0 ---2026.2139.9 3.9 -1882.4 2. As Table 1 but for the RAR IF mock data.We find the generalised RAR IF at rank 41, and a closely related function at rank 27 in which the free parameter appears in the other term in the denominator.For this dataset the RAR IF itself is superior to both of these modified forms, as would be expected given that it generated the data.
unity for one of the top-10 functions for which s+ = 1 (at rank 6).For all the others it is 0.64, with the exception of that at rank 1 where it is 0.63.These values have uncertainties ∼ 0.003 when constrained by MCMC, and vary by ∼ 0.02 over mock datasets differing only in the random seed.Thus the best functions almost always fail to recover the Newtonian limit even when it is the truth, presumably due to an insufficient g bar,max .The origin of 0.64 is unclear, but presumably results from the way the lower-g bar behaviour is filtered through the forms of the functions found to be optimal.In cases where s− = 1/2, the coefficient of proportionality is 1 (to be compared to √ g0 in MOND), and the double power law limits are g obs = 1.20 g 0.90 bar at high g bar and g obs = 1.30g 0.54 bar at low g bar .
The middle panel of Fig. 4 shows the Pareto front for these data.We find more smooth behaviour than for the real data, with the optimum solution achieved already by complexity 8, reflecting the relatively simple nature of the generating function.This shows that if the RAR IF was generating the real data (and our likelihood and mock data generation method were accurate), we would have achieved the L(D) minimum on those data too.However, the MONDian functions (including the RAR IF itself) and double power law are Pareto-dominated by the ESR results even on these mock data, showing that one should not expect to be able to re-cover unambiguously even this simplest of MOND generating functions.

Simple IF + EFE generating function
Analogous results for the mock data generated using the Simple IF with inclusion of the EFE are shown in Table 3 and the bottom/right panels of Figs.2-4.This dataset behaves more similarly to the real data in terms of the relative ordering of the IFs, double power law and ESR functions, including the generalised RAR IF.Indeed, the best-fit parameters of all three non-EFE IFs are identical to the SPARC data to two decimal places, while those of the generalised RAR IF and the high-g bar slope of the double power law are the same to one.Here the IFs provide a significantly worse compression of the data than the best ESR functions, and the double power law also performs relatively poorly due to the curvature at low g bar (see Fig. 1).There is again a small bias between the maximum-likelihood (1.19 and 8.56×10 −3 ) and true (1.2 and 1×10 −2 ) g0 and eN values for the Simple IF + EFE fit.Although this function has among the highest likelihoods achievable by ESR up to complexity 9, its functional complexity makes it a poor compression of its own SPARC-like mock data.This reinforces the conclusion that the characteristics of these data are insufficient to identify a MONDian  generating function: this one in particular would require far more data than the RAR IF to be favoured by MDL.
For the Simple IF + EFE mock all top-10 functions have s+ = 1 and s(g bar,max ) > 0.9.However, only six of them recover the true Newtonian limit g obs = g bar : the others find g obs = 0.71 g bar or g obs = 0.79 g bar , again with sub-percent uncertainty from MCMC.Thus under this model too one would not expect the Newtonian limit to be identified robustly.While s(g bar,min ) > 1/2, indicating the significant impact of the EFE, s− is typically 0 as opposed to 1 as expected from Eq. 6.Thus g bar,min is too high to constrain s− reliably, although this may also be a reflection of the relative simplicity of the functions we consider.The double power law limits are g obs = 0.96 g 0.98 bar at high g bar and g obs = 1.55 g 0.60 bar at low g bar .The Pareto front indicates this dataset to be somewhat more complex than the RAR IF mock, as L(D) continues to fall to complexity 9, although the smoothness shows it to be simpler than the real data.
All functions considered achieve considerably higher likelihood on the mock datasets than the real data, showing that the mocks are simpler.This could be because the data do not conform to the MOND expectation-one would expect any given g obs = f (g bar ) to be relatively inaccurate in a chaotic ΛCDM galaxy formation scenario-or because the model for scattering the mock data points is overly simplistic.This is discussed further in Sec. 5. Relatedly, the P (f ) values of the top functions are very closely spaced in the mock datasets, indicating that there is little to distinguish them.On the contrary, on the real data the integrated probability of all functions besides the top five is 10 −8 , suggesting that these functions perhaps ought not to be considered at all.The limiting slopes of the top ten equations as a function of the parameters can be found in Tables B2 and B3 for the RAR and EFE mocks respectively.

DISCUSSION
Our main conclusion is that the SPARC data are insufficient to determine robustly the limiting behaviour of the RAR, and hence cannot verify or refute the MOND hypothesis.This is reached by studying mock data generated by MOND; in particular, generating data according to the RAR IF, not only are we unable to identify that as the generating function, but, more seriously, we cannot reconstruct s− = 1/2.At the high-g bar end, the logarithmic slope of the Newtonian limit (s+ = 1) is typically well recovered, although the coefficient of proportionality in g obs ∝ g bar is not: in the RAR IF mock data this takes a values ∼ 0.64 far more often than 1.
Improving this situation requires increasing the dynamical range of the RAR.At the low-g bar end this may be achieved by studying ultra-diffuse galaxies (e.g.Freundlich et al. 2022), or local dwarf spheroidals (e.g. McGaugh & Wolf 2010;Mc-Gaugh & Milgrom 2013), some of which seem seem to indicate s− ≈ 0, as found by many of the best ESR functions (Lelli et al. 2017).Alternatively, one may attempt to probe the outer regions of galaxies including the Milky Way (e.g.Oman et al. 2020).Particularly promising for a large gain is to use stacked weak lensing to probe galaxy outskirts that would have insufficient signal-to-noise on an individualobject basis (Brouwer et al. 2021).This appears to indicate s− ≈ 1/2.Increasing g bar,max requires probing the central regions of high-mass ellipticals, as well as groups and clusters of galaxies (Chae et al. 2020a(Chae et al. , 2019;;Gopika & Desai 2021;Chan & Del Popolo 2020;Tian et al. 2020;Pradyumna & Desai 2021).Such data already exists and may readily be folded into our framework to increase its constraining power.A smaller information gain may be achieved by reducing the uncertainties in g bar and g obs : in the limit of no uncertainty any generating function will be assigned P (f ) = 1 by MDL.By generating mock data with different characteristics one could ascertain the requirements for various features of the functional form to be unambiguously determined.
It is likely that there exist functions at higher complexity superior to those of Tables 1-3, especially for the real data where L(D) drops significantly from complexity 8 to 9. While uncovering lower-L(D) functions at complexity >9 may update the optimal limiting behaviours of the functional form of the RAR, and hence its compatibility with MOND, it cannot compromise our discovery of simpler and more accurate functions than the IFs and double power law.Indeed, the fact that the (or at least a) knee of the Pareto front is reached around complexity 7 in the left panel of Fig. 4 shows that such functions already offer a powerful compression, and the commonalities between the top functions at complexity 9 suggests that similar features are likely to be present in more complex functions also.
Discovering such functions would likely be computationally prohibitive for the ESR algorithm, and thus a stochastic search (e.g. using a genetic algorithm) may be required.This search may be seeded by the ESR functions: the fact that many of the best-fitting functions have similar features (such as g obs ≈ 0.64 g bar as g bar → ∞ for the RAR IF mock) suggests these may be useful for higher-complexity functions also.Thus ESR may be used to validate the underlying assumption of stochastic searches-that there exist features of functions responsible for their fitness-and the identification of these features may be useful for tuning hyperparameters.It would also be possible to combine ESR with deterministic symbolic regression algorithms (e.g.Worm & Chiu 2013;Rivero et al. 2022;Kammerer et al. 2021) to search systematically the neighbourhood of good functions towards higher complexity.
Our best functions on the real data have a discontinuity in s around g bar = 0.02.This is likely due to the limited complexity of the equations we consider: a cusp is the simplest way of changing s sharply.It is probable that the optimal functions at higher complexity will have a smoothed form of this behaviour in which s does not become negative and may not tend to 0. We therefore doubt that the s− and s(g bar,min ) values of the best functions in Table 1 are robust.One could attempt to construct more complex functions inspired by the ESR results with similar but not discontinuous behaviour and calculate their − log(L) and L(D) separately, or feed them into a genetic algorithm as mentioned above.On the other hand, below complexity 9, there is only a single low description length function that is discontinuous, the third best function at complexity 8 (P (f ) = 4.2 × 10 −11 ).The best functions at lower complexity more frequently have s− = 1/2 and s+ = 1, although again they rarely satisfy g obs = g bar as g bar → ∞.For example, the top function at complexity 6-marking the (first) knee of the L(D) Pareto front in Fig. 4-is g obs = 0.70 g bar + √ g bar , exhibiting simi-lar high-g bar behaviour to the 1 st -and 8 th -ranked functions overall.
To generate the mock data we assumed that all g bar values are uncorrelated.While this is likely true between galaxies, it is not within a single galaxy because the uncertainty in g bar is dominated by the mass-to-light ratio, a global galaxy parameter in the simplest approximation.This may be seen from the data in Fig. 1, where lines of points (e.g.scattering low around g bar = 2 or high around g bar = 10) are all from the same galaxy.A more robust procedure may be to generate Υ values for each mock galaxy by randomly drawing from their priors, use this to transform g bar and then add any other, random sources of noise (e.g. from the uncertainty in 3.6 µm luminosity).By enhancing inter-galaxy variations this may increase the complexity of the mock datasets, moving their ESR results towards those of the SPARC data.Alternatively, one may fit each galaxy separately to assess compatibility of their individual RARs (analogously to Li et al. 2018 but not just for the RAR IF).The assumption of uncorrelated data points is also present in our likelihood, as discussed further in Appendix A. A complete analysis would infer D, i, L3.6 and Υ for each SPARC galaxy along with the parameters of the function being fitted.
We have assumed no intrinsic scatter in the RAR, such that all deviations from the hypothetical functional expectation must come from the observational uncertainties.While this is expected in MOND, in ΛCDM the complex process of galaxy formation would lead to a significant and parameterdependent effective intrinsic scatter (Desmond 2017).Even the EFE would introduce some scatter due to galaxy-bygalaxy variation in gex (Chae et al. 2021).It would be straightforward to add this (in some direction on the RAR plane) as an additional free parameter of all functions, which would alter the results.MDL naturally penalises the addition of this parameter, allowing one to determine whether it is justified for any given function.This would provide further evidence concerning the optimality of MOND by assessing the extent to which the data implies law-like modified gravity behaviour.
Our current implementation of MDL treats the parameter values as part of the model and chooses them to maximise the likelihood.An alternative would be to treat the hypothesis in Eq. 1 as the functional form alone, assigning codelengths and probabilities to functions regardless of their parameter values.In a Bayesian formulation this corresponds to marginalising over the parameters, and enables a simpler one-part coding scheme where the description length is simply the negative logarithm of the model evidence including any functional prior.An even higher-level approach would be to group functions into sets with specific properties, e.g.limiting behaviour.This would enable calculation of the posterior predictive distribution of any feature of the functional representation of the dataset, and hence enable model comparison at any level of generality.
The relative simplicity of the RAR and conformity to the Newtonian and deep-MOND limits are the key differences between the expectations of MOND and the more chaotic galaxy formation scenario of ΛCDM: it is only under the simpler scenario that one would expect to find a simple g obs = F(g bar ).While our results are therefore not particularly supportive of the MOND hypothesis, this is not to say either that the data could not plausibly have been generated by MOND or that it could plausibly have been generated under another hypothesis, as only MOND currently has sufficient predictivity for a test of this precision.We look to future SR studies with more data to establish the functional form of the RAR-if it exists-definitively.

CONCLUSION
The radial acceleration relation (RAR) has become central to debates about the mass discrepancy problem on astrophysical scales.Its tightness and regularity have been used to argue for a violation of Newtonian gravity in accordance with Modified Newtonian Dynamics (MOND), but the functions used to fit the data have been constructed to conform to this theory.As the first detailed application of the brand-new technique of Exhaustive Symbolic Regression, we rank objectively all simple functions in terms of their aptitude for describing the SPARC RAR.We employ the minimum description length principle to trade accuracy with simplicity and hence perform model selection, and calibrate our method on mock MOND data generated both with and without the external field effect (EFE).Our conclusions are as follows: • ESR discovers functions which are better descriptions, in both accuracy and simplicity and for both observed and simulated data, than MOND functions or a double power law.
• While the majority of best-fitting functions on the SPARC data recover g obs ∝ g bar at high accelerations, not all have a best-fit coefficient of proportionality near unity.Thus the Newtonian limit is not clearly evidenced.• The SPARC data do not prefer functions with the deep-MOND limit of g obs ∝ √ g bar as g bar → 0. Instead, we find that functions with g bar → const typically compress the data more efficiently, albeit with considerable uncertainty.
• SPARC-like mock data generated assuming the MONDian RAR interpolating function do not unambiguously recover that function.Moreover, many of the best functions for those mock data have g obs ≈ 0.64 g bar rather than g obs = g bar at high g bar , and most do not have a deep-MOND limit at all.• The EFE in AQUAL greatly increases the logarithmic slope of the best-fitting functions at the low-g bar end of the data, but does not appreciably impact the limiting slope at g bar → 0. Incorporating the EFE in the mock data produces more generally similar results to the real data, so our analysis (within the MOND paradigm) hints at it.• We conclude that the data have too small a dynamic range (and too large uncertainties) to unambiguously favour MOND even if it is in fact generating the data.The SPARC RAR alone, therefore, does not supporting that theory unambiguously.The best prospect for improving this situation is to increase the acceleration range of the data, e.g. using stacked weak lensing at low g bar and groups and clusters at high g bar .• Our results are a function of the maximum complexity of equation considered.Future symbolic regression algorithmsexhaustive or non-exhaustive-will reach the true description length minimum and hence uncover the optimal functional representation of the RAR and determine whether the relation implies novel law-like gravitational behaviour.
Exhaustive Symbolic Regression provides for the first time a guaranteed complete search through functional parameter space, making it the ideal tool to determine the analytic form of observed relations, extract physics from data theoryagnostically, and create fitting functions.We make the ESR and RAR codes, full function sets and the best 50 functions for each dataset we consider publicly available to facilitate future applications.

DATA AVAILABILITY
The code and data associated with ESR and its application to the RAR are released at and in Bartlett et al. (2022a).The SPARC data is available at http://astroweb.cwru.edu/SPARC.Other data may be shared on request to the corresponding authors.
Table B1.Functional forms and limiting slopes of the ten best functions found by ESR applied to the SPARC data.The functions in the second column give the fitted y = g obs /10 −10 m s −2 for input x = g bar /10 −10 m s −2 .The low-acceleration slope is lim x→0 + d log y/d log x (denoted s − in the text), and the high-acceleration slope is similarly defined but for x → ∞ (s + ).{θ i } are real parameters fitted to the data to maximise the likelihood (see Table 1).Slopes given without conditions are valid ∀ θ i .For comparison, the MOND prediction without the external field effect is s − = 1/2 and s + = 1.

Figure 1 .
Figure 1.The Simple, Standard and RAR IFs, a double power law, and Simple IF with a global external field strength in AQUAL, overlaid on the SPARC data (blue points).The parameters are set to their maximum-likelihood values shown in Table 1.The dashed black line shows the one-to-one relation (Newtonian limit) and the cross in the lower right shows the average uncertainty size.

Figure 2 .
Figure 2. The top 20 functions found by ESR overlaid on the SPARC data (blue points), colour-coded by their relative probability in the full function list.The top panel fits the SPARC data, the middle panel mock data generated by the RAR IF, and the bottom panel mock data generated by the Simple IF with universal external field strength gex = 1.2 × 10 −12 m s −2 .The mock datasets are 10 times larger than SPARC, although this is factored out in the description length calculation.

Figure 3 .
Figure 3.The logarithmic slopes s ≡ d log(g obs ) d log(g bar ) of the top 10 ESR functions on each dataset, for comparison with the low-and high-g bar MONDian expectations 1/2 and 1 respectively (blue and red vertical dashed lines).The blue and red points are the limiting slopes s − ≡ lim g bar →0 + s and s + ≡ limg bar →∞ s, while cyan and magenta indicate the slopes at the minimum and maximum g bar of the SPARC data (0.0083 and 65.4).In case a slope depends on a parameter value we show the 95% confidence interval as a bar (often very thin), obtained from an MCMC fit.Arrowheads indicate points or bars beyond the range of the plot.

Figure 4 .
Figure 4.The Pareto fronts identified by ESR for the SPARC, RAR IF mock and Simple IF + EFE mock datasets, for both log(L) (blue) and total description length L(D) (red).The quantities plotted have the minimum values subtracted so that the best results appear at 0. Also shown are the results of the RAR, Simple and Standard IFs, Simple IF + EFE and double power law fits.ESR significantly outperforms these "by eye" guesses, even for mock data generated from them.Short diagonal lines on the x-axis indicate breaks.In the left and right panels both red and blue points for the Standard IF at complexity 14 lie above the top of the plot.

Table 3 .
As Table 1 but for the Simple IF + EFE mock data.