Full spectrum fitting with photometry in pPXF: stellar population versus dynamical masses, non-parametric star formation history and metallicity for 3200 LEGA-C galaxies at redshift z~0.8

I introduce some improvements to the pPXF method, which measures the stellar and gas kinematics, star formation history (SFH) and chemical composition of galaxies. I describe the new optimization algorithm that pPXF uses and the changes I made to fit both spectra and photometry simultaneously. I apply the updated pPXF method to a sample of 3200 galaxies at redshift $0.63\times10^{10}$ M$_\odot$), using spectroscopy from the LEGA-C survey (DR3) and 28-bands photometry from two different sources. I compare the masses from new JAM dynamical models with the pPXF stellar population $M_\ast$ and show the latter are more reliable than previous estimates. I use three different stellar population synthesis (SPS) models in pPXF and both photometric sources. I confirm the main trend of the galaxies' global ages and metallicity $[M/H]$ with stellar velocity dispersion $\sigma_\ast$ (or central density), but I also find that $[M/H]$ depends on age at fixed $\sigma_\ast$. The SFHs reveal a sharp transition from star formation to quenching for galaxies with $\lg(\sigma_\ast/\mathrm{km\, s^{-1}})>2.3$, or average mass density within 1 kpc $\lg(\Sigma_1^{\rm JAM}/\mathrm{M_\odot kpc^{-2}})>9.9$, or with $[M/H]>-0.1$, or with Sersic index $\lg n_{\rm Ser}>0.5$. However, the transition is smoother as a function of $M_\ast$. These results are consistent for two SPS models and both photometric sources, but they differ significantly from the third SPS model, which demonstrates the importance of comparing model assumptions. The pPXF software is available from https://pypi.org/project/ppxf/.


INTRODUCTION
The study of the stellar population of galaxies is an essential tool when trying to uncover how they have assembled. For this reason, a vast number of papers have tried to infer the galaxies' star formation history (SFH) and chemical composition from observations. The earliest results were based on simple galaxy colours as these were easier to obtain (see e.g. the lectures by Baade 1963). However, galaxy colours alone cannot strongly constrain both the galaxies' chemical composition and SFHs. Inferences from galaxy photometry alone are strongly affected for example by the age-metallicity (e.g. Worthey 1994) as well as by the SFH-dust degeneracies (e.g. Silva et al. 1998;Devriendt et al. 1999;Pozzetti & Mannucci 2000). For this reason, most of our knowledge on both the star formation and chemical composition of galaxies has been obtained from the numerous absorption features in their spectra.
Stellar population synthesis (SPS) models based on empirical stellar spectra have been developed that can produce synthetic galaxy spectra at high resolution. Examples of these kind of models are the galaxev (Bruzual & Charlot 2003), the Vazdekis (Vazdekis et al. 2010(Vazdekis et al. , 2015, the fsps (Conroy et al. 2009;Conroy & Gunn 2010), the Maraston (Maraston 2005;Maraston & Strömbäck 2011;Maraston et al. 2020) and most recently the XSL models (Verro et al. 2022b). Most current SPS models complement empirical stellar spectra with fully synthetic ones like BaSeL (Westera et al. 2002) or MARCS (Gustafsson et al. 2008). This allows one to cover stages of stellar evolution not well sampled by observations and also can extend the wavelength coverage, especially to the ultraviolet and infrared regions which are poorly covered by observations. Fully synthetic SPS were also developed like bpass (Stanway & Eldridge 2018;Byrne et al. 2022) or the MARCS version of the Maraston models.
Initially, large spectroscopic studies of the stellar population in nearby galaxies were based on line indices of specific absorption features, generally using the LICK system (e.g. Worthey et al. 1994), but the availability of good-quality high-resolution SPS models (see review by Conroy 2013) motivated a shift to using full-spectrum fitting (see review by Walcher et al. 2011). Various templates-fitting methods were developed for this task, like ppxf (Cappellari & Emsellem 2004;Cappellari 2017), starlight (Cid Fernandes et al. 2005), stecmap (Ocvirk et al. 2006), vespa (Tojeiro et al. 2007), fit3D (Sánchez et al. 2016;Lacerda et al. 2022) and firefly (Wilkinson et al. 2017). These methods and software were extensively used e.g. to analyse the millions of spectra produced by integral-field spectroscopic surveys in the local Universe like ATLAS 3D , CALIFA (Sánchez et al. 2012), SAMI (Bryant et al. 2015) and MaNGA (Bundy et al. 2015).

Spectral fitting of high-galaxies
At significant redshift (e.g. > ∼ 1) the cosmological surface brightness dimming (e.g. Hogg 1999) makes good-quality spectra more difficult to obtain as the contribution of the sky background starts to dominate. ppxf was used to measure the kinematic and stellar population of significant samples of galaxies at redshift ≈ 1 (e.g. Shetty & Cappellari 2015;Bezanson et al. 2018) but only of individual objects out to ≈ 2 (e.g. van de Sande et al. 2013;Belli et al. 2014Belli et al. , 2017 and ≈ 3 (e.g. Esdaile et al. 2021;Forrest et al. 2022). Most large studies of distant galaxies still had to rely on photometry alone.
The most essential parameters one wants to extract from highgalaxies are their redshift and stellar mass (e.g. Muzzin et al. 2013;Weaver et al. 2022). Various template-fitting codes were developed to measure masses and redshift from photometric observations in multiple bands (I ignore here methods based on machine learning; see Salvato et al. 2019 for a review). These include Hyperz (Bolzonella et al. 2000), bpz (Benítez 2000), LePhare (Arnouts et al. 2002), zebra (Feldmann et al. 2006) and eazy (Brammer et al. 2008). These methods are conceptually similar to the template-based spectral fitting ones used for nearby galaxies, however, they all adopt a Bayesian approach, instead of a least-squares fitting one. This makes the codes simpler and allows for easy inclusion of priors on galaxy parameters or non-Gaussian uncertainties; e.g. one can assign a low probability to solutions where the galaxy has an unphysically large/small stellar mass.
Building on the photometric-redshift and full-spectrum fitting approaches, new software was later developed to fit spectra together with the photometry, while still retaining the same Bayesian approach of photometric-redshift codes. Examples of these are fast (Kriek et al. 2009), beagle (Chevallard & Charlot 2016), bagpipes (Carnall et al. 2018), the code described by Mendel et al. (2020) and prospector (Johnson et al. 2021b).
Contrary to what is sometimes stated, both least-squares, or maximum-likelihood, and Bayesian methods can return model posteriors, when needed. The former uses bootstrapping (e.g. Efron & Tibshirani 1994) or Monte Carlo approaches. In fact, bootstrapping can be seen as an efficient way to compute the Bayesian posterior, with non-informative priors (e.g. Rubin 1981;Efron 2011). Although bootstrapping is less flexible than general Bayesian methods, in many realistic situations, the uncertainties of model parameters are dominated by data systematic and model assumptions (as I also find later) rather than the details of the adopted statistical approach or by adopted priors.

This paper
In this paper, I proceed differently than most existing methods. Instead of adopting the standard Bayesian approach to fit photometry and spectra, I present an extension of my ppxf least-squares full-spectrum fitting method to simultaneously fit photometry. A key difference in this approach is that it can be a few orders of magnitude faster than Bayesian methods. Apart from algorithmic differences, my ppxf approach to fitting photometry with spectra is analogue to the extension of the starlight least-squares full-spectrum fitting method (López Fernández et al. 2016;Werle et al. 2019). Lest-squares methods appear complementary to existing ones as the extra speed allows for extra flexibility in the treatment of the stellar population, as shown later.
I illustrate the characteristics of the approach by fitting the VIMOS spectra and COSMOS photometry (Muzzin et al. 2013;Weaver et al. 2022) to study the joint SFH -metallicity distributions and the stellar population scaling relations of about 3200 galaxies from the LEGA-C survey  in the redshift range 0.6 < < 1.
Readers interested in the ppxf techniques should keep reading how to measure velocities in Section 2 and the ppxf updates in Section 3. While those only interested in the scientific results should skip the next two sections and go directly to the description of the data in Section 4 and results in Section 7. In this work, I adopt a standard cosmology with 0 = 70 km −1 Mpc −1 , Ω = 0.3 and Ω Λ = 0.7.

MEASURING VELOCITY AND REDSHIFT
In Cappellari (2017, sec. 2) I reviewed general and important facts that one should know before using any full spectral fitting method and ppxf in particular. Here I include only some updates, and I heavily refer the reader to my previous paper of this series to avoid duplicating material.

From measured velocity to observed redshift
The physical meaning of the velocity returned by ppxf or any spectrum-fitting code is often a source of confusion. As discussed in Cappellari (2017, sec. 2.3), the reason for this is that has no physical meaning. Even the recession velocity itself, for a distant galaxy, is an ill-defined concept with a debated interpretation (e.g. Bunn & Hogg 2009). It should never be used for quantitative work. What is well-defined empirically is the redshift of a given spectrum: where obsv and emit are the observed and rest-frame wavelength of a given spectral feature. The key formula that is needed to convert the ppxf returned by ppxf into redshift is (Cappellari 2017, eq. 8) ppxf ≡ ln(1 + ), with the speed of light. This formula is exact by construction and it is the only one to use to attach a physical meaning to ppxf .

Separating peculiar velocities and cosmological redshift
When observing spectra of distant galaxies from a single aperture, redshift is all one can measure. However, when obtaining spatiallyresolved observations of galaxies e.g. using integral-field spectroscopy (see review by Cappellari 2016) one needs to separate the cosmological redshift cosm , which only contains information on the galaxy distance, from the peculiar velocity pec . The latter is the one which satisfies e.g. Newton's law of gravitation in a reference system that moves with the galaxy barycentre. It is the velocity that has to be used to construct dynamical models of the galaxy. In Cappellari (2017, sec. 2.4) I suggested using the standard way of separating peculiar and cosmological redshift. However, there is a simpler and formally even more accurate way. In fact, the conversion of ppxf into redshift is unnecessary (see Baldry 2018). One can directly obtain pec using the velocities returned by ppxf as follows pec ( , ) = ppxf ( , ) − ppxf (bary). ( Here ppxf ( , ) is the velocity returned by ppxf at the location ( , ) on the sky, ppxf (bary) is the velocity returned by ppxf for the galaxy (or cluster) barycentre and pec ( , ) is the peculiar velocity at location ( , ). The latter is the only one with a clear physical meaning: it is the one to use in a dynamical model (e.g. Cappellari 2008), or to estimate the level of rotation in a galaxy (e.g. Emsellem et al. 2011). Importantly, equation (3) is always valid, regardless of whether the spectrum was de-redshifted to the rest-frame or not, before measuring ppxf . Note that this formula only works because of the way ppxf defines the relation between velocity and redshift in equation (2) and cannot be used with alternative definitions (e.g. ≡ ).
As an example of a practical application of these formulas, let's assume I am fitting a single spectrum of a high-galaxy for which I have an estimate of the redshift ′ (e.g. from photometry). It is generally convenient to de-redshift the spectrum by dividing each observed wavelength obs to obtain an estimate of the rest-frame wavelength with I then fit the spectrum with ppxf to obtain ppxf . If the initial guess ′ was perfect, I would obtain ppxf = 0, but in general I will measure ppxf ≠ 0, with uncertainty Δ ppxf . An improved estimate of the galaxy redshift and its uncertainty Δ can be obtained using equation (2) and equation (3) as Identical results are obtained without first bringing the spectrum to the restframe and setting ′ = 0 in equation (5). But one should remember to adjust the instrumental resolution as described in sec. 2.4 of Cappellari (2017).

UPDATES TO THE ppxf PACKAGE
I gave a detailed overview of the ppxf method in Cappellari (2017, sec. 3). I will not repeat that summary here, but instead, I will refer the reader to specific sections of that paper, while trying to keep a consistent notation. However, I substantially evolved ppxf since then, driven by the needs of my research and requests from colleagues. I only describe here the ppxf features that have changed. The description corresponds to the current version 8.2 of the public ppxf Python package 1 .

Well-sampled variable-convolution
As discussed in Cappellari (2017, sec. 2.2), when fitting stellar templates to galaxy spectra, one generally needs to first match their resolution to that of the galaxy observations by convolving the templates with a Gaussian with dispersion that varies with wavelength . In a previous version of ppxf, I implemented this step as a direct summation of the spectrum, weighted by the Gaussian centred on every pixel (function ppxf util.gaussian filter1d in ppxf). Given that the Gaussian is typically nonzero only over gau ∼ 10 pixels, while the templates have pix > ∼ 1000 spectral pixels, I limited the summation only to the nonzero pixels of the Gaussian kernel. In this way, the computation time of this summation scales as ∝ pix × gau , which is comparable to that ∝ pix × ln( pix ) achievable using Fourier convolution using the Fast Fourier Transform (FFT, Cooley & Tukey 1965).
A limitation of performing a convolution as a summation is that, when the of the Gaussian becomes comparable to the size of the sampled pixels, the convolution suffers from the same undersampling problems that motivated the use of an analytic Fourier Transform (FT) for the convolution in ppxf discussed in Cappellari (2017). For this reason, I now implemented an alternative procedure ppxf util.varsmooth which performs the variable-convolution using the FFT and uses the same approach of ppxf of using an analytic Fourier Transform of the kernel, to avoid undersampling issues.
A natural idea to use the FFT for the convolution of a vector of values with a kernel with variable scale is to stretch the coordinate with the inverse of the scale, via interpolation, in such a way that the kernel has the same scale in the new coordinates. This idea was discussed e.g. in 2014 on StackOverflow 2 and implemented in 2016 by Janez Kos on GitHub 3 in the procedure varconvolve. It was also discussed in Johnson et al. (2021b). The interpolation approach is also central for algorithms for the Non-Uniform FFT (NUFFT, e.g. Greengard & Lee 2004). Here I combine the interpolation approach with the use of an analytic Fourier Transform as in Cappellari (2017) to produce Algorithm 1. The algorithm can also be used with a non-Gaussian kernel (e.g. Cappellari 2017, eq. 38) as long as only its scale changes with wavelength.
Algorithm 1 varsmooth: well-sampled variable-convolution Given x, y, , x out ⊲ len(x) = len(y) = len( ) = = /∇x ⊲ Convert into pixels 4 max = max( ) × ⊲ Optional oversampling ≥ 1 This algorithm works well in practice, but one should be aware of its theoretical limitations. In fact, most interpolation methods can be described as a convolution with a specific kernel (e.g. Getreuer 2011) and one may think it would be better to remove the effect of this extra convolution as done e.g. in the NUFFT methods. However, this situation is different as the spectra have noise and one would have to perform the interpolation in a Bayesian framework (e.g. MacKay 1992(e.g. MacKay , 2003. One should also consider that the spectra to fit generally already include additional interpolation and resampling, which would have to be modelled for rigorous results. But all this is unlikely to affect scientific results, and for this reason, is beyond the scope of this paper.

The problem
When fitting the kinematics of multiple kinematics components with ppxf, both for the stellar and gas emission components, it is often useful to be able to set constraints on some parameters as a function of other parameters. For example, when looking for spectra containing both narrow and broad gas emission lines to study Active Galactic Nuclei (AGN, e.g. Oh et al. 2015;Fu et al. 2023), to avoid the degeneracy of fitting two similar lines, one may want to constrain the dispersion of the broad emission component, if present, to be significantly larger than that of the narrow one broad > narrow + Δ , or as a fractional difference broad > × narrow . Or one may want to constrain the velocity of a possible broad emission component to differ less than a certain value from that of the narrow one | broad − narrow | < Δ . Efficiently setting this kind of constraint requires solving a constrained non-linear optimization problem.
For maximum computational efficiency and accuracy, one should exploit the special problem that ppxf has to solve (Cappellari 2017, sec. 3.4). In particular, the function to minimize (x) is a sum of squares and the typical constraints are linear. The problem to solve can be expressed as where r(x) are the residual from the fit, and x are the nonlinear parameters, like the kinematics of different components. I have searched extensively for specialized software or an algorithm that I could easily use in ppxf to efficiently solve this specific problem but did not find any. For this reason, I developed my own.
One of the most effective ways of solving nonlinear problems with general constraints is the sequential quadratic programming (SQP) method, where at every iteration the algorithm solves a constrained quadratic problem that approximates the function at the current location (e.g. Nocedal & Wright 2006, chap. 18).
In least-squares problems, one can approximate the function near the current point x as a second-order Taylor series, with p = x − x , as follows (e.g. Nocedal & Wright 2006, sec. 10.2) (x) ≈ 1 2 ∥J · p + r ∥ 2 (7a) = 1 2 ∥r ∥ 2 + p · (J · r ) + 1 2 p · (J · J ) · p (7b) where J is the Jacobian, which can be computed by finite differences, ∇ (x ) = J · r is the gradient and ∇ 2 (x ) = J · J is the quasi-Newton approximation of the Hessian matrix, whose full form is the following, but I ignored the second term (e.g. Nocedal & Wright 2006, eq. 10.5) It is a characteristic of least-squares problems that one can approximate the Hessian "for free" using J and the reason it is important to adopt specialized methods for their solution. In the case of ppxf, the Hessian approximation is always especially good, even far from the solution, because the algorithm separates the linear and nonlinear optimizations (Cappellari 2017, sec. 3.3-3.4) and ensures that = 0 at every step. This tends to cancel out the second term in the Hessian of equation (8).
In essence, a specialized SQP algorithm to solve equation (6) would consist of solving a sequence of quadratic sub-problems as follows Algorithms for this type of problem are discussed e.g. by Fletcher (1987, sec. 11.2) or Gill et al. (1981, chapter 5). I implemented those ideas into a trust-region algorithm (Nocedal & Wright 2006, chap. 4) but I discovered that the approach is not sufficiently robust for my rather special situation. The difficulty of the optimization problem I have to solve consists of the fact that the Jacobian can sometimes be completely degenerate. A common situation where this happens is when ppxf is fitting for the kinematics of emission lines or multiple stellar kinematic components. In this situation, the weights associated with a given emission line or stellar component may become exactly zero, because the line or component is simply not present in a certain galaxy spectrum. In other cases, the signal-to-noise ratio / may be too low to give any constraints to some parameters. In these cases, the gradient (column of J) with respect to the parameters describing the kinematics of the missing component will be zero.
I tried using the Singular Value Decomposition (SVD, e.g. Press et al. 2007, sec. 15.4.2) during the iterations required to solve the quadratic sub-problem. But after extensive testing, e.g. during the development of the MaNGA Data Analysis Pipeline (Westfall et al. 2019), I was unable to find a robust criterion to decide which singular values must be edited and find the effective rank of my Jacobian.

The solution
To solve unconstrained least-squares optimization problems one of the most widely used techniques is the Levenberg-Marquardt (LM) method (Levenberg 1944;Marquardt 1963) and its state-of-the-art implementation in minpack (Moré 1978;Moré et al. 1980). The success of the LM method comes from the fact that the method penalizes the J matrix adaptively defining the quadratic sub-problem, in such a way that it always prevents degeneracy. This also makes the LM method a robust trust-region algorithm, as discussed in Fletcher (1987, sec. 5.2) or Nocedal & Wright (2006, sec. 10.3). Press et al. (2007, sec. 15.5.2) provides a less technical description.
In a previous version (<6.5) of ppxf, I used the LM algorithm, as modified in the mpfit implementation (Markwardt 2009), which included a very useful but non-optimal treatment of box constraints (i.e. upper/lower limits on the parameters). Comparable box-constrained least-squares methods exist in Scipy (Virtanen et al. 2020) as implemented in the trust-region reflective algorithm (method='trf'; Branch et al. 1999) and the dogleg algorithm (method='dogbox', Voglis & Lagaris 2004, Nocedal & Wright 2006 in scipy.optimize.least squares. These methods are also available in ppxf but cannot support linear constraints. After extensive experimentation with real-world cases, I implemented a novel hybrid between the SQP and LM methods, specialized for the nonlinear least-squares with linear constraints (both equality and inequality). The algorithm consists of a trust-region quasi-Newton SQP method, with linear constraints, in which the matrix defining the quadratic sub-problem is penalized to avoid the risk of degeneracy, as in the LM method. I achieve this by replacing the quadratic subproblem of equation (9) with the following (see Nocedal & Wright 2006, eq. 10.41) minimize where D k is a diagonal matrix, which makes the problem scale invariant. By default, the diagonal elements of D k are initialized with the norm ∥·∥ of the columns of J and are updated during the iterations as suggested in Moré (1978, eq. 6.3). Close to the solution, when the quadratic model provides a good approximation of (x), then becomes small and equation (10) approximates equation (9). In this limit, the method behaves as an SQP method. When the quadratic approximation is inaccurate, becomes large and the method behaves as a trust-region LM method.
My resulting algorithm is rather simple because I did not worry about the efficiency of the solution of the quadratic programming sub-problem. The latter generally dominates the complexity of other state-of-the-art algorithms, which devise approximated matrix updates to save computation time (e.g. see the description of LM in Moré 1978). I also did not try to deal with large-scale problems and sparse matrices which also increase complexity and require specialized methods (e.g Gill et al. 2005). Instead, I focused on the fitting of rather small nonlinear problems ( < ∼ 50 variables) in which computing the function (x) involves creating a complex model, as in ppxf. In this rather common situation, the time to solve the small quadratic programming sub-problem becomes negligible compared to that of evaluating (x). The solution is given in Algorithm 2, which I implemented in the capfit procedure in the ppxf package.
Except for the fact that the quadratic sub-problem is penalized and linearly-constrained, the algorithm uses the standard trust-region framework (e.g. Nocedal & Wright 2006, algorithm 4.1, or Fletcher 1987. For the convergence criteria, I follow the description in Moré et al. (1980, sec. 2.3).

Solving the quadratic sub-problem
I implemented two procedures to solve the quadratic programming subproblem of equation (10). In both cases, I avoid explicitly constructing the Hessian J ·J as this would degrade the conditioning of the system. The first procedure (lsq box) is specialized for the common situation where only box constraints are present. It solves min ∥A · x − b∥ 2 with lb < x < ub. For this, I use the active-set method adopted in the non-negative least-squares (nnsl) method (Lawson & Hanson 1995, algorithm 23.10), which was generalized for box constraints with the Bounded-Variables Least-Squares (bvls) procedure in the same book and in Stark & Parker (1995). My implementation closely follows Lawson & Hanson (1995) except for the important fact that (i) I allow for a starting guess and (ii) I include an initialization step (Algorithm 3) which generalizes to the box-constrained case Algorithm 2 CapFit: linearly-constrained nonlinear least-squares Given x 1 , , > 0, ∈ [0, 1/4) Obtain p as solution of equation (10) with J, D, r 1 , if > then ⊲ Successful step: move on J = J(x 2 ) Adjust the scaling matrix D ⊲ See text x 1 , r 1 = x 2 , r 2 the initialization loop in the fastnnls code 5 by Andersson & Bro (2000). In realistic ppxf problems, using my new lsq box with hundreds of spectral templates produced a typical speedup of a factor four compared to using the scipy.optimize.nnls, which is a wrapper to the Lawson & Hanson (1995) Fortran code. The procedure scipy.optimize.lsq linear currently also does not support passing a starting guess.
Algorithm 3 Initializiation lsq box box-constrained least-squares Given x, A, b, lower lb and upper ub bounds, The second procedure (lsq lin) solves general linearlyconstrained quadratic programming problems like in equation (9) using a modified version of the standard active-set technique described by Nocedal & Wright (2006, algorithm 16.3). The approach consists of solving a sequence of equality-constrained linear least-squares problems, for which I follow Golub & Van Loan (2013, algorithm 6.2.2), allowing for degenerate matrices using SVD. If the initial guess is unfeasible, I find a feasible point using the linear programming procedure scipy.optimize.linprog and the method='highs' by Huangfu & Hall (2017). Alternatively, the quadratic sub-problem can be solved within ppxf by minimizing the quadratic function in the form of equation (7b) using a general quadratic programming solver. In the current implementation, I use the interior-point solver solvers.coneqp from the cvxopt package 6 by Andersen et al. (2011), which is much faster for large-scale problems. I have extensively tested the lsq box and lsq lin procedure described in this section by constructing batteries of tests using both exact analytic solutions and comparisons against cvxopt.
The capfit procedure has been the default nonlinear optimization algorithm in ppxf for about three years and has been used as a general optimizer independently of ppxf too. During this time ppxf was used to fit millions of spectra from a variety of surveys (e.g. MaNGA Bundy et al. 2015 andSAMI Bryant et al. 2015). This allowed me to fix its handling of rare failures in degenerate situations, which are difficult to encounter in idealized examples. When applied to unconstrained nonlinear least-squares problems capfit produces essentially the same iterates as the state-of-the-art LM implementations in minpack or mpfit, as expected. When used for box-constrained nonlinear least-squares problems capfit is generally at least as efficient as the best algorithms in scipy.optimize.least squares. However, capfit allows for the extra flexibility of using linear constraints, as well as for keeping variables tied to others or fixed.

Setting linear constraints on the template weights
In the previous section, I discussed the use of linear constraints on the kinematic parameters during the ppxf fit. Here I note that linear constraints can be used also during the linear-fitting procedure in Cappellari (2017, sec. 3.3). These were used for example to constrain the sum of weights (e.g. the luminosity) of different template groups to constitute a certain fraction of the total light, e.g. to perform kinematic bulge/disks decompositions (e.g. Tabor et al. 2017Tabor et al. , 2019Oh et al. 2020) or to study stellar population of different kinematic components (e.g. Shetty et al. 2020b).

Multi-dimensional regularization
Considering without loss of generality that stellar population depends only on age, the fundamental equation used to model the spectrum of a composite stellar population is (e.g. Cid Fernandes et al. 2005;Ocvirk et al. 2006;Conroy 2013, sec. 2 where SFR is the star formation rate, SSP is a Single Stellar Population spectrum per unit mass, with age and metallicity , while is the age of the Universe at the redshift of the galaxy. This expression is generalized in ppxf to study the distribution of more parameters, like e.g. metallicity, enhancement or IMF, in addition to the SFR. I pointed out in Cappellari (2017, sec. 3.5) that equation (11) is an inhomogeneous Fredholm equation of the first kind, with kernel SSP . And the recovery of the SFR( ) from the observed mod is a textbook example of ill-conditioned inverse problem (e.g. Hansen 1998;Kabanikhin 2011;Press et al. 2007, sec. 19.0). This means that one cannot find a unique solution from real data without further assumptions.
In Cappellari (2017, sec. 3.5) I discussed the implementation of linear regularization (e.g. Press et al. 2007, sec. 19.5) in ppxf to address this issue and study the stellar population in galaxies. I gave a formula for the second-order one-dimensional regularization. Given that there are alternative ways of generalizing a measure of smoothness of a function in dimension larger than one (e.g. Brady & Horn 1983), I clarify here that the second-order regularization (reg ord=2) in ppxf minimizes the total squared Laplacian (Δ ) 2 of the weights distribution, while the first-order one (reg ord=1) minimizes the total squared gradient (∇ ) 2 . Both operators are implemented by standard finite differences.  Press et al. (2007, 19.4.1) point out that, under some sensible conditions, the regularized solution has a simple Bayesian interpretation: it represents the most likely solution for the weights, given an adjustable prior on the amplitude of the fluctuations. However, the meaning of the fundamental degeneracy of the stellar population inversion, as well as of regularization, is best illustrated with an example.
I used a grid of 25 logarithmically-spaced ages and 6 metallicities [ / ] from the SPS models by Vazdekis et al. (2015) to construct a synthetic spectrum in which the distribution of light contributed by each spectrum in the band follows a bivariate Gaussian distribution N ( , [ / ]) with mean age 0 = 1 Gyr, mean [ / ] 0 = −0.3 and dispersion of 0.25 dex in both age and metallicity. I logarithmically sampled the spectrum at a velocity scale Δ = Δ ln = 50 km −1 per spectral pixel.
I show the resulting spectrum in the top panel of Fig. 1 and the input light-weights distribution in the second panel. The third panel shows a single ppxf fit, which is characterized by discrete sharp peaks as expected due to the ill-conditioning of the inversion problem. The fourth panel shows the result of averaging the weights obtained by fitting with ppxf 100 Monte Carlo realizations obtained by adding Gaussian noise on the same noiseless synthetic spectrum. Here, the average converges towards the true input distribution. Finally, in the bottom panel, I show the result of performing a single regularized ppxf fit (with reg order=2 and a typical regul=30). Here the distribution looks comparable to that of the average of multiple realizations. Regularization has its limitations, in fact, it is by construction a trade-off between agreement with the data and smoothness (e.g. Press et al. 2007, fig. 19.4.1), which may introduce biases. In general, when one is one is obtaining results by averaging many spectra, it may be better not to use regularization, or only use a minimal amount, to reduce possible biases, while allowing the differences in the noise between spectra to act as Monte Carlo realizations. But regularization is very useful when interpreting individual spectral fits and even to reduce noise in the SPS models themselves, which may introduce spurious features in the solutions (as I found later).
One can use bootstrapping of the residuals, while repeating the ppxf fits multiple times, to obtain averages as well as uncertainties in the distribution of the weights as done e.g. by Kacharov et al. (2018, figs. 8-13). In this case, it is important to perform the initial ppxf fit, from which the residuals are extracted, using some regularization, to obtain a less noisy and more representative best-fitting spectrum. I achieved good results perturbing the residuals using the easy-to-use wild bootstrap method (Davidson & Flachaire 2008).

Global nonlinear fitting
In the most common situations, e.g. when fitting a single stellar kinematic component with emission lines, the spectral fitting problem has a single global minimum and the local optimization method of Section 3.2 is guaranteed to efficiently converge to it. However, in more complex situations, like when fitting multiple stellar or gas kinematic components, the fitting problem may present multiple minima and a local optimizer is not guaranteed to converge to the global minimum.
The standard way of dealing with multiple minima in ppxf is to perform the optimization of the variables in which the 2 function is multi-modal outside of ppxf, while calling ppxf with those variables fixed, from inside a wrapper function. For example, when studying multiple kinematic components one may sample a grid of velocities and call ppxf with fixed velocities at every location (e.g. Mitzkus et al. 2017;Tabor et al. 2017;Bevacqua et al. 2022). If one is interested in the full posterior of certain parameters, and computation time is not an issue, one may call ppxf with those parameters fixed from within a Bayesian method like MultiNest (Feroz et al. 2009), emcee (Foreman-Mackey et al. 2013), AdaMet (Cappellari et al. 2013a) or dynesty (Speagle 2020), assuming the contribution of the non-fixed parameters to the posterior can be neglected.
In the current version of ppxf one can also perform the global optimization within ppxf. This is currently implemented using the function scipy.optimize.differential evolution, which uses the Differential Evolution algorithm by Storn & Price (1997). The Scipy function allows for linear constraints using the method by Lampinen (2002). To save computation time, by default, I do not run the global optimization step until convergence, but I use it as starting point for the usual CapFit procedure.
An example of a situation where using both the global optimization and the linear constraints options can be useful, and the corresponding . The orange line is the ppxf total best fit, the red line is the best fitting stellar spectrum alone, while the magenta is for the gas emissions alone, with individual components shown in blue. The fit residuals, arbitrarily offset, are shown with green diamonds. I only plot the region with the key emissions, but I fitted the full optical spectrum, which is needed to constrain the underlying stellar contribution.
ppxf fit is shown in Fig. 2. The plots show the central spectrum of the active galaxy NGC 1386, extracted from the MUSE (Bacon et al. 2010) integral-field spectroscopic observations presented in Venturi et al. (2021). The emission line spectrum clearly requires at least three distinct kinematic components (see also Lena et al. 2015). The definition of the three kinematics components may appear ill-defined, due to the extensive blending of the lines. However, a meaningful decomposition can be obtained with some simple assumptions. Here I required the kinematics ( , ) of all five emission lines to be the same within each of the three kinematic components. I additionally required the broad of the broad component to be at least 200 km −1 broader than either of the two narrow components as follows broad > narrow,1 and broad > narrow,2 . These are linear constraints that I enforced using the constr kinem keyword in ppxf. I also fix the ratios of the [OIII] and [NII] doublets to 1/3. Linear constraints in ppxf were used extensively to produce the recent catalogue of broad and multiple gas emission line components for the full MaNGA galaxy survey (Fu et al. 2023).

Fitting spectra and photometry
Adding photometry to a full-spectrum fitting method is similar to adding a few extra pixels to the fit, which represent the fluxes measured in some observed photometric bands. The main differences are (i) that the photometric fluxes are independent of the line-of-sight velocitydistribution (LOSVD) L ( ), unlike the spectroscopic ones and (ii) the photometry of a single galaxy is usually not enough to determine both the calibration errors and the template weights. This means that one cannot use polynomials as done for the spectroscopy.
I define a function that describes an individual template spectrum ( ) (either stars or gas), convolved * with the LOSVD, which is allowed to be different for each of the templates With this notation, the model for the galaxy spectrum becomes where the P and P are multiplicative and additive polynomials respectively (of Legendre or trigonometric type) and the are optional spectra of the sky. This model is similar to the one in Cappellari (2017, eq. 11), except that here each template spectrum can have a different attenuation function . Moreover, both the attenuation and multiplicative polynomials can be used simultaneously, rather than being alternatives. This is especially useful when including photometry in the fit.
The reason for this modification is that, when we have photometric bands that span a large wavelength range, we can infer the attenuation from the photometry itself, which is not modelled with polynomials. At the same time, we can use multiplicative polynomials to correct small errors in the spectral flux calibration. However, if we do not have photometry, we cannot tell apart reddening and multiplicative polynomials, because reddening is a special case of polynomials.
The model for the photometric measurements, in linear units which allow for negative fluxes, not magnitudes, is given by the following expression where ⟨·⟩ represents the attenuated mean flux of the -th template in the -th photometric band with effective wavelength . Unlike the spectroscopic model of equation (13), the photometric model of equation (14) does not include polynomials or the sky spectrum.
In the common case of a photon-counting or energy-integrating detectors, and assuming fluxes as (e.g. in units of erg cm −2 s −1 A −1 ) the mean flux is given by (e.g. Bessell & Murphy 2012, eq. A11) where ( ) is the system photon response function and the integral extends over the region where is nonzero. The definition of mean flux in equation (15) is the one used in the standard definition of magnitudes in the ultraviolet (e.g for the GALEX spacecraft Martin et al. 2005), in the optical (e.g. for the SDSS optical survey York et al. 2000), or in the near-infrared (e.g. for the 2MASS survey Skrutskie et al. 2006). We can exactly convert mean fluxes in units of (for example, erg cm −2 s −1 Hz −1 ) using this formula where is the speed of light and the source-independent pivot wavelength defined as (e.g. Koornneef et al. 1986;Bessell & Murphy 2012, eq One could use different definitions of the observed mean fluxes by simply replacing equation (15).
In the common situation in which the covariance between the spectroscopic spec or photometric phot measurements are not known, or ignored, the residuals r from the fit are as in Cappellari (2017, eq. 22 with the difference that the vector of residuals now includes both the spectroscopic and the photometric values. In other words, the total log-likelihood ln L total of a fit now becomes the sum of the spectroscopic and photometric ones Both the linear and nonlinear fit, the regularization and the possible treatment of covariances, proceed unchanged as already described in Cappellari (2017, sec. 3.3-3.5). The only difference is one extra row in the matrix A, defined in Cappellari (2017, sec. 3.3), for every photometric measurement. According to the mean value theorem for integration, for every -th band and -th template, there exists a wavelength , which satisfies exactly When one has a good estimate of the galaxy redshift (e.g. from previous photometric redshift), or when performing a grid search for the best-fitting redshift with ppxf, the redshift of the spectrum does not change much during each ppxf fit. This makes the quantities ⟨ ( )⟩ essentially independent of L ( ). If I rewrite equation (14) as I can precompute the ⟨ ( )⟩ and , for all templates before the fit, using the initial redshift estimate. With this approach, adding photometry to a fit takes almost no extra time compared to fitting only the spectrum. I found that the flux-weighted effective wavelength (e.g. Bessell & Murphy 2012, eq well approximates the wavelength , defined by equation (20), for a range of attenuation parameters. In Fig. 3 I illustrate how accurately equation (20) is verified when approximating , ≈ eff , . I used all the 28 photometric bands described in Section 4.2 and all 387 fsps SPS templates introduced in Section 4.3, which span extreme ranges of age and metallicity. I adopt the attenuation function of equation (23) Testing the accuracy of A( eff ) g( ) g( )A( ) 2.5 lgA( eff q, n ) 2.5 lg gn( ) An( ) q gn( ) q Figure 3. Comparison between the variation in the mean flux due to dust attenuation ⟨ ( ) ( ) ⟩ /⟨ ( ) ⟩ (red circles) and the value of the attenuation curve at the rest-frame effective wavelength ( eff , ) (blue solid line), plotted versus the rest-frame effective wavelength eff , of each -th filter and -th template combination. This is computed for 28 filters at redshift = 0.8 and a set of templates spanning extreme ages and metallicities as described in the text. The eff , relative variation and the corresponding variation of the attenuation at that wavelength for a given filter are only significant in the far ultraviolet GALEX filters (indicated by the blue arrows). Even in that case, ( eff , ) accurately predicts the true integrated attenuation.
the median redshift = 0.8 of the LEGA-C sample. For every template-band combination, I compare the rigorous variation in mean flux due to the attenuation of each -th template in the -th band ⟨ ( ) ( )⟩ /⟨ ( )⟩ with the value of the attenuation curve at the effective wavelength ( eff , ). The two quantities must agree when the photometric band is narrow or the attenuation is approximately constant within the band, as is generally the case at optical or near-infrared wavelengths. However, in the far ultraviolet, in the GALEX bands, eff , and the corresponding ( eff , ) vary significantly for different templates in the same band. Even so, as shown in Fig. 3 the approximation is still much better than our uncertainty of the attenuation curve itself. Moreover, for these large attenuations, generally little flux is detected in the far ultraviolet, making the observed uncertainties very large. For these reasons, in the analysis presented here, I use equation (21) to model the attenuation on the photometry. When higher accuracy is required the full expression of equation (15) can be used.

Dust attenuation
As shown in equation (13) and equation (14), the new ppxf method allows each template to have a different attenuation curve. This feature can be used to vary the attenuation curve for specific groups of templates, based on the current understanding of dust attenuation in galaxies (see review by Salim & Narayanan 2020). Three groups of attenuation curves are expected to be useful: (i) for very young stars (with ages < ∼ 10 7 yr), which are still embedded in their birth clouds (Charlot & Fall 2000;Granato et al. 2000); (ii) for the entire stellar population (both young and old), due to diffuse dust; and (iii) for the gas emission lines from star-forming regions.
In ppxf one can adopt a generic function, which can be different for different templates and can have an arbitrary number of parameters.
The parameters can have bounds or can be kept fixed. By default I currently implemented a four-parameters attenuation function in linear units ( ) = ( , , , nodust ) defined by Here equation (23a) is the Lorentzian-like Drude function adopted by Noll et al. (2009) to describe the UV bump around 0 = 0.2175 m, with width Δ = 0.035 m. The equation (23b) is the expression adopted by Kriek & Conroy (2013), which includes the attenuation ′ ( ) and = 4.05 from Calzetti et al. (2000, eq. 4 and 5), and allows for a variable UV slope around the pivot -band wavelength = 0.55 m. Optionally, one can make a function of (Kriek & Conroy 2013, eq. 3) Finally equation (23c) allows one to specify the fraction nodust of the stellar population (for the given template) that is unattenuated, as suggested by Lower et al. (2022). The resulting ( ) is the factor to multiply the template at the given wavelength to model the attenuation effect.

DATA AND SPS MODELS
In the rest of this paper, I present an analysis of the combined photometric and spectroscopic data, for a sample of about 3200 galaxies at 0.6 < < 1, making use of various of the new features of ppxf introduced in the first part of the paper.

Spectroscopy
I study a subset of the galaxy sample of the LEGA-C survey (van der Wel et al. 2016). It is an ESO ESO/Very Large Telescope (VLT) public spectroscopic survey targeting galaxies in the redshift range 0.6 < < 1, selected based on their observed -band luminosity in the UltraVISTA/COSMOS catalogue by Muzzin et al. (2013), with a small variation of the limit with . In this work, I use the data from the LEGA-C third data release (DR3) presented in van der Wel et al. (2021). This redshift selection results in a mass-complete sample of 3445 galaxies in DR3 (90% completeness above lg( * /M ⊙ ) > ∼ 10.3). The selection, the completeness level, the characteristics of the sample and the data reduction are discussed extensively in van der Wel et al. (2016) and Straatman et al. (2018).
For my study, I focused only on the subsample of 3197 galaxies (including duplicates) in the DR3 catalogue with spectroscopic redshift 0.6 < < 1 and with measured stellar velocity dispersion * . This a sample has redshift = [0.67, 0.76, 0.93] and average / = [7, 14, 26] perÅ at the 16th (−1 ), 50th (median) and 84th (+1 ) percentiles. However, I verified that all my results are unchanged if I restrict the sample to the redshift range to 0.7 < < 0.9.
The survey data consist of spectra observed with the VLT/VIMOS multi-object spectrograph (Le Fèvre et al. 2003) covering the wavelength range 0.63-0.88 m with a spectral resolution ≈ 3500, equivalent to an instrumental dispersion inst ≈ 36 km −1 (van der . For the ppxf fits I logarithmically rebinned the spectra to a velocity scale Δ = Δ ln = inst (Cappellari 2017, eq. 8) to make sure the spectrum is Nyquist sampled.

Photometry
I use two large collections of photometric measurements for the LEGA-C galaxies. The first is the UltraVISTA/COSMOS catalogue by Muzzin et al. (2013). It includes PSF-matched photometry in 30 bands from 0.15 to 24 m. The catalogue is based on the NIR imaging data from UltraVISTA (McCracken et al. 2012). The optical data consist of broad-band Subaru/SuprimeCam data ( + + + + ), as well as * data from the CFHT/MegaCam (Taniguchi et al. 2007;Capak et al. 2007). It also includes the 12 optical medium bands (IA427-IA827) from Subaru/SuprimeCam (Capak et al. 2007). Also included are the GALEX FUV and NUV channels (Martin et al. 2005), and the 3.6 m 4.5 m 5.8 m 8.0 m and 24 m channels from Spitzer's IRAC+MIPS cameras (Sanders et al. 2007). The model predictions within the FUV GALEX band at ∼ 0.8 are quite uncertain and few photons are generally expected to escape at those wavelengths. However, I still include this band in the fit to verify that this is indeed the case in the data. Moreover, significant detections are observed for the most star-forming galaxies.
The second photometric catalogue is the COSMOS2020 by Weaver et al. (2022). Highlights of this catalogue, compared to the one by Muzzin et al. (2013), are much deeper Subaru Hyper Suprime-Cam broadband photometric measurements (Aihara et al. 2019) and deeper UltraVISTA DR4 observations . The extra depth is not an important feature for my study, as the LEGA-C galaxies were all well-detected in Muzzin et al. (2013) by design. However, I use the COSMOS2020 to assess the sensitivity of my results to the use of independent datasets. For this work, I adopt the catalogue produced with the farmer profile-fitting photometric extraction tool. For both photometric catalogues, I only included the typically 28 bands which have a transmission FWHM fully contained in the wavelength range of the adopted stellar population templates (see later) at the redshift of each galaxy.
The current COSMOS2020 farmer catalogue does not have the two GALEX bands, so I added them to compare it more closely with the UltraVISTA/COSMOS catalogue. I used the following steps for each galaxy: (i) I selected the bands that were common between the COSMOS2020 and the UltraVISTA/COSMOS catalogues. (ii) I used equation (34) to perform a linear least-squares fit and find a normalization factor that matches the two photometries for that galaxy. (iii) I applied the same factor to scale the two GALEX bands and included them in the COSMOS2020 bands.

Stellar population synthesis models
I used three independent SPS models to assess the sensitivity of the results to some of the adopted model assumptions. I selected the models based on two criteria (i) the ability to generate model spectra from the far UV at 0.1 m to about 2 m, to be able to constrain the decrease of towards the NIR region of the galaxy spectra and (ii) to include model spectra down to a young age of 1 Myr, to reproduce the many actively star-forming galaxies that are present in the sample. The age criterion forces me to exclude from this study the models by Vazdekis and Maraston, which I have extensively used in the past.
For all three models, I tried to select a consistent set of templates. In all cases, I adopted the same set of 43 ages logarithmically spaced by 0.1 dex from 1 Myr to 15.85 Gyr, defined as lg(Age/yr) = 6, 6.1, 6.2, . . . , 10.2. (25) The oldest age is 2-3× older than the age of the Universe at the redshift of the sample, which varies in my standard cosmology from 5.75 -7.75 Gyr between = 1 -0.6, but I did not truncate the models to physical ages, to check how well the data themselves can constrain the galaxy ages. For comparison, I additionally ran models where I constrained the maximum age in the fit to each galaxy to the age of the Universe at its redshift. I also excluded from all models the most extreme low metallicities [ / ] < −2. Here is my other setup for the three SPS: (i) The fsps models allow one to compute SPS models for a specified set of parameters. I used the Python bindings 10 (Johnson et al. 2021a) and the latest public v3.2 to compute a set of spectra with the above ages and 9 equally-spaced metallicities [ / ] = [−1.75, −1.5, −1.25, −1., −0.75, −0.5, −0.25, 0, 0.25], for a total of 387 SPS templates. I adopted a Salpeter (1955) IMF with a lower/upper mass cut of 0.08 and 100 M ⊙ respectively, for consistency with bpass, and used the MIST isochrones (Choi et al. 2016). But I note that my results are virtually insensitive to the slope of the IMF at lower masses. I computed the SPS without including the effect of gas or dust and adopted default parameters for the other parameters. This returns SPS spectra computed using the MILES stellar library (Sánchez-Blázquez et al. 2006;Falcón-Barroso et al. 2011) for the optical region, which is the one I fit in the LEGA-C spectra.
(ii) The galaxev models provide a Fortran code (version 2020) which I used to produce a set of SPS spectra with the same ages as above and computed at the provided 5 metallicities [ / ] = [−1.74, −0.73, −0.42, 0, 0.47], for a total of 215 SPS templates. This SPS model also uses the MILES library to generate spectra of the optical region. Also here I adopted a Salpeter IMF. The models use the Padova isochrones (Bertelli et al. 1994;Girardi et al. 2000;Marigo et al. 2008).
(iii) The bpass models v2.3 are provided as a set of files precomputed at a given set of metallicity. I adopted the 10 metallicities 11 [ / ] = [−1.3, −1, −0.8, −0.7, −0.5, −0.4, −0.3, 0, 0.2, 0.3], for a total of 430 SPS templates. The models are provided for a single IMF having a power slope -2.35 (Salpeter slope) above > 0.5 M ⊙ and a slope -1.3 at lower masses. I used the version of the SPS for single stars, ignoring binaries, with [ / ] = 0, for consistency with the other two SPS models. These SPS models are fully synthetic. They use isochrones produced by a derivative of the Cambridge stars code (Eggleton 1971) as described by Eldridge et al. (2008).

DYNAMICAL MASSES FROM SERSIC PHOTOMETRY
We don't know the true masses of galaxies, so we can't tell how good the mass estimates from different stellar population codes are. One way to test the accuracy is to use good dynamical models. I show how I do this in this section. I use mass-follow-light axisymmetric JAM dynamical models which are as quick and simple to use as the usual virial estimator (e.g. Cappellari et al. 2006), and require the same data, but do not have its problems.  (26) and the minimax approximation (red line) for the interval 0.2 < Ser < 16 of equation (28). I also show for comparison the approximation (blue line) by Ciotti & Bertin (1999), which is very accurate for Ser > 0.5, but starts deviating rapidly at smaller Ser . The inset shows the same thing using a different scale for the -axis.

Updated coefficient for the Sersic profile
I assume a galaxy with surface brightness described by a Sersic profile with elliptical isophotes of constant axial ratio Ser where the elliptical radius is with aligned along the galaxy photometric projected major axis. The parameter ( Ser ) is defined by the requirement that Ser e represents the semi-major axis of the isophote containing half of the total luminosity of the Sersic model. This implies that ( Ser ) is the solution of Γ(2 Ser ) = 2 Γ(2 Ser , ( Ser )) (Ciotti 1991), where Γ( , ) is the incomplete gamma function (Olver et al. 2010, equation 8.2.2) and Γ( ) = Γ( , 0) is the complete one (Olver et al. 2010, equation 5.2.1). A useful approximation for ( Ser ) was presented by Ciotti & Bertin (1999). This is very accurate for values of Ser > ∼ 0.5 but starts becoming rapidly inaccurate for smaller Ser values. The LEGA-C catalogue contains values of the Sersic index down to Ser = 0.2 where I found that the Ciotti & Bertin (1999) approximation for ( Ser ) reaches an error of 11% (Fig. 4).
To overcome this limitation I computed an alternative approximation for ( Ser ). I adopted the same number of terms and the same mathematical form, but I adjusted the coefficients to obtain the minimax solution, minimizing the maximum absolute relative error of ( Ser ) using nonlinear optimization over an interval including the most extreme Sersic indices existing in the literature. I found the following expression which has a maximum absolute relative error of 5.6 × 10 −4 in the whole interval 0.2 ≤ Ser ≤ 16. The fact that the relative error reaches its maximum value, with alternating sign, at five values of Ser (Fig. 4) confirms that this is indeed the minimax solution for the adopted function and interval (e.g. Press et al. 2007, sec. 5.15).

Jeans Anisotropic Models from Sersic photometry
I performed dynamical modelling of the full LEGA-C sample of 3197 galaxies with measured * and 0.6 < < 1 using the Jeans Anisotropic Modelling (JAM) method 12 (Cappellari 2008(Cappellari , 2020. Dynamical modelling of LEGA-C galaxies with JAM was previously done for a subset of 797 galaxies by van Houdt et al. (2021) using a Bayesian approach, but I extended it to all galaxies with measured * . A simpler alternative to dynamical models would be to use a virial estimation of the galaxy masses (e.g. Cappellari et al. 2006) based on the fitted parameters of the (Sersic 1968) profiles provided in the LEGA-C DR3 catalogue (van der Wel et al. 2021). The virial estimates are also included in the DR3 catalogue.
The main advantage of virial estimators is that they are fast and easy. However, a common limitation is that they do not account for differences in the spectroscopic aperture and the instrumental pointspread function, which can be significant, especially in high-redshift observations like LEGA-C. Moreover, virial estimators do not allow one to explicitly assume an intrinsic shape or anisotropy for the galaxies under study.
Although virial estimators are still useful to study scaling relations (e.g. Cappellari et al. 2013a;Li et al. 2018;Zhu et al. 2023a), nowadays there is no reason to use them to compute galaxy masses. In fact, using e.g. the public JAM package, one can compute a more reliable dynamical mass for a galaxy approximated by a Sersic model with a similar time and effort as using the virial estimator, while allowing for intrinsic shape, anisotropy, aperture and PSF effects, without introducing unnecessary approximations.
To simplify this task of computing accurate dynamical masses of galaxies with available fitted Sersic parameters and * , I developed a simple procedure jam axi sersic mass and I have made it publicly available in the updated version 7.2 of the JAM package. The procedure requires the following inputs from the users: (iv) The observed second moment * (which includes both rotation and random motions) and uncertainty of the stellar line-of-sight velocity distribution, preferably measured at a similar wavelength as the imaging used to fit the Sersic model.
Given an angular diameter distance , the procedure then returns the dynamical mass JAM of the Sersic model and its formal uncertainty in a fraction of a second.
The procedure uses the method and mge fit 1d routine within the MgeFit package 13 (Cappellari 2002) to accurately fit the onedimensional Sersic profile of equation (26) with a one-dimensional Multi-Gaussian Expansion (MGE). Then assumes a fixed axial ratio Ser obs for all MGE Gaussians and an arbitrary reference total mass 0 for the model. It uses the jam axi proj procedure in the JAmPy package (Cappellari 2008(Cappellari , 2020 to calculate a PSF-convolved prediction for the rms, at a large set of discrete sky locations ( , ) finely sampling the adopted spectroscopic aperture. The luminosityweighted second moment inside the whole aperture is computed as where the summation extends to the pixels of flux inside the aperture. Given the general scaling ∝ 2 between the total mass and velocities in a dynamical model, the dynamical mass of the Sersic model is then given by The physical meaning of the dynamical mass JAM , as derived from mass-follow-light dynamical models or virial estimators, is often a source of confusion. This is because JAM is neither a total stellar mass, nor a total mass of a galaxy which includes its dark halo. Moreover, the value of JAM is highly dependent on the extrapolated outer profile. For example, a Sersic model with Ser = 6 contains 21% of its total light outside 4 e , which is about the maximum radius one can observe in typical photometry. This strong dependence on extrapolation prevents comparison of masses when accuracies better than a few 20% are desired.
The quantity that both dynamical and stellar population models are robustly measuring is the mass-to-light radius within the region covered by the spectroscopic (for the dynamics) or photometric (for the population) observations. Specifically, if one divides the Sersic dynamical mass JAM returned by the procedure by the analytic total luminosity of the same Sersic model (Ciotti 1991 the mass-to-light ratio provides a very accurate approximation of the average / within a sphere of radius comparable to the size of the spectroscopic aperture. I used jam axi sersic mass to compute JAM and ( / ) JAM for all the LEGA-C galaxies in my subsample. I assumed a spectroscopic aperture of 1 ′′ ×1 ′′ and a characteristic PSF of 0. ′′ 75 FWHM from van Houdt et al. (2021). I adopted as the intrinsic axial ratio for all galaxies the mean value Ser intr = 0.41 of the Gaussian distribution inferred by van Houdt et al. (2021) by inverting the observed shape distribution of the LEGA-C sample. For the anisotropy, I used a typical value = 0.2 expected for the assumed mean intrinsic shape (Cappellari 2016, fig. 9). I assumed a cylindrically-oriented (align='cyl') velocity ellipsoid for JAM (Cappellari 2008). The parameters of the Sersic profiles Ser e , Ser , Ser obs and Ser come from the LEGA-C DR3 catalogue .
I also used the dynamical models to calculate the average projected density Σ JAM 1 within a circle of radius = 1 kpc at the angular 13 V5.0 of Python MgeFit from https://pypi.org/project/mgefit diameter distance of the galaxy. This quantity was shown to closely relate to galaxy quenching (Cheung et al. 2012;Fang et al. 2013), like * . However, here instead of using the stellar mass as in previous studies, I used the dynamical mass from JAM. I obtained this value by circularising and analytically integrating the best fitting MGE as described in equation (11) of Cappellari et al. (2013a).

SETUP FOR ppxf AND TESTS
In this study, fitting (twice) the VIMOS spectrum (≈ 2800 spectral pixels) and typically 28 photometric bands for a single galaxy with ppxf takes about 1 min. This compares with the "roughly 100 CPU hours" reported by Tacchella et al. (2022) in a similar state-of-the-art study using DEIMOS spectroscopy and the prospector Bayesian code (Johnson et al. 2021b). This is a computation time difference of nearly four orders of magnitude! Of course, the two methods perform quite different tasks and the large computational cost is a standard feature of Bayesian methods and not a weakness by itself. However, the execution time of a method affects the kind of tasks one can address and the variety of modelling choices one can explore as I outline in this section.

Non-parametric population model
A key feature made possible by least-squares methods is the ability to explore non-parametrically the joint distribution of SFH and chemical composition with high resolution. My setup uses 43 non-parametric age bins, and each age bin is allowed to have a different non-parametric metallicity (5 -10 bins depending on the SPS code) for a total of up to 430 bins. This contrasts with the 10 non-parametric age bins and the single metallicity for the entire galaxy adopted by Tacchella et al. (2022). Crucially, even when using a non-parametric grid of a few hundred templates the least-squares method guarantees global convergence to the most likely weights distribution. This is because the constrained quadratic-programming problem being solved (Cappellari 2017, eq. 27) is known to possess a unique global minimum (e.g. Nocedal & Wright 2006). This contrast with Bayesian methods, where Tacchella et al. (2022, sec. 3.2) reported that "the fits do not converge within a reasonable amount of time" with 14 non-parametric bins. The use of non-parametric models is important for a proper recovery of the stellar population in galaxies (Lower et al. 2020).

Polynomials
A least-squares method like ppxf allows for a quick exploration of different modelling assumptions. In the course of this study, I was able to easily test different options, each with all three SPS models, for all 3200 galaxies, to test how they affected the final results. To fit the spectrum, we can choose between additive or multiplicative polynomials (the photometry does not use any polynomials). Additive polynomials are specified by the ppxf keyword degree and can help with template mismatch, AGN modeling or sky subtraction errors. Multiplicative polynomials are specified by the ppxf keyword mdegree and can account for spectral flux calibration issues or reddening effects.
For my tests, I run models with both multiplicative and additive polynomials degree from mdegree=degree=-1 (i.e. using only attenuation and no polynomials) to degree 4 and found that the solution changes slightly without polynomials but quickly stabilizes as soon as one allows for a nonzero degree. Results were similar when only using additive or only multiplicative polynomials to adjust the spectral stellar continuum. This was a non-obvious result in the present analysis, given that the LEGA-C spectra were calibrated using SPS models ) and this non-standard calibration may leave an influence on the results. The polynomials should effectively remove any memory of possible inaccuracies in spectral calibration. I adopted the ppxf keywords mdegree=2, degree=-1 for my standard setup.

Dust attenuation model
I truncated all three the SPS templates to 0.01 m < < 5 m (except for the bpass SPS which only extend to < 2 m) to remove the influence of dust on the spectral shape (e.g. Conroy 2013, fig. 1). This is because the modelling of dust from the energy balance of UV light reradiated to the IR requires several further assumptions and is not implemented in all modelling codes. Moreover, the currently available bands are included in the fitted range anyway. This situation is changing rapidly with the James Webb Space Telescope (JWST) and will soon be revisited.
For the full set of 3200 galaxies, I experimented with three different assumptions for the attenuation: (i) I adopted a single four-parameter attenuation for all stellar templates as in equation (23); (ii) I reduced the attenuation curve to two parameters ( , ) by assuming nodust = 0 and adopting the − relation by Kriek & Conroy (2013, eq. 3); I still applied this attenuation for all stellar templates; (iii) I adopted the two-components attenuation model by Charlot & Fall (2000). For this, I used a birth-cloud attenuation of the form I apply this only to the stellar templates younger than 10 Myr, while I used the same attenuation as in (ii) for the diffuse dust component affecting all stellar templates. In all three cases, I fit a different Calzetti et al. (2000) attenuation curve (equation (23) with = = nogas = 0) for the gas emission lines templates. I found that options (i) and (ii) produce an insignificant difference in the final results, but in the first case there is more degeneracy in the attenuation parameters, which makes any trend between dust parameters less obvious. Option (iii) generates a similar result for the older and intermediate populations, as expected. However, when allowing the youngest population to have its own attenuation, this becomes completely degenerate with the amount of star formation in that same young component. As a result, one can obtain good fits with an unlikely large attenuation associated with an equally large star formation, but with complete degeneracy between the two parameters. This is the well-known SFR-dust degeneracy mentioned in Section 1. It can be broken by introducing extra assumptions on the dust geometry and reradiated UV fraction, combined with rest-frame IR data, which I do not have.
In conclusion, I adopted as my standard choice the two-parameters attenuation function of option (ii), which applies to the whole population, as done e.g. by Kriek & Conroy (2013). I enforced bounds on the parameters as −1 < < 0.4 and 0 < < 4. As illustrated by Kriek & Conroy (2013, fig. 1), their two-parameters parametrization differs from both the Milky Way (Cardelli et al. 1989) and Calzetti et al. (2000) attenuation laws, but appear to better describe highgalaxies. However, using the full four-parameter attenuation curve, which covers the Calzetti et al. (2000) curve and other curves from the Milky Way to the Large Magellanic Cloud (Gordon et al. 2003), does not alter my scientific conclusions.

Matching photometry and spectra
I have a spectroscopic redshift for every galaxy from the LEGA-C catalogue. I only included photometric bands in the fit if their FWHM is fully enclosed by the template's wavelength coverage at that redshift. Before the ppxf fits, I pre-computed the ⟨ ( )⟩ and , in equation (21). I used the same photonic throughput file for all filters, produced for the eazy code (Brammer et al. 2008) and available online 14 .
Spectroscopy was observed within 1 ′′ slits, while photometric observations were either measured within a 2. ′′ 1 aperture (Muzzin et al. 2013) or are total magnitudes (Weaver et al. 2022). This means that calibration is needed to match the flux levels of photometry and spectroscopy. It's important to note that in some cases, the photometric fluxes may correspond to a different stellar population than that sampled by the spectra. With this in mind, I assumed that spectroscopy and photometry originate from a single spectral energy distribution and applied a constant scaling factor to the spectrum spec ( ). This factor ensures that the synthetic photometry derived from the spectrum using filter transmission curves matches the observed LEGA-C photometry in bands covered by the spectroscopy. This step calibrates the overall normalization of the spectrum flux using photometry before starting the ppxf fit.
To match the VIMOS spectra to the photometry, before calling ppxf, for every galaxy I first computed synthetic photometric fluxes ⟨ ⟩ from its VIMOS spectrum using equation (15), for the subset of photometric bands (typically 7) contained within the VIMOS wavelength range. I then multiplied the spectrum by a factor to minimise the 2 between the synthetic and observed photometry, only for the few bands in common. It can be computed with the general analytic linear-fitting relation (e.g. Cappellari 2008, eq. 51) where the "data" vector d has elements = /Δ , the observed photometric fluxes , divided by their uncertainties Δ and the "model" vector m has elements = ⟨ ⟩ /Δ , the synthetic fluxes also divided by the data uncertainties.
Like Kriek & Conroy (2013), I didn't use the catalogues' formal photometric uncertainties in any of my fits, including when calculating and during ppxf fits. This is because the small error bars at longer wavelengths would have dominated the fits. Additionally, after many fits, it became apparent that systematic imperfections in the SPS model assumptions or data were the main source of uncertainty, rather than random noise. Instead, I use fixed linear uncertainties for all photometric bands of a given galaxy as explained in the next section.

Outliers removal
To remove outliers from the spectral fits, I follow a common practice (e.g. Westfall et al. 2019) that involves multiple ppxf fits and adjusting the uncertainties based on the fit residuals. My approach is designed for robustness as follows: (i) I perform an initial ppxf fit assuming a reasonable fixed uncertainty Δ phot ( ) for all bands of 3% of the maximum photometric flux phot ( ) for that galaxy. Similarly, for the spectrum uncertainty Δ spec ( ), I adopt a constant value of 10% of the median galaxy spectrum spec ( ). A constant uncertainty is a good approximation for the VIMOS spectra and reduces the noise in the fit, with respect to adopting a more accurate but noisy error spectrum (e.g. as given by the reduction pipeline). The best fit is not sensitive to the scaling of uncertainties, which only affects the relative weight of the spectrum and photometry.
(ii) After the fit, I estimate the rms noise spectrum spec noise per pixel from the fit residuals, in a statistically robust way, by computing for every spectral pixel the interval containing 68% of the residuals, within a moving window of 100 pixels.
(iii) I mask the pixels deviating more than 3 spec noise from the best fit. I repeat the masking in a loop, while iteratively adjusting the normalization of the best-fitting spectrum using the non-masked pixels from equation (34), until the mask does not change anymore.
(iv) I multiply Δ spec ( ) by √︃ 2 spec / , where is the number of non-masked spectral pixels, and I compute 2 spec from the spectrum alone, in such a way that, after rescaling of the uncertainties, the resulting 2 spec = .
(v) I do the same constant rescaling for the photometric uncertainties to enforce 2 phot = .
Here is the number of fitted photometric bands, and I compute 2 phot from the photometry alone. (vi) After the spectral masking and the rescaling of the photometric Δ phot ( ) and spectral Δ spec ( ) uncertainties, I perform a second ppxf fit, from which I extract the final results.
The rescaled uncertainties are generally of the same order of magnitude as the formal ones provided by the pipelines. However, there can be significant relative differences between different photometric bands. I tested the full LEGA-C sample and found that none of the results in this paper depended on whether I used the formal or rescaled uncertainties. However, the approach I adopted significantly reduced the number of cases where, after visual inspection, the formal best fit did not match the data well because of unrealistically small formal uncertainties that overemphasized certain photometric bands.

Gas model and kinematic constraints
For galaxies without Active Galactic Nuclei, gas emission could be approximately predicted based on the galaxy SFH and could be included in the models, with some extra assumptions, based on photoionization models like cloudy (Ferland et al. 1998(Ferland et al. , 2013. This feature is implemented in fsps and can be useful when fitting photometry alone where the gas emission are poorly constrained by the data. However, in my case, I have many good-quality spectra in addition to photometry and I want to be able to fit the gas more accurately than a model could predict. For this reason, I fit the gas emission lines in a model-independent way with ppxf. With ppxf one can fit many gas emission lines simultaneously to the stellar continuum. This is especially important when studying the stellar population of star-forming galaxies or AGNs, where key absorption lines like the Balmer series are filled by emission. However, when fitting gas lines in relatively low S/N spectra, it is essential to set constraints on the parameters of the gas lines, to prevent possible degenerate situations. An example of a situation to avoid is when the spectrum does not have gas emission and the Gaussian describing an emission line becomes so wide as to become degenerate with the shape of the stellar continuum. This is one of the types of situations for which I designed the linearly constrained algorithm of Section 3.2. For my fits to the LEGA-C spectra, after some experimentation focusing on the few problematic fits, I found it sufficient to require the dispersion of the gas emission lines to be smaller than the stellar one gas < * and in addition I required the gas and mean stellar velocities to satisfy | gas − * | < 500 km −1 . I enforced these requirements as linear constraints in ppxf (keyword constr kinem). My strict constraints on the gas dispersion is not always verified in galaxies and I would not recommend it when one is interested in the gas kinematics. However, it appears to work well in eliminating spurious solutions for the stellar population alone from the present type of spectra.
The emission lines that I included in the ppxf fits are all the lines listed in Belfiore et al. (2019, table 1). In particular, those falling within the LEGA-C wavelength range for 0.6 < < 1 are the Balmer series bluer than H , the [OII] 3726,29, [NeIII] 3868,69 and [OIII] 4959,5007 doublets, and the HeII 4687. I force the kinematics of all the gas lines to be the same and I additionally fix the [OIII] doublet to the 1/3 ratio. I fit the Balmer series as a single gas template with decrement for Case B recombination, for temperature = 10 4 K and electron density = 100 cm −3 from Storey & Hummer (1995). I allow the gas templates to have their own Calzetti et al. (2000) attenuation curve. Fixing the intrinsic ratios of the Balmer series allows me to provide a better extrapolation of the gas filling the weakest (higher-order) absorption lines of the series, even when the S/N of the spectrum is not high enough to constrain them.
Note that, although I do not include theoretical gas emission predictions in the SPS models, I do include the contribution of the emission lines that are spectroscopically constrained in the photometry. In particular, when Balmer lines are present in the spectrum, the line fluxes of the Balmer series, and in particular of H , which is outside the LEGA-C wavelength range, are included in the photometric fit. However, I checked that this inclusion has a minimal effect on the final results.

Velocity dispersion matching
The LEGA-C data have an instrumental dispersion of inst ≈ 36 km −1 (Section 4.1), ignoring possible variations within the rather small wavelength range. The MILES stellar templates used in both the fsps and galaxev models were observed with an instrumental resolution of Δ ≈ 2.50Å FWHM (Falcón-Barroso et al. 2011), equivalent to = /Δ ≈ 1600 at the typical wavelength ≈ 0.4 m covered by LEGA-C. This corresponds to an instrumental dispersion inst = √ 4 ln 4 ≈ 80 km −1 .
Ideally, I would like to use SPS models based on stars with higher resolution than the galaxy spectra. However, assuming that the instrumental line spread functions are approximately Gaussian, one can still use a template with higher instrumental dispersion inst,tem than that inst,gal of the observed galaxy spectrum as long as ( 2 inst,gal + 2 * ) > 2 inst,tem , where * is the real "astrophysical" dispersion of the galaxy stars. After the ppxf fit one can compute the corrected stellar dispersion with the standard expressions (Cappellari 2017, sec. 2 As 2 diff is in this case a negative quantity, I can model the dispersion of galaxies down to * > ∼ | 2 diff | 1/2 = 71 km −1 . I compared my fitted dispersions * with the values in the LEGA-C DR3 catalogue, which were measured with ppxf using higher resolution synthetic templates as described in Bezanson et al. (2018). I found a good agreement assuming | 2 diff | 1/2 ≈ 48 km −1 , which suggests possible inaccuracies in the quoted relative instrumental dispersion of the galaxies  Figure 5. Examples of ppxf fits to the LEGA-C galaxy spectra and 28-bands photometry using the fsps models. The top three galaxies require a model with a single short star formation event, while the bottom three galaxies require multiple discrete star formation events. For each galaxy, the top panel shows the photometric measurements (blue error bars) and the best fit (green diamonds), while the golden line shows the underlying best-fitting template with included emission lines. The grey vertical band indicates the range where spectroscopy was also fitted. The second panel shows the observed spectrum (black line) and the best-fitting total spectrum (orange line). The best-fitting stellar spectrum alone is shown in red and the gas emission one is in magenta. The residuals (arbitrarily offset) are indicated with green diamonds and the masked pixels with blue lines (and corresponding grey vertical bands). The last three panels show the distribution of the ppxf weights, indicating the bolometric luminosity bol of each stellar population of given age and metallicity. The weights are shown for (i) no regularization, (ii) regularization regul=10 and (iii) regul=100 as written in the plots.  (1), (2) and (3): ID, right ascension and declination J2000 in degrees from the LEGA-C catalogue of van der Wel et al. (2021). Column (4): JAM dynamical masses from Section 5.2 in solar masses. For accurate quantitative use, one should divide these masses by the luminosities of the Sersic models in the LEGA-C catalogues to obtain the total ( / ) JAM as described in Section 7.3; Column (5): dynamically-determined average mass density within a cylinder of radius = 1 kpc along the line-of-sight. I computed this from the best-fitting JAM model; Column (6): Stellar masses from ppxf using SPS templates from fsps. These masses include living stars and stellar remnants, but exclude gas lost during stellar evolution. I assume a Salpeter IMF with a lower/upper mass cutoff of 0.08 and 100 M ⊙ respectively; Columns (7) and (8): lg Age and [ / ] weighted by the bolometric luminosity (Section 7.4). Columns (9) and (10): -band attenuation in mag and slope from equation (23), for the two-parameters attenuation described in Section 6.3. I show only the first ten rows of this table, while the full electronic table for 3197 galaxies (including duplicates) is available as Supporting Information from the MNRAS website. and the templates. Regardless of the reason for this discrepancy, only 40 of the 3197 galaxies in the catalogue with measured dispersion have * < 48 km −1 , likely due to measurement uncertainties. This implies that I can safely use the SPS based on MILES models to study the stellar population of LEGA-C galaxies.

RESULTS
In this section, I describe the results of my stellar population modelling with ppxf. I also compare masses from stellar population and galaxy dynamics. The key quantities used in this paper are given in Table 1.

Spectral fit examples
In this paper, I focus on galaxy observable trends rather than on comparisons with models of galaxy formation. For this reason, instead of converting the SFH recovered by ppxf into stellar masses formed in a given time interval, I will always show the fraction of bolometric luminosity contributed by each template, as a function of their age and metallicity [ / ]. More precisely, I integrate the luminosity from the template spectra only within the region 0.1 < < 3 m covered by the data. This is to avoid the possibility of interpreting very young stars, which emit most of their luminosity for < 0.1 m, as contributing significantly to my observables, even when their flux is not detected in the data, but simply extrapolated. I still indicate my luminosity as bol because, except for extremely young stars, it still represents a very good approximation for it. The advantage of using bol rather than SFH, is that one can get a direct sense of what the data actually show, without strongly nonlinear conversions into masses, due to the large / differences of different stellar populations. In fact, I would argue that comparisons with models of galaxy formations are generally more meaningful when the models, for which all quantities are known accurately, are converted into luminous observables, rather than trying to do the reverse by extracting SFH in masses from the data.
In the course of this study, I fitted the 3197 galaxies of my subsample (Section 4.1) with ppxf multiple times with different levels of regularization, or no regularization at all, to test the sensitivity of the results. In Fig. 5 I illustrate the effect of regularization on some high-/ spectra. These figures, like Fig. 1, illustrate the ill-conditioning of the stellar population inversion, which prevents one from obtaining a unique solution, even from very good data. Nonetheless, the figure also illustrates the ability of the method to distinguish the striking difference between (i) galaxies that can only be described, even at high regularization 15 (regul=100) by a single star formation event at a very localized lg Age (top three panels in Fig. 5) and (ii) galaxies that require multiple and separated star formation events to be described (bottom three panels in Fig. 5). The galaxies in the top panels are essentially described by a single SPS model, from 0.1 m to 3 m, for both spectra and photometry. This highlights the success of the SPS models in accurately predicting real galaxy spectra.
In Fig. 6 I show additional examples of ppxf fits to good quality spectra to give a sense of the variety of spectral morphologies and the corresponding variations in the bol weights distributions. I used in all these cases a high regularization (regul=100). Also here one can clearly see the striking difference between (i) the three galaxies in the top row, which can only be described as a single burst of star formation, which happened at different times and (ii) galaxies requiring multiple discrete star formation events. Star formation events appear to have a similar extent in ln Age, which seems to imply that events in the past lasted longer than recent ones. This is likely an artefact of our general ability to more accurately detect age differences in recent events.

Comparing ppxf stellar masses with other methods
The weights for different ages and metallicities produced by a fit with ppxf can be converted into stellar masses. In this section I compare 15 A given value of the ppxf keyword regul roughly implies that neighbouring weights can differ by Δ ∼ 1/regul. As I normalize all galaxy spectra to the same average flux (e.g. average=1), setting a given regul value roughly corresponds to requiring a similar level of smoothness in the distribution of the weight.  Figure 6. More examples of ppxf fits to spectra and photometry. The meaning of the symbols is the same as in Fig. 5, but I show a single regularization (regul=100). The top three galaxies require a single star formation event, while the rest can only be modelled with multiple discrete star formation events. the masses derived with ppxf against the stellar masses produced by other codes.
For my comparisons I used the published stellar masses * for the LEGA-C galaxies from the three stellar population codes: (i) LePhare (Arnouts et al. 2002), (ii) EAZY (Brammer et al. 2008) and (iii) Prospector (Johnson et al. 2021b). I extracted the values of stellar masses for the first two codes from the Farmer version of the COSMOS2020 catalogue (Weaver et al. 2022), while for the last code, I used the values described in the LEGA-C DR3 paper (van der Wel et al. 2021, appendix B) as kindly provided by Arjen van der Wel. I tried to isolate the effect of the fitting methods from differences in the extrapolation of the galaxy's total luminosities. For this, with the * from the COSMOS2020 catalogues, I rescaled the masses by the difference in the total band luminosity between the Weaver et al.  (2021) or from EAZY and LePhare in Weaver et al. (2022), for the galaxies that are common in both catalogues. The three panels show the comparison for each method. The red line is the one-to-one relation, and the black line is the best fit using LtsFit (Cappellari et al. 2013a). The x-symbols are the outliers identified by LtsFit, and the dotted lines are the selection limits. The colours show the luminosity-weighted mean stellar age ⟨lg Age⟩, smoothed by loess. The grey contours show the kernel density estimation of the galaxy distribution. I indicate the / and ⟨lg Age⟩ selection criteria, the number of selected galaxies, the rms scatter Δ, and an approximate relative 1 error * in * for each panel. I estimated the latter as * ≡ Δ/ √ 2 by assuming both * have the same uncertainty.  Fig. 7 for a subsample with high spectral / > 10 and old ages ⟨lg Age⟩ > 9.5. As the age range is limited, the colours show here the loess smoothed stellar velocity dispersion.
in the loess package 16 by Cappellari et al. (2013b) and using the keyword rescale=True to equalize the axes of maximum/minimum variance before smoothing. I used a small smoothing parameter frac=0.1 in all plots of this paper. The loess-smoothed values are the two-dimensional equivalent of the average trend that is often shown in one-dimensional plots. The key difference is that the scatter cannot be easily shown in two dimensions together with the average trend. The scatter is better visualized using a different projection.
To estimate the scatter between two pair of measurements, while removing outliers, I used the LtsFit package 17 described in Cappellari et al. (2013a, sec. 3.2), which combines the Least Trimmed Squares robust technique of Rousseeuw & Van Driessen (2006) into a leastsquares fitting algorithm which allows for errors in all variables and intrinsic scatter. Instead of using a fixed -clipping criterion with the 'clip' keyword in the ltsfit procedure, I used an adaptive clipping that depends on the sample size. This is the value that would produce on average one outlier in a Gaussian distribution of the given sample size. It can be computed using the Scipy class scipy.stats.norm as clip=abs(norm.ppf(p/2)), with = 1/ and the sample size. For reference, with = 100 this gives clip=2.58 (default for ltsfit), for = 500, clip=3.09 and for = 3000, clip=3.59. The ltsfit procedure returns a robust estimate of the rms scatter Δ from the best-fitting relation. When the uncertainty of the two quantities I am comparing is the same, one can estimate it as * = Δ/ √ 2. In all my plots I rescaled the masses provided by all other methods to have the same median as the ppxf values, which I did not modify. This is the reason why all plots follow the one-to-one relation without any overall offset. This is to remove the effect of differences in the assumed stellar IMF, gas loss or stellar remnants, whose investigation is outside the scope of this paper. I find that the observed scatter in all galaxies, when selected irrespective of their age or / , is in agreement with a 1 uncertainty in the stellar mass of about 30% for every method. This result is consistent across all six pairwise comparisons of the methods, with differences within the measurement uncertainties. However, the behaviour of the differences is markedly different as a function of mean ages. The comparison of ppxf against Prospector (Fig. 7), show that the scatter is smaller for older galaxies at given mass, but the younger ones generally scatter symmetrically around the one-to-one relation. The exception are the outliers, which have generally lower masses in Prospector than in ppxf. The comparison of ppxf and LePhare is similar to Prospector, but with less low-mass outliers. However there is no evidence for a tightening of the correlation for older models. The comparison of ppxf and EAZY, unlike the other two models, shows a strong asymmetry as a function of age: older models tend to be less massive in EAZY than ppxf, while younger models are more massive in EAZY. This asymmetry is reminiscent of the difference between Prospector and EAZY reported in Leja et al. (2019). In fact, the same age asymmetry is seen when comparing EAZY with either Prospector or LePhare.
As suggested by Fig. 7, the scatter dramatically decreases (Fig. 8) if I compare ppxf and Prospector only for the galaxies with the oldest ages and largest spectral / (which also implies brightest photometry). Given the small age range, I coloured galaxies by their * . For this subset of galaxies, the inferred scatter of about 16% is half of that for the general population, without significant trends, except again for some outliers where Prospector gives lower masses than ppxf.
When comparing stellar mass estimates of real galaxies, it is often difficult to assess the real accuracy between different methods, because the true masses are unknown. In the next section, I will address this issue, for a subsample of the LEGA-C sample, using mass determinations from stellar dynamics.

Comparing JAM dynamical with stellar population /
One of the sources of confusion in comparing galaxy masses from stellar populations and dynamical modelling is the ambiguity of the so-called 'dynamical mass' of a galaxy. This term does not refer to a well-defined physical quantity, because in the standard cosmological model, the galaxy's total mass is largely composed of dark matter, which is difficult to constrain with the available kinematic data of limited radial coverage. The quantity that the dynamical models reliably measure is the total density profile within the spatial extent of the kinematic tracer. However, this density profile cannot be easily converted into a mass, because it depends on the choice of the integration volume and on the assumptions about the galaxy shape and orientation (Cappellari et al. 2013a, sec. 3.3.1). A more robust and convenient quantity for comparing population and dynamics is the total mass-to-light ratio ( / ) JAM , within the inner regions of a galaxy. This quantity has a weak dependence on the integration volume and galaxy inclination. It should be always preferred for accurate comparisons.
In Section 5, I presented unbiased dynamical models of the stellar kinematics, based on the Sersic photometric models in the F814W/ACS band, for all the galaxies in the LEGA-C sample with available velocity dispersion. Previous studies of nearby galaxies using high-resolution integral-field stellar kinematics have demonstrated that this kind of models can reliably estimate the total dynamical mass-to-light ratios ( / ) JAM in the central regions of galaxies with uncertainties of about 5% (Cappellari et al. 2006(Cappellari et al. , 2013aShetty et al. 2020b;Zhu et al. 2023a). Importantly, these studies have also shown that precise and unbiased ( / ) JAM can be obtained using models where the total mass distribution follows the luminous one. In fact, these mass-follow-light models are more robust and precise than those that explicitly separate the luminous and dark matter, when the main goal is to measure the total ( / ) JAM (Cappellari et al. 2013a;Zhu et al. 2023b).
An additional complication is that the dynamics is sensitive to all The black line shows the best linear fit using LtsFit, which strongly deviates from the one-to-one relation. The x-symbols mark the outliers identified by LtsFit, and the dotted lines mark the selection limits. The colours indicate the stellar velocity dispersion * , smoothed by loess. The grey contours indicate a kernel density estimate of galaxies. For each panel, I show the signal-to-noise ratio / and the mean logarithmic age ⟨lg Age⟩ selection criteria, the number of selected galaxies , the root mean square scatter Δ, and an approximate relative error * in stellar mass * for both methods. I estimate * as * ≡ Δ/ √ 2 by assuming both methods have the same uncertainty. mass components: stellar, gas, and dark matter, while the population only measures the stellar one. However, detailed nearby studies have shown that for the passive galaxy population, dark matter contributes only about ≈ 10% of the total mass within 1 e (Cappellari et al. 2013a;Zhu et al. 2023b), while gas mass has an even smaller contribution (e.g. Young et al. 2011). This allows us to assume that the dynamical ( / ) JAM accurately approximates the stellar one.
To compare the dynamical ( / ) JAM I need the same quantity from stellar population ( / ) pop . Having the full spectra from the stellar population models, one can compute the ( / ) pop in any band. However, since I only have total stellar masses * from Prospector LEGA-C catalogue, I divide * by the total luminosity in SUBARU -filter from Muzzin et al. (2013) catalogues. This assumes that ( / ) pop is constant over the full galaxy, which is likely a decent approximation for passive galaxies. Using photometry consistent with mass derivation ensures no spurious differences in mass and luminosity extrapolation. However, differences between -band and F814W filters may introduce some small systematic offset in the / comparison. However, I am interested in relative uncertainties more than absolute offsets. The / is usually reported in solar units. For this, I assume a solar luminosity = 5.11 mag in AB system from Willmer (2018) and report / in units of M ⊙ /L ⊙ , given that F814W approximately corresponds to rest-frame SDSS -band filter at median redshift ≈ 0.8 of my sample. This normalization is a constant and does not affect the comparison. The rest-frame wavelength of the filter varies by up to ≈ 10% within the redshift range, however, this shift is the same for both dynamical and population / and does not affect the scatter.
I compare the ( / ) JAM from dynamical modelling and the ( / ) pop from stellar population synthesis using a sample of old galaxies with high-quality spectra in Fig. 9. I only use the passive population for this comparison. I adjust the Prospector ( / ) pop by adding 0.19 dex to match the JAM median. I also convert the ppxf ( / ) pop from the Salpeter (1955) fig. 4). I do not change the ppxf value after this conversion.
The main findings from Fig. 9 are: (i) The ppxf ( / ) pop values are more consistent with the ( / ) JAM values than the Prospector ones. The scatter is 0.090 dex for ppxf and 0.117 dex for Prospector. This suggests that adding spectra to ppxf improves the mass estimates.
(ii) The ppxf ( / ) pop comparison does not have the low-/ outliers that appear in the Prospector comparison, indicating more reliable ( / ) pop or * estimates in ppxf with spectra than in Prospector with photometry only.
(iii) Both ppxf and Prospector show a similar trend in the ( / ) pop − ( / ) JAM relation, which clearly deviates from a one-to-one relation. The trend implies that the galaxies with higher * have more mass from dynamics than from population models at a fixed IMF. The variation is comparable to the mass difference between Chabrier and Salpeter IMF. This trend is consistent with previous studies that suggested a non-universal IMF based on dynamics and population of nearby (Cappellari et al. 2012;Li et al. 2017;Shetty et al. 2020a) and distant galaxies (Shetty & Cappellari 2014). Whatever the origin of this trend, this comparison shows that it is robust across different samples, redshift and methods.
From the cross-comparisons between the scatter observed when comparing different estimates of the stellar masses, one can infer the accuracy of each individual technique, assuming as an approximation that it is constant. In fact, if we define method the uncertainty of 'method', then the squared uncertainties between each pair of methods add linearly as follows where the scatter Δ was measured in Fig (37) gives the 1 relative uncertainty of the three different methods on this dataset: This result shows that, at least for the limited case of the old population, where we can assume we know the 'true' stellar mass from galaxy dynamics, the inclusion of spectra in ppxf gives masses significantly more accurate than those using Prospector with photometry alone. This is encouraging, but of course, it should not be interpreted as ppxf being more accurate than Prospector, given that the latter could fit spectra as well and this would likely lead to comparable accuracy as ppxf. However, these extra comparisons are beyond the scope of this paper.
In Fig. 10 I also show the comparison between stellar dynamics and stellar population / for the full set of galaxies with high-/ regardless of their age. This plot cannot be used to infer the accuracy of the mass estimates. In fact, detailed modelling of the MaNGA survey has shown that younger galaxies contain significant fractions of gas and dark matter (Zhu et al. 2023a), making the mass estimate from the stellar population significantly lower than the dynamical one, as observed.

Stellar population scaling relations
As I am focusing on observable trends, I define luminosity-weighted population quantities, summed over the template weights, as In Fig. 11 I show the distribution of ages, metallicities and the Sersic (1968) index Ser on the ( JAM , maj e ) plane, where the dynamical mass JAM closely approximates the total stellar masses * , and the half-light radius maj e is the semi-major axis of the isophote containing half of the total light of the Sersic (1968)  The grey contours are a kernel density estimate of the galaxy distribution (using scipy.stats.gaussian kde). Galaxy properties mainly follow lines of constant stellar velocity dispersion, which is indicated by the dashed grey lines for * = 50, 100, 200, 300, 400, 500 km −1 from left to right, while mass is not a good predictor of their stellar population. However, above a stellar mass lg( * /M ⊙ ) > ∼ 11.5 (blue vertical line), all galaxies are old (quenched), have a high metallicity and large Sersic index. The thick red line is the "zone of avoidance" for nearby galaxies from Cappellari et al. (2013b), scaled down by a factor 1.6× to account for redshift evolution. ⟩ vs stellar velocity dispersion * for the LEGA-C galaxies. The top panels show the age distribution, coloured by loess-smoothed metallicity. The grey horizontal band is the Universe age range between 0.6 < < 1 and the red dashed line is the present age. The bottom panels show the metallicity distribution coloured by loess-smoothed age. In all panels, the grey contours are the kernel density estimator of the galaxies' distribution. From left to right I show results using the fsps, galaxev and bpass SPS models, all with regul=10 in ppxf. There is a clear bend of the (Age, * ) trend around lg( * /km −1 ) ≈ 2.3. At fixed * metallicity depends on age with younger galaxies having lower ⟨ in mag is plotted against the UV slope and coloured by loess-smoothed luminosityweighted ages. The grey contours are the kernel density estimator of the galaxies' distribution. At every the youngest galaxies have larger attenuation, while the largest are only observed around the Calzetti et al. (2000) attenuation curve, which has = 0. From left to right I show results using the fsps, galaxev and bpass SPS models. The first two are highly consistent, but the latter, while still qualitatively similar is quantitatively very different.
or Lu et al. 2023). I also show to guide the eye the local "zone of avoidance" at high densities (Cappellari et al. 2013b, eq. 4), which I scaled down by a factor 1.6× in maj e , roughly consistent with the general trends of decreasing galaxy sizes with redshift (e.g. van der Wel et al. 2014). The Sersic index Ser also approximately follows the distribution of ages and metallicity, in the sense that passive galaxies tend to have ln Ser > ∼ 0.4 or Ser > ∼ 2.5 (red colour in the right panel of Fig. 11). This Ser value is the one sometimes adopted to separate early-type from late-type galaxies (e.g. Bell et al. 2003;Shen et al. 2003). In the local Universe, below the stellar mass lg( * /M ⊙ ) < ∼ 11.5 the trend of Ser is due to a sequence of increasing bulge fraction, while above lg( * /M ⊙ ) > ∼ 11.5 is the region of slow rotators with cores (see review by Cappellari 2016, fig. 23).
This result has a rather long history, both locally and at ∼ 1 (Chauke et al. 2018(Chauke et al. , 2019Beverage et al. 2021;Barone et al. 2022;Hamadouche et al. 2022;Tacchella et al. 2022) but had not been seen so cleanly at this redshift before LEGA-C. For nearby galaxies, Kauffmann et al. (2003) clearly noted that galaxy population correlates better with mass surface density Σ than with * . It was later observed that Σ, or even better the virial predictor vir ∝ * / e of the stellar velocity dispersion, inferred from photometry alone, remains a better predictor of galaxy colours out to ≈ 3 (Franx et al. 2008;Bell et al. 2012). However, it was still unclear at that time how accurately the photometric estimates were able to predict the actual stellar masses and the velocity dispersion of the stars. To address this issue I used masses from dynamical models, and * from good quality integral-field stellar kinematics, rather than photometric estimates. In Cappellari (2011) I clearly concluded that " * (not Σ e or * ) is the best predictor of galaxy properties" (see also Cappellari et al. 2013b). These early results were confirmed by several papers using larger samples and stellar kinematics of ever-increasing quality (e.g. Wake et al. 2012;McDermid et al. 2015;Scott et al. 2017;Li et al. 2018;Barone et al. 2018Barone et al. , 2020. In parallel, Cheung et al. (2012) and Fang et al. (2013) introduced the use of central surface density Σ 1 from photometry, within a fixed radius of 1 kpc, to predict quenching. A review is given in Cappellari (2016, see fig. 22).
Given that in Fig. 11 the main stellar population trends follow * , in Fig. 12 I show how the luminosity-weighted ages and metallicity depend on * in the LEGA-C sample. The trends resemble quite closely the local results from the best integral-field spectroscopy from both SAMI (Scott et al. 2017) and MaNGA (Li et al. 2018).
However, the top panels of Fig. 12 additionally illustrate the clear dependency between age and [ / ] at fixed * : the population of old galaxies at large * is characterized by a larger metallicity than their younger counterpart at the same * . Very clear is the bend in the ( * , Age) distribution around lg( * /km −1 ) ≈ 2.3 (also see Chauke et al. 2018). The results are very consistent between both the fsps and galaxev SPS models. It is reassuring to see that the ridge of the age distribution in the top panels converges towards the age of the Universe at that redshift (grey horizontal band), while being slightly younger for galaxev vs fsps. I also run models where I restricted the age of each galaxy to the Universe's age at its redshift, as generally done for local studies. All results were qualitatively similar, except for the obvious truncation and corresponding clustering of the Ages values at the maximum Universe Age ≈ 6.6 Gyr at ≈ 0.8, which is indicated by a grey band in Fig. 12. The bpass results are qualitatively in agreement but show substantial quantitative differences, especially in the * − [ / ] trend. Overall, this figure confirms the quality and consistency of these global results compared to local surveys. Fig. 13 shows the distribution of the two dust attenuation parameters and (Section 6.3) coloured by mean stellar age. One can see that at every UV slope the youngest galaxies have the strongest attenuation, except for the largest . Moreover, the largest attenuations in galaxies are only observed at large , close to the Calzetti et al. (2000) slope = 0. Note, however, that there is a degeneracy between attenuation and continuum normalization near the upper limit of . Results are extremely consistent for the fsps and galaxev SPS models, but again the bpass results look quite different, although they all qualitatively agree.

Non-parametric star formation histories
Fig. 14 shows the non-parametric star formation history of the galaxies in the LEGA-C sample as a function of key galaxy parameters. For this plot I sorted the quantity of interest (e.g. * ) and constructed 30 bins in that quantity, each containing the same number of about 100 galaxies, in such a way that different bins have the same level of shot noise. I show the dependency of the SFH, parametrized as discussed by the light bol contributed in the spectrum by stellar populations of different ages, as a function of the following parameters: (i) Stellar velocity dispersion: the plots show a clear trend of SFH with * as expected from the trends between * and age. What In all panels, the colours represent the SFH recovered with ppxf (with regul=100) parametrized by the bolometric light bol fraction contributed by populations of different ages. The SFHs are shown as a function of key galaxy parameters: (1) the stellar velocity dispersion * ; (2) the dynamical mass JAM , which well approximates * ; (3) the dynamically-determined average density Σ JAM 1 inside a circle of radius 1 kpc centred on the galaxy; (4) the luminosity weighted metallicity ⟨ [ / ] ⟩ and (5) the exponent Ser of a Sersic profile fitted to the galaxy photometry (see text for definitions). Average values are computed for equal bins in * (or the other parameters) of 100 galaxies each. Galaxies with largest * , * , ⟨ [ / ] ⟩ or Ser on average experienced their main star formation event long ago, but the typical age for the bulk of their star formation increases with * (or the other parameters) as indicated by the slanted black dashed wavy lines in the left and right panels. For lower * galaxies can form the stars at any time until the present. The plots show a beautifully clear and sharp quenching boundary at lg( * /km −1 ) ≈ 2.3, or lg(Σ JAM 1 /M ⊙ kpc −2 ) ≈ 9.9, or ⟨ [ / ] ⟩ ≈ −0.2 for fsps and ⟨ [ / ] ⟩ ≈ 0.0 for galaxev, or lg Ser ≈ 0.5, as indicated by the vertical yellow dashed wavy lines. There is no sharp boundary as a function of galaxy mass, but the transition is gradual and roughly happens around lg( * /M ⊙ ) ≈ 11.5. Note the generally good agreement between the results from the fsps and galaxev SPS. The results using the bpass models look problematic, with spurious structures at specific ages, which are most likely artefacts of the models.
is new is the striking sharpness of the boundary between a regime lg( * /km −1 ) > ∼ 2.3 (or * > ∼ 200 km −1 ), above which the spectra are dominated by a population nearly as old as the Universe at that redshift, without evidence for subsequent star formation events, and below which suddenly galaxies have star formation at any time until the present time. Both the fsps and galaxev SPS models indicate that galaxies still form the bulk of their stars at old times, but this age increases with * by roughly a factor between 6-10 for a variation in * by a factor of 10. In the case of the galaxev models, the SFH indicate ongoing star formation at the lowest * bins, while this is less so for the fsps models.
(ii) Density within 1 kpc: this panel shows the same trend as the previous one, but with different units. This is because Σ JAM 1 is closely related to * (see Fang et al. 2013), especially when using JAM dynamical masses instead of stellar population masses.
(iii) Galaxy mass: contrary to the dependency of SFH with * , there is no sharp transition as a function of stellar mass, but rather a gradual trend. Only galaxies more massive than lg( * /M ⊙ ) > ∼ 11.5 are characterized by a single event of star formation at old times.
(iv) Galaxy metallicity: this panel shows that [ / ] is as good as * at predicting the boundary between the region of fully quenched galaxies and those that can have multiple star formation events. Here   Figure 15. Joint star-formation history (SFH) and metallicity distributions. Each panel shows the average distribution of weights recovered with ppxf (with regul=10) for 4 bins containing 800 LEGA-C galaxies sorted by their stellar velocity dispersion * (as indicated in the plot titles). The weights represent the bolometric luminosity bol contributed by populations of different ages and metallicities. The top row shows the results obtained with the fsps SPS models. The second row still uses fsps but adopts the COSMOS2020 (Weaver et al. 2022) instead of my default UltraVISTA photometric catalogue (Muzzin et al. 2013). The third row is derived using the galaxev SPS models and the bottom one with the bpass models. One can see that the fsps and galaxev SPS provide qualitatively consistent results even for some of the main "blobs" in the distributions. Using an alternative photometric catalogue has virtually no effect on the results. At large * galaxies on average quenched long ago and their stars have high metallicity. At progressively lower * the age of the bulk of the star formation decreases, while still being dominated by high metallicity stars. However, the population is polluted by fresh accretion events of lower metallicity and a range of accretion times. As previously noted, the results using bpass SPS models are significantly different and should not be trusted, without further analysis. it happens at [ / ] ≈ −0.2 for the fsps and [ / ] ≈ 0.0 for the galaxev, which are systematically shifted to larger values of metallicity.
(v) Sersic index: This panel show the SFH as a function of the Sersic (1968) exponent Ser . The boundary between fully quenched galaxies and galaxies that can have multiple star formation events happens here at lg Ser ≈ 0.5 or Ser ≈ 3.2. Remarkably this boundary is here nearly as clean as that with * .
In all panels, the results using the bpass SPS models are again quite different from the other two. They show significant structure at specific ages and a less clear quenching boundary. The structure seen using the bpass models is likely an artefact of the SPS rather than a real conspiracy in the star formation events.
The trends for different galaxy parameters, and the overall consistency between the four panels, for both the fsps and galaxev SPS models, can be understood by looking at Fig. 11 and noting that there is a region lg( * /M ⊙ ) > ∼ 11.5 above which all galaxies are quenched, high metallicity and have large Sersic index. Below that mass galaxies follow a trend of increasing bulge fraction, which increases * , metallicity and makes galaxies more likely to quench. These results parallel those which have been extensively reported for local galaxies (see review by Cappellari 2016). What is new here is the clarity and sharpness of the empirical evidence of the boundary to quenching and the fact that this can be detected so well at a time when the Universe was half of its current age.
A rapid cessation of star formation for galaxies above a given critical value of * , or of some other estimate of the central stellar density, varying with , has often been invoked to explain the evolution of galaxy parameters over time (e.g. van Dokkum et al. 2015). An excellent review of the empirical evidence and models of a quenching boundary in galaxies is given in Chen et al. (2020, sec. 1). The physical mechanism for quenching is still under debate. Very briefly, one can group the main proposed theories into three broad classes: (i) "halo quenching", where the gas gets shock-heated when falling into the gravitational potential of massive dark halos (e.g. Dekel & Birnboim 2006); (ii) "active galactic nucleus (AGN) feedback", where a jet from the supermassive black hole either ejects the gas from its host galaxy (e.g. Silk & Rees 1998) or prevent it from infalling (e.g. Bower et al. 2006;Croton et al. 2006). See review by Somerville & Davé (2015). The panels in Fig. 14 provide a beautiful empirical confirmation of the theoretical assumptions that are made in many of those models. Fig. 15 presents the non-parametric joint luminosity distribution of the age and metallicities of the stellar populations of galaxies in four different bins of * . As in Fig. 14, also for this figure I sorted galaxies as a function of their * and constructed four groups, of about 800 galaxies each, to ensure all panels have the same level of shot noise. Like before, I compare all three SPS models (fsps, galaxev and bpass). In addition, In the second row of Fig. 15 I show the result when using the fsps model but adopting the photometric measurements from COSMOS2020 (Weaver et al. 2022) instead of the UltraVISTA catalogue (Muzzin et al. 2013). The first and second rows are barely distinguishable and this shows that any possible difference in the photometric calibration has a completely insignificant effect on the results. The distribution from both fsps and galaxev is highly consistent, almost at the level of the individual "blobs", except for slightly older younger ages and higher metallicities for the galaxev vs the fsps models.

Non-parametric joint SFH and metallicity distributions
The plots indicate that even galaxies with low * are still dominated by stars with high metallicity, but this is diluted by extra lowermetallicity populations acquired at different times. I should stress that the relatively smooth distribution in the maps are averages of many galaxies and should not be interpreted as the evolution of one individual galaxy, which is generally characterized by discrete star formation events. Moreover, not every feature of the maps is robust against variations in the data and SPS models.
The overall observed distribution could be interpreted in the context of the two-phases of galaxy formation (e.g. Oser et al. 2010). According to this scenario, the formation of galaxies has a "two-phase" nature: a fast initial phase at > ∼ 2 where "in situ" stars are created inside the galaxy from cold gas that falls in, and a longer phase since < ∼ 3 where "ex-situ" already-formed stars are mainly acquired. In this phase, large systems increase their mass and radius by absorbing smaller stellar systems that were formed very early ( > ∼ 3) outside of the central galaxy's virial radius, or by smooth gas accretion from cosmological filaments (see Naab & Ostriker 2017, for a review).
Specifically, the old, high-metallicity component observed in Fig. 15 could be interpreted as the relic of the in-situ formation, which was quickly metal enriched, while the lower metallicity would correspond to either acquired stars, previously formed in smaller stellar components, or to star formation due to accretion from low-metallicity cosmological filaments. The accreted component is only present below the critical "quenching boundary" of * < ∼ 200 km −1 . Below that boundary, accretion can continue throughout the galaxies evolution. The high-metallicity old peak is visible for all four subsets of * , but its age decreases with * . This age trend in the old-age peak is the same already pointed out in Fig. 14.
The LEGA-C spectra I analysed are not spatially resolved, but a similar analysis of spatially-resolved integral-field spectroscopic data for the MaNGA survey shows that, in low * galaxies in the nearby Universe, the oldest higher-metallicity component is associated to the galaxy bulge, while low-metallicity gas accretion happens in the disk (e.g. Lu et al. 2023).
As expected, the bpass models show again quite different results, with a markedly different metallicity distribution. As commented earlier, the results from this model should be treated with caution as they are likely dominated by spurious unknown effects in the models.
A caveat on these results on the metallicity distribution, which also affects other similar results on metallicity determinations from galaxy stellar spectra, is that the signature of metallicity variations becomes weaker at younger ages, where the / of the data also generally decreases. This can introduce possible systematic effects on metallicity trends. To exclude the effect of / , I verified that all results remain unchanged if I restrict the analysis to the 873 galaxies with / > 20 and even, at coarser resolution, for the subset of 126 galaxies with / > 40. It would still be valuable to compare the reported metallicity trends e.g. with those inferred from gas tracers from similar data.

SUMMARY
In the first half of this paper, I described some modifications to the ppxf method (Cappellari 2017), which is used to extract the stellar and gas kinematics, as well as the stellar population of galaxies. First, I described a novel constrained least-squares optimization algorithm that ppxf has been using for the past few years. Then I outlined the changes I made to ppxf to be able to fit photometric data together with the usual full-spectrum fitting. I also described some other minor changes.
In the second half of the paper, I presented an application of ppxf to the extraction of non-parametric star formation histories and metallicity distributions for a sample of 3200 galaxies at redshift 0.6 < < 1 with spectroscopy from the LEGA-C survey DR3 (van der , and with 28-bands photometric measurements covering from the far ultraviolet (0.1 m) to the near-infrared (3 m) from either the UltraVISTA (Muzzin et al. 2013) or the COSMOS2020 catalogues (Weaver et al. 2022). I also constructed JAM dynamical models (Cappellari 2008(Cappellari , 2020) for all galaxies with measured stellar dispersion * and available Sersic profile fits to the photometry.
For this study, I used and compared three spectral population synthesis (SPS) methods satisfying some criteria of age and wavelength coverage. This led to my selection of the fsps (Conroy et al. 2009;Conroy & Gunn 2010), galaxev (Bruzual & Charlot 2003) and bpass (Stanway & Eldridge 2018;Byrne et al. 2022) SPS methods.
I compared the dynamical masses from JAM against the stellar masses from the different stellar-population fitting methods. I found that ppxf with photometry and spectra provides more accurate masses than the other methods with photometry alone, as one would have expected.
I found that ppxf on these data reveals a striking difference between galaxies that are only consistent with a single star formation event from those that require multiple bursts of star formation.
I constructed scaling relations for the global stellar population parameters and found a remarkable similarity, but even clearer trends, between these results at ≈ 0.8 and those from the latest spectroscopic surveys in the nearby Universe. This gives some confidence in the meaningfulness of the results and highlights the quality of the spectrophotometric data.
Finally, I explored the non-parametric star formation histories (SFH) and the joint SFH and metallicity [ / ] distributions. I found that the data indicate, on average over many galaxies, a remarkably sharp quenching boundary for the cessation of star formation, at a stellar velocity dispersion lg( * /km −1 ) ≈ 2.3 ( * ≈ 200 km −1 ), or equivalently with average mass density within 1 kpc lg(Σ JAM 1 /M ⊙ kpc −2 ) > ∼ 9.9 (Σ JAM 1 > ∼ 7.9×10 9 M ⊙ kpc −2 ), or at metallicity [ / ] ≈ −0.1 (with some variation dependent on the adopted SPS model) or at Sersic (1968) index lg Ser ≈ 0.5 ( Ser ≈ 3.2). As expected, the transition is more gradual as a function of stellar mass. This abrupt quenching boundary has been invoked by several models of galaxy formation. These data provide one of the cleanest empirical evidence to date.
The joint age-metallicity distribution appears to support the twophase scenario of galaxy evolution by revealing the relic of an old quickly-formed high-metallicity component and, below the quenching boundary * < ∼ 200, multiple events of lower-metallicity accretion. This paper only scratches the surface of what can be done with this dataset and with similar ones that are being acquired at comparable and higher redshift. I have not explored e.g. obvious dependencies between SFH and stellar kinematics or environment (e.g. Cole et al. 2020;Sobral et al. 2022). Comparisons with galaxy formation models should be performed in the space of observable rather than using stellar masses which are empirically more uncertain. A similar analysis at higher redshift can reveal the onset and variation of the quenching boundary, which is a key but still quite uncertain parameter in galaxy formation models. James Webb Space Telescope (JWST) data are ideal to extend this kind of study to higher redshift.