Surrogate modelling the Baryonic Universe I: The colour of star formation

The spectral energy distribution of a galaxy emerges from the complex interplay of many physical ingredients, including its star formation history (SFH), metallicity evolution, and dust properties. Using GALAXPY, a new galaxy spectral prediction tool, and SFHs predicted by the empirical model UNIVERSEMACHINE and the cosmological hydrodynamical simulation IllustrisTNG, we isolate the influence of SFH on optical and near-infrared colours from 3200 to 10800 \r{A} at z=0. By carrying out a principal component analysis, we show that physically-motivated SFH variations modify galaxy colours along a single direction in colour space: the SFH-direction. We find that the projection of a galaxy's present-day colours onto the SFH-direction is almost completely regulated by the fraction of stellar mass that the galaxy formed over the last billion years. Together with cosmic downsizing, this results in galaxies becoming redder as their host halo mass increases. We additionally study the change in galaxy colours due to variations in metallicity, dust attenuation, and nebular emission lines, finding that these properties vary broad-band colours along distinct directions in colour space relative to the SFH-direction. Finally, we show that the colours of low-redshift SDSS galaxies span an ellipsoid with significant extent along two independent dimensions, and that the SFH-direction is well-aligned with the major axis of this ellipsoid. Our analysis supports the conclusion that variations in star formation history are the dominant influence on present-day galaxy colours, and that the nature of this influence is strikingly simple.


INTRODUCTION
The spectral energy distribution (SED) of a galaxy encodes detailed information about the physics of galaxy formation, as star formation history (Heavens et al. 2000), metallicity evolution (Tojeiro et al. 2007), dust properties (Conroy et al. 2010), initial mass function ), nebular emission (Wilkins et al. 2013), as well as other galaxy properties, act in concert to shape observed SEDs. Due to these correlations, inferring the physical properties of individual galaxies is a long-standing challenge in astronomy (Kauffmann et al. 2003;Acquaviva et al. 2011;Conroy 2013;Chevallard & Charlot 2016). In this work, we study how the physical ingredients of galaxy formation manifest in photometric observations of galaxies, with a special focus on the influence of star formation history (SFH) on broad-band optical and near-infrared galaxy colours.
The relationship between SFH and colours has been studied previously in the literature using parametric approaches, (Iyer & Gawiser 2017;Carnall et al. 2019), non-parametric approaches Iyer et al. 2019), simulations (Nelson et al. 2018b), ★ E-mail: jchavesmontero@anl.gov and semi-analytic models (Pacifici et al. 2012). Here, we leverage the galaxy evolution model (Behroozi et al. 2019) and the IllustrisTNG cosmological hydrodynamical simulation (Pillepich et al. 2018a) as generators of SFHs that are wellmotivated both physically and observationally. Our aim is to identify the characteristic signature(s) that SFH imprints upon the statistical distribution of galaxy colours.
We start by studying the properties of SFHs drawn from and IllustrisTNG. Even though SFHs predicted by these models present different shapes and levels of burstiness, we find that these SFHs share a very important property: galaxies in more massive haloes reach the peak of their SFH earlier, present a larger stellar mass, and quench faster. This well-known effect is commonly known as downsizing (e.g., Cowie et al. 1996). There is plenty of observational evidence for downsizing (e.g., Fontanot et al. 2009, and references therein), which is a general prediction of structure formation theories, hydrodynamical simulations (e.g., Davé et al. 2016), and abundance matching (Conroy & Wechsler 2009).
After getting rid of unwanted correlations between the SFH and other galaxy features, we use , a new open-source python-based galaxy spectral prediction tool, and SFHs drawn from and IllustrisTNG to produce broad-band colours that isolate the influence of physically-motivated SFH variations. We then carry out a Principal Component Analysis (PCA) of these colours to study how physically motivated SFH variations, as well as changes in other galaxy properties, manifest in observations of broad-band optical and near-infrared colours. PCA has been widely used in the literature in closely related contexts, such as spectral classification of galaxies (Connolly et al. 1995) and quasars (Yip et al. 2004), stellar mass estimation (Chen et al. 2012), K-corrections (Blanton & Roweis 2007), and inferring spectra from optical imaging (Kalmbach & Connolly 2017). Finally, we seek to determine the galaxy properties with the strongest influence on galaxy colours by carrying out a PCA on the colours of a volume-limited sample of galaxies from the Sloan Digital Sky Survey (SDSS, York et al. 2000;Dawson et al. 2013). Our work complements the results shown in Pacifici et al. (2015), who studied the impact of various galaxy properties on colours.
This work is the first in a series building a new approach to simulation-based forward-modelling of the galaxy-halo connection: Surrogate modelling the Baryonic Universe (SBU). Our present scope is somewhat pedagogical; we demarcate the specific influence of star formation history on galaxy colours to improve our understanding of the degeneracies between model ingredients. One of the main conclusions of this work is that the colours of a galaxy primarily depend upon the fraction of stellar mass formed over the last billion years. Taking advantage of this important simplification, in a companion paper to the present work we build a computationally efficient surrogate model for the prediction of broad-band galaxy colours. These two papers set up the basis of SBU, a model that we will use to produce synthetic skies by populating cosmological simulations with galaxies. Forward-modelling galaxy catalogues is essential for ongoing experiments such as the Dark Energy Survey 1 (DES, Dark Energy Survey Collaboration 2016), the Hyper-Suprime Cam 2 (HSC, Miyazaki et al. 2012), and the Kilo-Degree Survey 3 (KiDS, de Jong et al. 2013), and near-future surveys like the Javalambre Physics of the Accelerating Universe Astrophysical Survey 4 (J-PAS, Benítez et al. 2014), the Large Synoptic Survey Telescope 5 (LSST, Ivezić et al. 2008;LSST Science Collaboration et al. 2009), the Spectro-Photometer for the History of the Universe, Epoch of Reionization and Ices Explorer 6 (SPHEREX, Doré et al. 2014), and the Wide Field Infrared Survey Telescope (WFIRST, Akeson et al. 2019). Our aim is to apply the SBU approach to mimic the measurements of these and other surveys, thereby enabling a precise characterisation of the impact of systematic uncertainties on cosmological parameters, as well as setting constraints on galaxy formation physics using large-scale structure data.
The paper is organised as follows. In §2 we first describe the main features of the newly developed model , and then we briefly present the most important characteristics of and IllustrisTNG. Using SFHs drawn from these two models, in §3 we study the connection between galaxies and dark matter haloes. Then, in §4 we study the influence of SFH on broadband SEDs, and in §5 we address the impact of initial mass function, metallicity, dust attenuation, and nebular emission on galaxy colours. We discuss our findings in the context of the existing literature in §7, and we conclude with a summary of our principal results in §8.
Throughout this paper we use Planck 2 015 cosmological parameters (Planck Collaboration et al. 2014): Ω m = 0.314, Ω Λ = 0.686, Ω b = 0.049, 8 = 0.83, ℎ = 0.67, and s = 0.96. Magnitudes of observed and simulated galaxies are in the AB system, while halo and stellar masses are in ℎ −1 and units, respectively. Simulated colours are computed by convolving spectra with LSST transmission curves 7 . For the sake of brevity, some figures in the main body of the paper display results for data only; for completeness, we include analogous results for IllustrisTNG in Appendix §A.

DATASETS AND MODELS
In this section, we first detail the main features of , a new publicly available galaxy spectral prediction tool. Then, we briefly introduce the data-driven model and the cosmological hydrodynamical simulation IllustrisTNG, from which we draw physically-motivated star formation histories.

GALAXPY
The spectral energy distribution of a galaxy depends on many ingredients, including its star formation history, metallicity, dust properties, and nebular emission. Due to complex correlations between these, modelling galaxy SEDs is a long-standing problem in astronomy. The most followed approach to model SEDs is Stellar Population Synthesis (SPS), which relies on stellar evolution theory to produce precise SEDs typically in the UV, optical, and nearinfrared range (for a review see Conroy 2013). The basic unit of SPS models is a Simple Stellar Population (SSP), i.e. a set of stars with the same metallicity created by an instantaneous star-forming event; the motivation behind this approach is that any stellar population can be written as a sum of SSPs. We can find many examples of SPS models in the literature such as (Bruzual & Charlot 2003), Maraston (2005), and ). In this section, we introduce a new open-source python-based galaxy spectral prediction tool, 8 , which uses for constructing SSPs, accounts for attenuation by dust following the prescriptions outlined in Charlot & Fall (2000), and models nebular emission lines as Gutkin et al. (2016). We proceed to briefly introduce the main properties of . The most common technique used by SPS codes to produce SEDs for SSPs is isochrone synthesis (Charlot & Bruzual A 1991;Bruzual & Charlot 1993), which relies on three main ingredients: stellar evolutionary tracks, spectral libraries, and initial mass functions (IMFs). The first accounts for the evolution of stars with the same mass, age, and metallicity; in we use Padova 1994 evolutionary tracks describing stars with metallicities = 0.0001, 0.0004, 0.004, 0.008, 0.02 ( ), and 0.05 (Alongi et al. 1993;Bressan et al. 1993;Fagotto et al. 1994a,b,c;Girardi et al. 1996). The second ingredient is responsible for producing SEDs using the output of stellar evolutionary tracks; our model relies on the widely used BaSeL 3.1 library (Westera et al. 2002). Finally, the IMF controls the distribution of stellar masses in a population of newborn stars. We will consider Salpeter (1955), Kroupa (2001), andChabrier (2003) IMFs.
To account for the attenuation of starlight by dust, in we follow the prescriptions outlined in Charlot & Fall (2000). This dust model attenuates the light coming from SSPs by a factor exp[ ( )] given by where ,y ( ,o ) controls the attenuation of light coming from stars younger (older) than 10 Myr, 0 = 5500 Å, and = −0.7. This approach is inspired by observations: optical depths measured from emission lines, which depend mostly on light emitted by young stars, are approximately two times larger than those determined using the underlying continuum (Calzetti et al. 1994), which is largely controlled by old stars. Note that this model neglects the absorption of ionizing photons by dust in star forming regions. Regarding attenuation by the intergalactic medium, wavelengths shortward of Lyman-are attenuated according to the recipes introduced in Fan et al. (2006) and Becker et al. (2015).
Nebular emission lines are modelled following the same approach as in Gutkin et al. (2016), which assumes that the nebular emission of an entire galaxy is proportional to its star formation rate. This model includes six free parameters that control emission line ratios: interstellar metallicity ISM , zero-age ionisation at the Strömgren radius , dust-to-metal mass ratio d , carbon-to-oxygen abundance ratio / , hydrogen gas density H , and high-mass cutoff of the IMF up . For simplicity, throughout this work we only consider variations in ISM and , as these are the two parameters that most strongly affect emission line ratios. The first regulates gas cooling (e.g., Spitzer 1978), which for ISM > 0.006 ( ISM < 0.006) is dominated by collisionally excited (fine-structure) transitions. The second controls the size of HII regions, which become more compact as grows, thereby increasing the ratio of high-to low-ionisation lines. We set the other parameters to the best-fitting values determined in Gutkin et al. (2016) using low redshift spectroscopic data from the SDSS survey: d = 0.3, / = ( / ) = 0.44, H = 100 cm −3 , and up = 100 . Once emission lines are produced, we also attenuate these according to Eq. 1. Note that nebular continuum emission is currently neglected in . Throughout most of this work we use the same configuration: Chabrier IMF, = 0.02 (solar metallicity), ,y = 1, ,o = 0.3, and no nebular emission lines. If emission lines are considered, the fiducial values of the two free emission line parameters are ISM = 0.001 and log 10 = −3.5.

SFH library
One of our most important aims is to study the influence of SFH variations on broad-band galaxy colours. In principle, the star formation history of a galaxy can be inferred from observations; however, even from high-resolution spectra most algorithms can only recover smooth histories following simple functional forms, or alternatively deliver results at a reduced number of control points (see §7). Consequently, SFHs estimated from observations either present a reduced temporal resolution or may be biased. To avoid these issues, we resort to physically motivated SFHs from simulations. The main drawback of this approach is that our predictions become model dependent; to mitigate the dependence of prescriptions of a certain galaxy formation model on the results, we will study SFHs from both the empirical model (Behroozi et al. 2019) and the cosmological hydrodynamical simulation IllustrisTNG (Marinacci et al. 2018;Naiman et al. 2018;Nelson et al. 2018a;Pillepich et al. 2018a;Springel et al. 2018). Note that these two models follow completely distinct approaches to simulate galaxies. We detail the main properties of and IllustrisTNG below.

UniverseMachine
is a data-driven model of galaxy evolution that maps SFHs onto dark matter (sub)halos empirically (Behroozi et al. 2019). Assuming that each (sub)halo contains a single galaxy, this model assigns to each galaxy a star formation rate (SFR) that depends on the maximum circular velocity ever attained by its (sub)halo, peak , and its redshift. Using Conditional Abundance Matching (Hearin et al. 2014), allows for a possible correlation between SFR and the relative change in the maximum circular velocity of a (sub)halo over the last dynamical time, Δ| dyn max , where the strength of this correlation is parameterized as a function of both halo mass and redshift. The stellar mass of each galaxy is determined by integrating the SFH of its main progenitor, while additionally accounting for mass gained through mergers or lost via passive evolution. As shown in Behroozi et al. (2019), the best-fitting model captures a wide range of statistics summarising the observed galaxy distribution, e.g. stellar mass functions, quenched fractions, and two-point galaxy clustering.
We generate the data used in the present work by running the publicly available code 9 on merger trees identified in the Bolshoi-Planck simulation with Rockstar and Consis-tentTrees (Behroozi et al. 2013a,b;Klypin et al. 2016;Rodríguez-Puebla et al. 2016). The Bolshoi-Planck simulation was carried out using the ART code (Kravtsov et al. 1997). This simulation evolved 2048 3 dark-matter particles of mass p = 1.35 × 10 8 on a simulation box of 250 ℎ −1 Mpc on a side under cosmological parameters closely matching Planck Collaboration et al. (2014). After running , we end up with 700 000 galaxies with SFHs tabulated at each of the 178 publicly available Bolshoi-Planck snapshots.
It has been long-known that the distribution of galaxy colours presents a strong bimodality (e.g., Strateva et al. 2001;Baldry et al. 2004), where galaxies with blue (red) colours typically present high (low) star formation rates. Many approaches have been proposed to classify galaxies into these two populations (e.g., Pozzetti et al. 2010); we will consider galaxies with specific SFRs above and below log 10 (SFR/ * [yr −1 ]) = −11 as star-forming and quenched, respectively. In this work, we will study star-forming and quenched galaxies separately.
Given that the evolution of satellite galaxies inside haloes is highly model dependent, we will restrict our analysis to central galaxies. This election assumes that the difference between central and satellite SEDs is primarily driven by differences in quenching times, and so we have adopted this simplifying assumption to reduce the dependence of our findings on the particularities of each galaxy formation model. We avoid selecting backsplash satellites -centrals that were satellites in the past -by imposing that all present-day centrals cannot have any progenitor classified as a satellite. We find that approximately 400 000 .08 -0.14 galaxies satisfy these criteria. From this sample, we select the 1 000 star-forming and quenched galaxies with host halo masses closest to log 10 ( peak [ℎ −1 M ]) = 11.5, 12, 12.5, and 13, respectively, where peak is the maximum mass that a (sub)halo has ever attained.

IllustrisTNG
IllustrisTNG is a suite of cosmological hydrodynamical simulations carried out using the moving-mesh code A (Springel 2010). Incorporating a comprehensive galaxy formation model with radiative gas cooling, star formation, galactic winds, and AGN feedback (Weinberger et al. 2017;Pillepich et al. 2018a), IllustrisTNG solves for the joint evolution of dark matter, gas, stars, and supermassive black holes from = 127 up to present time. We will use publicly available data from the largest hydrodynamical simulation of the suite, TNG300-1 (Nelson et al. 2018b), which evolved 2 500 3 gas tracers together with the same number of dark matter particles in a periodic box of 302.6 Mpc on a side under a Planck 2015 cosmology. The corresponding mass resolution is 5.9 and 1.1 × 10 7 for dark matter and gas, respectively. Publicly available galaxy properties are tabulated at 100 snapshots.
Following the same procedure as in §2.2.1, we select the 1 000 IllustrisTNG star-forming and quenched central galaxies with host halo masses closest to log 10 ( peak [ℎ −1 M ]) = 11.4, 11.8, 12.3, and 12.8, respectively. Note that almost all central galaxies in high mass haloes are quenched in this simulation; consequently, selecting the 1 000 star-forming galaxies with host halo mass closest to log 10 ( peak [ℎ −1 M ]) = 12.8 would result in selecting galaxies in haloes with a broad range of masses. Motivated by this, we do not consider star-forming galaxies in such haloes. Our IllustrisTNG library thus includes 7 000 SFHs.

THE DEPENDENCE OF SFH UPON HOST HALO MASS
Using the and IllustrisTNG SFH libraries introduced in §2.2, in this section we study the dependence of the SFH of star-forming and quenched central galaxies on the properties of their host haloes. As noted previously in the literature (Pillepich et al. 2018b;Behroozi et al. 2019), in both models the typical stellar mass of central galaxies increases monotonically with halo mass, and the *peak scaling relations exhibit good quantitative agreement with constraints from observations. For reference, in  Figure 1. Median SFH of galaxies in haloes of different masses. In the top and bottom panels we display the results for and IllustrisTNG, respectively. Galaxies in more massive haloes reach the peak of their SFH earlier and present a larger stellar mass, i.e. both models predict downsizing.
we show the first and second moments of the stellar mass distribution of galaxies in different halo mass bins.
Due to the dependence of * upon peak , we naturally expect a correlation between the SFH of a galaxy and its host halo mass. To study this relationship, in Fig. 1 we present the median SFH of galaxies in haloes of different masses, showing results for and IllustrisTNG in the top and bottom panels, respectively. Each line denotes the weighted median SFH of galaxies hosted by haloes of a certain mass, as stated in the legend. We use the fraction of star-forming and quenched galaxies in haloes of the selected mass to weight the SFH of star-forming and quenched galaxies, respectively. We show weighted medians so as not to overrepresent quenched (star-forming) galaxies at small (large) halo masses. In broad strokes, the SFH of galaxies in haloes of different masses look alike in and IllustrisTNG: star formation rate increases rapidly at early times, reaches a maximum sev- Median formation history of galaxies hosted by log 10 ( peak [ℎ −1 M ]) 12 halos. Blue and orange lines indicate median star formation and mass histories, respectively, while solid and dotted (dashed and dot-dashed) lines show the results for star-forming and quenched (IllustrisTNG) galaxies. Shaded areas denote the region between the 16th and 84th percentiles of the results for star-forming Illus-trisTNG galaxies. Star-forming and quenched galaxies present similar SFHs before 0.1; after this time, the SFH of quenched galaxies decreases more rapidly. eral billion years ago, and decreases thereafter. Interestingly, these general characteristics show a clear dependence upon peak . The overall normalisation of the SFH at its peak increases with peak ; this is sensible given that the stellar-to-halo mass relation is monotonic in both models. Furthermore, galaxies hosted by more massive haloes evolve faster, i.e. they reach the peak of their SFH at earlier times. This well-known effect is commonly known as downsizing (e.g., Cowie et al. 1996), and it results in high-mass galaxies hosting older stellar populations relative to low-mass galaxies (Juneau et al. 2005;Thomas et al. 2005;Pacifici et al. 2016).
Our criterion for classifying galaxies into star-forming and quenched only considers their specific SFR at observation time.
Given that these two populations present different observational properties, it is natural to address whether their entire SFH reflects these differences. In Fig. 2 we present the median star formation and stellar mass history of galaxies hosted by log 10 ( peak [ℎ −1 M ]) 12 haloes. Blue and orange lines indicate median star formation and mass histories, respectively, while solid and dotted (dashed and dot-dashed) lines show the results for star-forming and quenched galaxies as predicted by (IllustrisTNG). Shaded areas encapsulate the region between the 16th and 84th percentiles of the results for IllustrisTNG star-forming galaxies. As we can see, star-forming and quenched galaxies at = 0 show similar SFHs before 0.1, and, after this time, the SFH of quenched galaxies decreases more rapidly. Although the SFHs of both populations are noticeably different, the stellar mass of star-forming and quenched galaxies is approximately the same at all redshifts. This is consistent with the fact that galaxies in log 10 ( peak [ℎ −1 M ]) 12 haloes form less than 10% of their current stellar mass after 0.3, as the formation of stars peaks at higher redshift for these galaxies. Similar conclusions apply to galaxies hosted by more massive haloes, as their SFH peaks at even higher redshifts. Indeed, in Table 1 we show that the stellar masses of star-forming and quenched galaxies . The top and bottom panels present the results for starforming and quenched galaxies, respectively. Thick lines indicate median SFHs, thin lines the SFHs of randomly selected galaxies, and black stripes the periods during which galaxies would be considered quenched according to our criterion. The variety of SFHs for distinct samples is captured by twodimensional histograms. We find that another manifestation of downsizing is that galaxies in more massive haloes become quenched at earlier times. in and IllustrisTNG are compatible with one another to within one standard deviation for all halo masses studied.
To further study the dependence of SFH on halo mass for star-forming and quenched galaxies, in the top and bottom panels of Fig. 3 we separately display the SFHs of both populations as a function of host halo mass. This figure show the galaxy-halo connection as predicted by ; see Appendix A for an analogous plot for IllustrisTNG. Thick lines indicate median SFHs and thin lines the SFHs of randomly selected galaxies. Despite the smooth regularity of the median trends, we find that SFHs are very bursty, changing by orders of magnitude over short timescales. The typical amplitude of these variations is captured by the two-dimensional histograms on the background. We find Table 2. Properties of broad-band colour distributions that isolate the influence of SFH variations for star-forming and quenched galaxies in haloes of different masses. We use the same coding as in Table 1 that that SFHs predicted by and IllustrisTNG agree qualitatively; however, IllustrisTNG predicts a less (more) steeply decreasing SFH for star-forming (quenched) galaxies at late times, and that SFHs are more bursty in IllustrisTNG relative to . Black stripes denote the periods during which galaxies would be considered quenched according to our criterion. As we can see, in both models the lookback time at which a galaxy becomes quenched increases with its host halo mass.

INFLUENCE OF SFH ON BROAD-BAND GALAXY SED
In §3 we studied the dependence of SFH on host halo mass using and IllustrisTNG, finding that both models predict that galaxies in the more massive haloes form first, reach the peak of their SFH faster, and quench earlier, i.e. galaxies in more massive haloes are more evolved. Here, we address how these trends manifest as a dependence of broad-band galaxy SEDs upon host halo mass.

Influence of SFH on broad-band colours
In this section, we explore the influence of SFH on broad-band optical and near-infrared galaxy colours from 3 200 to 10 800 Å. In particular, we study the five colours that result from subtracting consecutive LSST bands: {u-g, g-r, r-i, i-z, z-Y}.
We use to isolate the impact of SFH variations on broad-band colours. Specifically, we hold fixed all variables such as metallicity and dust properties while generating a SED for each SFH in the and IllustrisTNG libraries. Any difference between the resulting colours thus arises from SFH variations. Although these simulated colours have not been fitted to photometric observations, we will show in §6 that they broadly represent the distribution of galaxy colours from observations, thereby serving our primary purpose of capturing how physically motivated SFH variations translate into observations of galaxies in colour space.
In Fig. 4 we show the simulated colours of galaxies hosted by haloes of different masses at = 0.03. In the left and right panels, we present the results for star-forming and quenched galaxies, respectively; see Appendix A for an equivalent figure for IllustrisTNG. At fixed peak , the scatter in colours arises from variations amongst SFHs. Even though this scatter is large, especially for star-forming galaxies, galaxies in low (high) mass halos are preferentially located closer to the bottom left (top right) of all panels. Thus, central galaxies in high mass haloes are in general redder than those in low mass haloes. We highlight that this reddening is purely due to differences in SFH, i.e. this is a direct consequence of downsizing.
For a quantitative characterisation of these colour distributions, we carry out a principal component analysis (PCA, Pearson 1901). PCA is an orthogonal transformation that decomposes a multivariate dataset into linearly uncorrelated variables called principal components (PC). This transformation is done in such a way that the first PC accounts for the largest possible amount of variability in the data, the second PC maximises the remaining variance under the constraint that it is orthogonal to the first PC, and so on. Therefore, PCs define an orthogonal basis in the multidimensional vector space of the data. We will refer to these vectors as eigenvectors. As succeeding components explain less and less variance, PCA can be used to reduce the dimensionality of a dataset.
PCA has been widely used in the literature in closely related contexts, such as spectral classification of galaxies (Connolly et al. 1995) and quasars (Yip et al. 2004), stellar mass estimation (Chen et al. 2012), K-corrections (Blanton & Roweis 2007), and inferring spectra from optical imaging (Kalmbach & Connolly 2017). In this work, the application of PCA to galaxy colours is motivated by the obvious correlation between broad-band colours shown in Fig. 4. In Table 2 we show the variance explained by the first PC of the simulated colours of and IllustrisTNG star-forming and quenched galaxies in different halo mass bins. Independently of population and host halo mass, we find that the first PC of each distribution explains more than 96% of the total variance in colours, implying that most of the information about SFH encoded in broad-band photometry can be captured by just a single linear combination of galaxy colours. This conclusion holds for both and IllustrisTNG, even though these two models have radically distinct implementations of the galaxy-halo connection and predict SFHs with different shapes and levels of burstiness.
The previous result suggests the existence of a single direction in colour space that corresponds to physically-motivated variations in star formation history. It is natural to consider whether this direction varies across different populations, halo masses, or SFH models. To address this question, we repeat the above PC decomposition, but for our dataset we use the entire collection of (or IllustrisTNG) galaxies, without separating them into quenched/star-forming subpopulations or into different halo mass bins. We find that the first PCs of both and IllustrisTNG simulated colours explain more than 98% of the data variance, and moreover that there is an angle of approximately 0.5 degrees between their first eigenvectors. In what follows, we refer to the orientation of the first PC constructed in this fashion as the SFH-direction.
We now compare the SFH-direction to the first eigenvector resulting from the PC decomposition of the colours of star-forming and quenched galaxies residing in halos of different masses. In Table 2 we present the angle between the SFH-direction and each of these first eigenvectors. We find that this angle is smaller than 5 degrees for all subsamples, and thus SFH variations modify colours in the same direction of the colour space for all populations and halo masses. We illustrate this result in Fig. 4; in this figure dashed lines indicate the SFH-direction, arrows point in the direction of the first eigenvector of each subpopulation, and the sizes of the vectors denote the standard deviation of colours projected onto this UniverseMachine, quenched galaxies Figure 4. Influence of SFH on broad-band galaxy colours at = 0.03. In the left and right panels, we present the results for star-forming and quenched galaxies, respectively. We produce these colours with , holding fixed all galaxy properties and using the star formation histories of galaxies; thus the range colours spanned by these distributions exclusively reflects the diversity in SFH predicted by . Contours and shaded areas enclose 75% of the distributions, arrows point in the direction of their first eigenvectors, arrow sizes indicate the standard deviation of their first PCs, and grey dashed lines denote the first eigenvector of the entire population of galaxies. Strikingly, for all galaxy populations, the grey lines run parallel to all first eigenvectors; in Appendix §A we show that the same result holds for SFH variations predicted by IllustrisTNG, demonstrating that physically-motivated SFH variations modify galaxy colours along a single direction in the five-dimensional colour space.
direction. In accord with the invariance of the angles in Table 2, all eigenvectors are well-aligned in all projections of the colour space. We therefore conclude that physically-motivated SFH variations result in galaxy colours moving along a single direction in the fivedimensional colour-colour space.
In Fig. 4 we can also see that the scatter in galaxy colours depends strongly on peak . We quantify this scatter by computing the volume of the minimum ellipsoid (VME) enclosing 68 % of each colour distribution. To do so, we start by projecting all distributions onto their eigenvectors. Then, we compute the VME using the following expression where = 5 is the number of colours, Γ refers to the Gamma function, Φ is the quantile function of a 2 distribution with degrees of freedom, = 0.68 indicates the percentage of the distribution that the ellipsoid encloses, and is half the difference between the 16th and 84th percentiles of projections onto the th PC.
In Table 2 we present the VME of star-forming and quenched galaxies hosted by halos of different masses. Overall, the VME of the colours of star-forming (quenched) galaxies increases (decreases) with peak in both and IllustrisTNG. In the next section, we proceed to explore the origin of these trends.

Physical origin of colour variations owing to SFH differences
After a star formation episode, a galaxy forms stars according to its initial mass function, and, if this episode is sufficiently strong, . Correlation between {g-r, r-i} colours and * ( = 1 Gyr), the fraction of stellar mass formed over the last billion years. Each point represents a galaxy at = 0.03. As expected, galaxies with larger values of * ( = 1 Gyr) are bluer. Irrespective of halo mass, the correlation between * ( = 1 Gyr) and galaxy colours is exceptionally tight, and exhibits little-to-no dependence upon galaxy assembly history prior to the last billion years.
newly formed O and B stars quickly dominate optical and nearinfrared wavelengths (e.g., Renzini & Buzzoni 1986;Charlot & Bruzual A 1991). However, O and B stars do not dominate these wavelengths for more than a few million years if no further forma- Dependence of the g-r colour on * ( = 1 Gyr) at = 0.03. Blue solid and orange dashed contours indicate the results for and IllustrisTNG, respectively, and the vertical dotted line indicates * ( = 1 Gyr) = 0.01. As we can see, there is a strong relation between * ( = 1 Gyr) and g-r: for * ( = 1 Gyr) > 0.01 galaxies get rapidly bluer as * ( = 1 Gyr) increases, while for * ( = 1 Gyr) < 0.01 galaxies practically present the same red colours. This result is also valid for other broad-band colours. Bottom panel. Distribution of * ( = 1 Gyr) for galaxies hosted by haloes of different masses. Error bars and shaded areas indicate 16th and 84th percentiles. As expected, the average value of * ( = 1 Gyr) decreases as peak increases. The top and bottom panels together provide a simple way to understand the tightness of the red sequence.
tion of stars takes place; this is because the time a star spends in the main sequence (MS) decreases rapidly with its mass. Massive, blue stars thus leave the main sequence quickly, and lower mass, redder stars progressively dominate short optical wavelengths. This picture is nonetheless more complicated for longer optical and nearinfrared wavelengths; this is the result of horizontal, asymptotic giant, and red giant branch stars contributing substantially to these wavelengths 0.1, 1, and 10 billion years after a star-forming episode (e.g., Charlot & Bruzual A 1991;Maraston 1998), respectively. These two processes explain why the optical and NIR colours of quenched galaxies become progressively redder.
The situation explained above is different for a galaxy presenting further formation of stars; this is because such galaxy would present a steady supply of massive, blue MS stars, which would dominate short optical wavelengths (e.g., Bruzual & Charlot 1993). For longer wavelengths, we face a more complicated scenario: for some billion years blue MS stars would dominate the light of this galaxy but progressively, as more MS stars evolve into the horizontal, asymptotic giant, and red giant branch, the light of this galaxy would get increasingly redder (e.g., Bruzual & Charlot 1993). In summary, the optical and near-infrared colours of a galaxy undergoing an intense star-forming episode become extremely blue because massive MS stars dominate these wavelengths, while those of a galaxy no experimenting any formation of stars for a few billion years get red due to post MS stars dominating these. This scenario is obviously more complex for galaxies with intermediate star formation, and it is metallicity dependent; as the metallicity of a galaxy increases, so it does the dominance of its light by MS stars (Maraston 2005, e.g.,). Based on this intuitive picture, we proceed to investigate why galaxies with different SFHs have distinct colours.
This aforementioned scenario motives exploration of the following proxy for galaxy colours: the fraction of stellar mass formed over the last years before observation time 0 , * ( ) ≡ * ( 0 )− * ( 0 − ) * ( 0 ) . In Fig. 5, we display the projection of the simulated colours produced in §4.1 for onto the g-r; r-i plane (see Appendix A for an analogous figure for IllustrisTNG). Symbols are colour-coded according to * ( = 1 Gyr) and indicate the results for individual galaxies. We find that the g-r and r-i colours present a very strong correlation with * ( = 1 Gyr) for starforming and quenched galaxies in haloes of different masses, where these colours get bluer as * ( = 1 Gyr) increases. Of course, this correlation is entirely expected based on the well-established physical model introduced above; our concern in the remainder of this section is a quantitative assessment of the explanatory power of * ( = 1 Gyr) when applied to the physically-motivated star formation histories predicted by IllustrisTNG and .
We further test the accuracy of * as a proxy for SFH variations in galaxy colours by computing the correlation between the projection of the colours shown in Fig. 5 onto the SFH-direction, exploring * ( ) for = 0.25, 0.5, 1, 1.5, and 2 Gyr. We find that the strength of this correlation at < 1 Gyr exceeds = 0.86 and 0.96 for and IllustrisTNG, respectively, with very little dependence upon . In contrast, this correlation weakens rapidly for timescales larger than 1 Gyr. This characteristic timescale reflects the time at which all stars more massive than 2 have left the main sequence. We note that this timescale is wavelength dependent; young (old) stars are much brighter on wavelengths shorter (longer) than those studied here (e.g., Maraston 1998), and thus our results only hold for optical and near-infrared rest-frame colours.
For a better understanding of the relation between galaxy colours and * ( = 1 Gyr), in the top panel of Fig. 6 we display simulated g-r colours as a function of * ( = 1 Gyr). Blue solid and orange dashed contours indicate the results for and IllustrisTNG, respectively, where the inner and outer contour enclose 60 and 90% of the corresponding colour distribution. Generally speaking, the g-r colour gets redder as * ( = 1 Gyr) decreases, a result that also applies to other optical colours. We can readily see that there is a strong dependence of g-r on * ( = 1 Gyr) for galaxies with * ( = 1 Gyr) 0.01: g-r increases by ∼ 0.4 mag between * ( = 1 Gyr) = 0.1 and 0.01. On the other hand, the dependence of g-r upon * ( = 1 Gyr) rapidly weakens for galaxies with * ( = 1 Gyr) < 0.01, so that all such galaxies have nearly the same red colours. This trend can again be understood in terms of the basic physical picture outlined above: galaxies with * ( = 1 Gyr) < 0.01 form essentially no new blue stars over the last billion years, and so their stellar light is dominated by old main sequence stars and asymptotic and red giant branch stars. Note that at fixed * ( = 1 Gyr) colours produced using and IllustrisTNG SFHs are very similar. This is remarkable, given the differences between SFHs predicted by both models. We further study this result in a companion paper (Chaves-Montero et al. in prep.).
Given the correlation between SFH and halo mass, we also expect a similar connection between * ( = 1 Gyr) and peak . In the bottom panel of Fig. 6, we show the distribution of * ( = 1 Gyr) for galaxies hosted by haloes of different masses. Blue and orange (green and red) colours indicate the results for star-forming and quenched (IllustrisTNG) galaxies, respectively. Symbols denote medians, while error bars show 16th and 84th percentiles. Overall, star-forming galaxies present a larger value of * ( = 1 Gyr) relative to quenched galaxies, and for both populations * ( = 1 Gyr) decreases as peak increases. These basic scaling relations with * ( = 1 Gyr) drive the trend of star-forming (quenched) galaxies having preferentially blue (red) colours, as well as the tendency of central galaxies to be redder in haloes of increasing mass. Note that the results are very similar for and IllustrisTNG.
We can gain further insight into the distribution of galaxy colours by consideration of how * ( = 1 Gyr) varies with halo mass. The collection of star-forming and quenched galaxies in haloes less massive than log 10 ( peak [ℎ −1 M ]) < 12.2 tend to have a broad range of values of * ( = 1 Gyr) that exceed 0.01. As shown in the top panel of Fig. 6, the g-r colour has a strong dependence upon * ( = 1 Gyr) in this regime, and so the wide range of values of * ( = 1 Gyr) > 0.01 naturally gives rise to the wide range of colours of galaxies in low-mass haloes. In §4.1, we also observed that the scatter in the colours of star-forming galaxies grows with peak ; this is consistent with the broadening of the * ( = 1 Gyr) distribution as peak increases for this population (see the blue and green curves in the bottom panel of Fig. 6). On the other hand, for quenched galaxies in massive haloes, the distribution of * ( = 1 Gyr) rapidly plummets below 0.01 with increasing halo mass; accordingly, quenched galaxies in massive halos have a much more restricted range of predominantly red colours. These results provide a simple way to understand the tightness of the red sequence.

Impact of SFH on observed magnitudes
The stellar mass of a galaxy is strongly correlated with its luminosity. This relation is especially tight in the near-infrared, as near solarmass stars dominate the near-infrared luminosity of a galaxy as well as its stellar mass (Gardner et al. 1997). Given that more massive galaxies live in more massive haloes, we also expect a strong correlation between near-infrared luminosity and peak . In the top panel of Fig. 7, we display the distribution of i-band magnitudes for galaxies in haloes of different masses at = 0.03 as predicted by . Solid and dashed lines denote the results for star-forming and quenched galaxies, respectively. As expected, galaxies in more massive haloes are in general brighter, albeit with significant scatter. Interestingly, we find that star-forming galaxies are slightly brighter than quenched galaxies at fixed halo mass. This trend is explained by a) star-forming galaxies having more blue stars relative to quenched galaxies, and b) the light-to-mass ratio of blue stars is larger than that of red stars. The corollary of this tendency is that at fixed * , the luminosity of a galaxy depends on its Distribution of i-band magnitudes for galaxies in haloes of different masses at = 0.03. We generate these luminosities using SFHs predicted by -. Solid and dashed lines present the results for star-forming and quenched galaxies, respectively. The luminosity of a galaxy increases with peak , and at fixed peak star-forming galaxies are slightly brighter than quenched galaxies. Bottom panel. Observed i-band magnitudes of individual galaxies as a function of their stellar masses. Symbols are colourcoded according to * ( = 1 Gyr). Even though luminosities and stellar masses are strongly correlated, this relation presents a small scatter that is mostly captured by * ( = 1 Gyr).
colours. This is a well-known result that has been used to improve the precision of stellar masses derived from photometric observations (e.g., Bell et al. 2003). The relation between luminosity and colours motivates the principal concern of this section, which is to study the dependence of galaxy luminosities upon * ( = 1 Gyr), our proxy for galaxy colours.
In the bottom panel of Fig. 7, we display the relation between observed i-band magnitudes and * for (in Appendix §A we present an analogous figure for IllustrisTNG). Symbols are colour-coded according to * ( = 1 Gyr) and indicate the results for individual galaxies. We find that the Spearman rank correlation coefficient between i-band magnitudes and stellar masses is -0.97. Even though this correlation is very strong, we can see that there is still some scatter between i-band magnitudes and stellar masses. Remarkably, this scatter is mostly captured by * ( = 1 Gyr): at fixed * the brightest (dimmest) galaxies are those with the largest (smallest) * ( = 1 Gyr). More quantitatively, we estimate the fraction of the scatter captured by * ( = 1 Gyr) by fitting two different Gaussian processes to the data; the first (second) considers * ( * and * ( = 1 Gyr)) as dependent variable(s) to predict i-band magnitudes. We find that the standard deviation of the residual between data and predictions is a factor of two smaller after considering * ( = 1 Gyr) in addition to * .
Assembling the basic trends covered in this and the previous section, we have now a full picture of the influence of SFH on broad-band galaxy SEDs: galaxies get brighter as their stellar mass increases and, at fixed * , galaxies with a larger value of * ( = 1 Gyr) present bluer colours and slightly larger luminosities. Furthermore, the fraction of quenched galaxies increases with host halo mass as a result of the increasing abundance of galaxies with small values of * ( = 1 Gyr) at high masses.

INFLUENCE OF GALAXY PROPERTIES ON BROAD-BAND COLOURS
In §4.1 we studied the influence of SFH on broad-band colours, finding that physically-motivated SFH variations modify galaxy colours along a single direction in colour space. In this section we address the impact of variations in IMF, metallicity, dust attenuation, and nebular emission line ratios on broad-band colours. We investigate the influence of galaxy properties on colours using a similar PCA-based approach as in §4.1. We will explore the following galaxy features: • Initial Mass Function. The fiducial configuration of 9.0 +0.8 −1.0 * Variance explained by the first PC of the indicated parameter. † Angle between the direction in colour space associated with variations in the indicated parameter and the SFH-direction. ‡ Impact of variations in the indicated parameter on the SFH-direction. assumes a Chabrier IMF. To study the dependence of galaxy colours on this property, we also explore Kroupa and Salpeter IMFs.
• Dust attenuation. We explore the influence of each of the two parameters controlling the dust attenuation model separately. We sample each of these two parameters using ten linearly-spaced values spanning the intervals ,y ∈ [0, 3] and ,o ∈ [0, 1].
To study the impact of a certain parameter on colours, we first select a single galaxy history from one of the SFH libraries introduced in §2.2. We then produce colours for every value of the parameter under consideration, while holding fixed all other parameters to their fiducial values. After that, we iterate over every SFH in the libraries for each target parameter. We thus end up with a new set of simulated colours for each value of the parameters studied, allowing us to isolate the influence of the corresponding physical effects on colours.
In Fig. 8 we show the impact of IMF, metallicity, dust attenuation, and nebular emission lines on galaxy colours in the {g-r, r-i} plane at = 0.03. Orange contour indicate the distribution of colours resulting from varying the parameter indicated by the title at the top of each panel; for reference, a blue shaded area displays the distribution of colours arising only from SFH variations. Both contours and shaded areas contain 90% of the distributions. Broadly speaking, when varying metallicity, dust attenuation, and nebular emission line ratios, the area of colour space spanned by galaxy colours is significantly expanded beyond the extent of that resulting from SFH variations alone, in accord with previous results (e.g., Pacifici et al. 2015). Fig. 8 suggests that variations in SFH have a qualitatively different influence on galaxy colours relative to other galaxy properties. To quantitatively compare the impact of the SFH and that of other galaxy properties, we first determine the direction along which each parameter modifies colours. Then, we proceed to determine the angle between each of these directions and the SFHdirection. A small (large) angle between the SFH-direction and a certain galaxy property indicates that variations in both this property and the SFH have a similar (different) impact on galaxy colors. To determine the direction associated with a particular parameter in colour space, we start by computing, for each SFH from any of the libraries introduced in §2.2, the first eigenvector of the distribution of colours resulting from varying the target parameter while holding all other parameters fixed to their fiducial values. We thus end up with a distribution of eigenvectors, one for each element of the and IllustrisTNG libraries. Then, we marginalise over SFH variations by taking the average of this collection of eigenvectors; we refer to the resulting vector as the direction associated with the parameter under consideration. It is worth noticing that the first PC of all galaxy properties but explains more than 97% of the variance; for , the variance explained by the 1st PC drops to 90% (see Table 3).
In Fig. 8, orange arrows denote the direction along which distinct galaxy properties modify colours, while blue arrows show the SFH-direction. In Table 3 we collate the angle between the SFHdirection and the direction associated with each parameter; we find that all these angles are larger or equal to 12 degrees. We thus conclude that changes in metallicity, dust attenuation, and nebular emission line ratios modify colours in distinct directions of the colour space relative to SFH variations.

Impact of galaxy properties on the SFH-direction
Before presenting the influence of each individual galaxy feature on broad-band colours, we first study whether the orientation of the SFH-direction depends on such properties. For the sake of definiteness, we begin by using the IMF as an example. We first select the colours generated for each SFH in our libraries using different values of the target parameter while holding the other parameters fixed to their fiducial values. We group these colours according to the value considered for the target parameter, in this particular case a Chabrier, Salpeter, and Kroupa IMF, ending up with three distinct sets of colours. We then compute the first eigenvector of each of these colour distributions. Note that, by definition, the first eigenvector associated with Chabrier colours is the SFHdirection. Finally, we compute the angles between the Salpeter and Kroupa eigenvectors and the SFH-direction. In Table 3 we show the median of these two angles, which we use to give an estimate for how much the orientation of the SFH-direction changes due to physically plausible variations in the IMF. We repeat the same process for each parameter, gathering the results in Table 3. In all cases, we find that the SFH-direction never varies by more than 9 degrees owing to variations in other galaxy features. Based on this exercise, we conclude that the orientation of the SFH-direction is very robust: no matter the properties of the galaxy, the formation of stars smoothly varies broad-band colours along the same direction in colour space.

Initial Mass Function
The initial mass function is a significant source of systematic uncertainty in galaxy evolution (for a recent review, see Hopkins 2018). The impact of IMF variations on the broad-band colours of single stellar populations is rather small, as SSP colours are dominated by stars with masses close to the main sequence turnoff (Figure 7, ). However, it is conceivable that the IMF may modify the colours of composite stellar populations due to the large range of turnoff masses contributing to the integrated light. To study the in-fluence of the IMF upon broad-band colours, we leverage physicallymotivated SFHs from and IllustrisTNG, and we use to generate colours assuming Chabrier, Kroupa, and Salpeter IMFs. We find that broad-band colours have very little dependence on the IMF for realistic composite stellar populations; the median difference between u-g and z-Y colours produced using a Chabrier and a Kroupa (Salpeter) IMF is 2.4 and 0.3 mmag (18.4 and 5.7 mmag), respectively. Consequently, the impact of IMF variations on broad-band colours is very weak. Nonetheless, these small differences shift colours in the same direction for all galaxies. Intuitively, Salpeter and Kroupa IMFs result in redder colours relative to a Chabrier IMF because the first two are characterised by a larger ratio of low-to-high mass stars than the latter.

Metallicity
It is well-known that the temperature and luminosity of main sequence stars are inversely proportional to their metallicity (e.g., Bruzual & Charlot 2003); galaxies therefore get redder as their average metallicity increases (e.g., Sanders et al. 2013, for results from observations). By analysing the colours generated using SFHs from and IllustrisTNG, we confirm that subsolar (super-solar) metallicities result in broad-band colours that are bluer (redder) than those corresponding to solar metallicities. Regarding the size of these variations, the median difference between u-g and z-Y colours produced using a solar and a sub-solar = 0.004 metallicity (super-solar = 0.05 metallicity) is -0.33 and -0.16 mag (0.07 and 0.09 mag), respectively, where positive and negative values indicate a shift to the red and blue. Furthermore, the direction of colour fluctuations owing to metallicity differences, i.e., the metallicity-direction, presents an angle of 35 degrees with respect to the SFH-direction. Taken together, these results show that metallicity variations have a strong effect on galaxy colours, especially on those with shorter wavelengths, and that changes in metallicity modify colours in a completely distinct direction in colour space relative to SFH variations.

Dust attenuation
Dust grains absorb UV and optical light, and then they re-radiate this energy in the infrared; consequently, galaxies get redder as their dust content increases. It is worth noticing that even though the attenuation of starlight is indeed directly proportional to the total amount of dust in a galaxy, the precise shape of the attenuation curve depends upon dust properties such as the distribution of grain sizes (e.g., Weingartner & Draine 2001). In we use the Charlot & Fall (2000) dust model, which includes one parameter controlling the attenuation of light from coming from stars younger than 10 Myr, ,y , and one for stars older than this age, ,o . Broadly, we find that the impact of each parameter is similar, making broad-band colours redder. The median difference between u-g and z-Y colours produced using ,y = 0 and 3 ( ,o = 0 and 1) is 0.32 and 0.06 mag (0.19 and 0.06 mag), respectively. Dust attenuation has thus a strong impact on broad-band colours, especially on those with shorter wavelengths.
As with the other parameters, we compute the direction in colour space associated with variations in ,y and ,o . We find that the ,y -and ,o -directions present an angle of 12 and 27 degrees with respect to the SFH-direction, respectively. Therefore, the SFH-and ,y -direction are rather similar, a manifestation of the well-known age-dust degeneracy (e.g., Papovich et al. 2001). On the other hand, the angle formed by the ,o -and SFH-directions is significantly larger, and so evidently the attenuation of light from old stars leaves a signature on broad-band colours that is discernible from the influence of star formation history.
Finally, we study the dependence of the direction associated with dust attenuation on the shape of the attenuation curve using the Calzetti (Calzetti et al. 2000) and Fitzpatrick (Fitzpatrick 1999) laws. In particular, we generate colours for each of these laws using ten linearly-spaced values for the total V-band attenuation spanning the interval ∈ [0, 3] mag. We find that the first eigenvector of each of these two attenuation laws captures more than 99.99% of the variance in the data and that the Calzetti (Fitzpatrick) eigenvector presents an angle of 20 (21) degrees with the SFH-direction. These results let us conclude that the impact of the shape of the attenuation curve on the direction along with dust attenuation modify colours is weak. In fact, the Calzetti and Fitzpatrick eigenvectors present an angle of just 8 and 6 degrees with the ,o -direction, respectively.

Nebular emission
The impact of nebular emission lines on broad-band colours is more important for galaxies with low metallicities, large SFRs, and at high redshift (e.g., Conroy 2013). The influence of nebular emission on any particular colour has a dramatic dependence on redshift because the filter in which a specific emission line falls changes as the central wavelength of such line increases with redshift. The introduction of nebular emission lines thus results in some broad-band colours getting redder while others bluer; this is in contrast to variations in IMF, metallicity, and dust attenuation, because changes in these other properties systematically shift all colours towards the blue or red. In what follows, we first study the impact of nebular emission lines on broad-band colours at = 0.03; we then address the effect of considering different emission line ratios.
In , the luminosity of nebular emission lines is proportional to the star formation rate of a galaxy; therefore, we expect the colours of star-forming galaxies to be more strongly influenced by nebular emission than those of quenched galaxies. Indeed, the median difference between u-g and g-r colours produced with and without considering emission lines for star-forming (quenched) galaxies is -44.7 and 22.1 mmag (-9.7 and 1.1 mmag), respectively, where we model emission lines using ISM = 0.001 and log 10 = −3.5. This general trend can be seen in the bottom-middle and bottomright panels of Fig. 8; galaxies with bluer (redder) colours are more (less) affected by nebular emission lines. On the other hand, the impact of nebular emission lines on broad-band colours is weaker relative to that of both metallicity and dust attenuation.
There are two parameters controlling nebular emission line ratios in , ISM and . In contrast to other parameters, a monotonic change any of these two parameters does not result in broad-band colours getting systematically bluer or redder. Consider, for example, how the colours of a galaxy change when excluding emission lines altogether versus including emission lines using ISM = 0.001 and log 10 = −1 or the same value of ISM and log 10 = −4. We find that for the first case the u-g and g-r colours shift by 19.4 and -13.2 mmag, respectively, while for the second case the variations are -35.6 and 15.1 mmag.
Some combinations of ISM and produce emission line ratios that are incompatible with observations, and thus the colour distributions shown in the bottom-middle and bottom-right panels of Fig. 8 Figure 9. Colours of SDSS galaxies and theoretical predictions for the distribution of colours that results from physically-motivated SFH variations. Shaded areas display the results for SDSS galaxies, while solid (dashed) contours show (IllustrisTNG) predictions. Both contours and shaded areas enclose 90% of the colour distributions. Brown, purple, and yellow arrows indicate the direction in colour space associated with changes in metallicity, dust attenuation, and nebular emission lines, respectively. The major axis of the SDSS ellipsoid is closely aligned with the major axes of the and IllustrisTNG ellipsoids, while it presents a significant angle with respect to the directions associated with variations in other galaxy properties. Taken together, these results suggest that the SFH is the most important driver of optical and near-infrared galaxy colours. that the angles formed by the ISM -and -directions and the SFHdirection are 22 and 55 degrees, respectively. Nebular emission lines thus vary galaxy colours in quite distinct directions in colour space relative to SFH variations. Finally, we note that nebular continuum emission is currently neglected in ; as a result, star-forming galaxies should present even bluer colours and the directions associated with nebular emission line parameters could be rotated.

ANALYSIS OF BROAD-BAND COLOURS FROM OBSERVATIONS
In §4, we used PCA to study the influence of SFH on broad-band colours, finding that physically-motivated SFH variations modify colours along a single direction in colour space, the SFH-direction. Then, in §5, we conducted a similar analysis of other galaxy features, finding that changes in the IMF, metallicity, dust attenuation, and nebular emission move galaxy colours along distinct directions in colour space relative to the SFH-direction. In this section, we seek to identify the galaxy feature(s) with the strongest influence on the colours of observed galaxies. To do so, we again employ the PCA-based methodology used in §4 and 5, only here we analyse the broad-band colours of a volume-limited sample of galaxies from the Sloan Digital Sky Survey (SDSS, York et al. 2000;Dawson et al. 2013). Throughout this section we only consider the four colours resulting from subtracting consecutive SDSS bands: {u-g, g-r, r-i, i-z}.
We proceed to analyse the broad-band colours of galaxies from the SDSS data release 10 (Ahn et al. 2014) with secure spectroscopic redshifts within the narrow interval 0.025 < < 0.035 and with stellar masses greater than log 10 ( * [ ]) = 9.5; the first and second criterion are introduced to minimise the impact of redshift evolution and to avoid completeness issues, respectively. We start by carrying out a PCA on the colours of the galaxies in this sample, finding that the first and second PCs explain 89.2 and 8.7% of the variance, respectively. The colours of SDSS galaxies thus span a multi-dimensional ellipsoid with significant extent along two independent dimensions.
In Fig. 9 we compare the distribution of broad-band colours of SDSS galaxies alongside theoretical expectations for the distributions that result from variations in SFH alone. Blue shaded areas display the distribution of SDSS colours, while orange solid (green dashed) contours present the distribution of simulated colours produced by variations in SFH predicted by (Illus-trisTNG) (see §4). In particular, the volume spanned by the orange and green contours is exclusively due to variations in SFH, as we hold all other galaxy properties fixed to their fiducial values. Both shaded areas and contours enclose 90% of the colour distributions.
As expected, the major axes of the and Il-lustrisTNG ellipsoids are closely aligned, supporting the thesis that the SFH-direction is universal. Additionally, the breadth of the orange and green ellipsoids is very similar, confirming that physicallymotivated SFH variations result in galaxy colours spanning a similar volume in colour space. Furthermore, the major axes of simulated and observed colours also appear to be well-aligned, suggesting that the most important galaxy property controlling broad-band colours is the star formation history. More quantitatively, we find that the angle between the SFH-direction and the first eigenvector of SDSS colours is very small, approximately 7 degrees, comparable to the small size of changes to the SFH-direction induced by variations in distinct galaxy properties (see §5.1). On the other hand, the angles between the first eigenvector of SDSS galaxies and the IMF-, -, V,y -, V,o -, 27,15,31,15, and 47 degrees, respectively. Taken together, these results suggest that broad-band optical and near-infrared galaxy colours are primarily driven by star formation history.
Despite the close correspondence of the SFH-direction with the first eigenvector of SDSS colours, in Fig. 9 we can also see that the ellipsoid of SDSS colours spans a much larger volume than that of and IllustrisTNG. Therefore, it is clear that SFH variations alone cannot completely account for the colours of observed galaxies. To help visualise this, in the bottom right panel of this figure we display the directions associated with variations in metallicity, dust attenuation, and nebular emission lines. We can see that these are indeed the directions needed to stretch the ellipsoids of simulated colours in order to cover the volume of colour space spanned by observed galaxies. Thus, even though SFH is the primary driver of galaxy colours, variations in metallicity, dust attenuation, and nebular emission lines are nonetheless required in order to account for the colours of observed galaxies.

DISCUSSION
Determining the physical characteristics of individual galaxies is one of the most common applications of SPS modelling, and there is a diversity of methodologies that have been deployed to infer SFHs in particular. Broadly speaking, all such analyses proceed by programmatically varying a SFH model together with other ingre-dients such as metallicity and dust attenuation. The observational predictions of each model are then compared to corresponding measurements of the galaxies of interest, enabling determination of the best-fit point (and sometimes confidence intervals) on the model parameter space. This approach to SFH inference has been explored using a wide variety of observations, including individual spectral features (Kauffmann et al. 2003;Chauke et al. 2018), broad-band photometry ), and colour-magnitude diagrams of resolved stellar populations (McQuinn et al. 2010) and unresolved stellar populations (Cook et al. 2019). See Conroy (2013) for a comprehensive review.
A natural way to classify the landscape of SFH models is to divide them into parametric and non-parametric approaches. In parametric models, SFHs are described by a family of functions controlled by a set of parameters that are explored during the analysis. This approach is computationally efficient and transparent, and it has been shown that even relatively simple models can capture most SFHs predicted by simulations with reasonably high fidelity (Simha et al. 2014;Diemer et al. 2017). Common choices for parameterized SFHs include single component -models (Schmidt 1959), rising SFHs aiming to explain the SED of high-redshift galaxies (Buat et al. 2008;Lee et al. 2009), log-normal SFHs motivated by measurements of cosmic star formation rate density (Gladders et al. 2013), and various two-epoch models with separate rising and declining phases at early and late times (Ciesla et al. 2017;Tinker 2017;Carnall et al. 2018). It is important to note that none of these functional forms can account for complex features such as episodic bursts of star formation, or rejuvenation following a sudden quenching event. Therefore, the simplicity of these models may bias SFH inference for galaxies with such features.
Alternatively, non-parametric models do not explicitly specify an analytical function for star formation history. Instead, they typically define SFHs based on SFRs computed at a finite set of control points. A wide range of such SFH models have been explored in the literature, including those based on piecewise constant functions (Cid Fernandes et al. 2005;Ocvirk et al. 2006) and models with time-stepping that is adaptive (Tojeiro et al. 2007;Iyer et al. 2019) and/or stochastically correlated (Caplar & Tacchella 2019). As expected, nonparametric models can predict a broader variety of features in the SFH relative to a family of simple analytical functions. However, this added flexibility usually comes at significant computational cost, and potentially encompasses freedom to capture features in SFH that are either not physical, not warranted by the available data, or both. After all, there exists a finite amount of SFH information contained in any observational dataset, and so one must be careful not to overfit limited data with a highly complex model. Even with high-resolution spectra in optical wavelengths, at most ∼ 8 episodes of star formation can be reliably recovered (Ocvirk et al. 2006). Despite photometric measurements through a few broad bands contain considerable constraining power on galaxy parameters (Pacifici et al. 2016), much information about the SFH is lost when observing a galaxy's SED through only a handful of broad-band filters. As a result, imaging data alone constrain even fewer SFH features relative to spectroscopic data .
To address the ill-conditioned nature of SFH inference with a flexible non-parametric model, it is typical to appeal to physically motivated priors to reduce the dimensionality of the problem. In one approach to applying such priors, galaxy catalogues generated by semi-analytic models or hydrodynamical simulations are treated as libraries from which SFHs are drawn (Finlator et al. 2007;Brammer et al. 2008;Pacifici et al. 2012). Synthetic galaxy catalogues can also be used as training data for a machine learning algorithm such as a Convolutional Neural Network (Lovell et al. 2019), or for a flexible basis function decomposition (Iyer & Gawiser 2017;Iyer et al. 2019). 10 Using a library of SFHs from the semi-analytic model developed in De Lucia & Blaizot (2007), Pacifici et al. (2015) studied how changes in numerous modelling ingredients propagate into variations in colours. Pacifici et al. (2015) showed that even the full panoply of SFHs predicted by the SAM is insufficient to reproduce the diversity of observed galaxy colours. One of the most important conclusions of the present work is that physically-motivated SFHs result in galaxy colours tightly localised along a single direction in colour space. Our identification of this "SFH-direction" reinforces the results presented in Pacifici et al. (2015), in addition to providing confirmation with SFHs drawn from two independent models. Moreover, we show that the SFH-direction is closely aligned with the first eigenvector of SDSS galaxies, suggesting that galaxy colours are mainly driven by star formation history.

CONCLUSIONS
The spectral energy distribution of a galaxy results from the complex interaction of many factors, including its star formation history, metallicity evolution, and dust properties. In this work, we take advantage of the newly developed galaxy spectral prediction tool to study the influence of distinct galaxy properties on broad-band optical and near-infrared colours from 3 200 to 10 800 Å at 0. Our main findings can be summarised as follows: • Using SFHs predicted by the empirical model and the cosmological hydrodynamical simulation Illus-trisTNG, we show that physically motivated SFH variations modify broad-band colours along a single direction in colour space, the SFH-direction. We find that the SFH-direction is universal, i.e. it is the same for galaxies in haloes of different masses, independently of whether these are star-forming or quenched galaxies, as well as for the two models studied. Furthermore, we show that variations in any other galaxy property have very little impact on the SFH-direction.
• We determine that the projection of optical and near-infrared galaxy colours onto the SFH-direction is mostly regulated by the fraction of stellar mass formed over the last billion years. Taken together in combination with the phenomenon of downsizing, i.e., the fact that galaxies in more massive haloes reach the peak of their SFH faster, have greater stellar mass, and quench earlier, this result accounts for the observation that galaxies get increasingly redder as their host haloes become more massive.
• We show that the broad-band colours of low redshift SDSS galaxies span a multidimensional ellipsoid with significant extent along two independent dimensions. The major axis of this ellipsoid is closely aligned with the SFH-direction, while it presents a significant angle with respect to the directions associated with variations in other galaxy properties. Considering together, these results suggest that the main driver of optical and near-infrared galaxy colours is the star formation history. Furthermore, we show that despite broadband colours are mostly controlled by SFH, changes in metallicity, dust attenuation, and nebular emission lines are also required to explain the diversity of galaxy colours from observations. 10 Such catalogs are also widely used to validate, rather than train, techniques for inferring physical galaxy properties, e.g. Laigle et al. (2019).
In a companion paper to the present work, we will exploit the simplicity of the influence of SFH upon galaxy colours to create a surrogate model for broad-band photometry. By leveraging highperformance implementations of machine-learning algorithms on GPU resources, the resulting emulator will create the capability to map approximate, but high-accuracy colours to survey-scale volumes of simulated galaxies at a tiny fraction of the time taken by traditional SPS codes. These two works will thus serve as the foundation of Surrogate modelling the Baryonic Universe (SBU), a new simulation-based method to forward-model the galaxy-halo connection. IllustrisTNG, star-forming galaxies  Figure A1. Same as Fig. 3 for SFHs drawn from IllustrisTNG. The main difference between SFHs predicted by and IllustrisTNG is that after the peak of SFR, the latter decrease more slowly (rapidly) for star-forming (quenched) galaxies.

APPENDIX A: ADDITIONAL FIGURES FOR ILLUSTRISTNG
For brevity, throughout the main body of this work we mostly included figures presenting results for . Here, we display analogous figures for IllustrisTNG.
In Fig. A1 we show SFHs predicted by IllustrisTNG for starforming and quenched galaxies (see Fig. 3 for ). We can clearly see that the SFHs predicted by these two models present different shapes. First, after cosmic star formation peaks at ∼ 1 − 3, the SFR of star-forming galaxies in decreases more rapidly relative to IllustrisTNG. Second, at fixed halo mass, galaxies in IllustrisTNG become quenched at earlier times relative to . Finally, we also find that SFHs drawn from exhibit substantially more burstiness relative to those from IllustrisTNG.
Although SFHs predicted by and Illus-trisTNG have different shapes and levels of burstiness, in both models the influence of SFH variations on colours is very similar. In Fig. A2 we display the distribution of galaxy colours resulting from the diversity of SFHs predicted by IllustrisTNG (see Fig. 4 for ). As in , SFH variations result in galaxy colours moving along a single direction in colour IllustrisTNG, star-forming galaxies  space. Because the modelling of SFH in these two models is predicated upon quite distinct assumptions, there is good reason for confidence that the universality of the SFH-direction is robust to systematic uncertainties in the true physical processes that regulate star formation.
In Fig. A3 we show the correlation between broad-band colours and * ( = 1 Gyr) at = 0.03 in IllustrisTNG (see Fig. 5 for -). We find that the projection of galaxy colours onto  Fig. 5 for colours produced using SFHs predicted by IllustrisTNG. We can see that * ( = 1 Gyr) is not only strongly correlated with colours, but also with IllustrisTNG colours.
the SFH-direction is strongly correlated with * ( = 1 Gyr) in both and IllustrisTNG. Again, given the differences between SFHs predicted by and IllustrisTNG, it is remarkable that * ( = 1 Gyr) captures so precisely the influence of SFH variations on galaxy colours. This paper has been typeset from a T E X/L A T E X file prepared by the author.