The PAU Survey: a new constraint on galaxy formation models using the observed colour redshift relation

We use the GALFORM semi-analytical galaxy formation model implemented in the Planck Millennium N-body simulation to build a mock galaxy catalogue on an observer's past lightcone. The mass resolution of this N-body simulation is almost an order of magnitude better than in previous simulations used for this purpose, allowing us to probe fainter galaxies and hence build a more complete mock catalogue at low redshifts. The high time cadence of the simulation outputs allows us to make improved calculations of galaxy properties and positions in the mock. We test the predictions of the mock against the Physics of the Accelerating Universe Survey, a narrow band imaging survey with highly accurate and precise photometric redshifts, which probes the galaxy population over a lookback time of 8 billion years. We compare the model against the observed number counts, redshift distribution and evolution of the observed colours and find good agreement; these statistics avoid the need for model-dependent processing of the observations. The model produces red and blue populations that have similar median colours to the observations. However, the bimodality of galaxy colours in the model is stronger than in the observations. This bimodality is reduced on including a simple model for errors in the GALFORM photometry. We examine how the model predictions for the observed galaxy colours change when perturbing key model parameters. This exercise shows that the median colours and relative abundance of red and blue galaxies provide constraints on the strength of the feedback driven by supernovae used in the model.


INTRODUCTION
In the effort to understand the physical processes that govern the formation and evolution of galaxies, mock galaxy catalogues have become an important tool for comparing theoretical models to observations.Wide-field galaxy redshift surveys are covering ever larger areas of the sky to increasing depths.A mock catalogue can be used to model the selection effects that dominate every galaxy survey, and hence allows us to understand how these observational effects shape any measurements made from the survey, and thus, in turn, helps us to disentangle physical results from observational features.
Here, with the aim of using new observations to help constrain galaxy formation models, we build a replica of the Physics of the Accelerating Universe Survey (PAUS; Eriksen et al. 2019;Padilla et al. 2019;Serrano et al. 2023;Castander et al. in prep).Using a combination of the PAUS narrow band imaging and intermediate and broad band photometry, Eriksen et al. (2019) measured photometric redshifts for PAUS galaxies in the COSMOS field, estimating a scatter ( 68 /(1 + ) = 0.0037 to  AB = 22.5) that is around an order of magnitude below the few percent level that is typically obtained when using a handful of broad band filters (see also Eriksen et al. 2020, Alarcon et al. 2021, Cabayol et al. 2021, Soo et al. 2021, Cabayol et al. 2023, Navarro-Gironés et al. 2023).
Building a mock catalogue with realistic photometric redshift errors provides a way to understand the selection effects on measured statistics.We focus on two of the largest fields in PAUS, the Canada-France-Hawaii-Telescope Lensing Survey (CFHTLS) W1 and W3 fields, which cover about 38 deg 2 .Broad band imaging is available for these fields in the standard  * , , , ,  filter set from the CFHTLenS catalogues (Cuillandre et al. 2012;Erben et al. 2013), to complement the PAUS narrow band photometry.Despite the much improved precision in the photometric redshifts obtained using PAUS photometry, the associated errors along the line-of-sight remain an observational effect of concern.The rms error at  ∼ 0.3 is a little over a comoving distance of 10ℎ −1 Mpc (Stothert et al. 2018).Also, around 17 per cent of the galaxies in the  AB = 22.5 sample with photometric redshifts have substantial errors in their estimated redshifts and are considered as outliers (see Eq. C3 for the definition of a photometric redshift outlier, which is the one used by Eriksen et al. 2019).Such errors could alter the perceived evolution of a statistic by mixing galaxies with different properties between redshift bins.If the property evolves over a redshift range comparable to the errors in the photometric redshift, or if there are significant numbers of redshift outliers, this will alter the measured evolution of the statistic.The mock catalogue allows us to investigate the impact of errors in photometry and, in turn, photometric redshifts, on observed galaxy statistics.
The PAU Survey complements and extends spectroscopic studies of galaxy evolution.PAUS is deeper than the Galaxy and Mass Assembly (GAMA) Survey (Driver et al. 2009).The deepest GAMA fields are limited to  AB = 19.8.For the typical galaxy colour of  −  ∼ 0.4 (González et al. 2009), this corresponds roughly to  AB = 20.2, which is approximately two magnitudes shallower than the PAUS limit considered here of  AB = 22.5.(Note that the PAUS catalogue now extends to  AB = 23, but when this project was started the bulk of the available photometry was limited to  AB = 22.5.)The GAMA redshift distribution peaks at  ∼ 0.2 with a tail that extends to  ∼ 0.5.PAUS has the same depth as the VIMOS Public Extragalactic Redshift Survey (VIPERS; Guzzo et al. 2014;Scodeggio et al. 2018), which measured approximately 100 000 galaxy redshifts in the interval 0.5 <  < 1.2, over 24 deg 2 , around two-thirds of the combined solid angle of the W1 and W3 fields considered here.VIPERS used a colour preselection to target galaxies with  ≳ 0.5.As we will see, this is the peak in the redshift distribution for galaxies brighter than  AB = 22.5.The Deep Extragalactic VIsible Legacy Survey (DEVILS) Davies et al. (2018) is deeper than GAMA with a higher completeness than surveys like VIPERS, but covers a small solid angle (6 square degrees) and contains 60 000 galaxies.PAUS samples the full range of galaxy redshifts to this magnitude limit, covering 0 <  < 1.2, with about 584 000 galaxies in the W1 and W3 fields.Moreover, the galaxy selection in PAUS is genuinely magnitude limited.As we show in Section 3.1, the requirements placed on the shape or features in a galaxy spectral energy distribution in order to measure a photometric redshift are less demanding than those needed to successfully extract a spectroscopic redshift.There is no requirement on finding spectral features to measure a redshift with a high degree of certainty, so there is no bias against objects with weak spectral breaks or emission/absorption lines.As part of their study of spectral features in PAUS galaxies, Renard et al. (2022) looked at the evolution of galaxy colour for a sample matched to the VIPERS survey mentioned above.
The redshift range covered by PAUS galaxies corresponds to a look back time of around 8 billion years or about two-thirds of cosmic history.Over this period a dramatic change took place in the global star formation rate (SFR) density (Madau & Dickinson 2014).The present-day SFR density is around one-tenth of the value at the peak, which occurred just above  = 1.Hierarchical models of galaxy formation have traditionally struggled to reproduce a drop in the global star formation activity of the same size (e.g.Baugh et al. 2005, Lagos et al. 2018).The inference of the global SFR from observations is fraught with difficulties, such as accounting for the attenuation of starlight by dust, which is important at the short wavelengths that are most sensitive to recent star formation, and the 'correction' for galaxies that are too faint to be observed.Instead, we take the more direct approach of considering observed galaxy colours rather than extracting model dependent quantities from observations.The  −  colour is less affected by dust attenuation than the UV fluxes used to deduce SFRs.We will compare the predictions of galaxy formation models to the location and width of the red and blue clouds, and to the numbers of galaxies they contain.
Optical galaxy colours are sensitive to the star formation activity in galaxies and other intrinsic properties such as the metallicity and overall age of the composite stellar population and the galaxy stellar mass (e.g.Daddi et al. 2007;Taylor et al. 2011;Conroy 2013;Robotham et al. 2020).Galaxy colours are also correlated with morphology (Strateva et al. 2001).Hence by measuring galaxy colours we can in principle constrain some of the physical processes that change the star formation history of a galaxy and the chemical evolution of its stars.The relative importance of gas cooling, and heating by supernovae and AGN is expected to change over the time interval accessible through the PAUS data.
The traditional way to analyse galaxy surveys, particularly ones that cover a substantial baseline in redshift, is to estimate rest-frame luminosities for galaxies.This involves correcting for band-shifting effects, which lead to filters in the observer frame sampling progressively shorter wavelengths in the rest frame of the galaxy with increasing redshift (Hogg et al. 2002;Kasparova et al. 2021).This correction depends on the shape of the galaxy's spectral energy distribution which depends on its star formation history, chemical evolution, stellar mass and dust content.Corrections may also be required for changes in the stellar populations over time, called evolutionary corrections in luminosity function studies (Loveday et al. 2015).To accomplish this, the survey may be split into a set of disjoint redshift shells to measure the evolution of the luminosity function, however this results in removing many survey galaxies from the analysis.
Here we take a simpler approach which uses all of the galaxies in a survey and tries to avoid any model dependent processing of the observations.We aim to compare the model predictions with actually observed quantities based on apparent magnitudes and redshift.In addition to basic statistics like the number counts and redshift distribution of galaxies, we also consider the evolution of the observed galaxy colours with redshift, exploiting the wide redshift baseline and homogeneous selection of PAUS.
To compare the evolution of observer frame colours with theoretical models it is necessary to include the sample selection and the band shifting effects in the model predictions.We do this by building a mock catalogue on an observer's past lightcone by implementing a semi-analytical model of galaxy formation into an N-body simulation (Kitzbichler & White 2007a;Merson et al. 2013).This opens up a new set of tests of galaxy formation models: the overall galaxy number counts, the redshift distribution and the evolution of the observed colours; in the latter two cases, the statistics are measured for a specified magnitude limit.Hence, we extend the datasets typically used to calibrate galaxy formation models, such as the local luminosity function or stellar mass function, to include statistics that cover a range of redshifts and are relevant to ongoing surveys such as DESI (DESI Collaboration et al. 2016, 2022) and Euclid (Laureĳs et al. 2011).We use the GALFORM galaxy formation model (Cole et al. 2000;Lacey et al. 2016) implemented in the Planck Millennium N-body simulation (Baugh et al. 2019).This extends the work of Stothert et al. (2018), as the N-body simulation used here has superior resolution in mass and time.This allows us to include fainter galaxies in the mock catalogue and to make more accurate predictions for galaxy positions and luminosities.Also, since Stothert et al. (2018), sufficient PAUS data has been collected to allow accurate measurements of the basic galaxy statistics to be made.A similar exercise was carried out by Bravo et al. (2020) who compared observed colours from a lightcone mock catalogues built using the SHARK semi-analytical model of Lagos et al. (2019) to compare to the GAMA survey; here we extend this comparison to higher redshift.
The remainder of the paper is laid out as follows: we first describe the theoretical framework used to build the PAUS mock in Sect.2, then we will present our main analysis and results in Sect.3. In § 3.4 we show how sensitive the model predictions are to the parameter choices.Finally, we present our conclusions in Sect. 4.

THEORETICAL MODEL AND OBSERVATIONAL DATASET
Here we describe the theoretical model, covering the galaxy formation model ( § 2.1), the N-body simulation in which it is implemented ( § 2.2), the construction of the lightcone mock catalogue ( § 2.3), before introducing the PAUS dataset in § 2.4.

Galaxy formation model
We use the GALFORM semi-analytical model of galaxy formation (Cole et al. 2000;Bower et al. 2006;Lacey et al. 2016).The model follows the key physical processes that shape the formation and evolution of galaxies in the cold dark matter cosmology (for reviews of these processes and semi-analytical models see Baugh 2006 andBenson 2010).The model tracks the transfer of mass and metals between different reservoirs of baryons, predicting the chemical evolution of the gas that is available to form stars and the full star formation history of galaxies.When implemented in an N-body simulation, the semi-analytical model also provides predictions for the spatial distribution of galaxies (Kauffman 1999;Benson et al. 2000).
The calibration of the model parameters is described in Lacey et al. (2016), who provide a list of the model parameters in their table 1. Mostly local observational data is used in the calibration, which historically has been performed by hand in a 'chi-by-eye' approach.Elliott et al. (2021) describe an automated and reproducible calibration that can perform an exhaustive search of a high-dimension parameter space.Here we use the version of the model introduced by Gonzalez-Perez et al. ( 2014) (hereafter GP14), as recalibrated by Baugh et al. (2019) following its implementation in the P-Millennium N-body simulation (which is described in Sect.2.2).This recalibration required changes to the values of two parameters: the velocity that sets the mass loading of winds driven by supernova and the strength of AGN feedback (the parameter in this case effectively determines the halo mass at which AGN heating is able to stop gas cooling).We note that Stothert et al. (2018) used the Gonzalez-Perez et al. (2014) version of GALFORM; the model used here is essentially the same, with two small changes made to the parameter values as outlined above (see Baugh et al. 2019 for more details).

The Planck Millennium N-body simulation
The Planck Millennium N-body simulation is part of the 'Millennium' series of simulations of structure formation (Springel et al. 2005;Guo et al. 2013; see table 1 in Baugh et al. 2019 for a summary of the specifications of these runs and the cosmological parameters used).The Planck Millennium follows the evolution of the matter distribution in a volume of 5.12 × 10 8 Mpc 3 , which is 1.43 times larger, after taking into account the differences in the Hubble parameters assumed, than the simulation described by Guo et al. (2013), which was used by Stothert et al. (2018) to build an earlier mock catalogue for PAUS.
The Planck Millennium uses over 128 billion particles (5040 3 ) to represent the matter distribution, which is over an order of magnitude more than was used in the earlier Millennium runs.This, along with the simulation volume used, places the Planck Millennium at a resolution intermediate to that of the Millennium-I simulation of Springel et al. (2005) (hereafter MSI) and the Millennium-II run described in Boylan-Kolchin et al. (2009).The Planck Millennium has many more outputs than the MSI, with the halos and subhalos stored at 271 redshifts compared with the ∼ 60 outputs used in the MSI.Dark matter halo merger trees were constructed from the SUBFIND subhalos (Springel et al. 2001) using the DHALOS algorithm described in Jiang et al. (2014) (see also Merson et al. 2013).Halos that contain at least 20 particles were retained, corresponding to a halo mass resolution limit of 2.12 × 10 9 ℎ −1 M ⊙ .

Building a lightcone mock catalogue
The construction of a mock catalogue for a cosmological redshift survey can be accomplished in different ways, resulting in predictions with different accuracies, and which inform us to different extents about the physics behind galaxy formation.In principle, a simple approach would be to sample a population of galaxies randomly from an observed statistical distribution such as the luminosity function.However, this would lead to a catalogue with information limited to the property studied in the statistical distribution, ignoring any other properties and their relation with other observables.Moreover, the biggest limitation is that such a simplistic catalogue would not even be able to track the evolution of the galaxy population with redshift.
To build a more realistic catalogue we need to track the evolution of the dark matter structures and populate the dark matter halos with galaxies at different epochs.Here, we make use of the Planck Millennium N-body simulation described in the previous section.
To populate dark matter halos in the simulation with galaxies, we implemented the GALFORM semi-analytic model of galaxy formation on the merger histories of the dark matter halos extracted from the simulation.The combination of the Planck Millennium and GALFORM results in a physically motivated model which includes environmental effects associated with the merger histories of halos, and gives predictions for the spatial distribution of galaxies.GALFORM predicts the chemical evolution of the gas and stars in each galaxy, along with the size of the disc and bulge components and their star formation histories.The model outputs the mass-to-light ratios in a list of filters that are specified at run time.Along with the model for attenuation of stellar emission by dust described in Cole et al. (2000) (see also Lacey et al. 2016), this allows the model to predict the brightness or magnitude of the model galaxies in these bands.
GALFORM outputs the properties and positions of the galaxy population in the simulation box at a discrete set of redshift outputs.The lightcone is built by interpolating galaxy magnitudes and positions between the values at these discrete redshifts, using the redshift at which the galaxy crosses the observer's lightcone.Thanks to the high time resolution of the Planck Millennium outputs, the reliability of the interpolation process described below is increased compared to that in earlier Millennium simulations.
To build the PAUS mock we follow the procedure described in Merson et al. (2013).We first place an observer at some position inside the simulation box, and choose a line-of-sight direction1 for the mid-point of the survey, and a solid angle.Given the size of the simulation box, using this volume on its own we would only be able to probe redshifts out to  ≈ 0.19.Hence, to cover the volume sampled by PAUS we need to replicate the simulation box in space using the periodic boundary conditions of the simulation.A galaxy crosses the past lightcone of the observer in between two of the simulation output redshifts or snapshots.The positions of the galaxy in the two snapshots are used to estimate its position at the lightcone crossing.Merson et al. (2013) applied different interpolation procedures for central and satellite galaxies.Central galaxies are assumed to be at the centre of mass of the host dark matter halo and hence track its motion between the snapshots.In this case, a simple linear interpolation is sufficient.Satellite galaxies, on the other hand, follow more complicated paths and can enter the observer's past lightcone either before or after their associated central.For this reason, a more sophisticated treatment is needed to compute the position of a satellite galaxy, taking into account its orbit around the central (see fig. 2 of Merson et al. 2013).Interpolating the galaxy positions in this way minimises artificial jumps in the correlation function measured from the lightcone.
Assigning properties to galaxies as they cross the observer's past lightcone using a simple interpolation between snapshots could lead to inaccuracies.The evolution of some properties, such as the SFR, is too complicated to be modelled by simple linear interpolation.Star formation can result from stochastic events, such as galaxy mergers and mass flows triggered by dynamically unstable discs, as well as smoother quiescent star formation in the galactic disc.For this reason, we follow Merson et al. (2013) and simply retain the galaxy properties from the higher redshift snapshot just above the redshift of lightcone crossing (as suggested by Kitzbichler & White 2007a).Given the higher frequency of simulation outputs in the Planck Millennium run, the errors associated with this treatment are smaller than in previous Millennium simulations.
The one exception to this is the magnitude of the galaxy in the prespecified filters in the observer frame.The definition of the observer frame depends on redshift and so is slightly different at the two redshifts that straddle the lightcone crossing redshift.We perform a linear interpolation between these two versions of the observer frame magnitudes to compute the observed magnitude at the redshift of lightcone crossing.In addition to the band shifting of the observer frame, we need to use the luminosity distance that corresponds to the lightcone crossing redshift to compute the apparent magnitude of the galaxy in the mock.This approach does not take into account any change in the spectral energy distribution of the galaxy between the higher redshift snapshot and the lightcone crossing redshift.However, the resulting colour-redshift relation is smooth and contains no trace of the locations of the simulation snapshots, as shown in Fig. A1 in Appendix A.Here we test the interpolation scheme further by estimating photometric redshifts for the mock galaxies (see Section 3.2) and by looking at the colour redshift relation defined using colours obtained from the PAUS narrow band filters (Appendix A).
Using the methods set out above, we have built a mock catalogue for PAUS which covers approximately 100 deg2 , with a magnitude limit of  AB = 24.We used P-Millennium snapshots in the redshift range 0 <  < 2. For some applications, we impose a magnitude limit to the mock of  AB = 22.5 2 .Some of the predictions we present include a simple model for errors in the photometry of GALFORM galaxies, which is set out in Appendix B. In this case, the magnitude limit is imposed after applying the perturbations to the raw magnitudes to account for the photometric errors.

The PAU Survey
We test the GALFORM lightcone against the Physics of the Accelerating Universe Survey (PAUS).PAUS was carried out using PAUCam (Padilla et al. 2019), a camera that was mounted on the William Hershel Telescope (WHT) in La Palma, Spain.PAUS is a novel imaging survey, with the key feature being the 40 narrow-band filters of width 130Å covering the wavelength range from 4500Å to 8500Å , spaced by 100Å (see fig. 1 in Renard et al. 2022).The 40 PAUS narrow bands overlap the wavelength range covered by the CFHTLenS ,  and  broadband filters (Erben et al. 2013), as shown in fig. 1 of Stothert et al. (2018) andRenard et al. (2022).The narrow bands are particularly important when estimating photometric redshifts.The precision that PAUS can achieve is intermediate between that which can typically be achieved with a handful of broadband filters and that obtained with spectroscopy in a large-scale structure survey, in which case the spectral resolution and exposure time are chosen to maximise the number of redshifts that can be measured.Eriksen et al. (2019) report an error of   = ( photo −  spec )/(1 +  spec ) ∼ 0.0037 when selecting the 'best' 50 per cent of the PAUS photometric redshifts in the COSMOS field limited at  AB = 22.5.PAUS observations are available for the CFHTLenS wide fields: W1, W3 and W4, and the W2 field which corresponds to the Kilo Degree Survey (KiDS) (Kuijken et al. 2019).For this study, we have decided to use the largest fields in PAUS which are W1, covering 13.71 deg 2 and W3 covering 24.27 deg 2 (giving a total of 37.98 deg 2 ).We use photometric redshifts estimated using the BCNZ2 code following the approach taken by Eriksen et al. (2019).We note that improved estimates of the photometric redshifts in PAUS have also been produced in a series of papers (Eriksen et al. 2020;Alarcon et al. 2021;Cabayol et al. 2021;Navarro-Gironés et al. 2023).

RESULTS
We first describe some basic properties of the lightcone mock, such as its visual appearance, number counts and redshift distribution ( § 3.1), before describing the estimation of photometric redshifts for the mock galaxies, using a simple, approximate model for flux errors ( § 3.2) and then comparing the evolution of the observed colours with PAUS ( § 3.3).Finally, we assess the sensitivity of galaxy colours to the model parameters ( § 3.4).

Basic results: number counts and redshift distribution
In this section, we discuss the basic predictions of the simulated lightcone to show that they can reproduce the trends observed in the PAUS observations.One important feature of the lightcone is its magnitude limit cut.For some purposes, the magnitude limit of  AB = 22.5 is imposed on the magnitudes of mock galaxies without photometric errors.In other cases, the mock galaxy magnitudes are perturbed as described in Section 3.2 and Appendix B and the magnitude limit is applied to a deeper catalogue to investigate the impact of photometric errors.The narrow band photometry has been computed using the transmission curves estimated by Casas et al. (2016) and Padilla et al. (2019) for the PAUCam optical system and the broadband photometry has been computed from the transmission curves used in the CFHTLenS (Erben et al. 2013).
The distribution of the mock galaxies on the sky for three representative redshift bins is shown in Fig. 1, where we have split the galaxies into red and blue populations according to the observed  − colour (see Eq. 2 and the associated discussion in Section 3.3).The spatial scale in these images is indicated by the bar which shows a scale of 10 Mpc, and allows us to compare the size of the structures at different redshifts.As shown in previous studies (e.g.Zehavi et al. 2011 using Sloan Digital Sky Survey galaxies), red galaxies tend to cluster more strongly than blue ones.This is driven by environmental effects, such as the quenching of gas cooling and star formation when galaxies fall into the potential well of a more massive host dark matter halo (for example due to ram pressure stripping or other similar phenomena related to the removal of gas from galaxies due to gravity or tidal interactions).In the first row of Fig. 1 (0 <  < 0.07), this effect is clearly visible with structures traced out by red galaxies being sharply defined compared to the more 'diffuse' distribution of blue galaxies seen in the right panel.In the middle row of Fig. 1 (0.50 <  < 0.51) as we zoom out, a larger region of the cosmic web is visible.The difference in the contrast of the structures seen with red or blue galaxies is now less pronounced, but still present, with the structures traced by blue galaxies appearing somewhat less sharp than those mapped by the red galaxies.In the bottom row of Fig. 1, which shows the redshift slice 0.90 <  < 0.91, we can see that although the total number of galaxies is lower than it is in the other lower redshift bins, the relative numbers of red and blue galaxies are reversed (i.e.we now have more blue galaxies than red ones), due to the general uplift in star formation activity with increasing redshift.Now that we have gained a visual impression of the galaxies in the lightcone, and have seen how different colour populations trace out structures, we are ready to perform more quantitative analyses.The first simple characteristic measure of an optically selected galaxy  2007) (red points) and the PAUS data in the W1 and W3 fields for different selections: full photometric sample (orange line), this includes all objects that have been observed in the narrow-band (NB) filter NB455 (this means that they might not have a redshift estimate), objects with star_flag= 0 (blue line) which are those that has been classified as galaxies from a CFHTLenS star-galaxy separation algorithm, objects with star_flag= 1 (violet line) which are those that has been classified as stars, total photo-z sample (pink line), which are the galaxies that have a PAUS redshift estimate (they need to be observed in a large fraction of the NB filters) and 50 per cent of the best quality redshift sample (brown line) according to the quality flag   as described in Eriksen et al. (2019).
sample is the number counts as a function of magnitude.We plot the -band number counts in Fig. 2. The blue line in Fig. 2 represents an estimate of the observed galaxy number counts for PAUS in the W1 and W3 fields (which cover, respectively, areas of 13.71 deg 2 and 24.27 deg 2 , giving a total of 37.98 deg 2 ).This is the area covered by the PAUS observations with at least one measurement in the narrow band filter at 455 nm.This results in a more complete sample than the PAUS photo-z catalogue, because in order to measure a photometric redshift, there is a requirement for the galaxy to be imaged in at least 30 out of 40 narrow band filters (as well as the 5 CFHTLenS broadbands from the parent catalogue).This target is not always met for the PAUCam imaging (Padilla et al. 2019).We also include the number counts of the subsample of galaxies with photometric redshifts (pink line).The photo-z catalogue covers areas of 9.73 deg 2 and 20.37 deg 2 in W1 and W3 respectively, giving a total of 30.10 deg 2 which is 79 per cent of the photometric sample area.An important thing to note is that the shape of the number counts is the same for the photometric (blue line) and the photometric redshift (pink line) catalogues, which means that we expect their statistical properties to be similar, modulo a simple sampling factor (the median ratio between the sample with photometric redshifts and the full photometric sample number counts) that we estimate to be about 0.897.It is common practice in photometric redshift studies to apply cuts on the quality of the redshift estimates to define a new subsample of the catalogue for a particular analysis.The number counts for the best 50 per cent of the photo-z sample are shown by the brown line in Fig. 2. In this case, the shape of the number counts starts to depart from that of the photometric sample for magnitudes fainter than  AB ∼ 20.This occurs because the fraction of objects with poorer quality factors increases as fainter magnitudes are reached.This is an important factor to consider when performing statistical tests and the impact of this cut on galaxy colours will be considered later on.
The blue curve in Fig. 2 is the best estimate of the galaxy number counts, after applying a simple cut to remove stars from the photometric catalogue.The raw uncorrected counts of all objects in the PAUS photometric catalogue are shown by the orange curve.The property star_flag, defined in the CFHTLenS catalogue, is used to remove stars.Objects with star_flag= 1, which are deemed to be stars, are shown by the purple curve.Note that there is a change in the methodology used to assign the star_flag value at  AB = 21.At brighter magnitudes than this, the size of the image is compared to the size of the point spread function, with unresolved objects being classified as stars.At fainter magnitudes, an object has to be unresolved and a good fit to a stellar template to be labelled as a star (Erben et al. 2013).After removing stars in this way, the galaxy counts (blue curve), agree well with a previous estimate from the smaller COSMOS field ∼ 4 deg 2 ) by Capak et al. (2007) (red points; these counts extend to fainter magnitudes than shown in the plot).The number counts predicted by GALFORM, measured from the lightcone, are shown by the green thick line.These agree remarkably well both with the COSMOS and PAUS measurements, particularly in view of the fact that mainly local observations were used to calibrate the model.
As a further test of the GALFORM predictions for galaxy number counts, we compare with the target density of galactic sources in the Dark Energy Spectroscopic Instrument Bright Galaxy Survey (DESI BGS) input catalogue estimated by Ruiz-Macias et al. (2020, 2021) (see also Hahn et al. 2023).Ruiz-Macias et al. ( 2021) find an integrated surface density of sources to  AB = 19.5 of 808 deg −2 .In the GALFORM mock we find 837 deg −2 to the same depth, which agrees with the DESI value to within 5 per cent.For PAUS, combining the W1 and W3 fields, we obtain a surface density of 719 deg −2 , which is about 10 per cent lower than the DESI BGS value.However, we note that the combined area of the W1 and W3 fields (for the photometric sample) is 37.98 deg, i.e. 400 times smaller than the imaging data used to obtain the DESI BGS estimate.Therefore, the counts from the PAUS fields could be subject to sample variance.
After the number counts, the next statistic to consider that characterises the galaxy population is the redshift distribution, the number of galaxies per square degree as a function of redshift.We show the redshift distribution of galaxies to two flux limits in Fig. 3,  AB = 19.8 in the left panel, the depth of the deepest fields in the GAMA survey (Driver et al. 2011) and the PAUS limit3 of  AB = 22.5 in the right panel, which is substantially deeper.
The distribution of photometric redshifts in the combined W1 and W3 PAUS fields is shown by the red histograms in the panels of Fig. 3.These distributions are obtained by imposing the respective flux limits used in each panel, along with a selection on a star-galaxy separation parameter to reduce the contamination by stars (i.e.only retaining objects with star_flag = 0).The normalisation of the redshift distribution has been corrected for the offset between the number counts of objects in the photometric sample and the photo-z sample (this is the sampling factor described above).The left panel of Fig. 3 also shows a fit to the observed redshift distribution from the GAMA survey, made by Smith et al. (2017). 4This agrees well with the distribution of photometric redshifts from the W1 and W3 PAUS fields, which together correspond to about one-fifth of the total solid angle probed by GAMA.Note that in the right panel of Fig. 3, by construction the photometric redshift code does not return redshifts above  = 1.1.It is also clear from this panel that there is a preference for photometric redshifts around  ∼ 0.75, which is a systematic in the estimation that is being investigated by the PAUS team, rather than due to large-scale structure; the feature at  ∼ 0.15 is due to large-scale structure (see figure 13 of Navarro-Gironés et al. 2023).At low redshift the survey samples a smaller volume than at high redshift and the redshift distribution is more susceptible to fluctuations due to features like clusters.
The green histograms in Fig. 3 show the corresponding redshift distributions predicted using the GALFORM lightcone.A simple fit to the lightcone redshift distribution is given by  (Baugh & Efstathiou 1993).We find the best fitting parameters to be  = 321 428,  c = 0.18, and  = 1.7 for the  AB = 19.8magnitude limited () (left panel).While for the  AB = 22.5 magnitude limited (), the best fit is given by  = 610 000,  c = 0.4 and  = 1.6.The predicted redshift distributions agree well with the observed ones for both magnitude limits shown in Fig. 3.

Estimating photometric redshifts for the mock
One of the important aims of this work is to quantify how the observed colour distribution of galaxies evolves with redshift (see Section 3.3).To isolate physical trends from those introduced by observational errors, we need to estimate photometric redshifts for the model galaxies.To do this, we need to model the observational errors in the photometry of the mock galaxies.We perturb the fluxes of the model galaxies to mimic the errors expected for the detection of a point source, given the magnitude limit of the PAUS observations in each band (see Table B1 in Appendix B; this appendix also discusses why we treat the galaxies as point sources).The errors are assumed to be Gaussian distributed in magnitude with a variance which is set by the signal-to-noise ratio at the magnitude limit in a particular band, using the formalism set out in van den Busch et al. (2020) (see Appendix B for more details).The broad band (BB) flux limits are much deeper than those for the PAUS imaging (see Erben et al. 2013).The PAUS NB magnitude limits are 5 limits for point sources (see Serrano et al. 2023 and Table B1).
The flux errors are computed for a subset of galaxies (44 700) from the mock catalogue limited to  AB = 24, which is a much deeper sample than the one we aim to analyse.This sample is then cut back to  AB = 22.5 once the magnitude errors have been applied, giving a final sample of 14 100 galaxies.The BCNz2 algorithm (Eriksen et al. 2019) is run on the perturbed model fluxes to estimate photometric redshifts.We then compare the scatter and fraction of outliers in the resulting photometric redshifts with those found for the observed galaxies.
Fig. 4 shows the results of this exercise.The left panel shows the estimated photometric redshift,  photo , as a function of the true value,  spec , which is the redshift including the effects of peculiar motions  taken from the lightcone.This is the equivalent of a spectroscopic redshift but with no measurement error.We quantify the scatter in the photometric redshifts in a similar way to Eriksen et al. (2019), using a centralised estimate,  68 , defined as: where  84 and  16 are the 84 th and the 16 th percentiles, respectively, of the distribution of the photometric redshift relative errors: . This last quantity is plotted as a function of the estimated photo-z in the right panel of Fig. 4. Estimates of the  68 are reported in the key of the same figure.The scatter found for the mock shares qualitative features with those inferred from the observations, being of the same order of magnitude and showing trends such as increasing with redshift.The observations that we use in the right panel of Fig. 4, and that we label as 'PAUS SPEC' are a match of the PAUS field W1 and W3 with spectroscopic measurements from other surveys5 .Since these PAUS SPEC samples are not simple flux-limited catalogues, they have a bias towards brighter magnitudes as a result of maximizing the number of spectroscopic redshift matches.The scatter predicted in the photometric redshifts for the mocks is nevertheless in reasonable agreement with the observational estimate.The characteristics of the mock photometric redshifts are discussed further in Appendix C. In summary, the size of the scatter is comparable to that estimated for the observations.However, the fraction of outliers is somewhat lower in the mock than in PAUS.This is due in part to our treating all of the model galaxies as unresolved point sources; in practice, resolved galaxies will have larger photometric errors, which could lead to more photometric redshift outliers.Also, we do not include the contribution of emission lines to the NB flux.The improved emission line model implemented in GALFORM by Baugh et al. (2022) will be used in a forthcoming test of photometric redshift codes.
Finally, it is reassuring that in Fig. 4 we can see no trace of any preferred values for the photometric redshifts recovered for the model galaxies.In particular, the redshifts of the original output snapshots in the N-body simulation are not apparent.This provides a validation of the methodology applied in order to compute the observer frame magnitudes in the model lightcone.Recall that the observer frame is defined at the simulation output redshifts on either side of the redshift at which the galaxy crosses the observer's past lightcone, and a linear interpolation is used to estimate the observer frame magnitudes in different bands at the lightcone crossing redshift (Merson et al. 2013).This point is investigated further in Appendix A.
The analysis in the subsequent subsections looks at the distribution of observed galaxy colours and their evolution with redshift.We will investigate the impact that errors in photometry and photometric redshift have on the GALFORM predictions.

Evolution of galaxy colours
Here, we study the evolution of the observer frame  −  colour with redshift.In an effort to keep the results from the observational data as model independent as possible, we use observer frame quantities to simplify the analysis, thereby avoiding the need to devise -corrections to transform observed colours to the rest-frame.Fig. 5 shows the distribution of galaxies with photometric redshifts in the observed  −  colour − redshift plane for the combined PAUS W1 and W3 fields (left) and the GALFORM model lightcone (right), in both cases to a magnitude limit of  AB = 22.5.Note that in the GALFORM case in Fig. 5 we are showing the galaxy colours without photometric errors and use the cosmological redshift (the effect of the inclusion of photometric errors and the use of the estimated photometric redshift is shown in Fig. 6 and discussed later in the text).Focusing on the left panel first which shows PAUS galaxies, the shading shows that there are two distinct populations of galaxies, the well known red sequence and blue cloud.Motivated by this, we place a dividing line to set the boundary between these populations: Blue galaxies lie below this line and red galaxies above it.Whilst there is a clear peak in the counts of galaxies in the red and blue clouds, there is a low count bridge of galaxies with intermediate colours connecting these two clouds.This is the so-called 'green valley'.The minimum in the green valley is well defined and shifts to redder values of the observed ( − ) colour with increasing redshift, up to  ∼ 0.4.Beyond this redshift, the position of the green valley does not change in colour.The shape of the valley becomes more 'flat bottomed' at high redshift, with the blue and red peaks moving further apart.At the highest redshifts the red peak becomes more indistinct and is much weaker than the blue peak.Having split the population into two using this line, we can compute the median colours of the sub-populations on either side of the dividing line, along with the respective inter-quartile ranges (shown by the coloured lines and bars).The uneven density variations along the redshift axis are due to large-scale structure in the W1 and W3 fields.The observed  −  colour evolves with redshift.There are two main physical contributions to the shape of the galaxy spectral energy distribution which affect this evolution: the attenuation of the starlight by dust and the shape of the stellar continuum.The latter effect depends on the amount of ongoing star formation and the age of the composite stellar population.In the rest frame, the effective wavelength of the -band is 4792.9Åand for  it is 6212.1Å.The main spectral feature at these wavelengths, particularly once a modest redshift is applied to the source, is the 4000Å break, a combination of various metal absorption lines over a range of several hundred Angstroms which are stronger in older stellar populations.PAUS images galaxies using narrow band filters that span the wavelength range from 4500Å to 8500Å .A wavelength of 4000Å in the rest-frame is sampled by the  and the  bands for redshifts in the range 0.16 <  < 0.36.The decline in the spectrum associated with the 4000 Å break actually starts around 4500 Å, close to the effective wavelength of the -band.As redshift increases, the −band in the observer frame samples progressively shorter wavelengths in the rest-frame, towards the 4000Å break (see Renard et al. 2022 for a discussion of this spectral feature).The observed  −  colour gets redder with increasing redshift, with the gradient being somewhat steeper for red galaxies (with deeper 4000Å breaks).Note that starforming galaxies display a modest reddening of the stellar continuum around 4000 Å, albeit not as pronounced as in galaxies with older composite stellar populations.Hence the observer frame  −  colour for star-forming galaxies in the blue cloud also gets redder with increasing redshift.At  = 0.3, the observer frame -band samples the rest-frame effective wavelength of the -band at  = 0, and the  filter starts to move down to shorter wavelengths than the break.At higher redshifts than this, there is a divergence in the observer frame  −  colours found for the red sequence and blue cloud, with both Figure 5. 2D histogram of galaxy counts in the observed  −  colour vs redshift plane, for galaxies brighter than  AB = 22.5.The left panel shows galaxies from the combined PAUS W1 and W3 fields.The white line is used to separate red and blue galaxies (see text for equation).This is the same criteria used to separate red and blue galaxies in Fig. 1.Stars have been removed using the CFHTLenS property star_flag = 0.The lines with bars show the median colour and 25-to-75 percentile range for the red and blue populations.The right panel shows the same plane for the model lightcone.As the model lightcone covers a roughly three times larger area than the observations, we have randomly sampled the model galaxies to match the total number of observed galaxies.To compare the two panels, we set the same colour bar; the most populated bins of the model lightcone are saturated with counts above the limit of 600 galaxies per bin.filters now sampling rest-frame wavelengths that are bluewards of the 4000Å break.
The right panel of Fig. 5 shows the equivalent information for the model lightcone.As the model lightcone covers a much larger solid angle than the combined W1 and W3 fields, we have randomly sampled the model galaxies to match the total number of galaxies in the observed sample (583 992 galaxies).To ensure that the random sampling does not affect the results we tested three different random seeds and observed no difference in the resulting colour redshift distribution.In principle, using the same number of objects allows us to use the same colour scale for the density shading for the observations and the model.However, as the colour bimodality is noticeably tighter in the model, the white bins in the right panel are all saturated as the counts reach around a thousand per pixel, and the colour shading peaks at 600 galaxies per pixel.The larger solid angle of the model lightcone also means that large-scale structure has a smaller impact on the number of galaxies, so we see little evidence of any striping in redshift.The overall locus of galaxies in the red sequence and blue cloud in the model is similar to that seen in the observations, so we are able to use the same line to divide the model galaxies into red and blue subsamples.
To make a more quantitative comparison of the colour evolution between the observations and the model, we compute the median and interquartile range of the distribution of  −  colour in narrow redshift bins, considering the blue and red populations separately.As we have already noticed by the relative tightness of the shaded regions in the colour-redshift plane in Fig. 5, the bimodality is stronger in the model colours compared with the observed ones.This is backed up by the narrower interquartile range of colours in the model compared with the observations.This behaviour of the model had already been noticed in previous comparisons (González et al. 2009;Manzoni et al. 2021).
The predicted width of the red and blue populations is strongly affected by the addition of photometric errors (see Appendix B for a description of the errors applied), as shown by the inter-quartile ranges plotted in Fig. 6.This figure shows the comparison in the running medians and percentiles for the colour -redshift relation when using the unperturbed colours  −  predicted by the lightcone (red and blue lines) versus using the perturbed colours by adding the simulated errors (pink and cyan dotted line) as in Appendix B. The perturbed colours are plotted against the photometric redshift estimated by the photometric redshift code as in Section 3.2, while the unperturbed magnitudes are plotted against the cosmological redshift outputted by GALFORM .This effect will be shown in plots of In each case (lightcone, PAUS and VIPERS) red and blue galaxies have been split according to the white line in Fig. 5 and the median has been computed in the two populations of galaxies separately.Left: the running median computed for different apparent magnitude limits.Right: the running median computed for different quality cuts, using the property   (see Eriksen et al. 2019) to identify the 50, 20 and 10 per cent best quality redshifts in the sample.
the colour distribution for different selections in redshift and apparent magnitude in the remaining of this section.
We make further comparisons between the evolution of the observer frame colour distributions in the model lightcone and observations in Fig. 7, again including the effects of photometric errors in the model colours and using the estimated photometric redshifts.For clarity, we drop the density shading in this plot and show only the median colour and inter-quartile range for different selections.Note that the results for the model and the observations are plotted together in the same panel in this plot.The left panel of Fig. 7 extends the standard colour -redshift comparison made at the PAUS depth of  AB = 22.5 in two directions.First, we consider a brighter magnitude cut,  AB = 19.8, which corresponds to the depth of the faintest fields in the GAMA survey.As expected, median colours can only now be plotted out to a lower redshift of  = 0.45, as there are very few galaxies at higher redshifts.The median colours in the model are insensitive to this change in magnitude limit, though the observations suggest that both red and blue galaxies get redder with the brighter apparent magnitude cut.In the left panel of Fig. 7 we also compare the model with an alternative sample of higher redshift galaxies, using the VIPERS spectroscopic sample (Scodeggio et al. 2018), which is limited to the same depth as PAUS ( AB = 22.5).Colour pre-selection is used to identify VIPERS targets, which limits this survey to redshifts  ≳ 0.5 (see fig. 3 of Guzzo et al. 2014 for the colour-colour selection used to select high redshift target galaxies).The high redshift tail of the colour -redshift relation agrees well between VIPERS and PAUS, suggesting that this result is not sensitive to errors in the estimated photometric redshifts and that the colour preselection in VIPERS is effective.This comparison shows the usefulness of the PAUS measurements which span a much wider redshift baseline than comparable spectroscopic surveys, which are either shallower and hence only cover the lower redshift half of the PAUS redshift range, as is the case with the GAMA survey, or which do not measure low redshift galaxies, as in the case of VIPERS.
The right panel of Fig. 7 examines if the selection of higher quality photometric redshifts changes the appearance of the colour-redshift relation.Eriksen et al. (2019) and Alarcon et al. (2021) show that the quality factor property can be used to define a subset of galaxies with fewer redshift mismatches or outliers and a smaller scatter in the estimated redshift than would be found in the full apparent magnitude limited sample.We want to rule out two effects: firstly that the distribution of quality factors might be different for red and blue galaxies due to a dependence of photometric redshift accuracy on galaxy colour, and secondly, that changing the fraction of outlier redshifts could alter the appearance of the colour -redshift relation.In the right panel of Fig. 7, we plot the median colour for the entire sample, and for subsamples comprising the best 50, 20 and 10 per cent of redshifts.Although the median colours agree within the 25th -75th interquartile range, we note a slight shift in the blue cloud medians to bluer colours when restricting the sample to better quality redshifts.The colours measured for better quality photometric redshift samples seem to agree better with the lightcone predictions.
Finally, we dig deeper into the evolution of galaxy colours by considering galaxies selected to be in narrow ranges of apparent magnitude and redshift.In Fig. 8, we plot the distribution of the observed  −  colours for both the GALFORM and the PAUS samples.We select a narrow apparent magnitude bin, 21.7 <  < 22.0, to minimize the effect of having different galaxy populations6 and study how this distribution change in two redshift bins: a 'low redshift' one spanning 0.1 <  < 0.3 and a 'high redshift' one covering 0.4 <  < 0.7.As noted when commenting on Fig. 5, the bimodality of the colour distribution predicted in the GALFORM model, before the application of any errors in the galaxy photometry (blue histogram), is more pronounced than that seen in the observations (green histogram).This is quite clear in the high redshift bin.Including the simple model of photometric errors described in Appendix B, the bimodality in the The two rows show different cuts for the quality of the photometric redshifts (the unperturbed lightcone, blue histogram, is unchanged as it uses cosmological redshifts directly from the simulation).Specifically, the first row is for the full sample while the second row retains the best 50% of objects according to the  criteria.We note that for GALFORM-PHOTO-Z, the photometric redshifts are used to select the sample plotted.
GALFORM predictions that is prominent in the high redshift panels is greatly reduced (orange histogram).This brings the model into much better qualitative agreement with the observations.Reassuringly, the shape of the PAUS distribution does not change when selecting the best 50 per cent of photometric redshifts using a cut on the quality parameter (bottom panels of Fig. 8).In the same way, the GALFORM predictions display similar behaviour when selecting the half of the sample with the best photometric redshifts.

Sensitivity of galaxy colours to model parameters
In this section, we explore the sensitivity of the observed galaxy colours to variation of the GALFORM model parameters.In particular, we look to see if altering the value of a parameter modifies the number of objects in the red and blue populations, or indeed produces a shift in the median colours of these populations.We focus on a subset of the processes in the model for this exercise, as they are known to have a big effect on the intrinsic galaxy properties by altering the star formation activity and hence affecting their colours.These processes are: the strength of the supernova (SNe) driven winds, the timescale for SNe heated gas to be reincorporated into the hot halo, the efficiency of AGN suppression of gas cooling, and the timescale for quiescent star formation.
When a calibrated galaxy formation model is run with a perturbed value for one of its parameters, this can result in a change in the predictions for the observations used to calibrate the model (see the plots illustrating the impact of changing a range of model parameters in Lacey et al. 2016).In principle, other model parameters might need to be adjusted to ensure that the variant model produces an acceptable match to the calibration data, for example, using the methodology   2021).Here, we instead rescale the model galaxy luminosities to force agreement with the -band luminosity function at  = 0 predicted by the fiducial model.We chose the -band as this is the selection band for PAUS.The same rescaling is applied at all redshifts, and to all bands.Hence, the rescaling does not change the model predictions for observer frame colours, but does affect which galaxies are selected to be part of the -band apparent magnitude limited sample.Note that although, as we shall see below, in some cases the shape and location of the red and blue peaks can change, we have checked that the line separating galaxies into red/blue populations works equally well in all models and Eqn. 2 is retained throughout.Four model parameters are changed in this exercise, one at a time, resulting in eight variant models.The parameter values are listed in Table 1: (i) the pivot velocity that controls the mass loading of SNe driven winds,  SN (Eqn 10 in Lacey et al. 2016), with higher values resulting in larger mass ejection rates from more massive halos (ii) the timescale for gas heated by SNe to be reincorporated into the hot gas halo, which is inversely proportional to  reheat (Eqn 11 in Lacey et al. 20167 ), with larger values giving shorter reincorporation times (iii) the star formation efficiency factor,  SF , (Eqn.7 of Lacey et al. 2016; the variants listed in Table 1 correspond to the full range suggested by observations of local star forming galaxies Blitz & Rosolowsky 2006), and (iv) the factor which determines the halo mass in which AGN heating starts to prevent the cooling of gas,  cool , (Eqn.12 in Lacey et al. 2016).From Fig. 9 to 12 we show the model predictions for the median observer frame colours in the top panel and in the bottom panel we show the change in the number density of galaxies in the red and blue populations as a function of redshift.Specifically, in each figure, the upper panel shows the observer frame colours as a function of redshift, with the solid line showing the fiducial model and the dotted and dashed lines showing the predictions for the rescaled variants; dashed lines show the predictions for the lower value of the parameter varied and dotted lines the higher value.We leave for reference the black line indicating the separation used to classify red and blue galaxies in the colour -redshift plane (See Eqn 2).The lower panel shows the logarithm of the ratio between the number of galaxies in the red or blue populations, labelled as , and the number of objects for the same population in the fiducial model,  ref .We draw a horizontal line at log 10 (/ ref ) = 0 as this would be the place where the lines would lay in case the number of galaxies per population is not altered from the fiducial model.
The largest change in the median colours is found after changing the strength of the SNe feedback parameter,  SN , as shown in Fig. 9.The  −  colour shifts by more than the inter-quartile range of the model predictions on perturbing the SNe feedback.As well as the  Blitz & Rosolowsky 2006).With the same colour and line scheme as the upper panel, the bottom panel shows the log of the ratio between the number of galaxies in the variant model and the number of galaxies in the fiducial model for the desired population (red or blue).Note that in this and subsequent plots (Figs.11 and 12), the interquartile colour ranges for the variant models are similar to those for the fiducial model and so are not shown.
Figure 11.The impact on galaxy colours of changing  reheat , the parameter that controls the timescale for gas heated by SNe to be reincorporated into the hot halo, so that it can be considered for cooling.The value in the fiducial model is  reheat = 1.26: dotted lines show the results for  reheat = 1.5 and dashed lines show  reheat = 1.0.The top panel shows the median  −  colour versus redshift, along with the 25-75 percentile range.The bottom panel shows the log of the ratio between the number of galaxies in the variant model and the number of galaxies in the fiducial model for the desired population (red or blue).
shift in the median colours, there are appreciable changes, of up to a factor of three, in the number of objects in the red and blue populations for this parameter change.From Fig. 10, instead, we learn that perturbing the star formation efficiency,  SF , results in only a small shift in the predicted median colours for red galaxies but a larger shift in the median colours of blue galaxies.The number of galaxies in the red and blue populations changes, by up to a factor and the dotted lines show the results for  cool = 0.9.With the same colour and line scheme as the upper panel, the bottom panel shows the log of the ratio between the number of galaxies in the variant model and the number of galaxies in the fiducial model for the population being studied (red or blue). of two.Fig. 11 shows that the median colours of the red and blue populations hardly change on perturbing  reheat .The change in the number of objects in these populations is modest.Finally, Fig. 12 shows that the median colours of the red and blue populations are fairly insensitive to the value of  cool until  ∼ 1.The changes in number density for this parameter change are also small.
In conclusion, we can state that in our models, the strength of the supernovae feedback, as controlled by the parameter  SN , is the physical process that alters the most the location of the red and blue population in the colour -redshift plots studied here (see upper panel of Fig. 9).As a consequence, we can see that the population of red and blue galaxies change significantly in numbers (bottom panel of Fig. 9) as the suppression of star formation is directly related to colours.Nevertheless, the overall trend of the colour -redshift relation is preserved making this test a good candidate for testing the accuracy of galaxy formation models.

CONCLUSIONS
We have presented a new observational test of galaxy formation models using a novel narrow band imaging survey, PAUS (Eriksen et al. 2019;Padilla et al. 2019).The narrow band imaging provides highly accurate photometric redshifts, which allow us to measure how galaxy properties evolve with redshift.The use of photometric redshifts removes any potential biases associated with the successful measurement of spectroscopic redshifts, and allows us to quantify the evolution of galaxy colours over an unprecedented baseline in redshift for a single survey with a homogeneous selection.We focus on observer frame galaxy colours to minimise the model dependent processing that needs to be applied to the data.Hence, we do not need to model the -correction needed to estimate a rest-frame magnitude from the observed photometry.
The PAUS sample used here is magnitude limited to  AB = 22.5, with galaxy redshifts that are mainly distributed between 0 <  < 1.2 with a peak occurring at about  ∼ 0.5 (see Fig. 3).Over this redshift range a significant change in the global star formation rate per unit volume is observed (Madau & Dickinson 2014).
We focus on the observed  −  colour and its evolution with redshift.The observed colour distribution shows a clear division into red and blue populations (as shown in Fig. 5).The observed colours evolve strongly with redshift.This is driven mostly by the redshifting of the spectral energy distribution of the galaxies, which means that the filters sample different absorption features with increasing redshift.A secondary driver of the colour evolution is the change in the intrinsic galaxy properties with redshift, such as the overall increase in the global SFR with increasing redshift.
Hence to compare theoretical predictions to the observations, it is necessary to model the bandshifting effects on the galaxy spectra energy distribution and to build a mock catalogue on an observer's past lightcone, rather than focusing on fixed redshift outputs (Baugh 2008).We do this by implementing the GALFORM semi-analytical model of galaxy formation into the P-Millennium N-body simulation, using one of the recalibrated models presented in Baugh et al. (2019).The construction of a lightcone mock catalogue is described in Merson et al. (2013).An earlier PAUS mock was made using this approach by Stothert et al. (2018), but with a different N-body simulation.The mass resolution in the P-Millennium N-body simulation is almost an order of magnitude better than that in the simulation available to Stothert et al. (2018), allowing intrinsically fainter galaxies to be included in the mock.This allows the mock to recover more of the expected galaxies, particularly at low redshift.Also, the P-Millennium has four times as many snapshots as the previous simulation, which means that the calculation of galaxy positions and magnitudes is more accurate than before.This is because having a higher number of redshift outputs in the same redshift range, hence more binned, facilitates the interpolation of properties between them.
The galaxy formation model used to build the mock is calibrated against mostly local observations.In particular, Baugh et al. (2019) focused on the reproduction of the optical  J -band luminosity function and the HI mass function in the recalibration of the model parameters (the recalibration was necessary because of the change of cosmology in the P-Millennium, compared with earlier runs, and the improvement in the mass resolution).Hence, a useful entry level test of the model is that it reproduces the number counts in the PAUS survey as a function of apparent magnitude and redshift.
The observed number counts are reproduced closely by the mock catalogue (Fig. 2).This exercise also showed the importance of a robust and accurate algorithm for star-galaxy separation, in order to make a reliable comparison of galaxy counts with the model.This is particularly relevant at bright apparent magnitudes where stars make up a larger fraction of the total counts of objects.We also investigated if the number counts of galaxies change when we restrict our attention to galaxies with an estimated photometric redshift.The reason for this test is that to have an estimate for the photometric redshift, the requirement is to have that galaxy observed in at least 30 of the 40 narrow band filters, and not all galaxies in the PAUS W1 and W3 fields meet this criterion.Because of this, we want to make sure that using galaxies with an estimated photometric redshift is not introducing any bias.In the first instance, when a galaxy has a photometric redshift estimated, the shape of the number counts is unchanged.However, there is a small reduction in amplitude and this can be taken into account by introducing a constant sampling factor that accommodates for the fraction of missing objects.If we restrict attention to galaxies which, based on the quality parameter (see Eriksen et al. 2019), are inferred to have good photometric redshifts, the shape of the number counts changes, with the fraction of galaxies with high quality photometric redshifts varying strongly with apparent magnitude.This is an important result that must be considered any time that we use the quality parameter to select galaxies with good photometric redshifts to perform any statistical analysis.We note that although the shape of the number counts is altered by retaining only those galaxies which are believed to have high quality photometric redshifts (comparing the brown line to the blue line in Fig. 2), the colour distribution is not altered (as can be seen by comparing the top and bottom panels of Fig. 8 which is selected over a narrow range in apparent magnitude close to  AB ∼ 22).This implies that the colour magnitude relation is flat at faint apparent magnitudes.Finally, another test that ensures us about the ability of the model to reproduce the observations is the good match to the overall galaxy redshift distribution, limited to the GAMA or PAUS survey apparent magnitude cuts.
With the aim of testing further our model, we use the clear separation between galaxies in the colour -redshift plane (Fig. 5) to divide galaxies into red and blue populations.This definition works well for both the PAUS observations and the GALFORM mock catalogue.Reassuringly, when we limit our attention to those galaxies with high quality photometric redshifts in the observations, the colour distribution does not change, unlike the overall galaxy counts.The observer frame colour redshift relation from a photometric redshift survey like PAUS is therefore statistically robust to test galaxy formation models.Qualitatively, the colour-redshift plane looks similar in the model and observations.The red and blue populations are more sharply defined in the model than in the observations.This bimodality is greatly reduced if we include photometric errors in the model galaxy magnitudes, at the level expected for point sources, which implies that our model for the photometric errors may overestimate the errors.There is good agreement between the median colours (and interquartile range) of the red and blue galaxies as a function of redshift.PAUS is able to probe the colour -redshift relation over a wide baseline in redshift (from  = 0 to  = 1.2) with a homogeneous selection.
We also look at the distribution of the observed colour  −  for an apparent magnitude selected subset of the galaxies in redshift bins (Fig. 8).Again, this test seems to be unaffected when only considering the 50 per cent of galaxies with the best quality redshifts.The comparison between the model and the data is good at low redshifts.At high redshift, the bimodality in colours is stronger in the model than in the observations when we use the unperturbed magnitudes outputted by the model.However, this discrepancy is greatly reduced once photometric errors are included in the GALFORM predictions.
Finally, we examine the sensitivity of the model predictions to perturbations in the values of several key model parameters.These changes can alter the median colours of the red and blue populations and the number of galaxies in each population.For most of the parameter changes we considered, the median colours were unchanged, with small changes in the number of galaxies in the red and blue clouds.The parameter that controls the mass loading of supernovae-driven winds does produce a noticeable change in both the median colours and the number of galaxies in the red and blue populations, suggesting that the observed colours could be used as a further constraint on this model parameter.According to our model the fiducial value of  SN = 380 km/s is the one reproducing better the observations in the colour -redshift plane (see a full discussion of this parameter in section 3.5.2 of Lacey et al. 2016).
Although there is still some room for improvement in the accuracy of GALFORM predictions for galaxy colours, the tests presented here are mostly satisfied by the GALFORM model and they seem to be a good indicator of the accuracy of the model predictions for future galaxy surveys, over a key epoch in the history of galaxy evolution.

DATA AVAILABILITY
The data that support the findings of this study are stored at the Durham COSMA facilities (GALFORM mock) and at the Barcelona PIC facilities (PAUS observations).The GALFORM mock is available from the corresponding author, GM, or the Durham GALFORM team, while for the observation please contact the PAUS team through the website https://pausurvey.org/.In particular, the versions of the data used in this work are PAUS photometric production 972 for the W3 field and 979 for the W1 field.The GALFORM magnitudes are plotted without errors for this purpose.The red points show the galaxies plotted using the redshift from the lightcone.The blue points are plotted at the estimated photometric redshifts.The vertical green lines show the redshifts of the outputs in the P-Millennium N-body simulation.There is no stepping or banding apparent in the observed galaxy colour, even when plotted using the estimated photometric redshift.Furthermore, there is no indication that the estimated photometric redshifts favour the snapshot redshifts.

APPENDIX A: INTERPOLATION SCHEME FOR APPARENT MAGNITUDES
In the GALFORM model, the observer frame magnitude in a given band is defined at the redshift of each of the output snapshots of the simulation.The apparent magnitude of the galaxy at the redshift of lightcone crossing is calculated by interpolating in redshift between the observed magnitudes at the snapshots on either side of the lightcone crossing redshift (Blaizot et al. 2005;Kitzbichler & White 2007b;Merson et al. 2013).Merson et al. (2013) showed that this scheme resulted in a smooth colour redshift distribution, matching the general form observed.
In this section, we extend this test of the observer frame magnitude interpolation in two ways.First, we consider the colourredshift relation for a colour defined using narrow band filters rather than broad band filters.The red points in Fig. A1 show the GAL-FORM galaxy colours, without any photometric errors, plotted against their lightcone redshift.The vertical green lines mark the redshifts of the simulation snapshots.There is no stepping or discreteness visible in the red points.Next, we investigate the photometric redshifts estimated using the GALFORM galaxy photometry.These results are shown by the blue points in Fig. A1.Again, there is no preference for the photometric redshift code to return the N-body simulation snapshots, which suggests that any errors introduced by the interpolation scheme are smaller than those resulting from the redshift estimation.

APPENDIX B: ADDING ERRORS TO THE MOCK GALAXY PHOTOMETRY
We follow the method set out in van den Busch et al. (2020) to add errors to the magnitudes predicted for galaxies in the mock catalogue which reflect the observing strategy for PAUS.The errors are assumed to have a Gaussian distribution in magnitude.The perturbed magnitude in the band labelled by ,  obs  , is obtained by adding a Gaussian distributed quantity, , which has zero mean and variance   to the true magnitude predicted by GALFORM,  true  : The variance of the Gaussian is related to the signal-to-noise ratio in band , (/)  , by where  is a factor which depends on the size of the galaxy if it is resolved and  gives the signal-to-noise ratio for a point source at the magnitude limit.For an extended source,  < 1.Here we assume  = 1 for all sources and  = 5, which means that all galaxies are treated as point sources and are detected with / = 5 at the magnitude limit of the band in question.The magnitude limits in the broad band (BB) filters come from the CFHTLenS photometric catalogues for the W1 and W3 fields (Erben et al. 2013).The PAUS narrow band magnitude limits correspond to 5 limits for a point source (see Table B1).The estimation of the NB errors is described in Serrano et al. (2023), and takes into account the Poisson error in the electron count from the CCDs and the sky noise in the aperture.Note that GALFORM makes a prediction of the size of the disk and bulge component of each galaxy, so in principle, we could have applied a more accurate model for the photometric errors, which took into account whether or not the galaxy is an extended source.However, the predictions for the sizes of disks and bulges are some of the less accurate GALFORM predictions (see, for example, the galaxy size -luminosity plots in Lacey et al. 2016 andElliott et al. 2021).
Hence in the simple model for photometric errors presented here, we have forced the assumption that all model galaxies are point sources.
The results recovered in Fig. 4 reassure us that this methodology for simulating errors on the magnitudes is accurate enough to get photometric redshifts and photometric redshift errors in agreement with the observations, as described in the following Appendix C.

APPENDIX C: THE CHARACTERISTICS OF PHOTOMETRIC REDSHIFTS IN THE MOCK CATALOGUE
After assigning magnitude errors to the model galaxies as described above in Appendix B, we run the BCNz2 code developed by Eriksen et al. (2019) to estimate photometric redshifts for the mock catalogue.
Here we examine the resulting scatter in the estimated redshift and the fraction of outliers, i.e. redshifts with catastrophic errors, and compare these to the results found for the observations.Following Brammer et al. (2008) a quality factor,  is calculated for each photometric redshift to quantify our confidence in the accuracy of the photometric redshift: where  f is the number of filters used to sample the spectral energy distribution (SED) of the galaxy,  2 is the metric describing how well the template SED fits the observations,  99 is the redshift below which 99 per cent of the redshift probability distribution lies and  1 is the redshift below which 1 per cent of the probability density function lies.The ODDS quantity is defined as where () is the redshift probability density function and  b is the mode of ().Note that Δ = 0.0035 is smaller than the value typically used for BB filters, and has been reduced to reflect the width of the PAUS NB filters.These choices are discussed at length in Eriksen et al. (2019).A galaxy with a good photometric redshift quality has a low  value as this implies a low value of  2 and a high value for the ODDS (due to a peaked, narrow ()).Calculating a  value for the model galaxies allows us to study the errors and metrics for different subsamples of galaxies, as is usually done for the data.The distribution of  values recovered from the mock catalogue is compared with that from the observations in Fig. C1 (the different panels have different levels of zooming).The distribution for the mock galaxies is impressively close to those estimated for the PAUS galaxies, particularly for the subsample with spectroscopic matches (labelled as PAUS SPEC8 ).
We next consider, in Fig. C2, the centralised estimate of the scatter,  68 , as a function of magnitude, for different subsamples from the mock, defined using the  value.This plot can be compared with the upper panel of fig. 3 from Eriksen et al. (2020).The agreement with the PAUS SPEC sample is remarkably good (especially when selecting the best 50 per cent of the sample based on the  value), though we note that this sample is biased towards brighter galaxies than the magnitude-limited mock catalogue, because of the difficulties in getting spectra of very faint objects.
As well as the scatter, the performance of the photometric redshift estimation can be quantified using the fraction of outliers produced.Following Eriksen et al. (2019), we define the fraction of outliers as the number of galaxies, normalised by the total number of galaxies in the sample, that satisfy: The outlier fraction is shown in the lower panel of Fig. C2.The mock shows a similar trend to the PAUS data for the outlier fraction as a function of magnitude, but with the overall values being somewhat lower in the mock than in the observations.This holds true both for the full sample (solid lines) and the 50 per cent best quality redshift according to the  criterion (dashed lines).

Figure 1 .
Figure1.Projected angular positions of galaxies in the lightcone mock catalogue (similar to right ascension and declination in degrees) in three different redshift intervals (as labelled), separated into red (left column) and blue galaxies (right column) according to their observed  −  colour (see Fig.5).The lightcone covers approximately 100 deg 2 and is magnitude limited to  AB = 22.5.The presence of two big clusters at low redshift (top panels) can affect the number counts.For reference, the thick black bar in each panel indicates a scale of 10 Mpc.The number of galaxies plotted is given in the top left of each panel.

Figure 2 .
Figure 2. Number counts in the -band predicted from the GALFORM mock (thick green line) compared with the number counts from Capak et al. (2007) (red points) and the PAUS data in the W1 and W3 fields for different selections: full photometric sample (orange line), this includes all objects that have been observed in the narrow-band (NB) filter NB455 (this means that they might not have a redshift estimate), objects with star_flag= 0 (blue line) which are those that has been classified as galaxies from a CFHTLenS star-galaxy separation algorithm, objects with star_flag= 1 (violet line) which are those that has been classified as stars, total photo-z sample (pink line), which are the galaxies that have a PAUS redshift estimate (they need to be observed in a large fraction of the NB filters) and 50 per cent of the best quality redshift sample (brown line) according to the quality flag   as described inEriksen et al. (2019).

Figure 3 .
Figure3.The redshift distribution of galaxies brighter than  AB = 19.8(left) and  AB = 22.5 (right).In both cases the red histograms show the measurements from the PAUS W1 and W3 fields combined, after imposing the star_flag= 0 cut to reject stars.The amplitude of the red histograms has been enhanced by dividing by the sampling rate factor stated in the legend, to take into account the fact that the photometric redshift catalogue is missing the fraction of objects that have less than 30 narrow band measurements.The green histograms show the lightcone redshift distributions, using the exact redshifts (i.e. the cosmological redshift plus the contribution of peculiar velocities predicted by the model) rather than the photometric redshifts that are discussed in Sec.3.2.The blue curves show a simple parametric fit to the green histograms (see text).The orange curve in the left panel shows a fit to the redshift distribution measured from the GAMA survey fromSmith et al. (2017).

Figure 4 .
Figure 4. Left: relation between the lightcone redshifts ( spec ) and the photometric redshifts ( photo ) obtained using the BCNz2 photo-z pipeline(Eriksen et al. 2019).The photometric redshifts are the results of running BCNz2 on the broad-band and the narrow-band filters with errors modeled using the prescription described in Appendix B. Right: relative error on the redshift estimated as the difference between photometric redshift and spectroscopic (lightcone) redshifts.The red line shows the median error in bins of redshifts for the GALFORM mock, with error bars indicating the 16 th to 84 th percentile range.The blue and light blue lines show the same quantity for a subsample of PAUS W1 and W3 respectively, matched with spectroscopic measurements from other overlapping surveys (for details, seeNavarro-Gironés et al. 2023).The scale on the -axis and the values the centralised  68 values quoted in the legend have been multiplied by 1000, to facilitate a comparison with the plots inEriksen et al. (2019).In both panels, the points are from the mock and are coloured according to the density of points per pixel going from violet (low density) to yellow (high density).

Figure 6 .
Figure 6.Running medians for the observed  −  colour vs redshift for GALFORM galaxies, comparing the case with (pink and cyan dotted lines) and without (red and blue lines) errors in the galaxy photometry and photometric redshift.

Figure 7 .
Figure 7. Running medians for the observed  −  colour vs redshift.In each case (lightcone, PAUS and VIPERS) red and blue galaxies have been split according to the white line in Fig.5and the median has been computed in the two populations of galaxies separately.Left: the running median computed for different apparent magnitude limits.Right: the running median computed for different quality cuts, using the property   (seeEriksen et al. 2019) to identify the 50, 20 and 10 per cent best quality redshifts in the sample.

Figure 8 .
Figure 8. Histograms of the observed  −  colour for different redshift bins (different columns) in a selected narrow bin in the i-band (21.7 <  AB < 22.0), for the lightcone (unperturbed, blue histogram labelled as GALFORM and perturbed, orange histogram labelled as GALFORM-PHOTO-Z) and PAUS W1 + W3 fields (green histogram).The two rows show different cuts for the quality of the photometric redshifts (the unperturbed lightcone, blue histogram, is unchanged as it uses cosmological redshifts directly from the simulation).Specifically, the first row is for the full sample while the second row retains the best 50% of objects according to the  criteria.We note that for GALFORM-PHOTO-Z, the photometric redshifts are used to select the sample plotted.

Figure 9 .
Figure 9.The sensitivity of observed galaxy colours to the strength of SNe feedback, for samples limited to  AB = 22.5.In the default model, the scale velocity used in the mass loading of the SNe-driven wind is  SN = 380 kms −1 .The colour redshift relation for this model is shown by the solid lines in the top panel.The bars indicate the 25-75 percentile spread of the colours for the red and blue galaxy populations separately (i.e.those which fall on either side of the black line).The variant models correspond to  SN = 280 kms −1 (dashed)and  SN = 480 kms −1 (dotted).The interquartile range is also shown for the variant models in the corresponding line style.The upper panel shows that these changes result in a shift in the median colours of the red and blue populations.The lower panel shows the logarithm of the number of objects in a specific population (red or blue as per the line colour) in the variants (line style) normalised by the number of objects in the same population in the fiducial model.The -band luminosity functions in the variants have been rescaled to match that in the fiducial model at  = 0, which affects the sample of galaxies plotted, but not their colours.

Figure 10 .
Figure 10.The effect on galaxy colours of changing the star formation efficiency parameter,  SF .The top panel shows the median  −  colour versus redshift, along with the 25-75 percentile range.The value in the fiducial model is  SF = 0.5.The dashed line shows the predictions with  SF = 0.2 and the dotted line shows  SF = 1.7.This range for the parameter  SF is inferred from observations (seeBlitz & Rosolowsky 2006).With the same colour and line scheme as the upper panel, the bottom panel shows the log of the ratio between the number of galaxies in the variant model and the number of galaxies in the fiducial model for the desired population (red or blue).Note that in this and subsequent plots (Figs.11 and 12), the interquartile colour ranges for the variant models are similar to those for the fiducial model and so are not shown.

Figure 12 .
Figure12.The effect on galaxy colours of changing the parameter that governs the halo mass at which the AGN heating starts to prevent the cooling of the gas,  cool .The top panel shows the median  −  colour versus redshift, along with the 25-75 percentile range.The value in the fiducial model is  cool = 0.72.The dashed lines show the model with  cool = 0.5 and the dotted lines show the results for  cool = 0.9.With the same colour and line scheme as the upper panel, the bottom panel shows the log of the ratio between the number of galaxies in the variant model and the number of galaxies in the fiducial model for the population being studied (red or blue).

Figure A1 .
Figure A1.A colour -redshift plot as a test of the interpolation scheme for observer frame magnitudes using narrow band filters.The narrow band filters used have central wavelengths of 6450 and 7450 Å as labelled in the -axis.The GALFORM magnitudes are plotted without errors for this purpose.The red points show the galaxies plotted using the redshift from the lightcone.The blue points are plotted at the estimated photometric redshifts.The vertical green lines show the redshifts of the outputs in the P-Millennium N-body simulation.There is no stepping or banding apparent in the observed galaxy colour, even when plotted using the estimated photometric redshift.Furthermore, there is no indication that the estimated photometric redshifts favour the snapshot redshifts.
et al. (2020)  model the signal-to-noise ratio as a function of the magnitude limit in band ,  lim  as:(/)  = 10 −0.4( true  − lim  )

Figure C1 .
Figure C1.Normalised distribution of the redshift quality factor  for three different samples: the red histogram shows the galform lightcone model, the blue histogram shows a spectroscopically matched PAUS subsample (PAUS SPEC) and the green histogram for the full photometric PAUS sample used here (PAUS PHOTO).The lower panel is a zoomed-in version of the upper panel showing the low  or good photometric redshift region.

Figure C2 .
Figure C2.Upper panel: cumulative plot of  68 as a function of magnitude.Red lines show the scatter obtained from the GALFORM mock while the blue lines are from the spectroscopically matched PAUS subsample (PAUS SPEC).Dashed lines are for the best 50 per cent of the relative sample based on the  value.Lower panel: cumulative fraction of outliers as a function of magnitude.The red continuous lines show the results for the GALFORM mock (whereas the dashed line shows the outlier fraction for the 50 per cent highest quality redshifts based on the  value).Blue lines show the corresponding quantities for the spectroscopically matched PAUS sample (continuous lines are for the full sample and the dashed lines for the best 50 per cent).Both panels can be compared with fig. 3 of Eriksen et al. (2019).

Table 1 .
The parameter values explored in the variant models.The first column gives the parameter name.The third column gives the fiducial value of the parameter, whereas the second and fourth columns give the low and high values considered, respectively.
ETH Zurich, Leiden University (via ERC StG ADULT-279396 and Netherlands Organisation for Scientific Research (NWO) Vici grant 639.043.512),UniversityCollege London and from the European Union's Horizon 2020 research and innovation programme under the grant agreement No 776247 EWC.The PAU data center is hosted by the Port d'Informació Científica (PIC), maintained through a collaboration of CIEMAT and IFAE, with additional support from Universitat Autònoma de Barcelona and European Regional Development Fund.We acknowledge the PIC services department team for their support and fruitful discussions.JYHS acknowledges financial support via the Fundamental Research Grant Scheme (FRGS) by the Malaysian Ministry of Higher Education with code FRGS/1/2023/STG07/USM/02/14.PR acknowledges the support by the Tsinghua Shui Mu Scholarship, the funding of the National Key R&D Program of China (grant no.2018YFA0404503), the National Science Foundation of China (grant no.12073014), the science research grants from the China Manned Space Project with No. CMS-CSST2021-A05, and the Tsinghua University Initiative Scientific Research Program (No. 20223080023).JGB acknowledges support from the Spanish Research Project PID2021-123012NB-C43 [MICINNFEDER], and the Centro de Excelencia Severo Ochoa Program CEX2020-001007-S at IFT. HHo acknowledges support from the Netherlands Organisation for Scientific Research (NWO) through grant 639.043.512.HHi is supported by a DFG Heisenberg grant (Hi 1495/5-1), the DFG Collaborative Research Center SFB1491, as well as an ERC Consolidator Grant (No. 770935).JC acknowledges support from the Spanish Research Project PID2021-123012NA-C44 [MICINNFEDER].MS has been supported by the Polish National Agency for Academic Exchange (Bekker grant BPN/BEK/2021/1/00298/DEC/1), the European Union's Horizon 2020 Research and Innovation programme under the Maria Sklodowska-Curie grant agreement (No. 754510).EJG was partially supported by Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación and FEDER (PID2021-123012NA-C44).I finally want to thank all the informal contributions from the GAL-FORM team spread around the world including Shaun Cole, Cedric Lacey, Piotr Oleskiewicz, Andrew Griffin, Alex Smith, Adarsh Kumar, Difu Shi, Sownak Bose, Andrew Benson, Will Cowley, Ed Elliot, Alex Merson, Violeta Gonzalez-Perez, the whole COSMA SUPPORT team including Alan Lotts, Lydia Heck, and the whole PAUS collaboration including Santi Serrano, Alex Alarcon and Andrea Pocino.

Table B1 .
The narrow band magnitude limits for the W3 field, for a point source detected at 5.The first column gives the central wavelength of the filter in nm and the second column gives the magnitude limit for the band.