Dark Energy Survey Year 3 Results: Redshift Calibration of the MagLim Lens Sample from the combination of SOMPZ and clustering and its impact on Cosmology

We present an alternative calibration of the MagLim lens sample redshift distributions from the Dark Energy Survey (DES) first three years of data (Y3). The new calibration is based on a combination of a Self-Organising Maps based scheme and clustering redshifts to estimate redshift distributions and inherent uncertainties, which is expected to be more accurate than the original DES Y3 redshift calibration of the lens sample. We describe in detail the methodology, we validate it on simulations and discuss the main effects dominating our error budget. The new calibration is in fair agreement with the fiducial DES Y3 redshift distributions calibration, with only mild differences ($<3\sigma$) in the means and widths of the distributions. We study the impact of this new calibration on cosmological constraints, analysing DES Y3 galaxy clustering and galaxy-galaxy lensing measurements, assuming a $\Lambda$CDM cosmology. We obtain $\Omega_{\rm m} = 0.30\pm 0.04$, $\sigma_8 = 0.81\pm 0.07 $ and $S_8 = 0.81\pm 0.04$, which implies a $\sim 0.4\sigma$ shift in the $\Omega_{\rm}-S_8$ plane compared to the fiducial DES Y3 results, highlighting the importance of the redshift calibration of the lens sample in multi-probe cosmological analyses.


INTRODUCTION
The Dark Energy Survey (DES, Flaugher et al. 2015) is currently the largest photometric galaxy survey to date, spanning 5000 deg 2 of the southern hemisphere and having detected hundreds of millions of galaxies.Together with other ongoing and future galaxy surveys (e.g., Kilo-Degree Survey KIDS, Kuĳken et al. 2015; Hyper Suprime-Cam HSC, Aihara et al. 2018; Vera Rubin Observatory Legacy Survey of Space and Time (LSST), LSST Science Collaboration et al. 2009;Euclid, Laureĳs et al. 2011), DES can achieve competitive constraints on cosmological parameters by studying both the spatial distribution of the detected galaxies and by measuring the tiny distortions in their observed shapes due to gravitational lensing effects induced by the large scale structure of the Universe.For instance, the analysis of the first three years (Y3) of DES data (DES Collaboration 2022) placed tight constraints on cosmological parameters combining three different measurements of the twopoint (3x2pt) correlation functions that involved galaxy positions and measured galaxy shapes.These measurements are namely: (i) Cosmic shear, i.e. the 2-point correlation function of galaxy shapes; the DES Y3 measurements (Amon & Gruen et al. 2022;Secco & Samuroff et al. 2022) involve the angular correlation of 10 8 galaxy shapes from the weak lensing sample (Gatti & Sheldon et al. 2021), divided into four tomographic bins.We refer to this as the "source" sample.
(ii) Galaxy clustering: the 2-point correlation function of the positions of bright galaxies (which we refer to as the "lens" sample) (Rodríguez-Monroy et al. 2022); (iii) Galaxy-galaxy lensing: the cross-correlation function of galaxy shapes and the position of the galaxies of the lens sample (Prat et al. 2022).
The modelling of each of these correlation functions requires knowledge of the redshift distributions (from hereafter n(z)) of the two samples (lens and source galaxies), which have to be estimated with great accuracy in order to avoid biased cosmological results (Huterer et al. 2006;Cunha et al. 2012;Benjamin et al. 2013;Huterer et al. 2013;Bonnett et al. 2016;Hildebrandt et al. 2017;Hoyle et al. 2018;Joudaki et al. 2020;Hildebrandt et al. 2021;Tessore & Harrison 2020).The optimal solution would be to avail ourselves of spectroscopic observations, providing an accurate redshift measurement of each targeted galaxy.Unfortunately, it is not feasible to obtain said spectra other than for a small fraction of the science sample, due to the required time and cost of the observing campaign.Cosmological surveys like DES therefore have to use for their redshift estimation measurements only a few, noisy, broad-band fluxes, requiring inventive methods to create robust and unbiased redshift calibration pipelines.
For the DES Y3 3x2pt analysis, two different lens samples were used.The first sample is defined by selecting luminous red galaxies through the RedMaGiC algorithm (Rozo et al. 2016), which retains galaxies with high quality photometric redshift, by fitting each galaxy to a red-sequence template.The galaxies passing the RedMaGiC selection have, however, a low number density, and the final sample comprises roughly 3,000,000 galaxies.The second sample slightly compromises on the redshift accuracy to the benefit of a larger number density.The MagLim sample (Porredon et al. 2021b) is a magnitude-limited sample with a number density more than 3 times greater than RedMaGiC, comprising roughly 10,000,000 galaxies.In the fiducial DES 3x2pt (DES Collaboration 2022) and 2x2pt analyses (Porredon et al. 2021a) that rely on the MagLim sample, the redshift distributions of the sample have been characterised using the machine learning photometric redshift code Directional Neighbourhood Fitting (DNF, De Vicente et al. 2016).In its current implementation, the DNF code provides per-galaxy redshift estimates using nearest neighbour techniques.The redshift distributions were then further calibrated using clustering redshift (hereafter WZ), which relies on cross-correlation measurements with spectroscopic samples (Cawthon et al. 2022).This calibration step also placed uncertainties on the redshift distribution estimates, which were modelled by "shifting" and "stretching" the redshift distributions.
This work presents an additional and more sophisticated calibration of the redshift distributions of the lens sample, and studies the impact of these new redshift distribution estimates on the cosmological constraints using DES Y3 galaxy clustering and galaxygalaxy lensing measurements (2x2pt).In particular, we adopt an approach similar to the one adopted to characterise the redshift distributions of the DES Y3 weak lensing (WL) sample, presented in Myles & Alarcon et al. (2020); Gatti & Giannini et al. (2022).This methodology also combines photometric and clustering constraints to produce redshift estimates, and it is more powerful than the fiducial redshift calibration adopted for the lenses for a number of reasons.The photometric information is used to produce redshift estimates using a self-organizing-map-based scheme (hereafter SOMPZ), which allows a meticulous control over all the (known) potential sources of uncertainties affecting the estimates.The SOMPZ method works by leveraging the DES deep fields, which have deeper observations with additional photometric bands and overlap with many-band redshift surveys available.It is possible to reproduce realistic selection functions in the deep fields from the injection of galaxies into actual DES images using the sophisticated image simulation tool Balrog (Everett et al. 2022).
The SOMPZ method provides an ensemble of redshift n(z) for a given galaxy sample, which captures the uncertainties in the redshift distributions at all orders (i.e., not only in the mean or width of the distributions).The clustering constraints are then incorporated through a rigorous joint likelihood framework where the clustering data is forward modelled as a function of the input n(z), and the specific WZ systematics are marginalized over.This scheme allows to draw n(z) samples conditioned on both clustering and photometric measurements, improving the n(z) estimates by correctly taking into account the significance of the information provided by each source of information.This combined approach has proven to be more robust than SOMPZ or WZ applied individually (Gatti & Giannini et al. 2022), as the combination exploits the complementarity of both methods and reduces the overall n(z) uncertainty.
The paper is organised as follows.In section 2 we introduce all the samples used in this work, both on data and simulations.Simulated samples are used to validate the methodology.Section 3 summarises the SOMPZ+WZ methodology adopted in this paper, also outlining the differences with the "standard" SOMPZ+WZ methodology used to model the DES Y3 source redshift distributions (Myles et al. 2020;Gatti et al. 2022).Section 4 is devoted to the characterisation of the method's uncertainties.Section 5 presents the redshift distributions MagLim sample produced using the techniques described in this work.Section 6 describes the impact of this new redshift calibration on cosmological parameters estimation and compares it to the "fiducial" constraints obtained using the DNF+WZ redshift calibration (Porredon et al. 2021a).In Appendix A we provide details on the construction of the MagLim sample in simulations.Appendix B complements the paper with a validation of the methodology in simulations.In Appendix C are listed the values of parameters and the prior functions used in the cosmological inference; Appendix D discusses the impact of different redshift uncertainties marginalisation techniques on the cosmological parameters estimation.
• the DES MagLim sample, used as lenses in the DES cosmological analysis.Characterising its redshift distribution is the main goal of this work; • the DES deep field samples, which are observed in small fields by DES with deeper observations than wide field ones and where information from additional photometric bands are available.Deep fields are a key element of the SOMPZ methodology; • the DES Balrog sample; this sample consists of softwareinjected deep field galaxies into DES wide field images and is a key element of the SOMPZ methodology; • the "redshift" samples, which are a collection of either spectroscopic or multi-band photometric samples collected by other surveys in the DES deep field region.The redshift samples are a key element in the SOMPZ methodology; • BOSS/eBOSS spectroscopic galaxy catalogues; these are galaxies with spectroscopic redshift overlapping with the DES wide field footprint used for the WZ measurement; • the DES WL sample, used as sources in the DES cosmological analysis; we use the WL sample here when presenting the impact of MagLim SOMPZ redshift distributions on the cosmological analysis results.
All of these samples in data have also been reproduced in simulation for testing purposes.

DES Year 3 Data
DES (Flaugher et al. 2015) is a five broadband ( ) photometric survey that mapped roughly 5000 deg 2 of the southern sky, using a 570 megapixel camera (DECam; Flaugher et al. 2015) mounted on the 4 meter Blanco telescope at the Cerro Tololo Inter-American Observatory (CTIO) in Chile.In this work we use data from the first three years (out of six) of observations (Y3), which were taken from August 2013 to February 2016.The DES Data Management (DESDM) team was in charge of processing the raw images (Sevilla et al. 2011;Morganson et al. 2018;Abbott et al. 2018); full details are provided in Sevilla-Noarbe et al. (2021) and Gatti & Sheldon et al. (2021).The main catalog upon all the DES samples are built is the DES gold catalog, obtained using observations in the griz bands.Objects belonging to the gold catalog have passed a number of selections aimed at removing objects in problematic regions of the sky or anomalous detections (e.g., objects with pixels affected by saturation or truncation issues).The gold catalog consists of 388 millions objects (Sevilla-Noarbe et al. 2021).Each object comes with morphological and photometric measurements based on two different pipelines, the Multi-Object Fitting pipeline (MOF) and the Single-Object Fitting pipeline (SOF).The former performs a simultaneous multi-object, multi-epoch, multi-band fit to estimate morphology and photometric information; the latter does not perform the multi-object fit when it comes to crowded objects.The DES Y3 SOF implementation is faster and less prone to fit failures compared to the MOF pipeline, and it does not suffer from any significant loss in terms of accuracy (Sevilla-Noarbe et al. 2021).

MagLim sample
The main galaxy sample considered in this work is the MagLim sample.The MagLim sample is a subset of the DES gold catalog  1.The MagLim sample is used as lens sample in the galaxy-galaxy lensing and galaxy clustering measurements of the DES Y3 2x2 cosmological analysis (Porredon et al. 2021a).

DNF
The photo- code DNF (Directional Neighborhood Fitting) is used to define the MagLim selection and to define the MagLim tomographic bins.The DNF algorithm computes a point estimate  mean of redshift of the galaxies by performing a fit to a hyper-plane in color and magnitude space using up to 80 nearest neighbors taken from a reference sample made of spectroscopic galaxies with secure redshift information.For this purpose, a large number of spectroscopic catalogs collected by Gschwend et al. (2018) has been used, including spectra from SDSS DR4 (Abolfathi et al. 2018), OzDES (Lidman et al. 2020), VIPERS (Garilli et al. 2014), and from the PAU spectro-photometric catalog (Eriksen et al. 2019).The total number of spectra used for training is ∼ 10 5 .DNF also provides a redshift estimate  DNF drawn from the redshift PDF for each individual galaxy, although only the quantity  mean (used for the selection and for the binning) is of interest in this work.

Deep Fields sample
The Deep fields catalog is a key element of the SOMPZ methodology.We provide here a few key details, but we refer the reader to Hartley et al. (2022) for extensive details and the characterisation of the sample.This work uses four different deep fields, i.e., E2, X3, C3 and COSMOS (COS) covering 3.32, 3.29, 1.94, and 1.38 square degrees, respectively.Each deep field has undergone a scrupulous masking procedure aimed at removing artefacts (e.g., cosmic rays, meteors, saturated pixels, etc.).Considering the final unmasked area overlapping with the UltraVISTA and VIDEO near-infrared (NIR) surveys (McCracken et al. 2012;Jarvis et al. 2013), which is needed to provide photometric information in additional bands, we are left with 5.2 square degrees of area for a total of 267,229 galaxies with measured , , , , , , ,   photometry with limiting magnitudes 24.64, 25.57, 25.28, 24.66, 24.06, 24.02, 23.69, and 23.58.Note that deep field galaxies have deeper photometry and photometry available in more bands compared to the wide field galaxies; this is key for a good performance of the SOMPZ method as it reduces the color-redshift degeneracy.

Balrog sample
The Balrog sample is another key element of the SOMPZ methodology.It is used to relate galaxies with given deep photometry to observed galaxies with wide field photometry, which are noisier.To this aim we rely on Balrog (Suchyta et al. 2016), a software which injects "fake" galaxies into real images.For this analysis, Balrog was used to inject deep field galaxies into the broader wide field footprint (Everett et al. 2022).After injecting galaxies into images, the output Balrog images are passed into the DES Y3 photometric pipeline and injected galaxies are detected equivalently to real galaxies, yielding multiple realisations of each injected galaxy.The Balrog sample spans ∼20% of the DES Y3 footprint.We further select injected galaxies using the MagLim selection.We then construct a matched catalog matching Balrog injected wide field MagLim galaxies with their deep field counterparts, for a total of 351,165 galaxies with both deep and wide photometric information.The resulting catalog is called the Balrog sample.

Redshift Samples
The redshift samples used for the SOMPZ section of the analysis consist of galaxies with secure redshift information (either spectroscopic or high quality multi-band photometric) observed in the deep fields.These samples are key to characterise the redshifts of the deep field sample and, in turn, to transfer the redshift information to the wide field MagLim sample.
We consider three separate redshift selections, similarly to what has been used in source sample redshift characterisation (Myles & Alarcon et al. 2020): • a collection of spectra from a number of different public and private spectroscopic samples, from the spectroscopic compilation by Gschwend et al. (2018).We have not restricted ourselves to a few, selected surveys as in the case of the DES Y3 weak lensing sample (Myles & Alarcon et al. 2020), where only zCOSMOS (Lilly et al. 2009), C3R2 (Masters et al. 2017(Masters et al. , 2019)), VVDS (Le Fèvre et al. 2013), and VIPERS (Scodeggio et al. 2018) were considered, because due the bright nature of the MagLim sample we would mostly select high signal-to-noise galaxies.Furthermore, using more spectra from different surveys allow us to simultaneously reduce the shot noise and improve the completeness of the sample, while minimising the impact of possible outliers; • multi-band photo-z galaxies from the COSMOS field; in particular, we used the COSMOS2015 30-band photometric redshift catalog (Laigle et al. 2016), which is equipped with narrow, intermediate and broad bands covering the IR, optical, and UV regions of the electromagnetic spectrum; • redshifts from the PAUS+COSMOS 66-band photometric redshift catalog (Alarcon et al. 2021), which adds 40 narrow band filters from the PAU Survey.
We match these redshift catalogs to our deep field galaxies and keep only those that are selected at least once into our MagLim selection according to their Balrog injections.Due to the bright nature of the MagLim sample, the number of galaxies in our final redshift samples is greatly reduced: for the SPC sample, for example, the unique total number of galaxies passes from 118745 to 17718, a reduction of around 85%.
In some cases, the same galaxy might have redshift information from multiple surveys.Following Myles & Alarcon et al. (2020), we created three slightly different redshift samples, where in case of multiple information from different surveys we use as fiducial the redshift from a specific survey.The samples are: • 1) SPC, where in case of multiple information available we first use the spectroscopic catalog (S), then PAUS+COSMOS (P), and finally COSMOS2015 (C); • 2) PC, where we rank first the PAUS+COSMOS catalog before COSMOS2015, and we do not include spectroscopic redshifts; • 3) SC: where we first use the spectroscopic catalog before COSMOS2015, but we do not include the PAUS+COSMOS catalog.
Table 2 summarises the number of unique galaxies appearing in each of the three redshift samples, before and after performing the MagLim sample selection.The fiducial ensemble of redshift distributions is generated by marginalizing over all three of these redshift samples (SPC, PC, SC) with equal prior, which in practice is achieved by simply merging the () samples produced from the three redshift samples, creating a three times larger pool of n(z).In such a way we marginalise over potential uncertainties and biases in the different redshift catalogs (S, P and C).

BOSS/eBOSS Galaxy catalogs
The BOSS/eBOSS galaxy catalog is our reference sample for the WZ measurement.It consists of a number of spectroscopic samples from the Sloan Digital Sky Survey (SDSS, Gunn et al. 2006;Eisenstein et al. 2011;Blanton et al. 2017), and combines SDSS galaxies from BOSS (Baryonic Oscillation Spectroscopic Survey, Smee et al. 2013;Dawson et al. 2013) 3, with the final sample consisting of 241,987 objects and covering an area ranging from 14 to 17% of the total DES footprint.We note that estimates of the magnification coefficients are not available for BOSS/eBOSS galaxies.For our fiducial analysis we assumed magnification values for the BOSS/eBOSS sample to be set to zero.We are confident about this choice for the narrow shape of the MagLim tomographic bins, since the magnification is usually significant in the tails of the distribution when the clustering kernel due to selection effects is larger.We nonetheless verify in this work that our analysis is not very sensitive to the particular choice of the values of the magnification parameters (see Section 6.1.2).

Weak Lensing catalog
The DES Y3 WL sample is used in this work as source in the galaxygalaxy lensing measurement with the MagLim sample.The WL sample is created using the metacalibration pipeline (described and tested in Huff &Mandelbaum 2017 andSheldon &Huff 2017 and applied to the Y3 data in Gatti & Sheldon et al. 2021) and it is a subset of the gold catalog.The metacalibration pipeline provides a per-galaxy self-calibrated shape measurement, which is free from shear and selection biases.An additional, small calibration based on image simulations (MacCrann et al. 2022) accounts for blending and detection biases.The final catalog consists of ∼ 100 million galaxies, spanning the full DES Y3 wide field footprint and with an effective number density of  eff = 5.59 gal/arcmin −2 .The WL sample is divided into four tomographic bin using the SOMPZ method (Myles & Alarcon et al. 2020).

Simulated Galaxy catalogs
Our methodology is thoroughly validated using simulated catalogs.In particular, we use one realisation of the sets of the Buzzard Nbody simulations (DeRose et al. 2022).All the catalogs we used in data have their simulated counterparts, although we adopted some reasonable simplifications, when needed.We give here a brief summary of the Buzzard simulation and the simulated catalog we had to create for this project, i.e., the simulated MagLim sample.The simulated BOSS/eBOSS catalog description is provided in Gatti & Giannini et al. (2022), whereas the simulated WL sample is described in DeRose et al. (2022).
Buzzard is a synthetic galaxy catalog built starting from N-body lightcones produced by L-GADGET2 (Springel 2005).Galaxies are incorporated in the dark matter lightcones using the ADDGALS algorithm (DeRose et al. 2019).Buzzard spans 10313 square degrees.The cosmological parameters chosen are Ω m = 0.286,  8 = 0.82, Ω  = 0.047,   = 0.96, ℎ = 0.7.The simulations are created starting from three lightcones with different resolutions and size (1050 3 , 2600 3 and 4000 3 Mpc 3 ℎ −3 boxes and 1400 3 , 2048 3 and 2048 3 particles), to accommodate the need of a larger box at high redshift.Halos are identified using the public code ROCKSTAR (Behroozi et al. 2013) and they are populated with galaxies using ADDGALS (DeRose et al. 2019), which provides positions, velocities, magnitudes, SEDs and ellipticities.Galaxies are assigned their properties based on the relation between redshift, -band absolute magnitude, and large-scale density from a subhalo abundance matching model (Conroy et al. 2006;Lehmann et al. 2017) in higher resolution N-body simulations.SEDs are assigned to galaxies by imposing the matching with the SED-luminosity-density relationship measured in the SDSS data.SEDs are -corrected and integrated over the DES filter bands to generate DES  magnitudes.Ray-tracing is performed through the CALCLENS algorithm (Becker 2013), to introduce lensing effects, in order to provide weak-lensing shear, magnification and lensed galaxy positions for the lightcone outputs.CALCLENS is run onto the sphere, masked with the DES Y3 footprint, using the HEALPix algorithm (Górski & Hivon 2011) and is accurate to ∼ 6.4 arcseconds.

Simulated MagLim sample
In order to define a simulated MagLim sample, the photo- code DNF has been run on a subset of the Buzzard simulations, restricted to i-band magnitudes  < 23, so as to reduce the running time without affecting the final result (note that the MagLim selection presents a cut at  < 22.2).The goal is to attain similar number density and color distributions as in data.We provide more detailed information on the adaptation to the sample selection for Buzzard in Appendix A.

Simulated Deep catalog
The simulated true fluxes from Buzzard are used as the deep measurements, but we further assign a realistic error by using the limiting flux for each mock deep band.We use the same uncertainties as in data, but as the Buzzard simulation has a different zero point, those values have to be converted in magnitude using zero point of 30, and then is converted to a flux uncertainty for a zero point of 22.5, which is the zero point of the Buzzard fluxes.We do not differentiate between fields, as it has been proven in Myles & Alarcon et al. (2020) that this had no impact on the simulated redshift distribution.The size of the sample is 968759 galaxies.We use the true redshift for the redshift sample and to compare our inferred redshift distributions to the true ones.

Simulated Balrog catalog
We mimic the Balrog algorithm by randomly selecting positions over the full Y3 footprint and run the corresponding error model on the galaxies of the simulated deep catalog to obtain noisy versions, according to the exposure times of each location.The deep galaxies can be injected an arbitrary number of times and we set this at 10.Only the wide counterparts of the deep galaxies that respect the MagLim selection defined in the Buzzard simulation are then included in the sample, yielding the final number of 250193 galaxies.

REDSHIFT INFERENCE METHODOLOGY
We describe in this section the methodology adopted in this work to infer the redshift distributions of the lens sample.The methodology is similar to the one adopted for the weak lensing sample (Myles & Alarcon et al. 2020) and relies on two key techniques: • photometric classification with Self-Organising Maps (SOM), known as the SOMPZ method (Buchs et al. 2019;Myles et al. 2020).The SOMPZ method takes advantage of the deeper photometry of 8 bands (ugrizJHKs) available in the DES deep fields, where galaxies with high-quality redshifts can be accurately classified in the deep color space, to ensure small selection biases, and well characterised redshift estimates and uncertainties of DES wide field galaxies; • clustering-based or clustering redshift techniques (WZ), more established in cosmology (Newman 2008;Ménard et al. 2013).The redshift distributions calibration is based on angular correlation with a reference sample with high-quality redshift estimates.This method is affected by systematic biases different than photometric methods, which makes this combination interesting and improves the robustness of our redshift estimates.For example, it does not require the spectroscopic sample used for calibration to be representative of the target sample.On the other hand, the galaxy bias evolution of the galaxy samples is involved, and magnification effects have to be taken into account.
These two techniques are combined together to provide an estimate of the redshift distributions of the lens sample.Such a combination is powerful because it exploits the complementarity of the two methods, which are affected by two very different sets of biases and uncertainties.We provide the key ingredients of these two techniques in the following sections, followed by a description of how the two are combined together.
We note that this method is an alternate method compared to the one presented in Porredon et al. (2021a); Cawthon et al. (2022), which provides redshift estimates combining photometric estimates from the photo- code DNF (De Vicente et al. 2016) and clustering constraints from Cawthon et al. (2022).We delay the comparison between the two methods to section 5.1.

SOMPZ Methodology
The SOMPZ methodology estimates wide field redshift distributions by exploiting a mapping between wide field galaxies and deep field galaxies with deeper and more precise photometry.Extracting the redshift information from deep, several band photometry in order to estimate the redshift of an observed wide field galaxy amounts to marginalizing over deep photometric information (Buchs et al. 2019).Let us consider the probability distribution function for the redshift of a galaxy (); let us assume such a probability to be conditioned on observed wide field color-magnitude x and covariance matrix Σ.The probability can be written by marginalizing over deep photometric color x as follows: The large dimensionality of this form prevents us from applying it to real situations.This problem can be circumvented by discretising the color space x and (x, Σ) in cells  and ĉ, each spanning a portion of the whole and representing a specific galaxy phenotype, respectively of the deep and wide field.The galaxy samples are arranged in cells/phenotypes using Self-Organizing Maps (SOM) (Kohonen 1982), which is an unsupervised machine learning technique used to produce a lower-dimensional representation of a complex data set, while preserving its core properties.The choice of the topology of the cells follows Buchs et al. (2019), where a two-dimensional representation of the color space was chosen as it ensures an immediate visualisation of the data not possible otherwise.Once we compressed our data in a more manageable set of information, we can write the () for the group of galaxies living in a particular wide cell ĉ.Since the MagLim tomographic bins b are already defined, we are going to construct one set of SOMs (one deep and one wide) for each bin.Assigning all galaxies belonging to a tomographic bin to a wide SOM is straight forward.In order to construct the deep SOM we have to use our Balrog sample, consisting of all detected and selected Balrog realisations of the galaxies in the wide field, each associated to its own "noiseless" replica in the deep sample.We therefore can assign to the deep SOM associated to a tomographic bin, galaxies whose Balrog wide replica is selected in that specific wide bin.Therefore we can marginalize over deep field phenotypes  as: At this point we want to marginalise over all wide cells ĉ belonging to each tomographic bin.Again, we are computing (| b) for each bin separately from different sets of SOMs: Unfortunately there are very few galaxies for each (, ĉ) pair, and in many cases there are none.This makes the term (|, ĉ) quite difficult to estimate.However, we can reasonably assume that the () for galaxies assigned to a given deep cell  should not depend on the noisy wide photometry of that galaxy.Therefore we can relax the selection: We use this approximation for our fiducial result.We obtain each of the terms appearing in Eq. 3.1 by placing galaxy samples to the SOM cells, as follows: • ( ĉ) is computed collecting wide field galaxies from the MagLim sample into a wide field SOM (one per tomographic bin); • (| ĉ) is computed from the deep/Balrog sample.It consists of all detected and selected Balrog replicas of the deep galaxies injected in the wide field.We therefore can arrange the deep/Balrog sample simultaneously into a wide and deep SOMs.We call this term the transfer function.We weight the deep field galaxies according to their detection rate measured from Balrog.An alternative to Balrog would be using a sub-section of the wide field and deep fields overlap, giving us both deep and wide photometry for a limited number of galaxies.However, the area of overlap is small and the particular observing conditions found in this area will not be representative of the overall observing conditions found in the Y3 footprint as highlighted in Myles & Alarcon et al. (2020).
• (|) is computed from the redshift sample, which is a subset of the deep sample, for which we have both credible redshifts, 8-band deep photometry, and thanks to Balrog also wide-field realisations.

SOM properties
As in Buchs et al. (2019) and Myles & Alarcon et al. (2020), we use squared-shaped SOMs with  cells for each side (for a total of  ×  cells) and periodic boundaries, which makes the visualisation easier without compromising the efficiency.We parametrize the SOMs using luptitudes and lupticolors, following Buchs et al. (2019).Luptitudes are defined in Lupton et al. (1999) as inverse hyperbolic sine transformation of fluxes: where m are magnitudes, f are fluxes,  = 2.5 log  and b is a softening parameter that defines at which scale luptitudes transition between logarithmic and linear behaviour.For the deep SOM we compute 7 lupticolors with respect to the i-band where the index from 1-7 runs over the deep bands urgzJHK.We avoid using the g-band for the wide field galaxies, as any obser-vational systematics and chromatic effects are more evident in the g band.With only two lupticolors available in the wide SOM, we decided to add the i-band luptitude, as Buchs et al. (2019) find empirically that addition of the luptitude improves the training performance: The resolutions of the SOMs are 32x32 cells for the wide, and 12x12 cells for the deep.The reason behind the fewer cells in the deep SOM lies in the MagLim selection: the bright magnituderedshift cuts must be applied also to the wide-component of the deep and redshift samples, and only the deep galaxies whose wide component is selected are included in the sample.This results in smaller deep and redshift samples covering a very small portion of the color space, compared to the weak lensing source sample Myles & Alarcon et al. (2020).Also, reducing the number of cells means yielding more galaxies in each one.This is necessary in order to minimise the number of wide field galaxies assigned by the transfer function to a deep SOM cell with no redshift information.Reducing this number under 1% is crucial to ensure that we get a correctly estimated redshift distribution for our sample.We note that shot noise caused by a small number of redshifts in a deep cell can play a significant role in biasing the estimate.We therefore performed a test to identify the optimal SOM size which would minimise these issues.We first computed several estimates in the Buzzard simulations using different resolutions for the deep SOM.We then evaluated which setting produced the smallest shift on the mean redshift with respect to the true value.As mentioned at the beginning of this section, SOMs require to be trained before being able to classify galaxies.After ensuring that the redshift samples and the MagLim sample span the same luptitude-lupticolor space (achieved using Balrog to obtain the redshift samples wide photometry), we decided to use the redshift sample for the deep SOM training.We instead use the MagLim sample itself to train the wide SOM.

WZ
Clustering redshift is a widely used method (Newman 2008;Ménard et al. 2013;Davis et al. 2017;Morrison et al. 2017;Scottez et al. 2018;Johnson et al. 2017;Gatti & Vielzeuf et al. 2018;van den Busch et al. 2020;Hildebrandt et al. 2021;Cawthon et al. 2022;Gatti & Giannini et al. 2022) to infer or calibrate redshift distributions of galaxy samples.It relies on the assumption that the cross-correlation between two samples of objects is non-zero only in the case of overlap of the distribution of objects in physical space, due to their mutual gravitational influence.
Various implementations of the clustering redshift methodology differ in their details, but they all agree on one key aspect: the "target" sample (hereafter dubbed "unknown" sample), which has to be calibrated, has to be cross-correlated with a "reference" sample divided into thin redshift bins.The reference sample consists of either high-quality photometric or spectroscopic redshift galaxies, and has to spatially overlap with the unknown sample.
Assuming linear galaxy-matter bias, we can express the clustering  ur between the unknown sample and each of the reference sample thin bins as function of the separation angle  between the unknown and reference sample: where  r and  u are the redshift distributions of the reference and unknown sample,  r and  u are the galaxy-matter biases of both samples,  DM is the clustering of dark matter and  () denotes contributions due to magnification.Note that we are assuming Limber approximation (Limber 1953), but this has been shown to have no impact on the results (McQuinn & White 2013).
In our methodology, we use a single estimated value from the cross-correlation signal for each thin redshift bin.In practice, we do this by measuring the correlation function as a function of angular separation and then averaging it with a weight function to produce the single estimate: where  () ∝  −1 is a weighting function (Gatti & Giannini et al. 2022).The integration limits in the integral in Eq. 10 are set to fixed physical scales (1.5 to 5 Mpc).
Since the  r are binned in narrow bins we can approximate the number density of the sample of reference as a Dirac delta, and the revised expression becomes: The above equation relates the redshift distribution of the unknown sample to the measured clustering signal wur .The galaxy-matter biases of the reference can be estimated from the autocorrelation of the reference sample.Usually the galaxy-matter bias of the unknown sample cannot be inferred and is treated as nuisance parameter.In this work, however, due to the relatively good redshift provided by DNF for the MagLim sample, we also use the autocorrelation of the latter as a prior for  u (see section 4.2).The other terms in the above equation are the clustering of dark matter wDM , which can be estimated from theory and it is not very sensitive to the cosmological parameters (Gatti & Giannini et al. 2022), and the magnification term, which is expected to have a little impact (Gatti & Giannini et al. 2022) and can be estimated if magnification coefficients for the samples are provided.The angular scales considered have been chosen to span the physical interval between 1.5 and 5.0 Mpc.These bounds, applied to data as well as simulations, are selected so that the upper bound is below the range used for the galaxy clustering cosmological analyses, therefore granting the WZ likelihoods to be essentially independent of the assumed cosmology, and allowing us to produce n(z) samples in an MCMC chain that runs independently of the cosmological ones.We perform the cross-correlations of MagLim with each of the 50 bins of width Δ = 0.02 of the BOSS/eBOSS catalog, which spans 0.1 < z < 1.1 as previously mentioned.We also weigh each galaxy of the MagLim sample by the clustering weights computed in Rodríguez-Monroy et al. ( 2022).
We use the Davis & Peebles (1983) estimator for the crosscorrelation signal, where  u  r () and  u  r () represent data-data and data-random pairs.The pairs are normalized through  Dr and  Rr , which is the total number of galaxies in the reference sample and in the reference random catalog.The correlation estimates were computed using treecorr2 .

CHARACTERIZATION OF SOURCES OF UNCERTAINTY
In this section, we present the characterisation of the systematic uncertainties of our methodology.The dominant sources of uncertainties for the SOMPZ method are sample variance and shot noise.
In the clustering redshift method, the main uncertainty is caused by the lack of prior knowledge on the redshift evolution of the galaxymatter bias of the MagLim sample.This is modelled by a flexible systematic function, informed by a measurement of the MagLim auto-correlation function in data.Other, minor sources of uncertainties are related to magnification effects and the approximation of linear bias (Gatti & Giannini et al. 2022).We provide further details on each source of uncertainty in the following subsections.A full catalog-to-cosmology validation of the method (in simulations) is then presented in Appendix B.

SOMPZ uncertainties
For the SOMPZ method we consider the following sources of uncertainty: • sample variance of the deep fields: main uncertainty, caused by the limited area of the deep fields.We model the effect of sample variance by means of the three step Dirichlet (3sDir) analytical model described in §4.1.1; • shot noise in the deep and redshift samples: this is induced by the limited number of galaxies available in the deep and redshift samples.We model the effect of shot noise by means of the 3sDir analytical model described in §4.1.1; • SOMPZ method uncertainty: this uncertainty stems from discretising the color space in the SOMPZ mapping.We do estimate its impact on the SOMPZ estimates by replicating the SOMPZ methods multiple times in simulations, and incorporate its effects by using Probability Integral Transforms (PITs) ( § 4.1.2); • photometric calibration: related to uncertainties in the calibration of the deep fields zeropoint, it is accounted for in the SOMPZ estimates by means of PITs ( § 4.1.3).
• redshift sample biases: these biases stem from uncertainties and biases in the redshift estimates of the redshift samples.Their impact is accounted for in our methodology by marginalising over three different combinations of redshift samples ( § 4.1.4); • transfer function: any bias induced by an erroneous estimation of the transfer function due to a size-limited Balrog sample; we anticipate this to be negligible following the results from Myles & Alarcon et al. (2020) ( § 4.1.5).
In the following sections we will proceed to describe in detail how we account for each of the items listed above.

Sample variance and shot noise (3sDir)
Sample variance is the dominant uncertainty affecting our SOMPZ estimates, and stems from the limited size and area coverage of the redshift and deep samples, with respect to the whole wide field.The deep fields only cover ∼ 9deg 2 , which means we could be learning the color/redshift relation from a non-representative sample of the sky due to fluctuations in the matter density field; moreover, the finite size of the redshift sample can introduce shot noise effects, preventing a correct sampling of the quantities required for the redshift inference.
Generally the impact of sample variance can be evaluated estimating the redshift distributions in simulations multiple times using different line of sights for the deep fields (e.g.Hildebrandt et al. 2017, Hildebrandt et al. 2021;Hoyle et al. 2018;Buchs et al. 2019;Wright et al. 2020).Although we also performed a test where we evaluated the impact of sample variance using the Buzzard simulation, in our standard procedure we use the three step Dirichlet (3sDir) approach 3sDir presented in Sánchez et al. (2020) and applied to the redshift calibration of the DES Year 3 source sample (Myles & Alarcon et al. 2020).
The 3sDir method consists of an analytical sample variance model predicting what the redshift-color distribution would be from the observed individual redshift and galaxy phenotypes (colors) of galaxies coming from smaller deep fields.Using this model we can build an ensemble of redshift distributions realisations whose fluctuations realistically represent the effect of sample variance.During the cosmological inference, by sampling over these realisations, one can effectively marginalise over the effect of sample variance.Here we provide a short description of the 3sDir method, but we direct the reader to Myles & Alarcon et al. (2020) and Sánchez et al. (2020) for more details.The 3sDir method assumes the probability (, ) that galaxies belong to a redshift bin  and color phenotype  to be described by a probability histogram with coefficients   (with   = 1 and 0 ≤   ≤ 1).Under this assumption, the expected number counts of galaxies in a deep SOM cell given the coefficients   are described by a multinomial distribution; if we assume a Dirichlet function for the prior on   , the posterior of   given the observed number count will also be described by a Dirichlet function.Such a Dirichlet posterior can be used to draw samples and naturally accounts for the effect of shot noise in the data.The effect of sample variance can be introduced by tuning the width of the prior on   , which does not change the expected value for   in the Dirichlet distribution, but does change its variance to simultaneously account for shot noise and sample variance.If all the galaxies belonging to the redshift sample were independently drawn, then a Dirichlet distribution parametrized by the redshift sample counts in each couple of redshift bin  and phenotype ,   , would fully characterize   .However, one subtlety is that sample variance correlates with redshifts; to increase the variance with the correct redshift dependence one can use the fact that two different phenotypes (deep SOM cells) overlapping in redshift are correlated due to the same underlying large-scale structure fluctuations.The 3sDir model assumes that phenotypes at the same redshift share the same sample variance, and therefore groups cells with similar redshifts in superphenotypes T. One can then express the   as: The 3sDir method consists of drawing values of these three sets of coefficients with three Dirichlet functions.In this way, it is possible to include a redshift-dependent variance while conserving the expected value of   .
The validation of the 3sDir method has been carried out in Myles & Alarcon et al. (2020), applied to the weak lensing source sample.The only difference with this work stands in the fact we are performing the 3sDir estimation independently for each tomographic bin, due to their definition.
As reported in Table 4, this uncertainty is dominant, both on the mean and width values of the n(z) distributions, computed from the ensemble of realisations provided by the 3sDir method.

SOMPZ Method Uncertainty
The SOMPZ method relies on the discretisation on the color space spanned by our deep field sample, and this is an approximation that can lead to small biases or additional uncertainties.In order to estimate these, we compute our SOMPZ n(z) a large number of times in the Buzzard simulations.In order to factor out sample variance, each time we randomly select patches of the Buzzard footprint to construct the mock deep fields.In this way, by averaging over all the final n(z) realisations, we can produce an estimate of the n(z) only minimally biased by sample variance, and test the agreement with the true n(z) .Due to the computational cost of the SOMPZ pipeline, we decided to produce 300 n(z) replicas.To perform this test, we assumed that the redshift sample would only be limited to one of our four fields, of the size of COSMOS.
We computed the mean redshift offset of the ensemble with respect to the true value, for each tomographic bin.As reported in Table 4, these values are smaller than the effect of sample variance.These values are incorporated into our final n(z) ensemble using the PIT method described in the following section, by additionally shifting each probability integral transform (used to correct for the zeropoint uncertainties) by a value drawn from a Gaussian centered at zero with standard deviation equal to the root-mean-square of the aforementioned mean offset values.

Deep Fields Photometric Calibration Uncertainty
Although the uncertainty in the photometry of each individual galaxy is implicitly accounted for in the SOM training, the uncertainty on the photometric calibrations as a whole must be evaluated by testing how the measured () are affected by changes in the photometric zeropoint in each band.This is relevant for the deep fields, where the relatively precise fluxes are key to constraining reliable () in parts of parameter space that are not subject to selection biases.Ideally, this would be tested by rerunning the full analysis for an ensemble of perturbations of the photometric zeropoint according to the zeropoint uncertainty, but the computational The various components are computed as described in section 4 and as they are not completely independent it is not expected that they sum up to the total value.The values related to SOMPZ and SOMPZ+WZ refer to Figure 5, and include only the 3sDir uncertainty due to sample variance and shot noise (and the redshift samples uncertainty), because it was logistically not possible to add the SOMPZ method and the zeropoint sources of uncertainty before the combination with WZ.As a comparison, the "SOMPZ (with all unc)" includes all uncertainties.The final n(z) which has been used in the cosmological analysis is the bottom line.
requirements of the Balrog injection procedure make this infeasible.Instead, we produce an analogous ensemble of realizations in simulations, where the Balrog mock photometric survey is reduced to a computationally simpler procedure of adding Gaussian noise to true magnitudes.For each realization of this ensemble, we perturb all deep field magnitudes by a draw from a Gaussian whose width is determined by the photometric zeropoint uncertainty in the Y3 deep fields catalog in a specified band, as computed in Hartley et al. (2022).We then "inject" these perturbed deep field fluxes with a mock Balrog procedure to generate wide field realizations of the galaxies and measure the corresponding ().In this way we generate a full ensemble of () realisations reflecting the uncertainty in our redshift calibration due to the photometric calibration.We apply Probability Integral Transforms (PITs) as in Myles & Alarcon et al. (2020) to transfer the variation encoded in the ensemble from simulated () to our fiducial data result.Essentially, this process involves calculating the inverse cumulative distribution function (CDF) for each simulated realization   () in the ensemble.The PIT is then obtained by computing the difference between the CDF of each realization and the average CDF of the entire ensemble.To apply these transformations to the data, the PIT value is added to the inverse CDF of the fiducial data ().The PIT resulting from a single draw of zero-point offsets is determined and collectively applied to all tomographic bins.More details on this new implementation of the PIT can be found in Myles et al. (2023).

Redshift Sample uncertainty
As mentioned in Section 2.5, we decided to choose three different catalogs to infer our redshift distributions from: a collection of spectroscopic surveys galaxies (Gschwend et al. 2018), PAU+COSMOS redshift as in Alarcon et al. (2021), and COSMOS30 photometric redshifts (Laigle et al. 2016).The reason for availing ourselves of more than one catalog lies in the fact neither of these are exempt from systematic uncertainties: each survey uses different photome-try, different model assumptions, and can be affected systematically by selection effects, incorrect templates, photometric outliers, etc.Since there is a considerable overlap in the number of galaxies belonging to more than one of the redshift catalogs selected for this work, to account for the intrinsic biases we decided to build three samples which are combinations of the aforementioned catalogs.
We ranked the redshift catalogs differently for each sample: if a galaxy has information from multiple origins, we assign the redshift from the highest ranked catalog.The three redshift samples SPC, PC, SC, are described in Section 2.5.For each of these, we will perform the complete pipeline, and the final set of realisation will be constructed by an equal fraction () = 1/3 from each survey.By placing equal prior probability to each sample, this is equivalent as saying that we do not believe any of the samples is more likely to be correct.But note that for galaxies from which we have information from only one catalog, we are assuming that information to be true, and this is a caveat of this approach.

Transfer function uncertainty
One of the key points in this redshift calibration is the transfer function (| ĉ), the intermediate step necessary to assign redshifts from deep field galaxies to the whole wide field.If the transfer function is inaccurate, regardless of how a precise the color/redshift characterisation is in the deep SOM, it can bias the final n(z) distributions.(| ĉ) depends on the observation conditions in that location, determining if the galaxy is detected or not.Observing conditions vary across the wide field, but for our analysis we are interested in redshift distributions estimated across all the footprint.Balrog injects the same deep galaxies in random wide tiles, and despite these covering only around ∼ 20% of the DES footprint, in Myles & Alarcon et al. (2020) was verified that Balrog is adequately sampling the observing conditions in the wide field.They boostrapped the sample by the injected position and recomputed 1000 different transfer functions.They concluded that the dispersion in the final n(z) mean redshift from repeating the analysis using each time a different transfer function was completely negligible.Here we repeated that test, since our deep field sample has less galaxies and might impact differently the transfer function.We found that this is also negligible for our case, with variations on the n(z) mean < 10 −3 in each tomographic bin, and therefore decided not to propagate this in the final n(z) estimate.

WZ Uncertainties
The WZ systematic uncertainties have been identified and characterised in detail for the WL sample in Gatti & Giannini et al. (2022).Namely, the systematic budget was found to be dominated by our lack of prior knowledge of the redshift evolution of the galaxy-matter bias of the unknown sample.This is also expected to be the case for the MagLim sample, although the amplitude of the effect might differ from the WL sample (ideally, since the MagLim redshift distributions are narrower, we might expect a smaller impact due to systematics slowly varying with redshift like the galaxy-matter bias of the unknown sample).Similarly to Gatti & Giannini et al. (2022), we model our systematics by means of a flexible function, Sys(s), which mostly captures the redshift evolution of the galaxy-matter of the unknown sample.The Sys(s) function is parameterized by s = { 1 ,  2 , . ..} that we will marginalize over and is given by: with   (  ) being the -th Legendre polynomial and  = 6 is the maximum order.In this work, we set the prior (s) to be a simple diagonal normal distribution, with the standard deviations { 0 , . . .,   } and means informed by the measured autocorrelation of the MagLim sample.
In Gatti & Giannini et al. (2022), such a systematic function was let to vary by the typical amplitude of the redshift evolution of the galaxy-matter bias of the WL sample we measured in simulations.In practice, this was achieved by imposing a Gaussian prior with zero mean (s) on the coefficients s of the systematic function.
In the case of the MagLim sample, we can use a more informative prior (s) that uses the information we have from the data about the galaxy-matter bias evolution of the sample.In particular, we rely on the fact that the MagLim sample has good per-galaxy redshift estimates, which allows us to divide the sample in relatively small bins and measure the auto-correlation of such bins.This was not possible for WL sample, due to the poor per-galaxy redshift accuracy.
To this aim, we use DNF 1-point estimates  mean to further divide the MagLim sample in bins of width of Δ  = 0.02, and we measure the auto-correlation of each bin.We note that the true width of each bin will be much larger than Δ  = 0.02, as the DNF photo- are uncertain.Under the approximation of negligible redshift evolution of the galaxy-matter bias of the MagLim sample over each thin bin, the measured autocorrelaton can be related to the galaxy-matter bias by knowing how broad the true () distribution of each bin is (Gatti et al. 2018;Cawthon et al. 2022): where  u,i ( ′ ) is indeed the true distribution of the thin bin MagLim sample.Such a quantity is estimated using the PDF estimate from DNF  PDF .
From this measurement performed in data we can then retrieve the galaxy bias  u () by inverting Eq. 16.We fit the Sys(s) function presented in Eq. 14 to the measured  u () and obtain best-fit s values, which we show in Figure 4.These best-fit coefficients are then used as the mean value of the Gaussian prior (s).The best fitting Sys(s) function to the data is shown in the right panel of Fig.

4.
To estimate the width of the prior p(s) we took a different approach.First, we estimate the bias evolution in simulations by dividing galaxies into thin redshift bins using: (i) the true redshifts from the simulation; and (ii) the photo-z estimated from the DNF code.When dividing the galaxies with the photo-z from DNF, we further correct the measured auto-correlation using Equation 16.These measurements are shown in the left panel of Figure 4.The discrepancy between the measured bias evolution from photo-z (equivalent to the application with real data) relative to the measured bias evolution with true redshifts (equivalent to the truth) is a systematic bias.We use the sum in quadrature of this difference with the statistical uncertainty of the bias measurement as the prior width of  0 .For the higher order parameters we estimate the standard deviation of the prior by summing in quadrature the ratio between the two biases and the statistical uncertainty from the bias measurement in data.This allows to best capture the RMS variations of the bias function itself.As can be seen in Figure 4, the 68% confidence interval spanned by the Sys(s) function both brackets the ideal and real world measurements.The values for the mean and width of the prior are displayed in Table 5.Both the width of the prior on the 0-th and higher order coefficients are much tighter than in Myles & Alarcon et al. (2020), where  0 = 0.6 and  1..4 = 0.15.As already explained, the difference lies in the initial accuracy of the photo-z estimates, that enables the measurement of the auto-correlation of the galaxy sample in thin redshift bins.For the weak lensing source sample such information was not available, and therefore a more conservative prior was deemed appropriate.In the MagLim sample case instead, the greater accuracy on its photo-z allows to extract more information from the auto-correlation.Last, we mention that an additional source of uncertainties for the WZ measurement is related to the impact of magnification.We do model magnification effects, but the accuracy of that model is limited by our knowledge of the magnification coefficients for the two samples.In particular, we do not have any prior knowledge of such a coefficients for the BOSS/eBOSS sample.Those coefficients are set to 0 for our fiducial analysis (on the contrary, estimates for the magnification coefficient of the MagLim sample are available).We expect magnification to have a small impact, based on tests performed in Gatti & Giannini et al. (2022), but we nonetheless test in the following section the impact of having a non null magnification coefficient for the BOSS/eBOSS sample.

Combination of SOMPZ and WZ
In order to combine SOMPZ and WZ constraints, we follow Gatti & Giannini et al. (2022) and write the clustering likelihood by forward modelling the full clustering signal as a function of the SOMPZ redshift distributions estimates () pz .Moreover, we include the systematic function Sys(s) introduced in the previous section, which describes the uncertainties on the WZ measurement, mostly driven by the lack of knowledge of  u and its redshift dependence: In the above equation, the quantities  u (  ) and  r (  ) are the magnification coefficients for the unknown and reference samples.See Gatti & Giannini et al. 2022 for full description of the magnification term .The clustering of dark matter  DM (  ) is estimated from theory assuming fixed cosmology.We tested that varying cosmology has a negligible impact on our methodology.
The likelihood of the WZ data conditioned on the target n(z) and all the systematic parameters reads as: were Σ  is the clustering covariance, estimated through jackknife, and p =  u ,  u .We implemented a Hamiltonian Monte Carlo sampler (HMC) that simultaneously samples the SOMPZ and WZ likelihood.The HMC does directly take as input the SOMs output of the sample variance estimation (described in 4.1.1),and it perturbs selectively the number counts in the SOMs in such a way to produce realisations that are already more likely to match the clustering redshift data.

RESULTS IN DATA
In this section, we present the final redshift distributions for the MagLim sample as obtained in data.We also compare the SOMPZ+WZ redshift distributions with the fiducial DNF+WZ estimates used for the same sample and adopted in the cosmological analysis presented in Porredon et al. (2021a).A complete validation of the method in simulations is presented in Appendix B. We first compare in Figure 5 the redshift estimates obtained using the 3sDir method and the estimates obtained including the WZ information as described in section 4. Due to logistics, the .3sDir distributions before (lighter shades) and after the combination with clustering-z (solid shades), and after the combination with clustering-z but using a broader prior on the parameters of the galaxymatter bias function Sys(s)(the same values of the width of the prior  (s) that were used in Gatti et al. 2022).In the top row we have bins 1 and 4, in the middle row bins 2 and 5, and in the bottom rows bin 3 and 6.The bands represent the 1 error from the central value.Note how the combination with WZ tightens the constraint on the shape of the n(z).
combination of the two methods was performed before incorporating the SOMPZ and zeropoint errors.As here we are just displaying the effect of the combination, we are showing only how the 3sDir uncertainty from sample variance and shot noise (from the three redshift samples) varies once we add the information from WZ.The combination of the two methods result in stronger constraints on the shape of the n(z), thanks to the complementarity in the information provided by each SOMPZ and WZ.Particularly, the WZ signal strongly correlates across adjacent bins, excluding large portions of possible n(z) shapes allowed by the SOMPZ likelihood alone, which are affected by sample variance fluctuations from the small calibration fields, and resulting in a smoother distribution.
The improvement on the uncertainty on the mean is more modest, but not null, as reported in Table 4. Usually, WZ data provides limited information on the mean redshift, especially compared to SOMPZ, as the systematic uncertainty on the galaxy bias evolution of the target sample is large and directly degenerate with the mean redshift, as is the case in Gatti & Giannini et al. (2022).However, in this work we have included a tighter prior on the Sys(s) function describing the galaxy bias evolution uncertainty by measuring it directly from the MagLim auto-correlation function.The addition of the WZ information has a modest impact on the values of the mean and width of the redshift distributions, at most at the 1 level (see Table 4); this is somewhat expected, as the WZ and SOMPZ information are independent, but consistent with each other.

Comparison with DNF
We find it interesting to compare the final SOMPZ+WZ redshift distributions with the fiducial ones used for DES Y3, obtained using DNF photometric estimates and clustering constraints (hereafter DNF+WZ).Since the two sets of distributions have been obtained with two different methods, we also briefly discuss the major differences between the two pipelines.The DNF code presented in 2.2.1 produces per-galaxy redshift estimates; these are stacked to produce the redshift distributions for the lens samples.Then, following Cawthon et al. (2022), a clustering redshift measurement is performed, using BOSS/eBOSS galaxies as reference sample, similarly to this work.The DNF n(z) are matched to the WZ-estimated n(z) through a chi-square fitting; in particular, the DNF n(z) are allowed to shift and stretch to improve the  2 .The maximum-aposteriori values of the shift and stretch and related uncertainties obtained through this matching procedure are used as a prior for the DNF n(z) shift and stretch used in the cosmological inference.Despite the DNF+WZ and SOMPZ+WZ methods using the same photometric and clustering measurements, the methodologies differ in a number of aspects: (i) SOMPZ vs DNF uncertainties: SOMPZ and DNF are both machine learning methods, but they are substantially different in spirit and implementation.DNF is a traditional supervised machine learning code where the likelihood (directional neighborhood) between wide field magnitudes/colors and redshift is learned from training with a subsample of galaxies with both reliable redshift information and measured wide field photometry.On the other hand, in SOMPZ machine learning is only used in an unsupervised fashion (without knowledge of redshift), to group self-similar parts of wide field magnitude/color space together.Then, these groups (wide SOM cells) are probabilistically related using Bayes theorem to the color-redshift relation measured empirically in the calibration deep fields, where much better information is available.The likelihood between each set of wide and deep field photometry is also measured empirically by injecting galaxies of the latter into images of the former.Furthermore, SOMPZ provides a comprehensive list of statistical as well as systematic uncertainties affecting the calibration samples which are rigorously propagated through the n(z).On the other hand, DNF only describes statistical uncertainties related to the residual differences to the closest training neighbors to the fitted hyperplane of the target galaxies.
(ii) Combination: The clustering information is included and combined with the photometric estimates in a substantially different way.In this work, SOMPZ and WZ are combined by sampling from the joint posterior using the HMC method.No approximation is performed when combining the two likelihoods.On the other hand, matching DNF n(z) to the WZ measurements it has been implicitly assumed that the DNF n(z) estimates can only be biased at the level of their mean and width, and that inaccuracies in the higher order moments of the n(z) can be neglected (or do not affect the matching procedure with the WZ measurements).However, if the DNF and WZ n(z) estimates are substantially different beyond their first two moments, the matching might cause biases (Gatti et al. 2018) also in the first and second moments.Furthermore, in the combination of the fiducial method, the DNF shape is only allowed to be modified by shifting and stretching it.Therefore the shift and stretch parameters are centered at the WZ values.This means that the photo-z priors for the cosmological inference only carry uncertainty from the WZ measurement, as this method does not propagate any systematic uncertainties related to uncertainty from the accuracy of DNF or the quality of its training sample photometry.In comparison, SOMPZ+WZ properly combines the statistical significance from SOMPZ and WZ yielding a final uncertainty that truly combines the information from each of them separately.Finally, the SOMPZ+WZ n(z) samples also capture the uncertainties in the higher moments of the redshift distributions, whereas the DNF+WZ uncertainties are only relative to the mean and width.
(iii) WZ distribution tails: The WZ measurements used to calibrate the DNF n(z) have clipped tails, since the measurements were performed in a restricted redshift window to avoid biases related to un-modelled magnification effects in the tails of the redshift distribution.On the other hand, in this work, when combining the clustering information with SOMPZ estimates, we use the WZ measurements over all the redshift range, since we also marginalise over magnification effects.
(iv) WZ galaxy-matter bias: The WZ measurements used in the DNF+WZ estimates are corrected for the redshift evolution of the galaxy-matter bias of the MagLim sample computed from autocorrelations measurements following Eq.16 (Cawthon et al. 2022).
As for this work we use the forward modelling approach described in Section 4.2, we instead do not correct directly for the bias, but from the MagLim auto-correlations we determine prior values of the parameters of our Sys(s), and then marginalise over possible bias functions in the sampling from the joint likelihood.We are therefore assuming an uncertainty on the galaxy-matter bias and validating the central value using SOMPZ data.
We must highlight that in Cawthon et al. (2022); Porredon et al. (2021a) several tests were performed to test the robustness of the DNF+WZ method.In particular, Cawthon et al. (2022) tested the performance of the clustering measurements in simulations, whereas Porredon et al. (2021a) tested that matching DNF n(z) to the WZ measurements was not introducing biases in the cosmological constraints, and that modelling only the uncertainties in the mean and width of the distributions was sufficient for the DES Y3 cosmological analysis.These tests should cover potential worries raised in points ii), iii) and iv) above for the DNF+WZ method.Having said this, any discrepancy between the SOMPZ+WZ n(z) and the DNF+WZ n(z) should boil down to the points listed above.
In Figure 6, the shapes and uncertainties of the two methodologies are compared, before and after the inclusion of WZ information, respectively in the left and right panel.Visually the DNF+WZ n(z) look very similar to the SOMPZ+WZ ones, although some discrepancies can be noticed (e.g., in the second bin).We report in Table 6 the redshift means and widths of the two sets of distributions, and their agreement.The means and widths are also visually compared in Figure 7.The agreement is computed assuming the uncertainties of the two methods to be uncorrelated, which is likely not true; therefore, the reported agreements are optimistic.Computing the level of correlation between the two redshift estimates is not trivial.The DNF+WZ estimates and uncertainties are driven only by the WZ measurements in the range where WZ measurements are available and magnification effects are negligible; the tails of the distribution, on the other hand, are described by the DNF estimates.The SOMPZ+WZ estimates receive contributions from both SOMPZ and WZ; if the SOMPZ method was to completely drive our estimates, then the SOMPZ+WZ and the DNF+WZ estimates could be assumed to be independent.This is likely the case for the mean redshift estimates, as we have seen that WZ is not particularly constraining on the mean redshift (see Figure 7).The width estimates are inferred more by the WZ measurements, and this might indicate that our tensions are under estimated, because we know that the two calibration methods share part of the WZ information.With this in mind, large tensions between means/widths of the two methods might indicate that either that the DNF+WZ uncertainties are under estimated, or there are some real differences between the two methods (one or both are biased).The reported values in Table 6 does not point to dramatic differences between the two methods: the most extreme statistical distance is 2.7 between means of Bin 2, and 2.3 between widths of Bin 6.
From Table 6 we note that SOMPZ+WZ uncertainties on the mean are larger than the DNF+WZ ones, while uncertainties on the widths are comparable.This is due to the fact that the uncertainties in the mean redshifts for the SOMPZ estimates are very sensitive to contributions from outliers at high redshift.The DNF+WZ mean redshift estimates (and uncertainties), on the other hand, are driven by the match with the WZ measurements with clipped tails, i.e., they do not take into account uncertainties in the tails, and are therefore smaller.The fact that the modelling of the tails is different between the two methodologies is also responsible for the slightly higher mean redshifts of the SOMPZ+WZ estimates compared to the DNF+WZ estimates.If we restrict the comparison of the aforementioned quantities in redshift intervals that exclude the tails of the distributions, the match between SOMPZ+WZ and DNF+WZ improves (Figure 7).We further investigate the importance of the tails on the cosmological constraints in Appendix D1, finding that, despite them being important, they do not drive the main difference between the SOMPZ+WZ and DNF+WZ constraints.

Galaxy-matter bias prior from WZ auto-correlation
We tested the impact on the ΛCDM cosmological parameters of using the same broad prior on the Sys(s) function describing the galaxy-matter bias as was done for the WL sample (Gatti et al. 2022).In this work we used more informative values computed from the clustering auto-correlation of the MagLim sample, the application of which is explained in more detail in Section 4.2.It is particularly interesting to look at the shape of distributions, especially for bin 2. Figure 5 shows in grey the 1-sigma bands for the case without using the auto-correlation, and leaving a much broader prior.While in most bins the difference is not appreciable, and the grey bands are very similar to the solid bands, in bin 2 there is an evident difference.It is therefore suggested that this implementation of the auto-correlation information used as priors in the SOMPZ+WZ combination is able to help us constraining the galaxy-matter bias value, in a way that otherwise would not have been possible with traditional methods.In figure 7 is shown the comparison over mean redshift and width of the distributions between SOMPZ+WZ with the more informative prior from the auto-correlation, against the broad prior (labelled as "SOMPZ+WZ (broad prior)").The means and widths are well compatible with the standard SOMPZ+WZ results, and for bins 2 and 3 they are slightly closer to the DNF+WZ results.Even in bin 2, where the shape of the n(z) is substantially different, the values of mean and width do not differ greatly from the standard case, reinforcing the notion that mean and width alone are not sufficient to fully characterise redshift distributions of a lens sample.

COSMOLOGICAL RESULTS
In this section, we show the constraints on cosmological and nuisance parameters obtained using the DES Y3 measurements for galaxy-galaxy lensing and galaxy clustering (Prat et al. 2022;Rodríguez-Monroy et al. 2022) (a.k.a. 2x2pt), and the n(z) from this paper.As in Porredon et al. (2021a), we also include in our analysis an additional likelihood constructed with the Shear Ratio (SR) measurements (Sánchez et al. 2022).This exploits galaxygalaxy lensing signal at small scales (< 6 Mpc/h) to provide further . We refer to these as are lower limits.Because the WZ measurement is very similar in the two cases, and the uncertainties summed in quadrature are correlated and therefore we are likely underestimating Δ <> .Right panel) Final n(z) realisations obtained from both SOMPZ and WZ methodology compared to the fiducial DNF distribution for MagLim (grey bands) after shifting and stretching them to fit WZ measurement.Since in the inference the shift and stretch values are marginalised over, the uncertainties of the gray bands are obtained by sampling over the allowed ranges of shift and stretch defined by the prior, and applied respectively to the DNF estimate.Note that for a fairer comparison of the methods, the two remaining uncertainties were applied to the SOMPZ ensemble (zeropoint and SOMPZ intrinsic), to include all the SOMPZ-related uncertainties.For both plots, in the top row we have bins 1 and 4, in the middle row bins 2 and 5, and in the bottom row bins 3 and 6. .constraint to the redshift distributions and intrinsic alignment parameters.The ratio of a galaxy-galaxy lensing signal of each lens sample redshift bin computed with respect to two source sample bins results in a primarily geometric measurement, which has been proven a powerful method for constraining systematics and nuisance parameters.This adds independent information from SOMPZ and WZ to the source redshift calibration.The posterior distribution obtained follows the Bayes theorem: where Π( |) is the prior distribution for all the parameters of the model .For the cosmological inference we use the CosmoSIS pipeline (Zuntz et al. 2015), and we sample the parameter posteriors using the PolyChord sampler (Handley et al. 2015a,b).Our data vector  = {(),   ()} is compared to theoretical predictions  ( ) = {(, ),   (, )} in a Bayesian fashion, and the posterior of the parameters conditional on the data is evaluated by assuming a Gaussian likelihood for the data: where  is the measurement covariance.In our analysis, we vary 5 (or 6) cosmological parameters assuming a ΛCDM (or wCDM) cosmology: Ω m ,  8 ,   , Ω b , ℎ 100 , and  for the wCDM case.Moreover, we also marginalise over "astrophysical" nuisance parameters (describing intrinsic alignment effects and the galaxy-matter bias of the lens sample), and calibration parameters (redshift uncertainties, shear measurement uncertainties).In short, our setup (covariance, parameters varied, prior ranges, etc.) is the same as the one adopted in Porredon et al. (2021a), except for the redshift () and uncertainties priors of the lens sample, where the ones obtained in this work have been assumed, and other minor changes that we describe below.All modelling and analysis choices, together with the calculations of the theoretical two-point functions, are described in detail in Krause et al. (2021).
Our analyses were not "blinded", since this work occurred after the "unblinding" of the DES Y3 3x2pt results.We did not perform any cosmological analysis until the redshift distributions were frozen; no changes to the redshift distributions (and uncertainties prior) have been performed after looking at the cosmological constraints.To ensure the robustness of our final estimates, we adopted a -value criteria on the best-fitting models to our data vector.Following Porredon et al. (2021a), we required the goodness-offit −value on unblinded data vectors was larger than 1 per cent.
The goodness-of-fit has been computed using the Predictive Posterior Distribution (PPD, Doux et al. 2021) and adopted in the main DES Y3 3x2pt analysis.The PPD methodology derives a calibrated probability-to-exceed ; in the case of goodness-of-fit tests, this is achieved by drawing realisations of the data vector for parameters drawn from the posterior under study which are then compared to actual observations.The distance metric ( 2 ) is computed in data space, which is then used to compute the -value.
Concerning the redshift uncertainties, as it is the primary goal of this work, we proceeded using the fiducial DES Y3 methodology: we parametrize the redshift uncertainties with two parameters for each tomographic bin, that modify a fiducial n(z) distribution with a shift on the mean and a stretch on the width.The fiducial n(z) is estimated by averaging the SOMPZ+WZ () realisations.The Gaussian priors on the mean and stretch parameters are centered at the mean and width of the fiducial n(z), while the Gaussian priors width are measured from the variance in the mean and width of the n(z) ensemble.This parametrization can be compared directly to the fiducial DES Y3 2x2pt analysis (Porredon et al. 2021a).In Appendix D we describe an alternative marginalisation of the redshift uncertainties, by marginalising over the full sets of n(z) realisations provided by the SOMPZ+WZ method.In principle, this latter method describes better the redshift uncertainties of our method.However, we find that the currently available techniques that marginalise over the full ensemble of realisations during cosmology inference are prohibitively computationally expensive.Therefore we defer its application to future work.
Besides the different n(z), we also ran a few analyses where we marginalised over magnification parameters of the lens samples over wide priors.This is different from Porredon et al. (2021a), where magnification parameters have been fixed.
For the fiducial DES Y3 2x2pt analysis, the p-value from the data-model  2 using all six bins of MagLim was not sufficient to pass the 1 per cent criteria.After a series of tests the consensus was that the two highest redshift tomographic bins were responsible for worsening the fit.Therefore the analysis in Porredon et al. (2021a) included only the first 4 MagLim bins.Here, we perform the analyses using all the 6 bins of the MagLim sample, but also using only the first 4 bins, to verify if the same applies also to this work using different redshift distributions.
In particular, we consider the following scenarios: • ΛCDM (CDM); 4 and 6 lens bins, fixed magnification.This is the fiducial analysis that mirrors the one presented in Porredon et al. (2021a).Five (six) cosmological parameters are varied, including Ω m ,  8 ,  s , Ω b , ℎ 100 (and  for the CDM case).Intrinsic alignment, shear measurement and redshift uncertainties parameters (of both lenses and sources) and galaxy-matter linear biases of the lenses also are marginalised over.The magnification coefficients of the lens sample, however, are fixed to the values estimated from Balrog (Everett et al. 2022).Uncertainties in the redshift distributions of the lens sample are modelled as a shift and stretch in the distributions.
• ΛCDM (CDM); 4 and 6 lens bins, free magnification.Same parameters as the ones above, but magnification parameters are marginalised over using Gaussian priors.This is an additional setup considered only after analysing the results from the aforementioned fixed magnification setup.
In what follows, we will also quote results in terms of the  8 parameter, defined as  8 ≡  8 (Ω m /0.3) 0.5 .In Table 7 we sum-marise best fit values of  8 , Ω  ,  8 , , and the computed PPD goodness-of-fit p-value for all the different analyses.

Fiducial results: 4 bins, fixed magnification and comparison with DNF results
The first cosmological constraints we analyse are the ones obtained assuming a ΛCDM cosmology, using 4 lens bins and fixed magnification parameters.The decision on which set of results will be quoted as "fiducial" for this work had to be made before conducting any cosmological analysis on data.We initially planned to only run the fiducial analyses with fixed magnification, as in Porredon et al. (2021b).The choice between 4 or 6 lens bins would depend on the -value criteria: if the ΛCDM, 6 bins, fixed magnification scenario were to yield a -value above the specified threshold, then we would favour that configuration.This analysis though did not fulfil our -value criteria (-value = 0.008, see Table 7), similarly as for the analysis ran with the same settings but using the fiducial redshift distributions from DNF; hence, we do not show those results here.We then chose as fiducial the ΛCDM, 4 bins, fixed magnification analysis, which is equivalent to the "fiducial" setup assumed in Porredon et al. (2021b), which also allows us to compare our results directly to the ones obtained using the DNF+WZ n(z).
The posterior on the cosmological parameters Ω m , and  8 is shown in the left panel of Fig. 8; the marginalised mean values of  8 , Ω m , and  8 , along with the 68% confidence intervals, are: 8 = 0.81 ± 0.07, ( 22) The PPD goodness-of-fit test for this analysis results into −value=0.029,well above our threshold (see also Table 7).In the left panel of Fig. 8 we also compare our results with the constraints obtained using the fiducial DNF+WZ n(z).The size of the posteriors is similar for the two cases, but the two posteriors are slightly shifted; the distance between the posteriors' peaks in the 2D Ω m −  8 plane is  ∼ 0.4.In DES Y3 we impose a 0.3 threshold for differences in the Ω m −  8 plane induced by different analysis choices, as larger statistical distances would indicate the presence of systematic uncertainties unaccounted for; these results would apparently violate this criteria.We note, however, that the (arbitrary) 0.3 threshold adopted by DES refers to differences in the Ω m −  8 plane when noiseless theory data vectors are assumed.In the presence of noisy data vectors these differences can become larger, without invalidating our criteria.Having said this, a  ∼ 0.4 difference nonetheless show the large impact a different redshift calibration of the lens sample can have on the cosmological constraints.This is somewhat different from the results obtained for the source sample () (Amon et al. 2022), where uncertainties in the redshift calibration had a negligible impact on the cosmological constraints.
In Section 4.2 we explained how for the combination of the two methods we marginalise over possible functional forms of the unknown galaxy-matter bias of the MagLim sample, by means of the systematic function Sys(s) in our clustering model.The prior on the parameters s is inferred from the clustering auto-correlation.We tested the impact on the redshift distributions of using a broader prior (the same used in Myles & Alarcon et al. 2020) in Section 5. We have tested the impact of using these n(z) for the cosmological inference, and found that there is no change in constraining power and no shift for Ω m , but there is a shift on  8 such to overlap with the fiducial results from DNF+WZ.Therefore it is clear that the information carried by the auto-correlation is crucial in our cosmological analysis.

4 and 6 bins, free magnification
As supplementary analyses, we then proceed to relax the fixed priors on the magnification parameters for the lens sample.Instead of fixing them to the values estimated from Elvin-Poole et al. ( 2021) (as done in the previous section), we leave them as free parameters, using Gaussian priors.In short, Elvin-Poole et al. ( 2021) estimate the magnification parameters using Balrog, by injecting fake galaxies into the wide field with and without applying a small magnification; the difference between the number of galaxies passing the selection in the two cases is then used to estimate the magnification parameters of the sample.These parameters come with a small uncertainty, which is however ignored in the fiducial analysis, as the magnification parameters are assumed to be fixed to the mean Balrog value.The central values and the uncertainties are reported in Table C1 in Appendix C. One of the main reasons the DES Y3 fiducial analysis did not vary the magnification parameters was merely computational, as 4 (or 6) additional parameters lengthen the parameter inference process.In principle there is no reason to doubt these estimates.Differences might be caused by the fact that the Balrog injections do not completely sample the full DES Y3 footprint, or in case our injections were not fully representative of the DES sample we are analysing.
When varying these parameters in our analyses, we find that the −value computed using PPD indicates a good fit of the model to the data not only for the 4 bins case, but also for 6 bins case (see Table 7).Adding the last 2 lens bins significantly improves the constraining power on Ω m by 30% compared to the 4 bins case, whereas the constraints on  8 are 20% tighter.

wCDM Results
We then proceed to analyse the results obtained with CDM, for all four cases: 4 and 6 bins, fixed and free magnification, as described in the previous section.Parameter posteriors are shown in Fig. 9, whereas p-values and parameters constraints are reported in Table 7.All the reported p-values are above our  = 0.01 threshold.
In general, the 2x2pt constraints on  are loose and affected by the prior (−2 <  < −0.3), but compatible with a ΛCDM scenario.With respect to ΛCDM 4 bins case, freeing  loosens the constraint on  8 (both with fixed and with free magnification) by ∼ 30%, while leaves it unvaried for Ω  .For the 6 bins, we are unable to directly compare to the fixed magnification case, but for the free magnification the constraint on  8 is ∼ 25% looser, while, similarly to the 4 bins case, it is unvaried for Ω m .
Passing from the 4 bins to the 6 bins configuration, besides increasing the constraints on  8 , also the constraints on  improves (by ∼20%), although part of the improvement is due to the posterior partially hitting the prior edge.
Freeing the magnification parameters slightly shifts  towards the upper edge of the prior ( = −0.3),and  8 slightly towards higher values, due to a degeneracy between ,  8 , and the magnification parameters of the two highest lens bins, which are now fairly broad (see Table C1).Such a shift is not present in the case of 4 bins, as the Gaussian priors used for the first 4 magnification parameters are much tighter.The "fiducial" posteriors have been obtained using the DNF+WZ redshift distributions, and they are compared to the ones obtained using the SOMPZ+WZ redshift distributions.Right panel: Posterior distributions of the cosmological parameters Ω m , and  8 for the ΛCDM analysis for three different cases: 1) 4 bins and fixed magnification parameters (the blue contours in the two plots share the same analysis choices); 2) 4 bins and marginalised over magnification parameters (in solid green); 3) 6 bins and marginalising over magnification parameters (in solid red).The 2D marginalised contours in both of these figures show the 68 per cent and 95 per cent confidence levels.
Table 7. Constraints on the cosmological parameters Ω m ,  , and  8 .For each parameter we report the mean of the posterior and the 68 per cent confidence interval.We also report the PPD goodness-of-fit -value and the probability of the parameter difference (computed over the full parameter space) between the analyses considered in this work and Planck TTTEEE0 lowl lowE (Aghanim et al. 2020)

Statistical distance to Planck
We compute here the statistical distances between our cosmological constraints and the early Universe ones from the Planck satellite (Aghanim et al. 2020).To this aim, we used the algorithm presented in Raveri & Doux (2021), which estimates the probability of tension between parameters via Monte Carlo approximation.In particular, the probability of tension between parameters can be expressed as follows: where   represents the prior volume, while P  and P  represent two posterior parameter distributions under study.The probability of having a shift in the parameter space is described by the parameter shifts density: This refers to the posterior portion beyond the constant probability contour for no shift, Δ = 0.The integration in Eq. ( 25) is performed via Monte Carlo techniques.
The comparison between the results has been performed considering all the parameters shared by our analyses and Planck.The values are reported in the last column of Table 7; we find no sign of significant tension (< 3) in any of the analysis setups considered.In particular, we find that for the 4 bins case for ΛCDM (both fixed and free magnification) there is good agreement (1.15, 1.11), similarly for wCDM with 4 bins we have 0.46 for both fixed and free magnification.For the 6 bins cases the values are larger (2.2 − 2.4), but still below the 3 threshold.C1); see text for more details.

CONCLUSIONS
In this paper, we presented an alternative calibration of the MagLim lens sample redshift distributions from the Dark Energy Survey (DES) first three years of data (Y3).This new method, which has already been applied to the DES Y3 weak lensing sample (Myles & Alarcon et al. 2020), is based on a combination of a Self-Organising Maps (SOMPZ) based scheme and clustering redshifts (WZ) to estimate redshift distributions and inherent uncertainties.The original redshift calibration of the MagLim sample (and cosmological results obtained adopting that calibration) have been originally presented in Porredon et al. (2021a), and has been based on the photo- code DNF (De Vicente et al. 2016) and WZ constraints (Cawthon et al. 2022).The methodology presented in this paper is meant to be more accurate than the original one.First, the SOMPZ method allows a better control over all the potential sources of uncertainties affecting the estimates compared to DNF; second, the clustering constraints (WZ) are incorporated through a rigorous joint likelihood framework which allows to draw n(z) samples conditioned on both clustering and photometric measurements, improving the n(z) estimates (e.g., the final "SOMPZ+WZ" n(z) have a smaller scatter, or uncertainty, compared to the SOMPZ ones, see Figure 5).We described in detail the methodology followed to produce the alternative MagLim n(z) based on the SOMPZ+WZ approach, together with a detailed report on the main systematics dominating our calibration error budget.Our redshift uncertainties, in particular, are dominated by the impact of sample variance on the SOMPZ estimate (due to the limited area spanned by the deep field sample used in the calibration) and by the effect of the redshift evolution of the galaxy-matter bias of the MagLim sample on the WZ constraints.We then compared our SOMPZ+WZ n(z) with the fiducial DNF+WZ n(z) estimates; the means and widths of the 6 MagLim tomographic bins show moderate statistical distances, with the largest deviation of 2.7 in bin 2 (see Table 6).We also found the uncertainties on mean of the redshift distributions of the SOMPZ+WZ method to be slightly larger than the ones of the DNF+WZ method, due to a more conservative calibration of the tails of the redshift distributions.On the other hand, we found the two methods to have a similar constraining power on the widths of the distributions.
We then proceeded investigating the impact on the cosmological constraints of our new redshift calibration.In particular, we used the DES Y3 galaxy-galaxy lensing and galaxy clustering measurements (Prat et al. 2022;Rodríguez-Monroy et al. 2022) (a.k.a.2x2pt), and the n(z) from this work, and compared to the results from Porredon et al. (2021a).In the "fiducial" configuration, which involves using the first 4 lens bins and assuming a ΛCDM cosmology, we obtained as marginalised mean values Ω m = 0.30 ± 0.04,  8 = 0.81 ± 0.07 and  8 = 0.81 ± 0.04.We noted a ∼ 0.4 shift in the Ω −  8 plane compared to the Porredon et al. (2021a) results, but no change in terms of constraining power.The shift indicates that the redshift calibration of the lens sample plays a key role on cosmological constraints from the 2x2pt analysis, contrary to the redshift calibration of the source sample (Amon et al. 2022).Subsequently, we explored different analysis setups; we tested the case where all the 6 MagLim redshift bins were included, a scenario where the magnification coefficients of the lens sample were marginalised during the inference, and last, we assumed a CDM cosmology.We found that the inclusion of the last two redshift bins of the MagLim sample help improving the constraints on Ω m by ∼ 25%, and on  8 by ∼ 20%.
We also compared our results to the cosmological constraints from Planck (Aghanim et al. 2020), finding a no-tension of 1.15 between the results when 4 lens bins where considered.We did find a statistical distance of 2.41 in ΛCDM with free magnification coefficients when including in the analysis the two high redshift bins ( > 0.85), which have not been included in the fiducial DES Y3 analysis (Porredon et al. 2021a).
As a final comment, despite the SOMPZ+WZ method's ability to produce n(z) samples capturing the redshift uncertainties of our estimates, we could not efficiently marginalise over these realisation during the cosmological inference, due to computational constraints.Our marginalisation strategy followed the one adopted in Porredon et al. (2021a): we adopted the mean of the SOMPZ+WZ samples as our fiducial n(z), and marginalised over a shift in the mean and a stretch of the width of the distribution, using as priors the variances in the mean and widths of the SOMPZ+WZ n(z) samples.While this strategy was deemed sufficient for this current work, we plan to implement the full marginalisation scheme for subsequent analyses of the lens samples with DES Y6 data.

Number density
Bin  .Estimated () in four tomographic bins using a 12x12 cell deep SOM and 32x32 cell wide SOM trained on Buzzard simulations.In the top row we have bin 1 and 4, in the middle row bin 2 and 5, and in the bottom row bin 3 and 6.The Redshift sample used here has 100000 galaxies drawn from 1.38 deg 2 , such that after the MagLim selection it yields ∼ 15000 unique galaxies, which is the same order of magnitude as the redshift samples in data, see Table 2.The deep sample is drawn from three fields of size 3.32, 3.29, and 1.94 deg 2 , respectively from the Buzzard simulated sky catalog.The black dashed line marks the true value, the transparent bands are the 3sDir set of n(z) and the solid bands are the realisations once combined with clustering redshifts.We can appreciate the effect of the combined likelihood, resulting in distributions more constrained in terms of shape, and still consistent with the truth.ally, we fixed the source galaxies redshift distributions, to ensure any deviation from the true parameter values of the simulation would be caused by the lens n(z) alone.The mean values of  8 , Ω m (and ), with their respective 68% confidence intervals, are: • ΛCDM:  8 = 0.73 ± 0.18, Ω m = 0.31 ± 0.07; • CDM:  8 = 0.71 ± 0.18, Ω m = 0.30 ± 0.08,  = -1.3± 0.4.
For both analyses, the posterior distributions successfully recovered the input parameters (see Section 2), as displayed in Figure B2.

APPENDIX C: COSMOLOGICAL PARAMETERS
In Table C1 are listed all the cosmological parameters included in our fiducial analysis.

APPENDIX D: REDSHIFT UNCERTAINTIES SAMPLING STRATEGY
How redshift uncertainties are propagated in the cosmological analysis can have an impact on the final result.In this section we discuss different strategies to marginalise over the redshift uncertainties of our sample during the cosmological inference.Because we have can rely on a full ensemble of n(z) shapes capturing our redshift uncertainties, we can compare three different sampling methods: • Shift: we compress the realisations by computing their average, and marginalise over a shift on the mean; • Shift and stretch: we compress the realisations by computing  their average, and marginalise over both a shift on the mean and on a stretch on the width; • Full shape: we provide as input all the produced realisations and we rank them by one of their properties using the Hyperrank method (Cordero et al. 2022), marginalising over the full shape of the distributions.
Using only shifts is the methodology usually adopted to model redshift uncertainties in weak lensing sample, as the weak lensing kernel is mostly sensitive to the mean of the redshift distributions.On the other hand, clustering and galaxy-galaxy lensing measurements are also very sensitive to the width of the lens redshift distributions; therefore, the shift and stretch approach is preferred.The full shape marginalisation, in theory, is more accurate, because it accounts for the uncertainties in the higher order moments of the distribution; however, depending on the science case, it might not make a huge impact on the final constraints.The full shape marginalisation is implemented via hyperrank (Cordero et al. 2022), which is an algorithm that orders realisations of the ensemble according to a parameter, which facilitates the sampling and marginalization over the n(z) ensemble within the cosmological likelihood Markov chains.Hyperrank was also implemented for the WL sources, although it had a negligible impact on the results.The quantity chosen for the ranking in that case was the mean.We decided for this case it would be more appropriate to perform the optimised ranking of the realisation by the 68% sigma rather than the mean, and we tested it indeed improved the performance of the sampling.To test the different sampling strategies, we built a synthetic noiseless data vector based on theory predictions at fixed cosmology and we used as n(z) the realisations average of the SOMPZ+WZ estimates in data.We then marginalised over redshift uncertainties using the three approaches aforementioned.We performed this test both using 4 or 6 lens bins, although here we are just going to show the posteriors obtained with 4 bins as they are not qualitatively different from the ones with 6 bins.The results of this test are shown in Figure D1, where we show the posterior of  8 , Ω m and for sake of simplicity, two out of the four galaxy-matter linear biases.
Focusing on the shift and shift+stretch contours, one can notice that the width of the contour in the direction perpendicular to the degeneration axis is larger for the shift+stretch.This is related to impact of the additional marginalisation over the width of the distributions.One caveat is that in our marginalisation scheme (as adopted in the main DES Y3 2x2pt analysis), we are implicitly neglecting correlations between the uncertainties in the mean and widths of the distributions, which usually show a certain degree of correlation (from ∼ 10% to ∼ 30%, depending from the tomographic bin).These are neglected, which might translate in a slight overestimation of our constraints.When marginalising over the uncertainties using the hyperrank framework, on the other hand, such correlations are implicitly accounted for.Indeed, one can notice that the hyperrank posteriors are slightly tighter than the shift or shift-stretch posteriors.
Unfortunately, we did not manage to successfully apply hyperrank to the data.When performing the cosmological analysis on data using hyperrank, we found significantly less smooth posteriors compared to our tests on simulations.A similar behaviour has also been found when applying hyperrank to the DES Y3 source sample Amon et al. (2022), and it has been interpreted as a consequence of a possible larger degree of complexity of the redshift distributions of our data compared to simulations.We attempted both to artificially smooth our n(z) and to increase the number of samples from the SOMPZ+WZ method, without reaching a satisfactory level.Due to the very high computational cost of running a cosmological chain using hyperrank, we could only test a few different levels of smoothing before deciding to abandon hyperrank for the present work, and choose the shift+stretch as photo- uncertainty marginalisation methodology.For DES Y6, we plan to apply several tools that will speed up our cosmological inference, enabling more tests on hyperrank, which has great potential and whose implementation is a goal for the DES Y6 analysis.

D1 Cosmological constraints with clipped n(z) tails
Here we test whether the difference between DNF+WZ and SOMPZ+WZ constraints (Fig. 8) were only due to the different treatment of redshift outliers and of the tails of the redshift distri-  butions.We artificially removed the tails from the DNF+WZ and SOMPZ+WZ n(z) (i.e., we set the distributions to zero), and repeated our cosmological analysis.We used as definition of the tails the same interval used to calibrate the DNF distribution with the WZ constraints adopted in Porredon et al. (2021a).Results for the ΛCDM case, 4 bins and fixed magnification are shown in Fig. D2.By removing the tails, both posteriors are shifted, which means that the calibration of the tails of the redshift distribution is important for our cosmological analysis.Since the two posteriors are shifted but they still do not overlap, we can assume that the differences in the bulk of the redshift distributions inferred by two methods is also crucially driving the differences at the constraints level seen in Fig. 8.
This paper has been typeset from a T E X/L A T E X file prepared by the author.

Figure 1 .
Figure 1.Scheme illustrating the operation of Balrog: the practically noiseless deep fields galaxies are injected many times in DES real wide field images; those dichotomous images are then processed through the fiducial DES detection pipeline, to construct a sample containing several noisy representations of the same deep galaxies.

Figure 2 .
Figure 2. Flowchart illustrating the MagLim redshift distributions calibration scheme.The two methodologies included in the analysis are SOMPZ and clustering redshifts.Inspired by the flowchart in Myles & Alarcon et al. 2020.

Figure 3 .
Figure3.Uncertainty on the mean redshift represented by the number counts of the three redshift samples: SPC (prioritizes spectra, than PAU photo-z, then COSMOS30), PC (prioritizes PAU photo-z, then COSMOS30) and SC (prioritizes spectra, then COSMOS30).In red the total uncertainty given by their combination.
Figure5.3sDir distributions before (lighter shades) and after the combination with clustering-z (solid shades), and after the combination with clustering-z but using a broader prior on the parameters of the galaxymatter bias function Sys(s)(the same values of the width of the prior  (s) that were used inGatti et al. 2022).In the top row we have bins 1 and 4, in the middle row bins 2 and 5, and in the bottom rows bin 3 and 6.The bands represent the 1 error from the central value.Note how the combination with WZ tightens the constraint on the shape of the n(z).

Figure 6 .
Figure 6.Left panel) Final n(z) realisations obtained from the SOMPZ methodology alone compared to the fiducial DNF distribution for MagLim (in black).Right panel) Final n(z) realisations obtained from both SOMPZ and WZ methodology compared to the fiducial DNF distribution for MagLim (grey bands) after shifting and stretching them to fit WZ measurement.Since in the inference the shift and stretch values are marginalised over, the uncertainties of the gray bands are obtained by sampling over the allowed ranges of shift and stretch defined by the prior, and applied respectively to the DNF estimate.Note that for a fairer comparison of the methods, the two remaining uncertainties were applied to the SOMPZ ensemble (zeropoint and SOMPZ intrinsic), to include all the SOMPZ-related uncertainties.For both plots, in the top row we have bins 1 and 4, in the middle row bins 2 and 5, and in the bottom row bins 3 and 6. .

Figure 7 .
Figure 7.Visual representation of the uncertainties on mean (above) and width (below) of the redshift distributions estimated using the SOMPZ (square markers) and DNF (round markers) methods, before and after including the WZ information, for each tomographic bin.Below the dashed line is the comparison of the values computed in the redshift range used for the  2 fit of the DNF estimate with the smoothed WZ n(z).

Figure 9 .
Figure 9.Posterior distributions of the cosmological parameters Ω m , and  8 and  for four different cases: 1) wCDM, 4 bins and fixed magnification parameters; 2) wCDM, 6 bins and fixed magnification parameters, 3) wCDM, 4 bins and free magnification parameters; 4) wCDM, 6 bins and free magnification parameters.The 2D marginalised contours in these figures show the 68 per cent and 95 per cent confidence levels.We note that the posteriors of  for the 6 bins cases are partially affected by the prior edge ( ∈ [ −2, −0.33], TableC1); see text for more details.

Figure A1 .
Figure A1.Comparison of -band magnitudes and  − ,  −  colors of the 6 bins of the MagLim sample, between data (blue) and simulations, before (green) and after re-weighting (red).The re-weighting process has proven successful in yielding magnitude distributions that closely resemble those observed in the actual data.
Figure B1.Estimated () in four tomographic bins using a 12x12 cell deep SOM and 32x32 cell wide SOM trained on Buzzard simulations.In the top row we have bin 1 and 4, in the middle row bin 2 and 5, and in the bottom row bin 3 and 6.The Redshift sample used here has 100000 galaxies drawn from 1.38 deg 2 , such that after the MagLim selection it yields ∼ 15000 unique galaxies, which is the same order of magnitude as the redshift samples in data, see Table2.The deep sample is drawn from three fields of size 3.32, 3.29, and 1.94 deg 2 , respectively from the Buzzard simulated sky catalog.The black dashed line marks the true value, the transparent bands are the 3sDir set of n(z) and the solid bands are the realisations once combined with clustering redshifts.We can appreciate the effect of the combined likelihood, resulting in distributions more constrained in terms of shape, and still consistent with the truth.

Figure B2 .
Figure B2.Posterior distributions of the cosmological parameters Ω m ,  8 , and  for the ΛCDM and CDM analyses.These have been run with 6 bins and fixed magnification parameters.

Figure D1 .
Figure D1.Posterior distributions of the cosmological parameters Ω m ,  8 , and two out of four of the galaxy-matter biases ( 2 ,  4 ) for the ΛCDM analysis involving 4 bins and fixed magnification parameters.These analyses have been obtained assuming a theoretical datavector and adopting different marginalisation schemes on the redshift distribution of the lens sample.

Figure D2 .
Figure D2.Same as the left panel of Fig.8, but with two additional posteriors overplotted representing the constraints obtained using the redshift distributions with "clipped" tails.

Table 1 .
(Porredon et al. 2021b1ample.We have outlined for each tomographic bin the redshift range (selected using DNF   ), the number of galaxies, the number density, and the magnification coefficient as measured inElvin-Poole et al. 2021and consists of bright galaxies selected with an ad-hoc selection that optimises the number density and the redshift accuracy of the sample(Porredon et al. 2021b).The MagLim sample spans the full DES Y3 wide field footprint, for a total of ∼ 4143 deg 2 .SOF magnitudes in the  bands 1 are used for the selection and photometry.The selection is meant to be linear in redshift and magnitude, and reads < 4 *  mean + 18  > 17.5, (1) where   the i-band SOF magnitude and  mean is a per-object redshift estimate from the photo- code DNF (De Vicente et al. 2016); see also next subsection).The sample is then further limited to the redshift range 0.2 <  mean < 1.05.This leads to a sample that ranges from 18.8 <  mag < 22.2 The MagLim sample is divided into 6 tomographic bins using DNF  mean and considering the following bin edges: [0.2, 0.4, 0.55, 0.7, 0.85, 0.95, 1.05], with a total of a 10,716,506 galaxies, distributed across bins as summarised in Table

Table 2 .
Number of unique galaxies belonging to each of the three redshift catalogs (spectroscopic collection, COSMOS, and PAU) for each of the samples SPC (composed by galaxies from spectra, PAU, COSMOS in this order), SC (spectra, COSMOS), PC (PAU, COSMOS).The sample selection for the MagLim sample applied to the corresponding Balrog injections reduces greatly the size of all samples.For more information, see Section 2.5.

Table 4 .
Summary of values for systematic uncertainties and center values for mean (top panel) and width (bottom panel) for the n(z) distributions.

Table 5 .
Left panel: galaxy-matter bias of Bin 1 or the MagLim sample (0.2 < z < 0.4) as estimated in simulation following the methodology outlined in Section 4.2.The green points are obtained by dividing the sample into thin bins using the true redshifts, while the orange ones are obtained by binning the sample using the DNF redshift estimates.The grey band encompasses the 68% confidence interval of the Sys(s) function.Right panel: galaxy-matter bias of Bin 1 of the MagLim sample (0.2 < z < 0.4) as measured from the data (orange points); the blue line shows the best-fitting Sys(s) function, and the grey band encompasses its 68% confidence interval.Means and widths of the Gaussian prior function  (s) appearing in Eq. 18.

Table 6 .
Values of mean and width of the SOMPZ+WZ final ensemble of distributions and the DNF estimate.The statistical difference Δ <> is computed by considering the uncertainties of both methods summed in quadrature, as in Left panel: Posterior distributions of the cosmological parameters Ω m , and  8 for the ΛCDM analysis involving 4 bins and fixed magnification parameters.
. The fiducial results from this work is reported in bold in the first row, while the official, fiducial results of DES Y3 are reported in bold in the second to last row.
ogy Facilities Council of the United Kingdom, the Higher Education Funding Council for England, the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign, the Kavli Institute of Cosmological Physics at the University of Chicago, the Center for Cosmology and Astro-Particle Physics at the Ohio State University, the Mitchell Institute for Fundamental Physics and Astronomy at Texas A&M University, Financiadora de Estudos e Projetos, Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro, Conselho Nacional de Desenvolvimento Científico e Tecnológico and the Ministério da Ciência, Tecnologia e Inovação, the Deutsche Forschungsgemeinschaft and the Collaborating Institutions in the Dark Energy Survey.

Table A1 .
Number densities of the MagLim sample in Buzzard as obtained with the fiducial MagLim selection, and with the one adapted for Buzzard.

Table B1 .
SIMULATIONS: Summary of values for center values for mean (top panel) and width (bottom panel) for the n(z) distributions as measured in the Buzzard simulations.The values related to SOMPZ and SOMPZ+WZ refer to Figure B1.Note that the uncertainties quoted here only include sample variance and shot noise.

Table C1 .
The parameters and their priors used in the fiducial MagLim ΛCDM and CDM analyses.The parameter  is fixed to −1 in ΛCDM.Square brackets denote a flat prior, while parentheses denote a Gaussian prior of the form N ( , ).

1
Department of Astronomy and Astrophysics, University of Chicago, Chicago, IL 60637, USA 2 Kavli Institute for Cosmological Physics, University of Chicago, Chicago, IL 60637, USA 3 Institut de Física d'Altes Energies (IFAE), The Barcelona Institute of Science and Technology, Campus UAB, 08193 Bellaterra (Barcelona) Spain 4 Argonne National Laboratory, 9700 South Cass Avenue, Lemont, IL 60439, USA 5 Department of Physics and Astronomy, University of Pennsylvania, Philadelphia, PA 19104, USA 6 Center for Cosmology and Astro-Particle Physics, The Ohio State University, Columbus, OH 43210, USA 7 Department of Physics, The Ohio State University, Columbus, OH 43210,