Cosmic Shear Cosmology Beyond 2-Point Statistics: A Combined Peak Count and Correlation Function Analysis of DES-Y1

We constrain cosmological parameters from a joint cosmic shear analysis of peak-counts and the two-point shear correlation functions, as measured from the Dark Energy Survey (DES-Y1). We find the structure growth parameter $S_8\equiv \sigma_8\sqrt{\Omega_{\rm m}/0.3} = 0.766^{+0.033}_{-0.038}$, which at 4.8% precision, provides one of the tightest constraints on $S_8$ from the DES-Y1 weak lensing data. In our simulation-based method we determine the expected DES-Y1 peak-count signal for a range of cosmologies sampled in four $w$CDM parameters ($\Omega_{\rm m}$, $\sigma_8$, $h$, $w_0$). We also determine the joint covariance matrix with over 1000 realisations at our fiducial cosmology. With mock DES-Y1 data we calibrate the impact of photometric redshift and shear calibration uncertainty on the peak-count, marginalising over these uncertainties in our cosmological analysis. Using dedicated training samples we show that our measurements are unaffected by mass resolution limits in the simulation, and that our constraints are robust against uncertainty in the effect of baryon feedback. Accurate modelling for the impact of intrinsic alignments on the tomographic peak-count remains a challenge, currently limiting our exploitation of cross-correlated peak counts between high and low redshift bins. We demonstrate that once calibrated, a fully tomographic joint peak-count and correlation functions analysis has the potential to reach a 3% precision on $S_8$ for DES-Y1. Our methodology can be adopted to model any statistic that is sensitive to the non-Gaussian information encoded in the shear field. In order to accelerate the development of these beyond-two-point cosmic shear studies, our simulations are made available to the community upon request.


INTRODUCTION
Over the last decade, weak gravitational lensing has emerged as one of the most promising techniques to investigate the properties of our Universe on cosmic scales. Based on the analysis of small distortions between the shapes of millions of galaxies, weak lensing by large scale structures, or cosmic shear, can directly probe the total projected mass distribution between the observer and the source galaxies, as well as place tight constraints on a number of other cosmological parameters (for recent reviews of weak lensing as a cosmic probe, see Kilbinger 2015). Following the success of the Canada-France-Hawaii Telescope Lensing Survey (Heymans et al. 2012;Erben et al. 2013), a series of E-mail: joachim.harnois-deraps@ncl.ac.uk dedicated Stage-III weak lensing experiments, namely the Kilo Degree Survey 1 , the Dark Energy Survey 2 and the Hyper Suprime Camera Survey 3 , were launched and aimed at constraining properties of dark matter to within a few percent. These are now well advanced or have recently completed their data acquisition, and the community is preparing for the next generation of Stage IV experiments, notably the Rubin observatory 4 , and the Euclid 5 and Nancy Grace Roman 6 space telescopes. The central approach adopted by these surveys for constraining cosmology is based on two-point statistics -mostly either in the form of correlation functions (e.g. Kilbinger et al. 2013;Troxel et al. 2018;Hamana et al. 2020;Asgari et al. 2020a) or its Fourier equivalent, the power spectrum, estimated using pseudo-C (Hikage et al. 2019), band powers (Becker et al. 2016;van Uitert et al. 2018;Joachimi et al. 2020) and quadratic estimators (Köhlinger et al. 2017). By definition, these two-point functions can potentially capture all possible cosmological information contained in a linear, Gaussian density field, and are thus highly efficient at analysing large scale structure data. They have been thoroughly studied in terms of signal modelling (Kilbinger et al. 2017), measurement (Jarvis et al. 2004;Schneider et al. 2002;Alonso et al. 2019) and systematics (Mandelbaum 2018).
With the improved accuracy and precision provided by current and upcoming surveys, it becomes increasingly appealing to probe small angular scales, where the signal is the strongest. In doing so, the measurements are intrinsically affected by the non-Gaussian nature of the matter density field, and it is natural to seek analysis techniques that can extract the additional cosmological information that two-point functions fail to capture. A variety of alternative methods have been applied to lensing data with this in mind, including three-point functions (Fu et al. 2014), Minkowski functionals and lensing moments (Petri et al. 2015), peak count statistics (Liu et al. 2015b,a;Kacprzak et al. 2016;Martinet et al. 2018;Shan et al. 2018), density split statistics (Gruen et al. 2018), clipping of the shear field (Giblin et al. 2018), convolutional neural networks (Fluri et al. 2019) and neural data compression of lensing map summary statistics (Jeffrey et al. 2021). Other promising techniques are also being developed, notably the scattering transform (Cheng et al. 2020), persistent homology (Heydenreich et al. 2020a), lensing skewspectrum (Munshi et al. 2020), lensing minimas (Coulton et al. 2020) and moments of the lensing mass maps (van Waerbeke et al. 2013;Gatti et al. 2020).
While existing non-Gaussian data analyses revealed a constraining power comparable to that of the two-point functions, it is expected that the gain will drastically increase with the statistical precision of the data. For example, constraints on the sum of neutrino mass ( m ν ), on the matter density (Ω m ) and on the amplitude of the primordial power spectrum (A s ), in a tomographic peak count analysis of LSST, are forecasted to improve by 40%, 39%, and 36% respectively, compared to a power-spectrum analysis of the same data . Upcoming measurements of the dark energy equation of state (w 0 ) will also benefit from these methods, with a forecasted factor of three improvement expected on the precision when combining two-point functions with aperture mass map statistics (Martinet et al. 2020). Similar results are found in the context of a final Stage-III lensing experiment such as the (upcoming) DES-Y6 data release, where the combination of non-Gaussian statistics with the power spectrum method reduces the error on the parameter combination S 8 ≡ σ 8 √ Ω m /0.3 by about 25% compared to a two-point function (Zürcher et al. 2020), where σ 8 is the normalisation amplitude of the linear matter power spectrum.
In the absence of accurate theoretical predictions for the signal, the covariance, and the impact of systematics, non-Gaussian statistics must be carefully calibrated on numerical simulations specifically tailored to the data being analysed, which are generally expensive to run. Faster approximate methods exist (e.g. Izard et al. 2018), however they typically suffer from small scale inaccuracies exactly in the regime where the lensing signal is the strongest, introducing significant biases in the inferred cosmological parameters. Previous peak count analyses of the third KiDS data release (KiDS-450) (Martinet et al. 2018, M18 hereafter) and of the DES Science Verification data (Kacprzak et al. 2016, K16 hereafter) calibrated their signal on a suite of full N-body simulations spanning the [Ω m − σ 8 ] plane described in Dietrich & Hartlap (2010). The accuracy of this suite has however been later shown to be only ∼10% (Giblin et al. 2018). Significant improvements on the simu-lation side are therefore critical for the new generation of data analyses based on non-Gaussian statistics.
This paper aims to address this issue: we present a cosmological re-analysis of the DES-Y1 cosmic shear data (Abbott et al. 2018b), exploiting a novel simulation-based cosmology inference pipeline calibrated on state-of-the-art suites of N-body runs that are specifically designed to analyse current weak lensing data beyond two-point statistics. In this work, the incarnation of our pipeline is tailored for the peak count analysis of the DES-Y1 survey, however it is straightforward to extend it to alternative non-Gaussian probes. Our pipeline first calibrates the cosmological dependence of arbitrary non-Gaussian measurements with the cosmo-SLICS (Harnois-Déraps et al. 2019), a segment of the Scinet LIght-Cone Simulations suite that samples Ω m , σ 8 , w 0 and h (the Hubble reduced parameter). We next estimate the covariance from a suite of fully independent N-body runs extracted from the main SLICS sample 7 (Harnois-Déraps et al. 2018). We further use the cosmo-SLICS to generate systematics-infused control samples that we use to model the impact of photometric redshift and shear calibration uncertainty. We study the impact of galaxy intrinsic alignment with dedicated mock data in which the ellipticities of central galaxies are aligned (or not) with the shape of their host dark matter haloes, following the in-painting prescription of Heymans et al. (2006) (see also Joachimi et al. 2013b, for a more recent application). We finally use a suite of high-resolution simulations (SLICS-HR, presented in Harnois-Déraps & van Waerbeke 2015) to investigate the impact of mass-resolution on the non-Gaussian statistics, and full hydrodynamical simulation lightcones from the Magneticum Pathfinder 8 to assess the effect of baryon feedback. All of the above are fully integrated with the cosmoSIS cosmological inference pipeline (Zuntz et al. 2015) and therefore interfaces naturally with the two-point statistics likelihood, enabling joint analyses with the fiducial DES-Y1 cosmic shear correlation function measurements presented in Troxel et al. (2018, T18 hereafter), with the 3 × 2pts analysis presented in DES Collaboration et al. (2017), or any other analysis implemented within cosmoSIS.
The current document is structured as follow: In Sec. 2 we present the data and the simulation suites on which our pipeline is built; Sec. 3 describes the theoretical background, the weak lensing observables and the analysis methods. A detailed treatment of our systematic uncertainties is presented in Sec. 4, the results of our DES-Y1 data analysis are discussed in Sec. 5, and we conclude afterwards. The Appendices contain additional validation tests of our simulations and further details on our cosmological inference results.

DATA AND SIMULATIONS
We present in this section the data and simulations included in our analysis. We exploit multiple state-of-the-art simulation suites in order to conduct our cosmological analysis, including a Cosmology training set to model the response of our measurement to variations in cosmology, as well as a Covariance training set and multiple Systematics training sets. These DES-Y1-specific simulation products are created from four suites of simulations, which we describe after introducing the data.
The total computing cost of the SLICS, cosmo-SLICS and SLICS-HR are 12.3, 1.1 and 1.3 million cpu hours, respectively. They were produced on a system of IBM iDataPlex DX360M2 machines equipped with one or two Intel Xeon E5540 quad cores, running at 2.53GHz with 2GB of RAM per core. Every simulation was split into 64 mpi processes, each further parallelised with either four or eight openmp threads. Modern compilers and cpus would likely bring the total computing cost down if similar simulations had to be run again in the future. Tiling strategy adopted to pave the full DES-Y1 data (black) with flatsky 10 × 10 deg 2 simulations (red squares). The squares overlap owing to the sky curvature, hence we separate the data at the mean declination in the overlapping regions. In our pipeline, measurements are carried out in each tile separately, then combined at the level of summary statistics. Table 1. Survey properties. The effective number densities n eff [in gal/arcmin 2 ] and shape noise σ listed here assume the definition of Chang et al. (2013). The column 'Z B range' refers to the photometric selection that defines the four DES-Y1 tomographic bins, while the mean redshift in each bin is listed under z DIR .

DES-Y1 data
In this paper we present cosmological constraints obtained from a reanalysis of the public lensing catalogues of the Year-1 data release 9 of the Dark Energy Survey (Abbott et al. 2018b). These catalogues were obtained from the analysis of millions of galaxy images taken by the 570 megapixel DECam (Flaugher et al. 2015) on the Blanco telescope at the Cerro Tololo Inter-American Observatory, observed in the grizY bands. The specific selection criteria of the DES-Y1 cosmic shear data used in this paper exactly match those of the cosmic shear analysis presented in Troxel et al. (2018): they consist of 26 million galaxies that pass the flags select, Metacal and the redMaGiC filters (Zuntz et al. 2018), thereafter covering a total unmasked area of 1321 deg 2 , for an object density of 5.07 gal arcmin −2 . The footprint of the DES-Y1 data is presented in Fig. 1, which shows in black the galaxy positions from the selected sample. The galaxy shears in the DES-Y1 data are estimated by two independent methods, Metacalibration (Sheldon & Huff 2017) and IM3SHAPE (Zuntz et al. 2013) that were both fully implemented (see Zuntz et al. 2018, for details). While they provide consistent results, the former method has a larger acceptance rate of objects with good shape measurements, and thereby results in measurements with higher signalto-noise. Following Troxel et al. (2018), we also adopt the Metacalibration shear estimates in our analysis. This method provides a shear response measurement per galaxy, R γ , a 2×2 matrix that must be included to calibrate any measured statistics (we refer to Zuntz et al. 2018, for more details on this calibration technique in the context of shear twopoint correlation functions). Additionally, the galaxy selection itself can 9 des.ncsa.illinois.edu/releases/dr1 introduce a selection bias, which can be captured by a second 2 × 2 matrix, labelled R S in T18, which we choose not to include due to the small relative contribution. We compute from these matrices the shear response correction, defined as S = Tr(R γ )/2. As explained in T18, the method imposes a prior on an overall multiplicative shear correction of m ± σ m = 0.012 ± 0.023, which calibrates the galaxy ellipticities as The galaxy sample is further divided into four tomographic redshift bins based on the photometric redshift posterior estimated from griz flux measurements (Hoyle et al. 2018). The redshift distribution in these bins, n i (z), must then be estimated, and a number of methods are proposed to achieve this. The fiducial cosmic shear results presented in T18 are based on the Bayesian photometric redshift (bpz) methodology described in Benítez (2000), which are consistent with a n(z) estimated by resampling the COSMOS2015 field (Laigle et al. 2016) with objects of matched flux and size (Hoyle et al. 2018). However, the accuracy of these two methods has been questioned in Joudaki et al. (2020, J20 hereafter), where it is argued that even though both the bpz and COSMOS resampling estimates account for statistical uncertainty, residual systematics effects could significantly affect the inferred n(z) distributions.In particular, the COSMOS sample could be populated with outliers and/or an overall bias that would affect the calibration (e.g. figure 11 of Alarcon et al. 2020), and J20 proposes instead to calibrate with redshifts from matched spectroscopic catalogues 10 . The direct reweighted estimation method (Lima et al. 2008, DIR hereafter) was selected for the fiducial cosmic shear analysis of the third KiDS data release (Hildebrandt et al. , 2018, and for the DES-Y1 data re-analyses of J20 and Asgari et al. (2020b), where it is found that this calibration brings both DES-Y1 and KV450 results in excellent agreement, affecting the constraints on S 8 by only 0.8σ. It should be noted also that DIR has inherent systematic uncertainties that are hard to quantify. In particular, incomplete spectroscopy and colour pre-selection (Gruen & Brimioulle 2017) can potentially bias the DIR n(z). Despite these issues that can in principle be addressed by a pre-selection of sources via the self organising map technique (Wright et al. 2020), we choose to adopt this DIR methodology for simplicity and to be able to easily relate our findings to previous work. We use the same tomographic redshift distribution n i (z) and uncertainty about the mean redshift z i DIR as in J20 here. In this method, the uncertainties on the mean redshifts, σ i z , are estimated from a bootstrap resampling of the spectroscopic samples. The density, the mean redshifts and the shape noise of the galaxies in individual tomographic bins are presented in Table 1.

Cosmology training set
The training set is constructed from the cosmo-SLICS (Harnois-Déraps et al. 2019, HD19 hereafter), a suite of wCDM N-body simulations specifically designed for weak lensing data analysis targeting dark matter and dark energy. These simulations cover a wide range of values in (Ω m , σ 8 , h, w 0 ). They sample the parameter volume at 25+1 coordinates organised in a latin hypercube (25 wCDM plus one ΛCDM point), and further include a sample variance suppression technique, achieving a sub-percent to a few percent accuracy depending on the scales involved. This is comparable to the accuracy of many widely-used twopoint statistics models based on non-linear power spectra from Halofit (Takahashi et al. 2012) or from HMcode (Mead et al. 2015(Mead et al. , 2020. Table 2. Summary of key properties from the four simulations suites used in our pipeline. L box is the box side [in h −1 Mpc], n p is the number of particles evolved, N sim is the number of N-body runs, N LC is the number of light-cones in the full training set and N cosmo is the number of cosmology samples. The bottom section summarises the range in cosmological parameters that is covered by the cosmo-SLICS. The full training range is detailed in Table 2, which also influences our choice of priors when sampling the likelihood (see Sec. 3.6).
Each run evolved 1536 3 particles inside a 505h −1 Mpc co-moving volume with the public cubep 3 m N-body code (Harnois-Déraps et al. 2013), generating on-the-fly multiple two-dimensional projections of the density field. These flat-sky mass planes were subsequently arranged into past light-cones of 10 degrees on the side, from which lensing maps were extracted at a number of redshift planes (see Sec. 2.6.1). This process was repeated multiple times after the mass planes were randomly selected from a pool of six different projected subvolumes, then their origins were randomly shifted. In total, 50 pseudoindependent light-cones per cosmology are available for the generation of galaxy lensing catalogues (see HD19 for a complete description). In the end we include 10 light-cones per cosmology out of 50, after verifying that our results do not change when training on only five of them. Indeed, 1000 deg 2 is enough to reach convergence on our statistics, largely due to the sample suppression technique implemented in HD19.
Two of these models (cosmology-fid and -00, see HD19) are used to infuse photometric redshift and shear calibration uncertainty, which we describe in Sec. 4.1 and 4.2, respectively.

Covariance training set
Our covariance matrix is estimated from the SLICS (Harnois-Déraps et al. 2018, HD18 hereafter), a public simulation suite in which the cosmology is fixed for every N-body run, but the random phases in the initial conditions are varied, offering a unique opportunity to estimate the uncertainty associated with sampling variance. The volume and number of particles are the same as for the cosmo-SLICS, achieving a particle mass of 2.88 h −1 M (see the properties summary in Table 2). The light-cones are constructed in the same way as the cosmo-SLICS, except that in this case the mass sheets are sampled only once per Nbody run, generating 124 truly independent realisations. The accuracy of the SLICS has been quantified in Harnois-Déraps & van Waerbeke (2015) by comparing their matter power spectrum to that of the Cosmic Emulator (Heitmann et al. 2014), which match to within 2% up to k = 2.0 hMpc −1 ; smaller scales progressively depart from the emulator. The cosmo-SLICS have a similar resolution.

Systematics training set: mass resolution
Numerical simulations are inevitably limited by their intrinsic mass and force resolution, and it is critical to understand how these affect any measurements carried out on the simulated data. We employ for this purpose a series of 'high-resolution' runs, first introduced in Harnois-Déraps & van Waerbeke (2015) and labelled 'SLICS-HR' therein.
These consist of 5 independent N-body simulations similar to the main SLICS suite, but in which the force accuracy of cubep 3 m has been increased significantly such as to resolve smaller structures, even though the particle number is fixed. These have been shown to reproduce the Cosmic Emulator power spectrum to within 2% up to k = 10.0 h −1 Mpc, indicating that even those small scales are correctly captured by the simulations. The SLICS-HR are post-processed with a strategy similar to that adopted for the cosmo-SLICS, re-sampling the projected mass sheets in order to generate 10 pseudo-independent light-cones per run.

Systematics training set: baryon feedback
Another important systematic we investigate in this analysis is the impact of strong baryonic physics that modifies the clustering property of matter. As noted in multiple independent studies, AGN feedback has a particularly important effect on the matter power spectrum but is challenging to calibrate. Simulations often struggle to reproduce the correct baryon fraction in haloes of different masses, and these differences in turn cause major discrepancies in the clustering properties (see Chisari et al. 2018, for example). In this paper, we examine one of these models and inspect which parts of our peak count measurements are affected by baryons.
We used for this exercise a series of light-cones ray-traced from a subset of the Magneticum Pathfinder hydrodynamical simulations, which are designed to study the formation of cosmological structures in presence of baryonic physics and which were recently described in Castro et al. (2020). These are based on the smoothed particle hydrodynamics (SPH) code p-gadget3 (Springel 2005), in which a number of baryonic processes are implemented, including radiative cooling, star formation, supernovae, AGN and their associated feedback on the matter density field. The Magneticum reproduce a number of key observations such as statistical properties of the large-scale, inter-galactic and inter-cluster medium, but also central dark matter fractions and stellar mass size relations (see Hirschmann et al. 2014;Teklu et al. 2015;Castro et al. 2018Castro et al. , 2020, for more details). What is especially important in our case is that the total baryonic feedback on the matter field is comparable to that of the BAHAMAS cosmological hydrodynamical simulations (McCarthy et al. 2017), in particular in terms of the strength of the effect on the matter power spectrum. This derives from the similar baryon fractions produced by Magneticum and BAHAMAS, which are in reasonable agreement with observations. This validates the Magneticum as a good representation for the impact of baryon feedback, given the current uncertainty on the exact impact (see Sec. 4.3 for further discussion).
Among the various runs, we use a combination of the highresolution Run-2 (Hirschmann et al. 2014) and Run-2b (Ragagnin et al. 2017), which both co-evolve dark matter particles of mass 6.9 × 10 8 h −1 M and gas particles with mass 1.4×10 8 h −1 M in comoving volumes of side 352 and 640 h −1 Mpc, respectively; the smaller (larger) box is used at lower (higher) redshift, and the transition occurs at z = 0.31. The input cosmology is consistent with the SLICS but slightly different, with Ω m = 0.272, h = 0.704, Ω b = 0.0451, n s = 0.963 and σ 8 = 0.809. Both Run-2 and Run-2b also exist in pure gravity mode (i.e. dark matteronly) with otherwise identical initial conditions, allowing us to isolate the impact of the baryonic sector on our observables.
2.6 Simulation post-processing 2.6.1 Light-cones The simulation suites used in this paper all work under the flat sky approximation, which assumes that the maps are far enough from the observer so that cartesian axes can be used instead of angles and radial distances. At preselected redshifts z, the N-body/hydrodynamical codes assign the particles onto a three-dimensional grid, select a sub-volume to be projected with pre-determined co-moving thickness, and collapse the mass density along one of the axis. This procedure is repeated with different projection directions and sub-volumes, creating a collection of mass sheets at every redshift. These are next post-processed to generate a series of past light-cone mass maps, δ 2D (θ, z), each of 100 deg 2 , which are then used to generate convergence κ(θ, z s ) and shear γ(θ, z s ) maps at multiple source redshift planes, z s , (see HD18 and HD19 for full details), where γ = γ 1 + iγ 2 , the two components of the spin-2 shear field. From these, mock lensing quantities (κ, γ) can be computed for any galaxy position provided its (RA, Dec) coordinates and a redshift.

Assembling the simulated surveys
As for many non-Gaussian statistics, peak counts are highly sensitive to the noise properties of the data. As such the simulations need to reproduce exactly the position and shape noise of the real data, otherwise the calibration will be wrong. The solution, adopted in Liu et al. (2015a), K16 and M18 is to overlay data and simulated light-cones, and to construct mock surveys from the position and intrinsic shape of the former, and the convergence and shear of the latter.
Since the size of the full DES-Y1 footprint largely exceeds that of our individual light-cones, we connect the data and simulations with a 'mosaic' approach, where the DES-Y1 galaxy catalogues are divided into smaller 'tiles' 11 that all fit inside 100 deg 2 square areas. Each of these tiles are then overlaid with simulated light-cones from which lensing quantities are extracted. In total, 19 tiles are required to assemble the full footprint with our mosaic, which is shown in Fig. 1. Every simulated light-cone from the Cosmology, Covariance or Systematics training sets is therefore replicated 19 times and associated with a full realisation of the survey.
We emphasise that the simulated light-cones are discontinuous across tile boundaries, whereas data are not. To avoid significant calibration biases caused by this difference, no measurement whatsoever must extend over tile boundaries. Both data and mock data are separated in tiles at the catalogue level; these are then analysed individually, and the data vectors are combined at the end 12 .
Another subtle difference that needs to be taken into account is that the position coordinates (RA, Dec) and the galaxy ellipticities ( ) from the data are provided on the (Southern) curved sky, whereas all of our simulations assume a (X, Y) cartesian coordinate system. Since the physics are independent of our choice of coordinate system, and since we analyse every tile individually, we apply a coordinate transformation to centre every tile onto the equator, where both coordinate frames converge 13 . The weak lensing statistics of a given tile are unaffected by this rotation, a fundamental fact that we verify with two-point correlation functions in Sec. 3.1.
As easily noticed from looking at Fig. 1, some of the galaxies fall outside the tiles, which slightly affects the total number of galaxies in the sample. It is not ideal, but adding multiple simulated tiles for such a small fraction (1.9%) of the data is arguably not worth the effort. The number of objects listed in Table 1 reflects this final selection and amounts to a total of 25.5 million of galaxies. 11 These tiles are sometimes called 'patches' in the literature, e.g. in K16. 12 Note that the tiles are identical for all simulations (SLICS, cosmo-SLICS, SLICS-HR and Magneticum) since their light-cones all have the same opening angle. 13 In this process, we rotate both the celestial coordinate and the ellipticities of every galaxy in the tile, to account for the modified distance to the South pole in the new coordinate frame. The exact transformation uses the method presented in the Appendix B of Xia et al. (2020), which rotates pairs of galaxies from any orientation on the sky onto the equator, placing one member at the origin. In our case we instead map to the equator the straight line that bisects every tile. Every tile has its unique rotation vector, which we also use to displace the galaxies and to recompute their ellipticities ( 1/2 ) in this new coordinate frame.

Mock galaxy shapes and redshifts
As mentioned above, the position and the intrinsic ellipticities of individual galaxies in the simulated catalogues are taken from the observations. Redshifts are assigned to every object in a given tomographic bin 'i' by sampling randomly the n i (z) described in Sec. 2. Therefore, variations in survey depth are not included in our training sets. This induces a systematic difference with the data, but we expect that this has a minor effect on our cosmological measurement. Indeed, it was shown in Heydenreich et al. (2020b) that the impact of survey depth variability is subdominant for Stage-III surveys. At this stage, every galaxy has position and a redshift, which are used to extract the lensing quantities (κ, γ) from the simulation light-cones.
We finally include the intrinsic galaxy shapes and Metacal shear response correction in the simulations by randomly rotating the observed galaxy shapes such as to undo the cosmological correlations from the data, and we then combine the new ellipticity int with the simulated lensing signal as: Here g is the reduced shear, defined as g = γS /(1+κ), and the bold-font symbols g, γ, and int are again spin-2 complex quantities. We investigate in Sec. 4.4 the impact of the intrinsic alignment of galaxies, where int is no longer chosen at random and instead correlates with the shape of dark matter haloes.

THEORY AND METHODS
Since we validate our simulation suites with cosmic shear correlation functions and lensing power spectra, we begin this section with a review of the theoretical modelling and the measurement strategies related to these quantities. We next move to the primary focus of this paper and describe our peak count statistics pipeline, detailing our treatment of the data, our approach to modelling the signal and estimating the covariance matrix, and we finally describe our cosmological inference methods.

ξ ± statistics
Two-point correlation functions (2PCFs) are well studied and have a key advantage over other measurement techniques: as for all lensing two-point statistics, their modelling can be accurately related to the matter power spectrum, P(k, z), whose accuracy is reaching the percent level far in the non-linear regime when calibrated with N-body simulations at small scales, and in absence of baryonic physics (Heitmann et al. 2014;Euclid Collaboration: Knabenhans et al. 2019). From this P(k, z), the lensing power spectrum between tomographic bins 'i' and ' j' is computed in the Limber approximation as: where χ H is the co-moving distance to the horizon, and the lensing kernels q i are computed from the redshift distributions n(z) as: where c and H 0 are the speed of light and the Hubble parameter, respectively. The cosmic shear correlation functions ξ i j ± are computed from the C i j as: with J 0/4 (x) being Bessel functions of the first kind. Following T18, Eq. 4 is solved with the cosmological parameter estimation code cos-moSIS 14 (Zuntz et al. 2015), in which the matter power spectrum is calculated by the Halofit model of Takahashi et al. (2012). The ξ i j ± predictions for the DES-Y1 measurements are shown by the black lines in Fig. 2 for all pairs of tomographic bins, at the SLICS input cosmology.
The measurements of ξ i j ± from simulations and data are carried out with Treecorr (Jarvis et al. 2004), a fast parallel tree-code that computes shape correlations between pairs of galaxies 'a, b' separated by an angle ϑ as: In the above expression, the sums are over all galaxies 'a' in tomographic bin i and galaxies 'b' in tomographic bin j; i a,t and i a,× are the tangential and cross components of the ellipticity of galaxy a in the direction of galaxy b; W a/b are weights attributed to individual galaxies, which are set to unity in the Metacalibration shear inference method; S a/b are the 'shear response correction' per object mentioned in Sec. 2.1 and provided in the DES-Y1 catalogue; ∆ϑ ab is the binning operator, which is equal to unity if the angular separation between the two galaxies falls within the ϑ-bin, and zero otherwise. Our raw measurements are organised in 32 logarithmically-spaced ϑ-bins, in the range [0.5 − 475.5] arcmin, but not all angular scales are used in this work 15 .
We present in Fig. 2 our measurements of ξ i j ± on the DES-Y1 data, showing with the black solid points the measurement on the full footprint, and with the open blue triangles the measurements on the 19 data tiles described in Sec. 2.6.2, which are combined with a weighted mean using the Treecorr N pairs (ϑ) per tile as our weights. We see that the two results are similar, with differences that are everywhere at least twice as 14 bitbucket.org/joezuntz/cosmosis/wiki/Home 15 Note that T18 used 20 logarithmic bins in the range 2.5-250 arcmin. small as the statistical error measured from the covariance mocks (see below) and evenly scattered about the black points, validating our tiling method. We further verified that the difference on the inferred cosmology is negligible (see Sec. 5).
We also show, with the red squares in Fig. 2, the mean and expected 1σ error on the DES-Y1 data as estimated from the Covariance training set. The agreement between theory and simulations is excellent at all scales, and the slight differences are well under the statistical precision of the data. We can observe a slight loss of power at large angular scales in the ξ + statistics, a finite box effect that we forward model (see Appendix A1). For every simulated light-cone we generate a total of 10 realisations of the shape noise by rotating, as many times, every galaxy in the catalogue, and recomputing new observed ellipticities (with Eq. 1) and the correlation functions (Eq. 5). The red squares in Fig. 2, as well as their associated error bars, correspond to one of these realisations; we observe no significant change in the other nine realisations, and recover the error bars reported by T18 to within 5-15% over most angular scales, further demonstrating the robustness of our training set. We do not expect a perfect match due to the slightly different binning scheme.
Our cosmological analyses exclude the same angular scales as in T18, removing the elements of the data vector where T18 conclude that the uncertainty on the baryonic feedback and in the non-linear matter power spectrum is non-negligible. These scales are indicated by the vertical lines in Fig. 2.
The variation of ξ ± with cosmology are well captured by cosmo-SIS, and so are the responses to photometric redshift and shear calibration uncertainties (see Sec. 4). We therefore do not measure this statistic in the Cosmology nor the Systematics training sets, and use instead the public modules provided in the latest cosmoSIS release to calculate these.

Shear peak count statistics
As mentioned in the introduction, the peak count statistic is a powerful alternative method to extract cosmological information from weak lensing data. It consists of measuring the 'peak function', i.e. the number of lensing peaks as a function of their signal-to-noise, which is very sensitive to cosmology and robust to systematics (see Zürcher et al. 2020;Martinet et al. 2020, for recent comparisons with other lensing probes).
Our measurement technique closely follows that described in K16 and M18, which we review here. Peaks are identified from local maxima in the signal-to-noise maps of the mass within apertures (Schneider 1996), M ap (θ), searching for pixels with values higher than their 8 neighbours. This is one of many ways to estimate the projected mass map from galaxy lensing catalogues, and was chosen primarily for its local response to data masking. This is to be contrasted with e.g. the Fourier methods of Kaiser & Squires (1993) in which masking introduces a complicated mode-mixing matrix that can affect all scales. Other techniques such as Bayesian mass reconstruction (Price et al. 2020) or wavelets transforms (Leistedt et al. 2017) are also promising and merit to be explored in the future (see as well Gatti et al. 2020, and references therein).
From a lensing catalogue containing the position, ellipticity a and shear response correction S a per galaxy, we construct an aperture mass map on a grid by summing 16 over the tangential component of the ellipticities from galaxies surrounding every pixel at coordinate θ, weighted by an aperture filter Q. More precisely, we compute: where n gal (θ) is the galaxies density in the filter centred at θ, and θ a is the position of galaxy a. The tangential ellipticity with respect to the aperture centre is computed as In the above expression, x = θ/θ ap , where θ is the distance to the filter centre, and we adopt x c = 0.15 as in previous works, a choice that maximises the sensitivity of the signal to the massive haloes, which carry the majority of the cosmological information. The filter size of θ ap = 12.5 arcmin is adopted as in M18, however we also consider 9.0 and 15.0 arcmin, and report results for these where appropriate.
Hereafter, Eq. (6) defines the signal of our aperture mass map, which we compute at every pixel location. The variance about this map is calculated at every pixel location from: where again the sum runs over all galaxies in the filter. Note that the magnitude of the measured galaxy ellipticities that enters this equation must also be calibrated by the shear response correction (see the Appendix A of Asgari et al. 2020b), hence the term [ a S a ] 2 in the denominator. The signal-to-noise map from which peaks are identified, M(θ) ≡ S/N, is computed by taking the ratio between Eq. (6) and the square root of Eq. (8) at every pixel location, e.g.
16 In practice, we use a link-list to loop only over nearby galaxies. Peaks catalogues are first constructed from the galaxy catalogues separated in tomographic bins (which we label 1, 2, 3 & 4), and then from every combination of pairs of tomographic catalogues (which we label 1∪2, 1∪3, 1∪4 ... 3∪4). As detailed in Martinet et al. (2020), analysing these 'cross-tomographic' catalogues provides additional information that is not contained within the 'auto-tomographic' case. They went further and also included triplets (1∪2∪3, 1∪2∪4, 1∪3∪4...) and quadruplets (1∪2∪3∪4), showing that these also contained additional information, but this gain is not as significant in our case, where the noise levels are much higher.

Masking
Weak lensing data are taken inside a survey footprint, and parts of the images are removed in order to mask out satellite tracks, bright stars, saturated foreground galaxies, etc. The effect of data masking on the aperture mass map can be significant: the signal and the noise are coherently diluted in apertures that strongly overlap with masked regions, generating regions where M is overly smooth. Therefore the survey mask must be included in the simulations and in the estimator such as to avoid biasing the statistics.
If the masked pixels are known, this can be taken into account by avoiding pixels for which e.g. more than half of the filter overlaps with masked areas. Alternatively, one can examine the object density in the aperture filter, n gal (θ), and require that it exceeds a fixed threshold in order to down-weight or reject heavily masked apertures. In this method, pixels with little or no galaxies are treated as masked. We opted for the second method, setting the threshold to 1/π 2 gal/arcmin 2 after a few different trials, which directly identifies regions with very low galaxy counts. We further augment the masking selection with an apodisation step that flags as 'also masked' any pixel within a distance θ ap of a masked region found in the first step. Fig. 3 illustrates this procedure for one of the tiled catalogues, for an idealised noise-free case. For our fiducial choice of filter θ ap = 12.5 arcmin, we show in the upper two left panel the 'raw' M ap (θ) map (e.g. before masking, computed directly from Eq. 6) as well as N gal (θ). The masked regions are clearly visible in the latter but not so much in the former. A close inspection (top right panel) however reveals overly smooth features in M ap (θ), in regions where there are no galaxies (i.e. in the blue regions of the N gal (θ) map). The third left panel shows the masked regions constructed from our pipeline, which is finally applied on the raw aperture map, resulting in the masked map shown on the bottom panel. All choices of θ ap result in aperture maps that closely recover the true convergence (shown in the bottom right panel).
It is clear from Fig. 3 that the unmasked area of our final maps is affected by the aperture filter size. Indeed, larger filters can be blind to small features in the mask, while the survey edges are more severly excluded. This does not bias our cosmological inference since we apply the same filter to the data and the simulations, but it does slightly affect the signal-to-noise of our measurement, which increases with the area of the survey. The net unmasked area in our final maps are (1426,1408,1366,1327,1284) deg 2 for θ ap = (6.0, 9.0, 12.5, 15.0, 18.0)', respectively.

Peak function
Peaks found in the (masked) M maps are counted and binned as a function of their pixel value, thereby measuring the peak function N peaks (S/N). We use 12 bins covering the range 0 < S/N 4 in our main data vector, which was found in K16 and M18 to avoid scales where multiple systematics uncertainties such as the effects of baryon feedback and intrinsic alignments of galaxies become large (we extend this range to higher S/N values in some of our systematics investigations). 12 bins is also a good trade-off between our need to capture most cosmological information from N peaks (S/N), while keeping a small data vector for which the covariance matrix will be less noisy. A number of recent studies (M18, Davies et al. 2019;Coulton et al. 2020;Zürcher et al. 2020;Martinet et al. 2020;Davies et al. 2020) have shown that cosmological information is contained in peaks of negative S/N or in lensing voids, however, as noted in Appendix B of M18, the peaks with negative S/N strongly correlate with those of positive S/N value and only marginally improve the constraints from peak statistics in the case of Stage III surveys. We therefore focus only on the positive peaks in this DES-Y1 analysis.
We show in Fig. 4 the peak function measured from the Cosmology training set with θ ap = 12.5 arcmin, for all pair combinations of the four redshift bins and colour-coded as a function of the input S 8 . A pure noise case (N noise ), obtained from the average peak function after setting γ = 0 on 10 full survey realisations, has been subtracted to highlight the cosmological variations. Off-diagonal panels present the crosstomographic measurements. The colour gradient is clearly visible in all tomographic bins; more precisely, all cosmologies present an excess of large S/N peaks and a depletion of low S/N peaks compared to pure noise. This is caused by the gravitational lensing signal, which create peaks and troughs in the M ap map and smooths out the smallest peaks. Importantly, these differences are accentuated for high-S 8 cosmologies. Also shown with black squares are the measurements from the DES-Y1 data, with error bars estimated from the Covariance training set. These demonstrate that most of the constraining power comes from the autotomographic bins 3 and 4 and from the cross-tomographic bins. Some additional information is contained in the highest S/N peaks of the redshift bins 1 and 2, whereas the low S/N peaks of bin 2 mostly contribute noise.

Analysis pipeline
In this analysis we extend multiple aspects of the K16 and M18 methodologies. Here is a summary of these improvements: (i) We include a tomographic decomposition of the data, including the cross-redshift pairs inspired by the method presented in Martinet et al. (2020); (ii) Our Cosmology training set (see Sec. 2.2) now includes four parameters (Ω m , σ 8 , h and w 0 ), and it would be straightforward to increase that parameter list with additional training samples. Additionally, the cosmo-SLICS simulations are more accurate than those of Dietrich & Hartlap (2010), which were used in both K16 and M18: they resolve smaller scales, and suffer less from finite box effects, having a volume almost 8 times larger; (iii) We deploy a fast emulator (see Sec. 3.5) that can model the signal at arbitrary cosmologies within the parameter volume included in the training. In contrast with a likelihood interpolator, emulating the data vector directly allows us to combine the summary statistics with other measurement methods such as the two-point correlation functions, to better include systematic uncertainties, and to easily interface with most likelihood samplers; (iv) We generate a Covariance training set from a larger ensemble of independent survey realisations (see Sec. 2.3), and feed it into a novel hybrid internal resampling technique that improves the accuracy and precision of lensing covariance matrices estimated from the suite (see Sec. 3.4). Moreover, the covariance training set is shown to closely reproduce the published DES-Y1 cosmological constraints of T18 when analysed with two-point correlation functions (see Table 5), thereby validating both the simulations themselves and the covariance estimation pipeline. Our method is also compatible with joint-probe measurements; (v) We construct a series of dedicated Systematics training sets specifically tailored to our data, in which the most important cosmic shear-related systematics are infused. Specifically, we investigate the impact of photometric redshift uncertainty (Sec. 4.1), of multiplicative shear calibration uncertainty (Sec. 4.2), of baryonic feedback (Sec. 4.3), of possible intrinsic alignment of galaxies (Sec. 4.4), and of limits in the accuracy of the non-linear physics (Sec. 4.5). These tests allow us to flag the elements of our data vector that are affected, and in some case to model the impact. Following K16, we construct a linear response model to the photometric redshift and multiplicative shear calibration uncertainties, but calibrate our models on a sample of 10 deviations from the mean of the distribution (respectively ∆z i and ∆m i , with i = 1..10) as opposed to one; (vi) We implement the emulator, the covariance matrix and the linear systematic models within the cosmology inference code cosmoSIS (Zuntz et al. 2015), allowing us to carry out a joint likelihood analysis based on peak statistics and shear two-point correlation functions, while coherently marginalising over the nuisance parameters that affect both of these measurement methods.
With this new pipeline, we are fully equipped to investigate the impact of different measurement and modelling methods, of different systematics mitigation strategies, but also of analyses choices related to the likelihood sampling, such as the prior ranges, the specific combination of parameters to be sampled, or the manner in which maximum likelihoods and confidence intervals are reported. Indeed, it has been shown that these have a non-negligible impact on the final cosmological constraints, specifically in the context of weak lensing cosmic shear analyses (Joudaki et al. 2017;Chang et al. 2019;Joudaki et al. 2020;Asgari et al. 2020a;Joachimi et al. 2020) 17 . It turns out that these approaches do not make too much of an impact for current cosmic shear data. Moreover, it is not our primary goal here to optimise these choices, as we are rather interested in establishing our simulation-based inference method as being robust, accurate and flexible. We therefore opted for an overall analysis pipeline that maximally resembles that of the fiducial DES-Y1 cosmic shear data, and leave some additional tuning for future work.
Aside from a different choice of n(z) calibration (see Sec. 2.1), the key differences between our current pipeline and that of T18 are the impossibility of ours to vary and marginalise over the other cosmological parameters -the power spectrum tilt parameter n s , the baryon density Ω b and the sum of neutrino mass m ν . These would require more lightcone simulations such as the Mira-Titan (Heitmann et al. 2016) or the MassiveNuS (Liu et al. 2018), which are not folded in our training set at the moment, but form a natural extension to this work. Also missing is a cosmology-dependent model for the effect of intrinsic alignment, which could be necessary in future analyses.

Covariance matrix
The covariance matrix is a central ingredient to our cosmological inference as it describes the level of correlation between different elements of our data vector, and its inverse directly enters in the evaluation of the likelihood. In our analysis, it is estimated from the Covariance training set, which is based on 124 independent light-cones, each replicated onto the 19 survey tiles such as to fully cover the DES-Y1 footprint. For each of these survey realisations, we further generate 10 shape noise realisations by randomly rotating the ellipticity measurements from the data, which increases the number of pseudo-independent realisations to N sim = 1240 and is largely enough for the current analysis.
The next step consists in combining the measurements obtained in the 19 different tiles into a final measurement of the [ξ i j ± (ϑ); N i j peaks (S/N)] covariance. To achieve this, we mix the lightcones at the survey construction stage, such that for each of the 124 full survey assembly, the 19 tiles are extracted from 19 different light-cones selected at random. This mixing suppresses an unphysical large-scale mode-coupling caused by the replication, which otherwise results in an overall variance on ξ ± that is an order of magnitude too large. (Note that the ξ ± covariance block is identical to an alternative estimation based on computing the matrix for individual tiles, which are subsequently combined with an area-weighted average, but the N i j peaks (S/N) block in that latter case becomes inaccurate in this case so we reject this approach.) We repeat this for each of the 10 noise realisations and use the average matrix as our final estimate.
The net effect of averaging over multiple shape noise realisations is to significantly lower the noise in the matrix, especially over the terms where shape noise dominates. A similar technique is applied in M18, who also find a negligible impact on the cosmological results from KiDS-450 data whether they average over 5 or 20 noise realisations. Fig. 5 shows the resulting matrix, normalised to unity of the diagonal, e.g. r(x, y) = Cov(x, y)/ Cov(x, x) Cov(y, y). Whereas the ξ + block shows the highest level of correlation, the off-diagonal blocks are -0.5 0 0.5 1 Figure 5. This cross-correlation matrix highlights the correlations between the different elements of the data vector. From left to right, the first 10 blocks show the ξ + tomographic measurements, followed by the 10 ξ − blocks, while the last 10 blocks show the tomographic peak count. Not all elements are used in the analysis, see the main text for more details. mostly uncorrelated, which is promising for the prospect of learning additional information from the joint analysis.

Peak function emulator
The peak count statistics measured from the Cosmology training set (shown in Fig. 4 with the coloured histograms) is computed at 26 points in a wide 4-dimensional volume. From these we train a Gaussian Process Regression (GPR) emulator that can model the peak function given an input set of cosmological parameters [Ω m , S 8 , h, w 0 ] at any point within the training volume. Directly adapted from the public cosmo-SLICS emulator 18 described in Appendix A of HD19, we train our GPR emulator on the individual elements of the N peaks data vector, first optimising the hyper-parameters from an MCMC analysis that includes 200 training restarts, then 'freezing' the emulator once the best-fit solution has been found. As described in HD19, the training can also involve a PCA decomposition and a measurement error; we include the former but find that the modelling is more accurate without the latter.
We evaluate the accuracy of the emulator from a leave-one-out cross-validation test: the emulator is trained on all but one of the training nodes, then generates a prediction of the peak function at the removed cosmology, which is finally compared with the actual measurement. This test is performed for all nodes and provides an upper bound on the interpolation error, since in this case the distance between the evaluation point and all other training nodes is significantly larger than if all points had been present. Moreover, many of these points lie at 18 github.com/benjamingiblin/GPR Emulator/ the edge of the training volume, hence removing them for this test requires the emulator to extrapolate from the other points, which is significantly less accurate than the interpolation that is normally performed. As discussed in HD19, the node at the fiducial cosmology was added by hand close to the centre of the wCDM Latin hypercube, hence the cross-validation test performed at that single ΛCDM point is more representative of the actual emulator's accuracy.
The results from this accuracy test are presented in Fig. 6, again colour-coded with S 8 . We achieve sub-percent interpolation accuracy over data points with S/N < 3, and for all points when testing the ΛCDM model (shown in thick black). We observe in some other models a scatter of up to a few percent for peaks with S/N > 3, but this scatter over-estimates the true interpolation error for reasons explained above. In term of accuracy target for the model and other systematics effects, we generally aim for an impact on the cosmological inference that is less than 0.5σ stat ; all dominant effects are documented in Secs. 4 and 5.2. Generally speaking, most systematic effects that have a < 5% impact on a small number of elements in the data vector will satisfy this criteria. When compared to the statistical error on the DES-Y1 measurement, the GPR emulator's error is always subdominant (see the thick dashed lines in Fig. 6). We conclude from this that the accuracy of our model is high enough, given our current data, and that it should introduce no noticeable bias in the cosmological inference.

Likelihood
The GPR emulator is embedded within the cosmoSIS cosmological inference package, which allows us to evaluate the likelihood at any cosmology within the cosmo-SLICS training range, given measurements of ξ i j ± (ϑ) and N i j peaks (S/N) from the data, plus our joint covariance matrix. At the moment we can only provide predictions of the peak function for the θ ap values on which the GPR was trained, but in the future this could also be treated as a free parameter to be emulated, providing even more flexibility to the prediction code and optimisation avenues.
The predictions x at cosmology π are then compared with the data d using a multivariate t-distribution likelihood following Sellentin & Heavens (2016): This likelihood correctly takes into account the residual noise in the covariance matrix that stems from its sampling with a finite number of simulations, and reduces to the standard multivariate Gaussian likelihood when N sim → ∞. Since there are, at the very least, hundreds of peaks in each of our bins, adopting this likelihood is justified. For our first pipeline validation exercise, our choice of priors matches that of T18 in our two-point statistics-only analysis (see Table 3), allowing us to investigate the effect of replacing the fiducial (analytic) covariance matrix with our simulation-based matrix on the parameter inference. At the same time this serves to validate our simulations.
In our three fiducial analyses (2PCFs, peaks and joint), the priors reflect the parameter range probed by the cosmo-SLICS, and hence we assign a flat prior on Ω m , σ 8 , h and w 0 (summarised in Table 3). All other parameters (i.e. the baryon density, the tilt in the primordial power spectrum and the sum of neutrino masses) are kept fixed to Ω b = 0.0473, n s = 0.969 and m ν = 0.0eV, respectively. The lensing constraints on these are very weak at the moment, hence we do not expect that holding them fixed should significantly affect our results.
We finally include the same 10 nuisance parameters as in T18: a shear calibration ∆m i and a photometric redshift calibration ∆z i per tomographic bin, plus two parameters associated with the modelling of the intrinsic alignments (IA) in the non-linear alignment model (Bridle & King 2007, NLA). The latter two are not included in the peak count case for which we conduct instead a simulation-based assessment of the impact of IA. We sample the likelihood with the MultiNest sampler (Feroz et al. 2009), set with a tolerance parameter of 0.1 and an efficiency of 0.3. We refer the interested reader to T18 and Krause et al. (2017) for more details about the DES-Y1 cosmology inference pipeline. We validate our implementation with a series of likelihood sampling analyses where the 'data' is taken from the mean measurement extracted from the Covariance training set, for which the cosmology and the systematic biases are known. We detail these results in Appendix A, and compare the 2PCFs and the peaks wCDM performance on these mocks as well. We further validate our ΛCDM 2PCFs segment both against the T18 and J20 results in Sec. 5.1. It is worth mentioning here that Jeffrey & Abdalla (2019) have proposed a correction to the likelihood calculation when the model is inferred from noisy estimates, which we could have used the residual noise in our training sample had been judged too large, however this is not the case, with 1000 deg 2 of light-cone data per cosmology, times 10 noise realisation.

SYSTEMATICS
The likelihood modules within cosmoSIS are equipped with an infrastructure that allows us to define nuisance parameters and to marginalise over them. In particular, for the two-point correlation function sector, the photometric redshift errors ∆z i are included by shifting the tomographic redshift distributions, e.g. n i (z) → n i (z + ∆z i ), after which new predictions for ξ i j ± are computed from Eqs. (2 -4). Errors on the shear calibration, ∆m i , are included directly on the statistics as ξ i j ± → ξ i j ± (1 + ∆m i )(1 + ∆m j ). Finally, we include the two-parameters model of intrinsic alignments of galaxies that was used in T18. We keep the T18 priors on the IA and shear calibration nuisance parameters, but use the J20 priors for the redshift uncertainty in our fiducial analyses. These are all summarised in Table 3. Inaccuracies at small scales due to uncertainty in the non-linear physics and in the baryonic feedback are controlled with the angular scale cuts applied on ξ ± , rejecting elements of the data vector that vary by more than 2% in presence of these systematics. As pointed out in T18 (see their table 3), even with these stringent cuts, strong feedback mechanisms could shift the inferred S 8 value by more than 0.5σ.
We expand the existing cosmoSIS infrastructure to include systematics models of the peak statistics based on our simulation training sets. Specifically, we added modules to include the effect of photometric and shear calibration errors, which we parameterised by the same ∆z i and ∆m i nuisance parameters as for the 2PCFs, allowing us to marginalise coherently over these in a joint [ξ ± ; N peaks ] analysis. These two models are detailed in Secs. 4.1 and 4.2. All other sources of uncertainty are identified and controlled by removing S/N > 4 bins, which are identified from our Systematics training set as being contaminated beyond an acceptance threshold, or by verifying that they do not impact the best-fit cosmological parameters. The following sub-sections detail our treatment of these sources of systematic uncertainty for the peak count measurements; the reader hungry for results can skip ahead to Section 5.

Photometric redshifts
As there is no analytic prescription to model the effect of photometric redshift uncertainty on the peak function, we investigate its impact directly from the simulations: we infuse different shifts ∆z i in the galaxy distribution of the four tomographic bins (similar to the treatment in 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 0 0. 5   1   1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4   0 0.5 1 model-FID model-00 Figure 7. Derivative of the (log of the) peak function N peaks with respect to shifts in the mean of the galaxy redshift distribution, ∆z (upper) and in the mean shear calibration ∆m (lower). These derivatives are shown as a function of S/N bin, for every tomographic case and are computed from 10 deviation points (see main text for details). The cosmo-SLICS model-FID is in red, model-00 in black, and the error bars are obtained from the scatter over 10 realisations of the full survey. Other filter sizes yield slightly different derivatives, but exhibit a similar level of agreement between the two cosmological models. The FID derivatives are used in the cosmology pipeline to marginalise over these two nuisance parameters. the 2PCFs), generate new mock light-cones and galaxy catalogues, and compute the peak statistics from these. Our approach is similar to the linear model adopted by K16, which computes a two-point numerical derivative from simulations produced with a (single) shift in the mean n(z) by ∆z. Our model is slightly more sophisticated: we sample 10 ∆z i values in every S/N bin, drawing numbers randomly from Gaussian distributions with means of zero and standard deviations σ z given by J20 priors on the nuisance parameters (see Table 3). In the case of cross-tomographic bins, we use the mean between the two σ z values.
For every shift, we generate five light-cones and use these to cover the full survey (with the tiling strategy described in Sec. 2.6.2); we next measure N peaks (S/N) as a function of ∆z i , and fit a straight line through these 10 points in order to extract the numerical derivative ∂N peaks /∂∆z for every S/N and tomographic bin. The linear fit captures well the response to changes in ∆z i , reaching signal-to-noise ratios between 5 and 40 depending on the bin, except for the 6 th bin where the peak function and the derivatives are consistent with pure noise. We carry out this calculation for the different aperture filters investigated in this paper, but also at two different cosmological models 19 (model-FID and -00) in order to assess the stability of the derivative with respect to cosmology. The results are shown in the upper panels of Fig. 7.
We note that the results for the two cosmological models are in qualitative agreement, where the response of high (low) S/N peaks to an increased survey depth is positive (negative). This is caused by the fact that a greater depth increases the shear signal, which shifts the peak function towards higher S/N values. Some differences are observed towards the large S/N values. With model-00 being quite distant from model-FID -notably a 12% lower value of S 8 -we do not expect the derivatives to be identical, but the impact of this difference is highly suppressed by the tight priors on ∆z i j . Given the current statistical uncertainty of the measurements however, we therefore ignore this cosmology dependence, but this will need to be revisited in the future. Similarly to K16, we choose to model the redshift uncertainty by scaling the measurement in each of the S/N bins with this linear model prior to marginalising over ∆z in the likelihood inference. Namely, we compute N i j peaks (∆z i j ) = N i j peaks (∆z = 0) + ∂N i j peaks /∂∆z ∆z i j , using the derivative extracted from the model-FID cosmology, where ∆z i j = ∆z i + ∆z j /2.

Shear calibration
The uncertainty in the shear calibration is forward-modelled with a similar method, except that no additional ray-tracing is required. Instead we include a uniform (1 + ∆m i ) correction factor at the catalogue level, which multiplies every observed ellipticities in Eqs. (6) and (8). We next de-bias the peaks measurement with the original shear response S a but deliberately ignore these additional ∆m i factors, resulting in a net residual bias caused by an incorrect shear calibration, which is exactly what we wish to model. We repeat this process for 10 values of ∆m i sampled from the priors on the shear calibration errors (a Gaussian with width σ m = 0.023, see Table 3), we measure the peak function for each of these samples on the full survey, average over 5 light-cones, and fit a straight line to these points in every S/N bin and tomographic bin, allowing us to compute derivatives and model N i j peaks (∆m i j ) = N i j peaks (∆m = 0) + ∂N i j peaks /∂∆m ∆m i j .
The partial derivatives are calibrated this way for cosmological models-FID and -00 and reported in the lower panels of Fig. 7. We observe a global agreement between the two cosmologies, similar in shape to the effect of increasing the survey depth, with the amplitudes of the derivative being slightly larger towards the high S/N bins for model-FID, which is linked to its higher S 8 value. We ignore these differences in this work due to the small size of σ m relative to the statistical precision of the DES-Y1 data, and use solely the model-FID derivatives in the likelihood sampling. However, this could be easily addressed with our approach: the derivatives ∂N peaks /∂∆z and ∂N peaks /∂∆m could be estimated at our 26 cosmological nodes, from which we could train a Gaussian process emulator the same way we model our signal. This improved accuracy treatment of the derivatives will likely need to be included in future analyses.
We finally note that in contrast with the 2PCFs, where shifts in ∆m affect all scales equally, the derivative presented above exhibit a more complicated structure, caused by the fact that the m-calibration affects both the shear estimate and the noise in a non-trivial manner (see Eqs. 6 and 8).
We present in Appendix A2 a comparison between a cosmological inference in which ∆z i and ∆m i are allowed to vary, and one where these two are held fixed at zero, and notice that while no biases on the preferred parameters are observed, the uncertainty about S 8 almost doubles in the former case.

Baryon feedback
The uncertain impact of baryonic feedback on the peak count statistics has received an increasing degree of attention over the last few years (Osato et al. 2015;Weiss et al. 2019;Coulton et al. 2020). The current interpretation can be summarised as follows: radiative pressure from sustained stellar winds, combined with supernovae explosions and AGN activity combine to expel gas towards the outer regions of the haloes. These mechanisms are maximally efficient on medium-size (e.g. 10 14 M ) clusters (e.g. McCarthy et al. 2017), since light haloes generally do not host AGNs, while heavier haloes manage to keep the material inside due to their deeper gravitational wells. This redistribution of matter tends to reduce the number of high S/N peaks, and possibly augment that of smaller S/N values, however the exact size of this effect is highly uncertain and depends on the feedback model adopted. Just as for cosmic shear two-point correlation functions, its significance depends on the noise level of the data. We note that for less massive haloes, stellar feedback is also important, however this occurs at significantly smaller scales not probed by our filter. Also, radiative cooling at high redshift should produce more concentrated haloes and could enhance the lensing signal, acting in the opposite direction of the AGN feedback.
The impact on the lensing power-spectrum computed on a single redshift slice (taken to be z s = 0.97 here) of the Magneticum reaches 15% at high-, as seen in Fig. 8, an amplitude that is similar to those of the BAHAMAS simulations mentioned in Sec. 2.5, and which are consistent with the PCA constraints.
To investigate the relative impact of baryons on the peak function specifically for our analysis, we tile the full DES-Y1 survey with the Magneticum light-cones introduced in Sec. 2.5 either with the full hydrodynamical simulations or with the gravity-only solution, then evaluate and compare the peak functions. We repeat this process and average the results over the 10 pseudo-independent light-cones, each further sampled with 10 shape noise realisations. We also extend the data vector to higher S/N values in order to verify where the baryons start to play an important role. We additionally repeated the process in a nontomographic set-up, where the catalogues from the four redshift bins are merged before producing the aperture mass map and counting the peaks; this reduces the impact of the pure noise peaks to highlight subtle effects that occur in the underlying matter distribution.
The results are shown in Fig. 9 for all tomographic bins, as well as for the case where no tomography is applied. We see that the effect is generally under a percent for S/N < 4, and in the no-tomographic case reaches five percent for 4 < S/N < 5. The statistical precision is also reported on these plots, which shows that the impact of baryons is everywhere sub-dominant compared to the uncertainty on the GPR emulator. This reinforces our choice of selecting S/N 4 to ensure a measurement mostly free of uncertainty related to baryons, although this suggests that we could push the upper limit to higher S/N in the tomographic case without much contamination, and that modelling the effect could be relatively straight-forward. This follows a logic similar to Huang et al. (2020), where the impact of baryonic feedback on the DES-Y1 2PCFs is modelled and captured with a PCA decomposition, allowing them to include smaller angular scales in the data vector and increase the constraints.
We use the ratios presented in Fig. 9 to construct a multiplicative correction factor that is optionally applied to the data vector during the cosmology inference pipeline, from which we can estimate the impact of baryon feedback on the recovered best-fit parameters, as modelled with the Magneticum baryon model. We note that a similar approach is adopted in the context of a Stage-IV survey in Weiss et al. (2019)

Galaxy intrinsic alignment
The intrinsic alignment of galaxies is an astrophysical systematic signal that mimics weak lensing measurements, and arises from the fact that the intrinsic orientation of galaxies is not exactly random (see Kirk et al. 2015;Kiessling et al. 2015, for a review). Indeed, it has been shown in multiple hydrodynamical simulations that the formation of galaxies, and thus their final shape and alignment, is affected by their environment, notably by the neighbouring large scale structures (Chisari et al. 2015;Codis et al. 2015) and tidal fields (Catelan et al. 2001), and by a complex relation with their host haloes (Chisari et al. 2017). The observed galaxy shape is therefore a combination of the intrinsic (I) and the shear (G) term, which both contribute to the measured weak lensing signal.
Intrinsic alignments have been directly measured and constrained, notably in the COSMOS galaxies by Joachimi et al. (2013a), who detect the signal for early-type (e.g. red galaxies) but hardly any signal for late-type galaxies. The WiggleZ blue galaxies were also found to be consistent with no IA in Mandelbaum et al. (2011), while a significant IA signal was found in the BOSS LOWZ sample (Singh et al. 2015). Johnston et al. (2019) found similar results from the KiDS, SDSS and GAMA surveys, with a null detection from the blue galaxies and a 9σ detection for an IA signal for red galaxies, consistent with a value of A IA = 3.18 +0.47 −0.46 when interpreted in the NLA model. The IA signal has also been indirectly inferred from cosmic shear measurements, although with some dispersion in the results. For example, T18 finds a 2.5σ detection of the signal from the DES-Y1 data, with A IA = 1.3 +0.5 −0.6 , the KV450 analysis by Hildebrandt et al. (2018) found an A IA value consistent both with unity and with zero, while Hikage et al. −0.33 and 0.97 +0.29 −0.38 depending on the estimator. These A IA measurements are not expected to agree perfectly given the differences in the modelling of the IA signal, but also in the redshift, in the physical scales that are probed and in the selection of galaxies in these surveys. It is also worth pointing out that the IA nuisance parameter marginalisation is degenerate with the photo-z errors as well (see, for example, the discussion in Appendix C of Heymans et al. 2020; Wright et al. 2020). For our 2PCFs analyses, we use the same two-parameters NLA model as in T18, with priors on A IA and η listed in Table 3.
The impact of IA on peaks statistics has not been well studied in the literature so far, and a percent level calibration will require a significant level of development beyond what has been done so far. Nevertheless we present here a first step in this direction, with a measurement of the effect based on in-painting galaxies with intrinsic shapes determined by properties of the dark matter haloes contained inside the lensing light-cones. Again, the amplitude of the IA signal measured from 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 0.9 0.95 1 1.05 1 2 3 4 1 2 1 3 1 4 2 3 2 4 3 4 no-tomo peaks is not expected to be the same as for the two-point shear correlation function, largely because the physical scales and the number of galaxies involved in each estimator calculations are different. Within the NLA model of Bridle & King (2007) with A IA = 1.0 for example, intrinsic alignments can modify by up to 40% the ξ ± signal in the DES-Y1 bin 1, 20% in bin 2, but only about 5% in bins 3 and 4. Even if the effective A IA increases with redshift (see the discussion in Appendix A of Robertson et al. 2020), the lensing kernels of the GI and II terms are suppressed compared with the GG signal. Considering the IA model of Fortuna et al. (2020), we note that the IA effect is more significant at small physical scales, which are only well resolved at low redshift. For these reasons, we choose to only model the peaks IA signal in the low redshift bins, more precisely on bins 1, 1∪2 and 2. This likely leaves residual, unmodelled GI and II terms present in the bins 1∪3, 1∪4, 2∪3 and 2∪4. Within the NLA model the II term is completely sub-dominant in the cross-tomographic bins and can be neglected. We therefore estimate the impact of unmodelled GI contributions in an analysis variation in which we remove all cross-tomographic bins. As it will become clear in Sec. 5.2, this is currently a limiting factor in our data analysis, which we will seek to improve with a better IA model in the future.
Our IA model is inspired from the methods of Heymans et al. (2006) and Joachimi et al. (2013b), which assign an intrinsic ellipticity int to the galaxies based on the shape of their host dark matter halo (we summarise this method and detail our implementation in Appendix B). The model requires both light-cone haloes and in-pasted galaxies, two intensive post-processing steps that have not yet been completed on the cosmo-SLICS. It is therefore not possible at the moment to explore this in a cosmology-dependent manner. Instead, we use 26 light-cones from the KiDS-HOD galaxy sample described in HD18, which have these properties and have been downsampled to closely match the N(z) in the four DES-Y1 tomographic bins. These also cover 100 deg 2 each, and since we are only interested in the relative effect, we do not tile the full footprint and work instead directly on the light-cone galaxy samples. The effect of masking is hence not included, but it should equally impact the measurements with and without IA in these mocks.
For every galaxy, the model outputs int ; this quantity is then inserted in Eq. (1) alongside a randomly rotated version, from which we compute observed ellipticities with or without IA. We finally run our peak finder on these catalogues, count the peaks as for the other training sets, and compare the measurements in Fig. 10. We observe an important (10-15%) suppression of the number of peaks with S/N > 3, which clearly exceeds the statistical uncertainty in our measurement and therefore needs to be accounted for. Moreover, in the top two panels, peaks at all S/N values are suppressed by a few percent. We note that the results from Fig. 10 align well with those found in K16 (see their figure  C3), even though a direct comparison is impossible due to differences in the source distribution of the samples. When examined with 2PCFs measurements, we find that our IA prescription is bracketed by the NLA model with A IA ∈ [1.0 − 2.0], providing a consistent but slightly larger IA signal than that preferred by the data (see Appendix B).
We recognise that our simple IA model is unlikely to represent accurately the real physical effect, and at the moment has no free parameter, which means that we cannot yet marginalise over different IA strengths like we do for 2PCFs. Further development on the IA model will be required to achieve this in the future. However, we can get a sense of the impact of IA by infusing the relative effect observed in Fig.  10 in the model returned by the emulator, and record the deviation from the no-IA case on the best-fit parameters. It turns out that this results in a ∼0.1σ shift on the cosmological parameters, hence we do not include it in the fiducial peak count pipeline.

N-body force resolution
A large fraction of the weak lensing signal receives a contribution from small scales, which are difficult to model accurately. Even in simplified gravity-only scenarios, different methods and codes to estimate the amount of small-scale structures vary significantly for wave vectors larger than k = 1. function statistics, but these disagree sometimes at the 5-10% level, depending on the scales, redshifts and cosmological parameters. Recent efforts approach the 1% accuracy on the matter power spectrum (Euclid Collaboration: Knabenhans et al. 2019;Mead et al. 2020), at least for a subset of the cosmological parameter space. It is possible to protect the shear 2PCFs measurement against residual non-linear uncertainty by rejecting angular scales in ξ ± (ϑ) where differences between these models affect the cosmological inference beyond some threshold 20 . This is one of the main drivers, along with baryon feedback, for the choice of angular scales in the T18 2PCFs analysis (and hence ours).
For weak lensing probes beyond two-point statistics that are calibrated directly on simulations however, one must additionally understand the impact of small-scale smoothing caused by limits in the mass/force resolution of the N-body code. Specifically, higherresolution simulations better resolve the highly non-linear, small-scale structures that describe the concentrated inner regions of massive clusters, which are responsible for the high SNR peaks. Therefore, degrading the mass resolution directly affects both the high-k limit of the matter P(k) and the number of large lensing peaks, leading to a potential mis-calibration.
To assess this, we rely on the SLICS-HR simulations introduced in Sec. 2.4, in which the force resolution has been significantly augmented, thereby resolving scales almost an order of magnitude smaller. A comparison between the peak function of the SLICS-HR measured on 10 realisations of the full DES-Y1 footprint and that of our main Covariance training set is presented in Fig. 11; it reveals slight differences for the S/N peaks we are probing, but strong deviations are observed for peaks with S/N > 4. These objects are rare, which explains the increased noise in the ratio towards S/N = 5, but the trend is clear: there is an overall shortcoming of large S/N objects in the training set compared to the SLICS-HR, which justifies our choice of restricting the data vector to S/N 4. Upon closer examination, the average 20 There is a caveat to this argument, see the discussion in Asgari et al. (2020b) about the small k-scale power leaking into ξ ± (ϑ) to some level at all ϑ. size of the small deviations seen in that range correspond to no more than 20% of the statistical error, and are never higher than 0.5σ stat . If we wanted to include these peaks in a future analysis, we would possibly need a new generation of training sets (for cosmology, and possibly covariance) with an increased mass resolution. Just as for IA and baryons, we investigate the impact of mass resolution by extracting a correction factor from the black curves shown in Fig. 11, which we consequently apply to the signal during the cosmology inference.

Source-lens clustering
One of the key differences between the mock DES-Y1 simulations from our Cosmology training set and the DES-Y1 data is the presence of source-lens coupling. In real data, the source density is not homogeneous and in fact increases around foreground clusters. As explained in Appendix C of K16, this introduces a coupling between the peak positions and the amplitude of the measured shear relative to the expected shear -the sources that are associated with the cluster dilute the observed signal. Furthermore, these regions of high galaxy densities will have a larger blending rate, meaning that source galaxies behind clusters are more likely to be missed, while residual errors in the photometric redshift can wrongly assign cluster members to background sources. When combined, these effects can result in a mis-calibration between the data and the simulations, which can be corrected with a 'boost factor' (Mandelbaum et al. 2005).
Boost factors for peak statistics can be evaluated in different ways. Figure 14. Impact on the peak statistics of various sources of systematic uncertainty (IA, mass resolution, baryons and boost), presented as a ratio between the measurement on mocks with and without the effect. These are used as optional correction factor applied to the model in the cosmology inference, as described in Sec. 5.2. Also shown is the scatter in the GPR cross-validation test, as well as the statistical precision on the measurements.
K16 estimate the fractional over-crowding and over-blending rates in peaks of different S/N from the Balrog catalogue (Suchyta et al. 2016), a separate image simulation that matches the DES-SV n(z) and blending properties. These rates are computed as a function of distance to peaks centres, and a correction factor is used to correct the peak function found in their cosmological training set as a function of S/N. They found that by restricting their measurement to S/N < 4, the impact is minimal (a shift in S 8 of about 0.01) and could be neglected. Shan et al. (2018) instead use the Radovich et al. (2017) cluster catalogue that overlaps with the KiDS-450 survey, and evaluated the boost factor from the excess source density around these massive objects. They found that the contamination to the peak function reaches 27% for peaks with S/N = 5, however it caps at less than 6% for S/N < 4. In their analysis, this effect of source-coupling is about twice the size of that of their baryonic feedback model, and acting in the same direction, e.g. suppressing the number of high S/N peaks. If overlooked, this mis-calibration could lead to a best-fit inference with a S 8 that is too low.
We account for source-lens coupling by estimating the effect in the DES-Y1 data and recalibrating our measurements, leaving the simulations unchanged. We first extract w i j data (ϑ, S/N), the clustering of source galaxies along the line-of-sight of peaks identified in the data, as a function of their peak height, for each combination of auto-and cross-tomographic bins 'i j . These are next compared with a similar measurement carried on ten survey realisations sampled from the Cosmology training set at the fiducial cosmology, and the ratio of the two reveal clustering profiles in excess of random: which are shown in Fig. 12 for a sample of tomographic bins and S/N bins. The brackets refer to the mean measurement in the above expression, and the right term involving N data/sim peaks normalises the profiles. It is clear from this figure that the largest peaks are generally more severely affected by this boost factor correction, however the size of effect varies across redshift in a non-trivial manner. For example, the fourth tomographic bin is less affected than the second or the third. In absence of source-lens coupling these profiles would be flat. The excess of galaxies in these profiles are for the most part cluster members; their shapes are therefore not sheared by the foreground matter over-density and only dilute the lensing signal. We compensate for this by up-weighting the shear signal following the profile, which is most efficiently done by modifying the filter Q(r) in identified peaks as: and re-evaluate the peak height with Eq. (7). Fig. 13 shows the ratio between the corrected and original peak function in the four autotomographic bins (similar results are obtained with the cross-bins). The effect if generally small, however it approaches the 1σ level in some Table 3. Priors used in the likelihood sampling. The ranges for the four cosmological parameters are determined by the cosmo-SLICS simulations, the photometric redshift ranges and priors are taken from the DIR errors found in J20, while those of the shear calibration and intrinsic alignments are taken from T18. Gaussian priors are characterised by a mean and a standard deviation (µ, σ).

Parameter range prior
Cosmology isolated data elements. The boost factor is included in our fiducial analysis, and has been applied to the data points shown in Fig. 4, bypassing the need to forward-model the source-lens coupling at all cosmological points. Following the method used in the previous sections, we isolate its impact on our cosmological inference by optionally removing this correction factor. Fig. 14 summarises all the correction factors we can include in our cosmology inference, from IA, mass resolution, baryons and no-boost. Our fiducial analysis includes none of them, since their overall impact is relatively minor and many of these effects act in opposite directions.

Cosmology inference
We test our cosmology inference pipelines on mock data vectors taken to be the mean value from the Covariance training set, providing a measurement that is almost noise-free. We present our results in Appendix A, notably in Fig. A1. This exercise reveals a high degree of similarity in the constraints between the two-point functions, the peak statistics and the fiducial T18 analyses, despite major differences in our covariance matrix estimation techniques and in the observation data vector (DES-Y1 data vs SLICS, 2PCFs vs peaks). The best-fit cosmological parameters are also consistent with the input at the 1σ level, with no noticeable shift between the probes, and the sizes of all contours closely match that of the DES-Y1 analysis, two properties that respectively validate our cosmology calibration and our covariance matrix.

Others sources of uncertainty
Our method relies on a certain number of well-justified approximations, which could potentially contribute to the error budget in addition to the systematic effects described above. In this section we introduce these effects, and justify our choice to neglect them in this study.
Our simulated light-cones are constructed with mass planes of constant thickness set to 256.5 h −1 Mpc (see Sec. 2.6.1), a choice which has an effect on the reconstructed lensing planes compared to a construction made of hundreds of finer shells. This has been recently quantified in Zorrilla Matilla et al. (2020), where it is shown that the impact on peaks with S/N 4 is below the one percent level, regardless of the thickness and of the source redshift.
Correlations between the mass planes in our light-cones are explicitly suppressed by randomly shifting and rotating the mass sheets, which breaks the long line-of-sight correlations that exist in the data. It was shown in Takahashi et al. (2017, see their Appendix B) that this affects the projected power-spectrum, reducing the power at intermediate scales by a few percent on the sheets, however, the lensing kernels project an even larger volume and mixes these scales, which makes our measurements relatively immune to this.
The lensing plane construction has been carried out under the Born approximation (see HD18), whereby light bundles record the convergence and shear along straight lines and ignore the deflection angles in this calculation. It has been shown (Hilbert et al. 2020) that the difference between these methods induces variations smaller than 0.2% on the lensing power spectrum up to scales of = 2 × 10 4 . It is therefore reasonable to expect that Born approximation plays a minor role in the peak statistics as well, however Castro et al. (2018) find that the impact on the PDF of the lensing maps is of a few percent. We ignore this effect at the moment, but it will need to be revisited in the future.
Finite box effects are also known to plague the estimation of 2PCFs and of their covariance matrix (Harnois-Déraps & van Waerbeke 2015), being sensitive to Fourier modes larger than the simulation box. This has an impact on the ξ ± covariance matrix estimated from our Covariance training set, however it has a minor impact on the cosmological contour, as shown by the good match between our analysis on the mocks and that of T18. Furthermore, since the peak statistics measure quantities in local apertures, it is not sensitive to these large scales, and hence are protected against this. Similar conclusions can be drawn regarding the incomplete account of the super-sample covariance (Li et al. 2014, SSC hereafter) captured by our simulations: HD19 found that the SLICS light-cones capture more than 75% of this SSC term, yielding two-dimensional constraints on (Ω m , σ 8 , h, w 0 ) that match to better than 10% those of an analytical covariance matrix. Given the suppressed sensitivity of the peak statistics to these large-scale modes, we expect the residual missing SSC contribution to play a minor role on the full uncertainty, although this may need to be better quantified in the future.
It has been recently shown that the depth variability across a lensing survey can impact the cosmic signal and variance (Heydenreich et al. 2020b;Joachimi et al. 2020). This will need to be the subject of future investigations. Given the findings of Joachimi et al. (2020), we expect the impact of unmodelled variable depth to be negligible, given the statistical power of DES-Y1.

RESULTS
We present in this section the results from our cosmological inference analyses, beginning with the 2PCFs and the peak statistics pipelines. We next discuss our tests on the importance of various systematic effects, before introducing the results from our joint [ξ ± ; N peaks ] analysis. All quoted parameters constraints correspond to the best-fit value ± the 1σ region of the marginalised posterior. A s ∈ [0.5 − 5.0] × 10 −9 Analytic Stacked PDF J20 ln 10 10 A s ∈ [1.5 − 5.0] Analytic DIR Table 5. Cosmological pipeline comparison. The values used in the T18 comparison are taken from their Table 3, using a fixed neutrino mass density. Details on the different pipelines are summarised in Table 4. Most posteriors on h and w 0 are prior-limited, so no constraints are reported here. The same applies to Ω m in many cases. Tests on the mock data are presented in Appendix A.

2PCFs
We first report in Fig. 15 the constraint on ΛCDM parameters obtained from our 2PCFs measurement, over-plotted with those from T18 21 (red) and J20 (grey). Our constraints (in blue) assume the DES-DIR n(z), it uses our simulation-based covariance matrix, we marginalise over the 10 nuisance parameters listed in Table 3, but the cosmological sampling follows that of T18. Another difference: as described in Sec. 3.1, our ξ ± measurements are obtained from the weighted mean ξ ± obtained over the 19 tiles. As demonstrated in J20, the net effect of changing from the fiducial DES-Y1 n(z) to the DIR n(z) is to shift the amplitude of the modelled 2PCFs signal upwards, which translates into lower best-fit S 8 values. This can be seen by comparing the one-dimensional posteriors shown with the red and grey lines in the bottom right panel. When analysed this way, we obtain: S 2PCFs,Λ 8 = 0.761 +0.027 −0.027 . The priors and the parameter sampling in J20 are significantly different from T18 and are responsible for some of the differences between the blue and grey curves, notably the sharp edges in the h posterior, and the more elongated contour in the [σ 8 − Ω m ] plane. All parameter constraints are summarised in Table 5. Replacing the ξ ± data extracted from the tiles with those measured on the full footprint results in a negligible change, with S 8 = 0.762 ± 0.026, thereby validating our  Figure 15. Constraints on the ΛCDM cosmological parameters inferred from the 2PCFs, obtained from the DES-Y1 cosmology inference pipeline, our simulation-based covariance matrix and assuming the DIR n(z) (blue). These results are compared to the ΛCDM constraints from T18 (in red) based on the same modelling and likelihood sampling strategy, and to those of J20 (grey), which also use the DIR redshift distribution but adopt different modelling, prior ranges and likelihood sampling choices.  . Constraints on the wCDM cosmological parameters inferred from the 2PCFs with our fiducial pipeline (grey), from the T18 wCDM analysis (red) and from an intermediate pipeline, the DIR-wCDM, which uses the T18 parameter sampling and prior ranges on our tiled data with our simulation-based covariance matrix, assuming the DES-DIR n(z) (blue). Note that priors on h and w 0 are significantly tighter in our fiducial pipeline, and that our fiducial 2PCFs analysis hits the priors (shown with the dashed lines) on these two parameters. mosaic methodology. Additionally, the resemblance between our confidence interval and those of T18 (the fractional errors on S 8 and Ω m agree to within 0.002 and 0.005, respectively) demonstrates the accuracy of our simulation-based covariance matrix. The constraints on the nuisance parameters are mostly prior-dominated. We provide a more complete comparison between T18, J20 and our 2PCFs ΛCDM analyses in Appendix C.
We next compare in Fig. 16 the constraints on the four wCDM parameters inferred from our 2PCFs measurement (in grey), to the T18 wCDM results (in red). As explained previously, there are multiple difference between these two pipelines, which we can dissect here. We show (in blue) an intermediate pipeline, labelled the DIR-wCDM, which uses the T18 parameter sampling and prior ranges, but assume the DIR n(z), and uses our measurement on the tiled data and our simulation-based covariance matrix. Table 4 summarises the differences between these pipelines. By construction, differences between the blue and the grey contours are caused exclusively by the likelihood sampling strategy: the former uses the T18 priors and samples A s over a flat prior, whereas the latter uses those listed in Table 3 and samples σ 8 . In contrast, red and blue curves share the signal modelling as well as the parameter sampling, but differ in the n(z) (which shifts down the best S 8 and Ω m values, clearly visible in Fig. 16), and in the covariance matrix (which weights slightly differently the various elements of the compressed statistics and therefore affects the size and shape of the contours).
In our fiducial analysis the likelihood sampler hits the priors on h and w 0 , which are limited by the range of values probed by our Cosmology training set. We note, however, that this should have no impact on our analysis due to the low sensitivity of lensing to these particular parameters, and that we marginalise over these anyway. Consequently, we are unable to report constraints on h and w 0 in our 2PCFs pipeline with the current data 22 . The best-fit parameters are reported in Table 5, notably: S 2PCFs,w 8 = 0.753 +0.043 −0.043 , which is consistent with, but has a larger uncertainty than S T18,w 8 = 0.791 +0.031 −0.044 reported in T18. The overall precision on the matter density is similar to that of the ΛCDM analysis (we measure Ω m = 0.254 +0.033 −0.056 ), while the uncertainty on S 8 increases significantly, as expected from opening the parameter space. When adopting the DIR-wCDM pipeline, we obtain: −0.037 , which best inferred value aligns with our 2PCFs analysis, with error bars slightly tighter. The fact that the T18 constraints (4.7% on S 8 ) are tighter than these (5.3%), despite having larger uncertainty in their redshift distributions, indicates that the T18 priors and sampling strategy are informative about S 8 to some level, artificially decreasing the size of the reported error bars (see the discussion on informative priors in Joachimi et al. 2020).

Peaks
We report in Fig. 17 the results of our peak count analyses for models in which systematics are infused. All data presented from now on are obtained from the 12.5 arcmin filter; we investigated other choices of θ ap that yielded slightly less constraining results and hence we dropped them from the analysis.
Before examining differences between the various cases, we first  Figure 18. Constraints on the four wCDM cosmological parameters obtained from 2PCFs (grey), peaks (blue) and from the joint analysis (red). The dashed lines indicate the prior ranges on Ω m , h and w 0 , which are in most cases too narrow to provide meaningful constraints on these parameters (see main text for exceptions).
note that, for all of them, the inferred matter density values are unexpectedly high, with Ω m 0.4 at the 1σ level. This is in tension both with the 2PCFs measurements and with external probes such as the Planck measurement of Ω m = 0.3153 ± 0.0073 (Planck Collaboration et al. 2018, see also a quantitative assessment of these tensions in Appendix A3). In the case where the cross-redshift bins are excluded however, the tension is significantly reduced. Interestingly, this is also the only case where the GI part of the intrinsic alignment signal is suppressed, which suggests that unmodelled IA systematics could be artificially pushing the likelihood analysis towards high values of the matter density. We therefore adopt a conservative approach and remove the cross-redshift bins from our fiducial analysis. We still include them in one of our analysis variations, with the caveat that they might be contaminated by an unaccounted secondary signal. In order to provide the most accurate account of the other systematic effects, we include the cross-redshift bins in their measurements, shining light on their impact with the highest precision available. Starting with the IA, and recalling that our alignment model applies to tomographic bins 1, 1∪2 and 2 only with no free parameter, we include the systematic effect as an optional correction to the measured data vector (see Sec. 4.4 and Fig. 14). The cosmology inference results, shown by the black dashed lines in Fig. 17, indicate that our low-redshift IA model has a relatively mild impact, affecting the best-fit S 8 value by less than 0.1σ. Notably, we have: S peaks 8,IA = 0.735 +0.024 −0.032 , which is almost unchanged compared to the baseline analysis: S peaks 8,cross−tomo = 0.737 +0.027 −0.031 . We note that the best-fit Ω m increases by 0.02. This is consistent with the expectation that IA overall suppresses the lensing signal; therefore, given a measurement (2PCFs or peaks), the inferred S 8 and/or Ω m values increase with stronger IA model.
As mentioned earlier, the GI term, that we are currently unable to model accurately, is most likely to impact the cross-tomographic bin measurements. The fact that the best-fit S 8 shifts up by as much as 0.037 (and Ω m shifts down by 0.02) when removing the cross-tomographic bins suggests that unmodelled systematics affect these bins differently than the auto-tomographic bins. The GI contribution does exactly this, and a similar 1σ effect is detected in T18 (see their Fig. 9). It is therefore plausible that unmodelled IA could be leading to the observed preference for a high Ω m values when including the cross-tomographic bins, however we postpone a more robust analysis of IA to future work.
Adopting a similar strategy, we apply the baryon correction factor (see Sec. 4.3, and shown in Fig. 14), and re-run our cosmological inference pipeline. The impact is stronger than the IA, yielding: S peaks 8,baryons = 0.750 +0.026 −0.031 , a 0.5σ shift in S 8 visualised by the red dashed lines in Fig. 17. When correcting the signal for a possible bias due to limits in mass resolution of the Cosmology training sample, the values of S 8 is slightly reduced as expected, with: This demonstrates that our choice of S/N bins is unaffected by the known small-scales inaccuracies inherent to the N-body runs.
Removing the boost factor correction can generally push the inferred cosmology to models with lower clustering, however in our case, the effect is marginal. We find S peaks 8,no−boost = 0.736 +0.025 −0.032 , a negligible change with respect to the fiducial pipeline.
Overall, when including all redshift bins, we observe that the baryons and IA cause the largest shifts. Excluding the cross-redshift terms has the strongest impact, as it significantly offsets the best-fit value and degrades the constraining power. Nevertheless, the contours are fully consistent with the fiducial case, and this choice is necessary until an improved high-redshift IA model can be incorporated.
The fiducial constraints from the peak statistics are presented by the blue contours in Fig. 18 and compared to the 2PCFs (in grey, same as in Fig. 16). The marginal constraints on S 8 are: These values are in excellent agreement with those reported by J20, T18 and with our 2PCFs method, with marginal constraints on S 8 being consistent to within 1σ in all cases. More importantly, the peak count analysis improves this fractional precision by 10% compared to our fiducial 2PCFs pipeline, providing a 4.8% measurement on S 8 . Including the cross-tomographic bins results in S peaks 8 = 0.737 +0.027 −0.031 , which improves the constraining power to a ∼3.9% measurement (see the pipeline 'cross-tomo' in Table 5). Although these are excluded from the fiducial analysis due to unaccounted for systematics, this demonstrates that even with the current noise level and in presence of systematics, some 20% additional information on S 8 can be recovered by including these bins compared to the fiducial case, potentially bringing the gain to 30% over the 2PCFs analysis. This aligns with the findings of Martinet et al. (2020), which observe a 50% gain in a systematicsfree Stage-IV weak lensing survey setup, using five tomographic bins and including cross-terms for all possible combinations of slices (i.e. larger than pairs). We finally note that the posterior on Ω m overlaps with the upper prior edge within 2σ, while those on h and w 0 are completely prior-dominated, hence we do not report constraints on these parameters.

2PCFs+Peaks
The fact that both 2PCFs and peaks count methods individually prefer different values for the cosmological parameters is expected, given that they both probe slightly different physical scales, and react to the DES-Y1 shape noise in completely different ways. This is further supported by our measurements on the mean of the SLICS mocks presented in Appendix A, which correspond to noise-free data and show an excellent agreement in the best-fit values. We note that similar results have been observed in the literature already, notably when comparing the inferred cosmology from the 2PCFs and power spectra analyses of the HSC-Y1 data (Hamana et al. 2020;Hikage et al. 2019) and of the KiDS-1000 data (Asgari et al. 2020a), where it was shown that some fluctuations are expected and statistically consistent. We show with mock surveys in Appendix A3 that the observed difference of ∆Ω m = 0.2 is frequent, and that the tension between the two methods is low, enabling us to carry out a joint 2PCFs+peaks likelihood analysis. For consistency, our fiducial case again excludes the cross-tomographic redshift bins in the peak count data, and results in a best-fit value of: −0.038 , with contours presented in red in Fig. 18. The best-fit value in this case is consistent with both the 2PCFs and the peak statistics, with a ∼20% smaller fractional uncertainty on S 8 . The joint analysis achieves 4.6% precision, fast approaching the 3.8% precision achieved by the DES-Y1 3×2-pts analysis (Abbott et al. 2018a). We note that the same 20% increase in precision is reported in M18 when comparing peaks and 2PCFs. However, their treatment of the systematics is simpler in comparison to this study, which likely affects their real precision gain. The analysis variant in which the cross-tomographic terms are included yields a promising 3.2% measurement, with: S joint 8,cross−tomo = 0.743 +0.024 −0.024 , however a better modelling of systematics is required before we can exploit this configuration.
We repeated the full cosmology joint-probe inference with two other aperture filter sizes, and obtained similar constraints, which leads us to the conclusion that given the current level of noise in the data, the choice of aperture filter size does not make a large difference, at least for the scales we tested. It is also clear that under these conditions a multi-scale analysis would bring no additional information. M18 found a mild improvement when combining the 12.5 and the 15.0 filters, but that was in the absence of tomography. In that case, the noise levels are suppressed at the cost of losing the redshift information.
The dark energy equation of state can be constrained from the joint analysis, which has sufficient constraining power to not be priordominated 23 . We obtain: at 1σ, which is consistent with ΛCDM cosmology but not yet competitive with T18, who find w T18 0 = −0.92 +0.35 −0.40 . As shown in Martinet et al. (2020), the cross-tomographic data points will greatly help with this measurement once properly calibrated.
The overall goodness-of-fit is evaluated with the p-value, which can be interpreted as the probability that the model is not consistent with the data. This null hypothesis-rejection method depends notably on ν eff , the effective number of degrees of freedom. This quantity is often estimated from the difference between the number of data points and the number of free parameters in the model, however this does not exactly hold in the more accurate context where these model parameters are not totally free but sampled from priors that often restrict the search, and are highly degenerate. For example, the KiDS-1000 cosmic shear analysis presented in Asgari et al. (2020a) samples 12 parameters in their cosmological inference pipeline, but a careful study of the sampling reveals that the effective number of free parameter is closer to 4.5 ).
In the current analysis, the size of the data vector varies from 48 (peaks fiducial) to 305 (peaks + 2PCFs). The χ 2 for the three fiducial analysis pipelines, at their best-fit cosmologies, are respectively χ 2 peaks = 71, χ 2 2PCFs = 209 and χ 2 joint = 463. If, following Joachimi et al. (2020), we also use 4.5 free parameters, this leads to goodness-of-fit p-values of 0.005, 0.978 and 0.084, respectively. The p-values for the peak count analysis is low, indicating possible residual tensions that we could not completely identify nor resolve. We noted however from Fig. 4 that the data in first bin has a high level of scatter compared to the size of the error bars, which is partly responsible for dragging the p-value towards low values. When removing these points from the analysis, we obtain a p-value of 0.105, which is higher than our goodness threshold of 0.05. We decided to keep the bin data in our fiducial analysis pipeline, even though it carries more noise than signal.

CONCLUSIONS
We analyse the DES-Y1 cosmic shear data with a cosmology inference pipeline exclusively calibrated on suites of wCDM weak lensing numerical N-body simulations. Our method is general and can be directly used with many non-Gaussian probes, however we opted for the tomographic lensing peak count statistics. Our pipeline interfaces with the two-point functions via the public inference code cosmoSIS, which allows us to conduct a joint [ξ ± ; N peaks ] analysis. We model the peak statistics signal with the cosmo-SLICS, the covariance with the SLICS and investigate the key cosmic shear systematics either by infusing them in our training set, or by extracting their impact from tailored external mock data. Notably, the impact of baryons is assessed with the Magneticum hydrodynamical simulations, the effect of finite particle mass is quantified from high-resolution light-cones, while the impact of intrinsic alignment is investigated by infusing intrinsic shapes to mock galaxies following a physical model based on the shape of dark matter haloes. Source-lens clustering is also studied by comparing galaxy excess in peaks of different height, and found to have a negligible impact on our results given our peak selection criteria.
We validate our method on mock data vectors and against the DES-Y1 ξ ± analyses of T18 and J20, which we reproduce well given differences in our pipelines. We identify a residual unknown systematic in the cross-tomographic redshift bins of the peaks data, which appears to shift the inferred cosmological parameters towards high Ω m values. We identified the intrinsic alignment GI term as one possible and likely cause of this effect, and in absence of an accurate IA modelling, we decided to remove these redshift bins from our fiducial analysis.
We perform a joint likelihood analysis and set constraints on S 8 , finding a ∼20% gain in precision compared to the correlation function analysis. The joint analysis yields a 4.6% measurement, with S joint 8 = 0.766 +0.033 −0.038 , approaching the 3.8% measurement reported by the DES-Y1 3×2-pts analysis (Abbott et al. 2018a), and closely approaching the 2.9% S 8 measurement of Asgari et al. (2020a) with the fourth KiDS data release. We show that after an improved calibration of the cross-tomographic terms, the peak count method can achieve 3.9% on S 8 , and the joint analysis could reach 3.2%, one of the tightest measurements of this parameter so far. One possible caveat is that we include no IA modelling in our cross-tomographic analysis, and that its inclusion could possibly degrade our constraining power.
Our analysis pipeline builds from the infrastructure presented in M18, and is inspired by many aspects of the K16 data analysis. We have significantly upgraded the underlying simulation support, we include a better treatment of the photometric and shear calibration uncertainties, we improve the inference method with a full integration within a likelihood sampler, and we demonstrate the robustness of our results to uncertainty on baryonic feedback, to intrinsic alignments, to sourcelens clustering and to limitations from the finite mass resolution of the N-body runs, which are lacking in previous analyses. Additionally, this paper is the first tomographic peak counts data analysis, and we confirm that including cross-redshift bins reinforces the constraints, once secondary signals such as IA are properly calibrated.
For the peaks statistics to remain competitive with the 2PCFs, further development will be required in the modelling of baryon feedback. For example, a 20% improvement on S 8 is obtained from the DES-Y1 3x2pt data when modelling and marginalising over the impact of baryons, as it enables to include additional small-scale elements in the ξ ± data vector (Huang et al. 2020). A similar gain is observed in the cosmic shear-only DES-Y1 re-analysis by Asgari et al. (2020b), where clean small scales are included via the COSEBIs estimators. Equivalent procedures with peaks statistics must be investigated, which would allow us to push back the S/N 4 limit. The 'baryonification' method described in Schneider et al. (2019) is one possible avenue to achieve this, as well as the direct training on a variety of hydrodynamical lightcones, although the latter is a more expensive task and introduces several other uncertainties, e.g. how one changes the sub-grid physics for different cosmologies. The goal here would be to construct a response model in order to include the effect as a nuisance parameter in a cosmology inference analysis.
Intrinsic alignment of galaxies is the second topic where further development is critical, and improving the IA modelling is mandatory for future analyses. The model we adopt here is too simple, and we demonstrate that IA likely impacts the inferred cosmology at the same level as the baryons and potentially up to 1σ level, making it one of the primary limiting factor in our analysis. A better approach would involve a suite of dedicated training samples where the modelling and the levels of IA can be adjusted, mimicking the varying (A IA , η) parameters that control the amplitude of the secondary signal within the NLA model. One could also relate the simulated intrinsic galaxy shapes with the halo-model based approach of Fortuna et al. (2020) directly from the HOD prescription. Even better, these approaches could be linked, allowing us to marginalise coherently over a unique set of common astrophysical parameters.
All of the improvements presented here further close the gap between analytical and simulation-based cosmological inference tech-niques. While the former is preferred in cosmic shear analyses performed by the large weak lensing collaborations, the latter is required by most measurement methods based on non-Gaussian statistics, for which a theoretical model of the signal and of the covariance matrix is generally not available. Given that the performance of non-Gaussian estimators is now clearly shown to significantly exceed that of two-point correlation functions, and that this trend will intensify in future surveys Zürcher et al. 2020;Martinet et al. 2020), we strongly advocate for a co-development of both approaches.
Most of the work that has been put into the development of our method can now be directly exploited with most of the alternative statistics mentioned in the introduction: one only needs to measure the desired statistic on our training sets (Cosmology, Covariance and Systematics), re-train the GPR emulator and adjust the linear models that describe the photometric redshifts and the shear calibration. The rest of the infrastructure is already in place, and we intend to explore some of these alternative probes in the near future. For this reason we will make the mocks available upon request. We also intend to explore some extensions to the existing infrastructure, including for instance a new Neutrino training set derived from the MassiveNuS simulations (Liu et al. 2018) and designed to measure the sum of the neutrino mass. Parallel pipelines tailored for the analysis of the KiDS-1000 and/or the HSC surveys could also be constructed directly, and we hope to see these novel methods take an important role in the upcoming Stage-IV lensing analyses.  Figure A1. Two-dimensional constraints on the wCDM cosmological parameters (Ω m , w 0 , S 8 , h) from 2PCFs (blue) and peak count analyses (grey) of the mock DES-Y1 data corresponding to the mean SLICS values. The red contours show the fiducial T18 results wCDM for reference. The dashed lines indicate the input SLICS cosmology. The inferred S 8 value appears lower than the input value, but this is a plotting artefact; maximum a posterior (MAP) value is S MAP 8 = 0.793 and 0.785 for 2PCFs and peaks respectively, which are very close to the input of 0.813 (see main text for more details).
to those obtained by the wCDM analysis of T18 (blue vs. red), with a best fit value that is consistent with the input cosmology, albeit with a slightly lower S 8 value. This shift is caused by our parameter sampling, more precisely by the upper limit on w 0 . A closer look on the [w 0 − S 8 ] panel reveals that these two quantities are degenerate, and that the w 0 posterior is limited by the prior, especially on the upper bound. Had we access to higher values, the contours would extend further in the upperright direction, bringing centroid to higher S 8 values. This is further supported by the fact that the maximum a posterior (MAP) value is located at S MAP 8 = 0.793, which is very close to the input of 0.813. We next present in Fig. A2 the two-dimensional marginal constraints on the wCDM parameters (Ω m , h, σ 8 , S 8 , w 0 ) and on the 10 nuisance parameters used in this analysis, which are associated with photometric uncertainty (4 × ∆z i ), shear calibration (4 × ∆m i ) and intrinsic alignments of galaxies (A IA and η). The blue contours are obtained from our 2PCFs inference pipeline, executed on the mean 2PCFs extracted from the SLICS and using the SLICS covariance matrix. We observe that the constraints on the four cosmological parameters are well centred on the input cosmology, and the sizes of the contours are fully consistent with those of T18.
We also note that our pipeline accurately recovers the input (null) amplitude of the intrinsic alignment in the SLICS, which also shows a degeneracy with S 8 , whereas the data prefers values closer to A I = 1, as reported in T18 and seen in the red contours. The fact that the posterior of the η parameter is almost identical in the data and in the IA-free simulation indicates that the evidence for a redshift evolution in the data is not strong. In fact, it instead suggests that the measurement of this parameter is sensitive to something else, present in both simulations and data (see the discussion on degeneracies between IA parameters and photometric redshift nuisance parameters in Appendix C of Heymans et al. 2020). All other nuisance parameters are prior dominated, and we observe a clear difference between our nuisance priors and those of T18, especially for the photometric bias; the DIR method has a smaller error, which reflects here in smaller contours for all ∆z i .

A2 Inference from peaks
We next run our peak statistics wCDM inference pipeline on the same data vector as in Appendix A1, e.g. mean of the Covariance training set measurements. The mean peak function is presented by the black symbols in Fig. A3, showing the noise-free signal which lies well within the range covered by the Cosmology training set. Again, in this validation exercise, the priors on the nuisance parameters are centred on zero, and we marginalise over the 8 nuisance parameters (4 × ∆z i , 4 × ∆m i , the two IA parameters are not included in the peaks analysis). We next feed this data vector into our cosmoSIS pipeline, and confirm in Fig.  A1 that we recover the same cosmology as the 2PCFs analysis. Once again, the S 8 is slightly lower than expected, which can be explained by the asymmetric sampling of w 0 that causes a bias in the inferred contour plots. The maximum a posterior value is well within 1σ, with S MAP 8 = 0.785. We recover similar contours on Ω m and S 8 from peaks and 2PCFs. We finally repeated the exercise for the case where ∆z i and ∆m i are set to zero, and notice that the error on S 8 is reduced by almost a factor of two, which clearly indicates the importance of using accurate informative priors on these nuisance parameters. Results are summarised in Table 5.

A3 Towards a joint analysis
Peak statistics and 2PCFs are affected differently by the noise in the data, and even though they converge in the noise-free limit (as shown in Fig. A1), it is expected that a given noise realisation will scatter the best-fit cosmological parameters inferred by the two pipelines. We estimate the level of scatter by comparing the best-fit parameters obtained at the maximum likelihood returned by 50 survey realisations taken from the Covariance training set. Following the recommendations of Joachimi et al. (2020), we improve the accuracy of the solution by repeating the process multiple times by varying the starting points in the [σ 8 − Ω m ] plane and recording only the solution with the lowest χ 2 . In this exercise all nuisance parameters are held to zero, and we include the cross-redshift bins in the peak count data to exacerbate the effect.
We report on Fig. A4 the best-fit parameters for the 50 peaks analyses (in blue, including the cross-tomographic terms) and 2PCFs analyses (in orange), and further link the pairs associated with the same simulations. The inset shows the distribution of the difference in Ω m , which extends beyond 0.3. The value measured from the data is ∆Ω m = 0.21, which is indicated by the vertical dotted line. In this metric, the difference in matter density inferred by both probes is common and likely, given the noise levels present in the DES-Y1 data. Repeating the same exercise for S 8 reveals a smaller scatter, with ∆S 8 0.2; the difference of 0.2 observed in the data is therefore a rare event. Excluding the crossredshift bins significantly increases the width and the contours, which making the observed ∆S 8 = 0.3 more likely.

APPENDIX B: INFUSION OF INTRINSIC ALIGNMENT
In this paper we estimate the impact of IA on peak count statistics by infusing an alignment between mock galaxies and properties of their host dark matter haloes, based on the method described in Heymans et al. (2006) and Joachimi et al. (2013b). Since most of the simulated light-cones used in this paper do not use dark matter haloes to assign galaxy positions, we instead have to rely on a separate simulation suite. We use for this task the KiDS-HOD mock data described in Harnois-Déraps et al. (2018), in which dark matter haloes from the light-cones are populated with an HOD based on a conditional luminosity function, then sub-sampled to match the galaxy redshift distribution of the KiDS-450 data up to z = 1.5. Since these mocks are slightly denser than the DES-Y1 data, we further downsample them such as to closely match the DES tomographic bins. These DES-HOD mocks lack some of the Peaks wCDM T18 wCDM 2PCF wCDM Figure A2. Pipeline validation: Two-dimensional constraints on the wCDM cosmological parameters (Ω m , h, σ 8 , S 8 , w 0 ) and on the 10 nuisance parameters (4 × ∆z i , 4 × ∆m i , A IA and η). Mean and covariance are obtained from the SLICS 2PCFs (blue) and peaks (grey), and compared to the T18 constraints (red). The thin dashed lines indicate the input values in the simulations, which are well recovered by the blue and grey contours. Note that the T18 priors differ, most notably in h, ∆m i and ∆z i . very low redshift galaxies at z < 0.2 and all galaxies with z > 1.5, but these are all down-weighted by the lensing kernel, leaving the expected lensing signal almost unchanged. We match the mean redshift in each bins to within 0.04, and the galaxy density to within 0.07 gal/armin 2 , which is sufficient for this exercise. A number of physical models are presented in Joachimi et al. (2013b), and in this first study we opted for the simplest possible case: we assume that all central galaxies are early-type red galaxies, and that all satellites are late-type blue galaxy. This is of course inaccurate, but provides a good staring point upon which we can build and improve the model in future work.
The ellipticity of the central galaxies is known to correlate with the shape of the host dark matter halos, in a complicated way that depends on galaxy type, redshift, and possibly merger history (for a review see Kiessling et al. 2015). The galaxy catalogues that we construct from the DES-HOD mocks contain the inertia matrix of the host dark matter haloes, from which one can compute the eigenvalues and eigenvectors (ω µ , s µ ). The projected ellipses reconstructed from these are described by the symmetric tensor W (Joachimi et al. 2013a): where s ⊥,µ is the eigenvector projected along the line of sight, and the semi-major axes are given by 1/ω µ . Halo ellipticities h can be ob- Once these are determined, we opted for a 100% alignment between the halo and the central galaxy ellipticities. This is likely to slightly overestimate the effect, a deliberate choice that we make when developing a relatively conservative approach. The absence of scatter between the two, combined with the approximation that all centrals are early-type galaxies both act as to maximise the IA signal in our model. We also find that this model does not work well at high redshift, since the haloes are not fully relaxed, and their shapes are less well modelled. We therefore concentrate on the lower redshift bins only. We make a second important approximation by treating all satellite galaxies as blue, late-type, and assign them no intrinsic alignment, consistent with recent findings (Mandelbaum et al. 2011;Samuroff et al. 2019;Johnston et al. 2019). We could assign the halo ellipticities to the centrals directly, but doing so strongly biases the ellipticity distribution to lower values compared with the data. Instead we keep the ellipticities drawn from the Gaussian distribution, and rotate them until they align with h . Once this is done, we use Eq. (1) to shear these intrinsic ellipticities, and compute ξ 11 + , ξ 12 + and ξ 22 + with given either from the IA model described above or from the no-IA case. We additionally compute the same 2PCFs statistics from the pure intrinsic shapes int , thereby estimating the II term, as well as the combination int γ to compute the GI term. Results are presented in Fig. B1, and compared with the theoretical predictions with A IA = 1.0 and 2.0. We see in all three tomographic bin combinations that the mocks reproduce reasonably well the IA, noIA and GI models, however the II terms remains very noisy.
The IA model infused in these mocks is not accurate enough to be used for signal calibration, but is adequate for diagnostic tests such as those for which they are designed here. Future developments with more flexible options regarding early-types/late-types separation, inclusion of satellite alignment, and possibly different N-body runs, will be the subject of future work.

APPENDIX C: COMPARISON BETWEEN ΛCDM ANALYSES
Differences in signal modelling and in the likelihood sampling strategy are responsible for the broader range of accepted Ω m and σ 8 values in J20, compared to T18. Notably, they replaced the halofit predictions of the non-linear matter spectrum by the halo-based model HMcode (Mead et al. 2015). Furthermore, the sampling over the amplitude parameter A s is replaced by a sampling over ln 10 10 A s , the sum over the neutrino mass is fixed, and the ranges of priors are changed to that of Hildebrandt et al. (2018). These differences are responsible for a loss of precision on the S 8 constraints: while T18 finds S T18,Λ 8 = 0.792 +0.032 −0.021 , J20 reports S J20 8 = 0.763 +0.037 −0.031 , e.g. a ∼ 1σ shift compared to the original DES-Y1 results, and an error almost 30% larger. As explained in J20, the shift in S 8 is largely driven by differences in the n(z) estimations. This paper has been typeset from a T E X/ L A T E X file prepared by the author.