SPT-SZ MCMF: An extension of the SPT-SZ catalog over the DES region

We present an extension to a Sunyaev-Zel’dovich Effect (SZE) selected cluster catalog based on observations from the South Pole Telescope (SPT); this catalog extends to lower signal-to-noise than the previous SPT-SZ catalog and therefore includes lower mass clusters. Optically derived redshifts, centers, richnesses and morphological parameters together with catalog contamination and completeness statistics are extracted using the multi-component matched filter algorithm (MCMF) applied to the S/N > 4 SPT-SZ candidate list and the Dark Energy Survey (DES) photometric galaxy catalog. The main catalog contains 811 sources above S/N=4, has 91% purity and is 95% complete with respect to the original SZE selection. It contains 50% more total clusters and twice as many clusters above 𝑧 = 0 . 8 in comparison to the original SPT-SZ sample. The MCMF algorithm allows us to define subsamples of the desired purity with traceable impact on catalog completeness. As an example, we provide two subsamples with S/N > 4.25 and S/N > 4.5 for which the sample contamination and cleaning-induced incompleteness are both as low as the expected Poisson noise for samples of their size. The subsample with S/N > 4.5 has 98% purity and 96% completeness, and will be included in a combined SPT cluster and DES weak-lensing cosmological analysis. We measure the number of false detections in the SPT-SZ candidate list as function of S/N, finding that it follows that expected from assuming Gaussian noise, but with a lower amplitude compared to previous estimates from simulations.


INTRODUCTION
Within the past ten to twenty years, cluster catalogs based on the detection of ICM either via its X-ray emission or via the Sunyaev-★ E-mail:matthias.klein@physik.lmu.deZel'dovich Effect (SZE) signature have grown from just tens of systems to thousands of systems (Lahav et al. 1989;Böhringer et al. 2000;Vanderlinde et al. 2010;Bleem et al. 2015Bleem et al. , 2020;;Finoguenov et al. 2020;Hilton et al. 2021;Klein et al. 2023) and will soon reach tens or even hundreds of thousands (Merloni et al. 2012;Raghunathan et al. 2022).These ICM selected cluster catalogs require optical follow-up to assign cluster redshifts and typically to confirm that the ICM-selected cluster candidate is physically associated with a cluster of galaxies.
With the availability of well-calibrated, large solid-angle optical surveys (e.g., SDSS, KIDS, DES, HSC-SSP, Legacy Surveys; Blanton et al. 2017;de Jong et al. 2013;Flaugher et al. 2015;Aihara et al. 2018;Dey et al. 2019) and mid-infrared surveys like that from the Wide-Field Infrared Survey Explorer (WISE, Wright et al. 2010), the confirmation and redshift assignment can be done systematically over large portions of the sky.In the past, the final cluster catalogsespecially those employed for cosmology-were often defined such that the confirmation and redshift assignment would have a negligible impact on the completeness of the original ICM candidate list (e.g., Mantz et al. 2010;Benson et al. 2013;Reichardt et al. 2013;Bocquet et al. 2015;de Haan et al. 2016).This approach is now coming to its limit, because the larger cluster samples needed for improved cosmological constraints require improved control over systematicsincluding even the impact of follow-up confirmation and redshift assignment on sample completeness.Moreover, using only information from the ICM-selected catalog to produce a clean sample will lead to significantly smaller samples than would be possible using additional information from the optical follow-up.
Examples of combining X-ray-selected cluster candidate catalogs and systematic optical follow-up include the confirmation of ROSAT selected clusters via DES (Klein et al. 2019), SDSS (Finoguenov et al. 2020) and the Legacy Survey DR10 (Klein et al. 2023).These efforts yielded thousands of new galaxy clusters extending to higher redshifts and with an angular density many times higher than previously selected ROSAT cluster samples that relied on individual cluster imaging and spectroscopy.The eFEDS X-ray survey (Brunner et al. 2022) carried out by eROSITA on the satellite Spektrum-Röntgen-Gamma (Predehl et al. 2021) has been analyzed with MCMF using HSC-SSP and Legacy Survey DR9 data yielding a 94% pure sample with 477 confirmed clusters over 140 deg 2 .A subset with 450 clusters was recently used for the first eROSITA-based cluster cosmology (Chiu et al. 2023).The usage of MCMF based cleaning in this study allowed us to increase the sample useful for the cosmological study by more than a factor two compared to solely relying on X-ray data.
Similar systematic optical follow-up of SZE-selected cluster candidates has been carried out.The analysis of a set of SPTpol-ECS candidates was pursued with the redMaPPer algorithm (Rykoff et al. 2014) in targeted mode using DES data, supplemented with WISE and Panstarrs survey and pointed Spitzer IR and PISCO observations (Bleem et al. 2020).The ACT cluster candidate list has also been systematically followed up using DES and other data (Hilton et al. 2021).Recently, a new low S/N SZE-selected candidate list from the Planck mission dataset has been followed up using the MCMF algorithm with DES data, resulting in the discovery of the highest redshift Planck selected systems to date as well as a tripling of the number of confirmed Planck clusters in the DES survey footprint (Hernández- Lang et al. 2023).
In the analysis presented here, we carry out a similar study of confirming an SPT-SZ candidate list that extends to lower S/N than has been previously attempted in Bleem et al. (2015).We apply the MCMF algorithm to the SPT-SZ candidate list down to S/N=4 using the DES and WISE datasets, cross-checking previously confirmed SPT-SZ clusters but also identifying many lower-mass, previously undiscovered galaxy clusters.
This paper is structured as follows.In Section 2 we describe the dataset used in this work, and in Section 3 we outline the cluster confirmation method.The SPT-SZ MCMF cluster catalog is presented in Section 4 and validated in Section 5.The conclusions are sum-marized in Section 6.Throughout this paper we adopt a flat ΛCDM cosmology with Ω  = 0.3 and  0 = 70 km s −1 Mpc −1 .

DATA
In this paper we make use of the photometric catalog from DES observations obtained within the first three years of the survey (Y3) and the SPT-SZ cluster candidate list down to S/N=4.For the high-z confirmation of cluster candidates, we further make use of mid-IR data from the WISE satellite (Wright et al. 2010;Mainzer et al. 2011) in the form of a matched catalog between DES and the UnWISE catalog (Schlafly et al. 2019).The following subsections provide an overview of the datasets used.

The DES Y3A2 GOLD catalog
For the optical confirmation out to  ≈ 1.3 we make use of the DES Y3A2 GOLD catalog, which is based on , ,  and  band DECam (Flaugher et al. 2015) imaging data from DES between August 2013 and February 2016.Details on the data reduction and data quality are given elsewhere (Abbott et al. 2018;Morganson et al. 2018).
The DES Y3A2 GOLD catalog is a value-added version of the photometric catalog released in the public data release 1 (DR1; Abbott et al. 2018).The catalog covers approximately 5000 deg 2 in area with typically 3-5 exposures per band and reaches 95% completeness limits of 23.72, 23.34, 22.78 and 22.25 mag in the , ,  and  bands, respectively.The catalog includes additional calibration steps, flags and types of photometry.In our work we make use of the multi-epoch, multi-band, multi-object fitting photometry "MOF", which is based on the ngmix code (Sheldon 2014) and fits a galaxy model to each single epoch exposure and band at the same time, considering the different PSF shapes and sizes.Furthermore, it simultaneously fits neighbouring sources for improved deblending.In addition to MOF we make also use of single-object fitting (SOF) photometry, which is derived in a similar way but masking neighbouring sources rather than fitting them.As SOF turned out to be more robust, while MOF provides better photometry in crowded regions, we make use of SOF photometry in cases where MOF has failed.
Out to  = 22.2 mag we use the star-galaxy separator available in GOLD, which is an expanded version of that available in DES Y1A1 (Drlica-Wagner et al. 2018) and includes MOF/SOF-based extent information.For fainter magnitudes we do not apply a star-galaxy separation to maximise sensitivity to small, high-redshift cluster galaxies.The resulting impact on cluster richness from residual contamination by stars in the galaxy sample is minimized by using a local background measurement, which works well in the limit that the residual stellar density near the cluster position is nearly constant.
In addition, we make use of mask flags to exclude regions around bright stars and the "top of the galaxy" calibration including SEDbased de-reddening of sources due to interstellar dust provided in the DES Y3A2 GOLD catalog.

The SPT-SZ cluster candidates with S/N>4
The SPT-SZ survey is based on observations with the SPT-SZ camera on the 10m South Pole Telescope (SPT; Carlstrom et al. 2011), which has a 1 degree diameter field-of-view and a resolution of about ∼1 arcmin.The survey was conducted from 2007 to 2011, covering 2,500 deg 2 between 20h<RA<7h and -65°<Dec<-40°and in three frequencies 95, 150 and 220 GHz.The source detection via the thermal SZE is performed on the 95 and 150 GHz maps using a matched-filter approach (Bleem et al. 2015).The SPT-SZ cluster candidate list contains 1,518 sources with S/N>4, of which 1,395 (92%) fall within unmasked areas of DES that are suitable for optical follow-up.

WISE
The WISE satellite is a mid-infrared telescope with a main mirror of 40 cm observing in four bands at 3.4 µm, 4.6 µm, 12 µm and 22 µm (1, 2, 3, 4).The observing campaign can be divided in three phases the main phase, with sufficient cooling propellant to observe the full-sky 1.5 times in all four bands.A second phase called NEOWISE was performed immediately after the main campaign and without cooling completing the second full-sky observations in the 1 and 2.A third phase of WISE observations (NEOWISE-R) started in September 2013 when WISE was recommissioned after more than two years of hibernation.Since then WISE completes a full-sky survey every ∼ 6 months.
In this work we use the unWISE catalog (Meisner et al. 2019) that makes use of all WISE data until the end of the first year of the NEOWISE-R phase.It is based on the unblurred coadds of WISE imaging data (unWISE Lang 2014) and includes improved source detection and deblending modeling for crowded regions.The catalog yields a gain of 0.7 mag in depth and contains twice the number of galaxies with respect to the AllWISE catalog (Cutri et al. 2013) that is based on solely the main and he NEOWISE phase of WISE observations.

CLUSTER CONFIRMATION METHOD
For cluster confirmation and redshift determination of the majority (>90%) of SPT-SZ cluster candidates we use the multi-component matched filter cluster confirmation tool (MCMF; see details in Klein et al. 2018Klein et al. , 2019) ) with DES photometric data.In Section 3.1 we summarise the method and describe some recent modifications.
From the previous SPT-SZ sample (Bleem et al. 2015) we expect a significant fraction (∼ 8%) to be at  > 1, where the DES imaging data need to be complemented with NIR or IR imaging.For that reason we develop a high-z cluster confirmation tool, following the MCMF concept but using a combination of DES and WISE (mid-IR) photometry data.This is described in Section 3.2.Finally, we review the optical morphological measures that we extract for the SPT-SZ MCMF clusters in Section 3.3.

MCMF
The MCMF algorithm has been designed for the confirmation and characterization of ICM-selected cluster candidates identified in large X-ray or SZE surveys.MCMF has been successfully applied to ROSAT X-ray sources over the DES footprint (MARD-Y3; Klein et al. 2019) and more recently in combination with the Legacy Survey DR10 dataset (Dey et al. 2019), it has been used to create the allsky optically-confirmed X-ray cluster catalog (RASS-MCMF; Klein et al. 2023).In addition, it was used for the optical follow-up of the first eROSITA-based galaxy cluster catalog over the early mission test field eFEDS (Klein et al. 2022).Beyond this, it has also been applied to new S/N>3 Planck SZE-selected catalogs over the DES region (MADPSZ Hernández- Lang et al. 2023).In working with these different datasets, improvements and extensions to the original method have been made.In these applications, the new MCMF based catalogs significantly enhanced the number of clusters that had been previously extracted from the same X-ray or SZE datasets and followed up with cluster-by-cluster imaging and spectroscopy.In addition to enlarging the samples, the MCMF method allows one to limit the contamination of the new samples.
The MCMF algorithm includes a red sequence technique (Gladders & Yee 2000;Rykoff et al. 2014) with redshift and magnitude dependent color filters in the  − ,  −  and  −  colors, a radial weighting (projected NFW profile centered at ICM selected candidate location) and a characteristic magnitude range to estimate redshifts and richnesses for candidates.From the cluster candidate list it makes use of the source position and an ICM-based mass proxy.The mass proxy is used to estimate the radius  500 within which galaxies are counted to estimate cluster richness.In this work we make use of the SPT-SZ candidate S/N together with a calibration of the S/N-to-mass relation (Bocquet et al. 2019) to extract a cluster mass estimate for a range of hypothetical redshifts.
For each cluster candidate, the color and radially weighted, background-subtracted richness () within  500 is calculated as a function of (a priori unknown) candidate redshift.The peaks in () are then identified and modeled with so-called "peak profiles" (see below).If present, multiple richness peaks (≤ 3) along the line of sight toward each candidate are recorded.Examples of peak profiles and their best fit to () profiles of clusters are presented in previous MCMF analyses (e.g.Figs. 4 & A2, Klein et al. 2019).These peaks with associated richnesses and redshifts are then collected and processed further as described in the following subsections.
Note that the peak profile models are built using renormalised stacks of individual () profiles from clusters with spectroscopic redshift measurements (spec-z's).The clusters with spectroscopic redshift do not need to be part of the sample that is being studied.Important here is that the redshift dependency of the SZE observablederived estimate of  500 be the same for the spec-z clusters and the candidates to be analysed.To ensure this, we assign a value  of the SZE observable S/N to the spec-z clusters that is consistent with their masses and redshifts that then can be used as input to the MCMF pipeline.
To confirm clusters we characterize the likelihood that a given optical counterpart is a chance superposition rather than a physical counterpart to the ICM-selected cluster candidate.Doing so requires knowing the typical richness distribution of contaminants as a function of redshift within the survey region.Thus the same exercise employed on the candidates is then repeated using random positions within the SPT-SZ footprint, using the same distribution of  and excluding regions containing SPT-SZ detections.These random positions provide the richness distribution of non-SZE detected structures (noise, projections, undetected clusters).The richness distributions from the random lines-of-sight and true clusters are redshift dependent, because they are impacted by the selection function, the evolution of the halo mass function and the noise in the richness estimate.
To be able to control the contamination of the final cluster sample, we calculate a quantity  cont .High values of  cont indicate a higher probability that the candidate in question is a chance superposition rather than a real cluster. cont is calculated using the mean richness distributions along the random lines-of-sight  rand (, ) and the richness distributions  obs (, ) towards the candidates.That is, for each candidate  we calculate the number of random lines-of-sight within a redshift bin with richness  ≥   and divide by the number of SZE candidates within the same redshift bin with  ≥   .This ratio is then re-scaled according to the total number of SPT candidates and random lines-of-sight.
This  cont parameter is calculated for each richness peak associated with a candidate.The peak showing the lowest value of  cont is assigned as the best optical counterpart for the SPT-SZ candidate, because it is the most likely to be a real cluster.
The cluster sample itself can then be defined as those candidates showing an  cont below a certain threshold value  max cont .The threshold value corresponds to the fraction of the contamination in the initial candidate list that makes it into the final cluster catalog.The contamination of the resulting final cluster catalog would then be  SZE−cont ×  max cont , where the confirmed catalog contains all candidates with  cont (  ,   ) ≤  max cont and the initial contamination of the candidate catalog is  SZE−cont .As an example, if the input catalog is known to be 50% pure and an  cont threshold value  max cont = 0.2 is employed, then the contamination fraction of the confirmed cluster catalog would be 0.5 × 0.2 = 0.1 or 10%.
The version of MCMF applied here is-aside from the different mass proxy-largely the same as the version applied previously to two previous X-ray samples (Klein et al. 2019(Klein et al. , 2022)).Some minor improvement we made on the estimate of the redshift uncertainty.Based on the analysis on mock data, that include effects such as scatter in photometric calibration, intrinsic and measurement scatter of cluster member galaxy colors and structures along the line of sight, we find that cluster photo-z scatter can be reasonably well described as   =  ()/ √︁ ().Here  () is a scale factor as function of cluster redshift that is calibrated empirically with spectroscopic redshifts.The photometric redshift uncertainties that we list are therefore redshift and richness dependent.A second improvement specific to this work is a second iteration on the estimate of the richness distributions along random lines-of-sight.The richness distributions along random lines-of-sight  rand is supposed to resemble the expected richness distribution of contaminants as function of redshift.Given the correlation between , cluster mass and likelihood of a source being a real cluster, the initial choice of using the same  distribution as the full candidate list causes the estimate of  cont to be mildly biased high.To avoid this bias we use the  distribution of rejected systems (  cont > 0.3) as proxy of the distribution of contaminants and to select a subsample of randoms that follows this distribution and remeasure  cont for all candidates.

High-redshift extension using WISE
Besides the fact that passive galaxies become fainter with increasing redshift, the rest-frame wavelength range covered by the DES bands no longer brackets the 4000 Å break at redshifts  ≳ 1.Therefore, photometric redshifts become increasingly uncertain at these redshifts.For high-redshift cluster confirmation and photometric redshifts, it is therefore advantageous to move to redder bands such as the mid-IR regime covered by the Spitzer or WISE satellites.Data from both satellites were previously used for high-redshift cluster searches (Muzzin et al. 2009;Gonzalez et al. 2019) as well as for cluster confirmation (Bleem et al. 2015(Bleem et al. , 2020)).
In our current analysis, we use the unWISE catalog (Schlafly et al. 2019) that additionally includes more recent 1 and 2 band WISE imaging data from the NEOWISE-R phase to create deep catalogs without PSF scale smoothing of the data and includes an improved modeling of crowded regions (Meisner et al. 2019).WISE data exist over the full sky, and therefore we match the unWISE and DES   0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2 catalogs to allow for optical+IR photometry for all WISE sources.The optical to WISE galaxy colors (e.g., -1) have strong redshift dependence and are therefore well suited for getting high-quality cluster redshifts.The -1 and 1-2 color variation of passive galaxies with redshift is illustrated in Fig. 1 using DES measurements in the COSMOS field.One downside of WISE 1 is the large PSF (∼ 6"), which becomes a problem in dense regions such as the cores of galaxy clusters.In such dense regions, the separation of individual sources and the deblending of fluxes is a challenge.Here the improved modeling of crowded regions in unWISE compared to previous ALLWISE catalog becomes relevant.
Finally, we account for masks or missing data in the different surveys by deriving separate richnesses for cluster regions with different coverage (DES + 1 only, WISE 1 only, WISE 1, 2 & DES) and sum them for the final cluster richness estimate.With this approach we must track only the masked area in 1 imaging.The total richness in the high-redshift code is therefore given as where the individual richnesses are defined in the same manner as the standard MCMF richness (see Klein et al. 2018Klein et al. , 2019)), with the color-weights depending on the availability of the bands (, , 1, 2 for  DES+w1+w2 , , , 1 for  DES+w1 , 1, 2 for  w1+w2 and no color weight for  w1 ).
The high- cluster confirmation code has been applied to all candidates and over a redshift range from 0.63 <  < 2.0.Similar to the optically-based MCMF code, we perform runs along random lines-of-sight and calculate  cont for potential counterparts to the 0:6  0:8  1:0  1:2  1:4  1:6  1:8  z Bocq+19   0:6   0:8   1:0   1:2   1:4   1:6   1 SZE candidate.We make use of clusters with spectroscopic redshifts available for a subset of the sample and calibrate the WISE-based measurements.We further make use of the overlap in redshift between the optically-based MCMF and the high-z WISE-based run of 0.63 <  < 1.3 to compare richnesses.
In Fig. 2 we show a comparison between the redshifts obtained with the high-z code and the redshifts provided for the previous catalog (Bocquet et al. 2019) for clusters with  cont < 0.2, showing reasonable agreement between WISE informed redshifts and the redshifts coming from dedicated optical, IR and spectroscopic followup.To avoid complicated modeling of the richness-observable and richness-mass relation it is further favourable that both richnesses share an approximately similar scatter.For that reason we investigate the ratio of richness to the SZE-based mass estimate.Using the assumption that the richness-mass slope is approximately one and that there is no redshift evolution of the scaling relation, this ratio is a measure of the scatter of the lambda-mass relation.In Fig. 3 we show this ratio for the DES-only measurements of the  > 4.25 subsample (see Table 1) and also for the high-z code measurements, here with the additional requirement that the cluster redshift must lie at  > 0.63.As visible in Fig. 3, the width and the mean of the distributions for the high-redshift code and the DES-only code appear very similar.Both exhibit some deviation from a normal distribution.Fitting a normal distribution for the close-to-normal part of the distribution above log(/ 500 ) = 1.2 yields consistent mean ratios ( DES = 1.43 ± 0.02,  WISE = 1.42 ± 0.01) and standard deviations ( DES = 0.14 ± 0.02,  WISE = 0.13 ± 0.01), providing evidence that the two richness measurements exhibit similar relations to the SZE-based mass estimates.The cross-over in the ability to confirm cluster candidates is in the 1 <  < 1.3 regime where the mid-IR selection of WISE starts to see more of the cluster population than is visible in the DES data.

Optical estimators of cluster dynamical state
Following our previous work on X-ray selected clusters from ROSAT and eROSITA (Klein et al. 2019(Klein et al. , 2022) ) we provide for SPT-SZ MCMF clusters estimators related to cluster morphology or dynamical state.Here we briefly describe the different estimators and refer the interested reader to our previous work for details (see also Wen & Han 2013) .We provide six dedicated measurements related to the morphological appearance of the galaxy cluster in the optical data.Additionally, the offsets between SZE and default optical centre as well as SZE and a centre derived from fitting a 2D model to the galaxy distribution are presented and can be used as measures of the cluster dynamical state.The 2D model centre is extracted while measuring the cluster morphology estimator  (described in Klein et al. 2019Klein et al. , 2022;;Wen & Han 2013).The estimator  measures the normalised deviation from a smooth two dimensional elliptical King model (King 1962) fitted on the smoothed galaxy density map of red sequence cluster galaxies.Besides the normalised deviation from the model, we also provide the ellipticity  of the fitted model as a measure of cluster morphology.The ridge flatness , compares the concentration of fitted one dimensional King profiles along different angular wedges and is the ratio of the lowest concentration value to the average concentration.Low-mass clusters falling into a massive cluster will cause the radial galaxy density profile to flatten towards the merger direction causing a low value of .A third estimator is the asymmetry factor  (Wen & Han 2013), which measures the normalised average difference between pixel values in the galaxy density maps for pixels lying across from each other with respect to the cluster centre.All four estimators are correlated with one another; they are based on the same galaxy density maps and associated noise, and they are sensitive to similar features-primarily the asymmetry (Klein et al. 2022).
The last set of two estimators are derived by running SExtractor (Bertin & Arnouts 1996) on the passive galaxy density map.We use the resulting source list to identify nearby structures close to the main cluster and list the distance in terms of  500 as well as the ratio of the flux_auto measurement of the main and the second structure.The flux ratio can be thought of as a richness ratio, and it therefore serves as a proxy of the mass ratio between the two structures in question.The combination of both estimators makes it possible to select merger candidates or cluster pairs that exhibit a certain mass ratio.

CLUSTER CATALOG
We present the new SPT-SZ MCMF cluster sample in Section 4.1 and then discuss the sample contamination (Section 4.2) and com-  pleteness (Section 4.3).In Section 4.4 we discuss the impact of DES masking on the survey solid angle.Finally, we present the results of the cluster morphological or dynamical state estimators in Section 4.5.

Defining the SPT-SZ MCMF galaxy cluster sample
As detailed in Section 3.1, the MCMF  cont measurements for each candidate provide a means of defining cluster samples with the desired contamination level.For the catalog we present here, we adopt an  cont threshold  max cont = 0.2, which allows for 20% of the original contamination present in the SPT-SZ candidate list to slip through into the confirmed cluster sample, which we call SPT-SZ MCMF.As we will show in detail in Section 4.2 we do have measurements of the amount of contamination of the original SPT-SZ candidate catalog as a function of SPT-SZ detection signal-to-noise .For  > 4.0 we measure an original contamination  SZE−cont = 45%.The final sample is therefore expected to have 0.2 × 0.45 = 9% contamination and it contains 811 clusters.The  cont selection threshold introduces incompleteness in the catalog at the level of 5%, because while the  cont selection filters out contaminants it also removes some real, low-richness clusters.Details of this sample and other subsamples described below are shown in In selecting the best suited cluster sample for a given science investigation, different sample criteria can be more or less important.
To guide the reader, we provide here two additional subsamples of the SPT-SZ MCMF cluster catalog by varying  max cont and  selections.The first of the two subsamples ( > 4.25 in Table 1) has SZE selection thresholds  > 4.25 and  max cont = 0.125.Our current understanding is that cluster subsamples with  > 4.25 are better suited for studies relying on well behaved SZE-based cluster masses (e.g., mass-observable scaling relations or cluster counts cosmology analyses).Furthermore, when modeling cluster counts to derive cosmological constraints, it simplifies the analysis if the subsample contamination is low enough that it does not require detailed modeling.A guideline here is that the contamination fraction is at or below the level of the Poisson noise associated with the full subsample.Given the sample sizes here, the target upper limit for the contamination ranges between 3% and 4%, and this is met for both of the subsamples presented.A similar target can be set for the completeness of the sample, relative to its original SZE selection.As we will show in Section 4.3, the particular choice of  max cont used for this subsample meets requirements for both, purity and completeness.We also note that the incompleteness due to optical cleaning can be accounted for (see e.g.Grandis et al. (2020); Chiu et al. (2023)).
The  > 4.5 subsample represents a more conservative selection in  and a looser cut in  max cont = 0.13, which remains in the well-tested -regime.The contamination is low in the SZE candidate list, the predicted false detections from simulations and observations are in good agreement.The optical selection plays an insignificant role, introducing 3.5% incompleteness through  cont selection while providing a high (98%) purity sample.The  > 4.5 subsample will further be part of the upcoming SPT cosmological analysis, which includes modeling of the impact optical cleaning on the sample selection.
The redshift distribution of the SPT-SZ MCMF cluster sample is shown in Fig. 4, where it is compared to the previously released SPT-SZ catalog (Bleem et al. 2015) with updated redshifts from Bocquet et al. (2019) and the SPTpol Extended Cluster Survey catalog (SPTpol-ECS Bleem et al. 2020).Of the 811 confirmed clusters in the SPT-SZ MCMF catalog, 91 are at  > 1.This is a substantial increase compared to the 516 clusters in the previous catalog, and it more than doubles the number of high-redshift clusters.The DES data cover only 92% of the SPT-SZ sources, and consequently we miss 34 confirmed clusters from Bleem et al. (2015), and would expect ∼69 clusters to lie outside the footprint.Adding the other published SPT-based SZE cluster catalogs, SPTpol-ECS (Bleem et al. 2020) and SPTpol 100d (Huang et al. 2020), and excluding duplicates, we obtain a combined catalog containing 1,343 clusters, exceeding the number of confirmed SZE clusters from the second Planck catalog of Sunyaev-Zeldovich sources (Planck Collaboration et al. 2016), but lying below the number of S/N>4 candidates presented by the ACT collaboration (Hilton et al. 2021).

Initial contamination of SPT-SZ candidate lists
As discussed in Section 3.1, the expected contamination fraction of a sample selected using a particular  cont threshold (i.e., clusters with 2) applied to the  > 4.5 SZE-selected sample with different assumptions for the initial contamination.The best fit initial contamination of 15% is shown in blue.Bottom: Best fit results for five different SPT-SZ selection thresholds  =5.0, 4.75, 4.5, 4.25 and 4.0 arranged from top to bottom that indicate an initial purity of 97.5%, 95%, 85%, 69% and 55%, respectively.For each case the cyan point at  cont =0.8 shows an independent purity estimate from the mixture model method using the distribution of candidates in log 10 (10 14 / 500 ).Both methods are in good agreement with eachother for all thresholds in  .clusters is where  MCMF (  cont <  max cont ) is the number of systems in the MCMF confirmed catalog with  cont values below  max cont .The ratio  real / cand , where  cand is the number of SPT-SZ candidates, should reach but not exceed the expected purity of the candidate list (1 −  SZE−cont ).Incorrectly estimating  SZE−cont would lead to inconsistencies, such as finding (1) more real clusters than allowed or (2) falling numbers of real clusters at high  max cont .One illustrative example for an SPT-SZ sub-sample with  > 4.5 is shown in the top panel of Fig. 5.Here we show the behaviour of  real / cand for five different values of initial contamination fractions from 9 to 21% in steps of 3%.The horizontal lines show the expected purity (1 −  SZE−cont ) for the curves with the same color.As can be seen, setting the initial contamination  SZE−cont too high (lowest two curves in red and green) causes an over prediction of real clusters (lines with data points) relative to that expected number given the assumed contamination level (flat line of same color).This clear inconsistency excludes these high contamination levels.For the lower assumed contamination cases with  SZE−cont ≤ 0.12, the curves with data points (black and magenta) continue rising over the full range of  cont .This is a very unlikely scenario, given the expected richnesses of SPT clusters ( > 20) and the richness levels probed at  cont > 0.6 ( ≈ 2).For the initial contamination level of  SZE−cont =0.15 (black curve with data points) we find a stable solution where the fraction of real clusters converges to the expected contamination fraction and then remains roughly constant above  cont >0.4.This is a clear indication that this  > 4.5 SPT-SZ candidate list has ≈15% contamination.
In the lower panel of Fig. 5 we show the results for five SPT-SZ candidate lists with different thresholds in signal-to-noise  of 5, 4.75, 4.5, 4.25 and 4. In these cases we remeasure  cont for each subsample using the appropriate signal-to-noise thresholds in the candidate and random sample.One can read off the purity of these samples to be 97.5% (magenta), 95% (red), 85% (blue), 70% (green) and 55% (black), respectively.As expected, going to lower SPT-SZ signal-to-noise decreases the purity of the initial candidate lists.But as we have previously emphasized, the MCMF algorithm enables the removal of a large fraction of the contamination and the delivery of an overall larger cluster sample.This enlarged sample extends to lower masses at all redshifts and therefore typically also extends to higher redshift.
The second and main method to derive the level of initial contamination for different SPT-SZ candidate lists makes use of the richness distributions of the candidates and along random linesof-sight and follows our previous work on X-ray selected clusters (Klein et al. 2022(Klein et al. , 2023)).Contrary to the first method it does not rely on  cont selection or on the correct derivation of  cont .To estimate the initial contamination, we model the distribution of candidates in log 10 (10 14 / 500 ) space as a mixture of a contamination and a cluster model.The contamination model is directly derived from the measurements along random lines-of-sight as a histogram in log 10 (10 14 / 500 ) that can be re-scaled to adopt for different amounts of contamination.As richness scales approximately linear with mass, the cluster population in log 10 (10 14 / 500 ) can simply be assumed to be normally distributed.The total model therefore consists of just four parameters, three for the normal distribution and just one, the normalization, for the contaminant distribution.As example, the observed and the fitted model for the  > 4 candidate list is shown in Fig. 6.For the estimate of the contamination we are solely interested in the best-fitting contamination model, which is predominantly determined at log 10 (10 14 / 500 ) < 1, where the cluster component plays no significant role.
The resulting purity estimates of this sample together with results for the other candidate lists with different  thresholds are shown as cyan points in the lower panel of Fig. 5.The results of both methods are consistent.These two methods can be used to derive the level of initial contamination in a candidate list even in the case where contaminants and clusters are not clearly separated in, e.g., a space of  versus redshift.

SPT-SZ MCMF incompleteness due to optical cleaning
As already mentioned, the MCMF algorithm for excluding contamination can also exclude real clusters.The impact of the  cont -based selection, which is essentially a redshift dependent threshold in richness, can be modeled using the richness-mass relation (e.g., see Klein et al. 2022).In addition, a rough estimate of the overall completeness can be derived using the previously estimated initial contamination.The initial contamination defines the number of real clusters in the candidate list as well as the number of real clusters given the  cont selection threshold.The differences between these two reflects the impact on the completeness of the real cluster sample.The impact of  cont -based cleaning is already clearly visible in Fig. 5 (see bottom panel) as the difference between the horizontal lines that show the fraction of the candidates that are real clusters (1 −  SZE−cont ) and the curves with data points that show the recovered fraction of real clusters as a function of the  cont threshold employed.
In Fig. 7 we show the expected completeness of the  cont selected sample with respect to the number of real clusters in the SZE selected sample versus the purity; this is shown for the same five SPT-SZ  thresholds examined previously in Fig. 5.Each curve is built by calculating the purity and completeness for a range of  cont selection thresholds increasing from right to left.The purity is derived as 1 −  SZE−cont ×  max cont and the completeness is the fraction of real clusters that survive the  cont selection.As one can see, the impact of the optical cleaning on the completeness remains mild (≤ 5%) for all SPT-SZ subsamples until one reaches a purity of 95% or above, after which the completeness drops precipitously.Moreover, the highest purity samples tend to be smaller.One must consider these impacts when selecting a sample for scientific analysis.
As discussed in Section 4.1 the expected amount of sample Poisson noise (i.e., important for cluster count statistics) is simply 1/ √  clust , and corresponds to 3.5-5% for  thresholds of 4 to 5 in the SPT-SZ sample.This sets an upper limit for the target contamination such that contamination need not be explicitly modeled.While this translates into a completeness of 91% for the  > 4 sample, it results in >96% completeness for higher  threshold SPT-SZ subsamples, bringing the incompleteness below the level of Poisson noise of these sub-samples.This means for sub-samples with  thresholds of 4.25 or higher we are able to construct cluster samples where contamination and incompleteness are both below the expected Poisson noise, which implies that these effects will have a sub-dominant impact.Even in the contrary case, the sample incompleteness can be straightforwardly accounted for by modeling the selection in  jointly with the requirement that the richness  exceeds the threshold corresponding to  cont .A sample selected according to both variables  and  is then complete with respect to a model that accounts for the joint selection.This modelling approach has already been successfully applied in the cosmological analysis of a real cluster sample (Chiu et al. 2023).However, currently, explicit modelling of contaminants in ICM-selected samples is still lacking.This explains the choice of a very clean (98% pure)selection for the  > 4.5 sub-sample used in the upcoming SPT-based cosmological study (Bocquet et al., in prep.).This study explicitly includes modelling of the incompleteness due to MCMF-based cleaning but relies on high purity to avoid modelling contamination.

Impact of DES masking on SPT-SZ MCMF survey solid angle
The previous section covers the impact of optical cluster confirmation on the completeness and purity of the final cluster catalog within the general DES footprint.One additional problem that arises with optical confirmation is that within the DES footprint there are areas with missing optical data due to, e.g., bright stars or a lack of data due to poorly performing CCDs or even chip gaps.We follow these regions by building a sky mask for the optical data.The impact of missing data on cluster confirmation depends on the location and size of the masked region with respect of the SPT-SZ candidate position and effective size  500 ().Prior to confirmation the redshift and therefore the corresponding cluster size  500 () is not well known; therefore, we use the sky masking fraction within a fixed angular distance around the candidate locations to characterize the importance of masking.
To estimate the impact of masking on cluster confirmation, we use the confirmation fraction as a function of the masking fraction.We explore three different apertures sizes with radii of one, two and three arcminutes to test the sensitivity to masking.As a baseline we use clusters not showing masking and derive confirmation fractions  (  cont < 0.2)/ cand of 0.588 ± 0.014, 0.593 ± 0.014 and 0.595 ± 0.015 for the three apertures in discussion.Looking into the confirmation fractions we see that we would not have confirmed any candidate with mask fractions greater 0.56 in the two or three arcminute apertures and in total only one out of eight candidates with mask fractions greater 0.5.We therefore decided to re-define the minimum definition of a source to be considered within the DES footprint to have at least one source within 1 arcminute and a mask fraction within two arcminutes below 0.5, effectively reducing the footprint by 0.6%.There is no statistically significant impact visible on the confirmation fraction between mask fractions zero and 0.5.Taking all sources in that masking range we find confirmation fractions of 0.52 ± 0.05, 0.52 ± 0.04 and 0.54 ± 0.03, consistent within two sigma from the baseline confirmation fractions.This residual effect can generally be accounted for by an overall re-scaling of the footprint area by 1.5%.But we note that this correction is on the level of one sigma, given the uncertainty on the confirmation fraction of the unmasked clusters, this correction is likely not necessary for most studies using this sample.

Optical morphology and dynamical state
The morphology of a cluster can be an indicator of dynamical state, and so in principle the cluster morphology can be used to identify samples of clusters for the study of the dynamical evolution of clusters and cluster components.With the SPT-SZ MCMF catalog we include optical morphological estimators of dynamical state for all confirmed clusters; however, the quality of the measurements depends on richness and redshift.Increasing the richness selection threshold will further improve the robustness of the estimates.We therefore recommend restricting morphology analyses to the redshift range of 0.1 <  < 0.9 and a richness  > 40.In Fig. 8 we present comparisons among the four morphology estimators , ,  and  that are described in Section 3.3.As expected, the estimators are strongly correlated.
A preliminary comparison to X-ray morphological merger estimators that trigger on the skewness and ellipticity of the ICM distribution (e.g., Mohr et al. 1993;Nurgaliev et al. 2013Nurgaliev et al. , 2017) ) for a subset of clusters that have Chandra or XMM-Newton observations shows little correlation, underscoring that optical and X-ray merger indicators are sensitive to different stages of cluster mergers and are also affected differently by projection effects.A simple example for such a case of mismatching classifications is SPT-CL J0522-4818, shown in Fig. 9. Therefore, we expect that these optical morphological estimators could be useful in combination with the established X-ray techniques for the purpose of creating a sequence of clusters covering a broad range of dynamical state.

CATALOG VALIDATION
We validate SPT-SZ MCMF through comparison to several other catalogs in Section 5.1, carry out an examination of the contaminant distribution of the SPT-SZ candidate list in Section 5.2 and then carry out a modeling validation in Section 5.3 that employs parameter constraints from a cosmological analysis of the previous SPT-SZ sample.

Comparison of SPT-SZ MCMF to other cluster catalogs
We compare the new catalog to three previously published SZE selected cluster catalogs.

Previous SPT-SZ catalog
To check for consistency we compare our results to the previous release of the SPT-SZ catalog (Bleem et al. 2015) and considering the updated redshifts provided in Bocquet et al. (2019).The expected contamination of the SPT-SZ candidate list at  > 4.5 adopted in the previous study is 15%, and therefore the  cont threshold 0.2 would correspond to an expected contamination in the final catalog of 3%.We find 481 clusters that have redshifts in both catalogs, and all but 4 have  cont < 0.2.In all cases the previously published redshift estimate is consistent with the redshift presented here.The number of (4) unconfirmed systems, corresponding to ∼1% of the previously confirmed  > 4.5 sample, is consistent with the expected incompleteness due to MCMF based  cont selection of 2% given in Fig. 7. Furthermore, some of the previously confirmed clusters could indeed be chance superpositions.We fail to confirm SPT-CL J0334-4645, the highest-redshift SPT cluster at  = 1.7 and one of the lowest-redshift clusters SPT-CL J2313-4243 at  = 0.056.The latter is well identified in MCMF, but its richness is too low to meet the  cont < 0.2 selection.In addition, we fail to confirm SPT-CL J0002-5557, which is listed to be at  = 1.15, whereas our high-z analysis places this cluster at  = 1.37 with an  cont estimate of 0.45.The lack of red galaxies visible in the DES image indicates that this cluster needs to be beyond the MCMF DES redshift reach of  ∼ 1.3.In WISE the cluster is visible as one compact red blob, which may be the reason for the relatively high  cont because counting cluster members for this compact cluster might have failed.The last cluster, SPT-CL J2005-5635, at  = 0.2 shows a low richness resulting in  cont = 0.31.While the other three clusters do have matches in SZE or X-ray surveys, this cluster does not.
In Fig. 10 we show the redshifts derived from combining the MCMF outputs of the DES and the high-z runs,  comb , with those published in Bocquet et al. (2019).As can be seen, there is good overall agreement for the majority of the 481 systems, but there are some outliers.The scatter between spectroscopic redshifts and MCMF redshifts is consistent with that found in our previous work using ROSAT selected clusters (Klein et al. 2019).There are four prominent (Δ > 5) outlier clusters.In these four cases, we find two counterparts along the line of sight, where the second ranked one is consistent with the previously published redshift.In all four cases the primary counterpart redshifts are coming from the DES-based run but are consistent with the counterpart from WISE-based MCMF run, making it unlikely that we are observing a new failure mode in one of the MCMF runs.One possible explanation for these outliers could be that the original SPT-SZ cluster by cluster follow-up may be composed of shallow observations that are sufficient to reliably detect the lower redshift counterpart but miss the higher redshift, more significant counterpart.There is further indication that there might be a mild under estimation of the redshifts given in Bocquet et al. (2019) for  > 0.7.
We conclude from the comparison to the previous version of the SPT-SZ cluster that there is consistency for ∼ 99% of the overlapping sample.The number of previous systems not making our selection threshold is consistent with our estimate of incompleteness introduced by the optical cleaning, and the most prominent outliers in terms of redshift can be explained as multiple optical systems along the line of sight, where the current analysis finds a more significant richness peak than that selected in the original SPT-SZ follow-up.

SPTpol 100d catalog
The comparison to the SPTpol 100d catalog (Huang et al. 2020) is especially interesting, because the deeper SPTpol data enable one to identify a larger number of purely SZE-selected clusters, which can then be compared to the MCMF defined catalog from the fully overlapping but shallower SPT-SZ survey data.This 100d candidate list consists of 89 candidate clusters with a detection S/N  > 4.6.The analysis of image simulations suggests that 81 ± 2 of the candidates are real clusters, which is consistent with the number of optical-IR confirmed systems.
Using a matching radius of 150 arcsec we find 37 matches between the 100d and the SPT-SZ candidate catalogs, with the largest separation being 81 arcsec.Given the fact that contamination of the catalogs is mostly noise driven and the density of contaminants is estimated to be ≈ 0.08 deg −2 for SPTpol and 0.3-0.4 for SPT-SZ, it is highly unlikely that we would find a chance match within this 150 arcsec search radius.Therefore, it is safe to assume that all matches correspond to real clusters.
Out of the 37 matches we find 36 with  cont < 0.2 that are members of the SPT-SZ MCMF cluster catalog.The only cluster above that threshold is SPT-CLJ0002-5557, which was discussed in the previous section.Moreover, missing one cluster out of 37 matches is consistent with the expectation of 2% incompleteness induced by the optical cleaning undertaken in building the SPT-SZ MCMF catalog.Additionally, we find three clusters with  cont < 0.2 that were not previously confirmed (Huang et al. 2020) and one cluster with a disagreement in redshift.We discuss those four systems below.
SPT-CL J2331-5736 (Fig. 11) has a S/N  = 4.25 in SPT-SZ and 8.4 in SPTpol 100d and is the cluster with the highest redshift in the SPTpol sample with  = 1.38 ± 0.1.In Huang et al. (2020) it is noted that there is also a foreground cluster at  = 0.29.MCMF finds the low-z cluster to be at  = 0.2975 with a  cont = 0.005 and the high-z structure at  = 1.41 and  cont = 0.16.Given the  cont values, both richness peaks are considered reasonable counterparts in the SPT-SZ MCMF cluster sample.The low value of  cont makes it highly unlikely that the low-z structure is a chance superposition near a high-z cluster.The richness of  = 62 is consistent with the expectation from the scaling relation.On the other hand, the tentative BCG of the high-z cluster is very close to the peak of the SZE signal.A closer investigation reveals a bright radio source with a SUMSS flux of 147.6 mJy (peak, 179.6 total) at the cluster centre of the low-z cluster, which could cause the SZE signal of this cluster to be partially diluted and its centre to be shifted.
SPT-CL J2321-5419 (Fig. 12) has a S/N  = 5.26 in SPT-SZ and 4.68 in SPTpol 100d .This cluster was not confirmed in the previous SPTpol and SPT-SZ catalogs, because of a bright star close to the SZE postion.The MCMF analysis for this system indicates a redshift of 0.79 and a  cont = 0.07.The high-z code finds a consistent redshift but does not confirm this system, due to masking caused by the bright star.SPT-CL J2357-5953 (Fig. 13) with S/N  = 4.13 in SPT-SZ and 4.66 in SPTpol is unconfirmed in SPTpol 100d, but the MCMF analysis identifies a cluster with redshift  = 0.517 and  cont = 0.02.Additionally, the MCMF analysis identifies a second structure at  = 1.11 with  cont = 0.27.The peak of the SZE signal is approximately in the middle of the two optical structures, which are separated from each other by 100 arcsec.The relatively large separation between the SZE and optical structure positions may have contributed to this system not being confirmed until now.The low probability of having two noise fluctuations in the two SZE surveys agree to within 29 arcsec makes it quite clear that the SZE detection itself is real.The large offset between optical and the SZE centre could be either The last cluster is SPT-CL J0002-5214 with S/N  = 4.48 in SPT-SZ and 5.88 in SPTpol.This cluster is listed as a non detection in SPTpol, but there is a note that there is a potential group at  = 0.44.Noteworthy here is that according to simulations there should not be any noise fluctuations this large in the SPTpol 100d sample.The analysis with MCMF identifies two structures: one at  = 0.41 with  cont = 0.183 and a second one at  = 1.09 with  cont = 0.198.The high-redshift structure is also independently confirmed by the high-z code with a redshift of  = 1.1 and  cont = 0.009.Visual inspection of Fig. 14 shows the rather compact group at intermediate redshift ( ∼ 0.4), but the high-redshift structure is hard to identify by eye.
In the DES , ,  color composite image there is no clear cluster core, but there are a large number of high-redshift passive galaxies scattered over a region of 1.6 Mpc diameter.This becomes even clearer when using a combination of DES and Spitzer imaging data.
We therefore conclude that this system is likely a high-redshift cluster with a low optical concentration.
In addition to checking for matched sources as above, we also check for SPT-SZ sources with low  cont that do not appear in the SPTpol 100d catalog.Because SPTpol 100d is substantially deeper, we do not expect many SPT-SZ confirmed clusters to be missed, but scatter in both S/N estimates and applied selection thresholds do allow for some number of missed systems.In fact we find just one cluster in the overlapping footprints below  cont = 0.2 that is not matched to a SPTpol 100d source.This source, SPT-SZ-CL J2342-5715 has a S/N  = 4.33 with  cont = 0.07 and a redshift  = 0.83 (see Fig. 15).The DES optical image reveals a BCG that is only 33 arcsec away from the SZE peak, but the richness of the optical system  = 20.9 is relatively low.Within a distance of 1.9 arcminutes we identify a low-z foreground structure harbouring a SUMMS source with a flux of ∼ 60 mJy.Given the  cont value, we can expect to have one contaminating source in the overlapping footprint.At the same time given the scatter in S/N in both surveys, the adopted thresholds in S/N and the low S/N of the particular system one could well find some clusters at  > 4 in SPT-SZ that are not detected in SPTpol 100d.To summarise, we find only one SPT-SZ confirmed system that does not appear in the SPTpol 100d catalog, and given the  cont value this system could indeed be a chance superposition of an SPT-SZ noise fluctuation and an unassociated optical system.

ACT-DR5 cluster catalog
The ACT-DR5 cluster catalog (Hilton et al. 2021) is an SZE-selected cluster catalog built using ACT survey data.ACT has similar properties to SPT.The catalog contains 1,843 clusters over the full DES footprint with ACT S/N≥ 4. Allowing for offsets of up to 150 arcsec, we find 415 matches with our SPT-SZ MCMF catalog, where the largest separation is 98 arcsec.Of those matches, 62 clusters have SPT-SZ S/N  < 4.5, and all of them show  cont < 0.1, which indicates that these are very likely real clusters.Out of the full overlapping sample of 415, we find two clusters with  cont > 0.3 and one with 0.2 <  cont < 0.3, all of them are known SPT-SZ clusters with  > 5 and would have been considered as confirmed, given the  cont settings tuned for the  > 5 sample.We also find three systems with different redshift estimates.Two of them indicate two similarly good optical counterparts in the MCMF based analysis where it is the MCMF second ranked system that agrees with the ACT-DR5 redshift.The remaining cluster SPT-CL J0619-5802, has only one clear MCMF counterpart at  = 0.523, in agreement with previous SPT-SZ work.The corresponding ACT cluster ACT-CL J0619.7-5802 is listed with a DES redMaPPer-based redshift of  = 0.391.Visual inspection supports the MCMF analysis with redshift  = 0.523.

Distribution of contaminants in SPT-SZ candidate list
We can use the MCMF algorithm to estimate the number of contaminants as a function of  in the initial SPT-SZ candidate list.Because the SZE is a distinct, negative signal in the 90 and 150 GHz SPT-SZ bandpasses, SZE-selected candidate catalogs contain contamination due to noise fluctuations.Because the noise is close to Gaussian, the number of false detections can be expected to follow a Gaussian noise field.The number of contaminants for the SPT-SZ catalog were estimated previously by running the SZE-based cluster finder on source-free simulations (Bleem et al. 2015).The cumulative number of contaminants as a function of S/N  is shown in Fig. 16 together with the best fit model for Gaussian noise (red line and blue dashed line, respectively).The Gaussian model describes the number of contaminants for >4 with two free parameters: 1) the standard deviation of the noise and 2) a normalisation parameter that is related to the ratio of the total survey solid angle to the effective solid angle of the filter functions used to detect clusters in the maps.This Gaussian model provides an excellent fit to the simulation results.
In the same figure we show the measured number of contaminants extracted using the MCMF-based contamination analysis described in Section 4.2.Interestingly, the shape follows closely that expected from the Gaussian noise model and the image simulations, but the normalization is lower.In comparison to the Gaussian model fit to the image simulation results, the MCMF-based estimate can be better matched if the standard deviation of the noise is reduced by 2.3%.Thus, a mild overestimation of the noise in the SPT-SZ data could therefore lead to the overestimation of the contaminants apparent in Fig. 16.We note here that this difference becomes insignificant at >5, the threshold of the sample used in previous SPT-SZ cosmological studies, but is large compared to the Poisson uncertainties at  ≤ 4.7.
In summary, the distribution of contaminants is consistent with Gaussian noise, as expected, and therefore extremely sensitive to the amplitude of that noise.There is an offset in the number of contaminants predicted by the image simulations and inferred through the MCMF-based analysis that can be explained by a 2.3% change in the standard deviation of the noise.In the next section we model the cluster counts and find evidence that points to an overestimate of the contamination in the SPT-SZ candidate list from the image simulations.

SPT-SZ MCMF validation using cluster counts
Given the new SPT-SZ MCMF cluster sample (Table 1) together with constraints on the residual contamination (Section 4.2) and incompleteness due to optical cleaning (Section 4.3), we can obtain the cluster number counts as a function of SPT-SZ S/N  threshold ( min ), and compare them with the prediction using the results from the cosmological analysis of the previous SPT-SZ sample with  min = 5 (Bocquet et al. 2019).Here of course we are mainly interested in the behavior of the new SPT-SZ MCMF clusters with S/N  min <5.
The expected number of clusters from MCMF-based mixture mode method, as well as from subtracting the simulation-based number of expected false detections from the full list of candidates is shown in Fig. 17 (left panel) alongside the predicted number of clusters.Note that the uncertainties shown for the predicted cluster counts represent the Poisson noise only and do not include the error budget due to uncertainties on cosmology and scaling relation parameters.The uncertainties for the optical method predominantly depend on the number of contaminants in the sample and therefore becomes small at high  min .As can be seen in Fig. 17, the predicted number of clusters shown in magenta agrees with the observed number using the optical method at the 1 level and the behaviour at S/N  min <5 appears to be a meaningful extension to the high  min regime.By contrast, the number of clusters expected from using the simulationbased appears to decrease at lower  min , supporting the picture of a mild overestimation of the noise level in the simulations.The difference between simulation-based and optical-based estimates becomes insignificant at  min = 5, which was used in previous SPT-SZ-based cosmological studies.
Using the results from Bocquet et al. (2019), we can further compare the observed and the expected redshift distributions, which we present in the right panel of Fig. 17.Here we use the  > 4.25 subsample, which-according to Fig. 7-is 96% pure and 96.5% complete with respect to the initial SZE candidate selection.By design the  cont -based selection aims to maintain a constant level of contamination as a function of redshift.Contamination therefore should not alter the shape of the redshift distribution of the sample.The red line in the right panel of Fig. 17 shows the predicted redshift distribution using the results from Bocquet et al. (2019) normalised to same total number of clusters.The predicted and observed shapes of the redshift distributions agree remarkably well.Under the assumption that the contamination fraction is indeed constant over redshift this suggests that the incompleteness introduced by the  cont < 0.2 selection is not significantly impacting the redshift distribution either.2019) is shown in magenta (with 68% confidence region only includes Poisson noise).Right: Redshift distribution of the  > 4.25 subsample from Table 1 (black) and predicted redshift distribution according to Bocquet et al. (2019) (red).The predicted counts in  space are consistent with the observations, indicating that the sample is an extension of the previous  > 5 sample.The agreement of the shape of the redshift distribution with the prediction suggests that the incompleteness introduced by optical cleaning is not particularly pronounced at any redshift.

CONCLUSIONS
In this paper, we present the SPT-SZ MCMF cluster catalog with candidates selected to have SPT-SZ S/N  > 4 that are then confirmed using the MCMF algorithm.This sample represents a ≈ 50% increase in size compared to the previous SPT-SZ catalog and contains 811 clusters with 9% contamination.Subsamples of this new catalog can be selected to have different characteristics (see Table 1).Considering an SPT-SZ S/N threshold  > 4.25 with stricter  cont constraints in order to remove chance superpositions (  cont < 0.125), we obtain 640 clusters with 96% purity.This subsample has a modest 3.5% incompleteness due to optical cleaning with the MCMF algorithm.This sample should meet the requirements for a cosmological analysis and corresponds to a factor two increase compared to the previous SPT-SZ cluster catalog used for cosmological analysis (Bocquet et al. 2019).
We use information derived from our MCMF-based analysis to infer the level of the initial contamination in the SZE-selected sample above several S/N thresholds as well as the purity and completeness after optical confirmation.This information can be used to select the combination of purity, sample size and completeness best suited for a given science study.Studies less impacted by contamination or that suffer from small number statistics may chose larger but more contaminated subsamples, while studies sensitive on contamination may use cleaner but smaller subsamples.The measured initial contamination, expressed in number of false detections above a S/N threshold, follows the shape expected for Gaussian noise.We find a systematic difference between our measurements and those predicted by simulations that could be explained if the noise assumed in the simulation was overestimated by a small amount (2.3%).Comparing number of false detections with number of candidates we find further evidence that the simulation-based estimates over predict the number of false detections as the number of real systems ( cand −  false ) above a S/N threshold appears to decrease when lowering the threshold.
A validation test consisting of the comparison of S/N  and redshift  distributions of the new SPT-SZ MCMF sample to the predictions extrapolated from the previous cosmological analysis of the  > 5 subsample (Bocquet et al. 2019) shows good agreement.This gives us confidence that the new sample is well suited for an updated cosmological analysis that will be carried out in combination with the DES weak lensing dataset to constrain cluster masses (Bocquet et al, in prep).The subsample anticipated for that study is the more conservative subsample that contains 480 clusters with SPT-SZ S/N  > 4.5 and  > 0.25 (see Table 1).
Combining the SPT-SZ sample with SPT-ECS (Bleem et al. 2020) and SPTpol 100d (Huang et al. 2020) the total number of confirmed SPT-selected clusters now raises to 1,343.This number will further rise with the soon to be published sample from SPTpol 500d (Bleem et al., in prep).

Figure 1 .
Figure 1.Passive galaxy colors versus redshift in the COSMOS field, including the 1-2 color from the unWISE catalog (top), and the the DES z minus WISE 1 color (bottom).The observed galaxy colors suggest that cluster redshift constraints can be obtained out to  ≈ 1.5 when adding WISE data to the DES data set.

Figure 2 .
Figure 2. Comparison of redshift estimates from previous SPT-SZ catalog (Bocquet et al. 2019) and those derived with the WISE-based high-z code.Spectroscopic redshifts are shown in red.Wise-based redshifts show generally good agreement with spectroscopic redshifts over the full redshift range although they are only used for clusters at  > 1 in this work.

Figure 3 .
Distribution of richness  over SZE-based mass estimate  500 for richness measurements from the DES-only MCMF code (black) and the WISE-based high-z code (red).Continuous curves show the best fit normal distributions to values above log(/ 500 ) = 1.2 with best-fit standard deviations of  = 0.14 (black) and  = 0.13 (red).

Figure 4 .
Figure 4. Redshift distribution of the new SPT-SZ MCMF cluster sample containing 811 clusters with 9% contamination (red background) in comparison to the previous SPT-SZ catalog (blue) and the SPTpol Extended Cluster Survey (SPT-ECS) catalog in yellow.

Figure 6 .
Figure 6.Example of the empirical estimation of the initial contamination  SZE−cont based on the distribution of candidates in log 10 (10 14 / 500 ).The model (green) of the richness distribution of SPT-SZ candidates as a mixture of contaminants (in blue) and clusters.The composite model contains clusters modeled as a Gaussian distribution.The contaminant population is defined using measurements along random lines-of-sight.

Figure 7 .
Figure7.The purity as a function of completeness is shown for five different thresholds in the SPT-SZ selection threshold  .These curves are built for each sample by varying the MCMF defined optical selection  cont threshold.Through tuning the  cont threshold to lower values, one can create a final catalog with a increased purity at the cost of introducing additional incompleteness.

Figure 8 .
Figure 8.Comparison of different optical morphology estimators described in Section 4.5 for SPT-SZ MCMF clusters (  cont <0.2) with  > 25.Estimators probe different merging properties, but are well correlated.

Figure 9 .
Figure9.Top: DES RGB-image of 9'×5.5'region around SPT-CL J0522-4818.Bottom: Smooth Chandra X-ray image of the same region.The cluster is classified as one of the most unrelaxed systems in optical while having a low X-ray-based disturbance estimate.The two clusters are likely in a pre-merger or early merger state, where the X-ray surface brightness distribution of the main system probed by the X-ray estimators is not yet affected by the merger process.

Figure 11 .
Figure 11.SPT-CL J2331-5736, the cluster with the highest redshift in SPTpol 100d: The top image shows DES ,  ,  color composite image, and below is the DES ,  and Spitzer ℎ1 color composite image.The Spitzer image is taken from the SSDF (SPT Spitzer Deep Field).White contours show SPT-SZ S/N contours starting at 1 and increasing in steps of one.The green circle shows the location of a bright radio source detected in SUMSS.MCMF finds two counterparts, the high-z source visible only in the bottom image (  cont =0.16) and the low-z cluster close to the radio source (  cont =0.005).

Figure 12 .
Figure 12.DES ,  ,  color composite image of SPT-CL J2321-5419.White contours show SPT-SZ S/N levels starting at 1 and increasing in steps of one.A bright star makes it difficult to identify the  = 0.79 cluster members around the star north of the SZE peak.

Figure 13 .
Figure 13.DES ,  ,  color composite image of SPT-SZ-CL J2357-5953, an SPT-SZ to SPTpol 100d match that was not confirmed in SPTpol 100d.White contours show SPT-SZ S/N levels starting at 1 and increasing in steps of one.There are two structures, one at  = 0.517 and another at  = 1.11 with corresponding  cont values of 0.02 and 0.18, that are visible to the southeast and northwest of the SZE peak.

Figure 14 .
Figure 14.SPT-CL J0002-5214, an SPT-SZ match to SPTpol 100d not confirmed in SPTpol 100d: Top image shows DES ,  ,  color composite image and the bottom image shows the DES ,  and Spitzer ℎ1 color composite image.White contours show SPT-SZ S/N contours starting at 1 and increasing in steps of one.There are two counterparts.One is at  = 0.41 with  cont = 0.183 and another is at  = 1.1 with  cont = 0.009.

Figure 15 .
Figure 15.DES ,  ,  color composite image of SPT-SZ-CL J2342-5715.With  cont = 0.07 and redshift  = 0.83 it is the only  cont < 0.2 source that does not have a match in SPTpol 100d within the overlapping footprint.White contours show SPT-SZ S/N levels starting at 1 and increasing in steps of one.The green circle shows the location of a bright radio source detected in SUMSS.

Figure 17 .
Figure 17.Left: Observed and predicted cluster counts above a given SZE selection threshold  min .All candidates are shown in gray, candidates minus predicted contamination from image simulations in red, clusters expected from the mixture model method (see Figs. 6 & 5) in black.The predicted number of clusters according to Bocquet et al. (2019) is shown in magenta (with 68% confidence region only includes Poisson noise).Right: Redshift distribution of the  > 4.25 subsample from Table1(black) and predicted redshift distribution according toBocquet et al. (2019) (red).The predicted counts in  space are consistent with the observations, indicating that the sample is an extension of the previous  > 5 sample.The agreement of the shape of the redshift distribution with the prediction suggests that the incompleteness introduced by optical cleaning is not particularly pronounced at any redshift.

Table 1 .
Properties of the SPT-SZ MCMF cluster catalog along with three subsamples.The table shows sample name, selection criteria  max cont and  min , the expected final sample purity, the completeness with respect to the SZE selection, the total number of confirmed clusters and those above redshift  = 0.25.

Table 1
de Haan et al. 2016;Bocquet et al. 2019S/N threshold, purity, completeness, number of confirmed clusters and number of clusters at  > 0.25.Here the listing of clusters above  = 0.25 is of special relevance for cosmological analyses, which have typically excluded lower redshift systems due to the angular filtering in the SPT cluster selection (see, e.g.,de Haan et al. 2016;Bocquet et al. 2019).
Therefore, it is crucial to know the contamination fraction in the initial candidate list.We use two methods to estimate  SZE−cont for the different SPT-SZ candidate lists (i.e., with different SZE selection thresholds in ).The first follows our previous work in (Hernández-Lang et al. 2023) and uses the fact that in  cont <  max cont selected samples, the completeness should reach 100% for high values of  max cont ∼ 1.The number of expected real cand Figure 5. Empirical estimation of the initial contamination  SZE−cont in different SPT-SZ subsamples.Top: Example of the  max cont -based method (see discussion in Section 4. Cumulative number of contaminants in the SPT-SZ candidate catalog as a function of the SZE signal to noise threshold  min extracted from image simulations (red) and measured from the SPT-SZ catalog using MCMFmased mixture model (black line with uncertainties).The simulation-based as well as the MCMF-based estimates can be well described by a Gaussian noise models (dashed blue and cyan lines).The higher number of contaminants in the simulations can be explained by a 2.3% overestimate of the Gaussian noise used in the image simulations .See discussion in Section 4.2.