The AGN fraction in high-redshift protocluster candidates selected by Planck and Herschel

A complete understanding of the mass assembly history of structures in the universe requires the study of the growth of galaxies and their supermassive black holes (SMBHs) as a function of their local environment over cosmic time. In this context, it is important to quantify the effects that the early stages of galaxy cluster development have on the growth of SMBHs. We used a sample of Herschel/SPIRE sources of $\sim$ 228 red and compact Planck-selected protocluster (PC) candidates to estimate the active galactic nuclei (AGN) fraction from a large sample of galaxies within these candidates. We estimate the AGN fraction by using the mid-infrared (mid-IR) photometry provided by the WISE/AllWISE data of $\sim650$ counterparts at high redshifts. We created an AllWISE mid-IR colour-colour selection using a clustering machine learning algorithm and two {\it WISE} colour cuts using the 3.4 $\mu m$ (W1), 4.6 $\mu m$ (W2) and 12 $\mu m$ (W3) passbands, to classify sources as AGN. We also compare the AGN fraction in PCs with that in the field to better understand the influence of the environment on galaxy development. We found an AGN fraction of $f_{AGN} = 0.113 \pm 0.03$ in PC candidates and an AGN fraction of $f_{AGN} = 0.095 \pm 0.013$ in the field. We also selected a subsample of `red' SPIRE subsample with a higher overdensity significance, obtaining $f_{AGN} = 0.186 \pm 0.044$, versus $f_{AGN} = 0.037 \pm 0.010$ of `non-red sources', consistent with higher AGN fractions for denser environments. We conclude that our results point towards a higher AGN fraction in PCs, similar to other studies.


INTRODUCTION
Galaxies in the universe are not randomly distributed in space; instead, they can be isolated (i.e. a field galaxy) or in gravitationallybound structures, such as groups or galaxy clusters (e.g.Oort 1983;Waldrop 1983).
This raises important questions about the differences between galaxies as a function of their environment and how their evolutionary paths vary over cosmic time from the first density fluctuations ★ E-mail: calgaticai@gmail.com to local current structures.To answer these questions, it is necessary to study protoclusters (PCs) of galaxies, the progenitor structures of today's massive clusters, during the epoch of their formation (Muldrew et al. 2015;Overzier 2016;Muldrew et al. 2018).Consequently, there is a need to identify and characterise PCs at high redshift at  ∼ 2 − 3 ('cosmic noon'), which corresponds to the time in cosmic history when the peak of the SFR density of the Universe occurs (Madau & Dickinson 2014;Förster Schreiber & Wuyts 2020).
In order to be able to provide a complete picture of galaxy evolution as large-scale structures assemble and develop, we must understand the simultaneous growth of galaxies and their supermassive black holes (SMBHs).The active growth of a SMBH in a galaxy is typically signalled during its most vigorous phases of mass accretion (active galactic nucleus; AGN).In this regard, there is evidence of a peak at  ∼ 2 − 3 for high-luminosity AGN (Hasinger et al. 2005;Fanidakis et al. 2012), cosmic BH accretion (e.g.Croom et al. 2009;Delvecchio et al. 2014) and space-density of quasars (e.g. Brown et al. 2006;Richards et al. 2006).
AGN activity in different environments has also been explored.Results include lower average AGN fractions in clusters at redshift  < 0.5 (Mishra & Dai 2020) when compared to the field, no dependence of the optical AGN activity on environment in blue galaxies (Miraghaei 2020), higher AGN fractions for massive galaxies than lower mass galaxies (Pimbblet et al. 2013), similar AGN fractions in clusters and the field at 0.5 <  < 0.9 (Klesman & Sarajedini 2012), and an increase of AGN fractions with redshift (Eastman et al. 2007).Nevertheless, it is still unclear whether or not the local environment of galaxies plays a significant role in the growth of galaxies and their SMBHs.To clarify this, a statistical study of the environment hosting AGN activity is required, and in particular, it is important to determine the occurrence of AGN in PCs and at different and higher redshifts.
Studies of high redshift PCs concluded that they exhibit higher fractions of AGN and star-forming galaxies compared to the field, as opposed to overdensities at lower redshifts which have lower fractions than the field (e.g.Overzier 2016, a review).For instance, AGN fractions measured in PC range between 2 and 20 times higher than in the field (Lehmer et al. 2009(Lehmer et al. , 2013;;Digby-North et al. 2010;Krishnan et al. 2017).Also Polletta et al. (2021) found similar results for AGN fraction (= 13%±6%) in a PC at  = 2.16.Recently, Macuga et al. (2019) found a PC at  = 2.53 with an AGN fraction ∼2 times lower than in the field, indicating a lack of clarity regarding the AGN activity in PCs.Further, all of these studies showing a larger AGN fraction are based on X-ray selected AGN (see Casey et al. 2014 for a review) .This type of selection is biassed against highly obscured sources (Hickox & Alexander 2018;Hatcher et al. 2021, and references therein).Therefore, to provide a complete picture, other methods must be used to select AGN.
Comparing all of these studies is difficult, since they all present different methods for selecting AGN or AGN contribution, different sensitivity limits, and definitions of non-AGN host galaxies (see Padovani et al. 2017 for review).Whether differences in AGN fractions are due to redshift evolution, observational biases of PC selected in different halo masses, or evolutionary states, variations in the general and systematic properties of PC depending on how they are selected or just individual PC-to-PC variations remains a crucial open question.
Large samples of PC candidates have been built using large photometric surveys that have mapped significant areas of the sky, and some effort has been made to characterise these kind of environments (e.g., Chiang et al. 2013;Umehata et al. 2015;Lee et al. 2016;Shimakawa et al. 2018;Miller et al. 2019).Performing a larger census of galaxies, especially AGN, within PCs is crucial to understanding the physical processes involved and determining whether the environment of a forming galaxy cluster at high redshift can trigger or drive the growth of SMBHs in its member galaxy population.
The main goal of this study is to measure the AGN fraction in a large sample of PC candidates.We use the sample of 228 Planckselected PC candidates found in Planck Collaboration et al. 2015 (hereafter Planck XXVII), which itself is drawn from a more general sample of the Planck list of high-redshift source candidates (PHZ, Planck Collaboration et al. 2016).This sample was followed up by Herschel/SPIRE, and it is biassed towards highly star-forming regions.We combined the Planck XXVII catalogue with data from the Wide-field Infrared Source Explorer (WISE; Wright et al. 2010) All-WISE data release, which has mapped the whole sky.Using WISE sources allows us to use a mid-IR method that selects both obscured and unobscured AGN (Stern et al. 2012).For this, we built a classifier that includes both a clustering machine learning algorithm and W1-W2-W3 colour cuts.With the classification of our sources, we were able to estimate AGN fractions in both PC members and field galaxies.
This paper is organised as follows: in Section 2 we describe our Planck XXVII sample and its WISE counterparts, along with the control sample needed to construct our classifier; in Section 3 we present how we classify AGN sources with our classifier together with estimates of the method uncertainty; in Section 4 we present the measured AGN fractions and our comparison to previous results in the literature; in Section 5 we discuss our results; and in Section 6 we summarised our results and present our conclusions.

DATA & CATALOGUE
In this work, we use a catalogue of Planck colour-selected sources from the Planck Collaboration et al. 2015 (Planck XXVII), which corresponds to a catalogue of high-redshift protocluster candidates.This sample has follow-up observations with Herschel/SPIRE and the sources detected at > 3 in the Herschel/SPIRE 350m band will be referred to as "SPIRE sources".This sample is what we consider our main sample, and it is described in Section 2.1.1.To have higher resolution photometry than Herschel/SPIRE, we derive the AGN fraction using their WISE counterparts.The description of this sample is in Section 2.1.2.Also, to create a classification scheme that selects AGN, we compiled a control sample that includes catalogues of AGN (see Section 2.2.1) and non-AGN sources (see Section 2.2.2).

Planck XXVII
Our main sample consists of the Herschel/SPIRE follow-up observations of 228 Planck sources from Planck Collaboration et al. (2015).These fields, selected as cold sources of the cosmic infrared background (CIB) and from the Planck catalogue of Compact Sources (PCCS), were chosen for follow-up because their rest-frame farinfrared colours show a peak between the frequency range 353-857 GHz, allowing the selection of ultra luminous infrared galaxies.
This sample is dominated by dusty far-infrared galaxies, with high star formation rates, suggesting the signatures of highly star-forming protoclusters at high redshift, some line-of-sight projections (Negrello et al. 2017), and strongly-lensed sources.Therefore, it is important to note that this study targets a specific population of galaxies in protoclusters, i.e. their most star-forming population.
Particularly for this study, we have discarded the Herschel/SPIRE sources that are considered lensed (Cañameras et al. 2015, Dole H., private communication).After removing the lensed sources, we are left with 193 Planck sources.
Although this catalogue offers a good opportunity to study a large number of star-forming galaxies in protoclusters, it does not provide certain redshift measurements nor does it have enough multiwavelength observations to derive a redshift estimation such as photometric redshifts.However, we do have an idea of the redshift range for these sources.
First, since these sources are considered 'cold' sources of the cosmic infrared background (CIB), we know that they are at redshifts  > 1 because the CIB is considered a proxy of intense star formation as those redshifts (Planck Collaboration et al. 2015, 2014, andreferences therein).
More specifically, Planck observations show that these sources have spectral energy distributions (SEDs) that peak around 353 and 857 GHz, which equates to redshifted infrared galaxies at  ∼ 2 − 4 (Planck Collaboration et al. 2015).
Also, Planck Collaboration et al. ( 2015) followed the approach of Amblard et al. (2010), and found a suggested redshift range of  ∼ 1.5−3 with their Herschel colours and SEDs of modified blackbodies, with the redshift distribution of the SPIRE sources peaking at  =2 or 1.3 for dust temperatures of  d = 35K or 25K, respectively.
Planck Collaboration et al. (2015) separated the SPIRE sources in two regions, the 'in' region and the 'out' region.The 'in' region is defined as the 50% Planck intensity contour at 545 GHz, the map with the best signal-to-noise ratio (SNR), and has an approximate radius of ∼ 5 arcmin.Planck Collaboration et al. (2015) did an statistical analysis on the number counts for these regions and compared them with two control samples, the HerMES 'level 5' Lockman-SWIRE field (Oliver et al. 2010) and the Herschel Lens Survey (HLS) cluster fields at  < 1 of Egami et al. (2010).The statistical analysis shows that IN regions exhibit a chromatic excess consistent with a population of high-redshift ( = 2 − 4) lensed candidates, IN regions in both 350 and 500 have higher counts when compared to samples of the Lockman field and the  < 1 HLS cluster fields, IN regions have an excess of SPIRE sources, and that OUT regions have number counts consistent with the Lockman field and the HLS cluster fields with similar density to blind surveys.Thus, the analysis suggests that the IN and OUT regions would be a good method for selecting PC Purple and cyan dots show the sources that are 'in' and 'out' of the Planck 50% intensity region, respectively.The 'in' displays the same spread in colours as the sources in the 'out' region.The over-plotted contours show the colour distribution for our control sample at the 1, 1.5 and 2 levels.The blue contours show the distribution for AGN sources, while the red contours show the non-AGN sources.Our control sample is used to train and test our classifier (see Section 3) and includes the sources described in Table 1 (see Section 2.2).We see that most of the SPIRE sources have colours in the same range as the colours from the control sample.The AGN distribution tends to be redder in the W1-W2 colour and bluer in the W2-W3 colour when compared to non-AGN galaxies.member candidates and field sources, respectively.For instance, this approach is used by Lammers et al. (2022).
In Figure 1 we show the WISE image of one of the PC candidates as an example.We show the W2 band image for the field PLCK_HZ_G086.1plus61.6,along with the 'in' and 'out' sources (pink and cyan circles, respectively).The Herschel 500m emission is also shown in yellow contours, showing the difference in resolution between Herschel and WISE.Also, we show the contour at 50% of the peak flux for the Planck image at 545 GHz, which separates the 'in' and 'out' regions.
We have thus decided to take advantage of the AGN diagnostic power provided by WISE, to assess the presence of AGN activity in the Planck XXVII PC sample, by using the WISE counterparts of our SPIRE sources.Moreover, since we expect PC members to be bright and red sub-mm sources, we are reducing contamination from non-members by only selecting WISE sources associated with Herschel sources.
The SPIRE sources were cross-matched with the AllWISE data release 1 (Wright et al. 2010;Mainzer et al. 2011), using the public database from the NASA/IPAC Infrared Science Archive 2 (IRSA).
The match was done with the SPIRE 250  band and considered only the closest source as a counterpart (avoiding multiple counterparts) in a radius of 9 ′′ , which is half the resolution of the SPIRE's 250  band.This was a conservative choice to limit the wrong associations.We also considered that the WISE sources were photometrically not affected by contamination or artefacts (cc_flags='0000') and that they were point-like (ext_flg=0) as expected for highredshift sources.We use w#mpro Vega magnitudes (where # in the observing band 1, 2, 3 or 4), which is the appropriate magnitude of non-extended sources.
After the cross-match, it was necessary to choose a SNR threshold for the WISE bands to obtain a trustworthy sample of counterparts.To decide on an SNR threshold, we derived the completeness level of the sample at different SNR values.For the estimates of the completeness of the sample, we searched all sources in AllWISE within a search radius of 20 arcmin from the Planck's field centres, which is a few times larger than a typical Planck 'in' region, obtaining more than 1 million sources.Then, we compared the mean fluxes of the sample with different SNR limits for each band, with the AllWISE catalogue completeness.Details on how this completeness is computed are found in the AllWISE documentation 3 .After this comparison, we decide on an SNR threshold of SNR ≥ 7, which corresponds to a completeness level of at least ∼ 45% for our AllWISE counterparts sample.We argue that a higher completeness level is not required, considering that our sources from the Planck XXVII catalogue are secure SPIRE detections.Also, this SNR threshold is consistent with a confident point source detection (Lonsdale et al. 2015).
Finally, from our 6904 SPIRE sources, we obtained 646 AllWISE counterparts.Out of the total AllWISE sample, 150 are considered PC members (or 'in' sources) and 496 are considered field galaxies (or 'out' sources).A WISE W1-W2 vs. W2-W3 colour-colour diagram of our AllWISE sources is shown in Figure 2. Here, we show both the PC members ('in') and field galaxies ('out') sources.

Control Sample
To train the AGN classifier, a control sample was compiled with a combination of catalogues for AGN and non-AGN sources with available WISE colours.Considering the suggested redshift ranges for the SPIRE sources (discussed in Section 2.1.1),and the redshift range of the confirmed structures from the sample, sources were selected between 1 ≤  ≤ 3.
If the catalogue includes the WISE photometry, then the magnitudes and colours were retrieved from the catalogue itself.Otherwise, a cross-match with AllWISE was done, following the same procedure of the SPIRE sources, but using a search radius of 6 ′′ , which corresponds to half the best angular resolution of the WISE bands.

AGN sources
For the AGN subsample, we selected AGN sources from the Million Quasars (Milliquas) catalogue, version 7.2 (Flesch 2021), the AGNs in the MIR using AllWISE data (Secrest et al. 2015)  The Flesch (2021) catalogue corresponds to a compilation of ∼ 800, 000 quasars up to 30 April 2021, and is the updated version of the Flesch (2015) catalogue.It includes different types of sources, and we only selected secure quasar objects.

Non-AGN galaxies
Non-AGN sources were selected from the catalogue of star-forming galaxies at  ∼ 1.6 in the FMOS-COSMOS survey from Kashino et al. (2019) and the catalogues from the Cosmic Assembly Near-IR Deep Extragalactic Legacy Survey (CANDELS4 ; Grogin et al. 2011;Koekemoer et al. 2011).Particularly, we used the GOODS-S CAN-DELS and UDS CANDELS stellar mass catalogues from Santini et al. (2015), the CANDELS-EGS stellar mass catalogue from Stefanon et al. ( 2017) and the CANDELS-COSMOS Multiwavelength catalogue from Nayyeri et al. (2017).
The catalogue from Kashino et al. (2019) contains 5,484 objects observed over the COSMOS field, with ∼ 30% of them being within 1 ≤   ≤ 3. AGN sources were discarded using catalogues of Xray sources (Kashino et al. 2019).Only 516 sources remained, after cross-matching, with AllWISE photometry.
The CANDELS catalogues (Stefanon et al. 2017;Nayyeri et al. 2017;Santini et al. 2015) were chosen because they include an AG-Nflag, which allows the selection of non-AGN sources.This flag comes from SED fitting of multi-wavelength observations.

Balanced control sample
The resulting count of sources is a total of ∼ 21, 000 AGN sources and 636 non-AGN.The number of AGN sources is much greater than the number of non-AGN sources, but a well statistically balanced sample is necessary to train the classifier, to avoid biases.
Therefore, we reduced our AGN sample to match the number of non-AGN sources.For this, we randomly selected 636 AGN sources.Then, we have 636 AGN and 636 non-AGN sources, and a total of 1272 sources, summarised in Table 1.
The colour-colour diagram for our final control sample can be seen in the right panel of Figure 2.Here we distinguish between the AGN and non-AGN samples.At first glance, no strong separation between  AGN and non-AGN can be seen.However, the AGN distribution tends to be redder in the W1-W2 colour and bluer in the W2-W3 colour when compared to non-AGN galaxies.

AGN CLASSIFICATION
We design a colour-colour selection criterion (i.e. a classifier) to sort galaxies of our SPIRE sources as AGN or non-AGN galaxies.This is achieved by finding a way of separating both types of galaxies on a W1-W2-W3 colour-colour space.This classifier uses two main criteria.First, a K-means clustering machine learning algorithm (Macqueen 1967;Lloyd 1982) is applied to the WISE colour-colour diagram of known AGN and non-AGN sources (i.e. the control sample).After that, a mid-IR/WISE colour selection criterion is applied.
In this case, we use two colour cuts of W1-W2 and W2-W3 (see next subsection).The colour cut at W1-W2 > 0.94 is higher than the value used in other studies (e.g.0.8 in Stern et al. 2012 and0.5 in Blecha et al. 2018).This type of classifier is based on similar studies, that were able to distinguish different types of galaxies, mostly at lower redshifts, in a colour-colour diagram of WISE W1-W2 and W2-W3 colours (Lake et al. 2012;Mingo et al. 2016;Jarrett et al. 2017).
After classifying the WISE counterparts of our SPIRE sources, we estimate the AGN fractions for both the PC members ('in' sources) and field galaxies ('out' sources).The AGN fraction uncertainty is estimated via a Monte Carlo approach.

Building the classifier
Before training our classifier, we subdivide our control sample into a training set, that corresponds to an 85% of the full sample, and a test set, corresponding to the 15% left of the sample.This resulted in 1,081 galaxies for the training set and 191 galaxies for the test set.The percentages that we used to make each sub-sample were decided based on having enough sources to train the classifier, and enough sources that allowed us to evaluate the accuracy of the classifier.
The first part of the classifier consists of using a k-means algorithm from the Python package Scikit-learn (Pedregosa et al. 2011).Kmeans is an unsupervised, machine-learning, clustering algorithm.This algorithm subdivides the sample into clusters so that the sum of the squares of the data values in the W2-W3 vs W1-W2 colour-colour space within each cluster is minimised.
The K-means module uses the K-elbow parameter to decide how many clusters the algorithm will divide the data into.Considering that we want to distinguish between AGN and non-AGN, we set the K parameter as  = 2, thus dividing the data into two clusters.
Once the algorithm finishes assigning every data point in the training set to a given cluster, each point gets flagged with either 1 or 0, which means that the source was selected as either an AGN or a non-AGN, respectively.The separation is given by W1-W2 = 1.53(W2-W3) -4.80.Since running the k-means algorithm alone does not cleanly divide our sample, we added two colour cuts into the classifier.The colour cuts were defined as the mean minus 3 of the W1-W2 AGN distribution, and as the mean plus 3 of the W2-W3 colour from our control sample.This corresponds to colour cuts at W1-W2 > 0.94 and W2-W3 < 4.04 (see Figure 6).In summary, we consider a source to be an AGN if its W1-W2 and W2-W3 colours agree with the following:

Testing
We estimate the completeness, reliability and accuracy of the classifier using the test sample, with a size of 191 sources.We first verify the classification only by considering a k-means clustering.This results in a classification of 97 true positives and 68 true negatives.Considering the completeness as the number of true positives divided by the sum of true positives and false negatives, we get a completeness of 98%.For the reliability, measured as the number of true positives divided by the sum of true positives and false positives, we get a reliability of 80%.Lastly, for the accuracy, measured as the sum of true positives and true negatives divided by the total number of sources, we get an accuracy of 86%.
We then tested these parameters using the combined k-means algorithm with the colour cut criterion.This resulted in 83 true negatives and 96 true positives.This essentially means that adding the colour cut improves the accuracy of our classifier to a 94%, with a 97% completeness and a 91% reliability.In Figure 4 we present the confusion matrices, summarising these values.

Monte Carlo simulation
To estimate the uncertainty of our method we performed a 10,000step Monte Carlo simulation.Each step simulates colours in the space W1-W2 vs. W2-W3, for which we use our classifier and then measure an AGN fraction.To construct our simulation, we interpolate the W1-W2 and W2-W3 colour distributions of the SPIRE sources.The SPIRE distributions in each colour and their interpolations are shown in Figure 3.
Each step simulates the data using random data points generated from the previously mentioned distributions.The W1-W2-W3 colours of one of the artificially generated distributions are shown in Figure 5.After that, each simulated data point gets a designation of AGN or non-AGN, using our classifier.Finally, the AGN fractions are measured.The Monte Carlo simulation returns a normal distribution, in which the standard deviation  is the corresponding uncertainty.

Classification of the SPIRE sources
We ran the classifier with our SPIRE sources and found the following outcome.Out of the full catalogue of 646 sources, we found that 64 were selected as AGN, while 582 were selected as non-AGN sources.In particular, we found that there are 17 AGN that correspond to members of PC candidates and 47 that correspond to sources outside the PC candidates, i.e. field galaxies.When it comes to non-AGN, we found 133 that are also PC members and 449 sources that correspond to non-PC members.These numbers are summarised in Table 2.The final classifier is represented in Figure 6.In particular, we show the W1-W2 vs W2-W3 colour-colour diagram for our SPIRE sources.The different symbols distinguish between the sources that the classifier selects as AGN or non-AGN.Also, filled and empty symbols differentiate member galaxies of PC from field galaxies, respectively.We also over-plotted the (training) control sample as blue and red contours for AGN and non-AGN objects, respectively, to show the distribution of the galaxies we used to train our k-means method.

AGN fractions
After the classification of sources in our SPIRE sample, we proceeded to measure the AGN fraction in both the PC candidates and the field.The resulting AGN fraction for PCs is     = 0.113 ± 0.03 or 11 ± 3%.For the field, we found an AGN fraction of     = 0.095 ± 0.013 or 10 ± 1%.
The uncertainties to each AGN fraction come from Monte Carlo simulations.The Monte Carlo histograms are shown in the top panels of Figure A1, of the Appendix A. We note that the AGN fractions measured by the Monte Carlo simulations are quite similar to the actual AGN fractions.This is a good probe that our simulated data are a good representation of the observations.For a better understanding of our results, we also measured the AGN fraction for 'red' SPIRE sources.The 'red' sources come from the selection of the reddest Herschel sources by Planck Collaboration et al. (2015), defined as  350 / 250 > 0.7 and  500 / 350 > 0.6, based on source density distributions.This sample of SPIRE red sources has a higher overdensity significance than the SPIRE sources (Planck Collaboration et al. 2015, see Figures 6 and 7), suggesting this method as another way of selecting PC members.Therefore, in this case, we measure AGN fractions for PC members and non-members, considering red SPIRE sources as the PC member candidates and the non-red sources as field galaxy sources.
The AGN fraction of red SPIRE sources is     = 0.186 ± 0.044 or 19 ± 4%.For the 'non-red' sources, the AGN fraction is    − = 0.037±0.010or 4±1%.The Monte Carlo histograms showing the estimated uncertainty are in the middle panels of Figure A1, of the Appendix A. At first glance, if we consider the PC members as 'in' sources and field galaxies as 'out' sources, we find an AGN fraction higher in PC candidates, but with a difference not statistically significant considering the uncertainties of our estimates.However, if we consider the PC members as the 'red' sources and field galaxies as the 'non-red' sources, we find a clear increase of AGN fraction in the PC candidates with respect to the field, by at least a factor 3 (with 1 uncertainty).
To compare these AGN fractions, we also measured the AGN fraction of the HerMES 'level 5' Lockman-SWIRE field (Oliver et al. 2010), which has a similar depth to our SPIRE sources (Planck Collaboration et al. 2015).We find that      = 0.075 ± 0.008 or 8 ± 1%.The Monte Carlo histogram showing the estimated uncertainty is in the bottom panel of Figure A1, of the Appendix A. This AGN fraction is lower than the     .Figures 7 and 8 summarise these fractions.
Since we do not have the exact redshift for each source and we are only working on a suggested redshift range of 1 <  < 3, we plotted the fractions as a horizontal bar that extends through that redshift range.To compare our results, we added AGN fractions from Macuga et al. (2019, and references therein) at a redshift of z = 2.53.The figure also includes measurements for different PCs from Lehmer et al. (2009), Digby-North et al. ( 2010), Lehmer et al. (2013), Polletta et al. (2021) and Krishnan et al. (2017), at redshifts of z = 3.09, 2.3, 2.23, 2.16 and 1.6, respectively.We found similar values of    in PCs to those in Krishnan et al. (2017) and Lehmer et al. (2013), while the others seem lower.It is important to keep in mind that these studies only measured the AGN fraction within one PC, instead of a fraction within a large set of PC members, like in this study.It is also important to mention that these studies are based on different AGN selection approaches than this work, therefore it is difficult to compare them directly.However, they still mostly find an increasing number of AGNs in PCs than in the field.

AGN selection
We expect that training our AGN classifier with a richer data set would return a higher accuracy of classification and better statistical results, since here we were limited by a relatively small sample of starforming galaxies at high redshift with WISE photometry.According to Stern et al. (2012) and references therein, one could decide on a different colour cut between the range 0.7 ≤ 1 − 2 ≤ 0.8, 'trading' completeness (bluer colour cut) for reliability (redder colour cut), however, our W1-W2 colour cut is higher than this range (W1-W2 > 0.94).
Keeping this in mind, plus the fact that our classifier shows an 94% of accuracy (see Section 3.2 and Figure 4), we compared our classification method with the one shown in Assef et al. (2018), which also classifies AGN based on a colour condition.Particularly, we compared the number of SPIRE sources selected as AGN, following our criteria versus Assef et al. (2018).This was made by comparing our 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 z 10 4  (and 1𝜎 significance), for the red SPIRE sources (blue), non-red SPIRE sources (red), SPIRE sources that are inside PC (light blue), outside PC (dark red) and HerMES field (grey).Literature values from Polletta et al. (2021) and Macuga et al. (2019, and references therein) are added as references for cluster/PC (filled stars) and field (empty stars) galaxies.The arbitrary y-axis was chosen to better distinguish the difference in the AGN fractions, taking into account their significance.See also Figure 7. positives for R90 (R75), while our method has an extra 6% of false positives, slightly surpassing the 90% reliability of Assef's method by 1% and reaching a completeness of 97%.In Figure 9 we show a comparison between Assef et al. (2018) and our AGN selection criteria.In Table 3 we summarised this comparison.We conclude that our method and the method from Assef et al. (2018) are both useful and reliable methods to classify AGN.However, since the goal of this study is to measure AGN fractions, i.e. both the number of AGN and non-AGN are important, we argue that the most important characteristic of the classifier must be the completeness.In this case, our method is more appropriate because our completeness is 97%, while at the same time we reach a 91% of reliability and an accuracy of 94%, in contrast to the 17% completeness of Assef et al. (2018) for R90 or to the 28% completeness for R75.

AGN fraction and their implications in protoclusters
When considering the 'in' and 'out' sources as PC members and field sources, respectively, the AGN fraction that we find in PCs is not significantly higher than the fraction measured in the field as found by other studies.An important thing to keep in mind is that only a few PCs in our sample are confirmed (see Section 2.1.1).Therefore, we may have sources that are line-of-sight alignments as suggested by Negrello et al. (2017), instead of members of the overdensities, contaminating our sample.We tested if measuring the AGN fraction in a subsample with a higher overdensity significance (i.e. a subsample of red sources), resulted in a higher AGN fraction.We found a higher AGN fraction than the field by at least factor 3 with 1 uncertainty.This could suggest that by selecting this redder sample we were in fact cleaning our sample and removing 'lineof-sight alignments', and we would be consistent with higher AGN fraction in PCs.
Another possible explanation for finding an AGN fraction not as highly significant in PCs is that, as described in Section 2.1.1,we are using a sample of PC candidates that are the most star-forming and dustiest members, instead of the full PC population, and the AGN population might not overlap with these.
Finally, the most likely explanation is that many PC members are too faint to be detected by WISE.Further, several PC AGN members may be detected only by W1 and W2 bands, and not in W3.In that case, this would mean that we are looking into the brightest AGN in the structures, which are rare.In order to test this statement, we looked into how many members of the protocluster PHz G237.01+42.50(G237) at  = 2.16 are detected by WISE.This PC has 31 spectroscopically confirmed members (Polletta et al. 2021).Using a crossmatch radius of 6.5 ′′ (W3 band resolution), out of the 31 sources, we found 5 WISE counterparts.For these counterparts, none of them are detectable in the W3 band, i.e. they have, on average, an SNR< 1 in the W3 band.In other words, a ∼ 16 % of the members were detected in W1 and W2 bands.Similarly, we consider the protocluster MAGAZ3NE J095924+022537 at z=3.37 (McConachie et al. 2022).Out of 22 spectroscopically confirmed members, we found 7 sources within 10 ′′ ; none were detected in the W3 band.Thus, a ∼ 31 % of the members were detected only in W1 and W2 bands.
Following this analysis, a diagnostic based only on the W1 and W2 may be considered for future work.In this case, we find that using W3 became a disadvantage in our method, and maybe other colours should be tested to find a better separation between AGN from star-forming galaxies, without biassing the sample to the most star-forming sources.Alternatively, a stacking analysis on the W3 signal could be done to reveal sources that are too faint to be detected individually.Also, our analysis could point to the fact that the small difference we found in the AGN fractions for field and PCs may be significant even if statistically is not.Thus, even if we did not find a highly significant difference, we think our results are still hinting at a higher AGN activity in PCs.
One of the main limitations of this study is that we are using photometrically selected PC candidates, instead of spectroscopically confirmed structures, due to the paucity of confirmed PCs available.Having a large data set of spectroscopically confirmed overdensities at high redshift would make it possible to better understand the relationship between AGN fractions − and, therefore, the growth history of SMBHs in galaxies − and the evolutionary state of early dense environments.
Nevertheless, WISE-selected AGN appear to be good indicators of overdensities (Jones 2017), as well as other AGN selections in general (e.g.Noirot et al. 2016Noirot et al. , 2018)).Plus, follow-up observations from Spitzer/IRAC for some of these PC candidates (Martinache et al. 2018), continue to support the idea that these sources, or at least a good fraction of them, are true members of PC overdensities.

CONCLUSIONS
We estimated the AGN fraction in ∼228 protocluster candidates selected by Planck XXVII and followed up by Herschel Planck Collaboration et al. (2015), a representative sample of high redshift PC candidate members.This sample provides the photometry for 7099 sources and allows us to compare the measured AGN fraction of galaxies inside the overdensities and compare them with field galaxies.We used the WISE counterparts of these sources since WISE provides higher-resolution photometry and the possibility of probing the stellar emission.This resulted in a catalogue of 646 counterparts.
In order to select the AGN in our sample, we constructed a classifier based on a mid-IR AllWISE colour-colour selection criterion.This is achieved by combining W1-W2 > 0.94 and W2-W3 < 4.04 colour cuts, which corresponds to the mean minus 3 of the W1-W2 and mean plus 3 of the W2-W3 AGN distributions of a control sample made up of AGN and non-AGN catalogues, and a k-means clustering algorithm that separates the control sample following the W1-W2 = 1.53(W2-W3) -4.80 relation.Our control sample includes known AGN and non-AGN galaxies that were used to train our classifier.
For further study of the AGN fraction in PCs, we also measured the AGN fraction in a 'redder' ( 350 / 250 > 0.7 and  500 / 350 > 0.6) subsample of our SPIRE sources, which has a higher overdensity significance.In this case we consider the red sources as PC members and the non-red sources as field galaxies.We found an AGN fraction of     = 0.186 ± 0.044 or 19% ± 4% and a    − = 0.037 ± 0.010 or 4% ± 1%.Moreover, to assess our AGN fraction for the field sample, we also measured the AGN fraction in the Lockman-SWIRE field from HerMES.We found an AGN fraction of      = 0.075 ± 0.008 or 8% ± 1%.
In terms of AGN activity in PCs, we found that our AGN fraction is not significantly higher in PCs when compared to the field, when considering the 'in' and 'out' sources as PC and field galaxies, respectively.For the field, we found that both our sample ('out') and the one from HerMES have a similar AGN fraction, thus suggesting that we have a representative field sample.However, we think that our results hint towards a higher SMBH activity in overdensities, specially since we found a higher difference in the AGN fraction for the red and non-red samples, which are proportional to the overdensity significance of the sample.
Our main conclusion is that it is complicated to assess the AGN and SMBH activity in overdensities, particularly at these high redshifts.We believe that it is necessary for a combined and complete multiwavelength study to better understand the role of the environment in the evolution of galaxies and their SMBHs.We expect that new observations from the James Webb Space Telescope will improve this kind of study by delivering deeper and higher resolution data for galaxies and large-scale structures in the redshift interval considered in this work.
project of the Jet Propulsion Laboratory/California Institute of Technology.WISE and NEOWISE are funded by the National Aeronautics and Space Administration.

Figure 1 .
Figure1.A 20 × 16 arcmin 2 WISE observation at 4.6 (W2 band) of one of our PC candidates, PLCK_HZ_G086.1plus61.6,shown as an example.The yellow contours show emission levels at 2 and 3 of the Herschel/SPIRE observation at 500, for the same field.The red contour corresponds to the 50% of the peak flux of the respective Planck image at 545 GHz, which separates the 'in' and 'out' regions.WISE 'in' and 'out' sources are enclosed by magenta and cyan circles, respectively.The sources enclosed by a blue star were classified as AGN according to our method (see Section 3).

Figure 2 .
Figure 2. W1-W2 vs. W2-W3 colour-colour diagram of the WISE counterparts for the SPIRE sources.Purple and cyan dots show the sources that are 'in' and 'out' of the Planck 50% intensity region, respectively.The 'in' displays the same spread in colours as the sources in the 'out' region.The over-plotted contours show the colour distribution for our control sample at the 1, 1.5 and 2 levels.The blue contours show the distribution for AGN sources, while the red contours show the non-AGN sources.Our control sample is used to train and test our classifier (see Section 3) and includes the sources described in Table1(see Section 2.2).We see that most of the SPIRE sources have colours in the same range as the colours from the control sample.The AGN distribution tends to be redder in the W1-W2 colour and bluer in the W2-W3 colour when compared to non-AGN galaxies.

Figure 3 .
Figure 3. SPIRE WISE counterparts distributions for the W2-W3 (left panel) and W1-W2 (right panel) colours.Purple and cyan colours represent sources flagged as in and out, respectively.The green and blue dashed curves show the interpolation to the in and out distributions.We used these interpolations to generate the simulated data for the Monte Carlo simulation (used to estimate the uncertainty of the classifier).

Figure 4 .Figure 5 .
Figure 4. Confusion matrices of our classification test with a 191 sources test sample.Each matrix shows the number of true positives (bottom right), false negatives (bottom left), false positives (top right) and true negatives (top left).Left panel: Confusion matrix for the classifier without adding the colour cuts at W1-W2 > 0.94 and W2-W3 < 4.04.The accuracy of the classification is 86%, with 98% completeness and 80% reliability.Right panel: Confusion Matrix for the classifier, now adding the colour cuts at W1-W2 > 0.94 and W2-W3 < 4.04.The accuracy of the classification using the colour cut increases to 94%, with 97% completeness and 91% reliability.

Figure 6 .
Figure 6.Classification result for the SPIRE sources (green) in the W1-W2 vs. W2-W3 colour-colour diagram.To consider a source as an AGN, three conditions must be met.The source must be located: (1) above the black horizontal line, which corresponds to a 3 level threshold for AGN in W1-W2 colour, (2) to the left of the black vertical line, which is the 3 level threshold for AGN in W2-W3 colour, and (3) over the red background area of the colour-colour diagram, which corresponds to the AGN classification given by the k-means separation.Filled (empty) stars represent the sources inside (outside) the PCs that were classified as AGN.Filled (empty) circles are the sources inside (outside) the PCs classified as non-AGN.The blue and red contours show the 1, 1.5 and 2 levels of the AGN and SF/non-AGN sources in the training data set of our control sample, respectively.This shows that both our main sample and control sample have a similar colour range covered.The upper panel shows the histogram distribution for the W2-W3 colour and the 3 level threshold shown as the solid black line for the SPIRE sources (green), and for the AGN sources (blue) and the non-AGN sources (red) of the control sample.Similarly, the right side panel shows the histogram of those distributions for the W1-W2 colour, including the 3 level threshold shown as the solid black line.The different coloured dashed lines show the fitted Gaussian model for each distribution.

Figure 9 .
Figure 9.Comparison of AGN selection criteria between this work and Assef et al. (2018).In the left (right) panel we show the colour-colour distribution of our control (SPIRE) sample (grey dots).The dashed black line shows our AGN selection criterion while up-and down-pointing triangles correspond to sources selected as AGN following Assef et al. (2018) for R90 and R75, respectively.The AGN selected sources are colour-coded for W2 magnitude.The majority of our AGN selected data, 109% from the control sample, 84% from the all SPIRE sample, and 95% from the red SPIRE sources were also selected as AGN followingAssef et al. (2018)'s criteria, principally for R90.

Table 1 .
Summary of our final control sample.For each catalogue, we show the type of galaxy selected, the number of sources and the corresponding reference.

Table 2 .
Classification result of the SPIRE WISE counterparts.
AGN fractions,    , for the red (blue), non-red (red), 'in' (cyan) and 'out' (dark red) SPIRE sources, and HerMES field (grey) versus redshift.For easier visualisation, we show the 1 significance of the AGN fractions as boxes in arbitrary redshift positions.Literature values from Polletta et al. (2021) (black star) and Macuga et al. (2019, and references therein; black circle, triangle, square and diamond) are added as reference for cluster/PC (filled black markers) and field (empty grey markers) galaxies.Here we show that the AGN fraction is, in general, greater in PCs than in the field.See also Figure 8. AGN fractions,

Table 3 .
Ratio of AGN classifications following this work and Assef et al. (2018) and the true number of AGN.General comparison between AGN classification from this work and classification from Assef et al. (2018).