Galaxy clustering measurements out to redshift z$\sim$8 from Hubble Legacy Fields

We present a novel approach for measuring the two-point correlation function of galaxies in narrow pencil beam surveys with varying depths. Our methodology is utilized to expand high-redshift galaxy clustering investigations up to $z \sim 8$ by analyzing a comprehensive sample consisting of $N_g = 160$ Lyman break galaxy candidates obtained through optical and near-infrared photometric data within the CANDELS GOODS datasets from the Hubble Space Telescope Legacy Fields. For bright sources with $M_{UV}<-19.8$, we determine a galaxy bias of $b = 9.33\pm4.90$ at $\overline{z} = 7.7$ and a correlation length of $r_0 = 10.74\pm7.06$ $h^{-1}Mpc$. We obtain similar results for the XDF, with a galaxy bias measurement of $b = 8.26\pm3.41$ at the same redshift for a slightly fainter sample with a median luminosity of $M_{UV} = -18.4$. By comparing with dark-matter halo bias and employing abundance matching, we deduce a characteristic halo mass of $M_h \sim 10^{11.5} M_{\odot}$ and a duty cycle close to unity. To validate our approach for variable-depth datasets, we replicate the analysis in a region with near-uniform depth using a standard two-point correlation function estimator, yielding consistent outcomes. Our study not only provides a valuable tool for future utilization in JWST datasets but also suggests that the clustering of early galaxies continues to increase with redshift beyond $z \gtrsim 8$, potentially contributing to the existence of protocluster structures observed in early JWST imaging and spectroscopic surveys at $z \gtrsim 8$.


INTRODUCTION
Studying galaxy clustering, i.e. quantifying the non-uniform angular distribution of galaxies in the sky, is a fundamental tool to investigate how the structures we observe today formed and evolved during the course of the history of the Universe (e.g., Mo & White 1996).In particular, clustering measurements at high redshift offer insight to link the rapid build-up of the galaxy luminosity function (LF) to the assembly of dark-matter halos (e.g., Cooray 2005Cooray , 2006; Lee et al. 2006;Vale & Ostriker 2008).
A complementary approach to investigate this is through abundance matching (Vale & Ostriker 2004), this technique involves comparing the number of galaxies with luminosity exceeding a specified threshold   to the amount of halos with masses surpassing a given threshold  ℎ under two assumptions: each halo or sub-halo is hosting an individual galaxy and the luminosity and the halo mass functions are monotonically related (e.g., Martínez et al. 2002;Neyrinck et al. 2004).Through the combination of clustering measurements and abundance matching analysis, the galaxy duty cycle (  ) can be constrained (Martínez et al. 2002).Also referred as occupation fraction this parameter is defined as the fraction of UV-bright Lyman break galaxies (LBGs) that reside within a dark matter halo, and it is an important quantity to constrain average star formation efficiency within dark matter halos.
★ e-mail: ndalmasso@student.unimelb.edu.auOver the past two decades, clustering measurements have been performed out to  ≲ 7. Using data from Hubble Deep Field North, (Arnouts et al. 1999) quantified the redshift evolution of clustering between redshift 0 ≤  ≤ 4.5 for photometrically selected LBGs.The angular correlation function (ACF) derived showed a clear decrease in amplitude from  = 0 up to a redshift  ∼ 1 − 1.5 followed by a slow increase in amplitude at higher redshift, with the same trend found in the correlation length.These two behaviours translate into an increasing galaxy bias () at  ≳ 1.5 showing that a typical minimum halo mass of  ℎ ≳ 10 12 ℎ −1  ⊙ is needed for galaxies with ( ≃ 4) ≃ 3. Galaxy bias, denoted as , represents the systematic deviation or clustering of galaxies compared to the underlying matter distribution in the universe.It serves as an indicator of the non-uniform spatial distribution of galaxies, influenced by dark matter distribution and the large-scale structure of the universe.An increasing trend in galaxy bias implies stronger galaxy clustering and deviation from the matter distribution in larger cosmic structures.
Using LBGs candidates at 3.5 ≤  ≤ 5.5, Lee et al. (2006) investigated a more realistic model for the ACF with a function composed of two terms, a correlation between two halos on relatively large separations ( 2ℎ ()) and a one-halo term associated to presence of multiple galaxies in a single halo ( 1ℎ ()).The latter dominates the correlation signal at very small separations (comparable to the typical virial radius of a halo with  ℎ ≳ 10 12 M ⊙ ).
In their work, Wyithe et al. (2014) derived a theoretical estimate of   () ≃ 0.10 − 0.15 for  ≳ 6 using a combination of clustering and abundance matching analyses.This approach aimed to reconcile the observed discrepancies in total stellar mass and associated star formation rates.
This theoretical prediction has been subsequently tested by Barone-Nugent et al. (2014) with new galaxy clustering measurements out to  = 7.2 using LBGs from the GOODS/CANDELS and XDF Hubble legacy fields observations (Bouwens et al. 2015).Their analysis of the evolution of galaxy bias with redshift shows an increasing trend, reaching  = 9.7 +2.0 −2.5 for their sample of UV bright LBGs at redshift  = 7.2.This translate to a minimum dark matter halo mass of  ℎ ≳ 10 11.0  ⊙ .
Using a comparable sample of LBGs derived from the Hubble Legacy deep imaging, Harikane et al. (2016) conducted a study on the angular correlation function up to redshift  = 6.8, focusing on UV bright candidates (  < 27.6).Through their measurements, they deduced a range of acceptable halo masses for galaxies residing within the redshift interval of  ∼ 6 − 7 to be approximately  ℎ ∼ (1 − 20) • 10 11  ⊙ , in accordance with what had been found previously in the high redshift regime ( ∼ 5.8) (e.g., Hamana et al. 2004;Lee et al. 2006;Hildebrandt et al. 2009;Barone-Nugent et al. 2014).
In contrast to Harikane et al. (2016)'s redshift  = 5.8 findings of a galaxy bias of  = 4.7 +1.0 −1.3 , both Hatfield et al. (2018) and Qiu et al. (2018) report slightly higher galaxy bias values for galaxies with similar redshift and magnitude when estimating halo mass, measuring  = 6.0 ± 1.1 and  = 6.1 +1.6  −2.0 respectively.These differences may be due to computation method variations, but clustering trends with luminosity remain consistent.Both studies find significantly higher typical host halo mass and galaxy bias in their samples compared to lower luminosity ones.
In their subsequent study Harikane et al. (2022), the authors focus on measuring the rest-UV luminosity functions and angular correlation functions across a wide range of redshifts, specifically  ∼ 2 − 7. Their findings indicate an overall increasing trend in galaxy bias, compared to their previous work (Harikane et al. 2016).As a result of this increasing galaxy bias, there is a corresponding rise in the required host halo mass threshold, notably shifting for luminous UV galaxies towards values approximating  ℎ ≳ 10 11.37  ⊙ around redshift  ∼ 6, with this shift becoming more pronounced for even brighter galaxy candidates.
Further progress at higher redshift was precluded both by the limited sample sizes at  ∼ 8 and by the heterogeneous nature of surveys carried out in the last ten years, which often employed a "weddingcake" depth strategy (i.e.multiple layers of exposure times).This strategy is very efficient for characterising the luminosity function across a wide dynamic range (Bouwens et al. 2015), but it makes the measurement of the two-point correlation function (which can be used to contrain galaxy clustering) more challenging.The two-point correlation function encodes the excess probability of finding two objects, in our case galaxies, separated by a given distance.The excess is evaluated relative to a uniform distribution of sources (Landy & Szalay 1993).
With a heterogeneous survey including regions at different depths, special care must be placed to avoid introduction of systematic uncertainty and artificial clustering signal simply because some areas have an excess of galaxies due to deeper observations.In this paper we investigate the impact of non-homogeneous survey depth on the two point correlation function measurement and propose a method to conduct an accurate analysis by careful generation of the reference random sample for use in the two point correlation function estimator.Rather than simply generating a uniform random distribution over the geometry of the survey and taking into account a binary mask to model geometry and presence of foreground bright sources that mask faint high-redshift objects, we generate the reference random sample through a Monte Carlo simulation that places artificial sources with realistic spectral energy distributions and recovers them through the full photometric pipeline.This approach improves the two-point correlation function analysis well established techniques for accurate measurements of the galaxy luminosity function.
The primary objective of this paper is to showcase a novel approach through the analysis of data obtained from the Hubble Legacy Fields (Bouwens et al. 2021).This study has two main goals: firstly, to validate our methodology by comparing it with traditional twopoint correlation techniques using substantial lower redshift datasets.Secondly, to extend our measurements to high redshifts ( = 8) by combining surveys with varying depths, given the challenge of obtaining large high-redshift samples independently.
Our novel method should be ideal for applications to upcoming catalogs from the increasing number of JWST photometric surveys that are characterising the galaxy population at  > 8 (Paris et al. 2023;Hainline et al. 2023).
This paper is organized as follows.In Section 2 we describe the data selection for the LBGs candidates.Variables estimation and the theory behind the ACF is presented in Section 3. In Section 4 we address the different methods to generate the points catalog needed to proceed in the measurements.Our findings and numerical results are discussed in Section 5.In Section 6 we presents our summary and conclusions.

DATA SELECTION
This study utilises the most recent data released by the Hubble Space Telescope (HST), specifically the comprehensive collection known as the Hubble Legacy Fields1 .In our exploration of this extensively researched field, we have opted to make use of publicly available catalogs, such as the VizieR Online Data Catalog2 (Bouwens et al. 2021).
Our ultimate objective is to assemble a sizable galaxy dataset for conducting clustering analysis.To assess the influence of varying depth on source recovery and detect any potential systematic biases, we generated  2 images by utilizing data from the  105 ,  125 ,  140 , and  160 filters.These  2 images will be foundamental in the development of random point catalogs, as further detailed in the subsequent sections.
We initiated by creating a parent sample sourced from the VizieR Online Data Catalog, involving the selection of galaxies within the redshift range  ∈ [4.5, 8.5].We subsequently divided this primary sample into four redshift bins, each with a width of Δ = 1.0, with the mean redshift within each bin being =5. 1, 6.1, 6.8, and 7.7, respectively.To refine the analysis and identify sources in contiguous regions, we obtained the latest versions of the science and weight images from the Hubble Legacy Fields Project (HST Legacy field data, Bouwens et al. (2017); Whitaker et al. (2019)).Specifically, Version 2.0 was used for the GOODS-S field, while Version 2.5 was employed for the GOODS-N field, with both images having a scale of 60 milliarcseconds per pixel.
Subsequently, we converted the weight images into root mean square (rms) images.Considering that the GOODS-S field contains the XDF/HUDF patch, which involves much deeper observations that may introduce systematic uncertainties in the estimation of the correlation function beyond our ability to correct for them, we took an additional step to separate the XDF/HUDF region out.Furthermore, we excluded the HUDFP1 and HUDFP2 parallel ultra-deep regions within the GOODS-S field due to the absence of the required  435 filter for  ∼ 5 selection in the HUDFP1 region, as well as the presence of multiple bright stars in the HUDFP2 region.Additionally, the selection area in the GOODS-S field was trimmed to the common overlap area in the F125W, F140W, and F160W filters to ensure a uniform selection of sources.The XDF area is not included in the rms map of GOODS-S see right panels of Fig. 1 therefore it will be analysed separately.
The next step involved filtering the catalog by applying a magnitude cut of   < −19.8 in the two GOODS regions in order to obtain a magnitude-limited (uniform) sub-sample for conducting a consistent analysis across the entire survey area and redshift range.This magnitude limit was chosen to ensure source detection above the nominal 5 depth limit for all fields and redshifts.We decided to not apply any magnitude cut to the XDF field since the whole XDF region is internally uniform.Additionally, applying the same magnitude cut as the GOODS region would have resulted in the number of candidates that are too few to conduct a meaningful analysis.Therefore, our investigation encompasses all XDF sources, irrespective of their brightness.For reference, the average absolute magnitudes of the samples within the XDF area range from   = −18.6 to   = −18.1.Details of the data selection for each redshift bin considered in this study are summarized in Tab.1.
For a comprehensive description of the dataset (including the determination of galaxy luminosity functions), we refer the reader to Bouwens et al. (2021).

CLUSTERING ESTIMATION
First, we derive the observed ACF indicated as   (), that measures the excess probability of finding two objects -galaxies in our case -at an angular separation .We use the ACF estimator defined by Landy & Szalay (1993): where  () is the number of pairs of observed galaxies with angular separations ( ± ).() is the analogous quantity determined from a random catalog with the same geometry and selection properties as the observed catalog. () is the number of random-galaxy cross pairs.For the functional form of ACF, we assume a power law parameterization () =    − , where () is our estimator.The finite size of this survey leads to an underestimation of the measured quantities, which can be rectified by incorporating a constant known as the integral constraint (IC; see Peacock & Nicholson 1991): estimated as: where  true () is the intrinsic ACF and  obs () is the measurement within the survey area.For consistency with previous work (e.g., Lee et al. 2006;Overzier et al. 2006;Barone-Nugent et al. 2014), we fix  = 0.6 and use linear binning with 12".5 width with the maximum separation at  = 250".When the correlation slope  is fixed, the quantity /  only depends on the size and the shape of the survey area and can be determined from Eq.3.The dimensionless parameter   is the only variable that needs to be fit as: by maximizing the likelihood expressed as: where   ()  and   ()  represent the observed and modelled ACF measurements while   ()  the uncertainties in field .Errors in the ACF, for both fields, are estimated using bootstrap resampling (Ling et al. 1986).
After finding   by maximizing the likelihood Eq.5 we are able to invert Eq.6 in order to obtain the associated correlation length  0 .From the real-space correlation function  () we define the galaxy bias  through the ratio between the galaxy variance  8, and the linear matter fluctuation  8 () both in a sphere of comoving 8ℎ −1   radius: assuming  8 (0) = 0.828 from the cosmology model and the galaxy variance expressed as:

CLUSTERING ANALYSIS METHODS
To conduct measurements of galaxy clustering, it is necessary to generate a catalog of random points for the calculation of the observed angular correlation function (ACF) estimator   , as defined in  Eq.1.Previous studies adopted a simpler approach when generating the random points catalogs, mainly because they were based on surveys with relatively uniform depth.In such cases, the primary concern is modeling the survey geometry and accounting for obstructions caused by bright foreground sources.However, the GOODS-CANDELS survey exhibits nonuniform depth, as shown in Fig. 1.Consequently, in order to evaluate the pair count () for the random points, we investigate the impact of the random catalog generation on the ACF measurement.This investigation is motivated by the observations presented in Fig. 1, where the survey fields consist of sub-fields with varying individual depths.Moreover, distinct patterns are discernible in small regions, characterized by non-uniform depths (e.g., refer to the bottom right image illustrating the RMS map of GOODS-South in its bottom and top regions).All these factors contribute to the heterogeneity of the survey.

Number of sources
Since the survey's completeness depends on the depth of observations, there may be an excess or lack of sources in the observed catalog at specific locations and preferred angular separations associated with the patterns in the RMS maps.In this section, we introduce two approaches for clustering analysis in the presence of non-uniform-depth data and evaluate their respective strengths.Our analysis focuses on candidate galaxies selected through photometry in the CANDELS data over GOODS at an average redshift of  = 5.1 (Leethochawalit et al. 2022), in both cases the random points catalog will be of dimension   = 20  where   is the number of data in the analysis at a certain redshift value.
Specifically, we explore two different aspects that are fundamental for galaxy clustering measurements, the first one being the generation of the random points catalog.The second one is robustness of the analysis with respect to the use of different regions of the survey.

Uniform random points
This represent the standard approach of generating random point by sampling a uniform distribution throughout the area covered by the survey, in our case the GOODS field.
To take into account the survey geometry (and presence of bright foreground sources and/or regions with missing data), the uniform random catalog is filtered through a binary map (reject or accept) constructed considering all pixels in the image that have a wellbehaved RMS map value and are not part of the segmentation maps of (bright) sources.In practice, for faint and relatively rare galaxies at high redshift, one can use the full segmentation map without noticeable difference (Barone-Nugent et al. 2014).

Random points from source recovery map
This second method, which is introduced in this paper, aims to incorporate the varying depth of observations, including subtle features arising from mosaicing and dithering, as observed in the RMS map depicted in Fig. 1.In order to improve the fidelity of our source detection and completeness modeling in the random catalog, we qualitatively account for these data characteristics by implementing an artificial source injection and recovery process within the region covered by the ACF observations.The procedure utilizes the publicly available source injection and recovery code, GLACiAR2 (Leethochawalit et al. 2022), and employs a hit-and-miss Monte Carlo technique, assuming a model luminosity function for the LBG population.This code is typically employed to compute completeness in determining the luminosity function and, as such, realistically simulates all pertinent factors that influence the likelihood of identifying galaxies in observations, including non-uniform depth and photometric scatter in the Lyman break and/or photometric redshift selection algorithm.
For the GOODS field, we injected a total of   = 500 galaxies within each   and redshift bin.These galaxies possess Sérsic light profiles with  = 1, random inclinations and ellipticities, and a physical size of  = 1.25 kpc at  = 5.At other redshifts, the size of the injected galaxies was scaled as 1/(1 + ).They exhibit a powerlaw intrinsic spectral energy distribution (SED) with a Lyman break, where the power-law UV slope is randomly drawn from a Gaussian distribution with a mean of  UV = −2.2 and a standard deviation of  = 0.4.The magnitudes of the galaxies in each photometric band were determined based on their input spectral energy distributions.These artificial sources were then randomly injected into the science image using a uniform distribution of positions.The process of source extraction followed the same steps utilized for the real data sample.The magnitudes, redshifts, and positions of the recovered galaxies were recorded and employed for subsequent analysis.
Since the catalog of recovered artificial sources was generated assuming a flat input luminosity function Φ(), we performed a Monte Carlo hit-and-miss procedure to select the final sample of random points for clustering analysis.The probability function for selection was defined as: where Φ * = 10 −2.81 ,  * = −20.68,and  = −1.58were used for the Schechter luminosity function (Leethochawalit et al. 2022).

Spatial dependence of the clustering signal and random points methods comparison
Surveys covering a wide area may show variations in depth, resulting in a layered structure resembling a 'wedding cake', as illustrated in Fig. 1 for the CANDELS survey.Specifically, the GOODS fields consist of three distinct regions, each characterised by an approximately constant exposure time (i.e., map RMS values).
To study the impact of non-uniform exposure, we perform clustering analysis using both methods to generate the random catalog, as described in the previous sections, on the three separate regions: upper, middle and lower.The visual results are presented side by side in Fig. 2.
The top row of Fig. 2 illustrates the results obtained from analyzing the complete data sample using both random point catalog generation methods.It is evident that utilizing a uniform random point distribution in the autocorrelation function (ACF) measurement leads to an overestimation when compared to the best-fit power law.
This behavior becomes apparent in the top-left corner of Fig. 2 and manifests at large angular separations ( ∼ 100").It arises due to the introduction of artificial clustering signals induced by non-uniform depth variations across the survey area, with fluctuations of the order of one WFC3 field of view (i.e., ≳ 100").These variations can be visually correlated with the typical fluctuations in the RMS maps shown in Fig. 1.By employing a uniformly generated random point catalog, these variations are not taken into account.
In contrast, this issue is alleviated by adopting our proposed improved procedure for generating the random point catalog using a Monte Carlo recovery process (top-right of Fig. 2), although it does not entirely eliminate the artificial clustering signals.
Interestingly, employing the latter approach yields a larger uncertainty in the correlation length  0 compared to the uniform random catalog counterpart.The difference amounts to a factor of two, and we consider this estimation to be more representative of the true errors in the measurement.
Examining the analysis conducted on the individual sub-regions, we observe a consistent trend in terms of error estimations for  0 .In all cases, the estimations tend to be 1.5 − 2 larger compared to the results obtained from the uniformly generated sample.
By dividing the field into sub-regions of approximately uniform depth, we avoid significant overestimation of the ACF.This outcome arises from the fact that considering spatially confined regions leads to reduced variations in depth among the individual pointings superimposed in the mosaic.However, each region yields ACF and  0 estimations that differ from one another at a level surpassing the nominal associated error.

Angular correlation function and Bias
In Fig. 3 we present the ACF estimated from the candidate galaxies in the combined GOODS North and GOODS South catalog, while Fig. 4 displays the ACF estimated from the XDF region.In both figures, we provide the measurements and associated uncertainties for each separation bin, along with the best-fitting model (indicated by the blue line) and its associated parameters   and  0 reported as sub-plot legends.For comparison we present the results for all the redshift bins considered in the analysis  = 5.1,  = 6.1,  = 6.8 and  = 7.7 in separate panels.
As discussed in the data selection process, Sec.2, we are accounting for a cut in magnitude in both GOODS-N and GOODS-S considering only candidates with   < −19.8, while for XDF we did not consider any constraint in the magnitude.This means that the average absolute magnitudes of the samples are lower for this last field, ranging from   = −18.6 to   = −18.1 (without any apparent trend in redshift).The results of the ACF analysis for all fields are summarized in Tab. 2, where the left part reports the parameters obtained from the combined GOODS calatog and the right the results from the analysis on XDF.
First, we find that a power-law fit presents an overall good description of the data at all redshifts considered, including for our highest redshift sample at  = 7.7 in both Fig. 3 and Fig. 4 (albeit the uncertainty is larger).From lower to higher redshift, if we consider a fixed angular scale, the ACF is increasing.As a result the two clustering parameters   and  0 are increasing accordingly.Qualitatively, the XDF angular correlation function shape and redshift evolutionary trends are similar, but the increase in clustering strength is more pronounced for this small-area field.
Fig. 5 shows the evolution of the galaxy bias, obtained at the four different redshifts studied from the ACF, overlaid on dashed curves that correspond to the bias of dark-matter halos at constant mass (derived following Sheth & Tormen 1999).The comparison shows that the galaxy bias increases with redshift tracing approximately constant halo mass  ℎ ∼ 10 11.4 M ⊙ for our large-area magnitudelimited GOODS sample.The results we obtain are fully consistent with theory (e.g., Sheth & Tormen 1999) and with previous observational studies (e.g., Lee et al. 2006;Overzier et al. 2006;Barone-Nugent et al. 2014;Harikane et al. 2016;Hatfield et al. 2018;Qiu et al. 2018;Harikane et al. 2022).
At a redshift of  ∼ 6, our investigation of a luminous sample of LBGs unveils a galaxy bias of  = 6.0 ± 1.1, which closely resembles the outcomes reported by other groups in the same redhisft regime (e.g., Harikane et al. 2016;Hatfield et al. 2018).In their study, they determine a minimum halo mass of  ℎ ∼ 10 11.5  ⊙ using a simple abundance model, consistent with the predictions made in this work at the same redshift with respect to the halo mass function depicted in the left panel of Fig. 5. Furthermore, Qiu et al. (2018) presents the galaxy bias obtained at a redshift of  ∼ 7 through clustering measurements with a similar sample.The resulting value is  = 8.6 +2.2 −2.8 , which is consistent within the measurement uncertainties with the value found in our study at an average redshift of  = 6.8.
In the recent investigation conducted by Harikane et al. (2022), they provide a summary of their findings based on the angular correlation function of over  = 4 • 10 6 galaxy candidates at  ∼ 2−7.The far-right panel of the figure presents the results obtained for luminous ultraviolet (UV) galaxies at a redshift of  = 5.9.According to their analysis, the halo mass of these galaxies is estimated to be approximately  ℎ ∼ 10 11.05  ⊙ , with an uncertainty consistent with the findings of Barone-Nugent et al. ( 2014) at a similar redshift of  ∼ 6.
From the right panel of Fig. 5, we see that the XDF field shows a smaller bias compared to GOODS at all redshifts.This is expected because the sources are on average fainter as the data is deeper and no magnitude cut is applied to retain a sufficient number of galaxies for clustering analysis.However, in this case there is a strong redshift evolution in the inferred characteristic halo mass of the population, which at face value seems to evolve from  ℎ ∼ 10 8  ⊙ at  = 5.1 to  ℎ ∼ 10 11.5  ⊙ at  = 7.7.Given the small area of the field, we consider this trend most likely to originate from cosmic variance in the clustering measurement itself, which is not captured by the error bars in the figure (covering only statistical uncertainty).Incidentally this trend was also present in the XDF analysis of Barone-Nugent et al. (2014) and we have here extended it again from  ≲ 7 to  ≲ 8.
To provide a visual comparison with previous investigations on the topic, we present in Fig. 6 the comparison in between the correlation lengths  0 .Through a comprehensive analysis of low redshift data ( ≲ 7.0), our work exhibits a similar trend to that presented in other independent studies conducted on a comparable sample of LBGs that are bright.We find that the characteristic correlation length increases as we move towards higher redshifts.Our findings for each redshift bin align with previous redshift measurements reported in the literature (e.g., Adelberger et al. 2005;Lee et al. 2006;Barone-Nugent et al. 2014;Harikane et al. 2016;Hatfield et al. 2018;Qiu et al. 2018;Harikane et al. 2022), with consistency observed within the 1 uncertainty range.
An important novelty of our analysis is the extension of this trend to higher redshifts ( = 7.7), specifically towards the mid-point of the epoch of reionization.We identify a correlation length of  0 = 10.74 +7.06 −7.06 ℎ −1  , which aligns with the aforementioned trend.However, it is worth noting that the large error bars are primarily a consequence of the limited quantity of LBGs available in our sample at such high redshift.In the coming years, this limitation is expected to improve significantly thanks to the utilization of the James Webb Space Telescope (JWST) in new surveys (e.g., Paris et al. 2023;Hainline et al. 2023).
As demonstrated in Sec.4.2, our analysis reveals that the clustering parameters are not significantly overestimated when studying similardepth sub-regions.To ensure consistency, we extend our investigation to include the entire field while focusing on measurements obtained solely from the central region of both the GOODS-N and GOODS-S catalogs (see Fig. 1).In Fig. 7, we present the results, wherein the blue fitting curve represents the central region data, and we overlay the best-fit model obtained using the complete sample of galaxies labeled in green.
Tabulated in central column of Tab.2 are the results for both   and  0 , along with the associated galaxy bias, for all redshifts considered.Comparison of these measurements reveals consistency within the typical uncertainties of 1, which can also be visually observed in the left panel of Fig. 5.It is worth noting that the galaxy bias , when derived solely from the central region, marginally exceeds the value obtained from the overall analysis.This slight difference can be attributed to the fact that the central region represents the deepest portion among the three sub-regions, thus effectively capturing the general behavior exhibited by the entire field.

Duty Cycle
To study star formation efficiency, we need to link the dark matter halo mass function and the luminosity function.We can do this by introducing the duty cycle (  ), the fraction of dark matter halos hosting UV-bright LBGs.We can estimate the duty cycle by matching objects at the same comoving density (Vale & Ostriker 2004).Table 2. Galaxy clustering parameters obtained from ACF analysis with a fixed  = 0.6.For the combined GOODS analysis only the bright LBGs are considered, with   <-19.8, while for XDF we have not applied a magnitude cut for lack in candidates, see Tab.1.Note: the correlation length  0 is expressed in ℎ −1   units.
Figure 3. ACFs of LBGs in the GOODS catalog at four different redshifts.The results presented were obtained with random points generated from a recovery process Section 4.1.2and all the galaxies contained in the GOODS-N and GOODS-S region.The parameter   is a dimensionless coefficient obtained by fitting the ACF estimator Eq.1, is linked by Eq.6 to the correlation length  0 .Both these parameters present an increasing trend with redshift as excepted from theory and previous findings on this topic, we can see how the power law model well represent the ACF measurements.
This procedure is conducted by matching the number of galaxies with luminosity greater than a given   to the number of halos with mass greater than a minimum dark matter halo mass  ℎ assuming only a galaxy in each dark-matter halo (Martínez et al. 2002): where Φ(, ) if the LF at redshift  and ( ℎ , ) is the halo MF (Sheth & Tormen 1999).By taking the intersection between the bias inferred from abundance matching, using a Schechter LF: Φ * = 10 −0.44 , * = −20.19and  = −1.81 at  = 7.7 (Bouwens et al. 2015), and from clustering analysis, we derive the value of   .In Fig. 8 we present the duty cycle   evolution with the redshift.
We can see that at low redshift the duty cycle is contained below the unit value, while at  > 6.5 the duty cycle tends towards unity ( = 1), meaning that all the dark matter halos are hosting UV-bright LBGs with a mean magnitude of   = −19.8.We note that the duty cycle estimated from the XDF field favours lower values of  compared to the broader GOODS fields.This is a reflection of the increase in galaxy bias presented in the right-hand panel of Fig. 5.
The values found in this work are consistent with those of the XDF analysis conducted in Barone-Nugent et al. (2014), they present the bias measured for XDF, HUDF92 and HUDF91 range from  ∼ 2 to  ∼ 4 in between  ∼ 4 and  ∼ 6.These three results do not present any particular pattern, but show very large variations in individual measurements, supporting our interpretation of sample  variance being the most likely explanation of the XDF bias and duty cycle measurements.

SUMMARY AND PERSPECTIVES
The primary goal of this paper was to extend the investigation of galaxy clustering up to a redshift of  ∼ 8. To achieve the necessary critical number of galaxies for such a study, we combined multiple surveys of varying depth and introduced a novel method to address the systematic bias in correlation measurements within regions of varying depth.To achieve our goal, we utilized the most recent data from the Hubble Space Telescope (HST) Legacy Fields, focusing specifically on three distinct fields: GOODS-N, GOODS-S, and XDF.Our analysis covered a range of mean redshifts, namely  = 5.1,  = 6.1,  = 6.8, and  = 7.7.By exploring these fields comprehensively, we aimed to provide an exhaustive analysis that surpasses previous studies in this field.Here, we summarize the main aspects addressed and results obtained in this work: • In order to gain a better understanding of various approaches to the analysis of the angular correlation function (ACF), we conducted a preliminary study.This involved considering the different depths of the catalogs used and incorporating random points generated through a recovery process, as described in Sec.4.The purpose of this investigation was to explore the impact of these factors on the ACF analysis and assess their implications for our research.
• To apply our innovative ACF estimation procedure, we utilized a combined GOODS catalog as well as the XDF field.The corresponding ACF measurements are presented in Fig. 3 for the combined GOODS catalog and Fig. 4 for the XDF field.At a mean redshift of  = 7.7, we obtained novel measurements of the galaxy bias.For the combined GOODS catalog, the galaxy bias was determined to be  = 9.33 +4.90  −4.90 , while for the XDF field, the galaxy bias was found to be  = 8.26 +3.41  −3.41 .These measurements represent important contributions to our understanding of galaxy clustering at this high redshift.
• In order to validate our results, we conducted a thorough comparison with previous studies on the topic, and we found agreement between our findings and those from the literature, as demonstrated in Fig. 6.To further strengthen our analysis, we performed a supplementary validation by focusing on the central deep region of the GOODS catalog.Our results from this validation analysis were consistent with our main study, providing additional support for the robustness and reliability of our method.This validation analysis is presented in Fig. 7. Together, these comparisons and supplementary analyses reinforce the credibility and significance of our study.
• In Fig. 5, we have summarized all the ACF results and depicted them graphically alongside the trend of dark matter halo bias from the model by Sheth & Tormen (1999), considering different mass values.Notably, all three results at  = 7.7 consistently suggest a dark matter halo mass in the range of  ℎ = 10 11.4∼11.5  ⊙ .This finding indicates that as we move towards higher redshift, LBGs located within larger dark matter halos exhibit higher UV luminosity.Moreover, these results are consistent with previous studies on high-redshift galaxies (e.g., Adelberger et al. 2005;Lee et al. 2006;Barone-Nugent et al. 2014;Harikane et al. 2016;Hatfield et al. 2018;Qiu et al. 2018;Harikane et al. 2022).Our findings suggest that galaxies with greater mass and luminosity display higher levels of clustering compared to previous studies at similar redshifts.This implies that these galaxies are hosted by more massive dark matter halos, providing valuable insights into the relationship between galaxy properties and their underlying dark matter structures.
• We investigate the duty cycle of galaxies within dark matter halos using an abundance matching framework.We confirm the results from a previous study Barone-Nugent et al. (2014), where our measured duty cycle approaches unity above  > 6.5 (see Fig. 8).Furthermore, we extended the duty cycle trend to  ∼ 8 with our novel determination of the ACF at  = 7.7.
The extension of the clustering analysis to a redshift of  = 7.7 was conducted using a sample size of   = 160 objects derived from surveys with varying depths.This serves as a reference point for future forecasts of galaxy clustering studies at redshifts  ≳ 8 using the James Webb Space Telescope (JWST).With the current preliminary determinations of galaxy luminosity functions (e.g., Leethochawalit et al. 2022;Donnan et al. 2023) and ongoing photometric surveys (e.g., Paris et al. 2023;Hainline et al. 2023), it is anticipated that high-quality photometric samples will become available within the next 12-24 months.These advancements will enable the study of the connection between galaxies and their associated dark matter halos during the earliest stages of cosmic reionization.This prospect holds great promise for furthering our understanding of the formation and evolution of galaxies in the early Universe.

Figure 1 .
Figure 1.Left: GOODS-North maps.Right: GOODS-South maps.The top row shows the science images while the bottom has rms maps in logarithmic scale and divided in sections by red lines: Top, Central and Bottom to show the differences in depth due a wedding-cake depth strategy of the survey (see Grogin et al. 2011).

Figure 2 .
Figure 2. ACFs for LBGs at  = 5.1 with   < −19.5.On the left column we can see results obtained using a catalog of random points generated by a uniform distribution Sec.4.1.1,while on the right the analysis conducted with a random point recovery catalog Sec.4.1.2.From top to bottom we can see the results produced using data from: Overall, Top, Central and Bottom region of the survey in Fig.1.From this figure we can see that the central region is the one that better represent the analysis on the overall survey, this is because is the most deep as we can see in the bottom row of Fig.1.

Figure 4 .
Figure 4. Top: ACF for LBGs in GOODS-S deep field XDF catalog.The results presented were obtained with random points generated from a recovery process Section 4.1.2and the entire set of data from Tab.1 without applying a cut in magnitude.The parameter   is a dimensionless coefficient obtained by fitting the ACF estimator Eq.1, is linked by Eq.6 to the correlation length  0 .In this figure we can see the same increasing trend of the clustering parameters as observed in Fig.3, values are lower because of the smaller area of XDF and the reduced pool of available candidates.

Figure 5 .
Figure 5. Bias evolution as a function of redshift from 5.1 ≤ z ≤ 7.7.Dotted lines represents the dark matter halo bias from the Sheth & Tormen (1999) mass function.On the left panel the red dots are slightly shifted for better visualisation.

Figure 6 .
Figure 6.Comparison of the correlation length  0 with previous studies conducted on a sample of UV bright LBGs at similar redshifts (Lee et al. 2006; Overzier et al. 2006; Barone-Nugent et al. 2014; Harikane et al. 2016).The height of the boxes represents the corresponding error bars, while the thick horizontal line within each box represents the peak measurement of the respective study.Note: the width of the boxes does not convey any physical information; it is solely used for enhanced visualization purposes.

Figure 7 .
Figure7.ACF for LBGs in GOODS catalog.The results presented were obtained with random points generated from a recovery process Section 4.1.2and all the galaxies contained in the GOODS-N and GOODS-S central region (bottom row Fig.1).The blue fitting line return the dimensionless parameter   linked by Eq.6 to the correlation length  0 , while the green dashed line represent the fit obtained by considering the entire combined GOODS catalog.

Figure 8 .
Figure 8. Duty cycle   analysis conducted on the two different samples at the four redshift considered in this work.For both  = 6.8 and  = 7.7 we found the upper limit represented by   = 1.0.The results from XDF, in blue, are slightly shifted in redshift to the right for clarity.

Table 1 .
(Bouwens et al. 2021)urces used in this work for clustering measurements in three main fields (XDF, GOODS-S and GOODS-N).The sources are extracted from(Bouwens et al. 2021)as discussed in Section 2, and the table includes for each redshift bin the parent catalog source number and in bold between parentheses the objects selected for analysis.