Correcting correlation functions for redshift-dependent interloper contamination

The construction of catalogues of a particular type of galaxy can be complicated by interlopers contaminating the sample. In spectroscopic galaxy surveys this can be due to the misclassification of an emission line; for example in the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX) low redshift [OII] emitters may make up a few percent of the observed Ly${\alpha}$ emitter (LAE) sample. The presence of contaminants affects the measured correlation functions and power spectra. Previous attempts to deal with this using the cross-correlation function have assumed sources at a fixed redshift, or not modelled evolution within the adopted redshift bins. However, in spectroscopic surveys like HETDEX, where the contamination fraction is likely to be redshift dependent, the observed clustering of misclassified sources will appear to evolve strongly due to projection effects, even if their true clustering does not. We present a practical method for accounting for the presence of contaminants with redshift-dependent contamination fractions and projected clustering. We show using mock catalogues that our method, unlike existing approaches, yields unbiased clustering measurements from the upcoming HETDEX survey in scenarios with redshift-dependent contamination fractions within the redshift bins used. We show our method returns auto-correlation functions with systematic biases much smaller than the statistical noise for samples with at least as high as 7 per cent contamination. We also present and test a method for fitting for the redshift-dependent interloper fraction using the LAE-[OII] galaxy cross-correlation function, which gives less biased results than assuming a single interloper fraction for the whole sample.


INTRODUCTION
The measurement of a redshift from a galaxy spectrum is one of the most fundamental parts of a spectroscopic survey. This is usually achieved by relying on features in the spectra such as emission and absorption lines and the shape of the continuum. However, when only one emission line is detected it becomes impossible to unambiguously identify the rest frame emission line and return an accurate classification and redshift. This results in catalogues of galaxies which contain interlopers, i.e., misclassified sources at the wrong redshift. Interloper contamination is expected to be impor-tant in several major upcoming galaxy surveys (for examples see e.g. Pullen et al. 2016). The focus of this paper is the ongoing Hobby-Eberly Telescope Dark Energy Experiment (HETDEX; Hill et al. 2008, Hill et al in prep, Gebhardt et al in prep), where, due to the spectrographs not resolving the [O ii] doublet, low redshift [O ii] emitters with rest-frame wavelength 3727 Å can be mistaken for high redshift Ly α emitters (LAEs) with rest-frame wavelength 1216 Å.
The impact of interlopers on the correlation function and power spectrum of a galaxy sample has been studied in the literature (e.g., Pullen et al. 2016;Leung et al. 2017;Grasshorn Gebhardt et al. 2019;Addison et al. 2019;Massara et al. 2020). It has been seen that the presence of interlopers in a sample changes the galaxies' correlation function and power spectrum. It is also understood that if the interlopers are unclustered then the main effect just decreases the overall clustering amplitude by adding in uncorrelated sources (see Appendix B.4 of Grasshorn Gebhardt et al. 2019). However, if the interlopers are clustered, then a signal from their correlation function is added into the sample. It has also been shown that these spurious clustering signals can cause biases in the inferred cosmological parameters (e.g. Pullen et al. 2016;Grasshorn Gebhardt et al. 2019;Addison et al. 2019).
In both Grasshorn Gebhardt et al. (2019) and Addison et al. (2019) methods are presented that include the effects of interlopers in the modelling of the galaxy power spectrum. These authors note that a cross correlation signal between two intrinsically uncorrelated samples of galaxies can be created entirely due to interloper contamination. They advocate using this observed cross correlation signal to put constraints on the contamination fraction, in order to yield better measurements of cosmological parameters. An alternative approach to forward modelling techniques is to decontaminate the measurements by applying a transformation that changes the observed auto and cross correlation functions into the true underlying functions. A matrix to carry out this transformation and its inverse is given in Awan & Gawiser (2020). Their work deals with angular clustering measurements in redshift bins.
A related issue to interlopers in spectroscopic galaxy surveys is their impact in line intensity mapping experiments (e.g., Visbal & Loeb 2010;Gong et al. 2014Gong et al. , 2020Lidz & Taylor 2016;Cheng et al. 2016Cheng et al. , 2020. These studies differ from emission line surveys in that they target the light from unresolved populations of galaxies. However, it has also been noted that interlopers in intensity mapping experiments add an anisotropic signal to the power spectrum of the target population (e.g., Visbal & Loeb 2010;Gong et al. 2014;Lidz & Taylor 2016). In Gong et al. (2020) a method is presented that jointly fits the cosmology and properties of interloper lines in line intensity mapping experiments.
One scenario that has not been addressed by efforts to model the correlation function or power spectrum from spectroscopic emission line surveys is when the contamination fractions and the clustering of the contaminants show rapid evolution within the redshift bins used to define samples. Existing methods may work to an acceptable level with correlation functions that have a reasonable amount of evolution within the redshift bins considered, but in HETDEX the observed [O ii] clustering signal will evolve rapidly with redshift, due to projection effects (see e.g. Figure 2 of Grasshorn Gebhardt et al. 2019). The [O ii] contamination fraction will also be redshift dependent, due to the intrinsic redshift distribution of the emission lines and due to the wavelength dependence of the noise. Although Cheng et al. (2020) recently published a method of generating a 3D lightcone of the interlopers in an intensity mapping survey, their method relies on the interlopers having multiple emission lines. That will not usually be the case for HETDEX, as beyond z ∼ 0.13, the bulk of the [O ii] galaxy population will only have a single detectable emission-line. Cheng et al. (2020) also focuses on producing a 3D map of the interloper density, not unbiased correlation function measurements from the target population.
In this paper we present a method to account for the redshift dependence of the contamination fractions in emissionline surveys by combining the decontamination methodology in the literature with lightcone effects presented in Yamamoto & Suto (1999) and Suto et al. (2000). References to 'lightcone effects' in this paper specifically refer to effects from the redshift dependent contamination and observed clustering. We test our method on simulations of the HETDEX survey, and demonstrate that our method to deal with the lightcone effects is an improvement over assuming fixed contamination fractions and clustering across a whole redshift bin. We also show that our new method is useful when using the crosscorrelation function to gain unbiased constraints on the contamination fractions. We focus on HETDEX here, but the work we present gives insights into all surveys with contamination rates that depend on redshift.
The outline of the paper is as follows: in section 2 we introduce the HETDEX survey and our simulations of it; this section also includes a method of assigning source classification probabilities. In section 3 we present the methods used to measure and model the projected clustering. Then in section 4 we present the methodology of our decontamination. We show the results of our model in section 5, and in section 6 we use our new methodology to fit for the redshift-dependent contamination. We give our conclusions in section 7.

SIMULATIONS OF HETDEX
In this section we explain how we generate mock catalogues. We note that our work follows that of Chiang et al. (2013), who use an older version of the log-normal simulation code used here (Agrawal et al. 2017), and an older HETDEX design, to produce simulations of the HETDEX survey. We improve on that paper, first by adding [O ii] galaxies and source classifications following Leung et al. (2017), and then by adding in more realistic redshift dependent variations into the sensitivity and noise estimates.
We will begin by introducing HETDEX (section 2.1), then the following sections introduce the model of large-scale structure (section 2.2) and the approach we use to generate a density field with a given power spectrum (section 2.3). We also explain how we assign galaxy properties (section 2.5 and 2.9), model observational effects (sections 2.4, 2.6 and 2.7) and assign the LAE probabilities (section 2.10) to generate samples of LAEs and [OII] emitters.

The HETDEX survey
HETDEX is a program on the Hobby-Eberly Telescope at the McDonald Observatory, Texas Hill et al in prep;Gebhardt et al in prep) to use LAEs to map out the large scale structure of the 1.9 < z < 3.5 Universe. The survey measures spectra from the sky using an array of up to 78 integral field units (IFUs; Hill et al. 2018;Hill et al in prep), galaxies are not pre-selected but instead observations are taken blindly. Each IFU has a square footprint roughly 50 on a side, and neighbouring IFUs are separated by 100 . When observing, the gaps between the fibers are filled in by taking 3 dithered exposures. The dithering does not fill in the gaps between the IFUs, however, meaning areas of sky are sparsely sampled. It has been shown that such a sampling can be treated as surveying the whole area with a lower number of tracers (Chiang et al. 2013). We refer to the set of three dithers at one pointing as an 'exposure set', and use the term 'exposure set position' to refer to the right-ascension and declination of the pointing.
The survey sparsely samples two main fields: a roughly 390 deg 2 field in the northern hemisphere (the 'Spring' field) and a ∼ 150 deg 2 equatorial region (the 'Fall' field). Defining the area that is sparsely sampled is difficult due to the jagged edges of the HETDEX footprint, which are caused by the approximately octagonal boundary of IFUs in the focal plane. In the real survey additional effects we do not model here, such as bright stars in the Milky Way and large foreground galaxies create holes in the survey, further complicating the issue. Thus, the precise values for the survey areas depend on how survey edges are defined and the regions which are compromised by foreground sources.
The survey goal is to measure the clustering (e.g., correlation function or power spectrum) of the LAEs and use it to probe cosmology. The modest resolution of the spectrographs (mean resolving power R = λ/δλ ∼ 800) means the [O ii] doublet cannot be resolved, resulting in some [O ii] emitters being classified as LAEs (see Leung et al. 2017).

Model of Cosmology & Large-Scale Structure
The simulations and our whole paper use the marginalised mean, flat ΛCDM cosmology from the Planck Collaboration et al. (2020), but for simplicity we assume massless neutrinos (see table 1 for the exact parameter values). The model of the power spectrum and bias used to generate the simulations is the same as that used for the analysis of the Baryon Oscillation Spectroscopic Survey (BOSS; Dawson et al. 2013) by Sánchez et al. (2017), and a full description of the model can be found there. Briefly, a linear power spectrum is generated at the mean pair redshift 1 of the [O ii] (z = 0.3) and LAE (z = 2.5) samples using camb (Lewis et al. 2000). The modelling of the non-linear evolution of the power spectrum is based on a Galilean-invariant version of renormalised perturbation theory (Crocce & Scoccimarro 2006) dubbed gRPT, which will be presented in detail in Crocce et al. 2021, in prep. (see also the description in Eggemeier et al. 2020). The gRPT model offers a good description of the power spectrum down to k ≤ 0.25h −1 Mpc for a survey like BOSS (Sánchez et al. 2017).
The bias model we use for the input power spectrum is from Chan et al. (2012), and it relates the galaxy overdensity δg to the matter overdensity δ using local bias parameters consisting of b1 and b2 and non-local bias parameters γ2 and γ − 3 as given in Chan et al. (2012). The full expression of the power spectrum from this bias model is given in Appendix A of Sánchez et al. (2017). A review on perturbative bias is given in Desjacques et al. (2018) To generate input power spectra for the mocks, the γ2 and γ − 3 parameters are set following the local Lagrangian approximation (see Fry 1996;Catelan et al. 1998Catelan et al. , 2000Chan et al. 2012). The local bias parameters we use for the LAEs and [O ii] galaxies are b1 = 2.5 and b1 = 1.5 respectively. The LAE bias we adopt is consistent with the z ∼ 2.5 measurement of Khostovan et al. (2019), if we convert their power-law fits of the clustering to a bias via Quadri et al. (2007), which uses an expression from Peebles (1980). The [O ii] galaxy bias is chosen to be consistent with previous work on HET-DEX contamination (Grasshorn Gebhardt et al. 2019). For the LAE second order bias we use fitting functions of b1 versus b2 from Lazeyras et al. (2016), which they derive using the separate universe approach of Wagner et al. (2015). We use the Lazeyras et al. (2016) results at redshifts slightly higher than the maximum redshift they test, but they see no evidence of redshift dependence in their relations in the range they do test, 0 < z < 2. The fitting function yields b2 = 0.986 for the LAEs. We do not use the same fitting function for the [O ii] galaxies, as it gives a negative power spectrum at scales important to the simulation. This is likely due to an insufficient number of terms in the expansion; correcting this issue would require higher order bias terms in the expansion. We therefore set b2 = 0 for these galaxies since it gives a reasonable power spectrum. We do not model any dependence of the clustering of sources on luminosity or other galaxy properties as this should not impact our conclusions.

Log-normal Simulations
To generate mock catalogues with our desired power spectrum we use the log-normal simulation code presented in Agrawal et al. (2017). A full explanation of the generation procedure is given in the above paper, but we include a brief summary here. The code uses an input power spectrum P G (k) to generate a 3D Gaussian field on a grid in k-space, G(k). It also generates random phases for each grid point and then carries out a Fourier transform to generate G(x), a realisation of a Gaussian random field with the power spectrum P G (k). It then transforms this field to yield a field with a log-normal distribution, δ(x). The input power spectrum P G (k) is chosen in such a way that this resultant log-normal field will have the desired power spectrum P (k). In this case our non-linear power spectrum is used for the matter density field, and our non-linear power spectrum with the added effects of bias is used for the galaxy density field. Each cell of the galaxy density field is randomly populated with galaxies. The number of galaxies assigned to a cell is drawn randomly from a Poisson distribution with a mean ofn(1 + δ(x))V cell , wheren is the number density of galaxies and V cell is the cell's volume. The code also assigns a velocity to every cell using the linearised continuity equation in Fourier space on  (2020). The values for angular area of the survey are explained in section 2.4, the prediction for the number of LAEs is explained in section 2.6. The volume given is for the LAE redshift range and for the total area that is covered with gaps and sparse observations (see Chiang et al. 2013). The number density assumes the total number of LAEs are spread uniformly over that volume. As we, unlike Planck Collaboration et al. (2020), assume massless neutrinos, we do not use their quoted σ 8 value but instead compute it using Lewis et al. (2000). We also include σ 12 , the square root of the variance in 12 Mpc spheres (i.e. not using h units), as an alternative to the more standard σ 8 (see the arguments in Sánchez 2020).
the simulated matter density field, using linear growth rates from camb (Lewis et al. 2000). Mock galaxies are then assigned the velocity of their cell. The cell size we use in the simulations is 2.2 h −1 Mpc for our LAE mocks. For the mocks of the [O ii] galaxies, we use a minimum scale of 0.88 h −1 Mpc; the smaller size compensates for the fact [O ii] emitters are projected onto larger scales by their misclassification as LAEs. We expect resolution effects on scales to occur at least as small as twice the cell size, and we will label this scale on our plots.

Adding an Observer and the Angular Selection Function
To convert the simulated galaxies to a catalogue, we place an observer at an appropriate position in simulation coordinates and compute the right ascension, declination and redshift to each mock galaxy from this observer's view point. The lo-cation of the observer, the simulation cube dimensions and the coordinate system are chosen in such a way to ensure the whole volume of a HETDEX field is contained within the simulation. We assume the two widely separated Spring and Fall fields are independent, and we also assume the density fields of LAEs and [O ii] galaxies are independent (as in Addison et al. 2019 andGrasshorn Gebhardt et al. 2019 we ignore the small, inferred correlations from gravitational lensing). We therefore simulate each population with separate log-normal simulations.
The line-of-sight (LOS) direction between the observer and every galaxy is computed, and each galaxy's velocity is projected onto the galaxy's LOS direction. These LOS velocities are used to apply the offsets to the galaxy's 'observed' redshift, in order to model redshift space distortions (RSD). In these mocks we do not consider additional effects from the virial motions of galaxies within groups and clusters or from rohl et al. , 2021Gurung-López et al. 2019. Also note that although this modelling uses linear-theory-derived velocities, the resultant power spectrum in redshift space is subject to the non-linear aspects of RSD which arise from the transformation of mock galaxies from cosmological to observed redshifts (Agrawal et al. 2017).
We apply the angular footprints of the HETDEX fields to the mock catalogues. The exposure set positions for the full survey are combined with the expected positions of the full 78 IFUs in the focal plane. Instead of using the actual mask for the data taken on the telescope we use idealized exposure set positions and assume a full focal plane from the start. We also assume 78 working units for this analysis, as there remains a goal to reach this number on the telescope. Having 74 working units is a more realistic expectation given the data taken at the time of writing (Gebhardt et al in prep). These small differences should not impact our conclusions on the decontamination. Figure 1 shows a mock catalogue with the angular selection function applied. The unusual shape of the Spring field is due to a decision (made in the first half of 2020) that the most efficient use of the telescope time is to extend the area rather than fill in missing regions from the originally planned footprint. This also explains the additional holes in the Spring footprint.
We use the masking software mangle from Swanson et al. (2008) and Hamilton & Tegmark (2004) to apply the survey footprint and also to generate a catalogue of random positions. These random positions are used to measure the clustering and we refer to them as the 'random catalogue' or 'randoms' hereafter. We also use mangle to compute the area of the sky covered by fibers: 55.6 deg 2 in the Spring field and 27.2 deg 2 in the Fall field, and make use of a wrapper to mangle called litemangle 2 . Although the footprint of HETDEX is unlikely to have any influence on our ability to discriminate LAEs from [O ii] galaxies, it does influence the error estimates we use to assess the size of systematic biases.
These log-normal simulations are not true lightcone simulations like the ones used to probe contamination effects by Massara et al. (2020), or in the tomographic analysis of Awan & Gawiser (2020), since there is no evolution of the true power spectra along the redshift direction. The focus of this paper, however, is to determine how the misclassification of [O ii] emitters as LAEs produces a redshift dependent projection of the [O ii] galaxy density field, and how this redshift dependence, combined with redshift-dependent contamination fractions, affects clustering. These effects are included as we compute 'observed' redshifts to all of our sources from a simulated observer's point of view.

LAE and [O ii] Properties
To generate catalogues with realistic number densities and classification probabilities, we need to assign points in our mocks luminosities and equivalent widths (EWs). To assign LAE luminosities we use the Schechter function fits to the z = 2.1 and z = 3.1 measured luminosity functions from Gronwall et al. (2014). These Schechter functions are parameterised by the characteristic luminosity, L * , the faint end slope, α, and the number density coefficient, φ * . Similarly, we assume that the LAE EWs follow the exponential distributions found by Gronwall et al. (2014) at those two redshifts (see also equation 2 of Leung et al. 2017). The parameters for the [O ii] luminosity and EW functions come from measurements in four redshift bins between z = 0.1 and z = 0.5 by Ciardullo et al. (2013). At redshifts other than bin centers, we use the linearly interpolated or extrapolated values of all of the parameters. In Table 1 we list the relevant parameters mentioned in this paragraph explicitly. The choice of luminosity and equivalent width functions are made to match the previous work on HETDEX source classification by Leung et al. (2017). We also use the approach of Leung et al. (2017) to correct the measured luminosity functions for low EW LAEs (EW<20 Å), which were removed when the luminosity functions were estimated. We do not model any relationship between EW and luminosity as this level or realism is not needed for our work.

Assigning Luminosities and the Radial Selection Function
To apply the radial selection function to our mock and random catalogues we first begin by assigning luminosities to our mock galaxies. A minimum luminosity is computed for each redshift assuming flux limits much deeper than those of the survey: 6×10 −18 erg/s/cm 2 for LAEs and 4×10 −18 erg/s/cm 2 for [O ii] galaxies. A maximum luminosity is computed as a large multiple of the minimum value, Lmax = 6000Lmin.
To test if our choice of Lmax could affect results, larger values were tested. To avoid having to run the full simulation pipeline, we adopted a faster approach to test where we integrated products of the mean extinction, the luminosity function and our completeness model in redshift slices up to an even higher Lmax. The number of sources predicted by our mocks and this simple integration based technique agree to high precision (< 1% difference). Between the two luminosity limits, random luminosities are drawn from our fiducial luminosity functions (Table 1). These luminosities are then translated to fluxes using the luminosity distance to the virtual observer. For the random catalogue distances are randomly chosen in a way that is uniform in volume, and luminosities and fluxes are drawn that are consistent with that distance.
Model flux limits are adopted using the 5σ detection limit given in the HETDEX science requirements (and also presented in Hill et al, in prep), and are based on a typical sky spectrum along with the project's expectations for image quality and the efficiency of the whole telescope, spectrograph and detector system. Achieving the number density of LAEs predicted by these flux limits is a target of HETDEX.
We divide the 5σ flux limit at each mock galaxy's observerframe wavelength by 5, and use that value as the standard deviation of the Gaussian noise we add to the true flux of the emission line. The signal-to-noise (SNR) ratio of this noisy 'observed' emission line is computed and the line is classified as 'detected' in the simulation if its observed SNR exceeds 5. This results in sources being detected 50% of the time if their flux is exactly at the 5σ limit; the completeness that corresponds to other fluxes can be computed by integrating a Gaussian and determining the area above the SNR cut. Carrying out these mock observations is more resource intensive than simply applying the predicted n(z) to the mocks, but it does have the benefit of including Eddington bias in the emission line fluxes. These fluxes are used to estimate the LAE/[O ii] galaxy probabilities (see section 2.10).
In addition to cuts in simulated SNR, another part of the radial selection function of [O ii] emitters comes from their size. As explained by Leung et al. (2017), at z < 0.05 most [O ii] emitters will appear extended in imaging data and therefore easily distinguished from the LAE sample. We therefore do not include z < 0.05 [O ii] emitters in our simulations. The argument that imaging will be able to remove very nearby galaxies is also why we do not consider the impact of other even longer rest-frame wavelength potential contaminants such as [O iii] emitters. We further discuss the possible impact of other contaminants on cosmology in an upcoming paper (Farrow et al, in prep.).

On sky sensitivity variations and extinction
The sensitivity and the flux errors of HETDEX vary from exposure set to exposure set, IFU to IFU and even fiber to fiber. These variations are likely to change how well we can classify LAEs as a function of sky position in the real survey. We do not model on-sky variations in the sensitivity of the survey, as in this work we focus on effects along the line of sight, which are likely much more important, since the clustering of the contaminants evolves quickly due to projection effects (see section 3.2).
Although we do not model the aforementioned on-sky sensitivity variations, we do add some sky-position dependent effects as we attenuate the fluxes and mock spectra (see section 2.9) by Galactic extinction. We model the sky-position dependence of Galactic extinction via the python library of Green (2018), utilising the dust maps of Schlegel et al. (1998). To model the wavelength dependence we use the extinction 3 library with the Fitzpatrick (1999) function and the parameters advocated in the Appendix of Schlafly et al. (2010). We confirm that our code can reproduce the extinction in the SDSS bands (Doi et al. 2010) predicted by Schlafly & Finkbeiner (2011) using Munari et al. (2005 stellar spectra to 2% accuracy. This is more than sufficient for our mock catalogues. We also add this extinction to the randoms, so it is accounted for when measuring the clustering. The Galactic dust reddening versus position from Schlegel et al. (1998) is indicated in Figure 1; the equatorial Fall field typically has more Galactic extinction than the Spring field.

Source density versus redshift
In the following sections we continue our explanation of the mocks with how we assign mock continuum values and use them to classify galaxies as LAE or [O ii]. Before this, let us consider the number density of the mock catalogues without contamination. Our model of the selection function predicts about a million LAEs and 600,000 z > 0.05 [O ii] emitters in the full HETDEX survey. Figure 2 shows the number density of detected emission line sources in one of our mock catalogues (solid lines) and in our random catalogue (dashed lines) versus redshift. The plots are computed using the full volumes of the two fields. The most prominent troughs in the number density of the randoms are not due to noise, but the effect of sky lines, which propagate into the survey's sensitivity limit. The difference in the number density between . The number density of LAEs in one of our mock catalogues (solid lines) and random catalogues (dashed lines, normalized to the total number of mock sources) in the two HETDEX fields. The structure in the randoms is caused by the complex, wavelength dependent flux limits. The number density is computed assuming the sparsely sampled on-sky area of the two fields; the Fall field has higher number density as the fill-factor of the area is larger.
the Spring and Fall fields is mostly caused by different sky filling factors (i.e. there are more gaps in the Spring field). This figure shows the effect of the complicated radial selection function on the detected number density.

Mock LAE/[O ii] Galaxy Spectra
In order to model the separation of LAE and [O ii] emitters as accurately as possible, we generate mock spectra, which allow us to model the noise on the measured EWs more accurately.
To do this we follow the approach of Leung et al. (2017). Details are available in that paper, but summarising the method is helpful for future discussion. Equivalent widths are drawn from the distributions described in section 2.5, with scale lengths as given in Table 1. A spectral slope is assigned to the line emitters, based on (g −r) colours in SDSS filters (Doi et al. 2010) randomly selected from a distribution that looks like the real data (details in Leung et al. 2017). The line flux divided by the EW sets the amplitude of the mock spectra. Then absorption from the intergalactic medium is applied to the mock spectra from the prescription in Madau (1995), using code adapted from Leung et al. (2017) and Acquaviva et al. (2011). We apply broad band filters to the mock spectra to simulate the imaging surveys we intend to use to make estimates of the continuum flux density. In the Fall field we al-  (Abbott et al. 2018). In the Spring field we have complete coverage with Hyper-Suprime Cam (HSC) data in the r-band, so we apply the HSC filter (Kawanomoto et al. 2018). We use the Python library speclite 4 to supply the filter response functions. Noise is added to the mock magnitude measurements, using a rough estimate derived by dividing the 5σ flux limits of the SHELA survey by five. The 5σ sky-aperture magnitude limits of SHELA were determined by Wold et al. (2019), and we take the mean of the four different fields in this work, r = 24.6, converting to flux via Oke & Gunn (1983). For simplicity, we use the noise based off of SHELA for the whole survey, which in some areas is actually covered by DES or HSC. This simplification has some impact on the precision of the assigned probabilities, but should not affect the conclusions of our work. Also, early analysis suggests the HSC data is significantly deeper than the SHELA data, so in the Spring field this is a conservative approach. The noisy magnitude measurements are combined with the noisy line flux measurements to make a noisy estimate of the equivalent width, EW obs .
A few subtleties are worth mentioning here. Firstly, although all of the noise we add is Gaussian, the distribution of EW obs can be realistically non-Gaussian due to taking the inverse of the noisy continuum estimates. Secondly, note we make the assumption in our mock EW observations that the continuum is flat across the r-band and the spectral range of HETDEX. In real data more sophisticated techniques could be used, but here we again decide to be conservative and make the most simple mock measurements from our spectra. Finally, note that for the broad bands we use, which are to the red of Ly α, applying IGM absorption makes no difference to the results, but we include it in the model for possible future work.
Also following the Leung et al. (2017) (Anders & Fritze-v. Alvensleben 2003, and references therein). We also add appropriate Gaussian noise to these lines, following the same wavelength dependent noise prediction used for Ly α. These other emission lines can also be used to identify [O ii] emitters in the regions of redshift where they are within the spectral range of HETDEX. We use this method to generate 1000 realistic mock HETDEX catalogues.

A modified method to assign probabilities
To split the mocks into 'observed' LAE and [O ii] samples, we assign each mock source a probability of being an LAE, based on its 'observed' properties. To generate these probabilities, we reformulate the Bayesian method of separating the two classes that was presented in Leung et al. (2017). We begin by presenting a conceptually different way to formulate the problem, that results in a set of more easily evaluated equations. We use the same set of inputs as in Leung et al. (2017), except for the source colour as it is unclear whether we will have deep multiband imaging over the whole HET-DEX field. We then consider a small n-dimensional box in the parameter space of EW, flux, wavelength, and the flux of other non-[O ii]/LAE emission lines. Assuming the primary emission-line can only be [O ii] λ3727 or Ly α the probability of the source being an LAE is where NLAE and N [O ii] represent the number of LAE and [O ii] emitters, respectively, in the box defined in the space of parameters used for the discrimination. We want this box to be a fixed size in observed coordinates. If we choose a fractional interval of ±δ in observed flux (f ), equivalent width, (w) and wavelength (λ) this corresponds to where dL is the luminosity distance. We can now express the number in terms of integrals over the luminosity function, Φ(L/L * , z) dL/L * , the equivalent width distribution W (w, z) and a Gaussian, G(f obs − fexp, σ line ), with mean fexp and dispersion σ line . This last term expresses the difference between the noisy measured flux and the expected flux, This equation is very similar to equation (19) of Leung et al. (2017), except here we do not normalise by the number density of the emission line sources at the redshift under consideration. Moreover, Leung et al. (2017) chose a fixed size value for δ; we set δ to be infinitesimally small as then we can drop the integrals. The number in an infinitesimally sized box becomes Then, using equation (1), substituting 1 + z with the ratio of observed to assumed rest wavelength, and cancelling the 2δ and λ terms, the expression for the LAE probability becomes with  [O ii] . In these simulations we evaluate equation (8) using the true underlying input luminosity and equivalent width distributions, the input line ratios, and cosmology used in the survey. We also use our mock observed measurements when computing the probabilities, which adds noise similar to real data. Future HETDEX papers will carry out more extensive tests and assessments of LAE classification approaches (Davis et al, in prep).
Our library to produce these probabilities, and also an implementation of the Leung et al. (2017) method, has been integrated into the rest of the HETDEX source classification code, and is also available online 5 . The authors of Leung et al. (2017) provided us with their original code, which we use as a reference (and for some sections reproduce directly) in our implementation. This is also true for parts of the HETDEX simulation pipeline. . This is actually better than the target LAE sample contamination fraction of 2 per cent, but our classifier is better than what is obtainable for real data, as it assumes we know the properties of the input LAE and [O ii] populations perfectly. To consider a pessimistic scenario we also split the samples using a less conservative cut of PLAE > 0.15, which produces a purer [O ii] sample (contamination fraction of 1.7 per cent), but a greater number of contaminants in the LAE sample (5.1 per cent). It might seem surprising that the PLAE > 0.15 cut still gives a relatively small fraction of contaminants, but it is important to realise the PLAE values assigned to individual [O ii] emitters are skewed towards zero, as for most sources, the classification is nearly unambiguous. In the rest of the paper we will refer to the high contamination sample as that for PLAE > 0.15 and the low contamination sample for PLAE > 0.5. These two samples bracket the expected 2 per cent contamination of HETDEX.
In order to create a random catalogue that correctly follows the redshift distribution of the data samples, we also compute LAE probabilities for the random catalogue and apply the same probability cuts. If we used random catalogues without contamination the different redshift distribution of the randoms versus that inferred for the observed samples would cause a huge systematic bias.
The predicted sample purity, defined as the number of correctly classified sources in a sample divided by the total size The correlation functions of the mock catalogues are measured on a two-dimensional grid of the galaxy and/or random pair separation, s, and the cosine of the angle between the pair separation vector and the line of sight, µ. We use the estimator introduced by Landy & Szalay (1993), modified for cross-correlation functions by Blake et al. (2006), The 2D correlation functions are integrated in µ, weighted with the appropriate Legendre polynomials, to yield measurements of the first three even multipoles, ξ (s), following the standard method (e.g. Sánchez et al. 2017). The covariance matrix is estimated from the measured multipoles also using the standard approach, i.e., where C (sa, s b ) is the covariance between multipoles and , for measurement bins sa and s b . The index i runs over the number of mock catalogues, N mk = 1000. The quantities with bars, e.g.,ξ (x b ), are the mean values from all of the mock catalogues.
The simulated Fall and Spring fields have different average flux limits due to different values of the Galactic extinction. Normally if the fields have significantly different average flux limits they would be biased differently and need to be analysed separately. In our simulations however all the LAE sources have the same correlation function; we therefore combine the two fields by computing weighted sums of the multipoles and covariances following equations (8) and (9) of White et al. (2011).

Projected [O ii] clustering
The [O ii] contaminants in the LAE sample are assigned redshifts assuming the rest-frame wavelength of Ly α, and viceversa for the LAE contaminants in the [O ii] sample. The relation between the source redshift assuming the emission line is [O ii] λ3727 rather than Ly α is simply given by As noted in Lidz & Taylor (2016) where H(z) is the Hubble parameter and DM(z) is the comoving angular diameter distance to z. This is given by DM(z) = (1 + z)DA(z), where DA(z) is the angular diameter distance. Given these distortion parameters, the correlation function can be written as where we do not explicitly show the dependence of q on zLAE to shorten the notation (for the expression for the power spectrum see e.g., Pullen et al. 2016;Leung et al. 2017;Grasshorn Gebhardt et al. 2019). Equation (14) The equations describing the clustering of LAEs misclassified as [O ii] galaxies are the same but with the inverse of the distortion parameters, i.e. c −1 and c −1 ⊥ . As the distortion parameters are an approximation of a more complicated effect, we carry out tests in appendix A of the distortion parameters compared to a brute force approach. This appendix also presents an additional test of the methodology we present in section 4.2.
The redshift dependence of the distortions causes the clustering of the [O ii] contaminants to evolve with (Ly α based) redshift. To illustrate these effects we show in Figure  We also note an increase in the hexadecapole. The dotted lines in Figure 4 show the predicted multipoles of z [O ii] = 0.44 [O ii] emitters that are misclassified as zLAE = 3.4 LAEs. We see similar trends to the lower redshift projection, but the amplitude of the distorted multipoles is lower. This decrease is driven by c ⊥ becoming closer to unity, and the distortion transverse to the line of sight is much larger than the distortion in the parallel direction, c (e.g. Grasshorn Gebhardt et al. 2019). We will return to modelling these signals over a redshift range in section 4.2.
At this point we highlight the fact that we always use the true cosmology when computing the parameters for the projection. As highlighted by Addison et al. (2019), if we want to make predictions for the projected functions in the real data, we need to be aware of the additional uncertainty from not knowing the actual cosmology. We discuss this again at the end of the paper.

Simple decontamination ignoring redshift dependencies
As mentioned, in this paper we develop a new method to deal with the redshift dependence of the contamination. We start by slightly modifying equation (12) of Awan & Gawiser (2020) to use the multipoles of the two-dimensional correlation function instead of the angular clustering, giving The matrix contains contributions from the fractions of each type of galaxy that were correctly classified (i.e., the purity), labelled faa, and f bb , and the fractions that were misclassified, f ab and f ba . In Awan & Gawiser (2020) this matrix is given as where the contamination fractions can be computed from the purity via f ba = 1−f bb and f ab = 1−faa. To be more specific to the case of HETDEX, we relabel faa as fLAE and f bb as f [O ii] . Also following Awan & Gawiser (2020), the decontaminated estimates of the auto and cross correlation functions can then be given by applying the matrix inverse to a vector of the observed functions, i.e., This can be related to the full decontaminated crosscorrelation from equations (16) and (17) via We will label this approach 'simple decontamination' and differentiate it from our new approach of 'lightcone decontamination'. We note that Awan & Gawiser (2020) developed this method for angular clustering in tomographic redshift bins. They do not claim that the method will work for our scenario, which has rapidly evolving projected OII contamination within the redshift bins considered. However, we present it in its unmodified form as a demonstration of what might happen if one does not take additional steps to deal with this rapid evolution.

Lightcone based decontamination
When we apply the matrix of Awan & Gawiser (2020) to HETDEX, we make the assumption that the clustering of the galaxies classified as The idea then, is to use something like the decontamination matrix of Awan & Gawiser (2020), but instead of using the observed clustering of the [O ii] emitters, we apply a prediction for the clustering of contaminants that is consistent with the redshift dependence of the interloper number density, n inter (z). To make a prediction for the expected interloper clustering in a redshift range, we refer to the work of Yamamoto & Suto (1999) and Suto et al. (2000). They approximate the correlation function between two galaxies as the correlation function at the mid-point between them (equation 19 of Yamamoto & Suto 1999). This results in a fairly intuitive expression that approximates the observed correlation function for galaxies in the redshift range zmin to zmax as an integral of the redshift-dependent correlation function weighted by the square of the number density as a function of redshift, i.e.
This differs slightly from equation (18) of Suto et al. (2000) in that we use the observed number density of objects, not the true comoving number density in real space, so that the terms related to the selection function and the AP distortion are unneeded. Equation (21) also assumes that n(z) does not change much over the separations under consideration, s < 180 h −1 Mpc, and the redshift evolution of ξ (s; z) is slow enough to be unimportant over those same scales. This is an approximation, as there are certainly redshifts over which the projected [O ii] clustering changes rapidly. But as we will see, the simplification works reasonably well for our simulations. For surveys whose properties differ from those of mock HET-DEX, it would be prudent to test the technique with tailored simulations.
To continue, we define a function to carry out the lightcone (i.e., redshift) integral, F(x, y), as Given equation (22), and using equations (16) and (17) with equation (21), we get the following expression for the prediction of the observed auto-correlation function for LAEs with redshift z and purity f (z), The subscripts on n(z) indicate which observed sample redshift versus volume number density should be used. To be closer to the numerical implementation we replaced the multipoles of equation (16) with the 2D correlation function; this makes no practical difference as the decontamination has no µ dependence so the order of decontamination and converting the measurements to multipoles is unimportant. The number densities are for the total "observed" samples with contaminants. Here we use the fact that the (LAE) redshift distribution of the [O ii] interlopers is given by n inter.
[O ii] (z) = (1 − f (z))nLAE(z). We explicitly include the redshift dependence of the purity parameters, to differentiate them from their redshift independent versions: fLAE and f [O ii] . For brevity we also drop the LAE subscript from the redshift dependent purity parameter in this section. Since the integral of the number density gives the total number of objects, the redshift dependent and independent types of purity parameter are related by the lightcone integral, i.e. for the LAE sample and the equivalent for the [O ii] sample. Samples of emission line galaxies with nearby rest-frame wavelengths, such as Hβ and [O iii], will have a non-zero cross-correlation due to large-scale structure, and even distant samples like our LAE and [O ii] galaxies will have a slightly non-zero cross-correlation due to cosmic magnification of the background galaxies by the foreground galaxies. As mentioned, our simulations do not include such magnification, as it is a very small signal. We therefore will now assume that the true cross-correlation between the [O ii] and LAE samples is zero. This limits the method to scenarios, like HETDEX, where the contaminants are not correlated with the main sample. However, future work on surveys with sample-contaminant cross-correlations could still use equation (23) in an approach that tries to forward-model the relevant auto-and cross-correlations.
Given that the cross-correlation is zero, to find our estimate for ξ true LAE (s, µ, z) we now make the assumption that the LAE clustering does not evolve with redshift, which allows us to take it out of the lightcone integral and we are left with The integrals over redshift are then all carried out numerically and an estimate of the true correlation function, ξ est LAE , can be made. Even when the assumption that ξ true LAE (s, µ, z) does not evolve over the whole survey is unreasonable, this approach can be utilized to estimate the observed correlation function in bins of redshift over which that evolution is expected to be small enough that this average provides a meaningful observable. Additionally, we discuss plans to relax this assumption in section 7.
Given our estimate of ξ est LAE (s, µ), we can use the crosscorrelation term of equations (16) and (17) to predict the observed cross-correlation measured with LAE redshifts as follows: where we have restored the cross-correlation term, is the redshift distribution of [O ii] emitters projected into LAE redshifts. This latter relation can easily be measured by computing redshifts assuming Ly α for the galaxies in the [O ii] sample. We cannot estimate the redshift dependent cross-correlation by simply rearranging this expression, but if the crosscorrelation is expected to be non-zero a forward modelling approach could be used. Here we forward-model the expected cross-correlation signal, using the fact ξ true,proj LAE×[O ii] (s, µ, z) = 0 in HETDEX, which gives This equation is the lightcone version of equation (

RESULTS
The differences between the multipoles of the auto-correlation function for the (de)contaminated and the pure cases (measured from the corresponding pure LAE catalogues) for the full HETDEX redshift range (1.9 < z < 3.5) are given in the left column of Figure 5. The upper and lower panels in Figure 5 give the two PLAE cuts under consideration. The points are the mean of the 1000 mocks. Each measurement has been divided by the statistical error expected for the HETDEX survey, i.e., the square-root of the diagonal of the covariance matrix derived from the mocks. As we have 1000 mock catalogues, the errors on our mean measurement are much smaller than the statistical error on a single HETDEX mock. In the following, when we refer to σ, we specifically mean the statistical error on a single realisation. The right column shows for the same redshift range, the residual difference between the observed cross correlations of the [O ii] and LAE samples and the predicted cross correlation from the contamination, from equation (19) for the simple method and equation (27) for the lightcone method. The differences are divided by the statistical errors. Recall from equation (20) that for the simple method the decontaminated cross-correlation is related to the residual we plot, , by a constant factor. As we divide by the statistical error this means the residuals we plot for the simple decontamination method are also the statistical significance (ignoring the diagonal terms of the covariance) of the spurious cross-correlation signal that remains in the simple decontaminated multipoles. We chose to frame the discussion of the simple method in terms of the residuals, in order to make comparisons to the lightcone approach easier.
In the following subsections we study the impact of contamination on the raw (section 5.1), the simple decontaminated (section 5.2) and lightcone decontaminated (section 5.3) multipoles. The final subsection considers the scenario where the redshift range with no modelled [O ii] emitters (z < 0.05) is cut out of the catalogue (section 5.4).

Raw clustering multipoles
It is clear from the dotted lines Figure 5, which show the results from the raw measurements of the contaminated mocks, that interloper contamination modifies the correlation signals. This has been previously seen or predicted many times in the literature (e.g. Leung et al. 2017;Awan & Gawiser 2020;Grasshorn Gebhardt et al. 2019;Addison et al. 2019;Massara et al. 2020). In the low contamination LAE sample, the offset in the auto-correlation is generally < 1σ but reaches a maximum difference of ∼ 2.5σ at scales down to twice the cell size of the log-normal simulations (indicated by the vertical dashed line). In the high contamination sample, where 5.1 per cent of the LAE catalogue are [O ii] galaxies, many of the auto-correlation function hexadecapole measurements are systematically high by up to 2σ at separations > 20 h −1 Mpc. At smaller scales, where the effect of the contamination appears greater, the raw auto-correlation function multipole measurements can be biased by several σ.
The signal of contamination in the cross-correlation function is even stronger. We can see from the right column of Figure 5 that we expect the signal to be clearly detected in the HETDEX survey, even at low levels of contamination. The cross-correlation function, which is zero for pure samples, shows a positive monopole and hexadecapole, and a negative quadrupole. The negative quadrupole implies that the two-dimensional cross-correlation function appears elongated transverse to the line of sight. This could be explained by the fact that the cross-correlation is dominated by the strong clustering signal of [O ii] galaxies (Grasshorn Gebhardt et al. 2019), which, as mentioned, appears elongated due to projection effects.

Simple Decontaminated measurements
In this section, we use equation (18) to decontaminate the galaxy samples while ignoring redshift dependencies in the interloper fraction. We take the required contamination and purity factors, i.e. the components of Ds, from the numbers of LAEs and [O ii] emitters in the mock catalogues averaged over the realisations; in real data, some procedure will be needed to measure the contamination and purity. We cover this scenario in section 6.
The dashed lines in Figure 5, labelled 'simple decontaminated', show the results. In the low contamination LAE sample (1.3 per cent [O ii] galaxies), the simple method results in all of the multipoles having no significant systematic bias (< 1.0σ) all the way down to the resolution limit of the catalogue. The decontaminated measurements are a modest improvement over the raw measurements. In the high LAE sample contamination case (5.1 per cent [O ii] emitters) the decontamination improves the monopole, but at small scales the monopole still has a bias approaching ∼ 2σ. Additionally the hexadecapole is biased up to ∼ 1.5σ low after the decontamination -a bias in the opposite direction from what was seen in the raw data.
Subtracting the predicted contribution to the crosscorrelation from contamination decreases the observed crosscorrelation. However, the correction is too small, and very significant (> 5σ over a range of scales) cross-correlations still remain in the high contamination case. The low contamination sample also shows residual cross-correlation signals that increase relative to the statistical errors. As smaller separations are considered, these signals increase, causing what can be as great as a ∼ 5σ spurious signal. As mentioned, the residuals we plot are related to the full decontaminated cross-correlation from the simple method via a constant factor (equation 20), meaning they also show the significance of spurious cross-correlations left after the full decontamination process. Although a strong cross-correlation signal would not be expected to directly affect cosmological parameters derived from HETDEX, a non-zero cross-correlation after simple decontamination does suggest a failure of the modelling which can cause indirect effects. For example, the methods presented in Grasshorn Gebhardt et al. (2019) and Addison et al. (2019) use the cross correlation signal to determine the purity and contamination of the LAE and [O ii] samples. If the lightcone effects are ignored, then inferred values of the contamination will become artificially high in order to force the decontaminated cross correlation toward zero. This in turn would impact the contamination and purity values used to decontaminate the auto-correlation, causing additional bias in their measurement. We will demonstrate this in section 6.

Lightcone decontamination
In this section we apply our new method of decontaminating the samples while accounting for lightcone/redshift effects. As with the simple method, we will start by assuming perfect knowledge of the purity and contamination, i.e. f (z) and f [O ii] (z), and use the redshift dependent contamination fraction measured from the random catalogues (i.e., Figure 3). The results of this are the solid lines in Figure 5.
In the low contamination LAE sample auto-correlation function we see little meaningful difference when compared to the simple method. Both return clustering multipoles with very little evidence of systematic bias. On the other hand, in the high contamination sample, the lightcone-based contamination does a better job at correcting the multipoles. Down to a scale of 20 h −1 Mpc the new method returns measurements with a bias less than ∼ 0.25σ.
The differences between the lightcone model predicted and measured cross-correlation functions show an even greater improvement over the simple method. In both the low and high contamination scenarios, the new method accounts for the spurious cross-correlation signal leaving less than ∼ 1σ residuals at all scales greater than twice the cell size of the LAE simulation box.
The improvements seen in our approach support the idea that the residual, biased signals seen when using simple decontamination come from applying it to clustering measurements without accounting for the significant redshift evolution of the projected clustering and the contamination fraction within the redshift bin.

Restricted redshift range
The redshift range studied so far, 1.9 < z < 3.5, includes a volume in which we expect there to be no [O ii] emitters. This is strictly true at redshifts 1.90 < zLAE < 2.06, since  Removing the redshift range over which the LAE sample is pure means the contamination fraction for LAEs increases to 6.8 per cent for the high contamination sample (defined by PLAE > 0.15) and 1.8 per cent for the low contamination case. In Figure 6 we show results for the redshift range 2.22 < z < 3.5. As before the plots with auto-correlations show the difference between the (de)contaminated measurements and those from the corresponding pure catalogues with the same redshift range. In the new case, the contaminated cross correlation function looks the same as for the full redshift range. However, the simple decontamination procedure, which ignores the redshift dependence of the purity within the redshift bin, works better than for the full z range. Nonetheless, it can be seen from Figure 6 that even when cutting out redshifts with the most dramatic changes in purity and contamination, not accounting for lightcone effects can still cause biases. In the lower contamination case, the simple decon-tamination leaves a ∼ 2σ biased cross-correlation monopole at separations s > 20 h −1 Mpc. This bias increases to ∼ 5σ for the higher contamination case. The auto-correlation also displays significant biases in the high contamination case if redshift effects are ignored. The hexadecapole shows a systematic bias even at large scales of up to ∼ 1σ, while the monopole shows a ∼ 2σ bias at the resolution limit of the simulation.
In contrast, as for the full redshift range analysis, the new lightcone based decontamination method returns very close to the true auto-correlation function down to twice the cell size of the LAE box, and also accurately predicts the crosscorrelation, with only tiny insignificant residuals, over the same range of scales.

Fitting model and technique
The work so far has assumed we have perfect knowledge of the contamination. We now attempt to fit for the contamination by minimising the residual cross-correlation function, which as mentioned previously, should be zero for the case of perfect decontamination as the LAE and the simple method and our lightcone-based approach to decontamination. We minimise the residual difference between the observed cross correlation multipoles and the predicted cross-correlation multipoles evaluated using equations (19) and (27) for the simple and lightcone methods respectively, where ξ LAE×[O ii] is a vector of all the decontaminated multipole measurements, with superscripts indicating the observed ('obs') and predicted ('pred, obs') cross-correlation functions due to contamination, and C is the covariance matrix of 'observed' cross-correlation functions that we measure from the mock realisations. The data to which we fit our model is the mean of the 1000 measured cross-correlation functions, but we use the covariance matrix, C, appropriate for a single realisation. This means our results will have the reported uncertainty appropriate for a single HETDEX realisation, but they will be centered much closer to the best model description than statistically likely for real data.
In the lightcone method for the redshift dependence of the decontamination we use a two parameter model for each sample, where we fit the purity at the high and low edges of the redshift bin f (z low ) and f (z high ) for both LAEs and [O ii] galaxies, corresponding to four parameters in total. We linearly interpolate between the two purity fractions to return purity values for the intermediate redshifts. This is motivated by considering the simplest model that could fit the redshift dependent contamination fractions shown in Figure 3.
We need models of the LAE and [O ii] clustering to make predictions for the cross-correlation for given contamination parameters. One approach could be to use the measured LAE and [O ii] correlation functions, decontaminated using the model whose χ 2 is being evaluated. Since our aim is to present a proof of concept, and highlight the sensitivity of these statistics to the redshift dependent f (z), we avoid this extra complication by using the true [O ii] and LAE correlation functions, taken as the mean measurement of the 1000 pure mocks of the Fall field. For the OII correlation function in the simple method we use the mean correlation function measured from 1000 pure OII catalogues using LAE redshifts.
If one did estimate auto-correlation functions from the data, then their errors would have to be accounted for in the fitting routine. For the simple method one could use equation A16 of Awan & Gawiser (2020), which is an expression for the covariance of the decontaminated auto-and crosscorrelation functions that propagates the errors on the observed auto-and cross-correlation functions. For the lightcone method, we argue we can avoid these issues in future work by including models of the LAE and [O ii] correlation functions in the fitting (see more discussion in section 7). As we only use the covariance of the cross correlation function in our fits, it means the uncertainties in our parameter estimates correspond to the unrealistically optimistic scenario of perfect measurements or knowledge of the LAE and [O ii] auto-correlation functions.
We fit the sample in the restricted redshift range 2.22 < zLAE < 3.50 since we assume all lower redshift [O ii] emitters can be correctly classified by imaging. We note even in the restricted redshift range there is an area that in practice has 100 per cent LAE sample purity, where many other [O ii] emission lines also lie in the HETDEX spectral range. Because of this, and the presence of other sharp features in the purity versus redshift relation, our simple straight line model for the contamination is not an ideal model of f (z). Nevertheless, we shall see it does a reasonable job of describing the behavior of contaminants.
To explore the posterior of the contamination model parameters we use the software package cobaya (Torrado & Lewis 2021), which uses the MCMC sampler of Lewis & Bridle (2002) and Lewis (2013). We use a flat prior between 0.6 < f < 1.0 for the purity parameters, and compute the likelihood as L = exp(−χ 2 /2). We ran eight chains for each fit and set the convergence criteria in cobaya to require that a value of the Gelman-Rubin statistic (Gelman & Rubin 1992;Brooks & Gelman 1998), modified as described in Lewis (2013), of R − 1 = 0.005 is achieved (the cobaya parameter rminus1_stop). The smallest separation bin we use is 15.0 < s[ h −1 Mpc] < 20.0. Even though we assume perfect knowledge of the LAE and [O ii] auto-correlation functions, we avoid using smaller scales in the fit. This mimics a scenario where a model of the auto-correlation function might not work sufficiently well on small, more non-linear scales. The MCMC contour plots, confidence intervals and best-fitting parameters were derived using the getdist Python package (Lewis 2019 Figure 8. The 68% and 95% MCMC contours derived by fitting the cross correlation function of the mock catalogues using a two parameter model for the purity to the LAE and [O ii] samples. These results are from our lightcone decontamination method. The data are for the high contamination P LAE > 0.15 sample, which has an average contamination rate of 7%. The normalized, marginalized 1D likelihoods are also shown. A redshift dependence of the contamination of the LAEs is clearly detected. weighed chain values. In section 6.4 we explain why we do this rather than quoting the marginalised mean and standard deviation. These values can be compared to the total purity and contamination of the whole sample. The LAE purity has a slight bias with respect to the true purity of the sample (f = 0.982), while the [O ii] purity value agrees with the true value of f [O ii] = 0.956 (these true values are exact to the given number of significant figures.) Note that since we fit to the mean of the mock measurements, the detection of this bias in LAE purity is more significant than its size. In this case the LAE purity measure-ment is biased much lower than the truth, this is because the true purity values leave spurious cross-correlation signals (see Figure 6), so a better χ 2 is given by biased parameters that return a smaller cross-correlation signal.

Results of the fit using the lightcone method
We now turn our attention to the new lightcone decontamination model, which uses two parameters to describe the redshift dependence of the sample purity. The results of our MCMC fitting of the cross-correlation function for the high contamination sample are given in Figure 8. We can see a degeneracy between the low and high redshift purity limits. Lower high-redshift completeness can be somewhat mitigated by higher low-redshift completeness. However, this degeneracy is not complete, and more contamination at higher redshifts is favoured. The [O ii] emitters, on the other hand, are consistent with 100 per cent purity and there is no significant detection of any redshift dependence of the contamination.
The MCMC chains of the low contamination sample also show the contamination of the LAEs is detected. In this case,

Constraints on the purity
We now wish to visualize our constraints in a plot of purity versus redshift. The flat priors on the purity at the ends of the redshift range result in non-flat priors on the derived values of purity at intermediate redshifts. For example, at the center of the redshift range a contamination fraction in the middle of the prior range is favoured, as there are more allowed values of f (z low ) and f (z high ) that produce a purity crossing that point. In addition, we have the added complication that the expected purity is very close to the priors we use, and those priors cannot be expanded without including nonsensical values of purity (i.e., > 1.0). Therefore to minimise the effects of priors when plotting the purity constraints of Figure 3, we do not plot the weighted mean and standard deviation of the chains. Instead we show the best-fitting parameters as a dotted line, and the maximum and minimum values of purity in 68 per cent of weighted chain values with the highest likelihoods as a shaded region. We compute these parameters using the likelihood functions in getdist (Lewis 2019).
We can see in Figure 3 that for both of the PLAE cuts, and for both the LAE and [O ii] galaxy samples, the best fitting parameters qualitatively reproduce the slope and amplitude that would be expected for a straight line fit to the more complex behaviour of the true purity. The true purity as a function of redshift is nearly always within the minimum and maximum range defined by the 68 per cent of chain positions with the highest likelihood. We do however see the true purity is slightly outside the region at the two redshift extremes. This is reasonable, as at those positions the true purity deviates most from the straight line. It is possible the fits could be improved by adding additional parameters to the model to mimic the sharp drops in purity. However, we will see in section 6.5 that this detailed modelling is unnec-essary to decontaminate the clustering measurements for the scenarios considered here. It is also unclear whether the data are really constraining enough to warrant additional model parameters.
In general, we see that the LAE purity is better constrained than the [O ii] purity. A possible cause of this is the choice of measuring the cross correlation function using LAE redshifts for both samples. This mapping between observed wavelength and redshift means that the LAE contamination in the [O ii] sample has a correlation function that is fixed with redshift. This allows us to remove it from the integral in equation (27). Since the LAE clustering term is independent of redshift, the sensitivity to the redshift dependence of purity in equation (27) (z). Note this approach is not guaranteed to give a strong constraint on the [O ii] purity, as the LAE sample has an intrinsically weaker correlation function, which may get even weaker when projected to lower redshifts. Since we are mainly interested in the contamination of the LAE correlation function, and f (z) is the only purity term appearing in equation (17), we leave further study of this to future work.

Using the fitted contamination
In section 5 we assume we have perfect knowledge of the purity as a function of redshift. Here we use the purity we have fit from the cross-correlation function, to see if it can be used to give unbiased, decontaminated measurements of the auto correlation functions. We decontaminate the measurements from the two fields separately using the estimated parameters from the combined field, and then combine them after the decontamination (following, e.g., White et al. 2011). To test if combining before or after decontamination makes a difference in our mocks, we tried combining the two fields before decontaminating for one of the scenarios, specifically the high contamination sample using the simple decontamination method. We found results that agree closely, confirming the order of combining and decontaminating is not important for our mocks. In the future, further tests could be carried out to find if this is also the case with the real HETDEX data.
In Figure 9 we show the results of using the best-fitting contamination parameters. We can see that these parameters return much smaller residual cross-correlation signals than the true parameters for the simple approach that ignores redshift effects. This is because it is able to fit the extra cross-correlation signal with artificially low purity values. However, the LAE auto-correlation function shows the danger in this, as the incorrect inferred purity values results in even more biased results than when using the true values. The monopole from the high-contamination sample can have a ∼ 2σ bias even at larger scales (s > 20 h −1 Mpc) and the bias gets even worse at small scales. The low-contamination sample monopole shows a ∼ 1σ bias at large scales, increasing to ∼ 2σ at small scales.
In contrast, our new lightcone method of accounting for the redshift dependence of contamination works well for both the auto and cross-correlation functions, yielding autocorrelation measurements with only a ∼ 0.3σ bias at large scales (s > 20 h −1 Mpc) and biases smaller than the statistical error down to twice the cell size of the simulation box. The results from the fitted contamination are slightly worse than the results from using the true purity versus redshift, but this is to be expected given the limitations of the model parameterisation and the added uncertainties from the fit. These results show this method could potentially be used to both fit for contamination and correct clustering measurements.
It is important to note that in Figure 9 we have the benefit of a best-fitting purity measured from our large number of mock catalogues. In the real data, there will be some statistical error associated with the best-fit purity parameters that would have to be propagated into the final results.
It should also be noted that an alternative method of deriving the f (z) to the one presented here is to use the simple decontamination method in narrow redshift bins where the contamination and projection parameters are roughly constant. This could have benefits, such as avoiding potentially losing information by not integrating over redshift, and not needing a model of the LAE or [O ii] correlation function. On the other hand, using narrow redshift bins would make the individual measurements noisier. We leave a comparative study of these approaches for later work. We also highlight regardless of how the f (z) is derived, the lightcone formalism allows one to optimize the size of redshift bins for the cosmology measurements without having to be restricted by the requirement of avoiding projected interloper clustering evolution.

CONCLUSIONS
We generated 1000 mock catalogues of the HETDEX survey, which include clustering, redshift-space distortions, redshiftdependent noise and a realistic selection function. We used a reformulated version of the probabilistic classifier of Leung et al. (2017) to generate catalogues of LAEs with realistic, redshift-dependent [O ii] galaxy contamination, and considered two scenarios which bracket the expectations for HET-DEX: low contamination (1-2 per cent) and high contamination (5-7 per cent). These catalogues were used to explore the impact of the redshift dependence of the contamination fraction and the correlation function of the contaminants on the observed correlation functions.
The mock catalogues show that existing methods of decontamination such as Awan & Gawiser (2020), and other methods which do not account for redshift evolution of the interlopers within a redshift bin, should not be directly applied to clustering measurements from a survey such as HET-DEX, unless the analysis is restricted to using redshift bins that are narrow enough for the evolution effects to not be important -which is unlikely to be ideal for the HETDEX cosmological analysis. This is because in HETDEX both the [O ii] galaxy contamination fraction and the projected [O ii] clustering vary with redshift. Although in the low contamination cases we consider, such methods may be effective enough, when the contamination is larger (with misclassified fractions of 5-7 per cent) biases appear. In the auto-correlation function these biases can be larger than the statistical error of HETDEX. Moreover, the biases in the cross correlation signal are even stronger, with a spurious signal of more than 5σ of the expected statistical noise being left after subtracting the expected cross correlation from contamination. A biased, decontaminated cross-correlation function is not a problem for HETDEX in itself. However, we have also shown the inability of the simple decontamination method to correctly account for the spurious cross-correlation signal results in biases in the inferred contamination fractions from fits to the observed, raw cross-correlation functions. If these incorrect contamination fractions are used to decontaminate the autocorrelation function, additional biases will be propagated into the measurements used to fit cosmological parameters.
We present a method to account for the redshift effects that can be applied when there is no true cross correlation between the pure samples of the target galaxies and the contaminants. Our method combines the literature approach to decontamination with the models of correlation functions integrated along a lightcone given in Yamamoto & Suto (1999) and Suto et al. (2000). This methodology is needed in scenarios like HETDEX, where the correlation function of the [O ii] interlopers evolves rapidly due to projection effects. Accounting for the lightcone effects gives a much better model of the cross correlation, and it also produces decontaminated autocorrelation functions that agree with the pure measurements to an accuracy much smaller than the statistical noise down to 20h −1 Mpc. Although we formulate this work for the correlation function, our findings should also apply to the power spectrum.
The work on this topic is not complete. The method we have developed is an improvement over existing methods, but we still have to assume that the true clustering of LAEs and [O ii] galaxies does not evolve with redshift. Allowing for evolving LAE and [O ii] correlation functions is possible in the framework we present however, and such an evolution could be constrained with auto-correlation function measurements from the data. We also mention once more that we always compute the projected clustering using distortion parameters assuming a true cosmology. As advocated by Addison et al. (2019), future work could consider the impact of this limitation.
Although our decontamination method itself relies only on interpolating over the observed correlation functions and the assumption of a fiducial cosmology for computing the distortion parameters, our method of fitting the contamination assumes perfect knowledge of the LAE and [O ii] autocorrelation functions. One way to make our experiments with fitting the contamination applicable to real data would be to develop an approach that simultaneously fits models of the cosmology, bias, and the contamination parameters to both the cross-and auto-correlation functions. The distortion parameters could also be modified to be consistent with the different cosmologies during the fit, solving a further issue.
Simultaneously fitting contamination, galaxy bias, and cosmology is advocated in e.g., Addison et al. (2019) and Grasshorn Gebhardt et al. (2019). While their approaches assumed single contamination values for the whole sample, we would suggest including our new redshift dependent modelling of contamination. This is especially true if the contamination fraction of HETDEX is closer to 5 per cent than 1 per cent, and if there are significant differences in the sample purity at different redshifts. Before applying the method to real data, a modelling pipeline which includes redshift-dependent contamination should be tested on more realistic simulations than our log-normal mocks. These simulations should include possible evolution in the true galaxy correlation function and a better modelling of the clustering and redshift space distortions on non-linear scales.
To summarize, this paper highlights the importance of the redshift dependence of the contamination, presents a method to model these effects, and shows that such effects should be considered when decontaminating clustering measurements from surveys with redshift dependent contamination within the adopted redshift bins. These effects are particularly important when using the cross-correlation function to constrain contamination.
age resources that have contributed to the research results reported within this paper. URL: http://www.tacc.utexas.edu This research made use of NASA's Astrophysics Data System Bibliographic Services.

DATA AVAILABILITY
The HETDEX data is currently proprietary, but public releases are planned for the future. The authors will respond to reasonable requests for access to the simulation data used in this paper, so long as no unreleased proprietary data is involved.

APPENDIX A: TESTS OF THE PROJECTION
In this section we further test our modelling of the light cone projection and our use of the distortion parameters c ⊥ and c . To do this we measure the projected clustering of pure samples of [O ii] emitters in 1000 of our Fall field mock catalogues. This means we measure the multipoles of a pure [O ii] catalogue assuming Ly α redshifts for everything. To assess how well equation (21) models the redshift dependent distortion of the [O ii] density field we compute a prediction of the projected correlation function for the redshift interval 2.22 < z < 2.5 via where n pure [O ii] (z) is the number density as a function of LAE redshift of the projected, pure [O ii] sample. We then compare the multipoles of the result to the multipoles measured from the mock catalogue ξ mock (s).
The difference between the mock and predicted multipoles of the projected [O ii] galaxy correlation function is given as dashed lines in Figure A1; the measurements are divided by the statistical error expected for one HETDEX Fall-field realisation. We can see only very small (∼ 0.2σ) differences between the statistical errors down to s = 20 h −1 Mpc, which gives further confirmation of our use of the Yamamoto & Suto (1999) and Suto et al. (2000) approach to account for the redshift dependence of the projection parameters. On scales smaller than this, the differences increase, becoming approximately the same size as the statistical error around s = 10 h −1 Mpc. It is unclear why the modeling of the projection does not work perfectly down to the smallest scales, but for the purposes of decontaminating the LAE signal, the projected [O ii] clustering is down weighted by the square of the LAE sample contamination. Thus this small bias should not affect our results on LAE clustering decontamination. In section 6 where we fit the cross correlation, we restrict ourselves to larger separations which will also mitigate any possible effect.
In this paper we use a model of ξ proj [O ii] (s, µ, z) that relies on distortion parameters, i.e., equation (13). These distortion parameters are, however, an approximate model of the effects of the true projection as they do not account for the different redshifts of the two galaxies in the pair. We therefore also compute a mapping between the s and µ coordinates using a brute force method. In 800 uniform bins of LAE redshift between 2.06 < z < 3.5 (i.e., the range where a Ly α emitter has a wavelength greater than 3727 Å) we generate pairs of galaxies in cartesian coordinates with given values for the true separation s, µ and compute an observed right ascension and declination to a virtual observer. The pairs we generate are always in a plane and use a fixed line-ofsight. We then recompute the cartesian coordinates from our 'observed' coordinates but assume [O ii] redshifts, and recompute the line-of-sight to the galaxy pair. By measuring the r , µ of this projected pair we generate a look-up table between true and projected coordinates in 400 bins of r and µ. We interpolate over this 3D look-up table to provide an alternative coordinate mapping for our model of the projected [O ii] clustering.
The dotted lines in Figure A1 show the difference of the mock multipoles and the projected multipoles using the brute force look-up   Figure A1. The difference between the mean multipoles measured from 1000 Fall field mock catalogues of [O ii] emitters, analysed using redshifts assuming the [O ii] emitters are actually LAEs, and our prediction of the projected [O ii] galaxy clustering. Dashed lines use our standard approach with distortion parameters and an integral along the redshift range. The dotted lines replace the distortion parameters with the brute force approach detailed in the text. The distortion parameters and the brute force approach give nearly indistinguishable results. The plot is split into two panels to allow a larger dynamic range, in the right panel only every forth data point is marked with a symbol for visual clarity.
coordinates. There is extremely close agreement between the predictions using the distortion parameters and the brute force look-up table. This confirms that the distortion parameters c , c ⊥ , which are advocated by several other authors (e.g., Visbal & Loeb 2010;Gong et al. 2014;Lidz & Taylor 2016;Pullen et al. 2016;Leung et al. 2017;Grasshorn Gebhardt et al. 2019), are an excellent model of the true distortion for the HETDEX LAE/[O ii] confusion scenario. This paper has been typeset from a T E X/L A T E X file prepared by the author.