The informativeness of [C II] line-intensity mapping as a probe of the H I content and metallicity of galaxies at the end of reionization

Line-intensity mapping (LIM) experiments coming online now will survey fluctuations in aggregate emission in the [C II] ionized carbon line from galaxies at the end of reionization. Experimental progress must be matched by theoretical reassessments of approaches to modelling and the information content of the signal. We present a new model for the halo-[C II] connection, building upon results from the FIRE simulations suggesting that gas mass and metallicity most directly determine [C II] luminosity. Applying our new model to an ensemble of peak-patch halo lightcones, we generate new predictions for the [C II] LIM signal at $z\gtrsim6$. We expect a baseline 4000-hour LIM survey from the CCAT facility to have the fundamental sensitivity to detect the [C II] power spectrum at a significance of $5\sigma$ at $z\sim6$, with an extended or successor Stage 2 experiment improving significance to $48\sigma$ at $z\sim6$ and achieving $11\sigma$ at $z\sim7.5$. Cross-correlation through stacking, simulated against a mock narrow-band Lyman-break galaxy survey, would yield a strong detection of the radial profile of cosmological [C II] emission surrounding star-forming galaxies. We also analyse the role of a few of our model's parameters through the pointwise relative entropy (PRE) of the distribution of [C II] intensities. While the PRE signature of different model parameters can become degenerate or diminished after factoring in observational distortions, various parameters do imprint themselves differently on the one-point statistics of the intrinsic signal. Further work can pave the way to access this information and distinguish different sources of non-Gaussianity in the [C II] LIM observation.


INTRODUCTION
As the studies of galaxy formation and cosmology advance, galaxies at high redshift and in particular at the end of the epoch of reionization (EoR) are as important as ever.From the perspective of cosmology, understanding how the Universe exits the EoR relies on understanding the physical conditions, abundances, and cosmic structure of the first stars and galaxies that carry out the reionization process.From the perspective of galaxy formation, to understand the nature of these first galaxies and proto-galaxies is to understand the starting point of cosmic star-formation history, with star-formation activity continuing to rise all the way through the 'cosmic noon' of  ∼ 2-3 (see, e.g., the review of Madau & Dickinson 2014).The EoR thus sits at a key intersection of cosmology and extragalactic astrophysics, between structure formation and galaxy formation.
★ E-mail: phorlaville24@ubishops.ca Studying the population of galaxies from the EoR is fundamental to understanding the Universe both as we know it today and as it was almost fourteen billion years ago.
Although targeted or pencil-beam surveys are already detecting individual objects from the EoR, conventional spectroscopic galaxy surveys will struggle to probe EoR galaxies at the efficiency required for this population to act as a true large-scale cosmological probe.Thus, a full understanding of the EoR demands alternative ways to measure the clustering of star formation and gas at high redshift.
Intensity mapping (IM), or line intensity mapping (LIM), is a technique analysing low-to-medium resolution spatial-spectral observations to probe a three-dimensional cosmological volume for the signature of large-scale structure as traced by spectral lines associated with specific atomic or molecular species.The strength of such analyses resides in their potential to probe large cosmological volume efficiently while measuring the contribution of all galaxies, including those impractical for standard galaxy surveys to detect Visualization of correlated LIM observables derived from applying our state-of-the-art understanding of the galaxy-halo connection to one of the cosmological simulations used in this work, and convolved with realistic instrumental responses.The [C ii] signal (top panels), tracing diffuse neutral gas in the ISM, is simulated using the model put forward in this paper, and convolved with a Gaussian angular beam of width ≈ 50 ′′ and a frequency-space profile of width ∼ 3 GHz, corresponding to the near-future capabilities of the CCAT facility.The CO signal (middle panels), tracing molecular gas, is simulated using a model from prior literature (Li et al. 2016) and convolved with an angular beam of width ∼ 4 ′ , corresponding to a 10 metre dish as used by the COMAP Pathfinder (Cleary et al. 2022).The 21 cm signal as shown neglects contributions outside of halos -although the intergalactic medium should not dominate the signal, given the high ionized fraction expected at  ∼ 6 -and is convolved with a beam of width ∼ 10 ′ , corresponding to an interferometer with ∼ 500 m baselines.
While atomic hydrogen will be most abundant particularly at early cosmic epochs, the past decade has seen studies of LIM concepts leveraging other spectral lines associated with dust-obscured star-formation activity to complement IM surveys of the 21 cm atomic hydrogen line (see Figure 1).This includes the rotational transitions of carbon monoxide (CO) observable at centimetre wavelengths, one of the most abundant luminous tracers of molecular gas and thus the subject of significant work in LIM modelling (Righi et al. 2008;Visbal & Loeb 2010;Lidz et al. 2011;Pullen et al. 2013;Li et al. 2016;Sun et al. 2019;Moradinezhad Dizgah & Keating 2019;Yang et al. 2021Yang et al. , 2022) ) and experiment (Keating et al. 2016(Keating et al. , 2020;;Keenan et al. 2022;Cleary et al. 2022).
At shorter wavelengths, the 157.7 micron [C ii] line from ionized carbon (with rest-frame frequency  rest = 1900.5GHz) becomes one of the most promising tracers of high-redshift galaxies.
In our local volume, [C ii] is a key far-infrared (FIR) cooling line that makes up as much as 1% of the bolometric FIR luminosity in local gas-rich and highly luminous galaxies (Stacey et al. 1991).Given mounting observations with the Atacama Large Millimetre/submillimetre Array (ALMA) of individual [C ii] lines from  ∼ 4-8 (see, e.g.: Pentericci et al. 2016;Knudsen et al. 2016;Smit et al. 2018;Yan et al. 2020;Fujimoto et al. 2021), the existence of an appreciable population of [C ii] emitters in the late EoR appears certain.The brightness of the cosmological [C ii] emission at this relatively metal-poor epoch remains uncertain, however, and has been the subject of a substantive body of prior modelling work (e.g.: Gong et al. 2012;Uzgil et al. 2014;Silva et al. 2015;Yue et al. 2015;Serra et al. 2016;Dumitru et al. 2019;Chung et al. 2020;Karoumpis et al. 2022).
At time of writing, a range of EoR [C ii] pathfinders have begun construction or even operation, including CONCERTO (Monfardini et al. 2022;Fasano et al. 2022), the Tomographic Ionizedcarbon Mapping Experiment (TIME; Crites et al. 2014;Sun et al. 2021;Crites 2022), and the Epoch of Reionization Spectrometer (EoR-Spec) instrument on the Cerro Chajnantor Atacama Telescope (CCAT) facility (CCAT-Prime Collaboration et al. 2023;Nikola et al. 2022).With all these projects in active development and/or analysis phases, and early science results anticipated within this decade, this work aims to push the modelling of the [C ii] LIM signal forward in two important ways: first by reassessing the nature of the underlying halo-[C ii] connection, and second by assessing the information content of the one-point statistics of the [C ii] intensity fluctuations to be observed.
The model of this work distinguishes itself from many previous models by mediating the halo-[C ii] connection not solely through star-formation rate (SFR), but through the neutral atomic hydrogen (H i) mass and gas-phase metallicity in the ISM of the galaxies in dark matter halos, based on insights from the Feedback In Realistic Environments (FIRE) project examined by Liang et al. (2023).In formulating this halo model, we reproduce certain trends like the evolution of [C ii] luminosity against FIR luminosity, and arrive at a model consistent with current observational constraints on highredshift [C ii] abundances.
Applying our halo model to an ensemble of approximate cosmological simulations, we aim to answer the following questions: • What is the detectability of [C ii] emission in near-and farfuture mm-wave LIM surveys, in either auto-or cross-correlation?
• How do different model parameters affect the voxel intensity distribution of the line-intensity fluctuations, as represented by the differential relative entropy?
• How do observational effects like the instrument response and thermal noise affect the observability of these distinct signatures in relative entropy?
The paper is structured as follows.After a summary of context around the halo-[C ii] connection and one-point statistics in Section 2, we present our new [C ii] model and simulation methods in Section 3. We then show our results and forecasts in Section 4, and conclude with forward-looking discussion in Section 5.

A FIRE-tested formula: moving away from [C ii] as a tracer of star formation
The SFR is a highly attractive lever arm for [C ii] modelling at high redshift.With [C ii] being a dominant FIR cooling line mostly unaffected by dust extinction, [C ii] emission is a natural mechanism to balance heating from newly forming stars.Equally, [C ii] emission is attractive as a proxy for SFR at high redshifts, where a signpost for obscured star-formation activity is needed for a full understanding of cosmic star-formation history.However, previous work has already cast significant doubt that the connection between [C ii] luminosity and SFR is reliably universal (Herrera-Camus et al. 2015;Vallini et al. 2015;Croxall et al. 2017), as other factors like metallicity appear to modulate the observed relation.A full review of such work is well beyond the scope of this paper, and we refer the reader to Liang et al. (2023) for further details.The upshot, however, is that such doubts motivate a re-examination of the nature of the halo-[C ii] connection based on a physically motivated picture of [C ii] emission.Such work is particularly important if we are to leverage synergies between mmwave LIM and 21 cm tomography at high redshifts (Padmanabhan 2023).
To this end, we make use of insights from the work of Liang et al. (2023), who conducted a comprehensive analysis of the relationship between the [C ii] luminosity and star formation rate (SFR) of simulated star-forming galaxies at  ∼ 0-8 generated by the Feedback In Realistic Environments (FIRE) project.The mock galaxy sample includes 507 FIREBox galaxies (including 91 at  ≥ 6), which reside not in zoom-in boxes but in a (15ℎ −1 cMpc) 3 cosmological box with prescriptions for hydrodynamics and feedback (Feldmann et al. 2023), as well as 10 MassiveFIRE zoom-in simulations (6 of which have snapshots at  ≥ 6; see references in Liang et al. 2023).Through post-processing of these simulations to derive the interstellar radiation field and the ionization and excitation states of gas clouds, Liang et al. (2023) derived [C ii] luminosities for the FIRE galaxies studied, as well as other properties like the SFR and the bolometric IR luminosity  IR .
Not only do the FIRE simulations reproduce the empirically observed overall correlations between the [C ii] luminosity, SFR, and  IR , but Liang et al. (2023) also reproduce the empirically observed evolution of the [C ii]-SFR or [C ii]- IR relation with both redshift and  IR .Of interest is the finding that dense molecular regions contribute negligibly to [C ii] emission in the FIRE simulations, which would imply that the link between [C ii] luminosity and SFR is truly not through directly tracing the cool star-forming molecular gas (as is the case for CO, for instance).Rather, the [C ii] emission primarily coincides with the neutral H i regions between the ionized and molecular gas, where ionized carbon is appreciably present and the gas density is sufficiently high to encourage cooling through [C ii] line emission.
The ultimate result of Liang et al. (2023) was a physical picture of [C ii] emission where four factors modulate the scaling between [C ii] and SFR: the gas depletion timescale, the gas-phase metallicity, the gas density, and the fraction of gas in ionized and neutral regions.This work will leverage this proposed formula to move beyond [C ii] as simply a proxy for SFR, and build a new halo model instead relating [C ii] luminosity to the expected metallicity and neutral hydrogen mass in each halo, using this model to build new forecasts for upcoming [C ii] LIM experiments as will take place on the CCAT facility.

The importance of one-point statistics in summarising a highly non-Gaussian signal
In traditional cosmological contexts like studies of the cosmic microwave background, we are used to the unreasonable effectiveness of Gaussian random fields as a description of cosmological fields.What reflects this is the common use of the power spectrum as the primary summary statistic of observations of density fluctuations, including LIM.However, structure formation is very much a non-linear process resulting in non-Gaussian structures.In LIM contexts, Breysse et al. (2017) first explicitly proposed the use of the probability distribution function of observed voxel intensities -i.e., the voxel intensity distribution (VID) -as a way to capture the highly non-Gaussian information present in line-intensity fluctuations, without relying on higher-order statistics whose estimation and covariance pose significant challenges to detection and interpretation.Work continues on understanding the VID and extending one-point statistics in LIM contexts, on both theoretical and numerical fronts (Breysse et al. 2019;Ihle et al. 2019;Sato-Polito & Bernal 2022;Breysse et al. 2023;Chung et al. 2023).
It is however the logarithm of the VID, rather than the VID itself, that is most closely related to the information content of the LIM observation.Lee et al. (2023) put forward a relevant insight in this context, using one-point statistics to quantify the non-Gaussian information injected into the cosmic infrared background (CIB) by gravitational lensing.In particular, that work put forward quantities related to the relative entropy -or equivalently to the Kullback-Leibler divergence (KLD; Kullback & Leibler 1951) -between the intensity probability distributions of the lensed and unlensed CIB.
It is indeed natural to consider the information content in the VID -i.e., the probability density function (PDF) () -as the relative entropy from a model PDF () to the actual ():

𝑃(𝐼) 𝑄(𝐼)
. (1) However, Lee et al. (2023) considered it equally interesting to examine the integrand, the pointwise relative entropy (PRE) per intensity bin (dubbed 'weighted relative entropy' in that work).We will consider 'per intensity bin' (or 'per log-intensity bin' in some cases) to be implicit and often refer simply to (We will often switch between using  rel as shorthand for either  rel / as defined above or  rel / (log ); the usage will be clear based on, e.g., the binning of the discrete histogram approximating the underlying continuous ().)Note that many works (e.g., MacKay 2003) define 'relative entropy' as the negative of our definition in Equation 1, as that is equal to the KLD.However, our definition naturally leads to the PRE from  to  being the Shannon information for  minus that for the 'starting' , weighted by .
Using the PRE -the log-space difference of one distribution against another, weighted by the true distribution -provides a fuller description of the effect of a given variable on the intensity PDF.The integrated quantity of relative entropy will quantify the net statistical distance by which that variable pushes the PDF away from an alternative expectation, but the pointwise relative entropy can provide full templates of how the difference arises at different ranges of intensity.Thus distinguishing different modes of distributional distortion would disentangle a range of astrophysical and cosmological signatures that can imprint non-Gaussianities in the observable Universe.
Towards the conclusion of this work, rather than explicitly forecasting the detectability of these signatures, we will use the PRE to consider the information content in the VID related to a number of astrophysical model parameters.This in turn will guide future work that will more explicitly probe the detectability of the relevant signatures as well as signatures of primordial non-Gaussianity in the clustering and abundances of [C ii] emitters.

Line-luminosity model: a novel halo-[C ii] connection
As we reviewed in Section 2, through the analysis of the relationship between SFR and [C ii] emission in ∼ 500 simulated galaxies, Liang et al. (2023) derived a linear scaling relationship for the ratio between SFR and the global [C ii] luminosity  [C ii] of a galaxy: where  [C ii] is the fraction of the total galactic gas mass contained in H i and H ii regions,  dep ≡  gas /SFR is the galaxy's gas depletion time scale (equal to the total gas mass divided by the SFR), and  gas and  gas are the average gas-phase metallicity and gas particle number density, respectively. 1By the end of this section, we will be able to transform this relation into a model of [C ii] luminosity as a function of halo mass and redshift.We first establish H i mass as a primary intermediary property in Section 3.1.1,before adding dependence on metallicity in Section 3.1.2and a stochastic step in Section 3.1.3.

𝐿 [C ii] as a function of H i mass
By multiplying both sides of Equation 3 by the SFR, we recover a proportionality relationship between  [C ii] and  gas : For now, we neglect the dependence on metallicity and gas density, and consider the proportionality between  [C ii] and  [C ii]  gas .The latter quantity effectively ends up being the gas mass contained in H i and H ii regions.The original motivation for this being the relevant quantity in the work of Liang et al. (2023) is that it is entirely acceptable to ignore the contribution of molecular (H 2 ) regions to [C ii] emission, as most of the carbon there is in a neutral rather than ionized state.In particular, however, simulations show that it is chiefly the H i regions that dominate the determination of the [C ii] luminosity.This is because, as Liang et al. (2023) discuss, even the [C ii] emission from H ii regions -which often contribute more to the total [C ii] luminosity than H i regions -technically originates from the interface of the H ii gas with the H i-rich regions.
Based on the above considerations, we approximate the H i gas as the main driver of  [C ii]  gas , such that the [C ii] luminosity scales linearly with the H i gas mass  HI , with some proportionality coefficient  [C ii] : 1 Both average quantities are [C ii] brightness-weighted averages.Specifically, the 'average' gas density corresponds to the value at the location of the median [C ii] luminosity within the galaxy, and the 'average' metallicity is the same but is very similar to the mass-weighted gas metallicity.Villaescusa-Navarro et al. (2018) examine halos and galaxies in cosmological simulations to provide a relationship between the total halo mass  h and  HI for  h ∼ 10 9 -10 13  ⊙ , featuring an exponential cutoff at lower halo masses and a power law at higher halo masses: where  min is the cutoff mass (essentially in units of  ⊙ ),  0,HI is the overall normalization and  HI is the power law index at high halo mass.Villaescusa-Navarro et al. (2018) determine values for these parameters at  ∈ {0, 1, 2, 3, 4, 5}.While we are looking into halos of  ∼ 6-8, the relation evolves little at the highest redshifts where Villaescusa-Navarro et al. ( 2018) perform their fitting.Based on the  = 5 parameter values, we use  HI = 0.74, and scale the other parameters down to 70% of their values at  = 5, so that  0,HI = 1.9 × 10 9  ⊙ and  min = 2.0 × 10 10 .This downscaling is an approximate extrapolation of the redshift evolution of these parameters at  ≳ 4.
Recall that our goal is to express the [C ii] luminosity of a halo as a function of its mass  h .Since we have  HI ( h ), the missing ingredient is the coefficient  [C ii] in Equation 5. We calibrate this proportionality at a SFR of SFR * ∼ 1 M ⊙ yr −1 , not uncommon for FIREBox simulated galaxies at  ∼ 6.
Throughout this work we obtain averages of certain properties as a function of halo mass and redshift based on the model of Behroozi et al. (2013a,b), which connects halo mass to stellar mass and SFR through empirical models matching outputs from cosmological simulations to observational constraints.Here we interpolate across outputs from this model to find the halo mass  * h corresponding to SFR * at  ∼ 6, and subsequently the corresponding H i mass  * HI ≡  HI ( * h ) based on Equation 6.We find  * HI ∼ 2 × 10 9  ⊙ .Per Liang et al. (2023), meaning that we should have  [C ii] ( * h ) ∼ 10 7  ⊙ .This sets the value of  [C ii] = 0.005, meaning that when considering only the proportionality of [C ii] luminosity with H i mass, the relation between halo mass and [C ii] luminosity is with  HI ( h ) given by Equation 6.

𝐿 [C ii] as a function of metallicity
The gas-phase metallicity  determines the amount of carbon in neutral and ionized gas available to emit in [C ii] in the first place, and thus is fundamental to our model.The proportionality we have derived above effectively assumes a constant global gas-phase metallicity across all halos, but this is far from realistic as we expect significant evolution of metallicity on average with both halo mass and redshift.We would in fact like to calculate the metallicity  expected in each halo and use it as an additional variable that modulates the halo-[C ii] connection, so that It is common to relate the metallicity of galaxies to their stellar masses  * and SFR, to the point where this relation is considered 'fundamental' (at least out to some moderately high redshift) and dubbed the fundamental metallicity relation (FMR) and studied using a range of galaxy selections and parameterizations (Mannucci et al. 2010;Lara-López et al. 2010;Hunt et al. 2012;Cresci et al. 2019;Curti et al. 2020).We use the model of Behroozi et al. (2013b) again, using their fitting formula for average stellar mass ⟨ * ⟩ ( h , ) as a function of halo mass and redshift, and interpolating their outputs for ⟨SFR⟩ ( h , ).
Note in particular that the stellar mass-halo mass (SMHM) relation parameterization is where  1 is the characteristic halo mass,  is a fitting factor and  () is defined as where  is the power law index at low mass,   is a fitting index and  is a fitting factor.Most of the fitting parameters include some redshift dependence, including with best-fitting values of  0 = −1.412and  1 = 0.731.We refer the reader to Behroozi et al. (2013b) for further details on the other parameters.
To find metallicities, Heintz et al. (2021) use the FMR parameterization of Curti et al. (2020), and we do the same: where  0 (SFR) = 10  0 × SFR  1 , with best-fit parameters Z0 = 8.779,  0 = 10.11, 1 = 0.56,  = 0.31 and  = 2.1.Here, the metallicity Z = 12 + log (O/H) is not in units of solar metallicity.Curti et al. (2020) note that the solar oxygen abundance corresponds to Z = 8.69, meaning that All that remains is to recalibrate the proportionality between [C ii] luminosity and  HI , as we did while building the first version of our model.For the halo mass  * h corresponding to SFR * ∼ 1  ⊙ yr −1 at  ∼ 6, we should expect a metallicity of  * / ⊙ ∼ 0.3 based on the relations outlined above.One can already see that To be more precise we have with  HI ( h ) still given by Equation 6but the expected metallicity value  ( h , ) now given by Equations 13 and 14 combined with ⟨SFR⟩ ( h , ) and ⟨ * ⟩ ( h , ) based on Behroozi et al. (2013b).
Additionally, we assume a minimum halo mass of  h = 10 10  ⊙ required for [C ii] emission, with the implication being that we do not expect lower mass halos to host carbon and/or diffuse gas in sufficient abundance for there to be appreciable cooling through [C ii] emission (cf., e.g., Table 2 of Chung et al. 2020 and discussion in their Section 3.1).

Adding log-normal scatter to metallicity
A halo model cannot capture all of the baryonic physics interior to each halo driving the [C ii] emission, but we can approximate such physics as a stochastic process affecting the [C ii] luminosity.
In this case, we assume that the carbon abundances are most likely to be driven away from the expected value, and randomly scatter the metallicity  ( h , ) by a log-normal distribution with standard deviation   = 0.4 (in units of dex), which in turn will scatter the [C ii] luminosity with the same distribution since  [C ii] ∝ .We choose the fiducial value for   based on previous work by Vizgan et al. (2022), which showed that the log-linear correlation between [C ii] luminosity and H i mass exhibits a log-space root-mean-square scatter of 0.39 dex.
To add scatter to metallicity, we compute a positive scaling factor   randomly drawn from a log-normal distribution with the following PDF: where  =   ln( 10) and  = − 2 /2, with  chosen to preserve the linear mean value of  for a given halo mass and redshift.
Then, from the expected metallicity value  ( h , ) given halo mass and redshift, we may obtain a randomized metallicity simply by multiplying by   ∼ (  ;   ).Therefore, when 'painting' [C ii] luminosities onto dark matter halos in a cosmological simulation, we calculate  [C ii] from the H i mass and the randomized metallicity    found for each halo:

Noise model: fundamental sensitivities of Stage 1 and 2 LIM experiments
Along with the signal, we generate forecasts of observational experiments.To do so, we convolve the pure signal by a Gaussian approximation to the main beam, and add Gaussian noise to the result.This work only aims to examine the fundamental sensitivity that [C ii] LIM will be able to achieve in the near-to medium-term future, and so we omit considerations of either non-Gaussian sky noise (which will be the subject of significant near-future work from the first generation of [C ii] experiments) or interloper emission (rejection of which is already the subject of considerable theoretical efforts -see, e.g., Breysse et al. 2015;Lidz & Taylor 2016;Cheng et al. 2016;Sun et al. 2018;Cheng et al. 2020;Chen & Pullen 2022).
To add Gaussian noise to our signal, we obtain the sensitivity per map voxel.We refer the reader to other works like Appendix B of Chung et al. (2020) and references therein for further details on noise-related quantities, but for mm-wave observations, the convention is to express sensitivities in terms of the noise-equivalent intensity (NEI): where NEFD is the noise equivalent flux density (the noise level per instrumental beam) and Ω beam is the solid angle in the sky covered by the beam.Approximating the beam as having a Gaussian profile with full width at half maximum of  FWHM , the beam solid angle is given by Given the NEI, the actual noise level per voxel is given by where  feeds is the number of spectroscopic feeds, and  vox is the average observing time per volume element (or voxel) of the survey volume.For a spectrometer with simultaneous coverage of its observing band, this would simply be the observing time per sky pixel, but a scanning spectrometer will incur a penalty based on the number of steps required to cover the entire observing band.We will centre our observational forecasts on the CCAT facility, which will deliver the largest field of view out of all currently funded EoR [C ii] experimental projects.Based on the most recent specifications at  ∼ 280 GHz (CCAT-Prime Collaboration et al. 2023), the two EoR-Spec instrument modules proposed for Prime-Cam target an angular resolution of  FWHM = 48 arcseconds, a frequency resolution of   = 2.8 GHz (or /  ≈ 100), and NEFD values of 64, 81, and 120 mJy s 1/2 per beam in first-, second-, and third-quartile weather (respectively) at a source elevation of 45 • .
Like the CCAT-Prime Collaboration et al. ( 2023) work, we assume that the survey spans two fields of 4 deg 2 , and that EoR-Spec requires 42 steps to cover the full observing band.While the original CCAT-Prime Collaboration et al. (2023) forecasts assume 6912 beams and 80% yield (which would imply ≈ 5530 active detectors) and a weighted average NEI across the varied weather conditions, we assume 5040 active detectors on the EoR-Spec focal plane and a NEFD per beam of 72.5 mJy s 1/2 (a simple arithmetic mean of the first-and second-quartile NEFD values).These two differences end up mostly cancelling each other, as we assume slightly fewer active detectors but slightly better NEFD per beam, and we end up with sensitivity figures within 10% of those obtained by CCAT-Prime Collaboration et al. (2023).
We consider two experimental scenarios.
• The second scenario is a future Stage 2 successor LIM experiment, with 4.8 × 10 6 spectrometer-hours per field (versus the 2000 × 120 = 2.4 × 10 5 spectrometer-hours achieved with the baseline CCAT survey), or  vox  feeds = 38 hours.Advancements in onchip spectrometers and readout technologies should relieve some of the efficiency limitations of the current generation of spectrometers and enable 10 6 -10 7 spectrometer-hour surveys to deploy within the next decade (Karkare et al. 2022).The CCAT facility is fundamentally capable of hosting a Stage 2 mm-wave LIM experiment of this calibre, as Prime-Cam occupies only the central ∼ 40% of the total field of view delivered by the telescope, and future CCAT instruments could deploy many more spectrometer modules.We do not necessarily assume these modules will be scanning spectrometers like the EoR-Spec modules, but assume that the same NEFD is fundamentally achievable since the values we assume are already weather-limited.

Generating cosmological line-intensity mock data cubes
Having pulled out our halo model for [C ii] from the FIRE simulation analysis of Liang et al. (2023), and defined experimental parameters for near-future [C ii] surveys, it only remains to simulate observations of the [C ii] signal.We use a suite of approximate cosmological simulations to generate catalogues of halos spanning the survey volume, and then generate 3D cubes of simulated observations by 'painting' luminosities onto these halos.

Cosmology to halos: the peak-patch method
The peak-patch (or mass peak-patch) method is a model for identifying halos in the initial dark matter fluctuation field.Moving beyond the original peaks theory of Bardeen et al. (1986) and the excursion set method of Bond et al. (1991), the peak-patch method is a convolutional method, identifying mass peaks in the initial (Lagrangian) density field after processing through a hierarchy of smoothing filters at different scales, and finding the final properties of peaks as collapsed halos by solving for homogeneous ellipsoidal collapse and using second-order Lagrangian perturbation theory to resolve flow dynamics.First expounded and validated in the work of Bond & Myers (1996a,b,c), the peak-patch method has recently returned to work following re-implementation and re-validation by Stein et al. (2019).The peak-patch method generates large ensembles of independent cosmological realizations with much greater ease than conventional N-body methods, making it suitable for signal forecasts as well as covariance estimation for novel statistics.This is particularly true in the context of LIM (Ihle et al. 2019;Chung et al. 2023), which demand large lightcone runs resolving relatively lowmass halos, since the peak-patch method can intrinsically evolve the initial mass peaks to a continuous range of final redshifts instead of having to stitch together snapshots at discrete time steps.
We use a suite of peak-patch simulations generated to probe the possible use of novel statistics to apply to CO LIM at EoR redshifts and previously detailed in a number of papers, including Chung et al. (2023).These simulations were designed to push to lower masses by a factor of several compared to the peak-patch simulations generated at  ∼ 3 for Ihle et al. (2019), and so have a size of (960 Mpc) 3 and a resolution of 5640 3 cells.In principle, this should correspond to a minimum resolvable halo mass of 6.4 × 10 9  ⊙ ; after adjusting masses to match abundances predicted by the halo mass function (HMF) of Tinker et al. (2008) with empirical high-redshift corrections from Behroozi et al. (2013b), we find our simulations should be complete down to  h ≈ 1.2 × 10 10  ⊙ , well matched to our assumed minimum emitter halo mass of 10 10  ⊙ .(Note that such adjustments are not strictly necessary particularly at lower  than those considered here; Stein et al. (2019) showed that peak-patch simulations agree out of the box with the Tinker et al. ( 2008) HMF within 10% for  h ≳ 10 11  ⊙ and  ≤ 2.) The simulations are lightcone runs with the centre of the box placed 8668 Mpc away from the observer; although the minimum requirement for CO LIM simulations was to cover  = 5.8-7.9, the lightcones actually cover a slightly wider interval of  = 5.6-8.2.The box size then corresponds to a sky area of 6 • × 6 • at the highest redshifts accessed.As we only need to simulate patches of 2 • × 2 • , it is theoretically possible to slice each simulation into nine semi-independent sub-volumes for a total of 2430 lightcones instead of 270.However, the present work does not focus on aspects like covariances of statistics, and 270 lightcones is already a sufficient ensemble to reliably identify average expectations and variances.We instead favour using wholly independent volumes and use the central 4 deg 2 of each lightcone.

Halos to maps: mocking [C ii] clustering
To consider how the [C ii] line-intensity field responds to the presence of our simulated halos, we use modified versions of limlam_mocker2 and lim3 codes that interface with each other to paint the halo-[C ii] 'response function' of Equation 17 onto our peak-patch lightcones, as well as appropriately handle convolution of the beam profile and addition of noise.The code is more generally applicable to other atomic and molecular line species, with the only prerequisite being an appropriate halo model for line emission.
The code uses the peak-patch simulation outputs -which catalogue each halo's mass, Eulerian positions, comoving distances and redshift -as well as a function to relate halo properties to line luminosities (in our case the  [C ii] ( h , ) function as described in Section 3.1) and the observational parameters described in Section 3.2, including the survey extent in angular and frequency coordinates, the NEFD,  FWHM ,  feeds ,   , and the total survey time (from which the code derives  vox ).
For all halos in the simulation that fall within the specified observational volume, we compute the [C ii] luminosity of each halo, but any mass-luminosity function, for any atomic or molecular line, can be prescribed.We then bin all halos and thus their luminosities into discrete voxels based on the pixelization of the sky and the frequency channelization.The conversion of the total luminosity  [C ii],vox in each voxel to the voxel intensity  [C ii] is straightforward: where The observing frequencies encompass [C ii] emission from  = 5.6-6.6; at the central frequency we obtain  vox ≈ 22 Mpc 3 and /[4 rest  ()] ≈ 7.2 × 10 −4 Jy sr −1 Mpc 3  −1 ⊙ .For limited purposes, we also produce intensity cubes at other redshifts, including at  obs = (226 ± 14) GHz using otherwise similar parameters.
We consider the raw [C ii] cube in some cases, but in most cases we will consider the cube after convolution with the angular Gaussian profile approximating the CCAT beam, and with the addition of Gaussian noise as described in Section 3.2.

Sanity checks: luminosity functions and [C ii] deficits
Before moving on to the primary results of our simulations, we show the results of a couple of sanity checks of the model, first against other  ∼ 6 [C ii] luminosity functions in the literature and then against the so-called [C ii] deficit.The latter is not so prominent in the range of luminosities primarily relevant to our work, but there is nonetheless weak downward evolution of  [C ii] / IR with increasing  IR .As our peak-patch lightcones have halo masses adjusted to match the Tinker et al. (2008)-Behroozi et al. (2013b) form of the HMF, it is sufficient to use that HMF model to generate randomly drawn masses given some large cosmological volume, convert these masses to [C ii] luminosities (with randomly drawn metallicity values), and obtain the resulting luminosity function.We show this in Figure 2 alongside selected prior simulations and observations.
Our model output compares favourably to the prediction of Popping et al. (2016), which uses a semi-analytic model to predict line luminosity functions at  = 0-6; that work noted its difficulty in reproducing bright CO and [C ii] emitters at high redshifts, a difficulty which we do not share to the same extent.We also predict similar abundances of  [C ii] ∼ 10 8 -10 9  ⊙ objects as the model of Lagache et al. (2018) We also show a similar level of consistency (i.e., within an order of magnitude) with observational constraints, which is remarkable given that we have not explicitly fit to them in great detail beyond the specific pivot point of  [C ii] ∼ 10 7  ⊙ (which is in fact quite far from where most observational constraints lie).
• The ASPECS LP survey was unable to go beyond upper limits on the luminosity function at  ∼ 6 (Decarli et al. 2020); our simulated source abundances lie safely below the upper limits.
• Our simulated luminosity function lies systematically below the measurements of ALPINE (Yan et al. 2020), an ALMA survey of UV-selected galaxies.The target selection of ALPINE may be a source of bias, but the discrepancy against our model is certainly not as radical as against that of Popping et al. (2016) (which posited a similar explanation for inconsistency with a lower limit at  ∼ 4).
• The REBELS survey has also surveyed [C ii] emitters at  ∼ 7, but has not publicly released explicit luminosity function constraints; in its place, we compare the fitting of Padmanabhan (2023) to the work in preparation against our model, and find consistency within a factor of 2-3 for  [C ii] ≳ 10 8  ⊙ .For comparison, publicly discussed REBELS [C ii] emitters (e.g., in Ferrara et al. 2022) have luminosities of 10 8.6  ⊙ .
We also compare against constraints on the cumulative number density of [C ii] emitters above certain detection thresholds.
• An alternative ASPECS LP analysis by Uzgil et al. (2021) derived upper limits on source abundances down to luminosities of ∼ 10 8  ⊙ ; we show their upper limit for sources at  = 6-7 and demonstrate consistency with this limit.Although the original work of Uzgil et al. (2021) shows a number of other limits, the qualitative comparison against their other limits is the same.
• In addition, two surveys of lensing clusters have each yielded a single detection of a [C ii] emitter at  ∼ 6. ALCS (Fujimoto et al. 2021) mapped strongly lensed regions across 33 different clusters, covering a total of 88 arcmin 2 ; DUALZ (Fujimoto et al. 2023) specifically targets Abell 2744, in both a wide 24 arcmin 2 survey and a deep 4 arcmin 2 survey.As in the comparison against ALPINE data, our predictions lie systematically on the lower side of these limits.However, we caution for both ALCS and DUALZ that these constraints not only rely on a single detection in small effective survey areas of ∼ 10 0 -10 2 arcmin 2 , but also rely strongly on the mass model of the lensing cluster, both in the determination of the intrinsic luminosities that were or should have been detectable, and in the determination of the effective comoving volume surveyed.
For the ( [C ii] / IR )- IR relation, we look to the simulated FIRE galaxies as a comparison point, rather than observational trends. 4Real-life resolved sources at  > 5 are mostly exceptionally bright sources with  IR ≳ 10 12  ⊙ (and typically  [C ii] ≳ 10 9  ⊙ ), whereas it is almost entirely the complementary population that is most abundant and will dominate the large-scale clustering component of the [C ii] LIM signal.
To obtain the IR luminosity for our halos, we use the same IR-SFR conversion as Liang et al. (2023): This is the relation of Kennicutt (1998), recalibrated for a Kroupa (2002) initial mass function.
Recall that we formulated our final model in steps, and initially considered in Section 3.1.1a model with only explicit dependence on H i mass (implying either fixed metallicity across all halos or lack of any metallicity-dependence).We show the ( [C ii] / IR )- IR relation for this model in the upper panel of Figure 3.At high  IR , Liang et al. (2023) note that  dep ∼  gas /SFR drives the socalled [C ii] deficit as there is less gas available to cool in [C ii] relative to the amount of star-forming activity.In the regime of  IR ≲ 10 12  ⊙ , however, the predicted evolution in  [C ii] / IR with  IR is far too steep when attempting to explain [C ii] emission with gas mass alone, and in particular over-predicts [C ii] emission in low-mass objects relative to the FIRE simulations.
The lower panel of Figure 3 shows the ( [C ii] / IR )- IR relation for our final model, using both H i mass and metallicity to inform the [C ii] luminosity.As Liang et al. (2023) note, metallicity is clearly key in explaining a different kind of [C ii] deficit, which 4 We use results from Liang et al. (2023) where the prescribed cloud scale differs from that of their final paper, but the effects are only of order unity.17(lower panel) including metallicity.We show 68% and 95% intervals (unfilled rectangles, with more opaque bins being more abundant in our simulations) and the mean trend (dashed curves) in log-space luminosity bins of width ≈ 0.25 dex.Plotted alongside these are calculated values for simulated FIRE galaxies from similar redshifts.
is the decline in  [C ii] / IR with increasing redshift at fixed  IR .In general, less IR-luminous galaxies are less metallic and less abundant in carbon, which counteracts the increased relative availability of diffuse gas.The softened evolution of   3, the dependence on metallicity over-compensates against the declining abundance of gas in general, and leads to an overprediction of [C ii].Inversely, metallicity-dependence suppresses our  ≈ 8 predictions at low and high  IR relative to FIRE predictions.A crude parameterization of  [C ii] or a re-evaluation of the FMR at  ≳ 6 would allow the model to capture the behaviour of the brightest and highest-redshift sources, but this task is beyond the scope of the present work as  IR ≳ 10 12  ⊙ sources will be too rare to significantly affect the [C ii] LIM signal and as a precise replication of the FIRE outputs is not the primary focus of this work.

Line-intensity maps and power spectra
As previously outlined, we generate 270 lightcones of simulated [C ii] emission at  ≳ 6.For most of this work, we will be focusing in particular on the part of the signal originating from  = 5.6-6.6 (corresponding to  obs ∈ (250, 290) GHz).We plot the intensity map for a wider interval spanning  ∈ (5.8, 7.6) in Figure 4, which shows the [C ii] emission tracing the large-scale clustering of our simulated dark matter halos, but with an intensity that falls off rather rapidly beyond  ∼ 7.
No actual experiment will perfectly recover the signal as shown in Figure 4.As previously stated, the primary focus of LIM forecasts is the power spectrum, encapsulating the variance of the [C ii] lineintensity fluctuations at different scales, which should be detectable above noise.The power spectrum also lets us calculate a Wiener filter given the expected power spectrum and the actual noisy observation, thus constructing a minimum-variance linear estimator of the original [C ii] line-intensity map.We consider implementation details for the power spectrum and Wiener filter calculations in Appendix A, and focus here on the results.
Figure 5 shows the average power spectrum across all 270 lightcones, compared to both previous models (Silva et al. 2015;Serra et al. 2016;Dumitru et al. 2019;Chung et al. 2020;Karoumpis et al. 2022;Kannan et al. 2022;Sun et al. 2023) and sensitivities expected from future [C ii] LIM experiments (see Chung et al. 2020 for calculation details).At large scales, our predictions are similar to those of previous models if relatively pessimistic.However, our model is decidedly more pessimistic at small scales, indicating that we tend to predict significantly less shot noise compared to previous models, which is to say we do not predict a substantial population of rare bright [C ii] emitters.In this sense our predictions tie closest to those of Kannan et al. (2022) and Sun et al. (2023).
Greater stochasticity in [C ii] luminosity than the fiducial expectation could account for some of this difference, but not easily.Figure 6 shows the [C ii] power spectrum for different amounts of log-normal scatter in metallicity ranging from 0.0 to 1.0 (in units of dex), where the fiducial value is   = 0.4 dex.Higher scatter in metallicity introduces more random small scale fluctuations of the signal, and as a result, the clustering component of the power spectrum becomes subdominant to shot noise at lower  for higher   .
For an arbitrarily high level of scatter, the signal becomes entirely dominated by shot noise on all scales.However, the values required are extreme, with   ≳ 0.7 dex required to achieve a level of shot noise relative to clustering similar to other model predictions.(As for why such values would be extreme relative to the fiducial   = 0.4 dex, consider that a change in the standard deviation of a Gaussian variable from  = 0.4 to  = 0.8 would decrease the correlation coefficient against another variable by a factor of 2, all other covariances being held equal.) In fact, the low level of shot noise arises in large part as a fundamental feature of our halo model: by its very motivation our model prescribes non-linearly suppressed [C ii] emission from more massive objects relative to their SFR, mimicking the so-called [C ii] deficit.By contrast, the other models use either a linear relation or mildly non-linear power law to obtain [C ii] luminosity from either SFR or IR luminosity.A key exception is the prediction of Sun et al. (2023), which uses the LIMFAST code (Mas-Ribas et al. 2023) to generate a model of [C ii] emission motivated by similar physical variables as in our model, albeit simulated at the molecular cloud level rather than at the dark matter halo level.Another exception exists in one of the models of Karoumpis et al. (2022), which works from the results of Vallini et al. (2015) to account for the effect of metallicity on the [C ii] luminosity.But metallicity accounts for the redshift evolution of the [C ii]-SFR relation given the metal-poor nature of early galaxies, while the primary driver of the [C ii] deficit at high luminosities is the gas mass.We also see the reluctance of our model to prescribe very bright objects in the luminosity function shown in Figure 2, especially in comparison to the model of Lagache et al. (2018).All this suggests that the shape of the power spectrum and the relative balance of the clustering and shot noise components may be a key discriminator between competing physical pictures of [C ii] emission.Although such pictures must include a wide range of processes that impact the halo-[C ii] connection at both low and high masses, the shape of the power spectrum may yet test different explanations of the [C ii] deficit.
Prospects for detecting the power spectrum to this end are promising.When summed in quadrature across all  ≲ 1 Mpc −1 , the total signal-to-noise ratio for the  ∼ 6 [C ii] power spectrum is ≈ 5 for the baseline CCAT-DSS experiment.Although a detection at  ∼ 7.5 would be out of reach for CCAT-DSS, our Stage 2 LIM experimental parameters would be sufficient to achieve a ≈ 11  (Silva et al. 2015;Serra et al. 2016;Dumitru et al. 2019;Chung et al. 2020;Karoumpis et al. 2022;Kannan et al. 2022;Sun et al. 2023).Sensitivity forecasts for all surveys assume a mode count and noise level corresponding to the specifications outlined in either this work or in a previous work (Chung et al. 2020)  Average Power Spectrum Average power spectrum of the [C ii] pure signal over our 270 halo lightcones.The dashed line indicates the mean and the shaded region indicates the 68% interval about the median of the distribution the 270 lightcones' power spectra.Here, we display the average power spectrum for values of metallicity scatter index   ranging from 0.0 to 1.0, which shows the gradual dominance of the shot noise across more scales with higher   .
overall detection of the  ∼ 7.5 [C ii] power spectrum, as well as an improved ≈ 48 detection at  ∼ 6.The improvement would allow measurement of the shape of the [C ii] power spectrum beyond a simple initial detection, probing the contribution of populations of faint and bright galaxies to the total signal.
A Wiener-filtered line intensity map provides a more intuitive illustration of what this detectability of the power spectrum means in terms of sensitivity to [C ii] intensity fluctuations at different scales.
We process the raw line intensities from an example realization, as shown in the upper panel of Figure 7, through a Wiener filter given the Stage 2 forecast, ending up with the result in the lower panel of the same figure demonstrating recovery of prominent large-scale features (again, see Appendix A for more information on the Wiener filter processing).

Stacking [C ii] on optical galaxies
We perform a mock stacking analysis to consider recoverability of the average [C ii] signal associated with locations of individual objects.Specifically, we assume that the halos in our simulations contain not only [C ii]-emitting gas, but also Lyman-break galaxy (LBG) populations that can be detected through photometric dropout (as the Lyman break in the continuum flux falls between two photometric filters), and simulate stacks on a hypothetical narrowband LBG survey that can access thousands of objects over the sky area of a single 2 deg 2 [C ii] LIM field.
We assume drop-out identification of LBGs at  ∈ (5.9, 6.1), and assign LBG counts to each halo in this redshift range.Harikane et al. (2016) consider a halo occupation distribution model for a range of LBG populations detected in Hubble deep imagery and large-area Subaru/Hyper Suprime-Cam (HSC) data, including a  ≈ 5.9 LBG population detected as -dropouts with threshold restframe UV aperture magnitude of 28.4, or absolute UV magnitude of  UV < −19.1.Although this limiting magnitude is mostly driven by the Hubble data, which only span hundreds of square arcminutes, we assume that future ultra-deep photometric data will be capable of reaching this limit over areas on the scale of square degrees.This is not unreasonable based on the original HSC Subaru Strategic Program specifications (Aihara et al. 2018), which specified a depth of  ≃ 28 for the HSC-UltraDeep (3.5 deg 2 ) layer; as of the third data release (Aihara et al. 2022), the HSC data had reached a depth of  ≃ 26.9 in the Deep layer, with the UltraDeep layer being ≃ 0.8 mag deeper.
We briefly recap the halo occupation parameterization of Harikane et al. (2016) parameterization here with minor changes to notation.The average number of LBGs in each halo is given as an assumed star-formation duty cycle  duty = 0.6 times the number of both central and satellite LBGs when active: The average central LBG count ⟨ cen ⟩ is given by a sigmoid which reaches 0.5 at some characteristic mass  min,LBG and has some width  log  : where for the  ∼ 6 LBG population, Harikane et al. (2016) obtain log ( min / ⊙ ) = 11.03 as the best-fitting value.The number of satellite LBGs is then this scaled by a power law: After obtaining ⟨⟩ ( h ), we make a Poissonian draw to determine the 'actual' number of LBGs hosted in each halo.This results in a typical count of (4.3 ± 0.4) × 10 3 LBGs at  ∈ (5.9, 6.1) in each 2 deg 2 lightcone.The corresponding comoving number density of ∼ 5×10 −4 Mpc −3 is lower than that found by Harikane et al. (2016) in data, but consistent with our own recalculation at  = 5.9 using the HMF and the prescription for ⟨⟩ ( h ), suggesting our results are likely consistent with their fitting to the data (modulo cosmology differences).
We generate the actual mock stack on LBGs by cutting out a thumbnail from the [C ii] map centred on the host halo location, 50 pixels on each side (≈ 17 ′ × 17 ′ ), and then averaging the thumbnails weighted by the LBG count of all host halos.
We first show in Figure 8 the radial profile of the intensity of the stack away from the centre, as recovered from both noiseless simulations and realistic forecasts for [C ii] observations with CCAT-DSS and the Stage 2 concept.The profile appears to have two components, namely a more compact component localised near the stack centre corresponding to the [C ii] emission from the host halo, and a more diffuse component extending out to larger angular distances corresponding to emission from the surrounding large-scale structure.We find a slight upwards bias in the CCAT-DSS forecast for the second component at the level of 10%, but otherwise the fundamental sensitivity of CCAT-DSS already allows for strong recovery of the radial profile of [C ii] emission around  ∼ 6 LBGs.With Stage 2 sensitivities the variance in the recovered profile is comparable to the corresponding variance for the noiseless stack, suggesting that instrumental noise in the [C ii] observation is not the dominant source of uncertainty in the radial profile in that scenario.We consider a phenomenological model of the radial profile as the sum of Gaussian and power-law profiles, with  here being the radial distance in pixels (recalling that 1 px covers 20 ′′ along each angular dimension): Table 1 shows the recovered values of all five free parameters { D ,  0 ,  D ,  G , } from both the noiseless stack and the average expected stack from the CCAT-DSS and Stage 2 LIM forecasts, given respective errors.Note that while we naïvely expect  = 2 px 2 based on the beam profile being a Gaussian with scale of 1 px, the stacking procedure effectively convolves this with the pixel window function during the stack to widen the actual profile.Otherwise the recovered parameter values are generally consistent with what we see in Figure 8, with the slight bias in the CCAT forecast for the large-scale component reflected in the lower recovered value of  D .
For illustrative purposes, we also show a single realization of the stacking exercise in Figure 9, including radial profiles recovered from that particular realization re-projected into a thumbnail.Note however that the Stage 2 LIM sensitivities allow very strong recovery of the [C ii] signal correlated with LBG locations without the need to isotropize the signal by averaging in radial bins.In fact, although recovery of the radial profile is already very good with CCAT-DSS sensitivities, an isotropized analysis even with Stage 2 sensitivities will miss out on the finer details of clustering of the [C ii] signal, as shown by the structure evident in the difference map between the recovered isotropized thumbnail and the true noiseless stack.Future work should explore the possibility of improving the information content of LIM stacks through anisotropic techniques like oriented stacking (see, e.g., Lokken et al. 2022Lokken et al. , 2023)).

Relative entropy in one-point statistics
Returning to summary statistics of the [C ii] observation on its own, we now consider the ability of one-point statistics to distinguish the non-Gaussian signature of different parameters.
Recall from Section 2 our definition of PRE per intensity bin: (30) We may consider how this PRE varies with  (or equivalently its displacement Δ away from  0 ).Locally, this is simply given by the derivative of the PRE: or, taking Δ → 0 while noting that (;  =  0 ) = (), Therefore, the differential PRE incurred by a change in a given model parameter is simply the derivative of the log probability with respect to that parameter, weighted by the fiducial probability.We numerically estimate the PRE differential with respect to a number of parameters that we encounter in Section 3.1, namely: •  [C ii] , the overall normalization of  [C ii] ( h , ) in Equation 17, with fiducial value of 0.024; •   , the metallicity log-normal scatter in units of dex, with fiducial value of 0.4 dex; •  HI , the power law index of the  HI ( h ) relation, with fiducial value of 0.74; • log  min , the exponential cutoff mass scale in the  HI ( h ) relation, with fiducial value of 10.3; •  0 , controlling the power law index of the SMHM relation of Behroozi et al. (2013b), with fiducial value of -1.412; • and , a power law index in the FMR parameterization of Curti et al. (2020), with fiducial value of 0.31.
First, we consider the PRE in the noiseless signal simulation, before the introduction of either the angular beam or Gaussian noise.The upper panel of Figure 10 shows the resulting PRE derivatives (per log-intensity bin) with respect to the six chosen parameters.Due to the significantly smaller fiducial value of  [C ii] relative to those of the other parameters, we appropriately re-scale the PRE derivatives against the other parameters for easy comparison.
Since the PRE derivative is of the same sign as the derivative of the log-VID, the effect of increasing a given parameter  is to displace voxel counts towards 'higher entropy' from bins of lower  ( rel )/ to bins of higher  ( rel )/.For instance, by increasing  [C ii] , we uniformly increase the brightness of the entire [C ii] intensity signal, thus pushing voxels from lower to higher intensities.For our fiducial model, this displacement appears to happen across a cross-over point of ∼ 10 2 Jy sr −1 .
Unfortunately, the effect of many parameters is largely similar to that of  [C ii] , and their PRE derivatives appear mostly the same as  ( rel )/ [C ii] after a rescaling and/or sign flip.Decreasing any one of  min ,  0 , or  increases the brightness of the signal in a way that cannot be distinguished from increasing  [C ii] , at least not through the informational divergence in the one-point statistics of the map.In Figure 10, we have specifically chosen rescalings that accentuate this degeneracy.(Appendix B shows the same derivatives with less drastic rescalings and against the natural log of the parameter value; the qualitative conclusions remain the same.) We do however find some parameters whose effect appears distinct from that of  However, much of this becomes insignificant against the effect of introducing the angular beam and Gaussian noise to the observation, which results in the PRE derivatives (per linear intensity bin) shown in the lower panel of Figure 10.The PRE derivative with respect to  HI is now indistinguishable in shape from that with respect to  [C ii] , while the PRE derivative with respect to   still appears distinct but is significantly reduced in amplitude, suggesting its significantly reduced informational impact on the VID.

Discussion point 1: observational distortions
It is clear that both smoothing of the signal by the beam and contamination from thermal noise cause significant loss and distortions in the informational sensitivity of the VID to our model parameters.To better understand how this occurs, we further vary the beam width  FWHM and noise scale  n , and examine the resulting changes in the PRE derivative.
We summarise our findings in Figure 11.First, we consider the effect of the beam between the outer left panel and the inner left column of panels.Broadly speaking, the effect of convolving the field with a beam of finite size is a coarse pooling.This pooling dilutes the intensity in the brightest voxels across its neighbours, suppressing the heavy right tail that would originally be present in the VID and compressing the effect of each parameter to a narrower range of intensities.The effect is more complex at lower intensities, and so the distortion of the PRE derivative ultimately depends at some level on the clustering of the signal and the effect thereon of each model parameter.
Next, moving to the inner right column of panels of Figure 11, we consider the effect of introducing Gaussian noise, considering a few intermediate scenarios between the noiseless case and the Stage 2 forecast with  n = 3.2 × 10 3 Jy sr −1 .Inevitably, the PDF of the total intensity is the convolution of the individual signal and noise intensity PDFs, so introducing Gaussian noise means smoothing out the signal PDF by a Gaussian profile of size  n .Given that we were plotting the PRE per log-intensity bin across a range of 1-10 4 Jy sr −1 , the apparent result of convolution with the noise PDF is unsurprising.This is in fact the point where, as we move to considering the PRE derivative per linear intensity bin, the distinct signatures of some parameters homogenize, and overall the PRE derivatives clearly decrease significantly in magnitude as noise spreads the actual information content of the model parameter across , the  HI (  h ) power law index  HI , and the metallicity log-normal scatter   (in units of dex).Inner left column of panels: The same derivatives for all three parameters (from top to bottom panels), but with the introduction of a Gaussian beam, shown for several widths of the beam smoothing the signal and suppressing intensities.Inner right column of panels: The derivatives of  rel (now evaluated in linear intensity space) for the same three parameters (again from top to bottom panels), but with the fiducial Gaussian beam width of  FWHM = 48 ′′ , and with varying noise levels smoothing the actual intensity distribution.Rightmost panel: The final pointwise relative entropy derivatives after accounting for both beam smoothing and Gaussian noise at the level of our Stage 2 experimental scenario.Note that as with the leftmost panel, these derivatives are also scaled up as indicated in the legend, and in particular the derivative with respect to   is scaled up by a further factor of 6 compared to in the leftmost panel.more intensity bins.A lower value of  n ≈ 600 Jy sr −1 -effectively corresponding to a Stage 4 LIM survey -better preserves the distinct nature of some PRE derivatives (e.g., against   ), but the noise in a Stage 2 LIM experiment ends up having a more drastic effect.By a combination of effective averaging of the signal and smoothing of the actual VID, the beam and Gaussian noise in our survey simulations tend to Gaussianize the observation and decrease the amount of information-theoretical divergence imparted to the VID by the change in any given model parameter.

Discussion point 2: Paths forward for optimal information recovery from the VID
While observational distortions do wipe away a significant amount of non-Gaussianity, it is possible to consider paths to extract the residual non-Gaussian signatures from the distorted VID.Detailed work on this front is beyond the scope of the present paper, but we will briefly discuss a few possible avenues.Appropriate filtering or processing of the observational data could conceivably improve detectability of the distinct signatures of different parameters.For instance, the relative entropy analysis of Lee et al. (2023) used not the full simulated CIB map, but maps filtered in different bands of harmonic multipoles.Similarly, Fourier or wavelet analyses could reduce the impact of noise.
However, it seems difficult to extract an information-theoretical free lunch from the noisy LIM observation, at least on its own.Certainly the simplest noise-reducing filter, a rolling average, will ultimately tend to Gaussianize the signal even as it reduces the noise, which will be undesirable for certain signatures of skew or kurtosis.More advanced statistical methods could conceivably be better matched to the specific problem and perform better at isolating signal from noise in a non-linear, non-Gaussian fashion, but would likely require large amounts of training and thus be at least implicitly constrained by prior expectations and other external information.
One of the best empirical paths forward likely involves explicitly introducing external information and probing one-point statistics of cross-correlations or cross-correlations of one-point or higherorder statistics.Although the isotropic stacks of Section 4.2 do not necessarily reveal the clustering around the external source catalogue very prominently on casual inspection, an oriented stack that emphasises the filaments of the cosmic web may itself have onepoint statistics that reveal non-Gaussian signatures of certain model variations.Similarly, the one-point statistics of a line-intensity map reconstruction through a non-linear extension of conventional techniques like linear covariance-based filtering (Chung 2023) could also access non-Gaussianities.
Another technique being developed for LIM applications is to deconvolve noise from signal through operations on the characteristic functions dual to the VID or counts-in-cells (Breysse et al. 2019(Breysse et al. , 2023;;Chung et al. 2023).For reference, Figure 12 shows the spectral densities of the PRE derivatives.Purely as a qualitative heuristic, the spectral densities suggest that at smaller scales (i.e., higher val- ues of the Fourier dual of the histogrammed intensity), the PRE does contain some small amount of information content imprinted in different ways by the distinct parameters of our model.Therefore, by leveraging the mutual one-point statistical information between a [C ii] LIM survey and a correlated observation, Fourier-space deconvolved distribution estimation (DDE; Breysse et al. 2023) can potentially access discriminatory information content by rejecting disjoint noise and systematics.
Finally, we note that accessing the information content of lineintensity maps is subject to further observational distortions that we have not accounted for -e.g., scale-dependent sky noise, removal of interloper emission, and other mapmaking pipeline artefactsmany of which will involve non-linear operations.Tailored simulations of specific experimental scenarios using large ensembles of numerical simulations of the kind we have considered here will thus continue to be critical to evaluation of information content, covariance estimation, and tests of inference pipelines.

CONCLUSIONS
By now we have answered all of the questions we put forward in the Introduction to this paper: • What is the detectability of [C ii] emission in near-and farfuture mm-wave LIM surveys, in either auto-or cross-correlation?We expect a confident detection of the [C ii] power spectrum at  ∼ 6 with CCAT, followed by at  ∼ 7.5 with a Stage 2 LIM experiment.Stacking against a drop-out LBG survey can yield strong detections of the [C ii] signal near star-forming galaxies, as demonstrated with mock source densities corresponding to survey depths similar to those already achieved in existing surveys over multiple square degrees of sky.
• How do different model parameters affect the voxel intensity distribution of the line-intensity fluctuations, as represented by the differential PRE?The PRE derivatives of many parameters become apparently degenerate with that of a simple uniform scaling of the signal brightness, but some parameters like the metallicity scatter   provide a distinct non-Gaussian signature that redistributes probability from middling intensities to both lower and higher intensities, or vice versa.
• How do observational effects like the instrument response and thermal noise affect the observability of these distinct signatures in relative entropy?By Gaussianizing the signal and smoothing out the VID, resolution effects and thermal noise both blur the signature in the log-VID, suppressing the observable PRE differential significantly and/or rendering all parameters degenerate with the overall brightness of the [C ii] fluctuations.
We only stand at the beginning of the larger endeavour of [C ii] LIM, which will continue to require significant logistical and theoretical effort for optimal analysis and interpretation.Future work should investigate techniques like oriented stacking or the Fourierspace DDE -or indeed entirely novel statistical methods -leveraging external cross-correlation or mutual information to isolate signal from interlopers and noise.As we stated towards the end of the previous section, such methods may even be key to accessing sufficient information content to discriminate between different kinds of astrophysical and even cosmological non-Gaussian signatures.This includes signatures of primordial intermittent non-Gaussianities propagating through large-scale structure and galaxy formation, which we will examine in future work through the lens of [C ii] LIM among other probes (Carlson et al., in prep.).In turn, these techniques and non-Gaussianities will require further study through rigorous simulation of both signal and noise.
In this aspect, a full understanding of observational effects will be important, but equally important is a full understanding of the [C ii] signal, or at least as full an understanding as one can get at this juncture.This work represents the state of the art in terms of a halo model grounded on physical insights from small-scale simulations, but even detailed simulations of individual galaxies are only as good as their inputs.Testing such models with [C ii] LIM will be desirable in future; this may demand the use not of manual tailored analyses of a simulation set (as in Liang et al. 2023)  LL acknowledges financial support from the Swiss National Science Foundation (grant number P2ZHP2_199729) and the University of Toronto Faculty of Arts and Science.
Finally, we thank the anonymous referee that reviewed the manuscript and provided useful comments.
This research made use of Astropy,6 a community-developed core Python package for astronomy (Astropy Collaboration et al. 2013Collaboration et al. , 2018Collaboration et al. , 2022)), as well as the NumPy (Harris et al. 2020) and SciPy (Virtanen et al. 2020) packages.This research also made use of NASA's Astrophysics Data System Bibliographic Services.
Figure 1.Visualization of correlated LIM observables derived from applying our state-of-the-art understanding of the galaxy-halo connection to one of the cosmological simulations used in this work, and convolved with realistic instrumental responses.The [C ii] signal (top panels), tracing diffuse neutral gas in the ISM, is simulated using the model put forward in this paper, and convolved with a Gaussian angular beam of width ≈ 50 ′′ and a frequency-space profile of width ∼ 3 GHz, corresponding to the near-future capabilities of the CCAT facility.The CO signal (middle panels), tracing molecular gas, is simulated using a model from prior literature(Li et al. 2016) and convolved with an angular beam of width ∼ 4 ′ , corresponding to a 10 metre dish as used by the COMAP Pathfinder(Cleary et al. 2022).The 21 cm signal as shown neglects contributions outside of halos -although the intergalactic medium should not dominate the signal, given the high ionized fraction expected at  ∼ 6 -and is convolved with a beam of width ∼ 10 ′ , corresponding to an interferometer with ∼ 500 m baselines.
The first scenario is a CCAT-like experiment, specifically one conforming to specifications of the proposed baseline CCAT deep spectroscopic survey (CCAT-DSS) of CCAT-Prime Collaboration et al. (2023) with 2000 hours of observing time per 4 deg 2 field.Assuming a pixelization of the survey volume with a pixel size of  FWHM / √ vox is the comoving volume of each voxel, and the prefactor /[4 rest  ()] depends on the speed of light , the rest-frame line emission frequency  rest , and the Hubble parameter  () at the emission redshift.From our peak-patch lightcone simulations, we produce [C ii] intensity cubes of size   = 2 deg along each angular coordinate -i.e., right ascension (RA) and declination -and  obs = 270 ± 20 GHz in frequency space.Based on the CCAT beam width of  FWHM = 48 ′′ or (1/75) deg, we require a pixel size equal to the scale of the corresponding Gaussian profile,  FWHM / √ 8 ln 2, yielding (  / FWHM ) √ 8 ln 2 = 353 pixels on each transverse side.Along the line of sight in frequency space, we have a bandwidth of 40 GHz and a spectral resolution /  ≈ 100 or   = 2.8 GHz, which makes for 15 frequency channels.
, although considerably fewer bright objects and somewhat more faint objects.(The comparison of Béthermin et al. (2022) in turn shows that the model of Chung et al. (2020) predicts a luminosity function largely similar to that of Lagache et al. (2018), as does their own model when using the Lagache et al. (2018) SFR-[C ii] relation.)

Figure 3 .
Figure 3. Evolution of  [C ii] / IR with  IR with the incomplete model at the end of Section 3.1.1(upper panel), dependent only on H i mass, and the full model described by Equation17(lower panel) including metallicity.We show 68% and 95% intervals (unfilled rectangles, with more opaque bins being more abundant in our simulations) and the mean trend (dashed curves) in log-space luminosity bins of width ≈ 0.25 dex.Plotted alongside these are calculated values for simulated FIRE galaxies from similar redshifts.

Figure 4 .
Figure 4. Slices taken from two [C ii] intensity cubes' simulations.One cube was generated with high frequency resolution (   = 0.01 GHz) to illustrate the evolution of the [C ii] signal along the frequency axis, while the other uses our base spectral resolution of   = 2.8 GHz to show a mock observation of [C ii].Left panel: slice of the   = 0.01 GHz cube at ΔRA ∼ 0 deg with the frequency and redshift range indicated along the horizontal axes.Notice how the [C ii] power diminishes from lower to higher redshift.Right panel: angular slice from the   = 2.8 GHz cube at  ∼ 6 or  obs ∼ 270 GHz.

Figure 5 .
Figure5.Models and sensitivities for the [C ii] intensity power spectrum at  ∼ 6 (left panel) and  ∼ 7.5 (right panel).Models are drawn either from simulations used in this work or from previous literature(Silva et al. 2015;Serra et al. 2016;Dumitru et al. 2019;Chung et al. 2020;Karoumpis et al. 2022;Kannan et al. 2022;Sun et al. 2023).Sensitivity forecasts for all surveys assume a mode count and noise level corresponding to the specifications outlined in either this work or in a previous work(Chung et al. 2020) and wavenumber bins of width Δ = 0.035 Mpc −1 .
Figure5.Models and sensitivities for the [C ii] intensity power spectrum at  ∼ 6 (left panel) and  ∼ 7.5 (right panel).Models are drawn either from simulations used in this work or from previous literature(Silva et al. 2015;Serra et al. 2016;Dumitru et al. 2019;Chung et al. 2020;Karoumpis et al. 2022;Kannan et al. 2022;Sun et al. 2023).Sensitivity forecasts for all surveys assume a mode count and noise level corresponding to the specifications outlined in either this work or in a previous work(Chung et al. 2020) and wavenumber bins of width Δ = 0.035 Mpc −1 .

Figure 7 .
Figure 7. Slices of a sample realization of input and recovered signal maps, both at  obs ∼ 270 GHz or  ∼ 6. Upper panel: Map of the [C ii] noiseless signal.Here, the map shown is convolved with a beam of 50 arcseconds to mock the CCAT beam width.Lower panel: Map of a simulated Wienerfiltered observation given parameters for a Stage 2 LIM survey.

Figure 8 .
Figure 8. Radial average intensity profile of a simulated  ∼ 6 LBG stack for pure signal (both with and without a 48" beam), a mock CCAT-DSS observation, and a mock Stage 2 LIM observation.The 'noiseless stack (no beam)' and CCAT recovered profiles are slightly offset from their actual distance coordinates to show them more clearly against the other profiles.

)
Consider the 'true' () to be our fiducial [C ii] model, and consider the alternate () to be a version of the model with a change to some parameter .The PRE due to finding an observation with the fiducial value  0 of the parameter instead of the alternate value of  0 + Δ is  rel (Δ) = −() ln () (;  =  0 + Δ) .

Figure 9 .
Figure 9. Simulated stacks on ∼ 4000 LBG locations from  ∈ (5.9, 6.1), including recovery for the CCAT and Stage 2 experimental scenarios from a single realization of a 2 deg 2 field.Upper left: The stack of the pure signal on the LBG locations, with only the Gaussian beam affecting the signal and no noise.Upper middle: The best-fitting radial profile of the form described in Equation 28 recovered from stacking on a mock CCAT-DSS observation, re-projected into the stack thumbnail.Upper right: Difference map between the recovered CCAT-DSS model thumbnail and the underlying true noiseless stack.Lower left: The result of stacking on a mock Stage 2 observation.Lower middle: Same as the upper middle panel, but with Stage 2 sensitivities.Lower right: Same as the upper right panel but with Stage 2 sensitivities, demonstrating considerably reduced bias overall compared to the CCAT forecast in the upper right panel, but clearly showing some structure missed as a result of the isotropic nature of the simulated analysis.

Figure 10 .
Figure 10.PRE derivatives with respect to a number of parameters (outlined in the main text), re-scaled as indicated in the legend, in either noiseless simulations of the [C ii] signal (upper panel) or simulations of a Stage 2 LIM observation of [C ii] (lower panel).

Figure 11 .
Figure 11.Distortion of entropy derivatives due to observational effects.Leftmost panel: Derivatives of pointwise relative entropy  rel (evaluated in log intensity space) with respect to three parameters (scaled up as indicated in the legend): the  [C ii] proportionality constant  [C ii], the  HI (  h ) power law index  HI , and the metallicity log-normal scatter   (in units of dex).Inner left column of panels: The same derivatives for all three parameters (from top to bottom panels), but with the introduction of a Gaussian beam, shown for several widths of the beam smoothing the signal and suppressing intensities.Inner right column of panels: The derivatives of  rel (now evaluated in linear intensity space) for the same three parameters (again from top to bottom panels), but with the fiducial Gaussian beam width of  FWHM = 48 ′′ , and with varying noise levels smoothing the actual intensity distribution.Rightmost panel: The final pointwise relative entropy derivatives after accounting for both beam smoothing and Gaussian noise at the level of our Stage 2 experimental scenario.Note that as with the leftmost panel, these derivatives are also scaled up as indicated in the legend, and in particular the derivative with respect to   is scaled up by a further factor of 6 compared to in the leftmost panel.

Figure 12 .
Figure 12.Power spectrum of the PRE derivatives for a simulated Stage 2 LIM survey (as shown in the lower panel of Figure 10), representing the amplitude of the Fourier transform.The shaded area corresponds to values of the Fourier dual of voxel intensity that exceed /(2.355n ), where the noise of the Stage 2 LIM survey will likely dominate the variance of the characteristic function and derived quantities.
but of outputs of statistical methods like emulation or unsupervised dimensionality reduction in order to generate predictions for cosmological [C ii] fluctuations from a wide range of galaxy-scale models.Doubtless our understanding of the halo-[C ii] connection will continue to evolve with further observations of high-redshift [C ii], though for now constraints on [C ii] at  ≳ 6 rely on serendipitous detections and specific selections.It may happen that the first detections of cosmological [C ii] through LIM, in either auto-or cross-correlation, will arrive at constraints free from such selection effects to inform further future LIM experiments.PH acknowledges support from CITA over the course of this research, including through the CITA Summer Undergraduate Research Fellowship and an NSERC Undergraduate Student Research Award.Thanks to Patrick Breysse and Clara Chung, whose work with JRB under the 2020 Summer Undergraduate Research Programme in astronomy and astrophysics at the University of Toronto acts as the foundation of the present work.DTC is supported by a CITA/Dunlap Institute postdoctoral fellowship.The Dunlap Institute is funded through an endowment established by the David Dunlap family and the University of Toronto.The University of Toronto operates on the traditional land of the Huron-Wendat, the Seneca, and most recently, the Mississaugas of the Credit River.DTC also acknowledges support through the Vincent and Beatrice Tremaine Postdoctoral Fellowship at CITA.
[C ii] / IR with  IR is far more consistent with the simulated FIRE galaxy population.While the deficit is softened, we still predict a strong break in the ( [C ii] / IR )- IR relation at  IR ≳ 10 10  ⊙ away from the naïve expectation of constant  [C ii] / IR .It is possible that at the very highest IR luminosities shown in Figure

Table 1 .
Recovered radial profile fitting parameters from stacks of noiseless, CCAT-DSS, and Stage 2 LIM mocks of a [C ii] observation on a  ∼ 6 LBG sample.Recall that 1 px in the map corresponds to an angular span of 20 ′′ .
[C ii].For example, increasing  HI makes the  HI ( h ) relation steeper about the pivot point of  h / ⊙ =  min , thus increasing H i mass for halos with  h / ⊙ ≳  min but decreasing it for halos with  h / ⊙ ≲  min .This results in voxels with middling intensities becoming either brighter or dimmer, depending on the halos they contain.Similarly, increasing   can either increase or decrease the metallicity, pushing voxels to both brighter and dimmer intensities.5Thecharacteristic intensity scales and effect sizes associated with these variables are different, meaning that  ( rel )/ HI and  ( rel )/  appear quite distinct from each other as well as from  ( rel )/ [C ii] .