The Difference PDF of 21-cm Fluctuations: A Powerful Statistical Tool for Probing Cosmic Reionization

A new generation of radio telescopes are currently being built with the goal of tracing the cosmic distribution of atomic hydrogen at redshifts 6-15 through its 21-cm line. The observations will probe the large-scale brightness fluctuations sourced by ionization fluctuations during cosmic reionization. Since detailed maps will be difficult to extract due to noise and foreground emission, efforts have focused on a statistical detection of the 21-cm fluctuations. During cosmic reionization, these fluctuations are highly non-Gaussian and thus more information can be extracted than just the one-dimensional function that is usually considered, i.e., the correlation function. We calculate a two-dimensional function that if measured observationally would allow a more thorough investigation of the properties of the underlying ionizing sources. This function is the probability distribution function (PDF) of the difference in the 21-cm brightness temperature between two points, as a function of the separation between the points. While the standard correlation function is determined by a complicated mixture of contributions from density and ionization fluctuations, we show that the difference PDF holds the key to separately measuring the statistical properties of the ionized regions.

iments (McQuinn et al. 2006;Bowman, Morales & Hewitt 2007). The power spectrum is the natural statistic at very high redshifts, as it contains all the available statistical information as long as Gaussian primordial density fluctuations drive the 21-cm fluctuations. However, during reionization the hydrogen distribution is a highly non-linear function of the distribution of the underlying ionizing sources. This follows most simply from the fact that the H I fraction is constrained to vary between 0 and 1, and this range is fully covered in any scenario driven by stars, in which the IGM is sharply divided between H I and H II regions. The resulting non-Gaussianity (Bharadwaj & Pandey 2005) raises the possibility of using complementary statistics to measuring additional information that is not directly derivable from the power spectrum (Saiyad-Ali, Bharadwaj & Pandey 2006).
Numerical simulations have recently begun to reach the large scales (of the order of 100 Mpc) needed to capture the evolution of the IGM during reionization (Mellema et al. 2006;Zahn et al. 2007). These simulations account accurately for gravitational evolution on a wide range of scales but still crudely for gas dynamics, star formation and the radiative transfer of ionizing photons. Analytically, Furlanetto, Zaldarriaga & Hernquist (2004) used the statistics of a random walk with a linear barrier to model the H II bubble size distribution during the reionization epoch. Schematic approximations were developed for the two-point correlation function (Furlanetto et al. 2004;McQuinn et al. 2005), but recently Barkana (2007) developed an accurate, self-consistent analytical expression for the full two-point distribution within the Furlanetto et al. (2004) model, and in particular used it to calculate the 21-cm correlation function.
Noting the expected non-Gaussianity and the importance of additional statistics, Furlanetto et al. (2004) also calculated the one-point probability distribution function (PDF) of the 21-cm brightness temperature T b at a point. The PDF has begun to be explored in numerical simulations as well (Ciardi & Madau 2003;Mellema et al. 2006). Some of the additional information available in the PDF can be captured by the skewness (Wyithe & Morales 2007) or bispectrum (Saiyad-Ali et al. 2006) statistics. Both the correlation function and the PDF are functions of a single variable (at each redshift): the twopoint correlation function is a function of separation, and the PDF is a function of T b . It is possible to create a two-dimensional function by calculating the one-point PDF as a function of smoothing scale (or pixel size), but this quantity is difficult to interpret since it is not simply related to the 21-cm correlation function or to the ionization statistics.
In this paper we consider a two-dimensional function that generalizes both the one-point PDF and the correlation function and yields additional information beyond those statistics. In particular, the variance of this new statistic is simply related to the 21-cm correlation function that is usually considered. This function is the PDF of the difference T b ≡ T b,1 − T b,2 of the 21-cm brightness temperatures T b at two points. We present in the next section our analytical model for predicting the difference PDF; its precise relation to the two-point correlation function is presented in Section 2.3. We present in Section 3.1 illustrative predictions of the difference PDF, argue in Section 3.2 that it is observationally feasible to probe it, and emphasize in Section 3.3 that it can be used to separately measure ionization correlations. We summarize our conclusions in Section 4.

M O D E L
Analytical approaches to galaxy formation and reionization are based on the mathematical problem of random walks with barriers. The statistics of a single random walk can be used to calculate various one-point distributions; in particular, the statistics of a random walk with a linear barrier can be used to calculate the distribution of ionized bubble sizes during reionization (Furlanetto et al. 2004). However, to calculate the correlation function and other two-point distributions requires us to solve for the simultaneous evolution of two correlated random walks at two different points. Scannapieco & Barkana (2002) found an approximate but quite accurate analytical solution in the case of constant barriers and used it to calculate the joint, bivariate mass function of haloes forming at two redshifts; Scannapieco & Thacker (2005) showed that this solution describes well the two-point correlation function of haloes in numerical simulations, particularly when expressed in Lagrangian coordinates (i.e. in terms of the initial comoving halo separation). Barkana (2007) generalized the two-point solution to the case of linear barriers, and applied it to calculate the correlation function of cosmological 21cm fluctuations during reionization. In this section we first review the basic set-up of the two-barrier problem in the context of reionization. We then briefly summarize the solution of Barkana (2007), except that we generalize it slightly to the case of measurements in the presence of additional smoothing (e.g. due to a limited instrumental resolution). We show how to apply this solution to calculate the difference PDF of 21-cm fluctuations.

Reionization: basic set-up
The basic approach for using random walks with barriers in cosmology follows Bond et al. (1991), who used it to rederive and extend the halo formation model of Press & Schechter (1974). In this approach we work with the linear overdensity field δ(x, z) ≡ ρ(x, z)/ρ(z)−1, where x is a comoving position in space, z is the cosmological redshift andρ is the mean value of the mass density ρ. In the linear regime, the overdensity grows in proportion to the linear growth factor D(z) (defined relative to z = 0). This fact is used in order to extrapolate the linear density field to the present time, i.e. the initial density field at high redshift is extrapolated to the present by multiplication by the relative growth factor. We adopt this view, and throughout this paper quantities such as δ and the power spectrum P(k) refer to their values linearly extrapolated to the present. In each application there is in addition a barrier that signifies the critical value which the linearly extrapolated δ must reach in order to achieve some physical milestone on some scale. In this work the milestone corresponds to having a sufficient number of galaxies within some region in order to fully reionize that same region.
At a given z, we consider the smoothed density in a region around a fixed point A in space. We begin by averaging over a large scale or, equivalently, by including only small comoving wavenumbers k. We then average over smaller scales (i.e. include larger k) until we find the largest scale on which the averaged overdensity is higher than the barrier; in the application to reionization, we then assume that the point A belongs to an H II bubble of this size. Mathematically, if the initial density field is a Gaussian random field and the smoothing is done using sharp k-space filters, then the value of the smoothed δ undergoes a random walk as the cut-off value of k is increased. Instead of using k, we adopt the (linearly extrapolated) variance S of density fluctuations as the independent variable. While the solutions are derived in reference to sharp k-space smoothing, we follow the traditional extended Press-Schechter approach and substitute real-space quantities in the final formulae. In particular, S is calculated as the variance of the appropriate mass M enclosed in a spatial sphere of comoving radius r.
We apply mathematical random-walk statistics to the distribution of H II regions during reionization using the model of Furlanetto et al. (2004). According to this model, a given point A is contained within a bubble of size given by the largest surrounding spherical region that contains enough ionizing sources to fully reionize itself. If we ignore recombinations, then the ionized fraction in a region is given by ζ f coll , where f coll is the collapse fraction (i.e. the gas fraction in galactic haloes) and ζ is the overall efficiency factor, which is the number of ionizing photons that escape from galactic haloes per hydrogen atom (or ion) contained in these haloes. This simple version of the model remains approximately valid even with recombinations if the number of recombinations per hydrogen atom in the IGM is roughly uniform; in this case, the resulting reduction of the ionized fraction by a constant factor can be incorporated into the value of ζ .
In the extended Press-Schechter model (Bond et al. 1991), in a region containing a mass corresponding to variance S, where S min is the variance corresponding to the minimum mass of a halo that hosts a galaxy, δ is the mean density fluctuation in the given region, and δ c (z) is the critical density for halo collapse at z. While this describes fluctuations in f coll well, the cosmic mean collapse fraction (and thus the overall evolution of reionization with redshift) is better described by the halo mass function of Sheth & Tormen (1999) [with the updated parameters suggested by Sheth & Tormen (2002)]. We thus use the latter mean mass function and adjust f coll in different regions in proportion to the extended Press-Schechter formula; Barkana & Loeb (2004) suggested this hybrid prescription and showed that it fits a broad range of simulation results. With these assumptions, the exact ionized fraction x i in a region is given bȳ wheref ST andf PS are the cosmic mean collapse fractions according to the Sheth-Tormen and Press-Schechter models, respectively. The resulting condition x i = 1 for having an ionized bubble of a given size, written as a condition for δ versus S, is of the same form as in Furlanetto et al. (2004), at a given redshift, and thus (as they showed) yields a linear barrier to a good approximation (see also Furlanetto, McQuinn & Hernquist 2006a). We write the effective linear barrier in a general notation: where with erfc −1 denoting the inverse function of erfc. For consistency, in what follows we use a modified formula for the ionized fraction, replacing equation (2) with the expression that corresponds to the linear approximation of the barrier: This replacement ensures that the ionized fraction varies from 0 to 1 as δ goes from −∞ up to the barrier. We also denote the neutral fraction x n = 1 − x i and, in particular, The approximation of the linear barrier is quite accurate as long as the maximum S that we consider is much smaller than S min , which is the case in the applications of the model in this paper, where the maximum S is set by the resolution of the upcoming experiments (see the next subsection). Because of some approximations in this model, the total ionized fraction as given by the model (see equation 15 in the next subsection) comes out slightly different from the direct result for the mean global ionized fraction, x i = ζf ST in terms of the cosmic mean collapse fraction (note that the model of Furlanetto et al. (2004) suffers from a similar difficulty). To deal with this, we adopt the direct values of x i versus redshift, and adjust ζ within the model to an effective value of ζ at each redshift that gives a model value of x i that equals the desired one. This typically only requires an adjustment by a few per cent or less.

The 21-cm one-point PDF
Before considering two-point functions, we first calculate the 21-cm PDF around one point by following Furlanetto et al. (2004), except that we obtain the ionized fraction from equation (5) for consistency with the barrier, and we also apply a non-linear correction to the density. We denote the PDF itself by P T b , and the cumulative probability distribution (CPD) During cosmic reionization, we assume that there are sufficient radiation backgrounds of X-rays and of Lyα photons so that the cosmic gas has been heated to well above the cosmic microwave background (CMB) temperature and the 21-cm level occupations have come into equilibrium with the gas temperature. In this case, the observed 21-cm brightness temperature relative to the CMB is independent of the spin temperature and, for our assumed cosmological parameters, is given by (Madau, Meiksin & Rees 1997) with = x n [1 + δ L (z)], where x n is the neutral hydrogen fraction and δ L (z) ≡ D(z)δ is the linear overdensity at z (as opposed to δ which denotes the density linearly extrapolated to redshift 0). Under these conditions, the 21-cm fluctuations are thus determined by fluctuations in .
In the model, x n is determined by the halo abundance, which is in turn determined by the statistics of the linear density field, and thus x n naturally falls within the correct range of 0-1. However, also depends on the actual density, and the linear density δ L (z) can take on unphysical values below −1. While a full non-linear model would be difficult to solve, within the context of the model where statistics are averaged over spherical regions, we can make a simple, approximate correction in order to get reasonable values for the actual, non-linear density δ NL (z). Mo & White (1996) developed such an approximate formula for δ L as a function of δ NL , based on spherical collapse (of overdensities) or spherical expansion (of voids). In particular, they incorporated the asymptotic limits of δ NL → −1 and δ NL → ∞ as well as the correct behaviour near δ NL = 0. Their formula is valid for the Einstein-de Sitter universe or more generally at high redshift (when the dynamical effect of the cosmological constant is negligible). We similarly develop and use here an accurate approximation for the inverse function, where δ E c = 1.686 47 is the critical collapse overdensity in an Einstein-de Sitter cosmology. Thus, the expression we use for is If we denote by P the one-point PDF of at a point, then this is related to the 21-cm one-point PDF defined above by Also, the corresponding CPDs are equal. We also assume that the PDF is considered on a resolution scale r res (corresponding to a variance S res ), i.e. the density and ionization states are averaged on the scale r res around the point being considered. In other words, we are really considering the density and ionization distributions in a region of size r res centred at a point.
We must now consider the separate contributions to the PDF of from two possible cases. First, if the point lies within a fully ionized region, then is identically zero, so this case contributes a δ Dfunction (Dirac delta function) at 0, containing the total probability that the region is ionized. This probability is given by the quantity F >,lin (ν, μ, S res ) in equation (15) in Barkana (2007), as first derived by McQuinn et al. (2005). The second case is if the point lies within a region that is still partially neutral. In this case, is given by equation (9), where for each value of δ we use x n =x n (z, S min , δ, S res ) from equation (6). We then distribute the total probability of this case into various values of using the conservation of probability, i.e. for each possible value of δ, the probability Q lin (ν, μ, δ, S res ) dδ [equation (14) in Barkana (2007), again in agreement with McQuinn et al. (2005)] contributes to P ( ) d at the value of that corresponds to δ. As noted by Furlanetto et al. (2004), there is a maximum value of , max , since → 0 in the two extreme limits, both when the density goes to zero (due to the density term in ) and when it goes up to the barrier (due to full ionization). Thus, two values of δ contribute to each value of , and P ( ) is singular at max .

The 21-cm difference PDF
We now consider two separate points, with the same assumptions as in the previous subsection. We wish to consider the PDF of the difference T b ≡ T b,1 − T b,2 of the 21-cm brightness temperatures T b at two points (or, in fact, averaged over regions centred at each of the two points). We denote the PDF itself by P T b , and the CPD If we denote by P the PDF of the difference ≡ | 1 − 2 | between the values of in the two regions, then this is related to the 21-cm difference PDF by Also, the corresponding CPDs are equal. We therefore wish to determine the PDF of the difference as a function of the comoving distance d between two points being considered at redshift z. As before, we also assume that the PDF is considered on a resolution scale r res , i.e. we are really considering the joint density and ionization distributions of two regions of size r res centred at two points separated by a distance d. The model of Barkana (2007) provides the probability that either or both of these regions lie completely within H II bubbles and, when the regions are not fully ionized, the model provides the correlated distributions of their average overdensities.
We must now consider the separate contributions to the PDF of from three possible cases. First, if both points are within fully ionized regions, then is identically zero, so this case contributes a δ D -function at 0, containing the total probability that both regions are ionized. This probability is given by the quantity F > (ν, ν, μ, μ, S res , S res , ξ res (d)) in equation (40) in Barkana (2007), where ξ is the effective real-space cross-correlation between the densities of the two regions [see section 4.1 of Barkana (2007)]: where W(x) is the Fourier transform of a spherical top-hat window function. The second case is where one of the regions is fully ionized, so, e.g. we assume that region 2 is in an H II bubble while region 1 is not, and then double the contribution in order to include the symmetric, opposite situation. In this case, = where for each value of δ 1 we use x n 1 =x n (z, S min , δ 1 , S res ) from equation (6). We then distribute the total probability of this case into various values of using the conservation of probability, i.e. for each possible value of δ 1 , the probability f [δ| ] (ν, ν, μ, μ, δ 1 , S res , S res , ξ res (d)) d δ 1 [equation (43) in Barkana (2007)] contributes to P ( ) d at the value of that corresponds to δ 1 . The final case is where both regions are not fully ionized. In this case, = | 1 − 2 | is a function of δ 1 and δ 2 , and the conservation of probability turns Q(ν, ν, μ, μ, δ 1 , δ 2 , S res , S res , ξ res (d)) dδ 1 dδ 2 [equation (36) in Barkana (2007)] into P ( ) d at the appropriate value of . The variance of the 21-cm PDF (for two points separated by a distance d) is where the ordinary 21-cm correlation function isξ T b,2 , and theξ notation denotes averaging on the resolution scale r res . The variance can be calculated from the PDF using one of these expressions: where we integrated by parts to get the second expression.
To calculate the correlation function of T b (or, equivalently, of ) without first calculating the PDF, we can calculate various expectation values using the Barkana (2007) solution, once again generalized to include a resolution/smoothing length. First, the mean ionized fraction in the model is where F >,lin and Q lin are given in section 3 of Barkana (2007), while the mean at a point is For two points separated by a distance d, which is a generalization of equation (49) of Barkana (2007) and reduces to that equation in the limit S res → S min . Also, in the limit d → 0 equation (17)

The full 21-cm difference PDF
In this subsection we use our model to predict the 21-cm difference PDF, plot it in full as a two-dimensional function of the separation d and the brightness temperature difference T b of the two points, and explore its dependence on a number of the input parameters (including the redshift and the resolution scale). In the following subsection we then show that important information can be extracted using gross features of the PDF that are insensitive to its detailed shape; Figure 1. The one-point PDF and CPD of the 21-cm brightness temperature (panels on the left-hand side), and the two-point difference PDF and CPD (panels on the right-hand side). We consider redshift z = 8.26, when the mean global ionized fraction is x i = 0.5. In our calculations we assume a constant ionization efficiency in all haloes in which atomic cooling occurs (i.e. in all haloes above a circular velocity V c = 16.5 km s −1 ), set so that reionization ends at z = 6.5. The assumed resolution (i.e. the diameter of each region) is 3 arcmin in all panels. Note that in each case, the PDF has an additional δ D -function (not shown) that brings the value of the corresponding CPD to unity at T b = 0 (or T b = 0). Also, the PDF becomes singular at the maximum value of T b (or T b ). For the two-point distributions, the full PDF or CPD (solid curve) is the sum of contributions from both regions being neutral (short-dashed curve) and from one of them being neutral (longdashed curve); the comoving separation between the two centres is set equal to 10 Mpc.
in particular, this information can be used to cleanly separate out and measure statistics of the ionization field that otherwise would be mixed in and convolved with the density field within the usually considered two-point correlation function. Throughout this section, we illustrate our predictions in a CDM universe that includes dark matter, baryons, radiation, and a cosmological constant. We assume cosmological parameters that match the three-year Wilkinson Microwave Anisotropy Probe (WMAP) data together with weak lensing observations (Spergel et al. 2007), namely m = 0.299, = 0.701, b = 0.0478, h = 0.687, n = 0.953 and σ 8 = 0.826. Fig. 1 shows an example of the 21-cm one-point PDF and CPD and the two-point 21-cm difference PDF and CPD. For the one-point function, there are two separate contributions; the case of a partly neutral region is shown as a function of T b , while the case where the region is ionized contributes an additional δ D -function to the PDF, or equivalently a step function to the CPD. In the CPD the size of this step function can be easily read off as the additional value needed to bring it up to unity at T b = 0. Similarly, for the difference PDF and CPD there are three separate contributions; two of them -the cases of both regions being partly neutral or just one of them -are shown as functions of T b , while the third case -with both regions fully ionized -contributes an additional δ D -function to the PDF. Again, in the CPD the size of the step function equals the additional value needed to bring the CPD up to unity at T b = 0. Another advantage of the one-point or two-point CPDs, pointed out by Furlanetto et al. We show the value of as given by equation (9) (solid curves) and of the mean neutral fraction x n in the region as given by equation (6) (dashed curves). We consider redshifts when the mean global ionized fraction is x i = 0.1, 0.3, 0.5, 0.7 or 0.9 (from right-to left-hand side). Our calculations assume a constant ionization efficiency in all haloes in which atomic cooling occurs, set so that reionization ends at z = 6.5. The assumed resolution (i.e. the diameter of the region) is 1 or 5 arcmin as indicated. Note that the range where δ L < −1 is physically sensible since it corresponds to a δ NL > −1.
(2004) in the one-point case, is that the PDF becomes singular (in the model) at the maximum value of T b while the CPD does not diverge. Thus, for these two reasons, we henceforth prefer to plot the CPD instead of the PDF.
The model predicts characteristic shapes for the PDF and CPD during reionization. In particular, the one-point function cuts off at some maximum value of T b (which corresponds in this case to max = 0.74), and has most of the probability near the cut-off. The reason for this behaviour is shown explicitly in Fig. 2. Except very early in reionization, a small (or even somewhat negative) value of overdensity suffices in order to significantly ionize a region. In particular, if we increase the overdensity of a region in the model, eventually we reach a large-enough overdensity at which the region fully reionizes itself and drops to zero. Thus it is not possible to have arbitrarily large values of . In practice, reaches a maximum at δ L (z) values near zero. Two factors then ensure that most of the probability of values is located around this maximum value. First, the function versus δ L (z) is flat near its maximum (particularly in the later stages of reionization), and secondly, the probability distribution of δ L (z) is centred at values of δ L (z) near zero. Note that we assume a Gaussian probability distribution for δ L (z), although the density is weakly non-linear and its distribution will thus be slightly modified. For instance, the standard deviation of δ L (z) in these examples is ∼0.3 for a 1-arcmin resolution, and half that for a 5-arcmin resolution. Since reionization is driven by the distribution of haloes, and the halo number density is strongly coupled to the mean density in each region, we expect the functional form of versus δ L (z) to be fairly robust. This means that the shape of the PDF will also be fairly robust even if the probability distribution of density becomes slightly non-Gaussian.
Returning to Fig. 1, we see that in the plotted case, the contribution of the one-neutral-one-ionized case to the difference PDF is similar (though not identical) in shape to the one-point PDF, and in particular is centred near the maximum value of T b . This results from the fact that in this case, the value of T b is simply equal to T b for the region that is not fully ionized. Note that in general the maximum value of T b for the difference PDF is equal to the maximum value of T b for the one-point PDF, since is always non-negative. The both-neutral contribution to the difference PDF is quite different, since in each of the two regions tends toward max , so the most likely difference between the two values is zero. This tendency is further strengthened by the correlation between the densities in the two regions. The outcome of all of this is a difference PDF that has two peaks, with a valley at intermediate values of T b . As a result, the 21-cm difference CPD first declines at small values of T b (where it is dominated by the both-neutral case), then flattens at larger values, and finally cuts off sharply at the maximum value of T b .
The bottom panels of Fig. 3 again show the full CPD (with the non-linear correction of the density), but also compare it to the cumulative of a Gaussian (i.e. an error function) with the same variance. The CPD shape described above is clearly very different from the error function. As illustrated by the top panels, the nonlinear correction that we have applied to the density is important in order to ensure that is always positive and that there is a sharp cut-off at a maximum value of T b (while a calculation with linear densities leads to the unphysical result of having negative values of when the density fluctuation is more negative than −1). Other Figure 3. The CPD of the 21-cm brightness temperature difference between two spherical regions. We consider redshift z = 8.26, when the mean global ionized fraction is x i = 0.5. We compare the full CPD (solid curve), which is the sum of contributions from both regions being neutral (short-dashed curve) and from one of them being neutral (long-dashed curve), to the CPD of a Gaussian with the same variance as the total CPD (dotted curve). The comoving separation between the two centres is 10 Mpc, and the assumed resolution is 1 or 5 arcmin, as indicated. In each case, we show the purely linear calculation (top panels) or the full calculation with a non-linear correction of the density (bottom panels). Note that in each panel, the CPD has an additional step function (not shown) that brings its value to unity at T b = 0. In our calculations we assume a constant ionization efficiency in all haloes in which atomic cooling occurs (i.e. in all haloes above a circular velocity V c = 16.5 km s −1 ), set so that reionization ends at z = 6.5.

Figure 4.
The CPD of the 21-cm brightness temperature difference between two spherical regions. We consider redshifts when the mean global ionized fraction is x i = 0.1 (dashed curve) or x i = 0.3, 0.5, 0.7 or 0.9 (solid curves, from right-to left-hand side where the CPD is small). The comoving separation between the two centres is d = 10 or 30 Mpc, and the assumed resolution is 1 or 5 arcmin, as indicated. In the bottom left-hand panel, we also compare the CPD curves to CPDs of Gaussians with the same variance (dotted curves, in order from right-to left-hand side: x i = 0.5, 0.3, 0.7, 0.1 and 0.9). Note that the CPD has an additional step function that brings its value to unity at T b = 0. In our calculations we assume a constant ionization efficiency in all haloes in which atomic cooling occurs, set so that reionization ends at z = 6.5. than this cut-off, the non-linear correction modifies the shape of the CPD only slightly, with the correction having a smaller effect in the case where the resolution angle is larger (and where fluctuations on the corresponding scale are more linear). Thus, while a non-linear correction of the density is required to ensure a physical result (with a non-negative density), we do not expect our results to depend strongly on the precise form of non-linear correction that we have used. Fig. 4 shows the time evolution of the CPD during reionization, considered at two different values of the separation d. Throughout the parameter range considered, the CPD clearly has the same characteristic shape as noted above (although for x i = 0.1 in the top panels, the flat portions occur at lower values of the CPD than are included in the plot). In the bottom left-hand panel, we show an example of two cases ( x i = 0.3 and 0.5) which have nearly identical variances (i.e. the corresponding Gaussian CPD curves are nearly indistinguishable), and yet the actual 21-cm difference CPDs differ substantially in these two cases. The figure illustrates how the CPD evolves during reionization, declining with time at low T b values (since the probability associated with both regions being fully ionized increases), and cutting off at a lower value of T b (since the overdensity needed for full reionization of a given region declines as reionization progresses globally -see Fig. 2).
The information on spatial correlations contained within the CPD is illustrated more clearly in Fig. 5. In most of the cases shown, the d = 30 and 100 Mpc curves are nearly indistinguishable, since even at a 30 Mpc separation the two regions are nearly independent. The CPD drops rapidly as the separation is decreased, with the probability becoming concentrated near T b = 0 once the two regions Figure 5. The CPD of the 21-cm brightness temperature difference between two spherical regions. We consider redshifts when the mean global ionized fraction is x i = 0.25, 0.5 or 0.75, as indicated. We assume a resolution of 1 arcmin (solid curves) or 5 arcmin (dashed curves), at a comoving separation d = 100, 30, 10, 3 and 1 Mpc (from right-to left-hand side in each set of curves). Note that the CPD has an additional step function that brings its value to unity at T b = 0. In our calculations we assume a constant ionization efficiency in all haloes in which atomic cooling occurs, set so that reionization ends at z = 6.5. become highly correlated. The decline with separation, which occurs at d = 1-10 Mpc early in reionization ( x i = 0.25) but over a broader range of d = 1-30 Mpc later on ( x i = 0.75), indicates the relative importance of bubble and density correlations on various scales.
A full measurement of the CPD would yield a two-dimensional function of d and T b at each redshift. This full function is illustrated with a contour plot in Fig. 6. Regions that contain much of the probability -i.e. where the CPD changes rapidly and there are large spaces between consecutive contours -indicate both the characteristic scale d of correlations and a corresponding characteristic value of T b (which is related to the correlated distribution of densities in the two separated regions).

Observational considerations
The results shown in the previous section indicate that the full 21-cm difference PDF would be a great tool to study theoretically and to observe. However, before proceeding we briefly consider the feasibility of measuring this quantity observationally.
Measurements of the cosmological 21-cm signal must overcome foregrounds that are brighter than the signal by orders of magnitude. These foregrounds, which include Galactic synchrotron and freefree emission as well as extragalactic radio sources, are expected to be spectrally smooth, in contrast to the signal during cosmic reionization. This difference in character is expected to allow the foregrounds to be cleaned effectively in 21-cm maps (e.g. McQuinn et al. 2006;Furlanetto et al. 2006b;Bowman et al. 2007). Thus, the major difficulty is expected to be the noise in the telescope, which is mainly determined by the high brightness temperature of the sky (which is dominated by the Galactic emission). This noise has a standard deviation given by the radiometer equation (e.g. Furlanetto Figure 6. Contour plot of the CPD of the 21-cm brightness temperature difference between two spherical regions. Same data as represented in the previous figure, but in a contour plot which emphasizes the fact that a twodimensional data set, C T b as a function of T b and d, is available to measure at each redshift. We again consider x i = 0.25, 0.5 or 0.75, as indicated. Contours go from a value C T b = 1 (short-dashed curve) to C T b = 0.01 (long-dashed curve), passing through C T b = 0.1 (dotted curve). Equal logarithmic spacing of the values of the CPD is used within each decade, with finer spacing at the highest decade of CPD values (13 contours total in each panel). We show the case of a 1-arcmin resolution, and again assume a constant ionization efficiency in all haloes in which atomic cooling occurs, set so that reionization ends at z = 6.5. et al. 2006b), and is expected to have a Gaussian distribution, although since the signal during reionization is highly non-Gaussian, it is sufficient for the noise PDF to be separately measurable and it need not be precisely Gaussian. Assuming night-time observations in high-Galactic-latitude, relatively radio-quiet portions of the sky, the expected noise per pixel for LOFAR is σ N ∼ 24 mK at z = 8.26, assuming a 1000-h integration with an effective area of 4 × 10 4 m 2 and an angular resolution of 3 arcmin (as well as an equal line-of-sight resolution, corresponding to a frequency resolution ν ∼ 0.5 MHz) (e.g. Furlanetto et al. 2006b).
Since each pixel comes with random noise, the observed onepoint PDF is the theoretical one convolved with a Gaussian with a 24 mK standard deviation. Since the difference PDF receives a contribution from the difference between the noise of two pixels, the observed difference PDF is the theoretical one convolved with a Gaussian with a 34 mK (= √ 2×24 mK) standard deviation. In Fig. 7 we show the observed one-point and difference PDF, corresponding to the convolution of the functions in Fig. 1 with measurement noise. The noise is clearly large and has a major effect on both the one-point and difference PDF, turning each of them into something resembling a single Gaussian function. However, they are each composed of two or three separate, nearly Gaussian contributions, which are significantly separated in their means. This separation is equal to the maximum value of T b (or T b ), 17.3 mK in this case, which is comparable to the standard deviation of the noise and thus should be observable.
Quantitatively, we can estimate the observational feasibility of telling the difference between the actual PDF and a single Gaussian. We compare each of the PDFs shown in Fig. 7 to a Gaussian  Fig. 1 for a 3-arcmin resolution at z = 8.26 are here shown convolved with the telescope noise expected in observations with LOFAR (see text). The convolved one-point PDF (solid curve) is the sum of the convolutions of the PDF arising from partially neutral pixels (dashed curve) and of the δ D -function at T b = 0 arising from fully ionized pixels (dotted curve). The convolved difference PDF (solid curve) is the sum of the convolutions of the PDF arising from both pixels being partially neutral (short-dashed curve) or from exactly one of them being partially neutral (long-dashed curve) and of the δ D -function at T b = 0 arising from both pixels being fully ionized (dotted curve). Note that we use the LOFAR parameters for this illustration, although the MWA seems to offer even better prospects for the difference PDF, due to its larger field of view and thus larger number of pixels (though larger noise per pixel). of equal mean and variance (and similarly normalized to unity). Integrating the absolute value of the difference between the actual curve and a single Gaussian, we find that this difference covers an overall dimensionless area of ∼1 per cent in this case. This suggests that if ∼10 4 data points are observed for the PDF then its detailed shape will be reconstructed to better than 1 per cent, allowing us to probe the difference between the PDF shape and the Gaussian. This is rather promising, since with a field of view of ∼10 • , LOFAR will have ∼10 5 pixels per field. Indeed, this feasibility estimate is overly pessimistic, since if the noise is separately known to have a standard deviation of 24 mK, the single-Gaussian standard deviation of 25 mK (which would be measured to better than 0.1 mK with 10 5 pixel) would further help to reconstruct the PDF. For the difference PDF, the number of pixel pairs will be ∼10 10 , to be divided among different bins of pixel separation. Note that the MWA is expected to have ∼3 times larger noise per pixel but about 10 times more pixels per field than LOFAR (e.g. Hansen, Oh & Furlanetto, in preparation;Furlanetto et al. 2006b), which will be particularly advantageous for measuring the difference PDF.
Even if the shape of the observed PDF is measured relatively well, it will still be difficult to reconstruct the detailed shape of the underlying, theoretical PDF. This is because the convolution with the broad Gaussian noise washes out much of the detailed shape of the PDF. Thus, in the following section we study what information can be extracted just from the gross features of the PDF, which seem likely to be measurable from the upcoming observations. Indeed, preliminary studies using a detailed maximum likelihood analysis indicate that the cosmic mean ionized fraction at each redshift can be reconstructed from the observed one-point PDF (Hansen et al., in preparation). Since the number of points available for measuring the difference PDF is much larger (as all pixel pairs contribute), we expect this to be significantly easier than for the one-point PDF.

Using the difference PDF to separately measure ionization statistics
Further study may show that the detailed shape of the difference PDF can be used as a sensitive probe of the underlying astrophysics (such as properties of the population of ionizing sources). However, given the limited signal-to-noise ratio expected to be available in observational measurements of the difference PDF (see the previous subsection), it is important to consider robust ways to extract significant information even from rough measurements of the onepoint PDF and the two-point, difference PDF. In this subsection we show that even if the gross features of the PDF were measured, they already would reveal important information that is not directly available from measurements of the correlation function alone.
Another advantage of this approach is that it is more robust with respect to the theory. The analytical model we consider in this paper is approximate and neglects some non-linear corrections and other physical effects, so the precise PDF shape would change somewhat in more complete models or numerical simulations. However, we expect our model to correctly capture the gross features of the PDF, which likely constitute more robust predictions.
We first consider the one-point PDF. Two gross quantities can be simply extracted from it: the probability P(x i = 1) that a region of the resolution size is fully ionized, and the value of max . The first quantity can be extracted by measuring the size of the step function of the CPD, or (equivalently) by subtracting from unity the value of the CPD just above T b = 0; alternatively, this value equals the integrated area under the PDF curve, not including the δ D -function at zero. Thus, an observational measurement depends on separating the two contributions centred on different values of T b (see also the previous subsection). Such a measurement will also yield the value of max ; this value can be derived from the maximum value of T b using equation (7). Note that while a fully realistic PDF may not feature such a total sudden drop at a maximum value of T b , there should nonetheless be a definite, sharp drop due to the strong dependence of ionization on density through variations in the halo number density. Note also that in simulations, while a region cannot be truly fully ionized (because of the presence of very highdensity gas), the interpretation of P(x i = 1) in this case is where the bubbles within the region have fully overlapped and all the gas is highly ionized except for some gas at δ 1 (which at high redshift generally makes up only a small volume and mass fraction). Indeed, in simulations by Mellema et al. (2006) the PDF as a function of T b has a fairly flat portion (though not always increasing) followed by a rather sharp cut-off. We note that the shape of such statistics as measured in simulations has not yet been physically justified or been subjected to numerical convergence tests. But it is generically expected that the two quantities P(x i = 1) and max should clearly feature in the PDF. If models or simulations can reliably establish at least the approximate form of the PDF, then this would make it easier to measure these two quantities. Now consider the difference PDF (see Fig. 1). One gross feature is obviously a cut-off that can also be used to measure the same value of max . The difference PDF can also be used to extract three other interrelated quantities, each as a function of the separation d: the probability P(x i 1 = 1, x i 2 = 1) of joint full ionization; the probability 2P(x i 1 = 1, x i 2 < 1) of full ionization of only one of the two regions (with the factor of 2 accounting for the symmetry of selecting which region is the ionized one); and the probability P(x i 1 < 1, x i 2 < 1) that neither region is fully ionized. The first probability can be extracted from the size of the step function of the CPD at T b = 0, while the other two can be extracted from the areas under the two peaks that are fairly well separated in the PDF during reionization (see Fig. 1), or more accurately by modelling the two separate contributions to the PDF at T b > 0. In reality, the three probabilities need not be measured separately, since they can easily be shown to be closely related to each other; in fact, P(x i 1 = 1, x i 2 = 1) together with the one-point quantity P(x i 1 = 1) can be used to express the other two two-point probabilities: and These relations should make it much easier to extract this information from even approximate measurements of the difference PDF.
Thus, from the one-point and two-point PDF, we can measure two independent probabilities that depend directly only on ionization statistics, not mixed in with the value of the density. These are P(x i = 1) (a single quantity at each redshift) and P(x i 1 = 1, x i 2 = 1) (a function of d at each redshift). Even a rough measurement of the PDFs may yield a reasonable estimate of these gross quantities. Note also that in the limit of infinitely good resolution (i.e. a very small resolution scale), P(x i = 1) becomes the cosmic mean ionized fraction, and P(x i 1 = 1, x i 2 = 1) becomes the ionization correlation function (after subtracting the square of the mean ionized fraction). In addition, the value of max yields an interesting piece of information on the dependence of ionization on the density. All of these quantities are separate from the correlation function, which yields just one function of d that is a complicated convolution of fluctuations in density and ionization. Fig. 8 shows predictions of our model for the two quantities P(x i = 1) and max that are available from the one-point PDF. P(x i = 1) can be used as a rough estimate for the cosmic mean x i , although this works better with high resolution (1 arcmin) and late in reionization. The robustness of theoretical predictions of the relation between P(x i = 1) and x i can be investigated with further models and simulations. Even for a fixed end-of-reionization redshift and when expressed as functions of x i , both P(x i = 1) and max depend significantly on the characteristic halo mass of ionizing sources. Fig. 9 shows the three interrelated probabilities obtainable from the difference PDF, compared to the expectation value 1 2 . The standard 21-cm two-point correlation function does not actually yield 1 2 but rather subtracts off 2 , which is the value of 1 2 at d → ∞. Thus, in the figures the information available from the two-point correlation function is not the absolute plotted values of 1 2 , but just the values relative to the large-scale asymptotic value. Therefore, the ability to measure the two-point correlation function and extract useful information from it depends on the total change in 1 2 from small to large scales. This total change is smaller than the overall change with scale of most of the curves that show the ionization probabilities. The probability P(x i 1 = 1, x i 2 = 1) goes from P(x i 1 = 1) at d → 0 to [P(x i 1 = 1)] 2 at d → ∞, while P(x i 1 < 1, x i 2 < 1) goes from 1 − P( The largest change is seen in the probability that one Assuming parameters for which the universe fully reionizes at z = 6.5, we consider stars forming (with a constant ionization efficiency) in all haloes above V c = 16.5 km s −1 (corresponds to efficient atomic cooling) or only in haloes above V c = 45 km s −1 (corresponds to strong feedback in low-mass haloes, e.g. due to photoheating or supernovae). In each panel, we show P(x i = 1) for V c = 16.5 km s −1 (solid curve) or V c = 45 km s −1 (dot-dashed curve), compared to the cosmic mean x i (straight dotted line). We also show max for V c = 16.5 km s −1 (short-dashed curve) or V c = 45 km s −1 (long-dashed curve).

Figure 9.
Gross observables available from the 21-cm two-point difference PDF, shown versus the separation of the two points. Assuming parameters for which the universe fully reionizes at z = 6.5, we consider stars forming with a constant ionization efficiency in all haloes above V c = 16.5 km s −1 . For the resolution we assume a value of 1 arcmin. We consider at various stages during reionization ( x i = 0.1, 0.3, 0.5, 0.7 or 0.9), the standard quantity 1 2 from the 21-cm two-point correlation function (dotted curves, from top to bottom) as well as the three interrelated gross quantities from the difference PDF (which together add up to unity): P(x i 1 = 1, x i 2 = 1) (solid curves, from bottom to top), 2 P(x i 1 = 1, x i 2 < 1) (long-dashed curves; at d = 100 Mpc, x i = 0.1, 0.9, 0.3, 0.5 and 0.7, from bottom to top), and P(x i 1 < 1, x i 2 < 1) (short-dashed curves, from top to bottom).

Figure 10.
Derivatives of the 21-cm two-point quantities. We show at various stages during reionization |d/d log (d)| of P(x i 1 = 1, x i 2 = 1) (solid curves) and of 1 2 (dotted curves). For each quantity, we consider various stages during reionization ( x i = 0.1, 0.3, 0.5, 0.7 and 0.9, in order from left-to right-hand side at the peak of the curve).
region is ionized and the other is not; this varies from zero at small The characteristic scales of the bubble correlations can be read off as the scales where the various quantities in Fig. 9 change most rapidly as a function of d. In order to focus on this important feature, we plot the derivatives with respect to log (d) in Fig. 10, just for 1 2 and for P(x i 1 = 1, x i 2 = 1) (since the other two probabilities can be derived from this quantity). In general, both P(x i 1 = 1, x i 2 = 1) and 1 2 show roughly the same dominant scales, but P(x i 1 = 1, x i 2 = 1) shows a greater variation with scale during the central stages of reionization at which the variation is maximized. Fig. 11 shows similar trends in the case of reionization by more massive and highly biased haloes, except that here the characteristic scale grows to even Figure 11. Same as Fig. 10, except that we assume stars form with a constant ionization efficiency in all haloes above V c = 45 km s −1 . Figure 12. Same as Fig. 10, but for the case of a 5-arcmin resolution. In this case, the x i = 0.1 case cannot be seen in P(x i 1 = 1, x i 2 = 1) (which is essentially zero in this case, at all d). Here, the order of increasing peak heights (not positions) is x i = 0.1, 0.3, 0.9, 0.5 and 0.7, for 1 2 , and x i = 0.3, 0.5, 0.9 and 0.7 for P(x i 1 = 1, x i 2 = 1).
larger values by the end of reionization; early on in reionization, the characteristic scale is approximately the same as in the previous case of less massive haloes, but the magnitudes of the scale derivatives are larger in the case of the more massive haloes (corresponding to stronger ionization and 21-cm fluctuations). Fig. 12 demonstrates the importance of achieving high resolution in the upcoming experiments. In the scenario considered here, a cosmic mean ionization fraction of one half occurs at z = 8.26, at which the resolution of 1 arcmin corresponds to a radius of 1.3 comoving Mpc (com Mpc), and 5 arcmin to 6.7 com Mpc. The typical correlation scale grows with time and overtakes the 5 arcmin scale only late in reionization ( x i ∼ 0.7), so until then the peaks of the curves in Fig. 12 indicate roughly the resolution scale (rather than the desired correlation scale). Observations should therefore achieve a resolution of a few arcminutes or better in order to make it possible to measure the evolution of the dominant correlation scale throughout most of the reionization era.

C O N C L U S I O N S
We have presented and studied a new statistic for analysing 21-cm fluctuations, namely the PDF of the difference of the 21-cm brightness temperatures of two regions, as a function of the separation between their centres. This two-dimensional statistic generalizes two one-dimensional functions, the one-point PDF and the two-point correlation function; the latter is simply related to the variance of the difference PDF (equation 13).
We have predicted the difference PDF based on the correlated two-point distribution of density and ionization (Barkana 2007), generalized to the case of a finite observed resolution, and including a non-linear correction for the density (equation 8). The model predicts a characteristic shape during reionization for the PDF (or its cumulative form, the CPD) that can be understood from the various contributions to it depicted in Fig. 1. The PDF contains information on the distribution of the density within neutral regions and on the dominant spatial scales of bubble and density correlations (Figs 4 and 5). The full PDF is a function of separation and of T b at each redshift (Fig. 6).
While the usual correlation function is determined by a complicated mixture of contributions from density and ionization fluctuations, we have shown that the difference PDF (together with the onepoint PDF) holds the key to separately measuring statistics of the ionization distribution. In particular, even an approximate measurement of the PDFs (which is observationally feasible: Section 3.2) can generically be used to measure the ionization probability of a resolution-sized region, and the joint ionization probability of two such regions as a function of their separation. Within our model, the joint ionization probability shows the same characteristic correlation scale as the two-point 21-cm correlation function but has more power (Figs 10 and 11); this is because the contributions of density and ionization to 21-cm fluctuations are anticorrelated (a higher density implies less neutral gas), which reduces the 21-cm fluctuations relative to ionization fluctuations.
If quasars contribute significantly to reionization by producing large bubbles even early in reionization, or if quasars or supernovae emit significant X-ray photons (which have a long mean free path), then the density and ionization fluctuations will not be as simply related as we have assumed. In this case, it will be much more important to measure the ionization statistics separately since it will be difficult to extract this information from the 21-cm statistics. Also in this case the relation between the one-point and difference PDFs and the 21-cm correlation function should be significantly different from the case we have considered. The full quantitative details of such a scenario go beyond the scope of this paper and merit a separate study.