High-resolution imaging of the cosmic mass distribution from gravitational lensing of pre-galactic H I

Low-frequency radio observations of neutral hydrogen during and before the epoch of cosmic re-ionization will provide ∼ 1000 quasi-independent source planes, each of precisely known redshift, if a resolution of ∼ 1 arcmin or better can be attained. These planes can be used to reconstruct the projected mass distribution of foreground material. Structure in these source planes is linear and Gaussian at high redshift (30 < z < 300) but is non-linear and non-Gaussian during re-ionization. At both epochs, signiﬁcant power is expected down to subarcsecond scales. We demonstrate that this structure can, in principle, be used to make mass images with a formal signal-to-noise ratio (S / N) per pixel exceeding 10, even for pixels as small as an arcsecond. With an ideal telescope, both resolution and S / N can exceed those of even the most optimistic idealized mass maps from galaxy lensing by more than an order of magnitude. Individual dark haloes similar in mass to that of the Milky Way could be imaged with high S / N out to z ∼ 10. Even with a much less ambitious telescope, a wide-area survey of 21-cm lensing would provide very sensitive constraints on cosmological parameters, in particular on dark energy. These are up to 20 times tighter than the constraints obtainable from comparably sized, very deep surveys of galaxy lensing, although the best constraints come from combining data of the two types. Any radio telescope capable of mapping the 21-cm brightness temperature with good frequency resolution ( ∼ 0.05 MHz) over a band of width (cid:2) 10 MHz should be able to make mass maps of high quality. The planned Square Kilometre Array may be able to map the mass with moderate S / N down to arcminute scales, depending on the re-ionization history of the universe and the ability to subtract foreground sources.


I N T RO D U C T I O N
Dark matter appears to be the dominant component of all structures larger than individual galaxies. In the standard paradigm its gravitational effects drive the linear growth and the subsequent non-linear collapse of the fluctuations detected at z ∼ 1000 in the cosmic microwave background (CMB). Our inability to 'see' the dark matter, and so to image its distribution, has prevented a definitive observational verification of this paradigm. Simulations of structure formation predict all galaxies and galaxy clusters to sit within extended dark haloes with regular and well-specified structural properties, but it has proved difficult to test these predictions convincingly. As first demonstrated by Kaiser & Squires (1993) the distortion of the images of distant objects caused by gravitational lensing can be used to reconstruct an image of the foreground mass distribution. All successful applications so far have used distant galaxies as the sources. The resolution and signal-to-noise ratio (S/N) of the resulting maps are fundamentally limited by the abundance and intrinsic ellipticity of these sources. Even with deep satellite data the effective density of usable galaxies does not exceed about 100 arcmin −1 . For a map with 1 arcmin pixels this corresponds to an S/N per pixel (ratio of rms expected physical fluctuation to rms noise fluctuation) of about 0.75; for 10 arcmin pixels this ratio is about 2.5. As a result, only the centres of the most massive galaxy clusters can be detected at high S/N in galaxy-based mass maps. In this paper we show that much higher resolution and effective S/N can, in principle, be achieved by using high-redshift neutral hydrogen as the source, rather than galaxies.
There has been a great deal of interest in the possibility of observing the hyperfine transition of hydrogen (the 21-cm line) from the intergalactic (or pre-galactic) medium at high redshift (see Furlanetto, Oh & Briggs 2006, for an extensive review). There are two essentially disjoint epochs from which 21-cm radiation should be observable. At a redshift of z 300 neutral hydrogen (H I) became thermally decoupled from the CMB. The gas kinetic temperature then fell below the CMB temperature T r due to their different adiabatic cooling laws. For a while the spin temperature T s remained coupled to the kinetic temperature by atomic collisions, but at z ∼ 30 the collision rate became so low that the spin temperature decoupled from the kinetic temperature and returned to equilibrium with the CMB. During the period 30 < z < 300, T s was below T r and there was a net absorption of CMB photons through the 21-cm line. The observable quantity is the brightness temperature T b = (T s − T r ) (1 − e τ ) (T s − T r )τ which depends on the optical depth, τ , which is in turn proportional to the density of H I. A map of T b on the sky and in frequency would thus be a 3D map of the H I density, which is directly proportional to the mass density at these redshifts. The physics during this epoch of 21-cm absorption is simple, and predictions within the standard cosmogony are straightforward and robust.
The second epoch with observable 21-cm effects is considerably less well characterized. It is known that almost all intergalactic hydrogen at z < 6.5 is ionized. It is believed that radiation from the first generation of stars and/or quasars caused this re-ionization between z ∼ 6.5 and 30. The latest CMB constraints give 8.5 < z reion < 22 at 68 per cent confidence (Spergel et al. 2006). A variety of mechanisms will transfer energy from X-ray and/or Lyα radiation to the H I gas during re-ionization, thereby raising T s above the CMB temperature and making the 21-cm line visible in emission. After re-ionization is complete, too little H I is left to be observable. The mean free path for X-rays through the neutral intergalactic medium at z < 15 can exceed the Hubble length, so the spin temperature for much of the H I could have been raised uniformly before significant re-ionization occurred. Lyman-continuum radiation is expected to produce ionized bubbles that expand until they overlap. Re-ionization finally completes as the last interbubble clumps are evaporated. During this period, Lyα radiation passes freely through ionized regions but is resonantly scattered in neutral regions, thereby raising their spin temperature and producing 21-cm emission. 'How rapid and inhomogeneous was this process' is highly uncertain and is likely to remain so until it is directly measured. It is also possible that shock heating of H I gas during the collapse of pre-galactic objects could raise T s enough for 21-cm emission to be visible before re-ionization begins (Kuhlen, Madau & Montgomery 2006).
Gravitational lensing distorts our image of the 21-cm emission and absorption by moving the angular positions of points on the sky while keeping the associated surface brightness (and thus brightness temperature) unchanged. For this reason a smooth background radiation field is unaffected by lensing. The observed map of brightness temperature thus reflects both the intrinsic structure of the fluctuations and the lensing distortions. To separate the two, we use the fact that in a given direction the intrinsic structure of maps at sufficiently separated frequencies (hence redshifts) will be statistically uncorrelated, while the foreground lensing distribution will be the same. Below we show how the maps can be combined so as to average out the intrinsic temperature fluctuations while preserving the lensing signal. In essence, the gradients of brightness temperature maps at a set of sufficiently well-spaced frequencies are independently and isotropically distributed in the absence of lensing, but display a coherence which is a direct measure of the lensing-induced shear when the foreground mass distribution is taken into account.
Gravitational lensing of pre-galactic 21-cm signals has previously been considered by several authors. In particular,  extended to three dimensions (angle on the sky + redshift of source) the techniques developed by Hu (2001) for detecting lensing in the CMB, and they applied them to high-redshift 21-cm emission. In retrospect we find that the Fourier-space version of the method presented here is related to their method (see Appendices B and C for details) and that our method is related to one developed by Seljak & Zaldarriaga (1999) for detecting lensing in the CMB. Cooray (2004) (also Sigurdson & Cooray 2005) had already discussed applying the original 2D Hu (2001) method to the 21-cm absorption epoch, but this misses the main advantage offered by the radio technique, namely, the large number of available quasi-independent source planes. Finally, Pen (2004) discussed measuring gravitational lensing effects in the 21-cm emission by looking for anisotropic effects on the second-order statistics of the brightness fluctuations. This does not estimate the gravitational shear directly by comparing maps at different frequencies in the same direction, and so is much less sensitive than the approaches suggested here and by .
The 21-cm emission/absorption has two major advantages over the CMB as a background source for lensing studies. Since lensing conserves surface brightness, it can only redistribute structure that already exists in the source. The CMB has very little structure on the angular scales where lensing is significant ( 1 arcmin) so that lensing effects are very weak. The second advantage is that the CMB provides only one temperature field on the sky while the 21-cm emission/absorption provides many, all of which are lensed by the same foreground mass distribution. Although the CMB comes from higher redshift, this is a relatively minor advantage since most of the structure detected by lensing is at much smaller redshift than either source.
Our paper is organized as follows. In Section 2 an estimator for the gravitational shear is derived and in Section 3 the noise in that estimator is discussed and quantified using a particular model for correlations in the 21-cm brightness temperature. The expected lensing signal and the size of objects that could be detected are calculated in Section 4. The prospects for measuring cosmological parameters with 21-cm lensing are discussed in Section 5. The observational prospects given currently planned telescope designs are discussed in Section 6. In the appendices several technical issues are addressed and alternative methods for measuring the lensing signal are described.

A N E S T I M ATO R F O R T H E G R AV I TAT I O NA L S H E A R
The observed deviation in the brightness temperature of the 21-cm emission at a redshift z (or equivalently frequency ν) and a point on the sky, θ will be denoted T(θ, ν). We seek to construct a statistic from this temperature that, when summed over frequency bands, ν, preserves the lensing signal while smoothing out the fluctuations in T(θ, ν). A statistic will have these properties if it has the same properties when averaged over an ensemble of temperature fields at a fixed ν while keeping the lensing contribution fixed. All statistics that are first order in T(θ, ν) vanish with this averaging because of isotropy.
We will now show that it is possible to isolate the lensing contribution in the second-order statistics of the gradient of the temperature field, ∇T(θ, ν). The small angle, or 'flat-sky', approximation will be used throughout this paper and is well justified for the angular scales that are considered. The observed temperature at a point on the sky, θ, is the source temperature at θ = θ + α(θ, ν) plus noise, where θ is the position on the source plane (what the position would be in the absence of lensing) and α (θ, ν) is the position shift caused by lensing (hereafter the deflection). Thus the observed gradient of the temperature will be where T (θ, ν) is the real, unlensed brightness temperature and N(θ, ν) is the noise in the measured gradient. Repeated indices are summed over. The square of the magnitude of the observed gradient will be where the ν variables have been left out for brevity. The source emission, the deflection and the noise will all be statistically independent so we can consider them separately. Averaging over the source gives where this defines σ 2 ∇ (ν) and δ i j is the Kronecker delta. The distortion matrix α i j can be decomposed into quantities that are commonly used in lensing, the convergence κ, the shear γ and a rotation parameter β, Using this decomposition we find α ik (θ, ν)α jk (θ, ν) to correspond to the matrix The rotation term β comes from coupling between different lens planes and is second order in the surface density. It is expected to be very small in nearly all cases so we will neglect it in what follows, although its inclusion would be straightforward. Because of isotropy and the requirement that the usual angular size distance be correct on average, we have [α i j (θ, ν)] = 0 where [·] denotes an average over direction on the sky, and where γ 2 = γ 2 1 + γ 2 2 . The second equality follows from the deflection field being a potential field (i.e. assuming β = 0). We now construct three second-order quantities from the observed temperature gradient, and where the indices on the gradient symbols refer to the two axes of the chosen orthogonal coordinate system. For a given direction (and hence deflection field) the averages of these are where σ 2 N (ν) is the average of |N(θ)| 2 over random realizations of the noise. If the noise is isotropic it will drop out of both (11) and (12). This can be seen by expressing the noise vector in terms of its magnitude and polar angle and then requiring that the direction be random. However, in general the noise may not be isotropic so we retain these terms. To lowest order the first terms in the averages (11) and (12) are proportional to the gravitational shear and (13) is related to the convergence.
The lensing signal from a single-redshift slice will be dominated by noise so we wish to add up frequency channels to reduce the noise. The convergence and shear are slowly varying functions of ν at the high redshifts we are considering, so for now we will assume γ (θ, ν) to be independent of ν within the frequency band being used. This suggests estimating the shear at a point on the sky through where the sum is over frequency channels. The weights, ω ν , are normalized so that the mean values are and The weights will be determined in the next section. Except along exceptional lines of sight (through the very centres of galaxies and galaxy clusters) κ(θ) is much smaller than 1. As we show explicitly below, the variance σ 2 κ is thus small, and (14) in effect provides an unbiased map of γ(θ) all the way back to the beginning of structure formation. We will sometimes refer toγ i (θ) as the shear estimators even though the third component is an estimator for the convergence, andγ 3 (θ) will sometimes be written asκ(θ).

Instrumental, foreground and irreducible noise
There will be a number of sources of noise in the estimators (14). In particular, there will be noise from the instrumentation, terrestrial interference and incomplete subtraction of Galactic and extragalactic foreground emission. This noise is encapsulated in the N(θ, ν) vector field. We will refer to these sources of noise collectively as foreground noise. It is expected that foreground emission will be removed to high accuracy by using the fact that it varies slowly with frequency, whereas the 21-cm emission/absorption signal (and particularly the angular gradient of this signal) decorrelates for even small separations along the line of sight (see Zaldarriaga, Furlanetto & Hernquist 2004;Santos, Cooray & Knox 2005). The removal process could, however, leave noise with correlations in both frequency and position on the sky. For currently planned generations of instruments, this residual is expected to be as small as or smaller than the purely instrumental or thermal noise. The lensing signal is also coherent in frequency, but foreground subtraction will not effect it because lensing is multiplicative while the foregrounds are additive, see equation (1). Lensing does not cause correlations between frequency channels; it causes spatial correlations within a frequency channel that are the same as in the other channels.
In addition to foreground noise there is noise from the randomness of the ∇T(θ, ν) field itself. Clearly, this cannot be reduced by any improvement in technology or foreground subtraction, so we will refer to it as the irreducible noise. It depends only on the intrinsic correlations in the 21-cm signals and on the range of frequency, or redshift, over which the signals are mapped. We will find that for any telescope which is able to map the 21-cm signals, the total noise in the shear estimate will automatically be near the irreducible value. For this reason it is both a lower limit and a good benchmark.
We must also differentiate between the noise per pixel and the noise in the averageγ (θ) over a patch of sky which is larger than the pixel size. By pixel we mean the smallest resolvable region of the sky as set by the telescope. If the angular correlations in the noise drop off more rapidly than the correlations in the shear, then the S/N will be maximal on an angular scale that is larger than the pixel size. We will refer to a region of sky over whichγ (θ) is averaged as a patch in the shear map. A patch could be a square region, a circular aperture, a Gaussian smoothing window or any other localized window.
The variances in the magnitude of our shear estimators, (14), are given by where W(θ ;δ ) is the window function defining the patch (normalized to one when integrated over θ) and δ is its characteristic angular scale. The noise in the magnitude of the shear per pixel (i.e. in the original unsmoothed detection) is σ 2 γ (0) while the noise in the isotropic estimator is σ 2 γ 3 (δ ). Note that the tildes are used to differentiate between noise in the estimator, σγ , and the variance in the signal, σ γ . With some assumptions the correlation functions can be simplified: In (20) we used the fact that rotational invariance requires that the noise in both components of γ (θ) be the same, so we choose to find the variance in the simpler 2 (ν) and double it to account for the other component. To get (22) from (21) we have assumed that each component of ∇T(ν) is normally distributed so that the fourth moment can be reduced to second moments. In addition, we assume that the temperature field is isotropic so that there is no cross-correlation between the components. The same assumptions are used in expression (23).
Replacing the observed gradient in (22) with the true temperature gradient plus noise gives the result where and The correlation function ξ N (ν, ν , θ) is similarly defined. The pixel and frequency response functions are included in T (θ). The optimal weights, ω ν , can be calculated numerically, but a very good analytic approximation can be found by assuming that they vary slowly over the frequency range in which T (ν) is correlated [A(ν, ν ) A(ν, ν)]. In this case ω ν ω ν in (25). Minimizing this subject to the constraint (15) gives the weights Substituting this back into (25) gives the noise If the noise in the brightness temperature map is small compared to the fluctuations in the temperature itself, ξ N (ν, θ) < ξ ∇ (ν, θ ), which is a minimal requirement for mapping the brightness temperature, then the foreground noise will drop out of (26) and the irreducible noise limit will be reached. To approach this noise level it is not necessary to eliminate all foreground noise. Thus a telescope designed to map the brightness temperature will naturally achieve a noise level inγ (θ) that is close to the irreducible value. The correlations in frequency might be set by the bandwidth or by the intrinsic correlations in the brightness temperature.
It is often convenient to express the lensing noise (30) in terms of the (cross-)power spectra of the brightness temperature C T (ν, ν ) and the noise in the temperature C N (ν, ν ). This can be done by Fourier transforming the temperature and gives and For a further discussion of calculating things in Fourier space see Appendices B and C.
To determine the possible capabilities of 21-cm lensing experiments we now investigate the optimal case in which the irreducible limit is reached with a bandwidth that is much smaller than the intrinsic correlation length of the temperature. In the limit of infinitely narrow bandwidths the sums in equation (30) can be converted into integrals. The function A(ν, ν , θ) defines a volume in frequency and angle within which the structure or noise is too strongly correlated to contribute 'independent' information to the shear measurement. A very useful approximation to this volume can be found by calculating its characteristic length in frequency at θ = 0 and its characteristic angular area at ν = ν. For the temperature gradient alone these are Analogous correlation lengths can be defined for the noise term and for the cross-term. Note that these correlation lengths are defined with the correlation function squared, temperature to the fourth power, which makes them significantly smaller than the usual correlation lengths defined with the first power of the correlation function. When the patch size is near ∇ (ν) there will be only a few quasi-independent areas per patch. To account for this it is convenient to define the quantity This quantity is essentially the area of a correlated region divided by the area of the patch. Two limiting cases are instructive. For a very small patch N ∇ (ν; δ ) → 1 and for a patch much larger than the intrinsic correlation length for a Gaussian patch]. One would like the data to be collected in frequency channels with a width smaller than ν ∇ (ν), otherwise the irreducible noise will be increased. Instrumental design or foreground noise actually result in there being an optimal bandwidth that is near ν ∇ (ν) as will be shown in Section 6.
Using the above definitions in (30) a simple approximation for the irreducible noise is found, When the patch size is very close to the pixel size the complete integrals in (26) must be carried out to obtain an accurate result, but for the purposes of this section this is not necessary. The limit for small δ is the noise per pixel. It is easy to see from (36) and (35) that the square of the irreducible noise is essentially one over twice the number of correlated volumes in a patch.
The above estimates assume that σ 2 ∇ (ν), the variance in the intrinsic temperature within a frequency channel, can be measured exactly so that the estimatorsγ i can be normalized properly. This is normally a good approximation as we now show. The variance in the gradient can be found by averaging over position on the sky in the entire surveyed region. Using (2) and dropping all terms higher than second order in κ and γ results in where the area of the surveyed region is . It has been assumed that σ 2 N is determined by independent means to an accuracy much better than the above. The correlations in the lensing convergence are relatively small [σ 2 κ , ξ κ (θ) 1] so the terms containing them on lines (38) and (39) can be safely ignored. The last term in (40) expresses the uncertainty in the mismatch between the average κ (or γ ) over the survey region and the true average.
By comparing (38)-(40) with equation (26) one can see that where κ is the area of sky over which the foreground convergence is correlated. (Note that κ is defined with one power of ξ κ (θ) instead of two like ∇ .) Both these correlated areas could effectively be as small as the pixel if sparse pointings are used for normalizingγ (θ). If a survey covers just a few independent regions and is capable of mapping the shear (so that σγ σ κ ), then the noise in the normalization of the shear map will be small, and σγ (δ ), as obtained above, can be taken as the noise in the shear estimate.

Correlations in the 21-cm emission
The irreducible noise in the shear map depends critically on the number of statistically independent regions of 21-cm emission/absorption along a single line of sight. At the redshifts where the 21-cm brightness temperature is significant the density of the Universe was dominated by ordinary matter so the comoving length between two redshifts is well approximated by the flat universe formula Between redshift 10 and 100, for example, l co = 2200 h −1 Mpc for m = 0.3, or 96.5 h −1 Mpc in proper distance. Roughly speaking, the correlation length (33) is ∼0.1−1 Mpc (comoving) for a pixel size of 0.5 arcmin in radius or smaller so we expect of order 1000 independent samples between these redshifts. A more detailed calculation must take into account the precise form of the correlations in the brightness temperature. The irreducible noise is independent of the normalization of the correlation function ξ ∇ (ν, ν , θ) and thus will depend only on the shape of the 3D correlation function or power spectrum. During the early epoch of 21-cm absorption the brightness temperature will be correlated in the same way as the dark matter (Loeb & Zaldarriaga 2004). During re-ionization the correlations could be very different. One expectation is that 'bubbles' of ionized gas will form and expand until they merge. The size of the bubbles depends on the abundance and spatial distribution of sources of ionizing radiation; active galactic nuclei produce larger bubbles and stars smaller bubbles. These bubbles may or may not be smaller than the pixel -a 1 arcmin pixel has a comoving width of 1.9 h −1 Mpc at z = 10. In what follows we make the assumption that the brightness temperature is proportional to the dark matter density even during re-ionization. We consider this conservative, because modifications to the power spectrum during re-ionization are more likely to shorten the correlation length (and so to reduce the noise) than to increase it. The contribution of ionized bubbles will increase the correlations on scales larger than the characteristic bubble size and suppress them somewhat on scales smaller. There have been a number of attempts to model the fluctuations in the brightness temperature during re-ionization (Furlanetto, Zaldarriaga & Hernquist 2004;Wyithe & Loeb 2004). We tried the simple model of Santos et al. (2003) and find that it produces very little difference in the irreducible noise for δθ = 0.5 arcmin because the bubbles are significantly smaller than the pixel sizes. However, in the absence of either a complete theory of re-ionization or direct observations, the form of the temperature correlations remains a significant source of uncertainty in what follows, especially for small pixel sizes.
The brightness temperature in direction θ and at frequency ν is given by where q ν (r) is the response function of the telescope expressed as a function of distance instead of frequency and r(ν) is the comoving distance to the redshift from which the 21-cm line is observed at frequency ν. Since peculiar velocities will change the observed frequencies of the 21-cm line, r(ν) is not actually the radial distance, but rather the redshift expressed as a distance. Using this we can find the correlation function between the gradient of the temperature at different redshifts. This can be done in spherical coordinates, but it comes out much more simply in the small angle approximation. The bandwidth will initially be treated as infinitely narrow, q ν (r) = δ(r(ν) − r) (see Appendices B and C for a treatment of finite bandwidths). The result expressed as an integral over Fourier space is where r (ν, ν ) = r (ν) − r (ν ), r (ν, ν ) = (r (ν) + r (ν ))/2 and P 21 (k, ν) is the 3D power spectrum of the 21-cm brightness temperature. It has been assumed that the power spectrum, P 21 (k, ν), does not change significantly over the range in ν where there are significant correlations. The linear redshift distortions (Kaiser 1987) are responsible for the β term. The parameter β is given by β = m (z) 0.6 /b(z) 2 to a very good approximation, where b(z) is the bias between the matter and the T 21 fluctuations and m (z) is the density of matter in units of the critical density at that time. Here we have assumed that P 21 (k) = b 2 P matter (k) and is thus proportional to the matter power spectrum. In the calculations that follow we take b = 1, as expected at least during the early epoch of 21-cm absorption. With this result and with an assumed pixel profile the frequency correlation length (33), the angular correlation area (34) and the irreducible noise (36) can all be calculated. The non-linear evolution of the power spectrum is of some significance for the smaller pixel widths considered here. To account for this we use the Peacock & Dodds (1996) method to convert the linear power spectrum to a non-linear one. Fig. 1 shows ν ∇ (z) as a function of redshift for a circular Gaussian pixel with various radii δθ. The decrease in ν ∇ (z) with increasing redshift is largely the result of a fixed comoving distance corresponding to a smaller frequency interval at higher redshift. The correlation length also increases with increasing pixel size, but it is always less then 0.4 MHz, even when the pixel is 5 arcmin in radius.
The irreducible noise per pixel, σ 2 γ (0), is shown in Fig. 2 for a few pixel sizes and ranges in redshift. It can be seen that smaller pixel sizes give smaller irreducible noise per pixel. It is not yet clear over what range in z the 21-cm emission/absorption will be detectable. This depends on the history of re-ionization, on the subtraction of foregrounds and on telescope sensitivity. If the re-ionization epoch lasts from z ∼ 10 to 20 and this whole redshift range can be observed, then the expected irreducible noise is 2 per cent for a δθ = 0.5 arcmin pixel and 0.6 per cent for a 3-arcsec pixel. The early epoch of 21-cm absorption lasts from z ∼ 30 to 300. If this whole range could be observed, then we expect σγ (0) = 1.7 and 0.6 per cent for the same pixel sizes. It is possible that both epochs of emission/absorption will someday be observable, reducing the noise still further.
The angular correlation function at fixed frequency is shown in Fig. 3 for several different pixel sizes. To a good approximation, the correlation function scales as  (33) as a function of redshift. The power spectrum of 21-cm emission is taken to be the same as that of the dark matter, including linear velocity distortions and non-linear structure formation. The dot-dashed curve is for a Gaussian pixel of radius δθ = 5 arcmin, the dotted curve is for 1 arcmin, the solid curve is for 0.5 arcmin, the dot-dot-dot-dashed curve is for 0.1 arcmin, and the dashed curve is for 0.05 arcmin (3 arcsec). In the δθ = 0.05 arcmin case the decrease in the correlation length at small redshifts is caused by non-linear structure formation. This effect is present in the other cases but to a lesser extent.

Figure 2.
The expected irreducible noise in the shear measurement per pixel. This is a plot of expression (36) with δ = 0 and assuming a CDM dark matter power spectrum for the 21-cm brightness temperature. The dot-dashed curves are for a pixel radius of δθ = 5 arcmin, the dotted curves are for 1 arcmin, the solid curves are for 0.5 arcmin and the dashed curves are for 0.05 arcmin (3 arcsec). The upper limit of the redshift range used in the measurement is the abscissa, z 2 . For each pixel size the five curves are for different lower redshift limits. They are from left-to right-hand side (or down to up) z 1 = 6.5, 12, 22, 40 and 71. where f is a constant. The somewhat awkward normalization is chosen so that 2π x f (x) dx = 1 and f is unity to within a few per cent if the brightness temperature follows the cold dark matter (CDM) density field. We retain f as a fudge factor which could differ significantly from one if brightness temperature is not distributed like mass. This approximate scaling is a result of the power spectrum being almost scale-free on the relevant scales. It is a very useful approximation with important consequences, because it means that the smaller the pixel, the lower the irreducible noise for a fixed area on the sky. The scaling can be understood by considering the limiting case where the temperature is a Poisson process with correlations only on scales much smaller than the pixel so that the pixelization dominates the observed correlations. In this case and f = π/4. This limiting case is also shown in Fig. 3. The angular correlation is very nearly frequency independent because comoving angular size distance is a slow function of redshift at these high redshifts, and because the power spectrum of temperature fluctuations does not change shape during linear evolution. There is some dependence on ν for the smallest pixel radius (δθ = 0.05 arcmin) reflecting non-linear structure formation effects on these small scales (∼100 kpc) at the lower redshifts. The correlation area is a simple function of δθ, to a very good approximation ∇ 4f δθ 2 . Note that we use the radius of the Gaussian to characterize the pixel rather than its full width at half-maximum (FWHM).
Because of this simple scaling of correlated area with pixel size, a simple expression for the irreducible noise per patch can be found To connect the two asymptotes the formulae (35) and (36) must be used. The scaling as δ −2 on large scales just reflects the fact that correlations in temperature gradient are negligible on scales significantly larger than the pixel. The pre-factor might be different if brightness temperature turns out not to be distributed like dark matter density. If the brightness temperature correlations have strongly non-power-law behaviour on the relevant scales, then f will show some dependence on the pixel size. For example, if the temperature distributions were smooth on small scales, then making the pixel smaller would provide no further information and the noise would not continue to decrease with pixel size. As mentioned before, the correlation length might also be smaller during re-ionization, in which case σγ (δ ) might also be smaller. This effect must be minor, however, since the correlation length cannot be smaller than for the completely pixel-dominated Poisson case and, as Fig. 3 shows, this is only slightly smaller than that of our standard model. In Appendix C we present an alternative derivation of the noise in Fourier or visibility space that agrees very well with the one given here.

Signal-to-noise ratio estimates
We now need to determine whether there will be enough signal on the appropriate angular scales to produce a high-fidelity map of the shear. This requires the noise to be significantly lower than 'typical' values of the shear. We quantify the latter by calculating the rms value of the shear along random lines of sight.
The distortion matrix introduced in Section 2 can be written in terms of derivatives of the Newtonian potential, φ(x), along the light path. To a good approximation the unperturbed light path can be used (the first Born approximation) which results in with where r is the radial coordinate distance, D(r, r ) is the angular size distance between the two coordinate distances and W(θ; δ ) is still the angular window on the sky. Sometimes the distances to the source redshift, to the lens redshift and between them will be abbreviated as D s , D l and D ls , respectively. The coordinate vector perpendicular to the line of sight is x ⊥ . The lensing convergence is where δ(x) is the fractional density fluctuation.   To relate the variance in κ to the power spectrum of matter fluctuations it is easiest to use the Fourier space Limber's equation (Kaiser 1992) and then to transform back to angular space. For a geometrically flat universe the result is where E(z) = m (1 + z) 3 + and P δ (k, z) is the 3D power spectrum of matter fluctuations at redshift z, andW (l; δ ) is the window in Fourier space. The first equality follows from the shear being a homogeneous potential field to first order. The window will be taken to be a Gaussian to conform with our results in Section 3.2. Fig. 4 shows σ κ (z) as a function of source redshift for windows of different widths. The expected fluctuations in κ are at the many per cent level for redshifts between 10 and 300 (for a 1 arcmin pixel 4-6 per cent, for a 3-arcsec pixel 7.5-11 per cent). Reducing the pixel size can increase the signal substantially. By comparing this figure with Fig. 2 we see that for a pixel size of 1 arcmin or smaller and with a moderate redshift range the irreducible noise per pixel is less than half the expected signal. Fig. 5 shows which redshifts contribute most to σ 2 κ (z) for source redshifts of 1, 10 and 100. It can be seen that structures above z = 2 contribute significantly in both 21-cm cases, whereas structure around z ∼ 0.5 dominates in the galaxy-lensing case. If the shear could be measured accurately using signals from both epochs of 21-cm emission/absorption, one could expect to isolate the contribution from structure at z ∼ 10, since this contributes significantly to the signal for source redshift 100. In these calculations the non-linear power spectrum was modelled using the Peacock & Dodds (1996) method with a normalization of σ 8 = 0.75. Note that for the smaller pixels especially, the distribution of κ is strongly non-Gaussian, and the variance plotted in Fig. 4 is substantially larger than a typical fluctuation because of the long tail to high κ values (see Hilbert et al. 2007). Fig. 6 shows σ |γ | (δ ) and σκ (δ ) as functions of angular scale for observations averaged over patches larger than the pixel. The fluctuations in shear drop off relatively slowly with increasing angular scale while at scales much larger than the pixel size σκ (δ ) ∝ δ −1 . As a result even if the noise per pixel is comparable to σ κ the shear can still be mapped with high S/N on scales larger than the pixel. With small noise per pixel, the surface density averaged over large scales can be measured with high precision. Note that the normalizations of

Detection of individual objects
We have shown that 1σ fluctuations in the convergence could be detected with modest to good S/N (depending on pixel size) by a 21-cm experiment. Another interesting question is what kind of objects would be individually visible in a 21-cm shear map. To answer this question let us consider a circle of radius θ on the sky centred on a collapsed clump or halo. The average tangential shear on this circle is given by where M(θ) is the mass within the circle. The average tangential shear within a disc can be found from this. For haloes with an NFW profile (Navarro, Frenk & White 1997) we find, as a function of virial mass and halo redshift, the radius where the S/N for the average tangential shear is maximized. The central density and the scale-size are set according to the NFW prescription. If the disc is smaller than the pixel, the tangential component of the shear will not be identifiable. Thus although these haloes might cause a significant feature in the shear map, we will not consider a halo detected unless the S/N is above a 1 or 2σ threshold within a circle with a radius at least as large as the pixel size. The halo mass detection limit is plotted in Fig. 7. With a 3-arcsec pixel this threshold is below 10 12 M almost all the way out to the redshift of the 21-cm emission/absorption and <2 × 10 11 M below z = 1. This is smaller than the mass of the Milky Way halo today. Note that these are virial masses, not the mass enclosed within the circle. The latter is the directly detected mass and can be significantly smaller. Taking the average tangential shear over a disc is not the best method for detecting haloes. One could do somewhat better by assuming a model for their radial profiles and deriving an optimal weighting function (Schneider 1996). Here, however, we restrict ourselves to the question of what objects would be clearly visible in a shear map without any further special processing.
For comparison we calculate a similar mass limit for idealized future galaxy-lensing surveys. In this case the noise in the average shear in a patch of radius is σ 2 = σ 2 /(π 2 n g ) (half this for just the tangential component) where n g is the angular number density of background galaxies and σ is the rms intrinsic ellipticity of those galaxies; we use the standard estimate σ = 0.3. The shear strength depends on the redshift distribution of background galaxies with usable ellipticities. Here we model the redshift distribution as dn g /dz ∝ z 2 e −(z/z 0 ) 1.5 , where z 0 is set by the desired median redshift. The shear (52) must then be averaged over the portion of this distribution that is at higher redshift than the lens plane. Halo detection limits calculated in this way are also shown in Fig. 7.
A very deep space-based galaxy-lensing survey might be competitive with a ∼1 arcmin pixel 21-cm lensing survey for detecting haloes at z < 1. The proposed satellite SNAP 1 is expected to survey ∼2 per cent of the sky with an expected galaxy density of n g 100 arcmin −2 and a median redshift z ∼ 1.23. The DUNE 2 satellite proposes surveying ∼50 per cent of the sky with n g 35 arcmin −2 and a median redshift of z ∼ 0.9. Several proposed ground-based surveys -LSST, 3 PanSTARRS, 4 VISTA 5 would cover comparable areas to DUNE at similar depth. These are the two cases shown in Fig. 7. Clearly higher redshifts will be accessible with 21-cm lensing. With a small pixel size, 21-cm lensing could detect all Milky Way mass haloes in the Universe! Based on the Sheth & Tormen (2002) halo mass function, for the same sky coverage ∼600 times more objects could be identified by such a survey than in a space-based galaxy shear map with n g = 100 arcmin −2 and ∼3500 times more than in a ground-based galaxy shear survey with n g = 30 arcmin −2 . Mass maps of galaxy clusters could be made with arcsecond resolution and high S/N, instead of with arcminute resolution and relatively low S/N as is possible using galaxy lensing. Galaxy halo studies, which now require stacking thousands of galaxies to measure a single average shear profile, could be carried out on individual galaxies.

E S T I M AT I N G C O S M O L O G I C A L PA R A M E T E R S F RO M T H E L E N S I N G P OW E R S P E C T RU M
As we have shown, high-resolution, high-S/N shear maps could be made using 21-cm lensing. These maps will contain a wealth of information which can be used not only to learn about structure formation, but also to estimate cosmological parameters. We will make a preliminary foray into this latter topic in order to compare the power of 21-cm lensing to that of galaxy lensing. A useful study of the capability of planned galaxy-lensing surveys for cosmological parameter estimation has recently been published by Amara & Refregier (2007) and we will adopt their survey parameters in the following in order to facilitate comparison between the two techniques.
Figs 4 and 5 show clearly that the strength of gravitational lensing depends on source redshift. This suggests that additional information may be extracted by comparing shear maps derived from sources at different redshifts -either multiple 21-cm source planes, or multiple galaxy source planes, or a combination of the two. Such weak lensing tomography has already been proposed for galaxy-lensing surveys as a method to measure the evolution of structure and thereby to constrain the nature of dark energy (Hu & Tegmark 1999;2002;Hu & Okamoto 2002;Heavens 2003;Castro, Heavens & Kitching 2005). In this context 21-cm lensing has the potential advantages of superior S/N, higher source redshift and better angular resolution. On the other hand, most models of dark energy affect structure formation and the cosmic expansion rate primarily at z 1 where galaxy-lensing tomography is most sensitive. As we show below, a combination of galaxy and 21-cm lensing appears likely to constrain dark energy parameters most effectively.
For the purposes of cosmological parameter estimation it is convenient to work in spherical harmonic or Fourier space. The crosscorrelation between the harmonic modes of two shear maps, corresponding to source planes at redshifts z 1 and z 2 , can be derived from equation (49) and is directly related to the power spectrum of density fluctuations and where and are the multipole indices in the two maps. Using (53) the shear (cross-)power spectrum is trivially converted into the convergence (cross-)power spectrum. The observed shear power spectrum of a lensing map contains a contribution from the irreducible noise, but this term is absent in the cross-correlation between maps for different source redshifts, since the noise fields in the two source planes are then independent. The power spectrum of the irreducible noise can be found from the analysis of Section 3: where the function f(x) is defined by equation (45) and in what follows it. An alternative approach to calculating this noise directly from visibility space is demonstrated in Appendix C. The observed shear power spectrum including only the irreducible noise will be ∼ P i j κ ( )e −2δθ 2 2 + πσ 2 γ (0)δθ 2 1 + (δθ ) 4 8 e −δθ 2 2 /2 δ i j (resolution limited case).
As can be seen in Fig. 3, the angular correlation function ξ ∇ (θ) has similar angular scale to the pixel. As a resultf ( δθ ) is close to unity for any 1/δθ and it decreases rapidly for larger . We will restrict ourselves to modes larger than the pixel (i.e. δθ 1) in which case both e −2δθ 2 2 andf ( δθ ) will drop out of C i j κ ( ). Expression (61) shows the result for the pixel dominated Poisson case discussed in Section 3.2. The shear power spectrum from galaxy lensing has the same form except there is no pixel. It is often assumed that the noise in this case is dominated by the intrinsic ellipticities of galaxies in which case the noise power spectrum is N κ ( ) = σ 2 /n g (Kaiser 1992). In practice errors in the photometric redshifts of the source galaxies are often important, but here we will assume an ideal survey where these are not significant.
So far no assumption of Gaussian statistics in the shear field has been made in this section. Although our estimatorγ (θ) is not Gaussian, we have shown that the correlation length of the irreducible noise should be close to the pixel size. The multipole moments for scales larger than the pixel size are then sums of many independent variables and, by the central limit theorem, are expected to be approximately normally distributed. The shear map itself will also have substantial non-Gaussianity caused by non-linear structure, even for Gaussian initial density fluctuations. However, on scales larger than individual dark haloes the shear map is expected to be close to Gaussian because of contributions from many independent structures along the long line of sight (Takada & Jain 2004).
For a Gaussian shear map the likelihood function factorizes by mode, making the analysis much simpler. The Fisher matrix in this case is (formula 62 in Appendix A), where the indices a and b refer to parameters p a and p b and f sky is the fraction of the sky covered (Hu & Tegmark 1999). The f sky factor can be interpreted as the result of limited resolution in visibility space because of the finite size of the radio telescope's pixel. It is assumed here that the coverage of the u-v plane is complete between min and max down to the resolution of the telescope. The smallest scale mode, max , is chosen so that the Gaussian assumption remains approximately valid. The minimum variance unbiased estimator of p a then has statistical uncertainty σ 2 (p a ) = (F −1 ) aa , so this quantity can be used to indicate how well the parameter p a can be constrained. Directly from (62) one finds that the power spectrum of the fluctuations in κ can be determined to accuracy using only one epoch of 21-cm lensing. This formula holds on scales between that of the survey area, where windowing effects cause the noise to increase sharply, and that of the pixel size. Fig. 8 shows the uncertainty in the κ power spectrum given by this formula. Not shown in the figure is the -space resolution which is ∼ f sky . The errors in the power spectrum for separated by less then will be correlated (see Appendix A for more details). The cosmic variance (or sample variance for a partial sky survey) is likely to dominate the uncertainty over all linear scales. This illustrates a fundamental limitation of measuring cosmological parameters from the convergence power spectra and cross-power spectra. Decreasing the instrumental and/or irreducible noise does not provide any further information about the ensemble power spectrum of κ although it does provide more information on the particular realization that we live in. Modes with < 10 4 will be cosmic variance limited if σγ (0)δθ < 0.017 arcmin for sources at z = 10. For modes with < 10 3 the same is true if σγ (0)δθ < 0.12 arcmin. When estimating cosmological parameters there is no reason to decrease the noise below these values, as long as the Gaussian assumption holds and one is only interested in these modes. Nevertheless, this does not mean that the cosmic variance limit on the 3D power spectrum . The fractional error in the convergence power spectrum for a full-sky survey due to irreducible noise and cosmic variance. The solid and dashed curves are for 21-cm lensing with sources at z = 100 and 10, respectively, assuming a 0.5 arcmin pixel radius and σ 2 γ (0) = 0.03. The two dot-dashed curves are for galaxy-lensing surveys with median source redshift z = 1 and with 35 (upper) and 100 (lower) galaxies arcmin −1 . The dotted curve which is just visible in the lower right-hand corner, but otherwise is covered by the solid curve is the cosmic variance limit. For a smaller area survey these curves scale with the fraction of sky covered like 1/ f sky for modes that are smaller than the surveyed region.
has been reached. More information can be gained by splitting the source redshift range up. This increases the noise for each subrange, but accesses the additional tomographic information that is averaged out when using the full redshift range to make a single shear map.
To proceed we must choose a cosmological parameter space for exploration, a fiducial model to perturb around, and observational parameters for a set of representative surveys. For simplicity and for ease of comparison we follow the galaxy survey parameters chosen by Amara & Refregier (2007). In the current standard paradigm, the apparently accelerating expansion of the present universe is driven by dark energy, a near-uniform and dominant component of the cosmic energy density with effective equation of state p = wρ where w < −1/3 (Riess et al. 2004;Astier et al. 2006;Spergel et al. 2006). Dark energy modifies the lensing signal due to cosmic structure in two ways. It affects the angular size distance to a given redshift, which is given by the expression where Here denotes the dark energy density today in units of the critical density, and w has been assumed to be constant. The second effect of dark energy results from its influence on the linear evolution of density fluctuations, which is given by In addition to m , and w, we include in our cosmological parameter set the logarithmic slope or spectral index of the primordial power spectrum, n s , the density of baryons, b , and the normalization of the power spectrum on large scales A, which is proportional to σ 2 8 . The baryon oscillations in the power spectrum are not calculated, so b only effects the overall shape. Note that we do not restrict ourselves to flat cosmologies, but we do fix the Hubble constant to H 0 = 70 km s −1 Mpc −1 , assuming this to be externally determined.
Figs 9 and 10 show predicted error ellipses for various pairs of our set of six cosmological parameters and for various combinations of idealized 21-cm and galaxy-lensing surveys. Whenever calculations are done for 21-cm lensing at a particular redshift the convergence is treated as if it where constant over the redshift range used in estimating it. For each plot we have marginalized over the remaining four parameters of our model set. In Table 1 we give corresponding 1σ uncertainties on individual parameters after marginalizing over the other five dimensions of our parameter space. The galaxy redshift distributions assumed here are the same as described at the end of Section 4.2. When the galaxies are binned into several redshift intervals, we define these so as to obtain an equal number of galaxies in each bin. We also assume the full sky to be surveyed in all cases; for partial sky coverage the sizes of the uncertainties are approximately increased by a factor of f −0.5 sky . Apart from fixing the Hubble constant, no additional constraints from other observations are included.
In agreement with Amara & Refregier (2007) we find that a ambitious galaxy-lensing survey could determine and n s with an accuracy of about ∼0.01, b with an accuracy of about 0.004, m with an accuracy of about 0.0025 and w, with an accuracy of about 0.03. An ideal survey going to a depth corresponding to 100 source galaxies arcmin −1 over the whole sky (requiring about 30 yr with the specifications of the SNAP satellite) would reduce these uncertainties by about a factor of 2. Surveys covering only a fraction f sky of the sky, would have uncertainties increased approximately in proportion to f −1/2 sky . While these numbers are impressive, shear maps derived from 21 cm alone can provide considerably tighter constraints. All-sky maps derived from the signal around z = 10 and 15 will be limited by cosmic variance if their resolution and noise properties satisfy σγ (0)δθ < 0.017 arcmin, and will then determine m to an accuracy of 4 × 10 −4 , to an accuracy of 3.5 × 10 −4 and w to an accuracy of 0.  The dashed blue ellipses are for a deeper survey with n g = 100 arcmin −2 , σ = 0.3 and a median redshift of z = 1.23. It would take the SNAP satellite roughly 30 yr to complete a full-sky survey to the deeper depth. In all cases the galaxies are divided into three redshift bins as described in the text. The solid green ellipses are for shear maps from 21 cm alone at redshifts z = 10 and 15. The dashed green ellipses are for the 'optimal' 21-cm case with shear maps constructed for z = 10, 30 and 100. For these calculations we assume pixel radius δθ = 0.05 arcmin and noise level σγ (0) = 0.02, but the results are valid as long as σγ (0)δθ 0.017 arcmin and δθ < 0.5 arcmin because in this case cosmic variance dominates the noise for all the values used. Modes = 10 − 10 4 were used in deriving these constraints. The solid red ellipses are for the shallower galaxy survey combined with a 21-cm lensing survey at z = 10 and 15. Finally, the solid black ellipse shows the 'optimum' combination of the deeper galaxy-lensing survey with 21-cm shear maps for z = 10, 30 and 100. Fig. 10 shows blow-ups of these plots so that the inner regions can be seen better. The line types are summarized in Table 1.  , A, n s and b are constrained about as well as by the 21 cm alone, while w is constrained almost six times better and m four times better. Constraints on dark energy parameters are improved by including the galaxy lensing because dark energy primarily affects structure evolution at z < 1. On the other hand, galaxy lensing alone gives comparatively poor constraints on these parameters unless a prior constraint on m is included. For parameters that affect only the matter power spectrum (e.g. n s ) 21-cm lensing has a larger comparative advantage. Of course it is not a question of one or the other. Clearly it is worth doing both galaxy and 21-cm lensing surveys to maximize the information gained and to spread the risk from unanticipated systematics.
It should be emphasized that this analysis does not exhaust the potential for constraining cosmological parameters using 21 cm or galaxy lensing. The dark energy model used here is overly simplified and may be unrealistic; some more physically based models imply appreciable effects at redshifts well beyond unity and so may be particularly well constrained by 21-cm surveys (Doran, Schwindt & Wetterich 2001;Caldwell et al. 2003). Other data sets, notably CMB observations and supernova surveys, constrain cosmological parameters in different ways than gravitational lensing, and will be much improved by the time surveys of the type discussed in this section are completed. Combining results from all these sources will give stringent tests for the presence of systematics and will provide tighter and more robust final constraints if overall consistency is found. Our knowledge of many cosmological parameters is limited by degeneracies which are drastically reduced when different types of observation are combined in this way.

O B S E RVAT I O N S
So far we have considered idealized observations where the irreducible noise dominates and the bandwidth is smaller than the intrinsic correlation length of the brightness temperature. This will be the best any experiment can do and, as we showed, will be reached when the noise in the temperature map is small compared to the temperature fluctuations in each frequency channel. This irreducible noise depends only on the shape of the temperature correlation function. Realistic observations, at least in the near future, will have foreground noise levels that are comparable to or larger than the intrinsic fluctuations in the brightness temperature. In this case the noise in the lensing map will depend more sensitively on the parameters of the telescope and on the level and statistical properties of the brightness temperature fluctuations. We now discuss these factors in more detail.
The observations will be carried out with radio interferometers and thus in visibility space. As a result, when calculating the performance of telescopes it is easier to work in Fourier space. For this section we adopt the formalism of Appendix C for convenience. Equations (C7) and (C10) give the noise in the κ estimate as a function of the power spectrum of foreground noise, C N (ν), and the power spectrum of the brightness temperature, C (ν).
The noise in each visibility will have a thermal component and a component from imperfect foreground subtraction. We will model only the thermal component. If the telescopes in the array are uniformly distributed on the ground the average integration time for each baseline will be the same and the power spectrum of the noise will be Morales 2005;McQuinn et al. 2006), where T sys is the system temperature, ν is the bandwidth, t 0 is the total observation time, D tel is the diameter of the array and max (λ) = 2πD tel /λ is the highest multipole that can be measure by the array as set by the largest baselines. f cover is the total collecting area of the telescopes divided by π (D tel /2) 2 , the covering fraction. Other telescope configurations are possible which would make the noise unequally distributed in , but we will consider only this uniform configuration here. Demonstrator 6 will operate in the 80-300 MHz range with D tel 1.5 km and f cover ∼ 0.1. For Low-Frequency Array (LOFAR) 7 the core array will have f cover ∼ 0.016 and D tel ∼ 2 km. LOFAR's extended baselines, out to 350 km and possibly larger, are not expected to be useful for high-redshift 21-cm observations because of the small f cover of the extended array, although they will be used in foreground subtraction. It is anticipated that it will be able to detect 21-cm emission out to a redshift of z 11.5, but sensitivity limitations will make mapping very difficult. Plans for Square Kilometre Array (SKA) 8 have not been finalized, but it is expected to have f cover ∼ 0.02 out to a diameter of ∼6 km ( max ∼ 10 4 ) and sparse coverage extending out to 1000-3000 km. The lowest frequency currently anticipated is ∼100 MHz which corresponds to z ∼ 13. It is anticipated that the core will be able to map the 21-cm emission with a resolution of δθ = θ/2 ∼ 0.5 arcmin. For reference what we call the pixel width is given by θ = 2δθ ∼ π/ max or 1.08 × 10 4 / max arcmin. 1 arcmin (FWHM) corresponds to baselines of 7.9 and 73 km at redshifts of 10 and 100, respectively. For our calculation we will concentrate only on an SKA-like array with D tel = 6 km and a redshift range out to z = 13 since the smaller planned telescopes will not be capable of mapping mass at high fidelity.
The fluctuations in the brightness temperature depends on the spin temperature, the ionization and the density of H I through  (Field 1959;Madau, Meiksin & Rees 1997). As is commonly done, we will assume that the spin temperature is much greater than the CMB temperature. This leaves fluctuations in the ionization fraction, x H , and the baryon density δ b = (ρ b − ρ b )/ρ b as the sources of fluctuations. We will make the simplifying assumption that x H = 1 until the universe is very rapidly and uniformly re-ionized. Realistically, the re-ionization process will be inhomogeneous and may extend over a significant redshift range. This will increase C (ν) by perhaps a factor of 10 on scales larger than the characteristic size of the ionized bubbles and thus might be expected to reduce the noise inκ significantly. However, we have derived the noise in the lensing map by approximating the fourth-order statistics of δ T b as they would be for a Gaussian random field. If this is still a good approximation the lensing noise will be reduced. This is uncertain, however, since during re-ionization the field will clearly not be Gaussian, especially when the neutral fraction is low. A definitive resolution of these uncertainties will not be available until the observations are done. Here we model fluctuations in the baryons in the same way as in Section 3.2 with linear structure formation and redshift distortions. Fig. 11 shows the S/N, defined as σκ (δ , ν)/σ κ (δ ), for our SKA-like telescope. For the assumptions taken here the telescope should be able to make images (2σ ) of the dark matter on 1.3-2.5 arcmin scales in 90 d (f cover = 0.018 − 0.025). These values are not too far from the optimal values and increasing the telescope's covering fraction or resolution would markedly improve upon these.
Unlike in the irreducible noise only case shown in Fig. 6, when thermal noise is added the noise inκ does not go to an asymptotic value at small δ . This is because the noise increases very rapidly near the maximum resolution of the telescope because of a combination of effects. First, the intrinsic temperature power spectrum goes down at 1000. C (ν) is also suppressed by a factor of ∼1/ ν when > D(z)/δ r where δ r is physical width corresponding to the bandwidth. In addition, for a fixed baseline the highest resolution is attained for only a limited range of frequency which limits the number of redshift bins. With the parameters adopted, the cross-correlation in the temperature between frequency channels never becomes important because the noise generally dominates when ν is small and it is assumed to have no cross-correlation between channels.
As can be seen from Fig. 11, there is an optimal bandwidth for measuring lensing. This comes about because at large bandwidths the number of independent frequency bins is limited. At small bandwidth the S/N goes down because C N goes up faster than C with decreasing 6 http://www.haystack.mit.edu/ast/arrays/mwa/. 7 www.lofar.org. 8 www.skatelescope.org/. ν for scales < D(z)/δ r. This optimal bandwidth is ∼0.05 MHz for our examples. If there is more structure on smaller scales, such as when there are ionized bubbles, the optimal frequency will decrease.
The optimal bandwidth for lensing is generally smaller than the optimal bandwidth for measuring the brightness temperature itself as can also be seen in Fig. 11. At the optimal bandwidth the lensing map can have good fidelity while the temperature map is noise dominated on the same scale. This is a somewhat counterintuitive situation which reflects the fact that it is better to get more independent redshift slices at low S/N than to image the temperature in fewer channels. With a wider bandwidth the temperature can be imaged on the same angular scale as the mass distribution. This indicates that it may be advantageous to use several bandwidths simultaneously.
There are many additional challenges to observing 21-cm radiation from high redshift. The Galactic foreground from synchrotron emission is about four orders of magnitude brighter than the 21-cm signal at ∼180 MHz and goes up with decreasing frequency as ν −2.6 . Both this emission and extragalactic foreground sources can, however, be cleaned from the data because they are much smoother in frequency space (and also in position on the sky for the Galactic foreground) than the 21-cm signal itself. At large frequency separations foreground emission may also decorrelate. Generally, foregrounds pose no more of a problem for mass-mapping than for direct mapping of the 21 cm itself. Rapid increases in foreground emission and in the refractive index of the ionosphere with decreasing frequency make observations at higher redshift progressively more difficult. The high-redshift 21-cm absorption (z 30) will be very difficult to observe and there are no mature plans to do so at this time. The ionosphere is opaque below ∼10 MHz or z 150, so in principle all lower redshifts are accessible from the ground. In practice, the high, time-dependent index of refraction will make it difficult to go beyond ∼60 MHz without major advances in telescope technology. The ultimate high-redshift 21-cm telescope would be located on the far side of the Moon where the absence of terrestrial interference or an ionosphere would allow access to higher redshifts. However, the large collecting area required would make this both technically challenging and expensive.
Much will depend on future instrument design and the as yet unknown characteristics of the 21-cm absorption/emission, particularly around the epoch of re-ionization. Despite this, the planned specifications for SKA may enable it to make high-fidelity maps of the matter distribution and if enough area can be surveyed very good statistical information should be accessible. Realistic upgrades to the collecting area and array size would greatly improve its ability to make mass maps.

C O N C L U S I O N
We have shown that when low-frequency radio telescopes become sufficiently powerful to map the signal from high-redshift 21-cm emission/absorption within a bandwidth of ∼0.05 MHz, the data will necessarily be good enough to map the gravitational shear due to foreground matter. Increasing the resolution of the telescope reduces the intrinsic noise in the shear map both because of the number of statistically independent redshift slices increases and because the number of independent patches on the sky increases. As a result, 21-cm lensing offers the potential of producing high resolution, high-S/N images of the cosmic mass distribution. Such images would be of enormous value for the study of cosmology and galaxy formation.
For the specific problem of estimating cosmological parameters the requirements on resolution and redshift range are not particularly demanding, but survey area is of great importance. Even for a full-sky survey with a pixel of radius δθ = 1 arcmin (2 arcmin FWHM) and 10 per cent noise per pixel, the shear power spectrum would be cosmic variance dominated up to ∼ 10 3 . The cosmic variance limit is probably achievable up to ∼ 10 4 with an array of ∼5 km in diameter and a covering factor of several per cent. Cross-correlating several redshift slices with each other and with galaxy-lensing surveys over a significant portion of the sky would begin a new era of very high-precision cosmology.
The study of structure formation would benefit particularly from higher resolution observations, however. If a resolution of ∼6 arcsec (FWHM) could be achieved, every halo more massive than the Milky Way's would be clearly visible back to z ∼ 10. Even with a resolution of ∼1 arcmin (FWHM) all the haloes 2 × 10 13 M should be individually detected. Connecting these mass maps to images of emission at other wavelengths would provide a tremendous wealth of information about the evolution of structure and the formation of galaxies.

AC K N OW L E D G M E N T S
RBM would like to thank B. Ciardi, P. Madau and H. Sandvik for very useful discussions. We would also like to thank U. Seljak and O. Zahn for very useful comments.
The visibility power spectrum can be related to the spherical harmonic power spectrum through where the second approximation is very good for 60 ). If there were just one visibility measured the Fisher matrix would be (see Tegmark et al. 1997, for a review of likelihood methods in astronomy). Visibilities within ∼σ u of each other will be correlated, but an estimate of the total Fisher matrix can be made by assuming one measurement per correlated region which gives This is the formula (62) in Section 5 used to calculate cosmological parameter constraints. It can be seen that the f sky factor comes from the correlations or resolution in visibility space. A more sophisticated treatment would allow for partially correlated visibilities within σ u of each other which would reduce the noise further. The Fisher matrix is an estimate of the expected inverse correlation matrix in the model parameters at the maximum likelihood solution. Thus (F −1 ) aa is an estimate of the error in the parameter a after marginalizing over the other parameters. This formalism as outlined here in terms of the temperature, but it is equally valid for the lensing shear or convergence. The size of individual antennas in future radio telescope arrays are expected to be of order a few wavelengths or smaller as in the case of dipole antennas. In this case the primary beam covers almost the whole hemisphere. However, subtracting interference and handling the huge data rate will probably require synthesizing a much smaller beam. In addition, the subtraction of Galactic foregrounds will probably not be possible in some regions near the Galactic plane. For these reasons the sky fraction, f sky , and the shape of the observed fields will be limited for a single pointing of the telescope beam. The sky fraction and thus the -space resolution can be increased with multiple pointing or mosaicking.

A P P E N D I X B : L E N S I N G I N F O U R I E R S PAC E
The temperature on the sky to first order is T (θ, ν) = T (θ, ν) + ∇ (θ) · ∇T (θ, ν), where T (θ, ν) is the temperature before lensing and (L) is the lensing potential defined so that the deflection angle is α (θ) = ∇ (θ). The Fourier transform of this is and, as a result, to first order where L = 0 and the average is over realizations of the temperature field while keeping the lensing potential fixed (Hu & Okamoto 2002). Note that if the noise is homogeneous it will drop out of this equation and the C values will be only the power spectrum of the fluctuations in brightness temperature. The observed temperature is always binned into frequency channels or bands and smoothed by the telescope's beam. This observed temperature is T ( , ν) = dr q ν (r , ν) d 2 l Ã ( − )T ( , ν ) (B4) where the q ν (r , ν) is the response function for the band centred on ν, T 21 is the 3D Fourier transform of the brightness temperature and In equation (B7) the response function is taken to be a boxcar shape with sharp edges at ν + ν/2 and ν − ν/2. In equation (B8), the fact that the universe is matter dominated at the time of the 21-cm emission/absorption is used to express the radial distances in terms of frequency. As a result of beam smearing relation (B3) will become T ( , ν)T * ( − L, ν ) = (2π) 2 d 2 Ã ( )Ã * ( + L)C + (B9) where the correlations between temperature modes before lensing and beam smearing It is assumed that D(ν) does not change significantly between frequencies that are significantly correlated. A more rigorous derivation of equation (B11) is in spherical harmonic space as has been done by Zaldarriaga et al. (2004), but for the scales of important here the difference is very small and equation (B11) is considerably easier to evaluate. There are two effects that make equations (B9) and (B10) different from the Hu & Okamoto (2002) result (equation B3). The first term (equation B9) represents an aliasing effect caused by the finite size of the beam. This will cause a false signal on scales approaching the size of the beam or surveyed region, L 2πσ u , that will need to be subtracted. The second term (equation B10) is a kind of smoothing of the lensing potential over a scale of ∼2πσ u . In the limit of a very narrow beam, a large area in angle, the relation (B3) is recovered except with the frequency binned power spectra. Thus the observations will really measure a lensing potential that is smoothed in Fourier space in a rather complicated fashion.

A P P E N D I X C : C O N V E R G E N C E E S T I M ATO R S
In the main text of this paper we used a real-space estimators for the shear and convergence,γ i (θ). We consider this the most intuitive and instructive approach. In the weak lensing limit the shear map can be converted to a convergence map because they are both related to a single lensing potential by differential operators. This is commonly done for galaxy-lensing surveys (see Bartelmann & Schneider 2001, for a review). The most straightforward method is to Fourier transform the shear map, multiply by dependent factors and then transform back to a convergence map (Kaiser & Squires 1993). Averaging this with theγ 3 map would produce a convergence map with less noise than theγ 3 map alone. However, the gain in noise will not be as great as in the galaxy-lensing case because in the 21-cm lensing case the Fourier modes ofγ 1 andγ 2 are correlated unlike in the galaxy case. Probably a more practical approach from a technical point of view is to go directly from visibility space to a convergence map in real space.
Many convergence estimators in visibility or Fourier space are possible. Our real-space estimators can be Fourier transformed to make a set of estimators for the Fourier modes of shear and convergence, but the Fourier estimator of Hu & Okamoto (2002) has the advantage of having the lowest noise level for one frequency bin if the temperature distribution is Gaussian and the beam is infinitely large (in angle).  find an estimator in both angular Fourier space and frequency Fourier space which is optimal with the added assumption that the frequency Fourier modes are statistically independent and Gaussian distributed. The statistical independence of the modes will break down because of binning in frequency and to a lesser extent because of the finite range in frequency. For this reason it is difficult to determine how bandwidth will affect the noise in their estimator. Instead we choose to use the Hu & Okamoto (2002) estimator for each frequency band and then weight each band.
We consider a second-order estimator for the shear or convergence of the form where, as in the main text,γ 1,2 are estimators for the two components of shear andγ 3 is an estimator for the convergence. In this case the estimators are of the form