Constraining our Universe with X-ray and optical cluster data

We have used recent X-ray and optical data in order to impose some constraints on the cosmology and cluster scaling relations. Generically, two kinds of hypotheses deﬁne our model. First, we consider that the cluster population is well described by the standard Press–Schechter (PS) formalism, and secondly, these clusters are assumed to follow scaling relations with mass: temperature–mass (cid:133) T – M (cid:134) and X-ray luminosity–mass (cid:133) L x – M (cid:134) . In contrast with many other authors we do not assume speciﬁc scaling relations to model cluster properties such as the usual T – M virial relation or an observational relation or an observational determination of the L x – T relation. Instead we consider general unconstrained parameter scaling relations. With the previous model (PS plus scalings) we ﬁt our free parameters to several X-ray and optical data sets with the advantage over preceding works that we consider all the data sets at the same time. This prevents us from being inconsistent with some of the available observations. Among other interesting conclusions, we ﬁnd that only low-density universes are compatible with all the data considered and that the degeneracy between V m and s 8 is broken. Also we obtain interesting limits on the parameters characterizing the scaling relations.


I N T R O D U C T I O N
In recent years, the quality and quantity of new data sets coming from several X-ray missions have allowed a more precise study of the properties of galaxy clusters. These data, together with optical data sets have allowed many authors to compare the predictions of different models with observations. The standard approach is to simulate the data for a given parameter-dependent model and then by using an estimator (likelihood, x 2 , etc.) to look for the best-fitting model. That is, the best parameter combination which best fits the data.
In this process usually several assumptions are made. The most usual is that concerning the cluster population. Normally, it is assumed that the cluster population is well described by the Press-Schechter (PS) formalism (Press & Schechter 1974). This approach is supported by N-body numerical simulations which do show a good agreement with the PS parametrization (Efstathiou et al. 1988;White, Efstathiou & Frenk 1993;Lacey & Cole 1994;Borgani et al. 1999).
Another assumption usually made is the scaling of the temperature of the cluster with its mass, the T-M relation, which is taken as the virial relation (Eke, Cole & Frenk 1996). A T-M relation is necessary, for instance to build the temperature function of clusters (see Section 3). However, it is not clear to what extent the virial assumption is true for clusters, especially for those at high redshift. Several works show that the relation between mass and temperature has an exponent close or equal to the virial exponent M / T 3=2 (Evrard, Metzler & Navarro 1996;Horner, Mushotzky & Scharf 1999;Neumann & Arnaud 1999). However, the isothermal b-model and X-ray surface brightness deprojection masses follow a steeper M / T 1:8-2:0 scaling (Horner et al. 1999).
There are other scaling relations that are not well understood in the sense that they depend on the data used to build those relations and also on the method used to fit the data. A good example of this point is the luminosity-temperature relation ðL x -TÞ. From the literature one can find scaling relations ranging from L x / T 2:6 (Markevitch 1998) to L x / T 3:3 (David et al. 1993), while the most common one is L x / T 2:9 (White, Jones & Forman 1997; Arnaud & Evrard 1999;Reichart, Castander & Nichol et al. 1999). They show a discrepancy in the exponent of the relation. More and better data will be needed to resolve that discrepancy. Fabian et al. (1994) noted that this scatter is mostly caused by clusters with strong cooling flows. See also White et al. (1997) for a good discussion concerning the effect of cooling flows. Also the method used to fit the L x -T data can explain part of this scatter. Conventional leastsquares regression analysis assumes the abscissae data have zero error. This problem is overcome, for instance, by the use of an algorithm that takes into account errors in both dimensions of the data (White et al. 1997).
At present, X-ray observations are the best available data to study clusters. The amount of available X-ray data is increasing fast and in the near future larger data sets will be available. The strong X-ray emission from the hot gas in the intracluster medium makes the X-ray surveys an ideal way to detect clusters of galaxies. New catalogues of clusters have been published in recent years with the advantage that they are X-ray selected, and new ones are in preparation.
Clusters have been used to impose constraints on cosmology in several papers (Oukbir & Blanchard 1992;Lupino & Gioia 1995;Donahue 1996;Eke et al. 1996;Kitayama & Suto 1997;Oukbir & Blanchard 1997;Mathiesen & Evrard 1998;Donahue & Voit 1999;and many others). Clusters are the largest gravitationally bound objects in the universe and represent the final stage of the peaks in the primordial matter distribution. Their distribution in the massredshift (M, z) space is the fingerprint of those primordial fluctuations. The cluster abundance and its evolution is an essential cosmological test. Their modelling only depends on cosmological parameters and not on any cluster scaling relation such as the T-M or L x -T, thereby allowing a more precise determination of the cosmological parameters independently of any assumption concerning the cluster scaling relations. For this reason many authors have tried to determine the cluster mass distribution as a function of redshift, the mass function (Bahcall & Cen 1993;Biviano et al. 1993;Girardi et al. 1998). These authors found many difficulties when they tried a direct determination of the mass function. Basically, the problem is that the mass estimators are usually based on different assumptions (spherical symmetry, virialization, hydrostatic equilibrium). Lensing determinations work pretty well but the number of clusters with mass determined by this technique is too small to build a mass function from them.
An improvement could be to compare the models with the data using other X-ray-derived functions (luminosity, flux, temperature). The advantage of using X-ray data is that the determination of the luminosity, flux or temperature of the clusters is in general less affected by sy errors than the usual mass determination based on radial velocities of galaxies.
In this paper we want to extract some information concerning clusters and cosmological parameters from cluster data. Our aim is to find a model (PS plus scalings) which fits different observational data. This model will be realistic in the sense that it describes present observations (mass, temperature, X-ray luminosity and flux functions). This work follows many others but with two main differences. First, in our model we will allow a large number of free parameters (9) instead of the one-or two-free-parameter models usually assumed. This will prevent us from making incorrect assumptions about the scalings T-M or L x -M which could affect the final conclusion. Our second difference is that we will consider different data sets simultaneously. This is an important point as we will show in Section 2, where we demonstrate how some models with a good fit to some data sets, are, however, inconsistent with others.
The structure of the paper is the following. In Section 2, we describe the different data sets that will be used in the fits, and in Section 3 we describe the model used to fit the previous data. In Section 4, we search for the best model fitting the different data sets and discuss the best model estimator. In Section 5, we discuss the main results and compare them with previous works. Finally, Section 6 includes the main conclusions of this paper and some implications for future X-ray and cosmic microwave background (CMB) experiments. Through this paper we assume H 0 ¼ 100 h km s 21 Mpc 21 . Although we work in h units, the previous assumption should be taken into account when comparing with other results.
The first one is the mass function given in Bahcall & Cen (1993) which is built from a compilation of optical data of nearby clusters ðz , 0:1Þ. These data have several uncertainties mainly owing to the poor precision in the determination of cluster masses. They estimated the masses through the richness and velocity dispersion of the clusters. More sophisticated methods, such as lensing estimation would be preferable in order to achieve a good mass function but unfortunately the number of clusters with masses estimated from gravitational lensing is too small. It is important to bear in mind that masses in Bahcall & Cen (1993) were obtained from proportionality laws between cluster richness and mass or velocity dispersions and mass. Therefore, these mass estimates should be considered as inferred masses and not as a direct measure. There are other more recent determinations of the mass function (Girardi et al. 1998), but they suffer from the same problems. From the theoretical point of view, the mass function has the advantage of depending only on the cosmological parameters and not on the parameters in the T-M or L x -M relations. Therefore, the mass function is very useful to constrain the cosmological parameters.
We would like to point out that the original mass function given in Bahcall & Cen (1993) is a cumulative mass function. We have computed the differential mass function from the previous one by computing the difference between consecutive bins and the corresponding error bars are build from the original ones by adding them quadratically. Also important is to note that in Bahcall & Cen (1993) masses are estimated within a radius of 1.5 h 21 Mpc. Our masses, however, are estimated within the virial radius. As a first approximation we will consider that the mass within a sphere of 1.5 h 21 Mpc centred on the cluster and the virial mass are equivalent. This is justified because the virial radius can be well approximated by r v ¼ 1:3M 1=3 ð1 1 zÞ h 21 Mpc which, for typical clusters, is of the order of 1 h 21 Mpc. Clusters with masses M , 1:5 Â 10 15 h 21 M ( will have virial radius below 1.5 h 21 Mpc. In our model we have considered a truncated cluster density profile beyond the virial radius. Therefore, the previous clusters will have the same masses for larger radii (1.5 h 21 Mpc). Some problems could arise with very massive clusters with M . 1:5 Â 10 15 h 21 M ( but these ones are very rare and the correction factor will in any case be small. In order to account for the evolutionary effects in the mass function, we have also considered another data set: the evolution of the mass function for massive clusters (Bahcall & Fan 1998). In this data set, the error bars are large but the data are good enough to constrain the cosmology even more. Bahcall & Fan have demonstrated that combining the two data sets can impose strong constraints on V m and s 8 .
Obviously, the best models found by Bahcall & Fan should be compatible with other data sets, but we have shown that this point is not true in general. If we take models with a good fit in both, the mass function and the evolution of the mass function, we have found that only a few of those models also have a good fit in other different data sets (for example, the luminosity function, see Fig. 1). This is the main reason why we decided to work with several presently available cluster data sets at the same time. We looked for the model that simultaneously fits the different data sets the best. The additional data sets came from X-ray observations. Several X-ray cluster catalogues have been published recently (Rosati et al. 1995;Burke et al. 1997;Collins et al. 1997;Scharf et al. 1997;Ebeling et al. 1998;Vikhlinin et al. 1998;de Grandi et al. 1999;Voges et al. 1999;Romer et al. 2000 and references therein). Some of these catalogues are deeper in flux than others and they have different sky coverages. The techniques to detect the clusters are also different (wavelets, Vikhlinin et al. 1998;Voronoi tessellation and percolation, Ebeling & Wiedenmann 1993), but they show a remarkable agreement in the results. Particularly remarkable is the good agreement in the luminosity function among of all these works, showing that the estimation of the luminosity function is a robust indicator of the cluster population and this function will be very useful in the process of fitting our model. For the luminosity function we have used the estimation of Ebeling et al. (1997). This luminosity function is built from a ROSAT 90 per cent flux-complete sample of ,200 bright clusters (brightest cluster sample, BCS) in the northern hemisphere at high galactic latitudes ð|d| $ 208Þ, with measured redshifts z # 0:3 and fluxes higher than 4:4 Â 10 212 erg cm 22 s 21 in the 0:1-2:4 keV band. Different determinations of the luminosity function have been given in the literature Rosati et al. 1998;Vikhlinin et al. 1998), all of them being compatible with that in Ebeling et al. (1997). We would like to point out that this curve is given for an Einstein -de Sitter universe with q 0 ¼ 0:5. In order to build the luminosity function it is necessary to assume a cosmological model for the computation of the luminosity distance and the comoving volume. We have checked the effect of changing V m in this function. We have seen that the effect is negligible when we Figure 1. An example of a bad model. The fit is good in the case of the mass and temperature functions but this model does not reproduce the other two curves. This model has the following typical values for the parameters which are commonly used in the literature (see text in subsection 3.2 and Section 5 for an explanation of these parameters and a discussion of their values): are dealing with redshifts below 0.3 as in this case. For higher redshift data, the effect is still small as can be appreciated from fig. 1 of Bahcall & Fan (1998) where the authors show the data for three different models.
Furthermore, there are other functions that can be used as a test of our model. In particular, the flux function is relatively well established (there is only a small scatter among the different author estimations). The main difference between these two functions is the redshift and cosmological model assumed. The flux function is a direct measure in the sense that this function does not contain any information concerning the distance (redshift plus cosmological model) from which the cluster is emitting. In contrast, the luminosity function contains this additional information (redshift plus cosmological model). Both functions are obviously connected by the assumed model.
For the flux function we used that given by Rosati et al. (1998) for low-flux clusters and for the bright part of the curve we used the function of De Grandi et al. (1999). The sample of Rosati et al. (1998) (ROSAT Deep Cluster Survey, RDCS) is over the redshift range 0:05-0:8 and is a complete flux-limited subsample of 70 galaxy clusters, representing the brightest half of the full sample, which have been spectroscopically identified down to the flux limit 4 Â 10 214 erg cm 22 s 21 in the 0:5-2:0 keV band. In the RDCS sample, the sky coverage is small (48 deg 2 ) meanwhile the sample of de Grandi et al. has a larger sky coverage (8235 deg 2 ) but the limiting flux is higher (,3:5 Â 10 212 erg cm 22 s 21 in the 0:5-2:0 keV band) and therefore the sample is shallower (z # 0:3Þ than the RDCS sample. The final curve we have used to constrain our model is the temperature function. Henry & Arnaud (1991) compiled a temperature function from a sample of 25 nearby clusters. Their sample is X-ray selected and comes from Lahav et al. (1989) subject to the additional restrictions that the flux in the 2-10 keV must be $3 Â 10 211 erg cm 22 s 21 and the galactic latitude (|b II | $ 208Þ (see Piccinotti et al. 1982). The sample is greater than 90 per cent complete and redshifts range between z ¼ 0:0036 and 0.09.
The temperature function of Henry & Arnaud (1991) is known to suffer from some errors (Eke et al. 1996;Markevitch 1998;Henry 2000), but as mentioned in Eke et al. (1998) and Henry (2000) the errors in the Henry & Arnaud (1991) temperature function are largely compensated. The temperature function is usually presented in integral form. A determination of the differential temperature function requires binning the data and performing an average over the objects in the bin. This procedure introduces some arbitrariness that the integral form avoids. However, owing to the fact that our method is based on x 2 quantities we need the temperature function in a differential form. The arbitrariness of this binned function could be reduced significantly by increasing the number of clusters with measured temperature. However, there are few clusters for which we know precisely their temperature and consequently the differential temperature function is poorly determined. In order to check the validity of the Henry & Arnaud (1991) temperature function with more recent data we computed a binned version of the temperature function using the data of Henry (2000). Our estimate of the differential temperature function was in good agreement, within the error bars, with the previous estimate of Henry & Arnaud (1991). Owing to this agreement and to the large error bars of this function, our results will not depend significantly on the choice of one or another temperature function.
Although the temperature function is affected by large error bars, however, its use is justified because as a difference with the luminosity or flux functions, only the T-M relation is needed to build the temperature function. To compute the theoretical luminosity and flux functions from the PS formalism, the L x -M and T-M relations are needed. The first one is used to obtain the bolometric luminosities from the mass and the second is required to obtain the luminosities in the observed band. Hence, the temperature function is less affected by the uncertainties in the cluster scaling relations than the luminosity and flux functions. A recent determination of the temperature function can be found in Blanchard et al. (2000) and Henry (2000). Their determination of this function is compatible with that in Henry & Arnaud (1991) The information about the redshift and sky coverages, limiting flux, and the energy band in which luminosities and fluxes are given is needed in order to correctly simulate the data following the characteristics of the observations. The total number of clusters, and thereby, the error bars, will depend on the redshift and sky coverages and also on the limiting fluxes. The shape of the functions will depend on the limiting flux because lowering the limiting fluxes less massive and more distant clusters will be selected. Energy band and K corrections must also be included in order to correct for the bolometric luminosity. Finally, the cluster number densities are based on the computation of the V max which is the maximum volume in which the cluster could have been and still remained in the sample. Therefore, these volumes will depend on the limiting flux [see Page & Carrera (2000) for a good discussion concerning the 1/V method].
All of these observational features will be considered to perform a bias test using Monte Carlo simulations of the models in Section 4.
These data sets are not completely independent. Some clusters are common to the different catalogues and one should consider the dependence between the data but it can be shown that the dependence is not very significant. The luminosities and fluxes are independent because to compute the luminosity from the flux the redshift is needed. Because the redshift is an independent variable with respect to the flux, then the luminosity should also be considered as independent with respect to the flux. The temperature is another independent quantity so we do not expect correlations between this data set and the others. However, there is a clear correlation between the first data point in the evolution of the mass function and one point of the local mass function. Indeed, the information given by the comoving number density of clusters NðM . 8:0 Â 10 14 h 21 M ( Þ at z ¼ 0 is contained in both data sets. Apart from this, we consider that the rest of our data points are in fact independent.
The situation is different with the theoretical curves. The model will introduce some correlations among the curves, as we will see in the next section.

The Press -Schechter formalism
As in previous works the starting point of our model is the mass function which contains information about how many clusters are at a given redshift and how massive they are. We adopt the standard Press -Schechter formalism (Press & Schechter 1974) which has shown to be very consistent with N-body simulations (Lacey & Cole 1994;Borgani et al. 1999).
In this formalism the cluster number density per unit mass as a function of mass and redshift is given by where r¯is the present-day average matter density r ¼ V m Â 2:7755 Â 10 11 h 2 M ( Mpc 23 and d co (z) is the linear theory overdensity extrapolated at the present time for a uniform spherical fluctuation collapsed at redshift z. For a V m ¼ 1 model we have used d co ðzÞ ¼ 1:6865ð1 1 zÞ and for V m , 1 we take d co ðzÞ ¼ Dð0Þ DðzÞ d c ðzÞ, where D(z) is the linear growth factor (Peebles 1980) and for an open L ¼ 0 model and for a flat LCDM model (see Kitayama & Suto 1996;Mathiesen & Evrard 1998 for details). s M is the rms of the density fluctuation at the mass scale M which is related with the power spectrum of density fluctuations P(k) through where the window function W(kR) is introduced in order to select the volume from which the object with mass M will be formed. We have used the standard top hat approach for the window function and the corresponding Fourier transform is in this case: WðkRÞ ¼ 3½sinðkRÞ 2 ðkRÞ cosðkRÞ=ðkRÞ 3 : R is the comoving scale corresponding to the mass M and the relation between both quantities is M ¼ r 4 3 pR 3 . For the power spectrum we have used the following parametrization: The amplitude A is computed from equation (4) just taking in that equation the mass corresponding to R ¼ 8 h 21 Mpc and eliminating from both sides of the equality the parameter s 8 . n is the primordial power spectrum. We fixed this parameter to the Harrison -Zeldovich case n ¼ 1 according to determinations from CMB data (COBE-DMR, Bennet et al. 1996;MAXIMA, Balbi et al. 2000), and finally T(k) is the transfer function. For the transfer function we have used the fit given in Bardeen et al. (1986) for an adiabatic CDM model: where q ¼ kðh Mpc 21 Þ=G, with G being the shape parameter of the power spectrum. For the case of a CDM model with negligible V b , then G , V m h. We have considered as an additional constraint in our calculations the following. Although all our data sets and quantities are h independent (everything is in h units), however, we have just considered those models for which the ratio G=V m is between the conservative limits 0:5 , h , 0:75, thus avoiding computing CDM models which could be inconsistent with recent determinations of h.
In the previous formalism, there are two main variables: the mass and redshift of the cluster. Therefore, the Press-Schechter mass function which predicts the density of clusters expected at a given redshift and mass can be considered as the probability distribution of clusters in the mass-redshift space ðM-zÞ by normalizing by the total number. The cosmological parameters in this formalism are basically three, the density of the universe, V m , the amplitude of the power spectrum which we parametrize in units of s 8 and finally the shape parameter of the power spectrum G.
We can compare this model with real observations of the mass function and by doing this we can obtain some information concerning these three parameters. This has been done in several works (Bahcall & Cen 1993;Girardi et al. 1998) and the conclusions are very interesting. These works have shown for instance that low-density universes are more compatible with the observed mass function.
However, there are some problems with these works. First, the quality of the data is not very good, mainly owing to the fact that most of the masses have been estimated using radial velocities of cluster galaxies. Secondly, the mass functions are built only for nearby clusters and these mass functions do not contain any information about the cluster abundance at high redshift. There are some attempts to estimate the evolution of the mass function with redshift and, although the error bars are very large, one can obtain very interesting constraints on the cosmological parameters using this evolution (Bahcall & Fan 1998). This indicates that accurate information of the cluster abundance at high redshift would be a very powerful technique to constrain the cosmology. Unfortunately, the mass function of clusters at high redshift is not well determined yet, but there are some other functions that can be used in addition to the mass function. Recent X-ray experiments (Einstein, ASCA, ROSAT) have determined the temperature, luminosity and flux for several hundred clusters, some of them at medium and high redshift (up to z ¼ 0:8 in the RDCS). This information can be used to build new functions similar to the mass function, based on the temperature, luminosity and flux of the clusters. For instance, the expected temperature function up to a given redshift, dN/ dT (which can be compared with the corresponding observational temperature function), will be given by the integral along the redshift interval of dNðT; zÞ dVðzÞ dT ¼ dNðM; zÞ dVðzÞ dM where dNðM; zÞ=dVðzÞ dM is the Press -Schechter mass function. In order to build that function we need to calculate the derivative dM/ dT and hence a T-M relation is required. Usually, the virial relation is assumed; T / M 2=3 ð1 1 zÞ, though as discussed below, we will introduce free parameters to describe this relation. To build the X-ray luminosity and flux functions we operate in the same way, but in this case we need the relation between the mass and the X-ray luminosity of the cluster, the L x -M relation. There have been few attempts to determine the L x -M relation observationally, but the situation is different with the L x -T relation (David et al. 1993;Markevitch 1999;Reichart et al. 1999). These works show that there is a scaling in this relation L x / T 2:6-3:3 . The exponent of the scaling depends on whether or not clusters with cooling flows are considered, the exponent being higher when clusters with cooling flows enter the analysis. Another contribution to that scattering is that different statistical methods have been used to analyse the data (White et al. 1997).
Using the T-M relation and the L x -T scaling it is possible to build an L x -M relation which can be used to construct the luminosity and flux functions.

Cluster scaling relations
Starting from the Press-Schechter mass function plus the T-M and L x -M relations, the idea of this work is, therefore, to build the mass function itself and the remaining curves: temperature, X-ray luminosity and flux functions. We will compare these curves with the corresponding observational data sets and by changing our model parameters we will look for the best model simultaneously compatible with all the different data sets.
So, all what we need to know are the T-M and L x -M relations.
For the T-M relation, the most common model comes from the virial theorem plus the spherical collapse model and the isothermal gas distribution assumption (Eke et al. 1996): The shortcomings of this relation are well known (Eke et al. 1996;Kitayama & Suto 1996;Viana & Liddle 1996;Voit & Donahue 1998). Basically, the problem is that this assumption only holds for virialized objects. In the case of clusters this is more or less true for low redshift clusters where the equilibrium conditions required by the virial theorem are achieved. However, we do not know what happens at high redshift. Similar problems are in the redshift evolution of this relation. As discussed in Voit & Donahue (1998), the consequences of using an inaccurate T-M relation can be quite significant. For these reasons, we will consider this relation as an unconstrained one and we will adopt as the T-M relation the following, with no previous assumption about the parameters: where M 15 is the cluster mass in h 21 10 15 M ( . For the L x -M relation the situation is similar. The L x -M relation is not well established and we prefer to allow this relation to be a free parameter relation, Since T gas is in kelvin and L Bol x in h 22 erg s 21 cm 22 and considering the mass in h 21 10 15 M ( , then an additional h a and h b must be introduced in T 0 and L 0 , respectively, in order to make our result h-independent.
From the previous L x -M relation it is possible to build the S x -M relation by simply considering, In this formalism the L x -T relation has the form where g ¼ b/ a is the familiar exponent of the L x -T relation and d ¼ f 2 cb/ a. Within this framework we have a total of nine free parameters: s 8 ; G; V m ; T 0 ; a; c; L 0 ; b and f (or equivalently we can use g ¼ b/ a instead of b, and d ¼ f 2 cg instead of f). We have also considered the two situations of flat L ¼ 1 2 V m models (LCDM) and open L ¼ 0 models (OCDM). There are some experimental determinations of the parameters in T-M and L x -M. For instance, many works have shown that a is compatible with the predicted virial value a ¼ 2=3 (Neumann & Arnaud 1999), but also possible are scaling exponents a , 0:5 (Horner et al. 1999;Nevalainen, Markevitch & Forman 2000). The normalization of the T-M scaling has been determined by many authors and they found typical values of T 0 , 1:0 Â 10 8 K (Horner et al. 1999). There is not too much work done on the determination of the redshift exponent c because the data and redshift coverage is poor to fit this exponent, but usually what is found is that this exponent is also compatible with the virial prediction c < 1 (Neumann & Arnaud 1999). On the L x -M relation the scatter in the data is too large (large error bars in mass) but the situation improves when the L x -T relation is considered instead. In the latter case, the scatter in the correlation is reduced. Typical values for the parameters in these relations are L 0 , 1:0 Â 10 45 h 22 erg s 21 ; g , 2:9 (Arnaud & Evrard 1999) and d , 0 (Borgani et al. 1999;Reichart et al. 1999;Fairley et al. 2000), although the uncertainty in this last parameter is large. From the relation between d and f is easy to infer that f < 3 is what it is expected when c ¼ 1 and g , 3. In Fig. 1 the model was chosen according to these typical values. From L x -T and T-M it is easy to infer the parameters in L x -M and vice versa.
In our fit, we have allowed the parameters to take different values around these observational and theoretical predictions.
We are now ready to build the five theoretical curves dNðMÞ=dM, dNðM; zÞ=dM, dNðL x Þ=dL x , dNðS x Þ=dS x and dNðTÞ=dT and to look for the best model by comparing these curves with the data.
Similar analyses have been presented in previous works. However, we would like to remark again that in those works either some parameters are fixed (in T-M or L x -MÞ or only one data set is used (e.g. dNðMÞ=dM, dNðTÞ=dT, etc.). In Mathiesen & Evrard (1998), the authors combined a free parameter L x -M relation and two data sets ðdNðLÞ=dL, and dNðSÞ=dSÞ in order to say something about the evolution of the L x -T relation. However, they fixed the T-M relation and they did not combine together the results coming from the two different data sets. Similar work was done in Borgani et al. (1999) where the authors have used the observables, flux number counts, redshift distribution and X-ray luminosity function over a large redshift baseline ðz , 0:8Þ of the RDCS in order to constrain cosmological models. In the same paper, no assumption is made a priori on the L x -M relation, except for the amplitude of this relation, which is fixed by the authors. In addition the T-M relation is fixed to the usual spherical collapse plus virial plus isothermal gas distribution model.
In Bridle et al. (1999) they have combined the X-ray cluster temperature function (Henry & Arnaud 1991;Henry 2000) with CMB data and the IRAS 1.2-Jy galaxy redshift survey, but they have assumed a fixed T-M relation. This latter point can affect the final result.
Up to now, no previous work has combined such a large number of data sets as the five we have used without including any assumptions concerning the normalization or specific scalings of the temperature or X-ray luminosity.
As we mentioned at the end of the previous section, the model we have assumed will introduce some correlations between the five theoretical curves. Just by looking to equation (7), it is clear that the temperature function is correlated with the mass function (equivalently for the luminosity and flux functions). This point should be taken into account when fitting the data.

S TAT I S T I C A L A N A LY S I S A N D R E S U LT S
In order to fit the five data sets we must decide which estimator we should use. Because we assume there are some scaling relations between mass and temperature ðT-MÞ and luminosity ðL x -MÞ in the X-ray band, then, there must be some correlations among the five simulated data sets. Therefore, we should start by considering an estimator such as the standard likelihood estimator which takes into account all the correlations into the correlation matrix M. In our case, the model depends on nine free parameters and if we consider a grid of, let us say five values per parameter, then we should compute the correlation matrix for 5 9 , 1 million different models. This process would take many years. A faster technique would require a search method that avoids exploring all the parameter space. This could be the technique if we were interested just in the best model but we also want to know the error bars, or in other words the probability distribution of the parameters. To do that we need to know the probability in a given regular grid.
To simplify the problem, the simplest approach is to consider the standard x 2 joint as our estimator: where x 2 i represents the corresponding ordinary x 2 for the five different data sets and we are assuming that the correlation matrix is in this case diagonal.
By doing this, we know that we are forgetting the correlations between the curves and that there will be some bias in our estimation. For this reason, we want to check other more elaborated estimators.
We have considered as a second estimator of the best model one based on Bayesian theory In this estimator, the x 2 i is again the ordinary x 2 for each data set and N i represents the number of data points for the data set i. Based on a Bayesian approach with the choice of non-informative uniform priors on the log, those authors have seen that this estimator is appropriate for the case when different data sets are combined together, as is our case. The factor N i plays the role of a weight factor. Larger data sets are considered more reliable for the parameter determination.
We have checked both estimators by performing a bias test. In this test we have simulated the five data sets for a concrete model with the corresponding error bars similarly as they were computed in the real data. The input model was selected according to the criterion that it would be as close as possible to the data (for instance, the model which minimizes x 2 joint Þ. In the simulations, we have taken into account all the characteristics of the data, that is, Figure 2. The histogram represents the number of times each parameter was considered as part of the best model by the standard x 2 joint (dotted) and the x 2 L (full). The black dot represents the input model. sky coverage, limiting flux, maximum redshift, etc. Then we compare each one of these realizations corresponding to the assumed model with the models previously computed in the grid and for each realization we obtain the best-fitting model to the simulated data using both estimators. In Fig. 2, we plot the number of times each parameter was considered as part of the best model by the first and second estimator. The dot represents the input model. As can be seen from the histograms, the second estimator x 2 L works a bit better than the standard x 2 joint . There is still some bias, but the agreement between the input model and the recovered peak of the distribution is very good.
We can obtain some interesting information from these plots. The dispersion of the histograms indicates how sensitive is the estimator to that parameter. For instance, the cosmological parameters are well constrained. This is not the case for the redshift exponent c. We fixed this parameter to the virial value c ¼ 1 because our method is not sensitive to that exponent. When changing this exponent the simulated curves did not change appreciably, showing the almost null dependence of the simulated curves on this parameter. There is an explanation for that. This exponent appears only in the T-M relation as the redshift exponent. This relation is needed to construct the temperature function and these data only go up to redshift ,0.1. Therefore, it is not surprising that we cannot obtain any significant result concerning the redshift dependence with these data. The T-M relation also appears in the calculation of the X-ray luminosity in a given band, so the exponent c would, in principle, be important when we are simulating clusters at high z to compare the flux function with the data of RDCS since this data goes to z , 0:8. The flux in the band used by Rosati et al. (1998) is calculated from the luminosity in that band (see equation 11) and L band is computed from L Bol in the following way: L band ¼ L Bol f band , where f band includes the band and K corrections and is usually well approximated by the integral of the frequency dependence of the bremsstrahlung emission: Emin e 2E/ KbT dE. E min and E max are the energy limits of the band, and T is the cluster temperature. The redshift dependence of f band is concentrated in the K-correction, and there is also a weak dependence on the redshift exponent of the T-M relation. This dependence is too weak to be able to impose some constraints on this exponent even when we are using data at medium-high redshift such as the fluxes of clusters at z , 0:8 (Rosati et al. 1998). This explains the reason why with these data we cannot say much about the exponent c. We decided to fix this parameter to the standard value c ¼ 1, therefore reducing our dimension in the parameter space from nine parameters to eight. However, this parameter should be considered as a free parameter when dealing with future data for which the redshift coverage will increase significantly. Other result from the bias test is that there is some bias in the parameter a (scaling exponent of the T-M relation). The bias is about 0.05 or more towards higher values of a. We will come to this point later. A similar bias is found in L 0 (about 0.03 to higher values). The bias is not too large considering the small bin interval, but anyway it must be taken into account.
Apart from these parameters, the second estimator seems to be a good indicator of the best model. The next step is to compute the probability distribution in our nine-dimensional parameter space (eight after fixing c), using the second estimator. We have used a grid with about 2 Â 10 6 different models in the two cases of flat LCDM and OCDM and for each of them we have computed their P L (equation 14).
In Fig. 3, we plotted the best model compared with four data curves used in the fit. It is important at this point to compare Figs 1 and 3. Both cases only differ slightly on the cluster scaling relations but the differences in the models are relevant, especially in the case of the luminosity and flux functions. This shows the sensitivity of the models to the cluster scaling relations. Small changes in the parameters of these scalings can produce a completely different function if all the changes imply variations for the function in the same direction. The best models listed in Table 1 are an example of a fine tuning between the parameters. A change in one parameter should be compensated by another change in other(s) parameter(s) in order to keep the model compatible with the data and only a small region of the parameter space is allowed. This also explains why the temperature function does not change significantly. While in the luminosity and flux functions both scaling relations ðT-M and L x -MÞ are needed, in the case of the temperature function only the T-M relation is required, thereby reducing the number of parameters and consequently the change in the temperature function when a variation in the whole set of parameters is performed.
In Fig. 4, the best model is compared with the fifth curve. There is a good agreement between our best-fitting model and all the data sets except the fifth one where the model predicts less comoving number densities at high z than observed (only two clusters in the z , 0:54 bin and one in the z , 0:8 binÞ. However, one should bear in mind that in the fifth curve there are only three data points and also these data points have large error bars and therefore the weight of the fifth curve in the estimator of Lahav et al. (see equation 15) is low compared with the weight of the other data sets. When Table 1. Best LCDM and OCDM models (c fixed to 1.0). Error bars represent the projection of the contour at the 68 per cent confidence level of the eight-dimensional probability on each of the parameters. Limits marked with (*) must be considered as lower limits because the parameter was not explored above that limit. On the other hand, the dNðM; zÞ=dM curve is useful in the sense that including this curve in the analysis, helps to break the degeneracy between s 8 and V (as we will show in the next section).
Obviously, this point suggests the need to obtain better quality data in the evolution of the mass function in order to make these data a decisive discriminator between the models.

D I S C U S S I O N
We have computed the marginalized probability of the parameters in order to see how well constrained are those parameters. In Fig. 5, we show the power of the method to constrain the cosmology, even the amplitudes of the T-M and L x -M relations are well constrained. As can be seen in the bias test, it is clear that we cannot say much about the exponents a and g, except that high values are favoured.
Virial theory predicts a ¼ 2=3 which is compatible (at 68 per cent) with our fit values given in Table 1. However, models with a ¼ 0:8 work better than virial models, and higher values could possibly work even better. (We did not check this possibility because we wanted to remain within values of the parameters not far away from the expected ones.) In Nevalainen et al. (2000), the authors found a , 0:55 which is inconsistent with the self-similar (virial) prediction. They argue that a possible explanation for this discrepancy is preheating of an intracluster gas by supernovadriven galactic winds before the clusters collapse, as proposed by, for example, David et al. (1991), Evrard & Henry (1991), Kaiser (1991) and Loewenstein & Mushotzky (1996). If supernovae release a similar amount of energy per unit gas mass in hot and cool clusters, the coolest clusters would be affected more significantly than the hottest ones. This increase in their temperature will change the slope in the T-M relation towards low-a values. In the data sets we have considered, we have bright clusters with temperatures which are typically T . 3 keV. At those high temperatures, the previous effect should not be relevant and hence the slope in the T-M relation should approximate the self-similar value of 2/3 (see fig. 2 in Nevalainen et al. 2000). This can explain how our results are more compatible with the virial prediction that with those empirical relations where cool clusters are included in the fit.
A possible source of systematic errors in our best-fitting values (including a) can be on the data themselves. The data sets used in  this work suffer from several systematics which can affect the bestfitting parameters in the T-M relation. In our method, the bestfitting T-M relation is obtained from a global fit of the model to all the data. If such data sets change in some way then the best-fitting model should change as well. In the mass function, masses are defined inside a fixed radius. A different choice of this fixed radius could produce a different estimate of the cluster mass function. In the X-ray flux and luminosity functions, the inferred fluxes and luminosities depend on the assumed cluster profile used to extrapolate the observed surface brightness profile (Vikhlinin et al. 1998). If masses, fluxes or luminosities are underestimated or overestimated, then we should expect some differences in the bestfitting parameters and in particular in a. These systematics will be reduced with future determinations of these quantities (M, T, L x ). Cluster mass estimates can be clearly improved using the lensing technique. On the other hand, on-going X-ray missions (CHANDRA, XMM-Newton) will be able to determine the cluster surface brightness profile at larger radii and with a higher quality for a significant number of clusters. Furthermore, from the bias test we know that in the a parameter there is some bias in the peak of the distribution, so we know that if we obtained a ¼ 0:8, this high value compared with the virial one can arise from the bias in our estimator. However, our estimate of a is compatible (given the error bars) with the virial exponent. It could also be that hot clusters really behave in this way, showing a tendency towards high-a exponents. In order to distinguish between the two possibilities, more and better quality data are needed.
The second exponent, g, is also pointing to high values. In this case we know, from the bias test, that this exponent is degenerated. This together with the error bars found can very well accommodate an exponent g , 2:9, which is the most frequent value obtained in the literature when fitting the L x -T relation directly. However, the direct estimate of L x -T suffers from large scattering and depending on the kind and number of clusters considered the results are quite different and high values for g should not be ruled out yet.
For instance, in Borgani et al. (1999) they found 3 , g , 4 when fitting a phenomenological L x -T relation plus PS to the local X-ray luminosity function.
Concerning the redshift exponent f, we have a bit more information compared with the null information we obtained in c. This is not surprising because the L x -M relation appears in the calculation of dNðLÞ=dL and dNðSÞ=dS where the data are between z [ ½0; 0:3 and z [ ½0; 0:8, respectively, and these redshift intervals are much deeper than that for the dNðTÞ=dT data. Although the best value differs for the two cosmologies considered, however, the value f , 3 is allowed in both cases. Experimentally, there is no determination of the f parameter. What the different authors assume when they try to fit the L x -M relation to real data, is that there is no redshift dependence in this relation, that is, they simply fit the relation L x ¼ L 0 M b . However, we have shown in Section 2, that the unobserved f parameter can be related to the redshift exponent in the L x -T relation (equation 12), d ¼ f 2 cg and using this relation we can infer the value of f. Typical values for d found in the literature are d , 0 (Fairley et al. 2000). In Borgani et al. (1999), the authors have shown that the L x -T relation is compatible with no evolution. This result is also consistent with that of Mushotzky & Scharf (1997) where they compared results from a sample of ASCA temperatures at z . 0:14 with the low-redshift sample by David et al. (1993) and they found that data out to z . 0:4 are consistent with no evolution in L x -T. Now if we assume c ¼ 1 (from virial models) and g , 3 (from the empirical (from the empirical L x -T relation) then f should be f , 3d , 0. So we can conclude that f , 3 is compatible with the virial assumption and also with g , 3.
For a comparison of our results with a recent determination of the L x -T relation see, for instance, Fairley et al. (2000). It is remarkable that in that paper the authors find g ¼ 3:15, very close to our preferred value. Also they found an amplitude in the L x -T relation which is C ¼ 6:04^1:47 Â 10 42 erg s 21 . This value should be compared with the amplitude L 0 in our L x -M relation L 0 < 1:0 Â 10 45 ðhÞ b h 22 erg s 21 which corresponds to an amplitude in L x -T (see equation 12) L 0 / T g 0 ¼ 6:25 Â 10 42 erg s 21 [for g ¼ 3, T 0 ¼ 1:0 Â 10 8 h a K and taking h ¼ 0:5, which is the value used in Fairley et al. (2000)]. The normalization obtained here for the T-M relation is higher than those obtained from simulations or pure cluster modelling (spherical symmetry, virialization, hydrostatic equilibrium). This is not surprising as these kinds of modelling do not include some physical processes relevant to cluster formation and evolution. Our results should be compared with observational determinations of this relation such as those in Horner et al. (1999) where they found values for the T-M normalization compatible with our estimate (see table 1 in Horner et al. 1999).
It is important to point out that not all the parameter combinations inside the error bars in Table 1 correspond to models which are simultaneously compatible with all the data sets. As we have shown in Fig. 1, the model with parameters s 8 ¼ 0:8; G ¼ 0:2; V m ¼ 0:3; ðL ¼ 0Þ; T 0 ¼ 1:0 Â 10 8 K, a ¼ 2=3; c ¼ 1:0; L 0 ¼ 1:0 Â 10 45 h b h 22 erg s 21 , g ¼ 2:9; f ¼ 3:0 is an example of a 'bad' model in the sense that this model does not fit all the data sets. One should also notice that although these values are inside the error bars given in the table, since they are projected ones, not all the possible combinations are allowed at the 68 per cent confidence level. Therefore, when choosing a model it is important to bear in mind the correlations among the parameters.
The method is really powerful in the determination of the cosmological parameters. We made a consistent fit to five different data sets and we saw strong constraints on the cosmological parameters. Independently of L, only low-density universes are compatible with the different data sets. The amplitude of the power spectrum is also well constrained. Its value is consistent with, for instance, the value obtained by Bridle et al. (1999) where they have combined cluster, plus CMB and IRAS data using the same Lahav et al. estimator and they obtained s 8 , 0:75 and V m , 0:35.
We have computed the marginalized probability in the ðs 8 -V m Þ space in order to look for the well-known s 8 -V m correlation (Eke et al. 1996;Carlberg et al. 1997;Henry 1997;Kitayama & Suto 1997;Bahcall & Fan 1998;Borgani et al. 1999;Bridle et al. 1999). From the five data sets, the function dNðM; zÞ=dM shows a tendency to favour low-density models ðV # 0:2Þ, whereas the others seem to favour slightly higher values of V. Although our grid is poor (intervals of 0.1 in s 8 and V m ), we have seen that by combining the five data sets, there is a clear peak at the position cell ðs 8 ¼ 0:8, V m ¼ 0:3Þ in both LCDM and OCDM models. Approximately 50 per cent of the marginalized probability volume is enclosed in that 0:1 Â 0:1 cell (see Fig. 6).
This shows that the degeneracy between these two parameters can be broken by combining different data sets.
From the five data sets considered in this work, the evolution of the cluster population with redshift (Bahcall & Fan 1998) is, in principle, the most sensitive to the change in the cosmological parameters. However, that data set suffers from large error bars owing to the small number of clusters present at the high redshift bins. We made an additional test to check the weight of this data set in our fit. We have recomputed the marginalized probability in V-s 8 , excluding from the fit the Bahcall & Fan (1998) data set. The result is very similar to that shown in Fig. 6. This demonstrates that with only the low-redshift data sets it is possible to break the degeneracies present when each one of the individual data sets is analysed separately.
The fit to the flat LCDM model was a bit better than that to the open model in the sense that the best-fitting LCDM model had a smaller x 2 L (76.1 compared with 76.8). In order to compare both cases in a more realistic way we performed the following statistical test. Using 500 simulations of the OCDM model, for each of them we obtained the best model given by the x 2 L estimator applied to both situations (LCDM and OCDM models). The result was that 197 of the initial 500 OCDM simulations had a smaller x 2 L in the flat model case and in the remaining 303 simulations the open case was preferred. This demonstrates that both cases are equally probable with this method.
Obviously, the constraints given here will improve when new and high quality data are available (CHANDRA and XMM-Newton). The method proposed should be very useful when constraining the cosmology with the upcoming new data.

C O N C L U S I O N S
In this work, we have shown that our method, which combines different data sets for the cluster population, is a powerful tool to constrain both the cosmology and cluster scaling relations.
Our method is robust in the sense that neither assumptions concerning the cosmology nor specific cluster scaling relations are made a priori.
Despite the correlations in the theoretical curves, we have shown that with simple estimators (such as the standard x 2 joint and the Bayesian estimator of Lahav et al.) it is possible to fit the data without any significant bias.
The main conclusions of this paper are the following. Regarding the cosmology we have shown that only low-density (flat and open) models are compatible with the data sets considered in this paper. The marginalized probability in the ðs 8 -V m Þ space shows a clear peak at the position ðs 8 ¼ 0:8, V m ¼ 0:3Þ in both LCDM and OCDM models. This is a very interesting conclusion because previous works (Eke et al. 1996;Kitayama & Suto 1997;Bahcall & Fan 1998;Borgani et al. 1999;Bridle et al. 1999) show a degeneracy in these two parameters. This degeneracy is broken  when considering the five data sets we used in this paper. It is important to remark that in Bridle et al. (1999) the authors combine cluster abundance, CMB and IRAS data and they find values for (s 8 , V m ) very close to our best-fitting model. It is important to note that this result is compatible with the recent determination of the V m parameter obtained by the BOOMERANG team (De Bernardis et al. 2000;Lange et al. 2000) and MAXIMA (Balbi et al. 2000;Hanany et al. 2000). The third cosmological parameter, G, is consistent with the value obtained from the fit of the power spectrum of galaxies assuming CDM. (Peacock & Dodds 1994;Viana & Liddle 1996) Regarding the parameters obtained for the cluster scaling relations, they are consistent with empirical determinations of such scalings. However, we find a tendency to high values in the a exponent which could contradict recent determinations of such exponent, Nevalainen et al. (2000). However, as mentioned in the discussion, we know that there is a bias in our estimation of a. Therefore, our estimate is compatible (within the error bars and the bias) with the virial exponent a ¼ 2=3.
Additional data coming from high redshift clusters (CHANDRA, XMM-Newton and PLANCK) will improve this result.
Particularly interesting is the work that can be done with future CMB surveys. The PLANCK satellite will explore the whole sky at different frequencies (from 30 to 800 GhZ) and with resolutions between 5 and 30 arcmin. At these frequencies and with those resolutions we have shown (Diego et al. 2001) that many clusters are expected to be observed at high redshift ðz . 2Þ through the Sunyaev -Zel'dovich effect (see Fig. 7). PLANCK is expected to detect those clusters with S mm . 30 mJy. The information these clusters will provide will be decisive to definitely exclude many models. As shown, for instance, in Barbosa et al. (1996), Aghanim et al. (1997), Diego et al. (2001), the SZE can be considered as a clear probe of the cosmological parameters. In particular, from the previous discussion we concluded that we are not able to discriminate between LCDM and OCDM models. However, from Fig. 7, it is evident that through the SZE it could be possible to distinguish between these two models at a very high confidence level.

AC K N OW L E D G M E N T S
We would like to thank to Piero Rosati for kindly providing his data for the differential flux function and Nabila Aghanim for useful comments. This work has been supported by the Comisión Conjunta Hispano-Norteamericana de Cooperación Científica y Tecnológica ref. 98138, Spanish DGESIC Project no. PB98-0531-C02-01, FEDER Project, no. 1FD97-1769-C04-01 and the EEC project INTAS-OPEN-97-1992. JMD acknowledges support from a Spanish MEC fellowship FP96-20194004. And finally JMD, EMG, JLS, and LK thank to the CfPA and Berkeley Astronomy Department for its hospitality during several stays in 1999.