Constraining the intergalactic medium at $z\approx$ 9.1 using LOFAR Epoch of Reionization observations

We derive constraints on the thermal and ionization states of the intergalactic medium (IGM) at redshift $\approx$ 9.1 using new upper limits on the 21-cm power spectrum measured by the LOFAR radio-telescope and a prior on the ionized fraction at that redshift estimated from recent cosmic microwave background (CMB) observations. We have used results from the reionization simulation code GRIZZLY and a Bayesian inference framework to constrain the parameters which describe the physical state of the IGM. We find that, if the gas heating remains negligible, an IGM with ionized fraction $\gtrsim 0.13$ and a distribution of the ionized regions with a characteristic size $\gtrsim 8 ~h^{-1}$ comoving megaparsec (Mpc) and a full width at the half maximum (FWHM) $\gtrsim 16 ~h^{-1}$ Mpc is ruled out. For an IGM with a uniform spin temperature $T_{\rm S} \gtrsim 3$ K, no constraints on the ionized component can be computed. If the large-scale fluctuations of the signal are driven by spin temperature fluctuations, an IGM with a volume fraction $\lesssim 0.34$ of heated regions with a temperature larger than CMB, average gas temperature 7-160 K and a distribution of the heated regions with characteristic size 3.5-70 $h^{-1}$ Mpc and FWHM of $\lesssim 110$ $h^{-1}$ Mpc is ruled out. These constraints are within the 95 per cent credible intervals. With more stringent future upper limits from LOFAR at multiple redshifts, the constraints will become tighter and will exclude an increasingly large region of the parameter space.


INTRODUCTION
The Epoch of Reionization (EoR) is one of the least understood chapters in the history of our Universe. The formation of the first luminous sources initiated the transition of the cold and neutral intergalactic medium (IGM) into a hot and ionized state. This transition had a significant impact on the later stages of structure formation through various feedback mechanisms (see e.g. Ciardi & Ferrara 2005 for a review). Although we know that reionization took place, very few facts about it are known with certainty (see e.g. Morales & Wyithe 2010;Pritchard & Loeb 2012;Zaroubi 2013;Barkana 2016, for reviews).
Theoretical models suggest that ionizing ultra-violet (UV) photons from the first sources created localized ionized regions, which over time grew in size, started to overlap and, as an increasing number of sources formed, led to a complete reionization of the IGM. Observations of high-redshift (z 6) quasar absorption spectra suggest that complete reionization was reached around redshift 6 (e.g. Fan et al. 2006;Mortlock et al. 2011;Venemans et al. 2015;Bañados et al. 2018). On the other hand, the measurement of the Thomson optical depth from the observation of Cosmic Microwave Background (CMB) (Planck Collaboration et al. 2018) suggests that the probable period of this event lies at redshift 10 (Choudhury & Ferrara 2006;Mitra et al. 2011Mitra et al. , 2012. However, the details of the reionization process such as the exact timing of the EoR, the morphology of the H I distribution in the IGM and the properties of early sources, are still poorly known. The redshifted 21-cm signal from neutral hydrogen in the IGM is the most promising probe of the EoR, as it has the ability to reveal many of the unknown facts about this epoch. Inspired by its potential, several radio telescopes such as the Low Frequency Array (LOFAR) 1 (van Haarlem et al. 2013;Patil et al. 2017), the Precision Array for Probing the Epoch of Reionization (PAPER) 2 Kolopanis et al. 2019), the Murchison Widefield Array (MWA) 3 (Bowman et al. 2013; Barry et al. 2019) and the Hydrogen Epoch of Reionization Array (HERA) 4 (DeBoer et al. 2017) have invested considerable amounts of observing time to detect this signal. Due to their limited sensitivity, these radio interferometers aim to measure the statistical fluctuations of the signal. The planned Square Kilometre Array (SKA) 5 will in addition be able to produce actual tomographic images of the distribution of the signal on the sky (Mellema et al. 2015;Ghara et al. 2017). Beside these large radio interferometers, single antenna experiments such as EDGES (Bowman & Rogers 2010), EDGES2 (Monsalve et al. 2017;Bowman et al. 2018), SARAS (Patra et al. 2015), SARAS2 (Singh et al. 2017), BigHorns (Sokolowski et al. 2015), SciHi (Voytek et al. 2014) and LEDA (Price et al. 2018) are being used to attempt a detection of the sky-averaged 21-cm signal and its evolution with redshift.
In spite of all these efforts, so far no undisputed detection of the 21-cm signal from the EoR has been made. The main reason for this is that the signal is several orders of magnitude weaker than the galactic and extra-galactic foregrounds at these frequencies (see e.g., Shaver et al. 1999;Jelić et al. 2008). Moreover, the signal low amplitude implies long integration times are required to exceed the instrumental noise. Although there exist accurate methods to sub-tract (Harker et al. 2009;Bonaldi & Brown 2015;Chapman et al. 2013Chapman et al. , 2016Mertens et al. 2018), suppress (Datta et al. 2007;Majumdar et al. 2012;Ghara et al. 2016) or avoid (Datta et al. 2010;Liu et al. 2014) the foregrounds, these only work if the sky signal has been measured with high fidelity over the time of observation. This then requires exquisite calibration of the system as any left-over artefacts from strong sources will make a measurement impossible (Barry et al. 2016;Patil et al. 2017). This implies calibrating the many hardware components of the telescope (see e.g., Kern et al. 2019) while a further complication is added by the presence of the temporally and spatially varying ionosphere (see e.g., Mevius et al. 2016).
Recently, Bowman et al. (2018) have claimed a detection of the sky-averaged 21-cm signal at z ≈ 17 in observations with the EDGES2 low-band antenna. These results are debated (e.g. in Hills et al. 2018;Draine & Miralda-Escudé 2018;Singh & Subrahmanyan 2019;Bradley et al. 2019), but if true would challenge our theoretical understanding of the early universe as explanations for its strength require either a previously unknown cooling mechanism (see e.g., Tashiro et al. 2014;Barkana 2018;Fialkov et al. 2018;Muñoz & Loeb 2018) or a radio background other than the CMB (Feng & Holder 2018;. Other attempts have to date only provided upper limits on the expected signal. While global signal experiments probe the average brightness temperature, experiments with radio interferometers constrain the power spectrum of the expected 21-cm signal. Observations with the GMRT (Paciga et al. 2013) provided the very first upper limit, which was a 2-sigma value of (248) 2 mK 2 for k = 0.50 h Mpc −1 at z = 8.6. Later PAPER and MWA produced additional upper limits Ali et al. 2015; Barry et al. 2019). Note that the PAPER collaboration initially reported a strong upper bound (Beardsley et al. 2016) which was later revised to a weaker upper bound (Cheng et al. 2018;Kolopanis et al. 2019). The first LOFAR upper limit on the power spectrum of the signal obtained from one night observation was (79.6) 2 mK 2 at k = 0.053 h Mpc −1 and a redshift between 9.6 and 10.6 (Patil et al. 2017). Recently, upper limits were provided for even higher redshifts. Gehlot et al. (2019) placed upper limits on the power spectrum in the redshift range z = 19.8 − 25.2 using observations with the LOFAR-Low Band Antenna array and Eastwood et al. (2019) placed upper limits at z ≈ 18.4 using observations with the Owens Valley Radio Observatory Long Wavelength Array (OVRO-LWA) 6 . Mertens et al. (2020) have provided the second LOFAR upper limit on the 21-cm power spectrum at redshift ≈ 9.1 using 10 nights of observations. At k = 0.1 h Mpc −1 the 2 − σ upper limit is (106.65) 2 mK 2 , a factor of ≈ 8 improvement at the same k− scale over the value obtained from 1 night of observations (Patil et al. 2017) and the best upper limit so far on the large-scale power spectrum at redshift 9. The results give upper limits for a range of k values but only at one redshift, z ≈ 9.1. In this paper we explore scenarios for the EoR that can be ruled out by these new upper limits. As they are about an order of magnitude higher than the most popular theoretical predictions, we can expect that only extreme models will be ruled out. Similar analyses was performed by Pober et al. (2015) and Greig et al. (2016) for the earlier upper limits from PAPER which were reported in Ali et al. (2015).
Extracting astrophysical and cosmological information from 21-cm observations is not straightforward as, in addition to the cosmological dependence, the characteristics of the expected signal depend crucially on specific properties of the early sources and their redshift evolution. While UV photons from such sources are mostly absorbed during ionization of H I in surrounding regions, X-ray photons, due to their longer mean free path, penetrate further into the neutral gas and increase its temperature (see e.g., Madau et al. 1997;Zaroubi & Silk 2005;Zaroubi et al. 2007). At the same time, Lyα photons from the same sources determine the coupling strength of the H I spin temperature with the gas temperature. In view of this complexity, an exploration of many theoretical models of the expected 21-cm signal is necessary to interpret the results from radio observations. Such signal prediction algorithms are often combined with a Bayesian inference framework, such as the Monte Carlo Markov Chains (MCMC), to explore and constrain the reionization parameters (e.g. Greig & Mesinger 2015;Park et al. 2019;Cohen et al. 2019). This is the approach we will use here, relying on the GRIZZLY code (Ghara et al. 2015a(Ghara et al. , 2018 to generate the reionization scenarios and models for the 21-cm signal. Since the codes which are used to generate the 21-cm signal use source parameters as input, the results from such Bayesian inference frameworks typically constrain these source parameters. However, it should be realised that the 21-cm observations themselves characterise the state of the IGM and do not contain any direct information about the source properties. It is perfectly possible that many different source models lead to the same 21-cm signal, especially if one only has information from a single redshift, as is the case for the latest LOFAR upper limits. As explained in more detail below, we will therefore have a strong focus on IGM parameters such as the average ionized fraction, average spin temperature, volume fraction of "heated region" (i.e, partially ionized regions with gas temperature larger than the CMB temperature) and size distributions of ionized and heated regions. We give much less weight to the source parameters, which however are still needed by the models to generate the 21-cm signals.
Our paper is structured as follows. In section 2, we describe the basic methodology to prepare the Bayesian framework used to interpret the observations. We present our results in section 3 and discuss them from the point of view of other observations in section 4, before concluding in section 5. The cosmological parameters as used in this study are Ω m = 0.27, Ω Λ = 0.73, Ω B = 0.044, h = 0.7, consistent with the Wilkinson Microwave Anisotropy Probe (WMAP) results (Hinshaw et al. 2013). These are the same as used in the N-body simulation used in this paper. Within the error bars, these are consistent with Planck 2015 results (Planck Collaboration et al. 2016) which are used in Mertens et al. (2020). Note that all distances and scales used in this study are in comoving coordinates.

METHODOLOGY
Here we introduce the approach employed to generate the 21-cm signal and compare it with the observational upper limit.

Cosmological simulations
We use the GRIZZLY code (Ghara et al. 2015a(Ghara et al. , 2018 to generate brightness temperature maps at redshift ≈ 9.1. This algorithm requires cosmological density fields and halo catalogues as input. These are retrieved from results of the PRACE 7 project PRACE4LOFAR, which was run specifically for the purpose of providing several cosmological simulations for the interpretation of LOFAR data. Here we use the largest volume, of length 500 h −1 comoving megaparsec (Mpc) (see e.g, Giri et al. 2019b,a). This corresponds to a field-of-view of 4.27 • ×4.27 • at redshift ≈ 9.1 which is comparable to the LOFAR primary beam of ≈ 4 • . The cosmological N-body simulation was run using CUBEP 3 M (Harnois-Déraps et al. 2013) with 6912 3 particles and a mass resolution of 4.05 × 10 7 M . Halos were identified on the fly with a spherical overdensity halo finder (Watson et al. 2013), and only those with masses 10 9 M and higher, i.e. resolved with at least ≈ 25 particles, were used. More details on the PRACE4LOFAR simulations can be found in Dixon et al. (2016).
The GRIZZLY simulations are run on gridded versions of the density fields from which the halos have been removed as they are not part of the IGM and their effect is captured by the assumptions of the source model through the photon escape fraction. We use 300 3 grids for the results in this paper. The smallest k-scale which can be probed with this resolution is ≈ 1.9 h Mpc −1 (corresponding to scale ≈ 3.3 h −1 Mpc). The smallest scale probed in Mertens et al. (2020), which is ≈ 0.4 h Mpc −1 (corresponding to scale ≈ 15 h −1 Mpc), remains within the Nyquist limit of our simulation and free from the aliasing effect (Mao et al. 2012).
2.2 Modelling the 21-cm signal using GRIZZLY The differential brightness temperature, δT b , of the 21-cm signal can be expressed as (see e.g, Madau et al. 1997;Furlanetto et al. 2006b), where the quantities x HI , δ B and T γ (z) = 2.73 (1 + z) K denote the neutral hydrogen fraction, baryonic density contrast and the CMB temperature, respectively, each at position x and redshift z. T S represents the spin temperature of hydrogen in the IGM. In this paper, we will consider the dimensionless power spectrum of the brightness temperature, i.e., ∆ 2 (k) = k 3 P(k)/2π 2 . The spherically averaged power spectrum P(k) can be expressed as δ The GRIZZLY algorithm is based on a one-dimensional radiative transfer scheme and is an independent implementation of the BEARS algorithm described by Thomas & Zaroubi (2008); Thomas et al. (2009); Thomas & Zaroubi (2011) and Krause et al. (2018). It approximates the transfer of photons by assuming that the effect from individual sources is isotropic and can therefore be precalculated as radial profiles around each source. The algorithm corrects for overlap by ensuring that the total ionized volume of the region created by multiple sources is the correct one. This approach makes the code very fast, a requirement necessary for parameter studies such as the one we perform here. Ghara et al. (2018) presented a detailed comparison between the performance of this code and the full three-dimensional radiative transfer code C 2 RAY (Mellema et al. 2006b). We found that although GRIZZLY employs a range of approximations, its results agree with those of the full radiative transfer quite well, while being at least 10 5 times faster. In Appendix A, we give a brief outline of this code, while we refer the reader to the original papers (Ghara et al. 2015a(Ghara et al. , 2018 for a more detailed and complete description of the algorithm. Note that we have not included redshift space distortions while evaluating the power spectrum for different model parameters, as their impact remains rather small during the EoR, when ionization fluctuations dominate the power spectrum of δT b (Jensen et al. 2013;Majumdar et al. 2016;Ghara et al. 2015a). The upper limits from Mertens et al. (2020) at scales k = 0.075 and 0.1 h Mpc −1 are ∆ 2 = (58.97) 2 mK 2 and (95.21) 2 mK 2 respectively (see also Table 1). Before proceeding, it should be realized that these values are rather high compared to the power spectrum at redshift ≈ 9.1 predicted by various standard reionization scenarios, such as in Mellema et al. (2006a), Iliev et al. (2007), Greig & Mesinger (2015), Ghara et al. (2015b), Hassan et al. (2016), Bolgar et al. (2018) and Ross et al. (2019). For example, the predicted power spectra at z ≈ 9 at k = 0.1 h Mpc −1 are found to be 10 3 mK 2 . Models which can be excluded by these upper limits therefore have to be quite extreme.
As the lowest upper limit is for the largest scales, our aim is to identify scenarios which produce large amplitudes for the largescale fluctuations. Spatial fluctuations in the 21-cm signal can only be caused by spatial fluctuations in x HI , δ B or T S (see Eq. 1). Previous studies have shown that the fluctuations in δ B are small on the scales measured by LOFAR (e.g. Peebles 1993). We therefore consider two different scenarios to identify models with either large x HI and/or T S fluctuations. In the first scenario, we assume a uniform T S , so that the large-scale fluctuations of the signal are mostly driven by fluctuations in x HI . In the second scenario, we relax the uniform T S assumption and allow sources of heating to create local regions of high T S . In this case the large-scale fluctuations are predominantly sourced by fluctuations in T S . These two scenarios will be discussed in detail later in Section 3.1 and 3.2.
To calculate the evolution of the IGM for these scenarios, GRIZZLY needs to characterize the source properties with a range of parameters. The following are used in our study (also listed in Table 2).
• Ionization efficiency (ζ): the rate of ionizing photons per unit stellar mass escaping from a halo is given by N i = ζ × 2.85 × 10 45 s −1 M −1 . This value corresponding to ζ = 1 derives from the model galaxy spectrum employed when calculating x HII and T K , which has been produced with the publicly available code PE-GASE2 8 (Fioc & Rocca-Volmerange 1997). Note that the emission rate of the ionizing photons is assumed to be proportional to the halo mass. We refer the reader to Ghara et al. (2015a) for more details. We calculate the stellar mass of a halo using where f is the star formation efficiency, fixed at 0.02 (Behroozi & Silk 2015;Sun & Furlanetto 2016). The parameter ζ combines all the degeneracies from various quantities related to the star formation rate and the emission rate of ionizing photons from the sources, as well as their escape fraction into the IGM. The case ζ = 1 corresponds to a star formation efficiency of 2 per cent and an escape efficiency of 100 per cent, but also to a star formation efficiency of 20 per cent and an escape efficiency of 10 per cent. We vary ζ in both scenarios considered in this paper.
• Minimum mass of the UV emitting halos (M min ): In the above parametrization of the ionizing efficiency the number of ionizing photons escaping from a halo depends linearly on its mass. However, below a certain minimum mass radiative and mechanical feedback can severely reduce the star formation efficiency (see e.g., Hasegawa & Semelin 2013;Dawoodbhoy et al. 2018). We model this by introducing M min as the minimum mass of halos from which ionizing photons escape into the IGM. As with ζ, this parameter represents different physical processes, not only feedback but for example also very low escape fractions from lower-mass halos (see e.g., Gnedin et al. 2008;Sharma et al. 2016). Due to the mass resolution of our N-body simulation (see Sect. 2.1) the lowest value for M min is 10 9 M . Although halos of lower masses could contribute, as we will see below, the LOFAR results are not able to constrain such low values. In general, one expects star formation in halos with mass 10 9 M to be suppressed due to radiative feedback (see e.g., Wise et al. 2014;Dixon et al. 2016). Note that we do not employ radiative feedback in this study as M min remains 10 9 M for the scenarios considered here. We vary M min in the uniform T S scenario while fix it to 10 9 M in the non-uniform T S model.
• Minimum mass of X-ray emitting halo (M min,X ): In addition to the stellar contributions, GRIZZLY can also include heating and ionization from X-ray sources such as quasars, high-mass X-ray binaries, etc. As not all star hosting halos are necessarily substantial X-ray sources, we use the minimum mass of dark matter halos that contain X-ray sources as a separate parameter. This allows us to include scenarios in which the X-ray source population deviates from the population of galaxies. We consider and vary this parameter only in the non-uniform T S model.
• X-ray heating efficiency ( f X ) and spectral index (α): The spectrum of an X-ray source at energy E is modelled as a powerlaw, i.e. I q (E) = A q E −α , where α is the spectral index. The normalization constant A q is determined such that the X-ray luminosity per stellar mass is 3.4 × 10 34 f X erg s −1 M −1 , where f X is a parameter. This implies a rate of X-ray photons per unit stellar mass emitted from a halo N x = f X × 8.47 × 10 43 s −1 M −1 . The value of N x for f X = 1 is ∼ two orders of magnitude larger than the measurements of high-mass X-ray binaries in local star forming galaxies in 0.5-8 keV band (Mineo et al. 2012). We assume that the UV band spans the range 13.6-100 eV, while the X-ray band goes from 100 eV to 10 keV. We vary f X while we keep α fixed at 1.2 (fiducial) or 0.3 in the non-uniform T S model.  Table 3. An overview of the IGM parameters considered in this paper. Except for T S in the case of the uniform T S model, all of these are derived from the simulation results. We explore a range [-12:1] for 1 − T γ /T S . The last column refers to the models in which such a quantity is considered.

IGM Parameters Description
Corresponding Models x HII Volume averaged ionized fraction Uniform and non-uniform T S models T K Volume averaged gas temperature in the partially ionized IGM with x HII < 0.5 Non-uniform T S model

Derived IGM parameters
As mentioned above, although GRIZZLY uses astrophysical source parameters to generate brightness temperature maps, the main goal of this work is to infer the IGM properties at z ≈ 9.1 from the new LOFAR upper limit. At this epoch, the IGM is expected to consist of H II regions embedded in a (partially) neutral medium.
The signal from such gas is in emission (δT b > 0), in absorption (δT b < 0) or zero depending on its spin temperature T S . In addition to δ B , two major sources of the spatial fluctuations of the signal are fluctuations in ionized fraction x HII and spin temperature T S . If the signal is dominated by x HII fluctuations, the maximum power spectrum obtained from a model depends not only on the volume averaged ionized fraction (x HII ) and spin temperature, but also on the size distribution of the H II bubbles (e.g. Furlanetto et al. 2006a). We will therefore study the latter by characterizing the probability distribution function (PDF) of the sizes of H II regions with R HII peak and ∆R HII FWHM , which represent the size at which the PDF has a maximum and the full width at half maximum (FWHM), respectively. Figure 1 shows an example of such a distribution.
Similarly, in the presence of spin temperature fluctuations, the power spectrum of the 21-cm signal also depends on the size distribution of the heated regions (i.e. regions with T K > T γ ) besides the average gas temperature T K , fraction of volume occupied by the heated regions f heat and mass averaged brightness temperature (δT b ). Similarly to the PDF of H II regions, we will characterise the size distribution of the heated regions adopting the parameters R heat peak and ∆R heat FWHM . There exists no unique way to characterize the size distribution of a complex three-dimensional structure such as the distribution of H II /heated regions in the IGM. We refer the reader to Friedrich et al. (2011), Lin et al. (2016 and Giri et al. (2018a) for an overview of the various methods that can be used. In this work we will use a Monte Carlo based approach, namely the mean free path (MFP) method, first proposed by Mesinger & Furlanetto (2007). In the MFP algorithm, we randomly select a point inside the region of interest (e.g. H II regions) and shoot a ray in a random direction until it reaches the boundary of the region. The length of the ray is recorded. When this process is repeated numerous times, the PDF of the recorded lengths provides the PDF of the regions of interest. Here we use 10 7 rays shot on the fly during the GRIZZLY simulation.
A list of parameters used to characterise the IGM is given below (also in Table 3): • Volume averaged ionized fraction (x HII ).
• Volume averaged gas temperature in the partially ionized IGM with x HII < 0.5 (T K ).
• Uniform spin temperature of the IGM (T S ). Note that 1-(T γ /T S ) form will be used rather than T S .
• Volume fraction of heated regions with T K > T γ ( f heat ).
• R HII peak (R heat peak ): Size of the H II (heated) regions at which the probability distribution function (PDF) of the sizes peaks.
• ∆R HII FWHM (∆R heat FWHM ): full width at half maximum of the PDF of the sizes of the H II (heated) regions.
Note that we do not model the signal directly in terms of these IGM parameters, these are rather derived quantities from the simulations.

GRIZZLY emulator
Although GRIZZLY is fast and efficient, for parameter estimation with a Bayesian inference framework where hundreds of thousands of models may be needed, the use of GRIZZLY can become computationally too expensive. We therefore adopt an alternative approach. First, we emulate the power spectra derived from GRIZZLY simulations using the machine learning algorithm known as Gaussian Process Regression (GPR; Rasmussen & Williams 2006). The power spectrum emulator is used to interpolate within the parameter space and evaluate the power spectrum for parameter values which have not been simulated. For a description on how to emulate EoR simulations with GPR, we refer the reader to Kern et al. (2017) and Jennings et al. (2019). We use the GPR module provided in the python package SCIKIT-LEARN (Pedregosa et al. 2011). We determine the values for the hyper-parameters for GPR using cross validation, a process which prevents over-fitting of the model (e.g. Cawley & Talbot 2010;Hastie et al. 2009). We have used 10-fold cross validation (Kohavi 1995) to construct the emulators.
Given a set of parameters as described in the previous section, we have configured GRIZZLY to generate the spherically averaged power spectrum for the k-bins of the LOFAR data. However, as we will see later, not all the data points from the upper limit of the power spectrum are useful for this analysis. We therefore only use power spectrum amplitudes at scales k 0.15 h Mpc −1 to build up our emulator, more specifically at k = 0.075, 0.1 and 0.13 h Mpc −1 . We quantify the accuracy of the emulators with their mean squared error (MSE) 9 . In order to test the accuracy we calculate the MSE for the testing set. The testing set is independent of the data set used for emulation. The MSE of the emulators for predicting the 21-cm power spectrum are found to be less than 10 per cent.
We combine this emulator with the MCMC module available in the EMCEE python package (Foreman-Mackey et al. 2013) to explore the parameter space of different scenarios. As we are interested in the IGM parameters, we also construct emulators for mapping the source parameters to the IGM parametes. The MSE of these emulators are less than 5 per cent.

Bayesian inference framework
As described in the previous section, we combine the GRIZZLY emulator with an MCMC algorithm to explore the parameter space for different scenarios and to constrain them using the observed upper limits. The probability of any parameter value θ, i.e. the posterior p(θ|x), given some observation x, is defined by Bayes' theorem as, where p(θ) represents the prior on the parameter values. The quantity p(x|θ), also known as the likelihood L, gives the probability of any observation given certain parameters. It should be kept in mind that the likelihood can not be defined by the formal χ 2 method as the observed power spectrum is an upper limit only. Therefore, here we define the likelihood as follows. Let us denote the observed power spectrum ∆ 2 o (k i ) by ∆ 2 21 (k i ) ± ∆ 2 21,err (k i ), while the model power spectrum estimated using the emulator for a set of parameters θ is denoted by ∆ 2 m (k i , θ). Two major sources of uncertainty on the modelled large-scale power spectrum are: (i) error from the emulators themselves, (ii) sample variance which increases at larger scales. The combined error remains 10 per cent for the scales considered in this study. Thus, we assume a 10 per cent modelling error, . This error is always larger than the sampling error from the simulation. The total variance σ 2 = ∆ 4 21,err + ∆ 4 m,err includes the errors from the observation and simulations. For an upper limit, we define the likelihood L(θ) for a model with parameters θ as (see Appendix B for the derivation) This expression results in the following key behaviour: • If the model power spectrum, ∆ 2 m (k i , θ), as estimated using the emulator for a set of parameters θ is larger than the observed power spectrum, ∆ 2 21 (k i ) + σ(k i ), within at least one k-bin k i , L(θ) approaches 0 and that model is ruled out by the upper limit. • for all k-bins, L(θ) approaches 1, and that model is consistent with the upper limit.
• In case the above two conditions do not hold, the likelihood estimated from Eq. 3 remains between 0 and 1. 9 The MSE of the emulator is defined as (e.g. Jennings et al. 2019), where <> represents the mean estimate. The quantities Q true and Q emulated are true and emulated values respectively. In this work, we aim to find models that are excluded by the measured upper limit. Thus, we use L ex (θ) = 1 − L(θ) in the MCMC analysis as the likelihood of a set of parameters θ to be excluded by the upper limit. 10 In addition to this, we use a prior on the ionized fraction estimated from the measured Thomson scattering optical depth in Planck Collaboration et al. (2018). As we do not have any prior information about the redshift evolution of the average ionized fraction, x HII , we estimate the maximum value which is possible at redshift ≈ 9.1 as follows. If we assume that x HII increases or remains constant with time, a Thomson scattering optical depth τ = 0.054±0.007 translates into a maximum ionized fraction 0.57±0.24 at redshift ≈ 9.1. Here, we thus use x HII,max (z = 9.1) = 0.81 as the maximum possible value for x HII at redshift 9.1. This corresponds to a scenario in which the universe is neutral at z > 9.1, has a constant ionized fraction in the range 9.1 > z > 6 and reionization ends suddenly at redshift 6. Note that this is unlikely to be a realistic scenario, as x HII is expected to gradually increase to 1 with time. While studies such as Pober et al. (2015), Greig et al. (2016), and Monsalve et al. (2019) consider model dependent reionization histories and estimate x HII by comparing the estimated τ for the models with the measured τ from the CMB observations, here we use x HII,max (z = 9.1) = 0.81 as a model-independent conservative upper limit of the ionized fraction at z ≈ 9.1. 10 Note that, following the same calculation shown in Appendix B, one can also directly estimate the likelihood of set of parameters θ to be excluded .

RESULTS
Now we apply the parameter estimation framework described in the previous section to the upper limits from Mertens et al. (2020) (also given in Table 1). As described before, we will discuss two scenarios. While the first one considers ionized patches in a uniform T S IGM, the second one also includes T S fluctuations. We present our results in the following sections.

Ionized patches and a uniform T S
In this section we focus on the scenario in which the large-scale modes are caused by the presence of ionized regions, and assume a uniform spin temperature with a value T S (see e.g. Ali et al. 2015 andPober et al. 2015 for previous papers adopting a uniform T S model). These ionized regions are expected to be photo-heated to a temperature T K ≈ 10 4 K and emit no signal as x HI ≈ 0. Here, T S represents the spin temperature of the neutral part of hydrogen in the IGM. The sizes and spatial distribution of the ionized regions are determined by the astrophysical parameters ζ and M min . Therefore this model has three parameters ζ, M min and 1 − T γ /T S which we will explore. We further assume the existence of a uniform Lyα background which fully couples T S to the kinetic temperature T K , and thus a uniform T S implies a uniform T K for the neutral IGM. The lowest value of T K is obtained in the complete absence of heating processes, when adiabatic cooling due to cosmological expansion gives T K = 2.1 K at z ≈ 9.1 for our choice of cosmological parameters (calculated using CMBFAST; Zaldarriaga & Seljak 2000) 11 .  obtain a uniform distribution, this heating will have to be driven by very hard rather than soft X-ray photons (see e.g., Pacucci et al. 2014;Fialkov et al. 2014). Since the spin temperature appears in the differential brightness temperature expression (Eq. 1) as 1−T γ /T S , we use this, rather than T S , as a parameter in our study. For T S T γ , 1 − T γ /T S approaches 1, while for the lowest value represent the radius at which the probability distribution of the sizes of the ionized regions has maximum amplitude and the full width at half maximum of that distribution respectively. We use the mean free path method to estimate the bubble size distribution. The region in white in the bottom right of the panels corresponds to fully ionized IGM at redshift ≈ 9.1. The solid curves show the contours of ∆ 2 = (58.97) 2 mk 2 for 1 − T γ /T S = -12. For a deterministic observation, the region enclosed by the solid contour will be excluded. The dash-dotted line shows the contour for x HII = 0.81.
of T S = 2.1 K, 1 − T γ /T S ≈ −12. We therefore explore the range [-12,1]. Since we assume that 1 − T γ /T S is constant, the power spectrum scales by (1−T γ /T S ) 2 at all wavenumbers. Therefore, we train our GPR emulator only to generate power spectra for different combinations of ζ and M min while keeping 1 − T γ /T S = 1. For ζ and M min we select the ranges [10 −2 − 10 2.5 ] and [10 9 − 10 12 M ] respectively. The total number of GRIZZLY models used for training the emulator is 8556.
We first illustrate the outcome of this set of GRIZZLY models in Figure 2. The left panel shows a 2D slice from the brightness temperature cube for the case ζ = 50, M min = 3 × 10 10 M and 1 − T γ /T S = −12. This combination of parameters produces an ionization map with large H II bubbles with characteristic size larger than several tens of Mpc and a volume averaged ionized fraction of 0.55. The corresponding power spectrum is plotted as a thick dashed curve in the right panel of Figure 2, together with the other 8555 models from the training set. All these curves assume the minimal value of 1 − T γ /T S = −12.
The red points in the right panel of Figure 2 denote the current LOFAR upper limits on ∆ 2 with 1 − σ error bars. Clearly, some of the models have a power spectrum amplitude larger than the upper limits at the larger scales. These results also show that scales with k 0.15 h Mpc −1 do not significantly constrain the models, which is why, as mentioned in Sect. 2.4, we only use the lowest three k values to build the emulator and to calculate the likelihood in the MCMC framework. 1−T γ /T S = −12 and -9 respectively. The solid curves in both panels represent the contours corresponding to the upper limit at this scale, i.e. ∆ 2 = (58.97) 2 mK 2 . One can easily see that a significant part of the parameter space can be ruled out by this upper limit alone for 1 − T γ /T S = −12. However, the volume of parameter space which can be excluded rapidly shrinks for higher values of 1 − T γ /T S , and almost no constraints can be set for 1 − T γ /T S −8. Figure 3 also shows that the section of parameter space covering ζ 10 and 10 9.8 M M min 10 11 M which produces a highly ionized IGM is disfavoured. In fact, the excluded parameter space remains close to the parameter space which completes reionization by redshift 9.1, which corresponds to the region in white at the bottom right corner of both panels.

GRIZZLY and IGM parameters
In the left panel of Figure 4, we plot the average ionized fraction x HII as a function of the ζ and M min values we have explored. One can easily see that the models with the largest amplitude of the large-scale power spectrum correspond to an ionized fraction ≈ 0.5. This is expected as at this stage of the reionization process the typical dimension of the bubbles becomes comparable to the size of the scale of interest (see e.g., Ghara et al. 2015a). As the ionized fraction approaches 1, the power spectrum decreases and becomes negligible at the end of the reionization process due to the paucity of neutral hydrogen. One can see that for 1 − T γ /T S = −12 the excluded parameter space corresponds to average ionized fractions 0.2. It is interesting to note that, coincidentally, in this scenario the parameter space excluded by the LOFAR upper limit shares the same boundary at x HII ≈ 0.81 with the parameter space excluded by the CMB Thomson scattering optical depth constraint on the maximum possible value of ionized fraction at redshift 9.1 (dashed-dotted line, see Section 2.5).
The right panel of Figure 4 shows the value of the average brightness temperature δT b as a function of ζ and M min for 1 − T γ /T S = −12. δT b falls between ≈ −300 mK (fully neutral) Figure 6. Constraints on the three GRIZZLY parameters of the uniform T S scenario (see Section 3.1) from the MCMC analysis using the LOFAR upper limit for z ≈ 9.1. The color-bar shows the probability that models are ruled out. The solid and dashed curves show the 68 and 95 per cent credible intervals of the ruled out models. The diagonal panels show the marginalized probability distribution by which each parameter value as used in the MCMC analysis is ruled out. Table 4. Constraints from the MCMC analysis on the IGM parameters of the uniform T S scenario at z ≈ 9.1. Note that our analysis excludes the parameter space that satisfies all the conditions given in this  Figure 7. Similar to Figure 6, but this shows constraints on the IGM parameters at z ≈ 9.1 in the uniform T S scenario. The color-bar shows the probability that models are ruled out. The solid and dashed curves corresponds to the 68 and 95 per cent credible intervals of the ruled out models. The marginalized probability distributions of the IGM parameters are shown in the diagonal panels. and zero (fully ionized). We find that the excluded parameter space is concentrated around δT b −250 mK. This is due to the fact that the average ionized fraction remains 0.2 for the excluded parameter space for the case of the lowest spin temperature. Figure 5 shows the dependencies of R HII peak and ∆R HII FWHM on ζ and M min . Clearly, the most probable size of the bubbles and the FWHM increase with increasing ζ and decreasing M min , as the average ionized fraction increases (also see Giri et al. 2018a,b). As expected, the parameter space which is excluded has preferentially a large characteristic size of the ionized regions. More specifically, the part of the parameter space which can be ruled out for 1 − T γ /T S = −12 shows R HII peak 10 h −1 Mpc and ∆R HII FWHM 30 h −1 Mpc. To test whether the results for the derived IGM quantities could be sensitive to the choices for the source model, we also explored a source model in which the ionizing emissivity depends non-linearly on the halo mass. As can be seen in Appendix C, changing the source model affects the constraints on the source pa-rameters but reproduces the same constraints on the derived IGM parameters, illustrating that these constitute the more robust results of our study.

MCMC results
Up to this point we have only explored the implications of the LO-FAR upper limits using the results from GRIZZLY for slices through selected parameter spaces. In this section we employ our parameter estimation framework which includes the emulator results and an MCMC algorithm. The aim is to explore the full parameter space and find the probability that the models are ruled out by the current upper limit from LOFAR. We use 20 walkers and 10 6 steps for this MCMC analysis. We checked the convergences of the MCMC chains using the integrated auto-correlation time as suggested in Goodman & Weare (2010) and find that the chains are well converged for this number of steps.
The likelihood used for this MCMC analysis is given by Eq  . Spherically averaged power spectra of the 21-cm signal from z ≈ 9.1 at scale 0.075 h Mpc −1 as a function of M min,X and f X . These power spectra are generated using an emulator which is trained with 1495 models from GRIZZLY. This plot corresponds to ζ = 0.1, M min = 10 9 M and α = 1.2. The solid contour represents the upper limit constraint from LOFAR at scale 0.075 h Mpc −1 , i.e. ∆ 2 = (58.97) 2 mK 2 . For a deterministic observation, the region enclosed by the solid contour will be excluded.
3. In addition, we use a flat prior on x HII (z = 9.1) ≤ 0.81. Figure  6 shows the posterior distribution of the parameters ζ, M min and 1 − T γ /T S , with the solid and dashed curves indicating the 68 per cent and 95 per cent credible intervals 12 of the excluded models 12 We estimate the credible intervals of our posterior distributions by the within the range of parameters considered here. As expected, and as already suggested by Figure 3, higher values of ζ ( 10) and lower values of M min (in particular 10 9.8 M M min 10 11 M ) are more likely to be excluded as they result in higher ionization and thus a large-scale power spectrum more likely to exceed the observed one. Similarly, a colder IGM is more likely to be ruled out than a hotter IGM, as the former increases the signal strength.
We use a separate emulator to estimate the IGM parameters x HII , R HII peak and ∆R HII FWHM for this scenario from the same set of GRIZZLY source parameters as used in our MCMC framework. This emulator is constructed using the same method as described in Section 2.4. Figure 7 shows the posterior distribution of the IGM parameters. The constraints on the excluded IGM parameters are also listed in Table 4. Clearly, an IGM with x HII ≈ 0.13 − 0.74, (1 − T γ /T S ) −8.5, and H II bubble distribution characterised by R HII peak ≈ 8−58 h −1 Mpc and ∆R HII FWHM ≈ 16−185 h −1 Mpc is ruled out within 95 per cent credible intervals. This part of the parameter space corresponds to −250 mK δT b −55 mK. Note that the excluded parameter space requires satisfying all of the abovequoted conditions. These results are in agreement with our findings in section 3.1.1. However, it is also clear from Figures 6 and 7 that tighter constraints on the power spectrum are required to put any bounds on source and IGM parameters with this analysis if the IGM is not very cold.

Spin temperature fluctuations
In this section we relax the uniform T S assumption and consider the scenario in which X-ray sources cause partial ionization and heating of the IGM. However, we will not vary all five GRIZZLY approach based on computing the highest density interval (see e.g. Hyndman 1996). Figure 10. Average gas temperature for regions with ionized fraction less than 0.5 (left panel), volume fraction of the heated regions (middle panel), and average brightness temperature at redshift 9 as a function of M min,X and f X . Here ζ = 0.1. The solid curve represents the contour corresponding to ∆ 2 = (58.97) 2 mK 2 at scale 0.075 h Mpc −1 which is the LOFAR upper limit on the spherically averaged power spectrum. For a deterministic observation, the region enclosed by the solid contours will be excluded. Figure 11. Distribution of R heat peak (left panel) and ∆R heat FWHM (right panel) at z ≈ 9.1 as a function of the parameters f X and M min,X for ζ = 0.1. R heat peak and ∆R heat FWHM represent the radius at which the probability distribution of the sizes of the heated regions (i.e, regions with T K > T γ ) has maximum amplitude and the full width at half maximum of that distribution, respectively. We use the mean free path method to estimate the size distribution of the heated regions. The region in white in the bottom right of the panels corresponds to IGM fully heated above the CMB temperature. The solid curve represents the contour corresponding to ∆ 2 = (58.97) 2 mK 2 which is the LOFAR upper limit on the power spectrum at scale 0.075 h Mpc −1 . For a deterministic observation, the region enclosed by the solid contours will be excluded. parameters ζ, M min , M min,X , f X and α (see Section 2.2). Instead, we fix the values of M min and α and only retain the remaining three parameters. This choice is motivated by a preliminary study suggesting that the LOFAR upper limits provide very weak constraints on M min and α.
We set M min = 10 9 M , i.e. the lowest dark matter halo mass provided by our N-body results (Section 2.1). This means that, unlike the previous scenario, all halos contribute to the ionization of the neutral hydrogen in the IGM. All the halos also emit Lyα pho-tons, building a strong Lyα background. We thus assume that the Lyα coupling is saturated (in other word, T S = T K ) in this scenario. The value of the X-ray spectral index α is uncertain and dependent on the properties of the X-ray sources. For X-ray sources such as quasars and mini-quasars the spectrum can be very steep, with α 1 (Vignali et al. 2003;Gallerani et al. 2017;Martocchia et al. 2017), while for high-mass X-ray binaries the observed spectral index can be as small as α ≈ 0.2 (Mineo et al. 2012;Islam et al. Figure 12. Constraints on the three parameters of our second scenario with non-uniform T S fluctuations models presented in section 3.2 from the MCMC analysis using the LOFAR upper limit at z ≈ 9.1. The color-bar shows the probability that models are ruled out. The solid and dashed curves show the 68 and 95 per cent credible intervals of the ruled out models. The diagonal panels show the marginalized probability distribution for the parameters used in the MCMC analysis in this scenario. 2019). In this study, we assume α = 1.2. Below we discuss the effect of different α on our results.
The remaining three parameters constitute our parameter space. For ζ we keep the same range used in the previous scenario, while we vary M min,X between 10 9 and 10 12 M , and f X between 0.1 and 10. As we will see below, this choice covers the regime that is interesting from the point of view of the current LOFAR upper limits. As the run time of the simulations with spin-temperature fluctuations is much longer than in the previous scenario, we initially cover the parameter space with a coarser grid. We then visually identify the part of the parameter space that provides a large amplitude of the large-scale power spectrum and fine sample only that region to increase the accuracy of the emulator. We thus end up using only 1495 power spectra generated using GRIZZLY to train our GPR emulator for this scenario.
In Figure 8 we show a slice from the brightness temperature map corresponding to the scenario with ζ = 0.1, M min,X = 3 × 10 11 M and f X = 2. The average ionized fraction remains ≈ 0.01 due to the small value of ζ. The average volume fraction of heated regions of this map is also small (≈ 0.1) as M min,X is large, and thus only a few of the massive halos contribute to the heating. While in the previous scenario the patchiness of the signal was due to the ionized regions only, now it is also sourced by the heated regions around the sources.
The thick dashed curve in the right panel of Figure 8 refers to the power spectrum of the δT b map shown in the left panel, together with the 1494 other power spectra used to build the three-parameter emulator of ∆ 2 for this scenario. Similar to the previous case, we find that the large-scale power spectra of some of the extreme models are larger than the LOFAR upper limits, which are shown by the red data points and their limits in the right panel of the figure. Figure 13. Constraints on the IGM parameters of the non-uniform T S model presented in section 3.2 from the MCMC analysis using the LOFAR upper limit at z ≈ 9.1. The color-bar shows the probability that models are ruled out. The solid and dashed curves show the 68 and 95 per cent credible intervals of the ruled out models. The diagonal panels show the marginalized probability distribution for each of the IGM parameters considered in this scenario. Figure 9 shows the power spectrum at scale k = 0.075 h Mpc −1 in the 2D parameter space of f X and M min,X . Note that unlike Figure  3, where the power spectra were derived from the GRIZZLY simulations, here they are evaluated directly with the emulator. In this plot we fix ζ = 0.1, which ensures a small average ionized fraction at z = 9.1 (x HII = 0.01). Clearly, the power spectrum remains the lowest and constant for a combination of a high value of f X and a low value of M min,X . In this case, heating of the partially ionized gas in the IGM due to X-rays becomes very efficient, raising the gas temperature above the CMB (T K T γ ) and rendering δT b independent of the spin temperature. On the other hand, the heating of the gas in the IGM remains inefficient for a combination of small f X and high M min,X . As expected, the power spectrum for such models (top left corner of the Figure) is larger than the power spectrum for the heated IGM (bottom right corner of the Figure).

GRIZZLY and IGM parameters
One can see that the spin temperature fluctuations are effi-cient around the diagonal of the parameter space, starting from small values of f X and M min,X . Specifically, the combination of high M min,X and f X enhances the large-scale power spectrum. In this combination, the heated/emission regions around the rare Xray emitting sources remain isolated in the background absorption signal (e.g., see the left panel of Figure 8). Also, the partial ionization and heating of the IGM far away from the X-ray emitting sources remain small for a high value of M min,X . We also plot the contour corresponding to the LOFAR current upper limit at scale k = 0.075 h Mpc −1 , i.e. (58.97) 2 mK 2 . Clearly, some parts of the parameter space with the combination of high M min,X ( 10 11 M ) and f X ( 1) are ruled out with high confidence.
Next, we will consider the global parameters of this scenario. Note that to estimate the IGM parameters we use an emulator different from the one used for the source parameters. In Figure 10 we show the average temperature (T K ) of regions with ionized fraction smaller than 0.5, the volume fraction of heated regions f heat , and the average brightness temperature δT b . As expected, T K re-mains small for a combination of high M min,X and low f X , which also keeps f heat low. In this case the average signal remains in absorption, as shown in the right panel of the figure. On the other hand, T K is high for the opposite case of a low M min,X and a high f X , for which f heat approaches 1 and δT b becomes positive. The parameter space excluded by the LOFAR upper limit at scale k = 0.075 h Mpc −1 is shown by the solid curves in all panels. It corresponds to 10 K T K 100 K, f heat 0.3 and -200 mK δT b −100 mK.
In this scenario, the size distribution of the heated regions is more relevant than the size distribution of the ionized regions. Similarly to the size distribution of the ionized regions considered in the previous scenario, here we analyze the size distribution of the heated regions, characterising the PDF with two parameters, namely R heat peak and ∆R heat FWHM , which represent the size of the heated regions at which the PDF becomes maximum and full width of the half maximum of the PDF, respectively. Figure 11 shows the distribution of R heat peak and ∆R heat FWHM , suggesting that the characteristic size of the heated regions increases with increasing f X . The white regions represent an IGM fully heated above the CMB temperature. The parameter space in the range R heat peak ≈ 5 − 20 h −1 Mpc and ∆R heat FWHM ≈ 10 − 30 h −1 Mpc is the one excluded by the LO-FAR upper limit at z = 9.1. Note that the excluded parameter space requires satisfying all of the above-mentioned conditions.

MCMC results
Next, we explore the three-dimensional parameter space, i.e. ζ, M min,X and f X , using MCMC to find models that are ruled out by the current LOFAR upper limit. Similar to our previous scenario, we have used 20 walkers and 10 6 steps for the MCMC analysis and checked the convergence of the chains. Note that we use a flat prior on x HII (z = 9.1) ≤ 0.81. The outcome of the analysis is summarized in Figure 12. Clearly, a high emissivity of X-ray photons ( f X 0.3) with a large M min,X ( 10 10 M ) is the most likely to be excluded within the 68 per cent credible intervals by LOFAR alone. This combination of parameter values results in large heated regions around rare massive halos embedded in a cold IGM. On the other hand, the combination of large f X and a small M min,X causes more uniform heating and thus it reaches more easily the T S T γ condition where the power spectrum remains lower than the measured one. Similarly, a very small value of f X yields almost no heating and coincides with the scenario discussed in the previous section. In such models, a larger value of ζ is more likely to be ruled out as we have also seen in the previous scenario. Therefore we see a second ruled out region in the parameter space shown in Figure 12.
Next, we will constrain the IGM parameters of this nonuniform T S scenario, and show the posterior distribution of the IGM parameters in Figure 13. These results are also listed in Table 5. Clearly, two regimes of the parameter space are likely to be excluded. The first one has large H II regions in a poorly heated IGM, which is the configuration already discussed in the previous section. In this case, least likely values of the IGM parameters are: 0.5 x HII 0.6, T K 3.55 K with f heat ≈ 0. The second part of the parameter space which is likely to be excluded corresponds to large heated regions with: x HII 0.08,  Figure 13.
Up to this point we have only considered α = 1.2. A less steep SED with a smaller value of α contains a smaller number of soft X-ray photons and a larger number of hard X-ray photons. Thus, the heating due to an X-ray spectrum with smaller α is less patchy than that from a steeper spectrum (see e.g., Pacucci et al. 2014;Das et al. 2017;Islam et al. 2019), resulting in a smaller amplitude of the large-scale power spectrum of the signal. We have verified that for α = 0.3 the results are similar to those obtained with α = 1.2, except that the contour of the excluded region (see Figure 9) shrinks towards higher M min,X values and it shifts slightly towards higher values of f X .

DISCUSSION
We have considered two extreme scenarios, one in which fluctuations at large scales are driven by large ionized regions in a uniform spin temperature IGM, and the other in which they are driven by large heated regions in a non-uniform spin temperature IGM. One question that naturally arises is whether there exist other models capable of exceeding the LOFAR upper limits which are not covered by the two scenarios we have explored. As fluctuations in the 21-cm signal are induced by ionization and/or spin temperature fluctuations, it seems hard to come up with alternative scenarios which can be excluded without invoking non-standard physics.
A second question is whether the extreme cases considered are in any way realistic or whether they are already excluded by other observations. We have limited ourselves to deriving constraints from the LOFAR upper limits at z = 9.1 and have not added information from other redshifts, apart from a very conservative upper limit on the ionized fraction based on the Thomson scattering optical depth derived from the Planck results. This has been a conscious choice as using data from multiple redshifts requires assumptions about the evolution of the source properties which, given the small constraining power of the LOFAR upper limits, does not seem justified. However, it is still possible to apply a minimal check on the models that we find to be excluded by the LOFAR upper limits.
We first consider the scenario in which the excluded models require a fairly large value for x HII . The results from Mitra et al. (2015) show that the combined constraints from Planck and z > 6 quasar spectra imply that x HII 0.4 at z = 9.1. Although this limits the constraining power of the LOFAR upper limits, the latter is still unique in excluding some models, as we have found cases with x HII ≈ 0.3 and 1 − T γ /T S = −12 which violate them (see Figure 4). Monsalve et al. (2017) presented phenomenological constraints on the evolution of the global 21-cm signal derived from EDGES high-band observations. These constraints are mostly about changes in the signal and are therefore very different from the single z = 9.1 upper limits used for our results. However, our approach does produce values for the global signal (see the right hand panels in Figures 4 and 10), with excluded models lying in the range -250 K δT b −55 mK. These can be compared to the values in Figure 9 of Monsalve et al. (2017), where the authors show that for a minimum value of -200 mK, the ∆z for the full width half maximum of the entire absorption feature has to be above ≈ 5. They also show that this lower limit is inconsistent with an end of reionization at z ≈ 6. At face value this implies that the models excluded by the new LOFAR upper limits on the 21-cm power spectrum are also excluded by the EDGES high-band constraints on the evolution of the global signal. However, it should be kept in mind that the EDGES constraints are based on an assumed Gaussian profile Table 5. Constraints from the MCMC analysis on the IGM parameters of the non-uniform T S scenario at z ≈ 9.1. Note that our analysis excludes the parameter space that satisfies all the conditions given in this This comparison to previous results shows that the new LO-FAR upper limits exclude rather extreme models which were already unlikely in view of other observational constraints. However, it is important to point out that the LOFAR observations are of a very different character and thus contribute a new and independently obtained piece of the reionization puzzle. As we obtain more stringent upper limits and additional redshift points, the constraints will improve and start to rule out increasingly large regions of the parameter space.

SUMMARY & CONCLUSIONS
In this paper, we have used the new LOFAR upper limit on the dimensionless spherically averaged power spectrum of the 21-cm signal from redshift ≈ 9.1 (Mertens et al. 2020) and investigated which reionization scenarios can be ruled out by it. The upper limits as obtained from 10 nights of observations are (58.97) 2 mK 2 and (95.21) 2 mK 2 at scales k = 0.075 and 0.1 h Mpc −1 , respectively. As these numbers are much larger than the amplitude of the power spectrum expected for standard reionization histories, we mainly focused on the extreme models that produce such high values for the large-scale power spectrum. However, our study also covers the usual range of the parameter space.
With the code GRIZZLY we generated power spectra for thousands of models for different combinations of parameters namely, ionization efficiency (ζ), minimum mass of the UV emitting halos (M min ), minimum mass of X-ray emitting halos (M min,X ) and Xray heating efficiency ( f X ). On the basis of these results we build emulators for different scenarios based on Gaussian process regression that map source parameters to power spectra. These emulators combined with an MCMC framework are then used to constrain the source parameters at z ≈ 9.1 using the observed upper limits. We also build emulators that map source parameters to IGM parameters, which are used to put constraints on the IGM parameters. We considered two extreme scenarios in which large-scale fluctuations of the signal are driven by (i) ionized regions embedded in an IGM with a uniform spin temperature, (ii) spin temperature fluctuations.
As the 21-cm observations themselves characterise the state of the IGM, a major focus of this study is to constrain the thermal and ionization state of the IGM at z ≈ 9.1 using these upper limits. We study the state of the IGM in terms of parameters such as the average ionization fraction (x HII ), average gas temperature of the partially ionized IGM (T K ), (1-T γ /T S ), mass averaged brightness temperature (δT b ), volume fraction of the heated region ( f heat ), size of the H II (heated) regions at which the PDF of the sizes peaks R HII peak (R heat peak ) and the FWHM of the PDFs ∆R HII FWHM (∆R heat FWHM ). The results of our study can be summarized as follows.
• In the uniform T S scenario, we found that the models which can be ruled out by the upper limit have a high UV photon emission rate. More specifically, the model with the coldest possible IGM, i.e. T S 2.1 K, requires an emission rate 2.85 × 10 46 s −1 M −1 , which is 10 times larger than that predicted by population synthesis codes. At the same time, those models require a suppression of ionizing photons from halos with mass 10 9.8 M .
• A high emissivity of the UV photons renders the gas in the IGM largely ionized at the target redshift, so that ionized fractions x HII 0.13 are excluded within a 95 per cent credible interval. At the same time, the H II bubbles required have to be few in number and large in size. The characteristic size of the H II bubbles needs to be, R HII peak 8 h −1 Mpc, with a FWHM of the probability distribution of the size distribution larger than 16 h −1 Mpc. This keeps the average brightness temperature of the excluded models −250 mK. The size of the parameter space which can be excluded depends crucially on the value of T S , as it decreases with increasing T S and no constraints can be set for T S 3 K.
• For the scenario where the large-scale fluctuations of the signal are driven by spin temperature fluctuations, we found that the models ruled out are those in which regions with temperature larger than CMB cover a volume fraction 0.34 and at the same time are large with a characteristic size in the range 3.5 − 70 h −1 Mpc and a size distribution with a FWHM of 110 h −1 Mpc. The average gas temperature of the partially ionized regions for these excluded models is 7-160 K, while the average brightness temperature lies in between -234 mK and -65 mK. The heated regions required for these excluded models are large in size and few in number at the same time. This implies that scenarios in which the heating is driven by fewer X-ray emitting sources hosted by the rare massive halos (M min,X 10 10 M ) with a high emissivity of X-ray photons (X-ray luminosity 10 34 erg s −1 M −1 ) are more likely to be ruled out by the current upper limit.
As the current upper limits on the 21-cm power spectrum are rather large and restricted to one redshift, the constraints on the IGM and source parameters that can be obtained are not yet very strong. However, they do illustrate the potential of this type of observations to characterise the state of the IGM and from this the properties of early sources in a redshift range which has not been yet well explored. We expect LOFAR to produce more stringent upper limits on the power spectrum both through analysing more of the available data (also at other redshifts) and improving the meth-ods to deal with systematic effects. Combining these with other observables, such as the global 21-cm signal and observations of high-z galaxies using the present and next generation of ground based and space telescopes such as the James Webb Space Telescope, the European Extremely Large Telescope and the Atacama Large Millimetre Array, will give us a much deeper understanding of this crucial period in the history of the Universe.

APPENDIX A: GRIZZLY
Here we briefly describe the one-dimensional radiative transfer method used in the code GRIZZLY to simulate the redshifted 21-cm signal from the EoR. We refer the reader to Ghara et al. (2015aGhara et al. ( , 2018 for a more detailed description of the method. While the basic approach of GRIZZLY mainly follows the BEARS algorithm (Thomas & Zaroubi 2008;Thomas et al. 2009;Thomas & Zaroubi 2011), it differs slightly from the original method. Both codes avoid solving the one-dimensional radiative transfer equations on the fly and, instead, they use previously generated 1D ionization profiles to simulate an ionization field. Below, we briefly describe the steps used in GRIZZLY: • First, we generate a large number of 1D profiles of ionized fraction and kinetic temperature for different combinations of source parameter values. The parameters used for this are ionization efficiency, the ratio of X-ray and UV luminosities, X-ray spectral index, over-density of the uniform background IGM and redshift. In this study, we assume that the age of the source is 10 Myr. For a given cosmology, these profiles need to be generated only once.
• Next, we determine the size of the H II regions in all the 1D profiles and create a list of their radii for different parameter values. These are defined as the distance from the center of the source at which the ionized fraction drops to 0.5.
• Given a halo with a certain mass and position, we first determine the corresponding UV luminosity. From this, we determine the size of the H II region around that halo using the density field and the list of radii as generated in the previous step. This is done iteratively as follows. We start with a small value of the radius, estimate the spherically averaged over-density contained within it and look for the same combination of radius and over-density in the precompiled list. If this is not found, we change the initial choice of the radius and continue the iteration until a match is obtained. This step is repeated for all halos. The corresponding ionization profiles are used to generate the ionization field.
• When individual H II regions overlap, we estimate the number of photons in excess and distribute them around the surface of the overlapping regions so that all the photons are used for ionization.
• We then generate the kinetic temperature field from the ionization field and a correlation of the ionized fraction and the gas temperature (for details, see Ghara et al. 2015a).
Given the value of the uniform T S , the δT b maps can be generated using the ionization maps and density field for our first scenario. For our second scenario, which assumes T S = T K , we use the ionization, density and temperature maps to generate the δT b maps following Eq. 1.

APPENDIX B: LIKELIHOOD FOR UPPER LIMIT OBSERVATIONS
Using Bayes theorem, we can write the posterior of our model parameters θ for simulating the model power spectrum ∆ 2 m (k, θ) given the observed power spectrum ∆ 2 o (k) as follows, where the first and second term in the right hand side of the equation are the likelihood L(θ|∆ 2 o (k)) and prior, respectively. If ∆ 2 o (k) is a deterministic, a scenario is ruled out when the modelled power spectrum ∆ 2 m (k i ) is above ∆ 2 o (k i ) in any one wavenumber-bin k i . We can write L(θ|∆ 2 o (k i )) as a Heaviside function H (∆ 2 o (k i ) − ∆ 2 m (k i )). However, the ∆ 2 o (k i ) is probabilistic with mean of ∆ 2 21 (k i ) and standard deviation of ∆ 2 21,err (k i ). Therefore we need to draw a ∆ 2 a (k i ) from a normal distribution N ∆ 2 21 (k i ), ∆ 2 21,err (k i ) and calculate the probability of our model. Here ∆ 2 a (k i ) is a nuisance parameter over which we can marginalise to get the L(θ|∆ 2 o (k i )). Therefore L(θ|∆ 2 o (k i )) can be written as follows, The value for p(∆ 2 a (k i )|θ) is a Heaviside function H (∆ 2 a (k i ) − ∆ 2 m (k i )), while p(∆ 2 o (k i )|∆ 2 a (k i )) is defined by a N ∆ 2 21 (k i ), ∆ 2 21,err (k i ) . Putting these functions into equation B2, we get L(θ|∆ 2 o (k i )) = 1 √ 2π∆ 2 21,err