Possible evidence for a large-scale enhancement in the Lyman-$\alpha$ forest power spectrum at redshift $\mathbf{\textit{z}\geq 4}$

Inhomogeneous reionization enhances the 1D Lyman-$\alpha$ forest power spectrum on large scales at redshifts $z\geq4$. This is due to coherent fluctuations in the ionized hydrogen fraction that arise from large-scale variations in the post-reionization gas temperature, which fade as the gas cools. It is therefore possible to use these relic fluctuations to constrain inhomogeneous reionization with the power spectrum at wavenumbers $\log_{10}(k/{\rm km^{-1}\,s})\lesssim -1.5$. We use the Sherwood-Relics suite of hybrid radiation hydrodynamical simulations to perform a first analysis of new Lyman-$\alpha$ forest power spectrum measurements at $4.0\leq z \leq 4.6$. These data extend to wavenumbers $\log_{10}(k/{\rm km^{-1}\,s})\simeq -3$, with a relative uncertainty of $10$--$20$ per cent in each wavenumber bin. Our analysis returns a $2.7\sigma$ preference for an enhancement in the Lyman-$\alpha$ forest power spectrum at large scales, in excess of that expected for a spatially uniform ultraviolet background. This large-scale enhancement could be a signature of inhomogeneous reionization, although the statistical precision of these data is not yet sufficient for obtaining a robust detection of the relic post-reionization fluctuations. We show that future power spectrum measurements with relative uncertainties of $\lesssim 2.5$ per cent should provide unambiguous evidence for an enhancement in the power spectrum on large scales.


INTRODUCTION
Intergalactic neutral hydrogen along the line of sight toward highredshift quasars leaves an observable spectral signature in the form of Ly absorption lines. These features are cumulatively referred to as the Ly forest (see e.g. Rauch 1998;McQuinn 2016), and their observable properties are linked to the physical conditions in the distribution of intergalactic matter on scales of ∼ 0.5-50 comoving Mpc. The factors that determine the observable properties of the Ly forest absorbers -and hence the physical properties of the intergalactic medium (IGM) -may be grouped into two broad categories: those of cosmological origin, such as the nature of dark matter and the shape of the matter power spectrum (Croft et al. 2002;Seljak et al. 2006;Iršič et al. 2017a;Garzilli et al. 2019;Villasenor et al. 2022), and those of astrophysical origin, such as feedback processes (Theuns et al. 2002b;Viel et al. 2013b;Gurvich et al. 2017;Chabanier et al. 2019) and the ionization and thermal state of the IGM (Schaye et al. 2000;Bolton et al. 2005;Boera et al. 2014;Hiss et al. 2018;Gaikwad et al. 2021). Canonically, the ionization ★ E-mail: margherita.molaro@nottingham.ac.uk † E-mail: james.bolton@nottingham.ac.uk state of the IGM is determined by the UV photons emitted by stars and active galactic nuclei. As a result, intergalactic Ly absorption is also a key probe of the final stages of the reionization era at > 5 (Fan et al. 2006;Becker et al. 2015;Eilers et al. 2018;Bosman et al. 2022) A widely used statistic for characterising the Ly forest is the 1D power spectrum of the transmitted flux (McDonald et al. 2000;Palanque-Delabrouille et al. 2013;Iršič et al. 2017b;Walther et al. 2018;Boera et al. 2019;Karaçaylı et al. 2022). When studying the IGM approaching the reionization era, the 1D power spectrum is useful in several different ways. First, the power spectrum amplitude is sensitive to the average Ly forest transmission, and hence also the IGM neutral hydrogen fraction (Mishra & Gnedin 2022). Second, the shape of the power spectrum on small scales (i.e. for wavenumbers log 10 ( /km −1 s) −1.5) depends on the IGM thermal history, through the shape of the Doppler broadening kernel and role that gas pressure plays in setting the physical extent of Ly absorbers (Nasir et al. 2016). Finally, on large scales (log 10 ( /km −1 s) −1.5), the power spectrum is sensitive to spatial fluctuations in the thermal and ionization state of the IGM (Cen et al. 2009). These fluctuationsassociated with the inhomogeneous heating of the IGM during reionization -will linger for some time after reionization has completed due to the long cooling timescale in the low density IGM (Theuns et al. 2002a;Hui & Haiman 2003). The consequence is that observable relics of the reionization era will be imprinted in the Ly forest power spectrum at redshift > 4.
In this context, Molaro et al. (2022) (hereafter M22) used Sherwood-Relics -a set of hybrid radiation hydrodynamical simuluations of the IGM during reionization (Puchwein et al. 2022) -to study the effect of patchy reionization on the 1D Ly forest power spectrum. In agreement with a number of other studies (Cen et al. 2009;Keating et al. 2018;D'Aloisio et al. 2018;Oñorbe et al. 2019;Wu et al. 2019;Montero-Camacho & Mao 2020), M22 found that remnant patches of hot, highly ionised, low density hydrogen left over following reionization produce large-scale variations in the Ly forest transmission at 4.2 < < 5. These enhance the 1D power spectrum of the transmitted flux by 10-50 per cent on large scales, log 10 ( /km −1 s) −1.5, with this effect being largest close to the end point of reionization. However, M22 also demonstrated that these fluctuations will have a limited effect on the recovery of thermal parameters from existing measurements of the Ly forest power spectrum at > 4 (e.g. Boera et al. 2019). One reason for this is that the measurements of Boera et al. (2019) do not include the larger scales at log 10 ( /km −1 s) −2.2 where the additional power is expected to be most significant (see e.g. Fig. 6 in M22). Surveys such as the Dark Energy Spectropscopic Instrument (DESI) survey (Vargas-Magana et al. 2019) and the William Herschel Telescope Enhanced Area Velocity Explorer-Quasi-stellar Object (WEAVE-QSO) survey (Pieri et al. 2016) will, however, extend to such large scales with lower resolution spectra, and will measure the 1D Ly forest power spectrum to a precision of a few per cent. Inhomogeneous reionization effects at > 4 will therefore be an important astrophysical "nuisance" parameter that must be included within the forward modelling frameworks used to constrain the underlying matter power spectrum from these observations. Conversely, the presence of additional large scale power in the Ly forest data can also provide valuable information on the end stages of inhomogeneous reionization, as this may be consistent with relic fluctuations in the ionization and thermal state of the IGM (e.g. D' Aloisio et al. 2018).
In this work we will build on M22 by investigating the possibility of detecting inhomogeneous reionization using the 1D Ly forest power spectrum on large scales. For this purpose we will make use of the recent 1D power spectrum measurements presented by Karaçaylı et al. (2022) using data from the Keck Observatory Database of Ionized Absorption toward Quasars (KODIAQ) (O'Meara et al. 2017), the Spectral Quasar Absorption Database (SQUAD) (Murphy et al. 2019) and the XQ-100 survey (López et al. 2016). Importantly, these data extend to larger scales, log 10 ( /km −1 s) −3, compared to Boera et al. (2019), although the precision is still rather modest, with relative uncertainties in the power spectrum of 10-20 per cent. As we shall demonstrate, however, this is nevertheless sufficient for providing evidence for enhanced large scale power in the data. A related earlier result was presented by D' Aloisio et al. (2018), who performed an analysis of the 1D power spectrum at slightly higher redshift, = 5.2-5.6, using the sample of 21 quasars presented by McGreer et al. (2015). By comparing these data with a seminumerical model for the patchy ionization of the IGM, D' Aloisio et al. (2018) found tentative evidence (∼ 2 ) for an enhancement in large scale power at log 10 ( /km −1 s) −3.
We furthermore introduce two important updates to the Bayesian inference framework used in M22. First, we improve the accuracy and speed of our 1D Ly forest power spectrum emulator, by replacing the linear interpolation approach we had previously used to build our model grid (e.g. Viel & Haehnelt 2006) with a neural network.
Second, we introduce a more flexible, model independent approach for including the patchy ionization correction quantified by M22 within our Bayesian parameter estimation framework. This paper is structured as follows. In Section 2 we briefly redescribe the simulations used in M22 and introduce the new machinelearning generated power spectrum emulator that will be adopted in this work. In Section 3 we consider the effect of patchy reionization on the Ly power spectrum, and introduce a new parameterisation that allows the power spectrum template presented by M22 to be implemented within our Markov Chain Monte Carlo analysis in a more general, model independent way. In Section 4, we apply our framework to the data presented by Karaçaylı et al. (2022) and discuss the possible evidence for an enhancement in power on large scales. We furthermore consider the possibility of applying our analysis to future, higher precision data before concluding in Section 5. We assume a flat ΛCDM cosmology throughout this work, with Ω Λ = 0.692, Ω m = 0.308, Ω b = 0.0482, 8 = 0.829, s = 0.961, ℎ = 0.678 (Planck Collaboration et al. 2014), and a primordial helium fraction by mass of p = 0.24 (Hsyu et al. 2020).

Numerical simulations
In this work -with the exception of the neural network described in Section 2.2 -we use the same Bayesian inference and simulation set-up adopted in M22. This is based on a Monte Carlo Markov Chain (MCMC) sampler combined with a Metropolis Hastings algorithm, as introduced by Iršič et al. (2017a). The backbone of our Ly forest power spectrum emulator consists of 12 hydrodynamical simulations that form part of the Sherwood-Relics simulation suite (Puchwein et al. 2022). Each model follows a 20ℎ −1 cMpc cosmological volume with 2 × 1024 3 particles using variations of the Puchwein et al. (2019) spatially homogeneous UV background synthesis model (see Table 1 in M22). For completeness we briefly recapitulate some key information about the simulations here, but refer the reader to M22 for further details. The IGM thermal history in each of the 12 simulations is characterised by three parameters which, as in M22, will be used to described the IGM thermal state: the cumulative energy per unit mass deposited into gas at the mean background density, 0 , the temperature at mean density, 0 , and , the index in the power-law relation describing the temperature-density relation, where = 0 [ / ] −1 (Hui & Gnedin 1997;McQuinn 2016), where / is the local gas overdensity. The 12 simulations, each characterised by a different 0 value, are then post-processed to finely sample the 0 -parameterspace: gas particles in the simulations are translated and rotated in the temperature-density plane to obtain different combinations of these parameters (see Boera et al. (2019), Gaikwad et al. (2020), and M22 for more details).
The resulting Ly optical depths are also rescaled in post processing to give different values for the effective optical depth, eff , where eff = −ln and is the mean Ly forest transmission. In each redshift bin, the parameter grid uniformly spans 0 between 5 000 K and 14 000 K in steps of 1 000 K, between 0.9 to 1.8 in steps of 0.1, and eff from 0.3 to 1.7 times the observed effective optical depth eff,obs (Viel et al. 2013a) in steps of 0.1 eff,obs for each redshift bin considered. This leads to 10 × 10 × 15 × 12 = 18 000 post-processed sets of Ly forest spectra in the parameter grid for each redshift considered. The 1D Ly forest power spectrum is then calculated from Figure 1. Schematic illustrating the neural network used to predict a Ly forest power spectrum (ˆ) sampled with bins in wavenumber, , given four input astrophysical parameters ( eff , 0 , , 0 ). In addition to the input layer with four nodes for the input parameters, the network has three hidden layers (blue, green and orange) with 60 nodes each, and an output layer (yellow) with nodes, one for each -bin. W and b are the weight and bias matrices respectively, is the ReLU activation function (see text for details), a is the value of the node, and s is an intermediate value used in the calculation of a. The upper index in the square bracket refers to the layer (0 to 4, where 0 is the input layer and 4 the output layer), while the lower indices refer to the node. In the case of the weights, which "connect" two layers, the first lower index refers to the node in the first layer, and the second to the second layer. Equations are provided only for the top node of each layer. each set of spectra assuming a constant log 10 ( /km −1 s) step-size of 0.1, following Boera et al. (2019) and M22 (although we will adopt a different binning strategy when analysing observational data in Section 4). We refer to this grid of simulated power spectra as G homog , because it relies on simulations performed with a spatially homogeneous ultraviolet (UV) background.
Finally, as discussed in M22, for each redshift considered the 18 000 realisations of the power spectrum in G homog can be modified using a template that includes the effect of inhomogeneous reionization on the large scale power at log 10 ( /km −1 s) ≤ −1.5. This template uses the data compiled in Table 2 and eq. (3) of M22 (see also Section 3.2 later). We refer to the resulting grid of modified power spectra as G patchy , because these now also include the expected (model-dependent) effect of patchy reionization on large scales. Furthermore, for all mock realisations of the power spectrum the off-diagonal elements in the covariance matrix were obtained from the 80ℎ −1 cMpc, homogeneous-UVB Sherwood-Relics simulation with 2048 3 gas particles described in Puchwein et al. (2022). A larger box size was chosen for constructing the covariance in this work to limit any artificial correlations at log 10 ( /km −1 s) < −2.3.

Neural network interpolator
How to interpolate between different simulations to obtain a theory prediction at any point in parameter space is an issue common to all MCMC-based Bayesian inference studies of the Ly forest. Methods currently in use are either based on linear interpolation techniques (Viel et al. 2013a;Iršič et al. 2017a;Yèche et al. 2017;Palanque-Delabrouille et al. 2020), as were adopted in M22, or on alternative techniques such as Latin hypercube sampling and Gaussian process interpolation (e.g. Bird et al. 2019;Pedersen et al. 2021;Fernandez et al. 2022).
In this work, we propose an approach based on machine learn-ing techniques. We train a supervised neural network to generate the 1D power spectrum from the grid of available simulations with very high levels of accuracy, making it a reliable emulator to use in parameter recovery from forthcoming high-precision data. While a neural network trained on our set of simulations may not be immediately generalisable to simulations used in other studies, similar neural networks could very easily be trained using independent parameter grids, and transfer-learning could be invoked to avoid having to retrain the networks completely from scratch (Pan & Yang 2009). The machine learning model that we adopt here is an artificial neural network -sometimes known as a feed-forward neural network -which consists of a series of mathematical operations applied on the input parameters (see Fig. 1). Such operations are dependent on the choice of an "activation function" which we denote by , and the particular sequence in which the operations are applied. This sequence is best described by combinations of nodes organised in layers, linked so that every node in a given layer becomes the input to every node in the following layer. This gives rise to the "neural network" which lends its name to this method.
The output of node in layer , denoted by a [ ] , is given by where the weight, W, and bias, b, are matrices whose values are iteratively constrained during the training process, and where denotes the transpose of the matrix. A neural network without an activation function is essentially a linear regression model, with the activation function adding non-linearity. The number of nodes in each layer and the total number of layers considered are free parameters that can be modified to improve the network's performance (Hornik et al. 1989).
In this work, the neural network will be used to obtain predictions of the Ly forest power spectrum at parameter values in between those already included in our simulation grid. In our case, therefore, the input parameters a 0 are the four astrophysical parameters that our u 0 (eV m −1 p ) D 5% mock + G patchy + NN D 5% mock + G patchy + Lin The one and two dimensional posterior distributions for eff , 0 , and 0 at = 5 from our analysis of a mock Ly forest power spectrum drawn from the RT-late simulation. We assume 5 per cent relative uncertainties on the mock data for -2.8 ≤ log 10 [ ( /(km −1 s) ] ≤ −0.7. Results are displayed for the linear interpolator (blue contours) or the neural-network interpolator (orange contours) using the simulation grid G patchy . The purple stars show the true parameters used in the mock data. The bin at = 5 has the poorest match between the neural network and linear interpolator, which nonetheless is very good. All parameters are recovered to within 1-2 .
Bayesian inference MCMC method seeks to constrain, that is a 0 = { eff , 0 , , 0 }, where eff was converted into = e − eff prior to training. The output of the network will be the values of the power spectrum at a discrete number, , of bins in wavenumber, . As parameter recovery is performed independently using power spectrum data in separate redshift bins, we always consider independent neural networks trained, validated, and tested on 1D power spectra at the redshift under consideration. We furthermore independently train networks on both G homog and G patchy .
In each redshift bin, the data for the training, validation, and testing of the neural network are the 18 000 realisations of the power spectrum obtained from the simulations described in Section 2.1. In each case, a randomly selected 90 per cent of these power spectra are used to train the neural network, while 10 per cent are saved for validation. The loss function used for training is the mean squared error function, such that for power spectrum in the training sample, whereˆ, is the neural network prediction for the power spectrum, and , is the true value of the power spectrum in -bin for training sample . After experimenting with different activation functions we settled on the Rectified Linear Unit (ReLU) (Fukushima 1975), and chose a neural structure of [60,60,60]. Training was performed with batching (with batch size 12) to computationally optimise the process (LeCun et al. 2012).
The precision of our neural network was tested using k-fold cross validation (Stone 1974) as follows: the complete set of simulated realisations available in each redshift bin (18 000 power spectra) was first shuffled, and then divided into 10 sets of equal size. The initial randomization ensures that the thermal parameters of the power spectra making up each set are randomly chosen. One of the sets was then held back and the neural network was trained on the remaining 9 sets.
The neural network was then tested on the held-back set, resulting in a distribution of neural-network-predicted/true values for the 1 800 power spectra in the set not used in training. This process was then iteratively repeated for all the ten sets, resulting in ten independent distributions of neural-network-predicted/true values.
In Fig. 2 (left panel), we show the average distribution of the ten neural-network-predicted/true distributions independently obtained for neural networks trained on G patchy . The contour plots show (in blue) the average 68 and 95 per cent confidence intervals, while the blue solid line shows the average of the median of each distribution. The recovery error in the log 10 [( /(km −1 s)] < -1.2 range is around 0.5 per cent at the 68 per cent confidence level, increasing up to ∼1 per cent at largest wavenumbers considered here (log 10 [( /(km −1 s)] = -0.6), representing a very high level of precision. Furthermore, we find the uncertainty in the recovery to be well approximated by a Gaussian distribution in all -bins considered.
In the right panel of Fig. 2, we compare the one and two dimensional posterior distributions for eff , 0 , and 0 at = 5 from our analysis of a mock Ly forest power spectrum drawn from the RT-late simulation described in M22. We assume 5 per cent relative uncertainties on the mock data for -2.8 ≤ log 10 [( /(km −1 s)] ≤ −0.7. The results are obtained using either a linear interpolator (blue contours) or the neural-network interpolator introduced in this work (orange contours) for our G patchy grid of models. The purple stars show the true parameters of the mock data, showing that we recover the input parameters within 1-2 . We see that the choice of interpolator makes T 0 /10 4 (K) Figure 3. The one and two dimensional posterior distributions for eff , 0 , , and 0 from an analysis of mock 1D power spectrum data drawn from the M22 RT-late simulation at = 5. The mocks assume a 5 per cent relative error on the power spectrum. The purple stars and vertical dashed lines correspond to the true parameter values in the simulation. The results are obtained using either our original grid of homogeneous UVB models (G homog , blue contours) or that same grid including the M22 template for the effect of inhomogeneous reionization on the power spectrum (G patchy , orange contours). The left panel is for mock data spanning wavenumbers −2.2 ≤ log 10 [ ( /(km −1 s) ] ≤ −0.7, whereas the right panel is for mock data extending to smaller wavenumbers/larger scales, −2.8 ≤ log 10 [ ( /(km −1 s) ] ≤ −0.7. Note how the parameter recovery is more sensitive to the effect of inhomogeneous reionization when including data on larger scales.
little difference to the parameter recovery. However, along with the significantly improved control of interpolation uncertainties, the neural network interpolator reduces the computational requirements of the MCMC sampler. Using the same computational hardware, we find the neural network requires ∼ 1.5 per cent of the CPU time required by the linear interpolator for an equal number of steps. We will leverage the speed of this method in Section 4.4, where we perform an analysis of multiple realisations to assess the variance in our parameter recovery.

Revisiting M22 using the 1D power spectrum on large scales
We now revisit the analysis of the 1D power spectrum presented by M22. This earlier work concluded that the effect of inhomogeneous reionization on the shape of the power spectrum at 4.2 < < 5 should not strongly bias existing constraints on IGM thermal parameters obtained from an analysis of high resolution Ly forest data (see also Wu et al. 2019). This conclusion was based on assuming a relative uncertainty of 10 per cent on power spectrum measurements (i.e. similar or smaller than the level of precision currently achieved for the high resolution data presented by Boera et al. (2019)). On the other hand, M22 noted that, for data with relative uncertainties at the 5 per cent level, patchy reionization effects that change the shape of the power spectrum at small scales, log 10 ( /km −1 s) > −1.5 will introduce a modest (∼ 1 ) shift in the recovery of IGM thermal parameters. M22 found this was primarily driven by divergent peculiar velocity gradients and variations in the thermal broadening kernel that alter the shape of the power spectrum on small scales.
However, the power spectrum analysis presented by M22 only considered wavenumbers log 10 [ /(km −1 s)] ≥ −2.2, matching the measurement range presented by Boera et al. (2019). As discussed earlier, this excludes the larger scales where the enhancement in the power spectrum due to relic fluctuations in the neutral hydrogen fraction from reionization are expected to be largest. In this context, surveys such as DESI (Vargas-Magana et al. 2019) and WEAVE-QSO (Pieri et al. 2016) will not only achieve higher levels of precision, but will also probe larger physical scales in the redshift range 2 ≤ ≤ 4.5, where enhancements to the 1D power spectrum arising from patchy reionization are expected to be most prominent. Here, we therefore quantify how patchy reionization will impact on the recovery of IGM parameters by applying the M22 analysis to smaller wavenumbers/larger scales where the patchy correction is largest.
In Fig. 3, we use our parameter estimation framework to consider parameter recovery from mock data drawn from the RT-late simulation used in M22 (see also Puchwein et al. 2022). This simulation follows inhomogeneous reionization ending at redshift R = 5.3, where we choose to define the end point of reionization as the redshift when the volume-averaged H I fraction first falls below HI = 10 −3 in the model. We assume relative uncertainties of 5 per cent on the power spectrum and consider two different wavenumber ranges: −2.2 ≤ log 10 ( /km −1 s) ≤ −0.7 in the left panel (i.e. the range considered by M22 and Boera et al. (2019)), and a range that extends to larger scales, −2.8 ≤ log 10 ( /km −1 s) ≤ −0.7, in the right panel.
The panels in Fig. 3 show the one and two dimensional posterior distributions recovered for eff , 0 , and 0 in a single redshift bin at = 5. The results are obtained using our neural network   A p D 5% mock + G patchy D 5% mock + G homog + A p Figure 5. As for the right panel of Fig. 3, but now showing the results for G homog + p instead of G homog , where the former now uses the parameterisation for the patchy reionization template introduced in Eq. (4) (see text in Section 3.2 for further details). These two approaches lead to similar parameter constraints with overlapping contours. This indicates our updated G homog + p approach remains accurate, while simultaneously affording a greater degree flexibility and model independence to our analysis. power spectrum emulator, for either the grid of homogeneous UV background models used in M22 (G homog , blue contours), or that same grid including a template that captures the effect of inhomogeneous reionization on the power spectrum, based on eq. (3) in M22 (G patchy , orange contours). The purple stars and vertical dashed lines correspond to the true parameter values used in the simulation. First, as already noted in M22 (see their fig. 10), for wavenumbers −2.2 ≤ log 10 ( /km −1 s) ≤ −0.7 patchy reionization introduces a modest ∼ 1 shift between the true and recovered parameters (i.e. the blue contours and purple stars in the left panel of Fig. 3) when assuming a uniform UV background (G homog ). However, when including larger scales in the right panel, the systematic bias in the parameter recovery is much larger and is at the level of ∼ 6 for the total 2 . The largest shift is for eff ; this is unsurprising, given that eff sets the amplitude of the power spectrum on the scales where the patchy reionization effects are largest. There is also a large shift in the temperature-density relation, however this is entirely due to the very strong degeneracy between eff and . We also observe that the simulation parameters are generally well recovered (within ∼ 1-2 ) when applying our patchy reionization template to the grid of homogeneous UV background models (G patchy ), even when extending our analysis to larger scales. This is expected and serves as a useful consistency test, given that these mock data were drawn from one of the simulations used in M22 to obtain the patchy reionization template.
Parameter recovery is therefore much more sensitive to the effect of inhomogeneous reionization when including data on larger scales. As has been noted elsewhere (e.g. Cen et al. 2009;Keating et al. 2018;D'Aloisio et al. 2018;Oñorbe et al. 2019; Montero-Camacho & Mao 2020), this raises the interesting possibility that -given a significant detection of enhanced large scale power -the end stages of reionization may be constrained with precision measurements of the 1D Ly forest power spectrum on scales log 10 ( /km −1 s) −3 at > 4.

A generalised approach to modelling the enhanced large scale power from patchy reionization
Before turning to consider observational data, we additionally modify the framework first introduced by M22 to provide a more flexible, model independent approach for quantifying the amount of excess large scale power in data. The motivation for this is two-fold. First, it avoids using a parmeterisation that is directly tied to the redshift evolution of the reionization models used in M22. Second, it allows for a general parametrisation of the excess of large-scale power that can capture a larger variety of models. In the same manner as cosmological parameter data compression (e.g. McDonald et al. 2000;Pedersen et al. 2022), the interpretation of the measured excess large scale power can then be mapped back onto physical models. We implement this updated approach by continuing to use the template provided in Table 2 of M22, where the shape of the correction to the power spectrum for patchy reionization -obtained from radiative transfer simulations -is given by the quantity ( , ). Here where RT ( , ) ( homog ( , )) represent the 1D flux power spectrum from simulations that include (ignore) the effect of inhomogeneous reionization on the Ly forest, but otherwise have the same volume-averaged reionization history (see Puchwein et al. 2022, for further details). However, instead of assuming a model dependent redshift evolution for this template that is tied to the adopted reionization history (i.e. eq. (3) in M22), we now allow ( , ) to vary freely across each redshift bin within our analysis. We implement this by introducing the free parameter p with a flat prior in the range [0, 1]. This parameter controls the amplitude of the imprint of patchy reionization on the 1D flux power spectrum. This is achieved by relating it to the redshift, , at which the correction is linearly interpolated from the data in Table 2 of M22 by = 3.45 + 2.55 p , for 0 ≤ p ≤ 1.
This linear mapping has been obtained using the patchy reionization template ( , ) at 4.1 ≤ ≤ 5.4 in table 2 of M22, and linearly extrapolating to a lower and upper redshift limit of = 3.45 and = 6, respectively. The lower redshift is chosen such that the extrapolated template correction is negligible on large scales, which we find occurs at = 3.45 (i.e. for p = 0). The extrapolation similarly gives p = 1 by = 6. In practice, however, the exact parameterisation makes little difference to our results, as our main aim is to establish if p ≠ 0 is preferred in observational data. This is illustrated further in Fig. 4, where the coloured curves show the data presented in Table 2 of M22, and the dashed (dot-dashed) curves correspond to p = 0 ( p = 1) within our new parameterisation. We will refer to this approach as G homog + p within our analysis framework. In Fig. 5 we perform a brief consistency test of this approach, by once again examining the one and two dimensional posterior distributions obtained from the mock data used in Fig. 3. Recall the mock data are drawn from the RT-late simulation in M22 at = 5, and assume a 5 per cent relative error on the power spectrum measurement. The orange contours are identical to those shown in Fig. 3 (G patchy ), while the blue contours show the results for the generalised G homog + p approach. The purple stars and vertical lines, as usual, represent the true parameters in the mock data. Note that G patchy corresponds to a fixed value of p (as obtained by comparing Eq. (4) to eq. (3) of M22), such that ( , ) is assumed to exactly match the shape expected for a model with reionization ending at R = 5.3, as is appropriate for the model from which the mock data was produced. We observe that the two sets of contours overlap and p is recovered within 1 , indicating this approach maintains the accuracy of our parameter recovery, while also affording a greater degree of generality.

Observational data
We now apply our updated analysis framework to the observational measurements of the 1D Ly forest power spectrum at 4 ≤ ≤ 4.6 presented recently by Karaçaylı et al. (2022). The Karaçaylı et al. resolution on the smallest scales, and because we are interested in isolating the effect of inhomogeneous reionization on large scales, we further restrict our analysis by limiting the wavenumber range to < 0.1 km −1 s. This leaves 15 -bins in each of the 4 redshift bins, for a total of 60 data points.
By virtue of combining several data sets, these measurements boast one of the highest precision measurements to date of the flux power spectrum using high resolution Lyman-forest data. A typical uncertainty on the flux power spectrum at = 4.2 (4.6) is at the level of ∼ 10 − 15 per cent (20 − 30 per cent), ranging from intermediate to large scales. This is similar to the less sparsely sampled measurements of Boera et al. (2019), who use bin sizes Δ = 0.4 and report uncertainty levels on the flux power spectrum of ∼ 10 − 20 per cent (8 − 10 per cent) at = 4.2 (4.6). Furthermore, the Karaçaylı et al. (2022) power spectrum analysis is performed jointly on the quasar spectra from all three samples using the optimal quadratic estimator (Karaçaylı et al. 2020). An advantage over the more common fast Fourier transform estimators found in the literature (e.g. Viel et al. 2013a;Iršič et al. 2017b;Walther et al. 2019) lies in better control of the masked regions and therefore better recovery of the large-scale modes. This is critical for our analysis, as the effect of inhomogeneous reionization on the power spectrum is greatest on the largest scales probed by the data. Note also that although the flux power spectrum measurements from large spectroscopic surveys such as the extended Baryon Oscillation Spectroscopic Survey (eBOSS) reach a higher statistical precision (e.g. Chabanier et al. 2019), the typical spectral resolution prevents these surveys from measuring small-scale power. At large scales, the IGM thermal parameters are highly degenerate with the additional enhancement of power from reionization fluctuations, and adding small-scale information can significantly help in breaking this degeneracy (e.g. Viel et al. 2013a;Iršič et al. 2017a;Yèche et al. 2017). Furthermore, different systematic uncertainties can impact on the power spectrum measurements from low and high resolution data (see Section 4.3), such that a joint analysis with eBOSS data would be more complicated. For these reasons, we limit our investigation to the high-resolution data of Karaçaylı et al. (2022) and leave a joint analysis with low-resolution data to future work.

Data analysis
In order to compare our models to the flux power spectrum measurements of Karaçaylı et al. (2022), we use the same strategy and grid of simulations described in Section 2 and in M22. However, we make two minor but important modifications. First, because we are comparing our models to observational data, we must apply a mass resolution correction to all of our simulated models, which have been performed in 20ℎ −1 cMpc boxes with a gas particle mass of gas = 9.97 × 10 4 ℎ −1 (see Table 1 in M22). The resolution correction is obtained using a grid of simulations that have parameters matched to those used in M22, but with a mass resolution that is improved by a factor of 8. 1 This correction is largest at small scales, and is at most 12 per cent at = 0.1 km −1 s ( = 5), but is very small (< 1 per cent) on the large scales where the effect of patchy reionization is largest. Secondly, we now train the neural network on post-processed power spectra that use the same -binning as the Karaçaylı et al. (2022) data. We have verified that the accuracy and  Figure 6. The one and two dimensional posterior distributions for eff , 0 , , 0 and p from the analysis of the Karaçaylı et al. (2022) data using G homog (orange contours) and G homog + p (blue contours). The shaded regions show the 68 and 95 per cent confidence intervals for the joint distribution. For the latter, we also show in green the contours obtained when including the 0 − 0 prior described in Section 4. The four panels are for the four Karaçaylı et al. (2022) redshift bins analysed in this study, namely = 4.0, 4.2, 4.4, 4.6.
precision of the trained neural network remains at the same level demonstrated in Fig. 2 earlier. In Fig. 6 we show the one and two dimensional posterior distributions obtained from the analysis of the Karaçaylı et al. (2022) data excluding (G homog , orange contours) or including (G homog + p , blue contours) our parameterisation for the effect of inhomogeneous reionization on large scales. For the latter, we also consider a third case where a non-flat 0 − 0 prior is added to the analysis. We add this because the posterior on 0 − 0 is wide and some regions of parameter space (e.g. where 0 is very large and 0 is very low) are -3 -2.5 -2 -1.5 -1 log I [k/(km -1 s)] Figure 7. The best-fit F ( ) models obtained from our MCMC analysis of the Karaçaylı et al. (2022) data for the three cases displayed in Fig. 6: G homog shown by the orange dashed curves, G homog + p shown by the solid blue curves, and G homog + p + 0 − 0 prior shown by the dotted-dashed green curves. The data points with error bars show the Karaçaylı et al. (2022) power spectrum measurements in the four redshift bins considered here, = 4.0, 4.2, 4.4, 4.6. The best-fit 2 /d.o.f. values for the three cases are 55.4/44, 41.6/40, and 43.3/40 respectively, corresponding to p-values of = 0.12, 0.40 and 0.33. The data therefore exhibit a preference for an enhancement in the large scale power relative to models that assume a spatially homogeneous UV background.
unphysical. Hence, instead of allowing 0 and 0 to vary freely, we apply a prior that encompasses a physically plausible region within the 0 − 0 plane. We base this on the thermal histories used in our hydrodynamical simulations, corresponding to a region bounded by 6 ≤ ( 0 /eV m −1 p ) ( 0 /10 4 K) −1.7 ≤ 12 and 0.5 ≤ 0 /10 4 K ≤ 1.5. This prior enforces a tighter correlation between 0 − 0 that is almost perpendicular to the degeneracy axis between 0 and 0 found in the data. We note, however, that the 0 − 0 prior only affects parameter recovery from the power spectrum at the smallest scales, ∼ 0.1 km −1 s −1 , that are most sensitive to pressure smoothing and the thermal broadening kernel (Nasir et al. 2016); there is little effect on the recovery of p as a consequence. This result further suggests that the patchy reionization information contained in the p parameter is sensitive to the flux power spectrum on large scales only.
Finally, in Fig. 7, we compare the best-fit power spectrum models obtained from this analysis to the Karaçaylı et al. (2022) data in each redshift bin. Note that the goodness-of-fit varies across individual redshift bins, with the poorest at = 4.6 where -although the error bars are generally larger -there is some tension between the models and the change of the amplitude of the observed power spectrum from small to large scales. Note also that adjacent redshift bins are correlated, such that performing a simple "chi-by-eye" can be misleading. The joint fit across all four redshift bins is reasonable, with best-fit 2 /d.o.f. values of 55.4/44, 41.6/40, and 43.3/40 for the G homog , G homog + p , and G homog + p + 0 − 0 prior cases respectively, corresponding to p-values of = 0.12, 0.40 and 0.33. The improved 2 /d.o.f. values for the G homog + p cases with or without the 0 − 0 prior indicate that introducing the parameter p in the MCMC analysis -which introduces a boost to the large scale 1D power spectrum and a small suppression at small scales (see Fig. 4 and M22) -leads to a better fit to the observational data. There is a preference for p being required by the data at a significance of 2.7 (i.e. for an enhancement in the 1D power spectrum at large scales, log( / km −1 s) ∼ −3, relative to Ly forest models that assume a homogeneous UV background at > 4). This is in qualitative agreement with the independent analysis presented by D'Aloisio et al. (2018), although the formal significance we derive is higher. This may hint at the presence of relic fluctuations in the ionization and thermal state of the IGM following the completion of reionization (e.g. Cen et al. 2009;Keating et al. 2018;Oñorbe et al. 2019;Wu et al. 2019;Montero-Camacho & Mao 2020;Puchwein et al. 2022).

Alternative explanations for enhanced large scale power
Although the preference for non-zero p may be a signature of patchy reionization, other factors may be contributing to the boost in the 1D power spectrum at large physical scales. We briefly discuss some other possibilities here.
First, the enhanced flux power spectrum at large scales could have a non-astrophysical origin. Estimators used for the statistical analysis of fluctuations in the transmitted Ly flux require modelling of the intrinsic quasar continuum (e.g Francis et al. 1992;Suzuki et al. 2005;Ďurovčíková et al. 2020;Bosman et al. 2021). The modelling of this intrinsic quasar property is a complex procedure, and can leave residual contamination in the transmitted flux fluctuations. This can be due to the diversity in quasar continua when employing statistical methods to reconstruct the continuum, or due to absorption that can remove a large fraction of the intrinsic flux. The second of the two is especially of note for high-redshift quasars, where the larger IGM neutral fraction causes on average stronger absorption. A third possibility is linked to the echelle spectrographs commonly used in high-resolution spectroscopy. Observations made at higher echelle orders produce distortions in the spectrum over each order that are subsequently corrected for during flux calibration. This procedure can leave residual fluctuations imprinted on the absorption features. As has already been pointed out elsewhere (e.g. McDonald et al. 2005;Iršič et al. 2017b;Karaçaylı et al. 2022), any of the above mentioned sources of quasar continuum fluctuations could lead to enhanced power on larger scales, and this enhancement will be similar to the effect of patchy reionization. With this specifically in mind, Karaçaylı et al. (2022) marginalized over the parameters of their quasar continuum model in order to mitigate any scale-dependence from variations in the continuum. The preference for an enhancement in the large scale power we find in this work is therefore obtained already assuming a reasonable range of variation in the continuum. The marginalization procedure used by Karaçaylı et al. (2022), however, significantly increases the uncertainty on the power spectrum on large scales, and therefore effectively limits the increase in statistical power gained from combining different data sets.
There can also be other astrophysical effects that contribute to the shape of the large-scale power spectrum. Outflows driven by active galactic nuclei can suppress power on large scales, although this effect only becomes important toward lower redshifts, 2.5, where the volume filling factor of the outflows is larger and the Ly forest is sensitive to higher density gas (Viel et al. 2013b;Chabanier et al. 2020). At higher redshifts, enhancements to the power spectrum on large scales are more likely to arise from high column density systems, such as damped Ly absorbers and super Lyman limit systems with column densities HI 10 19 cm −2 . These systems have large Lorentzian damping wings that correlate on scales of order of ∼ 1000 s −1 km or ∼ 10 ℎ −1 Mpc. Measurements of the Ly forest flux power spectrum therefore typically compile a catalogue of such systems, and the affected regions are subsequently masked (Iršič et al. 2017b;Walther et al. 2019;Boera et al. 2019), but at the cost of a loss of information and hence statistical power Karaçaylı et al. 2022). However, due to incompleteness in the catalogues, particularly in low signalto-noise data, residual effects from damping wings can remain. This effect was studied in detail by Rogers et al. (2018) (and see also Mc-Donald et al. 2005;Font-Ribera & Miralda-Escudé 2012), where it was found that damped systems introduce a large-scale enhancement in the power spectrum that is similar to that expected from patchy reionization. A more complex analysis would therefore seek to further marginalize over the parameters of a model for contamination by damped absorbers.
The onset of inhomogeneous He II reionisation at redshifts 4 (e.g. Worseck et al. 2016) could also potentially impact on the Ly forest power spectrum, although the effect on the 1D power spectrum at large scales is minimal (McQuinn et al. 2011). The largest effect is instead expected at small scales due to increased line widths associated with He II photo-heating (La Plante et al. 2018;Upton Sanderbeck & Bird 2020). Furthermore the bulk of the additional He II photo-heating should occur at lower redshifts than those we consider here, particularly if He II reionization does not fully complete until ∼ 3.
Lastly, in this work we have assumed a fixed ΛCDM cosmology, but the enhancement of large-scale power by reionization also has important consequences for the inference of cosmological parameters from the Ly forest power spectrum. This will be relevant for parameters that derive most of the constraining power from large scales. For example, parameters that change the amplitude of matter fluctuations, such as 8 and the sum of the neutrino masses, Σ , could be biased high if the impact of reionization on large scales is not accounted for. Higher precision measurements of the flux power spectrum at large scales could alleviate this tension, as the change in the shape of power spectrum will be different when varying either 8 or the reionization model. However, such reasoning is more complicated for the value of the spectral index, s , and the running of the spectral index, s . A combination of a lower value for s and a lower mean Ly forest transmission can mimic the shape of the reionization signal when averaged over too narrow a range of wavenumbers. Such an analysis therefore runs the risk of s being biased low, or instead trading a low s value for a non-zero running of the spectral index. Further study of these effects will be required when inferring cosmological parameters from the Ly forest power spectrum at > 4.

Implications for future constraints on inhomogeneous reionization
We now briefly turn to consider how future, high precision measurements of the Ly forest power spectrum on large scales may be used detect the signature of inhomogeneous reionization. We achieve this by constructing mock realisations of the power spectrum using the RT-late simulation described in M22. We then vary the size of the relative uncertainty on the mock realisations, and apply our parameter inference framework to obtain p . This enables us to assess the precision that the true underlying value of p used in the simulation -which we refer to as true p -is recovered from the mock data. In Fig. 8, we obtain a distribution for the recovered best fit p parameter after performing our MCMC analysis assuming different relative uncertainties on the power spectrum measurements on 500 mock realisations. We show the median (solid curves) and the 68 per cent confidence intervals (shaded regions) for p / true p in three different redshift bins for different relative uncertainties on the power spectrum. We start from a 20 per cent relative uncertainty -comparable to that in Karaçaylı et al. (2022) -and extend this to higher precision cases that may be within reach of future surveys such as WEAVE-QSO and DESI. Note the recovered 68 per cent scatter is smaller in the higher redshift bins because the large scale enhance-  ment in the 1D power spectrum due to patchy reionization is more prominent approaching the end-point of reionization. Fig. 8 highlights how the precision of the true p recovery significantly improves as the relative uncertainty on the mock data decreases. For future, high-precision observations of the 1D Ly power spectrum, this opens the possibility of using the power spectrum on large scales to unambiguously recover the signature of postreionisation fluctuations in the IGM thermal state. We expect that this will be most useful when combined with a joint analysis of the transmitted flux distribution (e.g. Bosman et al. 2022) to obtain self-consistent constraints on the timing of reionization.

CONCLUSIONS
Inhomogeneous reionization impacts on the shape of the 1D Ly forest power spectrum at redshifts ≥ 4. This is because of coherent fluctuations in the ionized hydrogen fraction on large scales (see also Cen et al. 2009;Keating et al. 2018 In this work, we have used the Sherwood-Relics simulations (see Puchwein et al. 2022, for an overview) to assess the possibility of detecting the relic signature of inhomogeneous reionization at 4.0 ≤ ≤ 4.6 using measurements of the 1D power spectrum at wavenumbers log 10 ( /km −1 s) ≤ −2.2 (Karaçaylı et al. 2022), corresponding to larger scales than those considered in recent analyses of the intergalactic medium (IGM) thermal state (Boera et al. 2019;Molaro et al. 2022, although see also D'Aloisio et al. (2018) for earlier work using data at higher redshift). We update the analysis framework introduced by Molaro et al. (2022) by introducing a neural-network-based interpolator in our Bayesian inference analysis. This approach improves the control of interpolation uncertainties in our analysis (typically < 1 per cent), and is more computationally efficient than our previous linear interpolation technique. We also use a more general, model independent approach for parameterising the power spectrum template for inhomogeneous reionization first presented by Molaro et al. (2022). Our main results are as follows: • As already discussed by Molaro et al. (2022), we find that if considering mock realisations of the Ly forest power spectrum at wavenumbers −2.2 ≤ log 10 ( /km −1 s) ≤ −0.7 with 5 per cent relative uncertainties, patchy reionization introduces a modest ∼ 1 shift between the true and recovered IGM parameters if (incorrectly) assuming a spatially uniform UV background. However, when including larger scales with wavenumbers log 10 ( /km −1 s) −3, the bias in the parameter recovery is much larger and is at the level of ∼ 6 . This demonstrates that IGM parameter recovery is significantly more sensitive to inhomogeneous reionization when including power spectrum data on large scales.
• We perform a first analysis of the Ly forest power spectrum measurements presented recently by Karaçaylı et al. (2022), which are based on the combined observational data from the Keck Observatory Database of Ionized Absorption toward Quasars (KODIAQ) (O'Meara et al. 2017), the Spectral Quasar Absorption Database (SQUAD) (Murphy et al. 2019) and the XQ-100 survey (López et al. 2016). These data cover four redshift bins at = 4.0, 4.2, 4.4, and 4.6, and extend to large scales, log 10 ( /km −1 s) −3 and have typical relative uncertainties of 10-20 per cent in each wavenumber bin. We find a preference at the 2.7 level for an enhancement in the Karaçaylı et al. (2022) power spectrum at large scales relative to models that assume a spatially uniform background. This additional power may be due to patchy reionization, although we caution that systematic effects (e.g. variations in the shape of the continuum placement, or the damping wings of high column density systems) could still contribute.
• The precision of the Karaçaylı et al. (2022) power spectrum data at large scales is insufficient for recovering the signature of inhomogeneous reionization at high significance. However, forthcoming surveys such as the Dark Energy Spectropscopic Instrument (DESI) survey (Vargas-Magana et al. 2019) and William Herschel Telescope Enhanced Area Velocity Explorer QSO (WEAVE-QSO) survey (Pieri et al. 2016) will also extend to large scales at 2 ≤ ≤ 4.5, and will measure the 1D Ly forest power spectrum to a precision of a few per cent. Any enhancement in the power spectrum on large scales should then be recoverable. We anticipate that combining the 1D power spectrum on large scales in a joint analysis with the transmitted flux distribution (e.g. Bosman et al. 2022) should provide a powerful constraint on the timing of reionization.
DiRAC@Durham facility managed by the Institute for Computational Cosmology on behalf of the STFC DiRAC HPC Facility. The equipment was funded by BEIS capital funding via STFC capital grants ST/P002293/1 and ST/R002371/1, Durham University and STFC operations grant ST/R000832/1. DiRAC is part of the National e-Infrastructure. MM and JSB are supported by STFC consolidated grant ST/T000171/1. VI is supported by the Kavli foundation. MGH acknowledges support from the UKRI STFC (grant numbers ST/N000927/1 and ST/S000623/1). LCK was supported by the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 885990. We thank Volker Springel for making P-GADGET-3 available. We also thank Dominique Aubert for sharing the ATON code. We acknowledge use of the keras2c library in this work (Conlin et al. 2021). For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising from this submission.

DATA AVAILABILITY
All data and analysis code used in this work are available from the first author on reasonable request.