Photometric Redshift Estimation for Gamma-Ray Bursts from the Early Universe

Future detection of high-redshift gamma-ray bursts (GRBs) will be an important tool for studying the early Universe. Fast and accurate redshift estimation for detected GRBs is key for encouraging rapid follow-up observations by ground- and space-based telescopes. Low-redshift dusty interlopers pose the biggest challenge for GRB redshift estimation using broad photometric bands, as their high extinction can mimic a high-redshift GRB. To assess false alarms of high-redshift GRB photometric measurements, we simulate and fit a variety of GRBs using phozzy, a simulation code developed to estimate GRB photometric redshifts, and test the ability to distinguish between high- and low-redshift GRBs when using simultaneously observed photometric bands. We run the code with the wavelength bands and instrument parameters for the Photo-z Infrared Telescope (PIRT), an instrument designed for the Gamow mission concept. We explore various distributions of host galaxy extinction as a function of redshift, and their effect on the completeness and purity of a high-redshift GRB search with the PIRT. We find that for assumptions based on current observations, the completeness and purity range from $\sim 82$ to $88\%$ and from $\sim 84$ to $>99\%$, respectively. For the priors optimized to reduce false positives, only $\sim 0.6\%$ of low-redshift GRBs will be mistaken as a high-redshift one, corresponding to $\sim 1$ false alarm per 500 detected GRBs.


INTRODUCTION
Gamma-ray bursts (GRBs) are the most electromagnetically luminous events in the Universe.They are divided into two classes (Mazets et al. 1981;Kouveliotou et al. 1993): short GRBs are thought to arise from compact object mergers (Eichler et al. 1989;Narayan et al. 1992), while long GRBs result from the core collapse of massive stars (Woosley 1993).While both types of GRB are exceedingly bright, long GRBs are the most luminous (Hjorth et al. 2003), with the brightest having isotropic-equivalent luminosities in excess of 10 54 ergs/s (Frederiks et al. 2013), allowing them to potentially be ★ E-mail: hfausey@gwu.edu† deceased detected out to redshifts as high as ∼ 20 (Lamb & Reichart 2000).
The Epoch of Reionization is theorized to have ended around red-shift  ∼ 6, or 1 billion years after the Big Bang (Totani et al. 2006).GRBs at redshifts higher than  ∼ 6 will facilitate the study of star formation and chemical evolution at earlier stages in the evolution of the Universe.Fewer than 10 GRBs with  > 6 have been found to date (Tanvir et al. 2009;Cucchiara et al. 2011;Salvaterra 2015;Tanvir et al. 2018).While there are multiple missions dedicated to GRB detection and science, such as the Neil Gehrels Swift Observatory (Swift; Gehrels et al. 2004), the Fermi Gamma-ray Space Telescope (Fermi; Atwood et al. 2009), and KONUS-Wind (Aptekar et al. 1995), these missions are not optimized for detecting highredshift GRBs.The community has clearly recognized this absence, as a legion of high-z GRB missions have been proposed over more than a decade: Xenia (Kouveliotou et al. 2008), Joint Astrophysics Nascent Universe Survey (JANUS; Burrows et al. 2010), Energetic X-ray Imaging Survey Telescope (EXIST; Grindlay et al. 2010), Origin (Piro et al. 2011), High-z gamma-ray bursts for unraveling the dark ages mission (HiZ-GUNDAM; Yonetoku et al. 2014), Gamow Explorer (Gamow; White et al. 2021), Transient High-Energy Sky and Early Universe Surveyor (THESEUS; Amati et al. 2021), and Space Variable Objects Monitor (SVOM; Atteia et al. 2022).So far, only SVOM has been selected to become a mission, but it will be crucial to launch missions and instruments designed to detect high-z GRBs if we are to fully study the environments and evolution of the early Universe using GRBs as probes.
A major component of any future high-z GRB mission will be to alert the community for rapid follow-up observations of high-z GRBs.Fast and accurate on-board redshift estimation will be vital to their success.There are two ways to estimate redshift, using photometry and spectroscopy.Photometric redshift estimation relies on Lyman- (Ly) blanketing, a sharp loss of flux due to a multitude of absorption lines from neutral hydrogen throughout the IGM (Madau et al. 1996;Madau 1995).Spectroscopic redshift estimates are more accurate but require a significantly longer observing time, as they rely on measuring the flux through many small wavelength bins.Photometry uses broad bands, so it is faster than spectroscopy but less accurate for redshift estimation.Because of its speed and technical constraints, photometry is more practical for rapidly alerting the community to potential high-z GRBs.However, photometry is susceptible to misidentifying a high-extinction GRB as a high-redshift one (Curran et al. 2008).Extinction can cause a seemingly steep loss in flux when using broad photometric bands, as it has a larger effect at shorter, bluer wavelengths (Fitzpatrick 1989;Cardelli et al. 1989;Pei 1992;Schady et al. 2012).For high-extinction GRBs where this effect is more pronounced, it can be difficult to distinguish it from a high-z GRB.Finding as large a number of true high-z GRBs as possible (completeness), while minimizing the number of low-z high-extinction GRBs causing false alarms (purity), will establish the potential success of a high-z GRB mission.
Here we present an in-depth examination into our ability to correctly identify high-z GRBs using phozzy, a photometric redshift estimation code capable of simulating and fitting photometric measurements, and estimating instrument performance for future high-z GRB missions.The code can be applied to any instrument that makes use of any number of simultaneously observed photometric bands for redshift estimation.As an example, we use the channels proposed for the Photo-z Infrared Telescope (PIRT; Seiffert et al. 2021;White et al. 2021) onboard Gamow, a high-z GRB mission proposed to the 2021 NASA MIDEX call (White et al. 2021).The PIRT is an instrument designed to identify high-z GRBs within 100 seconds and send an alert within 1000 seconds of a GRB trigger (White et al. 2021).We use the PIRT to estimate how accurately we can differentiate between true high-z GRBs, and low-z high-extinction GRBs.
In Section 2, we will discuss details about the code, including the models, simulated GRB generation, and fitting used for the simulations.We present the results of the simulations in Section 3, including the key metrics used for estimating instrument performance.In Sections 4 and 5, we consider the implications of the results for future GRB missions and summarize our findings.

METHODOLOGY AND MODELLING CODE
In this section, we discuss the motivation and details behind our spectral model and the fitting method.We then explain the choices of distributions for both the input parameters of the simulated GRBs and priors for the fitting.Finally, we describe the structure of the code and how it functions.

Model
We assume that the optical to near-infrared (NIR) regime of a GRB afterglow spectrum can be modelled by a single power-law function of flux versus frequency or wavelength (Sari et al. 1998).Two major effects are applied to the spectrum: host galaxy extinction and intergalactic attenuation.
Host galaxy extinction is the absorption and scattering of light due to gas and dust along the line of sight in the GRB host galaxy (Pei 1992;Klose et al. 2000;Zafar et al. 2011;Greiner et al. 2011;Covino et al. 2013;Bolmer et al. 2018).Extinction is most prominent in the ultra-violet (UV) and optical regimes, and primarily affects shorter, bluer wavelengths (Fitzpatrick 1989;Cardelli et al. 1989;Pei 1992).For GRB host galaxies, there are typically three templates used: the Small Magellanic Cloud (SMC), Large Magellanic Cloud (LMC) and Milky-Way extinction models developed by Pei (1992).Our code includes all three models, but we chose to use the SMC model for the simulations presented in this paper because it is most consistent with observations for GRB host galaxies, which tend not to have a bump at 2175 Å (e.g., Schady et al. 2012).The 2175 Å bump was originally attributed to graphite and silicate grains (Mathis et al. 1977;Draine & Lee 1984;Pei 1992;Schady et al. 2012), or a mix of carbonaceous grains with polycyclic aromatic hydrocarbons (PAHs) (Li & Draine 2001;Weingartner & Draine 2001;Draine & Li 2007;Fischera & Dopita 2008).However, recent work suggest that PAHs alone may be the cause (Shivaei et al. 2022;Hensley & Draine 2023;Lin et al. 2023).
Intergalactic attenuation is applied to the extincted spectrum using a model developed by Meiksin (2006), which is based on work by Madau (1995).Intergalactic attenuation is the photoelectric absorption and resonant scattering by hydrogen gas in the intergalactic medium (Madau 1995;Meiksin 2006).It is dominated by the Lyman- (Ly) line, which has a wavelength of   = 1216 Å, corresponding to the transition between the first and second energy levels of the hydrogen atom (Madau 1995;Madau et al. 1996;Meiksin 2006).This results in a Ly forest -a region of spectra 'blanketed' by a multitude of Ly (and other higher order Lyman series) absorption lines due to intervening hydrogen (Madau 1995;Madau et al. 1996).The red edge of the Ly forest features a steep drop-off in spectra that occurs at a wavelength of   (1 + ), so it is heavily dependent on the redshift (Madau 1995;Madau et al. 1996).This Ly drop-off is a key marker for estimating the redshift of extra-galactic objects (Steidel & Hamilton 1992;Steidel et al. 1996;Madau et al. 1996;Krühler et al. 2011), and is the main feature used by phozzy to determine the photometric redshift of a GRB.
Overall the model is controlled by 4 parameters: the flux normalization , the spectral index , the redshift , and the extinction  − .The afterglow spectral flux is given by  and , while  − and  influence the spectral shape through effects related to extinction and redshift, respectively.

Fitting with MCMC
The code runs a Markov Chain Monte Carlo (MCMC) fitting method using emcee, a python package developed by Foreman-Mackey et al. (2013).The MCMC method is a stochastic process that estimates parameters using posterior distributions (Spade 2020).The posteriors are determined by a likelihood function based on the fit to the data and priors, probability distributions for the parameters based on a priori information (Spade 2020).
The phozzy package uses a Bayesian likelihood function where L is the likelihood, and  2 is defined as

𝑖
where   and   are the measured and fit average fluxes across each photometric band , respectively, and   are the uncertainties of the measurements.The priors used in the Bayesian analysis are outlined in Section 2.3.1.In the photo-z code, a set of 50 walkers are given a 250-step 'burn-in' phase allowing them to settle into a preferred, stable region of parameter space before starting its 500step 'production' phase from which the posterior distributions are created (Spade 2020).Once the runs are complete, the code takes the final positions of all 50 walkers for further analysis.
Even though there are only 4 parameters, the parameter space is complex due to effects from host galaxy extinction and intergalactic attenuation.Furthermore, the number of photometric bands is typically not much larger than the number of free parameters.This makes fitting prone to getting stuck in local minima instead of finding a global minimum.An MCMC fitting method is helpful for complex parameter spaces because it is stochastic and less likely to get stuck in a local minimum.With an MCMC method we can take the posterior distributions and the final positions of the walkers to identify multiple minima in parameter space if they exist.

Parameter Priors and Inputs
MCMC fitting methods allow for the input of a set of priors, or probability distributions for the parameters that can inform the posteriors.These simulations also require distributions for the input parameters that will be used to simulate the photometric band fluxes.Here we describe the parameter distributions used for both priors and input.

Priors
For the flux normalization  and redshift , the code includes a simple positive uniform prior.We considered including a redshift prior based on an expected redshift distribution of GRBs the would have been observed with Gamow (based on Ghirlanda et al. 2015Ghirlanda et al. , 2021;;Ghirlanda & Salvaterra 2022), but found that it resulted in a large portion of high-z GRBs being confused as low-z GRBs.We instead use this expected distribution for the GRB input redshifts (see Section 2.3.2) The spectral index is given a Gaussian prior centered on  = 0.7 with a standard deviation of 0.2, which is based on a large sample of observed optical afterglow spectral indices (Li et al. 2015).
The code includes multiple priors for the host galaxy extinction  − , which are explored in the simulations presented in this paper: • 'none' -extinction is assumed to be 0 and is not fit as a free parameter; • 'basic' -a single exponential prior based on results from Covino et al. (2013); • 'evolving' -a more complex exponential prior that combines results from Covino et al. (2013), Bolmer et al. (2018) and Greiner et al. (2011), in which the exponential constant changes depending on the redshift.
In their study of Swift GRBs, Covino et al. (2013) found that most GRBs have a relatively low host galaxy extinction, with a quickly diminishing number of GRBs at higher extinctions.We model our 'basic' prior off of this (Covino et al. 2013) data by fitting their extinction distribution with an exponential function.
A later study performed by Bolmer et al. (2018) found that higher redshift GRBs have significantly lower extinction compared to those of lower-redshift GRBs, with GRBs with 2 <  < 4 all having  V < 3, and GRBs with  > 4 having  V < 0.5.The same  V thresholds appear when combining the GRBs examined by Bolmer et al. (2018) and Covino et al. (2013) with an additional GRB host galaxy extinction study done by Greiner et al. (2011).To account for this relationship between redshift and extinction, we also created an 'evolving' extinction prior for which we fit the extinction distributions for the  < 2, 2 <  < 4, and  > 4 data sets with an exponential function, and use the different exponents for their corresponding redshift ranges (See Table 1).
We also include the option for 2 different sets of upper limits on the 'evolving' extinction prior, based on the  V thresholds for different redshift ranges in the Bolmer et al. (2018), Covino et al. (2013), andGreiner et al. (2011) GRB data (See Table 1).

Inputs
When selecting the input parameters for each simulated GRB, the code can pull from any of the priors for each respective parameter, but with two additions: a log-normal redshift distribution and a Gaussian flux distribution with a mean flux determined by the input redshift.
We include a log-normal input redshift distribution ( = 0.8,  = 0.55) based on an expected redshift distribution for GRBs observed by Gamow and based on previous work found in Ghirlanda & Salvaterra (2022); Ghirlanda et al. (2021Ghirlanda et al. ( , 2015)).We chose not to include this distribution as a prior because it negatively impacted the fitting methods' ability to identify high-z GRBs because the distribution is concentrated around low redshifts, and increases the likelihood that it would identify high-z GRBs as low-z GRBs.However, as an input distribution it can estimate how accurate the photo-z measurements will be for a general population of GRBs.False positives due to low-z dusty interlopers are a great concern for a high-z GRB mission, so it is vital to ensure that low-redshift GRBs are correctly categorized.This is especially important since the vast majority of GRBs will likely have  < 5, so even a small percentage of false positives could result in more false positives than true high-z detections.
For the input flux we use a Gaussian prior for the log of the flux in Jy ( = 6.18,  = 2.65) based on the expected brightness distribution for GRBs at redshift 10 (Kann et al., in preparation).The flux is then adjusted for the redshift of the GRB by using the ratio of luminosity distances (Weinberg 1972) to determine what the Table 1.Table of exponents for each exponential prior of the host galaxy extinction, including the various exponents for different redshift ranges in the 'evolving' prior.Note that the 'evolving' prior exponents get steeper in higher redshifts ranges due to the increasingly quick drop-off in  V at higher redshifts.The table also shows the different  V and  − upper limits included in the code, based on data from Bolmer et al. (2018), Covino et al. (2013), andGreiner et al. (2011).
Prior type   − exponent upper limit 1 upper limit 2 0 -2 6.9 6 2.05 6 2.05 evolving 2 -4 12.6 3 1.02 3 1.02 evolving > 4 36.2 1 0.34 0.5 0.17 brightness would be for a similar GRB at a different redshift.Note that we do not account for the lower intrinsic luminosities for lowerredshift GRBs (Petrosian et al. 2015;Pescalli et al. 2016;Lloyd-Ronning et al. 2019;Banerjee & Guetta 2022).This flux distribution may also be optimistic for Gamow, since it may detect more faint GRBs with dimmer afterglows (Kann et al., in preparation).If this is the case, this distribution may underestimate the rate at which the PIRT will observe high-redshift GRBs with low fluxes, which tend to have less certain photometric redshift estimations.However, we do not use this distribution as a prior, so it has a limited impact on the fitting itself.As an input distribution, we chose to rely on the added uncertainty in the band measurements to envelop any discrepancies.

Data & Code Structure
The code first generates parameters for the desired number of GRBs.These sets of parameters are randomly generated by pulling from the input distributions specified by the user (see Section 2.3.2), to create a set of simulated GRB spectra using the model laid out in Section 2.1.From these spectra, the flux measurements for each of the specified photometric bands are determined by finding the average integrated flux across the entire band, and then perturbed according to the assumed statistical uncertainty added in quadrature to the estimated instrumental noise.If the measured flux is below the given detection limit for all photometric bands, the corresponding set of parameters, GRB spectrum, and band measurements are recreated to ensure that all simulated GRBs would be considered detections.
The perturbed fluxes are fit using the MCMC fitting method described in Section 2.2 using the priors selected by the user (see Section 2.3.1).The final fits, posteriors, and positions of all walkers in parameter space for each of the fit are saved for further analysis.The full code structure is given in Figure 1.
For the example simulations, we input the 5 simultaneously observed optical-NIR bands proposed for the PIRT, which have wavelength ranges of 0.50 − 0.64, 0.64 − 0.87, 0.87 − 1.2, 1.2 − 1.7, and 1.7 − 2.4 .These bands were part of the final design reported in the Gamow Explorer's NASA MIDEX proposal, and are updated versions of the ranges from Seiffert et al. (2021) and White et al. (2021).These band edges correspond to the wavelengths of the Ly breaks for redshifts of 3.1, 4.3, 6.2, 8.9, 13.0, and 18.7, respectively.
phozzy allows for a large amount of customization.It includes inputs for the instrument parameters, such as band edges, statistical uncertainties, the 1 instrument noise value, and an  detection limit, where  is how many multiples above the 1 instrument noise level a measurement must be to be considered a detection in that band.It also allows the user to indicate the desired host galaxy extinction model, and the parameter input and prior distributions.The code can be used for instruments with bands that are not observed simultaneously, but the user must account for the additional effects and uncertainty that arise when interpolating flux measurements (see Section 4).For each of the PIRT simulations we create and fit sets of 500 GRBs, and assume a statistical uncertainty of 5% and 3 Jy of 1 instrument noise.The results for the PIRT simulations are presented in Section 3.

RESULTS
Here we present the results of the phozzy simulations for the Gamow PIRT instrument, using 3 metrics to assess the instrument performance for categorizing high-vs.low-redshift GRBs, and retrieving the redshift: • completeness -for GRBs above redshift 5, how many are correctly identified as high-redshift ( > 5) GRBs; • purity -for GRBs below redshift 5, how many are correctly identified as low-redshift ( < 5) GRBs; • accuracy -for GRBs above redshift 5, how often does the fitting method return a redshift within 10% and 20% of the input redshift.
The accuracy metric determines how well the fits retrieve the true redshift, while completeness determines the likelihood of a true positive (as opposed to a false negative) for high-redshift GRBs, and purity gives the likelihood of a true negative (as opposed to a false positive) for low-redshift GRBs (see Figure 2).We estimate these metrics by running fits on 500 randomly generated GRB spectra with a variety of input distributions and priors.For these simulations we define high-redshift as  > 5, but the code allows this threshold to be changed by the user.
For these simulations we draw from two different redshift distributions for varying purposes: a uniform redshift distribution for which all redshifts between 0 and 20 are equally likely, and an expected redshift distribution for Gamow that was created using previous work found in Ghirlanda & Salvaterra (2022); Ghirlanda et al. (2021Ghirlanda et al. ( , 2015)).
The uniform redshift distribution is mainly used for measuring completeness, while the measured redshift distribution is mainly used for measuring purity.To estimate completeness, we need to establish the instrument performance at all redshifts.The measured redshift distribution has very few high-z GRBs, and those that do occur are concentrated towards the lower end between  = 5 − 6, so it supplies very little information about the instrument performance if, for example, a  = 10 GRB were to be observed.The uniform distribution is evenly distributed across all redshifts, so it truly tests the instrument performance for all redshifts.For estimating purity, we do need to take the measured redshift distribution into account.In this case, we need to know how many misidentified low-z GRBs there are, which implies using a distribution similar to what we expect to observe.For estimating accuracy, we use and present the results for both distributions.
For both of these redshift distributions we run a variety of configurations for input and fitting prior extinction distributions: • Input: no extinction, Fitting: no extinction -acts as a baseline and makes sure the code is working properly; from here on, this will be denoted as the 'no extinction baseline'.
• Input: no extinction, Fitting: basic extinction prior -acts as a baseline for how well the instrument would do in the best case scenario where there is no host galaxy extinction for any GRBs; this shows how adding extinction as a free parameter affects the ability to retrieve the redshift, and will be denoted as the 'extinction prior baseline'.
• Input: basic extinction, Fitting: basic extinction prior -shows how well the instrument performs for an expected distribution of GRBs when using the basic fitting prior.
• Input: 'evolving' extinction, Fitting: 'evolving' extinction prior -shows how well the instrument performs for an expected distribution of GRBs when using the 'evolving' fitting prior.
• Input: 'evolving' extinction with upper limit 1, Fitting; 'evolving' extinction with upper limit 1 -shows how results change when adding a lightly constrained upper limit on host galaxy extinction when using the 'evolving' extinction prior.
• Input: 'evolving' extinction with upper limit 2, Fitting: 'evolving' extinction with upper limit 2 -shows how results change when adding a more constrained upper limit on host galaxy extinction when using the 'evolving' extinction prior.
While the runs with no extinction for the inputs and/or priors form a baseline, the others determine which set of priors will be best for correctly identifying low-and high-redshift GRBs for the Gamow PIRT instrument specifications (observing bands, instrument noise, etc.).The results for all runs are summarized in Table 2, and displayed using density plots showing the input versus retrieved redshift for all runs in Appendix A. The results and implications for the completeness, purity, and accuracy are detailed in the following subsections.

Completeness
Completeness is a key metric for estimating the instrument performance at high redshifts by determining how many high-z GRBs will be correctly identified as such and how many will be missed.In Table 2, we show the simulation results for both a uniform and an expected redshift distribution, but to estimate completeness we use the uniform distribution.
For the 'no extinction baseline' run, the completeness is unsurprisingly high at 98.1% since the high-z GRBs cannot be mistaken for high-extinction GRBs in this case.Of the ∼ 2% of GRBs that are mistaken as low-redshift, all had redshifts between  = 5 -5.5, and were mistaken as GRBs with redshifts between  = 4.5 -5, so they were only missed due to their proximity to the high-redshift threshold.When performing the 'extinction prior baseline' run, the completeness drops significantly to 85.4%.Including extinction in the fitting method increases the chances of confusing high-z GRBs with low-z dusty interlopers, especially when there are only one or two filters with non-zero fluxes (see Figure 3).
When including extinction as both an input and fitting parameter,  A fit where all 50 walkers retrieve a redshift within 10% of the input redshift.Bottom: A fit where the walkers find varying solutions.Both the high-z and low-z high-extinction outputs adequately fit the data, giving a mixed result for the GRB redshift.
the 'evolving' extinction distribution with upper limit 1 (see Table 1) performed the best with a completeness of 88.0%.However, for all simulations that use extinction for both the inputs and fit priors, the completeness is relatively similar, between ∼ 82−88%, so the choice of distribution seems to have a limited effect on the completeness.Only taking GRBs with 5 <  < 13 has little impact on the completeness result.The method does seem to be prone to missing high-z GRBs when using the 'evolving' extinction prior.For example, when using the 'evolving' prior without upper limits, GRBs with  > 13 account for 43% of all missed high-z GRBs, while  > 8.9 account for 80% of all misidentified high-z GRBs (See Figure A4:Left).GRBs with  > 8.9 have their Ly dropoff occur in one of the two reddest photometric bands, where it is more difficult to distinguish between high-z and high-extinction GRBs (see Figure 3).
When examining the completeness between 5 <  < 13, the completeness for the 'evolving' extinction prior increases from 81.7% to 83.0%.However, for GRBs with 5 <  < 8.9, where most highz GRBs are expected to be, the completeness increases to 93.0%.A large portion of the remaining miss-identified high-z GRBs are those near the high-z threshold.Of the missed high-z GRBs with 5 <  < 13, as many as ∼ 10% fall between 5 -5.5 when using the 'evolving' extinction with upper limit 1 (see Figure A5:Left).

Purity
The purity metric is used to determine the instrument's ability to correctly identify low-redshift GRBs.It estimates the risk for false positives by mistaking low-redshift, high-extinction GRBs for highredshift ones.It is crucial to reduce the number of false positives, because if high-z detections are regularly false alarms, this will have a large impact on the use of follow-up resources.For estimating purity we focus on the expected redshift distribution because it evaluates the false positive rate for a general population of GRBs.
The baseline fits both have very high purity results of 99.7% and > 99.99% for the 'no extinction baseline' and 'extinction prior baseline' simulations, respectively; so when extinction is not present in the GRB spectra, very few low-z GRBs are mistaken for high-z ones.Additionally, for both of the baseline runs, all of the GRBs mistaken for high redshift had input redshifts between 4.5-5 and output redshifts between 5-5.5, so these lapses in purity are caused by GRBs with redshifts near our high-redshift threshold.Top: An example of a fit using the 'basic' extinction distribution for the inputs and priors.This simulated GRB has a high redshift and high extinction, which makes it difficult to determine the redshift accurately.High-redshift solutions are in red, and low-redshift solutions are in blue.Bottom: A similar fit using the 'evolving' extinction distribution.The walkers are less inclined to combine high  and high  − values for fitting, even though it would accurately fit the low flux in the bluer bands.This improves the redshift retrieval performance.
When using the basic extinction distribution for both the inputs and the fitting, there is a drop in purity down to 94.0%.However, when switching to the 'evolving' extinction prior, the purity jumps back up to 99.4%, which is comparable to the baseline runs.The improvement when switching from the 'basic' to the 'evolving' prior is likely due to the more realistic simulated GRB spectra and the additional information about extinction as a function of redshift.When using the 'basic' prior, there are more GRBs that have both a high redshift and a high extinction.These kinds of simulated GRBs do not match up with observation based on studies done by Bolmer et al. (2018), Covino et al. (2013), andGreiner et al. (2011), and have spectra that make it difficult to tease out the redshift.The walkers are able to use both high redshift and high extinction for fitting because they do not have a prior that prevents them from settling on this unlikely solution.Using the 'evolving' extinction prior eliminates these kinds of GRBs and fits, which results in an increase in purity (see Figure 4).We note that the implementation of upper limits on  − result in a decrease in purity, down to 96.1% or even 83.8% for the most constrained upper limit.This will be further discussed in Section 4.2.

Accuracy
The accuracy metric determines how well the code can estimate the actual redshift of a GRB.We explore the accuracy from two different perspectives.First we examine the accuracy from a 'Universe perspective', i.e., for GRBs with input redshifts of  > 5, how many have an accurate measured redshift.This shows how well the instrument can estimate the actual redshift.Secondly, we examine the accuracy from an 'observer perspective', i.e., for GRBs with measured redshifts of  > 5, how many have accurately retrieved the input redshift.This tells us how many of the GRBs identified as having a high redshift will have accurate redshift measurements.
For the 'no extinction baseline', when looking at the accuracy from the Universe perspective and GRBs with redshifts above 5, we retrieve a redshift within 10% of the input redshift in 78.2% of the cases for a uniform redshift distribution, and in 90.4% of the cases for an expected redshift distribution.The large accuracy difference between the uniform and expected redshift distributions can be attributed to the much lower abundance of  > 12 GRBs in the expected redshift distribution.The method struggles at redshifts above  ∼ 12 because either only the reddest filter has a non-zero flux (for  > 13) or the flux in the second reddest band is low enough that it is easily mistaken as a non-zero flux measurement (for 12 <  < 13).In these cases, constraining the fit parameters becomes very difficult (see Figure 5).When looking only at the redshift retrieval between  of 5−12 for the uniform distribution, the accuracy increases to 93.5%, which is comparable to the result for the expected distribution.From an observer perspective, the accuracy is comparable to the accuracy from a Universe perspective for all accuracy metrics in the case of the 'no extinction baseline' (see Table 3).
When using the 'extinction prior baseline', the accuracy from the Universe perspective for the uniform redshift distribution drops to 58.8%.This sharp loss of accuracy can be attributed to the increase in the complexity of the model.When adding extinction as a free parameter, at least 3 filters with non-zero flux measurements are required to constrain the flux, redshift and extinction.The fitting method has a much harder time constraining the redshift for GRBs with  > 8.9 where the Ly dropoff falls in one of the two reddest filters and only 1-2 non-zero fluxes are present (see Figure A2:Left).
Table 3. Results from all simulations, listing redshift (z) and  − inputs and priors, as well as the accuracy from a Universe perspective ('Universe Acc.': for GRBs with an input redshift above 5, or between 5 and 12, how many have a retrieved redshift within 10% or 20% of the input redshift), and the accuracy from an observer perspective ('Observer Acc.': for GRBs with a retrieved redshift above 5, or between 5 and 12, how many are within 10% or 20% of the input redshift).Baseline runs are italicized.For the expected distribution, there is also a drop in accuracy, but it is not nearly as severe as the loss of accuracy for the uniform redshift distribution: from 90.4% to 83.4%.This is because the expected redshift distribution has far fewer  > 8.9 GRBs than the uniform distribution, and therefore is less impacted by the addition of extinction as a fitting parameter.The accuracy from an observer perspective also decreases, but not nearly as much as the accuracy from a Universe perspective.For example, for GRBs with measured redshifts of  > 5, the code retrieves the redshift within 10% of the input redshift 68.8% of the time, compared to 58.8% of the time from a Universe perspective.When extinction is included as both an input and fitting parameter using the 'basic' extinction distribution, the accuracy continues to decrease from both a Universe and observer perspective.This is due to high-extinction GRB spectra now present in the sample, which have low relative flux values in the bluer bands and are thus easy to mistake for a high-redshift GRB.When using the 'evolving' extinction distribution, the instrument's accuracies become comparable to those of the 'extinction prior baseline'.Of the different 'evolving' extinction distributions, the performance of the one with no upper limits on  − , and the one using upper limit 1 for  − (see Table 1), are comparable to that of the 'extinction prior baseline' simulation for nearly every accuracy measurement, regardless of input redshift distribution.This is promising because the instrument is expected to perform as well as it would in the best case scenario where every detected GRB has virtually no extinction.
The run using the 'evolving' extinction prior with upper limit 1 tends to do slightly better from the Universe perspective, as for GRBs with 5 <  < 12 it retrieves a redshift within 10% of the input redshift in 85.4% of cases, compared to 79.3% of cases for the 'evolving' extinction prior without upper limits.However, using the 'evolving' extinction prior with no upper limits outperforms all other runs from an observational perspective.For example, for GRBs with a measured redshift 5 <  < 12, the measured redshift is within 10% of the input redshift 92.1% of the time when using the 'evolving' extinction without upper limit, as opposed to 91.3% of the time when using the 'evolving' extinction with upper limit 1.This pattern holds when examining the accuracy metric as a function of redshift (see Figures 6 and 7).For the 'evolving' extinction prior without upper limits, the accuracy from a Universe perspective gradually decreases with higher redshifts, but from an observer perspective it has a high Fraction with accurate redshifts 20% accuracy 10% accuracy Figure 6.Accuracy as a function of redshift when using the 'evolving' extinction prior without upper limits.The black dashed lines represent the redshifts where the Ly dropoff aligns with the filter edges, and the grey dashed lines show redshift in increments of 1 and the fraction of GRBs with accurate redshifts in increments of 0.2.Top: The fraction of retrieved redshifts within 10% and 20% of the input redshift for   > 5 (Universe perspective).Bottom: The fraction of retrieved redshifts within 10% and 20% of the input redshift for   > 5 (observer perspective). .Accuracy as a function of redshift when using the 'evolving' extinction prior with upper limit 1.The black dashed lines represent the redshifts where the Ly dropoff aligns with the filter edges, and the grey dashed lines show redshift in increments of 1 and the fraction of GRBs with accurate redshifts in increments of 0.2.Top: The fraction of retrieved redshifts within 10% and 20% of the input redshift for   > 5 (Universe perspective).Bottom: The fraction of retrieved redshifts within 10% and 20% of the input redshift for   > 5 (observer perspective).accuracy out to  ∼ 12. Conversely, the 'evolving' extinction prior with upper limit 1 has a more reliable accuracy at higher redshifts from a universe perspective, but has a less reliable accuracy from an observational perspective.This means that while the 'evolving' extinction with upper limit 1 has a better accuracy for high-z GRBs overall, for GRBs that are successfully identified as high-redshift the 'evolving' extinction prior without upper limits is more reliable.For full accuracy results see Table 3.

DISCUSSION
In the previous section, we presented a variety of statistics for completeness, purity, and accuracy when making different assumptions about the redshift and host galaxy extinction distributions.We will now discuss the implications of these results for determining which extinction prior leads to the best performance of the Gamow PIRT, the drawbacks of implementing  − upper limits, the limitations of the instrument at low redshifts, and the advantages of simultaneous observations in different photometric bands.

Optimal 𝐸 𝐵−𝑉 prior
There is a natural trade-off between completeness and purity.We have shown that adding upper limits on the host galaxy extinction increases the completeness while at the same time negatively affecting the purity.When looking at percentage changes, it is not obvious what the optimal choice of extinction model is.However, it is important to note that there is a much larger population of low-z GRBs, so a small change in percentage can have a noticeable impact on the number of false positives.We must be mindful of this when balancing our ability to maximize the number of correctly identified high-z GRBs while minimizing false positives.Here we use Swift's detection rate to estimate the number of high-z triggers and false alarms.
Since its launch in 2004, Swift has detected ∼ 100 GRBs per year (Gehrels et al. 2007).If we assume the same rate of detection for a mission like Gamow, based on the measured redshift distribution (Ghirlanda et al. 2015(Ghirlanda et al. , 2021;;Ghirlanda & Salvaterra 2022), approximately 7.3 of these GRBs would have  > 5, while 92.7 GRBs would have  < 5.When using the 'evolving' extinction prior with no upper limit, we find a completeness of 81.7% and a purity of 99.4%.This translates to ∼ 6 correctly identified high-z GRBs per year, and ∼ 0.2 misidentified low-redshift GRB per year, or ∼ 1 over the course of a 5-year mission.
Conversely, when using the 'evolving' extinction prior with upper limit 1, we find a completeness of 88.0% and a purity of 96.1%, which translates to ∼ 6.4 correctly identified high-z GRBs and ∼ 3.6 misidentified low-z GRBs per year.The small decrease in purity when using upper limit 1 results in a detrimental increase in false alarms, as now 36% of reported high redshift GRBs would turn out to be low-z dusty interlopers.For this reason, we would recommend to use a 'evolving' extinction prior without upper limits for both simulating performance beforehand and fitting photometric data during the instrument's operation.

Extremes of parameter space
For all simulations there is a loss of accurate redshift retrieval below  ∼ 3.This result is expected as the Ly break does not fall in any of the Gamow PIRT photometric bands for GRBs with  ≤ 3.1, so there are no features to distinguish between GRBs with redshifts below this threshold.We are not concerned about the loss of accuracy for low-z GRBs as long as we are capable of accurately identifying them as having a low redshift.The main goal for a high-z GRB mission is to minimize the rate of false positives caused by low-z dusty interlopers, so maximizing the purity is far more important than accurate redshift retrieval for  < 3 GRBs.
For the 'evolving' extinction prior, we find that the more restrictive the upper limits on  − , the worse the purity becomes (See Table 2).This degradation of purity is also visible in the input versus output redshift density plots.See Figures A4:Right, A5:Right and A6:Right for comparison of results from the 'evolving' extinction runs with no upper limit, upper limit 1, and upper limit 2, respectively.This occurs because the upper limits restrict the walkers to regions of parameter space that they would otherwise have access to at a different redshift.For walkers with a high-redshift guess at the start of the burn-in phase, they are restricted to fitting the fluxes with a lower  − , which may inadvertently push them towards a higher redshift if the bluer bands have relatively low flux values.While this is not an issue when fitting high-z GRBs, it is problematic for walkers fitting a low-z high-extinction GRB if they start in this region of parameter space.These upper limits are an example of a prior that, while adding more  information about the parameters, can impact the walkers' ability to move through parameter space.

Performance for low flux GRBs
When examining the completeness, purity, and accuracy statistics for GRBs with lower flux values, the purity and accuracy tend to decrease, while the completeness remains relatively stable.For example, when using the 'evolving' extinction prior with no upper limits, the percentage of retrieved redshifts within 10% of the input redshift for GRBs with  > 5 drops to 45.3% when only looking at those with fluxes less than 75 Jy in the reddest photometric band, compared to 79.3% for all GRBs with  > 5.The accuracy also decreases at low flux from an observer perspective.When looking at GRBs with  > 5 and fluxes in the reddest band less than 75 Jy, only 53.7% of retrieved redshifts are within 10% of the input redshift for GRBs, compared to 91.9% for all GRB with  > 5 (see Figure 8).
The purity also drops significantly from 99.4% for all GRBs to 81.2% for GRBs with fluxes less than 100 Jy, and to 75.1% for GRBs with fluxes less than 75 Jy in the reddest band.However, the completeness remains largely unaffected regardless of GRB flux, as it hovers between 82-88% when looking at GRBs with fluxes less than 75 Jy up to GRBs with fluxes less than 300 Jy in the reddest band (see Figure 9).This is also consistent with the behavior of the completeness when using different extinction priors.
While the overall purity is very high and the rate of false positives is low, GRBs with lower flux values are more likely to be confused as having a high redshift.For this reason, it may be important to flag low-flux bursts as being higher risk targets for follow-up observations of future high-z GRB missions.

Simultaneous bands
In recent years, multiple instruments have been designed to use dichroics and beam splitters to take simultaneous photometric band measurements, such as GROND (Greiner et al. 2008), the PIRT on the Gamow explorer (Seiffert et al. 2021;White et al. 2021), and SCORPIO, an imaging and spectroscopy instrument in development for the Gemini South telescope (Robberto et al. 2020  for fast evolving transients such as GRBs, because it eliminates the uncertainty that arises when correcting for rapid fading of the afterglow: even a small change in time can result in a large change in flux, which can only be corrected for by interpolation and extrapolation.Even careful correction can introduce error in the photometric band fluxes, as it still relies on accurate measurements of temporal indices to do the correction.This is particularly important for the fast-evolving light curves of GRBs.There is also no guarantee that the light curve will have a steady decay between observations, due to for instance flares (Greiner et al. 2009;Gao 2009) or rebrightening episodes (Kann et al. 2018;Bersier et al. 2003).It is important to minimize any extra sources of error or uncertainty for high-z GRB missions where accurate photometric estimation is crucial, and any offsets or uncertainty in the bands relative fluxes could have an impact on redshift estimation.Even a small change in the purity can have a drastic impact on false alarm rates for a high-z GRB mission (see Section 4.1), so it is imperative to reduce the likelihood of confusion by eliminating as many sources of error as possible.The issue of interpolation could be partly mitigated with dense temporal sampling, but simultaneous photometric bands will be the optimal solution for future high-redshift GRB missions.

CONCLUSIONS
GRBs are valuable probes of the high-redshift Universe due to their high luminosities and simple power-law spectra.They are ideal for tracking the chemical evolution of the Universe, studying early star formation, and constraining the end of the Epoch of Reionization.Multiple missions including optical-NIR photometric instruments have been proposed for finding and studying high-redshift GRBs.Host galaxy extinction poses a challenge for constraining GRB redshifts, as low-z high-extinction GRBs can mimic high-z GRBs when using broad photometric bands.It is imperative that future high-z GRB missions are capable of rapidly and reliably identifying the redshift of detected GRBs with as few false alarms as possible to encourage community follow-up.
phozzy is a photo-z simulations and fitting code that can be used by future high-z GRB missions for testing their ability to accurately retrieve the redshift of an expected GRB population.Using it, we have tested the capabilities of the Gamow PIRT when using different extinction priors, and found that using the 'evolving'  − distribution gave the best results.We find that Gamow PIRT would have a ∼ 92% accuracy from an observer perspective, a completeness of ∼ 82%, and would only mistake ∼ 0.6% of low-redshift GRBs as having a high ( > 5) redshift.This translates to ∼ 1 false alarm per 500 GRBs detected.We have also shown that we can increase the completeness to ∼ 88% by imposing constraints on the  − at high redshifts, but this would have a significantly negative effect on the purity.The latter would decrease such that ∼ 1/3 of high-redshift GRB alerts would in fact be a low-z dusty interloper.We note that the completeness for the 'evolving'  − distribution increases to well above 90% for GRBs with 5 <  < 9. Finally, we discuss that the use of simultaneous measurements in different photometric bands can help remove unnecessary sources of error and improve the chances of retrieving GRB redshifts.

Figure 1 .
Figure 1.The structure of phozzy.Diagram shows the different modules and in which other modules of the code they are used.The main module phozzy uses the create_data, mcmc, and analysis modules to first create a number of simulated GRB photometric band measurements, then perform fits on each set, and finally, compile all results and determine the completeness, purity, and accuracy metrics for the results.

Figure 2 .
Figure 2. Regions defined as true positive, false positive, true negative, and false negative based on the input and output redshift results.For these simulations, we consider  = 5 the threshold between low and high redshift GRBs (white lines).

Figure 3 .
Figure 3. Example fits to simulated spectra, with the black lines showing the input GRB spectra, the black points showing simulated fluxes with 1 uncertainties, and the blue lines indicating the final positions of the 50 walkers.High-redshift solutions are in red, and low-redshift solutions are in blue.Top:A fit where all 50 walkers retrieve a redshift within 10% of the input redshift.Bottom: A fit where the walkers find varying solutions.Both the high-z and low-z high-extinction outputs adequately fit the data, giving a mixed result for the GRB redshift.

Figure 4 .
Figure 4. Example fits to simulated spectra, with the black lines showing the input GRB spectra, the black points showing simulated fluxes with 1 uncertainties, and the blue lines indicating the final positions of the 50 walkers.Top:An example of a fit using the 'basic' extinction distribution for the inputs and priors.This simulated GRB has a high redshift and high extinction, which makes it difficult to determine the redshift accurately.High-redshift solutions are in red, and low-redshift solutions are in blue.Bottom: A similar fit using the 'evolving' extinction distribution.The walkers are less inclined to combine high  and high  − values for fitting, even though it would accurately fit the low flux in the bluer bands.This improves the redshift retrieval performance.

Figure 5 .
Figure 5.An example of a fit to a simulated  ∼ 12.4 GRB from the 'no extinction baseline' run, with the black lines showing the input GRB spectra, the black points showing simulated fluxes with 1 uncertainties, and the blue lines indicating the final positions of the 50 walkers.For GRBs where the Ly dropoff occurs in or near the reddest wavelength band (z > 12), it becomes very difficult to constrain the redshift.
Figure7.Accuracy as a function of redshift when using the 'evolving' extinction prior with upper limit 1.The black dashed lines represent the redshifts where the Ly dropoff aligns with the filter edges, and the grey dashed lines show redshift in increments of 1 and the fraction of GRBs with accurate redshifts in increments of 0.2.Top: The fraction of retrieved redshifts within 10% and 20% of the input redshift for   > 5 (Universe perspective).Bottom: The fraction of retrieved redshifts within 10% and 20% of the input redshift for   > 5 (observer perspective).

Figure 8 .
Figure 8. Accuracy as a function of flux for both a Universe and an observer perspective.The accuracy is the rate of retrieval within 10% of the relevant redshift.The flux ceiling is the maximum flux in the reddest photometric band of the GRBs considered.

Figure 9 .
Figure 9. Completeness and purity as a function of flux.The flux ceiling is the maximum flux in the reddest photometric band of the GRBs considered.The purity decreases at lower flux values, while the completeness is fairly constant.

Table 2 .
Results from all simulations, listing redshift (z) and host galaxy extinction ( − ) inputs and priors, as well as the completeness (for GRBs above redshift 5, how many are correctly identified as high-z GRBs) and purity (for GRBs below redshift 5, how many are correctly identified as low-z GRBs).Baseline runs are italicized, and the most relevant statistic in terms of completeness and purity are in bold.