Reconstruction of luminosity function from flux-limited samples

The properties of the progenitors of gamma-ray bursts (GRBs) and of their environment are encoded in their luminosity function and cosmic formation rate. They are usually recovered from a flux-limited sample based on Lynden-Bell's $c^{-}$ method. However, this method is based on the assumption that the luminosity is independent of the redshift. Observationally, if correlated, people use nonparametric $\tau$ statistical method to remove this correlation through the transformation, $L^{\prime}=L/g(z)$, where $z$ is the burst redshift, and $g(z)=(1+z)^{k}$ parameterizes the underlying luminosity evolution. However, the application of this method to different observations could result in very different luminosity functions. By the means of Monte Carlo simulation, in this paper, we demonstrate that the origin of an observed correlation, measured by the $\tau$ statistical method, is a complex combination of multiple factors when the underlying data are correlated. Thus, in this case, it is difficult to unbiasedly reconstruct the underlying population distribution from a truncated sample, unless the detailed information of the intrinsic correlation is accurately known in advance. In addition, we argue that an intrinsic correlation between luminosity function and formation rate is unlikely eliminated by a misconfigured transformation, and the $g(z)$, derived from a truncated sample with the $\tau$ statistical method, does not necessarily represent its underlying luminosity evolution.


INTRODUCTION
For any astronomical source, there are two key properties that characterises the population: (a) their cosmic formation rate, representing the number of sources per unit comoving volume and time as a function of redshift; (b) their luminosity function, which represents the relative fraction of sources in a given luminosity range per unit volume.The statistical problem at hand is the determination of the two properties from flux-limited samples.Lynden-Bell (1971) applied a novel method to study the luminosity function and density evolution from a flux-limit quasar sample, which is called Lynden-Bell's  − method.This method is based on the assumption that the luminosity is independent of the redshift.To overcome this shortcoming, Efron & Petrosian (1992) generalized Lynden-Bell's idea and developed a non-parametric test statistic for independence, which is called non-parametric  statistical method.These methods have been widely used to estimate the intrinsic luminosity function and cosmic formation rate of astronomical sources, such as galaxies (Kirshner et al. 1978;Peterson et al. 1986;Loh & Spillar 1986), GRBs (Yonetoku et al. 2004;Lloyd-Ronning et al. 2002;Yonetoku et al. 2014;Kocevski & Liang 2006), and quasars (Maloney & Petrosian 1999;Singal et al. 2011).
Before applying Lynden-Bell's  − method, one must first de-★ E-mail: luruĳing@gxu.edu.cntermine whether L and z are correlated or not.Traditionally, if correlated, many authors (e.g., Lloyd & Petrosian (1999); Maloney & Petrosian (1999); Lloyd-Ronning et al. (2002); Yonetoku et al. (2004); Yu et al. (2015)) parameterized it as the transformation,  ′ = /(), where () = (1 + )  , the power-law redshift dependence is always adopted to parameterize the luminosity evolution.Once a function () is found, one could remove the correlation and yield an uncorrelated data set { ′ , }, then their distributions could be estimated by using Lynden-Bell's  − method.By using this method to derive the luminosity function and the formation rate of GRBs, Yu et al. (2015) found that an unexpectedly low-redshift excess in the formation rate of GRBs, compared to the star formation rate (SFR).Whereas, following the same approach, other authors (Pescalli et al. 2016;Tsvetkova et al. 2017) did not.
More recently, Bryant et al. (2021) had re-analysed several previous works (Yu et al. 2015;Pescalli et al. 2016;Tsvetkova et al. 2017;Lloyd-Ronning et al. 2019) and investigated the origin of the evolution of the luminosity/energetics of GRBs with redshift based on the same approach, and found that the effects of the detection threshold have been likely severely underestimated.Then they argued that the observed correlations are artefacts of the individually chosen detection thresholds of the various gamma-ray detectors, and that an inappropriate use of this statistical method could lead to biased scientific discoveries.
So our questions arise: When one applies a truncation function to an intrinsically correlated population, what factors would impact on the distribution of the  statistic?In the case, does the transformation decouple the intrinsic correlation between luminosity and redshift of astronomical sources?
In this paper, we will investigate the issues in detail by performing Monte Carlo simulations.An introduction to Lynden-Bell's  − method and the nonparametric  statistical method is given in Section 2. To investigate the factors affecting the  statistic, in Section 3, we first apply the  statistical method to a toy model, where the intrinsic correlations between two random variables are known.Then, in Section 4, applying the same approach to the realistic example in an astronomical context, i.e., the luminosity function and the formation rate of GRBs, we explore whether a correlated population distribution could be unbiasedly reconstructed by the transformation.Finally, in Section 5, we present our conclusions and discussions.

LYNDEN-BELL'S 𝐶 − METHOD AND NONPARAMETRIC TEST METHOD
Following the description about Lynden-Bell's  − method in (Ivezić et al. 2020) (see Figures of (4.8) and (4.9) in their Book), Here we give a summary description for the test statistic as follows.
Suppose  and  are the two random variables (RVs), if they are uncorrelated, the bivariate joint density of (, ) can be represented as ℎ(, ) =  ()(). (1) Assuming that pairs (, ) are observable only if they satisfy the truncation function (Efron & Petrosian 1992), here () is a monotonic function of .In an astronomical context,  can be considered as redshift,  as absolute magnitude, and the truncation function as magnitude limit, as shown in Figure 4.10 in Ivezić et al. (2020).
In the following analysis, we will adopt a symmetric function (Ivezić et al. (2020) (seen in Fig. (1)) as where  low is a constant.To test for independence when the data is truncated, firstly, for ith object, one can define an associated set as where  is the size of the observable sample.This is the largest -limited and -limited data subset for ith object, with  i elements.This region is shown in Fig.
(1) as a black solid rectangle.Secondly, sorting the set  i by  j , then the number of objects with  <  i in set  i is defined as  j , the rank for ith object.If X and Y are independent (Null hypothesis  0 ), then  j is uniformly distributed between 1 to  i .The Efron-Petrosian test statistic  is then, where  i = (1 +  i )/2,  i = ( i − 1) 2 /12 are the expected mean and the variance of  i , respectively.This is a specialized version of Kendell's  statistic.The  statistic has mean 0 and variance 1 under  0 .As pointed out by Efron & Petrosian (1992), the  statistic will approximately follow as a standard normal distribution,  (0, 1), even for  as small as 10 under  0 .Following classical statistical inference, one can accept  0 if  ≤ 1.645, and reject  0 otherwise.The rejection probability of the test for independence would be approximately 0.10.Once accepting  0 , the cumulative distributions for the two random variables are defined as (Ivezić et al. 2020) and Then, where it is assumed that  i are sorted ( 1 ≤   ≤  n ).Analogously, if  k is the number of objects in an associated set defined as This region is also shown in Fig.
(1) as a blue dashed rectangle.Then where it is also assumed that  j are sorted ( 1 ≤  k ≤  n ).

CASE OF THE BIVARIATE NORMAL DISTRIBUTION
A bivariate distribution is a statistical method used to show the probability of occurrence of two random variables.In this section, we first use Monte Carlo simulation to test the  statistic based on a truncated bivariate normal distribution when the relations between two random variables are known (similar to the toy model in Ivezić et al. ( 2020)), and next, we further investigate whether one can apply the  statistical method to decouple a correlation between the physical quantities of a source, such as luminosity and redshift, generated by effects of evolution and the truncation or bias introduced by the flux limit.
RVs  and  have a joint normal distribution, (,  ) ∼  (, Σ).First step, we investigate this issue based on the assumption that there is none correlation between RVs  and  .We define a joint probability density function (PDF) from a 2 truncated normal distribution based on the truncMVN python package1 , with the parameters of (  ,   )=(0.67,0.33),(  ,   )=(0.33,0.33),and Σ , = 0, and  and  are truncated in the range of [0,1].Then we generate a random sample with the size of  = 10 6 objects from the population distribution with the emcee sampler2 .Shown in Fig.
(2) are the corner and the resultant one-cumulative distributions of the sample.We obtain the sample Pearson correlation coefficient,  ≃ 0. To study the effect of selection (or truncated) function on the  statistic, we apply three different selection functions with  low = 0.7, 0.5, 0.3 to the sample, respectively, and then obtain three truncated data sets.As seen in the upper panel of Fig.
(2), the larger the  low , the smaller the proportion of truncated samples in the population.
For every truncated data set, we create 10 5 pseudo samples and each sample contains 10 2 observable objects3 .Then we calculate the distribution of the  statistic based on Eq.( 5) for these pseudo samples, and compare it to a standard normal distribution by using a KS-test.The results are shown in Fig. (3).The chance probabilities of the three tests are 0.999, 0.968 and 0.999, respectively, which indicates that the  statistic follows well a standard normal distribution for all the three cases.Thus, with Monte Carlo simulations, for the first time, we confirm the conclusions proposed by Efron & Petrosian (1992), i.e., the  statistic always follow a standard normal distribution for any truncated data as long as  0 does hold.
At the same time, with these pseudo truncated samples, we also derive the one-dimensional cumulative distributions for the two random variables based on Eqs. of ( 8) and ( 9), and compare them to their corresponding population distributions by the KS-test, respectively.The comparison results are shown in Fig. ( 4), which indicates that their corresponding population distributions could be unbiasedly recovered.Note that here we normalize the sample distribution to its corresponding population distribution for comparison.Further investigations show that the reconstructed cumulative distributions are not sensitive to the truncated sample size we adopted.
In conclusion, the  statistic is indeed a robust test statistic for independence of the truncated data.Once one accepts  0 (i.e.,  and  are truly independent), their population distribution could be unbiasedly reconstructed from truncated data with Eqs. of ( 8) and ( 9), irrespective of adopted selection functions.The fact shows that an observed correlation does unlikely come from a truncation effect as long as the  0 does hold.Now we wonder whether the  statistic still obeys a standard normal distribution when the two random variables,  and  , are correlated.In that case, whether one would still unbiasedly recover the underlying population distribution from truncated data based on Eqs. of ( 8) and ( 9) or not?
With the truncMVN python package and following the same sampling method above, we can also create some pseudo samples from the population distributions with the different values of  and Σ , , as adopted in Fig. (2).As pointed out by some authors (Kan & Robotti 2017;Galarza et al. 2020), the moment of the truncated variable depends on the mean and covariance matrix of a bivariate population distribution, and one can find the explicit expression for low order moments of the truncated multivariate normal distribution in their papers.In this paper, we calculate the Pearson correlation coefficient () of the pseudo sample with numpy python library4 .Finally, by sampling from different populations with different values of  and Σ , , we obtain four samples with Pearson's =0.15,0.32, 0.50, and 0.73, respectively.
To answer the questions mentioned above, we also apply the same three truncation functions above to the four samples, respectively, and obtain three truncated data sets for every sample.Then, with the same methods as done in Figs. of ( 3) and ( 4), we calculate the distributions of the  statistic and their corresponding one-dimensional cumulative distributions for RVs  and  , respectively.Meanwhile, the influence of observable sample size on the  statistic is also explored.Finally, the results of these analyses are shown in Fig. (5).
Evidently, unlike the case of an uncorrelated bivariate distribution, we can draw the following conclusions from Fig. ( 5): In all cases we have explored, the  statistic no longer follow a standard normal distribution, but a normal distribution, defined as  ( τ,  2  ) (where τ and   are the average and standard variance of the distribution, respectively).The τ changes with the Pearson's  of population, observable sample size ( Tr ), as well as different selection functions.But the   is less sensitive to them, with   ∼ 1 in all cases (The   is shown as the error bar of the data point in the figure), which means that the origin of an observed correlation, measured by the  statistical method, is a complex combination of the three factors.
Further investigations show that, in that case, the underlying population distribution can not unbiasedly reconstructed from the truncated data based on Eqs. of ( 8) and ( 9).

CASE OF THE LUMINOSITY FUNCTION AND THE FORMATION RATE OF GRBS
Now, let's turn our attention to the realistic example in an astronomical context.It is often the case in the analysis of astronomical data that one is faced with reconstructing the joint bivariate distribution from truncated data.And the luminosity function and cosmic formation rate of GRBs (or other astronomical object) (, ) is one such set of bivariate data.For simplicity, it is often assumed that such a bivariate distribution is separable in the following form, where () is the GRB Formation Rate and () is GRB's LF.
The GRB formation rate is usually assumed to trace the cosmic star formation rate (SFR).Here, we assume that the rate is purely proportional to the SFR, and parameterize it as (Hopkins & Beacom 2006;Li 2008 Cumulative distribution Illustration of a realization of the truncated bivariate normal distribution with the parameters of (  ,   ) = (0.67, 0.33), (  ,   ) = (0.33, 0.33), and  = 0. Upper panel: the corner of a sample from the parent distribution with  = 10 6 , in which the contours enclose the regions with different colors that contain 68%, 95% and 99% of the probability, the red dashed lines from upper to lower represent the truncated function defined in Eq. ( 3) with  low = 0.3, 0.5, 0.7, respectively, and the red plus marks the center of the PDF.Lower panel: comparison of the one-dimensional cumulative distributions (solid points) of the sample to their corresponding population distributions.
For the GRB's LF, we adopt the Schechter function (Schechter 1976) as follows (12) where  represents the power-law parameter for the faint-end and  * is the characteristic luminosity, while  * serves as the normalisation constant.Due to the typically large span of the luminosities, here we use logarithmic units in the Schechter function, written as follows, (log ) = ln 10 * 10 ( +1) (log −log  * ) exp −10 log −log  * .
In the luminosity-redshift plane, the truncated function is the luminosity limit (The red dashed line in Fig. 6), given as where  L () and  min are the luminosity distance at redshift  and flux-limit, respectively.Associated sets  i and  k for ith GRB( i ,  i ) can be defined as, and respectively (seen Fig. 6).
With the same method done in sec.
(3), we also draw a pseudo sample with  = 10 6 from the joint probability function (Equation 10) with the emcee sampler.With Eqs. of ( 8) and ( 9), we could also reconstruct their population distributions well from the observable data of the pseudo sample truncated by the flux limit at 1 × 10 −8  −2  −1 .The results are shown in Fig. (6).Further investigations show that, when different values of the flux limit are adopted, the population distributions could always be unbiasedly recovered from their corresponding truncated data, indicating that the non-parametric  statistical method can be applied to unbiasedly recover the underlying population distributions from truncated data regardless of adopted detection thresholds.The same conclusion could be arrived in the contex of an uncorrelated bivariate normal distribution.
Unfortunately, here's the fact that the luminosities of GRBs are strongly correlated with their redshifts is a common feature (Qin et al. 2010;Deng et al. 2016;Yu et al. 2015;Petrosian et al. 2015;Pescalli et al. 2016;Lloyd-Ronning et al. 2019).In this case, one can not directly apply Lynden-Bell's  − method to reconstruct their underlying parent distribution.
The popular method to eliminate the correlation (e.g., Lloyd & Petrosian (1999); Maloney & Petrosian (1999)   Next, we produce a truncated data set by the flux limit at 1 × 10 −7  −2  −1 (The data above the red line in the upper panel of Fig. ( 7) are observable).Based on the truncated data, we create 10 5 pseudo samples, and each pseudo sample contains 10 2 observable GRBs, the same as done in section 3.For every observable sample, we make the reverse transformation of the intrinsic luminosity evolution  =   / int (), and calculate the best  at  = 0, defining it as  best .Next, with Eqs. of ( 8) and ( 9), we could calculate their corresponding luminosity and redshift distributions from these uncorrelated data sets { i ,  i }.We fit a Gaussian function to the distribution of the  best , giving kbest = 2.51 ± 0.39.The result is also shown in the lower panel of Fig. (7).As the reconstructed luminosity and redshift distributions are similar to those shown in the lower panel of Fig. ( 6), we do not plot them repeately.Further investigations show that, the distribution of the  best and the reconstructed luminosity and redshift distribution are less sensitive to both the adopted detector threshold and observable sample size.These results show that, if the detailed information of the luminosity evolution is accurately known, we could remove the effect of the evolution by making the reverse transformation of the intrinsic luminosity evolution, then unbiasedly reconstruct their underlying parent distribution by Lynden-Bell's  − method.Some authors (Yu et al. 2015;Pescalli et al. 2016) came to similar conclusion.
However, this is not the case when the detailed information of intrinsic luminosity evolutions is not known.As shown in Fig ( 8) is an instance of the case, in which, we assume that the intrinsic luminosity evolves with redshift by the law of   =  int (), where  int = (3 + ) 2.5 , is taken to parameterize its intrinsic luminosity evolution.Then, a bias reverse transformation function, such as, () = (1 + )  , is adopted (The same as that usually adopted by some authors in the astronomical context) to reconstruct their underlying parent population as done in Fig ( 7).We find that the  best at  = 0 also obeys a Gaussian distribution.Then a Gaussian function is employed to fit.This yields a value of kbest = 1.50±0.36,which differs significantly from its intrinsic evolution index.In this case, although its redshift distribution can be unbiasedly recovered, the underlying luminosity distribution can not.Certainly, in this instance, if the reverse transformation function, () = (3 + )  , is adopted, their underlying parent population could also unbiasedly recovered.
The fact turns out that, in the reconstruction of an intrinsic luminosity function, if using a misconfigured transformation function, one does not unbiasedly recover its underlying parent population, though such a transformation does produce an uncorrelated truncated sample.

Figure 1 .
Figure 1.Illustration for the definition of a truncated data set, and for the associated subset used by Lynden-Bell  − method.The truncated function is defined by Eq. (3).The sample is limited by  <  max and  <  max (  ) (light-shaded area).Associated sets  i and  k are shown by the black solid rectangle and blue dashed rectangle, respectively.Noted this figure is adapted from Fig. (4.8) in Ivezić et al. (2020) ) () = 0.0157 + 0.118 1 + (/3.23) 4.66 .

Figure 3 .
Figure 3.Comparison of the distributions of the  statistic derived from the three truncated data (See the text for details) to the standard normal distribution (solid line).The chance probabilities of  -tests are also presented in their corresponding legends.

Figure 4 .
Figure 4. Demonstrations of the performances of the reconstructed distributions from different truncated data, compared to their corresponding population distributions.In each panel, the red dashed lines and the blue dash-dotted lines represent the population distributions of  and  , respectively.The truncated functions with different values of  low are also marked in their corresponding panels, respectively.The chance probabilities of  tests between the mean distributions of those truncated data and population distributions are also presented in their corresponding legends.For every random variable,  and  , in each panel, we only plot 50 samples (gray solid lines) chosed randomly from all truncated samples.

Figure 5 .
Figure 5. Demonstrations of the factors impacting on the  statistic.Upper panel: the τ is a function of the observable sample size ( Tr ) coming from different populations with different values of  (as marked in the legend), but with the same selection function with  low = 0.5.Lower panel: the τ is a function of the Pearson's coefficient  of population, and the observable samples with the same of  Tr = 2 × 10 3 come from the selection functions with different values of  low (as marked in the legend).For all cases, the value of   is marked by the error bar of every data point.