A meta-analysis of distance measurements to M87

We obtain the median, arithmetic mean, and the weighted mean-based central estimates for the distance to M87 using all the measurements collated in De Grijs et al (2020). We then reconstruct the error distribution for the residuals of the combined measurements and also splitting them based on the tracers used. We then checked for consistency with a Gaussian distribution and other symmetric distributions such as Cauchy, Laplacian, and Students-$t$ distributions. We find that when we analyze the combined data, the weighted mean-based estimates show a poor agreement with the Gaussian distribution, indicating that there are unaccounted systematic errors in some of the measurements. Therefore, the median-based estimate for the distance to M87 would be the most robust. This median-based distance modulus to M87 is given by $31.08 \pm 0.09$ mag and $31.07 \pm 0.09$ mag, with and without considering measurements categorized as"averages", respectively. This estimate agrees with the corresponding value obtained in DeGrijs et al (2020) to within $1\sigma$.


I. INTRODUCTION
The VIRGO cluster and its giant elliptical galaxy M87 is an important anchor for the distance estimates to more distant astronomical objects such as the Fornax and Coma cluster.Therefore, De Grijs et al [1] (D20 hereafter) have done an extensive data mining of all distance measurements to M87/Virgo cluster and compiled a database of 213 distances.D20 grouped these measurements into five categories, depending on the method used.They obtained a distance modulus of (m − M ) = 31.03± 0.14 mag corresponding to a distance measurement of 16.07 ± 1.03 Mpc.This central estimate was obtained using the weighted mean.
Given the important astrophysical and cosmological implications of the distance to the Virgo cluster from Hubble constant [25] to imaging of the black hole in M87 [26], estimating the distance to Fornax and Coma clusters, it is paramount to get a more robust estimate to M87.For this purpose, we revisit the issue of checking for non-Gaussianity of the error residuals using the measurements compiled in D20.The manuscript is structured as follows.The dataset used for our analysis is described in Sec.II.Our analysis procedure is described in Sec.III.Our results can be found in Sec.IV.Our conclusions are descibed in Sec.V.

II. DISTANCE MEASUREMENTS TO M87/VIRGO CLUSTER
We briefly review the data used for this analysis .More details can be found in D20 and references therein.D20 perused the NASA/ADS database (up to Sept 3, 2019) using the keywords 'M87' and obtained 213 independent distance estimations starting from Hubble 1929 measurement [27] to Hartke's 2017 analysis [28].Only those measurements associated with the M87 subcluster or centered around M87 were used.Their final catalog consists of 213 measurements out of which 173 have error bars.These have been collated at http://astro-expat.info/Data/pubbias.html.These measurements have been divided into 15 tracers.Out of these, one set of tracers consists of "Averages", which is a collation of 21 papers, with each paper containing averages of heterogeneous measurements of different types.Another category is called "Other methods", which consists of 15 independent measurements without any proper classification.These range from unspecified methods to techniques, which are independent of any distance ladder and purely based on Physics principles, such as the Sunyaev-Zeldovich observation of the VIRGO cluster from Planck [29].Among these 15 tracers, eight tracers have more than 10 measurements.Finally, we note that D20 has tabulated the distance measurements in terms of the distance modulus which is measured in AB magnitudes.We note that the published distances, whenever applicable have been homogenized to conform with the distance modulus to LMC, (m − M ) LM C = 18.49mag [30].One can trivially convert the measurements of the distance moduli into physical units of distance.We should note that we are using using the true distance moduli, i.e. the unreddened distance moduli.The recommended best-fit value of the distance to M87 obtained in D20 using the weighted mean is given by (m − M ) = 31.03± 0.14 mag [1].

III. ANALYSIS
The first step in assessing the Gaussianity of the error measurements of a dataset is to obtain the central estimate.For this purpose, we use all the measurements collated in D20.We obtained the central estimate using median, weighted, and arithmetic mean.The median estimate (m − M ) med corresponds to the 50% percentile value.The standard deviation of the median depends upon the distribution from where it is sampled from.Multiple methods have been proposed to estimate the sample variance of the median [31][32][33].Although, in our previous works we have used the prescription in [2] to get the error estimate, we use the following equation to get the uncertainty in the median estimate [34]: where N is the number of data points and σ is the sample standard deviation.Note however that the expression for σ med is mainly valid for Gaussian distributions as opposed to the method proposed in [2].
The weighted mean ((m − M ) wm ) using the observed distance modulus measurements ((m − M ) i ) is given by [35]: where σ i denotes the total error in each measurement.The error in the weighted mean is given by: The arithmetic mean central estimate ((m − M ) m ) is given by: with the standard deviation given by: For any central estimate based on the median or arithmetic mean, we include all the tabulated measurements, irrespective of whether they are provided with error bars or not.For the weighted mean, we only include the measurements which have uncertainties.Although, in principle one could also restrict the median or arithmetic mean based analysis to those measurements which only have error estimates, we decided to use the full dataset for computations which do not need the uncertainty estimates for increased statistics.From the measurements in Table I, the weighted mean estimate is found to be (m − M ) wm = 31.11± 0.008, whereas the median estimate is equal to 31.08 ± 0.09, and the arithmetic mean is equal to 30.97 ± 0.07.We also estimated the same after excluding the measurements tagged as "averages".These values can be found in Table I.Therefore the results are consistent with each other to within 1σ.The central estimates are also consistent with the measurements in D20 to within 1σ.
We also obtained the arithmetic mean, weighted mean, and median for each of the different measurements grouped according to the tracers.These results can be found in Table II.A graphical summary of the same can be found in Fig. 1.We find that the measurements obtained based on Hubble's law have the largest error bars and are also discrepant with respect to the other measurements.It is also inconsistent with the D20 estimate at about 2.7σ (arithmetic mean) to 3.8σ (median estimate).
We now check for the Gaussianity of the residuals using the combined dataset as well as using the measurements grouped according to the tracer used.For the latter, we only consider the Gaussianity as long as the number of independent measurements within each tracer is greater than 10.Such an analysis will guide us in choosing the most robust central estimate.

A. Error Residuals
After obtaining the central estimate for the distance (m − M ) CE modulus to M87 using each of the aforementioned methods, we calculate the residual error as follows [12,18]: Eq. 6 is used for , where σ CE denotes the error in the central estimate for each of the different methods, and σ i is the error in the individual measurements.As in Refs.[12,14,18], we denote our error distribution for the median ((m − M ) med ), arithmetic mean ((m − M ) m ) and the weighted mean ((m − M ) wm ) calculated from Eq. 6 by N med σi , N m σi , and N wm+ σi , respectively.When the central estimate is obtained from the weighted mean, one should take into account the correlations and the modified version of the error distribution, which accounts for these correlations becomes [18]: Therefore the only difference between N wm− σi and N wm+ σi is that the latter does not include correlations.Each of the above sets of |N σ | histograms is then symmetrized around zero.We now fit the symmetrized histogram distribution of |N σi | to multiple probability distributions as described in the next section.We now fit the symmetrized |N σ | histograms to a Gaussian distribution as well as other symmetric distributions, such as the Cauchy, Laplacian, and Student's t distribution, to test the efficacy of the each of these distributions.We briefly recap the different distributions used to fit the data.
The Gaussian distribution has a mean of zero and standard deviation of unity: The second distribution we consider is the Laplacian distribution: The third distribution, which we shall use is the Cauchy or Lorentzian distribution.It can be described by: Finally, the last distribution considered is the Student's-t distribution, given by n (or "degrees of freedom") and is given by [34]: For n = 1, the Student's-t distribution reduces to the Cauchy distribution, and is same as the Gaussian distribution for n = ∞.Similar to [20], we find the optimum value of n in the range from 2 to 2000.We also did a fit to each of the above distributions, after rescaling N by N/S, where S is an arbitrary scale factor ranging from from 0.001 to 2.5, using steps of size 0.01.In order to test the efficacy of the each of the above distributions to the residuals, we use the one-sample unbinned Kolmogorov-Smirnov (K-S) test [34].The K-S test uses the D-statistic, which measures the maximum distance between two cumulative distributions.The K-S test is agnostic to the distribution against which it is been tested, and is independent the size of the sample.Furthermore, one can easily obtain the p-value based on the D-statistic [34].Therefore, the one-sample K-S test can be used to test the goodness of fit.
The two distributions used as input to the one-sample K-S test are the error residual histograms and the parent PDF to which it is compared.We now present our results for the fits to N σ for the combined dataset as well as separately using each of the tracers.
• All measurements Our results for the goodness of fits to all the four distributions using all the tracers are summarized in Table III .The corresponding results for all tracers except for the ones classified as "averages" can be found in Table IV.For the data with averages (cf.Table III), we find that for all the four estimates, the Gaussian distribution is a very poor fit with p-values close to or less than 10 −7 .Only if the scale factor is very much different from unity (2.3), the Gaussian distribution for the median estimate is a good fit (with p-value of 0.6).For the scale factor of one, only the Cauchy distribution shows a very good fit for the median estimate.If we exclude the measurements tagged as "averages", the results are comparable as can be seen from Table IV.Hence, we conclude that the distance modulus measurements show evidence for non-Gaussianity in the residuals, when we analyze all the measurements.Therefore, in case we need to report a central estimate, then only the median value is the most robust, since it is not affected by non-Gaussianity of the errors [14].
• Color-magnitude/Luminosity relation The summary statistics after considering the data obtained using color-magnitude/luminosity relation measurements can be found in Table V.We find that all the four estimates show evidence for Gaussianity for the scale factor of unity (with p-values greater than 0.7).This shows that there is no evidence for systematic errors using the color-magnitude/luminosity as tracers.However, other distributions show comparable or larger p-values for all the central estimates.
• Faber-Jackson relation The corresponding results when obtaining the distance modulus using the Faber-Jackson relation can be found in Table VI.We find that the Gaussian distribution provides a marginal fit for scale factors of unity for all the central estimates with p-values only slightly greater than 0.05.The Cauchy distribution provides the best fit with p-vales close to one.The Gaussian distribution is a good fit to the residuals only for scale factors between 2.5 and 2.8.
• Globular Cluster Luminosity Function The corresponding results when obtaining the distance modulus using the globular cluster luminosity function can be found in Table VII.We find that for the median central estimate, the symmetrized N σ is consistent with the Gaussian distribution.However, for the arithmetic and weighted mean, the Gaussian distribution is not a good fit with p-values only slightly greater than 0.05.The estimates based on the median and arithmetic mean have one outlier measurement, whose distance modulus is given by m − M = 20.9[36].Since this measurement has no error bars provided, it was excluded in the weighted mean-based estimate, which explains why it mainly affects the p-value for the arithmetic mean.
• Planetary Nebula Luminosity Function The results using the planetary nebula luminosity function can be found in Table VIII.We find that the Gaussian distribution provides a very good fit for all estimates with p-values > 0.05.However, for all the central estimates, Laplacian distribution provides the best fit with a p-value higher than the Gaussian distribution.
• Surface Brightness Variations The results using surface brightness variations can be found in Table IX.We find that the Gaussian distribution is a good fit to N σ for all the four central estimates with p-values > 0.3.This shows that there are no systematic errors in the distance estimates to M87 using surface brightness variations.However the Students-t distribution provides a better fit than the Gaussian distribution for all the central estimates.
• Supernovae The corresponding results using supernovae as distance indicators to M87 can be found in Table X.We find that the Gaussian distribution is very good fit to N σ for all the central estimates.However for the median estimate and weighted mean (without correlations), the Laplacian distribution provides a better fit than the Gaussian distribution, whereas it is comparable for the weighted mean-based estimate, which accounts for correlations.
• Tully-Fisher relation The corresponding results using Tully-Fisher based distances to M87 can be found in Table XI.We find that the Gaussian distribution is not a good fit with p-values equal to 0.01 for the weighted mean and for the arithmetic mean.We get a good fit to the Gaussian distribution only with scale factors > 2 for all the central estimates.For median and weighted mean, only the Students-t distribution provides a p-value > 0.05.Therefore, the measurements based on Tully-Fisher relation contain systematic errors.
• Other Methods The results for Gaussianity tests using an assortment of other methods can be found in Table XII.Here, the median and arithmetic mean (which do not use the error bars) provide a good fit to the Gaussian distribution.However, the weighted means do not provide a good fit to the Gaussian distribution.However, even for the median and arithmetic means, the other three distributions such as Laplacian, Cauchy, and Students-t distributions provide a better fit than the Gaussian distribution.
• Averages The results for Gaussianity tests for the measurements tagged as "averages" can be found in Table XIII.We find that the residuals using all the central estimates are not consistent with Gaussian distributions (with p-values < 0.05).However, this is not surprising, since these data themselves consist of averages obtained using the different methods.Only the Cauchy distribution provides a good fit to the underlying residuals.Recently, D20 did an extensive data mining of literature to compile all the distance measurements to M87 using the Galactic center, LMC and M31 as distance anchors.They also classified all measurements into 15 distinct tracers, of which eight tracers contained more than 10 measurements.We carried out an extensive meta-analysis for all these measurements along the same lines as our previous works [10,11,20], which follow in spirit similar work done by Ratra et al [12,14,18] (and references therein).The main goal was to characterize the Gaussianity in the error residuals of these measurements, when using the full dataset as well as after classifying them according to the type of tracers used.Any evidence for non-Gaussianity in the residuals would point to systematic errors in these measurements [2].Therefore, our work complements the extensive analysis carried out in D20.
For this purpose, we calculated the central estimate using both the weighted mean (with and without correlations), arithmetic mean as well as the median value.The median estimate does not incorporate any errors.This was done for the full dataset and also after classifying the measurements according to the type of tracers used as long as each tracer contained more than 10 measurements.These results can be found in Table I and Table II respectively.We then fit these residuals to four distributions, viz.Gaussian, Laplace, Cauchy, and Student's t distribution using the one-sample K-S test.These results can be found in Tables III, IV, V, VI, VII, VIII, IX, X, XI XII, and XIII.
Our conclusions are as follows: • The central estimates which we obtained using all the three central estimates agree with the estimates in D20 to within 1σ.
• If we look at the measurements after classifying them according to tracers, except for Hubble law, all the measurements are consistent with each other.The measurements based on Hubble's law are inconsistent to within 3 − 4σ.
• When we consider the full dataset, the residuals using the weighted mean are a poor fit to the residuals.Therefore the median estimate which we obtain (31.08 ± 0.09) should be used as the central estimate.
• We find that after splitting the data according to the tracers, the measurements based on the Tully-Fisher relation and those tagged as "Averages" show a poor fit to the Gaussian distribution for all the central estimates.A good fit to Gaussian distribution is only obtained for scale factors between 2.5 and 3.8.This indicates that these measurements contain unaccounted for systematic errors.
• The residuals using the measurements based on the Faber-Jackson relation are only marginally consistent with the Gaussian distribution (for all estimates) with p-values between 0.05-0.1.
• For globular cluster luminosity function based measurements as well as those classified as "Other", only the residuals using median estimate show a good fit to Gaussian distribution.All other estimates have a poor fit to the Gaussian distribution.
• For all other measurements classified according to tracers, the residuals are consistent with a Gaussian distribution.However, other distributions such as Laplace or Cauchy also provide an equally good or better fit to the residuals.
Note added: After this work was submitted, we were informed that another work on similar lines was under preparation, and has been submitted for publication at the time of writing [37].This work focuses on using the median estimates to estimate the systematic errors in the distance measurements, whereas the emphasis in our work was on testing the Gaussianity of the error residuals.

TABLE I :
Central estimates and 1σ bars for Messier 87 (mag) distance measurements.For the median and arithmetic mean, we have used all measurements without error bars whereas for the weighted mean we only used the ones with error bars.

TABLE II :
Central estimates and 1σ bars for Messier 87 (mag) distance measurements, using different Individual Tracers.For the median and arithmetic mean we have used all measurements without error bars whereas for the weighted mean we only used the ones with error bars.

TABLE III :
Probabilities from K-S test for various distributions using all measurements (including those tagged as "averages") of M87.We have used 213 data for the median and arithmetic mean, and 173 data for the weighted mean.

TABLE IV :
Probabilities from K-S test for various distributions using all measurements (except those tagged as averages) measurement data of M87.We have used 190 data points for the median and arithmetic mean, and 153 data for the weighted mean.All variables have the same meaning as in TableIII.

TABLE V :
Probabilities from K-S test for various distributions using the Color-magnitude/luminosity relation measurement data of M87.We have used 11 data for the median and arithmetic mean, and seven measurements for Weighted mean.All variables have the same meaning as in TableIII.

TABLE VI :
Probabilities from K-S test for various distributions using the Faber-Jackson relation measurement data of M87.We have used 11 data for median and arithmetic mean, and 10 for the weighted mean.All variables have the same meaning as in TableIII.

TABLE VII :
Probabilities from K-S test for various distributions using the Globular Cluster Luminosity Function (GCLF) measurement data of M87.We have used 32 data for the median and arithmetic mean, and 23 for the weighted mean.All variables have the same meaning as in TableIII.

TABLE VIII :
Probabilities from K-S test for various distributions using the Planetary Nebula Luminosity Function (PNLF) measurement data of M87.We have used 12 data for the median and arithmetic mean, and 10 for the weighted mean.All variables have the same meaning as in TableIII.Distribution S a p b n c Median ((m − M ) med )

TABLE IX :
Probabilities from K-S test for various distributions using the Surface Brightness Variations (SBF) measurement data of M87.We have used 18 data for the median and arithmetic mean, and 17 for the weighted mean.All variables have the same meaning as in TableIII.Distribution S a p b n c Median ((m − M ) med )

TABLE XII :
Probabilities from K-S test for various distributions using the measurements classified as "Other methods".We have used 15 data for the median and arithmetic mean, and 13 for weighted mean.All variables have the same meaning as in TableIII.Distribution S a p b n c Median ((m − M ) med )