Estimating Photometric Redshift from Mock Flux for CSST Survey by using Weighted Random Forest

Accurate estimation of photometric redshifts (photo-$z$) is crucial in studies of both galaxy evolution and cosmology using current and future large sky surveys. In this study, we employ Random Forest (RF), a machine learning algorithm, to estimate photo-$z$ and investigate the systematic uncertainties affecting the results. Using galaxy flux and color as input features, we construct a mapping between input features and redshift by using a training set of simulated data, generated from the Hubble Space Telescope Advanced Camera for Surveys (HST-ACS) and COSMOS catalogue, with the expected instrumental effects of the planned China Space Station Telescope (CSST). To improve the accuracy and confidence of predictions, we incorporate inverse variance weighting and perturb the catalog using input feature errors. Our results show that weighted RF can achieve a photo-$z$ accuracy of $\rm \sigma_{NMAD}=0.025$ and an outlier fraction of $\rm \eta=2.045\%$, significantly better than the values of $\rm \sigma_{NMAD}=0.043$ and $\rm \eta=6.45\%$ obtained by the widely used Easy and Accurate Zphot from Yale (EAZY) software which uses template-fitting method. Furthermore, we have calculated the importance of each input feature for different redshift ranges and found that the most important input features reflect the approximate position of the break features in galaxy spectra, demonstrating the algorithm's ability to extract physical information from data. Additionally, we have established confidence indices and error bars for each prediction value based on the shape of the redshift probability distribution function, suggesting that screening sources with high confidence can further reduce the outlier fraction.

Additionally, photo- approaches have simple and uniform selection functions, extend to fainter flux limits and larger angular scales, and thus probe much larger cosmic volumes (Hildebrandt et al. 2010).
There are numerous ongoing and next-generation photometric surveys underway or set to begin in the near future.These include the Sloan Digital Sky Survey (SDSS)1 (Fukugita et al. 1996;York et al. 2000), Dark Energy Survey (DES)2 (Dark Energy Survey Collaboration et al. 2016;Abbott et al. 2021), the Large Synoptic Survey Telescope (LSST)3 (Ivezić et al. 2019;LSST Science Collaboration et al. 2009), the Euclid Space Telescope4 (Laureĳs et al. 2011), the Wide Field Infrared Survey Telescope (WFIRST)5 (Spergel et al. 2015), and more.These surveys will detect billions of galaxies in a large redshift range through spectroscopic or photometric images, making it possible to accurately measure the dynamic evolution of the universe.
Currently, photo- computation methods can be broadly categorized into two categories: template fitting algorithms and empirical training algorithms, and many works have been done to compare these two different algorithms (e.g.Hildebrandt et al. 2010;Abdalla et al. 2011;Sánchez et al. 2014;Beck et al. 2017;Euclid Collaboration: Desprez et al. 2020).Template fitting algorithms (e.g.Benítez 2000;Bolzonella et al. 2000;Csabai et al. 2003;Ilbert et al. 2006;Feldmann et al. 2006;Assef et al. 2010) use either empirical (e.g.Coleman et al. 1980;Assef et al. 2010) or synthetic spectral templates (e.g.Bruzual & Charlot 2003;Maraston 2005;Conroy et al. 2009;Eldridge & Stanway 2009) to estimate photo-.These techniques find the best match between the observed magnitudes or colors and the synthetic magnitudes or colors from the templates that are sampled across the expected redshift range of the photometric observations.Empirical training methods use a spectroscopic training dataset to calibrate an algorithm that can be quickly applied to new photometric observations.Initially, the training set was used to map a polynomial function between the colors or other photometric observables and the redshifts (e.g.Connolly et al. 1995;Brunner et al. 1997;Oyaizu et al. 2008;Geach 2012;Way & Klose 2012;Sadeh et al. 2016;Tanaka et al. 2018).More recently, this process has been extended to machine learning algorithms such as Random Forest.
Previous works have utilized prediction trees in photo- calculation, such as Carliles et al. (2010), who used the RF package in R to predict photo- and its error.They tested their approach on a subset of the SDSS Data Release 6 (Adelman- McCarthy et al. 2008) catalog with colors as input features and found that the RF method is very suitable for photo- prediction, producing comparable results to other machine learning methods.Carrasco Kind & Brunner (2013) built upon these findings and developed a new parallel machine learning photo- Python code called TPZ6 .TPZ employs both classification and regression trees in RF to calculate the probability density function (PDF).This approach utilizes extra information encoded within the measurement errors, generates ancillary information describing the spectroscopic training sample, and provides better control of the uncertainties.The authors tested their codes on galaxy samples drawn from the SDSS main galaxy sample and from the DEEP2 survey, obtaining excellent results in each case.Fotopoulou & Paltani (2018) combined three RFs and template fitting methods to identify stars, estimate redshifts for all galaxy populations including active galactic nucleis (AGN) and quasi-stellar objects (QSO).They applied their codes on the near-infrared VISTA public surveys, matching them with optical photometry data from CFHTLS, KIDS, and SDSS, mid-infrared photometry from WISE, and ultra-violet photometry from the Galaxy Evolution Explorer (GALEX).Their analysis demonstrated that their methods enhance photometric redshift accuracy for both normal galaxies and AGN without the need for extra X-ray information.Zhou et al. (2019) assessed their cross-matched catalogue, combining  photometry from the Canada-France-Hawaii Telescope Legacy Survey (CFHTLS) and Yband photometry from the Subaru Suprime camera, using a machine learning photometric redshift algorithm based upon RF regression.Their investigation revealed that utilizing corrected aperture photometry resulted in a notable enhancement in photo- accuracy when compared to the original SEXTRACTOR catalogues from CFHTLS and Subaru.Mucesh et al. (2021) employed the DES catalog, incorporating redshift and stellar mass data from COSMOS2015, to assess the effectiveness of their python package ( GALPRO7 ) based on RF.Their results revealed that their approach surpasses template fitting across all their predefined performance metrics.Li et al. (2023) conducted a comparative analysis of three machine learning techniques, specifically CATBOOST, Multi-Layer Perceptron, and RF, in the context of cross-matched datasets involving the DESI Legacy Imaging Surveys DR9 galaxy catalog, as well as LAMOST DR7, GAMA DR3, and WiggleZ galaxy catalogs.Although their findings indicate that RF is not the most optimal method for redshift estimation in their study, it exhibited a more favorable bias compared to CATBOOST.
Given its simplicity and robust performance in photometric redshift estimation, RF stands out as a valuable tool for showcasing feature importance, offering valuable insights for future survey strategies.However, the original RF algorithm proved to be ineffective when applied directly.As a response, enhancements were made to the algorithm, leading to a more effective and improved methodology for achieving better results.It is worth noting that there are also other powerful algorithms available, such as CATBOOST.In this work, we utilize RF to estimate the accuracy of photo- in the optical survey of CSST, a 2-meter diameter space telescope expected to launch around 2024 and will be in the same orbit as the China Manned Space Station (Zhan 2011;Cao et al. 2018;Gong et al. 2019).
CSST is planned to observe about 17,500 deg 2 in approximately 10 years, covering the optical and NIR bands from ∼ 250 nm to ∼ 1000 nm.The 5 limit for a point source magnitude can be as high as ∼ 26 AB mag for the , , and  bands, and is approximately 24.5 ∼ 25.5 for the other bands.In Figure 1, the real transmissions, including detector quantum efficiency, of the CSST seven photometric filters are shown.The primary scientific goals of CSST involve exploring the evolution of LSS, the properties of dark matter and dark energy, galaxy formation and evolution, among others, hence, photo- measurements are necessary.
In a recent study, Zhou et al. (2022a) explored the accuracy of photo- methods in the CSST using four neural networks: Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), Hybrid, and Hybrid transfer.These networks were used to extract photo- information from mock CSST flux and image data, achieving a photo- accuracy of approximately 0.02 and outlier fraction of around 0.9%.
As part of a series of studies on CSST photo- estimation, we tested the use of the RF algorithm.We generated mock photometric training data from the COSMOS galaxy catalog (Scoville et al. 2007;Ilbert et al. 2009) incorporating CSST instrument effects.Our aim is to assess the accuracy of this method and evaluate the credibility of prediction results.We also attempted to improve the prediction process by adding errors of input feature as fitting weight.These tests will inform the development of strategies for estimating photo- using RF on real CSST data in the future.
The paper is structured as follows: Section 2 describes the generation of mock flux for training, while Section 3 introduces the EAZY and RF algorithm for the redshift estimation problem.In Section 4, we apply RF algorithm and EAZY code to the mock data and present the results.Finally, we summarize the study in Section 5.

MOCK DATA
In this section, we will briefly introduce how the mock data is generated.Further details about the mock process can be found in Zhou et al. (2021).The mock data should have similar properties like redshift, magnitude distribution and galaxy type etc. with observations of the CSST survey.To achieve a high degree of realism in simulating galaxy images for the CSST photometric survey, we employ mock image generation techniques rooted in observations taken within the COSMOS field using the Advanced Camera for Surveys of the Hubble Space Telescope (HST-ACS), incorporating CSST instrumental effects.The mock flux data of galaxies are measured from these images by aperture photometry.The HST-ACS survey covers an area of approximately 2 deg 2 in the F814W band, which has a spatial resolution similar to that of the CSST, with an 80% energy concentration radius of  80 ∼ 0. ′′ 15 (Cao et al. 2018;Gong et al. 2019;Koekemoer et al. 2007;Massey et al. 2010;Bohlin 2016).Also the COSMOS HST-ACS F814W survey boasts a notably low background noise, anticipated to be approximately 1/3 of that in the CSST survey.This characteristic makes it a solid basis for simulating CSST galaxy images.
We first select the central 0.85 × 0.85 deg 2 area of this survey containing ∼ 192,000 galaxies to obtain high-quality images.Then rescale the pixel size of the COSMOS HST-ACS F814W survey from 0.03 ′′ to 0.075 ′′ to match the CSST pixel size.Next we extract a square stamp image for each galaxy from the survey area, with a galaxy in the center of the stamp image.The square stamp's dimensions are 15 times that of the galaxy's semi-major axis, resulting in varying sizes for galaxy stamp images.The galaxy's semi-major axis and other pertinent morphological details can be found in the COSMOS weak lensing source catalog (Leauthaud et al. 2007).In addition, we mask all sources in the stamp image with a signal-tonoise ratio (SNR) greater than 3, except for the central galaxy, and substitute them with the CSST background noise.
Here we use the COSMOS2015 catalog (Laigle et al. 2016) to match galaxies in HST-ACS F814W survey, it contains about 220,000 galaxies with the measures of galaxy redshift, magnitude, size, dust extinction, best-fit spectral energy distribution (SED), and so on (Scoville et al. 2007;Ilbert et al. 2009;Cao et al. 2018;Gong et al. 2019).The COSMOS photo-s have been computed using more than 30 bands that span a wide range of the electromagnetic spectrum.Laigle et al. (2016) have conducted verification of some of the photo- estimates in the COSMOS2015 catalog by comparing them with several spectroscopic survey samples.The accuracy of photo-s and characteristics of the spec- samples can be found in Tables 4 and  5, as well as Figures 11 and 12 presented by Laigle et al. (2016).Given that the photo-s in this catalog have demonstrated precision and accuracy, we consider these photo- estimates to be reliable and have chosen to adopt them as the true redshift (hereafter referred to as  true ) for the purpose of training RF.However, it should be noted that due to this reason, there may be some potential bias or error in the present results.In the future, the same method can be applied with the spectroscopic sample obtained in the CSST survey.In this work, 31 SED templates from ℎ code (Arnouts et al. 1999;Ilbert et al. 2006) were reproduced to fit galaxy flux.We extend the wavelength range of the templates from ∼ 900Å to ∼ 90Å using the BC03 method (Bruzual & Charlot 2003) because CSST has large wavelength coverage between NUV and NIR band.During the process of fitting the flux with the SED templates, we include five dust extinction laws, derived from sources like the Milky Way (Allen 1976;Seaton 1979), the Large Magellanic Cloud (Fitzpatrick 1986), the Small Magellanic Cloud (Prevot et al. 1984;Bouchet et al. 1985), and a starburst galaxy (Calzetti et al. 2000), as well as various emission lines such as Ly, H, H, [OII], and [OIII].For the IGM absorption, we use the attenuation laws computed by Madau (1995).
Then we convolve the obtained SEDs with the CSST total transmission curves (Figure 1) to calculate the theoretical flux data observed by the CSST.The theoretical flux in electron counting rate of a band  can be estimated as: where ℎ and  are the Planck constant and speed of light respectively,  is the effective telescope aperture area, () is the SED, and   () =   ()  ()  () is total system throughput, here   (),  () and   () are the intrinsic filter transmission, detector quantum efficiency, and total mirror efficiency respectively.Then we can calculate the theoretical flux in electron counts of  band as: where  exp = 150 is exposure time,   exp is the number of exposures, for  and  bands, it is 4, for the other bands it is 2.
After that, the CSST instrumental effects and background noise were added to the image.The additional noise in  band can be expressed as where   img is the background noise of the rescaled image,   bkg,th is the theoretical CSST background noise per pixel, it can be calculated by here  dark = 0.02e − s −1 pix −1 is the dark current,  = 5e − pix −1 is the read-out noise, and   sky is the sky background in unit of e − s −1 pix −1 ,which is given by where  sky () is the surface brightness intensity of the sky back- where   obs is the observed electron counts, and  pix is the number of pixels covered by a galaxy for the CSST.In this way, we have constructed a mock data sample for CSST photometric observations.
Figure 2 displays the redshift distribution of the mock data.The peak of the distribution is around  = 0.6 − 1, covering the redshift range from 0 to 5. Figure 3 presents measured flux examples from mock images for the seven CSST photometric bands at  = 0.31, 1.6 and 2.52, accompanied by the corresponding SEDs for comparison.The figure clearly shows that the mock flux data accurately captures the features of galaxy SEDs.

EAZY approach
EAZY, as described in Brammer et al. (2008), is a photometric redshift software that employs a template-fitting approach.The fundamental procedure involves the minimization of the  2 statistic by comparing SED templates with observed colors across various redshift grids:  where  filt is the number of filters,  ,,  is the synthetic flux of template  in filter  for redshift ,   is the observed flux in filter , and   is the uncertainty in   .Then EAZY will find the best-fitting coefficients   of all combined templates: Among all template-fitting methods, EAZY is one of the most widely used software.For example, Yang et al. (2014) 2020) showed that when run in identical configuration, template-fitting methods provide nearly identical results.The differences observed in the results are not due to differences in performance of the template-fitting methods, but rather to variations in their configurations.Hence, we have selected the EAZY method as a representative template-fitting approach for conducting a comparative analysis with our RF algorithm.
For EAZY, we employed the default EAZY_v1.2_dustytemplates, which include 8 templates and were widely used.Additionally, we incorporated the  band apparent magnitude prior (|  ), which represents the redshift distribution of galaxies with the apparent magnitude   .The parameter _  _  is configured with a value of 0, indicating an even division of the redshift grid.All other parameters are retained at their default settings.The input bands consist of the seven CSST bands described before.

Random forest algorithm
Random Forest (Breiman 2001) is an ensemble learning algorithm that consists of multiple regression trees trained on bootstrap samples (Caruana et al. 2008).In this work, we utilize the RF class from the scikit-learn library 8 (Pedregosa et al. 2011) to import and apply the Random Forest algorithm.Our goal is to train independent prediction trees on the mock data to accurately estimate photo-, and to use the resulting probability density function to characterize the error and derive an estimation for each galaxy.We also calculate other useful information such as feature importance and redshift confidence following the method proposed in Carrasco Kind & Brunner (2013).The details of this algorithm and training process can be seen in Appendix A.

Practical details of the procedure
In future analyses of weak gravitational lensing surveys, CSST will utilize photo- data.To ensure the accuracy of the photo- information, we have selected a high-quality sample from mock data with a signal-to-noise ratio (SNR) larger than 10 in either the  or  band.The resulting subsample contains 44,991 sources.In future work, other potentially better selection criteria such as size or shape measurement quality will be taken into consideration.
Figure 4 depicts a simplified workflow of the algorithm.First of all, the sample is randomly divided into training and testing data with a temporary ratio of 1:1.The input features consist of the flux of seven bands and six "colors" derived from the ratio of flux  flux  , where flux  and flux  are adjacent.Our mock data does not contain missing values, but may occasionally be negative, as this can occur when sources are undetected in certain bands.These features are rescaled to the same level using standard scaler for normalization.The formula for standard scaler calculation is: 8 https://scikit-learn.org/stable/ where  is the mean value of the data,  is the value of every feature and  is the standard deviation of all data.Then, we have developed a method based on the RF algorithm to fully utilize the observation error information.Instead of using the observation error directly as input features, we first compute the inverse variance as the sample weight during the fitting process, where   is the error of input features in  band,   is the corresponding feature.The sample weight for each source is shown in Figure 5. From the figure, we observe that the weights are concentrated and decrease as redshift increases.
The RF algorithm generates prediction results for all trees, which can be used to construct the Probability Density Function (PDF) of redshift through statistical analysis.Carrasco Kind & Brunner (2014) and Rau et al. (2015) have explored various methods for generating PDFs with RF, such as the Gaussian mixture model (GMM) and non-parametric techniques like kernel density estimation (KDE) (Rosenblatt 1956;Parzen 1962), among others.Due to its simplicity, we opt for KDE to construct our PDFs.KDE operates by placing a kernel, typically a smooth, bell-shaped function, at each data point and summing these kernels to produce a smooth, continuous approximation of the probability density.As a non-parametric method, it estimates p() for the PDF () using a set of  samples   .In the case of 1D KDE, the form is as follows: where we define  as the Gaussian kernel, with its standard deviation determined by the standard deviation of the  samples, where ℎ is referred to as the bandwidth.For this purpose, we employ the  .._ 9 class.We retain all parameters at their default settings.It is worth noting that our PDF is not a conventional PDF, rather, it represents the distribution of predictions.This PDF is denoted as (|tree), which represents the probability distribution of redshift.Typically, a redshift prediction can be either the mean value or the mode of the entire PDF.In this study, we use both for comparison.Our findings indicate that by selecting the peak redshift, the outlier fraction in test data can be reduced from about 4% to 2.3%.Therefore, we choose the peak value as the final prediction.In addition, the input feature importance can be obtained after constructing the RF.
Next, we utilize a Monte Carlo simulation while taking the input feature error as a 1 standard deviation to expand the training set.This process generates a factor of 100 new mimic catalogs.Utilizing the above RF algorithm, we make redshift predictions for each catalog.These predictions are used to create a new PDF, called (|forest), which takes into account observation error.
After that, we construct a final PDF, which is obtained by () ∝ (|tree) (|forest), to predict the redshift by selecting the value where the peak is located.Here the term (|forest) can be regarded as a straightforward weighting system based on the model.
In addition, we define the outlier source according to then we use the outlier fraction and normalized median absolute deviation (NMAD) (Brammer et al. 2008) to judge the accuracy, where 9 https://scipy.org/ true and  phot are the real and predicted redshift respectively.This step mainly estimates the performance of the algorithm in redshift prediction.Our final step is to establish additional indicators, which refer to Carrasco Kind & Brunner (2013), to evaluate the performance of our model predictions.Additionally, we investigate the effect of changing the ratio of training and testing data to determine the optimal amount of training data.
We also conducted separate tests to measure the impact of forest size and other parameters in the fitting process on performance.The number of regression trees in the forest is an essential hyperparameter that can affect the algorithm's performance.Generally, increasing the forest's size can improve the accuracy of the model as it increases the diversity of the trees and reduces the risk of overfitting.However, there is a point of diminishing returns beyond which further increasing the forest's size can not significantly improve performance but increase computational cost.In Figure 6(a), we demonstrate how the mean squared error (MSE) of the result is influenced by the forest's size.The MSE on the training data improves significantly when the forest size is small, but after 50, it changes little until it stops decreasing.On the other hand, the time required grows proportionally as the forest size increases.As shown in Figure 6(b), if we consider the time consumed in the model with 100 trees as the standard duration, models with 200 trees will cost twice as much time while the MSE changes little.Therefore, it is not necessary to establish too many trees.
Figure 6(c) illustrates the role of another parameter, the minimum number of samples required to be a leaf node, known as "min_samples_leaf".It determines the minimum number of samples that must be present in a leaf node of a regression tree before the splitting process can be halted.Setting a higher value for min samples leaf can lead to a simpler tree with fewer nodes, reducing the risk of overfitting the training data.It can also make the regression tree more interpretable by reducing the number of splits and resulting in a more intuitive regression process.Such modification is called "prune".However, if we cut too many nodes, the model will have the risk of underfitting.The result in Figure 6(c) is based on the forest size of 100.Other parameters, such as "max_depth" and "min_impurity_decrease", have a similar role.By adjusting these parameters, it is possible to control the size, complexity, and generalization ability of the regression trees in the RF.In general, the optimal values for these parameters will depend on the specific problem being addressed and the desired trade-off between accuracy and model complexity.In this work, we find that setting them to their default values is good enough.

RESULTS
The results in Figure 7 demonstrate that the outlier fraction and  NMAD in the testing data are 2.045% and 0.025, respectively, which are similar to the results of 1.823% and 0.021 in the training data.This indicates that the model does not suffer from overfitting.Moreover, our method outperforms the EAZY with a specific configuration, which yields an outlier fraction of 6.45% and  NMAD of 0.043.Although some studies, such as Euclid Collaboration: Desprez et al. (2020), suggest that certain model fitting methods are superior to the RF method, for our CSST photometric simulated sample, both in terms of the outlier fraction and  NMAD , the RF method outperforms EAZY.The comparison between the RF method and other model fitting methods will be conducted in further research.
In Figure 8, the top panel shows the outlier fraction in different  redshift bins, while the bottom panel displays the box plot of Δ/(1 +  true ) in the same redshift bins.From the top panel, we can observe that in general, the outlier fraction increases with redshift.It tends to have the smallest fraction in the redshift range of 0.5 <  true < 1.5, which is mainly due to the larger training samples in this range as shown in Figure 2. In a larger redshift range of 3.5 <  true < 4, there is an extreme increase in the outlier fraction, which could be mainlty caused by fewer training samples and larger observation flux errors.Note that there is a peak between 1.5 <  true < 2, which is due to the observable wavelength limit of filter bands, as shown in Figure 1.Between 1.5 <  true < 2, the Balmer break shifts out of the observation bands while Lyman break has just entered.As we can see, even without any physical model assumptions, the result still demonstrates a similar degeneracy problem in the template fitting method (Brammer et al. 2008).The box plot of Δ/(1 +  true ) below indicates that the mean (triangle) and median (middle line in box) prediction values decrease with redshift.The trend could be explained by the algorithm's inability to predict values beyond the training sample range.Thus, it is normal for the algorithm to overestimate at low redshifts and underestimate at high redshifts.The results presented above were obtained using a training-testing ratio of 1:1.To test if increasing this ratio would improve the photo- estimation, we tried different training-testing ratios as shown in Figure 9.After applying the same process, we obtained the outlier fraction and  NMAD for the testing data.We observed that increasing the training data can lead to better predictions, although the improvement is very small.After the ratio exceeded 2:1, the improvements reached a limit.This suggests that the number of training data is sufficient, and increasing the ratio may not be meaningful.Moreover, the RF algorithm can still perform well even with a small amount of training data, such as ∼ 0.25 : 1.

Error distribution
To examine the distribution of the difference between predicted and true values in our model, we calculated the standardized error Δ/   distribution, where    represents the standard deviation of each tree's prediction error for every source, which is presented as circles in the upper panel of Figure 10.An unbiased model should yield a Gaussian distribution with a zero mean and a unit variance, represented by the curve on the plot.In general, the curve and histogram are well-matched, although there is a slight concentration near zero.Upon calculating the skew and kurtosis of the distribution, we found them to be 0.01 and 1.65, respectively, indicating that the data is almost symmetrical and unbiased, closely approximating a mesokurtic distribution.
We proceed to compute the percentage of standardized true errors that fall within the level- critical values for a given  referenced from Carliles et al. (2010).We plot several circles with  values of 0.32, 0.1, 0.05, and 0.01 to represent the values, and the line represents the expected percentage.For instance, if  = 0.32, the area under the lower or upper tail of a standard Gaussian distribution is /2 = 0.16.We expect the percentage to be the area between the two tails, which is 68%.We find that approximately 70.2% of the error distribution falls within the critical values.While the fit is not perfect, the results are still close to the correct values within acceptable bounds.

Probability distribution function
A classical PDF obtained by EAZY (Brammer et al. 2008) is calculated from where (|) ∝ exp − 2 /2 is the likelihood computed from the template fits over given redshift grids, , and observed colours, ,  2 is defined in Equation 7. (| 0 ) is the prior probability that a specific galaxy with magnitude  0 at redshift .In this work we construct the PDF made by EAZY using the prior calculated from  band.The prior probability created by EAZY has the form of where   and  0, are the fitted parameters for the redshift distributions in each magnitude bin,  0, .Figure 11 displays some examples Here we notice that even though the PDF is concentrative, it still has the probability to predict inaccurately.
It is important to clarify that this comparison doesn't imply the existence of a single superior model.In reality, there exists a multitude of methods for estimating photo- PDFs, yielding divergent outcomes, and a consensus on a preferred approach is yet to be established.Recent efforts have been made to evaluate the performance of various PDF estimation methods (Schmidt et al. 2020).
Once the PDF is established, a new index called   can be designed to represent the confidence level of the predicted redshift (Carrasco Kind & Brunner 2013).  is defined as the integral of the PDF within the range of  mean ± (1 +  mean ), where  mean is the mean value of the PDF.When  mean < 1,  = 0.3, and when  mean ≥ 1,  = 0.1 × (1 +  mean ). Figure 11 shows the   values for each PDF, where the grey area represents the value of   .
In these examples, the   around the mean value of each PDF is measured.Typically, the   for galaxies with unimodal or concentrated PDFs is high, and the peak is more likely near the real redshift, while the   for galaxies with bimodal or dispersed PDFs is low, indicating that   can provide a reasonable confidence level for the redshift estimate.It should be noted that a low   does not necessarily mean a poor prediction.Sometimes, the peak value of the PDF corresponds to the true redshift.Therefore, the index   only represents the shape of the PDF rather than the accuracy.However, photo- samples can still be further refined by selecting only PDFs with   higher than a certain threshold.Table 1 shows the proportion of galaxies retained after different   cuts in RF and EAZY for comparison.The data depletion is smoother in the RF method, indicative of the RF algorithm generating a greater number of unimodal PDFs.Furthermore, the RF algorithm demonstrates superior accuracy across all cutoff conditions.Nonetheless, it is still possible that a small number of galaxies with high   may be estimated as the wrong redshift, and similarly, a small number of sources with low   may be estimated as the correct redshift.
In addition to the confidence index, an error bar is added to the predicted value of each redshift.Based on the PDF of each galaxy, the redshift position of the 16th and 84th percentiles of the cumulative probability distribution is taken as the upper and lower limits of the error bar, as shown in Figure 7. Like the confidence index, most outliers have large error bars.However, there are a small number of galaxies for which the error bar is very small, but the redshift is still incorrectly estimated.This situation may be due to insufficient input information.

Feature importance
In addition to the PDF, feature importance can provide helpful information.It helps us understand the training data better, determine the most effective feature combination, and test whether it is pos-sible to reduce the number of input features.Diverse methods for feature importance assessment can yield varying results due to different measurement approaches, reliance on specific prediction-tailored features, and the impact of modeling techniques (Saarela & Jauhiainen 2021).Disparities in results may also emerge from the ability to capture mutual influences between features.Furthermore, using different data subsets or adjusting hyperparameters can introduce fluctuations in importance values.In our subsequent analysis, we conducted tests using various feature importance calculation methods.

Model-dependent feature importance
RF-based feature importance is a component of the output generated by RF models.These importances are calculated based on the mean and standard deviation of impurity decrease (as described in Equation A2) accumulation within each tree, commonly referred to as Gini importance.Gini importance serves as a measure of the impact that each feature has on the predictive performance of the RF model.As shown in the upper panel of Figure 12, feature importance differs before and after adding sample weights to the model.This is expected because the weight parameter affects the random choice possibility of features when building a regression tree.The three most important features of the weighted model are the  − ,  − , and  bands.The reason why these two colors, along with the  band luminosity, are important is because they can effectively constrain the shape of the SED for galaxies in our sample.The importance of the r-band is also reflected in EAZY code, which typically uses  band data to construct the prior function.From the upper panel of Figure 12 we can also see that any input features related to the  band, such as  band flux and  −  color, do not exhibit significant importance in predicting photometric redshifts, primarily due to the fact that the wavelength range of the  band is completely encompassed by the  band.
The input feature importance was further investigated by dividing the training samples into multiple bins based on redshift.The importance of  − ,  − , and  −  was then calculated for each bin.As shown in the lower panel of Figure 12, in the redshift range of  ∈ (0, 1],  −  color is the most important as it reflects the Balmer (4000Å) break feature.In the  ∈ (2, 3] range,  −  becomes the most important feature as the Balmer break has shifted out of the observed wavelength range and the Lyman break feature has entered the  or  band.Similarly, the  −  color is most important in the  ∈ (3, 4] range for the same reason.Similar findings have also been reported in previous studies, indicating that the importance of different colors varies with redshift (Euclid Collaboration: Humphrey et al. 2023;Mucesh et al. 2021;Li et al. 2023;Zhou et al. 2019).
Adding more input features does not necessarily mean higher prediction accuracy.In fact, sometimes adding too many features can lead to overfitting and decrease the model's generalization ability.It is crucial to carefully choose the most relevant features and apply effective methods to process and filter them.To address this issue, we deleted the two least and most important features, respectively, and tested the performance of our model in each case.The results, summarized in table 2, showed that deleting the two least important features improved accuracy and sped up the running time of the code.This was because there were fewer random features when splitting nodes in the tree and less data to be saved in memory when building the regression tree, which improved the access speed of the cache.However, no matter how many features were deleted, deleting the most important feature had a significant impact on the results, as these features contain the most information needed to generate accurate photo-.One of these methods is the permutation feature importance, as depicted in Figure 13(a), where we computed the importance values for the input features in our training set.The permutation feature importance measures the reduction in a model's performance score when a single feature value is randomly shuffled.The importance value   for feature  is defined as follows: In this procedure, the variable  represents the reference model score, and  ,  denotes the score for each repetition  out of a total of  data shuffles.By disrupting the relationship between the feature and the target variable, any decrease in the model score indicates the significance of the feature in the model's dependence.In situations where features are highly correlated, the permutation of one feature has limited impact on the model's performance, as it can obtain similar information from a correlated feature.
Based on the analysis from Figure 13(a), it is evident that the top four important features, as identified through permutation feature importance, align closely with the Gini importance calculated in the RF model without sample weighting.Additionally, we can observe that both  and  are among the least important features.However, the combination feature  −  carries greater importance than either of them individually.
Furthermore, we calculate the Spearman's rank correlation coefficient (Zar 2005) between all features, as illustrated in Figure 13(b).Then, we transform the correlation coefficient matrix into a distance matrix by computing 1 − |coefficient|.This transformation aims to reduce the distance between features with higher correlation for subsequent hierarchical clustering.We employ Ward's method (Ward 1963) for clustering, and the clustering results are presented in Figure 13(c).
Finally, we conducted tests on our model by selecting specific thresholds and retaining only one feature from each cluster.The performance of the model using these selected features was evaluated and is presented in Table 3.This analysis provides insights into how the chosen features impact the performance of the model.

SUMMARY
In this work, we have used a weighted RF algorithm to predict the photo- of the CSST photometric survey.To simulate the CSST photometric observations, we use HST-ACS observations and the COSMOS catalog while taking into account the CSST instrumental effect.After obtaining the mock galaxy images in seven bands, fluxes and observation errors are measured using aperture photometry.We select sources with a signal-to-noise ratio greater than 10 in the  or  band as the training and testing sample for photo- prediction.Our model takes the fluxes and colors of galaxies as input features and incorporates the inverse variance weight of the samples.During the training process, we also use Monte Carlo simulation to expand the training set by using input feature errors as standard deviations.The findings indicate that the weighted RF model can achieve a photo- accuracy of  NMAD = 0.025 and an outlier fraction of  = 2.045%, which is considerably better than the values of  NMAD = 0.043 and  = 6.45% obtained by the EAZY code that uses the template-fitting approach.Although our results are not superior to the prediction accuracy of Zhou et al. (2022b) based on a CNN, our model only requires the flux data of galaxies as input (colors can be derived from flux data) and does not need to input massive galaxy images like CNN, making the model prediction efficient.Additionally, the RF model can naturally provide the probability density distribution of predicted redshift and output the importance of each input feature.Therefore, our model can provide a methodological choice for effectively estimating photo- using CSST data.
Our research has found that the weighted RF model can accurately derive the photo- of a sample and can also establish confidence indices and error bars for each prediction value based on the derived redshift PDF.Selecting high-confidence sources can further reduce the proportion of outliers and improve prediction accuracy.Additionally, the model can determine the importance of each input feature in different redshift ranges.The results suggest that the most critical input feature typically reflects the approximate position of the spectral break in galaxies, indicating the model's ability to extract physical features.Moreover, our research also indicates that increasing the number of input features does not necessarily result in higher prediction accuracy.Only by optimizing the selection of input features can the prediction accuracy of the model be improved.
Further research has revealed that, despite the impressive results of the model, we are still unable to accurately measure the photo-s of certain galaxies, regardless of how we adjust the model parameters or increase the number of training cycles.These galaxies are primarily concentrated in the low redshift range  ∼ (0 − 0.4) and high redshift range  ∼ (2 − 3).We believe that there are three reasons for this: firstly, there are relatively few training samples in these two redshift ranges, as can be seen from Figure 2; secondly, the photometric accuracy of high redshift samples is relatively low due to lower flux, which often results in larger observational errors; thirdly, our model is unable to accurately distinguish between the Balmer break and Lyman break features in the galaxy's SED within the wavelength range observed by CSST, leading to overestimation of photo-s in some low redshift galaxies and underestimation in some high redshift galaxies.Similar conclusions have also been mentioned in several previous studies(Euclid Collaboration: Humphrey et al. 2023;Mucesh et al. 2021;Li et al. 2023;Zhou et al. 2019).Thus, the performance of our model is not solely dependent on the choice of model parameters and training cycles, but also on the quality of training samples and the range of wavelengths covered by observations.We would also like to extend our sincere gratitude to the anonymous reviewers for their meticulous review and valuable feedback on this manuscript.Their thoughtful comments and suggestions have greatly enriched the quality and depth of this work.We are truly appreciative of the time and expertise they dedicated to this review process.

Figure 1 .
Figure1.The solid curves represent the real transmissions of the CSST seven photometric bands.The effect of detector quantum efficiency is included.The details of the transmission parameters can be found inCao et al. (2018) and  pix is the pixel size in arcseconds.The evaluation of   sky is based on the measurements of the earthshine and zodiacal light for the 'average' sky background case given in Ubeda (2011).It is found that   sky are 0.0023, 0.0018, 0.142, 0.187, 0.187, 0.118 and 0.035 e − s −1 pix −1 for , , , , ,  and  bands.Then   bkg,th are calculated as 10.65, 7.84, 9.93, 10.59, 10.59, 9.56 and 11.53 e − for these bands.Thus the additional noise   add can be calculated by subtracting the rescaled images noise   img as shown in equation 3, and added to pixels in stamp images by sampling from a Gaussian distribution with mean = 0 and  =   add in band .After obtaining the final mock image, aperture photometry method was used to measure the flux.First, , ,  and  band were staked to create detection images for high SNR source.Then we measure the Kron radius (Kron 1980) along galaxy major-and minor-axis to find an elliptical aperture with the size 1 ×  Kron .Finally, we can get the flux and error from electron number   e − and error   e − measured within the aperture.The SNR in band  can be calculated as

Figure 2 .
Figure 2. Galaxy redshift distribution of the mock sample from the COSMOS catalog.The sources have been selected with SNR greater than 10 in g or i bands.The distribution ranges from 0 to ∼ 5 with a peak locates around 0.6 − 1.

Figure 3 .
Figure 3.The fluxes of galaxy samples measured by the aperture photometry method in seven bands.The SEDs of the corresponding galaxy are also shown in solid curves as comparison.

Figure 4 .
Figure 4.A simplified workflow of the whole algorithm.After preprocessing the sample with true redshifts, we can get the input features and their errors.Then using the Monte Carlo Simulation to expand the catalog.By combining the PDF from the trees in original catalog and from the forests in the mimic catalog, we obtain the last PDF.Our final photo- and error bar will be calculated based on it.

Figure 5 .
Figure 5.The normalized sample weight of each source (gray points) in different redshifts.Most of the sources share similar weights.The density contour is shown with black lines.The distribution of redshifts and sample weights are plotted on the top and right panel, respectively.

Figure 7 .
Figure 7. Error bar of using RF in testing data.The 16th and 84th percentile of the cumulative PDF are calculated as the upper and lower limits of the error bars.Point is the prediction value, also one of the peak value located in PDF.

Figure 8 .
Figure 8. Upper panels are outlier fractions in different  true bins.Bottom panels are box-plot of Δ in the same bins.The triangle is the mean value of the whole data in current bins.The box in the plot represents the middle 50% of the data, with the bottom edge of the box indicating the 25th percentile (Q1) and the top edge indicating the 75th percentile (Q3).The line inside the box represents the median (50th percentile).The whiskers extend from the box by 1.5x the inter-quartile range (IQR).

Figure 9 .
Figure 9.A comparison of different training sizes in RF.The solid line is the outlier fraction.Training and testing data numbers are shown by grey and shadow bars respectively.

Figure 10 .
Figure 10.Upper panel shows a standard normal distribution (curve) and the distribution of standardized error ( phot −  true )/   for the test data (circle).The lower panel shows the percent of error (circle) within level- critical values.The straight line represents the percent error expected within level-.Horizontal axis is 1 −  (Carliles et al. 2010).

Figure 11 .
Figure 11.PDF examples obtained by RF (first row) and EAZY (second row).Every sub-figure in a column represents the same source, and the red solid line is true redshift.The dashed line is the predicted value.The average value is also shown by dot-dashed line.The grey area under the RF PDF is   .The larger the   , the more concentrated is around  mean .

Figure 12 .
Figure 12.Feature importance of the training data.The upper panel is the importance of the whole data, The panel below is the importance of three features change with redshift.

Figure 13 .
Figure 13.(a).Permutation importance of train set sample with sample weight.(b).Spearman's rank correlation coefficient of input features.(c).Hierarchical clustering dendrogram of input features with Ward's methods, the distance is calculated by 1 − |coefficient|.The y-axis represents the similarity between features or clusters.Smaller values indicate higher correlation between two features or clusters.
employed EAZY to derive photo-s for the Hawaii-Hubble Deep Field-North (H-HDF-N) survey catalog, and Chen et al. (2018) estimated photo-s for the X-ray point-source catalogue within the XMM-Large Scale Structure (XMM-LSS) survey region using EAZY.Both studies demonstrated strong performance of EAZY.Additionally, Euclid Collaboration: Desprez et al. (

Table 1 .
Different   cut in testing data and EAZY.

Table 2 .
A comparison in the accuracy of photo- prediction by using different feature combinations.

Table 3 .
The accuracy of our model for photometric redshift prediction by selecting specific thresholds and retaining only one feature from each cluster.