ABSTRACT

The increasing size and complexity of data provided by both ongoing and planned galaxy surveys greatly contribute to our understanding of galaxy evolution. Deep learning methods are particularly well suited for handling the complex and massive data. We train a convolutional neural network (CNN) to simultaneously predict the stellar populations in galaxies: age, metallicity, colour excess E(BV), and central velocity dispersion (VD) using spectra with redshift ≤ 0.3 from the Sloan Digital Sky Survey. This is the first time to use spectra based on deep learning to derive the four galaxy properties. The testing results show that our CNN predictions of galaxy properties are in good consistent with values by the traditional stellar population synthesis method with little scatters (0.11 dex for age and metallicity, 0.018 mag for E(BV), and 31 km s−1 for VD). In terms of the computational time, our method reduces by more than 10 times compared to traditional method. We further evaluate the performance of our CNN prediction model using spectra with different signal-to-noise ratios (S/Ns), redshifts, and spectral classes. We find that our model generally exhibits good performance, although the errors at different S/Ns, redshifts, and spectral classes vary slightly. Our well-trained CNN model and related codes are publicly available on https://github.com/sddzwll/CNNforStellarp.

1 INTRODUCTION

Stellar population refers to a collection of a large number of stars that share similarities in terms of age, metallicity and kinematics. The study of stellar populations in galaxies provides insights into the formation and evolution of galaxies. A spectrum of galaxy is often used to derive stellar populations, which records information about the age and metallicity distribution of stars, as well as stellar kinematics and dust attenuation.

The typical method for inferring stellar populations using spectra is stellar population synthesis. The basic idea of stellar population synthesis is to fit the observed spectrum to template spectra (Bruzual & Charlot 2003; Maraston 2005; Vazdekis et al. 2010; Maraston & Strömbäck 2011; Vazdekis et al. 2016) that combine libraries of evolutionary tracks, stellar spectra, initial mass function, star formation, and chemical histories. In the past two decades, the full spectrum fitting methods are widespread used to extract the stellar population and kinematics of galaxies, e.g. PPXF (Cappellari & Emsellem 2004; Cappellari 2017, 2023), STARLIGHT (Cid Fernandes et al. 2005), STECKMAP (Ocvirk et al. 2006), VESPA (Tojeiro et al. 2007), FIREFLY (Wilkinson et al. 2015, 2017), BEAGLE (Chevallard & Charlot 2016), and FADO (Gomes & Papaderos 2017). These methods adopt the full spectral information during the χ2-minimization between an observed spectrum and template spectra. The outputs are a best-fitting combination of templates and a set of weights of these templates. And the parameter grids of templates are finer, the fitting results are more accurate. However, the full spectrum fitting method is computationally intensive and time-consuming (Ocvirk et al. 2006; Wang et al. 2022), and the computational complexity increases as the parameter space and number of templates grow larger. This can limit the applicability of the full spectrum fitting methods to large datasets or when real-time analysis is required.

Deep learning is a technology with strong feature learning ability, especially in computer vision and speech recognition. Deep learning algorithms, such as convolutional neural network (CNN), are designed to handle complex and large-scale problems by utilizing multiple layers of interconnected nodes. The network adopts weight sharing to reduce the number of weights, making it easy to optimize and learn intricate patterns and relationships from vast amounts of data. Deep learning is widely used in astronomy, providing new ideas for solving problems in the era of big data (Tao et al. 2020; Huertas-Company & Lanusse 2023; Smith & Geach 2023). One of the most typical applications is source classification and detection, such as morphological classification (Dieleman, Willett & Dambre 2015; Huertas-Company et al. 2015; Aniyan & Thorat 2017; Vega-Ferrero et al. 2021) and gravitational wave detection (George & Huerta 2018; Wang et al. 2020), etc. There are some applications of deep learning to estimate galaxy properties such as photometric redshifts (Hoyle 2016; D’Isanto & Polsterer 2018; Hong et al. 2023), which is a regression type of application. Deep learning algorithms for stellar populations or star formation histories mostly employ galaxy photometric values (such as photometric image, magnitude, colour, etc.) to drive the properties like stellar mass, stellar age, stellar or gas metallicity, star formation rate (Surana et al. 2020; Buck & Wolf 2021; Liew-Cain et al. 2021; Euclid Collaboration et al. 2023). However, research on deep learning using spectra to derive stellar populations is few. Moreover, estimations of dust attenuation and central velocity dispersion (VD) of galaxies based on deep learning using spectra remain rare. The commonly used method for deriving these properties using spectra is the stellar population synthesis, which has a high time complexity, as mentioned above.

In this paper, we apply convolutional neural network to predict the stellar populations in galaxies, including age, metallicity, colour excess E(BV), and central VD. E(BV) quantifies the amount of dust attenuation. These four galaxy properties are simultaneously predicted by our trained CNN model. The data we used are spectra from the Seventh Data Release (DR7) of Sloan Digital Sky Survey (SDSS). In the process of deep learning, we use the full spectral information of galaxy spectra to retain as much information as possible. Note that, the input spectra are observed spectra without being shifted to rest wavelength. So our method does not require the prior knowledge of redshift, while stellar population synthesis methods need to know redshift in advance so as to shift the spectrum from observed wavelength to the rest frame.

The structure of this paper is as follows. In Section 2, we introduce the data selection, data labelling, and preprocessing. Section 3 presents our deep learning algorithm in details, including CNN architecture, setting of hyperparameters and training phase. In Section 4, we test the accuracy and execution time of our CNN prediction model. In addition, we further evaluate the performances of our model using data at different signal-to-noise ratios (S/Ns), redshifts, and spectral classes. The summary is given in Section 6.

2 DATA

Deep learning for regression analysis is a supervised method to determine the quantitative relationship between features and targets. In order to apply deep learning algorithms to stellar populations of SDSS galaxy spectra, we need to construct regression model by training a large number of galaxy spectra with labels. In this section, we pay attention to data selection and how to get the labels of our data sample.

2.1 Data selection

We use galaxy spectra from SDSS DR7 as our training and testing data. The SDSS spectrum covers from 3800 to 9200 Å at spectral resolution of R ∼ 2000. There are about 930 000 galaxy spectra in SDSS DR7. We restrict the data sample according to the following criteria:

  • 0.002 ≤ z ≤ 0.3 and zWarning = 0. The redshift cut ensures the main spectral lines lying in observed wavelength range, such as H β, Mg, Na, H α, etc., which are useful for estimating stellar populations. The zWarning = 0 indicates reliable spectroscopic redshifts.

  • S/N in the r band (hereafter S/Nr) ≥5. This criterion can removes very low quality data.

The resulting sample contains about 800 000 spectra as main sample. In order to alleviate the calculation pressure during the training process of deep learning, we select randomly 100 000 spectra from main sample as our data sample for learning method. The distributions of redshift, S/Nr, and Petrosian magnitude in r band (petroMag_r) of main sample and our learning sample are shown in Fig. 1. We can see that the distributions of our data sample are in agreement with the main sample. Most values of redshift, S/Nr, and petroMag_r of the two samples are located in [0.05, 0.2], [10, 30], and [16, 18], respectively.

Distributions of redshift, S/Nr, and petroMag_r of main sample and our learning sample. The spectra of our learning sample are selected randomly from main sample. The distributions of the two samples are basically the same.
Figure 1.

Distributions of redshift, S/Nr, and petroMag_r of main sample and our learning sample. The spectra of our learning sample are selected randomly from main sample. The distributions of the two samples are basically the same.

2.2 Labeling data

We use the classic full spectrum fitting method PPXF (Cappellari & Emsellem 2004; Cappellari 2017, 2023) to get the age, metallicity, E(BV), VD as the real values of our sample.

The PPXF derives the kinematics and stellar population of galaxy by fitting the observed spectrum to a linear combination of simple stellar populations (SSPs). There are some settings during the PPXF fitting as follows:

  • Templates. We choose 36 SSPs from Vazdekis et al. (2010), which contains nine age (0.06, 0.12, 0.25, 0.5, 1.0, 2.0, 4.0, 8.0, 15 Gyr) and four metallicity (−1.71, −0.71, 0, 0.22 in log(Z/Z)). These templates are then re-binned to the same velocity scale of the SDSS galaxy spectrum, and convolved with the quadratic difference between the instrumental resolutions of SDSS and templates.

  • Continuum processing. There are two ways to treat continuum of spectrum while fitting. One is to use a multiplicative polynomial to fit the continuum, which is insensitive to dust attenuation. The other is to preserve the continuum if one wants to estimate the dust attenuation from the continuum shape. In this work, we adopt the second way, that is, use the shape of continuum but not a multiplicative polynomial to derive the dust attenuation in a spectrum. The specific dust attenuation curve we use is Calzetti et al. (2000). The output of PPXF to quantify the amount of dust attenuation is colour excess E(BV). Note that, the E(BV) obtained from PPXF quantifies the total amount of dust attenuation because we do not correct the Galactic extinction of the input spectrum. Therefore, we can predict the the total amount of dust attenuation using this E(BV) as label.

  • Regularization. Using regularization for stellar population can lead to smoother weights of the solution. In this work, we focus on the weighted mean age and metallicity of galaxy rather than smooth weights, so we don’t apply a regularization, that is, set keyword REGUL=0 in PPXF.

Using PPXF with the above settings, the SDSS spectra in our sample are fitted to templates. Finally, we get the light-weighted average age and metallicity, E(BV), and VD. These values are regarded as the labels of stellar populations of galaxies in our sample for deep learning.

Note that, we remove the spectra if its VD is lower than 50 km s−1, which is 2/3 of the SDSS instrumental dispersion (70 km s−1). There are 6 per cent of our sample are filtered out. In order to keep the total number of sample unchanged, we supplement some spectra selected from SDSS DR7. The supplement spectra also meet the above selection criteria. The distributions of age (in log yr), metallicity ([M/H]), E(BV), and VD of our final sample for labels are displayed in Fig. 2. The median values of these four properties are 9.66 in log (yr), −0.22 dex, 0.1 mag, 150 km s−1, respectively.

Distributions of labels in our deep learning method. The four panels from the left to right are age in log (yr), metallicity ([M/H]), E(B − V), and VD, respectively.
Figure 2.

Distributions of labels in our deep learning method. The four panels from the left to right are age in log (yr), metallicity ([M/H]), E(BV), and VD, respectively.

2.3 Data preprocessing

There are two steps to preprocess the spectra before deep learning. Firstly, we normalize a spectrum divided by its flux around 5000 Å to avoid the influence of different scales. Secondly, we resample the spectra with an interval of 1 Å in the wavelength range [3800 Å, 8900 Å] in order to ensure the spectra with the same dimension. After resampling, each spectrum is a 5100-dimensional vector. Note that, the input spectra in our model are not de-redshifted to the rest frame. If we adjust the observed spectra to their rest frames, it may result in different wavelength ranges of spectra. Consequently, to maintain uniformity, the spectra must be truncated into the same range that is smaller than original range, which may lead to information missing in some spectra. In order to avoid the above issues, in our method, the spectra are in the same observed wavelength frame without being adjusted. In addition, stellar population synthesis methods need to shift observed spectra to the rest frame beforehand, so the redshift uncertainty directly affects measurements of stellar populations, while our method does not de-redshift the observed spectrum and thus refrains from the impact of redshift uncertainty.

3 METHOD

CNN is a deep learning algorithm that can deeply explore useful features for nonlinear, complex data. In this paper, we use CNN for regression prediction of stellar populations. Regression is a supervised learning, therefore, we should train CNN that learns the relationships between the input data (spectra) and the desired outputs (age, metallicity, E(BV), and VD). The CNN architecture and training phase are detailed in Sections 3.1 and 3.2, respectively. Our code is built on the Keras library (Chollet & others 2018).

3.1 Network architecture

The CNN architecture is very flexible in terms of number of layers, number of neurons per layer, kernel sizes, etc. After several tests and comparisons, our CNN architecture is determined, which is summarized in Table 1.

  • Input and output. The inputs to our CNN are spectra with 5100 pixels pre-processed in Section 2.3 and their true values of four physical properties: age, metallicity, E(BV), and VD. Note that, before feeding the four properties to the network, we should scale each property individually to a given range, e.g. between 0 and 1. By scaling these properties, we can ensure that each property contributes equally to modeling. Specifically, we apply the min-max scaling: subtracting each value of the property by its minimum value and then dividing it by the difference between its maximum and minimum value.

    The output of our CNN has four neurons representing the predicted values of age, metallicity, E(BV), and VD. Note that the output values from the model are scaled by the min-max scaling, so we must convert the outputs back into its original scale, which is known as inverse scaling.

  • Layers. Because the input data of CNN are spectra, the convolution and pooling of CNN are one-dimensional (Conv1D and MaxPooling1D). The CNN model contains eight convolution layers, four max pooling layers, and two fully connected (dense) layers. Using pooling layers as boundaries, our CNN model has a total of four block structures, with the same number of channels in each block structure. Because both the convolution layers and fully connected layers have weight coefficients, they are also referred to as weight layers. Therefore, our CNN model can be considered as 8+2 = 10 layers.

    Convolution layer is represented by Conv1D-xxx in the Table 1, where xxx represents the number of channels. The numbers of channel in the convolution layers are 16, 32, 64, and 128. The increase in the number of channels allows for more information to be extracted. Convolution kernel size of all convolution layers is 3, and the stride is set to 1. In all max pooling layers, the pooling window is 2.

  • Loss function. We apply mean squared error (MSE) as our loss function defined in equation (1). It calculates the average of the squared differences between each predicted value (⁠|$y_i^{\rm pred}$|⁠) and its corresponding true value (⁠|$y_i^{\rm true}$|⁠). The goal of the training process is to minimize this loss function, which helps the model learn and make better predictions:
    (1)
  • Activation function. In all convolution layers, we use Rectified Linear Unit (ReLU) as activation function. ReLU is simple yet effective and has gained popularity in deep learning. ReLU introduces non-linearity to neurons, allowing them to approximate any non-linear function and making neural networks better suited for our non-linear physical properties estimation.

  • Optimizer. Adaptive Moment Estimation (Adam) optimizer is used in our CNN training. It combines the advantages of both adaptive learning rate methods and momentum methods. Choosing an appropriate learning rate is crucial for successful training. We try four learning rates: 0.01, 0.001, 0.0001, and 0.00001. We finally select learning rate = 0.0001 by comparison experiments.

Table 1.

CNN architecture for predicting stellar populations based on SDSS galaxy spectra.

Layer (type)Output ShapeParameters
Input layer(5100)0
Conv1D-16(5098, 16)64
Conv1D-16(5096, 16)784
MaxPooling1D(2548, 16)0
Conv1D-32(2546, 32)1568
Conv1D-32(2544, 32)3104
MaxPooling1D(1272, 32)0
Conv1D-64(1270, 64)6208
Conv1D-64(1268, 64)12 352
MaxPooling1D(634, 64)0
Conv1D-128(632, 128)24 704
Conv1D-128(630, 128)49 280
MaxPooling1D(315, 128)0
Flatten(40 320)0
Dense(5184)209 024 064
Dense(4)20 740
Total params209 142 868
Layer (type)Output ShapeParameters
Input layer(5100)0
Conv1D-16(5098, 16)64
Conv1D-16(5096, 16)784
MaxPooling1D(2548, 16)0
Conv1D-32(2546, 32)1568
Conv1D-32(2544, 32)3104
MaxPooling1D(1272, 32)0
Conv1D-64(1270, 64)6208
Conv1D-64(1268, 64)12 352
MaxPooling1D(634, 64)0
Conv1D-128(632, 128)24 704
Conv1D-128(630, 128)49 280
MaxPooling1D(315, 128)0
Flatten(40 320)0
Dense(5184)209 024 064
Dense(4)20 740
Total params209 142 868

Note. It indicates the output shape and trained parameters for each layers.

Table 1.

CNN architecture for predicting stellar populations based on SDSS galaxy spectra.

Layer (type)Output ShapeParameters
Input layer(5100)0
Conv1D-16(5098, 16)64
Conv1D-16(5096, 16)784
MaxPooling1D(2548, 16)0
Conv1D-32(2546, 32)1568
Conv1D-32(2544, 32)3104
MaxPooling1D(1272, 32)0
Conv1D-64(1270, 64)6208
Conv1D-64(1268, 64)12 352
MaxPooling1D(634, 64)0
Conv1D-128(632, 128)24 704
Conv1D-128(630, 128)49 280
MaxPooling1D(315, 128)0
Flatten(40 320)0
Dense(5184)209 024 064
Dense(4)20 740
Total params209 142 868
Layer (type)Output ShapeParameters
Input layer(5100)0
Conv1D-16(5098, 16)64
Conv1D-16(5096, 16)784
MaxPooling1D(2548, 16)0
Conv1D-32(2546, 32)1568
Conv1D-32(2544, 32)3104
MaxPooling1D(1272, 32)0
Conv1D-64(1270, 64)6208
Conv1D-64(1268, 64)12 352
MaxPooling1D(634, 64)0
Conv1D-128(632, 128)24 704
Conv1D-128(630, 128)49 280
MaxPooling1D(315, 128)0
Flatten(40 320)0
Dense(5184)209 024 064
Dense(4)20 740
Total params209 142 868

Note. It indicates the output shape and trained parameters for each layers.

3.2 Network training

After data selection and pre-processing in Section 2, there are 100 000 spectra with four labels for training and testing the CNN model. The data sample is divided into training set, validation set, and test set in the proportion of 8:1:1. The training set is used to train a model with the goal of minimizing the difference between predicted outputs and actual values. The validation set is used during the training process to evaluate the performance of the model and determine the optimal values for hyperparameters. The test set is used to assess the performance and generalization ability of a trained model.

We train the CNN model with the above architecture using training set and validation set. Specifically, spectra and their true physical properties are given to the CNN in batch size of 32 at a time. In each epoch, CNN updates its parameters, or weights to minimize the loss function. We perform 100 epochs. In order to improve the generalization ability of the model or prevent overfitting in training process, we introduce L2 regularization. It involves adding a penalty term to the model’s loss function that encourages the weights of the parameters to be small. The loss of the training set and validation set changing with the increase of epoch is shown in Fig. 3. We can see that the training and validation loss steadily decrease over time and reach a low value, and the gap between the training and validation loss is small. This result indicates the model learns to generalize well on both the training data and unseen validation data, achieving good performance.

The curves of CNN loss of the training set and validation set with epoch. The training and validation loss decrease until reaching a steady state as epoch increases.
Figure 3.

The curves of CNN loss of the training set and validation set with epoch. The training and validation loss decrease until reaching a steady state as epoch increases.

4 RESULTS

After training our CNN model using the training and validation sets, we evaluate the model performance using the test set of 10 000 galaxies in this section. The performance of our CNN model is primarily quantified by the standard deviation of the difference between predicted values and true ones.

4.1 Accuracy and execution time of the CNN model

The comparison of stellar populations between the predicted values using our trained model and their true ones is shown in Fig. 4. From the left to right and top to bottom, the four panels show age, metallicity([M/H]), E(BV), and VD. In each panel, the inset shows the difference(ypredytrue) between predicted and true values. The mean (μ) and standard deviation (σ) of the differences are located in the upper left corner, and the blue line is the identity line.

Comparison of age in log (yr), metallicity ([M/H]), E(B − V), and VD between our predicted values and the true ones. The mean (μ) and standard deviation (σ) of the differences are indicated in the upper left corner, and the blue line is the one-to-one line. The distribution of residuals(ypred − ytrue) is shown in the inset of each panel. Note that the points out of 3σ are clipped.
Figure 4.

Comparison of age in log (yr), metallicity ([M/H]), E(BV), and VD between our predicted values and the true ones. The mean (μ) and standard deviation (σ) of the differences are indicated in the upper left corner, and the blue line is the one-to-one line. The distribution of residuals(ypredytrue) is shown in the inset of each panel. Note that the points out of 3σ are clipped.

We can see from Fig. 4 that the predictions for stellar populations using our model are quite good with relatively small scatters. The model accurately reproduces the age and metallicity with a scatter of 0.11 dex. Additionally, the scatter of the E(BV) is 0.018 mag and that of VD is 31 km s−1. This indicates that our CNN regression model yields consistent results with traditional stellar population synthesis method for measuring age, metallicity, E(BV), and VD.

In addition, we measure the execution time of both methods under the same experimental equipment configuration. In the comparative experiment, we perform the estimation of age, metallicity, E(BV), and VD for 100 000 galaxies using our deep learning method and traditional stellar population synthesis method-PPXF. The result shows that it requires 20 h to calculate the physical properties using PPXF, whereas it takes just 2 h to train our model and predict the properties for the same number of galaxies. And once our model is well trained, the time taken to the subsequent parameter estimation can be negligible. Therefore, our method has a significantly lower time complexity compared to the stellar population synthesis method by a factor of at least 10.

In conclusion, our CNN model demonstrates good performance in predicting stellar population parameters, and greatly reduces the time required compared to traditional methods.

4.2 Evaluating the CNN model at different S/Ns

The S/N of the training data may affect prediction performance of a model. By examining the dispersion of difference between the predicted galaxy properties and true values using test set with various S/Ns, we can gain insights into how our model’s accuracy is affected by different levels of noise. To be specific, we divide the test set into six bins according to S/Nr: [5, 10), [10, 15), [15, 20), [20, 25), [25, 30), and [30, –), and predict the four properties in each subset by the well-trained model. The dispersions of differences between predicted and true values of age, metallicity, E(BV), and VD as a function of S/Nr in Fig. 5. It can be seen that the scatter of the differences decreases as the S/Nr increases. At S/Nr = 5, the scatters of four properties reach their maximum(0.16 dex for age and metallicity, 0.03 mag for E(BV), and 50 km s−1 for VD). The result demonstrates that the CNN model has relatively good prediction performance at different S/N levels.

The scatters of the differences (Δy = ypred − ytrue) of age, metallicity, E(B − V), and VD as a function of S/Nr. In each panel, the middle point and bar of error bars, respectively, represent the median and scatter of σ(Δy) in different S/Nr subsets.
Figure 5.

The scatters of the differences (Δy = ypredytrue) of age, metallicity, E(BV), and VD as a function of S/Nr. In each panel, the middle point and bar of error bars, respectively, represent the median and scatter of σy) in different S/Nr subsets.

4.3 Evaluating the CNN model at different redshifts

As the input data to our CNN model are observed spectra without de-redshifted, how does redshift impact the prediction of our model? The redshifts of our test set are divided into seven subsets: [0, 0.06), [0.06, 0.08), [0.08, 0.10), [0.10, 0.12), [0.12, 0.15), [0.15, 0.18), and [0.18, 0.30). Fig. 6 shows the scatters of differences of age, metallicity, E(BV), and VD as a function of redshift. In general, the model performs well in prediction for spectra in different redshifts, with the maximum deviations of 0.13 dex for age and metallicity, 0.02 mag for E(BV), and 40 km s−1 for VD. There are slightly larger prediction errors for galaxies with lower and higher redshifts, which could be attributed to the limited amount of training data available within these redshift ranges(samples with z < 0.04 account for 6  per cent and z > 0.2 account for 8  per cent).

The scatters of differences(Δy) of age, metallicity, E(B − V), and VD as a function of redshift. In each panel, the middle point and bar of error bars, respectively, represent the median and scatter of σ(Δy) in different redshift subsets.
Figure 6.

The scatters of differences(Δy) of age, metallicity, E(BV), and VD as a function of redshift. In each panel, the middle point and bar of error bars, respectively, represent the median and scatter of σy) in different redshift subsets.

4.4 Evaluating the CNN model at different types of galaxies

There are several spectral classes of galaxies based on the spectral lines: passive galaxies without emission features, star-forming galaxies, composite galaxies, and AGN – the last three also known as emission-line galaxies. It is known that different types of galaxies have different physical properties and characteristics. How well does our model predict for different types of galaxies? Next we test our trained model’s prediction performance for different types of galaxies. Specifically, we cross-match our data with MPA–JHU DR7 catalogue (Kauffmann et al. 2003; Brinchmann et al. 2004) to obtain the spectral classes of galaxies. The classification flagged as BPTCLASS in MPA–JHU catalogue is based on Baldwin, Phillips & Terlevich (1981, hereafter BPT) diagram using the methodology described in Brinchmann et al. (2004). Here we apply four classes for our spectra: passive galaxies (BPTCLASS=-1), star-forming galaxies (BPTCLASS=1,2), composite galaxies (BPTCLASS=3), and AGN (BPTCLASS=4, 5). Fig. 7 presents the differences between the predictions by the our model and their true values for these four types of galaxies.

The differences (Δy) of age, metallicity, E(B − V), and VD for four classes of galaxies: passive, star-forming (SF), composite, and AGN. In each panel, the middle point and bar of error bars, respectively, represent the mean and standard deviation of Δy. The black dash line represents y = 0.
Figure 7.

The differences (Δy) of age, metallicity, E(BV), and VD for four classes of galaxies: passive, star-forming (SF), composite, and AGN. In each panel, the middle point and bar of error bars, respectively, represent the mean and standard deviation of Δy. The black dash line represents y = 0.

From Fig. 7, it can be seen that the model’s predictions for different types of galaxies are generally consistent with the true values with small uncertainties. In terms of age, model prediction for passive galaxies has the smallest error of 0.087 dex, while its prediction for star-forming galaxies exhibits the largest error of 0.131 dex. The prediction error of the other two types of galaxies is between the above two values. As to metallicity, the model also has the smallest prediction error(0.082 dex) for passive galaxies and the largest error (0.133 dex) for star-forming galaxies. For E(BV), the prediction errors for passive and star-forming galaxies are 0.012 and 0.022, respectively. Finally, to VD, the model prediction for star-forming galaxies has larger error of 33 km s−1, whereas its prediction errors for the other three classes are similar, around 30 km s−1. The above results show that our model performs the best in prediction for passive galaxies and the worst for star-forming galaxies. The reason for larger prediction error of star-forming galaxies may be star-forming galaxies have the relatively low continuum level and strong emission lines, which affects the continuum and absorption lines that are very important features for stellar population estimation.

5 SUMMARY

In this study, we train a CNN to infer the mean age and metallicity, colour excess E(BV), and central VD for galaxy spectra with SDSS-like resolution. The input feature is a 5100-dimensional spectrum in observed wavelength without de-redshifted to the rest frame. This is the first time to simultaneously predict the four galaxy properties based on deep learning using spectra in observed wavelength frame. The model can estimate age, metallicity, E(BV), and VD of galaxies without requiring prior knowledge of their redshifts. By comparison experiments, our CNN model demonstrates good performance in deriving galaxy properties, and greatly reduces the time required compared to traditional methods. The results are the following in detail:

  • Our CNN model predictions of properties are in agreement with the traditional spectral fitting method within some scatters: 0.11 dex for age and metallicity, 0.018 mag for E(BV), and 31 km s−1 for VD. In term of computation time, our method is more than 10 times faster than traditional method.

  • In different S/N levels, CNN model has relatively good prediction performance even at low S/N. At S/Nr = 5, the scatters are 0.16 dex for age and metallicity, 0.03 mag for E(BV), and 50 km s−1 for VD, respectively.

  • In different redshifts, there are slightly larger prediction errors for galaxies with lower and higher redshifts: 0.13 dex for age and metallicity, 0.02 mag for E(BV), and 40 km s−1 for VD.

  • In different spectral classes of galaxies, our model performs the best for passive galaxies and the worst for star-forming galaxies. The errors of model predictions of age, metallicity, E(BV), and VD for passive galaxies are 0.087 dex, 0.082 dex, 0.012, and 30 km s−1, respectively, while ones for star-forming galaxies are 0.131 dex, 0.133 dex, 0.022, and 33 km s−1.

The well-trained CNN model and related codes are publicly available on https://github.com/sddzwll/CNNforStellarp. One can input an observed spectrum with SDSS-like resolution into the model, and the model immediately outputs predicted values of four parameters(light-weighted average age and metallicity, E(BV), and VD). Note that, our model is applicable to spectra in the low-redshift Universe (redshift ≤ 0.3). The next step is to expand the redshift coverage of the training data so as to train the model to predict the stellar populations of higher redshift galaxies.

Acknowledgement

This work was supported by the National Natural Science Foundation of China (grant Nos. 11903008, 12273075, and U1931106) and Shandong Provincial Natural Science Foundation of China (grant No. ZR2019YQ03).

DATA AVAILABILITY

The data underlying the research results is from Sloan Digital Sky Survey. The catalogue of galaxies is available on http://skyserver.sdss.org/CasJobs/, and spectroscopic data are download from https://dr12.sdss.org/optical/spectrum/search.

References

Aniyan
A. K.
,
Thorat
K.
,
2017
,
ApJS
,
230
,
20

Baldwin
J. A.
,
Phillips
M. M.
,
Terlevich
R.
,
1981
,
PASP
,
93
,
5
(BPT)

Brinchmann
J.
,
Charlot
S.
,
White
S. D. M.
,
Tremonti
C.
,
Kauffmann
G.
,
Heckman
T.
,
Brinkmann
J.
,
2004
,
MNRAS
,
351
,
1151

Bruzual
G.
,
Charlot
S.
,
2003
,
MNRAS
,
344
,
1000

Buck
T.
,
Wolf
S.
,
2021
,
preprint
()

Calzetti
D.
,
Armus
L.
,
Bohlin
R. C.
,
Kinney
A. L.
,
Koornneef
J.
,
Storchi-Bergmann
T.
,
2000
,
ApJ
,
533
,
682

Cappellari
M.
,
2017
,
MNRAS
,
466
,
798

Cappellari
M.
,
2023
,
MNRAS
,
526
,
3273

Cappellari
M.
,
Emsellem
E.
,
2004
,
PASP
,
116
,
138

Chevallard
J.
,
Charlot
S.
,
2016
,
MNRAS
,
462
,
1415

Chollet
F.
et al. ,
2018
,
Astrophysics Source Code Library, record ascl:1806.022

Cid Fernandes
R.
,
Mateus
A.
,
Sodré
L.
,
Stasińska
G.
,
Gomes
J. M.
,
2005
,
MNRAS
,
358
,
363

D’Isanto
A.
,
Polsterer
K. L.
,
2018
,
A&A
,
609
,
A111

Dieleman
S.
,
Willett
K. W.
,
Dambre
J.
,
2015
,
MNRAS
,
450
,
1441

Euclid Collaboration
et al. .,
2023
,
MNRAS
,
520
,
3529

George
D.
,
Huerta
E. A.
,
2018
,
Phys. Lett. B
,
778
,
64

Gomes
J. M.
,
Papaderos
P.
,
2017
,
A&A
,
603
,
A63

Hong
S.
,
Zou
Z.
,
Luo
A. L.
,
Kong
X.
,
Yang
W.
,
Chen
Y.
,
2023
,
MNRAS
,
518
,
5049

Hoyle
B.
,
2016
,
Astron. Comput.
,
16
,
34

Huertas-Company
M.
,
Lanusse
F.
,
2023
,
PASA
,
40
,
e001

Huertas-Company
M.
et al. ,
2015
,
ApJS
,
221
,
8

Kauffmann
G.
et al. ,
2003
,
MNRAS
,
341
,
33

Liew-Cain
C. L.
,
Kawata
D.
,
Sánchez-Blázquez
P.
,
Ferreras
I.
,
Symeonidis
M.
,
2021
,
MNRAS
,
502
,
1355

Maraston
C.
,
2005
,
MNRAS
,
362
,
799

Maraston
C.
,
Strömbäck
G.
,
2011
,
MNRAS
,
418
,
2785

Ocvirk
P.
,
Pichon
C.
,
Lançon
A.
,
Thiébaut
E.
,
2006
,
MNRAS
,
365
,
46

Smith
M. J.
,
Geach
J. E.
,
2023
,
R. Soc. Open Sci.
,
10
,
221454

Surana
S.
,
Wadadekar
Y.
,
Bait
O.
,
Bhosale
H.
,
2020
,
MNRAS
,
493
,
4808

Tao
Y.-h.
et al. ,
2020
,
Prog. Astron.
,
38
,
168

Tojeiro
R.
,
Heavens
A. F.
,
Jimenez
R.
,
Panter
B.
,
2007
,
MNRAS
,
381
,
1252

Vazdekis
A.
,
Sánchez-Blázquez
P.
,
Falcón-Barroso
J.
,
Cenarro
A. J.
,
Beasley
M. A.
,
Cardiel
N.
,
Gorgas
J.
,
Peletier
R. F.
,
2010
,
MNRAS
,
404
,
1639

Vazdekis
A.
,
Koleva
M.
,
Ricciardelli
E.
,
Röck
B.
,
Falcón-Barroso
J.
,
2016
,
MNRAS
,
463
,
3409

Vega-Ferrero
J.
et al. ,
2021
,
MNRAS
,
506
,
1927

Wang
H.
,
Wu
S.
,
Cao
Z.
,
Liu
X.
,
Zhu
J.-Y.
,
2020
,
Phys. Rev. D
,
101
,
104003

Wang
L.-L.
et al. ,
2022
,
ApJS
,
258
,
9

Wilkinson
D. M.
et al. ,
2015
,
MNRAS
,
449
,
328

Wilkinson
D. M.
,
Maraston
C.
,
Goddard
D.
,
Thomas
D.
,
Parikh
T.
,
2017
,
MNRAS
,
472
,
4297

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.