Simulating emission line galaxies for the next generation of large-scale structure surveys

We investigate emission line galaxies across cosmic time by combining the modified L-Galaxies semi-analytical galaxy formation model with the JiuTian cosmological simulation. We improve the tidal disruption model of satellite galaxies in L-Galaxies to address the time dependence problem. We utilise the public code CLOUDY to compute emission line ratios for a grid of HII region models. The emission line models assume the same initial mass function as that used to generate the spectral energy distribution of semi-analytical galaxies, ensuring a coherent treatment for modelling the full galaxy spectrum. By incorporating these emission line ratios with galaxy properties, we reproduce observed luminosity functions for H$\alpha$, H$\beta$, [OII], and [OIII] in the local Universe and at high redshifts. We also find good agreement between model predictions and observations for auto-correlation and cross-correlation functions of [OII]-selected galaxies, as well as their luminosity dependence. The bias of emission line galaxies depends on both luminosity and redshift. At lower redshifts, it remains constant with increasing luminosity up to around $\sim 10^{42.5}\rm \, erg\,s^{-1}$ and then rises steeply for higher luminosities. The transition luminosity increases with redshift and becomes insignificant above $z$=1.5. Generally, galaxy bias shows an increasing trend with redshift. However, for luminous galaxies, the bias is higher at low redshifts, as the strong luminosity dependence observed at low redshifts diminishes at higher redshifts. We provide a fitting formula for the bias of emission line galaxies as a function of luminosity and redshift, which can be utilised for large-scale structure studies with future galaxy surveys.


INTRODUCTION
In the past few decades, large-scale surveys have propelled revolutionary developments in astronomy.Various surveys, such as the Sloan Digital Sky Survey (SDSS, York et al. 2000;Gunn et al. 2006), the Dark Energy Survey (DES, Dark Energy Survey Collaboration et al. 2016), and the Hyper Suprime-Cam Subaru Strategic Program (HSC-SSP, Aihara et al. 2018;Kawanomoto et al. 2018) have significantly enhanced our understanding of the universe and the underlying theories of galaxy formation.These surveys have contributed to the solidification of the ΛCDM cosmological framework, precise measurements of the expansion rate of the universe, and the provision of extensive data on the large-scale structure of the ★ E-mail: guoqi@nao.cas.cnuniverse, including the mapping of millions of galaxies.In addressing evolving scientific challenges, currently ongoing and upcoming large-scale next-generation surveys, such as the Chinese Space Station Telescope (CSST, Zhan 2011), the Dark Energy Spectroscopic Instrument (DESI, DESI Collaboration et al. 2022), Euclid (Laureĳs et al. 2011), the Large Synoptic Survey Telescope (LSST, Ivezić et al. 2019), the Nancy Grace Roman Space Telescope (Roman, Dressler et al. 2012;Spergel et al. 2013), and the Subaru Prime Focus Spectrograph (PFS, Takada et al. 2014;Tamura et al. 2016) aim to achieve substantial breakthroughs in fundamental issues such as the origin and evolution of dark matter and dark energy, as well as the origin and evolution of galaxies and black holes.These state-of-art instruments and surveys have lower detection limits and better spatial resolution, thereby extending observational data towards fainter objects and higher redshifts.
Numerical simulations (Kauffmann et al. 1999;Springel et al. 2001Springel et al. , 2005;;Guo et al. 2011;Henriques et al. 2015;Schaye et al. 2015;Pillepich et al. 2018) have been successful in investigating contemporary galaxy formation and cosmology.These simulations have shown the ability to reproduce various observed galaxy characteristics across different periods in cosmic history.Their capability in predicting, interpreting, and optimizing observational outcomes has rendered them valuable tools for comprehending the processes underlying galaxy formation and the progression of large-scale structures.Hydrodynamic cosmological simulations such as EAGLE (Schaye et al. 2015) and IllustrisTNG (Pillepich et al. 2018) have shown enhanced capacity in studying baryonic physics processes, especially gas dynamics.Recently, Hirschmann et al. (2023) computed optical and NUV emission lines originating from various sources such as star clusters, narrow-line regions of AGN, post-asymptotic giant branch stars, and fast radiative shocks for galaxies in the IllustrisTNG simulation up to redshift 7 via post-processing, providing valuable predictions for JWST.Similarly, Osato & Okumura (2023) constructed mock H and [OII] catalogues based on IllustrisTNG to investigate the clustering of emission line galaxies.However, such approaches are hindered by substantial CPU time requirements and computational constraints, precluding the simultaneous achievement of both high precision and large simulation volumes.
By combining physically motivated recipes of galaxy formation with N-body cosmological dark matter simulations, semi-analytical models (e.g.Kauffmann et al. 1999;Cole et al. 2000;Guo et al. 2011;Lacey et al. 2016;Lagos et al. 2018) provide a cost-effective alternative.This approach allows simultaneous consideration of high precision and large simulation volumes, as compared to hydro simulations.Izquierdo-Villalba et al. (2019) combined the semi-analytical model L-Galaxies (Henriques et al. 2015) with the emission line model from Orsi et al. (2014) to construct a light cone for the J-PLUS survey.Baugh et al. (2022) applied the pre-computed grid of emission line luminosity released by Gutkin et al. (2016) within the semi-analytical galaxy formation code GALFORM (Lacey et al. 2016) to reproduce the observed locus of star-forming galaxies on standard line ratio diagnostic diagrams.Favole et al. (2020) applied the emission line model described by Orsi et al. (2014) to three different semi-analytical models: SAG (Cora et al. 2018), SAGE (Croton et al. 2016), and GALACTICUS (Benson 2012) and concluded that utilising average star formation rates is a feasible method to generate [OII] luminosity functions.However, these works used different stellar population synthesis (SPS) models for computing stellar components and HII regions, introducing additional inconsistencies in the final results.
In this study, we implement the state-of-art semi-analytic model L-Galaxies (Henriques et al. 2015) onto the merger trees extracted Table 1.Details about our simulation suite.The first column shows the name of the simulation or merger tree set; the second column shows the corresponding boxsize; the third column shows the particle number; the fourth column shows the dark matter particle mass; the fifth column shows total snapshots.from a large-box-size, high-resolution N-body dark matter simulation to produce a galaxy catalogue for upcoming large-scale surveys.
We improve the satellite disruption model in L-Galaxies to address a theoretical issue with varying time resolutions.We record the complete star formation history (SFH) for each individual galaxy, enabling the computation of photometric magnitudes for any given filters as post-processes.We combine a grid of HII region models with the public radiation transfer code CLOUDY to derive emission line ratios using the same SPS model employed for the stellar components, ensuring the self-consistency of our predictions.This guarantees consistent treatment between the stellar SED and the emission line luminosities.The grid of HII regions cover a wider parameter space compared to many previous work.By combining this with the semi-analytical galaxy output, we calculate the luminosities of the 13 most frequently utilised NUV and optical emission lines.This paper is organised as follows.Sec. 2 provides an overview of the N-body dark matter simulations Jiutian-1G and Mini-Hyper and the details of our semi-analytic model and emission line models.Sec. 3 presents our model predictions for various galaxy properties, while Sec. 4 shows the properties of emission line galaxies.We conclude with a summary and discussion in Sec. 5.

DATA AND METHOD
In this section, we give a brief description about the dark matter merger trees, semi-analytic models, and emission line models.Two sets of N-body cosmological simulations are adopted, a large simulation, JiuTian-1G, and four sets of merger trees exacted from a small run with different numbers of snapshots.Details are listed in Table 1.Then we modify the L-Galaxies model (Henriques et al. 2015) and apply it on the Jiutian-1G dark matter simulation.We then utilise a publicly available radiation transfer code to determine the luminosity of emission lines by implementing a photoionisation model surrounding the star formation regions.

Dark matter simulation
JiuTian Simulations comprise a series of cosmological N-body simulations, ranging in box size and resolution.The JiuTian-1G (hereafter JT1G) simulation is a large dark matter only -body simulation within the framework of Λ cold dark matter (ΛCDM) cosmology designed for next-generation surveys.Utilising the L-Gadget3 code (Springel 2005), the JT1G simulation tracks 6144 3 dark matter particles within a cubic simulation box with a side length  box = 1Gpc/h.This box length is twice as large as the previous Millennium Simulation (MS, Springel et al. 2005), resulting in an eight-fold increase in volume compared to MS.The particle mass is 3.72 × 10 8 M ⊙ /h, almost three times smaller than the original MS.The JT1G simulation stores 128 snapshots ranging from redshift 127 to 0, with an average time gap of approximately 100 Myr.This time resolution is chosen for weak lensing studies and is twice as large as MS with 64 snapshots.We adopt the cosmological parameters from Planck Collaboration et al. (2020) as follows:  8 = 0.8102,  0 = 67.66kms−1 Mpc −1 , Ω Λ = 0.6889, Ω m = 0.3111, Ω b = 0.0490(  b = 0.1575).Following Springel et al. (2005), dark matter halos and subhalos are identified using the friends-of-friends (FOF) and SUBFIND (Springel et al. 2001) algorithms.Additionally, to establish the merger trees, the subhalos are linked with their unique descendants employing the B-Tree code.Details about JT1G are referred to Han et al. in prep and Li et al. in prep.Although previous works have extensively examined the particle mass resolution (see Crain & van de Voort 2023, for a review), limited attention has been given to time resolution.Benson et al. (2012) concluded that 128 snapshots are necessary for the GALACTICUS semianalytical model (Benson 2012) to achieve convergence within a 5% level in stellar mass.We conduct four sets of merger trees with different temporal resolutions from a smaller simulation, Mini-Hyper, to assess the time convergence.The parent simulation has a box size of 125 Mpc/h and a particle mass similar to JT1G, 3.674 × 10 8 M ⊙ /h.The total particle number is 768 3 which is stored in 513 distinct snapshots.We employ slightly different cosmological parameters compared to JT1G: Four merger trees are then constructed accordingly with varying time intervals using the B-Tree code with different "SnapSkip-Fac".This approach yields four sets of merger trees with comparable merger tree structures but differing numbers of snapshots.Table 1 shows the parameters for all our simulations.In practice, we fix the first and last snapshots across all simulations.Then we select every second snapshot to construct a simulation with 257 snapshots.Further skipping every other snapshot results in a simulation with 129 snapshots.Following the same methodology, simulations with 65/33 snapshots were constructed, with the number of snapshots decreasing by a factor of 2 each time.Therefore, we obtain four sets of merger trees with similar tree structures, wherein the time intervals between two snapshots increase by a factor of 2 each time.It is worth noting that these four sets of merger trees have the same dark matter halo properties at common redshifts.Fig. 1 depicts how we create merger trees with different snapshots.

Semi-Analytical Model: L-Galaxies
We use the version of the L-Galaxies code as described in Henriques et al. (2015) (hereafter H15) and make modifications to solve the time convergence problem.H15 includes physical prescriptions for various baryonic processes, such as shock heating, gas cooling, star formation, supernova feedback, formation and growth of supermassive black holes (SMBH), AGN feedback, metal enrichment and etc.Details about the physical recipes and parameters can be found in their supplementary material.We have modified the satellite disruption procedure to address the issue of time convergence and have readjusted the parameters to replicate the abundance of SMBH in the local Universe.

Time convergence problem
We utilise the original H15 code to examine whether similar galaxy properties can be acquired on dark matter merger trees with vary- ing time intervals.The black lines in the first row of Fig. 2 show the statistical properties of galaxies at z∼0 predicted by the original H15 model, including the galaxy stellar mass function (SMF), galaxy abundance as a function of star formation rate (SFRF) and the supermassive black hole mass function (BHMF).Distinct line styles represent results from merger trees with different time intervals: solid lines for Mini-Hyper-33, dashed lines for Mini-Hyper-65, dotted lines for Mini-Hyper-129, and dotted-dashed lines for Mini-Hyper-257 merger trees.Surprisingly, we observe substantial differences in galaxy properties across simulations.The left panel reveals that simulations with smaller time intervals tend to have more massive galaxies.The difference is remarkably large, reaching several orders of magnitude at  * > 10 11 M ⊙ between Mini-Hyper-33 and Mini-Hyper-257.Larger offsets in SFRF and BHMF are evident, as the Mini-Hyper-257 showcases significantly higher numbers of highly star-forming galaxies and considerably less SMBH compared to the other simulations.These substantial differences with different time intervals strongly suggest a challenge in time convergence within the H15 code.

Relevant physical processes
Further investigation into the H15 code reveals that the primary cause of the time convergence problem is linked to the fate of orphan satellite galaxies, which lost their subhalos due to physical processes or numerical effects.In H15, following the disruption of its subhalo, a merging clock is set simultaneously based on an estimated time ( friction ) for the orphan to spiral into the central galaxy due to dynamical friction.An orphan galaxy will ultimately either undergo disruption or merge.It could suffer tidal disruption on its way spiralling into the centre, relying on the competition between self-gravity and the tidal force from the main dark matter halo.In practice, a comparison is made between the baryonic (cold gas and stellar mass) density within the half-mass radius and the dark matter  density of the main halo at the assumed pericentre ( peri ) of the orphan's orbit.
If the tidal force exceeds the bounding gravity, prior to Δ =  friction where Δ is the time since merger clock is set, the orphan galaxy will be completely disrupted and no longer undergo merging.Their SMBH, gas, and stars are all distributed in the halo of the central galaxy.SMBH in the central galaxy will not grow in this scenario.Conversely, if tidal force never exceeds the bounding gravity (until Δ =  friction ), the orphan galaxy will merge into the central galaxy at Δ =  friction .During galaxy mergers, the SMBH in the central galaxy could grow by swallowing the SMBH from the satellite galaxies that are being merged and by experiencing strong gas accretion, which is triggered by the merger process.
A residual time-dependent treatment was adopted in the original H15 model, where the merger recipe and the satellite disruption recipe are called at different times.H15 divides the time between two adjacent snapshots into 20 sub-steps and calls the "merger" recipe in each sub-step, while the "disruption" recipe is only called at the end of each snap gap.This prioritises the occurrence of merger, while the disruption is delayed until the final sub-step, regardless of meeting the disruption criterion earlier.Larger intervals of time between two consecutive snapshots increase the chances of mergers occurring.As a result, there are fewer disrupted orphan galaxies, and the SMBH become more massive.The increased mass of SMBH in turn leads to more effective active galactic nucleus (AGN) feedback, resulting in smaller central galaxies.We conducted experiments to preserve time resolution in the substep by adjusting the number of substeps in different Mini-Hyper simulations, but this did not solve the time resolution problem.

Modifications to the disruption model
To address this time convergence problem, we make modifications to the disruption model in the code.We replace the assumed pericentre distance with the actual distance of orphan satellite galaxies from the central galaxy  orphan , and call the disruption function at each substep.We track the most bound particle, which was defined at the point just before the orphan galaxy lost its substructure.By multiplying its distance from the central galaxy  mostbound by a factor that accounts for the impact of dynamical friction, we can estimate the distance of the orphan galaxy  orphan as follows.
We introduce a minimum distance at which disruption could occur, set as the scale radius of the gas disk in the central galaxy,  gas .The choice of  gas as the minimum distance is justified by the notion that if an orphan galaxy has reached the region of  gas , it should be considered as having entered the region of central galaxies and is undergoing a strong interaction.In such situations, it is more appropriate to treat the event as a merger rather than a disruption.
We implement the modified model to the Mini-Hyper simulations with different snapshots without making any changes to the parameters from the initial H15 version.The stellar mass function, star formation rate function, and black hole mass function of the modified model are shown as the red lines in the first row of Fig. 2, indicating good agreement among different time resolutions.The quantification of the differences with respect to the Mini-Hyper-257 simulation is presented in the second row and shows that the differences in SMF and SFRF are within 5% in most cases.At higher values, it could suffer from the limited sample size.The variation in BHMF seems larger, especially at the low mass and at the high mass end.This suggests that the growth of SMBH could be more sensitive to the time interval between consecutive snapshots.Furthermore, varying numbers of snapshots could also result in difference within the generated trees.For example, Wang et al. (2016) showed that using more snapshots typically results in shorter branches for most tree builders.This is because more linking errors may occur when processing more snapshots, as tree builders are prone to resolution or flip-flop problems.Han et al. (2018) also showed that the SUBFIND and DTree combination could produce a substantial amount of fragmented branches, which in turn impacts the properties of resulting galaxies.

Parameter adjustment
It has been noticed that the mass of SMBH is lower by an order of magnitude compared to current observations (e.g.Yang et al. 2019) at z = 0 (see also the upper panel in Fig. 9, the black line is the result of H15, the purple symbols are observational data).To determine the parameters of our new model, we incorporate the black hole mass function at z = 0 as constraint, in addition to the observed galaxy stellar mass function at z=0, 1, 2, and 3, and the passive fraction at z = 0.4.The observational data we use in this study are listed in Table A1.In accordance with Henriques et al. (2013), we establish a representative subset of subhalo merger trees and employ the MCMC method as described in Henriques et al. (2009), to thoroughly explore the multidimensional parameter space with the updated disruption model in L-Galaxies.In short, we divide the haloes into I halo mass bins with a width of 0.5 dex, and the galaxies into J stellar mass bins with a same width.We randomly select   haloes from a total   halos in the th halo mass bin.The number   is determined by a set , where    is the average number of galaxies in the th stellar mass bin for haloes in the th halo mass bin, Φ  is the total number of galaxies in the th stellar mass bin. is the uncertainty of the stellar mass function, which we set to be  < 0.05.Our final representative subsample is ∼ 1/512 of the whole box.Details about the sample construction is available in Henriques et al. (2013), APPENDIX B.
For comparison with the default H15 parameters, the best-fit parameters are enumerated in Table 2. Our modified code maintains compatibility with the default H15 parameters.The best-fit parameters are then applied to the full volumes of the JT1G simulation to generate the SAM galaxy catalogue.

Mass-resolution convergence test
We have evaluated the mass resolution effect by applying the same SAM models (our modified model) and parameters (the best-fitting parameters for JT1G) to the merger trees extracted from the re-scaled Millennium-II simulation (Boylan-Kolchin et al. 2009;Henriques et al. 2015).The MSII simulation has a mass resolution of 7.69 × 10 6 M ⊙ /h, which is ∼ 50 times higher than JT1G, although the volume is smaller by 1000 (see Henriques et al. 2015, and reference therein for more details about the re-scaled MSII).MSII has been shown to resolve the smallest halo capable of hosting a detectable galaxy (e.g.Guo et al. 2011).Fig. B1 illustrates that the stellar mass of the galaxy converges at 10 9 M ⊙ between JT1G and MSII within a 10% from z=0 -3.This difference increases to 30% at 10 8 M ⊙ , partly attributed to slight differences in the cosmological parameters and possibly stemming from distinctions in their initial conditions (Li et al. in prep).At high masses, the disparity between JT1G and MSII is mainly due to varying mass resolutions.The mass resolution in MRII is about 50 times higher than in JT1G, allowing it to resolve more small halos that can later merge into larger systems.These mergers lead to larger supermassive black holes (SMBHs) in MRII, resulting in more effective AGN feedback that suppresses star formation and ultimately leads to a smaller stellar mass.The difference in stellar mass in massive halos is not significant, typically around 0.15 dex.However, due to the steep slope at the high mass end of the SMF, this slight variation in stellar mass can result in a notable difference in the stellar mass function at high masses.Along with mass resolution, cosmic variance stemming from the smaller volume of MSII could also be a factor.Similar comparisons are performed on the abundance of the galaxy as a function of SFR in Fig. B2.It shows a very good convergence between the JT1G simulation and MSII at z =0.At higher redshifts, the convergence is somehow larger, but all within 10%.

Emission line model
In this section, we first briefly describe the process of generating the galaxy spectral energy distribution (SED).Then, we explain in detail how to use the expected SED as the input ionising spectrum to calculate the luminosities of emission lines.In practice, we employ the radiative transfer code CLOUDY to calculate the relative strength of emission lines on a grid of HII region models.Meanwhile, we adopt empirical relations to establish connections between the general properties of galaxies and the parameters that describe the ionisation regions.According to the general properties of a given galaxy, the luminosity of each emission line is derived by performing interpolation within a pre-computed grid of line luminosities.During this process, we utilise the same Stellar Population Synthesis model and initial mass function as those employed in calculating the galaxy SED, ensuring the self-consistency of galaxy SED and emission line calculation.

Galaxy spectral energy distributions
Stellar Population Synthesis (SPS) models serve as essential tools in astrophysics, enabling the generation of synthetic SEDs with detailed information about the star formation history (SFH), metallicity, and initial mass function (IMF).These models provide valuable insights into the formation and evolution of galaxies.In this work, our default SPS model is based on the Bruzual & Charlot (2003) framework, with a Chabrier IMF (Chabrier 2003).
The entire SFH of both the disk and bulge components is stored in 22 distinct bins, as detailed by Shamshiri et al. (2015).Consequently, the SED can be produced using any desired SPS model and IMF as post-processes.

Grid of emission line models
We use the c17.04 release of the photoionisation code CLOUDY (Ferland et al. 2017), a widely used tool, to address radiation transfer in photoionised regions.To solve the radiation transfer equation, we specify the most important input physical properties of the gas cloud and spectrum of ionising sources, including the intensity and spectrum of the ionising photons, gas geometry, gas metallicity , and hydrogen density  H .
We adopt the BC03 (Bruzual & Charlot 2003) SPS model with a Chabrier IMF (Chabrier 2003) to calculate the intensity and spectrum of ionising photons.These SPS model and IMF are also used in calculating the galaxy SED.It is noteworthy that previous studies did not consistently employ the same stellar population models for computing the galaxy SED and for the photoionisation modelling of nebular emission.For example, Baugh et al. (2022) used the M05 model (Maraston 2005) to calculate galaxy stellar SED, yet adopted the emission line models based on BC03 in the HII regions.Izquierdo-Villalba et al. (2019) adopted different stellar synthesis models to calculate the emission line grid (Levesque et al. 2010) and the stellar SED (Bruzual & Charlot 2003) when generating the emission line galaxy mock catalogue for J-PLUS.This difference results in a lack of self-consistency between the two distinct components of the combined galaxy spectrum (i.e.stellar and emission).In contrast, our work ensures internal consistency by employing the same SPS  (Chabrier) for both the stellar component and the nebular model within galaxies.
For the gas geometry, we follow Byler et al. (2017) to assume a spherical shell geometry with the ionising source located in the centre, and to adopt an inner radius  inner at 10 19 cm.
In contrast to Byler et al. (2017), who assumes a constant gas density, we take into account the gas density within various ranges.In practice, we calculate the emission line ratios for a grid of HII region models by sampling U, , and  H as follows: log 10 U : −4.0, −3.5, −3.0, −2.5, −2.0, −1.5, −1.0 log 10 / ⊙ : −2.0, −1.5, −1.0, −0.6, −0.4,−0.3, −0.2, − 0.1, 0.0, 0.1, 0.2 log 10  H : 1, 2, 3, 4 (cm −3 ) where U is the ionisation parameter, a dimensionless parameter defined as the ratio of hydrogen ionising photons to the total hydrogen density: where   is the volume density of the ionizing photons. is the metallicity of cold gas, and  H is the hydrogen density.The output is a grid of line ratios relative to the H luminosity with different parameters.Using linear interpolation in log 10 U, log 10 / ⊙ and log 10  H , we extract line ratios for each star-forming galaxy.If the values for U, , and  H fall outside the range of the grid, we use the closest limiting values from the grid to align with the prediction of the catalogue.Fig. 3 shows the classic BPT diagram (Baldwin et al. 1981) of the pre-computed grid of emission line luminosities.The x-axis depicts [NII]6584/H, and the y-axis depicts [OIII]5007/H.The gray grid with blue symbols shows the grid with a hydrogen density of  H = 10cm −3 , varying the metallicity  and the ionisation parameter U. Additionally, we present the grid of emission line ratios from Byler et al. ( 2017) as black lines with red symbols.We employ a hydrogen density, metallicity, and ionisation parameter grid akin to that utilised in Byler et al. (2017), albeit with a different spectrum of ionising photons.Furthermore, we use the Chabrier IMF, while they employ the Kroupa IMF (Kroupa 2001).Our BPT diagram bears a resemblance to theirs, with a slight offset observed at higher metallicity.The [OIII]5007/H ratio exhibits a monotonically increasing trend with increasing U, while the [NII]6584/H ratio demonstrates a monotonically increasing pattern with increasing .We notice that as  becomes sufficiently large, an overlapping within the grid itself becomes evident, reflecting the degeneracy of  and U.

H𝛼 luminosity
CLOUDY provides line ratios relative to the H luminosity, which is closely related to the star formation rate.For a nebula that is absolutely optically thick for ionising photons and optically thin for redward photons (case B, Osterbrock & Ferland 2006), we can theoretically derive the relation between the intensity of a specific hydrogen recombination line and the ionising photon rate by quantum mechanics.The relation between the luminosity of H and the ionising photon rate is expressed as: where  eff H is the effective recombination coefficient at H, and  B is the case B recombination coefficient. H is the total number of ionised photons emitted per second, which could be related to the star formation rate assuming the Chabrier IMF (Chabrier 2003): The H luminosity can then be expressed as a function of SFR (Falcón-Barroso & Knapen 2013):

Emission line luminosity for semi-analytical galaxies
With the grid of emission lines, the final step in predicting the emission lines for semi-analytical galaxies is to establish a connection between the general properties of the galaxies and the parameters that determine the emission lines.These parameters include gas metallicity, hydrogen density, and ionisation parameters.
The gas metallicity is obtained directly from the semi-analytical catalogue.Here we use gas metallicity  cold for the whole galaxy.Calculations including individual heavy elements will be performed in further work.
We adopt the empirical relations based on local observations (Kashino & Inoue 2019) to link the hydrogen density  H to the stellar mass and specific star formation rate as follows, log 10  H cm −3 = 2.066 + 0.310 log 10 ( * / M ⊙ ) − 10.0 where sSFR is the specific star formation and  * is the stellar mass.
Both can be obtained directly from the semi-analytical catalogue.
The joint effect of the ionising spectrum and its intensity can be described by the ionisation parameter, U. We adopt the empirical relations based on local observations (Kashino & Inoue 2019) to link the ionization parameter to the specific star formation rate, gas metallicity and hydrogen density as follows, log 10 U = −2.316− 0.36 0.69 + log 10 ( cold / ⊙ ) − 0.292 log 10  H /cm −3 + 0.428 log 10 sSFR/yr −1 + 9 .
(8) So far, we have obtained the input parameters for the corresponding galaxy.By interpolating within a pre-computed grid of line luminosities, as explained in the previous section, one can determine the luminosity of every emission line for each model galaxy.
The procedure for generating emission lines is generally similar to previous works (Orsi et al. 2014;Izquierdo-Villalba et al. 2019;Favole et al. 2020) but varies in details.Our pre-calculated emission line models rely on gas density, metallicity, and ionization parameter.These previous works assumed a fixed  H = 100cm −3 , with the photoionization parameter depends on gas metallicity.Consequently, their emission line models are solely influenced by metallicity.Furthermore, we employ the same BC03 SPS model and Chabrier IMF for determining both stellar SED and emission line luminosities, thus ensuring internal consistency within the model.Orsi et al. (2014), Izquierdo-Villalba et al. (2019), andFavole et al. (2020) utilize the Salpeter IMF for computing emission line ratios but switch to the Kroupa IMF for calculating the H luminosity.Baugh et al. (2022) introduced a distinct SPS model for stellar SED (M05) and nebular regions (BC03).The main difference between our approaches and Merson et al. (2018) lies in the use of different links between galaxy properties and the input parameters for CLOUDY.For example, they uses an average gas density, the total gas mass divided by the cubic radius, whereas we implemented a empirical scaling relation associating the SFR and stellar mass with typical gas density in the SFR vicinity.

Dust extinction
The extinction induced by dust significantly influences the observed spectra of galaxies, absorbing UV/optical photons and re-emitting them at longer wavelengths.Consequently, galaxies rich in dust tend  to have red colours even if they have a high SFR.In this work, we follow the dust model in H15, considering both the extinction from diffuse ISM (Devriendt et al. 1999) and from star-forming molecular clouds (Charlot & Fall 2000).The optical depth of dust, as a function of wavelength, is independently computed for each component; and a slab geometry is assumed to compute the total extinction of the relevant populations.
Firstly, we calculate the extinction caused by the diffuse ISM.The wavelength dependent optical depth of galaxy disks is assumed as follows: where is the extinction curve for solar metallicity taken from Mathis et al. (1983), (1 + ) −1 represents the redshift dependence,  gas is the metallicity of cold gas,  = 1.35 for  < 2000 and  = 1.6 for  > 2000.⟨  ⟩ is the mean column density of hydrogen: where  gas ,d is the scale-length of the cold gas disk, and  = 3.36.
It should be noted that in previous work (Guo et al. 2011;Henriques et al. 2015), although they claimed to adopt  = 1.68, the actual factor used in their code was approximately  ∼ 3.36.Therefore, we use 3.36 instead of 1.68 in our calculations.
Another source contributing to the extinction is the molecular cloud around young stars.Following Charlot & Fall (2000), we assume that only young stars born within the last 10Myr will suffer from such effects.The optical depth is calculated as follows: where  is given by a Gaussian distribution with a mean of 0.3 and a standard deviation of 0.2, truncated at the boundaries of 0.1 and 1.The final extinction in magnitude for each component is given by: where  is the inclination angle between the angular momentum of the disk and the z-direction of the simulation box, and   is the optical depth of the corresponding component.Young stars (age less than 10Myr) suffer from both extinction components, while older stars are affected by diffuse ISM only.In the case of emission lines, we only consider the extinction from molecular clouds, as we only calculate the emission of star-forming regions.
In the following sections, all results are calculated using the new model and parameter settings applied to the merger trees from JT1G, unless otherwise specified.

GENERAL GALAXY PROPERTIES
This section presents a comprehensive comparison of various galaxy properties between our model predictions and observational data.We categorise the comparison into two classes: one for properties utilised as constraints in our MCMC parameter adjustment, and properties directly related; and the other for properties that are not utilised as constraints.(Brinchmann et al. 2004;Salim et al. 2007), black lines depict H15 results and red lines are the results from our new model.

Galaxy stellar mass functions
Fig. 4 shows the galaxy stellar mass functions span redshifts from 0 to 3. The red lines depict our model results, while the black lines represent the results of MS using default H15 model.The purple symbols with error bars are the observational data points used in our MCMC procedures that were produced by combining various observational studies in an attempt to estimate systematic uncertainties in the constraints.Further details can be found in Appendix A of H15.Our predicted SMF aligns successfully with observational data across the entire stellar mass range, spanning from local to high redshift.Notably, at very low masses, our results exhibit a slightly steeper slope compared to observations.Both the new set of parameters and the higher resolution of the JT1G dark matter simulation compared to MS, where H15 was conducted, could potentially contribute to the observed differences.Our results outperform H15 for stellar masses beyond the knee of the SMF.This is because we have a much larger volume, eight times larger to be exact, which enables us to include more galaxies with high masses.

Star formation rate
The star formation rate (SFR) represents a fundamental statistical measure within galaxy formation theory.In this work, a passive galaxy is defined as having an sSFR less than 10 −11 yr −1 , where sSFR = SFR/ * denotes the specific star formation rate.We utilise the passive fraction at z = 0.4 as a constraint when adjusting the parameters.Fig. 5 displays the probability distribution function (PDF) of sSFR for different stellar mass bins at redshift 0. In accordance with H15, we assign a random Gaussian sSFR centred at log(sSFR) = −0.3log( * ) − 8.6 with a dispersion of 0.5 for model galaxies with log(sSFR) ≤ −12yr −1 .The shaded regions represent results from SDSS DR7 (Brinchmann et al. 2004;Salim et al. 2007), while the black lines are results from H15.Our new model predicts an sSFR distribution similar to H15 across the entire stellar mass range and shows a higher peak in sSFR for the subset of star-forming galaxies at low masses, which is more consistent with the SDSS data compared to H15.
In Fig. 6, we show the SFRF at redshift 0, with various observational data points (Mauch & Sadler 2007;Robotham & Driver 2011;Patel et al. 2013;Gruppioni et al. 2015;Marchetti et al. 2016;Zhao et al. 2020).It is noteworthy that there exists a significant discrepancy between different observational datasets, with variations of up to three orders of magnitude at the high SFR end.Our model results fall within the range covered by these observations, emphasising the need for more accurate measurements to better constrain theoretical models.

Red and blue galaxies
sSFR is closely correlated with galaxy colours, which are widely employed to distinguish the star formation states of galaxies.Therefore, we further study the stellar mass functions of red galaxies and blue galaxies at z = 0 and 0.4.We use the same colour cut  as H15 for segregating galaxies into red and blue by:  −  = 1.85 − 0.075 × tanh((M r + 18.07)/1.09).The upper and bottom panels in Fig. 7 show the SMF of red and blue galaxies, and the left and right columns present the findings at redshifts 0 and 0.4, respectively.Both our model results and those of H15 successfully reproduce the overall shape of the observations (Bell et al. 2003;Baldry et al. 2004;Muzzin et al. 2013;Ilbert et al. 2013).However, our model has slightly more red galaxies at the low mass end at z=0.Furthermore, the SMF of red galaxies experiences a noticeable decline at  * ∼ 10 10 M ⊙ for H15 at z = 0, which is absent in our model.
The discrepancy between model predictions and observations becomes more pronounced when expressed in terms of the red fraction as a function of stellar mass (see Fig. 8).The upper and bottom panels illustrate the red fraction as a function of stellar mass at redshift 0 and 0.4, respectively, with observational data points sourced from Bell et al. (2003); Baldry et al. (2004); Muzzin et al. (2013) and Ilbert et al. (2013).At redshift 0, our model's results align marginally with the observations.We overestimate the proportion of red galaxies at the lower mass end and underestimate the red fraction at intermediate masses.H15 performs better at low masses but deviates more from the observations at  * ∼ 10 10 M ⊙ .At redshift 0.4, both our model and H15 are in line with the observations, although at intermediate masses, our model exhibits slightly fewer passive galaxies compared to H15.These findings underscore the need for further investigation into the quenching mechanisms, particularly around the knee of the stellar mass function.

Supermassive black holes
It has been noted that H15 underestimates the mass of SMBH by approximately an order of magnitude (Wang et al. 2018;Yang et al. 2019), suggesting that in the default H15 model the black hole growth rate is too slow.As illustrated in the upper panel of Fig. 9, our model agrees well with the observed BHMF taken from Shankar et al. (2009), which is used in the MCMC procedures, while H15 significantly underestimates the BHMF by an order.The new parameter set significantly enhances the growth rate during mergers, while also reducing the efficiency of AGN feedback.This adjustment allows the SMBH to attain larger masses, suppressing star formation activities in massive galaxies.Modifications in these aspects were essential to achieve a more precise representation of the observed BHMF.
The bottom panel of Fig. 9 illustrates the SMBH mass versus bulge mass relation.Observational data is obtained from Häring & Rix (2004), and the pink shaded region represents the relation from Kormendy & Ho (2013).Our new model predicts larger SMBH masses at a given bulge mass than H15 and aligns more closely with the observations.This alignment is a direct consequence of our enhanced growth rate during mergers.Some earlier studies also focus on improving the modeling of SMBH growth in L-Galaxies.Izquierdo-Villalba et al. (2020) introduces a time delay for Eddington accretion, promotes galaxy mergers, and incorporates additional pathways for SMBH growth, like disc instabilities.Spinoso et al. (2023) improves the modelling of SMBH seeds through various formation channels.Izquierdo-Villalba et al. (2024) constructs a new framework by including the super-Eddington accretion events.All these advancements have resulted in better alignment of SMBH in L-Galaxies with observational data.

Galaxy vs. dark halo relations
The relationship between galaxy and dark halo properties represents one of the most fundamental connections in the field of galaxy for-    (Moster et al. 2018;Behroozi et al. 2019).The solid and dashed lines represent the median and 16%(84%) values.
The maximum velocity and virial masses were determined at  = 0 for central galaxies and at the last infall time for satellite galaxies.
mation.Fig. 10 illustrates these scaling relations by highlighting the correlation between galaxy stellar mass and both the maximum halo velocity and the maximum virial mass.The maximum velocity and virial masses were determined at z = 0 for central galaxies, while for satellite galaxies, they were determined at the last infall time.In general, both of these scaling relations demonstrate good agreement with the H15 models, particularly in terms of their slopes.With regard to the correlation between stellar mass ( * ) and virial mass ( vir ), the newly proposed model showcases somewhat enhanced stellar masses for JT1G galaxies in comparison to the H15 models.This suggests a higher galaxy formation efficiency in the new model.It is noteworthy to mention that despite the minor variations, both model forecasts agree with direct measurements obtained using local data (Ristea et al. 2024) and abundance matching methodologies (Moster et al. 2018;Behroozi et al. 2019).

Gas metallicity
Fig. 11 depicts the relationship between gas metallicity and stellar mass.The metallicity of the gas ( gas ) is determined by the ratio of the metal mass in the cold gas to the mass of the cold gas.Then it is converted to oxygen abundance using the formula 12 + log 10 (O/H) gas = 8.69 + log 10 ( gas / ⊙ ).It is evident that our newly developed model exhibits somehow higher gas metallicity compared to H15 across a wide range of masses.This characteristic enables it to be more in line with the observations (Tremonti et al. 2004) made at high masses.

Evolution of SFR
In Fig. 12, we compare the model-predicted star formation rate density (SFRD) as a function of redshift to various observational results (Sanders et al. 2003;Takeuchi et al. 2003;Schiminovich et al. 2005;Wyder et al. 2005;Dahlen et al. 2007;Reddy & Steidel 2009;Karim et al. 2011;Robotham & Driver 2011;Magnelli et al. 2011;Bouwens et al. 2012;Cucciati et al. 2012;Gruppioni et al. 2013;Magnelli et al. 2013;Schenker et al. 2013) from redshift 0 to 6.The gray line represents the best-fitting result from Madau & Dickinson (2014).The observed SFRD peaks at a redshift of 2 gradually, which diminishes in magnitude when moving towards either higher or lower redshift values.The results of our simulation generally reproduce this trend, with specific values falling within the observational constraints.This suggests that our model effectively traces the evolutionary history of the SFR in the universe.More stars are formed in the new model on JT1G compared to H15 on MS below redshift 4.

Galaxy correlation functions
The study of galaxy correlation functions encompasses the analysis of the spatial distribution of galaxies, which in turn provides valuable information on the distribution of matter.We use the Landy Various observational data points are included (Sanders et al. 2003;Takeuchi et al. 2003;Schiminovich et al. 2005;Wyder et al. 2005;Dahlen et al. 2007;Reddy & Steidel 2009;Karim et al. 2011;Robotham & Driver 2011;Magnelli et al. 2011;Bouwens et al. 2012;Cucciati et al. 2012;Gruppioni et al. 2013;Magnelli et al. 2013;Schenker et al. 2013).The gray line represents the best fitting result from Madau & Dickinson (2014). subsamples: where  p is the distance along the projected direction and   is distance along the line-of-sight.    ,     ,     , and     are the normalised pair counts for data x-data y, data x-random y, data y-random x and random x-random y.Here x, and y indicate different subsamples.When x=y it becomes an auto-correlation function.The real-space projected correlation function

𝑥 𝑦
p  p is calculated by integrating     p ,   along the line-of-sight (Davis & Peebles 1983): where  ,max = 40 Mpc/h.We adopt the same  p and   values as in Li et al. (2006) to facilitate a direct comparison of our results with theirs.Specifically, we employ 28  p bins spanning from 0.1-50 Mpc/h with equal logarithmic intervals and 40   bins covering from 0-40 Mpc/h with equal linear intervals.The Corrfunc code (Sinha & Garrison 2020) is utilised for the calculation of the correlation function.
Given that galaxy clustering can be strongly influenced by their mass, we have categorised our model galaxies into various bins based on their stellar masses.Fig. 13 illustrates the comparison between the projected autocorrelation functions of our simulated galaxies and those observed in SDSS DR7.Each panel within the figure represents a distinct stellar mass range, shown in the upper right corner, with the black lines representing the results for the entire galaxy sample.Notably, the results for red and blue galaxies are depicted by red and blue lines, respectively.The symbols accompanied by error bars in the figure signify the measurements obtained from Li et al. (2006).
Our catalogue effectively reproduces the projected autocorrelation  function across the majority of stellar mass ranges while also capturing its dependence on colour.However, certain discrepancies between the simulation and the observations are evident.Specifically, at larger radii, we observe deviations for galaxies smaller than 10 10 M ⊙ .This could potentially be attributed to the limited volume occupied by these faint galaxies, thereby experiencing a notable cosmic variance effect.Furthermore, we note differences between the predictions of our model and the observations at smaller radii for galaxies exceeding 10 10.5 M ⊙ , indicating a significant proportion of satellite galaxies.By segregating red galaxies from blue galaxies, we find that the more strongly clustering of red galaxies contributes more prominently to the overall excess observed at smaller radii.
In summary, our refined model, incorporating adjusted parameters, excels in reproducing a wide range of observational properties: the stellar mass functions from local to redshift 3, the bimodal colour distributions of galaxies, the black hole mass functions, as well as other galaxy-halo relations and the clustering of galaxies, etc.

PROPERTIES OF VARIOUS EMISSION LINES
In this section, we first demonstrate that our galaxy catalogue meets the requirement for the next generation of large-scale surveys.We then present the model predictions of emission lines in comparison with observations.In Sec.4.2 we show the luminosity function of H, [OII], [OIII], and [OIII] + H, comparing our model results with a set of observations from local to high redshift.In Sec.4.3, we further compare the projected correlation function of  [OII] -selected samples with observations.Finally, in Sec.4.4, we show the model predicted bias of galaxies as a function of luminosity for different emission lines.

Luminosity functions for the next generation of large-scale surveys
Predictions for the luminosity function (LF) of various surveys can provide more direct forecasts for future observations.We convolve the filter functions of various surveys with the simulated galaxy SED to obtain the observed-frame LF in different bands in Fig. 14.It includes the luminosity functions in the Euclid  , , and  bands, in the LSST , , , , , and  bands, and in the CSST , , , , , , and  bands across redshifts from 0 to 3. All magnitudes account for both the two components of dust extinction mentioned in Sec.2.4 and include the contributions from various emission lines.The bottom x-axis displays apparent magnitude, while the top x-axis represents absolute magnitude.The two vertical dashed gray lines on the figures signify the detection limits of the CSST main survey (apparent magnitude of 26.5) and the deep field survey (apparent magnitude of 28).The transition point in the luminosity function, where the number density starts to decrease towards the dimmer end, serves as an approximate indicator of completeness.It happens at about 26.5 magnitudes at z = 0.24, and around 28 magnitudes at z = 0.5.Our simulation is deemed complete for the main survey above a redshift of 0.3 and the deep field survey above a redshift of 0.5.Similar completeness is expected for other surveys.We note that most tracers are above  ∼ 0.5.Meanwhile, the large box size of JT1G (1 Gpc/h) makes it suitable for studying large-scale structures.Therefore, our results offer a reasonably comprehensive prediction for the outcomes of the next generation of large-scale surveys.

Luminosity Function of Various Emission Lines
Fig. 15 shows the luminosity function of H, [OII], [OIII], and [OIII]+H from redshift 0 to 3. The model predictions incorporate dust attenuation from young birth cloud as mentioned in Sec.2.4 to align with direct observational symbols obtained from various sources.For clarity, we selected four representative redshifts to display for each line, ensuring the inclusion of corresponding observational data.Each observational point is presented in panels corresponding to the nearest redshift values.The first row in Fig. 15 presents the LF of H at z ∼ 0, 0.5, 1, and 2. Observational data points are collected from Ly et al. (2007); Gilbank et al. (2010); Drake et al. (2013); Sobral et al. (2013Sobral et al. ( , 2015)); Hayashi et al. (2018); Khostovan et al. (2020); Hayashi et al. (2020).Our model effectively reproduces the observed H LF across a wide luminosity spectrum, ranging from 10 39 erg • s −1 to 10 43 erg • s −1 , covering redshifts from 0 to 2. Since the luminosity of H  is directly related to the SFR by Eq. 5, the successful reproduction of the H  LF suggests that our model can accurately emulate the overall SFRF from redshift 0 to 2.
The second row in Fig. 15 2020); Cedrés et al. (2021).The overall agreement between observations and simulation is satisfactory, although there is a slight overestimation of the LF at the knee at low redshift.It is worth noting that limitations arise from the survey volume, which only ranges from tens of thousands to several hundred thousand Mpc 3 .Due to this constraint, the survey is susceptible to cosmic variance at these luminosities.
In summary, our study successfully reproduces the observed luminosity function for a set of emission lines from the local universe to redshift 3.This achievement suggests that both our semi-analytic galaxy model and the emission line model align with the actual universe.

Clustering of ELG
We investigate the clustering of emission line galaxies by calculating their projected auto-and cross-correlation functions for various subsamples.Following the methodology outlined in Gao et al. (2022), we partition our simulated galaxies at redshift 0.76 into distinct subsamples based on their [OII] luminosity and stellar mass.In practice, we create four samples based on [OII] luminosity, denoted as 0, 1, 2, and 3.For the cross-correlation functions, we generate four samples based on their stellar masses, denoted as 0, 1, 2, and 3.Detailed information is provided in Table 4.
(2022), derived from the VIMOS Public Extragalactic Redshift Survey (VIPERS, Scodeggio et al. 2018).In each panel, the black line represents the auto-correlation function of the specified [OII] luminosity sample.Our model successfully replicates the auto-correlation functions of galaxies within different [OII] luminosity bins across a broad range of radii, including one-halo and two-halo terms.The cross-correlation functions between the subsamples selected based on [OII] luminosity and subsamples selected based on stellar mass are depicted by the cyan/yellow/green/purple lines.It should be noted that these cross-correlation lines have been appropriately shifted by a factor of 2  , where n corresponds to the specific designation of the sample, M0, M1, M2, M3. Results from different stellar mass selected samples are presented using different colours as denoted in the bottom left corner of each panel.Overall, our model predictions exhibit an excellent agreement with observations for all subsamples.
The only exception appears for the projected cross-correlation functions between M3 and L3, where the model prediction is higher at small scales compared to observations, implying a stronger star formation close to massive central galaxies.

Bias of different emission line tracers
Emission line galaxies are one of the main targets for the next generation of large-scale surveys.It is essential to understand their bias relative to the matter distribution.Galaxy bias is often used in various cosmological probes, including baryonic acoustic oscillation, redshift distortion, etc.The bias is defined using : where   is the 3D correlation function of the galaxy sample, and   is the correlation function of the total matter.A value of  = 1 signifies that the particular galaxy sample traces the distribution of the total matter, while when b is greater or less than 1, it means that the particular galaxy sample is more or less strongly clustered than the total mass.
Here we calculate the bias of emission line galaxies as a function of luminosity and redshift.We split the simulated ELGs into bins of 0.2 dex within the luminosity range from 10 41 to 10 44 erg • s −1 .We focus on four of the most luminous lines, H, H, [OII], and [OIII].
Figure 16.Real-space cross (auto) correlation functions for subsamples L0, L1, L2, and L3 at redshift 0.76 presented in different panels.Observed data points with error bars are from Gao et al. (2022), derived from VIPERS at redshift 0.5 -0.8.In each panel, the black line represents the autocorrelation function of the specified [OII] luminosity subsample.Cyan/yellow/green/purple lines show the cross-correlation functions between the given [OII] luminosity-selected subsamples and stellar mass-selected subsamples M0/M1/M2/M3.These cross-correlation lines are shifted by a factor of 2 n , where n varies with colour (n=1(cyan), n=2(yellow), n=3(green), n=4(purple)) for clear illustration.
Bins containing fewer than 1000 galaxies are excluded.We compute the average bias over the range 20 Mpc/h to 60 Mpc/h where the bias is relatively constant and denote it as  gm .Fig. 17 presents  gm as a function of luminosity at different redshift 0 to 3 for H, H, [OII], and [OIII] selected galaxies.The errors are Poisson errors.Fig. 17 shows that for H selected galaxies at z<1 the bias exhibits a flat slope with luminosity for less luminous objects (< 10 42.5 erg • s −1 ) and undergoes a sharp increase at high luminosity.The luminosity transition increases as redshifts increase and disappears at z>1, where the bias shows a more gradual increase with luminosity across the entire range of luminosity.At low luminosities, the bias monotonically increases with redshifts.However, at high luminosities, the bias decreases with redshifts up to z ∼ 1 and then increases with redshift above z∼1.This is due to the sharp increase in luminosity at low redshifts which disappears at high redshifts.Similar features are observed in galaxies selected based on other emission lines, although the exact redshift and luminosity dependence differ among different line selections.For instance, the transition luminosity is considerably smaller in galaxies selected based on [OIII] emission line compared to H selected galaxies.
The variation in line selections can be understood as different line emissions corresponding to different physical conditions.Lines with the same luminosity may be hosted by galaxies with different properties.For example, for H ∼ 10 43 erg • s −1 , the typical galaxy stellar mass is 10 11.43 M ⊙ , SFR is 51.26 M ⊙ yr −1 , and halo mass is 10 13.49 M ⊙ .In contrast, for [OIII] with the same luminosity, the typical galaxy stellar mass is 10 11.02 M ⊙ , SFR is 65.81 M ⊙ yr −1 , and halo mass is 10 13.06 M ⊙ .
Detailed comparison among different emission lines is presented in Fig. 18.At redshift 0, it is evident that for H and [OII] selected galaxies, there is almost no dependence on emission line luminosity up to 10 42.5 erg • s −1 .The typical halo mass is 10 12.6 M ⊙ for those with luminosity around the transition.However, at higher luminosities, the bias increases rapidly, reaching ∼ 4 at 10 43 erg • s −1 , corresponding to a typical halo mass of 10 13.5 M ⊙ .At z=0, [OII] selected galaxies always have a similar bias compared to H selected galaxies.H selected galaxies exhibit a comparable bias to H selected and [OII] selected galaxies with luminosities below 10 42 erg • s −1 .The corresponding halo mass at the transition point is 10 12.55 M ⊙ , which is also similar to the halo mass of the H selected and [OII] selected galaxies.Beyond this luminosity threshold, the H selected galaxies experience a rapid increase in bias and surpass the bias of the H selected and [OII] selected galaxies, reaching a bias of approximately 2 at 10 42.5 erg • s −1 .The transition luminosity happens at even smaller luminosity for [OIII] selected galaxies.Surprisingly, we find that for galaxies with the least luminous [OIII] emissions, the bias is well below 1, also much lower than H, H and [OII] selected galaxies of the same luminosity.The corresponding typical stellar mass is 10 10.24 M ⊙ , SFR is 3.24 M ⊙ yr −1 , and halo mass is 10 11.88 M ⊙ .Such relations are also observed at other redshifts.
The relative bias of [OIII] selected galaxies compared to other line-selected galaxies varies with luminosities and redshifts.At high luminosities, H and [OII] selected galaxies have a lower bias than [OIII] selected galaxies at all redshift of interest.Conversely, at low luminosities, the [OIII] selected galaxies have a lower bias below z = 1.The transition luminosity increases with increasing redshifts.At higher redshifts, [OII] selected galaxies have a similar bias compared to [OIII] selected galaxies.
For less luminous ELGs, especially at low redshifts, their bias dependence on luminosity is weak, posing challenges in modelling their spatial distribution using halo occupation models and abundance matching techniques.To quantify the relationship between  gm and luminosity at different redshift, we employ a combined power-law and exponential function to fit  gm (): where Φ L ,  * ,  and  are free parameters.We set a minimum value of log( * ) = 40 since we do not use data from lower luminosity bins.
The fitting results are presented in Fig. 17 and Fig. 18 by solid lines.Our fitting formula consistently reproduces the observed bias correlation at all luminosities across all redshifts.The only exception appears at the highest luminosity end for some redshifts, where the sample size is relatively too small.The fitting parameters as a function of redshift are listed in Table 5.
In summary, in combination with the SAM and emission line models, we successfully reproduce most of the observed properties of emission line galaxies properties, including their luminosity functions, correlation functions, and their evolution with redshifts.At high redshifts, their bias increases with both redshift and luminosity, while at low redshift, the bias is a decreasing function of redshift, attributed to the stronger dependence on luminosity towards lower redshifts.Across all redshifts, the luminosity dependence of galaxy bias is weak below 10 42 erg • s −1

CONCLUSIONS
Our investigation focuses on generating a simulated galaxy catalogue for next-generation surveys, especially to include emission line galaxies in a self-consistent way.We solve the time convergence issue in the widely used semi-analytic model L-Galaxies and employ it in the JiuTian-1G simulation.Furthermore, we compute the luminosity of various emission lines, enabling predictions about emission line galaxies.We further study the clustering and bias of different emission lines and provide a fitting formula for bias as a function of luminosity and redshift.Our catalogue successfully reproduces various observational properties.The main conclusions of our study can be summarized as follows: (i) We observe a significant convergence problem in the L-Galaxies model presented by Henriques et al. (2015) when applied to dark matter merger trees with varying time intervals.This issue stems from the disruption model being exclusively implemented at the end of each snapshot gap.Therefore, merger trees with fewer snapshots tend to experience reduced disruption and an increased number of mergers.As mergers predominantly contribute to the growth of SMBH, a higher frequency of mergers results in more massive SMBHs, leading to more efficient AGN feedback and, consequently, less massive galaxies.By modifying the disruption model in L-Galaxies, We successfully achieve excellent convergence in simulations with merger trees of varying time intervals.
(ii) Our adapted model is applied to the JiuTian-1G simulation, and the corresponding parameters were readjusted.Our catalogue successfully reproduces numerous statistical observational properties and accurately captures the clustering patterns of diverse galaxy samples.In particular, it has been able to replicate the SMBH mass function, which was underestimated by H15 by an order of magnitude.
(iii) We demonstrate that in combination with the high resolution large boxsize JT1G simulation, L-Galaxies has successfully generated a galaxy catalogue that fulfils most of the requirements of next-generation large-scale surveys.
(iv) We compute the luminosity of 13 commonly used NUV and optical emission lines.Our model effectively reproduces the observed luminosity function of H, H, [OII], and [OIII].Additionally, the projected correlation of [OII] ELGs shows good agreement with observations.
(v) We further explore the bias of emission line galaxies as a function of luminosity and redshift.The dependence varies with luminosity ranges and redshifts.We observe that at low redshift, the bias of galaxies with low luminosity shows insensitivity to luminosity, while it increases rapidly at the high luminosity end.At high redshift, bias gradually increases with luminosity.Above z=1, galaxy bias increases monotonically with redshift, while below z=1, such a  (vi) We offer fitting formulas that capture the dependence of bias on both luminosity and redshifts.
In conclusion, our adapted model successfully replicates various galaxy observational properties, and the predictions from our emission line model align well with observations.The bias has a complex dependence on luminosity and redshift, which varies with luminosity range and redshift range.
Due to the limitation of storage, we only present photometric magnitude from several surveys and the luminosity of 13 emission lines.Additional photometric magnitude, emission lines, and full SED are available upon request.prehensive and resolution-independent coverage for galaxies with stellar masses above 10 9 M ⊙ and SFR exceeding 1 M ⊙ yr −1 , up to redshift 3.

APPENDIX C: SEDS OF TWO TYPICAL GALAXIES
In general, we can calculate the galaxy SED with emission lines for each galaxy in the catalogue.Here, the SEDs of a typical starforming galaxy and a passive galaxy are presented in Fig. C1 using the method described in Secion 2.3.We include the 13 emission lines in the SED for star-forming galaxies.It illustrates the declining feature with increasing wavelength for the star-forming galaxy, the D4000 break and UV-upturn for the passive galaxy.

Figure 1 .
Figure 1.An example of constructing merger trees with varying time intervals: using the whole 257 snapshots, we create the Mini-Hyper-257, represented by the blue lines.By skipping half of the snapshots, we construct the Mini-Hyper-129, shown by the red lines.Continuing this pattern, we skip half of the Mini-Hyper-129 snapshots to generate the Mini-Hyper-65 (green lines).We employ a similar methodology to generate merger trees with fewer snapshots, Mini-Hyper-33(purple lines).

Figure 2 .
Figure 2. Galaxy properties predicted by the original H15 model and modified model in Mini-Hyper simulations with different snapshots.The black lines represent the H15 results, while the red lines represent results from our modified model.Solid lines correspond to Mini-Hyper-33, dashed lines to Mini-Hyper-65, dotted lines to Mini-Hyper-129, and dotted-dashed lines to Mini-Hyper-257.Left panel: Stellar Mass Function (SMF); middle panel: Star Formation Rate Function (SFRF); right panel: Black Hole Mass Function (BHMF).The bottom row quantifies the differences normalised to the Mini-Hyper-257 simulation.

Figure 3 .
Figure 3.The classic BPT diagram illustrating the pre-computed grid of emission line luminosities.The x-axis represents [NII]6584/H, while the y-axis represents [OIII]5007/H.The gray grid with blue symbols denote the grid corresponding to a hydrogen density of  H = 10cm −3 , with variations in metallicity  and ionisation parameter U. Additionally, the black lines with red symbols depict the emission line ratio grid from Byler et al. (2017).

Figure 4 .
Figure 4. Galaxy Stellar Mass Function (SMF) spanning redshift 0 to 3. The red lines represent the results of our model, while the black lines represent the results of MR from H15. Purple symbols with error bars denote observational data points used in our MCMC procedures, combining various studies to estimate systematic uncertainties (see Appendix A of H15 for details).

Figure 5 .
Figure 5. Probability Distribution Function (PDF) of specific star formation rate for different stellar mass bins at redshift 0. The stellar mass ranges are shown in the upper-left corner of each panel.Gray shaded regions represent SDSS DR7 results(Brinchmann et al. 2004;Salim et al. 2007), black lines depict H15 results and red lines are the results from our new model.

Figure 9 .
Figure 9. Upper panel: Black Hole Mass Function at redshift 0. Purple dots represent the observed results of Shankar et al. (2009).Lower panel: Black Hole mass versus bulge mass relation.Observational data points from Häring & Rix (2004) (purple dots) and the relation from Kormendy & Ho (2013) (pink shaded region) are shown for comparison.The solid and dashed lines represent the median and 16%(84%) values.

Figure 10 .
Figure 10.Upper panel: the maximum velocity -stellar mass relation, compared with local results from SDSS-MaNGA(Ristea et al. 2024).The solid and dashed lines represent the median and 16%(84%) values.Bottom panel: the virial mass -stellar mass relation, compared with results from abundance matching methods(Moster et al. 2018;Behroozi et al. 2019).The solid and dashed lines represent the median and 16%(84%) values.

Figure 11 .
Figure11.The relationship between metallicity and stellar mass.The solid and dashed lines represent the median and 16%(84%) values.The gas metallicity ( gas ) is determined by the ratio of the metal mass in cold gas to the mass of cold gas and converted to oxygen abundance using the formula 12 + log 10 (O/H) gas = 8.69 + log 10 ( gas / ⊙ ).The purple line shows the result fromTremonti et al. (2004).

Figure 13 .
Figure13.Projected auto-correlation functions of simulated galaxies compared with observations in SDSS DR7.Different panels represent different stellar mass ranges, with black lines depicting results for the entire galaxy sample.The stellar mass ranges are shown in the upper right corner of each panel.Red and blue lines represent results for red and blue galaxies, respectively.Symbols with error bars denote measurements fromLi et al. (2006).

Figure 14 .
Figure14.Luminosity functions (LF) for various surveys in observed-frame across redshifts from 0 to 3. The panels show the LF for the  , , and  bands of EUCLID, and the , , , , , and  bands of LSST, and the , , , , , , and  bands of CSST.All magnitudes account for dust extinction mentioned in Sec.2.4.The bottom x-axis displays apparent magnitude, while the luminosity distance is computed from the corresponding redshift.The top x-axis represents absolute magnitude.The two vertical dashed gray lines signify the detection limits of the CSST main survey (apparent magnitude of 26.5) and the deep field survey (apparent magnitude of 28).
Fig.15shows the luminosity function of H, [OII],[OIII], and [OIII]+H from redshift 0 to 3. The model predictions incorporate dust attenuation from young birth cloud as mentioned in Sec.2.4 to align with direct observational symbols obtained from various sources.For clarity, we selected four representative redshifts to display for each line, ensuring the inclusion of corresponding observational data.Each observational point is presented in panels corresponding to the nearest redshift values.The first row in Fig.15presents the LF of H at z ∼ 0, 0.5, 1, and 2. Observational data points are collected fromLy et al. (2007);Gilbank et al. (2010);Drake et al. (2013);Sobral et al. (2013Sobral et al. ( , 2015));Hayashi et al. (2018);Khostovan et al. (2020);Hayashi et al. (2020).Our model effectively reproduces the observed H LF across a wide luminosity spectrum, ranging from 10 39 erg • s −1 to 10 43 erg • s −1 , covering redshifts from 0 to 2. Since the luminosity of H  is directly related to the SFR by Eq. 5, the successful reproduction of the H  LF suggests that our model can accurately emulate the overall SFRF from redshift 0 to 2.The second row in Fig.15displays the LF of [OII] at z ∼ 0.5, 1, 1.5, and 2. The [OII] luminosity is computed as the sum of the [OII] doublet.Observed data points are collected from Ly et al. (2007); Zhu et al. (2009); Gilbank et al. (2010); Ciardullo et al. (2013); Drake et al. (2013); Sobral et al. (2015); Comparat et al. (2015); Khostovan et al. (2015); Hayashi et al. (2018); Khostovan et al. (2020); Hayashi et al. (2020); Cedrés et al. (2021).The overall agreement between observations and simulation is satisfactory, although there is a slight overestimation of the LF at the knee at low redshift.It is worth noting that limitations arise from the survey volume, which only ranges from tens of thousands to several hundred thousand Mpc 3 .Due to this constraint, the survey is susceptible to cosmic variance at these luminosities.The third row in Fig.15illustrates the LF of the [OIII] doublet at z ∼ 0.5, 0.76, 1, and 1.5.Observational data points are collected fromLy et al. (2007);Drake et al. (2013);Hayashi et al. (2018);Khostovan et al. (2020);Hayashi et al. (2020).The last row shows the luminosity function of [OIII]5007+H at z ∼ 0.75, 1.5, 2, and 3, with observational points collected fromSobral et al. (2015);Khostovan et al. (2015Khostovan et al. ( , 2020)).Our model predictions demonstrate consistency with the observed [OIII] doublet and [OIII]5007+H LFs within a wide range of luminosity and redshift.In summary, our study successfully reproduces the observed luminosity function for a set of emission lines from the local universe to redshift 3.This achievement suggests that both our semi-analytic galaxy model and the emission line model align with the actual universe.

Figure 15 .
Figure 15.Luminosity function of H  , [OII],[OIII], and [OIII] + H from redshift 0 to 3. The model predictions include dust attenuation from young birth cloud to match direct observational symbols.Each row corresponds to a different emission line, while each panel in the row represents a specific redshift.The first row displays the H luminosity function at redshifts of approximately 0, 0.5, 1, and 2. The second row presents the [OII] doublet luminosity function at redshifts of approximately 0.5, 1, 1.5, and 2. The third row exhibits the [OIII] doublet luminosity function at redshifts around 0.5, 0.76, 1, and 1.5.The last row illustrates the luminosity function of[OIII]5007 + H at redshifts of approximately 0.75, 1.5, 2, and 3. Observational data points are taken from various studies(Ly et al. 2007;Gilbank et al. 2010;Drake et al. 2013;Sobral et al. 2013Sobral et al. , 2015;;Zhu et al. 2009;Ciardullo et al. 2013;Comparat et al. 2015;Khostovan et al. 2015;Hayashi et al. 2018;Khostovan et al. 2020;Hayashi et al. 2020;Cedrés et al. 2021).Despite the general good agreement, there are slight discrepancies, particularly at the low-redshift knee of the [OII] luminosity function.

Figure 17 .
Figure 17.Mean galaxy-matter bias ( gm ) as a function of luminosity from redshift 0 to 3 for H, H, [OII], and [OIII] selected galaxies.Different panels represent different emission lines.Solid lines represent the fitting results using Equation 16.

Figure 18 .
Figure 18.Mean galaxy-matter bias ( gm ) as a function of luminosity for H, H, [OII], and [OIII] from redshift 0 to 3. Different panels represent different redshifts.Solid lines represent the fitting results using Equation 16.

Figure B1 .
Figure B1.Performance of our model in predicting the stellar mass function (SMF) from redshift 0 to 3 on both the JT1G and MSII simulations.The red curves represent the JT1G simulation, while the black curves represent the MSII simulation.The upper row depicts the SMF, and the lower row illustrates the ratio of JT1G to MSII.The two horizontal dashed lines in the upper panels represent one galaxy and ten galaxies in MSII, respectively.

Figure C1 .
Figure C1.Upper panel: Full SED of a typical star-forming galaxy in our catalogue with emission lines.The stellar mass is 10 10.88 M ⊙ and the star formation rate is 51.70 M ⊙ yr −1 .Bottom panel: Full SED of a typical quenched galaxy in our catalogue.The stellar mass is 10 10.77 M ⊙ and the star formation rate is 0 M ⊙ yr −1 .

Table 2 .
Henriques et al. (2015)arameter estimation.The best-fit values of parameters are compared with the values published inHenriques et al. (2015).

Table 3 .
Details of the 13 emission lines provided by our study.It is noteworthy that we use the same IMF in this  (H)−SFR relation as employed in both the stellar component and the photoionization model, which ensures the overall consistency of our model.When paired with the line-ratio produced by CLOUDY as outlined in Sec.2.3.2,we are able to calculate the luminosity of any specific emission line at   .=  (H) ×    , U, ,  H ,(6)where    , , ,  H is the CLOUDY predicted ratio of the desired emission line at wavelength   to H with a given set of U,  and  H .We include the 13 most widely emission lines in NUV and optical ranges in the final catalogue 1 , as listed in Table3.Additional emission lines can be provided upon request.

Table 4 .
Details of the  [OII] -selected sample and stellar mass-selected sample.
. Each panel is for one particular [OII] luminosity bin.Observed data points with error bars are obtained from Gao et al.