Signatures of Hierarchical Mergers in Black Hole Spin and Mass distribution

Recent gravitational wave (GW) observations by LIGO/Virgo show evidence for hierarchical mergers, where the merging BHs are the remnants of previous BH merger events. These events may carry important clues about the astrophysical host environments of the GW sources. In this paper, we present the distributions of the effective spin parameter ($\chi_\mathrm{eff}$), the precession spin parameter ($\chi_\mathrm{p}$), and the chirp mass ($m_\mathrm{chirp}$) expected in hierarchical mergers. Under a wide range of assumptions, hierarchical mergers produce (i) a monotonic increase of the average of the typical total spin for merging binaries, which we characterize with ${\bar \chi}_\mathrm{typ}\equiv \overline{(\chi_\mathrm{eff}^2+\chi_\mathrm{p}^2)^{1/2}}$, up to roughly the maximum $m_\mathrm{chirp}$ among first-generation (1g) BHs, and (ii) a plateau at ${\bar \chi}_\mathrm{typ}\sim 0.6$ at higher $m_\mathrm{chirp}$. We suggest that the maximum mass and typical spin magnitudes for 1g BHs can be estimated from ${\bar \chi}_\mathrm{typ}$ as a function of $m_\mathrm{chirp}$. The GW data observed in LIGO/Virgo O1--O3a prefers an increase in ${\bar \chi}_\mathrm{typ}$ at low $m_\mathrm{chirp}$, which is consistent with the growth of the BH spin magnitude by hierarchical mergers, at $\sim 2 \sigma$ confidence. A Bayesian analysis suggests that 1g BHs have the maximum mass of $\sim 15$--$30\,M_\odot$ if the majority of mergers are of high-generation BHs (not among 1g-1g BHs), which is consistent with mergers in active galactic nucleus disks and/or nuclear star clusters, while if mergers mainly originate from globular clusters, 1g BHs are favored to have non-zero spin magnitudes of $\sim 0.3$. We also forecast that signatures for hierarchical mergers in the ${\bar \chi}_\mathrm{typ}$ distribution can be confidently recovered once the number of GW events increases to $\gtrsim O(100)$.


INTRODUCTION
Recent detections of gravitational waves (GWs) by LIGO (Aasi et al. 2015) and Virgo (Acernese et al. 2015) have shown evidence for a high rate of black hole (BH)-BH and neutron star (NS)-NS mergers in the Universe (Abbott et al. 2019;Venumadhav et al. 2019;Abbott et al. 2020b). However, proposed astrophysical pathways to mergers remain highly debated. Indeed there are currently a large number of such possible pathways, with widely different environments and physical processes. A possible list of these currently includes isolated binary evolution (e.g. Dominik et al. 2012;Kinugawa et al. 2014;Belczynski et al. 2016;Spera et al. 2019) accompanied by mass transfer (Pavlovskii et al. 2017;Inayoshi et al. 2017;van den Heuvel et al. 2017), common envelope ejection (e.g. Paczynski 1976;Ivanova et al. 2013), envelope expansion (Tagawa et al. 2018), chemically homogeneous evolution in a tidally distorted binary Marchant et al. 2016  , and interaction in active phases of galactic nucleus (AGN) disks (e.g. Bartos et al. 2017;Stone et al. 2017;McKernan et al. 2018;Tagawa et al. 2020b).
In AGN disks, hierarchical mergers are predicted to be frequent due to the high escape velocity and efficient binary formation and evolution facilitated by gaseous (Yang et al. 2019;McKernan et al. 2020b) and stellar interactions (Tagawa et al. 2020b). Yang et al. (2019) and McKernan et al. (2020b,a) identified the expected mass ratio and spin distribution of hierarchical mergers in hypothetical migration traps (MTs) of AGN disks, defined to be regions where objects accumulate rapidly as they interact with the accretion disks analogously to planetary migration. 1 Tagawa et al. (2020aTagawa et al. ( , 2021b showed that hierarchical mergers take place in AGN disks without MTs and derived the corresponding mass and spin distributions self-consistently. In the latter models (e.g. Tagawa et al. 2020a), the mass and spin distributions of merging BHs are significantly different compared to those in the former models. This is mainly due to binary-single interactions which take place frequently at large orbital radii where the gas density is very low and gas effects drive the binaries toward merger more slowly and allow ample time for such binary-single interactions.
Several authors have investigated the properties of GWs associated with hierarchical mergers (Gerosa & Berti 2017;Yang et al. 2019;Kimball et al. 2020a;Doctor et al. 2020). Gerosa & Berti (2017) estimated the fraction of future detected sources contributed by hierarchical mergers under the assumption that first-generation (1g) BHs have a flat spin distribution and binary components are drawn independently. Fishbach et al. (2017) estimated the required number of events to detect hierarchical mergers using the distribution of the BH spin magnitudes. Doctor et al. (2020) constructed a toy model to obtain the properties of hierarchical mergers from the distribution of subpopulations for BHs under various assumptions for coagulation and depletion in the population and constrained parameters using LIGO/Virgo O1-O2 data. Kimball et al. (2020a) examined whether the observed events in the same catalog are compatible with hierarchical mergers particularly in GCs. These models found no evidence for a high rate of hierarchical mergers in this early catalog. More recently, by analyzing the ensemble of events detected during LIGO/Virgo's O1-O3a observing runs, Kimball et al. (2020b) and Tiwari & Fairhurst (2020) found preference for at least one, but probably multiple hierarchical mergers in the detected 1 Note that the orbital radii where this takes place were derived by assuming Type-I migration (Bellovary et al. 2016), but these assumptions may be inconsistent for BHs embedded in AGN disks as gaps may be opened in the accretion disks (see e.g. Eqs. 45-46 Kocsis et al. 2011). Also, Pan & Yang (2021) found that the traps can disappear if radiation pressure is correctly accounted for.
sample. The conclusion of Kimball et al. (2020b) strongly depends on the assumed escape velocity in the host environment, with higher escape velocities favoring a larger number of hierarchical mergers.
In this paper, we focus on distributions of the effective and precession spin parameters (χ eff and χ p ) and the chirp mass (m chirp ), and predict characteristic features in them expected from hierarchical mergers. We use m chirp as this variable is most precisely determined by GW observations, and χ eff and χ p as these characterize the BH spin magnitudes in a binary. Here, χ eff and χ p are defined as and χ p = max a 1 sinθ 1 , q 4q + 3 4 + 3q a 2 sinθ 2 (2) (Hannam et al. 2014;Schmidt et al. 2015), where m 1 and m 2 are the masses, a 1 and a 2 are the spin magnitudes, θ 1 and θ 2 are the angles between the orbital angular momentum directions and the BH spins of the binary components, q ≡ m 2 /m 1 ≤ 1 is the mass ratio, and m chirp ≡ (m 1 m 2 ) 3/5 (m 1 + m 2 ) −1/5 . We identify and characterize features expected in hierarchical mergers using mock GW data, and find that intrinsic properties (maximum mass and typical spin magnitude) of 1g BHs can be constrained by recovering the features, which enables us to distinguish astrophysical models. By analyzing the GW data obtained in LIGO/Virgo O1-O3a, we investigate whether such features are consistent with observed GW data, and identify the astrophysical population models most consistent with the data. Finally, using mock GW data, we estimate how well parameters characterizing the spin distribution can be recovered in future catalogs depending on the number of events. The paper is organized as follows. In § 2, we describe our method to construct mock GW data and detect signatures for hierarchical mergers. We present our main results in § 3, and give our conclusions in § 4.
2. METHOD 2.1. Overview We introduce a mock dataset generated by a simple N -body toy model ( § 2.2), which allows us to explore hierarchical mergers more generically ( § 3.1). To identify features in the distributions representative of hierarchical mergers, we use a simple analytic model characterizing the spin distribution profile ( § 2.3.1), and apply it to the observed GW data ( § 3.2.1) and the N -body toy model ( § 3.3). Furthermore, to assess how well model predictions match the observed GW data, we also use a Bayes factor to assess relative likelihoods of models (including the N -body toy model and a physical model for mergers in AGN disks adopted from our simulations in Tagawa et al. 2021b, § 3.2.2).
In the analyses, we mostly use χ typ ≡ (χ 2 eff + χ 2 p ) 1/2 as it characterizes the spin magnitudes of BHs in binaries, and it is easily calculated from the quantities χ eff and χ p taken from LIGO Scientific Collaboration & Virgo Collaboration (2020) and LIGO Scientific Collaboration & Virgo Collaboration (2021). However one should be aware of the following properties of χ typ . First, unlike χ eff , χ p is not conserved up to 2PN (e.g. Gerosa et al. 2020a), suffering additional uncertainties due to its modulation. Second, due to the geometry, the contribution of χ p is on average larger than χ eff by a factor of ∼ 3 in cases of isotropic BH spins (Eq. 26 in the Appendix). Third, χ p is often unconstrained in the LIGO/Virgo events (e.g. Fig. 8).

Constructing mock GW data
To understand and analyze the distributions of χ eff , χ p , and m chirp typically expected in hierarchical mergers, we employ mock GW data.
2.2.1. Overall procedure We construct mock data by following the methodology of Doctor et al. (2020): 1 Sample N 1g BHs from 1g population as described in § 2.2.2. We set N 1g = 10 6 to ensure a sufficient number for detectable mergers. We call this sample S.
2 Choose ωN ng pairs from S by weighing the pairing probability Γ ( § 2.2.2), where ω is the fraction of BHs that merge at each step, and N ng is the number of BHs in the sample S (N ng = N 1g in the first iteration).
3 Compute the remnant mass and spin, and the kick velocity for merging pairs assuming random directions for BH spins, where we use the method described in Tagawa et al. (2020a). Update the sample S by removing BHs that have merged, and adding merger remnants if the kick velocity is smaller than the escape velocity (v esc ).
4 Repeat steps 2-3 for N s steps.
5 Determine the fraction of detectable mergers by assessing whether signal-to-noise ratio (SNR) of mergers exceeds the detection criteria ( § 2.2.3). Randomly choose N obs observed mergers from the detectable merging pairs. Add observational errors following § 2.2.3, and construct a mock GW dataset.
By changing the underlying parameters of the merging binaries in mock GW data (λ 0 ; presented in the next section), we can construct various χ eff , χ p and m chirp distributions expected in hierarchical mergers. For example, N s and ω influences the fraction of hierarchical mergers (∝∼ ω Ns ), while N s specifies the maximum generation and mass of BHs.

First generation BHs and pairing
We assume that the masses of 1g BHs are drawn from the power-law distribution as where α is the power-law slope, m min and m max are the minimum and maximum masses. We set the dimensionless spin magnitude for 1g BHs to where U [−1 : 1] represent uniform distribution randomly chosen from -1 to 1, and a ave and a uni are parameters characterising initial spins of 1g BHs. We assume that the spin magnitude for 1g BHs does not depend on the masses of 1g BHs. This assumption may be justified for single BHs, for which slow rotation is motivated by theoretical considerations (Fuller & Ma 2019 To draw merging pairs, we simply assume that the interaction rate depends on the binary masses with a form as employed in Doctor et al. (2020). This parameterization enables us to mimic the effects that massive and equal-mass binaries are easy to merge in plausible models due to exchanges at binary-single interactions, mass segregation in clusters, interaction with ambient gas, mass transfer, or common-envelope evolution (e.g. O'Leary et al. 2016;Rodriguez et al. 2019;Tagawa et al. 2021b;Olejak et al. 2020).
Using the model described above and adding observational errors ( § 2.2.3), we can construct a mock observational dataset.
The parameter set characterizing a mock dataset is λ 0 = {α, m min , m max , a ave , a uni , γ t , γ q , ω, N s , v esc , N obs }. The fiducial choice of λ 0 is described in § 2.2.4 and Table 1.

Mock observational errors
To construct mock GW data, we need to put observational errors on observables. The true values of observables θ are produced through the procedures in § 2.2.1 and § 2.2.2 assuming a set of the population parameters λ 0 . To incorporate observational errors to the mock data, we refer to the prescription in Fishbach & Holz (2020). We assume that the binary is detected if the SNR of the signal in a single detector exceeds 8. We set the typical SNR, ρ 0 , of a binary with parameters m chirp , χ eff , and the luminosity distance d L to where we fix m chirp,8 = 10 M ⊙ and d L,8 = 1 Gpc (see eq. 26 in Fishbach et al. 2018). This scaling approximates the amplitude of a GW signal, m chirp,8 and d L,8 are chosen to roughly match the typical values detected by LIGO at design sensitivity (Chen et al. 2017), and the dependence on χ eff roughly reproduces results in The LIGO Scientific Collaboration et al. (2019). We calculate d L from z assuming ΛCDM cosmology as stated above. The true SNR depends on the angular factor Θ, and is given by Θ plays the combined role of the sky location, inclination, and polarization on the measured GW amplitude. We tune the width of the distribution to control the uncertainty of the measured signal strength, which in turn controls the uncertainty on the measured luminosity distance. We simply set Θ to a log-normal distribution with following Fishbach et al. (2018).
From the true parameters ρ, m chirp (1 + z), z, χ eff and Θ, we assume that the four parameters, the SNR (ρ obs ), the chirp mass (m chirp,obs ), χ eff,obs , and χ p,obs , are given with errors as below. We assume that the fractional uncertainty on the detector-frame chirp mass is that on the SNR is following Fishbach & Holz (2020), and that on χ eff and χ p is, respectively, and which roughly match typical observational error magnitudes in Abbott et al. (2019) and Abbott et al. (2020b). We assume that the observed median valuesm chirp,obs , ρ obs ,χ eff,obs , andχ p,obs , respectively, from a normal distribution centered on the true values m chirp (1+z), ρ, χ eff , and χ p with the standard deviation σ m chirp , σ ρ , σ χ eff , and σ χp . We further assume that the posterior distributions of m chirp , ρ, χ eff , and χ p including errors for GW data in the i th event are, respectively, calculated by drawing from a normal distribution centered onm chirp,obs ,ρ obs , χ eff,obs , andχ p,obs with the standard deviation σ m chirp , σ ρ , σ χ eff , and σ χp . An observed value of z is calculated from d L derived by incorporating the observed values to Eq. (7) and the relation between z and d L so that Eq. (7) is valid for derived z. Table 1 lists the parameter values adopted in the fiducial model. Referring to Fuller & Ma (2019), we set small BH spin magnitudes for 1g BHs as a ave = a uni = 0. The power-law slope in the mass function for 1g BHs is given as α = 1. Assuming mergers in (active phase of) NSCs, where hierarchical mergers are probably most frequent, we set m max = 20 M ⊙ as NSCs are mainly metal rich (e.g. Do et al. 2018;Schödel et al. 2020), v esc = 1000 km/s typically expected for merging sites of binaries (Tagawa et al. 2020b), γ t = 2 and γ q = 2 as high-and equal-mass BHs are easier to merge in dynamical environments, and ω = 0.1 and N s = 4 to reproduce frequent hierarchical mergers ( Table 2, Tagawa et al. 2021b).

Reconstruction of the spin distribution
Here, we present a way to detect features for hierarchical mergers that possibly appear in the distribution of spins and masses.

Model characterizing the spin distribution
Given the universal trends of hierarchical mergers in the averaged spin magnitude as a function of masses for merging binaries ( § 3.1.1), we investigate how well such trends can be reconstructed using a finite number of events. To do this, we replace the procedure above with a simple parametric analytic toy model, directly describing the distribution of the three variables (θ = {χ eff , χ p , m chirp }) in terms of a set of the parameters (λ) as where N (x 0 |x 1 , x 2 ) represents the probability to return x 0 for the normal distribution with the mean x 1 and the standard deviation x 2 , T [−1, 1] means to truncate the normal distribution to the range [−1, 1] and normalize N so that the integral of N in this range is 1, and We use χ typ since it roughly represents the spin magnitudes of BHs in a binary. Hence, this model has five parameters λ = {a µ , b µ , a σ , b σ , m crit } characterizing the χ typ profile as a function of m chirp . The functional form of the model (eq. 14) is motivated by the prediction that hierarchical mergers favor a plateau in the distribution ofχ typ vs. m chirp at high m chirp as the BH spin magnitudes roughly converge to a constant value of ∼ 0.7 as a result of mergers with isotropic spin directions, whilē χ typ roughly linearly approaches the value at the plateau from lower m chirp according to Figs. 1 and 7. We simply adopt the same functional form for σ χ with ν χ . Since the BHs formed from mergers typically have spins dominated by the orbital angular momentum of their progenitor binary (i.e. ∼ 0.7), the dispersion in the χ typ distribution is expected to converge to a constant beyond m crit , producing a plateau. This motivates the functional form of Eq. (16) to describe the relation between the spins and mass for hierarchical mergers.
The model parameters, λ, are estimated from GW data through a Bayesian analysis, whose details are described in the next section.

Bayesian analysis
To derive the posterior distribution of λ from a dataset {d i }, p(λ|{d i }), we use the Bayesian formalism as follows. Here, d i encodes the measurable parameters (θ) and also includes their random noise in the i th event. Bayes' rule gives where p({d i }|λ) is the likelihood to obtain {d i } for λ, π(λ) is the prior probability for the model parameters λ, and the evidence p({d i }) is the integral of the numerator over all λ.
We assume that each GW detection is independent, so that The probability of making observation i is where the normalization factor A(λ) is given by is the detection probability for a given set of parameters, and "threshold" denotes that the event d is detectable when d is above the threshold. To reduce computational costs, we assume that A(λ) is constant. This assumption does not affect our results as A(λ) varies by less than a factor of 1.1 if the spin directions of BHs are assumed to be isotropic, meaning that the variation of A(λ) per each steps in the Monte Carlo method ( § 2.3.3) is negligible. This is because the detection probability is influenced only by χ eff by changing λ (see Eqs. 7 and 14), and the reduction and enhancement of the detectable volume for mergers with negative and positive χ eff are mostly cancelled out. The likelihood p(d i |θ) can be rewritten in terms of the posterior probability density function (PDF) p(θ|d i ) that is estimated in the analysis assuming prior π(θ) as The posterior PDF p(θ|d i ) has information on errors, and it is often discretely sampled with S i samples from the posterior, { j θ (i) }, for j ∈ [1, S i ]. Because the samples are drawn according to the posterior, the parameter space volume associated with each sample is inversely proportional to the local PDF, d j θ (i) ∝ [p( j θ (i) |d (i) )] −1 , which allows us to replace the integral with a discrete sum (e.g. Mandel et al. 2019;Vitale et al. 2020). Overall, the posterior distribution of λ is given as where we factor out the evidence factors p({d i }) and since it is independent of λ and does not affect the relative values of the posterior p(λ|{d i }). We use a flat prior distribution for π(λ). We set π(θ) ∝ d 2 L (z) following the standard priors used in the LIGO/Virgo analysis of individual events (Veitch et al. 2015). We assume flat priors on χ p and χ eff . Note that this is different from the LIGO/Virgo analysis which used uniform priors for the component spin magnitudes and they are appropriately transformed to priors for χ p and χ eff . We set S i = 3N obs so that we can take into account uncertainties whose probability is in the order of ∼ 1/N obs .

Markov chain Monte Carlo methods
We calculate the posterior distribution (Eq. 23) using Markov chain Monte Carlo (MCMC) methods. We track one chain for 10 7 steps, set the first half to a burn-in period, check convergence by verifying that values for parameters after the burn-in period are oscillating around a constant average and dispersion. We adopt Metropolis-Hastings algorithm (e.g. Hastings 1970), and set a proposal distribution to the normal distribution with the values at each step as the means and the standard deviations for a µ , b µ , a σ , b σ , and m crit to be 0.0001 M −1 ⊙ , 0.01, 0.0001 M −1 ⊙ , 0.01, and 1.0 M ⊙ , respectively. The standard deviations of the proposal distribution are roughly given by the typical standard deviations of the posterior distribution divided by ∼ 4 as this setting works well for convergence. We do not pose thinning to a posterior distribution as the autocorrelation for each variable between adjacent steps is already as small as 10 −5 . We restrict m crit in the ranges from m min to the maximum m chirp among observed events.

RESULTS
In § 3.1, we investigate characteristic features in hierarchical mergers, using our flexible tool ( § 2.2) to generate mock GW datasets for a large range of input parameter combinations. In § 3.2, we analyze GW data observed in LIGO/Virgo O1-O3a. We first derive signatures and properties of hierarchical mergers ( § 3.2.1), using the simple fitting formula for spin vs. chirp mass ( § 2.3.1). We then assess ( § 3.2.2) how well the predictions in our mock GW catalogs and in our physical AGN disk models (Tagawa et al. 2021b) in fact match these observed GW data. Finally, in § 3.3, we analyze mock GW data, and investigate how well the signatures of hierarchical models, described by the simple fitting formulae Table 1 Fiducial values of our model parameters.

Parameter
Fiducial value The number of observed events N obs = 1000 Frequency of mergers for high-mass binaries γt = 2 Frequency of mergers for equal-mass binaries γq = 2 The spin magnitudes for 1g BHs aave = 0, a uni = 0 Maximum and minimum masses for 1g BHs mmax = 20 M ⊙ , m min = 5 M ⊙ Power-law exponent in the mass function for 1g BHs α = 1 Fraction of BHs that merges at each step ω = 0.1 Number of merger steps Ns = 4 Escape velocity of systems hosting BHs vesc = 1000 km/s The parameter for correlation between the steps and the redshift wz = ∞ (no correlation)  (Table 2). We use N obs = 10 3 detectable mergers. In panels (b)-(c), the profiles for model M1 are presented by gray lines. Bars correspond to 1σ credible intervals.

Table 2
Properties of hierarchical mergers in our models. The first and second columns indicate the model number and its variation from the fiducial model (Table 1). The third and fourth columns show the fraction of high-g mergers among all and detectable mergers, respectively. The fifth column shows the maximum chirp mass (m chirp,max ) among N obs = 10 3 detectable mergers. The sixth and seventh columns show the average and the standard deviation of χp among all merging pairs. We first show the parameter dependence of theχ typ profile as a function of m chirp using mock GW events, in which hierarchical mergers are assumed to be frequent. In Table 2, we list the model varieties we have investigated. These include the fiducial model (M1), and 12 different varieties (models M2-M13). We examine different choices of the initial spin magnitudes (models M5-M8) and the maximum mass of 1g BHs (model M9), the fraction of hierarchical mergers (models M10-M13), and the several parameter sets mimicking different populations (models M2-M4, Table 3). We also investigate a variety of additional models in the appendix (models M14-M28, Table 5). Fig. 1 shows the profiles for models M1-M13 (Table 2). For models in which hierarchical mergers are frequent (panels (b) and (c) of Fig. 1 and Fig. 7), there are universal trends for hierarchical mergers in theχ typ profiles: (i) increase (or decrease) ofχ typ to ∼ 0.6 at low m chirp . (ii) plateau ofχ typ with ∼ 0.6 at high m chirp . Thus, the profile is roughly characterized by two lines if hierarchical mergers are frequent, mergers originate mostly from one population, and the typical spin magnitude for 1g BHs does not depend on their masses. The profile ofχ typ strongly depends on a ave , a uni , and m max (Fig. 1 b), while it is less affected by the other parameters (see Fig. 7).
The typical value ofχ typ ∼ 0.6 at the plateau can be understood as follows. When masses and spin magnitudes between the primary and secondary BHs are similar (m 1 ∼ m 2 and a 1 ∼ a 2 ∼ a 0 ) and the directions of BH spins are isotropic, the typical magnitude of massweighted BH spins is where . . . represents an average over the number of samples. If we approximatē Since merger remnants typically have spin magnitudes of a 0 ∼ 0.7 (Buonanno et al. 2008),χ typ ∼ 0.6 for mergers among high-g BHs, which is roughly consistent with the value at the plateau ( Figs. 1 and 7). Note that when q ≪ 1, |a w | ∼ a 0 and so the average value is slightly enhanced toχ typ ∼ 0.93a 0 . As m max increases, the bending point between the two lines increases (gray and cyan lines in Fig. 1 b). This is Table 3 Adopted parameter values for several populations. The differences with respect to the fiducial model (Table 1) because m max determines the critical mass above which all merging BHs are of high generations with high spins of ∼ 0.7. As the bending point is not influenced by the other parameters, the maximum mass of 1g BHs can be estimated from the bending point of theχ typ profile. Note that since the bending points of the χ p and χ eff profiles are similar in shape to that of the χ typ profile for mergers with isotropic BH spins (Fig. 3 a), either χ typ , χ p or χ eff can constrain the maximum mass of 1g BHs if the profiles are reconstructed well. Additionally, a ave and a uni influenceχ typ at the smallest values of m chirp (Fig. 1 b). This suggests that typical spin magnitudes of 1g BHs can be presumed by spins at small m chirp . However, note thatχ typ at small m chirp is also influenced by the observational errors on χ p and χ eff . Due to the smaller errors on |χ eff | compared to χ p , |χ eff | may constrain the typical spin values of 1g BHs more precisely using a number of events (green and orange lines in Fig. 3 a). Note thatχ p > |χ eff | when the BH spins are isotropic due to their definition. In model M7, the average and the dispersion of the spin magnitude for 1g BHs are set to be roughly the same as for the merger remnants. In such cases, the signatures of hierarchical mergers cannot be identified from the spin distributions (brown line in Fig. 1 b). Also, for models in which the typical spin magnitude for 1g BHs are close to ∼ 0.7 (e.g. models M5 and M8), a large number of events are needed to detect the hierarchical merger signatures.
In Fig. 1 (c), we can see how the features for hierarchical mergers in the χ typ profile are influenced by the fraction of hierarchical mergers for N obs = 50. The plateau at high m chirp is seen for N s = 3 (orange), while the rise of χ typ to ∼ 0.6 at low m chirp is seen for N s = 2 with ω ≥ 0.05 (green and brown). These suggest that with N obs = 50 the plateau and the rise of χ typ to ∼ 0.6 can be confirmed when the detection fraction of mergers of high-g BHs roughly exceeds ∼ 0.5 and ∼ 0.15, respectively (models M10, M12, Table 2).
To summarize, the profile ofχ typ is mostly affected only by a ave , a uni , and m max , while the other parameters may affect the maximum m chirp or the frequency of highg mergers (Tables 2 and 5 mergers in AGN disks outside of MTs (Tagawa et al. 2021b) or NSCs. The χ typ profile is similar but the m chirp distribution is different between the fiducial model and physically motivated models derived in Tagawa et al. (2021b). The former is because the profile is characterized by the few parameters (m max , a uni , a ave ) as found in § 3.1, while the latter is because the m chirp distributions are affected by how BHs pair with other BHs and merge in AGN disks.
In this section, we additionally consider the spin distributions for mergers typically expected in several environments, including GCs, FBs, and MTs of AGN disks. Values of the parameters adopted to mimic these populations are listed in Table 3. Figs. 2 and 3, and panel (a) in Fig. 1 present the distributions and the profiles of the spin parameters (χ typ , χ p , and χ eff ) as functions of m chirp for these populations. Fig. 4 is the same as Fig. 3, but mergers are contributed by a mixture of two populations. Some contribution from multiple populations to the observed events is also favored by the analysis in Zevin et al. (2020b).
For mergers in GCs, we set lower escape velocity v esc = 30 km/s, N s = 2 and ω = 0.03 to reproduce the detection fraction of hierarchical mergers of ∼ 10-20%, which is predicted by theoretical studies (e.g. O'Leary et al. 2016;Rodriguez et al. 2019, Table 2). We chose higher m max = 45 M ⊙ as GCs are composed of metalpoor stars (e.g. Peng et al. 2006;Leaman et al. 2013;Brodie et al. 2014); other parameters are the same as those for AGN disks. Note that m max in metal-poor environments is uncertain due to uncertainties on the reaction rate of carbon burning (Farmer et al. 2019) and the enhancement of the helium core mass by rotational mixing (Chatzopoulos & Wheeler 2012;Yoon et al. 2012;Vink et al. 2021).
Due to higher m max ,χ typ continues to increase until higher m chirp (panel b in Fig. 3, see also Rodriguez et al. 2018) compared to the fiducial model (panel a). Also, 90 percentile regions are distributed around χ eff ∼ 0 and χ p ∼ 0 (Fig. 2 b) as a large fraction of mergers are among 1g BHs. Thus, the distribution ofχ typ at low m chirp is clearly different between mergers in AGN disks and GCs, mainly due to the difference of m max and the fraction of mergers among high-g BHs. If mergers are comparably contributed both by GCs and AGN disks, steep increase ofχ typ against m chirp appears twice (panel a in Fig. 4). Thus, mixture of these populations can be discriminated by analyzing the spin distribution. Note that the intermediate line between the two increases in theχ typ profile is roughly characterized by the ratio of mergers from AGN disks and GCs. Hence, the contribution from multiple populations would be distinguishable by analyzing the profile by using a number of GW events.
For mergers among FBs, we set N s = 1 and m max = 45 M ⊙ . Although BH spin distributions are highly uncertain, we refer to Bavera et al. (2019) who proposed that χ eff is high at low m chirp of 10 − 20 M ⊙ as low-mass progenitors have enough time to be tidally spun up. We  assume that a uni follows BH spins are assumed to be always aligned with the orbital angular momentum of binaries, although we do not always expect spins to be aligned (e.g. Kalogera 2000; Rodriguez et al. 2016b). In such a setting, |χ eff | decreases as m chirp increases (panels c of Figs. 2 and 3). Also, non-zero χ p is due to assumed observational errors (orange line in Fig. 3 c). The profile expected for the binary evolution channel is significantly different from those expected for the other channels. If mergers arise comparably from FBs and GCs, |χ eff | exceedsχ p at low m chirp (panel c of Fig. 4). As contribution from mergers in FBs enhances |χ eff | relative toχ p at low m chirp , we could constrain the contribution from FBs using the ratio of |χ eff | toχ p . Observed events so far suggest that |χ eff | is typically lower than χ p at low m chirp (panel e of Fig. 3), implying that the contribution to the observed mergers from FBs is minor, unless adopted spins for 1g BHs need significant revisions.
For mergers in MTs, we assume that parameters are the same as in the fiducial model (Table 1), while BH spins are always aligned with the orbital angular momentum of the binaries. Such alignment is expected for binaries in MTs where randomization of the binary orbital angular momentum directions by binary-single interactions is inefficient due to rapid hardening and merger caused by gas dynamical friction (unlike in gaps formed further out in the disk, where these interactions were found to be very important by Tagawa et al. 2020a), and so the BH spins are aligned with circumbinary disks due to the Bardeeen-Petterson effect (Bardeen & Petterson 1975), and circumbinary disks are aligned with the binaries due to viscous torque (e.g. Moody et al. 2019). Here, we assume that the orbital angular momentum directions  of binaries are the same as that of the AGN disk referring to Lubow et al. (1999), which is different from the assumption (anti-alignment with 50%) adopted in Yang et al. (2019). In this model, the χ p and |χ eff | distributions are significantly different from those in the other models (panels d of Figs. 2 and 3). The value of χ eff at high m chirp is typically high, while χ p is low. When mergers originate comparably in MTs and GCs, |χ eff | significantly exceedsχ p in a wide range of m chirp (Fig. 4 d).
As |χ eff | is typically lower thanχ p in the observed events in all m chirp bins (Fig. 4 e), the contribution from MTs to the detected mergers is probably minor.
3.2. Application to LIGO/Virgo O1-O3a data 3.2.1. Reconstruction of spin profiles We analyze the GW data observed in LIGO/Virgo O1-O3a reported by Abbott et al. (2019) and Abbott et al. (2020b). Although χ p and χ typ suffer large uncertainties (e.g. Fig. 8 To confirm the features in the χ-profiles due to hierarchical mergers, we reconstruct theχ typ profile from the observed GW data in the way described in § 2.3. We discretize the posteriors for m chirp , χ eff , and χ p with 20, 40, and 20 bins in the ranges from the minimum to the maximum of posteriors for m chirp,i , from -1 to 1, and from 0 to 1, respectively. Note that the prior and posterior distributions for some events are similar to each other, which means that χ p is less constrained by the waveforms. To exclude events in which χ p are not well estimated, we only use events in which the Kullback-Leibler (KL) divergence between prior and posterior samples evaluated using heuristic estimates of χ p (D KL ) exceeds a critical value of D KL,cri = 0, 0.05, 0.1, 0.15, or 0.2. We consider that χ p for events with non-zero D KL is statistically useful to understand the spin distribution. We use the events with m 2 ≥ 5 M ⊙ provided in LIGO Scientific Collaboration & Virgo Collaboration (2020)  are 44, 28, 20, 12, and 7, respectively. We present 1σ errors on the estimated parameters below unless stated otherwise. The reconstructedχ typ profiles for D KL,cri = 0, 0.05, 0.1, and 0.15 are, respectively, presented by orange lines in panels (a)-(d) of Fig. 5, and the posterior distributions and correlations of the reconstructed parameters for D KL,cri = 0 are presented in Fig. 10 in the Appendix. For D KL,cri = 0, 0.05, 0.1, 0.15, and 0.20, respectively, χ typ at the plateau is b µ = 0.51 +0.14 −0.07 , 0.55 +0.14 −0.08 , 0.55 +0.
To understand the influence of GW190521, which seems to have a large impact on spin distributions due to its large mass and χ p , we repeated our analysis excluding this event. In this case, for D KL,cri = 0, 0.05, and 0.1, respectively, b µ = 0.50 +0.15 −0.07 , 0.52 +0.14 −0.08 , and 0.59 +0.24 −0.14 , m crit = 36 +19 −14 M ⊙ , 25 +23 −11 M ⊙ , and 37 +19 −17 M ⊙ , and a µ = 8 +4 −4 × 10 −3 M −1 ⊙ , 12 +10 −6 × 10 −3 M −1 ⊙ , and 12 +4 −5 ×10 −3 M −1 ⊙ , while for D KL,cri = 0.15 and 0.2, the parameters are not well determined due to the small number of events. For D LK,crit ≤ 0.1, the evaluated values of the parameters are similar with and without GW190521. The positive value of the slope (a µ ), i.e., the increase ofχ typ at low m chirp is confirmed with 2σ confidence, which is a tell-tale sign of frequent hierarchical mergers. Also, according to the analysis in § 3.1, the detection of the rise ofχ typ at low m chirp with N obs = 50 roughly requires that the detection fraction of mergers of high-g BHs exceeds ∼ 0.15. As the number of events is smaller than 50 (e.g. N obs = 28 for D KL,cri = 0.05), the highg detection fraction would be even higher than ∼ 0.15. Thus, hierarchical mergers are preferred from the analysis. Note that accretion can also produce a positive correlation, but |χ eff | > χ p is predicted in such cases, similarly to mergers in MTs (panel d of Fig. 3). As |χ eff | < χ p is predicted by GW observations (panel e of Fig. 3), accretion is disfavored as a process enhancing the BH spin magnitudes.
For D KL,cri = 0.05, 0.1, 0.15, and 0.20 (panels b, c, and d of Fig. 5), the value ofχ typ at the plateau (b µ ∼ 0.6) is consistent with that expected from hierarchical merg-ers (∼ 0.6), which possibly supports frequent hierarchical mergers with the high-g detection fraction to be 0.5 ( § 3.1.1). On the other hand, for D KL,cri = 0, b µ ∼ 0.5, which is somewhat lower than the expected value of 0.6. This is presumably because χ p values for events with D KL ≤ 0.05 are not well constrained and just reflect assumed priors. Also, note that events with high χ p might tend to be missed as the waveform for large χ p (Apostolatos et al. 1994;Kidder 1995;Pratten et al. 2020) or spin (Kesden et al. 2010;Gerosa et al. 2019) mergers often accompany strong amplitude modulation, reducing SNRs.
The critical chirp mass at the bending point of thē χ typ profile (m crit ) is related to the maximum mass of 1g BHs (Fig. 1 f). The analysis loosely constrains the parameter to m crit ∼ 15-50 M ⊙ , from which we discuss in § 3.3 that the maximum mass of 1g BHs is estimated to be ∼ 20-60 M ⊙ . However, it needs a caution that m crit is restricted from 5 M ⊙ to the maximum chirp mass among the event (∼ 67 M ⊙ ) in this analysis, which may artificially produce the bending point and the plateau. To confidently confirm the plateau, m crit needs to be precisely constrained compared to the allowed range for m crit of 5-67 M ⊙ , which would require further events (see also § 3.3).

Bayes factors on spins and mass distributions
In the previous section we focus on theχ typ profile, while here we use the distributions of χ eff , χ p , and m chirp and discuss the preferred values for underlying parameters λ 0 .
To assess the relative likelihood to produce each event in different models, we calculate the Bayes factors between pairs of models, where P (d i |A) is the likelihood of obtaining data d i observed in the GW event i from model A, and P (m chirp , χ eff , χ p |A) is the probability distribution of m chirp , χ eff , and χ p in model A. We calculate the three dimensional likelihood P (d i |m chirp , χ eff , χ p ) for the events. We calculate the Bayes factors for events with D KL ≥ D KL,crit = 0, 0.05, 0.1, 0.15, and 0.2. We consider D KL,crit = 0.05 as the fiducial value, and mostly discuss the Bayes factors for D KL,crit = 0.05 below. Note that the events with positive Bayes factors for D KL,crit = 0.1, 0.15, or 0.2 always have positive Bayes factors also for D KL,crit = 0.05 somewhat incidentally.
To calculate P (m chirp , χ eff , χ p |A), we first count mergers in 30 × 30 × 30 uniform bins in χ eff , m chirp , and χ p for model A. The maximum and minimum values of m chirp for the bins are set to 100 and 5 M ⊙ , respectively. In this section, we generate 1000 mergers for each model. To include error distributions for the variables (m chirp , χ eff , χ p ) to P (m chirp , χ eff , χ p |A), we sample 10 different realizations for each merger event predicted by the model. To reduce the statistical fluctuation in the distribution of χ eff , m chirp , and χ p due to the finite number of mergers in our models, we perform a kernel-density estimate for the distribution using Gaussian kernels whose bandwidth is chosen to satisfy the Scott's Rule (Scott 1992). We calculate P (d i |m chirp , χ eff , χ p ) by means of 300 samples generated according to the observed posterior distributions as used in the previous section.
For reference, we also calculate the Bayes factors for the two parameters, m chirp and χ eff , using the 44 events used in the analysis with D KL ≥ 0 in the previous section. Table 4 lists the Bayes factors for some models relative to the fiducial model (= B, Table 1). The Bayes factors suggest that, compared to the m chirp , χ eff , and χ p distributions typically expected for mergers in FBs and MTs (Table 3), the observed distribution is much more consistent with those in AGN disks. This is because high |χ eff | and low χ p expected for mergers either in FBs or MTs (panels c and d in Figs. 2 and 3) are incompatible with the observed distribution of |χ eff | < χ p (Fig. 3 e).
For mergers in GCs, the models with small spin magnitudes for 1g BHs are less favored. This is presumably because infrequent hierarchical mergers ( 20%) in GCs are difficult to explain typically high values of χ p if 1g BHs have low spin magnitudes. On the other hand, for a ave = 0.3 and α = 2, the Bayes factor for D KL,crit = 0.05 is as high as ∼ 10 0.4 . Thus, if mergers originate from GCs, 1g BHs are favored to have high spin magnitudes and follow a bottom heavy initial mass function.
For mergers in AGN disks or NSCs, the models with non-zero values for initial BH spins (a ave = 0.3) as well as a high value for σ χp (∼ 0.3-0.4) have high Bayes factors of 10 0.5 and 10 0.05 -10 0.4 for D KL,crit = 0.05, respectively. This is because non-zeroχ typ at low m chirp in the observed distribution (Fig. 8) can be explained by adjusting these variables (Fig. 1). Also, large values for α, which effectively shift the χ typ and m chirp distribution toward lower m chirp , and accordingly raisesχ typ at low m chirp (e.g. Fig. 7 e). This is presumably the reason why the model with α(= 2) has a high Bayes factor of 10 0.8 at m max ∼ 25 M ⊙ compared to the models with α = 1 (K A,B 1).
Preferred values for m max are probably as low as ∼ 15-30 M ⊙ if the typical spin magnitude for 1g BHs is low. For α = 1, in the models with N s = 3, 4, and 5, respectively, m max = 25-30 M ⊙ , m max = 20-25 M ⊙ , and m max = 15-20 M ⊙ is preferred. The difference in prefer- Table 4 The parameters of model A which are different from each population model (shown in Table 3) and the logarithm of their Bayes factor K A,B relative to the fiducial model ("B"). The Bayes factors for the three parameters with D KL,cri = 0, 0.05, 0.1, 0.15, and 0.2, and that for the two parameters are presented from the second to seventh columns. We highlight the models with positive Bayes factors in the five rightmost columns in boldface.
Parameters log 10 K A,B in 3D log 10 K A,B in 2D D KL,cri = 0 D KL,cri = 0.05 D KL,cri = 0.1 D KL,cri = 0.15 D KL,cri = 0. -3.6 -6.1 -7.9 -3.9 -3.6 -0.95 Ns = 2, mmax = 60 M ⊙ -14 -13 -12 -6.9 -5.0 -9.5 Ns = 3 -4.1 -3.0 -1. We also compare the properties inferred from GW observations with those predicted for mergers in AGN disks, which are calculated from one-dimensional N -body simulations, combined with a semi-analytical model used in Tagawa et al. (2021b). We adopt the fiducial model in Tagawa et al. (2021b), while we investigate several variations in which the initial BH masses are multiplied by f m1g =1, 1.33, 1.66, 2, and 3 so that m max = 15, 20, 25, 30, and 45 M ⊙ , respectively. Since 1g BH masses are 5-15 M ⊙ in the fiducial model, the minimum BH mass is given by 5f m1g M ⊙ , in which the minimum chirp mass is ∼ 8.7f m1g M ⊙ . To eliminate a reduction of the likelihood due to the lack of 1g BHs in the low mass ranges, we here calculate Bayes factors only using events with m chirp ≥ 8.7f m1g M ⊙ . The errors on m chirp , χ eff , and χ p are simply given by the normal distribution with the standard deviation of 0.08 m chirp , 0.12, and 0.2, respectively. The Bayes factors are listed in the bottom five rows in Table 4, which indicate that m max ∼ 30 M ⊙ (f m1g = 2) is preferred. Thus, the properties predicted for AGN disk-assisted mergers are likely to be consistent with the observed properties of the GW events.
Here, events with high Bayes factors for D KL,crit = 0.05 tend to have high Bayes factors for the two dimensional likelihood (bold number in the third and rightmost columns of Table 4). We consider that this fact would further support the preferred models discussed above.
Overall, our analyses suggest that m max = 15-30 M ⊙ with a high fraction of hierarchical mergers, or high spin magnitudes of ∼ 0.3 for 1g BHs is favored. The former may support mergers in NSCs including AGN disks, while the latter may be consistent with those in GCs. Further events would be required to assess these possibilities in more detail.
We also discuss the spin distribution suggested in The LIGO Scientific Collaboration et al. (2020c). First, we compare the average and the standard deviation of χ p predicted by models and those estimated from LIGO/Virgo O1-O3a data. By analyzing the observed GW data, The LIGO Scientific Collaboration et al. (2020c) estimated that the average and the standard deviation of χ p are 0.21 +0.15 −0.14 and 0.09 +0.21 −0.07 , respectively, assuming a truncated mass model. These values are consistent with models in which hierarchical mergers are frequent such as models M1, M9-M11 (Table 2), M18-M21, M25-M28 (Table 5). Also, the average and the standard deviation χ p for the model of GC with a ave = 0.3 is 0.25 and 0.062, respectively, which are also consistent with the values estimated from the observed data. This fact further supports our claim that frequent hierarchical mergers or high spin magnitudes of ∼ 0.3 for 1g BHs is favored. Here, note that the dependence of the spins on masses expected from hierarchical mergers is taken into account in our analysis, which would be a critical difference from that in The LIGO Scientific Collaboration et al. (2020c).
Next, we discuss the fraction of mergers with positive and negative χ eff . The LIGO Scientific Collaboration et al. (2020c) analyzed the GW data observed in LIGO/Virgo O1-O3a, and estimated that 0.67 +0.16 −0.16 (the 90% credible intervals) and 0.27 +0.17 −0.15 of mergers have χ eff > 0.01 and χ eff < −0.01, respectively. In the fiducial model (Table 1), the fraction of mergers with χ eff > 0.01 is 0.54 and that for χ eff < −0.01 is 0.41. The larger fraction for positive χ eff compared to that for negative one in the model is due to the assumed dependence of ρ 0 on χ eff in Eq. (7). The fraction of mergers with negative χ eff in the model is somewhat higher than that estimated in The LIGO Scientific Collaboration et al. (2020c). Such difference may be due to large uncertainties for the estimated fraction, while it may suggest that the dependence of ρ 0 on χ eff is stronger than that adopted in Eq. (7), or BH spins are moderately aligned toward the binary angular momentum directions due to interactions with gas, tidal synchronization, or alignment of spins for progenitor stars.

Reconstruction of the spin profile from mock GW data
We investigate how well theχ typ profile can be reconstructed from mock GW data ( § 2.2.3) for different values of N obs by performing the MCMC method as described in § 2.3. Fig. 6 showsχ typ as a function of m chirp for N obs = 44 (black) and N obs = 100 (orange) for the model with the fiducial setting (Table 1) but m max = 30 M ⊙ and α = 2, which is preferred from observed GW events ( § 3.2.1 and § 3.2.2).
As the parameter estimate tends to be biased in small N obs , we additionally perform 10 models for N obs = 44 with same settings with independent realizations of the initial condition. By averaging the estimated parameters for eleven models,χ typ at the plateau is b µ = 0.63 +0.07 −0.05 with the standard deviation σ(b µ ) = 0.04, the critical chirp mass is m crit = 24 +6 −4 M ⊙ with σ(m crit ) = 5 M ⊙ , and the slope ofχ typ in m chirp < m crit is a µ = 25 +9 −8 × 10 −3 M −1 ⊙ with σ(b µ ) = 8 × 10 −3 M −1 ⊙ . As these uncertainties on the reconstructed parameters from the GW mock data are similar to those derived from the observed GW data in § 3.2.1, we conclude that the GW mock data are a useful tool to understand how well the spin profile can be reconstructed.
The critical chirp mass is estimated to be m crit = 25 +6 −3 M ⊙ for N obs = 100, and m crit = 24.9 +0.9 −0.8 M ⊙ for N obs = 1000. Here, the estimated value of m crit is lower than m max by ∼ 20% mostly because m chirp = (m 1 + m 2 )[q(1 + q) −2 ] 3/5 0.87m max . As the analysis on the observed GW events in § 3.2.1 derives m crit ∼ 15-50 M ⊙ , m max ∼ 20-60 M ⊙ is roughly inferred according to the relation of m max ∼ 1.2 m crit .
The average spin parameterχ typ at m chirp = m min is related to the typical spin magnitude of 1g BHs (e.g. Fig. 1).χ typ at m chirp = 5 M ⊙ is 0.20 +0.14 −0.17 for N obs = 44, 0.27 +0.11 −0.13 for N obs = 100, and 0.23 +0.03 −0.02 for N obs = 1000. These values derived from the model with a ave = a uni = 0 are similar to the value (∼ 0.3± ∼ 0.1) derived from the observed GW data ( § 3.2.1), suggesting that the typical spin magnitude of 1g BHs inferred from the observed GW events is still consistent with ∼ 0.
For N obs = 100,χ typ at the plateau is b µ = 0.62 +0.04 −0.03 , which is similar to the expected value for hierarchical mergers (∼ 0.6, § 3.1.1). Also, the mass at the bending point is well constrained with N obs = 100 as mentioned above. Thus, with N obs ≥ 100, parameters characterising properties of hierarchical mergers, e.g. a value ofχ typ and m crit at the plateau, are more precisely constrained.
Finally, to investigate whether the bending point is robustly verified, we also fit the distribution by a straight line, i.e. assuming m crit → ∞ in Eq. (14), and calculate the Bayes factor of the model with broken lines (Eq. 14) compared to the model with a single line (m crit → ∞), where we set the likelihood function to Eq. (14) with the fitted parameters. For N obs = 44, 100, and 1000, the logarithm of the Bayes factor is 1.5, 2.1, and 24, respectively. If we adopt the Akaike information criterion (Akaike 1974), the model with the broken lines is preferred by a factor of ∼ 10 1.6 for N obs = 100, and the preference increases as N obs increases. In the analysis using the observed data in § 3.2.1, although we assumed the existence of the plateau, the Bayes factors using the observed events (with D KL,cri = 0, 0.05, 0.1, and 0.15) are in the range of 10 −0.2 -10 0.2 , suggesting that the existence of the plateau is uncertain. Our analysis suggests that as the number of GW events increases to O(100), the existence of the plateau can be confirmed with high significance.

SUMMARY AND CONCLUSIONS
In this paper we have investigated characteristic distributions of χ eff , χ p , χ typ = (χ 2 p + χ 2 eff ) 1/2 , and m chirp expected from hierarchical mergers among stellar-mass BHs. We then used a toy model to derive the profile of the average of χ typ as a function of m chirp for the events observed by LIGO/Virgo O1-O3a. We also investigated how well predictions in different models match observed spin and mass distributions by using Bayes factors. Finally, we estimate how well the χ typ profile can be reconstructed using mock GW data expected in hierarchical mergers. Our main results are summarized as follows: 1. If hierarchical mergers are frequent, and the spin distribution of first-generation (1g) BHs does not strongly depend on their mass, theχ typ profile as a function of m chirp is characterized by a monotonic increase ofχ typ with m chirp up to the maximum chirp mass among 1g BHs, and reaches a plateau ofχ typ with ∼ 0.6 at higher m chirp (Fig. 1). With ∼ 50 events, the plateau and the rise of χ typ to 0.6 can be confirmed if the detection fraction of mergers of high-g BHs roughly exceeds ∼ 0.5 and ∼ 0.15, respectively.
2. The maximum mass for 1g BHs can be estimated by constraining the transition point between the two regimes in theχ typ profile. Also, the typical spin magnitude for 1g BHs is constrained fromχ typ at around minimum m chirp among GW events.
3. Theχ typ profile reconstructed from the LIGO/Virgo O1-O3a data prefers an increase inχ typ at m chirp 15-50 M ⊙ with ∼ 2σ confidence (Fig. 5) 5. By using observed data of more than ∼ 100 events in the future, we will be able to recover parameters characterizing theχ typ distribution (e.g. the existence of the plateau and the value ofχ typ at the plateau b µ ) more precisely.

DATA AVAILABILITY
The data underlying this article will be shared on reasonable request to the corresponding author.  (Table 2). We use N obs = 10 3 detectable mergers for models M1, M14-M28, while N obs = 50 and 10 4 for models M14 and M15, respectively.
Here, we investigate the effect that mergers at larger iteration steps tend to occur at lower redshift because finite time needs to elapse between each generation and high-g mergers thus would take place after a significant delay compared to low-g mergers. To take this delay into account, we modify the redshift distribution of merging BHs as where t L (z) is the look-back time, we set the average to µ t = t typ Ns−Ni+1 Ns and the standard deviation to σ t = t typ w t , N i is the number of steps that the i th merger is created, t typ is the typical look-back time that mergers began to occur, which is set to 10 Gyr, and w z is the parameter determining the strength of correlation between N i and the time that mergers occur. A lower value of w z makes mergers with high N i occur at a lower z, and the fiducial model (Eq. 5) corresponds to w z = ∞. The dependence of theχ typ profile on w z is shown in panel (f), suggesting that the correlation between the redshift and the generations of BHs has a negligible impact on the profile.

OBSERVED SPIN DISTRIBUTION
We presents the observed distributions of χ p , χ eff , and χ typ as a function of m chirp in Fig. 8. Also, Fig. 9 compares the m chirp , χ eff and χ p distributions observed by LIGO/Virgo O1-O3a and those predicted by the model for the fiducial settings (Table 1) but m max = 30 M ⊙ and α = 2, which is assessed to high Bayes factors for both D KL,crit = 0.05 and the two parameters (Table 4). We can see that the observed distribution for these variables (blue and orange points) roughly follows the 90 and 99 percentile regions (black and gray lines) predicted by the model.

POSTERIOR DISTRIBUTIONS FOR SPIN PARAMETERS
We present the posterior distributions of the parameters characterizing the spin profile for the GW events with D KL ≥ 0 ( § 3.2.1) in Fig. 10.