Massive Black Hole Binary Mergers in Dynamical Galactic Environments

Gravitational Waves (GW) have now been detected from stellar-mass black hole binaries, and the first observations of GW from Massive Black Hole (MBH) Binaries are expected within the next decade. Pulsar Timing Arrays (PTA), which can measure the years long periods of GW from MBHB, have excluded many standard predictions for the amplitude of a stochastic GW Background (GWB). We use coevolved populations of MBH and galaxies from hydrodynamic, cosmological simulations ('Illustris') to calculate a predicted GWB. The most advanced predictions so far have included binary hardening mechanisms from individual environmental processes. We present the first calculation including all of the environmental mechanisms expected to be involved: dynamical friction, stellar 'loss-cone' scattering, and viscous drag from a circumbinary disk. We find that MBH binary lifetimes are generally multiple gigayears, and only a fraction coalesce by redshift zero. For a variety of parameters, we find all GWB amplitudes to be below the most stringent PTA upper limit of $A_{\textrm{yr}^{-1}} \approx 10^{-15}$. Our fairly conservative fiducial model predicts an amplitude of $A_{\textrm{yr}^{-1}} \approx 0.4\times 10^{-15}$---less than a factor of three below the current limit. At lower frequencies, we find $A_{0.1\,\textrm{yr}^{-1}} \approx 1.5\times 10^{-15}$ with spectral indices between $-0.4$ and $-0.6$---significantly flatter than the canonical value of $-2/3$ due to purely GW-driven evolution. Typical MBHB driving the GWB signal come from redshifts around $0.3$, with total masses of a few times $10^9\,M_\odot$, and in host galaxies with very large stellar masses. Even without GWB detections, our results can be connected to observations of dual AGN to constrain binary evolution.


INTRODUCTION
Massive Black Holes (MBH) occupy at least the majority of massive galaxies (e.g. Soltan 1982;Kormendy & Richstone 1995;Magorrian et al. 1998) which are also known to merge with each other as part of their typical lifecycles (e.g. Lacey & Cole 1993;Lotz et al. 2011;Rodriguez-Gomez et al. 2015). This presents two possibilities for the MBH of host galaxies which merge: either they also merge, or they persist as multiples in the resulting remnant galaxies. Naively, one might expect that BH must undergo mergers for them to grow-as for halos and to some degree galaxies in the fundamentally hierarchical Lambda-Cold Dark Matter model (e.g. White & Frenk 1991). On the contrary, the linear growth of black holes (i.e. at most doublings of mass) is known to be woefully inadequate to form the massive quasars observed at high redshifts (e.g. Fan et al. 2006), which require exponential growth from Edding-E-mail:lkelley@cfa.harvard.edu ton 1 (or super-Eddington) accretion (e.g. Haiman 2013). Assuming that the energy fueling Active Galactic Nuclei (AGN) comes from accretion onto compact objects, the integrated luminosity over redshifts requires a present day mass density comparable to that of observed MBH (Soltan 1982). Coalescences of MBH are then neither sufficient nor necessary to match their observed properties (e.g. Small & Blandford 1992).
In the last decade the search for multi-MBH systems have yielded many with kiloparsec-scale separations ('Dual AGN', e.g. Comerford et al. 2012)-although they seem to represent only a fraction of the population (Koss et al. 2012)-and even some triple systems (e.g. Deane et al. 2014). At separations of kiloparsecs, 'Dual MBH' are far from the 'hard' binary phase. A 'hard' binary is one in which the binding energy is larger than the typical 1 The Eddington accretion rate can be related to the Eddington luminosity asṀ Edd = L Edd /ε rad c 2 , for a radiative efficiency ε rad , i.e. hardening binaries. Similar SPH studies have affirmed and extended these results to more general environments and MBHB configurations (e.g. Dotti et al. 2007;Cuadra et al. 2009). While modern simulations continue to provide invaluable insights, and exciting steps towards simulating MBHB over broad physical scales are underway (e.g. Khan et al. 2016), neither hydrodynamic nor purely N-Body simulations are close to simulating the entire merger process in its full complexity 3 .
With our entrance into the era of Gravitational Wave (GW) astronomy (LIGO 2016a), we are presented with the prospect of observing compact objects outside of both the electromagnetic spectrum and numerical simulations. Direct detection experiments for gravitational waves are based on precisely measuring deviations in path length (via light-travel times). While ground-based detectors like the Laser Interferometer Gravitational-Wave Observatory (LIGO; LIGO 2016b) use the interference of light between two orthogonal, kilometer scale laser arms, Pulsar Timing Arrays (PTA, Foster & Backer 1990) use the kiloparsec scale separations between earth and galactic pulsars (Detweiler 1979). PTA are sensitive to GW at periods between the total observational baseline and the cadence between observations. These frequencies, roughly 0.1 − 10 yr −1 , are much lower than LIGO-corresponding to steady orbits of MBHB with total masses between ≈ 10 6 − 10 10 M , at separations of ≈ 10 −3 − 10 −1 pc (i.e. 1 − 10 6 R s 4 ). The parameter space is shown in Fig. A1.
Binaries produce GW which increase in amplitude and frequency as the orbit hardens, up to the 'chirp' when the binary coalesces. MBHB chirps will be at frequencies below the LIGO band, but above that of PTA. Future space-based interferometers (e.g. eLISA; The eLISA Consortium et al. 2013) will bridge the divide and observe not only the coalescence of MBHB, but also years of their final inspiral. The event rate of nearby, hard MBHB which could be observed as individual 'continuous wave' sources is expected to be quite low, and likely the first GW detections from PTA will be of a stochastic GW Background (GWB) of unresolved sources (Rosado et al. 2015).
The shape of the GWB spectrum was calculated numerically more than two decades ago (Rajagopal & Romani 1995), but Phinney (2001) showed that the characteristic GWB spectrum can be calculated analytically by considering the total energy emitted as gravitational waves, integrated over redshift. For a complete and pedagogical derivation of the GWB spectrum see, e.g., Sesana et al. (2008). The 'characteristic-strain', h c ( f ), can be calculated for a finite number of sources, in some comoving volume V c (e.g. a computational box), as, Equation 1 is the simplest way to calculate a GW background spectrum, requiring just a distribution of merger chirp-masses and redshifts. This type of relation is often written as, which has become typical for GWB predictions, and usually normalized to f 0 = 1 yr −1 (with some A yr −1 ). The prediction of a GWB with spectral slope of −2/3 is quite general, but does assume purely gravitational-wave driven hardening which produces a purely power-law evolution in frequency. The lack of high and low frequency cutoffs is fortuitously accurate at the frequencies observable through PTA, which are well populated by astrophysical MBHB systems. Deviations from pure power-law behavior within this band, however, are not only possible but expected-the degree of which, determined by how significant non-GW effects are, is currently of great interest. Many predictions have been made for the normalization of the GWB based on extensions to the method of Phinney (2001). The standard methodology is using Semi-Analytic Models 5 (SAM) of galaxy evolution, with prescribed MBHB mergers to calculate a GWB amplitude. Two of the earliest examples are Wyithe & Loeb (2003)-who use analytic mass functions (Press & Schechter 1974) with observed merger rates (Lacey & Cole 1993), and Jaffe & Backer (2003)-who use observationally derived galaxy mass functions, pair fractions, and merger time-scales. These studies find amplitudes of log A yr −1 = −14.3 and log A yr −1 = −16, respectively which remain as upper and lower bounds to most predictions since then. Monte Carlo realizations of hierarchical cosmologies (Sesana et al. 2004) exploring varieties of MBHB formation channels (e.g. Sesana et al. 2008;Sesana 2013b;Roebber et al. 2016) have been extremely fruitful in populating and understanding the parameter space, finding GWB amplitudes generally consistent with log A yr −1 ≈ −15 ± 1. Sesana et al. (2016) find that accounting for bias in MBH-Host scaling relations moves SAM predictions towards the lower end of this range at log A yr −1 = −15.4.
More extensive models exploring deviations from the purely power-law GWB have also been explored. For example, at higher frequencies ( 1 yr −1 ) from a finite numbers of sources ), or at lower frequencies due to eccentric binary evolution (e.g. Sesana 2010). Recently, much work has focused on the 'environmental effects' outlined by BBR80. Kocsis & Sesana (2011) incorporate viscous drag from a circumbinary gaseous disk (Haiman et al. 2009, hereafter HKM09) on top of halos and mergers from the dark-matter only Millennium simulations (Springel et al. 2005), with MBH added in post-processing. They find a fairly low amplitude GWB, log A yr −1 ≈ −16 ± 0.5, with a flattening spectrum below ∼ 1 yr −1 . Ravi et al. (2014) explore eccentric binary evolution in an always effectively refilled (i.e. full) LC using the Millennium simulation with the SAM of Guo et al. (2011). They find log A yr −1 ≈ −15 ± 0.5 with an turnover in the GWB below ∼ 10 −2 yr −1 and significant attenuation up to ∼ 10 −1 yr −1 . Recently, both McWilliams et al. (2014) and Kulier et al. (2015) have implemented explicit dynamical-friction formalisms along with recent MBH-Host scaling relations (McConnell & Ma 2013) applied to halo mass functions from Press-Schechter and the Millenium simulations respectively. McWilliams et al. (2014) find log A yr −1 ≈ 14.4 ± 0.3, and Kulier et al. (2015) log A yr −1 ≈ 14.7 ± 0.1, with both highlighting the non-negligible fraction of binaries stalled at kiloparsec-scale separations. Almost all previous studies had assumed that all MBHB merge effectively.
These predictions are summarized in Table 1. While far from exhaustive, we believe they are a representative sample, with specific attention to recent work on environmental effects. The amplitudes of the predicted backgrounds are distributed fairly consis-tently around A yr −1 ≈ 10 −15 . Assuming observational baselines of about 10 yr, pulsar TOA accuracies of at least tens of microseconds are required to constrain or observe a GWB with this amplitude (see, e.g. Blandford et al. 1984;Rajagopal & Romani 1995). Finding more millisecond pulsars with very small intrinsic timing noise are key to improving GWB upper-limits, while increasing the total number (and angular distribution) of pulsars will be instrumental for detections (Taylor et al. 2016).
There are currently three ongoing PTA groups, the North-American Nanohertz Observatory for Gravitational-waves (NANOGrav, The NANOGrav Collaboration et al. 2015), the European PTA (EPTA, Desvignes et al. 2016), and the Parkes PTA (PPTA, Manchester et al. 2013). Additionally, the International PTA (IPTA, Hobbs et al. 2010) aims to combine the data sets from each individual project, and has recently produced their first public data release (Verbiest et al. 2016). Table 2 summarizes the current upper limits from each PTA. These are the 2-σ upper bounds, based on both extrapolation to A yr −1 along with that of the specific frequency with the strongest constraint assuming a −2/3 spectral index. Overall, the lowest bound is from the PPTA, at A yr −1 < 10 −15 , or in terms of the fractional closure density, Ω GW ( f = 0.2 yr −1 ) < 2.3 × 10 −10 (Shannon et al. 2015).
Every existing prediction has been made with the use of SAM-mostly in the construction of the galaxy population, but also in how black holes are added onto those galaxies. SAM are extremely effective in efficiently creating large populations based on observational relations. Higher-order, less observationally constrained parameters can have systemic biases however, for example galaxy merger rates (see, e.g. Hopkins et al. 2010)-which are obviously critical to understanding MBHB evolution. Recently, Rodriguez-Gomez et al. (2015) have shown that merger rates from the cosmological, hydrodynamic Illustris simulations (Vogelsberger et al. 2014b) show excellent agreement with observations, while differing (at times substantially) from many canonical SAM.
In this paper, we use results from the Illustris ) simulations (discussed in §2) to make predictions for the rates at which MBH form binaries, and evolve to coalescence. Illustris provides the MBH population along with their self-consistently derived parent galaxies and associated stellar, gaseous, and darkmatter components. This is the first time that a hydrodynamic galaxy population, with fully co-evolved MBH have been used to calculate a GWB spectrum. We use these as the starting point for post-processed models of the unresolved merger dynamics themselves, including all of the underlying hardening mechanisms: dynamical friction, loss-cone stellar scattering, viscous drag, and GW evolution-again for the first time ( §3) in a MBHB population calculation. From these data, we make predictions for plausible GW backgrounds observable by PTA, focusing on the effects of different particular mechanisms on the resulting spectrum such that future detections and upper-limits can be used to constrain the physical merger process ( §4).
In addition to the GWB, understanding the population of MBHB is also important for future space-based GW observatories (e.g. eLISA The eLISA Consortium et al. 2013). Solid predictions for binary timescales at different separations will also be instrumental in interpreting observations of dual-and binary-AGN, in addition to offset and 'kicked' BHs (Blecha et al. 2016). Finally, MBHB could play a significant role in triggering stellar Tidal Disruption Events (TDE; e.g. Ivanov et al. 2005;Chen et al. 2009), and explaining the distribution of observed TDE host galaxies. Values are given both at the standard normalization of f = 1 yr −1 in addition to the frequency and amplitude of the strongest constraint (when given).

THE ILLUSTRIS SIMULATIONS
Illustris are a suite of cosmological, hydrodynamic simulations which have accurately reproduced both large-scale statistics of thousands of galaxies at the same time as the detailed internal structures of ellipticals and spirals (Vogelsberger et al. 2014b). Illustris-hereafter referring to Illustris-1, the highest resolution of three runs-is a cosmological box of 106.5 Mpc on a side, with 1820 3 each gas cells and dark-matter particles. The simulations use the moving, unstructured-mesh hydrodynamic code AREPO (Springel 2010), with superposed SPH particles (e.g. Springel et al. 2005) representing stars (roughly 1.3 × 10 6 M mass resolution, 700 pc gravitational softening length), dark-matter (DM; 6.3 × 10 6 M , 1.4 kpc), and MBH (seeded at M ≈ 10 5 M ) and allowed to accrete and evolve dynamically. Stars form and evolve, feeding back and enriching their local environments, over the course of the simulation which is initialized at redshift z = 137 and evolved until z = 0 at which point there are over 3 × 10 8 star particles. For a comprehensive presentation of the galaxy formation models (e.g. cooling, inter-stellar medium, stellar evolution, chemical enrichment) see the papers: Vogelsberger et al. (2013) and Torrey et al. (2014). For detailed descriptions of the general results of the Illustris simulations, and comparisons of their properties with the observed universe, see e.g., Vogelsberger et al. (2014a), Genel et al. (2014), and Sijacki et al. (2015). Finally, the data for the Illus-tris simulations, and auxiliary files containing the black hole data 1 used for this analysis, have been made publicly available online (www.illustris-project.org; Nelson et al. 2015).

The Black Hole Merger Population
Black holes are implemented as massive, collisionless 'sink' particles seeded into sufficiency massive halos. Specifically, halos with a total mass above 7.1 × 10 10 M , identified using an on the fly Friends-Of-Friends (FOF) algorithm, which don't already have a MBH are given one with a seed mass, M seed = 1.42×10 5 M (Sijacki et al. 2007). The highest density gas cell in the halo is converted into the BH particle. The BH mass is tracked as an internal quantity, while the particle overall retains a dynamical mass initially equal to the total mass of its predecessor gas cell (Vogelsberger et al. 2013). The internal BH mass grows by Eddington-limited, Bondi-Hoyle accretion from its parent gas cell (i.e. the total dynamical mass remains the same). Once the excess mass of the parent is depleted, mass is accreted from nearby gas particles-increasing both the dynamical mass of the sink particle, and the internal BH-mass quantity.
BH sink particles typically have masses comparable to (or within a few orders of magnitude of) that of the nearby stellar and DM particles. Freely evolving BH particles would then scatter around their host halo, instead of dynamically settling to their center-as is the case physically. To resolve this issue, BH particles in Illustris are repositioned to the potential minima of their host halos. For this reason, their parametric velocities are not physically meaningful. Black hole "mergers" occur in the simulation whenever two MBH particles come within a particle smoothinglength of one another-typically on the order of a kiloparsec. This project aims to fill in the merger process unresolved in Illustris. In our model, an Illustris "merger" corresponds to the formation of a MBH binary system, which we then evolve. To avoid confusion, we try to use the term 'coalescence' to refer to the point at which such a binary would actually collide, given arbitrary resolution.
Over the course of the Illustris run, 135 'snapshots' were produced, each of which include internal parameters of all simulation particles. Additional black-hole-specific output was also recorded at every time-step, providing much higher time resolution for black hole accretion rates 2 , local gas densities and most notably, merger events. The entire set of mergers-a time and pair of BH massesconstitutes our initial population of MBH binaries.
The distribution of BH masses is peaked at the lowest masses. Many of these black holes are short-lived: their small, usually satellite, host of-matter halos often quickly merge with a nearby neighbor-producing a BH 'merger' event. Additionally, in some cases, the identification of a particular matter over-density as a halo by the FOF halo-finder, while transient, may be sufficiently massive to trigger the creation of a new MBH seed particle. This seed can then quickly merge with the MBH in a nearby massive halo. Due to the significant uncertainties in our understanding of MBH and MBH-seed formation, it is unclear if and when these processes are physical. For this reason we implement a mass cut on merger events, to ensure that each component BH has M • > 10 6 M ≈ 10 M seed . Whether or not these 'fast-mergers' are non-physical, the mass cut is effective at excluding them from our analysis. The entire Illustris simulation has 23708 MBH merger events; applying the mass cut excludes 11291 (48%) of those, leaving 12417. We have run configurations without this mass cut, and the effects on the GWB are always negligible.
There is very small population of MBH 'merger' events which occur during close encounters (but not true mergers) of two hosthalos. During the encounter, the halo-finder might associate the two constituent halos as one, causing the MBHs to merge spuriously due to the repositioning algorithm. These forced mergers are rare and, we believe, have no noticeable effect on the overall population of thousands of mergers. They are certainly negligible in the overall MBH-Halo statistical correlations which are well reproduced in the Illustris simulations (Sijacki et al. 2015).

Merger Host Galaxies
To identify the environments which produce the dynamical friction, stellar scattering, and viscous drag which we are interested in, we identify the host galaxies of each MBH involved in the merger in the snapshot preceding it, in addition to the single galaxy which contains the 'binary' (at this point a single, remnant MBH) in the snapshot immediately following the merger event. 644 mergers (3%) are excluded because they don't have an associated galaxy before or after the merger. To ensure that each host galaxy is sufficiently well resolved (especially important for calculating density profiles), we require that a sufficient number of each particle type constitute the galaxy. Following Blecha et al. (2016) we use a fiducial cut of 80 and 300 star and DM particles respectively, and additionally require 80 gas cells. This excludes 54 of the remaining binaries. We emphasize here that this remnant host galaxy, as it is in the snapshot following the Illustris 'merger' event, forms the environment in which we model the MBHB merger process. In the future, we plan to upgrade our implementation to take full advantage of the dynamically evolving merger environment: including information from both galaxies as they merge, and the evolving remnant galaxy once it forms. 10 0 10 1 10 2 10 3 10 4 10 5 10 6 Radius [pc] 10 0 10 1 10 2 10 3 10 4 10 5 Counts 10 2 10 3 Radius [pc] 10 -1 10 0 10 1 Power-Law Index = −0. 05 Gas 10 2 10 3 Radius [pc] 10 -1 10 0 Power-Law Index = −0. 33 DM 10 2 10 3 Radius [pc] 10 -1 10 0 10 1 Power-Law Index = −0. 44 Stars None 10 -13 10 -11 10 -9 10 -7 10 -5 10 -3 10 -1 10 1 10 3 10 5 10 7 Density [M /pc 3 ] Gas DM Stars Figure 1. Density profiles from a sample Illustris MBHB host galaxy. Binned densities for each particle type are shown (upper-left) along with the number of particles/cells in each (bottom-left). Semi-transparent points are bins with less than four particles-the number required for consideration in calculating fits. Zoom-ins are also shown separately for each particle type (right), with the eight inner-most bins with at least four particles shown in the shaded region. Those bins were used to calculate fits, which are overplotted. The resulting power-law indices used to extrapolate inwards are also shown. Any galaxy without enough (eight) bins is excluded from our sample, in addition to the MBHB it contains. This was the case for roughly 1% of our initial population, almost entirely containing MBH very near our BH mass threshold (10 6 M ).
We construct spherically averaged, radial density profiles for each host galaxy and each particle/cell type (star, DM, gas). Because the particle smoothing lengths are larger than the MBHB separations of interest, we extrapolate the galaxy density profiles based on fits to the inner regions. Our fits use the innermost eight radial bins that have at least four particles in them. Out of the valid binaries, 347 (1%) are excluded because fits could not be constructedgenerally because the particles aren't distributed over the required eight bins. Successful fits typical use ∼ 100 particles, with gas cell sizes ∼ 10 2 pc and SPH smoothing lengths for stars and DM ∼ 10 3 pc.
Density profiles for a sample Illustris MBHB host galaxy are shown in Fig. 1. The left panels show the binned density profiles for each particle type (top), and the number of particles in each bin (bottom). The semi-transparent points are those with less than the requisite four particles in them. The right panels show zoom-ins for each particle type, where the shaded regions indicate the eight bins used for calculating fits. The resulting interpolants are overplotted, with the power-law index indicated. While the four particles per bin, and eight inner bins generally provide for robust fits, we impose a maximum power-law index of −0.1-i.e. that densities are at least gently increasing, and a minimum index of −3-to ensure that the mass enclosed is convergent. Using these densities we calculate all additional galaxy profiles required for the hardening prescriptions ( §3), e.g. velocities, binding energies, etc. When calculating profiles using our fiducial parameters, 2286 binaries (10%) are excluded when calculating the distribution functions ( §3.2), usually due to significant nonmonotonicities in the radial density profile which are incompatible with the model assumptions. Overall, after all selection cuts, 9270/23708 (39%) of the initial Illustris 'merger' events are analyzed in our simulations. Figure 2 shows the properties of MBHB passing our selection cuts, grouped my mass ratio. Mass ratio (upper-left panel) is strongly anti-correlated with total mass (upper-right) due to  Figure 2. Properties of the Massive Black Hole Binaries from the Illustris population passing our selection cuts. After selecting for MBH masses M > 10 6 M , and requiring the binary host galaxies to have sufficiently well resolved density profiles, 9270 of 23708 (39%) systems remain. Distributions of mass ratio, total mass, initial binary separation (determined by MBH particle smoothing lengths), and formation redshifts (determined as the time at which particles come within a smoothing length of one-another) are shown. The different lines (colors) correspond to different mass ratios which are strongly anti-correlated with total mass. both selection effects (e.g. at total masses just above the minimum mass, the mass ratio must be near unity), and astrophysical ones (e.g. the most massive MBH, in large, central galaxies tend to merge more often with the lower mass MBH in small satellite galaxies). Binary separations (lower-left) are set by the smoothing length of MBH particles in Illustris. Once two MBH particles come within a smoothing length of one another, Illustris considers them a 'merger' event-which corresponds to the 'formation' (lower-right) of a binary in the simulations of this study.

BINARY HARDENING MODELS
Black-hole encounters from Illustris determine the initial conditions for the binary population which are then evolved in our merger simulations. Throughout the 'hardening' process, where the binaries slowly coalesce over millions to billions of years, we assume uniformly circular orbits. In our models, we use information from the MBHB host galaxies to implement four distinct hardening mechanisms (Begelman et al. 1980, hereafter BBR80): dynamical friction (DF), stellar scattering in the 'loss-cone' regime (LC) 1 , viscous drag (VD) from a circumbinary gaseous-disk, and gravitational-wave radiation (GW).

Dynamical Friction
Dynamical friction is the integrated effect of many weak and longrange scattering events, on a gravitating object moving with a relative velocity through a massive background. The velocity differential causes an asymmetry which allows energy to be transferred from the motion of the massive object to a kinetic thermalization of the background population. In the case of galaxy mergers, dynamical friction is the primary mechanism of dissipating the initial 10 10 10 11 10 12 10 13 Galaxy Mass [M ] 10 0 10 1 10 2 Galaxy Radius [kpc] 10 10 10 11 10 12 10 13 Galaxy Mass [M ] 10 0 10 1 10 2 10 3 Count 10 0 10 1 10 2 10 3 Count 10 0 10 1 10 2 Galaxy Radius [kpc] 10 0 10 1 10 2 10 3 Dynamical Time [Myr] 10 0 10 1 10 2 10 3 10 4 Number Figure 3. Stellar half-mass radii (R ,1/2 ) and total mass within 2·R ,1/2 for all Illustris remnant host galaxies. Values for each galaxy are colored by their dynamical times, calculated using Eq. 7, which are used for the 'enhanced' DF masses. The histogram in the upper right shows the distribution of dynamical times. Galaxy masses and radii are peaked at about 10 11 M and 10 kpc respectively, with corresponding dynamical times around 100 Myr orbital energy to facilitate coalescence of the galaxies, generally on timescales comparable to the local dynamical time (∼ 10 8 yr). BH present in the parent galaxies will tend to 'sink' towards each other in the same manner (BBR80) due to the background of stars, gas and dark-matter (DM). For a detailed review of dynamical friction in MBH systems, see Antonini & Merritt (2012). The change in velocity of a massive object due to a single encounter with a background particle at a fixed relative velocity v and impact parameter b is derived (e.g. Binney & Tremaine 1987) following the treatment of Chandrasekhar (1942Chandrasekhar ( , 1943 by averaging the encounters over all possible angles to find, where the characteristic (or 'minimum') impact parameter b 0 ≡ G(M + m)/v 2 , for a primary object of mass M, in a background of bodies with masses m. The net deceleration on a primary mass is then found by integrating over distributions of stellar velocity (assumed to be isotropic and Maxwellian) and impact parameters (out to some maximum effective distance b max ) which yields, for a background of mass density ρ. The impact parameters are usually replaced with a constant-the 'Coulomb Logarithm', ln Λ ≡ ln bmax b 0 ≈ 1 2 ln 1 + Λ 2 , such that, In the implementation of Eq. 5, we use spherically averaged density profiles from the Illustris, remnant host galaxies. Modeling a 'bare', secondary MBH moving under DF through these remnants would clearly drastically underestimate the effective mass-which, at early times is the MBH secondary in addition to its host galaxy. Over time, the secondary galaxy will be stripped by tidal forces and drag, eventually leaving behind the secondary MBH with only a dense core of stars and gas directly within its sphere of influence. We model this mass 'enhancement' by assuming that the effective DF mass is initially the sum of the MBH mass (M 2 ), and that of its host galaxy (M 2,host ), decreasing as a power-law over a dynamical time τ dyn , until only the MBH mass is left, i.e., We calculate the dynamical time using a mass and radius from the remnant host galaxy. Specifically, we use twice the stellar half-mass radius 2R ,1/2 , and define M 2,host as the total mass within that radius, i.e., The galaxy properties and derived dynamical times for all MBHB host galaxies we consider are shown in Fig. 3. Galaxy masses and radii are peaked at about 10 11 M and 10 kpc respectively, with corresponding dynamical times around 100 Myr. For comparison, we also perform simulations using a fixed dynamical time of 1 Gyr for all galaxies, i.e., less-efficient stripping of the secondary galaxy.

Impact Parameters and Explicit Calculations
We have explored calculating the Coulomb logarithm explicitly, following BBR80 for the maximum impact parameter such that, This, effective maximum impact parameter is a function of binary separation 2 r-to account for the varying population of stars available for scattering and varying effectiveness of encounters. Eq. 8 also depends on the characteristic stellar radius R s ('r c ' in BBR80), radius at which the binary becomes gravitationally bound, R b = [M/ (Nm )] 1/3 R s , and radius at which the binary becomes 'hard', R h ≡ (R b /R s ) 3 R s . Not only is this formalism complex, but it often produces unphysical results. For example, with this prescription the 'maximum' impact parameter not infrequently becomes less than the 'minimum', or larger than the distances which interact in the characteristic timescales. After imposing a minimum impact parameter ratio of b max /b 0 10 (i.e. ln Λ 2.3), the results we obtained are generally consistent with using a constant coulomb-logarithm, with negligible effects on the resulting merger rates and GW background. We have also implemented an explicit integration over stellar distribution functions (see: §3.2), and found the results to again be entirely consistent with Eq. 5 which is both computationally faster and numerically smoother. We believe the explicit impact parameter calculation is only valuable as a heuristic, and instead we use ln Λ = 15, consistent with detailed calculations (e.g. Antonini & Merritt 2012). Similarly, in the results we present, we take the local stellar density as that given by spherically symmetric radial density profiles around the galactic center instead of first determining, then marginalizing over, the stellar distribution functions.  . Dynamical Friction (DF) hardening timescales for 50% of our MBHB around the median, under a variety of implementations. Cases in which a 'bare' secondary MBH migrates through the remnant host galaxy is compared to ones where the effective mass is enhanced to the secondary's host galaxy, decreasing as a power-law over the course of a dynamical time ('Enh'; see Eq. 6). The dynamical time is calculated using twice the 'Stellar' half-mass radius (see Eq. 7) shown in blue, or using a fixed 1 'Gyr' timescale shown in green. Allowing gaseous DF to continue below the attenuation radius R lc ('Gas Continues': darker regions, dotted lines) is compared to cutting off the gas along with stars and DM ('Gas Cutoff': lighter regions, dashed lines). F stall is the fraction of mergers with mass ratio µ > 0.1, remaining at separations r > 10 2 pc, at redshift zero.

Applicable Regimes
There is a critical separation at which the back-reaction of the decelerating MBH notably modifies the stellar distribution, and the dynamical friction formalism is no longer appropriate. Beyond this radius, the finite number of stars in the accessible region of parameter space to interact with the MBH(B)-the 'loss-cone', (LC; see, Merritt 2013)-must be considered explicitly, discussed more thoroughly in §3.2. The 'loss-cone' radius can be approximated as (BBR80), Stars and DM are effectively collisionless, so they can only refill the LC on a slow, diffusive scattering timescale. Gas, on the other hand, is viscous and supported thermally and by turbulent motion which can equilibrate it on shorter timescales. In our fiducial model, we assume that for separations r < R lc the dynamical friction due to stars and DM is attenuated to low values, but that of gas continues down to smaller separations. We set the inner edge of gaseous DF based on the formation of a (circumbinary) accretion disk on small scales (discussed in §3.3). The attenuation prescription given by BBR80 increases the dynamical friction timescales by a factor, where N = 1 M r 0 4πr 2 ρ dr , is the number of stars available to interact with the binary. For all intents and purposes this negates the effectiveness of DF for r R lc , such that without other hardening mechanisms (which become important at smaller scales), no MBHB would coalesce within a Hubble time.

DF Hardening Rates
The resulting hardening timescales, τ h = a/ (da/dt), for our different DF implementations are shown in Fig. 4. We show evolution for 'bare' MBH secondaries (red), in addition to effective masses enhanced ('enh') by the secondary's host galaxy for a dynamical time calculated using twice the 'stellar' half-mass radii (blue), or with a fixed 'Gyr' time scale (green). In each case we also compare between letting gas-DF continue below R lc ('Gas Continues': darker colors, dotted lines), versus attenuating gas along with stars and DM ('Gas Cutoff': lighter colors, dashed lines). As a metric of the varying outcomes, we calculate the fraction of 'stalled' majormergers F stall , defined as the number of major mergers (mass-ratios µ ≡ M 2 /M 1 > 0.1; which are 6040 out of the full sample of 9270, i.e. ∼ 65%) remaining at separations larger than 100 pc at redshift zero, divided by the total number of major-mergers. Attenuation of the DF begins below 100 pc, so F stall are unaffected by whether the gas DF is also cutoff.
For r 100 pc, where the density of stars and especially dark matter dominate that of gas, the hardening rates are the same with or without a separate treatment of gas. Hardening differs significantly however, between the 'bare' and enhanced models-with the latter hardening more than an order of magnitude faster at the largest separations 3 (∼ 10 4 pc). After a dynamical time, the 'stellar' enhancement runs out and the hardening rate approaches that of a bare MBH secondary by ∼ 10 3 pc. Still, the enhanced mass over this time leads to a decrease in the fraction of stalled binaries from ∼ 63% to ∼ 37%. When the mass enhancement persists for a gigayear-about a factor of ten longer than typical dynamical times-a large fraction of MBHB are able to reach parsecscale separations before tidal stripping becomes complete, leading to only ∼ 7% of major-mergers stalling at large separations. The particular fraction of stalled systems is fairly sensitive to the total mass and mass ratio cutoff, which we return to in §4.3.
Previous studies (e.g. Ravi et al. 2014) have assumed that DF is very effective at bringing MBHB into the dynamically 'hard' regime (i.e. instantly in their models), after which stellar interactions must be calculated explicitly to model the remaining evolution. For comparison, in our results we also include a 'Force-Hard' model in which we assume that all binaries reach the hard regime (r = R h ) over the course of a dynamically time 4 .
It has long been suggested that MBHB could stall at kiloparsec-scale separations (e.g. Yu 2002), but only recently has this effect been incorporated into population hardening models and studied specifically (e.g. McWilliams et al. 2014;Kulier et al. 2015). By better understanding the timescales over which stalled systems could be observable, we can use observed dual-AGN to constrain the hardening process and the event rates of MBHB encounters.

Stellar Loss-Cone Scattering
The population of stars that are able to interact (scatter) with the MBHB are said to occupy the 'loss-cone' (LC, Merritt 2013), sonamed because it describes a conical region in parameter space.
When stars are scattered out of the LC faster than they can be replenished, the LC becomes depleted and the dynamical friction description, which considers a relatively static background, becomes inconsistent. One approach to compensate for this is to add an 'attenuation' factor, as described in the previous section ( §3.1). Physically, a steady state must be dynamically realized in which stars are diffused into the outer edges of the LC via two-body relaxation at the same rate at which stars are scattering out by the central MBH(B). The largest uncertainty in the MBHB merger process is likely understanding the nature of this equilibrium state, and how it is affected by realistic galaxy-merger environments.
The loss-cone has been extensively explored in both the context of MBH binary hardening, as well as in tidal-disruption events (TDE). The two cases are almost identical, differing primarily in that for TDE calculations, only impact parameters small enough to cause disruptions are of interest-while for binary hardening, weaker scattering events are still able to extract energy from the MBH or MBHB. For binary hardening, there is an additional ambiguity in two subtly distinct regimes: first, where stars scatter with individual BHs, decelerating them analogously to the case of dynamical friction (but requiring the LC population to be considered explicitly). Second, for a truly bound binary, stars can interact with the combined system-in a three-body scattering-and extract energy from the binary pair together. In our prescriptions we do not distinguish between these cases, considering them to be a spectrum of the same phenomenon instead.
We use the model for LC scattering given by Magorrian & Tremaine (1999), corresponding to a single central object in a spherical (isotropic) background of stars. We adapt this prescription simply by modifying the radius of interaction to be appropriate for scattering with a binary instead of being tidally disrupted by a single MBH. This implementation is presented in pedagogical detail in Appendix §C. Scattering rates are calculated corresponding to both a 'full' LC (Eq. C6), one in which it is assumed that the parameter space of stars is replenished as fast as it is scattered; in addition to a 'steady-state' LC (Eq. C7), in which diffusive twobody scattering sets the rate at which stars are available to interact with the binary.
The interaction rates (fluxes) of stars scattering against all MBHB in our sample are shown in the upper panel of Fig. 5 for both full (red, Eq. C6) and equilibrium (blue, Eq. C7) loss-cone configurations. The interaction rates for full LC tend to be about six orders of magnitude higher than equilibrium configurations. The resulting binary hardening timescales are shown in the lower panel of Fig. 5-reaching four orders of magnitude above and below a Hubble time. Clearly, whether the LC is in the relatively low equilibrium state or is more effectively refilled has huge consequences for the number of binaries which are able to coalesce within a Hubble time.
Many factors exist which may contribute to quickly refilling the loss cone. In general, any form of asymmetry in the potential will act as an additional perturber-increasing the thermalization of stellar orbits. The presence of a MBHB is premised on there having been a recent galaxy merger-implying that significant asymmetries and aspherical morphologies may exist. Even ignoring galaxy mergers, galaxies themselves are triaxial (e.g Illingworth 1977;Leach 1981), many have bars (e.g. Sellwood & Wilkinson 1993), and in star forming galaxies there are likely large, dense molecular clouds (e.g. Young & Scoville 1991) which could act as perturbers. Finally, because binary lifetimes tend to be on the order of the Hubble time while galaxies typically undergo numerous merger events (e.g. Rodriguez-Gomez et al. 2015), subsequent merger events can 10 -9 10 -7 10 -5 10 -3 10 -1 10 1 10 3 Full Loss-Cone Steady-State Loss-Cone 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2 Binary Separation [pc] 10 5 10 7 10 9 10 11 10 13 10 15

Loss-Cone Hardening
Timescale [yr] Interval: 68% 95% Figure 5. Scattering rates and hardening timescales for full (red) and steady-state (blue) loss-cones. The bands represent 68% and 95% of the population around the median. The difference between the two extremes of LC states is a stark six orders of magnitude, illustrating how strong of an effect the LC can have on MBHB mergers. We use a simple, single parameter prescription to describe the state of the LC: the fraction, in log-space, between steady-state and full, F refill (see, Eq. 11).
lead to triple MBH systems (see: §4.4) which could be very effective at stirring the stellar distribution. While there is some evidence that for galaxy-merger remnants the hardening rate can be nearly that of a 'full' LC (e.g. Khan et al. 2011), the community seems to be far from a consensus (e.g. Vasiliev et al. 2014), and a purely numerical solution to the LC problem is currently still unfeasible.
In the future, we plan on incorporating the effects of triaxiality and tertiary MBH to explore self-consistent LC refilling. In our current models, we introduce an arbitrary dimensionless parameterthe logarithmic 'refilling fraction' F refill (in practice, but not requisitely, between [0.0, 1.0])-to logarithmically interpolate between the fluxes of steady-state (F eq lc ) and full LC (F full lc ), i.e.,

Viscous Hardening by a Circumbinary Disk
The density of gas accreting onto MBH can increase significantly at separations near the accretion or Bondi radius, R b ≡ GM • /c 2 s , where the sound-crossing time is comparable to the dynamical time. The nature of accretion flows onto MBH near and within the Bondi-radius are highly uncertain, as observations of these regions are currently rarely resolved (e.g. Wong et al. 2011). If a high density, circumbinary disk is able to form, the viscous drag (VD) can be a significant contribution to hardening the binary at separations just beyond the GW-dominated regime (BBR80, Gould & Rix 2000;Escala et al. 2005b). Galaxy mergers are effective at driving significant masses of gas to the central regions of post-merger galaxies (Barnes & Hernquist 1992), enhancing this possibility.
We implement a prescription for VD due to a circumbinary accretion disk following Haiman et al. (2009, hereafter HKM09) based on the classic thin-disk solution of Shakura & Sunyaev (1973), broken down into three, physically distinct regions (Shapiro & Teukolsky 1986). These regions are based on the dominant pressure (radiation vs. thermal) and opacity (Thomson vs. free-free) contributions, such that the regions are defined as, 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 Eddington Ratio . Accretion rates at the time of binary formation for all MBHB in our analysis. Values are measured as a fraction of the Eddington accretion rate,Ṁ Edd ≡ L Edd /ε rad c 2 , where we use ε rad = 0.1. Recall that in Illustris MBH merge when they come within a smoothing length of one anothercorresponding to the formation of a binary in our models. The accretion rates from Illustris are those of the resulting remnant MBH, and are limited to Eddington ratios of unity. The upper panel shows the distribution of accretion rates (bars) and cumulative number (line), which are strongly biased towards near-Eddington values. The lower panel shows the cumulative distribution of accretion rates above each value (note the different x-axis scaling). The median Eddington ratio of 0.75 is overplotted (grey, dashed line). 1) r < r 12 , radiation pressure and Thomson-scattering opacity dominated; 2) r 12 < r < r 23 , thermal pressure and Thomson-scattering opacity dominated; 3) r 23 < r, thermal pressure and free-free opacity dominated.
Recall that in Illustris, 'mergers' occur when MBH particles come nearer than a particle smoothing length, after which the MBH are combined into a single, remnant MBH. We track these remnant particles and use their accretion rates (Ṁ) to calibrate the circumbinary disk's gas density. The distribution of Eddington ratios (Ṁ/Ṁ Edd ) for these remnants, at the time of their formation, are presented in Fig. 6, showing a clear bias towards near-Eddington accretion rates. MBH remnants tend to have enhanced accretion rates for a few gigayear after merger, and have higher average accretion than general BHs (Blecha et al. 2016). In Illustris, MBH accretion (and thus growth in mass) is always limited to the Eddington accretion rate. We introduce a dimensionless parameter f Edd to modulate those accretion rates, i.e.,Ṁ = Min Ṁ ill , f EddṀEdd .
Otherwise, in the formalism of HKM09, we use their fiducial parameter values 5 and assume an α-disk (i.e. the viscosity depends on total pressure, not just thermal pressure as in a so-called β-disk). The outer disk boundary is determined by instability due to selfgravity-measured as some factor times the radius, r Q , at which the Toomre parameter reaches unity, i.e. R SG = λ sg r Q . In our fiducial model, λ sg = 1, and variations in this parameter have little effect on the overall population of binaries. After marginalizing over all systems, changes to the different viscous-disk parameters tend to be largely degenerate: shifting the distribution of hardening timescales and the GWB amplitude in similar manners.

-4
10 -3 10 -2 10 -1 10 0 10 1 Radius [pc]  VD hardening timescales tend to decrease with decreasing binary separations. They thus tend to be dominated by Region-3at larger separations. For near-Eddington accretion rates, however, Region-2 and especially Region-3 tend to be self-gravity unstable, fragmenting the disk and eliminating VD altogether. For this reason, when high accretion rate systems have dynamically important disks, they tend to be in Region-1. In these cases, Region-1 extends to large enough radii such that for most masses of interest, GW emission will only become significant well within that region of the disk. Lower accretion rate systems are stable out to much larger radii, allowing many binaries to stably evolve through Region-2 and Region-3. These regions also cutoff at smaller separations, meaning that GW emission can become significant outside of Region-1.
Decreased disk densities mean less drag, but at the same time sufficiently high densities lead to instability, making the connection between accretion rate and VD-effectiveness non-monotonic. This is enhanced by gaseous DF, with an inner cutoff radius determined by the SG radius (see §3.1.2). In other words, gaseous DF is allowed to continue down to smaller radii when the outer disk regions become SG unstable. We impose an additional, absolute upper-limit to the SG instability radius of R SG,Max = 10 pc, i.e. R SG = Min λ sg r Q , R SG,Max , to keep the outer-edge of disks physically reasonable. Fig. 7 shows the fraction of Illustris binaries in the different regions of the disk, for our fiducial model. Only a fraction of MBHB spend time in Region-2 and Region-3 disks, and even that is only for a small region of log-radius space. While almost all systems do enter Region-1 by about 10 3 R s , GW hardening has, in general, also become significant by these same scales.
In addition to the spatially-distinct disk regions, different types of migration occur depending on whether the disk or the secondary-MBH is dynamically dominant (analogous to the distinction between 'planet-dominated' and 'disk-dominated', Type II migration in planetary disks-see, HKM09). If disk-dominated, the system hardens on the viscous timescale τ v , whereas if the secondary is dominant-as is the typical case in our simulations, the timescale is slowed by a factor related to the degree of secondary-dominance.
The resulting hardening timescales due to VD from a circumbinary disk are shown in Fig. 8. The more massive binaries 10 -7 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 10 1 Binary Separation [pc] 10 -4 Interval: 50.0% Figure 8. Hardening timescales due to circumbinary disk drag for binaries grouped by total mass and mass ratio. Light and heavy MBHB are separated by total masses below and above 10 9 M respectively; and minor and major based on mass ratios (µ) below and above 1/4. The median and surrounding 50% intervals for all MBHB systems are shown in grey, showing that 'light' systems dominate the bulk of the binary population. Heavy, and especially heavy-major, systems tend to harden orders of magnitude faster than lighter ones. The light population (especially major) exhibits nonmonotonicities at intermediate separations (∼ 10 −3 -10 −1 pc) indicative of changes between disk regions. Heavy systems (especially minor), on the other hand, show much smoother hardening rates consistent with moving through primarily Region-1.
('heavy', M > 10 9 M ) have orders of magnitude shorter VD hardening times, but are quite rare. The overall trend (grey, crosshatched) follows the less massive ('light') systems. The changes in slope of the 'light' populations (especially 'Major', µ ≡ M 2 /M 1 > 1/4) at separations larger than 10 −3 pc are due to transitions in disk regions. The 'heavy' systems tend to have SG unstable Regions 2 & 3, and thus harden more smoothly, predominantly due to Region-1. Fig. 9 compares median hardening rates in simulations including VD (solid) with those without a disk (dashed). In the former case, the purely VD hardening rates are also shown (dotted)-with the maximum disk cutoff R SG,Max , clearly apparent at 10 pc. Different dynamical friction prescriptions are shown, with mass enhancements over dynamical times calculated using the 'stellar' method in blue, and fixed 1 Gyr timescales in red (see §3.1). The upper panel shows a moderately refilled loss cone (F refill = 0.6), while in the lower panel the LC is always full (F refill = 1.0). The effects of VD are clearly apparent below a few 10 −2 pc in all models, and up to R SG,Max = 10 pc when F refill = 0.6. In the F refill = 1.0 case, LC hardening dominates to much smaller separations, making the VD effects minimal for the overall hardening rates. This is echoed in the changes in coalescing fractions 6 between the VD and No-VD cases: for F refill = 0.6, VD increases F coal by ∼ 10%, while for F refill = 1.0, it is only increased by ∼ 2%.
The circumbinary disk is SG unstable for many systems, and thus the median hardening rates including VD are often intermediate between the purely VD timescales and simulations without VD at all (e.g., seen between ∼ 5 × 10 −3 -1 pc in the upper panel of Fig. 9). At smaller separations ( 10 −3 pc), where LC is almost always subdominant to VD and/or GW hardening, the VD decreases the median hardening timescales by between a factor of . Median Hardening timescales with and without drag from a circumbinary Viscous Disk (VD) for different hardening models. Different LC refilling parameters (F coal = 0.6, upper; and F coal = 1.0, lower) are compared against DF models ('Enh-Stellar', blue; 'Enh-Gyr', red). Each line type shows a different hardening rate: viscous drag only (dotted), and cumulative hardening rates with VD (solid) and without VD (dashed). With F refill = 0.6, VD effects are apparent up to the disk cutoff, R SG,Max = 10 pc, whereas for F refill = 1.0, LC scattering dominates down to ∼ 10 −2 pc. Similarly, the presence of a circumbinary disk has a much more pronounced effect on the fraction of high mass-ratio (µ > 0.1) systems which coalesce by redshift zero (F coal ), which are indicated in the legends. At very low separations the cumulative (with-VD) hardening rate is very nearly the purely VD rate, showing its importance down to very small scales. At intermediate separations the 'cumulative, VD' rate is intermediate between the 'cumulative, No-VD' and the 'VD only' model, showing that the disk is only present in some fraction of systems at those scales. a few, and an order of magnitude. Notably, Fig. 9 shows that the scaling of hardening rate with separation below about 10 −3 pc is very similar between that of GW (which is dominant in the No-VD case) and VD. At these small scales, 80% of our binaries are in disk Region-1 (see Fig. 7), which has a viscous hardening rate, τ v,1 ∝ r 7/2 (HKM09, Eq. 21a), compared to a very similar scaling for GW, τ gw ∝ r 4 (see §3.4). Thus, even when VD dominates hardening into the mpc-scale regime, we don't expect the GWB spectrum from binaries in Region-1 to deviate significantly from the canonical −2/3 power-law.
Differences in median hardening timescales, solely due to viscous drag, are compared for a variety of VD parameters in Fig. 10. A simulation with our fiducial disk parameters is shown in dashedblack, and each color shows variations in a different parameter. Decreasing the viscosity of the disk (α, green) amounts to a proportional increase in the hardening timescale, and decrease in the coalescing fractions (F coal ). Decreasing the maximum disk radius (R SG,Max , red) decreases the overall effectiveness of VD, but because gaseous DF continues in its place, the coalescing fraction remains unchanged. While R SG,Max changes the maximum disk radius, λ sg changes the radii at which Region-2 & Region-3 become SG-unstable directly (i.e. even well within the maximum cutoff radius). Increasing λ sg by a factor of four (blue) significantly in-10 -4 10 -3 10 -2 10 -1 10 0 10 1 Binary Separation [pc]  The radius at which the disk becomes Self-Gravity (SG) unstable is R SG = Min λ sg r Q , R SG,Max , where r Q is the radius at which the Toomre parameter reaches unity. λ sg scales the SG-unstable radius, while R SG,Max is a maximum cutoff radius. α is the standard disk viscosity parameter, and f Edd limits the maximum accretion rate, i.e.Ṁ = Min Ṁ ill , f EddṀEdd . While this variety of VD parameters produces hardening rates varying by two orders of magnitude, the resulting changes to the coalescing fraction F coal is fairly moderate as VD is often subdominant to LC scattering at larger radii and to GW emission at smaller radii. Effects on F coal can be counterintuitive, for example decreasing the accretion rate (purple line) increases the median hardening timescale, but increases the coalescing fraction because the disk becomes SG-stable for a larger fraction of binaries. Each model uses F refill = 0.6, and the 'Enh-Stellar' DF.
creases the number of MBHB with SG-stable Region-2 7 between ∼ 10 −2 & 10 −1 pc, increasing the overall coalescing fraction. Decreasing the accretion rates ( f Edd , purple), and thus disk densities, increases the hardening timescales similarly to changing the viscosity (green). At the same time, significantly more systems have stable outer disks. This has the effect of increasing coalescing fractions noticeably, despite the increased median hardening timescales. In addition to increased outer-disk stability, the transition between disk regions are also inwards. A large number of MBHB at small separations ( 10 −3 pc) remain in disk Region-2 instead of transitioning to Region-1. This softens the scaling of hardening rate with separation to, τ v,2 ∝ r 7/5 (HKM09, Eq. 21b), which differs much more significantly from purely GW-driven evolution.

Gravitational-Wave Emission
Gravitational wave radiation will always be the dominant dissipation mechanism at the smallest binary separations-within hundreds to thousands of Schwarzschild radii. GW hardening depends only on the constituent masses (M 1 & M 2 ) of the MBHB, their separation, and the system's eccentricity. The hardening rate can be expressed as (Peters 1964), 1 + 73 24 e 2 + 37 96 e 4 1 − e 2 7/2 , where a is the semi-major axis of the binary and e is the eccentricity. In our treatment we assume that the eccentricities of all MBHB are uniformly zero, in which case, Eq. 12 can easily be integrated to find the time to merge, for a total mass M = M 1 + M 2 , mass ratio µ ≡ M 2 /M 1 , initial separation a 0 and critical separation R crit . In practice, we assume that the GW signal from binaries terminates at the Inner-most Stable Circular Orbit (ISCO), at which point the binary 'coalesces'; i.e. R crit = R isco (J = 0.0) = 3R s . For an equal mass binary, with median Illustris MBH masses 8 of about 10 7 M , the binary needs to come to a separation of ∼ 0.01 pc (∼ 10 4 R s ), to merge within a Hubble time. Characteristic timescales and separations for (purely) GW-driven inspirals across total mass and mass ratio parameter space are plotted in Fig. A2 of Appendix A. While the absolute most-massive MBHB can merge purely from GW emission starting from a parsec, the bulk of physical systems, at 10 6 − 10 8 M , must be driven by environmental effects to separations on the order of 10 −3 − 10 −2 pc (∼ 500 − 5000 R s ) to coalesce by redshift zero.

RESULTS
The hardening timescales for all Massive Black Hole Binaries (MBHB) are plotted against binary separation in Fig. 11, broken down by hardening mechanism. This is a representative model with a moderate loss-cone (LC) refilling fraction F refill = 0.6 (see §3.2), using the 'Enh-Stellar' DF (see §3.1). This is the fiducial 8 after typical selection cuts, described in §2.

Binary Lifetimes
Characteristic hardening timescales are often many 100 Myr, and MBHB typically need to cross eight or nine orders of magnitude of separation before coalescing. The resulting lifetimes of MBHB can thus easily reach a Hubble time. Fig. 12 shows binary lifetimes (upper panels) and the fraction of systems which coalesce by z = 0 (lower panels) for our fiducial model. Systems are binned by total mass and mass ratio, with the number of systems in each bin indicated. The plotted lifetimes are median values for each bin, with the overall distribution shown in the upper-right-most panel. Grey values are outside of the range of binned medians, and the cumulative distribution is given by the dashed line. The lifetime distribution peaks near the median value of 29 Gyr, with only ∼ 7% of lifetimes at less than 1 Gyr. About 20% of all MBHB in our sample coalesce before redshift zero. Systems involving the lowest mass black holes 1 (i.e. down and left) tend towards much longer lifetimes. Overall, lifetimes and coalescing fractions are only mildly correlated with total mass or mass ratio, when marginalizing over the other. For systems with total masses M > 10 8 M , the coalescing fraction increases to 23%, and for mass ratios µ > 0.2, only slightly higher to 26%. In general, examining slightly different total-mass or mass-ratio cutoffs has only minor effects on lifetimes and coalescing fractions.
There is a strong trend towards shorter lifetimes for simultaneously high total masses and mass ratios (i.e. up and right), where median lifetimes are only a few gigayear. Considering both µ > 0.2 and at the same time M > 10 8 M , coalescing fractions reach 45%. The handful of MBHB which coalesce after 1 Myr (∼ 0.3%) tend to involve MBH in over-massive galaxies (i.e. galaxy masses larger than expected from MBH-host scalings) with especially concentrated stellar and/or gas distributions. There are a handful of high mass ratio, and highest total mass MBHB (M ∼ 10 10 M ) systems showing a noticeable decrease in coalescing fraction. These systems form at low redshifts and don't have time to coalesce despite relatively short lifetimes.
Lifetimes and coalescing fractions for an always full LC (F refill = 1.0) are shown in Fig. A4, in Appendix A. The median value of the lifetime distribution shifts down to ∼ 8 Gyr, with ∼ 24% under 1 Gyr. The coalescing fractions increase similarly, and systems which are either high mass ratio (µ 0.2) or high total mass (M 10 8 M ) generally coalesce by redshift zero. Specifically, the coalescing fractions are 50% and 61% for all systems and those with µ > 0.2 respectively. Considering only M > 10 8 M , fractions increase to 54% & 99%.
Cumulative distributions of MBHB lifetimes are compared in Fig. 13 for a variety of LC refilling factors (colors) and our three primary DF prescriptions (panels; see §3.1). The first two panels correspond to prescriptions where the effective masses used in the DF calculation are the sum of the secondary MBH mass and the mass of its host galaxy. To model stripping of the secondary galaxy during the merger process, the effective mass decreases as a power law to the 'bare' MBH mass after a dynamical time. The 'Enh-Stellar' model (upper) calculates the dynamical time at twice the stellar half-mass radius, and the mass there enclosed. The 'Enh-Gyr' model (middle), on the other hand, uses a fixed 1Gyr timescale-almost a factor of ten longer than the median 'stellar'calculated value. Finally, the 'Force-Hard' model (lower), uses the 'bare' secondary MBH mass, but the binary is forced to the hard binary regime (generally 1 -10 pc) over the course of a dynamical time (calculated in the 'stellar' manner). Each color of line indicates a different LC refilling fraction, from always full (F refill = 1.0; blue) to the steady state (F refill = 0.0; orange). The fraction of high mass-ratio (µ > 0.1) systems which coalesce by redshift zero (F coal ) are also indicated in the legends.
The high mass-ratio coalescing fractions tend to vary by al- most a factor of four depending on the LC state, while the varying DF prescriptions have less than factor of two effect. Median lifetimes change considerably, however, even between DF models, for example with F refill = 1.0, the median lifetime for the 'Enh-Stellar' model is 7.7 Gyr, while that of 'Enh-Gyr' is only about 0.42 Gyr. Apparently, with a full LC, DF at large scales tends to be the limiting factor for most systems. While the highest overall F coal occurs for 'Force-Hard' & F refill = 1.0, it takes almost an order of magnitude longer for the first ∼ 10% of systems to coalesce than in either of the 'Enh' models. The effects of DF on the lifetimes of the first systems to merge are fairly insensitive to the LC state. There are thus cases where DF can be effective at driving some systems to coalesce very rapidly. At the same time, for the bulk of systems, after hardening past kiloparsec scales the remaining lifetime can be quite substantial. For F refill 0.6, neither the precise LC refilling fraction nor the DF model make much of a difference after the first 10 -30% of systems coalesce. In these cases, the most massive systems with high-mass ratios coalesce fairly rapidly regardless, but the smaller more extreme mass-ratio systems take many Hubble times to merge. Figure 14 shows the distribution of formation (black) and coalescence (colored) redshifts resulting from a variety of binary evo-10 -3 10 -2 10 -1 10 0 10 1 Redshift . Distribution of MBH binary formation (black) and coalescence (colored) redshifts for different DF models (panels) and two LC refilling parameters: always full (F refill = 1.0; blue) and our fiducial, moderately refilled (F refill = 0.6; green) value. The median redshift for each distribution is also plotted (dashed), with the corresponding look-back time indicated. The minimum delay time between medians of formation and coalescence is 1 Gyr, but up to 4.5 Gyr for our fiducial LC state and DF model ('Enh-Stellar').
lution models. Each panel shows a different DF prescription, and two LC refilling parameters are shown: always full, F refill = 1.0 (blue), and our fiducial, moderately refilled value of F refill = 0.6 (green). A handful of events have been cutoff at low redshifts (z < 10 −3 ) where the finite volume of the Illustris simulations and cosmic variance becomes important. Median redshifts for each distribution are overplotted (dashed), along with their corresponding look-back times. The median formation redshift for our MBHB is z = 1.25 (look-back time of ∼ 8.7 Gyr). For a full LC and the stronger DF models, 'Enh-Gyr' and 'Force-Hard', the median coalescence redshifts are delayed to z ∼ 1.0 and z ∼ 0.9 respectivelyi.e. by about a gigayear. For our more modest, fiducial DF prescription, 'Enh-Stellar', even the full LC case still delays the median coalescence redshift to z ∼ 0.6, about 3 Gyr after the peak of MBHB formations. If the LC is only moderately refilled, the median redshifts are much lower: between z ∼ 0.4 -0.6.

The Gravitational Wave Background (GWB)
In §1 we have outlined the theoretical background for the existence of a stochastic GWB, and introduced the formalism for calculating 10 -3 10 -2 10 -1 10 0 10 1 GW Frequency (Observed) [ 10 -10 10 -9 10 -8 10 -7 GW Frequency (Observed) [s −1 ] Figure 15. Stochastic Gravitational Wave Background spectrum produced by Illustris MBH binaries, assuming purely power-law evolution with all systems efficiently reaching the GW regime. Power-law predictions from the literature (described in §1) are presented for comparison, along with the most recent PTA upper limits. The power-law spectrum resulting from the Illustris simulations is very consistent with previous results, and about 30% below the most stringent observational upper limits. pure power-law spectra. Fig. 15 shows the purely power-law spectrum derived from Illustris MBH binaries, assuming that all systems (passing our selection cuts outlined in §2) reach the GW dominated regime and evolve purely due to GW emission. Other representative power-law predictions (see §1) and recent pulsar timing array (PTA) upper limits are included for comparison. The Illustris prediction is completely consistent with the existing literature and about 30% below the most recent PTA upper limits. These consistencies validate the Illustris MBHB population, and the prescriptions for the growth and evolution of individual MBH.
Almost all of the details of binary evolution are obscured in purely power-law predictions (i.e. Eq. ??). In particular, they imply that all MBHB instantly reach the separations corresponding to the frequencies of interest, and evolve purely due to GW-emission. In reality, we've shown that the delay time distribution can be significant at fractions of a Hubble time. This has the important consequence that not all MBHB coalesce (or even reach the PTA band) before redshift zero. At the same time, the environmental effects (e.g. LC scattering) which are required to bring MBH binaries to the relevant orbital frequencies also decrease the time they emit in each band, attenuating the GW signal.

Full GWB Calculation Formalism
The GWB can be calculated more explicitly by decomposing the expression for GW energy radiated per logarithmic frequency interval, where the right-hand-side terms are the GW power radiated and the time spent in each frequency band. The latter term can be further rewritten using Kepler's law as, 10 -3 10 -2 10 -1 10 0 10 1 GW Frequency (Observed) [ A yr −1 = 7. 1 × 10 −16 A yr −1 = 3. 7 × 10 −16 Full All MBHB, Power-Law Power-Law + 7. 2 Gyr Parkes: Shannon+2015 10 -10 10 -9 10 -8 10 -7 GW Frequency (Observed) [s −1 ] Figure 16. Stochastic Gravitational Wave Background calculated from Illustris MBH binaries. The 'Full' calculation, shown in red, includes environmental effects from dynamical friction ('Enh-Stellar'), stellar scattering (F refill = 0.6), and a viscous circumbinary disk. Purely power-law models are also shown, for all Illustris MBHB (blue, dashed) and only the MBHB which coalesce by redshift zero after being delayed for 7.2 Gyr (purple, dotted). The GWB strain amplitudes at the standard frequency of 1 yr −1 are given, showing that a complete model of MBHB evolution leads to a ∼ 50% decrease of the signal. The most stringent observational upper limits are also shown.
where 'a' is the semi-major axis of the binary. In this expression, we can identify the binary 'hardening time' 2 , τ h ≡ a/ (da/dt r ).
For reference, the binary separations corresponding to each GW frequency are shown in A1. While the GW power radiated is determined solely by the binary configuration (chirp mass and orbital frequency), the hardening time is determined by both GW emission and the sum of all environmental hardening effects. For more generalized binary evolution we can write, This can be used to reformulate the GWB spectrum calculation 3 (Eq. ??) as, or for discrete sources, Additional hardening mechanisms will decrease the hardening timescale, i.e. τ h /τ gw 1, decreasing the GWB. The purely powerlaw expression in Eq. 1 (and the Illustris spectrum in Fig. 15) thus represents an upper-limit to the GWB amplitude. While non-GW mechanisms are required to bring MBH binaries close enough to effectively emit gravitational waves, they also attenuate the amplitude of the GW background.

Fiducial Model Predictions
The stochastic Gravitational Wave Background (GWB) resulting from our fiducial model is presented in Fig. 16. The 'Full' cal-culation (red, solid) uses Eq. 18, including the effects of DF, LC scattering, and VD in addition to GW emission. This is compared to a purely power-law model (blue, dashed), calculated with Eq. 1 and assuming that all Illustris MBHB reach the PTA-band rapidly, and evolve solely due to GW-emission. The amplitudes at 1 yr −1 are indicated, showing that the full hardening calculation with an amplitude of A yr −1 ≈ 3.7 × 10 −16 amounts to an almost 50% decrease from the naive, power-law estimate of A yr −1 ≈ 7.1 × 10 −16 . The amplitude of the full GWB calculation can be matched at 1 yr −1 using the power-law model by introducing a uniform delay time of ∼ 7.2 Gyr-such that the systems which formed within a look-back time of 7.2 Gyr don't coalesce or reach the relevant frequency ranges. This is shown in Fig. 16 (purple, dotted) as a heuristic comparison. At frequencies of the PTA band (∼ 0.1 yr −1 ) and higher, our full calculation very nearly matches the A yr −1 ∝ f −2/3 power-law. A significant flattening of the spectrum is apparent at and below a few 10 −2 yr −1 , where environmental effects (e.g. LCscattering) significantly increase the rate at which MBHB move through a given frequency band, decreasing τ h and thus attenuating the amplitude of the GWB. The particular location and strength of the spectral flattening (or turnover) depends on the details of the DF and especially LC models.

GWB Variations with Dynamical Friction and Loss-Cone
Model Parameters Figure 17 compares the GWB spectrum for different DF prescriptions (panels) and LC refilling fractions (line colors). The naive, power-law model is shown as the dashed line for comparison, along with the most stringent PTA upper limit. Effects from variations in the DF prescription are strongly subdominant to changes in the LC state. The spectral shape is determined almost entirely, and at times sensitively, to F refill . For F refill < 0.8, the spectrum flattens at low frequencies, whereas for higher values it becomes a turnover. Even then, the location of the peak amplitude of the spectrum changes by more than a factor of two between F refill = 0.8 and F refill = 1.0. The cutoff seen in the full LC case (F refill = 1.0) is very similar to that found by Sesana (2013a) (with ours ∼ 5 times lower amplitude), who show that in the scattering-dominated regime the GWB turns into a h c ∝ f spectrum. McWilliams et al. (2014) also find a spectral cutoff, but at an order of magnitude higher frequency and amplitude. Unlike the results of Ravi et al. (2014), the cutoffs in our predicted GWB spectra are always at lower frequencies than will be reached by PTA in the next decade or so, likely because we assume zero eccentricity in the binary evolution. In the near future we hope to present results expanded to include eccentric evolution, in addition to exploring 'deterministic' or 'continuous' GW sources-i.e. sources individually resolvable by future PTA observations.
For each DF case in Fig. 17, the GWB spectrum is almost identical between F refill = 0.0, 0.2 & 0.4, with very little change in the coalescing fractions. This is consistent with changes in the distribution of lifetimes from varying DF and LC parameters. Looking at f = 1 yr −1 , there is a sudden jump in amplitude with F refill = 0.6, and a modest increase in the coalescing fraction. Between F refill = 0.6 and F refill = 0.8, on the other hand, there tends to be a more modest increase in GWB amplitude, but a roughly factor of two increase in F coal . This contrast arrises from the changing population of MBHB which are brought to coalescence from each marginal change in refilling fraction. For an increase in F refill , the additional MBHB which are then able to coalesce tend to be the most massive of those which were previously persisting. Those, more massive systems, then have a larger effect on the GWB Figure 18 shows the strain at 1 yr −1 (A yr −1 ) versus coalescing fraction for the same set of DF and LC models. The colors again show different F refill , and now symbols are used for different DF prescriptions. The GWB amplitude is strongly correlated with coalescing fraction, but plateaus once roughly 50% of high mass-ratio MBHB are coalescing. Different DF parameters have little effect on A yr −1 but more noticeably affect F coal , in both cases this is especially true at higher F refill . The inset panels show, independently, how A yr −1 and F coal scale with F refill and DF model, reinforcing the previous points. In general, as F refill increases, lower total-mass MBHB systems are able to reach the PTA band, contribute to the GWB and coalesce effectively. At F refill ≈ 0.5, the large increase in A yr −1 is driven by massive MBH coming to coalescence, while at F refill ≈ 0.7, a large number of MBH at lower masses are driven together, significantly increasing the coalescing fraction, but only marginally increasing A yr −1 .
At the higher frequencies just discussed the GWB strain increases monotonically with F refill and coalescing fraction. This is intuitive as increasing effectiveness of the LC means more MBHB are  Figure 18. Dependence of GWB strain amplitude and binary coalescing fraction on LC refilling parameter and DF model. The GW strain is measures at the canonical f = 1 yr −1 , and the coalescing fraction is defined using the population of high mass-ratio µ > 0.1 systems. Each symbol represents a different DF model, and each color a different LC refilling parameter. A yr −1 tends to increase monotonically with F coal , but plateaus above F coal 0.5. The insets show how each A yr −1 and F coal change with F refill . The strongest changes in F refill and A yr −1 occur at slightly different values of the refilling fraction. This is because for an increase in F refill , the additional MBHB which are then able to coalesce tend to be the most massive of those which were previously persisting. At F refill ≈ 0.5 there is a significant change A yr −1 , due to more massive and stronger GW emitting MBHB merger at that point. At F refill ≈ 0.7, F coal changes significantly due to less massive MBHB then being able to merger, and them constituting a larger portion of the binary population. able to reach the GW-regime and then coalesce. Fig. 17 shows that this trend is not the case at lower frequencies (i.e. f 10 −1 yr −1 )where the highest F refill show a decrease in the GWB amplitude. This can be seen more clearly in Fig. A5, which shows the GWB amplitude at f = 10 −2 yr −1 versus coalescing fraction. The trend is generally the same-strain increasing with F refill -until F refill = 1.0 at which point the GWB amplitude drops significantly. At these low frequencies, LC stellar scattering is effective enough to significantly attenuate the GWB amplitude. This reflects a fundamental tradeoff in the realization of environmental effects: on one side bringing more MBHB into a given frequency bands, at the same time as driving their evolution rapidly through it, and attenuating the GW signal.

Effects of Circumbinary Viscous Drag on the GWB
The effects of viscous drag (VD) from a circumbinary disk are more subtle than those of DF and LC. Fig. 19 compares the GWB from our fiducial model (black, dashed) with a variety of VD parameter modifications (colored lines) and to a simulation with VD turned off (grey, dotted). All of these models use the 'Enh-Stellar' DF, but because there is virtually no overlap in between the VD and DF regimes, the results are very similar. The same VD parameters are explored as in the hardening rates shown in Fig. 9: modifying the self-gravity (SG) instability radius (λ sg ), the maximum SG radius (R SG,Max ), the maximum accretion rate ( f Edd ), and the alpha viscosity parameter (α). The upper panel shows a moderately refilled LC (F refill = 0.6), while in the lower panel the LC is always full (F refill = 1.0). The inset panels show the ratio of GWB strain from each model to that of a 'VD: Off' (i.e. no disk) model.
The overall shape of the GWB spectrum and the location of Strain Ratio Figure 19. Gravitational wave background from varying viscous drag (VD) parameters. Simulations with a variety of VD models are compared, with our fiducial model in dashed black, a model with no-VD in dotted grey, and each color of line showing changes to a different parameter. For comparison, the power-law model using all Illustris MBHB is also shown. The parameters modified are: the self-gravity (SG) instability radius (λ sg ), the maximum SG radius (R SG,Max ), the maximum accretion rate ( f Edd ), and the alpha viscosity parameter (α). The hardening rates for each of these models are shown in Fig. 10. The upper and lower panels show simulations for different LC refilling fractions. The inset panels show the ratio of GWB amplitude from each model to the 'VD: Off' case, as a function of GW frequency. Different VD parameters change A yr −1 by 10 -40%, and the spectral slope at 1 yr −1 by up to ∼ 10%. the spectral turnover is again determined almost entirely by the LC. The circumbinary disk does affect an additional 10 -40% amplitude modulation, tending to increase the amplitude at low frequencies ( 10 −2 yr −1 ) and decrease it at higher frequencies ( 10 −1 yr −1 ). This reflects the same tradeoff between bringing more MBHB into each frequency band, versus driving them more rapidly through them. In the moderately (completely) refilled LC case, our fiducial VD model amounts to a ∼ 20% (∼ 30%) decrease in A yr −1 = h c ( f = 1 yr −1 ) and similarly at h c ( f = 10 −1 yr −1 ).
At frequencies near the PTA band, the relationship between F coal and the GWB amplitude can be non-monotonic for VD variations, like with variations to the LC at low frequencies. For example, a comparison of Figures 9 & 19 shows that the α = 0.1 (green) model has the lowest fraction of high mass-ratio coalescences (with F coal = 0.22, versus F coal = 0.24 for the fiducial model, and F coal = 0.31 for the f Edd = 0.1 case) but an intermediate A yr −1 .
One striking feature of the GWB strain ratios is the clear variations in spectral index, even at high frequencies. This is especially . Binary hardening timescales versus GW frequency, by mechanism, for our fiducial model (F refill = 0.6, DF: 'Enh-Stellar'). Lines and bands show the median and 50% intervals for individual mechanisms (colors), along with the total hardening rate (grey, hatched). The inset panels show the fraction of binaries dominated by each mechanism, again versus frequency. The upper panel shows all MBHB in our sample, while the lower panel includes only systems with total mass above 3×10 9 M , roughly where the bulk of the GWB amplitude comes from. Only for the high mass systems do the majority become dominated by GW emission at high frequencies, with VD still contributing substantially to the overall hardening timescales. true for the always full LC, where the slope of the GWB can deviate by almost 10% from the canonical −2/3 power-law. The diskless model (grey, dotted) deviates by about 4% (3%) for F refill = 0.6 (F refill = 1.0) at f = 1 yr −1 , due to a combination of residual LC scattering effects and some binaries stopping emitting after coalesce at varying critical frequencies. In our fiducial model (black, dashed), the deviations are more significant at 6% (8%). As different parameters make VD hardening more important at this frequency, the GWB amplitude decreases, and the spectral index tends to flatten. Our fiducial VD model tends to have among the strongest spectral deviations. Towards lower frequencies, where PTA are heading, the turnover in the GWB spectrum becomes more significant, especially if the LC is effectively refilled. At f = 10 −1 yr −1 , for example, our fiducial model ('Enh-Stellar', F refill = 0.6) gives a spectral index of about −0.6, while for F refill = 1.0 it becomes slightly flatter than −0.4. A summary of GWB amplitudes and spectral indices are presented in Table B1, for a variety of configurations.
For a given binary system, GW radiation will always dominate at some sufficiently small separation (high frequency) where the circumbinary disk dynamically decouples from the hardening MBHB. This does not necessarily mean, however, that after considering a full ensemble of MBHB systems, with a variety of masses, that there is any frequency band with a spectral slope identical to the purely GW-driven case (h c ∝ f −2/3 ). Hardening rates as a function of GW frequency are shown in Fig. 20. The upper panel includes all MBHB in our sample 4 , for which we see that VD remains dominant well above the PTA frequency band. The high total-mass systems (M > 3 × 10 9 M )-which contribute the bulk of the GWB signal-are shown in the lower panel. These binaries tend to be driven in roughly equal amounts by VD and GW hardening at the frequencies where PTA detections should be forthcoming. Figure 20 (and Fig. 11) show that the typical hardening rates for VD are very similar to that of GW radiation. Indeed, as discussed in §3.3, the inner-most disk region has hardening times τ v,1 ∝ r 7/2 , while that of purely gravitational wave emission is τ gw ∝ r 4 . Hardening rates for farther-out disk regions tend to deviate more strongly from that of purely GW evolution, which could become more important for lower density disks.
The MBH accretion rates, which set the density of the circumbinary disks in our models, are perhaps one of the more uncertain aspects of the Illustris simulations, given that the accretion disk scale is well below the resolution limit and must therefore rely on a sub-grid prescription. Additionally, out of all possible configurations, the fiducial disk parameters we adopt tend to produce fairly strong effects on the GWB. If, for example, a β-disk model is more accurate, or the α-viscosity should be lower, the effects in the PTA band will be more moderate (see, e.g., Kocsis & Sesana 2011). None the less, we consistently see GWB spectral indices between −0.6 and −0.65 at 1 yr −1 , for a wide variety of model parameters. While these 10% deviations may be entirely unobservable in PTA observations (especially after taking stochastic variations into account; e.g. Sesana et al. 2008), it may need to be considered when using priors or match-filtering for detecting a GWB. More stringent observational constraints on specifically post galaxy-merger AGN activity could be used to better calibrate the VD model.

The Populations of MBHB
For the first time, we have used cosmological, hydrodynamic models which self-consistently evolve dark matter, gas, stars and MBH, to more precisely probe the connection between MBHB mergers and their environments. Previous calculations (see §1) of the GWB using SAM prescribe MBH onto their galaxies based on scaling relations. The MBH population in Illustris, on the other hand, coevolves with, and shapes, its environment. These data are then much better suited to analyze the details of MBHB and GW source populations, and their hosts. Figure 21 shows the distribution of properties for sources contributing to the GWB, from top to bottom: total mass (M), mass ratio (µ), and redshift (z). In the left column, these properties are weighted by squared-strain 5 for each source, and the resulting one-, two-, and three-sigma contours are shown as a function of GW frequency. The right column shows the cumulative distribution over the same source properties, weighted by A 2 yr −1 (solid), compared to the unweighted distribution of all sources contributing at f = 1 yr −1 . Strain-weighted sources tend to be at higher mass-ratios and much higher masses. While the fraction of all binaries rises fairly smoothly with total masses between 10 7 and 10 9 M (dashed, 4 Recall that we select only MBH with masses M > 10 6 M . 5 As seen in Eq. 18, binaries contribute to the strain spectrum in quadrature. shows, from top to bottom, the MBHB total mass, mass ratio, and redshift, weighted by each system's contribution to the GWB amplitude. Contours represent one-, two-, and three-sigma intervals. The Right column shows cumulative distributions, at a frequency of 1 yr −1 , for the same parameters. The solid line weights by contribution to the GWB amplitude (A 2 yr −1 ) and the dashed line is the distribution of the number of sources contributing at 1 yr −1 . The median values by GWB-contribution are roughly constant over GW frequency, at M ≈ 4 × 10 9 M , µ ≈ 0.3, and z ≈ 0.3 for this, our fiducial model. The overall distribution of sources moves noticeably to include lower masses, mass ratios, and redshifts at higher GW frequencies.
The contribution from redshifts above z ≈ 0.4 drops sharply, with 1% of the GWB signal coming from z > 1.0, while still ∼ 20% of all binaries emit there. black line; top-right panel), 90% of the GWB is contributed (solid, black line) by binaries with total mass 10 9 M -simply showing the strong dependence of the GW strain on the total system mass.
The core contribution over all three parameters tends to remain fairly constant over GW frequency, with median values around M ≈ 4 × 10 9 M , µ ≈ 0.3, and z ≈ 0.3. The tails of the distribution drop to noticeably lower values when moving to higher frequencies. This is especially pronounced in the redshift distribution, where at frequencies of a few times 10 −3 yr −1 virtually all GWBweighted sources come from z > 10 −2 , while at f = 1 yr −1 , almost 10% are below that redshift. While ∼ 20% of binaries that reach f = 1 yr −1 come from redshift above z = 1, they only contribute ∼ 0.5% of the GWB amplitude. Lower redshift and higher mass-ratio systems do contribute somewhat disproportionately to the GWB amplitude, but their distributions are altogether fairly consistent with the overall population. The presence of a nonnegligible fraction of low redshift sources motivates the need to explore populations of MBHB in the local universe which could be resolvable as individual 'stochastic' sources, or contribute to angular anisotropies in the GW sky. An analysis of our results in this context is currently underway, and the results will be presented in a future study.
As we move into the forthcoming era of PTA detections it will be increasingly important to use self-consistent hydrodynamic models to better understand the coupling of the MBH populations . The properties of host galaxies for the population of MBHB contributing to the GWB. From top to bottom, rows show the stellar (halfmass) radius, stellar mass (within twice the stellar half-mass radius), and subhalo mass (mass of all particles associated with the galaxy). The left column shows these parameters weighted by their resident MBHB's contribution to the GWB as a function of frequency. The right column shows the cumulative distribution at f = 1 yr −1 , both for contribution to GWB amplitude (solid) and by overall number (dashed). The GWB comes from MBHB predominantly in galaxies which are over-sized and significantly over-massive-especially in stars.
to their host-galaxies and merger environments. The Illustris hostgalaxy properties of our MBHB, at the time of binary formation, are presented in Fig. 22. We show stellar radius, stellar mass and 'subhalo mass', and each of these properties 6 is strongly biased towards higher values when weighting by GW strain. In particular, the median, strain-weighted subhalo and stellar masses are each more than an order of magnitude larger than the median of the host-galaxy population by number. The bias is exceedingly strong for stellar mass, where ∼ 90% of the GWB amplitude is contributed by only ∼ 20% of MBHB host galaxies. Following the galaxies that host MBHB to observe their parameters at the times they contribute significantly to the GW spectrum will be important for any future multi-messenger observations using PTA or predicting and deciphering anisotropies in the GWB (Taylor & Gair 2013;Mingarelli et al. 2013;Taylor et al. 2015). Better understanding host galaxy properties as they evolve in time could also be useful in understanding whether 'offset' AGN (those distinctly separated from the morphological or mass-weighted center of their galaxies) are due to binarity (i.e. a recent, or perhaps not so recent, merger) or possible due to post-coalesce GW 'kicks' (e.g. Blecha et al. 2016). 10 -3 10 -2 10 -1 10 0 Force-Hard 10 -4 10 -3 10 -2 10 -1 10 0 Mass Ratio Figure 23. Fraction of binaries which persist at redshift zero as a function of total mass (left column) and mass ratio (right column). Three different DF models are compared, from top to bottom: 'Enh-Stellar', 'Enh-Gyr', and 'Force-Hard'; and in each case, LC refilling parameters of F refill = 0.6 (green) and F refill = 1.0 (blue) are compared. Different line patterns show binaries with different separations: all separations (r 0.0, dotted), r 1 pc (dot-dashed), and r 10 2 pc (dashed). Note that in each panel the r 10 2 pc distributions are indistinguishable between LC parameters as the LC only takes effect at and below about 10 2 pc. The fraction of persisting systems is very strongly dependent on both DF and LC model. For 'Enh-Stellar', the most conservative DF case, a large fraction of systems remain in the DF regime (r ∼ 10 2 pc), before LC scattering can have a significant effect. 'Force-Hard', on the other hand, represents an approximately optimal DF at large scales, and shows a corresponding dearth of wide separation systems. Observations of the true fraction of systems at these separations could strongly constrain the efficiency of these hardening mechanisms.
Of great observational interest is the presence (e.g. Comerford et al. 2015), or perhaps conspicuous absence (e.g Burke-Spolaor 2011), of dual and binary AGN. The observational biases towards finding or systematically excluding MBH binaries with electromagnetic observations are extremely complex. None the less, understanding the characteristic residence times of binaries at different physical separations, the types of host galaxies they occupy, and the probability they will be observable (e.g. via the amount of gas available to power AGN activity) is crucial to backing out the underlying population, and placing empirical constraints on models of MBHB inspiral. A systematic study of this topic using these data is currently underway (Kelley et al. in prep.).
The fraction of MBHB which persist (i.e. remain uncoalesced) at redshift zero are shown in Fig. 23 as a function of total mass (left column) and mass ratio (right column). Three different separation criteria are shown in each panel: r > 0.0 (i.e. any persisting MBHB; dark, dotted), r > 1 pc (medium, dot-dashed), and r > 10 2 pc (light, dashed). Each row corresponds to a different DF model, and line colors vary by LC refilling fractions. In general, persisting fractions fall rapidly with increasing total mass and moderately with increasing mass ratio, until nearly equal-mass systems where the persisting fractions plummet.
The specific persisting fraction depends quite sensitively on both F refill and DF model. The 'Enh-Stellar' model has by far the most persisting systems, and relatively slight variance with either separation criteria or F refill . For our fiducial model with F refill = 0.6, 80% of all binaries persist, with only weak trends with either total mass or mass ratio: 77% with M > 10 8 M , and 74% with µ > 0.2. For systems fulfilling both requirements, the persisting fraction drops more noticeably to 55%.
The r > 10 2 pc population in particular, is almost solely determined by DF, as the other hardening mechanisms take effect only at smaller scales. At these large separations, persisting fractions for our fiducial model are 46% (all), 45% (M > 10 8 M ), and 33% (µ > 0.2), but for both high total-mass and mass-ratio, the widelyseparated persisting fraction drops dramatically to only 1%. If the DF is more effective, as in the 'Enh-Gyr' model, these fractions decrease significantly to 11% (all), 15% (M > 10 8 M ), 6% (µ > 0.2), and 1% (M > 10 8 M & µ > 0.2). A summary of persisting fractions at both r > 10 2 pc and r > 1 pc, for mass combinations and DF & LC models are presented in Table B1.

MBH Triples
The long characteristic lifetimes we see in our MBHB populations, and the (at times) substantial number of systems which remain at large separations, immediately begs the question of how often a third MBH (i.e. second galaxy merger) could become dynamically relevant. For F refill = 1.0, the median lifetimes of our MBHB tend to be comparable to the median time between binary formation events, and for F refill = 0.6, they are almost an order of magnitude longer. After selection cuts (see §2), 37% of our MBH binaries have subsequent 'merger' events (i.e. a second 'merger' is recorded by Illustris involving an MBH meeting our selection criteria). In our implementation, each of those binary systems are evolved completely independently, even if parts of their evolution are occurring simultaneously 7 . With this caveat, we can still consider, very simplistically, in how many systems the second binary overtakes the first as they harden. Out of the binaries with subsequent events, 83% (31% of all binaries) are overtaken, 76% (28% overall) of those before redshift zero, and 42% (16%) with z > 0.0 and mass ratio µ > 0.2.
The tendency for subsequent binaries to cross in our simulations likely reflects systems' ability to increase noticeably in total mass over the course of the merger process. We emphasize that this is a very simplistic and preliminary investigation. If, for example, MBH remnants tend to receive significant 'kicks' after merger, the resulting fractions could change significantly. None the less, the apparent commonality of candidate multiples suggests that the role of triples should be investigated more thoroughly.
It is unclear how such triple systems should be treated, even in a simple semi-analytic manner (see, however, Bonetti et al. 2016). The conventional wisdom of triple system dynamics is that the lowest mass object will be ejected, while the more massive pair become bound in a binary (e.g. Hills 1975). Such 'exchange' inter-actions are motivated primarily from stochastic scattering events, like those which may occur between stars in dense stellar environments. In these cases, the system can be viewed as nearly dissipationless, and their initial encounter is effectively stochastic. It is our premise, however, that the environments and dynamics of MBH multiples are heavily dissipational. For example, consider an initial pair of MBH which encounter at kpc scales, on a hyperbolic orbit. If the system quickly circularizes, and hardens to scales of 1 − 100 pc, then a third MBH which encounters the system-again at kpc scales-may similarly settle into an outer, roughly-circular orbit forming a hierarchical system. In such a situation, secular instead of scattering dynamics, such as the Kozai-Lidov mechanism (Lidov 1962;Kozai 1962) or resonant migration may be more appropriate than traditional three-body scatterings. In this case, the outer MBH in the triple system may accelerate the hardening of the inner binary, driving it to coalescence (e.g. Blaes et al. 2002). This may be less likely in gas-rich environments which could effectively damp eccentric evolution, but here gas-driven inspiral will likely cause rapid coalescence in any case.
MBH triples forming hierarchically with low to moderate eccentricities may evolve in a resonant fashion. If, on the other hand, environmental effects sufficiently enhance (or preserve initially high) eccentricities of MBHB, then the resulting highly radial orbits may strongly intersect. In that case, a more stochasticscattering-like regime may indeed still be appropriate. Numerous studies have suggested that environmental effects can indeed enhance MBHB eccentricity (e.g. Quinlan 1996;Sesana 2010;Ravi et al. 2014). If the interaction between MBH triples (or even higherorder multiples) is indeed most similar to scattering, then the simplest prescription of removing the lowest-mass BH, with or without some additional hardening of the more massive pair, may still be appropriate (e.g. Hoffman & Loeb 2007). An ejected MBH which may later fall back to the galactic center, while of great observational interest in and of itself, is likely less important for GW emission per se.
The observation of a triple-AGN system could provide insight into the type of system they form (i.e. hierarchical vs. scattering), and their lifetimes. Additionally, MBH ejected by three-body interactions could be observable as offset AGN, and possible confused with binary MBH, or ones 'recoiling' from previous coalescences (e.g. Blecha et al. 2011). We have assumed that recoiling systems do not significantly affect our populations, effectively assuming that kicks are small-which is expected for spin-aligned MBHB. This is motivated by studies which have shown that gravitational torques from circumbinary disks, such as those we consider, can be effective at aligning spins on timescales significantly shorter than a viscous time (e.g. Bogdanović et al. 2007;Dotti et al. 2010;Miller & Krolik 2013).
The MBH populations from the Illustris simulations are well suited for this problem, as they accurately follow the histories and large-scale environments of MBHB systems and host galaxies. As we are currently working on implementing eccentric evolution into our simulations, we plan to explore multi-MBH systems in more detail. This framework will also allow for the treatment of kicked MBH resulting from random spin orientations, if for example the spins of a substantial fraction of MBHB occur in gas-poor environments in which they may not be aligned.
For the first time, we have used the results of self-consistent, hydrodynamic cosmological simulations, with a co-evolved population of Massive Black Holes (MBH) to calculate the plausible stochastic Gravitational-Wave Background (GWB) soon to be detectable by Pulsar Timing Arrays (PTA). We have also presented the first simultaneous, numerical treatment of all classes of MBH Binary (MBHB) hardening mechanisms, discussing the effects of each: dynamical friction, stellar (loss-cone) scattering, gas drag from a viscous circumbinary disk, and gravitational wave emission.
The most advanced previous studies have included only individual environmental effects, for example, calculating dynamical friction (DF) timescales to determine which systems will contribute to the GWB (McWilliams et al. 2014), or attenuating the GWB spectrum due to loss-cone (LC) stellar-scattering (Ravi et al. 2014). We explicitly integrate each of almost ten thousand MBH binaries, from galactic scales to coalescence, using self-consistently derived, realistic galaxy environments and MBH accretion rates. We thoroughly explore a broad parameter space for each hardening mechanism to determine the effects on the MBHB merger process, the lifetimes of systems, and the resulting GWB spectrum they produce.
The resulting lifetimes of MBHB that coalesce by redshift zero are usually gigayears, while that of low total-mass and extreme mass-ratio systems typically extend well above a Hubble time. In our fiducial model, with a modest DF prescription ('Enh-Stellar') and moderately refilled LC (F refill = 0.6), the median lifetime of MBHB with total masses M > 10 8 M is 17 Gyr, with 23% coalescing before redshift zero. Massive systems that also have high mass ratios, µ > 0.2, merge much more effectively, with a median lifetime of 6.9 Gyr and 45% coalescing at z > 0. Increasing the effectiveness of the LC drastically decreases system merger times. For an always full LC (F refill = 1.0), the lifetime of massive systems decreases to 4.9 Gyr and 0.35 Gyr for systems with M > 10 8 M and all mass ratios, and those with µ > 0.2 respectively. The coalescing fractions in these cases doubles to 54% and 99%. A summary of lifetimes and coalescing fractions for different models is presented in Table B1.
The growing number of dual-MBH candidates (e.g. Deane et al. 2014;Comerford et al. 2015) presents the opportunity to constrain binary lifetimes and coalescing fractions observationally. For most of our models, only about 1% of MBHB with total masses M > 10 8 M and mass µ > 0.2 remain at separations r > 10 2 pc at redshift zero. At smaller separations, r > 1 pc, the fractions are dependent on model parameters, but in general between 1 -40%. Tabulated persisting fractions are included in Table B1 for a variety of models and situations. Observational constraints on these fractions can narrow down the relevant parameter space of hardening physics. Accurate predictions for dual-MBH observations must fold in AGN activity fractions and duty cycles, and their correlations with binary merger lifetimes. A comprehensive study of dual-AGN observability predicted by our models, over redshift and different observational parameters, is currently underway.
In addition to measuring the fraction of MBH in associations (e.g. dual AGN) as a function of separation, the redshift distribution of dual-MBH can be useful in understanding their evolution. The Illustris simulations, for example, give a median MBHB formation redshift 8 of z ≈ 1.25. Depending on the parameters of the hardening models, the median coalescing redshift can be anywhere between z ≈ 0.4 -1.0, with z ≈ 0.6 suggested by our fiducial model.
Without electromagnetic observations, GWB detections and upper limits can also be used to inform our understanding of MBH evolution (e.g. Sampson et al. 2015). Even if the fraction of systems which coalesce is quite low, the most massive and high mass-ratio systems, which produce the strongest GW, are difficult to keep from merging. In a simulation with the weakest hardening rates ('Enh-Stellar'; F refill = 0.0) only ∼ 12% of all binaries coalesce by redshift zero, but the GWB amplitude at f = 1 yr −1 is still 0.2×10 −15 -only about a factor of five below the most recent upper limits 9 . In our fiducial model, we use a moderate LC refilling rate (F refill = 0.6) which increases the number of MBHB contributing to the GWB at 1 yr −1 , producing an amplitude of A yr −1 ≈ 0.4 × 10 −15 . Increasing the effectiveness of DF and/or LC scattering tends to increase the amplitude further. Our fiducial model also includes fairly strong viscous drag (VD) from circumbinary disks, which decreases the time MBHB spend emitting in each frequency band, and thus attenuating the GWB. This effect tends to be more subtle, producing GWB attenuation of about 15%. In general, for a fairly broad range of parameters, our simulations yield GWB amplitudes between ∼ 0.3 − 0.6 × 10 −15 . A GWB amplitude of A yr −1 ≈ 0.4 × 10 −15 is less than a factor of three below current detection limits-a parameter space which will likely be probed by PTA within the next decade.
The most stringent PTA upper limits of A yr −1 10 −15 (Shannon et al. 2015) have already excluded a broad swath of previous predictions. Many of those models assume that binary hardening is very effective, with all MBHB quickly reaching the PTA band and emitting an unattenuated signal-i.e. evolving purely due to GW-emission, without additional environmental hardening effects. Following the same procedure, to calculate an upper-limit to the GWB based on our population of MBHB, we find a GWB amplitude of A yr −1 ≈ 0.7 × 10 −15 -slightly below the PTA limit. The Illustris simulation volume is very large for a hydrodynamic simulation, but it lacks the very-rare, most massive MBH in the universe ( 10 10 M ) which could slightly increase our predicted GWB amplitude-although, likely a correction on the order of ∼ 10% 10 (Sesana, private communication). None the less, our upper limit suggests that the current lack of PTA detections shouldn't be interpreted as a missing signal.
Our upper-limit value of A yr −1 ≈ 0.7 × 10 −15 falls just within the lower end of some recent studies (e.g. Ravi et al. 2014;Roebber et al. 2016), but is generally lower than much of the previous literature (see e.g. Table 1, and Fig. 2 of Shannon et al. 2015). Likely, this is at-least partly because the MBH merger rates derived from Illustris are based directly on simulated galaxy-galaxy merger rates. The bulk of existing calculations have either used inferences from (dark matter only) halo-halo mergers which may have systematic issues (see, e.g., Rodriguez-Gomez et al. 2015), or observations of galaxy merger rates which have uncertain timescales. This upper-limit is based on optimistic, GW-only evolution. In our fiducial model, the signal is lower by ∼ 50% due primarily to the moderately refilled LC, and mildly due to VD attenuation.
Variations in the rate at which the stellar LC is refilled has the strongest effect on the shape and amplitude of the GWB spectrum in our simulations, especially at low frequencies. PTA observations are moving towards these frequencies, as the duration of their timing baselines increase. Unlike at higher frequencies where scattering increases the number of MBHB contributing to the GWB, at 10 −1 yr −1 , for example, effective LC refilling leads to attenuation of the GWB from accelerated binary hardening. Here, our spectra tend to lie at amplitudes between 1.5 -2.5 × 10 −15 , with spectral indices between about −0.4 and −0.6-a significant deviation from the canonical −2/3 power-law. At frequencies lower still ( f 10 −2 yr −1 ), the effective LC scattering produces a strong turnover in the GWB spectrum. A summary of GWB amplitudes and spectral indices is presented in Table B1 for both f = 1 yr −1 and 10 −1 yr −1 , and numerous hardening models.
In our fiducial simulation, we find that the median contribution to the GWB comes from binaries at a redshift of z ≈ 0.3, with total masses M ≈ 10 9 M , and mass ratios µ ≈ 0.3. The co-evolved population of MBH and galaxies in Illustris allows us to also examine typical host-galaxy properties for the first time. Galaxies containing MBHB contributing strongly to the GWB are noticeably larger and more massive galaxies. The median stellar mass of galaxies, weighted by GWB contribution, is about 3 × 10 11 M -more than an order of magnitude larger than the median stellar mass for all galaxies.
Based on the merger trees and binary lifetimes produced from our simulations, we have also shown that the presence of higherorder MBH multiples could be an non-negligible aspect of MBH evolution. The simplest examination suggests that triples could be important in about 30% of MBHB in our simulations. In future work, we will explore these triple systems in more detail, as well as the effects of nonzero eccentricity and post-merger recoils. We also hope to implement more self-consistent LC refilling, and more comprehensive tracking of the changing galactic environment.
In summary, • MBH binary lifetimes tend to be multiple Gyr, even for massive systems. While massive and high mass ratio systems are likely rare at very large separations, observations of dual MBH at r > 1 pc can be used to constrain the merger physics.
• We find that the lack of PTA detections so far is entirely consistent with our MBH population, and does not require environmental effects. At the same time, our most conservative models yield a GWB amplitude of A yr −1 = 0.2 × 10 −15 . While incorporating non-zero eccentricities may further suppress our GWB predictions, our simulations suggest that if PTA limits improve by a factor of 3-4 and no detection is made, our understanding of galaxy and MBH evolution may require revision.
• The median redshift and total mass of MBHB sources contributing to the GWB are z ≈ 0.3 and a few 10 9 M , while the median coalescence time of all systems tends towards z ≈ 0.6. Observations of the redshift distribution and host galaxy properties of dual-MBH can be informative for our understanding of binary evolution.
• Our simulations suggest that up to 30% of binaries could in-volve the presence of a third MBH. The role of MBH triples is currently unclear, but should be explored and included in future simulations.

Enh-Stellar
Enh-Gyr Force-Hard Figure A5. Dependence of GWB strain amplitude and binary coalescing fraction on LC refilling parameter and DF model. The GW strain is measures at very low frequencies, f = 10 −2 yr −1 , and the coalescing fraction is defined using the population of high mass-ratio µ > 0.1 systems. Each symbol represents a different DF model, and each color a different LC refilling parameter. A yr −1 tends to increase monotonically with F coal , until the highest F refill . At F refill ≈ 1.0, LC stellar scattering is effective enough at these low frequencies to significantly attenuate the GWB amplitude.  Table B1. Summary of quantitative results for the gravitational wave background (GWB), and MBH binary lifetimes & coalescing/persisting fractions. Results are shown for the three dynamical friction (DF) models described in the text: 'Enh-Stellar', 'Enh-Gyr' & 'Force-Hard'. For our fiducial DF model ('Enh-Stellar'; the most conservative case), results are shown for both a case with no circumbinary, viscous disks ('VD: None'), and with a disk using our standard parameters ('VD: Fiducial'). Two different loss-cone (LC) scattering states are also compared, in which the LC is always full (F refill = 1.0) and our fiducial case of moderately refilled (F refill = 0.6). Amplitudes and spectral indices are presented at both f = 1 yr −1 and f = 0.1 yr −1 . For lifetime and fractional statistics, binaries are split into four subsets: 'All' binaries included in our analysis, and those with mass ratios µ > 0.2, total masses M > 10 8 M , and both criteria (µ > 0.2, M > 10 8 M ). The lifetimes shown are median values of systems in each the 'Full' subset, and only those which coalesce by redshift zero ('Coal')-the number (and fraction) of such systems are given in the 'Coalescing Fraction' column. Finally, the fraction of systems which remain uncoalesced at separations r > 1 pc and r > 10 2 pc are shown in the 'Persisting Fraction' column. Results for a simulation using models with all fiducial parameters are shown in bold.

APPENDIX C: STELLAR, LOSS-CONE (LC) SCATTERING CALCULATIONS
The rate at which stars can refill the loss cone is governed by the 'relaxation time' (τ rel ). Following Binney & Tremaine (1987), consider a system of N masses m, with number density n, and characteristic velocities v. The relaxation time can be written as, where ln Λ is again the Coulomb Logarithm, and τ cross ≡ r/v is the crossing-time. τ rel represents the characteristic time required to randomize a particle's velocity via scatterings, i.e., Eq. C1 can be used to define the diffusion coefficient D v 2 as, τ rel ≈ v 2 /D v 2 . If t/τ rel 1, then two-body encounters (and relaxation) haven't been important.
Consider a distribution function (or phase-space density) f = f ( x, v), such that the number of stars in a small spatial-volume d 3 x and velocity-space volume d 3 v is given as f ( x, v) d 3 x d 3 v. In a spherical system in which there are the conserved energy E and angular momentum L, the six independent position and velocity variables can be reduced to these four independent, conserved quantities via the Jeans theorem. Furthermore, if the system is perfectly spherically symmetric-which we assume in our analysis, then the three independent angular momentum components can be replaced with the angular momentum magnitude, i.e. f = f (E, L).
If we define a relative potential and relative energy, Ψ ≡ −Φ + Φ 0 , and, E ≡ −E + Φ 0 = Ψ − 1 2 v 2 , then we can calculate the number density as 1 , Inverting this relationship, the distribution function can be calculated from an isotropic density profile using, We have found the latter form of (C3) to be much simpler and more reliable to implement.
We follow the discussion and prescription for loss-cone scattering given by Magorrian & Tremaine (1999), corresponding to a single central object in a spherical (isotropic) background of stars. We adapt this prescription simply by modifying the radius of interaction to be appropriate for scattering with a binary instead of being tidally disrupted by a single MBH. A more extensive discussion of loss-cone dynamics-explicitly considering MBH binary systems and asphericity-can be found in Merritt (2013).
Stars with a pericenter distance smaller than some critical radius R crit will interact with the binary. For a fixed energy (E) orbit, there is then a critical angular momentum, J lc (E) = R crit (2 [E − Ψ(R crit )]) 1/2 ≈ R crit (2 [−Ψ(R crit )]) 1/2 , which defines the loss-cone (Frank & Rees 1976;Lightman & Shapiro 1977). In general, the number of stars with energy and angular momentum in the range dE and dJ 2 around E and J 2 can be calculated as, N(E, J 2 ) dE dJ 2 = 4π 2 f (E, J 2 ) · P(E, J 2 ) dE dJ 2 , where P(E, J 2 ) is the stellar orbital period. For an isotropic stellar distribution f (E, J 2 ) = f (E), and P(E, J 2 ) ≈ P(E). The total number of stars can be calculated as, For the number of stars in the loss-cone N lc (E), this uses the LC an-gular momentum, J 2 i (E) = J 2 lc (E); and for all stars N(E), the circular (and thus maximum) angular momentum, J 2 i (E) = J 2 c (E). When we initially calculate the distribution function, we use the stellar density profile from Illustris galaxies which have recently hosted a MBH 'merger' event (see: §2.2). We assume that the resulting distribution function f (E) is (so-far) unperturbed by the MBH binary, i.e. it does not take into account stars already lost (scattered). The resulting N lc (E) from Eq. C5 then corresponds to the number of stars in the 'full' loss-cone specifically.
Stars in the LC are consumed on their orbital timescale τ orb = P(E). The rate of flux of stars to within R crit is then, coming almost entirely from within the central objects sphere of influence R infl , defined as M(r < R infl ) ≈ M • . Refilling of the loss-cone 1 Φ 0 is arbitrary, but Φ 0 ≡ E(r → ∞) may be convenient.
occurs on the characteristic relaxation timescale τ rel . From Eq. C1, it is clear that τ orb /τ rel ≈ τ cross /τ rel 1, i.e. the loss-cone is drained significantly faster than it is refilled-and the loss-cone will, in general, be far from 'full'.
To calculate the steady-state flux of the loss-cone, the Fokker-Planck equation must be solved with a fixed (unperturbed) background stellar distribution at the outer edge of the LC and no stars surviving within the scattering region at the inner-edge. A full derivation can be found in Magorrian & Tremaine (1999), which yields a equilibrium flux of stars, where the angular momentum diffusion parameter µ ≡ 2r 2 D v 2 /J 2 c , and, describes the effective refilling radius depending on which refilling regime ('pin-hole' or 'diffusive', see Fig. 1 of Lightman & Shapiro 1977) is relevant, for a refilling parameter q(E) ≡ P(E) µ(E)/R crit (E). Equations C6 & C7 give the full and steady-state LC fluxes, which are interpolated between using a logarithmic, 'refilling fraction' (Eq. 11) which then determines the hardening rate of each binary in our simulations.