BILBY in space: Bayesian inference for transient gravitational-wave signals observed with LISA

The Laser Interferometer Space Antenna (LISA) is scheduled to launch in the mid 2030s, and is expected to observe gravitational-wave candidates from massive black-hole binary mergers, extreme mass-ratio inspirals, and more. Accurately inferring the source properties from the observed gravitational-wave signals is crucial to maximise the scientific return of the LISA mission. BILBY, the user-friendly Bayesian inference library, is regularly used for performing gravitational-wave inference on data from existing ground-based gravitational-wave detectors. Given that Bayesian inference with LISA includes additional subtitles and complexities beyond it's ground-based counterpart, in this work we modify BILBY to perform parameter estimation with LISA. We show that full nested sampling can be performed to accurately infer the properties of LISA sources from transient gravitational-wave signals in a) zero-noise and b) idealized instrumental noise. By focusing on massive black-hole binary mergers, we demonstrate that higher order multipole waveform models can be used to analyse a year's worth of simulated LISA data, and discuss the computational cost and performance of full nested sampling compared with techniques for optimising likelihood calculations, such as the heterodyned likelihood.


INTRODUCTION
Gravitational-wave (GW) signals are regularly observed with existing ground-based GW detectors (Aasi et al. 2015;Acernese et al. 2014;Akutsu et al. 2021).By focusing on the Hertz to kiloHertz range of the GW spectrum, groundbased GW detectors have observed ∼ 100 signals from neutron star and stellar-mass black hole coalescences (Abbott et al. 2021a;Nitz et al. 2023;Venumadhav et al. 2020;Zackay et al. 2019b,a;Mehta et al. 2023), with many more expected during the ongoing fourth GW observing run.To probe lower frequencies, GW detectors must transition to space, as unavoidable terrestrial noise means that groundbased GW detectors will never be sensitive to the milliHertz frequency range (Danzmann 1996).
The Laser Interferometer Space Antenna (LISA) is designed to explore the milliHertz region of the GW spectrum, and is scheduled to launch in the mid 2030s (Amaro-Seoane et al. 2017).This frequency range is rich in sources, with massive black hole binaries (MBHBs) (Klein et al. 2016), extreme mass ratio inspirals (Glampedakis et al. 2002; Babak ⋆ charlie.hoy@port.ac.uk et al. 2007, 2017) and galactic white dwarf binaries (Nelemans et al. 2001) all expected.Inferring the properties of these systems from the observed gravitational-wave signals will e.g.provide stringent tests of general relativity in the strong-field regime, help constrain the Hubble's constant, and investigate the effect of dark matter on GW propagation (Amaro-Seoane et al. 2013).Unfortunately, inferring the properties of LISA sources includes additional subtitles and complexities beyond it's ground-based counterpart.For instance, unlike existing ground-based GW detectors that can be assumed static throughout the observed GW signal, the LISA observatory is continuously orbiting the Sun.This affects e.g.waveform model generation as well as the commonly used assumption that the noise is stationary and uncorrelated.Accurately inferring the properties of LISA sources is crucial to maximize the scientific output of the LISA mission.
bilby , the user-friendly Bayesian inference library, is designed to infer the source properties from the observed GW signal (Ashton et al. 2019;Romero-Shaw et al. 2020).bilby has been extensively used to analyse data from ground-based GW detectors, including individual GW observations (e.g.Abbott et al. 2021a), population proper-ties using hierarchical Bayesian inference (e.g.Abbott et al. 2023;Saleem et al. 2022;Talbot et al. 2019), tests of general relativity (e.g.Abbott et al. 2021b) and more (e.g.You et al. 2021;Powell & Müller 2022;Hübner et al. 2020;Dietrich et al. 2020;Hoy 2022).Given that bilby is regularly used with a proven track record for producing reliable results, it is natural to extend bilby to perform Bayesian inference on LISA sources.It also has the advantage of trivially interfacing with parallel-bilby (Smith et al. 2020), which reduces wall-time through massively parallel Bayesian inference, as well as numerous methods for reducing overall cost (e.g.Cornish 2010;Zackay et al. 2018;Cornish 2021;Morisaki et al. 2023;Morisaki 2021;Williams et al. 2021;Krishna et al. 2023).
In this paper, we outline our developments to bilby to perform Bayesian inference on (simulated) LISA data.We demonstrate that full Bayesian inference can be performed on the three time-delay-interferometry (TDI) LISA observables (Tinto et al. 2002) for a range of waveform models.By focusing on MBHB mergers, we show that bilby accurately infers the source properties in zero-noise, as well as idealized instrumental noise.We compare the standard likelihood commonly used in GW astronomy against optimised likelihoods designed to reduce the computational cost (Cornish 2010;Zackay et al. 2018;Cornish 2021;Krishna et al. 2023).We show that although the latter reduces the CPU time by a factor of ∼ 10 2 , a small bias is observed.We finally conclude by showing that bilby can perform Bayesian inference on LISA data with higher order multipole waveform models, and discuss how the inclusion of higher order multipoles can help break degeneracies in the parameter space.Although the work presented here focuses on MBHBs, our implementation is generic and can in principle perform Bayesian inference for any transient GW signals in LISA data; our only requirement is GW models for more exotic systems.
We note that there already exists methods to perform Bayesian inference on LISA sources: balrog, a software package under development for LISA data analysis (see e.g.Klein et al. 2022;Pratten et al. 2023), pycbcinference (Weaving et al. 2023), numerous algorithms that employ simplified fisher-matrix approximations (see e.g.Vecchio 2004;Berti et al. 2005;Lang & Hughes 2006;Arun et al. 2009) and more (see e.g.Cornish 2022; Katz 2022a; Littenberg & Cornish 2023).However, our implementation is generic, includes additional waveform models defined in LALSimulation (LIGO Scientific Collaboration 2018), has the ability to interface with a diverse range of samplers, and builds upon the modularity and ease of using bilby , meaning that it facilitates use by beginners.
The paper is organised as follows: in Section 2 we review Bayesian inference, describe the likelihoods typically used in gravitational-wave astronomy, and introduce the bilby package.In Section 2.1 we outline our modifications to bilby to enable Bayesian inference on LISA data.In Section 3 we validate our changes and confirm the effectiveness of bilby for analysing LISA data.We infer the properties of MBHBs in zero-noise in Section 3.1, and idealised instrumental noise in Section 3.2.Results of the analyses presented in this paper are provided as accompaniments to this paper.

BAYESIAN INFERENCE
Bayesian inference is the process of estimating the parameters, θ, of a model, m, given the observed data d.These properties are represented by the model-dependent posterior probability density function (PDF), which is calculated through Bayes' theorem, p(θ|d, m) = p(θ|m) p(d|θ, m) Z . (1) Here, p(θ|m) is the probability that the model has parameters θ given the model m, otherwise known as prior, p(d|θ, m) is the probability of observing the data d given the model parameters θ and model m, otherwise known as the likelihood, and Z is the probability of observing the data given the model m, otherwise known as the evidence.
In GW astronomy, the noise in the GW detector is assumed stationary and Gaussian, and the Whittle likelihood (Whittle 1951) is often used (Romano & Cornish 2017;Thrane & Talbot 2019).Here a parameterised GW model, m, is evaluated for a set of parameters, θ, and compared with the GW strain data, d.For a network of GW detectors, a Whittle likelihood is assumed for each case, and the full likelihood is obtained by taking the product of the singledetector likelihoods (see e.g.Veitch et al. 2015;Ashton et al. 2019, for details).This inherently assumes that the noise is uncorrelated in each detector.Of course, the GW observed in different detectors will have small differences owing to their relative position on Earth or in space.An additional step is therefore required to calculate the expected GW as seen in each of the detectors in the network.
As the Whittle likelihood is typically computationally expensive to evaluate, the heterodyned likelihood was introduced (Cornish 2010;Zackay et al. 2018;Cornish 2021).By assuming that the model evaluated at two high likelihood points is similar, and hence the ratio is a smoothly varying function, summary data for a fiducial point can be pre-computed, and likelihoods for surrounding points can be evaluated in a fraction of the time.It has been shown that the heterodyned likelihood can reduce the computational cost of GW Bayesian inference by a factor of 10 4 compared to evaluating the Whittle likelihood (Zackay et al. 2018).A downside of the heterodyned likelihood is that a fiducial point of high likelihood must be known a priori.Most notably, LALInference is a Bayesian inference library that employs custom built Nested and MCMC algorithms, and was the main workhorse of the LIGO-Virgo-KAGRA (LVK) collaboration for many years (Veitch et al. 2015).bilby was later introduced as an alternative tool, which prioritised modularity and ease of accessibility (Ashton et al. 2019).Unlike LALInference , bilby interfaces with existing 'off-the-shelf' samplers (e.g.Foreman-Mackey et al. 2013;Handley et al. 2015;Speagle 2020;Williams et al. 2021;Ashton & Talbot 2021).For the case of the nested sampler dynesty (Speagle 2020), bilby incorporates additional modifications to enhance it's performance for GW problems.Since the third LVK GW observing run, bilby has primarily been used to analyse data from ground-based GW detectors (Abbott et al. 2021b(Abbott et al. , 2023)).

bilby and modifications for LISA
Although bilby has a proven track record of analysing data from ground-based GW detectors, there are several additional features that are required in order to infer the properties of LISA sources.These include implementing the LISA observatory into the bilby.gw.detector module, described in Section 2.1.1,and incorporating additional models for calculating GWs as seen by LISA, see Section 2.1.2.

The LISA observatory
LISA will consist of three spacecraft, each containing a free falling test mass.Lasers will be relayed between each of the spacecraft, and the distance between free falling test masses will be monitored over time.A major challenge is that the laser frequency noise will be several orders of magnitude larger than any change in the arm length due to expected GW signals (Amaro-Seoane et al. 2017).For this reason, Time-delayed interferometery (TDI) was introduced as an additional post-processing step.TDI is a technique that combines data from three spacecraft into virtual interferometric observables, where the laser frequency noise is suppressed by several orders of magnitude, improving the sensitivity of LISA to GW signals.Numerous TDI observables have been proposed with different underlying assumptions.The second-generation TDI observables, X, Y and Z, allowed for linearly changing arm lengths, making them the most applicable for realistic LISA scenarios (Tinto & Armstrong 1999;Armstrong et al. 1999;Estabrook et al. 2000;Vallisneri 2005;Tinto & Dhurandhar 2021;Tinto et al. 2023).However, the second-generation TDI observables are correlated in their noise properties.For this reason, a linear transformation is applied to the X, Y and Z observables, forming a set of approximately uncorrelated TDI observables: A, E and T (Prince et al. 2002).
We provide the LISA class to handle the LISA observatory.We treat LISA as a network of distinct virtual interferometers, each corresponding to separate TDI observables.Although our implementation is generic to any TDI observables, allowing for possible improvements beyond secondgeneration TDI, the likelihood for a network of virtual interferometers assumes the noise is uncorrelated in each detector.This paper therefore uses the A, E and T channels.

Alternative GW models
In order to evaluate the GW likelihood, a parameterised model must be evaluated for a given set of parameters, and compared with the GW strain data.Since it is sensible to perform Bayesian inference on the TDI observables, see Section 2.1.1,our parameterised GW model must calculate the expected GW in each of these channels.
An additional complication is that the GWs from expected LISA sources will likely remain observable for weeks, if not months.During this time, the LISA observatory will continuously change it's orientation with respect to the source, as a result of it's orbit around the Sun.This implies that when calculating the expected GW as seen by LISA, we must apply a time and frequency dependent transformation, T (f, t), to the source-frame GW (Marsat et al. 2021).This is in contrast to ground-based GW detectors, which are assumed to be static during the whole GW signal.
The lisa_binary_black_hole function returns a dictionary of expected GWs observed in each TDI observable for a provided set of parameters θ.Since we argued that the A, E and T channels are the most logical for calculating the likelihood in Section 2.1.1,we currently only consider these TDI observables.Of course, this can be extended in future work.Our implementation calculates GWs for parameters provided in the solar system baricenter (SSB) or LISA frame.When the LISA frame is used, transformations are applied to map the parameters into the SSB frame.This is required in order to calculate T (f, t), which depends on the source location and polarization defined in the SSB frame (Katz 2022a).
We interface with the BBHx (Katz et al. 2020;Katz 2022bKatz , 2021) ) and LALSimulation (LIGO Scientific Collaboration 2018) libraries.BBHx is an open source software that generates the GW polarizations, calculates the time and frequency dependent transformations, T (f, t), and returns the expected GWs in the A, E and T channels.Currently BBHx implements the IMRPhenomD (Khan et al. 2016;Husa et al. 2016) and IMRPhenomHM (London et al. 2018) waveform models; both models assume black hole spins aligned with the orbital angular momentum, with the former considering only the leading order quadrupole contribution, while the latter allows for additional higher-order multipole content.The user can call the BBHx models through bilby by passing a waveform_approximant of BBHx_IMRPhenomD or BBHx_IMRPhenomHM respectively.In order to reduce computational cost, interpolation can be enabled in the BBHx waveform models by specifying the waveform argument direct=False.Here, a reduced set of frequencies are used to evaluate the waveform, controlled with the waveform argument length, and interpolation is used to provide the waveform at the desired frequency points (Katz et al. 2020;Katz 2022bKatz , 2021)); by default length = 1024.In this work, we do not use interpolation when using the BBHx waveform models, in order to reduce the possibility of interpolation errors.If the user would like to use additional models beyond those included in BBHx, we use the LALSimulation library to generate GW polarizations, and use BBHx functions to project the GW polarizations into the A, E and T channels.This allows us to interface with e.g. the cutting edge binary black hole model IMRPhenomXPHM (Pratten et al. 2021).
To validate our LALSimulation implementation, we randomly drew 10 5 MBHBs with chirp masses 1 < M [10 6 M⊙] < 5.0, mass ratios 0.25 < q < 1.0, spins 0 < χ < 0.99 and luminosity distances 1 < dL[Gpc] < 100 (all other parameters randomly chosen from an unconstrained bound- ary), and calculated the expected GW in the A TDI channel when using the BBHx IMRPhenomD implementation, and the LALSimulation IMRPhenomD implementation.We quantify the difference between GWs by the mismatch, which is defined as, where a and b are the GWs calculated using BBHx and LALSimulation respectively, S is the one-sided power spectral density (PSD), and the maximisation is over the phase ϕ and time t1 .The mismatch ranges between 0 and 1, where a mismatch of 0 implies the two GWs are identical.As shown in Figure 1, we find a median mismatch of 4.0 × 10 −6 , with a large tail extending to smaller mismatches.Interestingly, we find that in general larger mismatches are correlated with smaller chirp masses, where the signal durations are longer.Ideally all mismatches would be close to numerical precision, but we point out that the average mismatch remains two orders of magnitude lower than the mismatch between IMRPhenomD and numerical solutions to general relativity (Khan et al. 2016).It is well known that the mismatch can be mapped to an approximate signal-to-noise ratio (SNR) at which two signals are indistinguishable at 90% confidence (Baird et al. 2013).Assuming eleven degrees of freedom (for example the two masses and spins of both black holes, sky location, luminosity distance, inclination, phase, polarization and merger time), we find that on average, the two approaches are indistinguishable for SNRs less than ∼ 1400, and 70% of signals are indistinguishable for SNRs less than ∼ 1000.Therefore although there may be convention differences between the two approaches, which become more evident at larger durations, it is unlikely to cause significant differences when performing Bayesian inference on most MBHB systems.We are therefore confident that our LALSimulation implementation is reasonable.

RESULTS
To test our developments to bilby to perform Bayesian inference on LISA data, we analyse MBHB signals.We choose signals that are similar to those expected when LISA is online.For cases where the true values are known, systematic biases can be studied.
We perform Bayesian inference with the dynesty Nested sampler (Speagle 2020), and compare the computational cost and performance of evaluating the full Whittle likelihood with the heterodyned likelihood.Since evaluating the full Whittle likelihood is likely computationally expensive, we perform Bayesian inference with parallel-bilby .We use standard bilby when evaluating the heterodyned likelihood, and set the true values as the fiducial point.Of course, in reality the true values are unknown, meaning that this is an idealised case.For all analyses we use 1000 live points, the bilby-implemented rwalk sampling algorithm with an average of 60 accepted steps per MCMC, and evaluate the likelihood for frequencies 10 −4 Hz ⩽ f ⩽ 10 −1 Hz.Sampling is performed in the LISA frame, as it is the most natural for LISA data analysis tasks, and we coherently analyse the A, E and T channels.All analyses use uninformative and wide priors, and we sample in the standard parameters: chirp mass, mass ratio, spin magnitudes, inclination angle, phase, ecliptic latitude, ecliptic longitude, polarization and merger time.We reduce the computational cost by analytically marginalizing over the luminosity distance of the source, and reconstructing the posterior in post-processing, see Thrane & Talbot (2019) for details.
To compare posterior distributions obtained with the heterodyned and full Whittle likelihoods, we use the Jensen-Shannon divergence (JSD) (Lin 1991).The JSD ranges between 0 bits and 1 bit, where a JSD=0 bits (JSD=1 bit) implies statistically identical (distinct) distributions.The JSD has been regularly used in GW astronomy for comparing posterior distributions (see e.g.Abbott et al. 2019Abbott et al. , 2021c)).A general rule of thumb is that a JSD < 0.05 bits implies that the distributions are in good agreement (Abbott et al. 2019).
We analyse a zero-noise injection in Section 3.1, and an injection in mock LISA data with idealised Gaussian instrumental noise in Section 3.2.We obtained mock LISA data through the Sangria dataset (Le Jeune & Babak 2022) released as part of the LISA data challenge (LDC).The LDC contains multiple challenges that will eventually cover all of the complexities associated with LISA data analysis.For this paper, we only consider the Sangria dataset as a proof of principle, and leave further challenges to future work.The PSDs used in this work were generated from the Sangria dataset using gwpy (Macleod et al. 2021) and the Welch method.Unlike Weaving et al. (2023), we assume Galactic binaries form part of the noise, and therefore we include them in our PSD generation.By constructing PSDs in this way we are inherently removing the time-dependence, both from the foreground Galactic binaries and from LISA's orbit around the Sun.This is therefore an idealised case, but we note that it will unlikely cause a significant difference for MBHBs, where the SNRs are expected to be large, and rapidly accumulating within the final day before merger (Marsat et al. 2021).This assumption will need to be relaxed for signals of longer duration.We note that the foreground Galactic binary problem may be navigated by using a Global Fit approach (see e.g.Littenberg & Cornish 2023), and the PSD may be able to include a time-frequency dependence by using the methods outlined in e.g.Cornish (2020).

Zero-noise Injections
We first simulate a GW produced by a MBHB with total mass M = m1 + m2 = 3.1 × 10 6 M⊙, mass ratio q = m2/m1 = 0.74, primary spin χ1 = 0.51 and secondary spin χ2 = 0.14.We assume the source has spins aligned with the orbital angular momentum and is located at an ecliptic latitude β = 2.8 and ecliptic longitude λ = 2.3, viewed at an inclination angle of ι = 1.3 rad and luminosity distance dL = 17 Gpc.The phase and polarization were chosen to be ϕ = 4.0, ψ = 1.1 respectively, and the merger time was set to be tc = 6.7 × 10 6 s.The injection is defined in the LISA frame, and has SNR ∼ 550 2 .We generate the simulated signal with BBHx_IMRPhenomD, inject it into a month's worth 2 The parameters were chosen to be similar to the zero-noise injection performed in Weaving et al. (2023).
of zero-noise data sampled at 5 second intervals, and perform Bayesian inference with the BBHx_IMRPhenomD model for consistency.
In Figure 2, we compare the marginalized posterior distributions for the intrinsic parameters (total mass, mass ratio, primary spin and secondary spin) with the true values.In general we see that the full Whittle and heterodyned likelihoods recover the true values within the 90% confidence interval.Although all Jensen-Shannon divergences are less 0.05 bits, we see that the Whittle likelihood results more accurately recover the true values, with a small shift relative to the heterodyned likelihood in all cases.
When considering extrinsic parameters (merger time, sky location, luminosity distance, inclination, phase, polarization), we expect to observe a bias in the sky location owing to well known degeneracies; for aligned-spin leading order quadrupole models, we expect to infer one of eight possibilities (Marsat et al. 2021).Both likelihoods find support at the true ecliptic latitude, but only the heterodyned likelihood identifies the true ecliptic longitude mode, while the full Whittle likelihood favours the opposite sky mode.We see that both likelihoods recover a quad-modal polarization, with the full Whittle posterior shifted by π/2 compared to the heterodyned posterior -this is consistent with favouring the opposite sky mode.Both likelihoods accurately recover the merger time and phase.Interestingly, only the Whittle likelihood observes a bias in the inclination angle and luminosity distance.Since we are using a model with spins aligned with the orbital angular momentum, and ignoring the effects of higher-order multipole content, we are unable to break the well known distance-inclination degeneracy (see e.g.Marsat et al. 2021).It is therefore expected to see a bias in both the distance and inclination angle, especially at such large SNRs.Given that the heterodyned likelihood uses the true values as the fiducial point, it is not surprising that it more accurately recovers the distance and inclination angle of the source.
We found that, although both analyses obtained ∼ 5, 500 posterior samples and evaluated the likelihood ∼ 200 million times, the analysis that took advantage of the heterodyned likelihood was ∼ 10 2 times faster and required ∼ 10 times less memory than the full Whittle likelihood analysis; the full Whittle likelihood analysis completed in O(10 4 ) CPU hours and required ∼ 200 GB memory.We note, however, that the heterodyned likelihood requires prior knowledge of a high likelihood point, whilst the Whittle likelihood is completely general.To navigate this, previous works either used the true injected values, or they performed templatebased matched-filter searches of the data prior to performing Bayesian inference, and they used the best fit template as the high likelihood point.We note that in some cases, the latter can lead to relatively poor fitting templates (Weaving et al. 2023).Standard implementations of the heterodyned likelihood may also not be appropriate for models that include the effects of higher-order multipoles (Leslie et al. 2021).The full Whittle likelihood may therefore be more appropriate for analyses where a high likelihood fiducial point cannot be confidently found, or when models that include higher order effects are required.We leave an investigation that studies the impact of fiducial points, as well as potential algorithms for using the heterodyned posteriors The reconstructed GW signal was obtained by coherently analysing the A, E and T TDI channels using the aligned-spin limit of IMRPhenomXPHM, and the full Whittle likelihood.We shift the reconstructed GW signal and data by the inferred merger time as defined in the LISA frame tc.The reconstructed GW signal is plotted as a band representing the 90% credible region.Bottom: Comparison between simulated gaussian noise which has been coloured by the power spectral density (black), and the residual after subtracting the reconstructed GW signal from the time-domain Sangria blind dataset (grey).For simplicity, we only show the A TDI channel in both panels.
as an initial guess for massively restricting the prior volume for full Whittle analyses, to future work.

Sangria Blind Dataset
We next analyse the Sangria Blind Dataset (Le Jeune & Babak 2022) with IMRPhenomXPHM (Pratten et al. 2021).IMRPhenomXPHM is the cutting edge binary black hole model, which includes spin-induced orbital precession (Apostolatos et al. 1994) and higher order multipoles.In this analysis, we restrict black holes spins to be aligned with the orbital angular momentum, meaning that we ignore the effects of spininduced orbital precession.Of course, we could have alternatively used IMRPhenomXHM (García-Quirós et al. 2020), which also includes higher order multipole contributions and similarly neglects the effects of spin-induced orbital precession.However, we chose to use the more general IMRPhenomXPHM model, as it is more likely to be used in future analyses; for instance investigating the impact of including mis-aligned black hole spins for breaking possible degeneracies.This dataset contains idealized Gaussian instrumental noise and simulated waveforms from Galactic white dwarf binaries and MBHBs with parameters derived from astrophysical models.The level of instrumental noise, the number of sources, and their parameters are not disclosed.Since the heterodyned likelihood requires a fiducial point, and the true injected parameters are unknown, we only use the full Whittle likelihood.We analyse a year's worth of data, sampled at 5 second intervals, and generate PSDs using all available data.The PSDs differ from those used in Section 3.1, but we similarly assume that Galactic binaries form part of the noise, and therefore we include them in our PSD generation.
As with the current bilby implementations, we require an estimate for the merger time of the binary in order to define a suitable prior distribution for the merger time.This is typically provided from the matched-filter search pipelines.However, given that signal 0 can clearly be seen in the Sangria blind dataset, see Figure 3, it is trivial to estimate a suitable prior distribution that encompasses the merger.In future analyses of blind data, we would base our prior distributions on the best matching template from a LISA search pipeline, see e.g.Weaving et al. (2023).
We see remarkable agreement between the reconstructed GW obtained with IMRPhenomXPHM and the data, see Figure 3.When comparing the marginalized posterior distributions with those obtained by Weaving et al. (2023) and BBHx_IMRPhenomD (see Figs. 9 and 10 in Weaving et al. (2023)), we see that our analysis a) more tightly constrains the intrinsic source properties, and b) obtains smoother (more gaussian-like) posterior distributions.This is expected as significant SNRs, as well as a lack of observed SNRs, in higher order multipoles can help break degeneracies (see e.g.Marsat et al. 2021;Pratten et al. 2023;Gong et al. 2023).We can directly calculate the SNRs in higher order multipoles given the inferred posterior with simple-pe (Fairhurst et al. 2023).For this signal, we infer SNRs in the (ℓ, m) = (3, 3) and (4, 4) multipoles to be < 1 (∼ 0.6 and ∼ 0.3 respectively).The near-zero SNRs are consistent with the expectations from gaussian noise (Mills & Fairhurst 2021), and therefore we find no significant evidence for higher order multipole content in this signal.This is consistent with the fact that the GW signals in the Sangria Blind Dataset were constructed with IMRPhenomD (Le Jeune & Babak 2022).
When subtracting the reconstructed GW from the data, the residual signal is comparable with simulated gaussian noise coloured by the PSD, see Figure 3.This implies that the MBHB signal has been appropriately subtracted from the data, meaning that the residual can be used for further Bayesian inference analyses as part of a Global Fit style approach (see e.g.Littenberg & Cornish 2023).We note that do not expect perfect agreement between the residual and the simulated gaussian noise, since the simulated gaussian noise is dependent on the specific noise realisation chosen.
The IMRPhenomXPHM analysis completed in O(10 5 ) CPU hours.Owing to the larger data duration considered, compared with the zero-noise injection presented in Section 3.1, more memory was required, ∼ 300 GB.For comparison a typical analysis of 8 seconds worth of ground-based GW detector data sampled at 1/4096 second intervals completes in O(10 3 ) CPU hours (Colleoni et al. 2021;Ramos-Buades et al. 2023).The increase in computational cost is primarily driven by the waveform evaluation time and the long data durations considered.There are numerous methods available for reducing the computational cost without taking advantage of the heterodyned likelihood.For example, reduced order quadrature models (Canizares et al. 2015;Qi & Raymond 2021;Morisaki et al. 2023), adaptive frequency resolutions (Vinciguerra et al. 2017;Morisaki 2021) and other techniques (Williams et al. 2021;Lee et al. 2022;Islam et al. 2022;Wong et al. 2023;Pathak et al. 2022;Tiwari et al. 2023) have all been used to good effect in the past.Given that most of these methods are already implemented in the bilby infrastructure, it is trivial to take advantage of these in future analyses.As discussed throughout this work, the SNR from MBHBs rapidly accumulates within the final day before merger (Marsat et al. 2021).This implies that long data duration's may not be necessary into order accurately infer the properties of MBHBs.This will inherently reduce both the CPU and memory costs.
Although the scalability of parallel-bilby has been investigated in detail in Smith et al. (2020), we provide a further analysis specific to the LISA context.We analysed the Sangria Blind Dataset with the aligned-spin limit of IMRPhenomXPHM using 128, 256, 512, and 768 CPUs.Owing to computational cost, we stopped the analyses after 12 hours wall time and compared the samplers progress.In general, we find that more CPUs increased a) the number of times the Whittle likelihood was called, b) the recorded iteration, and c) the number of proposals used.Specifically, when comparing the 768 and 128 CPU analyses, we find that the former called the likelihood ∼ 34 times more than the latter, and it was on the ∼ 5000th iteration compared to ∼ 3500th.This suggests that running with more CPUs will reduce the overall wall time of the analysis.

DISCUSSION
Inferring the properties of LISA sources is crucial for maximising the scientific output of the LISA mission.Although numerous algorithms exist for performing Bayesian inference, few are applicable for GW astronomy, and even fewer are capable of analysing LISA data.In this work, we intro-duce a modified version of bilby , the user-friendly Bayesian inference library, to allow users to easily analyse LISA data with a suite of waveform models.We use bilby as our starting point as it is already extensively used for analysing data from ground-based GW detectors, and has a proven track record for producing reliable results.
By focusing on MBHBs, we show that bilby can perform Bayesian inference on LISA data and is capable of accurately inferring the source properties in zero-noise as well as idealized instrumental noise.We compare the performance of the standard likelihood used in GW astronomy, with an optimised likelihood designed to reduce the computational cost: the heterodyned likelihood.Although we find that analyses which use the full likelihood more accurately infer the source properties, analyses using the heterodyned likelihood are ∼ 10 2 times faster.We note, however, that the heterodyned likelihood is dependent on knowing a point of high likelihood a priori, and standard implementations may not be suitable for models with higher order multipole content.The full Whittle likelihood may therefore be preferred for many cases.We finally demonstrate that higher order multipole waveform models can be performed on LISA data.We highlight that the computational cost can likely be reduced by using techniques other than the heterodyned likelihood, already implemented into the bilby infrastructure.
Our current implementation includes waveform models that are applicable for MBHBs.As we have maintained the modularity of bilby , we anticipate future extensions of this work to include alternative waveform models (Afshordi et al. 2023).
Numerous packages have been developed to stochastically sample the unknown posterior PDF for GW signals (e.g.Veitch et al. 2015;Ashton et al. 2019;Biwer et al. 2019).

Figure 1 .
Figure 1.Mismatches between 10 5 GW signals calculated using the BBHx and LALSimulation implementations of IMRPhenomD, when projected onto the A TDI channel.

Figure 2 .
Figure2.Inferred posterior distribution for the total mass M , mass ratio q, primary spin χ 1 and secondary spin χ 2 when analysing a zero noise injection with the BBHx_IMRPhenomD model.Results were obtained when analysing the A, E and T TDI channels.The dotted and solid lines show the results when using the heterodyned and full Whittle likelihood respectively.The grey cross hairs show the true values, and the 2-dimensional contours show the 90% confidence interval.

Figure 3 .
Figure3.Top: Comparison between the reconstructed GW signal (black) and the time-domain Sangria blind dataset (grey).The reconstructed GW signal was obtained by coherently analysing the A, E and T TDI channels using the aligned-spin limit of IMRPhenomXPHM, and the full Whittle likelihood.We shift the reconstructed GW signal and data by the inferred merger time as defined in the LISA frame tc.The reconstructed GW signal is plotted as a band representing the 90% credible region.Bottom: Comparison between simulated gaussian noise which has been coloured by the power spectral density (black), and the residual after subtracting the reconstructed GW signal from the time-domain Sangria blind dataset (grey).For simplicity, we only show the A TDI channel in both panels.