The HARPS search for southern extra-solar planets – XLV. Two Neptune mass planets orbiting HD 13808: a study of stellar activity modelling’s impact on planet detection

We present a comprehensive analysis of 10 yr of HARPS radial velocities (RVs) of the K2V dwarf star HD 13808, which has previously been reported to host two unconﬁrmed planet candidates. We use the state-of-the-art nested sampling algorithm P OLY C HORD to compare a wide variety of stellar activity models, including simple models exploiting linear correlations between RVs and stellar activity indicators, harmonic models for the activity signals, and a more sophisticated Gaussian process regression model. We show that the use of overly simplistic stellar activity models that are not well-motivated physically can lead to spurious ‘detections’ of planetary signals that are almost certainly not real. We also reveal some difﬁculties inherent in parameter and model inference in cases where multiple planetary signals may be present. Our study thus underlines the importance both of exploring a variety of competing models and of understanding the limitations and precision settings of one’s sampling algorithm. We also show that at least in the case of HD 13808, we always arrive at consistent conclusions about two particular signals present in the RV, regardless of the stellar activity model we adopt; these two signals correspond to the previously reported though unconﬁrmed planet candidate signals. Given the robustness and precision with which we can characterize these two signals, we deem them secure planet detections. In particular, we ﬁnd two planets orbiting HD 13808 at distances of 0.11, 0.26 au with periods of 14.2, 53.8 d, and minimum masses of 11, 10 M ⊕ .


INTRODUCTION
The Radial Velocity (RV) method has been an important and productive tool for discovering exoplanets ever since it led to the discovery of the first exoplanet orbiting a Sun-like star (Mayor & Queloz 1995).However, the search for small Earth-and Neptunelike planets orbiting Sun-like stars is very challenging.They give rise to relatively small radial velocity signatures of order 1 m s −1 or less, whilst stellar magnetic activity can induce RV signals that can mimic planetary ones, with amplitudes of order many meters per second (e.g Queloz et al. 2001).
Thus, with the increasing precision of RV measurements facilitated by upcoming extreme-precision Doppler spectrographs such as ESPRESSO (Pepe et al. 2014), EXPRES (Jurgenson et al. 2016), HARPS3 (Thompson et al. 2016), HIRES (Pasquini et al. 2008) and NEID (Schwab et al. 2016), the characterization of the influence from host stars' activity on the RV measurements becomes ever more important.Hence there is a strong interest in developing tools to disentangle stellar and planetary signals in RV data.
A variety of methods exists that attempt to 'correct' or model stellar activity signals in RVs, and thus reduce the possibility of false-positive planet detections.Models describing this include prewhitening and red-noise models (e.g Hatzes et al. 2010;Feroz & Hobson 2013), as well as studying stellar activity indicators such as the bisector inverse slope (BIS) and Full Width Half Maximum (FWHM) of the cross-correlation function (CCF) between a target star and a template spectrum, or measurements of chromospheric activity in the target star, such as the log  HK index (e.gBoisse et al. 2009;Queloz et al. 2009;Dumusque et al. 2011b).The presence of periodic signals in indicators usually suggest an activity-induced signal rather than a planetary one, since a genuine planet would induce a periodic Doppler shift in all spectral lines, but would not produce the same periodic variations in the stellar activity indicators.It is straightforward to use linear correlations (e.g.Queloz et al. 2001) and to include harmonic models which use the rotational period of the star and its harmonics (e.g Dumusque et al. 2012) to describe stellar activity induced RV variations.
Other, more computationally expensive methods include modelling the stellar surface features directly (e.g Boisse et al. 2012) and predicting activity-related RV variability with stellar activity indicators using the  method and Gaussian processes (GPs) as described in e.g.Aigrain et al. (2012) and Rajpaul et al. (2015); such approaches have already successfully been used to identify false-positive planetary detections as e.g. in Haywood et al. (2014b) or Rajpaul et al. (2016).
To quantify the quality of different models for describing observed RV data, Bayesian model comparison has proven to be a robust approach (e.g Feroz & Hobson 2013;Faria et al. 2016;Hall et al. 2018), also in combination with GP modelling (Faria et al. 2020).However, a significant challenge inherent in this method is very high computing costs when dealing with high-dimensional problems, since computing Bayesian model evidences entails integrating over all model parameter posteriors; a related challenge is degeneracy inherent in combinations of Keplerian and stellar activity models (see Nelson et al. 2020).Nevertheless, it is widely accepted that Bayesian model comparison is a theoretically-sound approach to answering questions about competing physical models, 1 and that such computational burdens are therefore a price worth paying (Goodman 1999).
This paper presents a comprehensive analysis of a series of 246 spectroscopic measurements on HD 13808 carried out with HARPS from 2003 to 2014.This star belongs to a sample of bright stars selected for their low level of RV "jitter" to maximize the sensitivity of the survey to low-mass exoplanets.Series of stellar spectrum of HD 13808 with a signal to noise (SN) ratio large enough to reach the photon noise RV measurement error of order 50 cm s −1 are available.In addition, the exposure time was at least 15 minutes to minimize the effect of acoustic mode stellar oscillations (e.g.Dumusque et al. 2011a;Chaplin et al. 2019).Early in the survey of HD 13808, small amplitude RV variations were detected, suggesting a combination of time variable signal due to a multi-planet systems and moderate stellar activity.We present a comprehensive analysis of HARPS RVs on that star using Bayesian inference to combine various stellar activity models with a set of independent Keplerian orbit solutions.
With this paper we aim to examine the importance of comparing systematically a number of competing physical models, since the usefulness of Bayesian model comparison is limited by the quality of the models considered (favouring one inappropriate model over another does not imply that either model is in any sense 'correct'). 1 The same can not be said for various other commonly-used approaches to model selection, such as residual minimization, Bayesian information criterion testing, etc. (Gelman et al. 2013) We show that if an inadequate stellar activity model is considered, one may be led astray and end up with spurious conclusions about the number of planets present in one's system.Models we consider in our analysis include: linear correlations between RV data and stellar activity indicators; sinusoidal models; and joint GP regression of RV measurements and activity.Moreover, we present a thorough test of the nested sampling algorithm P C , a state-of-theart sampler, and discuss its limitations and the importance of the precision settings in exoplanet applications.The HD 13808 system is used as a test case for this study as (i) the host star shows stellar activity variability, and (ii) it has been suggested in the literature that the system hosts at least two planet candidates, though the existence of these planets was never securely confirmed (see Mayor et al. 2011 andGillon et al. 2017; at the time of writing this paper, the NASA Exoplanet Archive2 lists HD 13808 as having no confirmed planets).With the analysis in this paper the status of these candidates elevates to confirmed planets.
This paper is structured as follows.We describe the HARPS observations in Section 2, followed by the modelling of Keplerian orbits in Section 3. We introduce the stellar activity models applied in this work in Section 4 and outline model comparison using Bayesian inference and P C in Section 5. Afterwards, we present the results of our analysis of HD 13808 RVs in Section 6 and conclude with our discussion in Section 7.

HARPS OBSERVATIONS OF HD 13808
Spectra of HD 13808 were measured with HARPS, a fiber-fed spectrograph installed in a vacuum vessel, mounted on the 3.6m ESO telescope of La Silla in Chile (Mayor et al. 2003).Observations were made using the most precise observing mode which utilizes a simultaneous calibration by the ThAr calibration lamp.In this configuration HARPS achieves a long term RV precision better than 1 m s −1 allowing us to detect small stellar RV variations of this order (Lovis et al. 2008).
The data obtained by HARPS are automatically processed on site by a data reduction software -the HARPS DRS -that extracts the spectra, calibrates it and eventually computes a cross-correlation function with a stellar template.The stellar radial-velocity is measured from the CCF as well as other parameters like the FWHM or the BIS Queloz et al. (2001); Santos et al. (2002).In addition for each HARPS spectra, the value of the Calcium S activity index is estimated from the chromospheric re-emission in the Ca II H and K lines and converted in standard log  HK index.Since the release of the first version of the HARPS DRS in 2003, a series of successive versions with improved algorithms have been developed, leading to a steady gain in the RV precision.Details about the historical changes in the HARPS DRS algorithms may be found in following series of papers: Baranne et al. (1996); Pepe et al. (2002); Lovis & Pepe (2007); Mayor et al. (2009a,b).Measurements used in this paper were reprocessed with version 3.5 of the HARPS DRS 3A description of the HARPS data used in this study is given in Table 1 while HD 13808's stellar parameters are summarised in Table 2. HD 13808 has been observed by HARPS 246 times over a span of more than 10 years, achieving consistent quality of mean( RV ) = 0.76 m s −1 .

MODELLING RADIAL VELOCITIES
To model the observed RV of a star at time   with   planets, we sum the planets' individual Keplerian terms, neglecting planetplanet interactions; following Feroz et al. (2011) and the formalism given in Balan & Lahav (2009) we arrive at: where   is the systemic velocity,   the RV semi-amplitude,  ,  , the true anomaly,   the orbital eccentricity, and   the argument of periastron of the  th planet, respectively.In addition, the introduction of the mean longitude   of the  th planet is necessary as part of the computation of  ,  which requires   and the period of the planet   as well.A planet's semi-amplitude   is related to its mass   , period and orbital eccentricity   , as well as to the mass of the parent star   and the inclination of the system  relative to the observer: In summary, we have five free parameters per planet (, , , , and ) plus a white-noise 'jitter' term  + RV which is added to the observational error in quadrature (see Section 5.2) and any terms for describing the systemic velocity,  -typically a constant offset plus a linear or possibly quadratic polynomial term.For the analysis in this paper, a quadratic polynomial term was used for all models.Additionally, our models for describing stellar activity contributions to RVs contain anything from one to sixteen free parameters; these models are described in detail in Section 4. The uninformative priors for our five free parameters per planet are listed in Table 3, as well as the priors for the polynomial terms and for the additive white-noise term.Note that the lower and upper boundary for planet periods was set to 5 and 100 d for computational efficiency.This choice was supported, in the first instance, by the fact that the Lomb-Scargle power spectrum of RVs revealed no significant periodicities above about 60 d (see Fig. 3).Moreover, models allowing planets with periods longer than 100 d or shorter than 5 d always had lower Bayesian evidences than models without such planets (when using a GP stellar activity model -cf.Section 6), and the inferred RV semi-amplitudes of these short-or long-period planets was always consistent with zero. 4he priors used for stellar activity model parameters were generally also chosen to be uninformative, and are discussed in Section 4 and summarized in Table 4.

MODELLING STELLAR ACTIVITY
In this section we introduce the multiple ways of modelling stellar activity applied in this work.Simple models like linear dependencies are considered, as well as harmonic modelling of the rotation period of the star, combining BIS and RV measurements in a simultaneous fit.We also discuss more sophisticated approaches to modelling activity-induced RV variations, namely the  approach and simultaneous GP regression over multiple activity indicators.

Linear activity model
A linear relation between RVs and BIS was considered as seen in equation ( 3), where RV Kepler (t) are the RV signatures of the planets at time  as described previously in equation (1); BIS(t) represents the BIS measurements taken at time  and  describes the free parameter for the linear relation.A quadratic polynomial term including a constant offset is represented by .
RV total () = RV Kepler () + BIS() +  . (3) This method has already been successful in identifying false where  represents the linear factor and log  HK () the log  HK measurements at time .The prior distribution for  can be found in Table 4.

Harmonic activity model
Our second activity model entailed fitting a sinusoid to both the RVs and BIS time series, enforcing an identical period but allowing different phases and amplitudes between the two.We assume that this captures solar spots and other stellar features which are sensitive to the rotation of the star.In addition, it accounts for a likely time shift which is a problem when considering linear correlations as mentioned in Section 4.1.
Equation ( 5) and ( 6) show the two models with the harmonics of the rotation period simultaneously fit to the RV and BIS measurements, respectively.The parameter  rot is the putative rotation period of the star fit to both models, while the other parameters   ,  RV and   ,  BIS and  BIS describe the sinusoidal signal of the  th and  th harmonic for the RV() and BIS() model respectively, with  ℎ being the number of harmonics.As before, the parameter  represents a quadratic polynomial contribution.
The prior distribution for  is as in Table 3; the priors for ,  BIS and  rot are displayed in Table 4.Note that the prior on the rotation period  rot was chosen to cover a narrow range based on the following: (i) the rotation period for HD 13808 was estimated before to be ∼ 40 d based on the average of log  HK measurements by Lovis et al. ( 2011); (ii) signals likely corresponding to the harmonics of the rotation period are detected in the periodograms of the stellar activity indicators, see Section 6.1 (Fig. 3).Further motivation was provided by the fact that (iii) when considering zero or 1-planet models, the rotation period of the star became locked on to the period of one of the two planets, and that (iv) during preliminary runs, the MAP value for the rotation period never went above 40 d.By limiting the period to below 42 d we reduced the computational burden of our modelling.

Long-term magnetic activity model
In a similar fashion to the harmonic activity model, the 'magn.cycle' model fits a sinusoid to two data sets, this time to the RVs and the log  HK time series, enforcing an identical period but allowing different phases and amplitudes.We assume that this captures stellar activity corresponding to a long-term magnetic activity cycle.Equation ( 7) and ( 8) show the two models with the cycle period  magn simultaneously fit to the RV and log  HK measurements, respectively.The parameters  magn,RV ,  magn,RV and  magn , magn and  magn describe the RV() and log R HK () model respectively;  represents a constant offset and a quadratic polynomial term.
The prior distribution for  is as in Table 3; the priors for ,  magn and  magn are displayed in Table 4.The upper and lower prior limits for  magn were determined by a broad peak around 1000-4000 d in the periodogram of log  HK as well as by the obvious periodic signal in the log  HK measurements (see Fig. 2).

𝐹𝐹 method
The FWHM measurements were used for applying the  method which was introduced by Aigrain et al. (2012) as a method for relating the photometric brightness and RV variations of a star.It uses the flux of the star Ψ() and its derivative Ψ() as an indicator for spot coverage and predicts RV variations.These variations include the RV perturbation ΔRV rot (t) due to the presence of spots on the rotating photosphere and their effect of the suppression of convective blueshift ΔRV conv (t).Aigrain et al. (2012) describes these with the following two equations where  represents the drop in flux produced by a spot at the centre of the stellar disc,  * is the stellar radius,   is the difference between the convective blueshift in the unspotted photosphere and that within the magnetized area and  is the ratio of this area to the spot surface.
The total RV variation ΔRV activity created by stellar activity is then the sum of both terms: The parameter  is approximated as with Φ min being the minimum observed flux.

It has been argued that CCF FWHM and log 𝑅
HK measurements should both behave, to first order, as the convective blueshift suppression term in the  method, which in turn is a close proxy for the integrated active region coverage on the visible stellar hemisphere (Rajpaul et al. 2015); indeed, it has been shown in practice that the CCF FWHM can be a good tracer of photometric flux (e.g.Suárez Mascareño et al. 2020).Thus, as we did not have photometric measurements of HD 13808 available, we chose to interpret the FWHM as a proxy for the flux Ψ().The derivative of the 'flux' was computed numerically by modelling the FWHM with a GP using the algorithm by Foreman-Mackey et al. (2017).In total, there were three free parameters in this stellar activity model,   ,  * and Ψ 0 -their respective priors appear in Table 4.

GP regression model
We used the GP framework developed by Rajpaul et al. (2015), hereafter R15, to model RVs simultaneously with log  HK and BIS observations.In short, this framework assumes that all observed stellar activity signals are generated by some underlying latent function  () and its derivatives; this function, which is not observed directly, is modelled with a Gaussian process (Rasmussen & Williams 2006;Roberts et al. 2013).
Following R15, activity variability in the RV, log  HK and BIS time series can be modelled as: log  HK =    (), and ( 14) respectively.The coefficients   ,   ,   ,   and   are free parameters relating the individual observations to the unobserved Gaussian process  ().In R15's framework,  () itself can be loosely interpreted as representing the projected area of the visible stellar disc covered in active regions at a given time; the GP describing  () is assumed to have zero mean and covariance matrix K, where As in R15, we adopt the following quasi-periodic covariance kernel function: where  GP is the period of the quasi-periodic activity signal,  p is the inverse harmonic complexity of the signal (such that signals become sinusoidal for large values of  p , and show increasing complexity/harmonic content for small values of  p ), and  e is the time scale over which activity signals evolve.This quasi-periodic covariance kernel has been widely used to model stellar activity signals in both photometry and RVs (e.g.Haywood et al. 2014a;Rajpaul et al. 2015;Grunblatt et al. 2015;Bonfils et al. 2018).The full expressions for the covariance between the three different observables modelled are given in R15.We expect, in principle at least, that this sophisticated GPbased approach to modelling RVs jointly with activity indicators should enable more reliable planet characterization, for several reasons.Firstly, by modelling multiple activity-sensitive time series simultaneously (e.g.log  HK , BIS, FWHM, etc., or some subset of these), more information can be gleaned on activity signals in RVs, compared to exploiting only simple correlations between RVs and (typically) one of these time series.Additionally, the framework uses GP draws and derivatives thereof as basis functions for modelling available time series, rather than e.g.sinusoids or other simple parametric models, the inappropriate use of which could easily lead to the introduction of correlated signals into model residuals.The GP basis functions could in principle take any form, although in the GP framework their properties are constrained to some extent by the data itself, and from reasonable prior assumptions about the quasi-periodic nature of stellar activity signals.The GP framework also incorporates the  formalism directly as a special case; the former approach may be thought of as a generalization of the latter.

Bayesian model comparison
As we shall use Bayesian inference to evaluate the relative posterior probabilities of different models, we summarise briefly here the relevant formalism.Firstly, Bayes' Theorem, given in equation ( 17 or simply For model selection, one can compare two models  1 and  2 , given data , by computing the ratio of their respective posterior probabilities; this ratio is also known as the Bayes factor :

Note that
Pr(  1 ) Pr(  2 ) is the relative a priori probability between the two models, which is usually set to one.
To decide whether the evidence difference is significant to favor one model over the other, we make use of the Jeffreys scale as given in Table 5, where the logarithmic scale of the evidence difference is used.This simplifies equation ( 19) to:

Likelihood function
It is commonly assumed that an additive white Gaussian noise (AWGN) model is sufficient for describing observational noise, as in probability theory the central limit theorem states that the sum of independent random variables tends towards a Gaussian distribution, even if the original ones are not normally distributed (Fischer 2011).In this case, the likelihood L (Θ) of parameters Θ can be written as: where (  ; Θ) describes the model's predicted RV for parameters Θ, while   describes the RV observed at time   , with corresponding error estimate   which contains the observational error and the additive noise 'jitter' term  + RV added in quadrature.The logarithmic likelihood, which is more convenient to work with, is computed as: Note that the above formalism applies to all of our models except for the GP model, which explicitly generalises the AWGN model by allowing for red (correlated) noise.In this case, the log likelihood may be computed via the more general expression where K is the matrix defining the covariance between all pairs of observations (see R15), and r is a vector of residuals with the  th element given by (  ; Θ) −   .Note that in the special case where K is a diagonal matrix with the  th element given by  2  , i.e.where the noise is assumed to be white, equation 23 reduces to equation 22.
As the Keplerian model requires the Kepler equation to be solved, we made use of the efficient CORDIC-like method introduced by Zechmeister (2018) where double precision is obtained within 55 iterations.

Avoiding unstable Keplerian orbits
While computing likelihoods we checked if each pair of planets would be stable following the criterion introduced by Gladman (1993): Δ > 2 √ 3   (, ) where Δ =   −   is the difference between the semi-major axis of the -th and -th planet and   (, ) the planets' mutual Hill radius.This has been used before e.g. by Malavolta et al. (2017).
If the parameters drawn resulted in a planet system which was considered unstable, the likelihood was set to zero, i.e.  0 = 0; in our case, as we worked with log likelihoods, this was approximated numerically by ln  0 = −1 × 10 30 .

P C
We used P C as it is an state-of-the-art nested sampling algorithm, designed to work with very high dimensional parameter spaces (Handley et al. 2015).A short discussion comparing it with MultiNest (Feroz et al. 2009) can be found in Hall et al. (2018).It was developed using C++ and Fortran and can also be called as a Python package.
In order to test the reliability of our algorithm with P C we used two sets of simulated data.One data set contained random uncorrelated Gaussian noise (with a standard deviation of 1m s −1 ) without any planetary signals, while the other one consisted of two planets on elliptical orbits with the same Gaussian noise.Both data sets also included an offset of a few m s −1 and linear trend of 10 −5 m s −1 per day.Models of 0-3 planet signals with long term trends in the form of second-order polynomials were fitted and their evidences compared.Both elliptical and circular orbital solutions were computed.We note here that in general, the evidence uncertainty reported by P C is an underestimate, and it is better practice to estimate it via the range of scatter in the evidence across multiple runs (Higson et al. 2018;Nelson et al. 2020); we used the latter approach throughout our analyses.(It is probable that this behaviour would also be ameliorated by increasing the number of live points beyond the computational resources available at the time of writing.) For the first data set without planetary signals, the algorithm successfully favoured the no-planet model consisting of a secondorder polynomial.Every other model computed showed lower evidences, with at least ln R ≈ 2 for the zero-planet model vs. one sinusoidal signal or ln R ≈ 3 for the zero-planet model vs. one elliptic orbital signal; evidences decreased with the number of signals fitted.
Similarly, the two planets with eccentric orbits in the second test were also recovered: the preferred model featured two planets, with the second best model containing three planets with eccentric orbits, albeit with a significantly lower evidence than the two-planet model, and thus rejected.The typical rms of the log evidences was of order ∼ 3 for circular orbital fits and of order ∼ 1 for eccentric orbital fits.
Over multiple runs with complex models, our tests showed that the default stopping criterion of P C was not precise enough 5 for these sort of exoplanet applications.This resulted in inconsistencies where the sampling in some runs missed transitions to areas of higher likelihood (see Fig.  and robust runs, the convergence criterion was lowered to values between 10 −5 to 10 −12 for runs with a large number of planets and complex stellar activity models.As an historical aside, nested sampling was in fact invented (Skilling 2006) in order to solve precisely these kind of phase transition problems.

ANALYSIS AND RESULTS
The extent of stellar activity observed from HD 13808 was investigated by looking at the activity indicators BIS, FWHM and log  HK .None of them showed an obvious linear correlation with the RV measurements.Note that this can occur e.g. when there is a relation but with a simple time lag, as demonstrated with the case of the Sun  2019).However, the log  HK measurements do show a notable cycle from a low of −5.10 to a high of −4.70, as displayed in Fig. 2. A cycle with similar shape and period is also seen in the BIS and FWHM time series, although with smaller amplitude.We thus decided to extend our analysis by splitting the full RV data into two subsets based on the median(log  HK ) = −4.90;low-and high-activity subsets then corresponded to observations with log  HK < −4.90 and log  HK > −4.90, respectively.Table 6 summarises the statistics of the full data set and those two subsets.Note that although we call one subset 'high activity,' HD 13808 is still classified as an inactive star, as an 'active' classification would usually require an average of log  HK > −4.75 (Henry et al. 1996).This log  HK cycle shows great similarities to the one of the Sun with its recent maxima and minima being at log  HK −4.83 and log  HK −4.96; the extrapolated log  HK value for the Maunder minimum period is log  HK −5.10 (Mamajek & Hillenbrand 2008).

Periodogram Analysis
To identify strong periodicities in our time series we computed the Bayesian generalized Lomb-Scargle (BGLS) periodogram (Lomb 1976;Scargle 1982;Zechmeister & Kürster 2009;Mortier et al. 2015) of the two separated data subsets for the RV, the stellar activity indicators BIS, FWHM and log  HK and a constant function (CF) with identical time stamps.
The resulting BGLS periodograms are shown in Fig. 3  The BGLS periodograms of the high-and low-activity subsets of the data suggest that the RV measurements are affected by stellar activity: note, in particular, the excess power around the putative stellar rotation period and the first two harmonics in the high-activity subsets of the BIS and log  HK time series.It is also evident that the stellar activity indicators differ quite strongly when the star is slightly more active; the difference between the mean log  HK value of the two subsets is Δ log  HK = 0.12.

Model Comparison
The different models we considered are summarised in Table 7, along with a short description of each.The model including only circular orbital solutions is called 'Circular', while the elliptical orbital solutions are referred to as 'Kepler'.A '+' indicates that the 'Kepler' solution was complemented with a specific stellar activity model, such as linear dependency on the BIS ('Kepler + linear BIS').This was done equivalently for a subset of models applied to the low and high-activity subsets of the data.

Results
The relative Bayesian evidences for each model with different number of Keplerians are shown in Table 8.For ease of comparison, we set the log evidence for models with two Keplerians to zero.Thus positive values are more favoured and negative ones less favoured than the 2-planet models, with significance to be interpreted according to Jeffreys' scale (Table 5).Table 8 also includes the rms and median values for the RV residuals computed using the maximum a posteriori (MAP) values of the various 2-planet models.
Note that the absolute evidences of different models cannot be meaningfully compared in a global sense as the evidence values depend, among other things, on the data fitted; hence an -Keplerian Gaussian process low activity Gaussian process high activity model that fits only RVs cannot e.g.be meaningfully compared to another -Keplerian model that fits RV simultaneously with a BIS or log  HK time series.In passing we note, however, that it was possible to compare the absolute evidence values of the parametric model 'Kepler + BIS  rot 3 rd harm.+ magn.cycle' with the GP model, as these fitted the same three time series; the 2-planet GP model achieved a log evidence an order of magnitude higher than the former model (−584 ± 3 vs.−4810 ± 6).This overwhelming difference in Bayesian evidences reflects the fact that the GP model achieved a superior fit to the observational data (RVs, log  HK , and BIS series jointly), and moreover that high-quality fits under the GP model were localised to a comparatively small volume of prior space.

Spurious third and fourth planets
Table 8 shows that many of the simple parametric models -though not the GP model -favoured the 'detection' of three or even four planets.However, we have strong empirical reasons to reject models with more than two Keplerian components.
First, we found that the periods of the third Keplerian signal (and also the fourth, where applicable), tended to change with every run, while the periods of ∼ 14 d and ∼ 54 d appeared as MAP values in every run, for all models with two or more Keplerians.This is demonstrated in Table 9, where we list the periods of the additional Keplerian periods found by every model.The most common periods for an 'extra' Keplerian were around 12 and 19 d -as it turns out, these periods out are also strongly visible in the high-activity periodogram of the log  HK and BIS time series (see Fig. 3).As mentioned in the BGLS periodogram analysis, we interpret these periods as harmonics of the stellar rotation period.The fact that these periodicities manifested in Keplerian terms in all but the most complex activity models reflects the inadequacy of the simpler models at capturing all the activity variability -neither the 12 d nor the 19 d Keplerian periodicities showed up under the GP model.
Second, it also turned out that these supposed planetary signals described by the Keplerians either did not show up or their semi-amplitudes decrease to below 1 m s −1 when considering only the low-activity subset of observations.Conversely, the semiamplitudes of the supposed third planets occurring at ∼ 12 d or ∼ 19 d increased significantly when only the high activity data set was modelled compared to when using the full data set (see Table 9).
Perhaps most significantly, our GP model -which we regarded a priori as being by far the most realistic of our activity modelsdecisively favoured a 2-planet interpretation of our full set of observations.The GP modelling even favoured a 2-planet interpretation when considering only the high-activity subset of observations, and resulted in planet properties consistent with those derived from the low-activity subset or the full data set (see Table 10).This was not true for the simpler parametric models; the  method equivocated between 2-and 3-planet solutions, and was not reliably able to detect the ∼ 54 d periodicity.
Our GP activity model, though relatively complex in its own right, did a comparatively good job of fitting variability in RVs unrelated to planets, when compared to most of the other activity models (see residuals in Table 8).This is despite our requirement that the GP fit activity-related variability in RVs simultaneously with BIS and log  HK time series, which ensured that the GP would be very unlikely to try to fit planetary variability.Presumably, then, under the GP modelling there was little residual activity-induced variability that could be explained using a third or fourth Keplerian term; hence these more complex models were rejected.
Regarding the parametric activity models (i.e.not the GP model), the addition of extra Keplerian terms beyond the 2-planet model in many cases did enable a non-trivial fraction of extra activity-related variability to be fitted.This may have been the case, for example, because the amplitudes, phases and periods of the different terms in the harmonic models were by definition fixed, whereas the quasi-periodic GP model is flexible enough to fit both low and high activity variability, possibly with changing phases and arising from a combination of rotation periods (as might be the case due e.g. to differential rotation), rather than a single 'master' rotation period (Rajpaul 2017).Thus the non-GP activity models in many cases favoured solutions with three or four Keplerian terms.
The upshot, then, is that one's conclusions about the number of planetary signals present in an RV time series is extremely sensitive to whether one has modelled other, non-planetary (e.g.activityinduced) variability present in the RVs.

Comments on computational costs
Given the high dimensionality of the joint stellar activity plus planetary models we considered (over 30 parameters for some of the 4-planet plus activity models), accurate Bayesian evidence computation required a large number of posterior samples to be drawnin the case of 4-planet models, several hundred million posterior samples were typically required, even with a state-of-the-art sampler such as P C .This did not prove too burdensome for the parametric activity models; a typical run for a 4-planet model with ∼ 6 additional stellar activity modelling parameters took 30 − 50 hours on 10 cores with 2.9 GHz clock speed each.
Evaluating a single GP likelihood, however, required among other things inversion of a 738 × 738-element covariance matrix; considering that we computed Bayesian evidences for models containing between 0 and 4 planets, and repeated these evidence calculations several times, we ended up implicitly inverting large covariance matrices many billions of times. 6The upshot was that evaluating our GP models required of order fifty thousand CPU core hours on a high-performance computing platform.(Attempts to speed up computation by reducing P C 's precision criterion led to unacceptably large scatter in computed evidences.)Accurate evaluation of the Bayesian evidences of these GP models would simply not have been feasible on a desktop or even a small computing cluster.
At face value this might seem to be a serious shortcoming of the GP model.However, we note that Bayesian evidence computation is computationally challenging in general (Nelson et al. 2020), and it would seem that you 'get what you pay for': the parametric activity models are certainly far cheaper to evaluate, though as we argued in Section 6.2.2, their over-simplicity inevitably leads to the detection of spurious planets.Such issues are avoided with the GP model.

Planet characteristics
The posterior distributions for the inferred semi-amplitude , period  and eccentricity  of the planets in all of our 2-planet models are summarised in Table 10.Despite the wide array of models considered, the planetary parameters show a remarkable degree of 6 While a number of techniques for speeding up GP regression do exist, e.g.Foreman-Mackey et al. (2017), we are not aware of any that are well-suited to cases where covariances need to be formulated over multiple inputs and outputs simultaneously, e.g.RVs jointly with activity indicators.consistency, with the credible intervals for their semi-amplitudes and eccentricities agreeing within 1 across almost all models; the same holds true when considering only the low-activity subset of observations.Differences between orbital periods inferred under different models are never greater than a fraction of a per cent.This provide additional evidence that the detected signals are indeed planetary and robust.It is worth noting that the GP model favours a marginally larger MAP semi-amplitude   , and a marginally larger MAP semi-amplitude   , than the typical parametric models.However, the values favoured by the GP model are bracketed both above and below by other parametric models, and so are in no sense extreme; concern that a GP might 'absorb' some of a planetary signal is clearly unfounded in this case. 7Under the parametric activity models,   tends to decrease in the low-activity data subset compared to the high-activity subset, suggesting that the outer Keplerian (with period broadly similar to the likely stellar rotation period) is actually being used to absorb some RV variability the too-simplistic activity models cannot account for.  appears comparatively in-7 After all, the framework from R15 is designed so that the GP component of the model is only able to explain variability simultaneously present in RVs and activity-sensitive time series; therefore, except in extremely pathological cases, there should be little risk of the GP wrongly fitting a planetary signal.
sensitive to stellar activity levels, which may reflect the fact that   is about 2.5 times shorter than the star's putative rotation period, such that little constructive or destructive interference between the Keplerian and activity signal occurs.
Putting aside the fine-grained differences between the GP and parametric activity models, Table 13 shows striking and remarkable consistency between planet parameters across almost all the models we considered, with well-constrained periods, and semi-amplitudes inconsistent with zero at > 10 levels.On this basis we regard our modelling to represent the secure RV detection of two planets orbiting HD 13808.By way of contrast, it is worth noting that even transiting planets with tightly constrained periods and orbital phases may sometimes prove difficult to characterize, with inferred semi-amplitudes diverging wildly depending on which model is used: Kepler-10 c is an excellent example (Fressin et al. 2011;Dumusque et al. 2014;Weiss et al. 2016;Rajpaul et al. 2017).We refer, hereafter, to the planets we have detected as HD 13808 b and HD 13808 c with periods of ∼ 14 and ∼ 54 d, respectively.

Stellar activity characteristics
The stellar rotation period of 38.99 ± 0.47 d inferred under the GP model is broadly bracketed by the ∼ 32-41 d rotation periods inferred by our various parametric models (Table 11), and agrees well with the ∼ 40 d period estimated by Lovis et al. (2011).
The combined RV semi-amplitude ascribed to activity under the GP model (taking into account both the convective and rotational terms in the model, i.e.   and   ) is a little under 1 m s −1 ; meanwhile, the fitted RV jitter term under the GP model is of order 2 m s −1 .By contrast, the 'Kepler + BIS  rot 3 rd harm.' model ascribes only around 30 cm s −1 of RV variability to activity, while absorbing nearly 6 m s −1 of variability as white-noise jitter (see Fig. 5).These differences are not surprising, given that our quasiperiodic GP function draws do not need to have constant amplitudes and phases, need not be strictly periodic, etc.: consequently, the GP can fit more activity-related variability than the more inflexible parametric model, which must instead absorb the stellar red noise via a large jitter term.
The GP model also allows us to constrain the evolution timescale of active regions: the inferred value of  e = 91±10 d (Table 12) suggests an active region lifetime (or spot evolution time scale) of about two rotation periods, which is significantly longer than is observed for active regions on the Sun (Bradshaw & Hartigan 2014).The hyper-parameter  p is harder to interpret physically, though the value  p ∼ 1 suggests that the activity signals being modelled by the GP are only moderately more complex than a sinusoidal model, having on average something between three or four inflection points per period, compared with exactly two for a sine wave (Rajpaul 2017).
In addition to the stellar activity linked to the rotation period of the star, we are able to characterise the long-term magnetic cycle of HD 13808.The two models considering this cycle 'Kepler + magn.cycle' and 'Kepler + BIS  rot 3 rd harm.+ magn.cycle' estimate these variations to have a period of 3682 ± 14 d and 3681 ± 19 d with RV semi-amplitudes of 1.61 +0.26  −0.33 m s −1 and 1.77 +0.37 −0.28 m s −1 , respectively.The estimate for the RV semi-amplitude of the magnetic cycle in the first model may be affected by the lack of the short-period activity, though the MAP values for both period and semi-amplitude are consistent within their 1 credible intervals.

Conclusion
In summary, our favoured activity model is the GP model; we might consider the 'Kepler + BIS  rot 3 rd harm.' (or arguably the  ) model to be the best of the parametric activity models, even though it falls short of the GP.Both the GP and the 'Kepler + BIS  rot 3 rd harm.' models favour a two-planet solution over a one-planet solution; they both strongly reject solutions containing more than two Keplerians, supporting our non-planetary interpretation for any 'extra' Keplerians.In addition, they both show amongst the lowest residual RV scatters (Table 8).However, the GP model is the only one to reject under all circumstances solutions containing more than two Keplerians, and to reliably detect the second planetary signal even when considering the high-activity subset of observations.Despite its other merits, the 'Kepler high activity + BIS  rot 3 rd harm.' model has the unusual disadvantage of having locked on to what appears to be a spurious or at least unusually short stellar rotation period, unlike the GP model and some of the other parametric activity models (see Table 11).
The MAP values and their corresponding planetary characteristics for our two favoured models are summarised in Table 13.The inferred minimum masses for HD 13808 b and HD 13808 c are ∼ 11  ⊕ and ∼ 10  ⊕ (model dependent), and their orbital semi-major axes around the host star are ∼ 0.11 AU and 0.26 AU, respectively.As Neptune has a mass of ∼ 17  ⊕ , this leads us to the conclusion that both planets are likely warm Neptunes.
The phase folded RVs for both planets and both models are displayed in Fig. 4, and the corner plot corresponding to the posteriors of both models is presented in Fig. 5.The comparison of the posteriors of both favoured models shows that -despite taking very different approaches to modelling stellar activity -the planetary parameters do agree within their 1 credible intervals.It is also interesting to note how narrow (and likely wildly over-optimistic) the posterior distribution for the stellar rotation period inferred by the harmonic model is.
To demonstrate the goodness-of-fit, the RV predictions of our two favoured models in comparison to the data are shown in two parts of the data set in Fig. 6, one being in the high-and one in the low-activity subsets of observations.

DISCUSSION
We presented a comprehensive study of the HD 13808 RV data provided by the HARPS spectrograph.In particular, we tested and compared multiple approaches to stellar activity modelling.
We found that Bayesian evidence comparison is, by itself, not sufficient for deciding on the number of planets present: when stellar activity is not modelled adequately, extra 'planets' might be favoured to account for residual variability.Only our GP model favoured two planets (both of which we believe to be extremely secure detections) over a higher number of planets under all circumstances we considered, including modelling only a high-activity subset of observations; the parameters for the additional 'planets' suggested by other models were highly variable and model-sensitive, and/or corresponded to likely activity variability (cf.periods in Table 9, particularly for the high-activity subset).However, some of our more complex stellar activity models such as the multi-harmonic and  stellar activity models did seem to describe the activity variability better than more simplistic parametric models, based both on the decrease in residual RV rms scatter and on the decrease in evidence for a three-planet solution vs. the two-planet solution.In short, model comparison using Bayesian evidence is only as good as the models considered.
Reassuringly, when modelling the full set of observations, the RV signals of HD 13808 b and HD 13808 c were without exception detected at a > 10 level with every activity model we considered, with the planets' characteristics remarkably consistent across all models.Taking all the observations into account, our two favoured models found the two planets HD 13808 b and HD 13808 c to orbit their host star with periods of ∼ 14.2 and 53.8 d at distances of ∼ 0.11 and 0.26 AU, with minimum masses of 11 and 10  ⊕ , respectively.The planetary parameters inferred from the low-activity observations alone are also mostly within agreement with those derived from the full data set.
The influence of stellar activity on the planets' characteristics was evident, however, when considering the parameters yielded by the models applied to the high-activity data subset where the log  HK value of the star was > −4.90.For example, even under a single model ('Kepler high activity + BIS  rot 3 rd harm.'), the MAP RV semi-amplitude of HD 13808 b decreases from 3.71 m s −1 to 2.19 m s −1 when moving from the low-to high-activity subset of observations, although these two semi-amplitudes are inconsistent at only a ∼ 2 level.Using the GP -the only model reliably to detect HD 13808 c in the high-activity subset of observations -the MAP RV semi-amplitude of HD 13808 c decreases from 1.78 m s −1 to 1.04 m s −1 , although in this case at least the semi-amplitudes do remain consistent within ∼ 1.
To conclude, even though in our case the planetary parameter values were relatively insensitive to the choice of stellar activity model when analysing the full data set (246 measurements), cau-Table 13.MAP values and ±1 credible intervals of the planetary parameters of the two planets in the HD 13808 system for the two favoured models: 'Gaussian process' and 'Kepler + BIS  rot 3 rd harm.'  tion would be essential on smaller data sets, or indeed when trying to characterise weaker planetary signals.In particular, the changes in the inferred planet parameters when considering only the highactivity subset of our data underscores the importance of stellar activity modelling even for a host star nominally classified as 'inactive'.

Figure 1 .
Figure 1.A typical run with P C. The bottom graph shows the logarithmic likelihood over the course of the run, with log  corresponding to the prior volume i.e. at the start of the run log  is at its highest value.Two phase transitions are clearly visible as 'knees' in the logL-logX curve.On the top, the live-point distributions of two parameters (  and√  cos(   )) within a given likelihood contour (indicated by the red vertical lines) are shown, demonstrating the transitions between different phases.Note that if the precision criterion is not set low enough, one runs the risk of stopping nested sampling at too early a log  value (i.e. at one of the lower knees), and this is indeed what we observed in preliminary tests.

Figure 2 .
Figure 2. Measurement of the log  HK activity index on the series of spectra of HD 13808.The black dashed line represents the median value of −4.90 which was used as the condition for separating the data set into two parts.
where peaks in the RV periodograms are visible at ∼ 15 and ∼ 55 d.Sets of strong peaks are visible in the BGLS periodograms of the activity indicators log  HK , BIS and FWHM between 30 − 40 d and a few between 25 − 30 d.Strong peaks are evident at ∼ 19 and ∼ 32 d in the log  HK high activity periodogram, with peaks at ∼ 12, ∼ 19 and ∼ 34 d in the BIS periodogram.The FWHM periodograms show peaks in the region of ∼ 35 d.

Figure 3 .
Figure 3. BGLS periodograms of, from top to bottom, the RV, BIS, FWHM and log  HK measurements and of a constant function (CF) i.e. a dataset with the same timestamps and RV errors but assuming constant RV.The colours red and black indicate the high activity and low activity phases.The dark grey shaded box represents the range of the putative rotation period with its second (grey) and third harmonic (light grey).

Figure 4 .
Figure 4. Phase folded RVs of HD 13808 for the planets HD 13808 b (left) and HD 13808 c (right) i.e. the leftover RV when the other planetary signal, the stellar activity model and the polynomial offset is subtracted.The black dots are the averaged RV values computed over bins of 36 degrees in mean longitude and the error bars correspond to the rms of the mean, while the gray points are the observed RV values with their error.The red line represents the calculated curve of the planet with the MAP planetary parameter values of each model.

Figure 6 .
Figure 6.Model prediction (red) with posterior predictive uncertainty (shaded blue) and their residuals for 'Kepler + BIS  rot 3 rd harm.' and 'Gaussian process' models; RV measurements and corresponding uncertainties appear in black.Two representative subsets of observations are shown, one from the high-activity (left) and low-activity (right) phases of the star.

Table 1 .
Description of the HARPS RV data of HD 13808.

Table 3 .
Priors for the Keplerian, RV trend and additive white noise ('jitter') parameters used in all of our models and analyses.* We also require for the periods to be sorted i.e.   <   etc. ** We also require that the corresponding eccentricity e < 1. *** This prior applies to all polynomial terms.

Table 4 .
Priors for the parameters of the various stellar activity models described in Section 4. *   is the radius of the star and   the estimated error.+  is the observed standard deviation of the FWHM.

Table 5 .
Jeffreys' scale, as introduced by Jeffreys (1983), for interpreting differences in Bayesian evidences.This scale interprets the strength of evidence favouring one model over another.

Table 6 .
An overview of the HARPS RV data subsets of HD 13808 referred to as low and high activity when log  HK is < −4.90 and > −4.90, respectively.

Table 7 .
HD 13808: Overview of the models used for the study of the HD 13808.rot 1 st harm.Kepler + simultaneous fit of 1 st harmonic of  rot to the BIS and RVs; Section 4.2 + BIS  rot 2 nd harm.Kepler + simultaneous fit of 1 st and 2 nd harmonic of  rot to the BIS and RVs; Section 4.2 + BIS  rot 3 rd harm.Kepler + simultaneous fit of 1 st , 2 nd and 3 rd harmonic of  rot to the BIS and RVs; Section 4.2

Table 8 .
HD 13808: Relative Bayesian evidences for different number of planets for each model and their scatter -as well as the residual RV (O−C) rms and median absolute deviation (MAD) -for the 2-planet model (or 1-planet model, if favoured) computed using the respective MAP values (see Table10).The model with two planetary signals for each model type was set to be zero.The errors correspond to the standard error on the mean evidence across multiple runs or to the highest individual run error provided by P C , whichever of the two is greater.

Table 9 .
HD 13808: Periods for a putative third planet found in the various runs by the different models, their corresponding semi-amplitudes and the uncertainty (per Table8) in the model evidences.Values in bold occur in at least 2 of the runs.

Table 10 .
HD 13808: Comparison of the orbital parameters semi-amplitude  , period  and eccentricity  for the two planets HD 13808 b and HD 13808 c for each model.The parameters for the 2-planet models of the run with the highest evidence from all runs are shown here.Note that not all models favour the 2-planet case.

Table 11 .
HD 13808: Rotation periods determined by the different models.

Table 12 .
MAP values and ±1 credible intervals for the parameters of the activity-related parameters of the GP model.