## Abstract

We perform a detailed analysis of the latest HARPS and Keck radial velocity data for the planet-hosting red dwarf GJ 581, which attracted a lot of attention in recent time. We show that these data contain important correlated noise component (‘red noise’) with the correlation time-scale of the order of 10 d. This red noise imposes a lot of misleading effects while we work in the traditional white-noise model. To eliminate these misleading effects, we propose a maximum-likelihood algorithm equipped by an extended model of the noise structure. We treat the red noise as a Gaussian random process with an exponentially decaying correlation function.

Using this method we prove that (i) planets *b* and *c* do exist in this system, since they can be independently detected in the HARPS and Keck data, and regardless of the assumed noise models; (ii) planet *e* can also be confirmed independently by both the data sets, although to reveal it in the Keck data it is mandatory to take the red noise into account; (iii) the recently announced putative planets *f* and *g* are likely just illusions of the red noise; (iv) the reality of the planet candidate GJ 581 *d* is questionable, because it cannot be detected from the Keck data, and its statistical significance in the HARPS data (as well as in the combined data set) drops to a marginal level of ∼2σ, when the red noise is taken into account.

Therefore, the current data for GJ 581 really support the existence of no more than four (or maybe even only three) orbiting exoplanets. The planet candidate GJ 581 *d* requests serious observational verification.

## INTRODUCTION

The multi-planet extrasolar system hosted by the red dwarf GJ 581 has attracted a lot of interest in the past few years. The concise history of planet detections for this system is as follows. The first planet *b*, having an orbital period of 5.37 d and a minimum mass of ∼ 15 M_{⊕}, was reported by Bonfils et al. (2005). The two subsequent super-Earths *c* (with the orbital period of 12.9 d and the minimum mass of ∼ 5 M_{⊕}) and *d* (with the originally reported orbital period of 82 d later corrected to 67 d and the current minimum mass estimate of 6 M_{⊕}) were discovered by Udry et al. (2007). Further, Mayor et al. (2009) reported the detection of the smallest exoplanet known at that time, GJ 581 *e*, orbiting the host star each 3.15 d and having the minimum mass of only approximately 2 M_{⊕}. All these discoveries were done on the basis of the radial velocity (RV) data obtained with the famous HARPS spectrograph.

Later, the Keck planet-search team got involved. Vogt et al. (2010) performed an analysis of the combined HARPS and Keck measurements and claimed the detection of two more planets in the system, *f* and *g*, orbiting the host star each 433 and 36.6 d (and having minimum masses of ∼7 and ∼ 3 M_{⊕}). The last planet *g* is remarkable because it appears to reside in the middle of the predicted habitable zone for this star. However, the reality of these two planets represents a subject of serious debates in the recent time. Gregory (2011) remained uncertain about the existence of these planets, based on his very detailed Bayesian analysis of the joint (Mayor et al. 2009) and (Vogt et al. 2010) data sets. Tadeu dos Santos et al. (2012) basically agreed with this conclusion, finding from the same combined data set that the detection confidence probabilities for these two planets are 96 per cent for planet *g* and 98 per cent for planet *f*. These values are too high to be just neglected, but they are simultaneously too low to claim a robust detection. Forveille et al. (2011) claim in a recent preprint that newer HARPS data do not support the existence of any planets beyond the four-planet model. Finally, in a very recent paper (Vogt, Butler & Haghighipour 2012) the authors assert, on the basis of the HARPS data from Forveille et al. (2011), that with the false-alarm probability of ∼4 per cent an extra 32-d planet should exist in this system, beyond the four-planet model.

Summarizing these investigations, we must admit that the reality of the last detected planets is rather controversial. This uncertainty probably comes from some mysterious interference between the HARPS and Keck data. Indeed, it follows from e.g. Gregory (2011) and Tadeu dos Santos et al. (2012) that Keck data alone do not allow us to detect more than only *two* planets *b* and *c*; all other planets seem to fall beyond the detection power of this time series. Newest HARPS data alone allow for the robust detection of *four* planets (from *b* to *e*) and do not really support the existence of the planets *f* and *g*. However, some additional variations can be still detected when both the data sets are joined, and it is rather uncomfortable just to ignore them.

In this paper, we present an attempt to find a solution of this mystery. Our main idea comes from our previous work (Baluev 2011), where we analysed available RV data for another planet-hosting red dwarf GJ 876, and found that these data contain significant correlated noise component, also called as ‘red noise’. Traditional statistical methods assume that the measurement errors are statistically independent, implying that their frequency power spectrum is flat (thus the noise is ‘white’). As we have shown in Baluev (2011), both HARPS and Keck RV measurements of GJ 876 demonstrate non-white power spectra with a clearly visible excess at longer periods, and this non-whiteness is statistically significant. We found that a similar picture is often seen in the periodograms of the GJ 581 data (see e.g. a lot of periodograms plotted by Tadeu dos Santos et al. 2012). All this motivated us to investigate how the red noise could affect the derived orbital configuration of this system.

The structure of this paper is as follows. In Section 2 we discuss the common undesired effects that the red noise might impose, and demonstrate how it reveals itself in the GJ 581 RV data. In Section 3 we present a maximum-likelihood algorithm that can perform a reduction of the red noise, based on its full modelling. In Section 4 we perform a detailed analysis of the latest RV data for GJ 581 taken from Vogt et al. (2010) and Forveille et al. (2011). We show how in the particular case of GJ 581 the red noise creates fake RV variations, as well as hides the true ones. We also give two best-fitting orbital solutions for this system that take the red noise into account. In Section 5, we discuss the reality of the putative fifth and sixth planets, based on our RV data analysis. In Section 6, we discuss what global consequences the red noise implies for the past exoplanetary data-analysis works and what it requests from the future ones.

## RED NOISE AS A MISLEADING AGENT

The routinely used methods of astrostatistics are designed to deal with the data containing uncorrelated noise. Such noise is also called white because its frequency spectrum is flat: its periodograms demonstrate approximately the same mean level when averaged over different frequency segments of the same length.

In practice, however, the white-noise approximation may be poor. In particular, the noise in photometric observations of exoplanetary transits is routinely red (Pont, Zucker & Queloz 2006). In the RV planet searches, it is also known that the RV noise is not necessarily white because it may demonstrate smaller level when averaged over larger time-scales (e.g. O’Toole, Tinney & Jones 2008). However, for the RV case this issue basically appears as rather dark stuff with no routinely working practical solution known so far.

Potential impact of the red noise on the results of the data analysis may be huge. The correlated data usually carry smaller amount of information, as if their number was smaller. Therefore, when our data contain correlated noise, various statistical uncertainties are typically larger than we obtain based on the traditional white-noise models. It is the first effect imposed by the red noise. Another, possibly even more important effect, appears due to the non-uniform frequency spectrum of the red noise. Basically, the red noise is able to generate fake periodicities that can be mistakenly ‘detected’ by the white-noise algorithms. This is demonstrated in Fig. 1, where the simulated example of a correlated noise looks like a bit noisy mixture of some illusive periodic signals. Moreover, these fake periodicities may obscure real variations, keeping them undetected until some data-analysis tool that is aware of the noise correlateness is applied.

In the case of GJ 581, the white-noise model of the RV data is definitely inadequate. As we can see from Fig. 2, the periodograms of the residual noise that remains after elimination of the compound RV signal of four planets demonstrate clear excess of the power at low frequencies (≲0.1 d^{−1}), a symmetric excess around 1 d period (emerging due to a strong diurnal aliasing), and a depression in the middle of the segment. It is important that this power excess does not concentrate in any well-defined discrete peaks; instead it is spread smoothly in a continuous frequency band. The both periodograms for the HARPS and Keck data demonstrate a similar smoothed shape, although the positions of individual high peaks have little in common (meaning that all relevant periodicities are not real). We may note that this picture looks very similar to the one we have already seen for the GJ 876 case (Baluev 2011).

The RV noise correlation can be also revealed in the time domain. This is easy to do for the HARPS data thanks to the following favouring factors.

- (i)
All HARPS measurements were done at almost the same sidereal time, implying that the time separation between two arbitrary data points is usually very close to an integer number of sidereal days.

- (ii)
There are a lot (up to ∼100) of HARPS RV pairs that have a small time separation of only one or a few sidereal days.

We run the following procedure. Given some integer *n* from 1 to 30, we collect all pairs of the HARPS measurements that are separated by *n* sidereal days. For each such pair, we evaluate the difference of the four-planet RV residuals corresponding to the observations involved in this pair. Finally, for each group of the RV pairs with a given time separation we evaluate the sample variance of this RV difference. How this variance can help us to detect red noise? Assuming that the variance of the residual RV noise has some constant value of σ^{2} (which is not too far from the truth) and its autocorrelation function is *R*(Δ*t*), the variance of the above-mentioned RV difference should be 2σ^{2}[1 − *R*(Δ*t*)]. Therefore, the graph of this variance should basically represent a rescaled upside-down view of the noise correlation function. We show this plot in Fig. 3. We can see a clear growing trend before the time separation of 10 d, and a saturation beyond this point. Basically, the HARPS RV measurements have better relative precision at short time-scales of up to ∼10 d, while at longer time separations they show larger random scatter.

Unfortunately, the distribution of the Keck data points is not regular enough, and application of a similar procedure to the Keck data set was not informative.

In this section, we limit ourselves by demonstration only, leaving the rigorous determination of the red noise significance for further sections. However, it is already clear that we may obtain trustable results concerning the GJ 581 planetary system only if we utilize some method of data analysis that can properly deal with correlated noise.

## MAXIMUM-LIKELIHOOD REDUCTION OF THE RED NOISE

The method that we propose for the analysis of the data polluted by correlated noise represents a generalization of the maximum-likelihood approach described in Baluev (2009), which was already used in Baluev (2011). The main idea of the method is to construct some suitable correlational model of the noise in the RV data and then, based on this model and on the Keplerian model of the RV curve, apply the maximum-likelihood algorithm to estimate the parameters of both the models and the relevant goodness of fit.

Thus, we should first choose some realistic and simultaneously simple model of such correlated noise. We assume that this noise is a Gaussian random process with some known correlation function. This means that the full vector of our *N* RV measurements *x*_{i}, taken at the timings *t*_{i}, should follow a multivariate Gaussian distribution. The mean of this distribution is equal to the RV curve model $$\mu (t_i,{\bf \theta} )$$, and the relevant variance–covariance matrix is $$\bf{V}(\boldsymbol p)$$, where the vectors $${\bf \theta}$$ and $$\boldsymbol p$$ contain some free parameters to be estimated from the data. The corresponding log-likelihood function may be expressed as

Maximizing equation (1) over $${\bf \theta}$$ and $$\boldsymbol p$$, we obtain the best-fitting estimations of these parameters (the point where the maximum is attained). The value of the maximum likelihood itself may further be used, for instance, in the likelihood ratio test comparing two different data models.

As we explain in Baluev (2009), in practice it might be useful to replace the true likelihood function (1) by a modified version

*N*is not so large.

Note that the generalized model (1) differs from the one used in Baluev (2009) in the matrix $$\bf{V}$$, which is no longer diagonal. However, the general theory of maximum-likelihood estimations is basically the same for both the cases. For example, to find the covariance matrix of the maximum-likelihood estimations, we should first calculate the quadratic Taylor approximation of the function $$\ln \mathcal {L}({\bf \xi} )$$. From equation (1), we can easily derive the relevant gradient

Considering together equations (1)–(4), we can write down the following quadratic approximation:

Expansion (5) allows us to approximate the point $${\bf \xi} ^*$$, where the maximum is achieved, as $${\bf \xi} ^* \simeq \hat{{\bf \xi} }+ \bf{F}^{-1} \boldsymbol g$$. Since the relation $$\mathop {\rm Var}\nolimits \boldsymbol g = \bf{F}$$ holds true, the variance–covariance matrix of our estimations $${\bf \xi} ^*$$ has the same asymptotic representation for large *N* as in the uncorrelated case:

The numerical non-linear maximization of equation (1) or (2) can be performed by means of the Levenberg–Marquardt algorithm. Note that the simplified widespread version of this algorithm that minimizes a sum-of-squares function (as implemented e.g. in the minpack package) is unsuitable here, because it relies on certain relationships between the gradient and Hessian matrix, which are invalid in our case. We need a general variant of the Levenberg–Marquardt algorithm (e.g. Bard 1974) that can maximize an arbitrary non-linear target function and can deal separately with the gradient and the Hessian. Note that it is handy to approximate the Hessian as $$-\bf{F}$$ due to expansion (5). It is useful because $$\bf{F}$$ is always positive definite. Besides, a lot of care is needed to optimize the calculational performance, since formulae (3)–(7) require very computation-greedy operations with large matrices and vectors. We give some tips concerning this issue in Appendix A.

We only need to detail the last thing, namely the model of the noise covariance matrix $$\bf{V}$$. In this paper, we consider three main noise models.

- (i)
*White-noise model.*The matrix $$\bf{V}$$ is diagonal; the diagonal elements represent the total variances of individual RV measurements and are equal to the sum of the instrumental part (the square of the stated measurement uncertainty) and the RV jitter. The RV jitter is different for different instruments. This model was considered in Baluev (2009). - (ii)
*Shared red-noise model.*In addition to the white-noise components, we add to $$\bf{V}$$ the red-noise covariance matrix $$\sigma _{\rm red}^2 \bf{R}(\tau )$$, where the elements of $$\bf{R}$$ are defined via some guessed noise correlation function ρ(*x*) as*R*_{ij}(τ) = ρ[(*t*_{j}−*t*_{i})/τ]. We chose ρ(*x*) = e^{− |x|}. This noise model infers that the red noise belongs to the star, while the spectrographs generate only the white noise. - (iii)
*Separated red-noise model.*It is similar to the previous case, but the parameters σ_{red}and τ are different for different instruments (HARPS and Keck). The cross-correlation between HARPS and Keck measurements is set to zero. This model infers that the red noise belongs to the spectrographs, and not to the star.

## GJ 581 data analysis

### Preliminary investigation

The main goal of this subsection is to estimate the validity of various statistical methods in the case of GJ 581 RV data analysis. We have two data sets at our disposal: the 240 HARPS and 121 Keck/HIRES RV measurements published in Forveille et al. (2011) and Vogt et al. (2010), respectively. First of all, we provide four-planet white-noise fit in Table 1 that was obtained by means of the likelihood function maximization as described in Baluev (2008b, 2009, 2011).

Planetary parameters | ||||
---|---|---|---|---|

Planet b | Planet c | Planet d | Planet e | |

P (d) | 5.368 585(79) | 12.9175(19) | 66.616(79) | 3.149 22(18) |

$$\tilde{K}=K\sqrt{1-e^2}$$ (m s^{−1}) | 12.58(16) | 3.26(16) | 1.95(17) | 1.79(16) |

e | 0.021(13) | 0.053(48) | 0.259(83) | 0.164(89) |

ω (°) | 334(35) | 145(53) | 342(18) | 156(31) |

λ (°) | 142.93(76) | 106.5(3.0) | 144.9(5.2) | 63.3(5.4) |

M sin i (M_{⊕}) | 15.78(20) | 5.48(27) | 5.65(49) | 1.88(17) |

a (AU) | 0.040 611 87(40) | 0.072 9244(70) | 0.217 68(17) | 0.028 4573(11) |

Data series and common fit parameters | ||||

HARPS | Keck | |||

c_{0} (m s^{−1}) | −9205.96(13) | 1.08(27) | ||

σ_{white} (m s^{−1}) | 1.50(11) | 2.45(23) | ||

rms (m s^{−1}) | 1.96 | 2.82 | ||

$$\tilde{l}$$ (m s^{−1}) | 2.25 |

Planetary parameters | ||||
---|---|---|---|---|

Planet b | Planet c | Planet d | Planet e | |

P (d) | 5.368 585(79) | 12.9175(19) | 66.616(79) | 3.149 22(18) |

$$\tilde{K}=K\sqrt{1-e^2}$$ (m s^{−1}) | 12.58(16) | 3.26(16) | 1.95(17) | 1.79(16) |

e | 0.021(13) | 0.053(48) | 0.259(83) | 0.164(89) |

ω (°) | 334(35) | 145(53) | 342(18) | 156(31) |

λ (°) | 142.93(76) | 106.5(3.0) | 144.9(5.2) | 63.3(5.4) |

M sin i (M_{⊕}) | 15.78(20) | 5.48(27) | 5.65(49) | 1.88(17) |

a (AU) | 0.040 611 87(40) | 0.072 9244(70) | 0.217 68(17) | 0.028 4573(11) |

Data series and common fit parameters | ||||

HARPS | Keck | |||

c_{0} (m s^{−1}) | −9205.96(13) | 1.08(27) | ||

σ_{white} (m s^{−1}) | 1.50(11) | 2.45(23) | ||

rms (m s^{−1}) | 1.96 | 2.82 | ||

$$\tilde{l}$$ (m s^{−1}) | 2.25 |

The *M* sin *i* and *a* values were derived assuming the mass of the star *M*_{☆} = 0.31 M_{⊙}, which was used e.g. by Forveille et al. (2011). The uncertainty of *M*_{☆} was not included in the uncertainties of the derived values. The mean longitudes λ refer to the epoch JD 245 4500. The goodness of fit $$\tilde{l}$$ was derived from the maximum of $$\ln \tilde{\mathcal {L}}$$ as explained in Baluev (2009).

The maximum-likelihood approach infers a set of well-known classical theoretical results and methods concerning the maximum-likelihood estimations that were established under a condition that the number of observations tends to infinity. We will call them collectively as Asymptotic Maximum-Likelihood Estimation Theory (hereafter AMLET). A fraction of them is described in Baluev (2009), bearing in mind an application to the exoplanetary RV curve fitting task.

Note that AMLET tools are sometimes called as frequentist ones, especially in the works employing the Bayesian analysis, where such opposing highlights the advantages of the Bayesian methods. Such terminology actually hides a misconception: AMLET represents, basically, a common limit to which both frequentist and Bayesian approaches converge, when the number of observations tends to infinity. Therefore, it would be incorrect to equate AMLET and the general frequentist approach in the statistics, since most of the classical AMLET propositions can be easily reinterpreted from the Bayesian point of view, while the genuine frequentist methods in their general form (e.g. Lehman 1959) are more complicated and theoretically justified than AMLET.

It is often argued that AMLET tools are often not applicable to the exoplanetary RV data analysis, especially for multi-planet systems which involve very complicated non-linear RV signal models. However, this is rarely verified with concrete practical cases. In this paper, we undertook an attempt to rigorously assess the applicability of AMLET tools to the case of GJ 581. Since this research involves a vast amount of various numerical simulations and rather boring statistical stuff, which is not related directly to the GJ 581 system itself, we do not describe these results in the main body of the paper. An interested reader may find the detailed discussion in Appendix C. Here we only provide a short summary of our investigation.

- (i)
In the case of the GJ 581 RV data, the parametric confidence regions and false-alarm probabilities, obtained using AMLET, work well for the white-noise and shared red-noise four-planet models, but are unsuitable for the separated red-noise model. This indicates that the latter model is overparametrized and must be used with caution.

- (ii)
When analysing the HARPS and Keck data independently from each other, we may use AMLET for the HARPS time series, but not for the Keck one. Actually, the Keck data set is the main thing that makes AMLET unusable with the separated red-noise RV model.

- (iii)
We should avoid using the bootstrap simulation (Section B2) for any of our models, because it works in an unexpected and misleading manner when a parametrized noise is involved. However, we may safely use the usual Monte Carlo simulation (Section B1) instead, since the RV data show absolutely no hints of any non-Gaussianity, which is the main fear of people preferring the bootstrap.

^{1} - (iv)
In most of the cases, we have no need for complicated and computationally greedy techniques like the Bayesian analysis or the genuine frequentist methods. For the white-noise and shared red-noise model, we can just use AMLET with no fear. For the separated red-noise model AMLET is poor, but with this model we obtain little serious results that would need a deep verification.

### Assessing the significance of the red noise

Let us first assess rigorously the significance of the noise non-whiteness. We can do this using the method described in Baluev (2011). Using a variation of the Monte Carlo algorithm B1 from Appendix B, we generated a bunch of 1000 simulated residual periodograms assuming the white model of the noise and the four-planet model of the RV curve. Thus, these periodograms are evaluated using exactly the same algorithm as in Fig. 2, but based on simulated uncorrelated data. Each simulated periodogram is then smoothed, also exactly as in Fig. 2. Based on this set we can derive the distribution of single values of the smoothed periodograms (for an arbitrary frequency), and also the distribution of the associated max/min ratio, which characterizes the degree of non-whiteness of the simulated spectrum. It is important that this procedure does not require us to make any assumptions about the red-noise correlation function.

The results are shown in Fig. 4. We can see that among 1000 Monte Carlo trials none could reproduce the same large max/min ratios that we observed for the smoothed periodogram of the real data. Therefore, the non-whiteness in the RV noise of these real data has very high significance (>99.9 per cent). For comparison, we also plot the periodograms of the real data on the basis of the red-noise models, utilizing the algorithm of Section 3. We can see that these frequency spectra are already consistent with the white-noise statistical limits, possibly except for the case of the Keck periodogram with shared red-noise model, where the residual non-whiteness has the significance of 2.6σ. Therefore, this model may be incapable of complete elimination of the red noise, probably because the red noise has somewhat different characteristics between the HARPS and Keck data sets. We think, however, that this shared red-noise model suits our practical needs at best, since the residual frequency spectrum non-whiteness is anyway a few times smaller than it was for the original white-noise model. The separated red-noise noise model can do apparently more impressive reduction of the correlated noise, but, as we have discussed above, this model is statistically poor.

We must emphasize that all significance levels shown in Fig. 4, including the one of 2.6σ for the Keck shared red-noise periodogram, were derived from white-noise simulations. To be fully honest, we ought to evaluate these levels assuming a matching noise model for each, but it appeared too computationally demanding for the red-noise cases. We expect that the correct significances for red-noise periodograms may be somewhat smaller because such periodograms usually showed systematically higher significance levels. It may even appear that the mentioned residual Keck non-whiteness for the shared red-noise case is eventually insignificant. However, this does not alter our conclusion that the white-noise model is inadequate; the relevant significance is based on the correct (matching) noise model and is well above the 3σ level, both for the HARPS and Keck data.

### Detailed analysis of the RV data

#### HARPS data alone

Let us start from the analysis of 240 HARPS RV measurements. In Fig. 5 we show a series of the residual periodograms, starting from the two-planet base model (planets *b* and *c*). In the case of the white-noise model, we are able to subsequently extract all four planets from these periodograms. We can see that all four peaks show very high significance. However, in the last residual periodogram, corresponding to the case when all four peaks are already extracted, we can see a typical red noise picture: an amorphous set of peaks at the periods longer than ∼10 d, a diurnal alias of this frequency band close to 1 d period and a depression in the middle part of the period range.

We can see that our maximum-likelihood algorithm suppressed the effect of the red noise, as expected. However, together with the red noise, our procedure dramatically suppressed the planet *d* peak at 67 d. This is not very surprising on itself, since this peak is in the range where the red noise is ruling. However, the final significance of this peak becomes marginal – only 1.8σ – making us rather sceptical about the reality of this planet.

Speaking shortly, although we cannot claim that the planet *d* RV signature is insignificant, we must admit that its detection is not robust and requires a serious verification. The relevant RV variation may be caused by correlated RV noise in the data, and does not necessarily reflect a Doppler wobble induced by a real planet.

#### Keck data alone

Let us now deal with 121 Keck/HIRES RV measurements in the similar manner. The relevant periodograms are shown in Fig. 6. First, we can see that now we cannot detect more than two planets *b* and *c*, if we use the traditional white-noise model. This conclusion is in agreement with previous studies (Gregory 2011; Tadeu dos Santos et al. 2012), but it is still rather disappointing, because the Keck RV precision is pretty competitive in comparison with the HARPS one. Secondly, in the Keck data we again see clear hints of the red noise, which created a fake periodicity at approximately 27 d. In this case, our red-noise removing algorithm does its job even better than anyone could expect; it did not just killed all fake red-noise peaks, but it also reveals the 3.1-d peak belonging to the planet *e*! This proves that our algorithm does not just suppresses the apparent RV variations, lifting up the detection thresholds. It is working in a much more intelligent manner; in certain frequency ranges it may basically *improve* the effective RV precision, revealing the true periodicities that the red noise tries to hide.

The period of the newly discovered variation in the Keck data is in excellent agreement with the planet *e* period obtained from the HARPS data. Such coincidence is hardly casual. However, what is its rigorous significance? The answer to this question is not obvious because we cannot use AMLET for the Keck data set alone. Monte Carlo simulations (algorithm B1) suggest that the significance associated with this peak of the Keck periodogram is only 1.2σ. Therefore, if we tried to *detect* this planet from the Keck data alone, with absolutely no reference to the HARPS data, we would have to admit that this peak is statistically insignificant. Basically, it is a luck that no other comparable peak appeared in the top right-hand periodogram of Fig. 6.

However, we need just to *confirm* the planet *e* existence on the basis of the Keck data set, not to *detect* it anew. This places much more mild limits. We have no need to simulate the periodogram in its whole period range as shown in Fig. 6. We already know the probable planet *e* parameters from the HARPS data with good precision, including e.g. its orbital period. Now we only need to confirm that RV noise could not generate so large peak as we can see in the Keck data just *in a narrow vicinity* around this known period. It is not a big deal if we find a noisy peak at some faraway frequency, where the real planet *e* definitely cannot reside. Such *confirmational* significance will be much larger than the *detectional* one because the probability for the RV noise to occasionally generate a large peak inside a narrow frequency segment is much smaller than inside a wide one. This becomes obvious if we look at the periodogram's false-alarm probability approximation from Baluev (2008a):

*W*. For the whole range of periods from 1 d to infinity we have

*W*≈ 3500 (for the Keck data set taken alone), but when dealing with confirmational false-alarm probabilities, we need to consider

*W*∼ 1 at most, since the ±1σ uncertainty range of the expected planet

*e*period corresponds to

*W*∼ 0.1. Therefore, the confirmational false-alarm probability might be a few thousand times smaller than the detectional one.

However, we must remember that the red-noise model infers strong non-linearity when used with Keck data alone, as we have already discussed above. We cannot use any asymptotic methods in this case, and our numerical simulations must be more intricate than the plain Monte Carlo scheme B1. We can no longer rely on a single simulation series based on a single vector of nominal ‘true’ model parameters, as we have done before. Instead, we should honour some representative parametric domain. Since the true values may be anywhere in this domain, we must generate many distinct simulation series according to the Monte Carlo scheme from Section B1, each time assuming different vector for the mock ‘true’ parameters. The detailed step-by-step description of this algorithm is given in Section B3.

During this calculation, we first generated a sequence of the trial ‘true’ sets of parameters, assuming the two-planet red-noise Keck RV data model (the first-level simulation). This first-level simulation is not intended to have big statistical meaning, we need it just to obtain a set of points covering some more or less wide domain around the best-fitting two-planet solution. After that, for each of the generated parametric vectors, we run the plain Monte Carlo simulation (algorithm B1, 500 000 random trials in each simulation). In each random trial of this second-level simulation we generate an artificial Keck data set using the model ‘two-planet RV variation + correlated RV noise’ (without planet *e*). Then for each such data set we evaluate the relevant residual periodogram exactly in the same way as in the top right-hand panel of Fig. 6, where we used the real Keck data. For each such periodogram, we find the maximum in the narrow period range 3.145-3.153 d (*W* ≈ 3, centred at the nominal planet *e* period). Based on such Monte Carlo sequence, we count the fraction of simulated periodograms that demonstrated the same or larger maximum peak as the one that we have seen for the real data. This fraction represents the desired confirmational false-alarm probability of the planet *e*, as inferred by the adopted ‘true’ two-planet model. These false-alarm probabilities can be further transformed to the normal (‘*n*–σ’) significance levels that we use throughout this paper.

We plot the results of these simulations in Fig. 7. From this graph, we can see that the simulated confirmational significance practically does not depend on the adopted parameters of the base two-planet model, even when these parameters deviate from the nominal estimation by more than 2σ. Actually, most of the scatter around the nominal level is likely due to the statistical uncertainty of the second-level Monte Carlo. If we generated more trials for each point in Fig. 7, this scatter would probably shrink further. Basically, this figure does not reveal any real dependence on the true parameters (at least in the parametric domain that we were able to fill in the first-level Monte Carlo). The rigorous frequentist significance level is given by the minimum ordinate among all simulated points, while the nominal level corresponds to the point located at zero abscissa. These values do not differ much and can be rounded to 3σ both. This means that Keck data can robustly confirm the existence of the planet GJ 581 *e* RV signal.

Our algorithm does not reveal any hint of the planet *d* in the Keck data. The corresponding residual periodogram calculated after extraction of the three planets *b*, *c* and *e* looks like a perfect white noise with no peak attracting any attention. Maybe this planet *d* does not actually exist, and the variation that we have seen in the HARPS data is some systematic effect or just some random fluctuation? Unfortunately, Keck data alone cannot supply an independent answer to this question. Note that there is a pretty large difference in the significance of the planet *e*, as inferred by the HARPS and Keck data. After extrapolation of this difference to the planet *d*, we realize that currently available Keck data are just unable to reveal it, no matter exists it or not.

#### Combined data set

Now let us proceed to the joint analysis of the HARPS and Keck data. We consider three noise models that we have already introduced: the white-noise model, the shared red-noise model and the separated red-noise model.

We show a series of the relevant periodograms in Fig. 8. We can see that while the planets *b*, *c* and *e* can be robustly extracted from these data, the planet *d* still remains rather controversial, because its significance drops to only ∼2.1σ or even below, if a red-noise model is used. We feel that such significance level is not enough to claim a robust detection of an exoplanet because this significance is model-dependent. The planet candidate GJ 581 *d* should be probably reclassified as a controversial one.

Finally, we present two fits of the GJ 581 planetary system, obtained using the shared red-noise model. In the first fit (Table 2) we provide a three-planet configuration with planets *b*, *c* and *e*, while the second one (Table 3) also involves planet *d*.

Planetary parameters | |||
---|---|---|---|

Planet b | Planet c | Planet e | |

P (d) | 5.368 589(68) | 12.9186(21) | 3.149 05(16) |

$$\tilde{K}=K\sqrt{1-e^2}$$ (m s^{−1}) | 12.65(13) | 3.20(17) | 1.69(13) |

e | 0.022(10) | 0.040(44) | 0.195(73) |

ω (°) | 38(25) | 122(61) | 38(24) |

λ (°) | 142.89(64) | 102.5(3.4) | 62.2(4.6) |

M sin i (M_{⊕}) | 15.86(16) | 5.38(28) | 1.77(13) |

a (AU) | 0.040 611 89(34) | 0.072 9286(80) | 0.028 456 21(95) |

Data series and common fit parameters | |||

HARPS | Keck | ||

c_{0} (m s^{−1}) | −9206.01(28) | 0.86(34) | |

σ_{white} (m s^{−1}) | 0.33(40) | 1.18(28) | |

σ_{red} (m s^{−1}) | 2.05(19) | ||

τ_{red} (d) | 11.0(3.4) | ||

rms (m s^{−1}) | 2.43 | 2.96 | |

$$\tilde{l}$$ (m s^{−1}) | 2.10 |

Planetary parameters | |||
---|---|---|---|

Planet b | Planet c | Planet e | |

P (d) | 5.368 589(68) | 12.9186(21) | 3.149 05(16) |

$$\tilde{K}=K\sqrt{1-e^2}$$ (m s^{−1}) | 12.65(13) | 3.20(17) | 1.69(13) |

e | 0.022(10) | 0.040(44) | 0.195(73) |

ω (°) | 38(25) | 122(61) | 38(24) |

λ (°) | 142.89(64) | 102.5(3.4) | 62.2(4.6) |

M sin i (M_{⊕}) | 15.86(16) | 5.38(28) | 1.77(13) |

a (AU) | 0.040 611 89(34) | 0.072 9286(80) | 0.028 456 21(95) |

Data series and common fit parameters | |||

HARPS | Keck | ||

c_{0} (m s^{−1}) | −9206.01(28) | 0.86(34) | |

σ_{white} (m s^{−1}) | 0.33(40) | 1.18(28) | |

σ_{red} (m s^{−1}) | 2.05(19) | ||

τ_{red} (d) | 11.0(3.4) | ||

rms (m s^{−1}) | 2.43 | 2.96 | |

$$\tilde{l}$$ (m s^{−1}) | 2.10 |

Notes are the same as given in Table 1.

Planetary parameters | ||||
---|---|---|---|---|

Planet b | Planet c | Planet d | Planet e | |

P (d) | 5.368 603(66) | 12.9198(20) | 66.56(12) | 3.149 05(15) |

$$\tilde{K}=K\sqrt{1-e^2}$$ (m s^{−1}) | 12.62(13) | 3.18(16) | 1.81(28) | 1.73(13) |

e | 0.022(10) | 0.039(43) | 0.28(11) | 0.167(71) |

ω (°) | 32(26) | 138(62) | 329(26) | 41(26) |

λ (°) | 143.06(63) | 105.4(3.2) | 143.0(8.8) | 62.5(4.5) |

M sin i (M_{⊕}) | 15.83(16) | 5.35(26) | 5.247(80) | 1.81(13) |

a (AU) | 0.040 611 96(33) | 0.072 9331(75) | 0.217 54(25) | 0.028 456 22(93) |

Data series and common fit parameters | ||||

HARPS | Keck | |||

c_{0} (m s^{−1}) | −9205.96(22) | 0.79(31) | ||

σ_{white} (m s^{−1}) | 0.50(27) | 1.24(26) | ||

σ_{red} (m s^{−1}) | 1.65(16) | |||

τ_{red} (d) | 9.3(3.1) | |||

rms (m s^{−1}) | 1.99 | 2.79 | ||

$$\tilde{l}$$ (m s^{−1}) | 2.01 |

Planetary parameters | ||||
---|---|---|---|---|

Planet b | Planet c | Planet d | Planet e | |

P (d) | 5.368 603(66) | 12.9198(20) | 66.56(12) | 3.149 05(15) |

$$\tilde{K}=K\sqrt{1-e^2}$$ (m s^{−1}) | 12.62(13) | 3.18(16) | 1.81(28) | 1.73(13) |

e | 0.022(10) | 0.039(43) | 0.28(11) | 0.167(71) |

ω (°) | 32(26) | 138(62) | 329(26) | 41(26) |

λ (°) | 143.06(63) | 105.4(3.2) | 143.0(8.8) | 62.5(4.5) |

M sin i (M_{⊕}) | 15.83(16) | 5.35(26) | 5.247(80) | 1.81(13) |

a (AU) | 0.040 611 96(33) | 0.072 9331(75) | 0.217 54(25) | 0.028 456 22(93) |

Data series and common fit parameters | ||||

HARPS | Keck | |||

c_{0} (m s^{−1}) | −9205.96(22) | 0.79(31) | ||

σ_{white} (m s^{−1}) | 0.50(27) | 1.24(26) | ||

σ_{red} (m s^{−1}) | 1.65(16) | |||

τ_{red} (d) | 9.3(3.1) | |||

rms (m s^{−1}) | 1.99 | 2.79 | ||

$$\tilde{l}$$ (m s^{−1}) | 2.01 |

Notes are the same as given in Table 1.

## Reality of the putative fifth and sixth planets

Various authors have already raised a lot of doubts concerning the existence of the planets GJ 581 *f* and GJ 581 *g*, since their announcement in Vogt et al. (2010). Basically, it appears that their parameters may change significantly with each data update and, besides, they do not demonstrate enough resistance with respect to various rather subjective choices: methods of data analysis, model details, etc. For example, even for the traditional white-noise model, we do not see in our work any hints of these planets, as they were originally reported in Vogt et al. (2010). The four-planet residual periodograms plotted here separately for the HARPS, Keck and joint data sets demonstrate different patterns of individual marginally significant maximum peaks, although they all reveal a similar average power excess at the periods longer than 10 d.

Our advanced analysis suggests that the red-noise models leave no room for any RV variations beyond the four-planet models: neither in the HARPS, nor in the Keck, nor in the combined data set. Our red-noise models just absorbed all RV signals interpreted previously as the hints of the planets *f* and *g*. However, maybe it is just a question of interpretation? There are a lot of reasons explaining why the measurements taken at different epoch may be statistically correlated with each other. The genuine RV noise caused by some astrophysical activity of the star, for instance, is a likely explanation, but other sources are still possible. For example, a sinusoidal periodic oscillation would produce the same periodic contribution in the compound correlation function of the data. This might produce a picture very similar to what we see e.g. in Fig. 3 until we extend this graph to larger time lags (which appears practically impossible because the necessary regularity of the RV data is lost at large time separations). Then why this red noise cannot be solely induced by some extra planets? Indeed, when working only in the time domain (correlation functions) it may appear practically impossible to disentangle the red noise from deterministic long- and moderate-term variations. However, in our work we rely on the frequency domain (power spectra), where this task is not that hard.

Of course, it remains theoretically possible that this red noise represents just a mixture of many *true* periodic variations. Actually, the same logic can be applied to the usual white noise equally well. For practice, this interpretation really changes nothing; we still unable to model these hypothetical variations separately and we have to find some compound model for them. It is the case where we should just apply the Occam's razor. The really meaningful question is: Can we efficiently eliminate the red noise by means of only a few periodic harmonics? We find that in the case of GJ 581 the answer is no. We started to honestly select the highest peaks from the last white-noise residual periodogram in Fig. 8 and subsequently remove the relevant periodicities from the residuals, one by one, each time plotting a new residual periodogram. We stopped after two such extra periodicities because we did not achieve any impressive progress (the power excesses for periods more than 10 d and around the diurnal period were still there), and the residual periodogram contained already *three* moderate peaks at the periods of a few tens of days. They had the same marginal formal significance as the two previous ones. Clearly, only the red-noise models allow us to purge out all these chameleonic peaks at once.

Based on our investigation, we conclude that the putative planets GJ 581 *f* and GJ 581 *g* likely do not exist, and the relevant RV signatures belong to the correlated noise. We do not say that we reject the existence of absolutely any planet in this system beyond the four known ones. However, even if some more planets exist in this system, they are not detectable in the present data, and they are unrelated to the peaks that we can see in the periodograms plotted with the use of only the white-noise model. The current RV data really support the existence of no more than four planets orbiting GJ 581.

## CONCLUSIONS AND DISCUSSION

Although this work was focused on the concrete exoplanetary system of GJ 581, we believe that our results may have much more general meaning. The cases of GJ 876 discussed in Baluev (2011) and of GJ 581 are not unique, and it seems that there are more examples of planet-hosting stars demonstrating clear signs of the correlated noise in their publicly available RV data. Actually, we believe that the red RV noise might be a rather common phenomenon.

This imposes bad as well as good consequences. The bad thing is that we have to use more complicated and computationally slow methods of the analysis. Without that any analysis of such data cannot be reliable. Unfortunately, the functional shape of the correlation function has not yet been investigated well, so we have to make some rather voluntaristic guesses about it. Moreover, we need to accumulate rather large RV time series before the noise correlation parameters become fittable. In the case of GJ 581, for instance, the size of the HARPS data set is large enough, while the Keck data (half of the HARPS data in number) are not so good in this concern.

The good thing is that the method of the red-noise modelling does not just suppress the *phantomic* RV variations together with anything else on its way; it is capable to reveal *true* variations that were hidden beyond the fog of correlated noise. This means that our approach allows to increase, basically, the effective precision of the RV measurements, at least in the short-period domain. This offers a way to partly overcome the barrier set by the intrinsic RV jitter of the star, at least for some stars and in certain frequency ranges. In particular, we believe that our method may decrease exoplanetary detection threshold for active and/or subgiant stars, where the RV jitter contribution dominates in the total error budget, making it impossible to obtain the RV precision of better than 10–100 m s^{−1}.

It is not yet fully clear, what is the source of the RV noise correlation. It may be caused by some long-living spots or other details on the star's visible surface, or may be a result of aggregation of various instrumental effects unrelated to the star itself. Our statistical analysis cannot offer a definite answer to this question. We believe that in general both reasons may be responsible. However, in the particular cases of GJ 581 and GJ 876 the first interpretation seems more likely, because in both these cases the red RV noise was detected in *two* independent data sets (HARPS and Keck).

In view of this topic, we cannot leave aside a very recent report on the detection of a ‘hot Earth’ orbiting α Cen B (Dumusque et al. 2012). This discovery would not be possible without careful elimination of various effects of stellar activity polluting the periodograms at low frequencies. Basically, this team also applied some method of red-noise modelling, based on its correlation with a spectral activity index. Although this method is different from our approach (we are just unable to use the approach of Dumusque et al. 2012 because all what we have is the public RVs of GJ 581 and not its raw spectra), we may note that the situation with the planet of α Cen B looks quite similar to the case of GJ 581 *e*. Anyway, the work by Dumusque et al. (2012) puts more emphasis of our conclusion that the future of the Doppler planet searches lies in the careful treatment of the measured RV noise.

What concerns the particular case of the GJ 581 planetary system, we were able to obtain several important results. First, we have shown that its RV data do not really support the existence of any extra planet beyond the four-planet model. All apparent periodicities in these data, that were previously interpreted as extra planets, are illusions caused by RV noise correlations. Moreover, even the planet GJ 581 *d* becomes doubtful. Its significance in the HARPS data does not even reach 2σ, and in the Keck data it is not detectable at all. In the combined data set it may reach a more honourable level of 2.1σ but this level is still model-dependent. We admit that this planet is not rejected in our work and still remains rather probable, but despite this fact we insist that it should be reclassified as a controversial one, until more data (preferably independent of HARPS) confirm it. The 2σ significance level is not enough to claim a robust planet detection, if it was not confirmed by independent observations. In contrast, we were able to robustly confirm the planet GJ 581 *e*. This planet can be revealed in the HARPS and Keck RV data independently, although to find its signal in the Keck time series it is mandatory to use a red-noise data model. The confirmation significance of this planet in the Keck data is ∼3σ, although the same detection significance is only ∼1σ.

At last, we would like to discuss briefly the results concerning the GJ 581 planetary system presented by Tuomi & Jenkins (2012) in a recent preprint emerged during refereeing of our paper. In this work, the authors also used an autocorrelated noise model with exponentially decaying correlation, similarly to our work. Their main conclusion differing from ours is that using the Bayesian approach they can confirm the existence of the fourth planet, *d*, with larger statistical evidence. We must admit that this did not dissolve our criticism concerning this planet. We are of the opinion that such a difference between our conclusions was induced by different subjective prior distributions adopted in Tuomi & Jenkins (2012), rather than on the objective information hidden in the RV data. Tuomi & Jenkins (2012) assumed the prior probability density function for planetary orbital periods as ∝ 1/*P*, meaning a flat distribution in ln *P*. It is easy to show that the periodogram approach used in our paper (which basically belongs to the family of AMLET tools) implicitly assumes a prior which is roughly flat in the *frequency*, implying the period distribution law ∝1/*P*^{2}. Clearly, even if we leave behind the scene the discussion of what prior is better, the first prior dramatically shifts any data analysis in favour of longer period signals, so there is no surprise that Tuomi & Jenkins (2012) obtained more significance for GJ 581 *d* than we did. This difference is mostly subjective, however; we could reach roughly the same effect simply by dealing with renormalized periodograms, multiplying them by the value of the period in the abscissa.

What concerns the question which prior is better, the answer depends on the purpose of the analysis and other conditions. If our goal was to analyse a large array of data sets for many targets and to maximize the outcome of this massive analysis, then we would use the prior 1/*P*, because we know that the period distribution of real exoplanets is much more uniform in the log-period/log-frequency scale than in the linear-frequency one. However, here we deal with a planetary system that has high individual importance. For an individual data set, the prior 1/*P* may induce some uncomfortable side effects. In particular, it favours to the selection of longer period aliases instead of the true periods. A practical example is provided by the two-planet system of HD 208487. The Bayesian analysis assigns a 909 d period for the second planet of this system (Gregory 2007), but in the usual periodogram of the residuals there are actually *two* main peaks at 27 and ∼1000 d (Wright et al. 2007). These two peaks can be interpreted as monthly aliases of each other, and offer almost the same best-fitting residuals rms. However, the 1/*P* prior used in Gregory (2007) just suppressed the shorter period peak, allowing it to slip away from the view. In the case of GJ 581, we must be even more careful with any pumping of long-period signals because in such a manner we can easily pump up some red noise variation that our model was somehow unable to efficiently eliminate.

Anyway, returning to the planet GJ 581 *d*, it is clear that its significance is not very impressive with the present RV data and, in addition, this significance is highly model-dependent. Under such circumstances, we prefer to follow the general principle of the frequentist approach in the statistics, which means that the selection between all realistic solutions must be done on the worst case basis. GJ 581 *d* needs to be confirmed by some independent non-HARPS data.

This work was supported by the Russian Academy of Sciences research programme ‘Non-stationary phenomena in the objects of the Universe’ and by the Russian Foundation for Basic Research, project No. 12-02-31119. I would like to thank Professor P. C. Gregory for providing an insightful review of this manuscript.

## REFERENCES

### APPENDIX A: Some tips on the maximum-likelihood algorithm

#### On the inverse of the noise covariance matrix

Possibly the fastest way to invert a real symmetric positive-definite matrix like $$\bf{V}$$ is to use the famous Cholesky decomposition $$\bf{V} = \bf{L} \bf{L}^{\rm T}$$, where $$\bf{L}$$ is a lower triangular matrix. It requires approximately *N*^{3}/6 floating-point multiplications. Moreover, having the matrix $$\bf{L}$$ at our disposal, we usually do not need to evaluate the inverse $$\bf{V}^{-1}$$ at all, because in reality we usually need to evaluate only the matrix–vector combinations like $$\bf{L}^{-1} \boldsymbol x$$ or $$\boldsymbol x^{\rm T} \bf{L}^{-1}$$, which obviously can be obtained using the forward or back substitution. Moreover, these operations require practically the same CPU time as the direct multiplication by the pre-calculated inverse $$\bf{V}^{-1}$$ would do.

However, there is a single occurrence where the evaluation via direct matrix inversion seems the fastest way possible. It is the expression $$\mathop {\rm Tr}\nolimits (\bf{V}^{-1} \mathrm{\partial} \bf{V}/\mathrm{\partial} p_i)$$ in equation (3). It seems that this task requires ∼*N*^{3} floating-point operations (FLOPs) anyway. However, since this expression must be evaluated for many parameters *p*_{i}, it is faster to pre-calculate $$\bf{V}^{-1}$$ based on the Cholesky decomposition (this inversion requires *N*^{3}/3 floating-point multiplications) and then to evaluate the necessary matrix trace directly. Note that the trace of a matrix product $$\mathop {\rm Tr}\nolimits \bf{A} \bf{B}$$ is equal just to the scalar product of the matrices involved, ∑_{i, j}*A*_{ij}*B*_{ij}, and thus requires only ∼*N*^{2} operations.

#### Avoiding matrix multiplications

Let us consider the calculation of $$F_{p_i p_j}$$ in equation (7). Even if we have pre-calculated the inverse $$\bf{V}^{-1}$$ or use some decomposition of $$\bf{V}$$ that makes its inversion easy, expression (7) involves a few matrix multiplications of very large (*N* × *N*) matrices, which require $$\mathcal {O}(N^3)$$ FLOPs. This is unsatisfactory and motivates us to find another representation for $$F_{p_i p_j}$$ that could be evaluated more quickly. Using the general identity $$\boldsymbol x^{\rm T} \bf{A} \boldsymbol x = \mathop {\rm Tr}\nolimits (\bf{A} \boldsymbol x \boldsymbol x^{\rm T})$$ and the relation $$\mathbb {E}(\boldsymbol r \boldsymbol r^{\rm T}) = \bf{V}$$ (the equality is exact because we should take the mathematical expectation at the true values of the parameters), we can transform the first of the expressions (4) as follows:

Since we use the Fisher matrix just as a handy approximation of the Hessian with the same relative error of $$\mathcal {O}(1/\sqrt{N})$$, the approximation in equation (A2) is no worse. However, equation (A2) can be evaluated without the use of matrix multiplications. Already having the Cholesky decomposition $$\bf{V} = \bf{L} \bf{L}^{\rm T}$$, we can easily perform the first matrix–vector multiplication $$\bf{V}^{-1} \boldsymbol r = (\bf{L}^{-1})^{\rm T} \bf{L}^{-1} \boldsymbol r$$ (only ∼*N*^{2} FLOPs). After that we need to perform yet a few matrix–vector multiplications to form $$\mathrm{\partial} \bf{V}/\mathrm{\partial} p_i \bf{V}^{-1} \boldsymbol r$$ for all *i* (also ∼*N*^{2} FLOPs). Then we need to multiply these vectors by $$(\bf{L}^{-1})^{\rm T}$$ from the left-hand side (again ∼*N*^{2} FLOPs) and evaluate the pairwise scalar products of the resulting vectors to obtain $$F_{p_i p_j}$$ for all *i* and *j* (only ∼*N* FLOPs). This optimized procedure requires only $$\mathcal {O}(N^2)$$ FLOPs instead of the original $$\mathcal {O}(N^3)$$ FLOPs.

#### Profit from the matrices sparseness

Since the noise correlation time-scale that we are dealing with is about 10 d, while the total time span has the order of 10^{3} d, most of the elements in the matrix $$\bf{V}$$ are close to zero. Therefore, it is highly desirable to set small off-diagonal elements to zero exactly, and apply some algorithm of Cholesky decomposition and/or inversion tuned for sparse matrices. It is important that the first thing must be done in a smooth manner: we cannot just abruptly set all correlations below some small level to zero, since for the sake of smooth work of our Levenberg–Marquardt algorithm, we need to have continuously varying gradient and Hessian of the likelihood function. We reach this goal by means of the following smooth replacement in the argument of the noise correlation function ρ^{′}(*x*) = ρ(*x*^{′}(*x*)), where *x*^{′} = *x*/[1 − (*x*/*x*_{0})^{2}] and *x*_{0} is such that ρ(*x*_{0}) is equal to some small value, e.g. 0.01. For *x* > *x*_{0} we set ρ^{′}(*x*) ≡ 0. After such modification, most of the elements in $$\bf{V}$$ become exact zeros. Interestingly, after that we noted a remarkable speed-up of various linear algebra calculations, even with no use of any specialized sparse-matrix algorithms. This indicates, probably, that modern CPUs execute various floating-point commands faster when one of the arguments is zero. With the use of algorithms tuned for sparse matrices, the performance increases even more dramatically. We unfortunately cannot give any reference or recommendation of any relevant software package, since the algorithms that we have used in this paper are developed by us.

### APPENDIX B: Monte Carlo simulation schemes used in the paper

#### Plain Monte Carlo assuming Gaussian noise

- (i)
First of all, select some mock ‘true’ values of the model parameters somewhere in the region of interest. We may select, for example, the nominal ones given in Table 1 for the white-noise model or analogous best-fitting values for the red-noise models, although such choice is not mandatory.

- (ii)
Given the chosen vector of ‘true’ parameters, evaluate the ‘true’ RV values and the compound RV errors variances (and also correlations for red-noise models).

- (iii)
Construct a simulated RV data set by means of adding to the evaluated RV curve the simulated Gaussian errors, generated on the basis of previously evaluated uncertainties and correlations.

- (iv)
Based on simulated data set, evaluate the value of the likelihood function at the true parameter values from step 1, and the maximum value of the likelihood function for this trial. Based on these two values, evaluate the modified likelihood ratio statistic $$\tilde{Z}$$ for this trial, which is defined in Baluev (2009).

- (v)
Save the newly generated value of $$\tilde{Z}$$, as well as the set of the simulated best-fitting parameters (when necessary), and return to step 3, if the desired number of trials has not been accumulated yet.

#### Bootstrap simulation

- (i)
Evaluate the best-fitting model and the resulting RV residual.

- (ii)
Apply random shuffling procedure separately to the HARPS and Keck sets of the residuals.

- (iii)
Evaluate the statistic $$\tilde{Z}$$ and best-fitting parameters in the same manner as in the plain Monte Carlo simulation.

- (iv)
Save the resulting value of $$\tilde{Z}$$ and parameters and return to step 2.

Note that the bootstrap simulation is meaningful only when it is used with a white-noise RV model, because random shuffling of the residuals basically destroys any correlational structure of the RV noise, which a red-noise model tries to deal with.

#### Genuinely frequentist Monte Carlo simulation

- (i)
Select an

*i*th trial point in the space of model parameters $${\bf \xi}$$ (or residing inside some given parametric domain). - (ii)
Run the algorithm B1 assuming that true parameters correspond to the selected point.

- (iii)
Save the simulated distribution $$P_i(\tilde{Z})$$ of the test statistic of interest ($$\tilde{Z}$$ in our case) and return to step 1.

- (iv)
When a sufficiently dense coverage of the mentioned in step 1 parametric domain is reached, evaluate the function $$P(\tilde{Z})=\min P_i(\tilde{Z})$$.

After that, the rigorous frequentist false-alarm probability associated with an *observed* value $$\tilde{Z}_*$$ (that was obtained using exactly the same models that were used during the simulation) can be calculated as $$1-P(\tilde{Z}_*)$$. This is a worst case assumption method, in other words. Note that if we would stand on the Bayesian ground, we would evaluate, basically, some weighted average of $$P_i(\tilde{Z})$$ instead of the minimum, and this would force us to assume some prior distribution of the parameters. Obviously, in the frequentist approach we need only to circle a parametric domain, since any prior density inside this domain does not play any role when we find the minimum.

### APPENDIX C: AMLET applicability to the GJ 581 case

Let us first freshen in brief a few practical things that AMLET includes.

- (i)
The asymptotically unbiased estimations of the model parameters are provided by the position of the maximum of the likelihood function.

- (ii)
Asymptotically, these estimations follow the multivariate Gaussian distribution with the covariance matrix expressed (again asymptotically) as the inverse of the Fisher information matrix, defined in equation (6).

- (iii)
To test some simple ‘null’ model against a more complicated alternative one (which encompasses the null hypothesis as a partial case), we need to construct the relevant likelihood ratio statistic, and evaluate the false-alarm probability associated with the null hypothesis rejection. The latter false-alarm probability can be found from the known asymptotic χ

^{2}distribution of the likelihood ratio logarithm. - (iv)
Consequently from the previous point, the multi-dimensional confidence regions for some set of model parameters are outlined as level curves (or level surfaces) of the likelihood function, considering it after maximization over the rest of free parameters. The value of the likelihood ratio corresponding to the global maximum and a given level curve yields the confidence probability of this level curve (again assuming the asymptotic χ

^{2}distribution for the logarithm of this ratio).

When the fit model is linear or well linearizable, AMLET is accurate already for relatively small number of observations. When the model non-linearity increases, the critical number of observations, after which AMLET becomes applicable, appears impractically large, so that for practical numbers AMLET offers poor precision. Since in our case of GJ 581 we deal with rather complicated non-linear model of the RV data, we would like to find out, which AMLET proposition we can be used safely under our concrete circumstances?

We may note that the AMLET proposition listed above demonstrate different resistance with respect to a change of variables (re-parametrization of the original model). For example, assume we have some model parameter *x*, and we make a replacement $$x \longmapsto y=1/x$$, treating the old primary fit parameter *x* as only a derived one. Even if the distribution of the estimation of *x* was exactly Gaussian, the analogous distribution of *y* may appear completely non-Gaussian. Therefore, rather formal action like a non-linear change of variables, which did not really alter the functional structure of our original model, was able to invalidate some of the AMLET propositions. However, some other propositions, namely those dealing with only maxima of the likelihood function, remained intact. Indeed, the maximum value of a function is invariable with respect to any change of the independent variables (at least if this change is a one-to-one mapping), so the quantities like the likelihood ratio statistic are invariable with respect to such re-parametrization. This phenomenon is sometimes called as exogenous and endogenous non-linearity. The exogenous non-linearity does not belong to the physics of the original task, and depends on human-controllable things like, for instance, the choice of the system of free parameters, time reference point, coordinate system, etc. The endogenous non-linearity represents an immanent property of the task and it cannot be eliminated by any such trick.

The endogenous part of non-linearity is the only thing that we really need to take care of, since anything beyond it is, basically, just a result of carelessly chosen parametrization. Since in this paper we mainly deal with the likelihood ratio test and its descendants, we need to check how precisely the asymptotic χ^{2} distribution can approximate the real distribution of the relevant likelihood ratio statistic. We consider three models to verify: the four-planet white-noise model, the four-planet shared red-noise model and the four-planet separated red-noise model. For each of these models, we perform the Monte Carlo simulation sequence B1 of Appendix B.

For the white-noise model, we also perform two bootstrap simulation series, which is a popular tool for exoplanetary RV data analysis works (e.g. Marcy et al. 2005) due to its resistance to possible non-Gaussian errors in the data. The first simulation is done according to the scheme B2 in Appendix B, while in the second bootstrap simulation we applied the same algorithm to a simulated data set with purely Gaussian noise (rather than to the real RV data set).

The results of these simulations are shown in Fig. C1. For the white-noise model, we find that the simulated distribution of $$\tilde{Z}$$ is in amazingly perfect agreement with the asymptotic χ^{2} approximation. For the shared red-noise model the agreement is still good. And only for the separated red-noise model AMLET offers poor precision.

It is rather unexpected that in the white-noise case bootstrap simulation disagrees both with the Monte Carlo and with the χ^{2} distribution. This disagreement does not indicate that the RV noise in the real data is non-Gaussian. Vice versa, two bootstrap curves lie very close to each other, meaning that real RV measurement errors are indistinguishable from Gaussian noise. This means that the bootstrap itself is basically an inadequate tool for our tasks.

Therefore, why bootstrap did not work in Fig. C1 as we expected? We find that the reason is contained in the RV jitter parameters. Investigating Fig. C2, we can see that while the usual Monte Carlo simulations are again in good agreement with AMLET confidence contours for the HARPS and Keck jitter, the results of the bootstrap are definitely wrong: they are biased and locked in an inadequately narrow region of the plane. The reason of this behaviour is clear: while we shuffle the best-fitting residuals, their scatter remains constant, and since the RV jitter is derived mainly from this scatter, such shuffling keeps both jitter estimations almost constant. This means that bootstrap cannot be applied to data models involving some parameters of the noise.

Now let us investigate the non-linearity of our RV models in a bit more depth. We consider 2D confidence regions for a few pairs of model parameters. For this goal, we select three pairs of parameters that demonstrate the largest mutual correlations, since such parameters are usually affected by stronger non-linearity effects (Baluev 2008b). As it turned out, all these pairs involve the mean longitude λ and the pericentre argument ω of one of the planets. The relevant asymptotic 2D confidence contours, constructed on the basis of our statistic $$\tilde{Z}$$, are shown in Fig. C3. Clearly, they have little common with ellipses that we would see for a well-linearizable model. However, this non-linearity has only exogenous nature and is caused, obviously, by small planetary eccentricities, which make the parameter ω poorly determinable. Under such circumstance we should better consider, instead of the pairs (λ, ω) the pairs (λ, *e* cos ω) and (λ, *e* sin ω), since the parameters *e* cos ω and *e* sin ω are more adequate than *e* and ω, when dealing with almost circular orbits.

Our conclusion, that the apparent non-linearity of these parameters is only exogenous, is confirmed by numerical simulations, which are in good agreement with the asymptotic confidence contours. The agreement is equally good for the bootstrap and for the pure Monte Carlo methods, which generate practically identical sets of points. This also confirms our previous conclusion that the RV data for GJ 581 do not show any detectable non-Gaussianity.

And the final question, why the separated red-noise model demonstrates so large deviation from AMLET’s χ^{2} distribution in Fig. C1? What is the source of this endogenous (and thus more important) non-linearity? Obviously, the reason is hidden in the RV noise model because we have already established that the RV curve model may produce only negligible endogenous non-linearity. To investigate this question in more depth, we performed the same Monte Carlo simulation treating the HARPS and Keck data entirely independently. We again assumed the separated red-noise model for these data sets. What concerns the RV curve model, for the HARPS data we adopted the four-planet one, while for the Keck data we considered the three-planet and two-planet models (with and without planet *e*, and with no planet *d*). After that we simulated the distribution of the statistic $$\tilde{Z}$$, and compared it with the relevant asymptotic χ^{2} distribution, the same as in Fig. C1. We obtained that for the HARPS data set the agreement is as good as for the shared red-noise model, while the Keck data set demonstrates bad things (Fig. C4).

Therefore, the main source making the separated red-noise model statistically poor is the Keck data set, which can provide only rather ill estimations of the RV noise parameters, when it is used without HARPS data.