Investigating stellar activity through eight years of Sun-as-a-star observations

Stellar magnetic activity induces both distortions and Doppler-shifts in the absorption line profiles of Sun-like stars. Those effects produce apparent radial velocity (RV) signals which greatly hamper the search for potentially habitable, Earth-like planets. In this work, we investigate these distortions in the Sun using cross-correlation functions (CCFs), derived from intensive monitoring with the high-precision spectrograph HARPS-N. We show that the RV signal arising from line-shape variations on time-scales associated with the Sun’s rotation and activity cycle can be robustly extracted from the data, reducing the RV dispersion by half. Once these have been corrected, activity-induced Doppler-shifts remain, that are modulated at the solar rotation period, and that are most effectively modelled in the time domain, using Gaussian Processes (GPs). Planet signatures are still best retrieved with multi-dimensonal GPs, when activity is jointly modelled from the raw RVs and indicators of the line width or of the Ca II H & K emission. After GP modelling, the residual RVs exhibit a dispersion of 0.6-0.8 m s − 1 , likely to be dominated by signals induced by super-granulation. Finally, we find that the statistical properties of the RVs evolve significantly over time, and that this evolution is primarily driven by sunspots, which control the smoothness of the signal. Such evolution, which reduces the sensitivity to long-period planet signatures, is no longer seen in the activity-induced Doppler-shifts, which is promising for long term RV monitoring surveys such as the Terra Hunting Experiment or the PLATO follow-up campaign.


INTRODUCTION
Doppler spectroscopy is one of the bedrocks of past, present and future exoplanetary science.As of 2024, it remains the preferred technique for confirming transiting planet candidates and the most prolific method for detecting non-transiting planets.It provides one of the most fundamental parameters of an exoplanet, its mass, crucial to constrain its inner composition (Mordasini et al. 2012) and to characterise its atmosphere (Batalha et al. 2019).Since the detection of 51 Pegasi b by Mayor & Queloz (1995), the radial velocity (RV) accuracy of optical high-resolution échelle spectrographs has dramatically increased, to the point where planets with RV semi-amplitudes below 1 m s −1 can be reliably detected (e.g.Faria et al. 2022;John et al. 2023).New-generation spectrographs like ESPRESSO (Pepe et al. 2021), EXPRES (Jurgenson et al. 2016), HARPS-3 (Thompson et al. 2016), NEID (Schwab et al. 2018) and KPF (Gibson et al. ★ E-mail: baptiste.klein@physics.ox.ac.uk 2016) have been designed to provide RVs with sub 0.3-m s −1 precision, suggesting that a detection of an Earth-like planet around a Sun analog may be possible in the coming decade.
The intrinsic variability of stars is currently the main limitation to the detection of low-mass and long-period planets with Doppler spectroscopy (see Fischer et al. 2016;Crass et al. 2021;Meunier 2021).Various processes related to photospheric flows (e.g.granulation, super-granulation, meridional circulation; Dumusque et al. 2011;Meunier et al. 2015;Cegla et al. 2018;Meunier & Lagrange 2019, 2020) and magnetic activity (e.g., active regions, flares, magnetic cycles; Saar & Donahue 1997;Desort et al. 2007;Meunier et al. 2010;Lovis et al. 2011;Gomes da Silva et al. 2012) distort stellar line profiles, giving rise to RV signals that hamper the search for planet signatures.The accurate modelling of these signals has become an extremely active area of research, with promising stateof-the-art methods currently under investigation (see Zhao et al. 2022, for a review).The mathematically tractable and flexible framework of Gaussian processes (Aigrain & Foreman-Mackey 2022) is now widely used to model the signals induced by stellar activity on the RVs and other activity indicators (e.g.Haywood et al. 2014;Rajpaul et al. 2015;Delisle et al. 2022;Barragán et al. 2022b).On the other hand, more and more innovative methods aim to correct line distortions directly from the cross-correlation functions (e.g.Collier Cameron et al. 2021;Klein et al. 2022;de Beurs et al. 2022) or from the spectra (e.g.Jones et al. 2017;Dumusque 2018;Rajpaul et al. 2020;Lienhard et al. 2022;Cretignier et al. 2022).Yet, none of the methods is currently able to filter stellar activity RV signals significantly below the symbolic barrier of ∼1 m s −1 (after averaging over signals induced by P-mode oscillations and granulation), and remain poorly sensitive to low-amplitude (≲0.5 m s −1 ) long-period (≳100 d) planet signals.
The Sun is our best laboratory to better understand stellar activity signals in high-resolution spectra.It is the only star whose surface can be resolved at high resolution and for which planet-induced RV variations can be integrally removed.Since 2015, the Sun has been intensively monitored with the High Accuracy Radial-velocity Planet Searcher in the Northern hemisphere (HARPS-N; Cosentino et al. 2012), making it possible to investigate the full contributions of active regions on high-resolution spectra (Milbourne et al. 2019;Thompson et al. 2020;Haywood et al. 2022;Lienhard et al. 2023).These highcadence high-precision solar spectra are ideally suited to assess the ability of a given method to filter stellar variability (Dumusque et al. 2021, Dumusque et al. in prep.).Additionally, continuous monitoring of the solar surface with, for example, the Helioseismic and Magnetic Imager (HMI) instrument onboard the Solar Dynamics Observatory (SDO; Pesnell et al. 2012) makes it possible to better understand the physical processes driving the observed disc-integrated RVs (e.g.Milbourne et al. 2019;Haywood et al. 2022, Rescigno et al., submitted).
In this study, we investigate how the HARPS-N solar activity RV signals can be modelled using information in both wavelength and time domains.In Sec. 2, we present our framework to separate a time series of cross-correlation functions (CCFs) into two components whose temporal variations are caused either by pure Doppler shifts or by shape-distortions.After describing the HARPS-N solar data set in Sec. 3, we investigate, in Sec. 4, the effects of filtering the shape-driven distortions on the solar RVs, and assess the planet detectability in the activity-filtered RVs through a large number of planet injection-recovery tests.In Sec. 5, we couple this framework with Gaussian processes and, notably, study the effect of combining newly-extracted shape-driven activity indicators with the HARPS-N solar RVs in a multi-dimensional Gaussian process framework.Finally, in Sec.6, we discuss the implications of our results in terms of activity modelling and planet search in the light of forthcoming long-term missions.

METHOD
Our approach builds on the Doppler-constrained principal component analysis (PCA) framework of Jones et al. (2017) and on the SCALPELS formalism introduced by Collier Cameron et al. (2021), borrowing elements from both to separate, as simply as possible, the components of the RV variations that can be explained by a pure Doppler shift from the rest.Using the same designations as Collier Cameron et al. (2021), we refer to these two components as the shift-and shapedriven velocities, respectively.

Framework
We start from CCFs produced by, for example, the HARPS-N data reduction software (see Sec. 3.1).We consider a time series of  CCFs spanning  velocity bins  = ( 1 ,...,  ) and the associated RV timeseries v obs (typically taken from the reduction pipeline).Our framework requires the computation of a non-noisy and activity-low reference CCF, C R , obtained by computing the inverse-variance-weighted mean of all the CCFs in the stellar rest frame 1 .We use the simple analytic framework of Jones et al. (2017) to decompose the time series of reference-subtracted CCFs, C, into a pure Doppler component, C D , and residuals, C S , whose variations are driven by distortions in the profile shape.Assuming that the Doppler shifts are small compared to the line width, we approximate the Doppler component by the first-order term of the Taylor expansion of C with respect to the CCF velocity bins,  (see Bouchy et al. 2001), matching to the spectrum's resolution.Therefore, noting D = [ R ′ ( 1 ) , ...,  R ′ (  )] ⊺ , the first derivative of the reference CCF with respect to the velocity bins, we have The time series of residuals, C S , contains all the shape distortions of the input CCFs, but are, in principle, insensitive to Doppler shifts. (2) We extract a RV time-series, v ∥ (called shape-driven RVs), by fitting a Gaussian function to each CCF of C ∥ + C R .Finally, we build a Doppler-driven (hereafter called shift-driven) RV time-series, v ⊥ , by subtracting v ∥ from the initial RV time-series v obs .
The number  of principal components used to build U must be chosen with caution.Basis vectors marginally contributing to the variance of the input CCF time-series are primarily composed of white noise.Thus, including them in the projection (Eq.2) will increase the dispersion of the input RVs and interfere with potential planetary signals.In practice, we adapted the method described in the Section 3 of Klein et al. (2024) to choose the number of components used in the projection.We independently generate about 10 matrices, matching C S in shape, but containing only white noise, drawn from the formal uncertainties on the CCF.We then apply SVD to each of these matrices and use the highest eigenvalue S max of all matrices as a proxy for noise-dominated components.When we apply SVD to C S , Table 1.Description of the parameters used to generate time series of CCFs with SOAP2.In each case, we give the number of spots (N spot ) and faculae (N fac ), as well as their corresponding stellar rotational phase  r and latitudes (Lat).The size of the active regions are set to 0.18% (faculae) and 0.1% (spot) with respect to the area of the visible hemisphere, in order to produce a RV signal of about 1 m s −1 .The last column gives the number N pl of pure Doppler signals injected in the CCFs.
components with eigenvalues greater than S max are assumed to enclose a significant fraction of correlated noise (e.g.activity-induced shape distortions) and are used in the basis definition.Conversely, components with eigenvalues smaller than S max are assumed to be dominated by white noise and discarded.Note that, in practice, the selected principal components used in the basis are consistent with the "elbow" heuristic method and account for an explained variance score typically greater than 95%.Assuming that U is noise free, all RV variations induced by distortions of the input CCFs are enclosed in v ∥ .Therefore, in principle, v ⊥ will be composed of planet signatures, Doppler shifts of unknown origin (e.g.residual stellar activity or instrument systematics) and white noise.As a word of caution, we stress that our theoretical framework is limited to spots and faculae, which do affect the shape of the CCFs.Other processes like super-granulation or meridional circulation may induce Doppler shifts without distorting the CCFs (see Meunier 2021, and references therein).

Application to synthetic data
Before applying the framework introduced in Sec.2.1 to real solar data, we conduct some simple tests on synthetic CCFs computed using the SOAP2 online tool (Dumusque et al. 2014;Boisse et al. 2012).This code allows the user to simulate the effects of spots (i.e., pure brightness inhomogeneities) and faculae (via the inhibition of the convective blueshift) on the absorption line profile.We consider three different configurations of active regions, listed in Tab. 1. Cases (1) and ( 2) correspond to a single spot and a single facula, at the equator of the star.For case (3), we consider a mix of four active regions, two spots and two faculae, of equal latitude (30 • ) and evenly spread in stellar longitude.Following Berdyugina (2005) and Meunier et al. (2010), the temperature of the spots is set to ∼650 K below the Sun's effective temperature.The size of the spots (resp.faculae) is set to 1% (resp.0.18%) of the visible stellar hemisphere, so that the semiamplitude of the corresponding RV signal is about 1 m s −1 , which is the typical order of magnitude of activity-induced RV variations for the Sun (e.g.Meunier et al. 2010).Finally, in case (4), we consider the case of a pure Doppler signal in absence of stellar activity, in order to double check that our method does not affect planetary signatures.The CCFs are generated by interpolating a non-spotted line profile computed with SOAP2, and shifting the entire profile according to the RV signature induced by a 1-m s −1 sine-wave with a period equal to the stellar rotation period to the data (note that no stellar activity is considered in this case).In all cases, we generate a time series of 200 CCFs evenly spanning one stellar rotation cycle, and add a white noise of 10 −5 with respect to the normalised continuum, which is of the same order of magnitude as the photon noise in HARPS-N solar CCFs.Note that a adding white noise significantly larger than the planet-induced Doppler shift is crucial in case (4).Since the Taylor expansion of Eq. 1 is only an approximation of the first derivative of the CCFs with respect to the wavelength, a small fraction of the planet signature will still be present in the residuals of Taylor expansion (i.e. the matrix C S in Sec.2.1).If the amplitude of this contribution is significantly weaker than the noise, it will naturally be discarded during the dimension reduction process (i.e. the SVD).For example, a 1-m s −1 Doppler signal induces residuals of about 10 −8 in C S , about 3-to-4 orders of magnitude lower than the typical white noise in HARPS-N spectra.A way to account for these variations, described in Collier Cameron et al. ( 2021), Wilson et al. (2022) and John et al. (2022), is to perform the dimensional reduction and search for planet-induced Doppler signatures simultaneously, which we apply in Sec.4.2.Note however that the planet signatures considered in this study are small enough not to affect significantly the shape-driven components of the Taylor expansion.
We then apply the framework of Sec.2.1 to the simulated data.We use the unspotted line profile generated by SOAP2 as reference (C R in Sec.2.1).A total of 3, 4 and 4 principal components is used to contruct the basis U in case (1), ( 2) and (3), respectively.As for case (4), eigenvectors are found to be dominated by white noise and are therefore discarded, which is expected as no shape distortion was introduced in the data set.In this last case, adding components in the matrix U introduces white noise in both v ∥ and v ⊥ , but the amplitude of the planet signature is only affected starting from  20 (∼3% of the explained variance in a data set dominated by white noise).The time series of noise-free reference-subtracted CCFs are shown in Fig. 1 for the cases listed in Tab. 1.The shape-and shift-driven RVs (i.e.resp.v ∥ and v ⊥ in Sec.2.1) extracted in each case are shown in Fig. 2. Unsurprisingly, the Doppler component is well recovered in case (4).We also find that RV signals induced by active regions are well filtered, with a suppression of 86, 94 and 78% of the stellar activity signals in cases (1), ( 2) and (3), respectively.However, even in the ideal case considered in this section, v ⊥ still exhibits activityinduced fluctuations of up to 0.1 m s −1 RMS in case (3), significantly larger than the level of white noise (∼0.01 m s −1 RMS).This is due to the fact that Doppler shifts and shape distortions are not purely orthogonal in the Taylor expansion and, therefore, cannot be entirely separated in Eq. 1.This can be understood by the fact that the first derivative of the CCF has no reason to be orthogonal to higher-order derivatives (especially with odd-order derivatives).By analysing the periodicities in v ∥ and v ⊥ for cases (1) and (2), we find that, in the case of a single spot, the shift-driven RVs, v ⊥ , still exhibits modulations at harmonics of the star's rotation period, whereas, in the case of a single facula, v ⊥ does not exhibit any explicit sign of periodicity.Finally, we note that including more SVD components in the basis projection affects only marginally the recovered RV signals and that none of the additional principal components exhibit significant periodic modulation.

Solar observations and data selection
Since 2015, disc-integrated solar spectra were collected at a 5min cadence with the stabilised cross-dispersed échelle spectrograph HARPS-N (Cosentino et al. 2012;Dumusque et al. 2015;Phillips et al. 2016;Collier Cameron et al. 2019;Dumusque et al. 2021), mounted at the 3.58-m Telescopio Nazionale Galileo (TNG) at Roque de Los Muchachos observatory (La Palma, Spain).These observations leverage the high-resolving power (R = 115 000) and the large spectral coverage in the  band (from 383 to 690 nm) of the instrument to achieve RV uncertainties lower than 0.5 m s −1 .Our input data set was collected between July 2015 and September 2023 and processed with the ESPRESSO data-reduction software (DRS v.2.3.5;Pepe et al. 2021), adapted to the HARPS-N solar telescope in Dumusque et al. (2021) and Dumusque et al., submitted.Since the Sun's surface is resolved, any partial obstruction of the solar disk by, for example, clouds in Earth's atmosphere, will likely induce substantial distortions in the CCFs, and, thus, large spurious RV deviations.In order to flag cloud-contaminated spectra, Collier Cameron et al. ( 2019) perform a daily linear fit to the apparent solar magnitude as a function of airmass (corresponding to the expected extinction law in optical observation conditions).The magnitude is given, as in Collier Cameron et al. (2019), by  = −5 log 10 S/N 60 , where S/N 60 is the signal-to-noise ratio (S/N) in the 60 th spectral order (corresponding to echelle order 98 which has a central wavelength of 625 nm).The deviation of each data point to the best-fitting extinction law is estimated through a Bayesian mixture model (Hogg et al. 2010), assigning a probability  that the spectrum is not contaminated by clouds.Following the recommendations of Dumusque et al. (2021) andCollier Cameron et al. (2021), we only consider spectra with  > 0.95 in the following analysis and airmass observations lower than 2.25.After rejecting such spectra, our initial data set of 147 741 solar spectra is reduced down to 104 573 (70.8%).The median S/N in the 60 th spectral order is 355 and the median RV uncertainty per spectrum is 0.28 m s −1 .The fraction of retained spectra are in agreement with Al Moulla et al. (2023), where in their Fig . A.2 it is shown that for most days there are only about 0-2 outliers per day among the dozens of daily HARPS-N solar spectra.By keeping only the spectra least affected by clouds, we are able to compute daily stacked spectra (see the following section) that are minimally affected by differential extinction.

Spectra post-processing
The analysis of solar activity with HARPS-N is made complex since Sun-as-a-star observations are different from real stellar observations.For instance, since the Earth is turning around the Sun and since the Earth is not perfectly aligned with the solar equator, a substantial change in the Sun's projected rotational velocity ( sin ) is observed, which introduces 1-year and half-a-year signals in the times series of the full-width at half maximum (FWHM) of the CCFs (Collier Cameron et al. 2019).Furthermore, the HARPS-N cryostat was changed on the 4 th October 2021, which also introduced an offset in several CCF moments since the focus of the instrument was realigned, slightly improving the resolution.Those instrumental interventions are expected to have RV contributions smaller than 0.5 m s −1 since no clear offset is observed by eye in either the solar RVs or standard stars RVs, but likely affected the spectra in some way.
In order to remove those signals and better isolate stellar activity contributions, we post-processed the S1D spectra time-series with YARARA (Cretignier et al. 2021).Note that we did not use any time-domain empirical corrections developed for the atmospheric extinction or  sin  (Collier Cameron et al. 2019;Dumusque et al. 2021), since those corrections were developed for the time-domain and their ability to perform in the wavelength domain is unclear.However, as shown below, YARARA itself is able to correct for them.YARARA is a post-processing methodology that aims to split the different contaminations coming from the instruments, such as ghosts, stitchings, interference patterns, ThAr bleeding, point spread function (PSF) defocus, from the tellurics and from the stellar activity directly in the spectra.
The main steps of the pipeline are to (i) daily stack the S1D spectra in the heliocentric rest-frame, (ii) normalise the continuum of the spectra with RASSINE (Cretignier et al. 2020b) and (iii) correct the continuum-normalised spectra with YARARA (Cretignier et al. 2021(Cretignier et al. , 2023)).The dataset after daily stacking the spectra and removing anomalous residuals among them2 consists of 1880 daily-stacked spectra.Note that we only use the first version of YARARA (hereafter "YV1"), which performs systematics correction in the wavelength domain as described in Cretignier et al. (2021).The pipeline was slightly upgrade on HARPS-N to better correct the cryostat change (see Appendix A).The final RVs were obtained from CCFs with a line list tailored for the Sun as described in Cretignier et al. (2020a), formally described as "Custom" in Bourrier et al. (2021) and "Kit-Cat" line list in Cretignier (2022).The CCF's FWHM, bisector velocity span, (V s ; Hatzes 1996;Queloz et al. 2001), and equivalent width (EW; product of FWHM and contrast), as well as the Mount Wilson S index (S HK ; Noyes et al. 1984) are also computed from the processed spectra.The CCF EW (i.e. the area of the CCF) is a known tracer of stellar temperature and metallicity (Malavolta et al. 2017).As the Sun's metallicity is expected to remain roughly constant during our observations, the CCF EW is a good tracer of global temperature changes in the solar photosphere (e.g.magnetic intensification and/or spot-induced flux variations Collier Cameron et al. 2019).On the other hand, V s , defined as the RV difference beween the center of a Gaussian fit to the whole CCF and the center of a parabola fit on the core of the CCF (i.e.within ±2.3 km s −1 ), is highly sensitive to velocity suppressions on the solar disk (mostly due to faculae).
Because YARARA is working with spectra interpolated on a 0.01-Å wavelength grid, the CCFs are slightly oversampled to velocity bins of 530 m s −1 , compared to the 820 m s −1 of the usual DRS (Dumusque et al. 2021).This is not critical, but it means that two consecutive velocity bins are not fully independent and share some covariance.Flux uncertainties on the CCF are artificially boosted by the square-root of the oversampling factor to cancel the oversampling effect.The natural output of the YARARA post-processing (spectra, CCFs or time-series) is usually already corrected from the stellar activity component.Since the purpose of this work is to study the solar activity signals, we introduced back the correction related to activity by YARARA to form an "YVA" dataset in order to follow the terminology introduced in Dalal et al. (prep).The time-series obtained with the YVA dataset are shown in Fig. 3.
Table 2. RMS of the time series of v obs , v ⊥ (shift-driven RVs) and v ∥ (shapedriven RVs), over different periods of time.In each case, the first four principal components are used to build the time series of shape-driven CCFs, C ∥ , from which v ∥ is extracted.

Name
Period

Planet-free case
We apply the framework defined in Sec.2.1 to the time series of daily-binned solar CCFs.Using the mean line profile as reference profile C R , we compute the time series of reference-subtracted CCFs (see the first panel of Fig. 4).From Eq. 1, we derive a time series of shift-driven (i.e.Doppler component in Sec.2.1) and shape-driven CCFs, shown in the middle and bottom panels of Fig. 4. As expected, the Doppler component only contains information on the first derivative of the CCFs, as evidenced by the fact that, at each epoch, the CCF variations are symmetrical with respect the center of the line (i.e.zero-velocity point).On the other hand, shape-driven residuals exhibit a more complex structure, non symmetrical with respect to the line center, suggesting that it probes higher-order derivatives of the CCF variations.We also double checked that the width and asymmetry of the shift-driven profiles remain constant over time.
We then apply SVD to the shape-driven CCFs and extract a basis of orthogonal vectors tracing the main non-Doppler-induced line profile variations.Applying the procedure described in Sec 2, we use the first four principal components in our reconstruction.As shown in Fig. 5, the first eigenvector  1 correlates strongly with v obs and usual activity indicators.In particular, we obtain a weighted Pearson correlation coefficient of 0.90 between  1 and S HK , which suggests that the variations of these two quantities are driven by similar processes (see Sec. 5 for a more detailed comparison).Higherorder SVD components tend to exhibit weaker correlations with CCF indicators, with the exception of  4 , which exhibits a relatively strong anti-correlation with the EW of the CCF, indicating that this indicator could be a good proxy of the photospheric temperature (see Collier Cameron et al. 2019).
We project the input CCFs onto the basis defined by components  1 to  4 , and derived shape-and shift-driven RVs (resp.v ∥ and v ⊥ ) using the method described in Sec.2.1.Both time-series are shown in Fig. 6 and their root-mean-square (RMS) deviations are given in Tab. 2. Starting from a dispersion of 2.1 m s −1 in the input data, the shape-and shift-driven RVs exhibit RMS dispersions of 1.8 and 1.1 m s −1 , respectively3 .In order to assess the effectiveness of the method for different levels of solar activity, we divide our input data set in the three seasons shown in Fig. 3. From 2015 to 2018 (Season 1), the Sun reaches the end of its activity cycle but still exhibits clear signs of activity.The Sun enters then a long period of activity minimum (Season 2, 2018(Season 2, -2021.8.8) which lasts until the end of 2021.With the start of Cycle 25 (Season 3, 2021(Season 3, .8-2024)), the Sun exhibits activity signals with significantly larger amplitude than in Season 1.Note that this division is motivated by two factors.Firstly, the cycleinduced evolution of the solar activity, evidenced, for example, by the amplitude of the fluctuations of the disk-integrated magnetic flux density (lower than 0.5 G in Season 2, according to SDO/HMI data extracted with SOLASTER; see Ervin et al. 2022) allows to to separate three different activity regimes.Secondly, important instrument maintenance operations, namely the change of Fabry-Pérot interferometer, between Seasons 1 and 2, and the refurbishment of the CCD camera, between Seasons 2 and 3, justify the definition of the three seasons adopted in this study.From Tab. 2, we note that our activityfiltering framework performs best when the Sun is more active, with a 44% reduction in RV RMS in Season 3, compared to only 7% in Season 2 (solar minimum).We also note that, in Season 1, our results are similar to those obtained with SCALPELS, on a similar data set (Collier Cameron et al. 2021).
As shown in Fig. 6, the shape-driven RVs seem to enclose most of the long-term RV variations, primarily driven by the magnetic cycle.This is further evidenced in the generalised Lomb-Scargle (GLS; Zechmeister & Kürster 2009) periodograms of the RV time series (see bottom curve on the top panel of Fig. 7), which are dominated by peaks at periods larger than ∼100 d.We note also a significant power near the Earth orbital period, also present in the periodogram of component  2 , attributable to residuals of the effects of Earth's orbital eccentricity and solar obliquity on the FWHM of the CCF, as described in Collier Cameron et al. (2019).These effects will not be observed in other stars.We also note that the power at the Sun's rotation period and harmonics remains strong in the shiftdriven RVs, despite a fraction of it being transferred to the shapedriven RVs.Surprisingly, components  3 and  4 exhibit a significant modulation of the Sun's rotation period and first harmonics and, thus, could potentially act as good proxies of quasi-periodic stellar activity signals (see Sec 5), although no strong correlation with CCF activity indicators is observed.

Effects on planet recovery
In order to assess the ability of our activity-filtering procedure to preserve planet RV signatures, we create 1 000 data sets from the solar HARPS-N observations, each containing a single planetary signal  directly injected in the CCF time-series with the same temporal sampling.As we assume the planet orbit to be circular, the corresponding RV signature,  p , as a function of the time, , is given by where the reference time,  0 , is set to BJD = 2 457 223.49054 (i.e., the time of our first observation), and where K inj , P orb and  p are respectively the semi-amplitude, orbital period and orbital phase of the injected signature.These three parameters are randomly drawn using log-uniform laws (for K inj and P orb ) and uniform laws (for  p ), as described in Tab. 3.
To inject the planet signature, we interpolate each CCF in the solar rest frame using a centered square-exponential Gaussian Process (GP; Rasmussen & Williams 2006; Aigrain & Foreman-Mackey 2022) with covariance function  between each pair of velocity bins   and   given by where   , is the uncertainty on the CCF flux associated with pixel , and  stands for the Kronecker delta.The two free hyperparameters of Eq. 4, namely the GP amplitude  and correlation length  e , are estimated by maximising the likelihood of the data, computed using the GEORGE python module (Ambikasaran et al. 2015).As a sanity check, we measured the RV and FWHM of each planet-injected CCF by fitting a Gaussian profile to it.The resulting RV/FWHM time-series differ by no more than 0.1  from their DRS-provided counterparts, confirming that this interpolation procedure only marginally affects the shape and position of the line profile.As a word of caution, we note that injecting the planet signature at the CCF level rather than at the spectrum level assumes that the Doppler shift is integrally preserved in the cross-correlation process.This assumption is motivated by recent studies demonstrating that reliable planet parameters were extracted from the CCF (e.g.John et al. 2023;de Beurs et al. 2024).
Our procedure to retrieve the injected planet signatures is similar of that used in Collier Cameron et al. ( 2021) and Wilson et al. (2022).For each data set, we jointly apply the dimensional reduction of Sec.2.1 and fit for a planet Doppler motion with Eq. 3. As detailed in the Appendix A of Wilson et al. (2022), projecting the observed RVs v obs onto the matrix P =  −U • U ⊺ , where  is the identity matrix and U the basis defined in Sec.2.1 (using four principal components), is equivalent to subtracting v ∥ from v obs .The likelihood L of the data given the model parameters  is then given by where  p is the planet RV model given by Eq. 3 and  is the number of data points.The covariance matrix  is assumed diagonal, with , where   is the formal RV uncertainty on the -th data point and   is a free parameter of the model to absorb RV variations that are not captured by Eq. 3 or the formal RV uncertainties (e.g.activity and granulation signals, instrumental stability).We assume that the planet orbital period and phase are known, leaving the planet RV semi-amplitude and the jitter term   as the only free parameters of the model.Their posterior probablity given the data is estimated, in the Bayesian framework, using the affine-invariant Markov Chain Monte Carlo (MCMC) process emcee (5 000 iterations of 100 chains; Foreman-Mackey et al. 2013).The best-fitting parameters and 1-uncertainties are estimated from the median and from the 16 th and 84 th percentiles of the chain after removing a burn-in period significantly longer than the auto-correlation time of the chain (about 100 iterations).As a reference, we also fit Eq. 3 to the input RV time-series, v obs , before filtering the shape-induced contributions.
The distribution of recovered RV semi-amplitudes, k est , is shown as a function of K inj in the top panel of Fig. 8 for K inj <1 m s −1 .In most cases, the recovered values of the planet RV semi-amplitude match their injected counterpart, with a slope of 1.03 ± 0.1.As shown in the bottom panel of Fig. 8, devivations from the identity are observed when the planet orbital period lies close to the Sun's rotation period or to the Earth orbital period (and first harmonic), which roughly corresponds to the location of the peaks in the periodogram of v ⊥ (see Fig. 7).No particular trend is observed between k est and the injected planet RV semi-amplitude or orbital phase.
Let us assume, conservatively, that a planet is detected if (i) k est differs from K inj by less than 1, and (ii) k est differs from 0 by at least 3.Using this criterion, about 45} of planets with RV semi-amplitudes lower than 1.0 m s −1 are detected from the activity-filtered RVs.This detection rate slowly decreases until K inj ≈ 0.2 m s −1 , where 29% of the injected planets are still recovered, and drops near zero for lower values of K inj .The planet detection rate decreases roughly linearly with the planet orbital period, and is about 30% and (resp.20%) for planet with orbital periods greater than 100 d (resp.300 d).The fraction of detected planets from v obs and v ⊥ is shown in the (K inj , P orb ) space in Fig. B1.We find that, on average, our sensitivity to planet signatures increases by about 20% from v obs to v ⊥ , and that this increase is the most spectacular for Earth-mass planets with orbital periods larger than 100 d, where the detection rate skyrockets by ∼50%.However, Earth-mass planets with orbital periods larger than 300 d remain mostly undetected in both time series, which is in line with the predictions of Meunier et al. (2023) 4 .
In order to assess how the sampling strategy affects our activityfiltering framework, we perform the planet injection-recovery for different numbers of randomly-selected data points, between 250 and 1750.For each sampling, we repeat the injection-recovery of the same 1 000 planet signals (using the same sampling for all planets).Fig. 9 shows the mean error K err on the RV semi-amplitude recovered with and without activity filtering.On average, K err is reduced by a factor ∼2 between v obs and v ⊥ .In both cases, the evolution of the detection rate with the number of points is well described by a square-root law (with reduced  2 of 1.0 and 1.1 for v obs and v ⊥ ; see Fig. 9).This suggests that the precision on k est is mostly driven by the number of data points used in the fit, despite the presence of super-granulation, activity residuals and instrumental systematics.

GAUSSIAN PROCESS REGRESSION
Our activity-filtering framework provides a new time series of RVs, v ⊥ , less dispersed than the input RVs, v obs .In the last section, we demonstrated that planet signatures in v obs would be largely preserved in v ⊥ .However, while the activity-filtering algorithm turns out to be efficient at filtering long-term activity effects, it leaves a significant rotationally-modulated component in v ⊥ , which also needs to be filtered out in order to improve the detectability of small planet signatures.Gaussian Process regression has become widely used to model quasi-periodic signals (Haywood et al. 2014;Rajpaul et al. 2015;Barragán et al. 2022b;Nicholson & Aigrain 2022;Aigrain & Foreman-Mackey 2022).In this section, we investigate how GPs can be used to model the stellar activity contributions in v obs and v ⊥ .In Sec.5.1, we first investigate how the statistical properties of

Modelling of usual activity proxies
We start by independently modelling the HARPS-N RV time series, v obs , as well as the time series of usual activity proxies (FWHM, V s , S HK ), using a one-dimensional (1D) GP with quasi-periodic covariance kernel  between each pair of epochs   and   given by where  GP is the GP period, which corresponds to the stellar rotation period (Angus et al. 2018;Nicholson & Aigrain 2022),  p is the inverse harmonic complexity and  e is the GP evolution time scale (already defined in Eq. 4).Each time series is modelled by six parameters: three hyper-parameters in the covariance matrix (Eq.6), the GP amplitude , a constant offset, and one additional uncorrelated jitter term  j , added in quadrature to the diagonal of the covariance matrix, to absorb any variations not accounted for by the GP.The parameter space is explored using the Bayesian Markov chain Monte Carlo (MCMC) sampler defined in Barragán et al. (2019), with uninformative priors for the parameters.To ensure that the MCMC process has converged, pyaneti iteratively runs 250 independent chains of 5 000 steps until the chains converge, which is assessed using the statistics of Gelman & Rubin (1992).As detailed in the Section 2.5 of Barragán et al. (2019), the code compares the variances between chains and within chains and uses the scaling factor R introduced in Gelman et al. (2004) to assess convergence (typically with R < 1.02).We then run a last MCMC process with 5 000 steps  of 250 independent chains, corresponding to 125 000 independent samples for each parameter, from which we estimate the best-fitting value with 1 uncertainties.
Since the HARPS-N solar data set covers more than half of the Sun's activity cycle, it is likely that the statistical properties (i.e.evolution time scale, harmonic complexity, period) of the activity RV signal vary over the course of the observations.We therefore independently analyse each of the three seasons defined in Tab. 2 (i.e.2015-2018, 2018-2021, 2021-2023).The best-fitting GP hyper-parameters for each season are given in Tab. 4, and their associated 1D posterior distributions are shown in Fig. 10.Significant variations are observed from one season to the next.The RV GP period increases from ∼26.4 d, in the 2015-2018 season, to ∼27.4 d after 2022 (i.e.∼2 increase).This is in line with expectations as active regions are found at higher latitudes in the start of the activity cycle and migrate near the equator at the end of the cycle (see Hathaway 2010, for a review of solar activity cycle).The difference in rotation periods then simply reflects the latitudinal dif-ferential rotation of the Sun (e.g.Howard & Harvey 1970;Snodgrass & Ulrich 1990;Beck 2000).For example, the average latitudes of sunspots computed from the Solar Optical Observing Network of the US Air Force (USAF), with the help of the US National Oceanic and Atmospheric Administration (NOAA), is about 12 • in 2015-2018 and 20 • in 2022-20245 , corresponding to a difference of about 1d in rotation period (using the latitudinal differential rotation profile of Snodgrass & Ulrich 1990), consistent with our estimates in Tab. 4.
We also note that the RV GP evolution time scales ( e ) are almost 2 larger (i.e.∼3 d) past 2018.Disk-resolved solar observations found larger active regions in the 2022-2024 period than in 2015-2018 5 , which is consistent with our observations as the active region evolution time scale roughly scales with the surface area of the features (e.g.Leighton 1964;Wang et al. 1989;Foukal 1998).The higher harmonic complexity (i.e.lower  p ) in the 2015-2018 season can be explained by the fact that equatorial active regions perturbs the wings of the profile more than higher-latitude regions and thus have an higher impact on the flux derivative which induce a sharper RV signature.As expected,  p is about twice as large for indicators of the photospheric flux (FWHM, S HK ) than for indicators of the first temporal derivative of the flux, like RV V s (e.g.Aigrain et al. 2012;Klein & Donati 2019;Nicholson & Aigrain 2022).The evolution of the GP covariance kernel over time, and its effect on the modelling of RV time series, is discussed in Sec 6.1.

Modelling activity-filtered RVs with Gaussian Processes
Using the 1D GP framework described in Sec.5.1, we model the time series of v ⊥ , v ∥ , as well as  1 to  4 .Except  2 , for which the MCMC process does not converge, all time series exhibit a quasiperiodic modulation well captured by the GP.For these time series, the best-fitting GP hyper-parameters are given in Tab. 4 and the posterior distributions of the fit parameters are shown for v ∥ and v ⊥ in Fig. 11.
As already intuited from Fig. 6, we find that the shape-driven RVs, v ∥ , are systematically smoother (i.e. with larger values of  p ) than v obs , which suggests that the activity signal within v ∥ is mostly driven by variations of the photospheric flux (typically induced by faculae and plages).This is further evidenced by the fact that v ∥ tends, in general, to evolve on longer time scales than v obs .On the other hand, the shift-driven RVs v ⊥ , exhibit a higher harmonic complexity and evolve on time scales similar to v obs , which suggests that the stellar activity signal in v ⊥ is sensitive to the variations of the first derivative of the photospheric flux (e.g.induced by spots).As expected, the bestfitting GP amplitudes are consistent with the RMS of the different time series given in Tab. 2.
In all three seasons, a value of 0.25-0.3m s −1 is obtained for jitter term in v ∥ , much lower than its counterpart in v obs .This is expected as, by definition, v ∥ contains only a small fraction of the white noise of HARPS RV and, therefore, the estimate of  j reflects more what the GP cannot model than the real dispersion of white noise in v ∥ .On the other hand, jitter terms of 0.5 to 0.6 m s −1 are found for v obs and v ⊥6 .In each case, the jitter term is considerably larger than the photon noise of ∼ 0.1 m s −1 , on the daily-binned RVs, which suggests that a significant fraction of the RV variation can be explained neither by the GP nor by formal uncertainties.In our daily-binned data set, both oscillation-and granulation-driven RV variations are expected to be averaged out due to their short evolution time scales (e.g.Dumusque et al. 2011;Chaplin et al. 2019).In contrast, super-granulation, which evolves on time scale of days, will be only partially reduced by our binning process.Recently, Al Moulla et al. (2023) and Lakeland et al. (2024) measured RV RMS contributions of ∼0.7 m s −1 and ∼0.9 m s −1 for super-granulation in the HARPS-N solar data, consistent with the simulations of Meunier et al. (2015).Therefore, the dispersion budget in the RV residuals is likely a mix of super-granulation and long-term stability (∼0.4 m s −1 for HARPS-N).
The posterior densities of the hyper-parameters of the GP fit to  1 ,  3 and  4 are shown in Fig. C4, and their best-fitting values are given in Tab. 4. As  1 is by far the dominant component of the SVD to the shape-driven CCFs, it follows a very similar behaviour to v ∥ and to S HK (as expected from Fig. 5).On the other hand,  3 and  4 exhibit similar statistical properties as v ⊥ , which suggests that they could be good proxies of the stellar activity signal in the latter time series.This is investigated in the next section.

Multi-dimensional GP analysis
In the practical case of the search for planet signatures in observed RVs, the shape-driven RVs, v ∥ could potentially be used as planetindependent proxies to mitigate stellar activity signals.The multidimensional GP framework of Rajpaul et al. (2015) is one of the most robust ways to simultaneously model RVs and activity proxies, thereby boosting the number of constraints on the stellar activity parameters.On the other hand, the shift-driven RVs, v ⊥ , are less dispersed than v obs , which could boost the sensitivity to low-ampitudes planet signatures.Furthermore, we have shown in Sec.5.2 that the residual quasi-periodic activity signals in v ⊥ would be well modelled by a GP with quasi-periodic kernel.In this section, we explore how v obs and v ⊥ perform in the search for long-period low-amplitude planet signatures.

Method
In our model, we assume that the activity signal in the RVs and indicators follow a FF'-like relation (Aigrain et al. 2012;Rajpaul et al. 2015).Each time series  is expressed as a linear combination of a latent variable  (typically the square of the photospheric flux) and its first temporal derivative , such that, at time , where the amplitudes   ,   and   are free parameters of the model.The latent variable  is modelled as a GP with the quasiperiodic covariance kernel defined in Eq. 6.
We inject the RV signature of a single planet on a circular orbit to the HARPS-N solar CCFs.We assume an orbital period of 100 d and a RV semi-amplitude of 0.4 m s −1 , which corresponds to a planet mass of 2.9 M ⊕ .In order to limit bias, three different orbital phases ( p in Eq. 3) of 0.0, 0.3 and 0.7 are considered.We apply the framework of Sec.2.1 to generate time series of shift-and shape-driven RVs, from the input RVs.We then use the multi-dimensional GP framework to model the activity signal within (1) v obs and FWHM, (2) v obs and V s , (3) v obs and S HK , (4) v obs and v ∥ .These indicators exhibit relatively similar evolution time scales and period to v obs , and are therefore promising candidates for multi-dimensional GP modelling.Differences in  p are naturally accounted for in the framework of Barragán et al. (2022a).
Since we have demonstrated that planet signatures are mostly preserved in v ⊥ , the planet search can as well be performed directly from  this time series7 .We therefore consider four additional cases, namely (5) v ⊥ and FWHM, (6) v ⊥ and V s , (7) v ⊥ and S HK , and (8) v ⊥ and v ∥ .As a reference, we also model the stellar activity RV signals in v obs (case R 0 ) and in v ⊥ (case R ⊥ ) using uni-dimensional GPs with QP and SE kernels, respectively.In all cases, the GP is jointly modelled with a planet RV signature of fixed orbital period and phase, to simulate the RV monitoring of a transiting planet (e.g.unveiled by the PLATO space mission).Note also that, in all cases, the same planetary signal has been injected to the data, but the GP model differs from one case to the next.As shown in Sec.5.1, the statistical properties of the time series vary significantly from one season to the next.We therefore choose to perform the planet injection-recovery tests independently on each of the three seasons defined in Tab. 2.
For each season, our input data sets contain the same amount of data points (460), spread evenly over a ∼2-yr period.The parameter space is sampled using the procedure described in Sec.5.1.

Results
The best estimates of the planet RV semi-amplitude and of the jitter term  j are given in Tab. 5 and shown in Fig. 12.The recovered values of k est are consistent with the injected one in all cases (note that, in all cases, the same planetary signal has been injected to the data), but the precision of the estimate varies significantly from one case to the next.Unsurprisingly, the 1D GP modelling of the HARPS-N RVs performs the least well, especially when the Sun is more active, in which case only a marginal ∼2  planet detection can be claimed.This is due to the fact that the long-term activity evolution and the planet signature cannot be easily separated with a simple 1D GP, which leads to larger error bars on k est .
We find that multi-dimensional GPs give, in most cases, much smaller error bars on k est .In particular, cases (1) and ( 3), where the FWHM of the CCF and the S index are used as activity proxies of v obs , perform best in all three seasons.These two indicators capture well the activity signatures sensitive to the photospheric flux, and thereby complement well the RV signals, sensitive to variations of the flux and its first derivative.In these two cases, the multi-dimensional GP leverages all the constraints available, which results in more precise estimates of k est .This also leads to higher values of  j , since the GP is now less flexible (due to the increased number of constraints), and less likely to absorb non-activity-induced variations.
We also note that cases (2) and ( 4), where v obs is combined with V s and v ∥ , perform well when the Sun is more active but yield significantly larger error bars on k est during solar minimum.During this period, there are fewer and smaller active regions than during more active epochs.It is therefore more likely that the statistical properties of the stellar activity RV signal vary significantly from one solar rotation to the next, and are thereby more difficult to model with a quasi-periodic GP (see Sec. 6.1).During solar minimum, neither V s nor v ∥ bring any additional constraint on the quasi-periodic activity RV signal and, therefore, the multi-dimensional GP modelling leads to results similar as the 1D GP modelling of v obs .
More precise values of k est are obtained in case R ⊥ than in case R 0 , which just reflects the fact that activity has been partly filtered from v ⊥ .However, these values are significantly less precise than those obtained in the multi-dimensional GP framework, even when the computation of v ⊥ is bypassed (i.e.cases 1 to 4).Cases where multidimensional GPs are used with v ⊥ (i.e.cases 5 to 8) tend to perform similar as their v obs counterpart (cases 1 to 4).In addition, we note that none of the SVD components,  1 ,  3 and  4 , outperforms v ∥ in the multi-dimensional GP framework.In particular, despite their periodic modulation,  3 and  4 do not improve the RV modelling compared to case R 0 .

DISCUSSION AND CONCLUSIONS
In this paper, we analysed the activity-induced distortions in the absorption lines of the Solar spectrum, intensively monitored with HARPS-N over the last 8 years.From the DRS-processed systematiccorrected CCFs, we constructed a dimensionally-reduced basis, which allowed us to separate the observed RVs (v obs ) into complementary time series, namely v ⊥ (i.e.shift-driven RVs) and v ∥ (shape-driven RVs).In the first one, the variations are largely dominated by Doppler shifts of the entire CCF, induced, for example, by planets or granulation at the solar surface.In the second one, the variations are only driven by CCF distortions and, thereby, probe the effects of active regions whilst being independent of planet signatures.When we apply this framework to the HARPS-N solar spectra, the RMS of the observed RVs goes from 2.05 to 1.06 m s −1 , hence a decrease of about 50%.We find that planet signatures are mostly preserved in v ⊥ , and that the efficiency of the activity filtering is not affected by the temporal sampling.However, as shown in Fig. 7, v ⊥ exhibits significant rotation-induced variations, suggesting that the shift-driven RVs are still affected by stellar activity.

Evolution of the GP covariance structure
In order to better understand the nature of the rotational modulation within v obs , v ⊥ and v ∥ , all time series were modelled using a GP with quasi-periodic covariance kernel.As shown in Fig. 13, the structure of the covariance matrix of the time series evolves significantly over time.The RV variations are smoother and more slowly evolving at the start of Cycle 25 than at the end of Cycle 24, which reflects the fact that the Sun's active regions are larger and at higher-latitude in the start of the magnetic cycle.We found that the quasi-periodic RV activity signals driven by flux variations as well as the long term cycle evolution are well captured by v ∥ .On the other hand, v ⊥ is still plagued by high-complexity quasi-periodic activity signals, likely reflecting variations in the first temporal derivative of the photospheric flux.
To visualise how the temporal evolution of the GP covariance kernel affects our GP fit, we model the full HARPS-N RV time series with a single 1D GP, using the quasi-periodic covariance kernel of Eq. 6.The best-fitting covariance kernel, shown in dashed black line in Fig. 13, differs significantly from the kernels obtained individually for the different seasons.The signal is now found smoother ( p ≈ 0.4) and more slowly evolving ( e ≈ 26 d).On the other hand, the uncorrelated jitter term  j is now ∼1.8 m s −1 , almost three times greater than on individual seasons.As explained in Sec.5.3, we expect  j to increase, since we have tripled the number of constraints in the fit.On the other hand, the values of  j obtained with the multi-dimensional GP in Tab. 5 are about 0.8-1.0m s −1 , significantly smaller than 1.8 m s −1 .Moreover, such a high value can be explained neither by super-granulation or long-term instrument stability.We therefore conclude that, as things stand, quasi-periodic stellar activity RV signals are better modelled by GPs on chunks during which cycle-driven variations remain small.
If we now model the full time series of shift-driven RVs, v ⊥ , with a single 1D GP, the best-fitting hyper-parameters are fully consistent with those obtained on individual seasons, despite the variations of P GP between 2015 and 2024.The best-fitting evolution time scale and GP period lie in between the estimates listed in Tab. 4, and a model with high harmonic complexity ( p = 0.23 ± 0.01) is preferred.The value of  j (0.62 ± 0.02 m s −1 ) lies in between the values obtained on individual seasons (between 0.57 and 0.72 m s −1 ), which indicates that our fit is as good on the whole data set as it is on individual seasons.This can be explained by the fact that most of the long-term activity variations (e.g.induced by the magnetic cycle), are well captured by v ∥ and, thus, well filtered out from v ⊥ .Therefore, we expect the GP covariance matrix to exhibit fewer variations in the case of v ⊥ than in the case of v obs .
To investigate what drives the evolution of the quasi-periodic covariance kernel in v obs , we divide the HARPS-N RVs into 270-d chunks (i.e. 10 rotation periods), using a moving window with a step of 100 days.After discarding chunks containing large gaps in the data, we model each subset of data using a GP with the quasiperiodic kernel given in Eq. 6.The best-fitting hyper-parameters are shown in Fig. C5.The variation of the GP amplitude, which is well described by a parabola, is unsurprisingly a good tracer of the solar magnetic cycle.We do not observe significant variations either in the time scale of the evolution of the GP or in the period, due to the relatively large uncertainties over these two parameters (about 4 d and 1 d for  e and P GP , respectively).On the other hand, inverse harmonic complexity,  p , varies significantly from one chunk to the next.When discarding the values obtained during solar minimum, often plagued with large uncertainties due to the weak activity signal, we find a positive correlation with the GP amplitude (Pearson correlation coefficient of 0.7).
In order to better understand the origin of this correlation, we compare in Fig. 14 the values of  p to the whole sunspot area observed by USAF/NOAA 7 , averaged over each chunk outside the solar minimum, and to the spot filling factor derived from SDO HMI obser-  vations (using the method of Milbourne et al. 2019;Haywood et al. 2022;Ervin et al. 2022;Lakeland et al. 2024).Both quantities appear to strongly correlate with  p (Pearson correlation coefficients of 0.94 and 0.82; see Fig. 14).When the spot filling factor is large, sunspots are likely to be more evenly distributed in longitude at the stellar surface, which results in a smoother RV signal (i.e.greater values of  p ). Conversely, for small filling factors, sunspots are more likely to be longitudinally isolated and create sharper RV variations.We also find that  p exhbits a weaker correlation with the filling factor of faculae and plages (Pearson correlation coefficient of ∼0.6).This suggests that long-term variations in  p are due more to sunspots than to the magnetic cycle, known to be best described by the filling factor of faculae/plages over the course of our observations (Cretignier et al. 2024).The inhibition of convective blueshift in faculae induces an RV contribution that evolves, to the first order, as the square of the disk-integrated photospheric flux (Aigrain et al. 2012).Therefore, in absence of spots, the RV activity signals should not vary less smoothly than the FWHM or the S index.On the other hand, spot-induced activity signals, which vary as the derivative of the photospheric flux, are characterised by a significantly higher harmonic complexity.Therefore, despite the fact that faculae dominate the stellar activity RV budget, spots remain the main driver of the smoothness of the signal, probed by  p .Despite the fact that this result has been intuited in the literature (e.g.Nicholson & Aigrain 2022), it is the first time that it is observed on data.
In contrast to v obs , no coherent variations in the statistical properties of v ⊥ are observed.The GP amplitude remains roughly constant over time, and the correlation between  p and the spot filling factor (or whole spot area) is now much weaker (Pearson correlation coefficients of 0.55 and 0.39; see Fig. 14) is observed.This confirms that, unlike v obs , the quasi-periodic activity signal in v ⊥ can be modelled by a single GP on time scales of several years (potentially over the full cycle).

Impact of systematics
The analysis presented in this study was conducted on CCFs computed from spectra processed by YV1, as described in Sec.3.2.Most of the corrected systematics are expected to impact the shape of the CCFs and, therefore, the activity-filtering framework described in Sec.2.1.To quantify this impact, we repeat the analysis presented in Sec.4.1 with the non-YARARA-processed spectra.The spectra selected in Sec.3.1 are normalised and aligned using the procedure described in Sec.3.2, and CCFs are computed using the "Kit-Cat" line list of Cretignier et al. (2022).Following the method described in Sec.2.1, we extract the time series of v ∥ and v ⊥ , using the first 4 components in the basis projection.
We find that, whereas the RMS of v obs remains pretty similar in the post-processed counterpart, the shift-driven RVs v ⊥ (resp.shapedriven RVs v ∥ ) are now significantly more (resp.less) dispersed, with a dispersion of 1.36 m s −1 RMS (resp.1.52 m s −1 RMS).When working independently on the three seasons defined in Tab. 2, we only see a noticeable decrease in RV RMS in Season 3. Component  1 still correlates well with the FWHM of the CCF, but both time series are dominated by 0.5-and 1-yr periodic modulations, induced by the Earth orbital obliquity and eccentricity (Collier Cameron et al. 2019).In contrast, no more correlation between  1 and v obs , V s and S HK is observed.Moderate correlations (Pearson correlation coefficients up to ∼0.6) are observed between these quantities and higher-order components, which reflects the fact that information has been diffused among the SVD components, due to systematics contributing as much, if not more, to the CCF variations.
When the RV time series are modelled with a quasi-periodic 1D GP, we find that v ∥ is significantly smoother and slowly-evolving than its YV1-processed counterpart.The fit is now poorer, with a residual RMS twice as large.This is due to the fact that v ∥ is plagued with systematics (e.g.due the 0.5-yr and 1-yr variations in FWHM Collier Cameron et al. 2019), which affects the GP modelling.On the other hand, whereas no significant changes in the statistical properties of v ⊥ are observed, the values of  j , the uncorrelated jitter term, have increased to ∼1 m s −1 , which means that the fit is poorer.We conclude that our framework to correct for shape-driven RV variations works best when systematics have been corrected beforehand, through a dedicated post-processing technique.The latter helps to better isolate stellar activity signals in the spectra, which can therefore be more accurately modelled.

Activity filtering and planet recovery
By jointly modelling the stellar activity RV signal in v obs with different activity proxies, either extracted from the input spectra (e.g.CCF FWHM, S HK ) or from our framework (i.e.v ∥ ), we demonstrated that long-period planets with small RV semi-amplitudes could be reliably detected with a multi-dimensional GP framework (see Tab. 5).This experiment also confirms that modelling the reduced RVs with a one-dimensional GP will likely yield imprecise parameters for longperiod planets, whereas multi-dimensional GPs, by increasing the number of constraints on the fitted parameters, will, in most cases, provide a more robust treatment of long-term variations.
Further investigations are also needed to leverage all the information available in the wavelength space (CCFs8 , spectra) to extract planet signatures in an optimal way and increase the constraints on their parameters (following the works of, e.g., Dumusque 2018;Rajpaul et al. 2020;Collier Cameron et al. 2021;Al Moulla et al. 2022;de Beurs et al. 2022;Cretignier et al. 2022).In particular, methods like Doppler Imaging (Kochukhov 2016;Luger et al. 2021;Asensio Ramos et al. 2022;Klein et al. 2022), which model the full absorption line profile (or spectrum), thereby bypassing the computation of RVs, might become robust complements to usual activity modelling techniques.Finally, long-term variations of the activity properties over the cycle (as evidenced in Fig. 13) are still hard to reproduce by current state-of-the-art modelling tools like GPs.Defining more physically-driven GP kernels (e.g.Hara & Delisle 2023) or allowing some hyper-parameters to vary with time are potential avenues for solving this problem.
Yet, even if the best-case scenarios, the RV residuals of the GP fit exhibit RMSs greater than ∼0.5 m s −1 , significantly larger than the formal RV uncertainties of ∼0.1m s −1 for the daily-binned data.This high dispersion is most likely attributable to the long-term instrument stability and to the solar super-granulation.This phenomenon, whose origin remains unclear (see Rincon & Rieutord 2018, for a review), is expected to induce RV signals on the same order of magnitude as our residuals (Meunier et al. 2015;Al Moulla et al. 2023).Moreover, as we do not expect granulation signals to have obvious effects on the shape of the spectral lines, the activity-filtering method presented in Sec.2.1 would not correct for them.Similarly, our GP model is not expected to affect super-granulation signals, as it is designed to account for rotationally-modulated activity signals which evolve on significantly longer time scales than super-granulation.Averaging out the RV effect of super-granulation using dedicated sampling strategies, as generally done for oscillations-and granulation-induced RV signals (Dumusque et al. 2011), is not a straightforward option here, as it will require the star to be intensively observed on time scales of days.As indicated in Meunier & Lagrange (2019), better understanding the origin of super-granulation, and developing physicallyor data-driven methods to model its RV contributions (O'Sullivan & Aigrain 2024, see), will be an important step to detect Earth-mass planets as part, for example, of PLATO ground-based monitoring or dedicated RV surveys like the Terra Hunting Experiment (Hall et al. 2018).As things currently stand, super-granulation signals have to be treated as white noise, which is emphasized by the fact that, in Sec.4.2, the precision of the recovered planet semi-amplitudes improves as the squared root of the number of measurements.If this result is confirmed by dedicated studies, it would mean that RV monitoring missions should aim at maximising the number of measurements on each target (ensuring these are spaced at least 1-2 d apart so that the super-granulation signal is uncorrelated), even at the cost of monitoring fewer stars overall.HD4628 in Stalport et al. (2023) for the solar observations in Fig. A1 and Fig. A2 with blue and green CCFs respectively.
The first PCA component of blue and green CCFs reveal the clear signature of the change of v sin i and of the cyostat change around BJD=2,459,491.The second PCA component in the blue contains extra component such as the warm-up at BJD=2,458,629, while the green CCFs contains a long trend with the cryostat offset.This long trend may be correlated with the long decrease in SNR due to ageing of the solar dome.The third PCA component is mainly an antisymmetric variation (therefore not used) and clearly show the stellar activity power at Prot/2.

APPENDIX B: PLANET DETECTION MAPS
Fig. B1 shows the detection rates of the planets injected in the HARPS-N solar CCFs without and with activity filtering (see Sec. 4.2), as a function of the planet RV semi-amplitude K inj and orbital period P orb .Planets considered to be detected if the recovered RV semi-amplitude differs by less than 1 from K inj and differs from 0 by at least 3.As outlined in Sec.4.2, the activity-filtering framework increases the sensitivity to planets with low semi-amplitude (typically lower than 0.4 m s −1 ) and larger orbital period (typically greater than ∼100 d).We note a very light decrease of the detection rate of short-period higher-amplitude planet signatures after activity filtering.This can be understood as uncertainties on the RV semi-amplitude are significantly smaller when estimated from the activity-filtered RVs than from the raw RVs (see Fig. 9).Note that the best estimates of the RV semi-amplitudes of the planets missed within ths (K inj ,P orb ) space still lie within 2 of the injected RV semi-amplitude.

APPENDIX C: GAUSSIAN PROCESS MODELLING
In this appendix, we give additional details of the GP modelling of spectroscopic activity indicators, namely the HARPS-N solar RVs (v obs ), the time series of v ⊥ and v ∥ , computed in Sec.4.1, and usual activity proxies (i.e.FWHM, V s and S HK ).In Fig. C1, Fig. C2 and C3, we show the quasi-periodic 1D GP fit to the different time series in Season 1 (2015-2018), 2 (2018-2021.8) and 3 (2021.8-2024),respectively.The posterior densities of the hyper-parameters of the 1D GP fit to the time series of eigenvectors  1 ,  3 and  4 , extracted in Sec.4.1, are shown for all three seasons in Fig. C4.We  C3), when the Sun is more active.In this regime, rotationally-modulated signals are present in all activity proxies except v ⊥ , where they have been filtered out.These activityinduced fluctuations appear significantly faster-evolving at the end of cycle 24 (Season 1, Fig. C1), which is consistent the lower GP evolution time scale and inverse harmonic complexity of this season (i.e. e and  p from Tab. 4).The signal is the most complex over the solar minimum (Season 2, Fig. C2).In this case, the GP fit is controlled by the few activity-induced fluctuations, visible notably in the end of the season, which explains why the best-fit GP parameters in Tab. 4 are similar to those obtained in Season 3. Outside these activity-dominated regions, we note that the FWHM and S HK index are mostly flat, whereas the RVs still exhibit fluctuations of about 1 m s −1 peak-to-peak, most likely due to an interplay between activity residuals, granulation and instrument stability.
Following the method introduced by Collier Cameron et al. (2021), we use singular-value decomposition (SVD) to build a set of  independent vectors, { 1 , ...,   }, from C S .Each of these vectors is a time series of  points associated with a score that scales with its contribution to the variance of C S .In what follows, the basis vectors are sorted by decreasing score value, so that  1 and   contribute the most and the least to the variance of C S , respectively.Finally, we project our reference-subtracted CCFs C onto the basis U = ( 1 , ...,   ), where each column  is equal to vector   .Using the same notations as Collier Cameron et al. (2021), we call C ∥ the projection of C on U, such that

Figure 1 .
Figure 1.Dynamical spectrum of the different time series of CCFs generated with SOAP2 using the parameters listed in Tab. 1.Each panel depicts the relative flux of the synthetic CCFs (after subtraction of a reference line profile, computed in absence of activity), in percent, as a function of the rotational phase of the star (X-axis) and line velocity (Y-axis).

Figure 2 .
Figure 2. Time series of raw RVs (black solid lines), shift-driven RVs (v ⊥ ; blue dashed lines) and shape-driven RVs (v ∥ ; yellow dotted lines), as a function of the rotational phase of the star, for the simulations described in Tab. 1.

Figure 3 .
Figure 3. YVA solar dataset.From top to bottom: time series of HARPS-N solar RVs, FWHM, velocity span and equivalent width of the HARPS-N solar CCFs, and S index.The vertical dashed lines delimitate the three seasons defined in Tab. 2.

Figure 4 .
Figure 4. Time series of reference-subtracted Solar CCFs (top panel), Doppler components computed using Eq. 1 (C D , middle panel), and shape-driven residuals (C S , bottom panel).

Figure 5 .
Figure 5. First four SVD components ( 1 to  4 ) plot against the median-subtracted HARPS-N solar RVs, the FWHM, bisector velocity span and EW of the CCFs, and the S HK time series.The individual points are colour-coded by date of observation, bluer (redder) points corresponding to more recent (older) observations.In the bottom right of each panel, we indicate the Pearson correlation coefficient of the two time series (weigthed by the uncertainties on the X axis).Note that all correlation coefficients are associated with -values lower than 0.05, and are thus considered to be significant.

Figure 6 .
Figure 6.Time series of HARPS-N solar RVs (top curve), shift-driven RVs (v ⊥ ; curve in the middle) and shape-driven RVs (v ∥ ; bottom curve).The typical formal error bar on each RV point is about 0.16 m s −1 .The vertical dashed lines divide the data into the three periods shown in Fig. 3.

Figure 7 .
Figure 7. Top panel: GLS periodogram of the HARPS-N solar RVs (top line), the shift-driven RVs (v ⊥ , middle line) and shape-driven RVs (v ∥ , bottom line).Bottom panels: GLS periodogram of the first four principal components,  1 to  4 , of the shape-driven CCFs.From right to left, the vertical orange dashed and magenta dotted lines indicate the Solar rotation period (and first two harmonics) and the Earth orbital period (and first harmonic).The periodograms were computed using the astropy python module(Astropy Collaboration  et al. 2013, 2018, 2022).

Table 3 .
Probability laws used to generate the semi-amplitude, orbital period and orbital phase of the planet RV signatures injected in the HARPS-N Solar CCFs (see Sec. 4.2).U and log U stand respectively for the Uniform and log-Uniform probability laws.The last two columns give the minimum and maximum values used for each parameter.

Figure 8 .
Figure 8. Top: Planet RV semi-amplitude, k est , recovered from the activityfiltered RVs (v ⊥ , dark blue dots) and from the raw RVs (v obs , light gray dots), as a function of the RV semi-amplitude K inj of the planets injected in Sec 4.2.The magenta dotted and dashed lines indicate the identity and the best-fitting straight line, respectively.Bottom: Difference between k est and K inj as a function of the orbital period P orb of the injected planet.The vertical orange dashed and magenta dotted lines indicate the Solar rotation period (and first harmonic) and the Earth orbital period (and first harmonic).

Figure 9 .
Figure 9. Mean error  err on the planet RV semi-amplitude estimated from the activity-filtered RVs (v ⊥ , dark blue dots) and from the raw RVs (v obs , light blue open circles) as a function of the number of data points N pts .For each case, the dashed line indicates the the best-fitting square-root law (reduced  2 of 1.0 and 1.1 for v obs and v ⊥ , respectively).

v
obs and usual stellar activity indicators compare to each other and evolve over time.In sec.5.2, we assess the ability of GPs to model the quasi-periodic activity signals in v ⊥ , and in  1 to  4 .In Sec.5.3, we finally investigate how a joint multi-dimensional GP-based activity filtering and planet search algorithm will perform for different indicators and input RV signals.All GP modellings in this section were performed using the open-source software pyaneti (Barragán et al. 2019, 2022a) with non-informative priors for all parameters.

Figure 10 .
Figure10.One-dimensional posterior densities of the hyper-parameters of the GP fit to the HARPS-N solar RVs (filled grey histograms), and to the time series of FWHM (thin blue lines), bisector velocity span (gold lines) and S HK (thick red lines), for the three seasons listed in Tab. 2. Columns 1, 2 and 3 show the posterior densities of the GP evolution time scale ( e ), inverse harmonic complexity ( p ) and period (P GP ).

Figure 11 .
Figure 11.Uni-dimensional posterior densities of the jitter term, GP amplitude and GP hyper-parameters of the fit to the HARPS-N solar RVs (filled gray distributions), the shape-driven RVs v ∥ (thin gold lines), and the shift-driven RVs v ⊥ (thick blue lines).

Figure 12 .
Figure 12.Best estimates of the planet RV semi-amplitude (k est , top panel) and jitter term ( j , bottom panel) for the different cases listed in Tab. 5.The orange dots, light blue filled circles and red triangles indicate the results obtained for the 2015-2018, 2018-2022 and 2022-2024 season, respectively.The dashed horizontal line on the top panel indicates the semi-amplitude of the planet RV signature injected in the data.

Figure 13 .
Figure 13.Evolution of the covariance kernel of the stellar activity RV signal, computed using Eq.6 with the best-fitting hyperparameters of the GP modelling.The thick dashed line is obtained by modelling all RVs together and the vertical dotted line indicate the average synodic rotation period of the Sun (27.2753 d).

Figure 14 .
Figure14.GP inverse harmonic complexity extracted from the HARPS-N RVs (v obs , dark blue stars) as a function of the whole sunspot area (in millionths of solar hemisphere; left-hand panel), and of the SDO spot filling factor (right-hand panel).In each panel, we give the Pearson correlation coefficient , and show the best-fitting straight line (orange dashed lines).For comparison, we also show the inverse harmonic complexities extracted from the shift-driven RVs (v ⊥ ) in gray dots, with Pearson correlation coefficients of 0.55 and 0.39 with the whole spot area and SDO spot filling factor, respectively.

Figure A1 .
Figure A1.PCA decomposition of the transformed CCFs residuals time-series.The figure is identical to the Fig.C.2 obtained for HD4628 in Stalport et al. (2023) excepted that instead of a CCF obtained with all the stellar lines, we represented here the "blue CCF" ( < 4877 Å) analysis.Left: PCA components obtained for the CCF residuals (see main text).The component decorrelate by the  HK is also shown in black.The vertical dotted line split the symmetric (left) from antisymmetric (right) profiles deformations.The parameter of symmetry  is shown on the left of the profiles and is in bold for significant symmetric profile ( > 3).Right : Score of the PCA components and corresponding periodograms in days.The time-series coefficient used to correct for the change of the PSF are given by the symmetric components deformation (blue and orange components).

Figure A2 .
Figure A2.Same as Fig.A1for the green CCF (4877 Å<  < 5856 Å).The jump corresponding for the warm-up of the detector is not visible around BJD=58,629 on contrary to the blue CCF.This indicates that warm-up signatures are mainly visible in the blue part, coherent with the higher contrast of the ghosts.

Figure A3 .
Figure A3.Same as Fig.A1 for the red CCF ( > 5856 Å).The jump corresponding for the warm-up of the detector is not visible, neither the cryostat change.

Figure B1 .
Figure B1.Planet detectability maps, as a function of the RV semi-amplitude and orbital period, recovered from the HARPS-N solar RVs without activity filtering (left-hand panel) and using the activity-filtering framework of Sec.2.1.In each panel, the color code depicts the planet detection rate (darkblue / yellow indicating detection rates close to 0 / 100%).The numbers within each cell indicate the detection rate in percent.

Figure C1 .
Figure C1.Best-fitting GP prediction to different time series of activity indicators extracted from HARPS-N Solar spectra, during Season 1 (i.e.2015 to 2018, see Tab. 2).In each panel, the data points with formal 1 error bars are shown in black dots, and the best-fitting GP prediction (resp.1 error bands) is indicated by the magenta solid line (resp.shaded bands).

Figure C4 .
Figure C4.Uni-dimensional posterior distributions of the GP evolution time scale (left-hand column), inverse-harmonic complexity (middle column) and period (right-hand column) obtained by modelling the HARPS-N solar RVs (filled grey histograms), v ⊥ (filled yellow histograms), and  1 ,  3 and  4 (thin, medium thick and thick gray lines).

Figure C5 .
Figure C5.Distribution of the best-fitting GP hyperprameters obtained by modelling 270-d chunks of HARPS-N solar RV.The dotted blue line in the top panel shows the best-fitting parabola to the GP amplitudes.The three seasons defined in Tab. 2 (i.e.end of Cycle 24, solar minimum and start of Cycle 25) are delimited by the vertical dashed lines.

Table 4 .
Best-fitting evolution time scale ( e ), inverse harmonic complexity ( p ) and period (P GP ) of the one-dimension GP fit to the different time series analysed in this study.