## Abstract

The number of main-sequence stars for which we can observe solar-like oscillations is expected to increase considerably with the short-cadence high-precision photometric observations from the NASA Kepler satellite. Because of this increase in the number of stars, automated tools are needed to analyse these data in a reasonable amount of time. In the framework of the asteroFLAG consortium, we present an automated pipeline which extracts frequencies and other parameters of solar-like oscillations in main-sequence and subgiant stars. The pipeline uses only the time series data as input and does not require any other input information. Tests on 353 artificial stars reveal that we can obtain accurate frequencies and oscillation parameters for about three quarters of the stars. We conclude that our methods are well suited for the analysis of main-sequence stars, which show mainly p-mode oscillations.

## 1 INTRODUCTION

Stars with subsurface convection zones, like the Sun, display acoustic oscillations. The stochastic excitation mechanism limits the amplitudes of the oscillations to intrinsically weak values. However, it gives rise to a rich spectrum of oscillations. The excited pressure (p) modes probe different interior volumes, with the radial and other low angular-degree modes probing as deeply as the core. This differential penetration of the oscillations allows the internal structure and dynamics to be inferred as a function of depth. Seismic studies of the Sun have indeed proven to be very powerful in inferring its internal structure.

The fact that the solar-like oscillations have such small amplitudes has made observations of these oscillations in stars other than the Sun very challenging. Over the past few years, asteroseismic observations of main-sequence stars up to evolved red giant stars have been made using Doppler velocity measurements from ground-based spectrographs, e.g. ELODIE (Baranne et al. 1996), CORALIE, HARPS (Queloz et al. 2001), UCLES and UVES (D'Odorico 2000), and photometric space-based instruments such as WIRE (Buzasi 2000), MOST (Matthews et al. 2000) and CoRoT (Baglin et al. 2006). This has led to detections of solar-like oscillations in more than 10 main-sequence stars and of the order of 1000 red giants. For a recent, but pre-CoRoT, review of these results see Bedding & Kjeldsen (2008). CoRoT results for main-sequence stars are presented by, for example, Michel et al. (2008), Appourchaux et al. (2008), García et al. (2009), while De Ridder et al. (2009) and Hekker et al. (2009) present first CoRoT results for red giants.

The NASA Kepler satellite was launched successfully into an Earth trailing orbit on 2009 March 7. The satellite contains a Schmidt telescope with a 0.95-m aperture and a 105 deg^{2} field of view, equipped with a highly sensitive photometer with a spectral bandpass from 400 to 850 nm. It is designed to continuously and simultaneously monitor 100 000 stars brighter than 14 mag. Kepler will be pointed towards the constellations Cygnus and Lyra during the entire mission, which has a nominal length of 3.5 years. For most stars, data will be integrated over 30 min, while for approximately 512 stars at a time, data with a 1-min cadence will be obtained. Although the driving goal for the development of Kepler is to observe transiting Earth-like exoplanets, these observations are very well suited for asteroseismology, including low-amplitude solar-like oscillations. Even so, asteroseismology will contribute to the exoplanet investigations by determining radii of the planet-hosting stars, which are needed to extract the planetary radii. The radii of stars exhibiting solar-like oscillations can be obtained using the difference in frequency between modes with consecutive radial orders, i.e. the large separation (Δν). This is a measure of the sound travel time across the star, which depends on the density of the star. See Stello et al. (2009b) for an overview of current approaches to obtain radii. The asteroseismic potential of Kepler is described in more detail by Christensen-Dalsgaard et al. (2008).

Apart from the determination of radii, the detection of solar-like oscillations in stars at different epochs along stellar evolutionary life cycles offers the prospect to test theories of stellar evolution and stellar dynamos for many stars. The input data for probing stellar interiors are the mode parameters. Accurate mode parameters are a vital prerequisite for robust, accurate inference on the fundamental stellar parameters.

We expect in total of the order of 1000 solar-like stars to be observed by Kepler in short cadence. Short-cadence oscillation data are needed to observe solar-like oscillations in main-sequence stars and subgiants as these occur at frequencies of the order of a few hundred up to several thousand microHertz. The short-cadence (1-min) data have a Nyquist frequency of ∼8300 μHz, while the Nyquist frequency of the long-cadence (30-min) data is ∼275 μHz.

In preparation for the Kepler mission, the asteroFLAG consortium has developed automated tools to analyse solar-like oscillations in main-sequence stars (e.g. Huber et al. 2009; Mathur et al. 2009; Mosser & Appourchaux 2009), and tested these tools extensively (see e.g. Chaplin et al. 2008; Stello et al. 2009b). Automated tools are needed to cope with the large number of stars we expect to be observed with Kepler. Here, we present an automated pipeline, built to determine oscillation parameters of solar-like oscillations in main-sequence stars and subgiants, which we describe in Section 3 (some mathematical details are deferred to Appendix A). We compare the results (Section 4) of the automated analysis with the input parameters used for simulations of realistic artificial data of a few hundred stars, prepared as if they were observed with Kepler in short cadence (Section 2).

## 2 SIMULATED DATA

The simulated time series are based on stellar parameters available in the Kepler Input Catalogue (KIC) (Brown et al. 2005) for the main-sequence and subgiants commissioning and survey targets. For these simulations, stellar parameters are randomly chosen within the expected formal and systematic errors around their KIC values and used as inputs to the model grid prepared for the Aarhus Kepler pipeline (Quirion et al. in preparation). The resulting parameters and model frequencies have then been used for the simulations. Rotation effects, granulation, activity and white noise corresponding to the brightness of the target have been added. Also, the lifetimes of the oscillation modes have been varied. All time series are 30 days long with a 1-min cadence.

The stellar models were generated with the Aarhus stellar evolution code (Christensen-Dalsgaard 2008b) using the opal equation of state (Iglesias & Rogers 1996) along with the Grevesse & Noels (1993) solar mixture using the opal and Alexander & Ferguson (1994) opacity tables. The frequencies of the p modes were calculated using the adiabatic pulsation code adipls (Christensen-Dalsgaard 2008a). The time series are generated using a combination of the asteroFLAG and Aarhus simulators (Stello et al. 2004; Chaplin et al. 2008).

## 3 METHODOLOGY

We developed an automated pipeline to obtain the following (oscillation) parameters of main-sequence and subgiant solar-like oscillators from Fourier spectra of the time series observations:

frequency range of oscillations,

frequency at which maximum oscillation power occurs,

parametrization of the background of the entire Fourier spectrum,

average large frequency separation between consecutive radial orders,

maximum mode amplitude and amplitude envelope of the oscillations,

linewidth (lifetime) at the frequency of maximum power of the oscillations,

individual frequencies.

Our methods to determine these parameters are described below, with some mathematical details and error calculations in Appendix A.

We stress here that this pipeline only uses the time series data as input and does not require any other information.

### 3.1 Frequency range of the oscillations

We are looking for high-order, low-degree, solar-like (p-mode) oscillations, the frequency (ν_{n,ℓ}) of which we expect to follow approximately the asymptotic relation (Tassoul 1980). In the present study, we use the following version of the relation:

*n*is the radial order and ℓ the angular degree. Δν is the large separation, which is sensitive to the sound travel time across the star, and ε is a constant sensitive to the surface layers.

*D*is related to the small separation (δν

_{02}) between adjacent modes ℓ= 0 and 2 by the expression δν

_{02}≈ 6

*D*. Because we deal with photometric data, it is unlikely that we can observe ℓ= 3 modes due to cancellation effects. We know from theoretical models and observations in the Sun and other stars that Δν, δν

_{02}and ε depend slightly on frequency and angular degree. Because the changes in Δν are usually relatively small, we consider Δν constant to a first approximation and search for a frequency range in which the power spectrum has peaks at near-equidistant intervals. From 200 μHz up to the Nyquist frequency, the power spectrum is divided into windows of variable width (

*w*) depending on the location of the central frequency of the window (ν

_{central}), with ν

_{central}separated by

*w*/4. The frequency ν

_{central}is used as a proxy of ν

_{max}and is therefore expected to scale with the acoustic cut-off frequency. Hence,

*w*is defined as

*w*= (ν

_{central}/ν

_{max⊙}) w

_{⊙}, with ν

_{max⊙}= 3100 μHz, i.e. the central frequency of the oscillations in the Sun, and w

_{⊙}= 2000 μHz, the expected width of the frequency interval over which we may find oscillations in the Sun, were the Sun to be observed as a bright star with Kepler.

To find equidistant frequency peaks, we compute the power spectrum of the power spectrum (PS⊗PS), which is equivalent to the autocorrelation of the time series, in each frequency window (see Fig. 1 for an example of a PS⊗PS). Subsequently, we check for the presence of features at predicted values of Δν/2, Δν/4 and Δν/6. The predicted value of Δν is obtained from Δν∼ν^{0.77}_{central} (see Hekker et al. 2009; Stello et al. 2009a), and we allowed for a 30 per cent deviation from the predicted value. When the probability of the presence of these three features being due to noise is less than 0.2 per cent, we interpret this as oscillations being present in the considered window. All windows in which we find equidistant frequency peaks are selected as part of the frequency range of the oscillations. For details on the computation of the probability see Appendix A.

In subgiants, for which we expect oscillations in the frequency range 100–1000 μHz, the assumption of regularly spaced frequencies may no longer be valid due to the presence of g modes and mixed modes. The increased luminosity of these more extended stars results in higher mode amplitudes (*A*). This is because *A*∼ (*L*/*M*)^{s}, with *L* and *M* the stellar luminosity and mass, respectively. The value of exponent *s* is of the order of 1, but this is still debated in the literature (e.g. see Kjeldsen & Bedding 1995; Samadi et al. 2005). As a result of the increased mode amplitudes we expect oscillations with a good signal-to-noise ratio, and therefore the presence of prominent peaks in the power spectrum. Therefore, in case we did not find equidistant frequency peaks, we fit a background signal including granulation, activity and white noise to the power spectrum and check whether there is a significant power excess with respect to this fit in the frequency range 100–1000 μHz (see Fig. 6 for examples of such fits). If this is the case, the frequency range of this power excess is taken to be the oscillation frequency range. We note here that before applying this fitting to real data one has to check for possible observational artifacts in the data, which can possibly have a similar signal in the power spectrum.

In case we do not detect any interval with equidistantly spaced frequencies, or significant power excess due to oscillations, we search once more for oscillation frequencies separated by Δν, but now we do not assume Δν to be constant with frequency. To account for this frequency dependency, we stretch (or compress) the frequency axis of the power spectrum slightly. This stretching is performed in such a way as to produce an equidistant pattern of peaks on the stretched, as opposed to the original, frequency axis. The PS⊗PS of the stretched power spectrum will therefore show a stronger (more prominent) signature of the large spacing than the PS⊗PS of the original spectrum.

The stretching depends on the value of Δν and on the frequency range of the oscillations. We assume a maximum of 10 per cent change in Δν over the oscillation frequency range. The maximum stretch (*s*_{max}) is therefore

_{stretch}) as follows: where ν

_{c}denotes the central frequency of the considered frequency range, which will usually be the frequency at which maximum power occurs, i.e. ν

_{max}. Furthermore,

*j*is an integer which may have both positive and negative values, i.e. negative stretching means effectively compressing the power spectrum. To find the optimum stretch value, we search for the value of

*j*for which we find minimum probability of the features in the PS⊗PS to be due to noise.

### 3.2 Background signal

A background signal (*bg*) consisting of granulation, activity and white noise is fitted to a binned power spectrum where we computed the average power over independent bins. The frequency range of the oscillations is excluded. Granulation and activity are represented by power laws, from which we obtain the time-scales (τ_{gran} and τ_{act}) and power (*p*_{gran} and *p*_{act}) of both phenomena, respectively (see equation 4). The granulation exponent *a* is left as a free parameter, while the activity exponent is fixed to 2. Fixing the activity exponent is justified by the fact that we assume an exponential decay of the activity over time. For the granulation the exponent is a free parameter, as in the original Harvey model (Harvey 1985) the granulation is modelled with three exponentially decaying power laws. For the present data, we can only fit one power law for the granulation due to the limited resolution and the input in the simulations, and therefore we do not assume exponential decay, i.e. fix the exponent at 2. Note that the background in the simulated data on which we tested our pipeline comprised two power laws. Should we find that more than two power laws are needed in real data, it will be straightforward to modify the code accordingly. In addition to the power laws, we add an offset *b* which contains mostly white noise. However, in cases where no oscillations can be detected some oscillation signal might also be present in this offset. The final form of the background model used is

For the input parameters, we chose for *p*_{act} and *p*_{gran} the maximum power of the binned power spectrum and 0.001 times this value, respectively. Furthermore, the inputs for τ_{act} and τ_{gran} were 100 000 and 1000 s, while the input value for *b* was the mean power at high frequencies outside the oscillation range. As a first estimate, we chose *a* to be equal to 2. To obtain the optimal fit, we vary the input parameters slightly. We randomly select one of the fitting parameters and multiply this by

^{2}is used as the best-fitting background fit.

In a few cases, the fitting with two power laws does not work properly. This is because only one decaying profile is visible in the data, either due to the presence of the oscillations at the same frequency as the hump of the second decaying profile or due to too low a signal-to-noise ratio, i.e. high white noise or low signal, or a combination of the two. In these instances, we fit only for one power law and the offset *b*. This does not provide us with an optimal fit at low frequencies (below ∼10 μHz), and the parameters cannot be used to infer properties of granulation and activity. However, at higher frequencies (>100 μHz), where the oscillations reside, the single power-law fit provides a reasonable estimate of the background. The standard deviations of the fitting parameters are used as errors.

### 3.3 Average large separation

For the estimation of the large separation (Δν), we compute the PS⊗PS in the frequency range of the oscillations. Here, we take into account that Δν depends on frequency and compute the PS⊗PS of a power spectrum with a stretched frequency axis. Determining Δν from the stretched power spectrum provides a more reliable measure of Δν and if required an estimate of the gradient of the large spacing with *n*δΔν/δ*n* can be made. For more details on the stretching, see the last paragraph of Section 3.1. The derivation of the gradient of the large separation is presented in Appendix A. In the PS⊗PS (see Fig. 1), we determine the position of the Δν/2 and Δν/4 features. The centroids and uncertainties of these features are computed in two ways. In the first method, we determine the power-weighted centroids of the features in the PS⊗PS, and their errors are computed as the standard deviation of grouped data (see equation A6 in Appendix A). In the second method, we compute the Bayesian posterior probability of the points in the PS⊗PS, using the same equations as for the individual frequencies, i.e. equations (8)–(10), which are discussed in Section 3.6 (see also Appourchaux, Samadi & Dupret 2009; Broomhall et al. 2009). Using these probabilities, we compute the posterior weighted centroid of the feature. The interval with a probability of the feature not being due to noise higher than 68.27 per cent, i.e. 1σ in a Gaussian distribution, is used as the uncertainty interval. Finally, for both methods, we determine Δν by computing a weighted average of Δν/2 and Δν/4.

We also compute Δν from an autocorrelation of the full oscillation frequency range and from an autocorrelation using the individual oscillation frequencies determined with the Bayesian approach only (again, see Section 3.6). A Gaussian is fitted to the feature at Δν in the respective autocorrelations, and the width of the Gaussian is used as an estimate of the error.

### 3.4 Maximum mode amplitude, amplitude envelope and frequency of maximum amplitude

Our next package provides estimates of the maximum mode amplitude, and the mode amplitude envelope as a function of frequency. Our results are scaled to be equivalent radial-mode amplitudes.

In summary, for Method I we began by subtracting the background fit from the power spectrum. The resulting, residual power spectrum is averaged over the range occupied by the modes using a boxcar filter of width 3Δν. Next, we multiply this averaged, residual spectrum by the large frequency spacing Δν, and finally we divide by a constant factor, *c*, to allow for the effective number of modes in each slice Δν of the spectrum. The value of *c* is chosen so that the above procedure gives observational estimates, as a function of frequency, of the power envelope for radial modes. *c* is computed assuming the presence of four frequencies in each Δν interval with ℓ= 0, 1, 2, 3, with relative power per mode of 1.0, 1.5, 0.5 and 0.03, respectively. *c* is the total power we expect in a Δν interval, i.e. 3.03. There is a slight dependence of *c* on limb darkening and thus on *T*_{eff} and log *g*, but the values change only by a few per cent and we ignore those changes here.

The highest value of the power envelope is an estimate of the maximum mode power. The mode amplitude envelope and the maximum mode amplitude are given by the square root of the power envelope and square root of the maximum power, respectively. The frequency at which the maximum mode power occurs is ν_{max}.

We choose to average the spectrum using a boxcar filter as opposed to the Gaussian filter (of width 4Δν) adopted by Kjeldsen et al. (2008). This is because it allows us to estimate very straightforwardly uncertainties for the amplitudes, here in independent frequency ranges of width 3Δν. For example, we estimate the uncertainty of the maximum mode power as the standard deviation of the powers in each bin in the frequency range 3Δν that contributes to the estimated maximum power. We may then calculate independent averages in ranges on either side of the maximum, to give an estimated power envelope with uncertainties. Uncertainties for the amplitudes follow by remembering that fractional errors on the mode amplitudes are equal to half those on the mode powers.

Our decision to average over 3Δν was to some extent determined by the following obvious compromise. The narrower the range, the more we avoid smoothing out potentially interesting features of the amplitude envelope, while the wider the range, the less subject we are to fluctuations due to the stochastic nature of the modes. But there is also another important factor, which argues for adopting a wider range, that is to suppress biases in the estimated maximum mode amplitudes when the signal-to-noise ratio is quite low.

The frequency at which maximum oscillation power occurs (ν_{max}) is computed as the weighted mean frequency of the oscillation power with the error computed from the standard deviation of grouped data (see equation A6).

In the second method (hereafter Method II), we fit a Gaussian to the binned oscillation power, where the binning is performed over intervals of 2Δν. The height of the Gaussian fit is then converted to amplitude per radial mode by multiplying by Δν/*c*, as per the other approach. We use the standard deviation of the fit parameters to compute the errors. The centre of the Gaussian fit is ν_{max}.

### 3.5 Linewidth of most prominent modes

We seek a straightforward and robust method for determining from the power spectrum the linewidth shown by the most prominent modes.

Our method relies on the fact that the height in the power spectrum of a solar-like (i.e. damped) mode peak depends not only on the total power of the mode but also (crucial to our method here) on the linewidth (or equivalently the damping time) of the mode and the intrinsic resolution in frequency of the spectrum. The height, *H*, in units of power per hertz is well described in both the resolved and unresolved regimes by (Fletcher et al. 2006; Chaplin et al. 2009a)

*A*

^{2}is the total power of the mode, Δ is the full width at half-maximum linewidth of the mode peak and

*T*is the effective length of the observations. We may re-express equation (6) in terms of the intrinsic (or natural) resolution in frequency δ= 1/

*T*. Substitution and the subsequent re-arrangement of the equation then give the following: which is the form required to explain our method. For a range of values of δ, we estimate the ratio

*A*

^{2}/

*H*(δ) of the most prominent radial mode in the spectrum (as explained in the next paragraph below). A plot of

*A*

^{2}/

*H*(δ) versus δ then yields data following a linear relationship. We fit a straight line to the data, and the intercept on the ordinate in principle provides an estimate of the linewidth, Δ. Evaluation of the spectrum at different δ is achieved by averaging the spectrum of the full time series over different numbers of bins

*M*(thereby degrading the intrinsic resolution as required). If

*T*is taken to be the effective length of the time series, this means that δ=

*M*/

*T*.

We estimate the ratio *A*^{2}/*H*(δ) as follows. We already have an estimate of *A*^{2} courtesy of the mode amplitude package in Section 3.4. To estimate the heights *H*(δ) in each *M*-bin-averaged spectrum we simply take the highest power spectral density in the range Δν/2 about ν_{max}. These estimates are only a proxy of the true, underlying *H*(δ), which means that to correctly estimate linewidths Δ, we must apply an empirical correction to the results. We found from simulations that a linear correction with both an offset and slope of 0.4 applied to the raw estimates of the linewidth is sufficient for this purpose.

Our simple proxy of *H*(δ) may sometimes have been estimated from the most prominent ℓ= 1 mode (depending on the inclination of the star), when it is the height of the most prominent ℓ= 0 mode that we require. In our first version of the pipeline, we accept this potential uncertainty and note that its main effect will be to add some additional scatter to the results. Any bias is taken care of by the empirical correction above.

### 3.6 Individual frequencies

For the determination of individual frequencies, we used a Bayesian approach adapted from Broomhall et al. (2009) and references therein. We want to test whether the power at each frequency in the power spectrum could be the result of a component of a stochastically excited mode (*H*_{1} hypothesis) or due to noise (*H*_{0} hypothesis).

As explained by Appourchaux et al. (2009), we aim to compute the posterior probability of *H*_{0}[*p*(*H*_{0}|*x*)] given the observed data *x*, i.e.

^{2}two degrees of freedom (d.o.f.) statistics of the power spectrum, the probability of observing

*x*given that there is only noise, i.e. the probability of observing

*x*if the

*H*

_{0}hypothesis is true, [

*p*(

*x*|

*H*

_{0})] is where

*x*is the observed power divided by the background.

For the alternative hypothesis *H*_{1}, i.e. the probability of observing *x* given that there is signal, we assume that we do not know a priori the mode height *H*. Therefore, we assume that the height can be taken from a uniform distribution between 0 and *H _{s}* (see Appendix A for details on the determination of

*H*). We can then compute the probability of observing

_{s}*x*if the

*H*

_{1}hypothesis is true [

*p*(

*x*|

*H*

_{1})] as follows (Appourchaux et al. 2009):

_{final}) were computed using the parameter estimation For the integration range, we use the frequency range for which we found that the posterior probability to find signal was larger than 68.27 per cent, i.e. 1σ in a Gaussian distribution. We also used this interval as the estimated error. We tested this error estimation by performing 1000 Monte Carlo simulations of one single stochastically excited mode with flat background noise. For each simulation, we computed the final frequency and its error. Then we expressed the offset between the computed frequency and input frequency in terms of its error. We see that for 77 per cent of the tests the offset is within 3σ and for 89 per cent of the tests it is within 5σ.

In cases where more than three significant oscillation frequencies could be detected, we used these individual frequencies to compute the large separation from the autocorrelation of the frequencies.

#### 3.6.1 Small separations

We have investigated for how many stars we might be able to find the small separation (δν_{02}), i.e. for how many stars we could see more than two ridges in the échelle diagram. This was the case for less than 10 per cent of the stars. This low number is most likely caused in part by the fact that we set the threshold posterior probability at only 0.5 per cent, which reduces the false alarm rate, but also the number of identified frequencies. Therefore, for most main-sequence stars only two ridges are present in the échelle diagram. Because of the low percentage of stars for which we might be able to identify the small separation with the strict thresholds currently applied, we do not include such computation in the automated pipeline presented here. In a further analysis using peak-bagging techniques, δν_{02} will be obtained. These results will be presented by Fletcher et al. (in preparation).

## 4 RESULTS

We are able to detect oscillations in 260 out of the 353 artificial stars, i.e. nearly 75 per cent. For these stars, we estimated the oscillation parameters and background as described in the previous section, which we then compared with the input values used to create the artificial data.

### 4.1 Oscillation parameters

In Figs 2 and 3, we compare our results for ν_{max} and Δν with the values computed from the input mass (*M*), radius (*R*) and effective temperature (*T*_{eff}), using the scaling relations (Kjeldsen & Bedding 1995):

The values obtained for both ν_{max} (Fig. 2) and Δν (Fig. 3) are in good agreement with the input values, for each of the implemented methods, although a slight overestimation is present in the determined ν_{max} values from Method I at higher frequencies where the height of the oscillations decreases (Chaplin et al. 2009b). For Method I, 54 per cent of our ν_{max} values agree within uncertainties with the input values, while this percentage increases to 97 per cent within three times the computed uncertainties. For Method II, we find that 37 per cent of our ν_{max} values agree with the input values within their uncertainties, which increases to 91 per cent agreement within three times the uncertainties. In 96 per cent of the stars for which we could detect oscillations, we found ν_{max} with both methods.

For Δν we found that 93 per cent, 90 per cent, 86 per cent and 97 per cent of our values agree with the input values within 5 per cent, for results computed with the weighted mean of the features in the PS⊗PS, the Bayesian probabilities in the PS⊗PS, autocorrelation of full oscillation frequency range and autocorrelation of individual frequencies, respectively. The uncertainties for Δν computed with the Bayesian probabilities in the PS⊗PS are larger and seem more realistic than for all other methods. For this method, 94 per cent of our values agree with the input within three times the uncertainties, while this is only 44 per cent, 20 per cent and 7 per cent for the other methods, i.e. weighted mean of the features in the PS⊗PS, autocorrelation of full oscillation frequency range and autocorrelation of individual frequencies, respectively. For the first three methods, we find Δν in all stars for which we detect oscillations. For Δν computed with the individual frequencies, we have results for 60 per cent of the stars. Due to the better error estimate, the results of the Bayesian probabilities in the PS⊗PS are most reliable. Despite their underestimated errors, the Δν values computed with the other methods are in more than 90 per cent of the cases compatible (within three times the uncertainties) with the Bayesian values.

In Fig. 4, our results for the maximum amplitude per radial oscillation mode are shown as a function of the input maximum amplitude per radial mode. For comparison, we also computed estimates of the maximum amplitude using the method of Kjeldsen et al. (2008). The results are consistent with the one-to-one relation, and for each method ∼90 per cent of our amplitude values are consistent with the input values within three times the computed uncertainties. Also, we computed the width of the frequency peaks in the power spectrum. The results are shown in Fig. 5. The input values of the linewidth follow the relation Δ∼*T*^{4}_{eff}, and we see in Fig. 5 that for stars brighter than 9 mag (black dots) our results for Δ are qualitatively in agreement with this relation. Good signal-to-noise ratio is required for this method, and for fainter stars the scatter in the results becomes considerably larger.

Four examples of background fits to power spectra with oscillations at different frequencies are shown in Fig. 6, all of which have a χ^{2} of the order of 1. We also compare in Fig. 7 our fitted values of the granulation parameters *p*_{gran} and τ_{gran} with the artificial input values. Here, we see that the results follow the one-to-one relation with the input value, but with non-negligible scatter.

Note that for stars where a detection was made, we were not always able to determine the oscillation parameters with all methods described in Section 3. For each parameter, we have at least one method that produces a result for all stars with detected oscillations, but for some methods we have results for fewer stars, down to 60 per cent of the stars with detected oscillations. The quoted percentages are always computed for stars with a result for the considered method.

### 4.2 (Non-)detections of oscillations

Next, we investigate empirically which parameters are of importance for the detection of oscillations in the data. First, we consider the apparent magnitude distribution of the stars with and without detected oscillations, see Fig. 8. As expected, the percentage of stars for which we can detect solar-like oscillations decreases for fainter stars.

We fitted the background for all stars, independent of whether we did or did not detect any oscillations. From a comparison of the distribution of the fitted parameters, we find that for stars in which we could not detect any oscillations the offset (*b* in equation 4) is on average larger than for stars in which we could detect oscillations, while the exponent *a* is typically lower. These distributions are shown in Fig. 9. These results are not unexpected since for stars for which we could not detect oscillations, the offset contains both noise and signal, while for stars with detected oscillations the offset mainly consists of noise. The latter can be seen in the bottom panel of Fig. 9, where we plot the offset as a function of the input ν_{max}. We indeed see that for stars with input ν_{max} > 3000 μHz for which we did not detect oscillations, the offset is higher than for stars for which we could detect oscillations in this frequency range. The exponent *a* influences the slope of the granulation in a log–log plot of a power spectrum. An increase in the offset will decrease the slope and therefore the exponent *a*. From these distributions, it might be possible to obtain upper limits on some oscillation parameters. We consider such an investigation beyond the scope of this paper.

## 5 DISCUSSION AND CONCLUSIONS

With the methods described in Section 3, we could detect solar-like oscillations and their parameters in 260 out of 353 artificial main-sequence stars and subgiants and individual frequencies in 154 stars, not further discussed here. In general, we have tried to be very cautious to reduce the number of false detections. Special care is taken in the identification of the oscillation frequency interval as an incorrectly identified frequency range will imply misidentifications for all oscillation parameters and the background fitting. Furthermore, parameters such as ν_{max}, Δν and amplitude per radial mode are determined with two or more (independent) methods.

The input values for ν_{max} and Δν are reproduced by our analyses for the majority of stars. Also, for the amplitudes per radial mode, our values are in agreement with the input values.

The widths and thus the lifetimes of the modes can be determined with reasonable accuracy for Kepler stars brighter than 9 mag. Our method does not contain detailed fitting, nor does it take into account that lifetimes vary for modes of different degrees. Nevertheless, for the bright stars we do find values consistent with the input relation Δ∼*T*^{4}_{eff} (Chaplin et al. 2009b).

An accurate determination of the value of the background at the oscillation frequencies is important for two reasons. First, the background level is taken into account in the extraction of oscillation parameters such as the amplitude per radial mode, and, secondly, it provides information on the power and time-scales of atmospheric parameters. From the fact that we can obtain oscillation parameters which are consistent with the input parameters, we can on the one hand infer that our background estimate in the oscillation frequency range is accurate enough to determine oscillation parameters. On the other hand, we still see scatter in the determined time-scale and power of the granulation around the input values. The scatter in the granulation parameters does not mean per se that the background level in the frequency interval is uncertain, as we fit a function with six parameters, but the parameters should be treated with caution when using them for further investigations of activity and granulation.

In terms of our sensitivity to detect oscillations, we found clear evidence that the percentage of stars for which we can detect oscillations decreases for fainter stars. Also, we found evidence that it is harder to detect solar-like oscillations in cooler (*T*_{eff} < 5500 K) main-sequence stars than in the hotter (*T*_{eff} > 5500 K) ones. This is because in the simulations (as in real data) the pulsation amplitudes scale with luminosity, with a weaker dependence on temperature.

In conclusion, the analysis tools compiled into a pipeline presented here proved to identify oscillations for a large fraction of artificial main-sequence stars. For the majority of these stars, we determine oscillation parameters within 3σ of the input values. The existence of such pipelines will be important to be able to perform an asteroseismic analysis on the many stars we expect to become available from Kepler.

We are grateful to the International Space Science Institute for support provided by a workshop programme award. This work has also been supported by the European Helio- and Asteroseismology Network, a major international collaboration funded by the European Commission's Sixth Framework Programme. SH, WJC, AMB, YPE, STF and RN acknowledge the support by Science and Technology Facilities Council.

## REFERENCES

*et al.*,

*et al.*,

## Appendix

### APPENDIX A: METHODOLOGY EXTENSION

Here, we provide additional (mathematical) information on the methodology described in Section 3, including determination of errors.

** Probability of peaks in PS⊗PS** The probability of the presence of equidistant frequency peaks in the PS⊗PS being due to noise is computed as follows. We compute the probability (

*p*

_{1}) that a random variable from a six d.o.f. χ

^{2}distribution is larger than the average height of the three peaks at Δν/2, Δν/4 and Δν/6 in the PS⊗PS. Each of these peaks has two d.o.f., hence the six d.o.f.

Next, we compute the probability of this occurring by chance at least once over the full *N* bins of the PS⊗PS. We must also take account of the fact that in practice we oversample the PS⊗PS by a factor of 10, so all bins are not independent. The resulting probability is given by

*not*due to noise is just 1 −

*P*.

A similar procedure is used to compute the probability of only one peak in the PS⊗PS, such as we use in the computation of the large separation. In these cases, *p*_{1} is computed as the probability that a random variable from a two d.o.f. χ^{2} distribution is larger than the height of the considered peak in the PS⊗PS.

**
Gradient of the large spacing
** Although results are not discussed in this paper, we can estimate δΔν/δ

*n*from the stretching as follows. First, we have

_{0}the average large spacing over the frequency or

*n*range of interest. Now, we may estimate δΔν/δν by differentiating equation (3). To differentiate the second term on the right-hand side in equation (3), we use the substitution , with , which gives So, after including the differential of the first term on the right-hand side, we have from which we obtain Δν as a function of frequency Finally, we find that the change in Δν as a function of

*n*is The best-fitting

*c*=

*js*

_{max}therefore provides a direct estimate of δΔν/δ

*n*.

** Standard deviation of grouped data** The standard deviation of grouped data is used to compute errors in Δν computed from the power-weighted centroids in the PS⊗PS, where we interpret each feature in the PS⊗PS as compiled of a number of bins with a certain height (

*f*) and midpoint (

*x*),

_{max}from the frequency-binned oscillation power as computed in Method I for the amplitudes per radial mode. Here, the total power and central frequency of each bin are interpreted as

*f*and

*x*, respectively.

** Bayesian signal hypothesis** For the computation of the probability of observing

*x*if the

*H*

_{1}hypothesis is true, i.e.

*p*(

*x*|

*H*

_{1}) (equation 10), we need to integrate over an interval between 0 and

*H*. To determine

_{s}*H*, we have smoothed the power spectrum over

_{s}*S*microhertz.

*H*then equals the maximum height of the smoothed spectrum minus the mean background noise level. To determine the optimum

_{s}*S*, we performed Monte Carlo simulations and determined the false detection rates. Spectra were generated using the asteroFLAG code (Chaplin et al. 2008) for 1000 different stars with random inclinations. We then used different values of

*S*to determine

*H*and from the obtained candidate frequencies, we also determine the ratio of the number of candidates that are actually modes to the total number of mode candidates. The higher this ratio the lower the proportion of false detections. We also determined the total number of modes that could be detected in each case to ensure that the method was producing a reasonable number of mode candidates. Note that a mode was counted as a detection if it lay within two linewidths of the input frequency and if the posterior probability was less than 0.005. We repeated the simulations for modes with different widths including 0.3, 1.0, 1.7, 2.4, 3.1 and 3.8 times solar widths for oscillations in the range 2000–4500 μHz. The optimum value of

_{s}*S*was found to be ∼10 μHz, even for the smallest linewidth. Similar simulations were performed for stars whose oscillations lay in the range 200–450 μHz. It was found that for frequencies in this oscillation range, it was more appropriate to smooth over a narrower

*S*to determine

*H*. If the oscillation frequencies are <450 μHz, we determined

_{s}*H*by smoothing over

_{s}*S*= 1 μHz.