On the free-precession candidate PSR B1828-11: Evidence for increasing deformation

We observe that the periodic variations in spin-down rate and beam-width of the radio pulsar PSR B1828-11 are getting faster. In the context of a free precession model, this corresponds to a decrease in the precession period $P_{\mathrm{fp}}$. We investigate how a precession model can account for such a decrease in $P_{\mathrm{fp}}$, in terms of an increase over time in the absolute biaxial deformation ($|\epsilon_{\mathrm{p}}|{\sim}10^{-8}$) of this pulsar. We perform a Bayesian model comparison against the 'base' precession model (with constant $\epsilon_{\mathrm{p}}$) developed in Ashton et al (2016), and we obtain decisive odds in favour of a time-varying deformation. We study two types of time-variation: (i) a linear drift with a posterior estimate of $\dot{\epsilon}_{\mathrm{p}}{\sim}10^{-18}\,\mathrm{s}^{-1}$ and odds of $10^{75}$ compared to the base-model, and (ii) $N$ discrete positive jumps in $\epsilon_{\mathrm{p}}$ with very similar odds to the linear $\epsilon_{\mathrm{p}}$-drift model. The physical mechanism explaining this behaviour is unclear, but the observation could provide a crucial probe of the interior physics of neutron stars. We also place an upper bound on the rate at which the precessional motion is damped, and translate this into a bound on a dissipative mutual friction-type coupling between the star's crust and core.


INTRODUCTION
The ∼500 day periodicity observed in the timing properties and pulse profile of PSR B1828-11 provides a unique opportunity to test neutron star physics. The first model, proposed by Bailes et al. (1993), consisted of a system of planets orbiting the pulsar. This model later lost favour, after Stairs et al. (2000) observed correlated modulation in the timing properties and beam-shape (the ratio of the heights of two fitted integrated pulse profiles). As such, a planetary model would require at least two orbiting planets with orbital frequencies that differ by a factor of 2 (see for example Beaugé et al. (2003)), while both interact with the magnetosphere over distances comparable to the Earth's orbit.
Instead, Stairs et al. (2000) proposed that the star was undergoing free precession, corresponding to a star that is deformed, with its spin-vector and angular momentum vectors misaligned. Subsequent modelling by Jones & Andersson (2001), Link & Epstein (2001) and Akgün et al. (2006) refined the precessional description, examining how the precessional motion served to amplify the modulations in spin-E-mail: gregory.ashton@ligo.org down rate, providing some quantitative detail to the precessional interpretation.
The existence of long period free precession has implications for the interaction between the superfluid, superconducting and 'normal' parts of the star. As shown by Shaham (1977), a pinned superfluid, as typically invoked to explain pulsar glitches, would result in a rather short free precession period, so that the observed long period can be used to place upper limits on the amount of pinned vorticity in PSR B1828-11; see Jones & Andersson (2001), Link & Epstein (2001) and Link & Cutler (2002). Furthermore, the interaction between neutron vortices and magnetic flux tubes in the stellar core is likely to be highly dissipative, which led to Link (2003) drawing the interesting conclusion that the persistence of the free precession required that neutron superfluidity and proton type II superconductivity coexist nowhere in the star, or else that the superconductivity is of type I. Additionally, Wasserman (2003) has argued that a sufficiently strong magnetic deformation of the stellar structure might force the star to undergo free precession. The issue of whether or not PSR B1828-11 really is precessing is therefore very important, in terms of its microphysical implications.
Motivated by the existence of periodic nulling pulsars (such as PSR B1931+24 (Kramer et al. 2006)), Lyne et al. (2010) posited an alternative explanation for the modulations seen in PSR B1828-11. Namely, that the system is undergoing magnetospheric switching. In this model, the magnetosphere abruptly 'state changes on a fast time scale, but can then be stable for many months or years before undergoing another fast change' (Lyne et al. 2010). This cycle periodically repeats according to some clock and produces correlated changes in the timing properties and pulse profile due to changes in the electromagnetic torque and flow of charged particles. However, to explain the double-peaked spin-down rate of PSR B1828-11, the model requires a complicated switching pattern such as that proposed by Perera et al. (2015).
In addition to the long timescale modulations, PSR B1828-11 is also known to undergo short timescale (over periods of a few hours) switching in its beam-shape, first demonstrated in Stairs et al. (2003), and illustrated further by Lyne (2013). In the context of magnetospheric switching, the natural explanation is that, rather than remaining in a single state for a prolonged period of time, the magnetosphere undergoes a random process of flickering between two states.
However, the magnetospheric switching model does not provide an explanation of why the modulations should be quasi-periodic. To remedy this, Jones (2012) proposed a model in which magnetospheric switching did indeed take place, but precession provided the necessary clock mechanism, with the energies available to accelerate particles in the magnetosphere being a function of the precessional phase. If there exists some critical energy threshold in the magnetosphere, the precession model could then lead to sharp magnetospheric transitions, with the magnetosphere being more likely to be in a given state at some precessional phases than others. More generally, Cordes (2013) has argued that a component of pulsar timing noise can be attributed to pulsars making random transitions between two or more states, with a periodic bias active in some, producing the observed quasi-periodicities.
It should also be noted that Akgün et al. (2006) have argued that short timescale variations do not preclude the pure precession model (i.e. precession without any magnetospheric switching) as a patchy emission region can also produce short term variations in the beam-shape.
In an attempt to shed further light on the problem, in Ashton et al. (2016) (hereafter referred to as Paper I) we performed a Bayesian model comparison using the Lyne et al. (2010) spin-down rate and beam-width data (W10, the width of the pulse at 10% of the maximum) for PSR B1828-11. We compared a switching model to a precession model (neglecting the short term flickering data and focusing only on the long term evolution), and found odds of 10 2.7±0.5 ('modest evidence') in favour of the precession model.
In this paper we will study what further inferences can be made based on simple some generalisations of the precession model. We use the same data set (spanning 5280 days between MJD 49710 and MJD 54980) as in Paper I, which was kindly provided by Andrew Lyne and originally published in Lyne et al. (2010). Specifically, we will look to see if there is any evidence for time evolution in the amplitude of the precession, as measured by the 'wobble angle' (see section 4 below), or for evolution in the modulation period of the variations in spin-down and beam-width. That the amplitude of the precession might evolve is natural, as one would expect dissipative processes within the star to damp the precession (Sedrakian et al. 1999). That the modulation period might change is less natural, but, as we describe in section 2, the data clearly favour such an interpretation, so this needs to be included on the model. The structure of this paper is as follows. In section 2 we provide a model-independent demonstration that the modulation period of the spin-down rate of PSR B1828-11 is decreasing. In section 3 we describe our Bayesian methodology. In section 4 we describe our 'base model' that other models will be compared to. In section 5 and 6 we describe extensions of our base model where the wobble angle and deformation, respectively, are allowed to vary (linearly) in time, while in section 7 we allow both to parameters to vary. In section 8 we consider a model where the deformation evolves by a series of discrete jumps, rather than varying continuously. In section 9 and 10 we provide some astrophysical interpretation of our results, and conclude in section 11 with some discussion of implications of our work, and other possible lines of attack.
In a separate paper ) we discuss consistency requirements between the free precession model of PSR B1828-11 explored here and the glitch that this pulsar underwent in 2009 (Espinoza et al. (2011) and www.jb.man. ac.uk/~pulsar/glitches/gTable.html).

MODEL-INDEPENDENT EVIDENCE FOR A DECREASING MODULATION PERIOD
The modulation period of PSR B1828-11 has so far been assumed constant. However, we now show in a modelindependent way that the period of the spin-down-rate modulations in PSR B1828-11 is getting shorter. Let us define ∆ν as the spin-down rate residual : the result of removing a first-order polynomial from the spin-down rate (which can be seen in Figure 1 of Paper I). This discards information on the average spin-down rate and the secondorder spin-down rateν leaving only the periodic modulations. To calculate the period of modulations, we will apply a Lomb-Scargle periodogram to the spin-down rate residual, which estimates the spectrum of periods by a least-squares fit of sinusoids (in particular, we use the scipy  implementation of the Townsend (2010) algorithm). In Figure 1A we show the resulting estimated spectrum for the entire data period, which agrees with the equivalent result presented in the additional material of Lyne et al. (2010). Two dominant modes are present in the spectrum: a major mode at ∼ 500 days and a minor mode at ∼ 250 days.
To study how this spectrum varies with time, we apply the periodogram in a sliding window across the spin-down rate residual data. Because the data is unevenly sampled, it is not possible to use a fixed window size, but the average window size is 2058 days with a standard deviation of 31 days. This duration is sufficiently long to always include several modulation cycles, but short enough to detect variations over the total data span. To visualise the result, in Figure 1B we stack the periodograms together and plot the spectral density as a function of the mid-point of each time window. This figure shows that the modulation period P mod appears to be decreasing over time. Taking the major mode from the first and last sliding window we find that over a time span of 3200 days the modulation period decreased from 505 to 470 days, corresponding to a rate of change ofṖ mod ≈ −0.01 s/s. We note that this estimate is inherently imprecise due to the fact that the Lomb-Scargle method is fitting a constant period sinusoid to data which is best described by a sinusoid with changing period. Nevertheless, it does provide a rough estimate. To underline the significance of this observedṖ mod , we found the bestfit for a phenomological fixed-period sinusoidal model -two sinusoids at P mod and P mod /2 with independent amplitudes and a relative phase -to the spin-down rate residual. We then generated 10 4 realisations of central Gaussian noise with a standard deviation of 4.3 × 10 −16 s −2 (based on the standard-deviation of the residual after removing the bestfit sinusoidal model). Adding the best-fit signal to each noise realisation, we apply our Lomb-Scargle process to calculate the change in period (due purely to the noise fluctuations) and find that the maximum |Ṗ mod | < 10 −7 . This illustrates that the observedṖ mod ∼ −0.01 for PSR B1828-11 is highly unlikely to be due to Gaussian noise fluctuations alone.
This shortening of the modulation period provides a new observational feature that needs to be accommodated by any model trying to describe this data. For example in the planetary hypothesis this would require that the two planets maintain orbital resonance while inspiralling. For the magnetospheric switching model proposed by Perera et al. (2015) and further studied in Paper I, it is unclear how this could be incorporated, given the purely phenomenological nature of this model. In the future it would interesting to understand this observation in the context of other models; in this work we explore how this feature is accommodated within the precession model of Paper I.

DATA ANALYSIS METHODOLOGY
In Paper I we performed a Bayesian model comparison between precession (with non-circular beam geometry) and magnetospheric switching for the observed long-term variations in spin-down rate and beam-width of PSR B1828-11. Because of the purely phenomenological nature of the switching model, no physical priors on its parameters were readily available and we therefore resorted to a two-step approach: first we performed parameter-estimation for both models on the spin-down data alone, by using wide flat priors for both models. Then we used the resulting posteriors as priors for a model comparison on the beam-width data. This yielded odds of 10 2.7±0.5 in favour of the precession model.
In this work, we focus on physical generalisations of the precession model and compare these to the 'base' precession model. The competing generalised precession models share the parameters of the base-model, but extend them with additional physical parameters that are allowed to be nonzero. The base-model priors can be thought of as effectively expressing certainty for these additional parameters to vanish exactly, while the generalised models relax this restriction and instead use plausible nonzero priors for them. This allows us to directly perform model comparison between base and generalised models on the full data set comprising both spin-down and beam-width data.
We define the data D as N observedν i values and M observed W j 10 values. We denote as σν and σW 10 the (assumed Gaussian) noise level for each type of observation. The likelihood for the data (see Section 2 of Paper I) given by model M with model parameters λ is then where ϑ = [λ, σν , σW 10 ] is the full set of parameters. To approximate the posterior density of these parameters, we use the Foreman-Mackey et al. (2013) implementation of the affine-invariant parallel-tempered MCMC sampler (Goodman & Weare 2010); the exact methodology is described in Appendix A of Paper I. We then use thermodynamic integration (Goggans & Chi 2004) to estimate the marginal likelihood of a given model (see Section 4 of Paper I) and hence the odds-ratio between models setting the prior ratio to unity. We use the posterior odds between models to quantify how much, if at all, each extension improves the power of the model to describe the data, compared to the base-model. This depends on both the improvement to fit the data as well as on the respective prior volume of the extension parameters, which provides an effective 'Occam factor' against the extension.

THE PRECESSION BASE-MODEL
We begin by introducing our base-model, the precession model based on the treatment given in Paper I. It is against this which the extended models will be compared.

Defining the base-model
We consider a biaxial star, spinning down by electromagnetic torque from the magnetic dipole m, which forms an angle χ with the symmetry axis of the star. Following Jones & Andersson (2001), we define θ as the wobble angle between the symmetry axis and the angular momentum vector. Precession produces modulations with period 1 P fp in the rotation of the magnetic axis. As a result, the spin-down rate and beam-width are modulated on the free precession period.
Combining precession with a generalisation of the vacuum dipole torque and allowing for an arbitrary braking index n, we show in Appendix A that the spin-down rate, in the small-θ limit, is given bẏ where [ν0,ν0] are the fixed frequency derivatives defined at a reference time t ref and ψ is one the three Euler angles describing the orientation of the star (see for example Landau & Lifshitz (1969)). We note that Equation (2) is equivalent to the results of Jones & Andersson (2001) and Link & Epstein (2001), although these previous works fixed the braking index to n=3. If the spin-down age is much longer than the precession period P fp , we have that in which we have implicitly defined the precession period as where ν(t) is the instantaneous spin-frequency at time t, and where ∆I d is the stellar deformation caused by elastic/magnetic strains, while Iprec is that part of the star that participates in the free precession. We can expect Icrust < Iprec < I * ; see Jones & Andersson (2001) for details. Formally, the spin frequency ν(t) is the integral of Equation (2). However, the sinusoidal variations due to precession will average to zero over an integer number of cycles. Therefore, we will neglect the residual modulations, which will have a negligible effect on the precession period, and approximate the spin frequency in Equation (4) by where ν0 is the fixed frequency of the star at t ref . We will define t ref at the epoch given in the ATNF (Manchester et al. 2.46887171470 ± 7 × 10 −11 Hż ν 0 −3.658728 × 10 −13 ± 5 × 10 −19 Hz/s ν 0 8.72 × 10 −25 ± 9 × 10 −27 Hz/s 2 τage = −ν 0 /ν 0 1.07 × 10 5 yrs n =ν 0 ν 0 /ν 2 0 16.08 ± 0.7 Distance 3.58 kpc 2005) entry for PSR B1828-11. This reference time, the frequency and its derivatives, and other useful quantities are listed in Table 1.
The pulse beam-width W10 is defined as the width of the pulse at 10% of the observed peak intensity. This beamwidth depends on the motion of the dipole m, how the intensity of emission varies across the beam, and on the relative position of the observer and the beam. The angle Θ between the dipole m and the angular momentum J can be expressed as which describes the polar motion of m in the inertial frame (Bisnovatyi-Kogan et al. 1990;Jones & Andersson 2001). Let ι denote the angle between the observing direction and J, and so the latitudinal separation between observer and beam is given simply by ∆Θ(t) = Θ(t) − ι.
In Paper I we first considered an emission model where the intensity of the emitted radiation is circularly symmetric around the dipole m with a radial Gaussian fall-off. However, this simple model is unable to account for the observed variations in W10, and we therefore extended the model to allow for the longitudinal width of the Gaussian describing the intensity to depend on the latitude ∆Θ(t) of the cut made through the beam; this was found to produce good agreement with observations (similar conclusions have previously been obtained by Link & Epstein (2001)). This results in a beam-width expression of the form where ρ 0 2 is the width of the Gaussian intensity at ∆Θ = 0 and ρ 2 describes the variation in intensity with ∆Θ; see Paper I. Our formulation of the base-model is now complete: Equation (2) is the base spin-down model and Equation (8) is the base beam-width model. This formulation of the base precession model differs from that used in Paper I in two ways. First, in Paper I, P fp was a constant model parameter. But in Equation (4), we now express the precession period P fp in terms of the fundamental model parameters: the instantaneous spin-frequency ν(t), wobble angle θ, and the deformation p. While this change of parameterisation provides a more complete description (in that it includes the time evolution of P fp with ν(t)), it was found to produce no significant change in the fit. Second, the sign of the first term of Equation (3) was positive in Eqn. (16) of Paper I, but is now negative; this change amounts to a redefinition of P fp which was done such that for an oblate star, p and P fp are both positive, while for a prolate star both these quantities are formally negative. As the spin-down rate and beam-width of the precession model (Equation (2) and Equation (8) respectively) are invariant to this change of sign (modulo addition of π to ψ0), the redefinition of P fp makes no substantial difference to the model. The base-model and all extensions considered in this work are subject to two symmetries which are important when interpreting our results. First, as a consequence of the invariant nature of the spin-down rate and beam-width to the sign of p, the data cannot fix the overall sign of p. We restrict this symmetry by choosing p > 0 in the prior, but we note that solutions where p → − p are equally valid. Second, it was noted by Arzamasskiy et al. (2015) that the spin-down rate in the precession model is symmetric under the substitution θ ↔ χ (we discuss how this can be derived for Equation (2) in Appendix A); in our model, this is also true for the beam-width. For both the spin-down and beamwidth models, this is fundamentally due to the symmetry of χ and θ in Equation (7). In our analysis, we consider only the 'large χ' model (as defined by Arzamasskiy et al. (2015)) and restrict this symmetry in the derivation by assuming that θ 1 and in the choice of prior. But, rederiving the equations with χ 1 instead results in Equation (2) with θ ↔ χ. Therefore, all models and parameter estimation considered in this work can equally be applied to the 'small χ' model by interchanging χ and θ. These symmetries may be important to consider when relating the model extension to physical theories.

Applying the base-model to the data
The base-model consists of the spin-down and beam-width predictions given in Equation (2) and Equation (8). Before applying these to the data, we first define our priors. Since we will use the same priors for these parameters when considering the extended models in the following sections, their prior volume won't have an impact on the model-comparison odds.
The full set of priors are listed in Table 2 and we now describe our choices in detail. For the spin frequency and frequency derivatives we apply astrophysical priors based on data from the ATNF database (which is listed in Table 1). Specifically, we use normal distributions with mean and standard-deviation given by the ATNF values. For the deformation p we use the absolute value of a normal distribution as prior, ensuring our gauge choice of p ≥ 0. The normal distribution has zero mean, and a standard deviation of 10 −8 , the approximate known value of p (Paper I). For the angles θ and χ we restrict their domain to solutions where the wobble angle θ is small while the magnetic inclination χ is close to orthogonal (the 'large χ' model, for more details see Appendix A)). The beam-width parameters (ρ 0 2 and ρ 2 ) use priors from Paper I, which were chosen to give a range of beam-widths up to 10% of the period and allow for some non-circularity. Finally the phase is given a uniform prior over its domain and we use uniform priors for σν and σW 10 from a crude estimate of the data.
We run MCMC simulations applying the base-model to the data under these priors and check that they converge and properly sample the posterior. In Figure 2 we show the spin-down and beam-width data together with the maxi- Table 2. Prior distributions and a posterior distribution summary for the base-model parameters.

Prior
Posterior median ± s.d. Units Comparison between the base-model (solid line) using maximum-posterior parameter estimates (MPE) and the observed spin-down and beam-width data (black dots). The shaded region indicates 1 σ of the estimated noise level in the spin-down and beam-width data, respectively. mum posterior estimate (MPE) solution of the model, i.e. using the parameters with the highest posterior probability. The samples from the converged MCMC chains are used to estimate the posterior distributions, which we find to be Gaussian-like, and which we summarise in the second column of Table 2 by their median and standard deviation. Compared to Paper I this base-model already contains one model extension: allowing for variation in P fp due to ν(t), as seen in Equation (4). However, this does not make any appreciable difference to the result in that there is no noticeable difference between the two panels of Figure 2 and Figure 7B and Figure 11B of Paper I. Furthermore, this extension does not explain the observed changing modulation period discussed in section 2. In order to see this quantitatively, we expand Equation (4) to first order as Sinceν0 < 0 this produces an increasing precession period, which over the observation span produces a fractional change in precession period of ∼ 7×10 −5 . Hence, the effect of the spin-down is too small and of the wrong sign to explain the observations of section 2. From Equation (4) we see that there are two further possible ways that P fp can evolve: either the wobble angle θ or the deformation p must evolve (or both). In the following sections, we will consider these possibilities in turn and evaluate the improvements in the power of the respective model to describe the data by computing their odds compared to the base-model.

SECULAR EVOLUTION OF THE WOBBLE ANGLE: THEθ-MODEL
There are two reasons for allowing a secular evolution of the precession wobble angle. Firstly, from Equation (4) we see that such an evolution could potentially drive a change in the precession period explaining the results of section 2. However, simple estimates show that the required rate of variation in θ is much too large to be consistent with the observations; we give such arguments in subsection 5.1 below. Secondly, and perhaps more fundamentally, in the precessional interpretation, dissipative processes are expected to exist and should damp the wobble angle, which would provide insights into the crust-core coupling (see for example Sedrakian et al. (1999) and Levin & D'Angelo (2004)). We model this in the simplest way by assuming that θ changes linearly in time as The base-model spin-down rate of Equation (2) was derived under the assumption that θ is constant. However, when rederiving this expression with an evolving θ according to Equation (10), we find that (to first order) the expression remains valid with the simple substitution θ → θ(t).
5.1 Can a changing θ explain the observed decrease in precession period?
Using the following simple argument, we can see that a nonzeroθ cannot consistently explain the observed decrease in precession period ofṖ fp ≈ −0.01 s/s found in section 2.
Taking the time-derivative of Equation (4) with θ = θ(t) (and dropping a negligible contribution P fp /τage ∼ 10 −5 s/s toṖ fp ) we can estimate the requiredθ aṡ where we used the base-model posterior estimates from Table 2 for θ and for P fp (these values are derived assuming thatθ = 0, however, as shown later in Table 3 they are consistent with those found when this assumption is relaxed). Similarly, with the estimate of Equation (11) the predicted relative change in the spin-down modulation amplitude from Equation (2) over the observation period of T ≈ 5000 days would amount tȯ θT θ ≈ −46.8 . This level of change in θ is inconsistent with the observed spin-down variations, which are well described by a model with an approximately constant θ (e.g. see Figure 2). We can therefore conclude that changes in θ are unable to explain the decrease in modulation period. Fundamentally, this stems from the fact that the dependence of the modulation amplitude on θ is ∝ θ, while the dependence of P fp is ∝ 1 + θ 2 /2 for θ 1.

Applying theθ-model to the data
We choose a weakly-informative prior for the additional model parameterθ: a central normal distribution with standard-deviation of 2.2 × 10 −10 rad/s, which is the value one would get ifθ ∼ 2θ/T , so effectively this allows θ to change by twice its magnitude over the observation time T . Using such a wide prior allows us to be confident that the posterior upper limit onθ will be informed by the data and not the result of an overly-constrained prior. The resulting posteriors for θ andθ are shown in Figure 3 and the posteriors for all model parameters are summarised in Table 3 alongside the priors (which are identical to those of the base-model).
The θ posterior is found to be essentially unchanged with respect to the base-model. Theθ posterior shows a substantial amount of 'shrinkage' compared to its prior range, but is fully consistent withθ = 0 and therefore provides no evidence that θ is actually changing. Nevertheless, we can use this to place constraints on the timescale of θ-changes by defining τ θ ≡ |θ/θ| and using the samples from Figure 3 to estimate the posterior distribution for τ θ , which is shown in Figure 4. This figure shows that there is little support for variation timescales below ∼ 100 years (confirming that the required timescale for τ θ to explain the changing modulation Table 3. Prior distributions and a posterior distribution summary for theθ-model parameters.

Prior
Posterior median ± s.d. Units period given in Equation (12) is too short). The distribution has a long tail, allowing for much longer timescales. The median of the distribution is 307.7 years and we can place a 95% credible lower limit of τ θ > 114.3 years. The odds between theθ-model compared to the base-model are found as 10 −1.70±1.39 , i.e. weak evidence against this extension. This shows the effect of the built-in Bayesian 'Occam factor': the extension of allowingθ = 0 (which can only improve the fit to the data) does not provide a sufficient improvement in likelihood compared to the increase in prior volume.

SECULAR EVOLUTION OF THE DEFORMATION: THE˙ p-MODEL
After ruling out variations in ν and θ in the previous sections as the cause for the observed level ofṖ fp , we see from Equation (4) that this leaves only variations in the deformation p as a possible explanation. In this section and section 8, we consider two distinct types of time-evolution in p: firstly the˙ p-model, a slow continuous change (approximated by the linear term) in p, and then the ∆ p-model, a series of distinct 'jumps' in p. These are just two possible phenomological models which are not founded in any physical theory, instead they are chosen simply to model two distinctive behaviours.

Defining the˙ p-model
We consider the simplest continuously changing deformation model by including a linear term (which also describes a larger class of sufficiently-slow continuous change in p): We will discuss some potential physical mechanisms for such a secular change in section 10. Allowing for a time-varying p(t) in Equation (4) and assuming this accounts for the majority of the change in P fp , we obtaiṅ where we have defined the characteristic timescale τ for the rate of change in p. Given that P fp is decreasing with time (c.f. section 2), for p > 0 this implies˙ p > 0, while for p < 0 this would correspond to˙ p < 0. As previously mentioned, we are unable to determine the sign of p from our current precession model, but in either case the magnitude of the deformation has to be increasing, i.e. d| p|/dt > 0, in order to account for the observed decrease in P fp .
From Equation (15) we can estimate the required˙ p for the observedṖ fp ≈ −0.01 s/s as found in section 2, which yields˙ p ≈ 2 × 10 −18 s −1 . We use this as the scale for a central Gaussian prior on˙ p aṡ where we restrict ourselves to positive values in accordance to our gauge choice of p > 0. This prior is weakly informed by the data, but we could equally well consider a less-informed choice of, say, allowing p to double in size over the observation timescale T = 5000 days, which would yield a prior scale of˙ p ∼ 2×10 −17 s −1 . This is only a factor of 10 wider compared to Equation (16), and would be expected to reduce the odds by about one order of magnitude at most via the larger 'Occam factor' (i.e. prior volume). Re-running the analysis with the wider prior confirms this, as we obtain odds that are reduced by a factor of ∼ 5 compared to using Equation (16), while yielding essentially unchanged posteriors.

Applying the˙ p-model to the data
The estimated posteriors distribution for selected model parameters are plotted in Figure 5 and the entire set is summarised in Table 4 along with their prior distributions. Comparing this to the base-model, two features are notable: the posterior mean of p is fractionally smaller, and˙ p has a posterior mean quite different from its prior, with a positive mean and essentially zero probability of˙ p = 0. Since˙ p > 0, the deformation is growing with time as expected from the observation that P fp is decreasing. As pointed out earlier, we recall that due to the degeneracy of the spin-down rate and beam-width with respect to the sign of p, this should therefore generally be interpreted as |˙ p| > 0.
In Figure 6 we plot the MPE spin-down and beam-width functions given by the model together with the observed data. Comparing this to Figure 2 it is evident that the model extension of Equation (14), allowing for evolution of the precession period via˙ p, noticeably improves the description of the data compared to the base-model. This improvement is confirmed by the odds between the˙ p-model and the basemodel which are found as 10 73.65±0.97 , i.e. decisive evidence  Table 4. Prior distributions and a posterior distribution summary for the˙ p-model parameters.
To understand how the two data sources contribute to the total odds, we repeat the analysis on the two data sets independently and find that the odds for the spin-down data are 10 49.35±1.44 while the odds for the beam-width data are 10 23.46±1.83 such that the individual odds approximately sum to the combined odds. One would expect the odds to sum up this way if the posteriors (when conditioned on each data set individually) are consistent; we show this is the case in Appendix B. The independent odds show that each data set separately strongly favours the˙ p-model, with the (clearly much cleaner) spin-down data providing stronger evidence than the beam-width data.
The large numerical values of the odds we obtain are related to the fact that for a Gaussian-noise model the log-odds scale linearly with the number of data points. For the spindown data set, which consisted of 257 data points, the average log-odds contributed by each point is 49.61/257 ≈ 0.19, or a factor of 10 0.19 ≈ 1.6 per data point to the odds itself. For the beam-width data, the corresponding numbers are 23.42/756 ≈ 0.03, or a factor of 10 0.03 ≈ 1.07 increase in odds per data point. This illustrates that it is the combination of many data points, each of which (on average) only modestly favours the˙ p-model, that leads to the large overall odds.
The timescale of the inferred increase in deformation is seen to be quite short: from the MCMC samples we calculate the median and standard deviation of the corresponding timescale to be τ ≡ ṗ p = 213 ± 10 years. (17)

SECULAR EVOLUTION OF WOBBLE ANGLE AND DEFORMATION: THE {θ,˙ p}-MODEL
In section 5 we showed that variations of θ cannot be responsible for the observed changing modulation period P fp . In the precession model considered here, the only plausible explanation for the decreasing P fp comes from allowing for an increasing deformation | p|. However, physically it is still quite plausible for the wobble angle θ to change over time, and at the minimum this allows us to set limits on the rate of change of θ, which has potentially interesting implications for the crust-core coupling. In this section, we will therefore consider a combined extension allowing for both θ and p to undergo linear secular evolution. This will allow us to set more stringent and realistic limits on the allowedθ rates than those provided in section 5.  In order to extend the base-model with both Equation (10) and Equation (14), we simply use the same formulations and priors as those given in section 5 and section 6. Figure 7 shows the posteriors obtained for the deformation p, the wobble angle θ, and their time-derivatives, and Table 5 summarises the posteriors found for all the model parameters. We note that the posterior forθ has again a slightly negative mean, but a narrower width than in thė θ-model shown in Figure 3. While the evolution in θ and p cannot be strictly separated, the evolution of the deformation p accounts mostly for the time varying modulation period, while the evolution of the wobble angle θ primarily probes the variation in amplitude. Figure 8 shows the resulting posterior for the timescale of θ-evolution, τ θ = |θ/θ|. We see that the tighter posterior onθ shifts the probability of τ θ to larger values than those seen in Figure 4, favouring slower rates of change of θ.
We can place a 95% credible lower limit of τ θ > 170.9 years and the distribution has a median value of 450.2 years. In this combined model, τ = 213±10 years (the timescale remains unchanged from the˙ p-model considered in section 6).
We obtain the odds in favour of the {θ,˙ p}-model compared to the base-model as 10 72.45±0.96 , i.e. slightly less than for the˙ p-model. We see that, similarly to the case of thė θ-model, the introduction ofθ does not produce a significantenough improvement in the fit compared to the increase in prior volume.

DISCRETE JUMPS IN DEFORMATION: THE ∆ p-MODEL
The success of the˙ p-model of section 6 indicates that a time-dependent p(t) provides a significant improvement over the base-model. In this section, we explore an alternative to the slow secular change by modelling the timevariation as a set of discrete jumps in p.

Defining the ∆ p-model
In this model extension, we allow p to undergo N distinct positive jumps. For each jump j ∈ [1, N ] at time tj, we define two dimensionless parameters: the fractional observation time at which the jump occurs, Rj ≡ (tj − t0)/T obs ∈ [0, 1], where t0 is the start-time and T obs is the total observation time, and the fractional (positive) variation in p at the jump, ∆j ≡ ∆ p,j / p,0 ∈ [0, ∞). In this way, the time evolution of p(t) can be written as where H(t) is the Heaviside step function.

Applying the ∆ p-model to the data
We assign a uniform prior distribution over the total observation span for Rj, the time of the jumps, with Rj < Rj+1 ∀ j. For the jump sizes ∆j we will use a prior consistent with the˙ p-model (see section 6), specifically a zero-mean Gaussian for˙ p with standard-deviation of 2 × 10 −18 s −1 . Distributing an equivalent total change in p on N discrete jumps, this gives an approximate scale of where we have substituted p and˙ p for the prior standarddeviation used in the˙ p-model. We use this to set the scale for a Gaussian prior on the fractional jump size as ∆j ∼ |N (0, 0.1/N )|.
To speed up the fitting process we have modified the original MCMC fitting process described in Appendix A of Ashton et al. (2016). Specifically, it was found that when fitting for the jump parameters, the MCMC chains took a long time to find the base-model best estimates for the spin-down parameters ν0,ν0, andν0 and the angles χ and θ. Therefore, instead of initialising the chains from the prior, for the parameters shared with the base-model we initialise them from the base-model posterior. This modification does not change our final estimates, provided that the burn-in period is sufficiently long to allow them to evolve from this point and explore all areas of the parameter space. For several values of N , we tested that evolving from the prior produced the same results, but the computation took longer to converge.
The number of jumps N can itself be thought of as a model parameter: ideally we would fit N as part of the MCMC sampling. However, to do this one must use a reversible-jump MCMC algorithm which can vary the number of model dimensions. This is not currently implemented in the software used in this analysis. Instead we have opted for a crude, but sufficient method in which we fit the model for different values of N individually and then use the respective odds to compare them. For each increase in N , the number of steps required to reach convergence increases. In Figure 9 we show the odds of the N -jump model compared to the base-model as a function of the number of jumps N . We see that up to N ∼ 6 the odds increase, then reach a plateau and start to marginally decrease for N = 10. In Figure 10 we present a stacked plot showing the posteriors on the jump times R for all jumps, for the different N -jump models. For ease of reading the plot, each jump is normalised so that the area under the N = 1 line is 1, under the N = 2 model the area is 2, etc.
The positions R at which the jumps occur appear consistent between different N -jump models. Moreover, the posteriors for each jump are multimodal, each having a unique 'fingerprint', which also appear consistent between models. This would not necessarily be expected if the best fit was quite agnostic about the exact jump times and simply dis- tributed N jumps randomly over the observation period. We also see a consistent progression play out as the number of allowed steps N is increased: up to N = 6 each increase in N finds a new jump site, but from N ≥ 7 the new jump sites are not so well defined. However, we cannot rule out the possibility that the MCMC chains did not successfully converged for some of these models. The data does not seem to strongly favour a particular number of jumps above N ≥ 6. Therefore, for illustrative purposes we will use N = 6 as our posterior estimate for N . While this model does not have the largest odds-ratio (as shown in Figure 9), the difference to the N=7 model, which does have the largest odds-ratio, is much smaller than the error bars. Moreover, this model captures all of the essential features of the discrete jumps as seen in Figure 10.

8.3
The N = 6 ∆ p-model Figure 11 shows the posterior for the six relative jump sizes ∆j which have typical sizes of order ∆j ∼ 0.01. We provide a summary of the priors and posteriors for all the model parameters in Table 6. Then, in Figure 12 we show the MPE fits to the spin-down and beam-width data; we indicate the jump times with vertical lines. By eye, the fit shows a similar level of improvement compared to the base-model Figure 2 as that observed in Figure 6, which is consistent with the similar odds of 10 73.53±2.79 relative to the base-model. As such we cannot distinguish between the two types of evolving deformation (continuous evolution verses discrete jumps). Figure 11. Posterior probability distributions for the six relative jump-size parameters ∆ j in the ∆ p-model. Table 6. Prior distributions and a posterior distribution summary for the N = 6 ∆ p-model parameters.

INTERPRETING THE UPPER LIMIT ONθ
Dissipative processes internal to the star may damp the wobble motion, leading to a decrease in θ. Looking at the posterior onθ shown in Figure 7, we see that, while the peak of the probability distribution lies at a valueθ < 0, the peak is nevertheless close toθ = 0, so there is no clear evidence for any evolution in the wobble angle over the duration of these observations. Slightly more informatively, in Figure 8 we plotted the posterior on the timescale τ θ = |θ/θ|. Even though this analysis finds no evidence for a secular variation in the wobble angle, we can use these results to put a lower bound on the timescale on which τ θ evolves, i.e. we can place a 95% credible interval that τ θ > 170.9 years.
Mutual friction, a dissipative coupling of neutron vortices and the charged component of the star, is the leading candidate for damping precession. The effect of mutual fric- tion on precession was examined by Sedrakian et al. (1999) and Glampedakis et al. (2008Glampedakis et al. ( , 2009. The strength of the interaction can be parameterised by a dimensionless quantity R, a measure of the relative strength of the mutual friction force to the Magnus force. In the limit of large R, the vortices become pinned to the crust, and a very fast precession frequency is obtained, in contradiction with the observations. The free precession observation instead requires the weak drag limit, R 1, to apply. The damping time can be shown to be given by where ISF denotes the moments of inertia of the core superfluid (see Sedrakian et al. (1999) and Appendix A of Glampedakis et al. (2009)). Strictly, R is a locally-defined quantity, i.e. a function of density, but this dependence is 'averaged-out' in the rigid-body dynamics analysis through which the above equation is obtained.
Given that the value of p is known from our posterior estimate, we can, as described in Glampedakis et al. (2009), convert our lower bound on τ θ to a 95% credible upper bound on R assuming that τ θ = τMF: Again as noted in Glampedakis et al. (2009), this can be combined with a lower bound on R that comes from analysis of the Christmas 1988 glitch in the Vela pulsar, where the relevant coupling time can be shown to be given by τMF = 1/(4πνR)Iprec/ISF. From the analysis of the Vela glitch by Abney et al. (1996), if we set Iprec/ISF = 0.1, we obtain 30 seconds as the upper limit on the crust-core coupling timescale, leading to a lower bound R 2.4 × 10 −5 .
The upper limit given here is an improvement by about one order of magnitude on that given by Glampedakis et al. (2009). A number of authors have attempted first-principles microphysical calculations of this parameter, appropriate for a neutron superfluid core (Alpar et al. 1984;Alpar & Sauls 1988;Andersson et al. 2006). Taking Equation (64) of Andersson et al. (2006), and setting the density 10 14 g cm −3 , and the proton density fraction to 0.1, one obtains a range for R ≈ 9.7 × 10 −5 -3.18 × 10 −4 as one varies the proton effective mass over the interval 0.5-0.7 times the bare mass. Clearly, there is a reasonable level of convergence between the shrinking observation range in R reported above and microscopic estimates.

INTERPRETING THE EVOLVING DEFORMATION
The rather rapid observed decrease in the free precession period is not easy to explain within the precessional model. We have shown above that it corresponds to an increase in the deformation parameter p of Equation (5). Re-writing this slightly, we see that we can interpret our observation as an increase on the deformation ∆I d /I * , and/or a decrease in the fraction of the star that participates in the free precession, Iprec/I * .
The total variation must correspond to a timescale of ≈ 213 years, a rather short timescale for a ∼ 10 5 year old neutron star. It is difficult to motivate a variation in Iprec/I * on this sort of timescale. One possible mechanism for producing a decrease in this quantity would be if the core superfluid does not contribute to Iprec. Then, if the star is currently cooling through the density-dependent normal matter-superfluid matter transition, the amount of core superfluid matter will be gradually increasing, with a corresponding decrease in the amount of core normal matter, hence, by our current assumption, decreasing Iprec. Such a mechanism has been used by Ho & Andersson (2012) to explain the n < 3 braking indices in some young pulsars. However, it is difficult to countenance such a mechanism applying here. PSR B1828-11 is a relatively old pulsar, and probably cooled through the normal fluid/superfluid transition when it was much younger. Also, its observed braking index is n ≈ 16 (see Table Table 1), so does not have n < 3 as would be expected if the electromagnetic spin-down torque were acting on a progressively smaller fraction of the stellar moment of inertia. Also, in the model of Ho & Andersson (2012), the newly created superfluid is required to pin to the crust, something which would result in a much more rapid rise in the free precession frequency via the gyroscopic effect of a pinned superfluid in a rotating star (Shaham 1977)-see the discussion below.
The alternative possibility is that the deformation ∆I d /I * is steadily increasing. The deformation itself may be supported by elastic and/or magnetic strains. In the case of elastic strains, it is very difficult to understand why the deformation should increase with time. Elastic strains can be expected to be steadily reduced by plastic flow (and possibly by occasional crustquakes), which would lead to a decreasing deformation.
In the case of magnetically-sustained deformations, it is again puzzling that the deformation should increase with time, as magnetic fields can be expected to decay, although the interplay of Ohmic decay, Hall drift and ambipolar diffusion processes can lead to a complicated evolution, with the (local) field strength increasing in some places. Nevertheless, the required evolution timescale ∼ 200 years is short compared to the timescales expected for these processes (see e.g. Goldreich & Reisenegger (1992)).
Note that if the exterior magnetic field also evolves on this time scale, then we should be able to measure it from the braking index. That is we allow B=B(t) in the usual vacuum dipole braking law (Shapiro & Teukolsky 1983) and solve for the derived braking index, giving . (24) This is much larger than the measured value of n ≈ 16 (see Table 1). So we can exclude models where the exterior field evolves in tandem with the internal one, but it remains unclear if the internal field could vary on such a timescale.
The possibility of the star containing a pinned superfluid component adds an additional strand to this story. As shown by Shaham (1977), a pinned superfluid has a profound effect on the precession frequency, adding a term proportional to IPSF, the amount of pinned superfluid: valid for small wobble angle and with the pinning directed along the symmetry axis of the biaxial star. Assuming that the quantity ∆I d /Iprec is positive (or else negligible), this immediately translates into the bound IPSF/Iprec 10 −8 for PSR B1828-11, much less than the value expected on the basis of microphysical considerations and superfluid glitch theory (Jones & Andersson 2001;Link & Epstein 2001). A possible explanation for this has been advanced by Link & Cutler (2002), who argued that the precessional motion itself might cause most/all of the pinning to break. This has motivated most models of PSR B1828-11 assuming that IPSF is exactly zero. However, as noted above, a small amount of pinning is allowed. This suggests an alternative mechanism to explain the evolving precession period: the previously broken pinning may be gradually reestablishing itself, with the amount of pinned superfluid increasing steadily over the last ∼ 200 years. Indeed, we can estimate the timescale ∆tre-pin for the gradual re-pinning to re-establish a reservoir of pinned superfluid of moment of inertia ∆tre-pin. From Equation (25) we haveİPSF = Iprec˙ p, so ∆tre-pin = ∆Ire-piṅ IPSF = 2.13 × 10 8 yr ∆Ire-pin/I * 10 −2 implying that such unpinning events have to be rare in the pulsar population, as PSR B1828-11 will not build up a typically sized pinned superfluid reservoir (at the few percent level) for a long time to come.
The ideas discussed here (evolving strain and pinned superfluidity) are all relevant to the physics of pulsar glitches. In fact, PSR B1828-11 was observed to glitch in 2009: see Espinoza et al. (2011) and www.jb.man.ac.uk/~pulsar/ glitches/gTable.html). The interplay between the modelling of the free precession and the glitch is an interesting topic in its own right. We have explored the consistency requirements between the free precession interpretation of the observed quasi-periodicities and glitches in a separate publication , which exposes significant tensions between the small wobble angle free precession model considered here and standard models of pulsar glitches.

DISCUSSION & OUTLOOK
In this work, we have extended the free precession model of Ashton et al. (2016) to allow for both the wobble angle θ and the deformation ∆I d /Iprec of PSR B1828-11 to evolve in time. The generalisation to allow for θ to vary was extremely natural, as dissipative processes internal to the star are expected to affect the wobble angle, causing it to decay in oblate stars (∆I d > 0), and grow in prolate ones (∆I d < 0; (Cutler 2002)). That the deformation ∆I d /Iprec should vary in time is less obvious. However, we first showed, in a completely model independent way (i.e. independently of the cause of the quasi-periodic oscillation in spin-down rate) that the ∼ 500 day modulation period was getting shorter; this necessitated the allowance for a time-varying deformation in our precession model. We in fact found no evidence for a variation in the wobble angle, with the inclusion of this new effect not producing a significant improvement in our ability to fit the data. We therefore proceeded to set an upper limit on the timescale on which it varied, τ θ 171 years. We translated this into an upper bound on the strength of the mutual frictional parameter R 1.2 × 10 −4 , describing the strength of the coupling between the crust and core, improving on previously published results by approximately one order of magnitude. When combined with a lower limit on the strength of this coupling, as deduced previously by analysis of the Vela 1988 glitch, this parameter is confined to the interval 2.4 × 10 −5 R 1.2 × 10 −4 , a rather narrow range, but consistent with microscopic calculation.
In terms of the evolving deformation, we explored two phenomenological ways to model this: either as a smooth secular evolution of the deformation or as N discrete jumps in the deformation. We find that both of these models produce a substantial improvement in the fit when compared to the base-model-decisive evidence that, in the context of precession, the magnitude of the star's deformation is growing; this can be seen in Table 7 where we list the odds-ratios for all models extensions considered in this work. For the discrete jumps model discussed in section 8, we found 6 or more jumps seemed to produce the best fit and used the N = 6 model to illustrate our results.
The odds-ratio between the˙ p-model and the N=6 ∆ pmodel is 10 0.11±2.87 , so we find no evidence to favour one of these two evolution models over the other. For both models an approximately equivalent informative prior was used, but when the odds-ratio is marginal the prior can have a substantial effect. We therefore cannot state without further investigation which of the two model extensions is preferred with certainty and without unfounded bias from the prior. It would be useful to propose substantive physical models which have well defined priors; this would allow a more thorough statement to be made. We discussed the possible physical cause of the evolution in the deformation. We mentioned elastic, magnetic and pinned superfluid interpretations, and pointed out some difficulties with all of these. PSR B1828-11 underwent a glitch in 2009 (Espinoza et al. 2011). In a separate publication, we discuss consistency requirements between the precession model described here and the glitch, folding in the evolving precession period into our discussion .
In interpreting this changing deformation, it may be important to note that while in this analysis we fitted the 'small-χ' model (as defined by Arzamasskiy et al. (2015)), our analysis can equally be applied to the 'large-χ' model by interchange of the θ and χ labels at the parameter estimation stage. This is shown in Appendix A and is due to the symmetry in θ and χ in the spin-down and beam-width models. The two solutions correspond to quite different physical scenarios which may result in fundamental differences in their interpretation.
The findings presented in this work provide a new way to probe neutron star physics. It remains to be understood what is the true cause of the changing deformation and whether this happens as a smooth secular evolution or as a number of discreet jumps. Moreover, it would be interesting to know if alternative models to precession can better model this behaviour.
It can be shown that deriving this expression, but making the assumption χ 1 in Equation (A2) and throughout (rather than θ 1) is equivalent to the transformation θ ↔ χ in Equation (A6). This symmetry was discussed by Arzamasskiy et al. (2015) and fundamentally results from the symmetry of θ and χ in Equation (7). Because the same symmetry also exists in our beam-width model (Equation (8)), the large-χ solutions presented in this work, can equally be interpreted as small-χ solutions by interchanging θ and χ.

APPENDIX B: CONSISTENCY OF POSTERIOR ESTIMATES IN THE˙ p-MODEL
For the base and˙ p-model, we investigated the behaviour when conditioned on each data set (spin-down and beamwidth) individually in addition to the combined results presented in section 6 and found that both data sets independently support the˙ p-model over the base-model. In Figure B1 we plot the posteriors for the˙ p-model parameters that are common to both the spin-down and beam-width parts of the model, excluding the frequency and spin-down parameters which are dominated in all cases by the astrophysical prior.
This figure demonstrates that the analysis performed on the two individual data sets independently arrive at reasonably consistent posterior distributions for these shared model parameters, with non-negligible overlap between the posteriors.
For the two angles θ and χ the beam-width data does little to constrain the posteriors, with the results even railing against the prior boundaries. Widening the prior (when conditioning on the beam-width) solves this issue, but the posteriors remain uninformative. Comparing with the analysis of the combined data set, we see that the combined posteriors are either a compromise of the individual posteriors, when they are both informative, as is the case for p, p, and ψ0, or they are dominated by the more informative spin-down data, as is the case for θ and χ. As such, when using a combined data set, there is no "tension" (i.e. the two data sets preferring different solutions) and so their log-odds sum approximately to the log-odds of the combined data set. This paper has been typeset from a T E X/L A T E X file prepared by the author. Figure B1. Selected posterior distributions in the˙ p-model as conditioned on the spin-down and beam-width data individually and the two combined. Note that the θ and χ posteriors conditioned on the beam-width data have been scaled by a factor of 10 so that they are visible on the same scale as the strongly peaked spin-down and combined results.