The Andromeda gamma-ray excess background systematics of the millisecond pulsars and dark matter interpretations

.


INTRODUCTION
Due to its proximity and mass, the center of the Milky Way (MW) is expected to be the brightest source of dark matter (DM) annihilation in the sky (e.g., Bertone et al. 2005;Charles et al. 2016).However, our view of the MW halo is obscured by large amounts of uncertain interstellar material.It is thus vital to carry out complementary searches for DM emission in regions with differing astrophysical uncertainties.
The Andromeda galaxy (M31), located at a somewhat low Galactic latitude (, ) = (121.17• , −21.57• ), suffers lower foreground extinction than the MW center.Moreover, due to the large inclination angle (∼77.5 • ) of the plane of M31 with the line of sight (Tamm et al. 2012), we can observe the M31's halo almost completely unobstructed by its stellar and gaseous disk.A wealth of recent observations serve to elucidate key differences between the MW and ★ f.zimmer@uva.nl† o.a.maciasramirez@uva.nlM31, which are thought to be due to their different accretion histories (Kormendy 2013).M31 has a stellar mass that is similar or possibly even larger than that of the MW (e.g., Watkins et al. 2010;Diaz et al. 2014).Furthermore, the M31's bulge is a factor of 5 to 6 times more massive than the compact bulge/bar system in the MW (Tamm et al. 2012;Licquia & Newman 2015), its supermassive black hole is about 50 times more massive than Sgr A ★ (Bland-Hawthorn & Gerhard 2016), and the star formation rate (SFR) in M31 is a factor of about 10 times lower than in the MW (Ford et al. 2013).Altogether, M31 constitutes not only an excellent target for searches of DM emission (e.g., Lisanti et al. 2018), but also a unique stepping stone in our efforts to understand the high-energy astrophysics of spiral galaxies.
Interestingly, there has been a recent discovery of an excess in gamma-rays coming from the inner region of M31 (Ackermann et al. 2017a), which appears to have similar characteristics to those of the longstanding so-called Galactic Center Excess (GCE) in the MW (e.g., Hooper & Goodenough 2011;Abazajian & Kaplinghat 2012;Gordon & Macias 2013;Macias & Gordon 2014;Daylan et al. 2016;Calore et al. 2015;Ajello et al. 2016;Ackermann et al. 2017b).In particular, using almost seven years of Fermi-LAT data, Ackermann et al. (2017a) detected extended diffuse gamma-ray emission from the inner ∼ 0.4 • (or ∼ 5 kpc) of the M31's halo, at the 4 statistical level.Its spatial morphology was found to be compatible with a uniform/Gaussian disk, and its spectrum was not well constrainedeither a simple power-law or a bump-like spectrum could give an acceptable fit to the data.Importantly, they obtained that M31's emission is not correlated with the distribution of interstellar gas nor regions of star formation activity, which are both mostly located in a large ring-like structure at ∼ 10 kpc from its center.However, gamma-ray emission from the gaseous disk was not strongly ruled out and might even be present up to the 50% level of the measured flux (Ackermann et al. 2017a).In addition, a later in-depth large field study (Karwin et al. 2019) of the outer halo of M31 found evidence for emission within the ∼ 120 − 200 kpc (or ∼ 8 • − 12 • ) of its center.
Since the M31 signal is only an "excess" with respect to our current understanding of the diffuse gamma-ray background, it is important to investigate the impact that diffuse mismodeling has on the characteristics of the signal.Very recent studies (e.g., Armand & Calore 2021; Di Mauro et al. 2019) have used either SkyFACT (Storm et al. 2017), or alternative diffuse emission models built with GAL-PROP (Strong & Moskalenko 1998) for the evaluation of these uncertainties.In the present study, we perform a reanalysis of this data with a novel approach; we construct tailor-made foreground gas map models for the M31 region which are expected to be less affected by biases.Karwin et al. (2021) highlighted that the foreground interstellar gas maps used in the identification of the M31 excess might contain a fraction of the interstellar gas that should belong to the M31 galaxy.Here, we excise M31 from the foreground interstellar gas models, followed by using various inpainting techniques (e.g. the methods "Nearest-Neighbour" and "SMILE") for reconstruction or processing all the information of the image by a neural network ("Deep-Prior" method) to inpaint the excised region.In this way we restore the actual spatial distribution of the interstellar gas in the foreground as best as possible.The resulting alternative maps are then employed in our reassessment of the characteristics of the M31 excess.
Studies of the GCE (Macias et al. 2018;Bartels et al. 2018;Macias et al. 2019;Coleman et al. 2020;Abazajian et al. 2020;Pohl et al. 2022) have demonstrated that the two most compelling explanations (astrophysical or DM emission) can be distinguished based on their spatial morphologies.Indeed, such studies have consistently found that the morphology of the GCE traces the stars of the inner Galaxy better than it traces the distribution expected for DM (however, see Di Mauro (2021) for opposite conclusions).Motivated by the very promising morphological results obtained in the GCE, here we attempt to separate the stellar from the DM hypotheses using a traditional fitting regression technique.For this, we construct empirical models of the stellar distribution in M31 using data from the PA AS survey, which has been mapping out the environment of M31 for more than a decade.We construct stellar maps for the disk based on this data and using the stellar contamination model of Martin et al. (2013), which effectively removes the stars residing in the Milky Way.Since, unfortunately, this data is unreliable for the inner most regions of M31, we use instead data from WISE (Wright et al. 2010) to construct the bulge component of our stellar templates.
The paper is outlined as follows.In Sec. 2 we describe the data used in this work.In Sec. 3 we present the methods used to construct the diffuse emission components and the various models used in our analysis.In Sec. 4 we present the results of the gamma-ray analyses and discuss their implications in Sec. 5.

GAMMA-RAY OBSERVATIONS
In this work we use data from the Fermi Large Area Telescope (Fermi-LAT), covering 10 years of observations (see Table 1).We use exactly the same data selection cuts assumed in the construction of the 4FGL-DR2 catalog (Abdollahi et al. 2020;Ballet et al. 2020) to avoid potential biases induced by new point sources (not present in the 4FGL-DR2 catalog) which would likely appear in our region of interest (ROI) if we were using a longer observation time.
We restrict ourselves to data from a 14 by 14 degree region in the sky, centered on the SIMBAD coordinates of M31, i.e. (, ) = (121.17• , −21.57• ), and spanning an energy range of 500 MeV to 100 GeV.We are most interested in the spatial morphology of the signal, since we want to determine in which part of the M31 galaxy the signal is strongest.We therefore chose a lower limit of 500 MeV to preserve as much angular information as possible, without too much of a loss of photon statistics.Previous studies have gone lower to 300 MeV (Armand & Calore 2021) or even as low as 100 MeV (Di Mauro et al. 2019).These studies have used the SOURCE event class data, while we restrict ourselves to the CLEAN event class data.This latter dataset benefits from a lower background rate than the SOURCE event class.
We make use of the Fermi ScienceTools version 2.0.0 software package to further reduce the raw data.Since the Pass 8 data release, it is possible to select data based on the reliability of the reconstructed direction based on the instrument's point spread function (PSF).We want to use photons from the PSF3 quartile only (the most reliable in reconstructed direction), which would benefit our morphological analysis the most.However, due to our bin-by-bin approach, higher energy bins lack statistical power if we do not include other PSF quartiles causing reconstructed spectra to fluctuate.Therefore, we restricted our analysis to the PSF3 quartile only for the first three energy bins, where the counts were high enough, and we used all PSF quartiles combined for the rest of the bins.
All the technical details of the selection filters applied to the data by the Fermitools are summarized in Table 1.For more information regarding the criteria, we refer the reader to the official Fermi website1 .

METHODS
In this section we describe the construction of tailor-made Galactic diffuse emission models for our ROI.One of the biggest challenges in creating an appropriate foreground model for the M31 region, is that it is very difficult to disentangle the foreground hydrogen gas that belongs to the MW from the actual gas in M31.The standard approach followed by the Fermi team consists of removing M31 from the hydrogen gas maps, based on cuts in the (, ,   ) data space with   being the local standard of rest velocity.After it  (2016a).Although this is a reasonable analysis choice, it is expected to introduce a bias in the gamma-ray properties of M31.In particular, it has been pointed out (Karwin et al. 2019) that the hydrogen maps used in the official foreground Fermi Galactic diffuse emission map might be holding some hydrogen gas which belongs to M31.In this work, we aim to construct more suitable Galactic foreground maps by using novel inpainting algorithms based on neural networks and deep learning to construct M31-free Galactic foreground templates.We also evaluate the uncertainties associated to this method by rerunning our gamma-ray pipeline with all the resulting alternative foreground maps.

Standard Fermi Galactic Diffuse Emission Model
One of the models we considered in this work is the so-called gll_iem_v07 model.This is the standard Galactic diffuse emission model recommended for analyses of point-like and small-sized extended sources, constructed by the Fermi collaboration-henceforth simply called the Fermi background.
The Fermi background is made up of a linear combination of templates, each one responsible for capturing the contribution of a physical mechanism to the total gamma-ray emission.Some of the largest contributions come from the decay of energetic neutral pions, and electron bremsstrahlung radiation.Since these two components are spatially correlated with gas, they were phenomenologically modeled using atomic and molecular hydrogen maps constructed from radio observations (Acero et al. 2016a).
Another big contribution to the Fermi background comes from the inverse-Compton scattering process.This component is very difficult to model correctly as it depends on many factors, which include solving the cosmic-ray transport equation.The Fermi team included inverse-Compton emission templates generated with the numerical code GALPROP (Strong & Moskalenko 1998).
The procedure followed by the Fermi team was then to fit these components-in addition to other empirical templates accounting for positive and negative residuals in the data-to the observed all-sky gamma-ray data using a template fitting approach.Once the best-fit fluxes for each template were determined, they combined all these in one compactified energy-dependent spatial template.This constitutes the gll_iem_v07 model.

Gas-correlated gamma-ray Component
As in the Fermi background, the gas-correlated emission is modeled by assuming that the gamma-ray intensity is proportional to the interstellar gas column density-mainly atomic (H) and molecular (H2) hydrogen.We use the high resolution H4PI survey (HI4PI Collaboration et al. 2016) as in the Fermi background as our first option.As a second option we use hydrodynamical gas and dust maps, i.e., H and H2 gas column density templates developed in Macias et al. (2018), which were already used to show that the GCE can be explained by an unresolved MSP population rather than annihilating DM (Macias et al. 2019;Abazajian et al. 2020).These templates were originally designed for analyses related to the MW Galactic center and for this purpose split into four galactocentric rings.Only the rings 3 and 4 of the H component contribute to the M31 region, whereas none do for the H2 component.
Similarly to the Fermi collaboration, we also excise M31 from the gas maps where present, but use more sophisticated inpainting algorithms to fill in the regions of missing data.Two methods for inpainting are employed.The first one makes use of deep convolutional neural networks to restore the image.It was developed in Puglisi & Bai (2020), where they successfully recovered the necessary statistical properties the image had before the reconstruction process.The authors tested it in the context of synchrotron and dust polarized emission, which represent the Galactic contamination in cosmic microwave background measurements.Their code was made publicly available as the Python Inpainter for Cosmological and AStrophysical SOurces (PICASSO) package.We successfully employed two methods of their code for our purposes: The "Nearest-Neighbour" algorithm based on Bucher et al. (2016) and the "Deep-Prior" method based on Ulyanov et al. (2020).
The other inpainting algorithm we use is from Li (2014), which we call the SMILE inpainting algorithm.It is based on solving the Laplace equation for each missing pixel by using the values of the surrounding pixels2 .Such an approach has been successfully used by Macias et al. (2019) to inpaint masked point-like sources near the center of the Milky Way.
Visual representations of the inpainting procedures when using either SMILE or PICASSO are shown in Fig. 1 and Fig. 2, respectively.The first column of panels in each figure show the original map, where M31 is present.The second column depicts the extension of the masked region, which we based on cuts described in Ackermann et al. (2017a), which were shown to effectively remove the majority of the disk of M31 from the data.The region was then inpainted with the aforementioned tools and we chose to use these algorithms over the one previously used by the Fermi collaboration due to their simpler implementation and to assess the impact on the properties of the M31 excess due to systematic uncertainties in the gamma-ray background model.These inpainted, M31-free versions can be seen in the last columns.In both figures, the lower row corresponds to the H4PI map and the upper row to the atomic hydrogen column density map used in Macias et al. (2018).Note that the latter was derived from the 21 cm LAB survey (Kalberla et al. 2005), which has much lower angular resolution than the newer H4PI survey.

Inverse-Compton Component
The second largest contribution to the diffuse gamma-ray sky comes from the inverse-Compton radiation produced by the photon upscattering by energetic electrons.All of the models for this inverse-Compton component we employ in this work are generated with the Galactic cosmic-ray propagation code GALPROP (Strong & Moskalenko 1998).
We used the three models from Ackermann et al. (2015) labeled Foreground Model A, B and C.This study was interested in investigating how their fit results were affected by the specific type of foreground model chosen.These three models are thought to encompass a very wide range of systematic uncertainties associated with the inverse-Compton component.Model A uses a distribution of cosmicray sources based on Lorimer et al. (2006) and standard choices for the propagation parameter setup.For Model B they include an additional population of sources for electrons near the Galactic center, which made this model a better fit according to their analysis.Also, Model B predicts an enhanced gamma-ray emission at high latitudes.This makes it highly relevant for out study as our ROI is relatively far from the plane.The diffusion and reacceleration parameters of Model C vary with Galactocentric radius throughout the Galaxy, as opposed to the constant diffusion coefficient assumed in the two other models.
Additionally, we use the inverse-Compton model of Abazajian et al. (2020), which is split into six galactocentric rings.Only their local and outermost rings contribute to the M31 sky region and are used in our analyses.
By combining templates for these two major components and different inpainting algorithms we arrive at a total of 2 (H1 components) × 3 (Inpainting Methods) × 4 (Inverse-Compton Models) = 24 combinations.By testing models over this kind of variety of background models, we can classify the obtained significances of detection against the systematic uncertainties that come with the modelling of the Galactic diffuse gamma-ray emission.In addition to the gas-correlated and inverse-Compton templates, we use the same components as in the Fermi background for the isotropic gammaray background, emission from the sun and moon, and the same 4FGL-DR2 source catalog.

Phenomenological M31 Models
All the phenomenological models we used are integrated in Fermitools.We tested whether the signal was point-like or spatially extended.Specifically, we used disks of different radii and uniform brightness or a Gaussian disk of different variance.In most of the studies related to this phenomenon, investigators have included similar templates in their analyses, from the discovery of the excess (Ackermann et al. 2017a) to both the previously mentioned studies closely related to our work (Di Mauro et al. 2019;Armand & Calore 2021).The purpose of these simple constructions is to test whether the data prefers an extension beyond a point-like emission, but these extended templates are likely unphysical in the sense that real emission will fail to be as symmetric and clean as they are.Arguably, better suited models for this type of analysis are based on multi-wavelength observations from space telescopes and ground detectors.

Interstellar Gas and Dust Models of M31
One of the difficulties in astronomical studies like ours, where we want to isolate specific astrophysical gamma-ray sources, is the interference of emission from gas and dust clouds.They can either be part of the Andromeda system or reside inside the Milky Way halo, extending across the line of sight between us and M31.There has even been evidence for hydrogen structures extending between the two galaxies in interstellar space (Lockman et al. 2012).
There have been studies dedicated to mapping out the gas and dust contents of the Andromeda galaxy.A deep wide-field H imaging survey has been done in Braun et al. (2009) with the Westerbork Synthesis Radio Telescope.We also include two different versions of this hydrogen map labeled by BraunV2 and BraunV3, which include corrections for opacity effects.An intensive study about Andromeda's dust was done in Draine et al. (2014) using observations from the Spitzer Space Telescope and Herschel Space Observatory.We use these three maps3 to account for unmodeled residual gas and dust emission belonging to M31.

Construction of the Stellar Density Templates
One of the most important undertakings of this work is to test whether the data prefer an explanation involving MSPs or DM.To that end, we constructed stellar density maps consisting of old red giants (from now on simply called stellar maps), which we use as proxy for the spatial distribution of MSPs given that they are both old stellar populations.
We will begin by describing the construction of our stellar maps, for which we adapted the procedure of Martin et al. (2013).The data used to construct our stellar maps stems from the Pan-Andromeda Archaeological Survey (PAndAS) (Ibata et al. 2014;McConnachie et al. 2018).The data is publicly available at the PAndAS archive 4 .
First, we selected all stars within a 150 kpc radius from the PAndAS survey, centered on the SIMBAD coordinates of the center of M31.These stars are then de-reddened to correct for extinction by using the  ( −) map from Schlegel et al. (1998) and the corrected color magnitudes are then obtained with (1) The observed magnitudes as stated in the PAndAS survey, are  and  and their de-reddened equivalent have the 0 subscript, which we omit from now on.In the color magnitude space of all these stars, we selected only a sub-sample corresponding to stars which have color and magnitude as expected from old, low-metallicity stars.This "selection box", as depicted in Fig. 3, is based on isochrones of a stellar population with a certain age and metallicity.For our purpose, we generated5 isochrones for six different age populations (3,5,7,9,11 and 13 Gyr), each having five different metallicities, i.e. -[Fe/H] ratios (0.5, 1.0, 1.5, 2.0 and 2.5).They were adjusted for the M31 distance modulus based on Conn et al. (2011Conn et al. ( , 2012)).
The data of the PAndAS survey has holes in some places due to gaps in the coverage, saturated bright stars, or instrumental failures.We reconstructed this missing data by using our SMILE inpainting algorithm.
Since we are ultimately interested in the old stars, which are part of the Andromeda system, the stars residing in the halo of the Milky Way are therefore contaminants and have to be removed.The density of these contamination stars, at a specific color and magnitude ( −, ), is modelled as an exponential increase towards the Galactic plane and center of the Milky Way as where the coordinates (,  ) are the equatorial coordinates (, ), projected on a plane tangential to the M31 centered celestial sphere.The functions ,  and 6 are constructed empirically from regions near M31, where contamination of the Milky Way foreground stars is low.For more details regarding these functions see Martin et al. (2013).A visualization of the individual steps of the construction for these maps can be seen in Fig. 4.This procedure gives us a stellar map for each of our six selected age populations ranging from 3 to 13 Gyr.These are now subtracted from each other in such a way that we get 5 stellar maps containing 3-5, 5-7, 7-11 and 11-13 Gyr population groups.
Unfortunately the data of this survey is not reliable in the bulge area of M31.This is mostly due to the telescope not being able to resolve individual stars in this bright region of the sky.We therefore follow the common procedure (see e.g.Martin et al. (2014)) of masking the bulge area in these maps and instead use data from the Wide-field Infrared Survey Explorer (WISE) (Wright et al. 2010) at wavelengths of 3.4 (W1) and 4.6 (W2) microns7 , which was shown to be best suited for tracing stellar light rather than dust (e.g.Ness & Lang (2016)).
The dimension of this mask is based on a study of the structural parameters of M31 (Courteau et al. 2011), which contains estimates of the extension of the bulge based on its luminosity profile (see their Fig.9).We show a close-up of this masked region together with the stellar density profile along the major and minor axis of M31 in Fig. 5.Note that using the raw W1 and W2 templates away from the bulge is not well justified given that these are expected to be heavily contaminated by dust/gas emission from the disk.In contrast to the MW center, we observe the bulge of M31 almost completely unobstructed by its own gaseous disk, due to the large inclination angle of M31.This further supports the assumption that the infrared W1/W2 data in the bulge is mostly produced by the stars.
One of the key differences in this work compared to previous works, is the use of empirical data over smooth analytical functions for our stellar templates.This guarantees a more sensitive comparison of the MSP to the DM hypotheses in our later analysis.A different approach based on using Einasto density profiles to model the bulge and disk of M31 has been used in e.g.Armand & Calore (2021).We compare their bulge Einasto profile to the radial density profile of our stellar map for W1 and W2 in the top and bottom row of Fig. 6, respectively.The middle and right panel show the normalized density along the major and minor axis, seen as black dotted lines in the left panel, respectively.Additionally, we have added a squared NFW density profile, representing our DM template, demonstrating that it is more concentrated to the innermost regions of M31 compared to the other two profiles, which impacts the results of our morphological analysis.
The combination of these masked stellar maps (each containing a different age group) with either a W1 or W2 bulge acts as our final model for the MSPs hypothesis.

Dark Matter Model
Our DM template has to account for both the M31 and MW DM halos, since our line-of-sight extends through both of them.We model the radial density profiles of both halos with the symmetric NFW profile (Navarro et al. 1996) given by (3) For the density  0 and the scale radius   for M31 and the MW we use the parameters from Karwin et al. (2019).We begin by expressing the radius from the center of either Andromeda or our Galaxy in terms of the line of sight variable  and the Galactic longitude and latitude coordinates  and  (centered on M31's SIMBAD coordinates in case of the M31 dark matter halo) with the law of cosines as To obtain the value for one pixel of our template, the DM density is squared8 and integrated along the line of sight as (5) The total value of each pixel   , also referred to as the J-factor, is then the sum of the contributions of both DM halos.This is repeated for each pixel to obtain the final template.Figure 7 shows the J-factor profile as a function of distance from the M31's centre.It is clear that the J-factor contribution from the MW halo becomes dominant for angular distances 1 • away from the centroid of M31 (Karwin et al. 2021).We also display (vertical lines) the size of our ROI in comparison with the extent of the observed gamma-ray excess, which demonstrates the adequacy of our analysis region.Furthermore, we compare our resulting radial J-factor profile with the one obtained by Karwin et al. (2021) (c.f.their Fig. 6 and our Fig.7) and find our values to be consistent with theirs.

Fitting Procedure
Since the flux ratios of the components included in the Fermi background model were obtained from an all-sky fit, these do not necessarily represent well the flux ratios in a small patch of the sky, such as the 14 by 14 degree region around M31 we have in this work.Another important issue is that this compactified map does not permit a rigorous evaluation of the systematic uncertainties associated to the Galactic diffuse emission model.In this work, we investigate the impact of each component making up the Galactic diffuse emission.For this, we fit all components of our alternative foreground emission model (FEM), as summarized in Table 2, individually using a bin-by-bin analysis procedure.We run a separate maximum likelihood fit in each of these bins, where we use a simple power-law (/ =  0  − ) to model the spectra of every source in our ROI.We allowed the normalization  0 of each source to vary, such that they have freedom to absorb any potential excess that is due to Galactic foreground emission.However, the spectral slope  was set to 2 to stay agnostic to the overall unknown spectral shape.The fits were performed with the PyLikelihood tool, where a total of 79 point sources were present in our ROI.The obtained likelihoods were used to compute the statistical significance of the various hypotheses in  units through equations A1 and A4.

RESULTS
The contents of the previous sections can be summarized as follows.
There are certain challenges we face when dealing with the complex nature of gamma-ray emission from M31.The main challenge is how to disentangle the foreground emission due to the MW from the emission in M31.We do so by constructing tailor-made emission templates for our ROI with the use of state-of-the-art image restoration techniques.The alternative M31-free foreground templates are used in this section to estimate the effects that systematic uncertainties on this component have on the properties of the M31 excess.
In summary, the foreground components comprising our baseline (also referred to as standard) model for the fits consist of: the Fermi diffuse emission model described in sect.3.1, the extragalactic diffuse emission model iso_P8R3_CLEAN_V3_PSF3_v1 for the first three energy bins and iso_P8R3_CLEAN_V3_v1 for the rest, together with the 79 point sources in our region of interest from the 4FGL-DR2 source catalog.Our alternative foreground model consist of either the HI4PI survey or the galactocentric HI and H2 ring templates for the gas-correlated emission, and the models labeled A,B or C for the inverse-Compton emission.The components modeling the extragalactic emission and -ray point sources are the same as in the baseline model.A list of all the templates considered in this work is shown in Table 2.  Table 2: Summary of individual components comprising our baseline (also referred to as standard) and alternative foreground emission model used in our analysis.Each model component is described in section 3.

Phenomenological Models
We started our analysis runs with the phenomenological models (see Sect. 3.3).We are interested in comparing our findings to previous work on this matter, since we are using the CLEAN class data compared to the more commonly used SOURCE class data, which might influence the results of a morphological analysis significantly.
We tested two cases where the center of the templates is either fixed or free.For the latter we moved the templates through a grid of locations.First, we moved it through a low resolution grid with a stepsize of 0.1 degrees in (RA,DEC) space.We then followed up with a high resolution grid with a stepsize of 0.01 degrees in (RA,DEC) space starting from the best-fit spatial position of the low resolution analysis.This procedure did not improve the fit much and yielded an almost identical value to the templates with a fixed center, all lying in the range of 5.42 to 5.46.All the results are summarized in Table 3.To visualize the extension of these freed models we superimpose them on a significance map of the M31 signal in Fig. 8a.This significance map was obtained by moving a putative point-like source through a spatial coordinate grid and calculating the TS value at that point.The model used to test for the likelihood did not include a model for M31.The resulting map traces the morphology of the gamma-ray emission and is solely meant for visualizing the extent of the emission given our data.
A quick but important check is to see if there is anything we missed to model in our ROI, such as a point-like source we neglected to include in our background model.To see this, we again generate a significance map but this time we include M31 as modelled in the 4FGL-DR2 catalogue.We do not find any significant unmodelled excess in our ROI as can be seen in Fig. 8b.

Physically-Motivated Models
For our more physically-motivated models (such as e.g.our stellar templates based on empirical observations) we first establish a significance ranking, for which we test each of our templates individually.We present the results of these tests in the upper section of Table 4.We found that our gas and dust templates all yielded similar results in the range from about 2.7 to 3.0 , showing that the M31 excess is not obviously correlated with emission from gas or dust.We also fitted a 150 MHz radio template, motivated by a study investigating the contribution of MSPs to this radio frequency (Sudoh et al. 2021), but could not obtain a significantly enough detection for that template with only ∼1.7 .Also, the signal does not seem to come from the disk alone, as our stellar disk templates all have rather low significances between ∼2.2 and ∼2.4.Most importantly, our results support previous works, finding the signal to be spatially correlated with the bulge of M31.In this ranking, our bulge templates even slightly outrank the DM template, with ∼5.4 and ∼5 respectively.
Since our stellar map is a two-component model comprised of a stellar disk component incorporating a certain red giant age population together with a component for the bulge based on WISE data, we show the significances for all 10 possible combinations of them.The results are displayed in the lower section of Table 4.We find that all combinations have a very similar significance of around 4.6 .Naturally, we expect the MSPs to be present both in the disk and the bulge of M31.However, due to the weak nature of the excess and the lack of statistics, there is no strong support for both the stellar disk and bulge template together over the bulge template alone.However, we expect this to change with more future data and future -ray mis-sions.We analyze the spectrum of the excess using this combination of templates rather than the bulge alone, since they differ by less than one  and the lack of statistics in the higher energy bins led to fluctuating results, when using the bulge template alone.Using the best-fit "disk+bulge" model, we display the residual photon counts map, i.e. the difference of the data to the model, in Fig. 9.In the left panel, we only fit the Fermi foreground model to the data, whereas in the right panel we added the disk+bulge model.The inclusion of this model has the effect of smoothing out the residuals, especially in the center of the ROI, where the M31 excess is located.

Spectrum of the stellar disk plus bulge model
We also investigate the spectrum of our best-fit stellar disk + bulge model, i.e. the model for the MSP population in the disk and bulge of M31.The spectrum together with previous results of the Fermi collaboration is shown in Fig. 10.The flux values are comparable to the study first documenting the Andromeda excess Ackermann et al. (2017a) and to one of the most recent studies Armand & Calore (2021) ensuring the validity of our results.
To accomplish our final goal, which is obtaining the alternative background induced systematic uncertainties for this best-fit model, we adopt the procedure of Acero et al. (2016b).The authors of this study have developed a method to calculate these uncertainties, in which they employ alternative interstellar emission models in addition to their standard (STD) model.Using the notation of Ackermann et al. (2018), the systematic error is then calculated as In this equation,  STD is the value for which we want to find the systematic error of, i.e. the flux value when using the standard background model.The flux values and their statistical errors when using the alternative background models are   and   respectively.In our case, we used the Fermi background as our standard model and the 24 FEMs as our alternatives.The resulting systematic uncertainties are shown as the green error bars in Fig. 10.For our case the statistical errors dominate over the systematic uncertainties.The reason for these large errors in our case as opposed to the much smaller errors in the recovered spectrum in the original study of the M31 excess (Ackermann et al. 2017a) is the following.In that study, the individual normalizations of the point sources in the bin-by-bin analysis were kept close to the values obtained in their broadband (i.e. over the whole energy range) analysis.We stayed agnostic to the overall shape of the spectrum by only performing a bin-by-bin analysis and keeping the normalizations of all sources in our ROI completely free, since we wanted to understand how this would effect the results.It turns out that the statistics are too low to make a definite statement about this.It is interesting however, that the excess survives all parts of our analysis with our rigorous and agnostic treatment of the data.

Implications for the DM interpretation
We have tested how strong the DM hypothesis is and if it can survive the inclusion of our more physically-motivated models.We do this with a template-nesting approach similar to the one used in Di Mauro et al. (2019).Even if we assume that parts of the emission are due to annihilating DM, there is still the emission of astrophysical nature, which has to be accounted for.We therefore successively include templates for emission by MSPs in the disk and bulge of M31, as    well as a template for gas-and dust-correlated emission into the FEM.The chosen templates are based on the significance ranking established in section 4.1.2and summarized in Table 4.
The resulting changes in the TS value of the DM hypothesis can be seen in Fig. 11.If we only use the Fermi FEM and no additional components are included, the TS value reaches a ∼5  significance, but quickly falls off significantly (∼1 ) as we account for the astrophysical emission from putative MSPs in the bulge and disk of M31 as well as emission from gas and dust.In summary, there is no support for DM related emission once the stellar templates are included.

Luminosity
Using the spectrum of the best-fit model, we calculate the -ray luminosity of M31 as follows.We calculate the flux for each bin by assuming a power law anchored to  0 = 1000 MeV as where we have expressed the normalisation constant  0 in terms of the luminosity , the distance to M31  M31 , and the photon energy flux above 500 MeV  500 , which is obtained with with the limits of the whole energy range considered in this work of  min = 500 MeV and  max = 10 5 MeV.We can then optimize the  2 quantity where  bin i ,mod are the flux values of the model for each bin and  bin i ,err its associated statistical errors.The optimization was performed with the algorithm (Nelder & Mead 1965), available in the iminuit package 9 .With this we obtain a total luminosity for M31 of a luminosity for the stellar bulge of and a 95% C.L. upper limit for the stellar disk of Note that the disk luminosity is only an upper limit since it was not significantly detected while, on the other hand, a bulge signal is detected at better than 5 confidence.
In order to gauge the -ray emissivity per unit mass of the M31 stellar structures we note that the total mass of the galaxy (Tamm et al. 2012) with the remainder in the bulge (+ halo, using the notation of Tamm et al. 2012).A recent, more tightly-constrained determination (Blaña Díaz et al. 2018) for the stellar mass of the M31 bulge (assuming an NFW form for its DM distribution) is that it is From these measurements we infer the following central values (upper limit in the case of the disk) for the -ray emissivity per unit stellar mass

Millisecond Pulsar Interpretation
Here we consider whether MSPs belonging to M31 can supply its detected -ray signal.The stellar population of M31 is predominantly old with its bulge and disk populations of similar mean age (Tamm et al. 2012).On the basis of the data presented by Williams et al. (2017), we determine a mass-weighted mean stellar age of 10-11 Gyr across the entire galaxy.At such an age, the recent binary population synthesis modelling by Gautam et al. (2021) suggests that the MSP population of M31 numbers around 10 6 , and the spin-down power liberated by magnetic braking of these pulsars is ∼ 5×10 28 erg/s ; this is easily sufficient to sustain the observed -ray luminosity 10 .Despite the energetics being easily met, two question marks hang over any putative identification of the M31 -rays with an MSP signal: i) the rather hard spectrum shown in the SED plot (Figure 10) does not resemble the classical ∼ 2 GeV bump spectrum of -ray MSPs, pulsars, or, indeed, the GCE; and ii) despite the strong detection of the bulge, the disk signal is only marginally detected.Given, as already noted, these stellar populations are characterised by similar mean ages, why does the disk not produce an easily-detectable -ray signal from its MSPs given its stellar mass actually exceeds that of the bulge?
Of course, we should grant at the outset that, formally, neither of these points is a show-stopper given the scale of present uncertainties/errors in our analysis (i.e., the large error bars on the -ray SED we have measured would accommodate a bump-like signal and the 95% upper limit on the disk -ray efficiency is clearly compatible with the measured efficiency of the bulge).Nevertheless, in anticipation of tighter constraints, we point out here that there is a natural scenario involving MSPs that would accommodate both a hard SED from the M31 bulge and a small, perhaps undetectable, signal from the disk.This, namely, is that the overall bulge signal is dominated not by the aggregate, 'prompt' curvature radiation from the magnetospheres of all the individual (albeit unresolved) MSPs, but rather by the inverse-Compton up-scattering of ambient photons by the cosmic ray electrons and positrons launched by these MSPs into the bulge ISM (cf.Petrović et al. 2015;Ackermann et al. 2017a;Song et al. 2021).Such emission can easily reproduce the observed hard spectrum up to 10s of GeV (e.g., Gautam et al. 2021).A natural explanation, then, of why we do not detect (or only marginally detect) such IC emission from the disk is that, in this environment, the ratio   / ISRF of the magnetic field to the local interstellar radiation field energy densities is significantly smaller than in the bulge (see below).Indeed, consistent with this, the high concentration of stars in the M31 bulge guarantees that the ISRF should reach a peak in this region.
To put numbers to this, on the basis of the stellar luminosity we 11 calculate a total radiation field energy density in the M31 bulge of 12 eV cm −3 and a 0.7 eV cm −3 in the disk at the approximate radius of the star forming ring (10 kpc).For the magnetic field, according to radio continuum studies (and making the equipartition assumption) (Hoernes et al. 1998;Gießübel & Beck 2014), the field amplitude in M31 reaches a peak at a radius of 800 − 1000 pc into the bulge; the measured value here, 19 ± 3 G, thus defines an upper limit to the ISM magnetic field in the bulge region from where the measured -ray signal emerges.On the other hand, the magnetic field amplitude at ∼ 10 kpc radius coincident with the star-forming ring is "3-4 times" (Hoernes et al. 1998)  which implies a transition from IC loss dominance in the bulge to synchrotron loss dominance in the disk, consistent with our scenario (albeit poorly constrained given the uncertainties).An additional, practical consideration is that a postulated -ray signal related to the disk stars would be distributed over a significantly larger solid angle than the bulge signal and may be soaked up in one or other of the extended foreground templates.An obvious extension to our analysis here would be to introduce a simultaneous spectromorphological treatment of the non-thermal radio data covering M31 (e.g., Tabatabaei et al. 2013) in addition to the -ray data; such is beyond the scope of this paper.Overall, our results are thus consistent with previous findings that the M31 signal, despite the hard spectrum (assuming the central values of the data points are close to correct) and despite the weak or even (consistent with) null detection of the massive stellar disk, is completely consistent with a dominantly MSP origin.

DISCUSSION AND CONCLUSIONS
In this work, we have performed a re-analysis of 10 years of Fermi-LAT data in the GeV energy range from the M31 region.We have employed novel image reconstruction techniques to derive a multitude of alternative foreground models, giving us a robust framework for the evaluation of the systematic uncertainties in the Galactic diffuse gamma-ray emission models.We have concentrated on the morphology of the emission of M31 by constructing empirical templates, which best represent the stellar population in the bulge and disk of M31.We have opted for this approach instead of using analytical functions like the Einasto profile to avoid possible morphological degeneracies with the DM distribution -as it is popularly modelled by density distributions similar to Einasto profiles.
The main results from our analyses, which involved the use of multiple foreground models, can be summarized as follows: • In this work we used Fermi-LAT photon events with improved directional reconstruction at low energies (PSF3 for the first three energy bins).One of our main goals was to have a better handle on the spatial morphology of the signal, whereas at low energies the angular resolution of the LAT instrument worsens.This implied a trade off between angular resolution and statistical power of the recovered signal for M31.Additionally, we varied the fluxes of all sources included in our region of interest.These two factors combined implied larger (statistical) error bars for the energy spectrum of M31, compared to previous studies.Based on this spectrum we could determine a -ray luminosity of  ,M31 = (1.6 ± 0.7) • 10 38 erg/s, which lies within the range of values compatible with the results in e.g., Ackermann et al. (2017a).
• The M31 excess was robustly detected despite the large set of alternative foreground models used in our analysis.By testing models for the signal (e.g., stellar templates) with different approaches of modelling the -ray foreground emission, we could estimate the systematic uncertainties related to the modeling of these complex diffuse emissions.We conclude that it is unlikely that the M31 excess is caused by statistical fluctuations of the Galactic foreground.We also found that that this excess emission appears localized within the bulge and disk regions of M31.
• We have tested stellar templates specifically constructed for this region of the sky to model the emission of a putative population of MSPs in the bulge and disk of M31.We have found these to be as strong as the widely used phenomenological models -a circular disk template with a uniform or Gaussian brightness profile -both sitting at the 5.4  level.With future observations, we expect that more accurate stellar templates will be able to outperform the more simple phenomenological models.
• Our findings do not support the star formation scenario, in which the regions rich in gas and dust contribute to the gamma-ray emission, since we do not detect the Spitzer and Herschel templates.As we also do not detect any of the hydrogen maps (Braun,BraunV2 and BraunV3) individually, the scenario where the main contribution comes from hadronic-only processes is unlikely as well.These findings are consistent with previous works (e.g.Ackermann et al. (2017a)) and can be linked to the properties of M31, specifically to the fact that the star formation rate of M31 was found to be about 10 times smaller compared to that of the MW (Ford et al. 2013), which can lead to a decreased contribution to the emission from the disk.
• We have tested whether the DM hypothesis survives the inclusion of the stellar templates.By nesting the stellar and DM templates during the fit, we find that the significance of the DM template drops to the 1  level.We therefore come to a similar conclusion as in related studies using this methodology (e.g. in Di Mauro et al. (2019); Armand & Calore (2021) for M31 and e.g. in Macias et al. (2019) for the MW): There is no significant leftover -ray emission which can be attributed to annihilating DM in the center of M31, when maps of stellar mass are included in the fit.
• We used the results of the binary population synthesis modelling by Gautam et al. (2021) to determine whether the MSPs theory could explain the observed luminosity per stellar mass, hard spectrum, and high stellar bulge-to-disk flux ratio.We found that (i) the observed energetics are easily met if M31 hosts about 10 6 unresolved pulsars with an average spin-down luminosity of ∼ 5 × 10 28 erg/s , and (ii) since the electrons injected by the MSPs lose energy more efficiently in the bulge than in the disk through IC, it is reasonable to conclude that both the hard spectrum and high bulge-to-disk ratio could be explained by an MSPs IC emission scenario.A potential detection of a high-energy tail (at TeV energies) in the M31 spectrum would provide strong support for such a scenario.In future work, we will perform a sensitivity analysis similar to those by e.g., Macias et al. (2021); Song et al. (2019) to investigate the capabilities of the upcoming Cherenkov Telescope Array to a tentative high-energy tail from MSPs in M31.

Figure 1 .Figure 2 .
Figure 1.Visualization of the inpainting process when using the SMILE algorithm.From left to right, the three columns show the original hydrogen template, the masked region and the inpainted final version.The upper and lower rows show the H component of Macias et al. (2018) (as described in the text) and the H 4PI map, respectively.

Figure 3 .
Figure 3. Color-magnitude diagram of all stars in a 150 kpc radius of M31 from the PAndAS survey with a 0.02 × 0.02 magnitude pixel size.The selection box boundaries are the same as in Martin et al. (2013), where the upper and right box limit is determined by the tip of the red giant branch (the highest reaching isochrone) and the right most reaching isochrone (the reddest star) respectively.

Figure 4 .Figure 5 .
Figure 4. Visualisation of the three stages of the stellar map construction.The coordinates ( ,  ) are the equatorial coordinates ( , ), projected on a plane tangential to the M31 centered celestial sphere.Left: All stars in our line of sight, where some are part of the Milky Way system.Middle: The model for the stars belonging to the Milky Way system, which are contaminating the sample we are interested in, i.e. all stars of the M31 system.Right: Residual stars after subtracting the contaminant stars, resulting in stars mostly belonging to Andromeda.

Figure 6 .
Figure 6.Radial density profile comparison of our W1 and W2 bulge templates along the major and minor axis (the dotted black lines in the left panel) with the bulge Einasto profile used in Armand & Calore (2021) and the squared NFW profile.All the panels are shown in arbitrary units (a.u).

Figure 7 .
Figure 7. Radial profiles of the J-factor of our DM template for both the M31 and MW halo component, consistent with those of Karwin et al. (2021).
Best-fit position of the phenomenological models for M31, where we scanned over a grid of spatial locations to find the best-fit position.The significance map generated by moving a putative point source through a grid of locations when including the source catalog model for M31.There is no significant unmodelled excess that could be interpreted as a point-like source.

Figure 9 .
Figure 9. Residual maps when we only fit the Fermi foreground model (left panel) to the data and if we include our best-fit bulge+disk model (right panel).The maps have been smoothed with a Gaussian kernel covering 1 • , approximately corresponding to the PSF of Fermi-LAT at lower energies.

Figure 10 .
Figure10.Spectrum of the fit with M31 modelled with the stellar disk (3-5 Gyr) + W1 bulge template, representing a putative MSP population.The red and green error bars represent the statistical and systematical errors, respectively.The flux values when using a uniform disk template of a previous analysis from(Ackermann et al. 2017a) are shown in black.

Figure 11 .
Figure11.The decrease in significance of detection of the DM template, when we one-by-one include a stellar disk, a bulge, a gas and a radio template.

Table 1 :
Fermi-LAT data selection criteria used for our analysis (applied to the data with the Fermi ScienceTools).
Elad et al. (2005)they reconstruct the foreground hydrogen maps based on the techniques described inElad et al. (2005); Acero et al.

Table 3 :
Significances for our phenomenological models for M31.The number of degrees of freedom of each template, i.e. the number of parameters left free to vary in the fit ( .. .), corresponds to 9.

Table 4 :
Significances for our physically motivated models and the two-component stellar maps.The TS and  values seperated by the slash symbol refer to the Templates seperated by the slash symbol in the same order.E.g. the right TS and  value in the last row corresponds to the Stellar Disk (11-13 Gyr) + W2 Bulge template.