A Probabilistic Method of Background Removal for High Energy Astrophysics Data

We present a new statistical method for constructing background subtracted measurements from event list data gathered by X-ray and gamma ray observatories. This method was initially developed specifically to construct images that account for the high background fraction and low overall count rates observed in survey data from the Mikhail Pavlinsky ART-XC telescope aboard the Spektrum R\"{o}ntgen Gamma (SRG) mission, although the mathematical underpinnings are valid for data taken with other imaging missions and analysis applications. This method fully accounts for the expected Poisson fluctuations in both the sky photon and non X-ray background count rates in a manner that does not result in unphysical negative counts. We derive the formulae for arbitrary confidence intervals for the source counts and show that our new measurement converges exactly to the standard background subtraction calculation in the high signal limit. Utilizing these results, we discuss several variants of images designed to optimize different science goals for both pointed and slewing telescopes. Using realistic simulated data of a galaxy cluster as observed by ART-XC we show that our method provides a more significant and robust detection of the cluster emission as compared to a standard background subtraction. We also demonstrate its advantages using real observations of a point source from the ART-XC telescope. These calculations may have widespread applications for a number of source classes observed with high energy telescopes.

X-ray and gamma ray regimes. The first and in some ways the most obvious is that the expected background counts can often exceed what is observed due to routine Poisson fluctuations. In this case the net counts are in fact negative, which is unphysical. These negative pixels/channels can be ignored or modified to be non-negative, albeit at the cost of losing information about the source. A related concern is that the large Poisson fluctuations expected for observations with only a few counts can lead to regions where noise can be misinterpreted as signal. Both of these concerns can be further compounded by the fact that high energy telescopes generally do not point at a single sky position during an observation the telescope is usually dithering around or slewing across a particular sky position during the nominal exposure, complicating the mapping between detector coordinates (where many of the NXB events are often generated) and sky coordinates (where the background of interest usually needs to be measured). One final challenge with a simple subtraction from a statistics perspective is that while the sum of two Poisson distributed random variables is another Poisson distribution, the difference is not 1 .
More sophisticated and rigorous methods of accounting for the background around sources often require the use of a model for the source in question. For example, the survey for sources in the Chandra Deep Field South (e.g. Luo et al. 2017, and references therein) compares counts within a model of the Point Spread Function (PSF) to those within a larger source-free region to calculate the significance of emission at the positions of candidate sources. Another method for identifying sources includes involving images generated from event lists with a filter whose properties involve assumptions of the telescope PSF and background, such as the analysis described in Pavlinsky et al. (2021a). Such methods are less frequently utilized for extended and diffuse sources, as implementing the source models for extended sources in such methods is more uncertain and cumbersome.
The data from Mikhail Pavlinsky ART-XC (hereafter ART-XC) telescope on board the Spectrum-Röntgen-Gamma (SRG) mission also operate in a background limited regime. The seven telescopes comprising this instrument are currently performing the first high spatial resolution (∼ 50 ) all-sky survey of the X-ray sky in the 4 − 30 keV bandpass. The rapid motion of ART-XC across the sky results in ART-XC's event lists having only short exposure times at each sky position, with only a handful of photons expected to arrive with each pass. Compared to the other X-ray astronomy missions described above, background events in the ART-XC detector are a significantly higher fraction of the total events that cannot be easily ignored or neglected. Removing background counts from these data is clearly subject to the limitations mentioned in the above paragraphs.
In order to address these challenges and produce additional meaningful measurements of the sky with the ART-XC event list data, we present a new statistical method for removing background events from X-ray images and estimating the sky photon count rates in detector coordinates. Our new method is very similar to a simple background subtraction but is not subject to the most serious drawbacks and limitations of that method. It is therefore especially well suited for analyzing diffuse and unresolved sources at low signal-to-noise in a model-independent fashion. The fundamental assumption behind this method is that, given the data available for each event, it is impossible to definitively identify any individual event as an background event or a photon from an astrophysical source (hereafter a sky photon). We can therefore only assign a probability as to it originating from the background or the sky. Although this methodology was developed specifically with ART-XC event lists (where the total counts per detector pixel are generally ∼ 30 or fewer and dominated by background events) in mind, the mathematical underpinnings of this method are valid for nearly all high energy astrophysics missions that observe data as discrete photon events. This methodology may have many useful potential applications for other X-ray and gamma ray telescope data currently limited by low signal and high background count rates. This paper is organized as follows: Section 2 discusses the mathematical foundations of how to estimate the sky photon counts in a given detector region, and Section 3 describes where and when these methods provide advantages over past practices. Section 4 describes some tests using Monte Carlo simulations to show that our new estimator is unbiased and compare its performance to a standard background subtraction calculation. Section 5 provides more expansive details as to how these calculations can be used to produce images for both pointed and survey observations, and Section 6 shows images produced using these new calculations on both real and simulated ART-XC data. Section 7 provides concluding remarks.

ESTIMATING THE SKY PHOTON RATE
Our process hinges on the following assumptions about the X-ray observation event list, and unless otherwise noted all variables are considered for an individual region on the detector or sky as small as an individual detector pixel: (i) The event list has already undergone a full battery of cleaning and processing steps. These steps may include restrictions on the energy band, event times, data quality flags, or region of the detector. While this is not strictly necessary from the standpoint of the mathematics we are about to describe, this method is not designed to further refine the parameter space of event properties.
(ii) The background events and sky photons in the event list are drawn from independent Poisson processes with expectation values of and , respectively. These two variables are in units of integrated counts in that detector region over the integrated exposure. While the calculations described below do not make any assumptions about the relative values of these two expectation values, in practice we are most concerned with the regime where > and 0.1.
(iii) An estimate for for the event list and detector region in question can be derived from either these same data or a different data set a priori. Precisely how this calculation is made, what its value is, and which data it uses will be strongly dependent on the telescope in question.
As a concrete example, the expected background counts for a given science observation might be estimated using "blank-sky" observations subject to the same quality cuts.
(iv) For the region in question, we observe a total of counts, of which originate from the background and are sky photons. The exact values of and cannot be directly measured, but they are obviously both non-negative integers that sum to . Our final estimate will need to account for all possible states of the observed counts into background and sky photon components (i.e. taking all values from 0 to ).
The value of scientific interest to measure is , which is the expectation value for the number of events in this particular region that originate from sky photons. Using the variables, notation, and assumptions described above, a simple background subtraction calculation would estimate as When considering the regime where ∼ and ∼ O (10), the result of Equation 1 has a high probability of resulting in a negative value for due to the expected Poisson fluctuations of the background (i.e there is a high probability of observing fewer counts than the expectation value of the background). Furthermore, the Poisson fluctuations of both variables can greatly skew this estimate of the expected sky counts when ∼ . These are the regimes for which this calculation are designed to address. Our treatment of our two Poisson processes will be Bayesian -rather than assume that the observed number counts is a randomly distributed variable about "known" values of and , we will instead keep the expected background counts fixed while calculating the probabilities of given that counts were observed. The fundamental formula at the heart of this entire calculation is the canonical Poisson probability mass distribution. For a known value of , we denote the probability of observing counts as

THE SUM OF TWO POISSON DISTRIBUTIONS -THE BAYESIAN PERSPECTIVE
Because of the assumptions we are making and the relative mathematical simplicity of the Poisson distribution, we can use Bayesian statistics to construct a posterior probability distribution calculation that assumes the broadest and flattest possible prior 2 on our only "unknown" parameter . Assuming that the prior distribution of is a constant across the entire feasible domain, i.e. = ∀ ∈ (0, ∞), may at first glance seem excessively broad. As we will now demonstrate, however, an exact analytic formula for an arbitrary central posterior density interval can be calculated under these assumptions. Since the sum of two Poisson distributed variables is itself a Poisson distributed variable, we can begin by defining the likelihood function for this model L ( | , ) as follows which is effectively identical to the probability mass function in Equation 2. By requiring the integral of this function over to be equal to unity, we determine the posterior PDF as where Γ ( + 1, ) is the incomplete upper gamma function, replacing the ! term in the denominator in order to normalize the integral over instead of the sum over . A simple derivative calculation shows that this posterior PDF is maximized at either = − whenever this difference is non-negative or = 0 whenever − < 0. We also note that, in the absence of any background (i.e. = 0), that the denominator would be exactly equal to !, leaving Equation 3 unchanged. The expectation value of for this posterior PDF is given as and can also be calculated exactly as We can also calculate the cumulative distribution function ( ) as The CDF has an exact analytic form, and can be written as where is the incomplete lower gamma function. Using this form, the median value of this posterior PDF can be determined by solving the equation ( ) = 1 2 numerically. Similarly, this analytic formula also enables a central posterior density interval to be determined. The upper and lower bounds of an interval that contains a fraction ∈ (0, 1) of the posterior PDF can be determined by solving ( ) = 1± 2 . As a concrete example, the lower and upper bounds of the 68% (1 − ) posterior density interval of can be calculated by setting the right hand side of Equation 8 equal to 0.16 and 0.84, respectively.

A SUMMARY STATISTIC FOR THE POSTERIOR DISTRIBUTION
The results of the previous section suggest a few summary statistics that might serve as well-motivated point estimates for posterior PDF given and . However, we have identified another summary statistic of of (which we denote as ★ ) that has better asymptotic behavior and lower bias than any of the estimators derived from the posterior PDF discussed thus far. In this section we will construct this point estimator and consider it in the context of the posterior PDF results of the previous section.
To construct ★ , we first consider an individual micro-state of the data where events originate from the background and events originate from they sky. For this micro-state, we can calculate the posterior PDF for given observed counts and a uniform prior over the domain of ∈ (0, ∞), P ( | ) as the canonical Poisson function where we are able to simplify significantly as compared to Equation 4. For this particular micro-state, we can determine a single-point estimate of (denoted as¯( )) by calculating the value of for which this posterior PDF is maximized. Solving the elementary calculation equation results in a simple and intuitive relationship between¯and that is trivial to calculate for all values of . We can also calculate the probability of observing background events given the expected background counts Some example values of¯and ( | ,¯, ) are shown in Figure 1. Our final measurement of the expected sky photon counts in this region, ★ , is then calculated as a weighted average of the values of¯over all possible values of , where the weights themselves are the probabilities of observing the background counts in that micro-state with the variable 1 being the sum of all of the weights This weighted average takes into account all possible realizations as to how the observed counts in the region can be distributed into sky photons and background counts. It is also by construction positive definite irrespective of the values of and . As we show in Appendix A, this probabilistic summary statistic also converges exactly to simple background subtraction in the regime where . We finally use ★ to calculate the probability that each individual event originates with a sky photon, , as Similarly, the probability for a single event being associated with an background event, , is estimated as We emphasize that these two probabilities clearly sum to unity. For data at very high signal-to-noise where , one can simply utilize the result of simple background subtraction ( − ) in the place of ★ , which can help remove the background events from images and other data products (see §6 for more details).
Three examples of the posterior PDF P and the corresponding 95% posterior density intervals ( = 0.95) for different values of and are shown in Figure 2. These figures show a few properties of these confidence intervals that hold generally, mainly that the intervals are highly asymmetric around our derived value of ★ . This asymmetry arises immediately from the asymmetry in the underlying Poisson-like distribution. Our estimator is also systematically lower than the mean and median estimators of .
A plot showing how the value of our three summary statistics and the 68% posterior density interval depends on for a total of = 25 events is shown in Figure 3. One particularly convenient property of our new estimator ★ is that it converges exactly to simple subtraction when the observed counts are much higher than the background (see the Appendix for a proof of this), while the median and mean values are systematically higher. As the expected background increases (while keeping the total number of observed counts unchanged) our estimator ★ smoothly converges to zero as expected, although it remains positive for all values of . This effect is due to the fact that the total observed counts become increasingly more likely to be a fluctuation in the background than evidence of a second independent source of events.

Regimes of Applicability
While mathematically valid for any number of counts drawn from a Poisson distribution, this method should not completely supersede simpler background subtraction methods. When there are sufficient sky photon events, simple background subtraction and the Gaussian approximations are sufficiently accurate. At the same time, the probabilistic treatment can be more time consuming to calculate at higher signal-to-noise. Unlike a simple subtraction calculation, a sum over all integers up to the total counts in that region is required to estimate the sky photon counts using this probabilistic method. In other words, a region with events requires at least a factor of more arithmetic operations than standard background subtraction. For event lists that have ∼ O (100) counts per region this can become significantly slower than a simpler, more traditional background subtraction calculation with negligible gains in accuracy.
Considering the numerical stability and accuracy of the calculation, the sum over all states is straightforward to calculate accurately for all values of and . When the expected number of background counts greatly exceeds the total number of observed counts ( 2 ), however, the ability to find roots for the CDF as calculated in Equation 8 can be limited by the numerical precision to which the incomplete gamma functions can be computed. It may not be possible to numerically determine a posterior density interval in these cases. This is also the situation where the observations are a low-probability realization of the expected background in an absolute sense, so we encourage any users who find themselves in this situation to review their background models carefully.
One final detail that was implicitly assumed for the calculations in §2 is that the expected background counts in the region in question ( ) is a value of order unity or larger. For situations where the expected background counts in the test region are significantly smaller than unity, the probability of any individual event being associated with the background becomes negligible and practically every event is treated as being from sky photons. Given the realities of total count rates for X-ray and gamma ray telescopes, those who wish to apply these calculations to real data should bin the event lists either in regions of detector space or time intervals in order to have an expected background counts of O (1), or at least to where there is a non-negligible probability of detecting at least one background event. Using nominal ART-XC values as a quantitative example as to why such binning is necessary, we will assume a background count rate of ∼ 10 −4 cts s −1 pix −1 for a detector with 1800 pixels. If the data is broken up into 1 s time intervals at the native detector resolution (i.e. with = 10 −4 ), we would infer that practically every event is associated with a sky photon (for = 1 observed counts in a pixel, ★ = 0.9999). The need for such binning is not unique to our new calculation -simple background subtraction results in an identical value. In fact, given the design of this calculation bins an order of magnitude smaller than what may be needed for Gaussian approximations to hold can be used. It is beyond the scope of this paper to discuss the best practices for binning up the data for an general observatory, however, given that these choices will strongly depend both on the mission-specific details of the telescope as well as the science questions of interest. ) values of , respectively. The estimate of ★ is by construction positive definite and converges to zero as the expected background gets much larger than the observed counts. The median and mean values show similar behavior but tend to be higher than ★ and − by approximately 1 count.

Bias
One of the desirable properties of a simple subtraction calculation is that it is an unbiased statistical estimator of (see the Appendix for a simple proof of this claim), at least for situations where the difference is not restricted to non-negative values. Imposing the physical restriction that our estimator of must be non-negative will inevitably introduce some bias into our estimation. In order to investigate the bias for our new summary statistic ★ , we utilize Monte Carlo simulations to estimate the expected bias compared to − . We also show, as a comparison, the equivalent bias calculations for the other two summary statistics presented earlier in this paper.
We have run 25,000 Monte Carlo simulations of Poisson processes with randomly determined ground-truth values of , and , between 0.05 and 50. This range of parameter values is based on the arguments made in the previous section, since this calculation does not offer sizable advantages in either accuracy or computation speed compared to the simple subtraction method when the total counts exceed ∼ 100. For each realization of these two parameter values, we generate observed counts using Poisson distributions and calculate all four summary statistics. We calculate the average bias with respect to the true simulation input value of . The results of this exercise can be found in Figure 4. As expected, all three of our new summary statistics are found to be biased, especially at the lowest values of . Of the three, only our new estimator ★ converges to zero bias at high values of . This numerical convergence in the average bias is unsurprising given the mathematical convergence between the two estimators in this regime. The median value of and mean value < > both converge to higher, non-zero values of bias around ∼ 0.6 − 1.2 counts for all values of investigated here.

Central Tendency
The performance of our new estimator cannot be solely judged on the basis of its bias. Another dimension for which a particular summary statistic can be tested is whether or not its value is representative of the underlying distribution. To investigate this, we calculated the average value of the posterior CDF (see Equation 7) at each of the four estimators as a function of the ground-truth value of used as simulation input. By construction our median estimator will have an average CDF value of 0.5. The results, shown in Figure 5, demonstrate that our new estimator has a nearly constant average CDF value of ∼ 0.46. The average CDF for the simple subtraction calculation 3 rapidly decreases at low values of , showing that while this estimator is unbiased that lack of bias comes at a serious cost with regards to the likelihood of your estimator being accurate given the observations. The mean value has an average CDF value that is slightly higher than 0.5, with an increasing value as decreases.  In summary, our new probabilistic calculation has many "desirable" characteristics -it is positive definite, converges to zero sky counts as the expected background greatly exceeds the total number of observed counts, and converges to simple subtraction in the high counts limit. It has a better tendency to be representative of the underlying posterior PDF than simple subtraction. We therefore conclude that the ★ estimator is superior to simple subtraction at low counts without introducing any adverse systematic effects at higher count rates.

CREATING IMAGES
Our new calculations offer the ability to assign, for each event in a typical x-ray or gamma ray event list file, the probability of that event being associated with either a sky photon or the NXB. This enables background events to be removed or down-weighted from images even in complex situations where the telescope is in continuous motion during the observation. In this section we discuss how to create meaningful images using these calculations for both pointed observations and survey data. There are several different variants of a standard image that are each well suited for different tasks and limited for others.

Random Removal of Likely Background Events
The background events from an event list can be removed statistically by assigning, to each event, its associated value of . A second number between 0 and 1 is randomly drawn from a uniform distribution. If this random number is less than the value of then the event is kept and included in the final image. Otherwise it is discarded. For a survey telescope like ART-XC, the surviving events are mapped to sky coordinates using the spacecraft's attitude at the event's arrival time and divided by the vignetting factor of the telescope at the event's detector coordinates. The total image of vignetting corrected events can then be divided by an exposure map that describes the dwell time of the telescope at each position to create a fully exposure corrected image. This method of background subtraction offers an exposure-corrected image that is "flat", in the sense that the large-scale spatial distribution of counts should be corrected for the expected behavior of the optics/detectors and pointing direction. It is limited in that it is susceptible to hot pixels that survive this removal process. For similar reasons, the resultant image may also have reduced signal-to-noise for strong sources, as events originating from "real" sources can possibly be removed from the image.

A Fractional Photon Map
A different variant on the background subtracted image is to assign the probability as calculated above to each event's sky position. In essence, we assign to each event a fraction of a photon, the precise value of which is given by the probability of the event originating from a sky photon. These fractional photons can then be corrected for vignetting before being mapped to their sky position. This version has far more pixels with non-zero values than the statistically subtracted version described above -no events are removed outright. Because the background events are still nominally present, the spatial distribution of image pixel values will not trace the vignetting function of the optics. This is one method of producing an image that has fewer noisy pixels and better preserves the flux of candidate sources, however. It may be well-suited for calculating the significance of any source flux at positions of interest.

A False Probability Map
For some applications, an image of the (i.e. the probability of a particular event being associated with an background event) values for each sky pixel may be of interest. This can be an especially useful quantity for source detection, as it can be easily be turned into a false probability map. For missions with multiple telescopes of similar performance such as ART-XC, the false probability value across all telescopes, , can be determined as a simple product where , is the value of for the ℎ instrument or telescope 4 . Bright sources would appear as minima in the false probability map, so for visualization/source detection purposes it may be more appropriate to present an image of 1 − . Such a map can readily identify regions with anomalously high sky photons, but given that the image pixel values are not proportional to the event counts it cannot be corrected for the exposure map.

EXAMPLES USING ART-XC DATA
The data from the recently launched ART-XC telescope are one example where such a probabilistic method of background subtraction is essential for producing meaningful images. Before discussing the nature of the data, we will summarize a few hardware and operational details for the ART-XC mission that motivate our analysis procedure. More details regarding the ART-XC detector hardware can be found in Pavlinsky et al. (2021b).

Telescope Summary
The ART-XC telescope system is composed of seven separate mirror module assemblies (MMAs) and detector systems, known as the Unit of Röntgen Detectors (URDs). Each URD is a CdTe detector with two perpendicular arrays of 48 Au/Pt electrode strips, resulting in a total of 48 × 48 = 2, 304 detector "pixels" (Pavlinsky et al. 2018a) with an angular size of ∼ 45 . Laboratory tests of the MMA's with high resolution CCD's have measured the on-axis half-power diameter (HPD) for each MMA to be ∼ 30 − 40 (Pavlinsky et al. 2021b), a value that is further degraded when convolved with the larger pixel size of the flight URD's.
The integrated background count rate in the 4 − 12 keV band for these detectors is estimated as ∼ 1.6 × 10 −4 cts s −1 pix −1 . This average value across all detector pixels was determined using pointed observations of a blank region of the sky (Pavlinsky et al. 2021a). Pixel and telescope-dependent structure in the background is clearly observed, with individual pixel values varying by as much as a factor of ∼ 2 around this nominal value. We therefore expect ∼ 0.2 cts s −1 across all of the active detector pixels for each telescope.
The data we are considering in this work is from ART-XC's all-sky survey, which covers the entire sky every six months. Its motion during the survey corresponds to 1.5 degrees per minute, or 90 arcseconds (two detector pixels) per second. From any particular source on the sky we should anticipate the ART-XC data only having a few counts during a pass dispersed over many detector pixels. A sizable fraction of the total counts in these detector pixels will originate from the background. These values should make it clear that a simple subtraction of the background in either sky or detector coordinates will be severely limited in this regime and is best accounted for on an event-by-event basis.

Tests Using Simulated ART-XC Data
We will first test the quality of our background subtraction using simulated ART-XC event lists where the positions and fluxes of sources as well as the input background spectrum are known a priori. The ART-XC data is simulated using SIXTE (Dauser et al. 2019), which uses the ART-XC telescope specifications described in Pavlinsky et al. (2018b). The simulation also makes use of the latest ART-XC calibration files, including the vignetting functions, PSF, and the auxiliary response function 5 . The simulation makes use of the real ART-XC attitude pointings near the ≈ 200 deg 2 North Ecliptic Pole (NEP hereafter) region from the first 6-month epoch of the ART-XC all sky survey, modified only to remove the gaps in coverage times inherent to the real ART-XC observations. Images for all of these simulations are created by calculating a gnomonic projection of the celestial coordinates (right ascension and declination ) centered at the J2000 position , = 266.53846 • , 66 • with an image pixel size of 15 . Simulations were only run for one of the seven telescopes.
We choose to simulate an extended source whose morphology is based on the Chandra image of the galaxy cluster Abell 2146. The purpose of this set of simulations is to examine the performance of the background-subtraction algorithm instead of mimicking a specific X-ray source. We assume the cluster spectrum is a simple isothermal model with Galactic absorption ( × in XSPEC). The Galactic absorption column density is fixed to 1 × 10 21 cm −2 , and the cluster temperature is fixed at = 6 keV. The metallicity is fixed to = 0.3 and the redshift is fixed to = 0.1. The normalization of this thermal model was adjusted such that our simulated cluster has a 4 − 30 keV flux of 1 × 10 −11 erg cm −2 s −1 .
We have run simulations of this cluster both with and without an injected background spectrum. The background spectrum, shown in Figure 6, was derived from ≈ 622 ks real ART-XC observations of sky regions without known X-ray sources. A third simulation that has this same input background spectrum but no cluster emission was also run. A total of 358 and 159 events were detected within a 3 radius of the cluster center for the cluster+background and background simulations, respectively. From these values, we should expect any source flux that accurately accounts for the background to contain approximately ∼ 200 net counts. A larger 27 radius region located far away from the cluster had a total of 831 and 794 counts in the cluster+background and background simulations, suggesting the two event lists have background count rates similar at the ∼ 5% level.
For the cluster+background simulation, we apply our background subtraction algorithm to the output event list in 5 × 5 square regions of sky pixels (1.25 × 1.25 ). Because of the continuous motion of the telescope and frequent passes through the NEP during the survey (∼ 6 times per day), the exposure map (and subsequently the background counts map) is extremely non-uniform in sky coordinates. Our expectation value for the background counts in each of these 5 × 5 pixel regions is determined using the identical region of the sky in the background-only simulation. A single value of ★ was determined for each of these regions, which is used to assign an equal value of to every event contained therein. We used those values of to construct new images of the cluster -one using the random removal of background method and the other projecting fractional photons back onto the sky.
Images of the cluster with and without our probabilistic subtraction calculation applied are shown in Figure 7. For comparative purposes, we have also produced an image of the cluster with the background image subtracted simply. Using simple aperture photometry within this 3 radius, the random removal and fractional photon images have a signal-to-noise ratio of ∼ 10.78 and 11.1, respectively, as opposed to ∼ 8.8 without any background subtraction. For simple subtraction, the nominal signal-to-noise ratio is ∼ 14.9, but this value is artificially inflated by the presence of unphysical negative pixels biasing the residual background low. A simple clipping to set all of the negative image pixel values to 0 results in a signal-to-noise ratio nearly identical to the case without any background subtraction.
As a final test of the robustness and accuracy of our probabilistic subtraction algorithm, we show the resultant surface brightness profiles  The image made from the simulated cluster + background event list. No background subtraction was applied to this image. Middle: The "clean" image, made from subtracting the background image derived from a simulation without cluster emission from the image derived from the cluster + background simulation. Middle-right: The image made from the simulated cluster + background event list with likely background events removed randomly from the image. Far-right: The image made from the simulated cluster + background event list with all events down-weighted by the value of . All three images with background treatment used identical background data.
for the five images in Figure 8. From these surface brightness profiles it is clear that both the random removal and fractional photon images do not greatly affect the extended structure of the cluster on average and gives a positive definite value at all radii, unlike the simple subtraction. Furthermore, the absence of any over-subtraction in our new methods enables the surface brightness of the cluster to be resolved at larger radii than what is possible with the simple subtraction curve. Comparing the random removal and fractional photon surface brightness profiles to that without any background subtraction at distances of 5 , we find that the average "background" surface brightness is reduced by factor of ∼ 5.

Tests Using Real ART-XC Data
After demonstrating the viability of this background subtraction calculation with simulated data of a known source and background spectrum, we repeat the test using real observations using the ART-XC data. The input event lists, one for each of the seven telescopes, are six months of integrated exposure throughout a 6.92 • × 3 • rectangle centered at a position ( , = 273.46154 • , 66.0 • ) near the NEP. Final images are produced by summing up the counts across all seven telescopes and dividing by the summed vignetting-corrected exposure map. The estimated background is determined by using the results of the "blank-sky" observations taken by ART-XC (Pavlinsky et al. 2021a), renormalized to match the 30 − 70 keV count rate observed in each of the constituent daily event list files that, when combined, produce the integrated event list. The mirrors have zero effective area at these energies, meaning that all events observed with these energies are associated with the background. Similar to what was done for the simulated data above, we project the expected detector background counts onto its corresponding sky position using the instantaneous pointing of the telescope at every second during this six month exposure. Our probabilistic background subtraction is subsequently performed using the background maps and observed data in 1.25 square regions on the sky. ) have been included on the cluster only surface brightness profile. The markers for each of these curves have been randomly offset from each other by up to ±3 exclusively for presentation purposes. Both the random removal and fractional photon methods enable the cluster emission to be resolved out to larger distances than images with or without simple background subtraction.
We demonstrate the effectiveness of this algorithm in two ways in Figure 9. The first is to show the distribution of pixel values across the full tile after applying various methods of subtracting background. The second demonstrates a zoomed-in image surrounding a real source detected in the ART-XC first year survey catalog: the star HD 170527. We perform a aperture photometry exercise similar to that performed for the simulated cluster emission, after renormalizing the exposure corrected image (units of cts s −1 ) by multiplying the data near the star by the average value of the exposure map (units of s) in that same region. With no background subtracted/removed the signal-to-noise ratio for this source is 5.15. The signal-to-noise ratios for simple subtraction, random removal, and fractional photons are 5.06, 5.51, and 5.06 respectively. The gains in signal-to-noise for this case are not as significant as for an extended source, but there is nevertheless no evidence that our new methods of suppressing background are adversely affecting the image statistics. For fainter sources at lower signal-to-noise even the modest improvements in significance can greatly improve the robustness and reliability of any candidate sources.

DISCUSSION
The method of accounting for background events presented here, while originally developed and designed to produce exposure-corrected images out of the ART-XC survey data, can be applied to data from any "photon counting" observatory. The particulars of the implementation for each mission will only vary in how the background event rate is estimated in a detector region, what that rate is, and how the telescope converts between detector and sky coordinate systems.
The treatment of background events described in this work has several properties that compare favorably to a simpler background subtraction. First and foremost, the results of the estimated sky photon count rates are ensured to be positive definite even when the expected number of background events in a pixel exceeds the total number of observed counts. The positive definite nature of the resultant image is both more physically sensible and aesthetically pleasing when an output image is created. In particular, no artificial cuts or restrictions on the image data are needed to address negative values in the image pixels. Importantly, tests with simulated data show that this weighted average provides an unbiased estimate of the true net counts and converges to the intuitive value of zero when the expected background counts become much larger than the total number of counts observed.
The mathematical framework described here may also have applications with other sources of "background" beyond the non-photon background for which this algorithm was specifically developed. While we have discussed this algorithm primarily in terms of separating background events originating from charged particles and electronic noise from sky photons in ART-XC data, any background in an event list that can be estimated independently can be accounted for using these same calculations, at least in principle. As one example, the variable could instead correspond to the quiescent flux of a flaring source, with corresponding to the flux of flares over a specific time period. In such a case, the flux of the flare can be robustly estimated while accounting for the expected Poisson fluctuations in the quiescent count rate in each individual time bin. We caution that, for any application of this algorithm, the interpretation must account for the inherently asymmetric and non-Gaussian distributions of derived parameter values. Figure 9. Imaging results derived from ART-XC observations of a 6.92 • × 3 • tile region near the North Ecliptic Pole. The images from each of the seven telescopes are individually corrected for background and the divided by the sum of the seven corrected exposure maps. Top: The distributions of pixel values across the entire tile using simple background subtraction (in blue) and the random removal method described in this work (in red). The dashed vertical line denotes zero counts. Bottom: Zoomed in images around one of the sources in the ART-XC first year source catalog, the star HD 170527. The scaling and color bars are identical for all four images, and the white color denotes pixels with negative values. The blue circle denotes a 2 circle around the optical position of this star. From left to right, the four sub-panels show: the image of the source plus background; the simply subtracted image using the re-scaled blank sky data projected onto the sky using the spacecraft attitude history for this region; our new method of random removal of likely background events; and the image created by down-weighting each event by its value of . The random removal and fractional photon methods use the exact same background sky map as the simple subtraction method.
One caveat for this method is that it does not explicitly account for uncertainties or outright errors in the expected background counts per pixel, . Because these calculations are designed to operate in the regime where the total counts per pixel are subject to large statistical fluctuations, it is certainly plausible that the uncertainties in the expected background counts per pixel may be small relative to the other uncertainties in the calculation. We nevertheless encourage any users who create images using these formulae to test the results using a realistic distribution of background count estimates in order to quantify their impact on their final images.
While the resultant images may be useful for identifying candidate sources that may not be apparent without background subtraction, this algorithm is not explicitly designed for such a purpose. The calculation presented here does not utilize any crucial telescope-specific information such as the point spread function (PSF) that is essential to properly quantifying the significance of the detection. In that sense, this algorithm is better suited for imaging extended, diffuse sources much larger than then PSF. Our tests using ART-XC data of a simulated galaxy cluster and a real star confirm that the signal-to-noise of the cluster emission was more significantly boosted by our improved methods of accounting for the background. Any candidate source detected in any image (including the new calculations discussed in this work) must be subject to rigorous follow-up analysis that calculates their overall significance above the expected background counts within a region. We emphasize that most rigorous calculations of source significance, including those utilizing ART-XC data (e.g. Pavlinsky et al. 2021a) do not remove the background events at all but rather model it simultaneously with any candidate source counts.
We caution that the practical details of applying this calculation to an X-ray or gamma ray telescope's data will depend strongly on the details of the telescope, even though the underlying statistical principles do not. Any user hoping to take advantage of this new calculation will need to make sure they are properly estimating and modeling the background structure (which may include its dependence on time, energy, detector coordinates, etc.. ) to sufficient accuracy. The choice of estimating the net counts in discrete regions of the sky for our ART-XC examples is largely driven by the presence of rich structure in the exposure map arising from the continuous motion of the spacecraft throughout the survey tile and the fact that each pass through the tile may have a different NXB count rate. This method of applying our new background treatment algorithms may apply to other telescopes "out of the box", but other means of binning the data may be required. Whatever method is ultimately developed for a specific mission, it must account for all of the unique aspects of their respective detector hardware and operations.
Recent work (Wilkins et al. 2020;Grant et al. 2020) using simulations of the upcoming ATHENA mission has shown that machine learning methods may be able to independently estimate the probability that an individual event is associated with the background. The methods presented here can be considered complimentary to those, in the sense that the our framework can be used in situations where such high fidelity simulations of X-ray photon interactions with telescope hardware are not available or feasible. This statistical method could also be used to help support or refute the event-specific probabilities determined by the machine learning algorithms. The methods for producing images discussed in §6 derived from the event-specific probabilities can be readily utilized in the same manner regardless as to how that probability was determined.

APPENDIX A: CONVERGENCE OF ★ TO SIMPLE SUBTRACTION IN HIGH SIGNAL-TO-NOISE LIMIT
We will now demonstrate that our new estimator ★ is identical to a simple subtraction calculation ( − ) in the limit where . The summary statistic described in Equation 13 where the normalization 1 is defined as can be expressed exactly and analytically as In this equation Γ( ) is the gamma function, the real-valued extension of the factorial function with Γ( + 1) = ! when is an integer. When we can approximate the incomplete upper gamma functions with their respective gamma functions (in other words Γ ( , ) ≈ Γ( ) when ). Distributing the leading exponential factor to the two terms in the square brackets, canceling out the gamma functions in the numerators and denominators in this approximation limit, and simplifying leads to In this limit ( ), this first term also converges to zero quickly enough to be negligible since Γ( + 2) = ( + 1)!. In this regime the only terms that therefore survive lead to desired result of ★ ≈ − .

APPENDIX B: BIAS OF SIMPLE SUBTRACTION
We wish to show that simple subtraction as discussed in this paper ( − ) is an unbiased estimator of . Using the operation ( ) to denote the expectation value of a variable , this is equivalent to showing that ( − ) = . We note that since is a random variable that follows a Poisson distribution from the sum of our sky and background terms, its expectation value is given as Since the expectation value operation is linear and the expectation value of a distribution parameter is the parameter itself, we can then rewrite our expectation value as ( − ) = ( ) − ( ) = + − = Thereby proving that the simple subtraction calculation is unbiased. This paper has been typeset from a T E X/L A T E X file prepared by the author.