-
PDF
- Split View
-
Views
-
Cite
Cite
R. J. Lyon, B. W. Stappers, S. Cooper, J. M. Brooke, J. D. Knowles, Fifty years of pulsar candidate selection: from simple filters to a new principled real-time classification approach, Monthly Notices of the Royal Astronomical Society, Volume 459, Issue 1, 11 June 2016, Pages 1104–1123, https://doi.org/10.1093/mnras/stw656
- Share Icon Share
Abstract
Improving survey specifications are causing an exponential rise in pulsar candidate numbers and data volumes. We study the candidate filters used to mitigate these problems during the past 50 years. We find that some existing methods such as applying constraints on the total number of candidates collected per observation, may have detrimental effects on the success of pulsar searches. Those methods immune to such effects are found to be ill-equipped to deal with the problems associated with increasing data volumes and candidate numbers, motivating the development of new approaches. We therefore present a new method designed for online operation. It selects promising candidates using a purpose-built tree-based machine learning classifier, the Gaussian Hellinger Very Fast Decision Tree, and a new set of features for describing candidates. The features have been chosen so as to (i) maximize the separation between candidates arising from noise and those of probable astrophysical origin, and (ii) be as survey-independent as possible. Using these features our new approach can process millions of candidates in seconds (∼1 million every 15 s), with high levels of pulsar recall (90 per cent+). This technique is therefore applicable to the large volumes of data expected to be produced by the Square Kilometre Array. Use of this approach has assisted in the discovery of 20 new pulsars in data obtained during the Low-Frequency Array Tied-Array All-Sky Survey.
1 INTRODUCTION
The search techniques used to isolate the radio emission of pulsars, are designed to find periodic broad-band signals exhibiting signs of dispersion caused by travel through the interstellar medium (ISM). Signals meeting these criteria are recorded as a collection of diagnostic plots and summary statistics, in preparation for analysis. Together these plots and statistics are referred to as a pulsar ‘candidate’, a possible detection of a new pulsar. Each candidate must be inspected by either an automated method, or a human expert, to determine their authenticity. Those of likely pulsar origin are highlighted for further analysis, and possibly allocated telescope time for confirmation observations. The remainder are typically ignored. The process of deciding which candidates are worthwhile investigating has become known as candidate ‘selection’. It is an important step in the search for pulsars since it allows telescope time to be prioritized upon those detections likely to yield a discovery. Until recently (early 2000s) candidate selection was a predominately manual task. However advances in telescope receiver design, and the capabilities of supporting computational infrastructures, significantly increased the number of candidates produced by modern pulsar surveys (Stovall, Lorimer & Lynch 2013). Manual approaches therefore became impractical, introducing what has become known as the ‘candidate selection problem’. In response, numerous graphical and automated selection methods were developed (Johnston et al. 1992; Edwards et al. 2001; Manchester et al. 2001; Keith et al. 2009; Navarro, Anderson & Freire 2003), designed to filter candidates in bulk. The filtering procedure used ranged in complexity from a simple signal-to-noise ratio (S/N) cut, through to more complex functions (Lee et al. 2013). In either case, automated approaches enabled large numbers of candidates to be selected at speed in a reproducible way.
Despite these advances the increasing number of candidates produced by contemporary pulsar surveys, tends to necessitate a pass of manual selection upon the candidates selected by software. Many have therefore turned to machine learning (ML) methods to build ‘intelligent’ filters (Eatough et al. 2010; Bates et al. 2012; Morello et al. 2014; Zhu et al. 2014), capable of reducing the dependence on human input. This has achieved some success. However these methods are often developed for a specific pulsar survey search pipeline, making them unsuitable for use with other surveys without modification. As a consequence, new selection mechanisms are often designed and implemented per survey. As more methods continue to emerge, it becomes increasingly unclear which of these best address the candidate selection problem, and under what circumstances. It is also unclear which are best equipped to cope with the trend for increasing candidate numbers, the overwhelming majority of which arise from noise. Existing approaches are not explicitly designed to mitigate noise, rather they are designed to isolate periodic detections. This does not achieve the same effect as explicitly mitigating noise. For example, isolating periodic candidates as potential pulsars, does not necessarily mitigate the impact of periodic noise. Thus, it is possible that these techniques will become less effective over time, as noise becomes responsible for an increasing proportion of all candidates detected.
Existing ‘intelligent’ approaches are also ill-equipped to deal with the data processing paradigm shift, soon to be brought about by next-generation radio telescopes. These instruments will produce more data than can be stored, thus survey data processing, including candidate selection, will have to be done online in real-time (or close to). In the real-time scenario, it is prohibitively expensive to retain all data collected (see Section 4.3.1). It therefore becomes important to identify and prioritize data potentially containing discoveries for storage. Otherwise such data could be discarded and discoveries missed. Thus, new techniques are required (Keane et al. 2014) to ensure preparedness for this processing challenge.
In this paper we describe a new candidate selection approach designed for online operation, that mitigates the impact of increasing candidate numbers arising from noise. We develop our arguments for producing such a technique in progressive stages. In Section 2, we describe the candidate generation process. We show that improvements in pulsar survey technical specifications have led to increased candidate output, and infer a trend for exponential growth in candidate numbers which we show to be dominated by noise. We also demonstrate why restricting candidate output based on simple S/N cuts, runs the risk of omitting legitimate pulsar signals. The trend in candidate numbers and the ineffectiveness of S/N filters, allows us to identify what we describe as a ‘crisis’ in candidate selection. In Section 3, we review the different candidate selection mechanisms employed during the past 50 years, to look for potential solutions to the issues raised in Section 2. Based on this review, in Section 4, we discuss these methods. We identify how all will be challenged by the transition to online processing required by telescopes such as the Square Kilometre Array (SKA), motivating the development of new approaches. In addition we critique the existing features used to describe pulsar candidates, fed as inputs to the ML methods employed by many to automate the selection process. In Section 5, we present our own set of eight candidate features, which overcome some of these deficiencies. Derived from statistical considerations and information theory, these features were chosen to maximize the separation between noise and non-noise arising candidates. In Section 6, we describe our new data stream classification algorithm for online candidate selection which uses these features. Section 6 also presents classification results that demonstrate the utility of the new approach, and its high level of pulsar recall. Finally, in Section 7 we summarize the paper, and comment on how the use of our method has helped to find 20 new pulsars during the Low-Frequency Array (LOFAR) Tied-Array All-Sky Survey (LOTAAS), though discovery details will be published elsewhere.
2 CANDIDATE GENERATION
Since the adoption of the fast Fourier transform (FFT) (Burns & Clark 1969; Taylor, Dura & Huguenin 1969; Hulse & Taylor 1974), the general pulsar search procedure has remained relatively unchanged. Signals focused at the receiver of a radio telescope observing at a central frequency fc (MHz), with bandwidth B (MHz), are sampled and recorded at a pre-determined rate at intervals of tsamp (μs), chosen to maximize sensitivity to the class of signals being searched for. The data are subsequently split in to nchans frequency channels, each of width Δv (kHz). An individual channel contains stot samples of the signal taken at the interval tsamp, over an observational period of length tobs seconds, such that |$s_{\rm tot} = \frac{t_{\rm obs}}{t_{\rm samp}}$|. Each unique observation is therefore representable as an nchans × stot matrix |$\boldsymbol M$|.
A pulsar search involves a number of procedural steps applied to the data in |$\boldsymbol M$|. The principal steps are similar for all searches, however the order in which these are undertaken can vary, as too can their precise implementation. In general, the first step involves radio frequency interference (RFI) excision, via the removal of channels (rows of the matrix) corresponding to known interference frequencies (Keith et al. 2010). Subsequently ‘Clipping’ (Hogden et al. 2012) may be applied to the data, which aims to reduce the impact of strong interference. This is achieved by setting to zero (or to the local mean) those samples which exhibit intensities higher than some pre-determined threshold in a given column in |$\boldsymbol M$| (e.g. an intensity 2σ above the mean). Once these initial steps are complete, processing enters a computationally expensive phase known as de-dispersion.
Dispersion by free electrons in the ISM causes a frequency-dependent delay in radio emission as it propagates through the ISM. This delay temporally smears legitimate pulsar emission (Lorimer & Kramer 2006) reducing the S/N of their pulses. The amount of dispersive smearing a signal receives is proportional to a quantity called the dispersion measure (DM; Lorimer & Kramer 2006). This represents the free electron column density between an observer and a pulsar, integrated along the line of sight. The degree to which a signal is dispersed for an unknown pulsar cannot be known a priori (e.g. Keith et al. 2010; Levin 2012), thus several dispersion measure tests or ‘DM trials’ must be conducted to determine this value. This can be used to mitigate the dispersive smearing, thereby increasing the S/N of a signal (Lorimer & Kramer 2006). For a single trial, each frequency channel (row in |$\boldsymbol M$|) is shifted by an appropriate delay before each time bin is integrated in frequency. This produces 1 de-dispersed time series for each DM trial value.
Periodic signals in de-dispersed time series data, can be found using a Fourier analysis. This is known as a periodicity search (Lorimer & Kramer 2006). The first step after performing the FFT of a periodicity search usually involves filtering the data to remove strong spectral features known as ‘birdies’ (Manchester et al. 2001; Hessels et al. 2007). These may be caused by periodic or quasi-periodic interference. Summing techniques are subsequently applied, which add the amplitudes of harmonically related frequencies to their corresponding fundamentals. This step is necessary as in the Fourier domain, the power from a narrow pulse is distributed between its fundamental frequency and its harmonics (Lorimer & Kramer 2006). Thus for weaker pulsars the fundamental may not rise above the detection threshold, but the harmonic sum generally will. Periodic detections with large Fourier amplitudes post summing (above the noise background or a threshold level), are then considered to be ‘suspect’ periods.
A further process known as sifting (e.g. Stovall et al. 2013) is then applied to the collected suspects, which removes duplicate detections of the same signal at slightly different DMs, along with their related harmonics. A large number of suspects survive the sifting process. Diagnostic plots and summary statistics are computed for each of these remaining suspects forming candidates, which are stored for further analysis. The basic candidate consists of a small collection of characteristic variables. These include the S/N, DM, period, pulse width, and the integrated pulse profile. The latter is an array of continuous variables that describe a longitude-resolved version of the signal that has been averaged in both time and frequency. More detailed candidates also contain data describing how the signal persists throughout the time and frequency domains (Eatough et al. 2010). This can be seen in plots (A) and (B) in Fig. 1. Here persistence in frequency (A) is represented by a two-dimensional matrix showing pulse profiles integrated in time, for a set of averaged frequency channels (i.e. not full frequency resolution). Persistence through time (B), is represented by a two-dimensional matrix showing the pulse profile integrated across similarly averaged frequency channels as a function of time.

An annotated example candidate summarizing the detection of PSR J1706−6118. The candidate was obtained during processing of HTRU data by Thornton (2013).
2.1 Modelling candidate numbers
Candidate numbers are anecdotally understood to be increasing steadily over time. Here we provide historical evidence supporting this view, obtained by reviewing most of the large-scale pulsar surveys conducted since the initial pulsar discovery by Hewish et al. (1968). The surveys studied are listed in Tables 2 and 3. This information has also been made available via an interactive online resource found at www.jb.man.ac.uk/pulsar/surveys.html.
Candidate numbers reported in the literature are summarized in Table 1, providing empirical evidence for rising candidate numbers. The rise is understood to be the result of expanding survey technical specifications (Stovall et al. 2013) occurring during the period depicted in Tables 2 and 3. Finer frequency resolution, longer dwell times, and acceleration searches (Eatough et al. 2013), have significantly increased the candidate yield (Lyon 2015). However, at present there is no accepted method for quantifying the effects of improving survey specifications on candidate numbers. It is therefore difficult to understand precisely how candidate numbers are changing, and what the S/N distribution of candidates should look like in practice. Such knowledge is needed if we are to design candidate selection approaches robust to error, and accurately plan survey storage requirements. Although it is difficult to capture all the steps involved in pulsar data analysis, we describe a model here that can be used as a proxy for estimating candidate numbers, linked to the number of dispersion trials undertaken per observation.
Survey . | Year . | Candidates . | deg−2 . |
---|---|---|---|
2nd Molonglo Survey(Manchester et al. 1978) | 1977 | 2500 | ∼0.1 |
Phase II survey (Stokes et al. 1986) | 1983 | 5405 | ∼1 |
Parkes 20 cm survey (Johnston et al. 1992) | 1988 | ∼150 000 | ∼188 |
Parkes Southern Pulsar Survey (Manchester et al. 1996) | 1991 | 40 000 | ∼2 |
Parkes Multibeam Pulsar Survey (Manchester et al. 2001) | 1997 | 8000 000 | ∼5161 |
Swinburne Int. Lat. Survey (Edwards et al. 2001) | 1998 | >200 000 | ∼168a |
Arecibo P-Alfa all configurations (Cordes et al. 2006; Lazarus 2012; P-Alfa Consortium 2015) | 2004 | >5000 000 | ∼16 361a |
6.5 GHz Multibeam Survey (Bates et al. 2011; Bates 2011) | 2006 | 3500 000 | ∼77 778 b |
GBNCC survey (Stovall et al. 2014) | 2009 | >1, 200 000 | ∼89a |
Southern HTRU (Keith et al. 2010) | 2010 | 55 434 300 | ∼1705 |
Northern HTRU (Barr et al. 2013; Ng 2012) | 2010 | >80 000 000 | ∼2890a |
LOTAAS (Cooper, private communication) | 2013 | 39 000 000 | ∼2000 |
Survey . | Year . | Candidates . | deg−2 . |
---|---|---|---|
2nd Molonglo Survey(Manchester et al. 1978) | 1977 | 2500 | ∼0.1 |
Phase II survey (Stokes et al. 1986) | 1983 | 5405 | ∼1 |
Parkes 20 cm survey (Johnston et al. 1992) | 1988 | ∼150 000 | ∼188 |
Parkes Southern Pulsar Survey (Manchester et al. 1996) | 1991 | 40 000 | ∼2 |
Parkes Multibeam Pulsar Survey (Manchester et al. 2001) | 1997 | 8000 000 | ∼5161 |
Swinburne Int. Lat. Survey (Edwards et al. 2001) | 1998 | >200 000 | ∼168a |
Arecibo P-Alfa all configurations (Cordes et al. 2006; Lazarus 2012; P-Alfa Consortium 2015) | 2004 | >5000 000 | ∼16 361a |
6.5 GHz Multibeam Survey (Bates et al. 2011; Bates 2011) | 2006 | 3500 000 | ∼77 778 b |
GBNCC survey (Stovall et al. 2014) | 2009 | >1, 200 000 | ∼89a |
Southern HTRU (Keith et al. 2010) | 2010 | 55 434 300 | ∼1705 |
Northern HTRU (Barr et al. 2013; Ng 2012) | 2010 | >80 000 000 | ∼2890a |
LOTAAS (Cooper, private communication) | 2013 | 39 000 000 | ∼2000 |
Survey . | Year . | Candidates . | deg−2 . |
---|---|---|---|
2nd Molonglo Survey(Manchester et al. 1978) | 1977 | 2500 | ∼0.1 |
Phase II survey (Stokes et al. 1986) | 1983 | 5405 | ∼1 |
Parkes 20 cm survey (Johnston et al. 1992) | 1988 | ∼150 000 | ∼188 |
Parkes Southern Pulsar Survey (Manchester et al. 1996) | 1991 | 40 000 | ∼2 |
Parkes Multibeam Pulsar Survey (Manchester et al. 2001) | 1997 | 8000 000 | ∼5161 |
Swinburne Int. Lat. Survey (Edwards et al. 2001) | 1998 | >200 000 | ∼168a |
Arecibo P-Alfa all configurations (Cordes et al. 2006; Lazarus 2012; P-Alfa Consortium 2015) | 2004 | >5000 000 | ∼16 361a |
6.5 GHz Multibeam Survey (Bates et al. 2011; Bates 2011) | 2006 | 3500 000 | ∼77 778 b |
GBNCC survey (Stovall et al. 2014) | 2009 | >1, 200 000 | ∼89a |
Southern HTRU (Keith et al. 2010) | 2010 | 55 434 300 | ∼1705 |
Northern HTRU (Barr et al. 2013; Ng 2012) | 2010 | >80 000 000 | ∼2890a |
LOTAAS (Cooper, private communication) | 2013 | 39 000 000 | ∼2000 |
Survey . | Year . | Candidates . | deg−2 . |
---|---|---|---|
2nd Molonglo Survey(Manchester et al. 1978) | 1977 | 2500 | ∼0.1 |
Phase II survey (Stokes et al. 1986) | 1983 | 5405 | ∼1 |
Parkes 20 cm survey (Johnston et al. 1992) | 1988 | ∼150 000 | ∼188 |
Parkes Southern Pulsar Survey (Manchester et al. 1996) | 1991 | 40 000 | ∼2 |
Parkes Multibeam Pulsar Survey (Manchester et al. 2001) | 1997 | 8000 000 | ∼5161 |
Swinburne Int. Lat. Survey (Edwards et al. 2001) | 1998 | >200 000 | ∼168a |
Arecibo P-Alfa all configurations (Cordes et al. 2006; Lazarus 2012; P-Alfa Consortium 2015) | 2004 | >5000 000 | ∼16 361a |
6.5 GHz Multibeam Survey (Bates et al. 2011; Bates 2011) | 2006 | 3500 000 | ∼77 778 b |
GBNCC survey (Stovall et al. 2014) | 2009 | >1, 200 000 | ∼89a |
Southern HTRU (Keith et al. 2010) | 2010 | 55 434 300 | ∼1705 |
Northern HTRU (Barr et al. 2013; Ng 2012) | 2010 | >80 000 000 | ∼2890a |
LOTAAS (Cooper, private communication) | 2013 | 39 000 000 | ∼2000 |
Technical specifications of pulsar surveys conducted between 1968–1999. Here Fc (MHz) is the central observing frequency, B (MHz) is the bandwidth, Δv (kHz) is the channel width (to 3.d.p), nchans indicates the number of frequency channels, tsamp(μs) is the sample frequency (to 3.d.p), and tobs(s) the length of the observation (to 1.d.p). Values that could not be found in the literature are indicated with ‘?’. The omission of a survey should be treated as an oversight as opposed to a judgement on its significance.
Survey . | Year . | Fc (MHz) . | B (MHz) . | Δv (kHz) . | nchans . | tsamp(μs) . | tobs(s) . | DM trials . |
---|---|---|---|---|---|---|---|---|
1st Molonglo Survey (Large et al. 1968) | 1968 | 408 | 4 | 2000 | 2 | 5000 | 15 | ? |
Search at low Galactic Lat. (Davies, Large & Pickwick 1970) | 1969 | 408 | 4 | ? | ? | 50 000 | 819 | ? |
Arecibo Survey 1 (Hulse & Taylor 1974) | 197? | 430 | 8 | 250 | 32 | 5600 | 198 | 64 |
Jodrell Survey A (Davies, Lyne & Seiradakis 1977) | 1972 | 408 | 4 | 2000 | 2 | 40 000 | 660 | ? |
2nd Molonglo Survey (Manchester et al. 1978) | 1977 | 408 | 4 | 800 | 4 | 20 000 | 44.7 | ? |
Green Bank Northern hemisphere Survey (Damashek et al. 1978, 1982) | 1977 | 400 | 16 | 2000 | 8 | 16 700 | 144 | 8 |
Princeton-NRAO Survey (Dewey et al. 1985) | 1982–83 | 390 | 16 | 2000 | 8 | 5556 | 138 | 8 |
Green Bank short-period (Stokes et al. 1985) | 1983 | 390 | 8 | 250 | 32 | 2000 | 132 | ? |
Jodrell Survey B (Clifton & Lyne 1986) | 1983–84 | 1400 | 40 | 5000 | 8 | 2000 | 540 | ? |
Arecibo survey 2 (a) – Phase II Princeton-NRAO (Stokes et al. 1986) | 1983 | 390 | 8 | 250 | 32 | 2000 | 132 | ? |
Arecibo survey 2 (b) (Stokes et al. 1986) | 1984–85 | 430 | 0.96 | 60 | 16 | 300 | 39 | ? |
Jodrell Survey C Biggs & Lyne (1992) | 1985–87 | 610/925/928/1420 | 4/8/32 | 125/500/1000 | 32 | 300 | 79 | 39 |
Parkes Globular Cluster Survey (20 cm) (Manchester et al. 1990a,b)a | 1989–90 | 1491 | 80/320 | 1000/5000 | 80/64 | 300 | 3000 | 100 |
Parkes Globular Cluster Survey (50 cm) (Manchester et al. 1990a,b)a | 1988–90 | 640 | 32 | 250 | 128 | 300 | 3000/4500 | 100 |
Arecibo Survey 3 (Nice, Fruchter & Taylor 1995) | 198? | 430 | 10 | 78.125 | 128 | 516.625 | 67.7 | 256 |
Parkes 20-cm Survey (I) (Johnston et al. 1992) | 1988 | 1434 | 800 | 1000 | 80 | 300 | 78.6 | 100 |
Parkes 20-cm Survey (II) (Johnston et al. 1992) | 1988 | 1520 | 320 | 5000 | 64 | 1200 | 157.3 | 100 |
Arecibo 430 MHz Intermediate Galactic Latitude Survey (Navarro et al. 2003) | 1989–91 | 430 | 10 | 78.125 | 128 | 506.625 | 66.4 | 163 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H1) (Foster et al. 1995) | 1990 | 430 | 10 | 250 | 128 | 506 | 40 | 64 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H2) (Foster et al. 1995) | 1991 | 430 | 8 | 250 | 32 | 250 | 40 | 64 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H3) (Foster et al. 1995) | 1992 | 430 | 8 | 250 | 32 | 250 | 40 | 64 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H4) (Foster et al. 1995) | 1993 | 430 | 8 | 250 | 32 | 250 | 40 | 64 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H5) (Foster et al. 1995) | 1994–95 | 430 | 8 | 250 | 32 | 250 | 40 | 64 |
Arecibo Survey 4 Phase I (Nice, Taylor & Fruchter 1993) | 1991 | 430 | 10 | 78.125 | 128 | 516.625 | 67.7 | ? |
Arecibo Survey 4 Phase II (Camilo, Nice & Taylor 1993) | 1992 | 429 | 8 | 250 | 64 | 250 | 40 | 192 |
Parkes Southern (Manchester et al. 1996) | 1991–93 | 436 | 32 | 1250 | 256 | 300 | 157.3 | 738 |
Green Bank fast pulsar survey (Sayer, Nice & Taylor 1997) | 1994–96 | 370 | 40 | 78.125 | 512 | 256 | 134 | 512 |
PMPS (Manchester et al. 2001) | 1997 | 1374 | 288 | 3000 | 96 | 250 | 2100 | 325 |
Swinburne Int. Lat. survey (Edwards et al. 2001) | 1998–99 | 1374 | 288 | 3000 | 96 | 125 | 265 | 375 |
Survey . | Year . | Fc (MHz) . | B (MHz) . | Δv (kHz) . | nchans . | tsamp(μs) . | tobs(s) . | DM trials . |
---|---|---|---|---|---|---|---|---|
1st Molonglo Survey (Large et al. 1968) | 1968 | 408 | 4 | 2000 | 2 | 5000 | 15 | ? |
Search at low Galactic Lat. (Davies, Large & Pickwick 1970) | 1969 | 408 | 4 | ? | ? | 50 000 | 819 | ? |
Arecibo Survey 1 (Hulse & Taylor 1974) | 197? | 430 | 8 | 250 | 32 | 5600 | 198 | 64 |
Jodrell Survey A (Davies, Lyne & Seiradakis 1977) | 1972 | 408 | 4 | 2000 | 2 | 40 000 | 660 | ? |
2nd Molonglo Survey (Manchester et al. 1978) | 1977 | 408 | 4 | 800 | 4 | 20 000 | 44.7 | ? |
Green Bank Northern hemisphere Survey (Damashek et al. 1978, 1982) | 1977 | 400 | 16 | 2000 | 8 | 16 700 | 144 | 8 |
Princeton-NRAO Survey (Dewey et al. 1985) | 1982–83 | 390 | 16 | 2000 | 8 | 5556 | 138 | 8 |
Green Bank short-period (Stokes et al. 1985) | 1983 | 390 | 8 | 250 | 32 | 2000 | 132 | ? |
Jodrell Survey B (Clifton & Lyne 1986) | 1983–84 | 1400 | 40 | 5000 | 8 | 2000 | 540 | ? |
Arecibo survey 2 (a) – Phase II Princeton-NRAO (Stokes et al. 1986) | 1983 | 390 | 8 | 250 | 32 | 2000 | 132 | ? |
Arecibo survey 2 (b) (Stokes et al. 1986) | 1984–85 | 430 | 0.96 | 60 | 16 | 300 | 39 | ? |
Jodrell Survey C Biggs & Lyne (1992) | 1985–87 | 610/925/928/1420 | 4/8/32 | 125/500/1000 | 32 | 300 | 79 | 39 |
Parkes Globular Cluster Survey (20 cm) (Manchester et al. 1990a,b)a | 1989–90 | 1491 | 80/320 | 1000/5000 | 80/64 | 300 | 3000 | 100 |
Parkes Globular Cluster Survey (50 cm) (Manchester et al. 1990a,b)a | 1988–90 | 640 | 32 | 250 | 128 | 300 | 3000/4500 | 100 |
Arecibo Survey 3 (Nice, Fruchter & Taylor 1995) | 198? | 430 | 10 | 78.125 | 128 | 516.625 | 67.7 | 256 |
Parkes 20-cm Survey (I) (Johnston et al. 1992) | 1988 | 1434 | 800 | 1000 | 80 | 300 | 78.6 | 100 |
Parkes 20-cm Survey (II) (Johnston et al. 1992) | 1988 | 1520 | 320 | 5000 | 64 | 1200 | 157.3 | 100 |
Arecibo 430 MHz Intermediate Galactic Latitude Survey (Navarro et al. 2003) | 1989–91 | 430 | 10 | 78.125 | 128 | 506.625 | 66.4 | 163 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H1) (Foster et al. 1995) | 1990 | 430 | 10 | 250 | 128 | 506 | 40 | 64 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H2) (Foster et al. 1995) | 1991 | 430 | 8 | 250 | 32 | 250 | 40 | 64 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H3) (Foster et al. 1995) | 1992 | 430 | 8 | 250 | 32 | 250 | 40 | 64 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H4) (Foster et al. 1995) | 1993 | 430 | 8 | 250 | 32 | 250 | 40 | 64 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H5) (Foster et al. 1995) | 1994–95 | 430 | 8 | 250 | 32 | 250 | 40 | 64 |
Arecibo Survey 4 Phase I (Nice, Taylor & Fruchter 1993) | 1991 | 430 | 10 | 78.125 | 128 | 516.625 | 67.7 | ? |
Arecibo Survey 4 Phase II (Camilo, Nice & Taylor 1993) | 1992 | 429 | 8 | 250 | 64 | 250 | 40 | 192 |
Parkes Southern (Manchester et al. 1996) | 1991–93 | 436 | 32 | 1250 | 256 | 300 | 157.3 | 738 |
Green Bank fast pulsar survey (Sayer, Nice & Taylor 1997) | 1994–96 | 370 | 40 | 78.125 | 512 | 256 | 134 | 512 |
PMPS (Manchester et al. 2001) | 1997 | 1374 | 288 | 3000 | 96 | 250 | 2100 | 325 |
Swinburne Int. Lat. survey (Edwards et al. 2001) | 1998–99 | 1374 | 288 | 3000 | 96 | 125 | 265 | 375 |
Note. amore than one configuration used during the survey.
Technical specifications of pulsar surveys conducted between 1968–1999. Here Fc (MHz) is the central observing frequency, B (MHz) is the bandwidth, Δv (kHz) is the channel width (to 3.d.p), nchans indicates the number of frequency channels, tsamp(μs) is the sample frequency (to 3.d.p), and tobs(s) the length of the observation (to 1.d.p). Values that could not be found in the literature are indicated with ‘?’. The omission of a survey should be treated as an oversight as opposed to a judgement on its significance.
Survey . | Year . | Fc (MHz) . | B (MHz) . | Δv (kHz) . | nchans . | tsamp(μs) . | tobs(s) . | DM trials . |
---|---|---|---|---|---|---|---|---|
1st Molonglo Survey (Large et al. 1968) | 1968 | 408 | 4 | 2000 | 2 | 5000 | 15 | ? |
Search at low Galactic Lat. (Davies, Large & Pickwick 1970) | 1969 | 408 | 4 | ? | ? | 50 000 | 819 | ? |
Arecibo Survey 1 (Hulse & Taylor 1974) | 197? | 430 | 8 | 250 | 32 | 5600 | 198 | 64 |
Jodrell Survey A (Davies, Lyne & Seiradakis 1977) | 1972 | 408 | 4 | 2000 | 2 | 40 000 | 660 | ? |
2nd Molonglo Survey (Manchester et al. 1978) | 1977 | 408 | 4 | 800 | 4 | 20 000 | 44.7 | ? |
Green Bank Northern hemisphere Survey (Damashek et al. 1978, 1982) | 1977 | 400 | 16 | 2000 | 8 | 16 700 | 144 | 8 |
Princeton-NRAO Survey (Dewey et al. 1985) | 1982–83 | 390 | 16 | 2000 | 8 | 5556 | 138 | 8 |
Green Bank short-period (Stokes et al. 1985) | 1983 | 390 | 8 | 250 | 32 | 2000 | 132 | ? |
Jodrell Survey B (Clifton & Lyne 1986) | 1983–84 | 1400 | 40 | 5000 | 8 | 2000 | 540 | ? |
Arecibo survey 2 (a) – Phase II Princeton-NRAO (Stokes et al. 1986) | 1983 | 390 | 8 | 250 | 32 | 2000 | 132 | ? |
Arecibo survey 2 (b) (Stokes et al. 1986) | 1984–85 | 430 | 0.96 | 60 | 16 | 300 | 39 | ? |
Jodrell Survey C Biggs & Lyne (1992) | 1985–87 | 610/925/928/1420 | 4/8/32 | 125/500/1000 | 32 | 300 | 79 | 39 |
Parkes Globular Cluster Survey (20 cm) (Manchester et al. 1990a,b)a | 1989–90 | 1491 | 80/320 | 1000/5000 | 80/64 | 300 | 3000 | 100 |
Parkes Globular Cluster Survey (50 cm) (Manchester et al. 1990a,b)a | 1988–90 | 640 | 32 | 250 | 128 | 300 | 3000/4500 | 100 |
Arecibo Survey 3 (Nice, Fruchter & Taylor 1995) | 198? | 430 | 10 | 78.125 | 128 | 516.625 | 67.7 | 256 |
Parkes 20-cm Survey (I) (Johnston et al. 1992) | 1988 | 1434 | 800 | 1000 | 80 | 300 | 78.6 | 100 |
Parkes 20-cm Survey (II) (Johnston et al. 1992) | 1988 | 1520 | 320 | 5000 | 64 | 1200 | 157.3 | 100 |
Arecibo 430 MHz Intermediate Galactic Latitude Survey (Navarro et al. 2003) | 1989–91 | 430 | 10 | 78.125 | 128 | 506.625 | 66.4 | 163 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H1) (Foster et al. 1995) | 1990 | 430 | 10 | 250 | 128 | 506 | 40 | 64 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H2) (Foster et al. 1995) | 1991 | 430 | 8 | 250 | 32 | 250 | 40 | 64 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H3) (Foster et al. 1995) | 1992 | 430 | 8 | 250 | 32 | 250 | 40 | 64 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H4) (Foster et al. 1995) | 1993 | 430 | 8 | 250 | 32 | 250 | 40 | 64 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H5) (Foster et al. 1995) | 1994–95 | 430 | 8 | 250 | 32 | 250 | 40 | 64 |
Arecibo Survey 4 Phase I (Nice, Taylor & Fruchter 1993) | 1991 | 430 | 10 | 78.125 | 128 | 516.625 | 67.7 | ? |
Arecibo Survey 4 Phase II (Camilo, Nice & Taylor 1993) | 1992 | 429 | 8 | 250 | 64 | 250 | 40 | 192 |
Parkes Southern (Manchester et al. 1996) | 1991–93 | 436 | 32 | 1250 | 256 | 300 | 157.3 | 738 |
Green Bank fast pulsar survey (Sayer, Nice & Taylor 1997) | 1994–96 | 370 | 40 | 78.125 | 512 | 256 | 134 | 512 |
PMPS (Manchester et al. 2001) | 1997 | 1374 | 288 | 3000 | 96 | 250 | 2100 | 325 |
Swinburne Int. Lat. survey (Edwards et al. 2001) | 1998–99 | 1374 | 288 | 3000 | 96 | 125 | 265 | 375 |
Survey . | Year . | Fc (MHz) . | B (MHz) . | Δv (kHz) . | nchans . | tsamp(μs) . | tobs(s) . | DM trials . |
---|---|---|---|---|---|---|---|---|
1st Molonglo Survey (Large et al. 1968) | 1968 | 408 | 4 | 2000 | 2 | 5000 | 15 | ? |
Search at low Galactic Lat. (Davies, Large & Pickwick 1970) | 1969 | 408 | 4 | ? | ? | 50 000 | 819 | ? |
Arecibo Survey 1 (Hulse & Taylor 1974) | 197? | 430 | 8 | 250 | 32 | 5600 | 198 | 64 |
Jodrell Survey A (Davies, Lyne & Seiradakis 1977) | 1972 | 408 | 4 | 2000 | 2 | 40 000 | 660 | ? |
2nd Molonglo Survey (Manchester et al. 1978) | 1977 | 408 | 4 | 800 | 4 | 20 000 | 44.7 | ? |
Green Bank Northern hemisphere Survey (Damashek et al. 1978, 1982) | 1977 | 400 | 16 | 2000 | 8 | 16 700 | 144 | 8 |
Princeton-NRAO Survey (Dewey et al. 1985) | 1982–83 | 390 | 16 | 2000 | 8 | 5556 | 138 | 8 |
Green Bank short-period (Stokes et al. 1985) | 1983 | 390 | 8 | 250 | 32 | 2000 | 132 | ? |
Jodrell Survey B (Clifton & Lyne 1986) | 1983–84 | 1400 | 40 | 5000 | 8 | 2000 | 540 | ? |
Arecibo survey 2 (a) – Phase II Princeton-NRAO (Stokes et al. 1986) | 1983 | 390 | 8 | 250 | 32 | 2000 | 132 | ? |
Arecibo survey 2 (b) (Stokes et al. 1986) | 1984–85 | 430 | 0.96 | 60 | 16 | 300 | 39 | ? |
Jodrell Survey C Biggs & Lyne (1992) | 1985–87 | 610/925/928/1420 | 4/8/32 | 125/500/1000 | 32 | 300 | 79 | 39 |
Parkes Globular Cluster Survey (20 cm) (Manchester et al. 1990a,b)a | 1989–90 | 1491 | 80/320 | 1000/5000 | 80/64 | 300 | 3000 | 100 |
Parkes Globular Cluster Survey (50 cm) (Manchester et al. 1990a,b)a | 1988–90 | 640 | 32 | 250 | 128 | 300 | 3000/4500 | 100 |
Arecibo Survey 3 (Nice, Fruchter & Taylor 1995) | 198? | 430 | 10 | 78.125 | 128 | 516.625 | 67.7 | 256 |
Parkes 20-cm Survey (I) (Johnston et al. 1992) | 1988 | 1434 | 800 | 1000 | 80 | 300 | 78.6 | 100 |
Parkes 20-cm Survey (II) (Johnston et al. 1992) | 1988 | 1520 | 320 | 5000 | 64 | 1200 | 157.3 | 100 |
Arecibo 430 MHz Intermediate Galactic Latitude Survey (Navarro et al. 2003) | 1989–91 | 430 | 10 | 78.125 | 128 | 506.625 | 66.4 | 163 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H1) (Foster et al. 1995) | 1990 | 430 | 10 | 250 | 128 | 506 | 40 | 64 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H2) (Foster et al. 1995) | 1991 | 430 | 8 | 250 | 32 | 250 | 40 | 64 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H3) (Foster et al. 1995) | 1992 | 430 | 8 | 250 | 32 | 250 | 40 | 64 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H4) (Foster et al. 1995) | 1993 | 430 | 8 | 250 | 32 | 250 | 40 | 64 |
High Galactic Latitude Pulsar Survey of the Arecibo Sky (H5) (Foster et al. 1995) | 1994–95 | 430 | 8 | 250 | 32 | 250 | 40 | 64 |
Arecibo Survey 4 Phase I (Nice, Taylor & Fruchter 1993) | 1991 | 430 | 10 | 78.125 | 128 | 516.625 | 67.7 | ? |
Arecibo Survey 4 Phase II (Camilo, Nice & Taylor 1993) | 1992 | 429 | 8 | 250 | 64 | 250 | 40 | 192 |
Parkes Southern (Manchester et al. 1996) | 1991–93 | 436 | 32 | 1250 | 256 | 300 | 157.3 | 738 |
Green Bank fast pulsar survey (Sayer, Nice & Taylor 1997) | 1994–96 | 370 | 40 | 78.125 | 512 | 256 | 134 | 512 |
PMPS (Manchester et al. 2001) | 1997 | 1374 | 288 | 3000 | 96 | 250 | 2100 | 325 |
Swinburne Int. Lat. survey (Edwards et al. 2001) | 1998–99 | 1374 | 288 | 3000 | 96 | 125 | 265 | 375 |
Note. amore than one configuration used during the survey.
Technical specifications of pulsar surveys conducted between 2000 and present, and projected specifications for instruments under development. X-ray pulsar searches undertaken during this period (Abdo et al. 2009; Ransom et al. 2011) are omitted. Here Fc (MHz) is the central observing frequency, B (MHz) is the bandwidth, Δv (kHz) is the channel width (to 3.d.p), nchans indicates the number of frequency channels, tsamp(μs) is the sample frequency (to 3.d.p), and tobs(s) the length of the observation (to 1.d.p). Values that could not be found in the literature are indicated with ‘?’. The omission of a survey should be treated as an oversight as opposed to a judgement on its significance.
Survey . | Year . | Fc (MHz) . | B (MHz) . | Δv (kHz) . | nchans . | tsamp(μs) . | tobs(s) . | DM trials . |
---|---|---|---|---|---|---|---|---|
Parkes high-lat multibeam (Burgay et al. 2006) | 2000–03 | 1374 | 288 | 3000 | 96 | 125 | 265 | ? |
Survey of the Magellanic Clouds (Manchester et al. 2006) | 2000–01 | 1374 | 288 | 3000 | 96 | 1000 | 8400 | 228 |
1.4 GHz Arecibo Survey (DM < 100) (Hessels et al. 2007) | 2001–02 | 1175 | 100 | 390.625 | 256 | 64 | 7200 | ? |
1.4 GHz Arecibo Survey (DM > 100) (Hessels et al. 2007) | 2001–02 | 1475 | 100 | 195.313 | 512 | 128 | 7200 | ? |
Large Area Survey for Radio Pulsars (Jacoby et al. 2009) | 2001–02 | 1374 | 288 | 3000 | 96 | 125 | 256 | 375 |
EGRET 56 Pulsar survey (Crawford et al. 2006) | 2002–03 | 1374 | 288 | 3000 | 96 | 125 | 2100 | 150 |
EGRET error box survey (Champion, McLaughlin & Lorimer 2005) | 2003 | 327 | 25 | 48.828 | 512 | 125 | 260 | 392 |
A0327 Pilot (Deneva et al. 2013) | 2003 | 327 | 25 | 48.828 | 512 | 256 | 60 | 6358 |
The Perseus Arm Pulsar Survey (Burgay et al. 2013)a | 2004–09 | 1374 | 288 | 3000 | 96 | 125 | 2100 | 183/325 |
The 8gr8 Cygnus Survey (Rubio-Herrera et al. 2007; Janssen et al. 2009) | 2004 | 328 | 10 | 19.531 | 512 | 819.2 | 6872 | 488 |
Parkes deep northern Galactic Plane (Lorimer, Camilo & McLaughlin 2013) | 2004–05 | 1374 | 288 | 3000 | 96 | 125 | 4200 | 496 |
P-ALFA Survey (intial) (WAPP) (Cordes et al. 2006; Deneva et al. 2009) | 2004 | 1420 | 100 | 390.625 | 256 | 64 | 134 | 96 |
P-ALFA Survey (anticipated) (WAPP) (Cordes et al. 2006; Deneva et al. 2009)a | 2004–10 | 1420 | 300 | 390.625 | 1024 | 64 | 134 | 96/1272 |
6.5 Ghz Multibeam Pulsar Survey (Bates et al. 2011) | 2006–07 | 6591 | 576 | 3000 | 192 | 125 | 1055 | 286 |
Green Bank 350 MHz Drift Scan (Boyles et al. 2013) | 2007 | 350 | 50 | 24.414 | 2048 | 81.92 | 140 | ? |
GBT350 (Spigot) (Deneva et al. 2013) | 2007 | 350 | 50 | 24.414 | 2048 | 82 | 140 | ? |
P-ALFA Survey (MOCK) (Spitler et al. 2014; Deneva et al. 2009; Lazarus 2012)a | 2009–14 | 1375 | 322.6 | 336.042 | 960 | 65.5 | 120/300 | 5016 |
GBNCC (GUPPI) (Deneva et al. 2013; Stovall et al. 2014)a | 2009–14 | 350 | 100 | 24.414 | 4096 | 82 | 120 | 17 352/26 532 |
Southern HTRU (LOW) (Keith et al. 2010) | 2010–12 | 1352 | 340 | 390.625 | 870 | 64 | 4300 | ? |
Southern HTRU (MED) (Keith et al. 2010) | 2010–12 | 1352 | 340 | 390.625 | 870 | 64 | 540 | 1436 |
Southern HTRU (HIGH) (Keith et al. 2010) | 2010–12 | 1352 | 340 | 390.625 | 870 | 64 | 270 | 8000 |
A0327 (MOCK) (Deneva et al. 2013) | 2010 | 327 | 57 | 55.664 | 1024 | 125 | 60 | 6358 |
LPPS (Coenen et al. 2014) | 2010 | 142 | 6.8 | 12.143 | 560 | 655 | 3420 | 3487 |
LOTAS (Coenen et al. 2014)a | 2010–11 | 135 | 48 | 12.295 | 3904 | 1300 | 1020 | 16 845/18 100 |
Northern HTRU (LOW) (Barr et al. 2013; Ng 2012)a | 2010–14 | 1360 | 240 | 585.9 | 410 | 54.61 | 1500 | 406/3240 |
Northern HTRU (MED) (Barr et al. 2013; Ng 2012)a | 2010–14 | 1360 | 240 | 585.9 | 410 | 54.61 | 180 | 406/3240 |
Northern HTRU (HIGH) (Barr et al. 2013; Ng 2012)a | 2010–14 | 1360 | 240 | 585.9 | 410 | 54.61 | 90 | 406/3240 |
SPAN512 (Desvignes et al. 2012) | 2012 | 1486 | 512 | 500 | 1024 | 64 | 1080 | ? |
LOTAAS (Lofar Working Group 2013; Cooper 2014) | 2013 | 135 | 95 | 12.207 | 2592 | 491.52 | 3600 | 7000 |
A0327 (PUPPI) (Deneva et al. 2013) | 2014 | 327 | 69 | 24.503 | 2816 | 82 | 60 | 6358 |
SUPERB (Barr 2014; Keane et al., in preparation) | 2014 | 1374 | 340 | 332.031 | 1024 | 32 | 540 | 1448 |
GMRT High Resolution Southern Sky Survey (MID) (Bhattachatyya 2014; Bhattachatyya et al. 2016) | 2014 | 322 | 32 | 15.625 | 2048 | 60 | 1200 | 6000 |
GMRT High Resolution Southern Sky Survey (HIGH) (Bhattachatyya 2014; Bhattachatyya et al. 2016) | 2014 | 322 | 32 | 31.25 | 1024 | 30 | 720 | 6000 |
FASTb (Smits et al. 2009b) | 2016 | 1315 | 400 | 42.105 | 9500 | 100 | 600 | ? |
SKAb (Configuration A) (Smits et al. 2009a) | 2020–22 | 1250 | 500 | 50 | 9500 | 64 | 1800 | ? |
SKAb (Configuration B) (Smits et al. 2009a) | 2020–22 | 650 | 300 | 50 | 9500 | 64 | 1800 | ? |
Survey . | Year . | Fc (MHz) . | B (MHz) . | Δv (kHz) . | nchans . | tsamp(μs) . | tobs(s) . | DM trials . |
---|---|---|---|---|---|---|---|---|
Parkes high-lat multibeam (Burgay et al. 2006) | 2000–03 | 1374 | 288 | 3000 | 96 | 125 | 265 | ? |
Survey of the Magellanic Clouds (Manchester et al. 2006) | 2000–01 | 1374 | 288 | 3000 | 96 | 1000 | 8400 | 228 |
1.4 GHz Arecibo Survey (DM < 100) (Hessels et al. 2007) | 2001–02 | 1175 | 100 | 390.625 | 256 | 64 | 7200 | ? |
1.4 GHz Arecibo Survey (DM > 100) (Hessels et al. 2007) | 2001–02 | 1475 | 100 | 195.313 | 512 | 128 | 7200 | ? |
Large Area Survey for Radio Pulsars (Jacoby et al. 2009) | 2001–02 | 1374 | 288 | 3000 | 96 | 125 | 256 | 375 |
EGRET 56 Pulsar survey (Crawford et al. 2006) | 2002–03 | 1374 | 288 | 3000 | 96 | 125 | 2100 | 150 |
EGRET error box survey (Champion, McLaughlin & Lorimer 2005) | 2003 | 327 | 25 | 48.828 | 512 | 125 | 260 | 392 |
A0327 Pilot (Deneva et al. 2013) | 2003 | 327 | 25 | 48.828 | 512 | 256 | 60 | 6358 |
The Perseus Arm Pulsar Survey (Burgay et al. 2013)a | 2004–09 | 1374 | 288 | 3000 | 96 | 125 | 2100 | 183/325 |
The 8gr8 Cygnus Survey (Rubio-Herrera et al. 2007; Janssen et al. 2009) | 2004 | 328 | 10 | 19.531 | 512 | 819.2 | 6872 | 488 |
Parkes deep northern Galactic Plane (Lorimer, Camilo & McLaughlin 2013) | 2004–05 | 1374 | 288 | 3000 | 96 | 125 | 4200 | 496 |
P-ALFA Survey (intial) (WAPP) (Cordes et al. 2006; Deneva et al. 2009) | 2004 | 1420 | 100 | 390.625 | 256 | 64 | 134 | 96 |
P-ALFA Survey (anticipated) (WAPP) (Cordes et al. 2006; Deneva et al. 2009)a | 2004–10 | 1420 | 300 | 390.625 | 1024 | 64 | 134 | 96/1272 |
6.5 Ghz Multibeam Pulsar Survey (Bates et al. 2011) | 2006–07 | 6591 | 576 | 3000 | 192 | 125 | 1055 | 286 |
Green Bank 350 MHz Drift Scan (Boyles et al. 2013) | 2007 | 350 | 50 | 24.414 | 2048 | 81.92 | 140 | ? |
GBT350 (Spigot) (Deneva et al. 2013) | 2007 | 350 | 50 | 24.414 | 2048 | 82 | 140 | ? |
P-ALFA Survey (MOCK) (Spitler et al. 2014; Deneva et al. 2009; Lazarus 2012)a | 2009–14 | 1375 | 322.6 | 336.042 | 960 | 65.5 | 120/300 | 5016 |
GBNCC (GUPPI) (Deneva et al. 2013; Stovall et al. 2014)a | 2009–14 | 350 | 100 | 24.414 | 4096 | 82 | 120 | 17 352/26 532 |
Southern HTRU (LOW) (Keith et al. 2010) | 2010–12 | 1352 | 340 | 390.625 | 870 | 64 | 4300 | ? |
Southern HTRU (MED) (Keith et al. 2010) | 2010–12 | 1352 | 340 | 390.625 | 870 | 64 | 540 | 1436 |
Southern HTRU (HIGH) (Keith et al. 2010) | 2010–12 | 1352 | 340 | 390.625 | 870 | 64 | 270 | 8000 |
A0327 (MOCK) (Deneva et al. 2013) | 2010 | 327 | 57 | 55.664 | 1024 | 125 | 60 | 6358 |
LPPS (Coenen et al. 2014) | 2010 | 142 | 6.8 | 12.143 | 560 | 655 | 3420 | 3487 |
LOTAS (Coenen et al. 2014)a | 2010–11 | 135 | 48 | 12.295 | 3904 | 1300 | 1020 | 16 845/18 100 |
Northern HTRU (LOW) (Barr et al. 2013; Ng 2012)a | 2010–14 | 1360 | 240 | 585.9 | 410 | 54.61 | 1500 | 406/3240 |
Northern HTRU (MED) (Barr et al. 2013; Ng 2012)a | 2010–14 | 1360 | 240 | 585.9 | 410 | 54.61 | 180 | 406/3240 |
Northern HTRU (HIGH) (Barr et al. 2013; Ng 2012)a | 2010–14 | 1360 | 240 | 585.9 | 410 | 54.61 | 90 | 406/3240 |
SPAN512 (Desvignes et al. 2012) | 2012 | 1486 | 512 | 500 | 1024 | 64 | 1080 | ? |
LOTAAS (Lofar Working Group 2013; Cooper 2014) | 2013 | 135 | 95 | 12.207 | 2592 | 491.52 | 3600 | 7000 |
A0327 (PUPPI) (Deneva et al. 2013) | 2014 | 327 | 69 | 24.503 | 2816 | 82 | 60 | 6358 |
SUPERB (Barr 2014; Keane et al., in preparation) | 2014 | 1374 | 340 | 332.031 | 1024 | 32 | 540 | 1448 |
GMRT High Resolution Southern Sky Survey (MID) (Bhattachatyya 2014; Bhattachatyya et al. 2016) | 2014 | 322 | 32 | 15.625 | 2048 | 60 | 1200 | 6000 |
GMRT High Resolution Southern Sky Survey (HIGH) (Bhattachatyya 2014; Bhattachatyya et al. 2016) | 2014 | 322 | 32 | 31.25 | 1024 | 30 | 720 | 6000 |
FASTb (Smits et al. 2009b) | 2016 | 1315 | 400 | 42.105 | 9500 | 100 | 600 | ? |
SKAb (Configuration A) (Smits et al. 2009a) | 2020–22 | 1250 | 500 | 50 | 9500 | 64 | 1800 | ? |
SKAb (Configuration B) (Smits et al. 2009a) | 2020–22 | 650 | 300 | 50 | 9500 | 64 | 1800 | ? |
Note. amore than one configuration used during the survey.
bProjected future survey with configuration specifics subject to change.
Technical specifications of pulsar surveys conducted between 2000 and present, and projected specifications for instruments under development. X-ray pulsar searches undertaken during this period (Abdo et al. 2009; Ransom et al. 2011) are omitted. Here Fc (MHz) is the central observing frequency, B (MHz) is the bandwidth, Δv (kHz) is the channel width (to 3.d.p), nchans indicates the number of frequency channels, tsamp(μs) is the sample frequency (to 3.d.p), and tobs(s) the length of the observation (to 1.d.p). Values that could not be found in the literature are indicated with ‘?’. The omission of a survey should be treated as an oversight as opposed to a judgement on its significance.
Survey . | Year . | Fc (MHz) . | B (MHz) . | Δv (kHz) . | nchans . | tsamp(μs) . | tobs(s) . | DM trials . |
---|---|---|---|---|---|---|---|---|
Parkes high-lat multibeam (Burgay et al. 2006) | 2000–03 | 1374 | 288 | 3000 | 96 | 125 | 265 | ? |
Survey of the Magellanic Clouds (Manchester et al. 2006) | 2000–01 | 1374 | 288 | 3000 | 96 | 1000 | 8400 | 228 |
1.4 GHz Arecibo Survey (DM < 100) (Hessels et al. 2007) | 2001–02 | 1175 | 100 | 390.625 | 256 | 64 | 7200 | ? |
1.4 GHz Arecibo Survey (DM > 100) (Hessels et al. 2007) | 2001–02 | 1475 | 100 | 195.313 | 512 | 128 | 7200 | ? |
Large Area Survey for Radio Pulsars (Jacoby et al. 2009) | 2001–02 | 1374 | 288 | 3000 | 96 | 125 | 256 | 375 |
EGRET 56 Pulsar survey (Crawford et al. 2006) | 2002–03 | 1374 | 288 | 3000 | 96 | 125 | 2100 | 150 |
EGRET error box survey (Champion, McLaughlin & Lorimer 2005) | 2003 | 327 | 25 | 48.828 | 512 | 125 | 260 | 392 |
A0327 Pilot (Deneva et al. 2013) | 2003 | 327 | 25 | 48.828 | 512 | 256 | 60 | 6358 |
The Perseus Arm Pulsar Survey (Burgay et al. 2013)a | 2004–09 | 1374 | 288 | 3000 | 96 | 125 | 2100 | 183/325 |
The 8gr8 Cygnus Survey (Rubio-Herrera et al. 2007; Janssen et al. 2009) | 2004 | 328 | 10 | 19.531 | 512 | 819.2 | 6872 | 488 |
Parkes deep northern Galactic Plane (Lorimer, Camilo & McLaughlin 2013) | 2004–05 | 1374 | 288 | 3000 | 96 | 125 | 4200 | 496 |
P-ALFA Survey (intial) (WAPP) (Cordes et al. 2006; Deneva et al. 2009) | 2004 | 1420 | 100 | 390.625 | 256 | 64 | 134 | 96 |
P-ALFA Survey (anticipated) (WAPP) (Cordes et al. 2006; Deneva et al. 2009)a | 2004–10 | 1420 | 300 | 390.625 | 1024 | 64 | 134 | 96/1272 |
6.5 Ghz Multibeam Pulsar Survey (Bates et al. 2011) | 2006–07 | 6591 | 576 | 3000 | 192 | 125 | 1055 | 286 |
Green Bank 350 MHz Drift Scan (Boyles et al. 2013) | 2007 | 350 | 50 | 24.414 | 2048 | 81.92 | 140 | ? |
GBT350 (Spigot) (Deneva et al. 2013) | 2007 | 350 | 50 | 24.414 | 2048 | 82 | 140 | ? |
P-ALFA Survey (MOCK) (Spitler et al. 2014; Deneva et al. 2009; Lazarus 2012)a | 2009–14 | 1375 | 322.6 | 336.042 | 960 | 65.5 | 120/300 | 5016 |
GBNCC (GUPPI) (Deneva et al. 2013; Stovall et al. 2014)a | 2009–14 | 350 | 100 | 24.414 | 4096 | 82 | 120 | 17 352/26 532 |
Southern HTRU (LOW) (Keith et al. 2010) | 2010–12 | 1352 | 340 | 390.625 | 870 | 64 | 4300 | ? |
Southern HTRU (MED) (Keith et al. 2010) | 2010–12 | 1352 | 340 | 390.625 | 870 | 64 | 540 | 1436 |
Southern HTRU (HIGH) (Keith et al. 2010) | 2010–12 | 1352 | 340 | 390.625 | 870 | 64 | 270 | 8000 |
A0327 (MOCK) (Deneva et al. 2013) | 2010 | 327 | 57 | 55.664 | 1024 | 125 | 60 | 6358 |
LPPS (Coenen et al. 2014) | 2010 | 142 | 6.8 | 12.143 | 560 | 655 | 3420 | 3487 |
LOTAS (Coenen et al. 2014)a | 2010–11 | 135 | 48 | 12.295 | 3904 | 1300 | 1020 | 16 845/18 100 |
Northern HTRU (LOW) (Barr et al. 2013; Ng 2012)a | 2010–14 | 1360 | 240 | 585.9 | 410 | 54.61 | 1500 | 406/3240 |
Northern HTRU (MED) (Barr et al. 2013; Ng 2012)a | 2010–14 | 1360 | 240 | 585.9 | 410 | 54.61 | 180 | 406/3240 |
Northern HTRU (HIGH) (Barr et al. 2013; Ng 2012)a | 2010–14 | 1360 | 240 | 585.9 | 410 | 54.61 | 90 | 406/3240 |
SPAN512 (Desvignes et al. 2012) | 2012 | 1486 | 512 | 500 | 1024 | 64 | 1080 | ? |
LOTAAS (Lofar Working Group 2013; Cooper 2014) | 2013 | 135 | 95 | 12.207 | 2592 | 491.52 | 3600 | 7000 |
A0327 (PUPPI) (Deneva et al. 2013) | 2014 | 327 | 69 | 24.503 | 2816 | 82 | 60 | 6358 |
SUPERB (Barr 2014; Keane et al., in preparation) | 2014 | 1374 | 340 | 332.031 | 1024 | 32 | 540 | 1448 |
GMRT High Resolution Southern Sky Survey (MID) (Bhattachatyya 2014; Bhattachatyya et al. 2016) | 2014 | 322 | 32 | 15.625 | 2048 | 60 | 1200 | 6000 |
GMRT High Resolution Southern Sky Survey (HIGH) (Bhattachatyya 2014; Bhattachatyya et al. 2016) | 2014 | 322 | 32 | 31.25 | 1024 | 30 | 720 | 6000 |
FASTb (Smits et al. 2009b) | 2016 | 1315 | 400 | 42.105 | 9500 | 100 | 600 | ? |
SKAb (Configuration A) (Smits et al. 2009a) | 2020–22 | 1250 | 500 | 50 | 9500 | 64 | 1800 | ? |
SKAb (Configuration B) (Smits et al. 2009a) | 2020–22 | 650 | 300 | 50 | 9500 | 64 | 1800 | ? |
Survey . | Year . | Fc (MHz) . | B (MHz) . | Δv (kHz) . | nchans . | tsamp(μs) . | tobs(s) . | DM trials . |
---|---|---|---|---|---|---|---|---|
Parkes high-lat multibeam (Burgay et al. 2006) | 2000–03 | 1374 | 288 | 3000 | 96 | 125 | 265 | ? |
Survey of the Magellanic Clouds (Manchester et al. 2006) | 2000–01 | 1374 | 288 | 3000 | 96 | 1000 | 8400 | 228 |
1.4 GHz Arecibo Survey (DM < 100) (Hessels et al. 2007) | 2001–02 | 1175 | 100 | 390.625 | 256 | 64 | 7200 | ? |
1.4 GHz Arecibo Survey (DM > 100) (Hessels et al. 2007) | 2001–02 | 1475 | 100 | 195.313 | 512 | 128 | 7200 | ? |
Large Area Survey for Radio Pulsars (Jacoby et al. 2009) | 2001–02 | 1374 | 288 | 3000 | 96 | 125 | 256 | 375 |
EGRET 56 Pulsar survey (Crawford et al. 2006) | 2002–03 | 1374 | 288 | 3000 | 96 | 125 | 2100 | 150 |
EGRET error box survey (Champion, McLaughlin & Lorimer 2005) | 2003 | 327 | 25 | 48.828 | 512 | 125 | 260 | 392 |
A0327 Pilot (Deneva et al. 2013) | 2003 | 327 | 25 | 48.828 | 512 | 256 | 60 | 6358 |
The Perseus Arm Pulsar Survey (Burgay et al. 2013)a | 2004–09 | 1374 | 288 | 3000 | 96 | 125 | 2100 | 183/325 |
The 8gr8 Cygnus Survey (Rubio-Herrera et al. 2007; Janssen et al. 2009) | 2004 | 328 | 10 | 19.531 | 512 | 819.2 | 6872 | 488 |
Parkes deep northern Galactic Plane (Lorimer, Camilo & McLaughlin 2013) | 2004–05 | 1374 | 288 | 3000 | 96 | 125 | 4200 | 496 |
P-ALFA Survey (intial) (WAPP) (Cordes et al. 2006; Deneva et al. 2009) | 2004 | 1420 | 100 | 390.625 | 256 | 64 | 134 | 96 |
P-ALFA Survey (anticipated) (WAPP) (Cordes et al. 2006; Deneva et al. 2009)a | 2004–10 | 1420 | 300 | 390.625 | 1024 | 64 | 134 | 96/1272 |
6.5 Ghz Multibeam Pulsar Survey (Bates et al. 2011) | 2006–07 | 6591 | 576 | 3000 | 192 | 125 | 1055 | 286 |
Green Bank 350 MHz Drift Scan (Boyles et al. 2013) | 2007 | 350 | 50 | 24.414 | 2048 | 81.92 | 140 | ? |
GBT350 (Spigot) (Deneva et al. 2013) | 2007 | 350 | 50 | 24.414 | 2048 | 82 | 140 | ? |
P-ALFA Survey (MOCK) (Spitler et al. 2014; Deneva et al. 2009; Lazarus 2012)a | 2009–14 | 1375 | 322.6 | 336.042 | 960 | 65.5 | 120/300 | 5016 |
GBNCC (GUPPI) (Deneva et al. 2013; Stovall et al. 2014)a | 2009–14 | 350 | 100 | 24.414 | 4096 | 82 | 120 | 17 352/26 532 |
Southern HTRU (LOW) (Keith et al. 2010) | 2010–12 | 1352 | 340 | 390.625 | 870 | 64 | 4300 | ? |
Southern HTRU (MED) (Keith et al. 2010) | 2010–12 | 1352 | 340 | 390.625 | 870 | 64 | 540 | 1436 |
Southern HTRU (HIGH) (Keith et al. 2010) | 2010–12 | 1352 | 340 | 390.625 | 870 | 64 | 270 | 8000 |
A0327 (MOCK) (Deneva et al. 2013) | 2010 | 327 | 57 | 55.664 | 1024 | 125 | 60 | 6358 |
LPPS (Coenen et al. 2014) | 2010 | 142 | 6.8 | 12.143 | 560 | 655 | 3420 | 3487 |
LOTAS (Coenen et al. 2014)a | 2010–11 | 135 | 48 | 12.295 | 3904 | 1300 | 1020 | 16 845/18 100 |
Northern HTRU (LOW) (Barr et al. 2013; Ng 2012)a | 2010–14 | 1360 | 240 | 585.9 | 410 | 54.61 | 1500 | 406/3240 |
Northern HTRU (MED) (Barr et al. 2013; Ng 2012)a | 2010–14 | 1360 | 240 | 585.9 | 410 | 54.61 | 180 | 406/3240 |
Northern HTRU (HIGH) (Barr et al. 2013; Ng 2012)a | 2010–14 | 1360 | 240 | 585.9 | 410 | 54.61 | 90 | 406/3240 |
SPAN512 (Desvignes et al. 2012) | 2012 | 1486 | 512 | 500 | 1024 | 64 | 1080 | ? |
LOTAAS (Lofar Working Group 2013; Cooper 2014) | 2013 | 135 | 95 | 12.207 | 2592 | 491.52 | 3600 | 7000 |
A0327 (PUPPI) (Deneva et al. 2013) | 2014 | 327 | 69 | 24.503 | 2816 | 82 | 60 | 6358 |
SUPERB (Barr 2014; Keane et al., in preparation) | 2014 | 1374 | 340 | 332.031 | 1024 | 32 | 540 | 1448 |
GMRT High Resolution Southern Sky Survey (MID) (Bhattachatyya 2014; Bhattachatyya et al. 2016) | 2014 | 322 | 32 | 15.625 | 2048 | 60 | 1200 | 6000 |
GMRT High Resolution Southern Sky Survey (HIGH) (Bhattachatyya 2014; Bhattachatyya et al. 2016) | 2014 | 322 | 32 | 31.25 | 1024 | 30 | 720 | 6000 |
FASTb (Smits et al. 2009b) | 2016 | 1315 | 400 | 42.105 | 9500 | 100 | 600 | ? |
SKAb (Configuration A) (Smits et al. 2009a) | 2020–22 | 1250 | 500 | 50 | 9500 | 64 | 1800 | ? |
SKAb (Configuration B) (Smits et al. 2009a) | 2020–22 | 650 | 300 | 50 | 9500 | 64 | 1800 | ? |
Note. amore than one configuration used during the survey.
bProjected future survey with configuration specifics subject to change.
2.1.1 Approximate model of candidate numbers
Selection begins in the spectral S/N regime as described in Section 2. Here each suspect period associated with a spectral S/N, is found through a Fourier analysis of a de-dispersed time series. However, we have incomplete knowledge of the S/N distribution of spectral suspects, which arise from either (i) variations in Galactic background noise, (ii) RFI, (iii) instrument noise, or (iv) legitimate phenomena. To overcome this, we model only the most significant contributor of candidates, Gaussian distributed background noise. Empirical evidence suggests most candidates originate from background noise. Our analysis of High Time Resolution Universe Survey (HTRU) data (Thornton 2013) supports this view, held by others (Lee et al. 2013; Morello et al. 2014). It is also logically consistent, since if most candidates arose from legitimate phenomena discovery would be trivial. Whilst if most arose from RFI, this would be concerning, as telescopes used for surveys are situated in low RFI environments. It thus appears sensible to conclude that candidates are noise dominated.

Diagram of 1 − CDF of equation (1), showing the relationship between nσ and constant cuts. This illustrates their impact on the number of noise candidates making it through to the candidate selection stage.

Candidate numbers predicted by equation (4) (using nσ = 7 and |$n_{\rm \sigma }^{\rm max}=100$|), varied according to the total number of survey pointings for a single beam receiver. Coloured dashed lines indicate the total number of candidates returned when using a conservative C = 100 cut. The corresponding solid colour lines indicate the total number of candidates returned when the cut is discarded. The solid lines are truncated such that they begin where C = 100 to avoid overlapping lines complicating the plot.
There are two strategies available for dealing with the implied rise of noisy candidates. The first is to increase the lower S/N limit nσ in equation (2). This effectively implements an S/N cut-off, used by many to filter in the spectral domain (Foster et al. 1995; Hessels et al. 2007; Burgay et al. 2013; Thornton 2013), and the folded domain (Damashek, Taylor & Hulse 1978; Manchester et al. 1978; Stokes et al. 1986; Manchester et al. 2001; Burgay et al. 2013). However in practice this cut-off would become high enough to reject weaker detections of interest (i.e. weaker pulsars, see Section 4.2.1) if it is to reduce candidate numbers. The second option is to impose a smaller constant cut-off C to the candidates collected per observation or beam, also done by many (Edwards et al. 2001; Jacoby et al. 2009; Bates et al. 2012; Thornton 2013) and accounted for in our model. Fig. 2 shows these two methods to be fundamentally the same. Imposing a fixed limit C on the output of equation (2), can only be achieved by increasing the lower value of nσ in the integral, since the integrand is fixed by equation (1). This corresponds to setting a high S/N cut-off. Using either of these approaches impacts our ability to detect legitimate pulsar signals. This is particularly true of a top C cut, as it would appear that noise alone can fill up a top C cut, without even taking into consideration the influence of RFI, or legitimate phenomena. Taking d to the limit increases the certainty that noise will dominate a candidate cut, and reduces the likelihood of weak legitimate signals making it through to analysis. We now turn our attention to determining how to deal with these issues.
3 CANDIDATE SELECTION METHODS
3.1 Manual selection
During the earliest surveys, manual selection involved the inspection of analogue pen chart records for periodic signals (Large, Vaughan & Wielebinski 1968; Manchester et al. 1978). This process was subsequently replaced by digital data inspection, with the adoption of early computer systems. From then on, manual selection involved the inspection ‘by eye’ of digitally produced diagnostic plots describing each candidate. Those found exhibiting pulsar-like characteristics were recorded for analysis, whilst the remainder were ignored (though retained on disc for possible re-analysis).
During the initial period of digitization, pulsar surveys produced very few candidates with respect to modern searches. The second Molonglo survey conducted during the 1970s, produced only 2500 candidates in total (Manchester et al. 1978). These yielded 224 pulsar detections (Manchester et al. 2005), a hit rate of almost 9 per cent.3 Thus during this period manual selection was entirely practical. Soon after however, increasing candidate numbers began to cause problems. The first mention of this within the literature (to the best of our knowledge) was made by Clifton & Lyne (1986) regarding Jodrell Survey B. The number of candidates produced during this survey necessitated extensive manual selection on the basis of pulse profile appearance and S/N. Although such heuristic judgements were not new, their explicit mention with respect to candidate selection indicated that a shift in procedure had occurred. Whereas before it was possible to evaluate most, if not all candidates by eye, here it became necessary to expedite the process using heuristics. Contemporary surveys reacting to similar issues imposed high S/N cut-offs to limit candidate numbers directly. The Arecibo Phase II survey used an 8σ S/N cut, thus only ∼5405 candidates required manual inspection (Stokes et al. 1986).
The use of heuristics and S/N cuts proved insufficient to deal with candidate number problems. Additional processing steps such as improved sifting were applied in response, and these became increasingly important during this period. However as these measures apply high up the processing pipeline (close to the final data products), their capacity to reduce candidate numbers was limited. Consequently attempts were made to automatically remove spurious candidates lower down the pipeline, with the aim of preventing them ever reaching human eyes. During the Parkes 20-cm survey, two software tools were devised by Johnston et al. (1992) to achieve this. Together these encapsulated and optimized the general search procedure discussed in Section 2. The software (‘mspfind’ and another unnamed tool) was explicitly designed to reduce the quantity of spurious candidates, while maintaining sensitivity to millisecond pulsars (MSPs). Only candidates with an S/N >8 were allowed through the pipeline to manual inspection. It is unclear how many candidates required manual inspection, though the number was less than 150 000 (Johnston et al. 1992). During the same period, a similar software tool known as the Caltech Pulsar Package (Deich 1994), was developed for the Arecibo 430 MHz Intermediate Galactic Latitude Survey (Navarro et al. 2003). These represent some of the earliest efforts to systematise the search process in a reproducible way.
3.2 Summary interfaces
The success achieved via low-level filtering and sifting, continued to be undermined by ever-increasing candidate numbers brought about by technological advances. By the late 1990s, manual selection was therefore becoming increasingly infeasible. This spawned many graphical tools, designed to summarize and filter candidates for speedy and concise evaluation. The first of these, runview (Burgay et al. 2006), was created to analyse data output by the Parkes Multibeam Survey (PMPS; Manchester et al. 2001). During the Swinburne Intermediate-latitude survey, Edwards et al. (2001) devised a similar graphical tool that included distributional information of candidate parameters. A later reprocessing of PMPS data for binary and MSPs, spawned the development of a more sophisticated graphical tool for candidate viewing called reaper. reaper used a dynamic customizable plot (Faulkner et al. 2004) that enabled heuristic judgements of candidate origin to be made using multiple variables. The use of reaper led to the discovery of 128 unidentified pulsars in PMPS data. This corresponds to ∼15.4 per cent of the known pulsars in PMPS data, given that 833 have now been identified (Lorimer et al. 2015).
Following the success of reaper, an updated version of the tool called jreaper was developed by Keith et al. (2009). It incorporated algorithms which assigned numerical scores to candidates based on their parameters, permitting candidate rankings. By ignoring those candidates achieving low rankings, the amount of visual inspection required was reduced. When applied to data gathered during the PMPS, use of jreaper led to the discovery of a further 28 new pulsars (Keith et al. 2009), corresponding to ∼3.4 per cent of known PMPS pulsars. Thus by 2009, summary interfaces had helped find ∼18.7 per cent of all PMPS pulsars illustrating the usefulness of graphical approaches. More recently, web-based candidate viewing systems incorporating similar scoring mechanisms have appeared (Cordes et al. 2006; Deneva et al. 2009, 2013). One such tool, The Pulsar Search Collaboratory (Rosen et al. 2010),4 also incorporates human scoring via the input of high school students. Students taking part in the programme have discovered several new pulsars (Rosen et al. 2013). This includes PSR J1930−1852, a pulsar in a double neutron star system (Swiggum et al. 2015).
3.3 Semi-automated ranking approaches
Semi-automated selection approaches have recently begun to emerge. Amongst the most popular are those employing ranking mechanisms to prioritize promising candidates for human attention. The most notable of these is the peace system developed by Lee et al. (2013). peace describes each candidate via six numerical features, combined linearly to form a candidate score. Ranked candidates are then analysed via graphical viewing tools by students in the Arecibo Remote Command Centre Programme. To date peace has been used during the Greenbank Northern Celestial Cap Survey (GBNCC; Stovall et al. 2014) and the Northern High Time Resolution Universe Survey (HTRU north, Ng 2012; Barr et al. 2013). Periodic and single-pulse candidates obtained during the A0327 survey (Deneva et al. 2013), were similarly ranked using an algorithm based on peace. Over 50 participants (of varying expertise) from four universities, were then invited to view the A0327 candidates via a web-based interface.
3.4 Automated ‘Intelligent’ selection
Intelligent selection techniques are gaining widespread adoption. The nature of the intelligence arises from the domain of statistical learning theory, more generally known as ML. In particular, from a branch of ML known as statistical classification. The aim of classification is to build functions that accurately map a set of input data points, to a set of class labels. For pulsar search this means mapping each candidate to its correct label (pulsar or non-pulsar). This is known as candidate classification, a form of supervised learning (Mitchell 1997; Duda, Hart & Stork 2000; Bishop 2006). If S = {X1, …, Xn} represents the set of all candidate data, then Xi is an individual candidate represented by variables known as features. Features describe the characteristics of the candidate such that |$X_{\rm i} = \lbrace X_{\rm i}^{\rm j},\ldots ,X_{\rm i}^{\rm m} \rbrace$|, where each feature |$X_{\rm i}^{\rm j} \in \mathbb {R}$| for j = 1, …, m. The label y associated with each candidate, may have multiple possible values such that y ∈ Y = {y1, …, yk} (e.g. MSP, RFI, noise etc.). However since the goal here is to separate pulsar and non-pulsar candidates, we consider the binary labels y ∈ Y = {−1, 1}, where y1 = −1 equates to non-pulsar (synonymous with negative) and y2 = 1 to pulsar (synonymous with positive).
To build accurate classification systems, it is desirable to utilize features that separate the classes under consideration. This is illustrated in Fig. 4. An ML function ‘learns’ to separate candidates described using features, from a labelled input vector known as the training set T. It contains pairs such that T = {(X1, y1), …, (Xn, yn)}. The goal of classification is to induce a mapping function between candidates and labels based on the data in T, that minimizes generalization error on test examples (Kohavi & John 1997). The derived function can then be used to label new unseen candidates.

Example of the varying separability of features from highly separable in (a), to poorly separable in (b).
The first application of ML approaches to candidate selection was accomplished by Eatough et al. (2010). In this work each candidate was reduced to a set of 12 numerical feature values inspired by the scoring system first adopted in jreaper. A predictive model based on a multilayered perceptron (MLP), a form of artificial neural network (Haykin 1999; Bishop 2006), was then constructed. Using this model, a re-analysis of a sample of PMPS data was completed and a new pulsar discovered (Eatough 2009). Neural network classifiers based on the MLP architecture were also developed to run on data gathered during the HTRU survey. Bates et al. (2012) modified the earlier approach by describing candidates using 10 further numerical features (22 in total). The same features were used to train neural network classifiers applied to HTRU medium latitude data by Thornton (2013). More recently the spinn system developed by Morello et al. (2014), utilized developments from the field of computer science to optimize neural network performance on a set of six features. spinn is currently being applied as part of the Survey for Pulsars and Extragalactic Radio Bursts (SUPERB; Barr 2014; Keane et al., in preparation).
Convolutional neural networks (CNN; Bengio 2009), which achieved prominence due to their high accuracy on difficult learning problems such as speech and image recognition, have been adapted for candidate selection. The Pulsar Image-based Classification System (pics) developed by Zhu et al. (2014), uses the CNN and other types of ML classifier to perform image classification on candidate plots. pics is technically the most sophisticated approach available, and it appears to possess high accuracy. However this comes at the expense of high computational costs. Particularly with respect to runtime complexity.
4 DISCUSSION
4.1 Critique of manual selection
Manual selection has retained a vital role in pulsar search (Keith et al. 2010), as demonstrated by its use during recent surveys (Bates et al. 2011; Boyles et al. 2013; Coenen et al. 2014). The strongest argument in favour of manual selection is its presumed accuracy, i.e. by Eatough (2009) and Morello et al. (2014). However, to the best of our knowledge, no study of the accuracy of expert selection has been conducted. Although intuitively one would expect manual accuracy to be high, studies in other domains indicate otherwise. Most famously studies in medicine and finance (Meehl 1954; Barber & Odean 2000) suggest that expert decision making is flawed due to unconscious biases. Indeed manual selection is already known to be a subjective and error prone process (Eatough 2009; Eatough et al. 2010). In any case, it is infeasible to continue using manual approaches given the rise in candidate numbers predicted in Section 2.1, also anticipated by others (Keane et al. 2014). Thus irrespective of the true accuracy of manual selection, it must be supplanted to keep pace with increasing data capture rates and candidate numbers.
4.2 Critique of automated approaches
ML approaches are becoming increasingly important for automating decision making processes in finance (Chandola, Banerjee & Kumar 2009), medicine (Markou & Singh 2003; Chandola et al. 2009), safety critical systems (Markou & Singh 2003; Hodge & Austin 2004; Chandola et al. 2009) and astronomy (Ball & Brunner 2009; Borne. 2009; Way et al. 2012). Given the widespread adoption of ML, the continued application of manual selection raises a fundamental question: why has a transition to completely automated selection not yet occurred? Specific barriers to adoption may be responsible, such as the expertise required to implement and use ML methods effectively. Where this barrier is overcome, approaches emerge that are typically survey and search specific.
A further problem is the limited public availability of pulsar specific code and data. Thus to adopt ML approaches new systems generally need to be built from scratch. ML approaches also have to be ‘trained’ upon data acquired by the same pipeline they will be deployed upon5. If training data are not shared, it has to be collected before a survey begins. The cost of doing so may be a further barrier to adoption. Perhaps more simply, existing automated approaches may not yet be accurate enough to be trusted completely. If this is the case, it is unlikely to be caused by the choice of ML system (e.g. neural network, probabilistic classifier, or any other). Those methods described in Section 3.4 employ well-studied ML techniques, proven to be effective for a variety of problems. Drops in performance are more likely to be due to deficiencies in (i) the features describing candidates, and (ii) the data used to train learning algorithms. In the following section, we present evidence suggesting that existing candidate features may well be sub-optimal.
4.2.1 Sub-optimal candidate features
Candidate features can be categorized as being either fundamental to, or as being derived from candidate data. The latter derive new information on the assumption that it will possess some utility, whilst the former do not. For instance the S/N or period of a candidate, can be considered fundamental. A good example of a derived feature is the χ2 value of a sine curve fit to the pulse profile as used by Bates et al. (2012). Using curve fittings in this manner expresses an underlying hypothesis. In this case Bates et al. (2012) suppose a good χ2 fit to be indicative of sinusoidal RFI. Whilst the reasoning is sound, such a feature represents an untested hypothesis which may or may not hold true.
The majority of existing features are derived (see Eatough et al. 2010; Bates et al. 2012; Thornton 2013; Morello et al. 2014), and are based upon the heuristics used when selecting candidates manually. As manual selection is imperfect, we cannot rule out the possibility of having designed features, and thereby automated methods, which make the same mistakes as ourselves. Some features in use have been found to introduce unwanted and unexpected biases against particular types of pulsar candidate (Bates et al. 2012; Morello et al. 2014). Fundamental features are not necessarily better. For example the folded or spectral S/N, is often used as a primitive filter and as a feature for learning. As noise candidates possessing folded S/Ns of 6σ are common (Nice et al. 1995), using an S/N cut at this level allows large numbers of likely noise-originating candidates to be rejected. However as noted by Bates et al. (2012), such cuts are helpful only if one assumes all low-S/N candidates are attributable to noise. In practice, the application of cuts has prevented the detection of weaker pulsar signals as warned in Section 2.1. PSR J0812−3910 went unseen in High Latitude survey data (Burgay et al. 2006), as its spectral S/N was below the survey's threshold for folding. Similarly PSR J0818−3049 went undetected during the same survey, as its folded S/N was below the cut applied prior to manual selection. What is more, there is no agreed upon S/N cut level for any stage in the search pipeline. Domain experience usually plays a role in determining the level, but this is often not specified and difficult to quantify. Levels used include 6σ (Damashek et al. 1978; Thornton 2013), 6.3σ (Manchester et al. 1978), 7σ (Foster et al. 1995; Hessels et al. 2007), 7.5σ (Manchester et al. 1996), 8σ (Stokes et al. 1986; Johnston et al. 1992; Edwards et al. 2001; Manchester et al. 2001; Burgay et al. 2006, 2013), 8.5σ (Nice et al. 1995), 9σ (Jacoby et al. 2009; Bates et al. 2011), and finally 9.5σ (Jacoby et al. 2009).
A further problem with many existing features is that they are implementation dependent. They are described using concepts that can be expressed in various ways mathematically (S/N used by Bates et al. 2011; Thornton 2013; Lee et al. 2013; Morello et al. 2014), are subject to interpretation without precise definition (pulse width used by Bates et al. 2011; Lee et al. 2013; Thornton 2013; Morello et al. 2014), or implicitly use external algorithms which go undefined (e.g. curve fitting employed by Bates et al. 2011; Thornton 2013). It is therefore difficult to build upon the work of others, as features and reported results are not reproducible. Thus direct comparisons between features are rare (Morello et al. 2014) and impractical.
4.2.2 Feature evaluation issues
The techniques most often used to evaluate features are inadequate for determining how well they separate pulsar and non-pulsar candidates. The most common form of evaluation is undertaken in two steps. The first determines the presence of linear correlations between features and class labels (Bates et al. 2011), the second compares the performance of different classifiers built using the features (Bates et al. 2011; Lee et al. 2013; Morello et al. 2014) – the standard ‘wrapper’ method (Kohavi & John 1997; Guyon & Elisseeff 2003). This two-step evaluation considers strong linear correlations and accurate classification performance, characteristic of ‘good’ feature sets. However this fails to consider the presence of useful non-linear correlations in the data. Finally using classifier outputs to assess feature performance is known to give misleading results (Brown et al. 2012), as performance will vary according to the classifier used.
In order to build robust shareable features tolerant to bias, it is necessary to adopt standard procedures that facilitate reproducibility and independent evaluation within the pulsar search community. Morello et al. (2014) began this process via the sharing of a fully labelled data set, and by providing a clear set of design principles used when creating their features. Here we make similar recommendations, closely followed when designing and evaluating the new feature set described in Section 5. It is recommended that features,
minimize biases and selection effects (Morello et al. 2014),
be survey independent for data interoperability,
be implementation independent, with concise mathematical definitions allowing for reproducibility,
be evaluated using a statistical framework that enables comparison and reproducibility,
guard against high dimensionality (Morello et al. 2014),
be accompanied by public feature generation code, to facilitate co-operation and feature improvement,
be supplied in a standard data format,
be evaluated on multiple data sets to ensure robustness.
4.3 Future processing challenges
The shift to online processing has already occurred in other domains in response to similar data pressures (ATLAS Collaboration 2008). Indeed closer to home, some pulsar/fast transient searches are already being undertaken with real-time processing pipelines (Thompson et al. 2011; Ait-Allal et al. 2012; Barr 2014; van Heerden et al. 2014). Real-time searches for fast radio bursts (Lorimer et al. 2007; Keane et al. 2012; Thornton et al. 2013) are also becoming increasingly common (Karastergiou et al. 2015; Law et al. 2015; Petroff et al. 2015). These concerns are returned to in Section 6.
5 NEW CANDIDATE FEATURES
The model introduced in Section 2.1 implies that candidate numbers are rising exponentially, and increasingly dominated by noise. We aim to address these problems by finding candidate features that maximize the separation between noise and non-noise candidates, reducing the impact of the largest contributor to high candidate numbers. We also seek to minimize the number of features we use, so as to avoid the problems associated with the ‘curse of dimensionality’ (Hughes 1968), which reduces classification performance. In total, we extracted eight new features for this purpose from two components of the typical pulsar candidate following the recommendations of Section 4.2.1. These features are defined in full in Table 4.
The eight features derived from the integrated pulse profile P = {p1, …, pn}, and the DM-SNR curve D = {d1, …, dn}. For both P and D, all pi and |$d_{\rm i} \in \mathbb {N}$| for i = 1, …, n.
Feature . | Description . | Definition . |
---|---|---|
Prof.μ | Mean of the integrated profile P. | |$\displaystyle \frac{1}{n}\sum _{i=1}^{\rm n} p_{i}$| |
Prof.σ | Standard deviation of the integrated profile P. | |$\displaystyle \sqrt{\frac{\sum _{i=1}^{\rm n}(\rm p_{\rm i}-\bar{P})^{\rm 2}}{n-1}}$| |
Prof.k | Excess kurtosis of the integrated profile P. | |$\displaystyle \frac{\frac{1}{n}(\sum _{i=1}^{n}(p_{i}-\bar{P})^{\rm 4})}{(\frac{1}{n}(\sum _{i=1}^{\rm n}(p_{i}-\bar{P})^{\rm 2}))^{\rm 2}}-3$| |
Prof.s | Skewness of the integrated profile P. | |$\displaystyle \frac{\frac{1}{n}\sum _{i=1}^{\rm n}(p_{i}-\bar{P})^{\rm 3}}{\big (\sqrt{\frac{1}{n}\sum _{i=1}^{\rm n}(p_{i}-\bar{P})^{\rm 2}}\big )^3}$| |
DMμ | Mean of the DM-SNR curve D. | |$\displaystyle \frac{1}{n}\sum _{i=1}^{\rm n} d_{i}$| |
DMσ | Standard deviation of the DM-SNR curve D. | |$\displaystyle \sqrt{\frac{\sum _{i=1}^{n}(d_{i}-\bar{D})^{\rm 2}}{n-1}}$| |
DMk | Excess kurtosis of the DM-SNR curve D. | |$\displaystyle \frac{\frac{1}{n}(\sum _{i=1}^{n}(d_{i}-\bar{D})^{\rm 4})}{(\frac{1}{n}(\sum _{i=1}^{\rm n}(d_{i}-\bar{D})^{\rm 2}))^{\rm 2}}-3$| |
DMs | Skewness of the DM-SNR curve D. | |$\displaystyle \frac{\frac{1}{n}\sum _{i=1}^{\rm n}(d_{i}-\bar{D})^{\rm 3}}{\big (\sqrt{\frac{1}{n}\sum _{i=1}^{\rm n}(d_{i}-\bar{D})^{\rm 2}}\big )^3}$| |
Feature . | Description . | Definition . |
---|---|---|
Prof.μ | Mean of the integrated profile P. | |$\displaystyle \frac{1}{n}\sum _{i=1}^{\rm n} p_{i}$| |
Prof.σ | Standard deviation of the integrated profile P. | |$\displaystyle \sqrt{\frac{\sum _{i=1}^{\rm n}(\rm p_{\rm i}-\bar{P})^{\rm 2}}{n-1}}$| |
Prof.k | Excess kurtosis of the integrated profile P. | |$\displaystyle \frac{\frac{1}{n}(\sum _{i=1}^{n}(p_{i}-\bar{P})^{\rm 4})}{(\frac{1}{n}(\sum _{i=1}^{\rm n}(p_{i}-\bar{P})^{\rm 2}))^{\rm 2}}-3$| |
Prof.s | Skewness of the integrated profile P. | |$\displaystyle \frac{\frac{1}{n}\sum _{i=1}^{\rm n}(p_{i}-\bar{P})^{\rm 3}}{\big (\sqrt{\frac{1}{n}\sum _{i=1}^{\rm n}(p_{i}-\bar{P})^{\rm 2}}\big )^3}$| |
DMμ | Mean of the DM-SNR curve D. | |$\displaystyle \frac{1}{n}\sum _{i=1}^{\rm n} d_{i}$| |
DMσ | Standard deviation of the DM-SNR curve D. | |$\displaystyle \sqrt{\frac{\sum _{i=1}^{n}(d_{i}-\bar{D})^{\rm 2}}{n-1}}$| |
DMk | Excess kurtosis of the DM-SNR curve D. | |$\displaystyle \frac{\frac{1}{n}(\sum _{i=1}^{n}(d_{i}-\bar{D})^{\rm 4})}{(\frac{1}{n}(\sum _{i=1}^{\rm n}(d_{i}-\bar{D})^{\rm 2}))^{\rm 2}}-3$| |
DMs | Skewness of the DM-SNR curve D. | |$\displaystyle \frac{\frac{1}{n}\sum _{i=1}^{\rm n}(d_{i}-\bar{D})^{\rm 3}}{\big (\sqrt{\frac{1}{n}\sum _{i=1}^{\rm n}(d_{i}-\bar{D})^{\rm 2}}\big )^3}$| |
The eight features derived from the integrated pulse profile P = {p1, …, pn}, and the DM-SNR curve D = {d1, …, dn}. For both P and D, all pi and |$d_{\rm i} \in \mathbb {N}$| for i = 1, …, n.
Feature . | Description . | Definition . |
---|---|---|
Prof.μ | Mean of the integrated profile P. | |$\displaystyle \frac{1}{n}\sum _{i=1}^{\rm n} p_{i}$| |
Prof.σ | Standard deviation of the integrated profile P. | |$\displaystyle \sqrt{\frac{\sum _{i=1}^{\rm n}(\rm p_{\rm i}-\bar{P})^{\rm 2}}{n-1}}$| |
Prof.k | Excess kurtosis of the integrated profile P. | |$\displaystyle \frac{\frac{1}{n}(\sum _{i=1}^{n}(p_{i}-\bar{P})^{\rm 4})}{(\frac{1}{n}(\sum _{i=1}^{\rm n}(p_{i}-\bar{P})^{\rm 2}))^{\rm 2}}-3$| |
Prof.s | Skewness of the integrated profile P. | |$\displaystyle \frac{\frac{1}{n}\sum _{i=1}^{\rm n}(p_{i}-\bar{P})^{\rm 3}}{\big (\sqrt{\frac{1}{n}\sum _{i=1}^{\rm n}(p_{i}-\bar{P})^{\rm 2}}\big )^3}$| |
DMμ | Mean of the DM-SNR curve D. | |$\displaystyle \frac{1}{n}\sum _{i=1}^{\rm n} d_{i}$| |
DMσ | Standard deviation of the DM-SNR curve D. | |$\displaystyle \sqrt{\frac{\sum _{i=1}^{n}(d_{i}-\bar{D})^{\rm 2}}{n-1}}$| |
DMk | Excess kurtosis of the DM-SNR curve D. | |$\displaystyle \frac{\frac{1}{n}(\sum _{i=1}^{n}(d_{i}-\bar{D})^{\rm 4})}{(\frac{1}{n}(\sum _{i=1}^{\rm n}(d_{i}-\bar{D})^{\rm 2}))^{\rm 2}}-3$| |
DMs | Skewness of the DM-SNR curve D. | |$\displaystyle \frac{\frac{1}{n}\sum _{i=1}^{\rm n}(d_{i}-\bar{D})^{\rm 3}}{\big (\sqrt{\frac{1}{n}\sum _{i=1}^{\rm n}(d_{i}-\bar{D})^{\rm 2}}\big )^3}$| |
Feature . | Description . | Definition . |
---|---|---|
Prof.μ | Mean of the integrated profile P. | |$\displaystyle \frac{1}{n}\sum _{i=1}^{\rm n} p_{i}$| |
Prof.σ | Standard deviation of the integrated profile P. | |$\displaystyle \sqrt{\frac{\sum _{i=1}^{\rm n}(\rm p_{\rm i}-\bar{P})^{\rm 2}}{n-1}}$| |
Prof.k | Excess kurtosis of the integrated profile P. | |$\displaystyle \frac{\frac{1}{n}(\sum _{i=1}^{n}(p_{i}-\bar{P})^{\rm 4})}{(\frac{1}{n}(\sum _{i=1}^{\rm n}(p_{i}-\bar{P})^{\rm 2}))^{\rm 2}}-3$| |
Prof.s | Skewness of the integrated profile P. | |$\displaystyle \frac{\frac{1}{n}\sum _{i=1}^{\rm n}(p_{i}-\bar{P})^{\rm 3}}{\big (\sqrt{\frac{1}{n}\sum _{i=1}^{\rm n}(p_{i}-\bar{P})^{\rm 2}}\big )^3}$| |
DMμ | Mean of the DM-SNR curve D. | |$\displaystyle \frac{1}{n}\sum _{i=1}^{\rm n} d_{i}$| |
DMσ | Standard deviation of the DM-SNR curve D. | |$\displaystyle \sqrt{\frac{\sum _{i=1}^{n}(d_{i}-\bar{D})^{\rm 2}}{n-1}}$| |
DMk | Excess kurtosis of the DM-SNR curve D. | |$\displaystyle \frac{\frac{1}{n}(\sum _{i=1}^{n}(d_{i}-\bar{D})^{\rm 4})}{(\frac{1}{n}(\sum _{i=1}^{\rm n}(d_{i}-\bar{D})^{\rm 2}))^{\rm 2}}-3$| |
DMs | Skewness of the DM-SNR curve D. | |$\displaystyle \frac{\frac{1}{n}\sum _{i=1}^{\rm n}(d_{i}-\bar{D})^{\rm 3}}{\big (\sqrt{\frac{1}{n}\sum _{i=1}^{\rm n}(d_{i}-\bar{D})^{\rm 2}}\big )^3}$| |
The first four are simple statistics obtained from the integrated pulse profile (folded profile). The remaining four similarly obtained from the DM-SNR curve shown in plot (E) in Fig. 1. These features are fundamental to the data, are dissociated with any specific hypothesis, and are few in number. Likewise they possess no intrinsic biases, except perhaps resolution, with respect to the number of profile/DM curve bins used to describe a candidate. The chosen features are also survey/implementation-independent, provided integrated profile and DM-SNR curve data have the same numerical range, and the same ‘natural’ DM window7 for candidates output by different surveys.
‘This is defined as the range of DMs around the DM that gives the highest spectral detection significance for the candidate. The limits of this range are defined by the change in DM that corresponds to a time delay across the frequency band equivalent to the candidates intial detection period’. These features were selected by returning to first principles with respect to feature design. By incorporating knowledge of the increasing trend in candidate numbers predicted in Section 2.1, potential features were evaluated according to how well they each separated noise and non-noise candidates. Starting with simple lower order statistics as possible features (mean, mode, median etc.), the ability of each to reject noise was considered statistically via a three-stage process. Higher order statistics and derived features described by Thornton (2013) were then added to the pool of possible features, and evaluated similarly. Those achieving the best separation, and the best classification results when used together with ML classifiers (see Section 6.3), were then selected for use. Thus these features were chosen with no preconceived notions of their suitability or expressiveness. Rather features were chosen on a statistical basis to avoid introducing bias.
5.1 Feature evaluation
There are three primary considerations when evaluating new features. A feature must (i) be useful for discriminating between the various classes of candidate, (ii) maximize the separation between them, and (iii) perform well in practice when used in conjunction with a classification system. Three separate evaluation procedures have therefore been applied to the features listed in Table 4. The first two forms of evaluation are presented in the section that follows, whilst classification performance is described in Section 6.3, to allow for a comparison between standard classifiers and our stream algorithm described in Section 6. As features in themselves are without meaning unless obtained from data, we first describe the data sets used during our analysis, before presenting details of the evaluation.
5.1.1 Data
Three separate data sets were used to test the discriminating capabilities of our features. These are summarized in Table 5. The first data set (HTRU 1) was produced by Morello et al. (2014). It is the first labelled8 candidate data set made publicly available. It consists of 1196 pulsar and 89 995 non-pulsar candidates, in pulsar hunter xml files (.phcx files). These candidates were generated from a re-processing of HTRU Medium Latitude data, using the GPU-based search pipeline peasoup (Barr et al., in preparation). The pipeline searched for pulsar signals with DMs from 0 to 400 cm−3pc, and also performed an acceleration search between −50 and +50 m s−2. The HTRU 1 candidate sample possesses varied spin periods, duty cycles, and S/Ns.
Data set . | Examples . | Non-pulsars . | Pulsars . |
---|---|---|---|
HTRU 1 | 91 192 | 89 995 | 1196 |
HTRU 2 | 17 898 | 16 259 | 1639 |
LOTAAS 1 | 5053 | 4987 | 66 |
Data set . | Examples . | Non-pulsars . | Pulsars . |
---|---|---|---|
HTRU 1 | 91 192 | 89 995 | 1196 |
HTRU 2 | 17 898 | 16 259 | 1639 |
LOTAAS 1 | 5053 | 4987 | 66 |
Data set . | Examples . | Non-pulsars . | Pulsars . |
---|---|---|---|
HTRU 1 | 91 192 | 89 995 | 1196 |
HTRU 2 | 17 898 | 16 259 | 1639 |
LOTAAS 1 | 5053 | 4987 | 66 |
Data set . | Examples . | Non-pulsars . | Pulsars . |
---|---|---|---|
HTRU 1 | 91 192 | 89 995 | 1196 |
HTRU 2 | 17 898 | 16 259 | 1639 |
LOTAAS 1 | 5053 | 4987 | 66 |
In addition two further data sets were used during this work. The first (HTRU 2), is made available for analysis.9 It comprises 1639 pulsar and 16 259 non-pulsar candidates. These were obtained during an analysis of HTRU Medium Latitude data by Thornton (2013), using a search pipeline that searched DMs between 0 and 2000 cm−3 pc. The pipeline produced over 11 million candidates in total. Of these 1610 pulsar and 2592 non-pulsar candidates were manually labelled by Bates et al. (2012) and Thornton (2013). These were combined with an additional 13 696 candidates, sampled uniformly from the same data set according to observational session and month. These additional candidates were manually inspected and assigned their correct labels. Together the two sets of labelled candidates form HTRU 2. It contains 725 of the known 1108 pulsars in the survey region (Levin 2012), along with re-detections and harmonics. HTRU 2 also contains noise, along with strong and weak forms of RFI. The third and final candidate data set (LOTAAS 1), was obtained during the LOTAAS survey (Lofar Working Group 2013; Cooper 2014) and is currently private. The data set consists of 66 pulsar and 4987 non-pulsar candidates. Feature data were extracted from these data sets using a new custom written python tool, the pulsar feature lab. This tool is made available for use.10
5.1.2 General separability
The discriminating capabilities of the new features when applied to HTRU 1, are summarized in Fig. 6 via standard box and whisker plots. For each feature there are two distinct box plots. A coloured box plot representing the feature distribution of known pulsars, and a plain black box plot showing the feature distribution of non-pulsars. As the features have numerical ranges which differ significantly, feature data were scaled to within the range [0, 1] prior to plotting. This enables a separability comparison on the same scale. For each individual feature, the median value of the negative distribution was also subtracted. Thus the plots are centred around the non-pulsar median, allowing differences between pulsar and non-pulsar distributions to be seen more clearly.
![Box plots (median and IQR) showing the linear separability of our new features. Feature data were extracted from 90 000 labelled pulsar candidates produced by Morello et al. (2014), via the pulsar feature lab. There are two box plots per feature. The coloured boxes describe the feature distribution for known pulsars, where corresponding coloured dots represent extreme outliers. Those box plots in black describe the RFI/noise distribution. Note that the data of each feature was scaled to the interval [0, 1], before the median of the RFI/noise distribution was subtracted to centre the non-pulsar plots on zero.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/mnras/459/1/10.1093_mnras_stw656/2/m_stw656fig6.jpeg?Expires=1748010485&Signature=yKIwC42hEWZjmAbzA2T5RMmIKw8Ht2YZwjPiX56~fDqB-FmeVlZMziG34H6VhdEcDeU7C70auCFEi9UNVVrLrss9FTZLvQdHln0QrOpXlqHIsyjN~8M6QB7oKlqJ0evnmNxBJoEYuV2Q67pZsIvyrC2gF9HdMhioP4sxa1zbwnxbYisHw-3evknbS8hMWfN9qpkIbbcb3-scXGBhbUmuClFjrmqxCiJGxImBNJ1BY-Q9g~6DDaZejdkxCEeBwJ8GuElLdJYfFkpuTFHPsN2lqC8vbiXBQboU1igHxdHIVjrylkKwVzesnPHbVRIwhVSYcdXtm~JC79907fJ5cWmVRg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Box plots (median and IQR) showing the linear separability of our new features. Feature data were extracted from 90 000 labelled pulsar candidates produced by Morello et al. (2014), via the pulsar feature lab. There are two box plots per feature. The coloured boxes describe the feature distribution for known pulsars, where corresponding coloured dots represent extreme outliers. Those box plots in black describe the RFI/noise distribution. Note that the data of each feature was scaled to the interval [0, 1], before the median of the RFI/noise distribution was subtracted to centre the non-pulsar plots on zero.
The visualization shows there to be a reasonable amount of separation between the pulsar and non-pulsar feature distributions. This is initial evidence for the usefulness of these features11 but only on a visual level. Thus we applied a two-tailed students t-test to feature data, in order to determine if the means of the pulsar and non-pulsar distributions were significantly different. A rejection of the null hypothesis (no significant difference) would provide statistical evidence for the separability indicated in the box plots. For all data sets, there was a statistically significant difference between the pulsar and non-pulsar distributions at α = 0.01. A non-parametric Wilcoxon signed-rank test (Wilcoxon 1945), was also undertaken with no difference in results. This suggested the features to be worthy of further, more rigorous investigation. The next step involved determining the extent of any linear correlation between the features and the target class variable.
5.1.3 Correlation tests
The point–biserial correlation coefficient for each feature on the three test data sets.
Feature . | Dataset . | Avg. rpb . | ||
---|---|---|---|---|
. | HTRU 1 . | HTRU 2 . | LOTAAS 1 . | . |
Prof.μ | − 0.310 | − 0.673 | − 0.508 | − 0.512 |
Prof.σ | − 0.084 | − 0.364 | − 0.337 | − 0.266 |
Prof.k | 0.545 | 0.792 | 0.774 | 0.719 |
Prof.s | 0.601 | 0.710 | 0.762 | 0.697 |
DMμ | − 0.174 | 0.401 | 0.275 | 0.175 |
DMσ | 0.059 | 0.492 | 0.282 | 0.287 |
DMk | 0.178 | − 0.391 | 0.426 | 0.074 |
DMs | 0.190 | − 0.230 | − 0.211 | − 0.096 |
Feature . | Dataset . | Avg. rpb . | ||
---|---|---|---|---|
. | HTRU 1 . | HTRU 2 . | LOTAAS 1 . | . |
Prof.μ | − 0.310 | − 0.673 | − 0.508 | − 0.512 |
Prof.σ | − 0.084 | − 0.364 | − 0.337 | − 0.266 |
Prof.k | 0.545 | 0.792 | 0.774 | 0.719 |
Prof.s | 0.601 | 0.710 | 0.762 | 0.697 |
DMμ | − 0.174 | 0.401 | 0.275 | 0.175 |
DMσ | 0.059 | 0.492 | 0.282 | 0.287 |
DMk | 0.178 | − 0.391 | 0.426 | 0.074 |
DMs | 0.190 | − 0.230 | − 0.211 | − 0.096 |
The point–biserial correlation coefficient for each feature on the three test data sets.
Feature . | Dataset . | Avg. rpb . | ||
---|---|---|---|---|
. | HTRU 1 . | HTRU 2 . | LOTAAS 1 . | . |
Prof.μ | − 0.310 | − 0.673 | − 0.508 | − 0.512 |
Prof.σ | − 0.084 | − 0.364 | − 0.337 | − 0.266 |
Prof.k | 0.545 | 0.792 | 0.774 | 0.719 |
Prof.s | 0.601 | 0.710 | 0.762 | 0.697 |
DMμ | − 0.174 | 0.401 | 0.275 | 0.175 |
DMσ | 0.059 | 0.492 | 0.282 | 0.287 |
DMk | 0.178 | − 0.391 | 0.426 | 0.074 |
DMs | 0.190 | − 0.230 | − 0.211 | − 0.096 |
Feature . | Dataset . | Avg. rpb . | ||
---|---|---|---|---|
. | HTRU 1 . | HTRU 2 . | LOTAAS 1 . | . |
Prof.μ | − 0.310 | − 0.673 | − 0.508 | − 0.512 |
Prof.σ | − 0.084 | − 0.364 | − 0.337 | − 0.266 |
Prof.k | 0.545 | 0.792 | 0.774 | 0.719 |
Prof.s | 0.601 | 0.710 | 0.762 | 0.697 |
DMμ | − 0.174 | 0.401 | 0.275 | 0.175 |
DMσ | 0.059 | 0.492 | 0.282 | 0.287 |
DMk | 0.178 | − 0.391 | 0.426 | 0.074 |
DMs | 0.190 | − 0.230 | − 0.211 | − 0.096 |
5.1.4 Information theoretic analysis
Information theory uses the standard rules of probability to learn more about features and their interactions. Features which at first appear information-poor, may when combined with one or more other features, impart new and meaningful knowledge (Guyon & Elisseeff 2003). Applying this theory to candidate features enables their comparison, evaluation, and selection within an established framework for the first time.
Information theory describes each feature Xj in terms of entropy. Entropy is a fundamental unit of information borrowed from thermodynamics by (Shannon & Weaver 1949), that quantifies the uncertainty present in the distribution of Xj.
The MI metric helps identify relevant features, by enabling them to be ranked according to those that result in the greatest reduction of uncertainty. It is one of the most common filter methods (Kohavi & John 1997; Guyon & Elisseeff 2003; Brown et al. 2012) used for feature selection (Brown 2009). The entropy and MI of our features are listed in Table 7, ranked according to their mean MI content, where higher MI is desirable. To produce this table feature data were discretized, for reasons set out by Guyon & Elisseeff (2003), enabling use with the information-theoretic feast14 and mitoolbox15 toolkits developed by Brown et al. (2012). The data were discretized using 10 equal-width bins using the filters within the weka data mining tool.16 Simple binning was chosen ahead of more advanced minimum description length based discretization procedures (Fayyad & Irani 1993), to simplify feature comparisons.
The entropy H(Xj), and mutual information I(Xj; Y) of each feature. Features are ranked according to their mutual information content with respect to the class label Y. Higher mutual information is desirable.
Feature . | Dataset . | Avg. . | ||||||
---|---|---|---|---|---|---|---|---|
. | HTRU 1 . | HTRU 2 . | LOTAAS 1 . | . | ||||
. | |$\displaystyle H(X^{\rm j})$| . | |$\displaystyle I(X^{\rm j};Y)$| . | |$\displaystyle H(X^{\rm j})$| . | |$\displaystyle I (X^{\rm j};Y)$| . | |$\displaystyle H(X^{\rm j})$| . | |$\displaystyle I(X^{\rm j};Y)$| . | |$\displaystyle H(X^{\rm j})$| . | |$\displaystyle I(X^{\rm j};Y)$| . |
Prof.k | 1.062 | 0.073 | 1.549 | 0.311 | 0.948 | 0.088 | 1.186 | 0.157 |
Prof.μ | 1.993 | 0.065 | 2.338 | 0.269 | 1.986 | 0.085 | 2.106 | 0.139 |
Prof.s | 0.545 | 0.063 | 0.523 | 0.245 | 0.114 | 0.074 | 0.394 | 0.127 |
DMk | 1.293 | 0.021 | 2.295 | 0.146 | 1.842 | 0.083 | 1.810 | 0.083 |
Prof.σ | 2.011 | 0.007 | 1.972 | 0.115 | 2.354 | 0.061 | 2.112 | 0.061 |
DMσ | 2.231 | 0.004 | 2.205 | 0.171 | 0.013 | 0.006 | 1.483 | 0.060 |
DMμ | 1.950 | 0.028 | 0.835 | 0.114 | 0.015 | 0.008 | 0.933 | 0.050 |
DMs | 0.138 | 0.013 | 1.320 | 0.041 | 2.243 | 0.045 | 1.233 | 0.033 |
Feature . | Dataset . | Avg. . | ||||||
---|---|---|---|---|---|---|---|---|
. | HTRU 1 . | HTRU 2 . | LOTAAS 1 . | . | ||||
. | |$\displaystyle H(X^{\rm j})$| . | |$\displaystyle I(X^{\rm j};Y)$| . | |$\displaystyle H(X^{\rm j})$| . | |$\displaystyle I (X^{\rm j};Y)$| . | |$\displaystyle H(X^{\rm j})$| . | |$\displaystyle I(X^{\rm j};Y)$| . | |$\displaystyle H(X^{\rm j})$| . | |$\displaystyle I(X^{\rm j};Y)$| . |
Prof.k | 1.062 | 0.073 | 1.549 | 0.311 | 0.948 | 0.088 | 1.186 | 0.157 |
Prof.μ | 1.993 | 0.065 | 2.338 | 0.269 | 1.986 | 0.085 | 2.106 | 0.139 |
Prof.s | 0.545 | 0.063 | 0.523 | 0.245 | 0.114 | 0.074 | 0.394 | 0.127 |
DMk | 1.293 | 0.021 | 2.295 | 0.146 | 1.842 | 0.083 | 1.810 | 0.083 |
Prof.σ | 2.011 | 0.007 | 1.972 | 0.115 | 2.354 | 0.061 | 2.112 | 0.061 |
DMσ | 2.231 | 0.004 | 2.205 | 0.171 | 0.013 | 0.006 | 1.483 | 0.060 |
DMμ | 1.950 | 0.028 | 0.835 | 0.114 | 0.015 | 0.008 | 0.933 | 0.050 |
DMs | 0.138 | 0.013 | 1.320 | 0.041 | 2.243 | 0.045 | 1.233 | 0.033 |
The entropy H(Xj), and mutual information I(Xj; Y) of each feature. Features are ranked according to their mutual information content with respect to the class label Y. Higher mutual information is desirable.
Feature . | Dataset . | Avg. . | ||||||
---|---|---|---|---|---|---|---|---|
. | HTRU 1 . | HTRU 2 . | LOTAAS 1 . | . | ||||
. | |$\displaystyle H(X^{\rm j})$| . | |$\displaystyle I(X^{\rm j};Y)$| . | |$\displaystyle H(X^{\rm j})$| . | |$\displaystyle I (X^{\rm j};Y)$| . | |$\displaystyle H(X^{\rm j})$| . | |$\displaystyle I(X^{\rm j};Y)$| . | |$\displaystyle H(X^{\rm j})$| . | |$\displaystyle I(X^{\rm j};Y)$| . |
Prof.k | 1.062 | 0.073 | 1.549 | 0.311 | 0.948 | 0.088 | 1.186 | 0.157 |
Prof.μ | 1.993 | 0.065 | 2.338 | 0.269 | 1.986 | 0.085 | 2.106 | 0.139 |
Prof.s | 0.545 | 0.063 | 0.523 | 0.245 | 0.114 | 0.074 | 0.394 | 0.127 |
DMk | 1.293 | 0.021 | 2.295 | 0.146 | 1.842 | 0.083 | 1.810 | 0.083 |
Prof.σ | 2.011 | 0.007 | 1.972 | 0.115 | 2.354 | 0.061 | 2.112 | 0.061 |
DMσ | 2.231 | 0.004 | 2.205 | 0.171 | 0.013 | 0.006 | 1.483 | 0.060 |
DMμ | 1.950 | 0.028 | 0.835 | 0.114 | 0.015 | 0.008 | 0.933 | 0.050 |
DMs | 0.138 | 0.013 | 1.320 | 0.041 | 2.243 | 0.045 | 1.233 | 0.033 |
Feature . | Dataset . | Avg. . | ||||||
---|---|---|---|---|---|---|---|---|
. | HTRU 1 . | HTRU 2 . | LOTAAS 1 . | . | ||||
. | |$\displaystyle H(X^{\rm j})$| . | |$\displaystyle I(X^{\rm j};Y)$| . | |$\displaystyle H(X^{\rm j})$| . | |$\displaystyle I (X^{\rm j};Y)$| . | |$\displaystyle H(X^{\rm j})$| . | |$\displaystyle I(X^{\rm j};Y)$| . | |$\displaystyle H(X^{\rm j})$| . | |$\displaystyle I(X^{\rm j};Y)$| . |
Prof.k | 1.062 | 0.073 | 1.549 | 0.311 | 0.948 | 0.088 | 1.186 | 0.157 |
Prof.μ | 1.993 | 0.065 | 2.338 | 0.269 | 1.986 | 0.085 | 2.106 | 0.139 |
Prof.s | 0.545 | 0.063 | 0.523 | 0.245 | 0.114 | 0.074 | 0.394 | 0.127 |
DMk | 1.293 | 0.021 | 2.295 | 0.146 | 1.842 | 0.083 | 1.810 | 0.083 |
Prof.σ | 2.011 | 0.007 | 1.972 | 0.115 | 2.354 | 0.061 | 2.112 | 0.061 |
DMσ | 2.231 | 0.004 | 2.205 | 0.171 | 0.013 | 0.006 | 1.483 | 0.060 |
DMμ | 1.950 | 0.028 | 0.835 | 0.114 | 0.015 | 0.008 | 0.933 | 0.050 |
DMs | 0.138 | 0.013 | 1.320 | 0.041 | 2.243 | 0.045 | 1.233 | 0.033 |
The JMI rank of each feature. Features are ranked according to their average JMI across the three test data sets, where a lower rank is better.
Feature . | Dataset . | Avg. rank . | ||
---|---|---|---|---|
. | HTRU 1 . | HTRU 2 . | LOTAAS 1 . | . |
Prof.k | 1 | 1 | 1 | 1 |
Prof.μ | 3 | 3 | 3 | 3 |
DMσ | 2 | 2 | 8 | 4 |
Prof.s | 4 | 4 | 6 | 4.7 |
DMk | 6 | 6 | 2 | 4.7 |
Prof.σ | 7 | 5 | 5 | 5.7 |
DMμ | 5 | 7 | 7 | 6.4 |
DMs | 8 | 8 | 4 | 6.7 |
Feature . | Dataset . | Avg. rank . | ||
---|---|---|---|---|
. | HTRU 1 . | HTRU 2 . | LOTAAS 1 . | . |
Prof.k | 1 | 1 | 1 | 1 |
Prof.μ | 3 | 3 | 3 | 3 |
DMσ | 2 | 2 | 8 | 4 |
Prof.s | 4 | 4 | 6 | 4.7 |
DMk | 6 | 6 | 2 | 4.7 |
Prof.σ | 7 | 5 | 5 | 5.7 |
DMμ | 5 | 7 | 7 | 6.4 |
DMs | 8 | 8 | 4 | 6.7 |
The JMI rank of each feature. Features are ranked according to their average JMI across the three test data sets, where a lower rank is better.
Feature . | Dataset . | Avg. rank . | ||
---|---|---|---|---|
. | HTRU 1 . | HTRU 2 . | LOTAAS 1 . | . |
Prof.k | 1 | 1 | 1 | 1 |
Prof.μ | 3 | 3 | 3 | 3 |
DMσ | 2 | 2 | 8 | 4 |
Prof.s | 4 | 4 | 6 | 4.7 |
DMk | 6 | 6 | 2 | 4.7 |
Prof.σ | 7 | 5 | 5 | 5.7 |
DMμ | 5 | 7 | 7 | 6.4 |
DMs | 8 | 8 | 4 | 6.7 |
Feature . | Dataset . | Avg. rank . | ||
---|---|---|---|---|
. | HTRU 1 . | HTRU 2 . | LOTAAS 1 . | . |
Prof.k | 1 | 1 | 1 | 1 |
Prof.μ | 3 | 3 | 3 | 3 |
DMσ | 2 | 2 | 8 | 4 |
Prof.s | 4 | 4 | 6 | 4.7 |
DMk | 6 | 6 | 2 | 4.7 |
Prof.σ | 7 | 5 | 5 | 5.7 |
DMμ | 5 | 7 | 7 | 6.4 |
DMs | 8 | 8 | 4 | 6.7 |
6 STREAM CLASSIFICATION
Data streams are quasi-infinite sequences of information, which are temporally ordered and indeterminable in size (Gaber, Zaslavsky & Krishnaswamy 2005; Lyon et al. 2013, 2014). Data streams are produced by many modern computer systems (Gaber et al. 2005) and are likely to arise from the increasing volumes of data output by modern radio telescopes, especially the SKA. However many of the effective supervised ML techniques used for candidate selection do not work with streams (Lyon et al. 2014). Adapting existing methods for use with streams is challenging, it remains an active goal of data mining research (Yang & Wu 2006; Gaber, Zaslavsky & Krishnaswamy 2007). Until that goal is realized, new stream-ready selection approaches are required.
6.1 Unsuitability of existing approaches
Supervised ML methods induce classification models from labelled training sets (Mitchell 1997; Bishop 2006). Provided these are large, representative of rare and majority class examples, and independent and identically distributed (i.i.d.) to the data being classified (Bishop 2006) good classification performance can be expected to result. However the notion of a training set does not exist within a data stream. There are instead two general processing models used for learning.
Batch processing model: at time step i, a batch b of n unlabelled instances arrives, and is classified using some model trained on batches b1 to bi−1. At time i + 1 labels arrive for batch bi, along with a new batch of unlabelled instances bi+1 to be classified.
Incremental processing model: a single data instance arrives at time step i defined as Xi, and is classified using some model trained on instances X1 to Xi − 1. At time i + 1 a label arrives for Xi, along with a new unlabelled instance Xi + 1 to be classified.
In both models learning proceeds continually, as labelled data becomes available. This allows for adaptive learning. Standard supervised classifiers simply cannot be trained in this way. Even if they could, the CPU and memory costs of their training phases make them impractical for streams (Gaber 2012). This was recognized by Zhu et al. (2014) with respect to their pics system.17
Given these problems how should candidate selection be addressed in streams? One may consider training an existing supervised candidate classifier offline, which could then be applied to a candidate stream. This is a plausible approach, provided the classifier processes each example before the next one arrives. For this to be viable, the classifier must also be trained with data that is i.i.d. with respect to the data in the stream. However data streams are known to exhibit distributional shifts over varying time periods. For example, a changing RFI environment can exhibit shifts over both short (minutes/hours), and/or long (days/weeks/years) time-scales. In either case the shifts cause violations of the i.i.d. assumption, a phenomena known as ‘concept drift’ (Widmer & Kubat 1996; Gaber et al. 2005). To mitigate the impact of drift, adaptive algorithms able to learn from distributional changes are required, as pre-existing training data no longer characterises the post-drift data distribution (Lyon 2015). Such algorithms must be capable of completely reconstructing their internal learning models in an efficient manner per each significant distributional shift. Standard supervised learning models are ‘static’, i.e. they remain unchanged once learned. A static classifier applied to streaming data subject to drifts, will exhibit a significant deterioration in classification performance over time (Aggarwal et al. 2004). This makes standard supervised learning unsuitable for data streams. In the next section we describe our new ‘intelligent’ data stream classifier, which overcomes these deficiencies.
6.2 Gaussian Hellinger Very Fast Decision Tree
The Gaussian Hellinger Very Fast Decision Tree (GH-VFDT) is an incremental stream classifier, developed specifically for the candidate selection problem (Lyon et al. 2014). It is a tree-based algorithm based on the Very Fast Decision tree (VFDT) developed by Hulten, Spence & Domingos (2001). It is designed to maximize classification performance on candidate data streams, which are heavily imbalanced in favour of the non-pulsar class. It is the first candidate selection algorithm designed to mitigate the imbalanced learning problem (He & Garcia 2009; Lyon et al. 2013, 2014), known to reduce classification accuracy when one class of examples (i.e. non-pulsar) dominates the other. The algorithm uses tree learning (Mitchell 1997) to achieve this, whereby the data are partitioned using feature split point tests (see Figs 7 and 8) that aim to maximize the separation of pulsar and non-pulsar candidates. This involves first choosing the variable that acts as the best class separator, and then finding a numerical threshold ‘test point’ for that variable that maximises class separability.

An overview of how a streaming decision tree partitions the data space to derive a classification. Each candidate is passed down the tree, and tested at each node it reaches including the root. Each node test outcome determines which branch the candidate continues down, until it reaches a leaf at the bottom of the tree. The tree shown here assigns the class labels A, B, and C to examples reaching the leaf nodes.

An overview of how a decision tree partitions the data space using binary split point ‘tests’ at each node. The best feature variable at each node is first determined, then an optimal numerical split point threshold chosen. Candidates with feature values below the threshold are passed down the left-hand branch of the tree, and possibly subjected to further split tests. Similarly for candidates with feature values above the threshold, except these are passed down the right-hand branch. Eventually candidates reach the leaf nodes, where they are assigned class labels.

A complete outline of the GH-VFDT is given in Algorithm 1. On line 7 tree statistics used to compute the Hellinger distance are updated. In particular, the running mean and standard deviation maintained at each leaf, for feature j, and class k are updated. The call to |$getBest(dist,X_{i}^{ j})$| returns the best and second best features found at a leaf. This is achieved by choosing those that maximize the Hellinger distance via an iterative process. On line 18 tree split points are first generated and evaluated. Here data are discretized using 10 equal-width bins, and a binary split point chosen.
This approach has already been shown to significantly improve recall rates for pulsar data, above the levels achieved by established stream classifiers. When applied to a data stream containing 10 000 non-pulsar candidates for every legitimate pulsar (HTRU data obtained by Thornton (2013)), it raised the recall rate from 30 to 86 per cent (Lyon et al. 2014). This was achieved using candidate data described using the features designed by Bates et al. (2012) and Thornton (2013). A full implementation of the algorithm can be found online for public use.18
6.3 Classification performance
Existing features and algorithms have been evaluated predominantly in terms of classification accuracy. Such an analysis considers candidate selection as a binary classification problem, whereby candidates arising from pulsars are considered positive (+), and those from non-pulsars negative (−). There are then four possible outcomes for an individual classification decision. These outcomes are summarized in Table 9 and are evaluated using standard metrics such as those outlined in Table 10. The goal of classification is to minimize the number of false positives, whilst maximizing the true positives. Features in this domain are most often chosen according to how well they maximize classifier recall (the fraction of legitimate pulsar candidates correctly classified) and specificity (fraction of non-pulsar candidates correctly classified).19 Those classifiers with high recall and specificity exhibit high accuracy, often interpreted to mean that underlying features are good discriminators.
Standard evaluation metrics for classifier performance. True Positives (TP) are those candidates correctly classified as pulsars. True Negatives (TN) are those correctly classified as not pulsars. False Positives (FP) are those incorrectly classified as pulsars, False Negatives (FN) are those incorrectly classified as not pulsars. All metrics produce values in the range [0, 1].
Statistic . | Description . | Definition . |
---|---|---|
Accuracy | Measure of overall classification accuracy. | |$\frac{({\rm TP}+{\rm TN})}{({\rm TP}+{\rm FP}+{\rm FN}+{\rm TN})}$| |
False positive rate (FPR) | Fraction of negative instances incorrectly labelled positive. | |$\frac{{\rm FP}}{({\rm FP} +{\rm TN})}$| |
G-Mean | Imbalanced data metric describing the ratio between positive and negative accuracy. | |$\sqrt{\frac{{\rm TP}}{{\rm TP}+{\rm FN}}\times \frac{{\rm TN}}{{\rm TN}+{\rm FP}}}$| |
Precision | Fraction of retrieved instances that are positive. | |$\frac{{\rm TP}}{({\rm TP}+{\rm FP})}$| |
Recall | Fraction of positive instances that are retrieved. | |$\frac{{\rm TP}}{({\rm TP} + {\rm FN})}$| |
F-Score | Measure of accuracy that considers both precision and recall. | |$2\times { \frac{{\rm precision}\times {{\rm recall}}}{{\rm precision} + {\rm recall}}}$| |
Specificity | Fraction of negatives correctly identified as such. | |$\frac{{\rm TN}}{({\rm FP} + {\rm TN})}$| |
Statistic . | Description . | Definition . |
---|---|---|
Accuracy | Measure of overall classification accuracy. | |$\frac{({\rm TP}+{\rm TN})}{({\rm TP}+{\rm FP}+{\rm FN}+{\rm TN})}$| |
False positive rate (FPR) | Fraction of negative instances incorrectly labelled positive. | |$\frac{{\rm FP}}{({\rm FP} +{\rm TN})}$| |
G-Mean | Imbalanced data metric describing the ratio between positive and negative accuracy. | |$\sqrt{\frac{{\rm TP}}{{\rm TP}+{\rm FN}}\times \frac{{\rm TN}}{{\rm TN}+{\rm FP}}}$| |
Precision | Fraction of retrieved instances that are positive. | |$\frac{{\rm TP}}{({\rm TP}+{\rm FP})}$| |
Recall | Fraction of positive instances that are retrieved. | |$\frac{{\rm TP}}{({\rm TP} + {\rm FN})}$| |
F-Score | Measure of accuracy that considers both precision and recall. | |$2\times { \frac{{\rm precision}\times {{\rm recall}}}{{\rm precision} + {\rm recall}}}$| |
Specificity | Fraction of negatives correctly identified as such. | |$\frac{{\rm TN}}{({\rm FP} + {\rm TN})}$| |
Standard evaluation metrics for classifier performance. True Positives (TP) are those candidates correctly classified as pulsars. True Negatives (TN) are those correctly classified as not pulsars. False Positives (FP) are those incorrectly classified as pulsars, False Negatives (FN) are those incorrectly classified as not pulsars. All metrics produce values in the range [0, 1].
Statistic . | Description . | Definition . |
---|---|---|
Accuracy | Measure of overall classification accuracy. | |$\frac{({\rm TP}+{\rm TN})}{({\rm TP}+{\rm FP}+{\rm FN}+{\rm TN})}$| |
False positive rate (FPR) | Fraction of negative instances incorrectly labelled positive. | |$\frac{{\rm FP}}{({\rm FP} +{\rm TN})}$| |
G-Mean | Imbalanced data metric describing the ratio between positive and negative accuracy. | |$\sqrt{\frac{{\rm TP}}{{\rm TP}+{\rm FN}}\times \frac{{\rm TN}}{{\rm TN}+{\rm FP}}}$| |
Precision | Fraction of retrieved instances that are positive. | |$\frac{{\rm TP}}{({\rm TP}+{\rm FP})}$| |
Recall | Fraction of positive instances that are retrieved. | |$\frac{{\rm TP}}{({\rm TP} + {\rm FN})}$| |
F-Score | Measure of accuracy that considers both precision and recall. | |$2\times { \frac{{\rm precision}\times {{\rm recall}}}{{\rm precision} + {\rm recall}}}$| |
Specificity | Fraction of negatives correctly identified as such. | |$\frac{{\rm TN}}{({\rm FP} + {\rm TN})}$| |
Statistic . | Description . | Definition . |
---|---|---|
Accuracy | Measure of overall classification accuracy. | |$\frac{({\rm TP}+{\rm TN})}{({\rm TP}+{\rm FP}+{\rm FN}+{\rm TN})}$| |
False positive rate (FPR) | Fraction of negative instances incorrectly labelled positive. | |$\frac{{\rm FP}}{({\rm FP} +{\rm TN})}$| |
G-Mean | Imbalanced data metric describing the ratio between positive and negative accuracy. | |$\sqrt{\frac{{\rm TP}}{{\rm TP}+{\rm FN}}\times \frac{{\rm TN}}{{\rm TN}+{\rm FP}}}$| |
Precision | Fraction of retrieved instances that are positive. | |$\frac{{\rm TP}}{({\rm TP}+{\rm FP})}$| |
Recall | Fraction of positive instances that are retrieved. | |$\frac{{\rm TP}}{({\rm TP} + {\rm FN})}$| |
F-Score | Measure of accuracy that considers both precision and recall. | |$2\times { \frac{{\rm precision}\times {{\rm recall}}}{{\rm precision} + {\rm recall}}}$| |
Specificity | Fraction of negatives correctly identified as such. | |$\frac{{\rm TN}}{({\rm FP} + {\rm TN})}$| |
. | Predicted . | ||
---|---|---|---|
. | . | − . | + . |
Actual | − | True negative (TN) | False positive (FP) |
+ | False negative (FN) | True positive (TP) |
. | Predicted . | ||
---|---|---|---|
. | . | − . | + . |
Actual | − | True negative (TN) | False positive (FP) |
+ | False negative (FN) | True positive (TP) |
. | Predicted . | ||
---|---|---|---|
. | . | − . | + . |
Actual | − | True negative (TN) | False positive (FP) |
+ | False negative (FN) | True positive (TP) |
. | Predicted . | ||
---|---|---|---|
. | . | − . | + . |
Actual | − | True negative (TN) | False positive (FP) |
+ | False negative (FN) | True positive (TP) |
This form of evaluation enables approaches to be tested quickly, with readily interpretable results. However using classifier performance as a proxy to measure feature-separability tests the classification system used as much as the features under investigation (Brown et al. 2012). The choice of classifier can influence the outcome of the evaluation giving misleading results. Evaluation metrics themselves can also be misleading. Pulsar data sets are imbalanced with respect to the total number of pulsar and non-pulsar candidates within them (Lyon et al. 2013, 2014). Thus for data sets consisting of almost entirely non-pulsar examples, high accuracy can often be achieved by classifying all candidates as non-pulsar. In these situations it is an unhelpful metric.
To overcome these possible sources of inaccuracy when evaluating the GH-VFDT, we make use of the G-mean metric (He & Garcia 2009). This describes the ratio between positive and negative accuracy, a measure insensitive to the distribution of pulsar and non-pulsar examples in test data sets. Additionally we employ multiple classifiers in our evaluation which differ greatly in terms of their internal learning models. This allows for a more general view of feature performance in practice to be revealed. This is also useful for evaluating the performance of the GH-VFDT with respect to standard static supervised classifiers, which are at an advantage in such tests. Here we make use of four standard classifiers found in the weka tool. These include the decision tree algorithm C4.5 (Quinlan 1993), MLP neural network (Haykin 1999), a simple probabilistic classifier Naïve Bayes (NB; Bishop 2006), and the standard linear soft-margin support vector machine (SVM; Cortes & Vapnik 1995).
6.3.1 GH-VFDT classification evaluation procedure
Feature data were extracted from the data sets listed in Table 5, and then independently sampled 500 times. Each sample was split into test and training sets. For HTRU 1 and 2, sampled training sets consisted of 200 positive and 200 negative examples, with remaining examples making up the test sets. LOTAAS 1 training sets contained 33 positive examples and 200 negative, with remaining examples similarly making up the test sets. Each classifier (five in total) was then trained upon, and made to classify each independent sample, therefore there were 3 × 500 × 5 = 7500 tests in total. The performance of each algorithm per data set was then averaged to summarize overall performance. To evaluate classifier performance results, one-factor analysis of variance tests were performed, where the algorithm used was the factor. Tukey's Honestly Significant Difference test (Tukey 1949), was then applied to determine if differences in results were statistically significant at α = 0.01. The full results are shown in Table 11.
Results obtained on the three test data sets. Bold type indicates the best performance observed. Results with an asterisk indicate no statistically significant difference at the α = 0.01 level.
Data set . | Algorithm . | G-Mean . | F-Score . | Recall . | Precision . | Specificity . | FPR . | Accuracy . |
---|---|---|---|---|---|---|---|---|
HTRU 1 | C4.5 | 0.962* | 0.839* | 0.961 | 0.748 | 0.962 | 0.038 | 0.962 |
MLP | 0.976 | 0.891 | 0.976 | 0.820 | 0.975 | 0.025* | 0.975 | |
NB | 0.925 | 0.837* | 0.877 | 0.801 | 0.975 | 0.025* | 0.965 | |
SVM | 0.967 | 0.922 | 0.947 | 0.898 | 0.988 | 0.012 | 0.984 | |
GH-VFDT | 0.961* | 0.941 | 0.928 | 0.955 | 0.995 | 0.005 | 0.988 | |
HTRU 2 | C4.5 | 0.926 | 0.740 | 0.904 | 0.635* | 0.949* | 0.051* | 0.946* |
MLP | 0.931 | 0.752 | 0.913 | 0.650* | 0.950* | 0.050* | 0.947* | |
NB | 0.902 | 0.692 | 0.863 | 0.579 | 0.943 | 0.057 | 0.937 | |
SVM | 0.919 | 0.789 | 0.871 | 0.723 | 0.969 | 0.031 | 0.961 | |
GH-VFDT | 0.907 | 0.862 | 0.829 | 0.899 | 0.992 | 0.008 | 0.978 | |
LOTAAS 1 | C4.5 | 0.969 | 0.623 | 0.948 | 0.494 | 0.991 | 0.009 | 0.990 |
MLP | 0.988 | 0.846* | 0.979 | 0.753 | 0.998 | 0.002 | 0.997* | |
NB | 0.977 | 0.782 | 0.959 | 0.673 | 0.996 | 0.004 | 0.996 | |
SVM | 0.949 | 0.932 | 0.901 | 0.966 | 0.999* | 0.001* | 0.999 | |
GH-VFDT | 0.888 | 0.830* | 0.789 | 0.875 | 0.999* | 0.001* | 0.998* |
Data set . | Algorithm . | G-Mean . | F-Score . | Recall . | Precision . | Specificity . | FPR . | Accuracy . |
---|---|---|---|---|---|---|---|---|
HTRU 1 | C4.5 | 0.962* | 0.839* | 0.961 | 0.748 | 0.962 | 0.038 | 0.962 |
MLP | 0.976 | 0.891 | 0.976 | 0.820 | 0.975 | 0.025* | 0.975 | |
NB | 0.925 | 0.837* | 0.877 | 0.801 | 0.975 | 0.025* | 0.965 | |
SVM | 0.967 | 0.922 | 0.947 | 0.898 | 0.988 | 0.012 | 0.984 | |
GH-VFDT | 0.961* | 0.941 | 0.928 | 0.955 | 0.995 | 0.005 | 0.988 | |
HTRU 2 | C4.5 | 0.926 | 0.740 | 0.904 | 0.635* | 0.949* | 0.051* | 0.946* |
MLP | 0.931 | 0.752 | 0.913 | 0.650* | 0.950* | 0.050* | 0.947* | |
NB | 0.902 | 0.692 | 0.863 | 0.579 | 0.943 | 0.057 | 0.937 | |
SVM | 0.919 | 0.789 | 0.871 | 0.723 | 0.969 | 0.031 | 0.961 | |
GH-VFDT | 0.907 | 0.862 | 0.829 | 0.899 | 0.992 | 0.008 | 0.978 | |
LOTAAS 1 | C4.5 | 0.969 | 0.623 | 0.948 | 0.494 | 0.991 | 0.009 | 0.990 |
MLP | 0.988 | 0.846* | 0.979 | 0.753 | 0.998 | 0.002 | 0.997* | |
NB | 0.977 | 0.782 | 0.959 | 0.673 | 0.996 | 0.004 | 0.996 | |
SVM | 0.949 | 0.932 | 0.901 | 0.966 | 0.999* | 0.001* | 0.999 | |
GH-VFDT | 0.888 | 0.830* | 0.789 | 0.875 | 0.999* | 0.001* | 0.998* |
Results obtained on the three test data sets. Bold type indicates the best performance observed. Results with an asterisk indicate no statistically significant difference at the α = 0.01 level.
Data set . | Algorithm . | G-Mean . | F-Score . | Recall . | Precision . | Specificity . | FPR . | Accuracy . |
---|---|---|---|---|---|---|---|---|
HTRU 1 | C4.5 | 0.962* | 0.839* | 0.961 | 0.748 | 0.962 | 0.038 | 0.962 |
MLP | 0.976 | 0.891 | 0.976 | 0.820 | 0.975 | 0.025* | 0.975 | |
NB | 0.925 | 0.837* | 0.877 | 0.801 | 0.975 | 0.025* | 0.965 | |
SVM | 0.967 | 0.922 | 0.947 | 0.898 | 0.988 | 0.012 | 0.984 | |
GH-VFDT | 0.961* | 0.941 | 0.928 | 0.955 | 0.995 | 0.005 | 0.988 | |
HTRU 2 | C4.5 | 0.926 | 0.740 | 0.904 | 0.635* | 0.949* | 0.051* | 0.946* |
MLP | 0.931 | 0.752 | 0.913 | 0.650* | 0.950* | 0.050* | 0.947* | |
NB | 0.902 | 0.692 | 0.863 | 0.579 | 0.943 | 0.057 | 0.937 | |
SVM | 0.919 | 0.789 | 0.871 | 0.723 | 0.969 | 0.031 | 0.961 | |
GH-VFDT | 0.907 | 0.862 | 0.829 | 0.899 | 0.992 | 0.008 | 0.978 | |
LOTAAS 1 | C4.5 | 0.969 | 0.623 | 0.948 | 0.494 | 0.991 | 0.009 | 0.990 |
MLP | 0.988 | 0.846* | 0.979 | 0.753 | 0.998 | 0.002 | 0.997* | |
NB | 0.977 | 0.782 | 0.959 | 0.673 | 0.996 | 0.004 | 0.996 | |
SVM | 0.949 | 0.932 | 0.901 | 0.966 | 0.999* | 0.001* | 0.999 | |
GH-VFDT | 0.888 | 0.830* | 0.789 | 0.875 | 0.999* | 0.001* | 0.998* |
Data set . | Algorithm . | G-Mean . | F-Score . | Recall . | Precision . | Specificity . | FPR . | Accuracy . |
---|---|---|---|---|---|---|---|---|
HTRU 1 | C4.5 | 0.962* | 0.839* | 0.961 | 0.748 | 0.962 | 0.038 | 0.962 |
MLP | 0.976 | 0.891 | 0.976 | 0.820 | 0.975 | 0.025* | 0.975 | |
NB | 0.925 | 0.837* | 0.877 | 0.801 | 0.975 | 0.025* | 0.965 | |
SVM | 0.967 | 0.922 | 0.947 | 0.898 | 0.988 | 0.012 | 0.984 | |
GH-VFDT | 0.961* | 0.941 | 0.928 | 0.955 | 0.995 | 0.005 | 0.988 | |
HTRU 2 | C4.5 | 0.926 | 0.740 | 0.904 | 0.635* | 0.949* | 0.051* | 0.946* |
MLP | 0.931 | 0.752 | 0.913 | 0.650* | 0.950* | 0.050* | 0.947* | |
NB | 0.902 | 0.692 | 0.863 | 0.579 | 0.943 | 0.057 | 0.937 | |
SVM | 0.919 | 0.789 | 0.871 | 0.723 | 0.969 | 0.031 | 0.961 | |
GH-VFDT | 0.907 | 0.862 | 0.829 | 0.899 | 0.992 | 0.008 | 0.978 | |
LOTAAS 1 | C4.5 | 0.969 | 0.623 | 0.948 | 0.494 | 0.991 | 0.009 | 0.990 |
MLP | 0.988 | 0.846* | 0.979 | 0.753 | 0.998 | 0.002 | 0.997* | |
NB | 0.977 | 0.782 | 0.959 | 0.673 | 0.996 | 0.004 | 0.996 | |
SVM | 0.949 | 0.932 | 0.901 | 0.966 | 0.999* | 0.001* | 0.999 | |
GH-VFDT | 0.888 | 0.830* | 0.789 | 0.875 | 0.999* | 0.001* | 0.998* |
These results indicate that it is possible to achieve high levels of classifier performance using the features described in Section 5. What is more, the classification results are consistent across all three data sets. Recall rates on all three test data sets are high, with 98 per cent recall achieved by the MLP on HTRU 1 and LOTAAS 1 data. High levels of accuracy were observed throughout testing and G-mean scores on HTRU 1 were particularly high. The algorithms also exhibited high levels of specificity and generally low false positive rates. The exception being the 6 per cent false positive rate achieved by the NB classifier on HTRU 2 data. This outcome is unremarkable for NB, the simplest classifier tested, as the HTRU 2 data set is populated with noise and borderline candidates. Thus we suggest that these represent the first survey independent features developed for the candidate selection problem.
The results also show that the GH-VFDT algorithm consistently outperformed the static classifiers, in terms of both specificity and false positive return rate. This is a highly desirable outcome for a stream classifier, since assigning positive labels too often will return an unmanageable number of candidates. The classifier does not always predict ‘non-pulsar’ to give this result. It is precise, achieving the best precision on two out of the three data sets. G-mean and recall rates were also high for the GH-VFDT, the latter reaching 92.8 per cent on HTRU 1 data. The recall rates are lower on the remaining two data sets. However it is worth noting that these data sets are considerably smaller than HTRU 1. This is important, since the performance of the GH-VFDT (and of other stream algorithms) improves as more examples are observed. The lower levels of recall on HTRU 2 and LOTAAS 1 are therefore to be expected given the smaller data set size. In terms of the usefulness of this algorithm for SKA data streams, the GH-VFDT returns consistently less than 1 per cent of candidates as false positives. This greatly reduces the quantity of candidates to be analysed. The GH-VFDT also classifies candidates rapidly. It classified candidates at a rate of ∼70 000 per second using a single 2.2 GHz Quad Core mobile CPU (Intel Core i7-2720QM Processor) when applied to a larger sample of HTRU 2 data consisting of 11 million examples. A discussion of the statistics of the pulsars incorrectly classified by the new methods will be discussed in a future paper.
7 SUMMARY
This paper has described the pulsar candidate selection process, and contextualized its almost 50 year history. During this time candidate selection procedures have been continually adapting to the demands of increased data capture rates and rising candidate numbers, which has proven to be difficult. We have contributed a new solution to these problems by demonstrating eight new features useful for separating pulsar and non-pulsar candidates, and by developing a candidate classification algorithm designed to meet the data processing challenges of the future. Together these enable a high fraction of legitimate pulsar candidates to be extracted from test data, with recall rates reaching almost 98 per cent. When applied to data streams, the combination of these features and our algorithm enable over 90 per cent of legitimate pulsar candidates to be recovered. The corresponding false positive return rate is less than half a per cent. Thus together these can be used to significantly reduce the problems associated with high candidate numbers which make pulsar discovery difficult, and go some way towards mitigating the selection problems posed by next-generation radio telescopes such as the SKA. The combination of these features and our classification algorithm has already proven useful, aiding in the discovery of 20 new pulsars in data collected during the LOTAAS (Cooper 2014). Details of these discoveries will be provided elsewhere, demonstrating the utility of our contributions in practice.
The features described in this paper are amongst the most rigorously tested in this domain. However whilst we advocate their use on statistical grounds, we do not demonstrate their superiority to other features. Future work will consider how these compare to those used previously, and determine if combining them with those already in use is worthwhile. Thus for the time being it is advisable to construct as large a set of features as possible, and use the tools described herein to select feature sets statistically.
This work was supported by grant EP/I028099/1 from the UK Engineering and Physical Sciences Research Council (EPSRC). HTRU 2 data were obtained by the High Time Resolution Universe Collaboration using the Parkes Observatory, funded by the Commonwealth of Australia and managed by the CSIRO. LOFAR data were obtained with the help of the DRAGNET team, supported by ERC Starting Grant 337062 (PI Hessels). We would also like to thank Konstantinos Sechidis for some insightful discussions with respect to information theoretic feature selection, Dan Thornton for initially processing HTRU 2 data, and our reviewer for their helpful feedback.
A candidate obtained by folding a de-dispersed time series at a specific suspect period.
Empirically observed in HTRU survey data.
The hit rate of the recent southern HTRU medium latitude search was much lower, at around 0.01 per cent (Lyon 2015).
The data an algorithm ‘learns’ from must possess the same distribution as the data it will be applied to, otherwise its performance will be poor.
This is defined as the range of DMs around the DM that yields the highest spectral detection for a candidate. The limits of this range are defined by a change in the DM that corresponds to a time delay across the frequency band equivalent to the initial detection period of a candidate.
Containing correctly labelled pulsar and non-pulsar candidates.
Similar levels of separability were observed when the same plot was produced for both the HTRU 2 and LOTAAS 1 data sets.
Max entropy for a feature with n possible values is given by log2(n).
Also known as information gain, or a specific case of the Kullback–Leibler divergence (MacKay 2002).
Zhu et al. (2014) indicated efforts are under way to rectify this.
The approaches in Section 3.4 evaluate in this manner.
REFERENCES