LOFAR 150-MHz observations of the Bo\"otes field: Catalogue and Source Counts

We present the first wide area (19 deg$^2$), deep ($\approx120-150$ {\mu}Jy beam$^{-1}$), high resolution ($5.6 \times 7.4$ arcsec) LOFAR High Band Antenna image of the Bo\"otes field made at 130-169 MHz. This image is at least an order of magnitude deeper and 3-5 times higher in angular resolution than previously achieved for this field at low frequencies. The observations and data reduction, which includes full direction-dependent calibration, are described here. We present a radio source catalogue containing 6276 sources detected over an area of $19$\,deg$^2$, with a peak flux density threshold of $5\sigma$. As the first thorough test of the facet calibration strategy, introduced by van Weeren et al., we investigate the flux and positional accuracy of the catalogue. We present differential source counts that reach an order of magnitude deeper in flux density than previously achieved at these low frequencies, and show flattening at 150 MHz flux densities below 10 mJy associated with the rise of the low flux density star-forming galaxies and radio-quiet AGN.


I N T RO D U C T I O N
The Low Frequency Array (LOFAR) is a new generation radio telescope operating at 10-240 MHz (van Haarlem et al. 2013). Its large instantaneous field of view, combined with multibeaming capabilities, high-spatial resolution, and a large fractional bandwidth make LOFAR an ideal instrument for carrying out large surveys of the sky, which will have long-lasting legacy value. As such, 'Surveys' is one of the six LOFAR Key Science Projects (KSP). The science goals of the Surveys KSP are broad, covering aspects from the formation and evolution of large-scale structure of the Universe; the physics of the origin, evolution and end-stages of radio sources; the magnetic field and interstellar medium in nearby galaxies and multiwavelength data sets available (for details see Röttgering et al. 2010).
Several low-frequency surveys have been performed in the past, such as the Cambridge surveys 3C, 4C, 6C, and 7C at 159, 178, 151, and 151 MHz, respectively (Edge et al. 1959;Bennett 1962;Pilkington & Scott 1965;Gower, Scott & Wills 1967;Hales, Baldwin & Warner 1988;Hales et al. 2007), the UTR-2 sky survey between 10 and 25 MHz (Braude et al. 2002), and the VLSS at 74 MHz (Cohen et al. 2007;Lane et al. 2014). The GMRT significantly improved low-frequency imaging, particularly in terms of sensitivity and angular resolution, and several GMRT surveys have now been performed at 150 MHz (e.g. Ishwara-Chandra & Marathe 2007;Sirothia et al. 2009;Ishwara-Chandra et al. 2010;Intema et al. 2011), including in particular a full-sky survey (Intema et al. 2016), and further surveys at 325 MHz (e.g Mauch et al. 2013) and 610 MHz (e.g Garn et al. 2007Garn et al. , 2008a. Recently, the Murchison Widefield Array (MWA; Lonsdale et al. 2009;Tingay et al. 2013), operating at 72-231 MHz, has yielded the GaLactic and Extragalactic All-sky MWA survey (GLEAM; Wayth et al. 2015). GLEAM covers the entire Southern sky (δ < 25 • ) with a noise level of a few mJy beam −1 and angular resolution of a few arcminutes. The Multifrequency Snapshot Sky Survey (MSSS; Heald et al. 2015) is an initial LOFAR survey at low resolution and a few mJy beam −1 that is complementary to GLEAM covering the Northern sky. However, for extragalactic science in particular, the high-resolution LOFAR surveys will provide a significant advantage in both image resolution and sensitivity.
Advanced calibration and processing techniques are needed to obtain deep high-fidelity images at low radio frequencies. In particular, direction-dependent effects (DDEs) caused by the ionosphere and imperfect knowledge of the station beam shapes need to be corrected for. van Weeren et al. (2016, herafter vW16) have recently presented a new scheme for calibrating the DDEs and imaging LO-FAR data that combines elements from existing direction-dependent calibration methods such as SPAM (Intema et al. 2009) and SAGE-CAL (Yatawatta et al. 2013;Kazemi et al. 2011). The Boötes field observations presented here serve as a test bed for this calibration strategy, which allows us to produce science quality images at the required Tier-1 survey depth.
Here, we report on the first LOFAR Cycle 2 High Band Antenna (HBA) observations of the Boötes field. The Boötes field is one of the Tier-3 Survey fields, and the aim is to eventually survey this field to the extreme rms depth of 12 μJy beam −1 (1σ ) at 150 MHz. The Boötes field has been extensively studied at higher radio frequencies and in other parts of the electromagnetic spectrum. Radio observations have been carried out at 153 MHz with the GMRT, both as a single deep 10 deg 2 pointing (Intema et al. 2011) and as a seven-pointing 30 deg 2 mosaic (Williams, Intema & Röttgering 2013). Further, observations include those at 325 MHz with the VLA (Croft et al. 2008;Coppejans et al. 2015) and deep, 28 μJy beam −1 rms, 1.4-GHz observations with WSRT (de Vries et al. 2002). The field has also been observed with the LOFAR Low Band Antennae (LBA) at 62 MHz (van Weeren et al. 2014).
The Boötes field is one of the largest of the well-characterized extragalactic deep fields and was originally targeted as part of the NOAO Deep Wide Field Survey (NDWFS; Jannuzi, Dey & NDWFS Team 1999) covering ≈9 deg 2 in the optical (B W , R, and I) and near infrared (K) bands. There is a wealth of ancillary data available for this field, including X-ray (Murray et al. 2005;Kenter et al. 2005), UV (GALEX; Martin et al. 2003), and mid-infrared (Eisenhardt et al. 2004). The AGN and Galaxy Evolution Survey (AGES) has provided redshifts for 23 745 galaxies and AGN across 7.7 deg 2 of the Boötes field (Kochanek et al. 2012). This rich multiwavelength data set, combined with the new low-frequency radio data presented here, will be important for determining the evolution of black hole accretion over cosmic time. The outline of this paper is as follows. In Section 2, we describe the LOFAR observations covering the NOAO Boötes field. In Section 3, we describe the data reduction techniques employed to achieve the deepest possible images. Our data reduction relies on the 'Facet' calibration scheme (vW16), which corrects for directiondependent ionospheric phase corruption as well as LOFAR beam amplitude corruption. In Section 4, we present the final image and describe the source-detection method and the compilation of the source catalogue. This section also includes an analysis of the quality of the catalogue. The spectral index distribution and differential source counts are presented in Section 5. Finally, Section 6 summarizes and concludes this work. Throughout this paper, the spectral index, α, is defined as S ν ∝ ν α , where S is the source flux density, and ν is the observing frequency. We assume a spectral index of −0.8 unless otherwise stated.

O B S E RVAT I O N S
The Boötes field was observed on 2014 August 10 with the LO-FAR HBA stations. An overview of the observations is given in Table 1. By default, all four correlation products were recorded with the frequency band divided into 195.3125 kHz-wide subbands (SBs). Each SB was further divided into 64 channels. The integration time used was 1 s in order to facilitate the removal of radio frequency interference (RFI) at high time resolution. The maximum number of SBs for the system in 8-bit mode is 488, and the chosen strategy was to use 366 for the Boötes field giving a total bandwidth of 72 MHz between 112 and 181 MHz. The remaining 122 SBs were used to observe the nearby bright calibrator source, 3C 294, located 5.2 • away, with a simultaneous station beam, with SBs semiregularly spread between 112 and 181 MHz, avoiding SBs with known strong RFI -the exact frequency coverage is available through the LOFAR Long Term Archive 1 (LTA). The main observations were preceded and succeeded by 10 min observations of the primary flux calibrators 3C 196 and 3C 295, respectively, with identical SB setup to the Boötes observation, i.e. 366 SBs (72-MHz bandwidth) between 112 and 181 MHz. For the observations 14, Dutch remote and 24 Dutch core stations were used. This setup results in baselines that range between 40 m and 120 km. The uv coverage for the Boötes field observation is displayed in Fig. 1. The 'HBA_DUAL_INNER' configuration was employed. In this configuration, the core stations are each split into two substations (48 total), and only the inner ≈30.8 m of the remote stations (which have a total diameter of 41 m) are used to obtain similar station beam sizes to the core stations (which have a diameter of 30.8 m). The resulting half-power beam width (HPBW) is ≈4.2 •2 at 150 MHz.

DATA R E D U C T I O N
In this section, we describe the calibration method that was used to obtain the required deep high-fidelity high-resolution images. The data reduction and calibration consists of two stages: a nondirectional and a directional part to correct DDEs caused by the ionosphere and imperfect station beam models. The non-directional part includes the following steps: (i) initial flagging and removal of RFI; (ii) solving for the calibrator complex gains, including 'clock-TEC (Total Electron Content) separation', and transfer of the amplitudes, median clock offsets, and the XX-YY phase offset from calibrator to the target field; (iii) removal of bright off-axis sources; (v) amplitude and phase (self-)calibration of the target field at medium (20-30 arcsec) resolution This is then followed by a scheme to correct for DDEs in order to reach near thermal-noise-limited images using the full resolution offered by the longest 'Dutch-LOFAR' baselines of about 120 km. All calibration steps are performed with the BLACKBOARD SELFCAL (BBS) software system (Pandey et al. 2009), and other data handling steps were undertaken with the LOFAR Default Pre-Processing Pipeline (DPPP). These steps are explained in more detail below. The direction-dependent calibration scheme is described in full by vW16.
We used the full frequency coverage of the 10 min 3C 196 observation as the primary calibrator observation to derive the timeindependent instrumental calibration including amplitudes, median clock offsets and the XX-YY phase offset. We did not use the simultaneous, but sparse, frequency coverage on the calibrator 3C 294, nor did we use the second calibrator, 3C 295. For the Boötes field, we selected 200 out of the total 366 observed SBs (55 per cent) covering the frequency range 130-169 MHz for further processing. The main limitation to the number of SBs processed was computational time; the main data reduction was carried out on a node with 20 virtual cores on ASTRON's CEP3 cluster 3 and most of the parallelization in the reduction is performed over 10-SB blocks, so that the full 'facet' reduction could be achieved in a reasonable time.

Flagging and RFI removal
The initial pre-processing of the data was carried out using the Radio Observatory pipeline and consisted of RFI excision (using AOFlagger; Offringa et al. 2010;Offringa, van de Gronde & Roerdink 2012), flagging the noisy first channel and last three channels of each SB, and averaging in time and frequency to 2 s and eight channels per SB. The data were stored at this resolution in the LOFAR LTA at this point. One core station (CS007) and one substation of another core station (CS501HBA1) were flagged entirely due to malfunction (failure to record data or low gains). CS013 was also flagged entirely due its different design (the dipoles are rotated by 45 • ).

Calibration transfer from Primary Calibrator 3C 196
Using BBS, we obtained parallel hand (XX and YY) gain solutions for 3C 196, on time-scales of 2 s for each frequency channel independently. In this step, we also solved simultaneously for 'Rotation Angle' per station per channel to remove the effects of differential Faraday Rotation from the parallel hand amplitudes. The solutions were computed with the LOFAR station beam applied to separate the beam effects from the gain solutions. We used a second-order spectral model for 3C 196 consisting of four point sources separated by 3-6 arcsec each with a spectral index and curvature term (Pandey, private communication). The total flux density of this model differs from the SH12 value by a factor of 1.074 ± 0.024. We used these calibration solutions to determine the direction-independent and time-invariant instrumental calibrations, including amplitude calibration, correction of clock delays between the remote and core stations, and an offset between the XX and YY phases.
The remote LOFAR stations each have their own GPS-corrected rubidium clocks, which are not perfectly synchronized with the single clock that is used for all the core stations. The offsets between the remote station clocks and the core can be of the order of 100 ns, which is large enough to cause strong phase delays within a single SB for the remote-remote and core-remote baselines. However, this offset appears to be fairly constant and similar for observations separated by days to weeks, and can thus safely be assumed to be time-invariant. In addition to these clock offsets, the individual clocks can drift within ±15 ns over the course of a few tens of minutes, before being reset to the offset values (i.e. the drifts do not accumulate over the course of several hours). The primary calibrator phase solutions were used to calculate the clock offsets to be applied to the target observation (we do not correct for the 15 ns clock drifts because the effects they induce are not time-invariant). We used the 'Clock-TEC separation' method, described in detail by vW16, which uses the frequency-dependent phase information across the full frequency range to separate the direction-independent clock errors from the direction-dependent ionospheric effects. The clock phase errors, or delays, vary linearly with frequency (phase ∝ δt × ν, where δt is the clock difference), while the ionospheric phases vary inversely with frequency (phase ∝ dTEC × ν −1 , where dTEC is the differential TEC). Fitting was performed on a solution interval time-scale of 5 s and smoothed with a running median filter with a local window size of 15 s. The clock offsets for the remote stations in our observation were between −90 and 80 ns with respect to the core. The ionospheric conditions during the 10 min calibrator observation were good, showing relatively smooth variations in the differential TEC of ≈0.2 × 10 16 m −2 .
For a few stations, we found small but constant offsets between the XX and YY phases. We determined these offsets by taking the median phase difference between the XX and YY phases, during the 3C 196 observation for each station.
The amplitudes were inspected for outliers and smoothed in the frequency axis with a running median filter with a window size of 3 SBs (≈0.6 MHz), and a single median value in the time axis of the 10 min observation.
These calculated median clock offsets, XX-YY phase offsets and amplitude values, were transferred from the 10 min observation of 3C 196 to the target-field data. The resulting target-field visibilities have amplitudes in Jy and are free of time-invariant clock offsets and XX-YY phase offsets.

Removal of bright off-axis sources
A few radio sources are sufficiently bright to contribute flux through the sidelobes of the station beams, the amplitudes of which are strongly modulated in frequency, time and baseline as they move in and out of the station-beam sidelobes. To remove these effects, we simply predicted the visibilities of the brightest of these 'A-team' sources (Cyg A, Cas A, Vir A, and Tau A) with the station beam applied in BBS, and flagged all times, frequencies and baselines where the contributed apparent flux density from these sources exceeded 5 Jy. The 5 Jy limit was set, based on visual inspection of the predicted visibility amplitudes and experience with this and other LOFAR HBA data sets. It was found that the contributed flux was 5 Jy in the majority of the frequency-time-baseline stamps, and exceeding several tens of Jy in only a few per cent of the frequency-time-baseline stamps. The amount of data flagged in this step was typically 2-5 per cent per SB.

Averaging
The data were then averaged in time to a more manageable 8 s and two channels per SB (98-kHz channelwidth). The main limit on the time resolution is set by the requirement to avoid decorrelation due to rapid ionospheric phase variations. Even at this time and frequency resolution, we expect some smearing at large radial distances from the field centre due to time and bandwidth smearing, of the order of 7 and 10 per cent, respectively, at the half-power point of the primary beam (2.1 • from the pointing centre) and 13 and 15 per cent, respectively, at 2.5 • at 150 MHz. The individual corrected SBs were combined in groups of ten, providing data sets of ≈2-MHz bandwidth and 20 channels, each of which was ≈30 GB in size.

Self-calibration of target
We used a single 10-SB block at 148-150 MHz on which to perform direction-independent self-calibration. This block was largely free of RFI and lies at the peak of the response of the HBA tiles. The 2-MHz bandwidth means that the signal-to-noise ratio within the 10-SB block was high enough to obtain coherent solutions in each timestep, while the bandwidth was not so high as to experience decorrelation due to ionospheric effects. We started with a model derived from the 30 deg 2 GMRT image of the Boötes field at 153 MHz (Williams et al. 2013). The resolution of this initial sky model is 25 × 25 arcsec -the native resolution of the GMRT image -and it includes all sources in the GMRT image greater than 20 mJy (≈6σ ). The brightest 10 sources in the GMRT model were replaced by Gaussian components taken from the ≈5 arcsec resolution FIRST catalogue (Becker, White & Helfand 1995), with their total GMRT 150-MHz flux density and flux density ratios taken from FIRST. While this was not strictly necessary, given the resolution of the imaging in this self-calibration step, it was found that it did improve the calibration. This may be due to the fact that the brightest source in the GMRT model is only barely resolved, while in FIRST it consists of four components.
The self-calibration was performed with two iterations with phase-only solutions followed by two iterations with amplitude and phase solutions. The solution interval in all iterations was 8 s, and we obtained a single solution in frequency, neglecting phase changes within the 2-MHz bandwidth. In each calibration step, the station beam model was applied in the 'solve' step in BBS, i.e. the model visibilities were corrupted by the station beams before the gain solutions were derived. In this way, the solutions do not contain the beam terms. The solutions were smoothed using a running median filter 30 s in width to remove outliers. The data were corrected for the gain solutions, as well as the station beam in the phase centre. Imaging was carried out using AWIMAGER (Tasse et al. 2013), which accounts for the non-coplanar nature of the array and performs a proper beam correction across the field of view. Each imaging step in the self-calibration cycle was carried out with the same parameters: a field of view of 6.4 • , a resolution of ≈22 arcsec imposed by the combination of a maximum w term of 10 kλ, an outer uv-limit of 10 kλ, and Briggs (1995) robust weighting (robust=0.25). This weighting results in a slightly lower resolution, with less emphasis on the calibration artefacts. The imaging was always performed in two stages: first without a mask and then with a mask. The CLEAN masks for each image were generated automatically using the Python Blob Detection and Source Measurement (PyBDSM) software (Mohan & Rafferty 2015), with rms_box = (boxsize, stepsize) =(85, 30) pixels, to create a 3σ The resolution is 23 × 20 arcsec. The grey-scale shows the flux density from −1.5σ to 6σ , where σ = 1 mJy beam −1 is the approximate rms noise in the central part of the image. Calibration artefacts are clearly visible around the brightest sources as only direction-independent self-calibration has been performed. island threshold map from an initial image made without a mask. From the resulting images, we created a new sky model, again using PYBDSM [with rms_box=(150, 40) pixels, thresh_pix=5σ and thresh_isl=3σ ], and included all Gaussians in the model. In each additional iteration (first phase-only self-calibration, and then two iterations of phase and amplitude self-calibration), solutions were obtained relative to the original uncalibrated data. We made no attempt to improve the resolution in each self-calibration cycle, as the DDEs become significantly worse at higher resolution. A part of the final self-calibrated image for the 10-SB block at 149 MHz is shown in Fig. 2. The noise level achieved is ≈1 mJy beam −1 . While deeper images at a similar resolution can be made with more bandwidth, this is sufficient for the purpose of initial calibration prior to the direction-dependent calibration.
All the 10-SB blocks were then corrected for the station beam in the phase centre before further processing. We then used the final image of the self-calibration cycle at 148-150 MHz to make a Gaussian input sky model with PYBDSM to (self)-calibrate the remaining 10-SB blocks. All the components in this model were assumed to have a spectral index of −0.8. For all the bands, we performed a single phase and amplitude calibration against this model on 8 s so-lution intervals. The amplitude self-calibration calibration is done here to clean up some artefacts around the brightest, most dominant sources. The purpose of this step is to provide the best possible sky model before direction-dependent calibration. These solutions are not applied in the direction-dependent calibration; thus, if any frequency dependent amplitude errors are introduced, they are not fixed in the data.
After calibration, images of each band were made with the Common Astronomy Software Applications (CASA; McMullin et al. 2007) version 4.2.1 using W-projection (Cornwell, Golap & Bhatnagar 2008 to handle the non-coplanar effects; CASA does not allow for the full beam correction across the field of view, but it does allow a much larger field to be imaged than with AWIMAGER. Because the beam correction was not performed in imaging, the resulting images are apparent sky images. The imaging was carried out in each 10-SB block in two iterations. The first image was made at 'medium resolution' using a uv-cut of 7 kλ and Briggs weighting (robust = −0.25) to limit the resolution to ≈29 arcsec, with 7.5 arcsec pixels; the field of view imaged was ≈10 • . Next, in each 10-SB data set, we subtracted the CLEAN components with these calibration gain solutions applied and reimaged the subtracted data using a 'low resolution' of ≈2 arcmin with a uv-cut of 2 kλ and Briggs weighting (robust = −0.25), 25 arcsec pixels, and a field of view of ≈30 • . This field of view extends to the second sidelobes of the station beams, allowing us to image and subtract these sources. The low-resolution image also picks up extended low-surface-brightness emission in the field not cleaned in the medium-resolution image. Both medium-and low-resolution images were created with CLEAN masks automatically generated from images made without CLEAN masks using PYBDSM with rms_box =(50, 12) and rms_box =(60, 12), respectively. The CLEAN components in the low-resolution image were then subtracted in the same way as the medium-resolution components and a combined list of CLEAN components for the medium-and low-resolution images was created.
The resulting products are 20 sets of 2-MHz residual data sets between 130 and 169 MHz -i.e. with all sources out to the second sidelobes subtracted using the gain solutions, but with the residual data itself not corrected for the gain solutions (in this way any errors in the self-calibration solutions can be corrected later). In addition, there are 20 corresponding CLEAN-component apparent sky models of the sources that were subtracted. These products serve as the input for the direction-dependent calibration scheme, which is described in the following section.

Directional calibration -'Facet' scheme
Significant artefacts remain in the self-calibrated images, even at 20 arcsec resolution, and the rms noise of 1-3 mJy beam −1 is a factor of 3-5 higher than what is expected with these imaging parameters. Both of these issues result from the DDEs of the station beams and ionosphere. The variation in noise as a function of frequency is largely due to the variation in sensitivity of the HBA tiles and to variations in RFI across the LOFAR frequency band. To correct for the artefacts, improve the noise and make high-resolution images, we follow the direction-dependent calibration, or 'facet', scheme, of vW16, in which calibration is performed iteratively in discrete directions and imaging is carried out within mutually exclusive facets around each calibration direction. The key concern is keeping the number of degrees of freedom in the calibration small with respect to the number of measured visibilities, in order to facilitate solving for DDEs in tens of directions. The 'facet' scheme has the following underlying assumptions.
(i) The only calibration errors are a result of ionosphere and beam errors.
(ii) The station beams vary slowly with time and frequency. (iii) Differential Faraday rotation is negligible, so that XX and YY phases are affected identically by the ionosphere.
(iv) The phase frequency dependence is phase ∝ v −1 as a result of ionosphere only. And (v) DDEs vary slowly across the field of view.
For a more detailed description and discussion of the underlying problems and assumptions see vW16 and references therein. The following subsections describe our implementation of the 'facet' scheme on the Boötes field data.

Facetting the sky
Initially, we identified 28 calibration directions or groups consisting of single bright sources or closely (a few arcminutes) separated sources with a combined total flux density 0.3 Jy. The bright source positions or centres of the groups define the facet directions that were used to tile the sky using Voronoi tessellation, ensuring that each point on the sky lies within the facet of the nearest calibrator source. The assumption, here, is that the calibration solutions for the calibrator group applies to the full facet. The typical facet size is a few tens of arcminutes in diameter. Around and beyond the HPBW of the station beam, the assumption that DDEs, in particular the beam, vary slowly across the field of view becomes less valid, and smaller facets are required to capture the changes. We found that two of the facets, initially defined, were too large and showed worsening calibration artefacts away from the calibrator source. These two facets were subdivided into smaller facets, each having new calibrator groups. The final set of facets is shown in Fig. 3. These facets are described by CASA regions and image masks. The image masks are constructed from the regions such that they are centred on the calibrator group for that direction and, when regridded to a single image centred on the pointing centre, will not overlap with any other facet. The following steps were then performed for each direction sequentially, starting with the brightest calibrator sources.

Directional self-calibration
The CLEAN components of all the sources within the calibrator group were added back in each 10-SB data set (with the directionindependent calibration solutions with which they had been initially subtracted). These data sets were then phase-rotated to the direction of the calibration group and averaged in frequency (but not in time) to one channel per 2-MHz data set. The frequency averaging greatly speeds up calibration without any bandwidth-smearing effects in the small (a few arcminutes) calibrator group images.
A self-calibration cycle with four iterations was then performed. The imaging at each iteration was carried out using CASA with the full 130-169-MHz bandwidth and multifrequency synthesis (MFS) CLEAN with a second-order frequency term (i.e. including spectral index and curvature, nterms=2 Conway, Cornwell & Wilkinson 1990), with the automated masking described in Section 3.1.5. The masks made in this way exclude any negative bowls around sources, and can, based on visual inspection of the images and masks, including CASA regions specified by the user to account for extended sources with complicated sidelobes. The imaging resulted in a CLEAN-component model with both flux and spectral index. Multiscale CLEAN (MS-MFS; Rau & Cornwell 2011;Cornwell 2008) was used only in the case of a few complex extended sources. We used a Briggs robust parameter of −0.25, a pixel size of 1.5 arcsec, and imposed a uv minimum of 80 λ to achieve a resolution of ∼5.6 × 7.4 arcsec. Using a more negative robust weighting allows for higher resolution images after the direction-dependant effects have been accounted for.
In the first two self-calibration cycles, we solved for a single Stokes I phase-offset and TEC term per station in groups of five 10-SB data sets, i.e. within 10-MHz bands. Across the full 40-MHz bandwidth, this gave a total of eight parameters (in four frequency bands, a phase and TEC solution) per station per solution interval. The solutions were computed on time-scales of 8 s where possible, but this was increased to 16 s for fainter calibration groups and even to 24 s for the faintest calibration groups, where the signal-to-noise was lower.
In the third and fourth iterations of self-calibration, we solved first for the short time-scale phase-offset and TEC terms, pre-applied these 'fast phase' solutions and subsequently solved for phases and amplitudes, i.e. XX and YY complex gains, independently per 10-SB data set on time-scales between 5 and 30 min, depending on the flux density of the calibrator group. This additional 'slow gain' calibration yielded an additional four parameters per 10-SB block and primarily takes out the slowly varying complex station beams. As expected, the phase component of these solutions is small because the fast phase component has been taken out. In the final self-calibration cycle, the amplitudes were normalized to unity across the full frequency range to prevent changes to the flux scale. The normalization corrections were typically smaller than a few per cent. Solutions were obtained for every time step, and outliers were removed by filtering. The final solutions were filtered in time twice by passing them through a sliding window median filter of width 10 solution intervals where outliers greater than 10 (first pass) and four(second pass) times 1.4826 times the median distance from the median were replaced with the median value.
Example good and bad solutions for a small sample of stations are shown in Figs 4 and 5 for a single direction. We visually inspected these solutions and images at each self-calibration step for each direction. While the bad phase solutions do appear almost decorrelated, there is still a clear improvement in the image quality for these directions. We accept the bad solutions, since we require solutions for each facet direction in order to fully cover the field of view. Fig. 6 shows example solutions for all the directions as a snapshot in time. We note that there is consistency in both the phase and amplitude solutions for the discrete directions across the field of view, where the solutions for each direction have been obtained independently. This shows that the solutions do make physical sense. We have not yet attempted to spatially filter or smooth the solutions, which would reduce outliers such as the one clearly visible in the amplitudes for station RS407HBA in Fig. 6. Indeed, an additional improvement in this method may be to interpolate the solutions between calibrator sources. When viewed as a movie in time, trends can be seen 'moving across' the field of view, in particular in the fast phases, which is consistent with ionospheric phase disturbances propagating through the field of view. The significant improvement made in the self-calibration cycles is demonstrated by the calibrator images shown in Fig. 7. It is clear that both the phase and amplitude calibration are currently required: the phase distortions resulting from ionospheric effects will always have to be corrected for, but future improvements in the LOFAR beam models may eliminate the need for the amplitude calibration.
The total number of parameters solved for across all 34 facets is about 0.9 million, ∼250 times less than the ∼220 million visibilities. This is well within the requirement that the number of degrees  of freedom be significantly less than the available visibilities. For comparison, solving for all eight Jones terms on short time-scales in 34 directions would result in over 20 times as many parameters.

Facet imaging and subtraction
After obtaining the direction-dependent calibration solutions for the given direction, the remaining sources within the facet were added back (with the direction-independent solutions with which they had been subtracted). Assuming that the solutions for the calibrator group apply to the full facet, we applied those solutions to the facet data at the original two channels per SB resolution, which allows the 1/ν dependence of the TEC term to be applied on a channel-tochannel basis. At this point, the corrected facet data were averaged five times in frequency and three times in time (to 0.5 MHz per channel and 24 s) to avoid excessive bandwidth and time smearing within the facet image. Each facet was then imaged with MS-MFS CLEAN with nterms=2 over the full 130-169-MHz bandwidth, with a Briggs robust parameter of −0.25, 1.5 arcsec pixels and a uv minimum of 80 λ. Note that since all facets have different phase centres, their uv coverage differs slightly and so the restoring beams are slightly different. The facet masks were used to CLEAN only within the facet. Sources outside the facet boundary appear only as residuals because they were not added back in the uv data. As in Section 3.1.5, we do not use AWIMAGER mainly due to its limitations in imaging beyond the HPBW of the station beam.
Each facet image provided an updated sky model that was then subtracted from the full-resolution data with the corresponding direction-dependent solutions, thereby, improving the residual data to which the subsequent facets were added. This process (from Section 3.2.2) was repeated until all facets had been calibrated and imaged. The order in which the facets were handled is determined by the severity of the calibration artefacts in the direction-independent images, which roughly corresponds to the brightness of the calibration groups, so that the directions with the worst artefacts were corrected first and did not influence later directions.
After all the facets were calibrated in this way, we reimaged all the facets without doing any additional frequency or time averaging (i.e. leaving the data at 0.1 MHz per channel and 8 s). This step removes the artefacts present in the given facet resulting from bright sources in neighbouring facets, which had only been calibrated after the given facet. It reduces the rms by a few per cent. This was achieved by adding back the facet sky model to the residual data, applying the relevant directional-dependent solutions, and imaging with the same parameters (using the full 130-169-MHz bandwidth with nterms=2, robust=−0.25, a pixel-size of 1.5 arcsec, and a uv minimum of 80 λ). At this point, we applied a common restoring beam of 5.6 × 7.4 arcsec to all the facets; this resolution was chosen as the smallest beam containing all the fitted restoring beams from the individual facets. The facet templates were constructed such that, when regridded to a single image centred on the pointing centre, the facet images do not overlap by a single pixel. Thus, a single 'mosaic' image was constructed by regridding the facet images, replacing the pixels outside the facet boundary with zeroes and summing the images. In this way, there are no clear facet boundaries in the final image. Only four sources lie on the facet boundaries -i.e. they have CLEAN components in two facets. For two of these sources that are found in the GMRT catalogue, their total flux is within the errors of the GMRT fluxes, so we infer that the total flux of sources on the boundaries is conserved.
The resulting products are (i) high-resolution images of the facets, combined in a single image ('mosaic') covering the full field of view, (ii) short-time-scale phase corrections for the variation of the ionosphere in each facet direction, and (iii) long-time-scale phase and amplitude corrections for the station beams in each facet direction.

'Primary beam' correction
CASA was used to image the individual facets, so the images are in 'apparent' flux units, with the station beam taken out only in the phase centre. In order to measure real sky fluxes, we performed a primary beam correction to take into account the LOFAR station beam response across the field of view. We used AWIMAGER to produce an average primary beam map from all the 20 10-SB data sets, using the same Briggs weighting (robust=−0.25), but with much larger pixels (5.5 arcsec), and imaging out to where the average station beam power drops to 20 per cent (∼6.2 • in diameter). We corrected the combined 'mosaic' image by dividing by the regridded AWIMAGER average primary beam map. 4 We imposed a 'primary beam' cut where the average primary beam power drops below 40 per cent, 5 at an approximate radius of 2.44 • , which results in an image covering a total area of ≈19 deg 2 .

F I NA L I M AG E A N D C ATA L O G U E
The combined 'mosaic' image at 5.6 × 7.4 arcsec resolution is shown in Fig. 8. The central rms noise level is relatively smooth and 125 μJy beam −1 , and 50 per cent of the map is at a noise level below 180 μJy beam −1 (see also Fig. 9). There is a small amount of 'striping' (see e.g. the Northern part of the image) indicative of some low-level residual RFI. This is localized and at the level of <2σ , but should be addressed before deeper images are made. A small portion of the image covering the inner 0.25 deg 2 is shown in Fig. 10 to illustrate the resolution and quality of the map. There remain some phase artefacts around the brightest sources (see for example the source in the lower right of the image in Fig. 10), which have not been entirely removed during the facet calibration. While these are localized around the bright sources and have little impact on the majority of the map, they do affect some nearby sources. Fig. 11 shows a comparison between the 153-MHz GMRT image, the LOFAR 148-150-MHz direction-independent self-calibration image (see Section 3.1.5) and the final 130-169-MHz directiondependent calibrated image for three arbitrary positions. This serves to illustrate the significant improvement in both noise and resolution achieved with the new LOFAR observations over the existing   GMRT data, which has an rms noise level of ≈3 mJy beam −1 and resolution of 25 arcsec.

Source detection and characterization
We compiled a source catalogue using PYBDSM to detect and characterize sources. We ran PYBDSM on the final 'mosaic' image, using the pre-beam-corrected image as the detection image and the primary beam-corrected image as the extraction image. The rms map was determined with a sliding box rms_box =(160, 50) pixels (i.e. a box size of 160 pixels every 50 pixels), with a smaller box rms_box =(60, 15) pixels in the regions around bright sources (defined as having peaks exceeding 150σ , where σ is the sigmaclipped rms across the entire field). Using a smaller box near bright sources accounts for the increase in local rms as a result of calibration aretefacts. For source extraction, we used thresh_pix=5σ and thresh_isl=3σ (i.e. the limit at which flux is included in the source for fitting). Fig. 9 illustrates the variation in rms noise thus determined across the combined 'mosaic' image.
PYBDSM fits each island with one or more Gaussians, which are subsequently grouped into sources. We used the group_tol parameter with a value of 10.0 to allow larger sources to be formed. This parameter controls how Gaussians within the same island are grouped into sources; the value we chose is a compromise between selecting all Gaussians in a single island as a single source, thus, merging too many distinct nearby sources, and selecting them as separate sources, thus, separating the radio lobes belonging to the same radio source. Sources are classified as 'S' for single sources and 'M' for multiple Gaussian sources. PYBDSM reports the fitted Gaussian parameters as well as the deconvolved sizes, computed assuming the image restoring beam. Uncertainties on the fitted parameters are computed following Condon (1997). The total number of sources detected by PYBDSM in all the facets is 6349 comprised of 10 771 Gaussian components of which 3010 were singlecomponent sources. We allowed PYBDSM to include sources that were poorly fitted by Gaussians; these 197 sources are included in the catalogue with the integrated flux density being the total flux density within the source island and flagged as having poor Gaussian fits ('Flag_badfit') and have no associated errors. Additionally, based on visual inspection, 71 sources were flagged as artefacts near bright sources, or detections on the edge of the image, or otherwise bad ('Flag_artefact', 'Flag_edge', 'Flag_bad'). These flags are included in the final catalogue presented in Section 4.7.

Resolved sources
In the absence of noise, resolved sources can easily be identified, based on the ratio of the integrated flux density to the peak flux density, S int /S peak > 1. However, since the uncertainties on S int and For 20 logarithmic bins in signal-to-noise ratio, the black points show the threshold below which 99 per cent of the sources lie in that bin. The red line shows a fit to this upper envelope. Right: the measured ratio of integrated to (smearing-corrected) peak flux density as a function of signal-to-noise ratio for all sources in the catalogue. The red line shows the fitted 99 per cent envelope.
S peak are correlated, the S int /S peak distribution is skewed, particularly at low signal to noise. We note that the scatter that we observe is large and skewed towards values >1. This is in some part due to the effects of bandwidth-and time smearing, which both reduce the peak flux densities of sources as a function of distance from the phase centre. Given the averaged channel, time and imaging resolution, we estimate (using the equations given by Bridle & Schwab 1989, for the reduction in peak flux) the combined effect of bandwidthand time smearing due to the averaging in the full data set to be of the order of 74 per cent (i.e. measured peak flux densities at 74 per cent of the expected value) at the edge of our field at 2.4 • . Our estimate for the correction factor for each source due to the combined effect is included in the source catalogue (at each source position).
The effect of noise on the total-to-peak flux density ratios as a function of signal to noise can be determined by running complete simulations in which artificial sources are injected into the real data and imaged and detected in the same way as the observed data. To determine an upper envelope of this distribution, we performed a Monte Carlo simulation in the image plane in which we generated 20 random fields containing ∼20000 randomly positioned point sources with peak flux densities between 0.1σ and 20σ , where σ was taken to be 110 μJy beam −1 . The source flux densities were drawn randomly from the real source count distribution, using a power-law slope. Sources were injected in the residual mosaic map produced after source detection with PYBDSM. Source detection was performed in the same manner described in Section 4.1, thus only ∼5000 sources in each field satisfy the detection criterion of peak flux density >5σ . The S int /S peak distribution produced from the Monte Carlo simulation is plotted in the left-hand panel of Fig. 12, after removing artefacts and false sources detected in the residual mosaic map. To determine the 99 per cent envelope, a curve was fit to the 99th percentile of 20 logarithmic bins across signal-to-noise ratio. The fitted envelope is characterized by by a function of the form 1.06 + 74.6/SNR 2.02 .
The flux density ratio as a function of signal-to-noise for our catalogued sources is shown in the right-hand panel of Fig. 12, where we have used the smearing-corrected peak flux densities. We therefore consider the 1748 sources above the fitted envelope from the point source simulation as resolved. These are flagged as resolved in the final catalogue presented in Section 4.7. Note that not all the PYBDSM sources with multiple Gaussian components are resolved by this criterion. Conversely, not all the single-component sources are unresolved.
Additionally, by a visual inspection of the images, we have identified 54 large extended sources that appear as separate sources within the PYBDSM catalogue because the emission from the lobes of these giant radio galaxies is not contiguous. These sources span sizes (largest angular size, LAS) of ≈20-250 arcsec. We have merged these components into single sources in the final catalogue, by taking the flux-weighted mean positions, summing their total flux densities, and retaining the maximum peak flux density value. These merged sources are flagged in the catalogue ('Flag_merged'). All but one of these sources meet the envelope criterion above and so are also classified as 'resolved'. All the sources with LAS>45 arcsec (including these merged sources and those already identified by PYBDSM), are presented in Fig. A1 in the Appendix.
Many diffuse extended sources are also clearly visible in the facet images. These sources are not detected by PYBDSM as their peak flux densities fall below the detection threshold. We identified four very clear large diffuse sources -see Fig. A2 in the Appendix. A full study of diffuse emission is deferred to a subsequent paper, as this will require reimaging of the facets with optimized parameters.

Flux density uncertainties
In this section, we evaluate the uncertainties in the measured LOFAR flux densities. We make corrections to the catalogue we present in Section 4.7, to account for systematic effects.

Systematic errors
Given the uncertainties in the low-frequency flux density scale (e.g. SH12) and the LOFAR station beam models, we may expect some systematic errors in the measured LOFAR flux densities. In order to determine any systematic offsets and place the final catalogue on to the SH12 flux scale, we have compared the LOFAR flux densities to those of the GMRT image of the Boötes field at 153 MHz (Williams et al. 2013). The veracity of the GMRT flux densities was evaluated both by comparing flux densities measured in the overlap areas of the seven individual pointings and through comparison with NVSS (the NRAO VLA Sky Survey; Condon et al. 1998). The assumed uncertainty on the GMRT flux density scale is 20 per cent. For the comparison, we selected only sources detected at high signal Figure 13. Map of the measured ratios of integrated flux densities for high signal-to-noise, small and isolated LOFAR sources with respect to the GMRT. The colourscale shows the flux density ratio.
to noise (S peak /σ > 10) in both maps. We have further limited the selection to isolated LOFAR sources, i.e. those with no other LOFAR source within 25 arcsec (the size of the GMRT beam), to ensure one-to-one matches. Finally, we consider only small LOFAR sources, with measured sizes of less than 50 arcsec, to rule out resolution effects, i.e. sources being resolved out by the poorer short baseline coverage of the GMRT. This yielded a sample of 420 objects. Limiting the selection only to those unresolved by LOFAR (and hence the GMRT) would give only 129 sources, so for the purposes of robust statistics we opt to select the larger small source sample over unresolved sources only. Before the comparison, the GMRT flux densities were multiplied by 1.078 to place them on the SH12 flux scale, based on the calibration model used. For this subsample of sources, we determined the ratio of integrated flux densities between LOFAR and the GMRT f S = S LOFAR /S GMRT . We measured a median ratio of f S = 1.10 with a standard deviation of σ f S = 0.20.
The flux density ratio showed no significant variation with distance from the phase centre. However, we noticed a small variation with position on the sky, plotted in Fig. 13. The variation is such that the LOFAR flux densities are slightly higher towards the north-north-west (upper right of the map) and lower towards the south-south-east (lower left of the map). We note that the trend is consistent across different facets so is a result of some global effect and not a result of discrete calibration in facets. Moreover, the trend is similar, but noisier, if we consider LOFAR sources extracted from the self-calibrated-only image from the single 10-SB block at 148-150 MHz. This further suggests that the flux density offset is not introduced by the direction dependent calibration procedure.
The flux density ratio is plotted as a function of the position angle between the source and the phase centre in Fig. 14, to which we fitted a sinusoid of the form f S = 1.11 + 0.10sin (101 • + φ) . 6 The median ratio was f S = 1.00 with a standard deviation of σ f S = 0.17 after applying this sinusoid function to correct the LOFAR flux densities. This correction assumes that the GMRT flux densities are 'correct' and, in particular, have no variation on the sky. We emphasize that after making this correction to the LOFAR flux densities they are on the SH12 flux scale. To validate this correction, we performed a similar comparison with sources in the deep 1.4-GHz WSRT catalogue of the Boötes Field (de Vries et al. 2002), again considering only the isolated, small and bright sources. Scaling the higher frequency flux densities to 150 MHz with a spectral index of −0.8, we recovered a similar, albeit noisier, trend. Given the large uncertainty in spectral index scaling, we neglect the flux scale differences of only 2-3 per cent between our LOFAR flux densities and the WSRT catalogue, which was tied to the NVSS scale (Baars et al. 1977). Moreover, in this case we are only investigating spatial variations, not any absolute scaling errors. This suggests that it is likely not to be GMRT pointing errors causing systematic trends in the GMRT flux densities. We speculate that the observed trend in the flux density errors may be the result of an incorrect primary beam correction, which itself may be caused by errors in the LOFAR station beam model used in AWIMAGER, or a map projection error.

Flux scale accuracy
To investigate the overall reliability of the flux scale, we have compared 189 small, isolated within 1 arcmin, and high signal-to-noise sources that are detected at higher frequencies both in NVSS at 1.4 GHz and WENSS at 325 MHz (the Westerbork Northern Sky Survey; Rengelink et al. 1997). For these sources, we computed the spectral index between the two higher frequencies and predicted the LOFAR flux density. The WENSS fluxes were first scaled by a factor of 0.9 to put them on the same flux scale (SH12). The predicted to observed flux density ratio is plotted in Fig. 15, for this sample. We calculate a mean flux density ratio of 1.01 ± 0.10 with standard deviation of σ = 0.20. We thus conclude that the corrected LOFAR flux densities are consistent with being on the SH12 flux scale. Moreover, the scatter includes source-to-source variations in spectral shape and uncertainties in the spectral index fits, so we can conclude that the flux scale is accurate to better than 20 per cent. Five outliers, at greater than 4σ , were excluded from the statistics. Based on inspection of the spectra of these outliers, considering also VLSS (at 74 MHz) and LOFAR LBA (at 62 MHz) detections and upper limits, they are consistent with having turnovers below ≈200 MHz or inverted spectra. Figure 15. Comparison of LOFR measured integrated flux densities with those predicted from WENSS and NVSS at higher frequencies. The mean ratio of 1.01 ± 0.10 shows that the LOFAR flux densities are on the same flux scale. Five outliers, which are consistent with having spectra which turn over or are inverted, are excluded.

CLEAN bias
The cleaning process can redistribute flux from real sources on to noise peaks, resulting in a 'CLEAN bias' (Becker et al. 1995;Condon et al. 1998). The effect is worse for observations with poor uv coverage due to increased sidelobe levels. Despite our good uv coverage and CLEAN masking, we have checked for the presence of CLEAN bias in our images. We injected point sources with flux densities drawn from the distribution of real sources at random positions into the residual uv data for a representative sample of facets. The simulated data were then imaged and cleaned in exactly the same manner as the real data, including automated CLEAN masking. We compared the measured peak flux density with the input flux density and measure a difference consistent with zero. We therefore conclude that there is no significant CLEAN bias.

Flux density uncertainty
The uncertainties of the flux density measurements themselves has been investigated by through a jackknife test, whereby, we have split the visibility data into four time segments of 2 h each. Due to the computational expense, this was only done for a representative sample of seven facets. The uv data for the subsamples were then imaged and sources extracted in a way identical to the original data. Fig. 16 shows a comparison of the measured total flux densities in the split images compared to the deep images as a function of the signal-to-noise ratio of each source in the split image. From this, we determined the signal-to-noise-dependent standard deviation of 0.66/SNR 0.52 .

Astrometric uncertainties
It is possible that errors in the phase calibration can introduce uncertainties in the source positions. Here, we evaluate any systematic offsets in the measured source positions and determine their uncertainties. We used the 1.4-GHz FIRST catalogue (Becker et al. 1995) to assess the LOFAR positional accuracy. The uncertainty on the FIRST positions is given by σ = s(1/SNR + 0.05) arcsec, where s is the source size. FIRST counterparts were identified within 2 arcsec of our LOFAR sources with high signal-to-noise ratios, i.e. S peak /σ local > 10. This yielded 968 matches, of which 313 were fitted with a single Gaussian in the LOFAR image, making them more likely to be point sources. For the high signal-to-noise point source sample, we measured a small mean offset in right ascension of α = α LOFAR − α FIRST = −0.037 ± 0.001 arcsec, with rms σ α = 0.44 arcsec and a somewhat larger mean offset in declination of δ = δ LOFAR − δ FIRST = −0.301 ± 0.001 arcsec with rms σ α = 0.59 arcsec. The offset is negligible, and we note that it is within the size of the LOFAR image pixels (1.5 arcsec). However, closer inspection of the offsets showed that the offset in declination varied systematically across the full 5 • field of view, and to a lesser extent for the offsets in right ascension. A correction for this offset has been made by fitting a plane, = aα + bδ + c, where is in arcsec, to the offset values and applying the fitted offsets to all sources in the catalogue. While this could be expressed as a rotation and therefore a sinusoidal correction could be made, as we have done for the flux densities, we find that the positional offsets are asymmetric with respect to the pointing centre. The fitted planes were α = −0.10(α − 218 • ) + 0.02(δ − 34.5 • ) − 0.01 and δ = 0.13(α − 218 • ) + 0.29(δ + 34.5 • ) − 0.34. These offsets and functional corrections are shown in Fig. 17. After applying this correction, we measured α = α LOFAR − α FIRST = −0.018 ± 0.002 arcsec with rms deviation σ α = 0.42 arcsec, and δ = δ LOFAR − δ FIRST = −0.008 ± 0.001 arcsec, with σ δ = 0.31 arcsec. We note that since the initial phase calibration of the LOFAR data was performed against the GMRT model sky; these positional offsets may originate from the GMRT model. We did evaluate the GMRT positions in a similar way and found no significant variations, but this may be due to the much larger positional uncertainty and the 25 arcsec synthesized beam of the GMRT map. We also consider LOFAR sources extracted from the self-calibrated-only image from the single 10-SB block at 148-150 MHz compared to FIRST and find a similar trend. This suggests that the position offsets are not introduced by the direction dependent calibration procedure. We hypothesize that a map projection or mosaic construction error could be the cause of these offsets, but note that it is a small correction.
The scatter in the offsets between the LOFAR and FIRST positions is a combination of noise-independent calibration errors, , in both the LOFAR and FIRST data as well as a noise-dependent error, σ , from position determination via Gaussian-fitting: To separate the noise-dependent and -independent uncertainties, we select from the above sample only the FIRST sources with position errors of less than 0.6 arcsec and measure an rms scatter of (σ α , σ δ ) LOFAR = (0.37, 0.35 arcsec) between the corrected LOFAR and FIRST source positions for this very high signal-to-noise subsample of 89 sources. From Becker et al. (1995), the FIRST calibration errors are ( α , δ ) FIRST = (0.050.05 arcsec). The noise-dependent fit errors, for both the LOFAR and FIRST, can be assumed to be small so we determine that the LOFAR calibration errors are ( α , δ ) LOFAR = (0.37, 0.35 arcsec). This scatter may contain a small contribution resulting from any spectral variation between 150 and 1400 MHz on scales smaller than the resolution of the surveys (≈5 arcsec).

Completeness
To quantify the completeness of the catalogue, we performed another Monte Carlo simulation in which we added simulated sources to the residual image (cf. Section 4.2). However, in this case, ∼10 per cent of the artificial sources inserted into the noise map were extended sources -Gaussians with FWHM larger than the beam size. This allows for a better estimate of the completeness in terms of integrated flux densities. The completeness of a catalogue represents the probability that all sources above a given flux density are detected. We have estimated this by plotting the fraction of detected sources in our simulation as a function of integrated flux density (left-hand panel of Fig. 18), i.e. the fraction of input sources that have a catalogued flux density using the same detection parameters. This detection fraction is largely driven by the variation in rms across the image, or visibility area. The number of detected sources as a fraction of sources that could be detected, accounting for the visibility area, is also shown in the figure. The completeness at a given flux density is determined by integrating the detected fraction upwards from a given flux density limit and is plotted as a function of integrated flux density in the right-hand panel of Fig. 18. This shows the completeness of the full catalogue. We thus estimate that the catalogue is 85 per cent complete above a peak flux density of 1 mJy.

Reliability
The reliability of a source catalogue indicates the probability that all sources above a given flux density are real. It is defined as the fraction of all detected sources in the survey area above a certain total flux density limit that are real sources and are not accidental detections of background features or noise. To estimate the reliability, we extracted sources from the inverted residual mosaic image, assuming that negative image background features are statistically the same as positive ones. We grouped the detected negative 'sources' by total flux density into 20 logarithmic flux density bins and compare these to the binned results of the regular (positive) source extraction as described in Section 4.1. For convenience, we define the real number of sources to be the number of positive sources minus the number of negative sources. The left-hand panel of Fig. 19 shows the false detection rate, FDR, determined from the number ratio of negative sources over positive sources per flux density bin. The peak around 2 mJy is explained by the fact that the detection efficiency drops off below this flux density as shown in Section 4.5. The right-hand panel shows the integrated reliability curve, R = 1 − FDR (> S int ), determined from the number ratio of real sources over positive sources above the total flux density limits that define the lower edges of the flux density bins. Errors are calculated based on Possionian errors on the number of sources per flux density bin. For a 1 mJy total flux density threshold, the reliability is 85 per cent.

Source catalogue
The final catalogue consists of 6 272 sources with flux densities between 0.4 mJy and 5 Jy and is available as a table in FITS format as part of the online version of this article. The catalogue is also available from the CDS. The astrometry in the catalogue has  been corrected for the systematic offset described in Section 4.3.2. Both the integrated and peak flux densities in the catalogue have been corrected for the systematic offset (Section 4.3.1). Resolved sources are identified as described in Section 4.2. Errors given in the catalogue are the nominal fit errors. All flux densities and rms values are on the SH12 flux density scale. A sample of the catalogue is shown in Table 2. Not included in this sample in Table 2, the catalogue also contains a number of simple flags, based on visual inspection, where non-zero values indicate: Column (14) -Flag_badfit, a bad Gaussian fit and so no parameters derived from the Gaussian fit; Column (15) -Flag_edge, a source on the edge of the mosaic such that some of the flux is missing; Column (16) -Flag_bad, a source has been successfully fit with Gaussians but visual inspection indicates a likely poor fit or other problem Column (17) -Flag_artefact, a source is identified as an artefact (this flag has two values -a '1' indicating a likely artefact and a '2' indicating an almost certain artefact); and Column (18) -Flag_merged, a large source whose source components have been manually merged into single sources.

R E S U LT S
In this section, we report two results based on the LOFAR catalogue: the spectral indices between 150 and 1400 MHz, and the 150 MHz faint source counts. Further analysis of these data will be presented in future publications.

Spectral index distributions
We use the deep WSRT 1.4-GHz data covering the Boötes Field (de Vries et al. 2002) to calculate spectral indices between 150 MHz and 1.4 GHz, α 1400 150 , the distribution of which is shown in Fig. 20. The WSRT map has a resolution of 13 × 27 arcsec, so some sources appear as separate sources in the LOFAR map but are identified as single sources in the WSRT image. To exclude erroneous spectral indices derived for such sources, we limit this selection to sources that are not identified as extended in either the LOFAR or WSRT catalogues and that do not have multiple matches within a 30 arcsec search radius.
Using LOFAR sources with flux densities greater than 2 mJy, we find a median spectral index between 1400 and 150 MHz of −0.79 ± 0.01 and scatter of σ = 0.30, which is consistent with previously (1) -IAU Source name.
(2) and (3) -flux-weighted right ascension (RA) and uncertainty. (4) and (5) (Sirothia et al. 2009), and−0.85 (Ishwara-Chandra &Marathe 2007). A detailed spectral index analysis using the other available radio data is deferred to later works. Analysis of the in-band LOFAR spectral indicies is also deferred to later work after the current LOFAR gain transfer problems have been solved.

Source counts
We used the LOFAR catalogue to compute the 150 MHz source counts down to ≈1 mJy. This is at least an order of magnitude deeper than previously studied at these low frequencies (e.g. McGilchrist et al. 1990;Intema et al. 2011;Williams et al. 2013). The source counts are computed using the integrated flux densities, but sources are detected based on their measured peak flux density over the local noise level. Thus, the completeness of the source counts depends both on the variation of the noise in the image and on the relation between integrated and peak flux densities. The latter is dependant both on systematic effects (e.g. smearing) and the intrinsic relation between integrated and peak flux densities of radio sources due to their intrinsic sizes. In the following paragraphs, we discuss these effects and how we correct them in deriving the source counts.

Visibility area
Due to the large variation in rms across the single pointing image (see Fig. 9), sources of different flux densities are not uniformly detected across the image, i.e. faint sources can only be detected in a smaller area in the inner part of the image. Moreover, smearing causes a reduction in peak flux density while conserving the integrated flux density, and the amount of smearing depends on the distance to the phase centre. We have noted already the effect of bandwidth-and time-smearing (see Section 4.2) and use the equations given by Bridle & Schwab (1989) to calculate an approximate correction to the peak flux density of each source based on its position in the map. The maximal correction is at most S meas peak ≈ 0.74S corr peak , so sources with corrected peak flux densities >6.7σ will have effective measured peak flux densities above the 5σ detection threshold. We therefore select only sources based on this threshold for deriving the source counts. To correct for the varying rms, we weight each source by the reciprocal of the area in which it can be detected, its visibility area, (e.g. Windhorst et al. 1985), based on its smearing-corrected peak flux density value. This also accounts for the varying detection area within a given flux density bin.

Completeness and reliability
We consider also a correction for the completeness of the catalogue (see Section 4.5). As the visibility area is considered separately, we use the red curves in Fig. 18 to determine a correction factor to account for the fraction of sources missed in each flux density bin. Additionally, we make a correction for the reliability by applying the FDR derived in Section 4.6, which acts in the opposite direction to the completeness correction.

Systematic effects
Another effect that could potentially influence both the peak and integrated flux densities is CLEAN bias, which could bias both downwards at the lowest flux densities, thus, leading to low source counts. However, we have shown (Section 4.3 that this is negligible because the use of masks in the imaging and good uv coverage. In general, noise can scatter sources into adjacent bins, again most noticeably at low flux densities. A positive bias is introduced by the enhancement of weak sources by random noise peaks (Eddington bias;Eddington 1913). Both of these effects could be quantified by simulations, but our source counts are not corrected for them, due to the computational expense of running the full required simulation.

Resolution bias
A resolved source of a given integrated flux density will be missed by the peak-flux-density selection more easily than a point source of the same integrated flux density. This incompleteness is called the resolution bias and to make a correction for it requires some knowledge of the true angular size distribution of radio sources. We have estimated a correction for the resolution bias following Prandoni et al. (2001). First, we calculate the approximate maximum size θ max a source could have for a given integrated flux density before it drops below the peak flux detection threshold. Fig. 21 shows the angular size of the LOFAR sources. We use the relation where b min and b max are the synthesized beam axes, and θ min and θ max are the source sizes, to estimate the maximum size a source of a given integrated flux density can have before dropping below the peak flux detection threshold. Given this θ max , we estimate the fraction of sources with angular sizes larger than this limit using the assumed true angular size distribution proposed by Windhorst, Mathis & Neuschaefer (1990): h (> θ max ) = exp [ − ln 2(θ max /θ med ) 0.62 ] with θ med = 2S 0.30 1.4 GHz arcsec (with S in mJy; we have scaled the 1.4-GHz flux densities to 150 MHz with a spectral index of −0.8). We have also calculated the correction using θ med = 2 arcsec for sources with S 1.4 GHz < 1 mJy (see Windhorst et al. 1993;Richards 2000). The resolution bias correction c = 1/[1 − h (> θ max )] is plotted in Fig. 22 for the two different assumed distributions. In correcting the source counts, we use an average of the two functions. We use the uncertainty in the forms of θ med and in θ max to estimate the uncertainty in the resolution bias correction. We further include a overall 10 per cent uncertainty following Windhorst et al. (1990). While we have used the extrapolated Windhorst et al. (1990) size distribution from 1.4 GHz to correct the source counts presented here, we note that the observed size distribution (see Appendix B) suggests that the low-frequency emission is more extended. Thus, the real resolution bias correction factor is likely to be somewhat larger, particularly in the lowest flux density bins, and may explain the turndown in source counts (see Fig. 23). A full study of the true low-frequency angular size distribution of radio sources is beyond the scope of this paper.

Complex sources
The source counts need to be corrected for multicomponent sources, i.e. cases where the radio-lobes are detected as two separate sources. The flux densities of physically related components should be summed together, instead of counted as separate sources. We use the method described in White et al. (2012) and Magliocchetti et al. (1998)  where S is in mJy and θ is in arcsec. Approximately 460 sources in the sample used for calculating the source counts, or 8.5 per cent, are considered to be a part of double or multiple sources for the source count calculation.

The low-frequency mJy source counts
The Euclidean-normalized differential source counts are shown in Fig. 23. Uncertainties on the final normalized source counts are propagated from the errors on the correction factors and the Poisson errors (Gehrels 1986) on the raw counts per bin. The source counts are tabulated in Table 3. Model source counts have been derived by Wilman et al. (2008) for the 151 and 610-MHz source populations predicted from the extrapolated radio luminosity functions of different radio sources in a CDM framework. We show the source counts for both AGN and star-forming (SF) galaxies on Fig. 23. The Wilman et al. (2008) model catalogue has been corrected with their recommended default post-processing, which effectively reduces the source count slightly at low flux densities. At low flux densities, it is likely that the Wilman et al. (2008) counts slightly overestimate the true counts due to double counting of hybrid AGN-SF galaxies. These models are based on low-frequency data at higher flux density limits and higher frequency data so some deviations are not unexpected; however, our observed counts do follow their model quite well. Mauch et al. (2013) suggest that the spectral curvature term used in the Wilman et al. (2008) Bridle et al. 1972;White et al. 1997;Ciliegi et al. 1999;Gruppioni et al. 1999;Richards 2000;Hopkins et al. 2003;Fomalont et al. 2006;Bondi et al. 2008;Kellermann et al. 2008;Owen & Morrison 2008;Seymour et al. 2008). This is a representative comparison and not an exhaustive list of available source counts. In particular, there are even deeper models of higher frequency counts using statistical methods (e.g. Vernstrom et al. 2014;Zwart, Santos & Jarvis 2015). The source counts are scaled assuming a spectral index of −0.8. The blue lines at the low flux density end show how the higher frequency counts would scale if a flatter spectral index of −0.5 was used, i.e. the blue line drawn through the tail of the source counts would lie at lower 150-MHz fluxes (left in the plot) and at lower normalized count values (down in the plot) mostly due to the S 5/2 term in the normalized counts. We note that the flattening of the source counts at 5-7 mJy, associated with the growing population of SF galaxies and faint radio-quiet AGN at lower flux densities (see e.g. Jarvis   (1) the flux density bins.
(2) the central flux density of the bin.

2015)
, is clear and is the same as that seen at the higher frequencies.
The further drop in the lowest flux density bins may be the result of some unaccounted for incompleteness in our sample or different resolution bias correction (see Section 5.2.4 and Appendix B).

C O N C L U S I O N
We have presented LOFAR HBA observations of the Boötes field made as part of the LOFAR Surveys Key Science Project. These are the first wide area (covering 19 deg 2 ), deep (reaching ≈120 − 150 μJy beam −1 ), high-resolution (5.6 × 7.4 arcsec) images of one of the extragalactic deep fields made at 130-169 MHz. These observations are at least an order of magnitude deeper and 3-5 times higher in resolution than previously obtained at these frequencies.
We have used a new calibration and imaging method to correct for the corrupting effects of the ionosphere and LOFAR digital beams. The radio source catalogue presented here contains 6276 sources detected with peak flux densities exceeding 5σ . We have quantified the positional and flux density accuracy of the LOFAR sources and used the source catalogue to derive spectral indices between 150 and 1400 MHz, finding a median spectral index of −0.79 ± 0.01. Finally, we have presented the deepest differential source counts at these low frequencies. These source counts follow quite well the model predictions of Wilman et al. (2008) and show the flattening at a few mJy as a result of the increasing contribution of SF galaxies.  (Taylor 2005).
LOFAR, the Low-Frequency Array designed and constructed by ASTRON, has facilities in several countries, that are owned by various parties (each with their own funding sources), and that are collectively operated by the International LOFAR Telescope (ILT) foundation under a joint scientific policy. The Open University is incorporated by Royal Charter (RC 000391), an exempt charity in England & Wales and a charity registered in Scotland (SC 038302). The Open University is authorized and regulated by the Financial Conduct Authority. Figure A1. Postage stamps of extended sources identified visually, showing all the sources with approximate LAS >45 arcsec. These include some Giant Radio Galaxies. The grey-scale shows the flux density from −3σ local to 30σ local , where σ local is the local rms noise. The scalebar in each image is 1 arcmin. Figure A2. Postage stamps of large diffuse sources identified by eye. The grey-scale shows the flux density from −3σ local to 15σ local , where σ local is the local rms noise. The scalebar in each image is 1 arcmin.

A P P E N D I X B : S O U R C E S I Z E D I S T R I B U T I O N
We have investigated the extent to which the extrapolation of the Windhorst et al. (1990) size distribution might be valid at low frequencies. This is relevant to the resolution bias correction to the source counts, described in Section 5.2.4. We have done this by comparing the true (deconvolved) angular size distribution of the 150-MHz LOFAR sources to the Windhorst et al. (1990) distribution, given by h(> θ max ) = exp[− ln 2(θ max /θ med ) 0.62 ], with the median size, in arcsec, as a function of flux density of θ med = 2S 0.30 1.4 GHz , where S is in mJy, and we have scaled the 1.4-GHz flux densities to 150 MHz with a spectral index of −0.8. The observed and extrapolated size distributions are shown in Fig. B1 for four flux density bins. The low-frequency emission appears to be more extended, which would suggest that the actual resolution bias correction should be somewhat larger. Future LOFAR Survey results, in particular, at high resolution using very long baseline interferometry with the LOFAR international stations, will allow for more detailed studies of the true angular size distribution of radio sources. Figure B1. The observed, deconvolved, source size distribution (in blue) in four flux density intervals. The magenta curves show the Windhorst et al. (1990) size distribution for the upper and lower bounds of the flux density bin. In each the vertical dotted lines are b maj (on the left), and the approximate maximum size a source can have before it drops below the peak-flux detection threshold (on the right). The catalogue will be incomplete for sources larger than the right line.