The VLBA CANDELS GOODS-North Survey. I - Survey Design, Processing, Data Products, and Source Counts

The past decade has seen significant advances in wide-field cm-wave very long baseline interferometry (VLBI), which is timely given the wide-area, synoptic survey-driven strategy of major facilities across the electromagnetic spectrum. While wide-field VLBI poses significant post-processing challenges that can severely curtail its potential scientific yield, many developments in the km-scale connected-element interferometer sphere are directly applicable to addressing these. Here we present the design, processing, data products, and source counts from a deep (11 𝜇 Jy beam − 1 ), quasi-uniform sensitivity, contiguous wide-field (160 arcmin 2 ) 1.6 GHz VLBI survey of the CANDELS GOODS-North field. This is one of the best-studied extragalactic fields at milli-arcsecond resolution and, therefore, is well-suited as a comparative study for our Tera-pixel VLBI image. The derived VLBI source counts show consistency with those measured in the COSMOS field, which broadly traces the AGN population detected in arcsecond-scale radio surveys. However, there is a distinctive flattening in the 𝑆 1 . 4GHz ∼ 100-500 𝜇 Jy flux density range, which suggests a transition in the population of compact faint radio sources, qualitatively consistent with the excess source counts at 15 GHz that is argued to be an unmodelled population of radio cores. This survey approach will assist in deriving robust VLBI source counts and broadening the discovery space for future wide-field VLBI surveys, including VLBI with the Square Kilometre Array, which will include new large field-of-view antennas on the African continent at ≳ 1000 km baselines. In addition, it may be useful in the design of both monitoring and/or rapidly triggered VLBI transient programmes.


INTRODUCTION
This decade sees a slew of cm-wavelength, arcsec-scale radio surveys with wide areal coverage (≳1 deg 2 ).The primary science objectives thereof include measuring the cosmic star formation rate history; constraining the population of low-luminosity and Compton-thick active galactic nuclei (AGN); improved statistical understanding of jet triggering and mechanical feedback; the relation to neutral hydrogen and molecular gas reservoirs in galaxies and large-scale struc-★ E-mail: roger.deane@wits.ac.za ture; image-plane transient and variability searches; as well as a range of cosmological applications through weak lensing and multiwavelength cross-correlation experiments enabled by wide and deep surveys at these angular scales (e.g.Schinnerer et al. 2010;Norris et al. 2011;Condon et al. 2012;Heywood et al. 2016;Jarvis et al. 2016;Smolčić et al. 2017;Murphy et al. 2017;Heywood et al. 2020;Muxlow et al. 2020;Chowdhury et al. 2022;Heywood et al. 2022;Hurley-Walker et al. 2022;Best et al. 2023, and references therein).These arcsecond-scale interferometric surveys at cm-wavelengths cover a wide range in depth and area on facilities including MeerKAT, the upgraded Karl G. Jansky Very Large Ar-ray, the Australian SKA Pathfinder Telescope, the Giant Metrewave Radio Telescope, the Murchison Widefield Array, and the International LOFAR Telescope.This survey-driven paradigm is motivated by statistical requirements of the above-mentioned scientific objectives; however, they typically lack higher angular resolution counterparts to address several long-standing questions faced in galaxy and AGN evolution, including the relative radio emission contributions from star formation and AGN-related activity for spatially unresolved, low-luminosity sources, as well as the discovery potential for scientifically-rich individual sources (e.g.binary supermassive black holes, gravitational lenses).In this pursuit, Very Long Baseline Interferometry (VLBI) can play a pivotal role as it spatially filters high brightness temperature emission and isolates radio cores and compact jets, hence constraining the compact AGN contribution.The enormous strides made over the past decade in VLBI survey area and depth make this a compelling approach, in concert with multi-wavelength programmes, to survey supermassive black hole accretion in the Universe if processing constraints can be overcome.
Since the first pioneering steps in wide-field VLBI (Garrett et al. 1996(Garrett et al. , 1999(Garrett et al. , 2001;;Lenc et al. 2008;Chi et al. 2013) it has been clear that the post-processing was an impracticably expensive computational task to perform using typical approaches, even for moderate fractions (≪ 0.1) of the available field-of-view.The problem stems from the required time and frequency resolution required to avoid significant sensitivity losses due to time and bandwidth smearing (e.g.Thompson et al. 2017).The required resolution to achieve this scales with baseline length, meaning that VLBI arrays require orders of magnitude higher time and frequency resolution to process the full field-of-view when compared to a km-scale interferometer.This results in many orders of magnitude larger data rates and storage requirements despite significantly lower source sky densities at a given flux density threshold.This has not hampered progress in VLBI since its first half-century has been almost exclusively focused on pointed observations of single objects at the centre of the narrow processed field-of-view.The VLBI visibility data are thus heavily averaged in both time and frequency since the processed and imaging field-of-view is typically restricted to only regions of at most a few arcseconds away from the pointing centres or phase centres.Given the low cm-VLBI sky source density at a ≫ 1 mJy sensitivity level, it was relatively seldom that there would be multiple detectable sources in the field before major bandwidth upgrades, so processed fields-of-view were typically limited to ≲ 1 arcsec 2 , rather than attempting to image the available field-of-view on the order of ∼0.1-1 deg 2 (with notable exceptions, of course, e.g.Garrett et al. 1996;Chi et al. 2013).
In order to image a non-negligible fraction of the available fieldof-view while minimizing point-source-sensitivity loss, the required time and frequency resolution leads to large processing demands, which are not scalable to wide-area surveys (≫ 1 deg 2 ) with the current archiving and correlator capacity of VLBI network operators.A transformational step in wide-field VLBI capability was the multiphase centre correlation technique (Morgan et al. 2011), which was implemented in the DiFX software correlator (Deller et al. 2011) and the SFXC correlator (Keimpema et al. 2015).This approach saves orders of magnitude in data volume over full-field imaging and has the added computational benefit of parallelised data processing streams.This has enabled wide-field VLBI observations of sources with a flux density of a few tens of Jy with relative ease and enabling more efficient wide-field VLBI surveys (e.g.Middelberg et al. 2011Middelberg et al. , 2013;;Deller & Middelberg 2014;Herrera Ruiz et al. 2017;Radcliffe et al. 2018;Petrov 2021).Contemporary wide-field VLBI surveys therefore have two possible strategies: (i) record the data at sufficiently high time and frequency resolution in order to image the entire region of interest, or (ii) image a number of considerably smaller sub-regions centred on a catalogue of phase centres, the positions of which are typically selected from known arcsec-scale radio or multi-wavelength detections.
Using the latter approach, recent wide-field VLBI surveys have demonstrated that VLBI is an integral and unique tool in the statistical study of AGN activity over cosmic time.The current state-ofthe-art for deep VLBI extragalactic fields is the COSMOS VLBA Survey (Herrera Ruiz et al. 2017) which employed the multi-phase centre technique to detect 468 VLA-detected sources down to an average noise rms  ∼ 10 Jy beam −1 in the 2 deg 2 COSMOS field.This was followed up with a ∼3× deeper, narrower tier which included the Greenbank Telescope and identified 35 sources below the 'VLBA-only' sensitivity, enabling the deepest VLBI source counts constraints to date.Of particular interest and utility in these surveys is a more detailed understanding of how radio source counts at milliarcsecond scales compared to the more readily available, higher brightness temperature sensitivity arcsec-scale radio surveys.This is a key step toward a statistical determination of what dominates the contribution of radio flux for different source populations.Therefore, there is a strong motivation for improved statistical power in the number of VLBI detections at ≲ 10 Jy flux density levels in legacy multi-wavelength extragalactic fields, enabling robust host galaxy characterisation and an enriched astrophysical analysis.
In this paper, we present a wide-field VLBI survey that aims to build on the technical progress described above and enhance our statistical understanding of compact radio sources.In Section 2 and Section 3, we outline the motivation and design of a quasi-uniform VLBI survey of an extragalactic legacy field.In Section 4, 5, and 6, we describe the technical details of the observations, calibration, imaging and source-finding techniques required to achieve this.Section 7 presents the cross-calibration catalogue and images.In Section 8, we derive the differential radio source counts from this uniform area survey and compare with them the COSMOS field VLBI source counts (Herrera Ruiz et al. 2018), alongside arcsec-scale radio surveys (e.g.Smolčić et al. 2017;Hale et al. 2023).
The data products from this survey will be used to carry out several analyses presented later in this paper series.These include a detailed comparison with other radio surveys, host galaxy properties, and analysis of the origin of the radio emission in Njeri et al. (in press, Paper II hereafter); a 12-epoch VLBI transient and variability search over >2 months; as well as leveraging the uniform sensitivity and VLBA's homogeneity to carry out a systematic study of statistical self-calibration schemes (e.g.Middelberg et al. 2013;Radcliffe et al. 2016).

SURVEY SCIENCE AND TECHNICAL DRIVERS
Here, we outline the primary motivations for carrying out a quasiuniform sensitivity survey over a deep, multi-wavelength extragalactic legacy field, as compared to the more traditional approach of selected phase centres on known radio detections from arcsec-scale interferometers.As discussed, different aspects of these will be presented in independent papers in the series, however, we briefly summarize them below.
(ii) The measurement of differential VLBI source counts at ∼10 Jy beam −1 sensitivity over a well-defined area for direct comparison with arcsec-scale radio surveys to better understand the relevant source populations.
(iii) Statistical approaches to the scientific analysis enabled by the deep multi-wavelength coverage, including high-resolution imaging from the Hubble Space Telescope (HST), Atacama Large Millimeter/submillimeter Array (ALMA), and the James Webb Space Telescope (JWST) in the future (e.g.Lindroos et al. 2016;Inami et al. 2020).
(iv) A systematic study of statistical self-calibration (i.e.multisource self-calibration, Middelberg et al. 2013;Radcliffe et al. 2016) using a homogeneous VLBI array, comparing the tradeoff between using a large number of marginally-detected sources with a small number of higher signal-to-noise detections.
(v) Scientifically useful lower limits on the compact, highbrightness radio emission within all galaxies catalogued within the selected extragalactic field, which can assist multi-wavelength AGN classification if sufficiently deep (e.g.Whittam et al. 2022).
Maximising the above required careful selection of the target extragalactic legacy field for this quasi-uniform VLBI survey experiment, which naturally has an AGN and galaxy evolution focus.A key requirement was the need for deep, high-resolution multi-wavelength coverage from radio through X-ray.Furthermore, previous VLBI observations were desirable as a comparison of known sources identified and characterized using the traditional VLBI approach Chi et al. (2013).We selected the CANDELS GOODS-North field as optimal for the above purpose.In addition, the field's Declination of +62 deg results in a favourable uv-coverage with the VLBA, resulting in a point spread function with comparatively low sidelobes and, therefore, relatively high imaging fidelity.We describe some of the key science drivers in surveying this field and extensions to it below.These are addressed in the current paper and subsequent papers in the series.

Comparison of low-luminosity AGN source counts from milli-arcsecond to arcsecond scales:
The unambiguous identification of radio-quiet, low-luminosity AGN remains a challenge with arcsec-resolution radio surveys, particularly as they reach Jy-level sensitivity (see, e.g.Padovani 2016, and references therein).These increasingly sensitive and wide area surveys show broad consistency but also some deviations from model predictions (e.g.Wilman et al. 2008;Bonaldi et al. 2019).For example, Whittam et al. (2013Whittam et al. ( , 2017Whittam et al. ( , 2020) ) use 15 GHz radio observations to show that there are a population of faint radio sources which are not included in simulated source counts (e.g.Wilman et al. 2008).They argue that these sources are the cores of low-luminosity, compact radio galaxies (potentially faint Fanaroff-Riley I (FRI) sources, Fanaroff & Riley 1974), which are not accounted for in models of the faint radio sky.VLBI observations are the ideal tool for probing these compact AGN, which are poorly understood.Furthermore, Hale et al. (2023) derive MeerKAT source counts at ∼ 15 − 100 Jy flux densities that are larger than model predictions; however, the relative split between AGN and star formation remains unclear.The role of low luminosity AGN and their compact radio emission properties is important to discern to understand its role in galaxy evolution, and larger VLBI-detected samples at low flux densities clearly provide a unique perspective (e.g.Herrera Ruiz et al. 2016).Improved statistical power of VLBI source counts at the  1.4GHz ≲ 100 Jy level over a wider range of extragalactic legacy fields will provide unique insights and constraints on the relative distribution of AGN and star-formation powered radio sources at low radio luminosity.

Host galaxy morphologies of VLBI-selected AGN:
The CANDELS programme is an HST near-infrared through ultraviolet legacy survey (Grogin et al. 2011;Koekemoer et al. 2011) that studied the evolution of black holes and galaxies between  = 1.5 − 8 and revealed a number of host galaxy properties of X-ray-selected AGN at intermediate to high redshift (e.g.Kocevski et al. 2012).The VLBA central pointing is based on the deeper, high-fidelity imaging of the CANDELS chip positions, enabling detailed analyses of VLBI-selected AGN host morphology and comparison with X-ray selection (and lower resolution radio), thereby probing the question of compact jet-triggering as a function of environment, addressing from a VLBI perspective, seemingly conflicting results on whether or not (1) major mergers play the dominant role in triggering AGN activity at higher redshifts (e.g.Hewlett et al. 2017;Marian et al. 2019) as outlined in the classical Sanders et al. (1988) scenario, where the merger-induced loss of angular momentum leads to black hole accretion (e.g.Hopkins et al. 2006); and (2) whether AGN hosts at  ∼ 1 − 3 are predominantly disk-like in morphology (e.g.Schawinski et al. 2011), and comparison with VLBI-selected local AGN (e.g.Kaviraj et al. 2015).The CANDELS fields have unparalleled optical/infrared image quality that enables the robust morphological modelling of the host galaxies required for these lines of study.
Probing the population of obscured AGN: Despite their significant cosmological importance, obscured quasars remain an elusive population in multi-wavelength surveys.Observations appear to confirm Silk & Rees (1998) and Fabian (1999) predictions that the Compton-thick AGN space density increases significantly towards higher redshift (Gilli et al. 2001;La Franca et al. 2005;Hopkins et al. 2006;Treister et al. 2009;Ueda et al. 2014;Gilli et al. 2022).While sophisticated selection techniques to select obscured quasars exist (e.g.Martínez-Sansigre et al. 2005), the dust insensitive, high-brightness temperature filter that VLBI observations provide makes this an important, complementary contribution towards obscured AGN identification.This is supported by Delvecchio et al. (2017), Radcliffe et al. (2021), andWhittam et al. (2022), who all show evidence that no single classification technique can reliably identify all VLBI sources in extragalactic fields as AGN, making these important detections to understand with greater statistical power.
The search for binary/dual and recoiling AGN: From a theoretical standpoint, we expect binary supermassive black holes to be common in the Universe (Begelman et al. 1980;Colpi & Dotti 2011).However, our observations at present do not agree with this forecasted ubiquity (Burke-Spolaor 2011; Koss et al. 2012;Comerford et al. 2013;Colpi 2014;Deane et al. 2015;De Rosa et al. 2019).This is a crucial disparity to reconcile as dual/binary supermassive black holes are predicted to play a significant role in galaxy evolution (e.g.Merritt & Milosavljević 2005;Van Wassenhove et al. 2012;Mayer 2013), for which observations show evidence, although spatial resolution can limit the ability to decouple this from the galaxy merger process in general (e.g.Komossa et al. 2003;Comerford et al. 2013;Ellison et al. 2013).Furthermore, binary supermassive black holes are expected to dominate the recently detected stochastic gravitational wave background at nanoHz frequencies (e.g.Sesana et al. 2008;Shannon et al. 2015;Burke-Spolaor et al. 2019;Agazie et al. 2023a;Antoniadis et al. 2023a;Reardon et al. 2023), with a poorly constrained dampening factor that gaseous environments and orbital eccentricity expected to play though their modification of the binary in-spiral rate (e.g.Ravi et al. 2014;Agazie et al. 2023b;Antoniadis et al. 2023b).VLBI has been shown to be an excellent method to discover binary/dual AGN (e.g. even in the classical radio galaxy, Cygnus A, Perley et al. 2017); as well as wide-field surveys (e.g.Herrera Ruiz et al. 2017;Njeri et al. 2023).Offset, potentially recoiling AGN are also expected during the merger process, with several candidates identified, simulations developed, and large-scale searches underway (e.g.Civano et al. 2010;Blecha et al. 2016;Hwang et al. 2020).The angular resolution of VLBI provides a unique perspective, and further, the environments are expected to be gas-rich and dust-obscured (e.g.Satyapal et al. 2017), adding impetus on high angular resolution radio observations in this multi-wavelength, multi-messenger field of astrophysics.Serendipitous search As outlined earlier, wide-field VLBI has demonstrated its ability to discover rare, astrophysically important objects (e.g.gravitational lenses, binary supermassive black holes; Herrera Ruiz et al. 2017;Spingola et al. 2019).As radio surveys increasingly improve the ability to probe the dynamic radio sky towards the SKA era (e.g.Bignall et al. 2015;Fender et al. 2015;Mooley et al. 2016;Radcliffe et al. 2019;Sarbadhicary et al. 2021), advances in wide-VLBI approaches offer a unique discovery technique, opening up the milli-arcsecond scale parameter space over increasingly wider areas with higher sensitivity.

SURVEY TECHNICAL DESIGN
Several scheduling, processing, and software considerations had to be made in the design of this survey, which we describe in this section.The primary objectives were to cover the CANDELS GOODS-North area of 160 arcmin 2 with contiguous, relatively uniform sensitivity while still retaining practical correlator and archive resource requests, as well as feasible processing requirements to carry out the imaging.A similar approach was taken to generating a large area (77 arcmin 2 ), quasi-uniform sensitivity map of the nearby face-on spiral galaxy M 51 to search for supernovae, X-ray binaries, and other transients (Rampadarath et al. 2015).As discussed in Section 1, if contiguous quasi-uniform sensitivity is the objective, then there are two possible strategies: either one must record the data at sufficient time and frequency resolution to image the entire field of interest; or configure multiple phase centres on a regular grid with a spacing based on time and bandwidth smearing considerations.Our selection of the latter approach requires dramatically lower instantaneous computational cost and available random-access memory than the former (particularly for imaging), however, it is still significantly more computationally expensive than the targeted approach of only placing phase centres on known arcsec-scale radio sources and imaging a small (few arcsec 2 ) region at those locations.
At the time of the proposal, our assessment was that image sizes of approximately 64,000×64,000 (64k hereafter) would be near the practical limit of the compute resources we had at our disposal.This imaging consideration set the angular area that would be imaged by each phase centre to approximately 64×64 arcsec 2 .To limit time and frequency smearing losses to the ≲20 per cent level, the correlator dump time and channel width were set to 2 seconds and 250 kHz, respectively.
The phase centres were positioned to follow a standard hexagonal mosaic pattern used for radio surveys with multiple pointings (see Figure 1).However, instead of arranging pointings that lie at the half-power point of their neighbour in Right Ascension to critically sample the sky, we position our phase centres using the constraint that no part of the field should have smearing sensitivity losses larger than the selected ∼20 per cent level for ∼4000 km baselines.This results in a configuration with a 'triangle' of adjacent phase centres that intersect at a radius of ∼35 arcsec.Note that phase centres are not jointly imaged since they are simply phase-rotated versions of one another and stem from the same original electric field measurements captured by the telescope, therefore, the noise is not independent.The locations were originally selected based on the planned HST chip positions, but as can be seen in Figure 1, there is imperfect coverage, depending on which HST filter is considered.
The configuration described required a total of 205 phase centres, which lay within another design constraint; the VLBA correlator limits on the total data output rate, which in turn limits the total number of phase centres for a given observational setup.The total data set size of all 205 phase centres is approximately 4 Terabytes, in fitsidi format.
Our approach of a regular grid of phase centres also has the disadvantage of a known source potentially being located in a region with higher noise rms than the full array sensitivity that a co-located phase centre would offer.Another disadvantage, or at least potential additional complexity to the data processing, is the primary beam correction (particularly if imaging is performed near or beyond the half power radius), which is described in Section 5.1.Despite these drawbacks, we choose to explore this approach with an emphasis on the possible variable/transient discovery parameter space offered by the quasi-uniform sensitivity, as well as our ignorance of the location of sources that may cross the detection threshold once multi-source selfcalibration is applied in future work.Quasi-uniform sensitivity may become especially important in deriving robust source counts with the expected additional detections from multi-source self-calibration.Given the flattening of the source counts near the detection threshold (described in Section 8), these new detections could be numerous and so the quasi-uniform sensitivity is an advantage.
A total of 24 hr of observing time was split into twelve approximately 2 hr schedule blocks to ease schedulability, detailed in Table 1.Each of these was chosen with suitably chosen starting hour angles in order to improve the combined uv-coverage (see Figure 3), however, not all had the full complement of 10 antennas participating (see Figure 2).
For each ∼2-hr schedule block, J0927+390, was observed for five minutes as a fringe finder (i.e.solved for delay and delay rate errors).The observations were made using the standard phase referencing mode with J1234+619 used as the complex gain calibrator, which is approximately 24. ′ 7 from the target field pointing centre.This calibrator was observed for one minute every five minutes.In total, the on-source integration time on the GOODS-North field was approximately 15.2 hours.This gives an expected thermal noise of 10 Jy beam −1 , assuming that 20 per cent of data was lost to radio frequency interference (RFI).

CALIBRATION
Calibration of these data was conducted entirely using the Common Astronomy Software Applications (CASA) software (McMullin et al. 2007;CASA Team et al. 2022).This software package now has all the tools necessary to calibrate VLBI data from the raw correlator data to science-ready images (Janssen et al. 2019;van Bemmel et al. 2019;CASA Team et al. 2022;van Bemmel et al. 2022).The multiple phase centre correlation method only includes the calibrator sources in one of the output data sets, not all.Any antenna-based calibration derived for this data set containing all calibrators is applied to all other phase centres to carry out direction-independent calibration.Here, we outline the calibration steps performed, noting the relevant CASA tasks in parentheses.
The data from each individual epoch was converted from fitsidi format to a CASA-compatible measurement set (importfitsidi) and concatenated together so that all of the separate epochs could be    A-E) that the twelve ∼2-hr schedule blocks were split into (see Table 1).
calibrated together (concat).Next, a priori calibration was derived.This includes corrections for the errors in the sampler thresholds (accor), the conversion of system temperature measurement ( sys ) into a CASA-compatible calibration table -to permit accurate flux scaling (gencal) -and the derivation of the VLBA gain curves (gencal).RFI was excised using the AOFlagger software (Offringa et al. 2012).Approximately 5 per cent of the channels at the edges of the spectral windows were removed, and auto-correlations were flagged (flagdata).Instrumental delays (the delay induced by differing the electronic paths from receiver to disk/correlator across the bandwidth) for each epoch were derived using a 2-minute solution interval on the bright source J0927+390 (fringefit).The application of these solutions removes phase discontinuities between the spectral windows.The delay rates were set to zero to avoid interpolation errors in time when these solutions are applied.We then derived normalised bandpass corrections using J0927+390 (bandpass).Next, time-variable delays, phase and delay rates were derived for each scan on the complex gain calibrator J1234+619 (fringefit).As the phase jumps between spectral windows were removed earlier, the spectral windows can now be combined to increase the signal-tonoise ratio (SNR) when deriving solutions.A small proportion of solutions (∼ 5 per cent) failed, which we attribute to the phase calibrator being relatively weak ( 1.6GHz ∼ 18 mJy).To minimize the loss of valid data, these flagged data were recovered by linearly interpolating between the nearest good solutions.
These solutions were applied to the phase calibrator source (applycal), and the phase calibrator was imaged (tclean).It was found that the phase calibrator is marginally resolved, as is clearly seen in both the visibility and image domains (see Figure 4), therefore, the delay, rates and phase corrections were refined using a model of the source.These solutions improved the signal-to-noise ratio of the phase calibrator image by ∼10 per cent.Self-calibration was then conducted on the phase calibrator to refine the amplitude and phase solutions.Typical VLBI observations at GHz frequencies will correct for the dispersive delays caused by the ionosphere.However, this correction is currently not available in CASA, so phase self-calibration was performed without combining the spectral windows together.This allowed us to approximate the ionospheric dispersive delays across the bandwidth using a step-wise approximation.Finally, amplitude self-calibration was performed to correct for  sys fluctuations and variable antenna gains over the course of the observations.
With the direction-independent, antenna-based calibration products derived, these were applied to the target fields in all 205 data sets.It is worth noting that each phase centre data set was individually flagged using AOFlagger, rather than the flags being transferred from a single phase centre.This is due to the different levels of RFI decorrelation at different phase centre coordinates, requiring each to be flagged individually for optimum RFI excision.This means that the flags for each data set are unique and thus need to be flagged individually (J.Morgan / M. Argo private communication; Morgan et al. 2013).

Primary beam correction
As demonstrated in Middelberg et al. (2013), the primary beam of the VLBA is very well approximated as an Airy disk with a diameter,  = 25.48 m.The Airy disk is simply the Fraunhofer diffraction pattern of a uniformly illuminated dish and is given by: where  1 () is the Bessel function of order one,  is the observing wavelength, and  is the radial distance from the pointing centre.The primary beam is normalised to the maximum response, so  0 = 1, and we assume that the primary beam is radially symmetric.As discussed, in typical wide-field VLBI observations, the phase centres are pre-selected (based on previous low-resolution radio observations) so that sources of interest are located at the centre of each phase centre.The subsequent primary beam corrections derived are only correct for the centre of the phase centre (these are often corrected in the -plane; e.g.see Middelberg et al. 2013;Cao et al. 2014;Radcliffe et al. 2018).However, in this case, the phase centres are not pre-selected based on known radio sources; therefore, the sources of interest are unlikely to be located near the phase centre.This means that the primary beam needs to be corrected across the whole image, or rather, wherever a source appears in that image.Since the VLBA is a homogeneous array, this primary beam correction can be conducted in the image plane, in contrast to heterogenous arrays, where the differing primary beam shapes can cause baselinebased amplitude errors, which must be accounted for in the -plane.In this survey, we calculate the primary beam response for each candidate detection in our catalogue using its location and Equation 1.We note that this approach does not account for the so-called 'beam squint' of the VLBA, caused by the offset between the beams for the two polarisations.Previous wide-field VLBI surveys have corrected for this using a frequency-dependent, per-antenna, visibility-based based technique (e.g.Middelberg et al. 2013;Herrera Ruiz et al. 2017).We opt not to apply this as the vast majority of our images are within the 80 per cent power point of the primary beam response; the significant added computational expense; as well as the low SNR of the majority of the sources, meaning this correction would likely be sub-dominant in comparison with the statistical uncertainties.
The achieved primary beam corrected sensitivities are shown in Figure 1.Given that our phase centres are relatively near the pointing centre (within the ∼80 per cent power point) and our relatively small fractional bandwidth (∼20 per cent), we assume an effective frequency of 1.6 GHz for all phase centres and VLBI detections.

IMAGING STRATEGY AND SOURCE-FINDING ALGORITHM
As described in Section 3, we choose per-phase centre image dimensions based on several survey design, hardware and software considerations that impact data processing performance.We use WSClean to image each phase centre, using a pixel size of 1 milliarcsecond and pixel dimensions of 64,000×64,000.These 64k images are not deconvolved due to the significant additional computational resources required to do so and the negligible benefit this provides for our field.There are no sources that are sufficiently bright to significantly impact the image dynamic range beyond a few arcsec, and therefore, the search for candidate sources.
For computing efficiency and calculation of the local rms, the 64k images are subdivided into 64 × 64 sub-images (each 1000 × 1000 pixels), referred to as 1k sub-images hereafter.We compute the maximum SNR in each 1k sub-image, defining the local rms as the standard deviation of the entire sub-image.The maximum SNR is computed as the maximum pixel value divided by the local rms.We do this for the native angular resolution 1k sub-images, as well as three derived images, which are convolutions of the original 1k sub-images with 2D Gaussian kernels with FWHM of 5, 10, and 20 mas.This approach improves the detection probability of extended sources, as is sometimes employed in more traditional source-finders.Ideally, we would rather employ a uv-taper to generate these smoothed images; however, for data processing and hard disk storage practicalities, we employ the image-domain smoothing approach.
We employ a stratified limiting SNR threshold to identify candidate sources.In the first, simplest case, we include in our catalogue all VLBI peaks with SNR > 7 in any one of the original or smoothed 1k sub-images.For VLBI peaks with 5.5 < SNR < 7, we only include those with a multi-wavelength source within 0.5 arcsec of the VLBI peak.This multi-wavelength cross-matching is performed using source positions from Chandra X-ray (Alexander et al. 2003); Spitzer infrared (Ashby et al. 2015), 3D-HST (Skelton et al. 2014) and VLA 1.4 GHz (Morrison et al. 2010).
In total, we find 24 candidate sources using these selection criteria, the majority of which are selected via the multi-wavelength cross-matching technique.These sources are distributed throughout the survey footprint as seen in Figure 5.As is detailed in Paper II (Njeri et al., in press), each of these 24 sources are common with catalogues from the EVN, VLA, and e-MERLIN, in addition to multi-wavelength counterparts.
Since several sources are spatially resolved, with morphological features of interest, we deconvolve sources once they have been identified in the 64k images, using much smaller 128 × 128 pixel images that are centred on candidate detections using on-the-fly phase rotation.These are Cleaned to a depth of ∼ 1 of the noise level, using both natural and uniform weighting schemes, using a circular mask centred on the peak and with a radius of twice the beam FWHM, down to a noise threshold of 4.Two-dimensional Gaussians are used to model the emission using Casa's imfit task.These deconvolved images are shown in Figure 6, and the resultant catalogue in Table 2 lists the integrated flux density from this procedure using the naturally weighted images.Note that because these are deconvolved, the SNR is marginally enhanced compared to the dirty 64k images within which these candidates are first identified.Similarly, they are unsmoothed, so they may also appear below the SNR threshold, which is crossed if convolved with one of the three Gaussian kernels used within the source-finding procedure described earlier.Our aim here is to provide a repeatable method of candidate source selection for more detailed scrutiny using multi-wavelength data and future multi-source self-calibration results.An analysis of the astrometric registration accuracy of the VLBA detections is carried out in Paper II, which finds the astrometry to be consistent with previous Chi et al. ( 2013) and Radcliffe et al. ( 2018) EVN 1.6 GHz results within the uncertainties.

Statistical Considerations
In total, this survey generates approximately 0.5 Terapixels of imaging.Therefore, we need to pay careful attention to avoiding spurious detections, given the large number of pixels.As outlined in Morgan et al. (2013), under the assumption of a Gaussian noise distribution, with an approximately constant noise rms across the image area considered, the cumulative probability distribution function of the image pixel brightness values is described by, where  is the pixel brightness value and  is the noise rms.Equation 2 and our Gaussianity assumptions imply that our imposed 7 SNR threshold would result in a total of  spur ≲ 0.01 false-positive VLBI peaks in our maps, given the ∼ 1×10 10 independent VLBI resolution elements (defined by restoring beam dimensions) in what is effectively a 0.5 Terapixel image.This low rate would, therefore, also allow for deviations from a Gaussian noise distribution, for which we find no evidence in our analysis of the wide-field images.Therefore, we are confident that all VLBI peaks above the 7 threshold are bona fide sources, and indeed, each of them has a clear multi-wavelength counterpart, just as the 5.5 − 7 detections do (since this is a requirement for their inclusion).
Examining the measured brightness of all 24 detections as a function of their distance from both the nearest phase centres and the VLBA pointing centres reveals no obvious trends (Figure 7), providing some qualitative support that there are no strong biases immediately apparent due to the survey strategy and its practical implementation.This will be further tested with larger source counts in upcoming, wider-field, higher-sensitivity VLBI surveys.
The false-positive rate is considerably higher for the 5.5 threshold, which we expect to result in  spur ∼ 94 spurious peaks within our entire imaging area (i.e.∼0.5 per 64k image).Attempts to lower the threshold significantly below 7 must incorporate additional information to remove false positives, which we do using multiwavelength catalogues, as previously described.Here, the probability that a  > 5.5 noise peak lies within 0.5 arcsec of a catalogued multi-wavelength source is assumed to be, where  s is the cross-matched multi-wavelength catalogue source density,  cross is the cross matching radius, and Ω is the VLBI imaging area.The source densities for the Spitzer infrared, Chandra X-ray, HST optical/near-infrared, and VLA surveys range from  s ∼ 10 3−6 deg −2 , meaning that even for the highest source density catalogues used, the probability of cross-matching a VLBI noise peak above 5.5 with a multi-wavelength source over the 160 arcmin 2 area is less than 2 per cent, a probability comparable to the 7 'VLBIonly' threshold, which we deem acceptable for generating a robust cross-calibration in this first paper of the series.We do not take the image-plane smoothing into account when computing the above statistics, however, we do not see any evidence that this leads to spurious detections, as detailed in Paper II. Naturally, one could argue that the 7 'VLBI-only' and 5.5 'multi-wavelength' thresholds applied here are somewhat arbitrary, apart from the manual investigation of a range of values we performed and the qualitative assessment thereof.This topic is worthy of a detailed systematic study, which is an enormous computational task and is beyond the scope of this survey overview paper.We explore this topic in a future paper in this series, with several motivations, including the use of low-SNR candidate detections to perform multisource self-calibration.In principle, including additional sources will improve the quality of the self-calibrated gain solutions and, hence, the image sensitivity and fidelity.Two of the questions this future work will explore are (i) the optimal thresholds in this process and (ii) the resultant false-positive rates.

SURVEY DATA PRODUCTS
Here, we describe the four primary survey data products and how they are used in companion papers.
(i) There are 205 × 64k total intensity dirty images used in candidate source identification.Each image is approximately 17 GB in size (FITS format) and serves as a reference for comparison with the statistical calibration ensembles to be carried out in a future paper.These images will also be used as a comparison for potential future transient source searches in this field.
(ii) A catalogue of candidate sources, referred to as the crosscalibration catalogue.This is a master catalogue from which several derivative catalogues are drawn in Paper II, incorporating a range of comparisons with other radio surveys, as well as multi-wavelength comparisons (Njeri et al., in press).The source-finding approach used to generate this is described in Section 6.The full catalogue is listed in Table 2, which includes the apparent and primary-beam corrected integrated flux density.All multi-wavelength cross-matching and intrinsic source parameter descriptions are detailed in Paper II.
(iii) Cleaned narrow-field total intensity images of each candidate source at a range of robust values, with the primary beam correction applied, following the method described in Section 6.These can be used for more detailed individual analysis of each source, its location within and the morphology of its host galaxy.
(iv) To carry out a transient/variability search, we generated a 64k image for each of the 12 observing epochs, resulting in 12 × 205 = 2460 total intensity dirty single-epoch images, each with the same 64k dimensions and a typical rms of  epoch ∼ 38 Jy beam −1 .This computationally intensive task required the use of WSClean's Image Domain Gridder (IDG) in combination with an NVIDA A40 Graphics Processing Unit (GPU) to increase the processing speed.The analysis of these will be reported in a future paper.

SOURCE COUNTS
In Section 6, we present the 24 detections made in this survey over 160 arcmin 2 .While this is a small sample size that provides relatively poor constraints on the inferred source sky density, the unique feature of this VLBI survey is the quasi-uniform sensitivity over a welldefined area within a well-studied extragalactic legacy field.This is the aspiration goal of future radio facilities that will have baseline lengths ranging from a few tens of metres out to trans-continental scales.In this section, we examine what this first extragalactic survey of its kind is able to contribute to our constraints of mas-scale radio source populations, with a detailed analysis of radio and host galaxy properties presented in Paper II.
Source counts have been used for many decades to better understand the radio sky (e.g.Ryle 1958;Condon 1984).Contemporary applications typically show the differential source counts, d/d  , as a function of source flux density,   .At GHz frequencies, this is generally applied to arcsec-scale resolution source counts in total intensity (e.g.Owen & Morrison 2008;de Zotti et al. 2010;Smolčić et al. 2017;Matthews et al. 2021) and in polarized intensity (e.g.Hales et al. 2014).Differential source counts were derived by Herrera Ruiz et al. (2018) using the VLBA COSMOS survey.They argue that the close proximity of the VLBA 1.4 GHz counts to the VLA 3 GHz source counts (Smolčić et al. 2017) was a sign of consistency, implying that most of the lower luminosity radio AGN were accounted for in the VLBA COSMOS survey in the ∼ 0.1−1 mJy beam −1 range.In Fig. 8, we show the derived source counts for the VLBA CANDELS GOODS-North Survey.A point that Herrera Ruiz et al. (2018) stress is that their Euclidean-normalized VLBI differential source counts are lower limits on the true counts at larger (i.e.arcsec) scales, which do not filter out low-brightness temperature emission.However, this is not strictly correct as the fraction of VLBI-detected sources in a given arcsec-scale radio flux density bin is not constant at all flux densities, as shown in Deller & Middelberg (2014).So, while it is true that VLBI source counts will typically be lower than arcsec-scale counts at a given flux density, we should be careful not to treat VLBI and arsec-scale differential source counts in a given flux density bin as part of the same population, which the 'upper limit' terminology may incorrectly be interpreted as.
In Figure 8, we show the Euclidean-normalized source counts for the COSMOS field for MeerKAT MIGHTEE 1.28 GHz (Jarvis  [-3,3,5,7,9,11,13].See Table 2 for source-specific values and coordinates.A representative synthesized beam geometry is shown in the bottom-left panel in blue, which has dimensions 9 × 6 mas at a position angle of 9.3 deg.Note that some of the spatially resolved sources only rise above our defined detection threshold when smoothed with a Gaussian kernel; however, they are shown at their native (natural weighting) angular resolution here.See Section 6 for further details.et al. 2016;Heywood et al. 2022;Hale et al. 2023) broken into two sub-populations, as well as the VLA 3 GHz counts (Smolčić et al. 2017), all scaled to 1.4 GHz assuming a spectral index of  = −0.7,where   ∝   .In addition, we show the VLBA (+GBT) mas-scale 1.4 GHz source counts presented in Herrera Ruiz et al. (2018), as well as our 1.6 GHz VLBA GOODS-North source counts derived from just 24 detections.While the cosmic variance uncertainty is considerable for an area of 160 arcmin 2 (see Heywood et al. 2013), we are motivated to show this comparison for two primary reasons.First, COSMOS is the extragalactic field with the largest number of VLBI-scale sub-mJy source counts by an order of magnitude.Second, this field includes manual classification of AGN and starforming galaxies (SFG), allowing direct comparison with these two populations with the compact radio source subsample.
There are two primary comparisons we wish to highlight from Figure 8.The first is the VLBI-only comparison of the VLBA COSMOS and VLBA CANDELS GOODS-North differential source counts.This reveals a consistent profile and similar drop-off at the faint end where completeness is expected to be similar, given this is the same instrument and with comparable angular resolution, observing frequency, and image rms of both surveys (all within ∼15 per cent).Second, both sets of VLBI counts follow the AGN-classified profile relatively well, hinting at these two approaches tracing out similar populations.This suggests that a significant fraction of the AGN detected in arcsecond-scale radio surveys in the range ∼0.1-1 mJy have a compact core with sufficiently high brightness temperature to be detected with milliarcsecond resolution.This supports the suggestion in Whittam et al. (2017) that the cores of faint radio galaxies are more dominant than previously thought, or at least more dominant than assumed in simulations of the radio source population.
There are a wide range of potential physical reasons for this observed flattening of the differential source counts of compact radio sources.First, a subset of these may be younger radio jets that have not yet had sufficient time to produce larger extended jets, perhaps still cocooned within the denser ISM of star-forming galaxies.Alternatively, a subset of these sources may belong to a class that fails to produce large-scale jets during their lifetime due to a plethora of reasons, including short duty cycles, weaker jet collimation, and slower jet speed, both of which are potentially linked to lower black hole spins.The latter could be related to the merger history of the respective host galaxies and/or their progenitor supermassive black holes (e.g.Volonteri et al. 2005;Reynolds 2021).Further progress on understanding the dominant physical reasons will require large samples enabled by wider surveys at ≲ 10  Jy beam −1 depth.Multi-band VLBI imaging will also likely provide an important additional perspective.
Figure 8 demonstrates that while VLBI source counts are still in their infancy, they clearly have the potential to bring a unique perspective to the relative composition of AGN and SFG in the important transition flux density range of ∼ 0.1 − 1 mJy beam −1 .What is critical to this are well-defined survey areas, as provided in the VLBA CANDELS GOODS-North Survey.Furthermore, we clearly require a wide range of fields with excellent multi-wavelength coverage and preferably with AGN/SFG classification in hand to address cosmic variance and begin to explore the impact of environment as well as the clustering properties of the lower luminosity VLBI-selected radio sources.This statistical comparison of the source counts between VLBA COSMOS and VLBA CANDELS GOODS-North Survey is a first step, which is followed by a detailed source-by-source comparison of the VLBA, EVN, e-MERLIN, and VLA GOODS-North surveys, which is presented in Paper II of this series, and upcoming, wider-area VLBI surveys at similar depths in other extragalactic legacy fields.

SUMMARY
In this paper, the first in a series, we describe the design, data processing, primary data products, and derived differential source counts of a wide-field, quasi-uniform sensitivity VLBA survey of GOODS-North field.The survey area is 160 arcmin 2 and reaches a depth of  ∼ 11 Jy beam −1 at the pointing centre.The survey serves as a technical demonstration of an alternative approach to wide-field VLBI surveys, placing phase centres on a uniform hexagonal grid rather than on known sources pre-selected by previous arcsec-scale radio or multi-wavelength surveys.We use this approach to generate what would collectively be a ∼0.5 Terapixel image, made up of 205 'sub-images', each with 64,000×64,000 pixel dimensions, where each pixel is 1 mas 2 .We employ a novel approach to candidate source identification, incorporating smoothing kernels and multi-wavelength cross-matching to derive what we refer to as the cross-calibration catalogue, comprising of 24 sources.The crossmatching and host galaxy analysis is performed in Paper II (Njeri et al., in press), along with a detailed description of the radio properties across a spatial dynamic range of ≳ 10 4 .Paper II also performs a comparison of the EVN GOODS-North field (Chi et al. 2013;Radcliffe et al. 2018) with that of the VLBA, which is the first comparison of its kind for this depth and sample size that the authors are aware of.The survey design also enables a detailed analysis of the performance of statistical calibration (i.e.Multi-Source Self-Calibration, Middelberg et al. 2013;Radcliffe et al. 2016) and the relevant tradeoffs in the catalogue size versus the lower limit of the flux density, a study detailed in a future paper in this series.The survey was carried out in 12 individual epochs of comparable duration and sensitivity, enabling a transient/variability search through the 205 × 12 single-epoch 64k images for sources of interest.
We derive the VLBI differential source counts for this uniform sensitivity field and show that these are consistent with previous VLBI source counts in the COSMOS field.These broadly trace the AGN population detected in arcsecond-scale radio surveys, with one important deviation: there is a distinct flattening of the source counts in the ∼100-500 Jy range.This could suggest a transition in the population of compact radio sources as the host galaxies transition into the starforming population.The physical reasons for the flattening of VLBI source counts are speculative at this point and include the impact of both the denser ISM and lower black hole spin, which could be related to galaxy/black hole merger history.Increased statistical power and multi-wavelength information will be required to test possible explanations.Multi-band VLBI imaging will likely provide an important additional perspective.
The scientific and technical demonstrations of this survey may serve as useful inputs to the design and execution of future large-area surveys at milli-arcsecond resolution, including those planned future wide field-of-view African VLBI stations (Gaylard et al. 2011;Godfrey et al. 2012;Agudo 2015;Paragi et al. 2015), which will greatly enhance VLBI access to southern hemisphere extragalactic legacy fields, in concert with the European VLBI Network and Australian Long-Baseline Array, and the Square Kilometre Array mid-frequency array.Furthermore, the technical approach outlined here may be useful in the design of transient surveys and time-critical rapid follow-up VLBI observations or for specific transient classes that can be identified on timescales comparable to the multi-month observational period of this survey.

Figure 1 .
Figure1.VLBA phase centres (small diameter colour circles) over-plotted on the Hubble Space Telescope F 606W mosaic of the CANDELS GOODS-North field(Grogin et al. 2011;Koekemoer et al. 2011).Each phase centre has a radius of 35 arcsec, the approximate angular distance at which the combined time and bandwidth smearing point source sensitivity loss is at the ∼20 per cent level.The red central cross shows the VLBA pointing centre, while the dashed and solid red circles show the 80 and 50 per cent primary beam response contours, respectively.Each phase centre is colourized by the corresponding image's achieved noise rms.

Figure 2 .
Figure 2. Dates that each of the twelve survey schedule blocks were carried out on the VLBA in the filler mode.The maximum baseline length and number of participating antennas for each schedule block are indicated.

Figure 3 .
Figure 3. VLBA uvcoverage of the GOODS-North field, with colours indicating the five scheduling variations (A-E) that the twelve ∼2-hr schedule blocks were split into (see Table1).

Figure 4 .
Figure 4. Amplitude versus uv-distance (left), and naturally-weighted Clean image of J1234+619 (right), the complex gain calibrator used in the VLBA CANDELS GOODS-North survey.J1234+619 has a peak brightness of 17.83 mJy beam −1 and an integrated flux density of 18.84 mJy.The source is marginally resolved but still above 10 mJy level at > 8000 km baselines.The contours are drawn at   = ±0.25 mJy beam −1 and increase by factors of two.The map rms is 0.22 mJy beam −1 .The restoring beam is shown as a black ellipse at the bottom left of the map, with dimensions of 9 × 5.9 mas and a position angle of 9.3 deg east of north.

Figure 5 .
Figure 5. Left: Location of 24 VLBA detections presented in the cross-calibration catalogue overplotted on the HST F1606W map.The red central cross shows the VLBA pointing centre, while the dashed and solid red circles show the 80 and 50 per cent primary beam response contours, respectively.The right panel demonstrates how each of the 205 phase centres is imaged with a size of 64k × 64k pixels.A further zoom-in is shown to illustrate one of the VLBI detections, which is located within with an HST-imaged host galaxy showing a prominent bulge and disk.

Figure 6 .
Figure6.Montage of the 24 detections that make up the cross-calibration catalogue.Each image has an extent of 64 × 64 mas 2 .The contours are drawn at the local rms of the map multiplied by factors of[-3,3,5,7,9,11,13]. See Table2for source-specific values and coordinates.A representative synthesized beam geometry is shown in the bottom-left panel in blue, which has dimensions 9 × 6 mas at a position angle of 9.3 deg.Note that some of the spatially resolved sources only rise above our defined detection threshold when smoothed with a Gaussian kernel; however, they are shown at their native (natural weighting) angular resolution here.See Section 6 for further details.

Figure 7 .
Figure 7. Maximum brightness,  peak 1.6 , of each source as a function of the distance from the phase centre, Δ PC , of the image within which the source was identified.The colours show the angular distance between each source and the VLBA pointing centre, Δ PTG .While this may be in the small number statistics regime, there is no indication of biases in the maximum brightness measured based on source location relative to the phase centre or pointing centre.

Table 1 .
Summary of the VLBA CANDELS GOODS-North observing epochs.

Table 2 .
VLBA CANDELS GOODS-North cross-calibration catalogue.Values are derived from the naturally-weighted images.Distance from the nearest phase centre (arcsec).† † Distance from the VLBA pointing centre (arcmin).‡ Primary beam response at the location of the source.