HELP: The Herschel Extragalactic Legacy Project

We present the Herschel Extragalactic Legacy Project (HELP). This project collates, curates, homogenises, and creates derived data products for most of the premium multi-wavelength extragalactic data sets. The sky boundaries for the first data release cover 1270 deg2 defined by the Herschel SPIRE extragalactic survey fields; notably the Herschel Multi-tiered Extragalactic Survey (HerMES) and the Herschel Atlas survey (H-ATLAS). Here, we describe the motivation and principal elements in the design of the project. Guiding principles are transparent or"open"methodologies with care for reproducibility and identification of provenance. A key element of the design focuses around the homogenisation of calibration, meta data and the provision of information required to define the selection of the data for statistical analysis. We apply probabilistic methods that extract information directly from the images at long wavelengths, exploiting the prior information available at shorter wavelengths and providing full posterior distributions rather than maximum likelihood estimates and associated uncertainties as in traditional catalogues. With this project definition paper we provide full access to the first data release of HELP; Data Release 1 (DR1), including a monolithic map of the largest SPIRE extragalactic field at 385 deg2 and 18 million measurements of PACS and SPIRE fluxes. We also provide tools to access and analyse the full HELP database. This new data set includes far-infrared photometry, photometric redshifts, and derived physical properties estimated from modelling the spectral energy distributions.


INTRODUCTION
A fundamental requirement for rigorous testing of any theories of galaxy formation and evolution is a complete statistical audit or census of the stellar content and star-formation rates of galaxies in the Universe at different times and as a function of the mass of the dark matter halos that host them.
This audit requires many elements. We need un-biased maps of large volumes of the Universe made with telescopes that probe different wavelengths at which different physical processes of interest manifest themselves. We need catalogues of the galaxies contained within these maps with photometry estimated uniformly from field-to-field, from telescope-to-telescope, and from wavelength-to-wavelength. We need to understand the probability of a galaxy of given properties appearing in our data sets. We need the machinery to bring together these various data sets and calculate the "value-added" physical data of primary interest, e.g. the distances, stellar masses, star-formation rates, active galactic nuclei (AGN) fractions and the intrinsic number densities of the different galaxy populations.
For decades many teams have been undertaking ambitious coordinated multi-wavelength programmes to study large volumes of the distant Universe. These surveys are becoming sufficiently complete that we are now able to undertake the necessary homogenising and adding value, and thus provide the first representative and comprehensive census of the galaxy populations in the Universe.
ESA's Herschel (Pilbratt et al. 2010) mission has a unique role in these studies, probing the obscured starformation activity, which at high redshifts forms about 80% of all star formation. The Herschel extragalactic surveys were a major goal of Herschel and occupied around 10% of the Herschel mission.
The Herschel Spectral and Photometric Imaging Receiver (SPIRE) instrument is sufficiently sensitive that the images can detect nearly all of the emission making up the Cosmic Infrared Background Radiation (CIRB) (Duivenvoorden et al. 2020), which itself makes up roughly half of the total background radiation from galaxies. However, the large beam size means that the objects that can be clearly seen as individual sources only make up about 15% of the CIRB. The Herschel Photoconductor Array Camera and Spectrometer (PACS) instrument complements the SPIRE observations with bands centered at 100 µm and 160 µm but at lower depths than SPIRE.
A particular focus of HELP is to employ new methods to learn from our large statistically meaningful samples. This requires harnessing the ancillary data and the Herschel data to unlock the full information from the Herschel images and then make that available as a legacy to the community.
The science possible with the Herschel data will be significantly enhanced with ongoing optical, NIR and radio surveys. The VISTA near-infrared surveys detect the radiation from the old stellar population in galaxies, which accounts for most of the stellar mass, while the radio surveys being carried out over the next few years with LOFAR, MeerKAT and ASKAP detect radiation associated with the young stellar population and with radio-loud AGN.
The challenge for astronomers wishing to exploit these rich data sets is to collate the data, understand the selection effects and derive physical parameters. Collation of multiwavelength data has been undertaken for very deep surveys over small areas (less than few deg 2 ) in particular COSMOS (Scoville et al. 2007;Ilbert et al. 2013;Laigle et al. 2016) and ASTRODEEP (Merlin et al. 2016;Castellano et al. 2016) and for wide nearby surveys (over 200-1000 deg 2 ) especially SDSS (Blanton et al. 2017) and GAMA (Driver et al. 2009(Driver et al. , 2011. However, due to size of the data and complexity arising from the variety of observatories used, little concerted effort has been made to assemble the deep surveys over 10-1000 deg 2 . These surveys are particularly important as they are large enough to probe representative ranges of environments and to provide large statistical samples to fully explore the range of galaxy phenomena in detail including rare and transitory phenomena. Dealing with this complexity and volume of data is not trivial. It requires cross-matching hundreds of millions of objects observed at different bands, identifying spurious sources in a robust and reliable manner and this needs to be done consistently across all fields with varying depths and bands. Dealing with such volumes of data is also memory intensive and requires huge compute power to process the resulting far infrared (FIR) photometry, photometric redshifts, and SED fitting.
This paper presents the Herschel Extragalactic Legacy Project (HELP) Data Release 1 (DR1) and details the pipelines and methods used to tackle the fore-mentioned challenges of complexity and volume size inherent to collating large, deep heterogeneous survey data. This paper follows specific HELP papers detailing specific parts of the project (e.g. Hurley et al. 2017;Duncan et al. 2018a;Duncan et al. 2018b;Ma lek et al. 2018;Shirley et al. 2019) and science results using data from DR1 (e.g. Scudder et al. 2016;Duivenvoorden et al. 2016;Lo Faro et al. 2017;Pearson et al. 2017bPearson et al. , 2018Buat et al. 2018;Scudder et al. 2018;Donevski et al. 2020;Duivenvoorden et al. 2020;Mountrichas et al. 2021). In Section 2 we define the HELP fields. In Section 3 we describe the overall HELP strategy. In Section 4 we describe the specific work-flow for DR1. In Section 5 we present some statistics of the data release. In Section 6 discuss the uses of this data set and conclude.

THE HELP FIELDS
Many extragalactic surveys from different observatories and at different wavelengths have been coordinated in their planning and execution. Each survey had different motivations and factors constraining their choice of field locations, sizes, and thus their individual footprint on the sky. The superset of all survey footprints would be large and include many areas with only a few data sets. The primary motivation for HELP is the Herschel coverage so DR1 is limited to the main wide area extragalactic Herschel surveys.
Given that there is no imminent successor to Herschel the data from that mission provides a legacy benchmark. Within the Herschel observatory the SPIRE instrument (Griffin et al. 2010) mapped larger areas than the PACS instrument (Poglitsch et al. 2010). We thus define the boundaries of the project on the basis of the extragalactic surveys carried out with SPIRE. The specific Herschel OB-SIDS chosen to define the project are listed in Appendix A.
The footprint of these observations is conveniently captured in HEALPix Multi Order Coverage maps, MOCs (Fernique et al. 2019) which are provided online 1 .
Some basic properties of the fields are tabulated in Table 1 and the footprints are illustrated on a map of the Galactic dust from Planck (Planck Collaboration et al. 2014) in Figure 1. As expected from the requirement of the infrared surveys and alignment with other multi-wavelength surveys, we can see that the HELP fields: avoid the emission of dust from our Galaxy; are distributed in right ascension; have some concentration at the celestial equator; and include fields near both ecliptic poles.
The Herschel Multi-tiered Extragalactic Survey (Her-MES, Oliver et al. 2012) is a major survey conducted by the Herschel mission (Pilbratt et al. 2010) using the SPIRE (Griffin et al. 2010) and PACS (Poglitsch et al. 2010) instruments covering 380 deg. 2 . A number of important Herschel surveys are contained within the footprint of the SPIRE data in HerMES, notably the PACS evolutionary Probe (PEP, Lutz et al. 2011). The largest and shallowest of the HerMES SPIRE tiers is the HerMES Large Mode Survey, HeLMS, which adjoins the 70 deg 2 HerS survey (Viero et al. 2014b) to form the largest contiguous extragalactic SPIRE field, which we refer to as the Herschel Stripe 82 field. The largest SPIRE footprint comes from the Herschel Astrophysical Terahertz Large Area Survey' (H-ATLAS, Eales et al. 2010) which comprises 660 deg. 2 (Smith et al. 2017). Additional SPIRE coverage comes from: the Herschel -AKARI NEP Deep Survey (Pearson et al. 2017a); and the SPIRE coverage of South Pole Telescope deep field (SSDF, Holder et al. 2013); and the SPIRE calibration field in the North Ecliptical Pole.
The multi-wavelength data available in these fields is extremely rich. This is important scientifically through providing the key for basic properties of the objects such as their redshift and probing different physical emission processes. The wealth of data is partly because the choice of these fields by the Herschel teams was motivated by existing surveys. In addition, new surveys have been carried out through coordination between survey teams and an appreciation of the value of the accumulated data in these fields has encouraged independent surveys. There are also many very large area surveys that overlap with these fields by accident. A primary goal of HELP is to collate these data sets together. The number of overlapping surveys is continually expanding, so the current collation can only be a snapshot.

HELP STRATEGY
The area, depth and wavelength coverage of the data in the surveys within the HELP fields have enormous potential for addressing important scientific questions, particularly addressing the questions of galaxy evolution. The volume of the Universe probed is phenomenal allowing studies of rare or transitory phenomena. This volume also provides large samples of galaxies that can be divided into meaningful subsamples to test galaxy formation scenarios in more detail. The variety of areas and depth allows probes of faint and distant galaxies and enables comparison between distant and 1 http://hedam.lam.fr/HELP/dataproducts/dmu2/ nearby samples, i.e. to study galaxy evolution. The volume also provides a complete sampling of the range of galaxy environments. The wealth of multi-wavelength data allows for study of the different emission from all the important physical processes e.g. stellar mass, star formation, active galactic nuclei and provides the basic information like positions and distances.
The challenge to realise this potential is that the information from different survey teams, from different wavelengths, from different facilities, and from different fields is not curated. This means that astronomers will tend to use a limited subset of the available data and also that basic analysis is unnecessarily repeated by many researchers.
The HELP strategy is to curate these data sets so that they can be used in their entirety by the whole astronomical community with the minimum of specialist knowledge and to add value to these data to enable more efficient and extensive scientific exploitation.
HELP is designed to create a framework for wide-area multiwavelength studies that can be continuously updated with new observations. The scope for Data Release 1, DR1, is to curate object catalogues and photometry at near-IR and optical wavelengths that have been provided by the survey teams from images at mid to far-IR wavelengths alongside spectroscopic redshifts. HELP also provides tools to access the original imaging for manual inspection of interesting sources identified in the final catalogues based on their far infrared flux or physical properties.
The most fundamental element of the curation is by providing homogeneous data products with consistent measurements, units, and data formats. We provide comprehensive meta data describing the data, using Virtual Observatory (VO) standards. In particular we provide the user with access to the original references and data from the survey teams, providing written descriptions of all the data in addition to machine readable files with links to papers, summaries of coverage and descriptions of instruments, including definitions of bands and links to transmission curves.
A key type of meta data for undertaking statistical studies of galaxy evolution is the selection function. The selection function is the probability of an object being detected and included in a given sample as a function of the galaxy properties. Determining the form of these functions is a major challenge for collated surveys. We therefore need to develop tools to reverse engineer the selection function and protocols to provide those to users. We provide the following for capturing selections functions at increasing levels of sophistication: Binary coverage maps: These contain the basic information of where, on the sky, data exists. We choose Multi-Order Coverage maps, MOCs (Fernique et al. 2019) to capture this.
Depth maps: These capture a simple, scalar, estimate of the depth of data at any sky position in a given band. We use HEALPix order 10 cells to provide a map of depths. This order can be changed and is chosen to be a compromise between attaining large enough samples to accurately measure depth and attaining a usefully high resolution.
Completeness maps: These capture the probability that an object of a given intrinsic flux at a particular sky position would be included in a catalogue. These are derived  from the depth maps and separately provided for photometric redshift availability.

Open Science
The project has been implemented using open science frameworks with the following general principles: • All code is publicly available through a version controlled git repository.
• Production code is embedded in extensively annotated Jupyter Notebooks with integrated diagnostic plots.
• Every version of each data product is associated with the git commit code for any code used at the time of production These key principles enable rerunning of any section of the pipeline in order to facilitate both verification and extension of the work by external researchers. By using Jupyter notebooks to document all the processing on GitHub, all the information about data quality is readily available and the code can be rerun with future additional survey data.

Tools
The HELP philosophy is that astronomers can easily carry out their scientific investigations without a high degree of instrument or survey specific expertise. We have defined some specific scientific use-cases which should be achievable at the end of the project. Our target is that these recipes could be used by a postgraduate student to produce meaningful scientific results. Our intention is that all scientific results from the team are easily reproduced using these tools. Some of these tools are database operations. Our database is VO enabled with ADQL interfaces. Some tools are traditional client/server interfaces. Other tools are developed to provide containers (e.g. Docker) that the user can download and run on their own CPU resources. We supply extensive examples and documentation to aid the uptake of all the tools developed and presented here.

THE DR1 WORKFLOW
In this section we describe the HELP workflow, outlining the key data analysis steps, the decisions taken and the outputs resulting from the workflow. Figure 2 is a visual representation of the workflow which we summarise below, with additional details for each stage in the subsections that follow.
First, we create the master list of astronomical sources and collate photometry measurements for these sources at all wavelengths between 0.36 − 4.5 µm. Part of the photometry collation process involves determining the highest quality measurements available in a given field and wavelength region. In order for subsequent data processing to work effectively, there should be high quality photometry across a wide spread of wavelengths. This stage also allows us to investigate the depths available in a given area for a given band. Some of the fields in the HELP area have deeper surveys available and wider wavelength coverage than others. After the production of the master list which includes all the compiled spectroscopic redshifts the catalogue is used to calculate photometric redshifts as described in section 4.6. These are required for spectral energy distribution (SED) modelling. The next stage is to produce the prior list which is required for XID+ forced FIR photometry. The forced photometry performed by XID+ takes the prior list as a hard positional prior for objects that are most likely to be detectable in the FIR based on the optical to NIR photometry available. The exact selection of the prior list is defined in section 4.3. Objects with fitted FIR fluxes are then fed through to the final stage where SED modelling is used to calculate galaxy properties.
The final merged catalogue contains all the objects from the master list and any subsequent quantities added by the HELP pipeline. The final catalogue can thus be broadly grouped into three hierarchical categories: • The master list: Objects detected in an optical or NIR survey.
• The prior list: Objects included in the XID+ list of prior positions with FIR fluxes in any of the MIPS, PACS, and SPIRE bands available • The A list: Objects selected for SED modelling with XID+ detections, a photometric redshift, an SED model, and physical properties estimated.
We will now describe the details of each stage of the pipeline.

Mid and far-infrared images
This is the first time that all Herschel extragalactic blank field survey images are presented together in an homogeneous form. We also provide the Spitzer Multiband Image Photometer (MIPS, Rieke et al. 2004) 24 µm band images that are also used for computing forced photometry as part of the general pipeline. There are a total of severn mid or far-infrared (FIR) imaging bands presented here and used to compute forced photometry for far-infrared fluxes. These are the MIPS 24 µm band, the PACS 70 µm, 100 µm bands and 160 µm, and the SPIRE 250 µm, 350 µm, and 500 µm bands.

Spitzer MIPS 24 µm images
The MIPS images are from two different data sources depending on the field. The Spitzer Enhanced Imaging Products (SEIP) is a collection of Super Mosaics of Spitzer MIPS data. They are not presented in contiguous form but as individual sometimes overlapping images as originally provided in NASA/IPAC Infrared Science Archive 2 . We do not mosaic them here because each image may reuse data and taking account of this requires decision which reduce the general applicability of the data sets. The Spitzer Legacy Program was motivated by a desire to enable major science observing projects early in the Spitzer mission. Starting with 6 projects, the Legacy program was expanded to include 20 extragalactic projects over the cryogenic lifespan of Spitzer. The data products produced by the Legacy projects are monolithic images covering the full observed field and are used when available.
As the data come from different Spitzer projects, the point spread function (PSF) for MIPS is highly variable across fields, therefore we compute the MIPS PSF for each field independently. The process for computing the PSFs is described in section 4.3.1.

PACS images
PACS observations are available for a sub-set of the SPIRE area. These observations were sometimes taken in parallel with SPIRE observations (typically larger fields, due to the offset between detectors), while some fields were taken with PACS alone. We have re-processed all the PACS data, to create an optimised set of images across all fields. The timeline data are initially processed using the Herschel Data Processing System (HIPE, Ott 2010), and a basic 'PhotProject' image created per observation. We correct the individual  Figure 2. Overview of the full workflow. All objects in the original master list make it through to the final list regardless of which added value derived quantities are available. A given astronomical object from an input optical or near-infrared survey can be traced through the pipeline using this high level schematic. For each of the XID+ runs and final CIGALE SED run there are further criteria applied to each object that are not shown here for simplicity but are described in the relevant sections.
. observations for any shift in astrometry by stacking the images on the position of WISE sources. The measured RA and Dec shift are applied to the timelines and the data exported using the UniHipe plugin. The final map for the field is created using Unimap (Piazzo et al. 2012(Piazzo et al. , 2015aPiazzo et al. 2015bPiazzo et al. , 2016aPiazzo 2017) combining all available ovservations. The PSF for PACS is made using the same procedure as for MIPS.

SPIRE images
Each of these projects had different processing pipelines but had similar procedures for producing images from the instrument timelines described in Oliver et al. (2012); Chapin et al. (2011);Levenson et al. (2010); Viero et al. (2013Viero et al. ( , 2014a; Smith et al. (2017). We have compared images produced by the H-ATLAS and HerMES pipelines using the same input data and found no significant differences. The images presented and used here are also all homogenised to the same units and storage format. The SPIRE images presented here have a number of layers containing the homogenised image, the error image, and nebulised image. The nebulisation process removes large scale structure caused by cirrus with the method presented in Smith et al. (2017) using the Nebu-liser algorithm developed by the Cambridge Astronomical Survey Unit 3 .

The master list
The HELP master list contains optical, near-and midinfrared (e.g. Spitzer IRAC) catalogues. It includes every source with a measurement in any band. A positional cross match is then used to combine the various wavelengths. Sources are flagged to indicate data coverage to discriminate lack of detection from lack of data. Full details of the cross-match criteria and mis-association fractions are given in Shirley et al. (2019). While cross match radii are determined for each input catalogue depending on positional errors it is typically around 0.4 arcseconds. We also provide a table of the original catalogue IDs and the original catalogues. This means that where additional useful information is included in the input catalogue, it can be quickly recovered using the table of cross identifiers. All this data is provided in a simple and well documented structure to facilitate independent validation and external use. Full details on the . RGB representation of the Herschel Stripe 82 and XMM-LSS field, with 250µm , 350µm and 500µm represented by blue, green and red respectively. This is the largest contiguous extragalactic region observed by Herschel. The maximum scale of the field from the East to West tips is 50 • and the separation from edge-to-edge (following the zig-zag, roughly North-to-South) is 11 • . The inset indicates the location of this region on an all-sky equatorial projection. The total area of this field is 385 deg 2 . Readily apparent is the strong cirrus structure throughout the map, including a "seagull" like shape in the centre. The data comes from three different observations (XMM-LSS, HELMS from HerMES Oliver et al. 2012 and HerS). This image was built for HELP from the processed SPIRE time-lines using the HerMES SMAP processing.
production of the master list are presented in Shirley et al. (2019) and through the code itself on Github.
The master list is central to the HELP pipeline and data products. As the master list progresses through the pipeline, some fraction of objects satisfy the criteria required for the next additional processing stage. For instance, if there are not sufficient optical photometry points then the photometric redshift (photo-z) calculation is likely to fail and the object will not have a photo-z, and so cannot be used for SED fitting. Likewise, when XID+ is run, some objects will have low signal to noise so will not be 'detected' and will not be used for SED fitting. We aim to model each of these selection effects so that the full selection function can be understood as accurately as possible. Shirley et al. (2019) provides depth maps in order to model the detection of objects in the original catalogues. Here we provide additional depth maps for both the photo=z catalogues, which can significantly affect selection at the margins, and for the new photometry presented here.

Star masks and artefacts
Astronomical catalogues contains spurious artefacts resulting from instrument noise and dependent on the extraction methods. A key source of artefacts is bright stars, where the wings of the PSF or scattered light raise the background and spurious signals exceed the object detection threshold. We therefore developed a star masking pipeline which highlights regions likely to suffer from these artefacts. Our approach is to look at the excess number density of catalogued sources as a function of distance from bright stars.
For our bright star list we select all Gaia (Lindegren et al. 2016;Brown et al. 2016) stars with g < 16 mag. The reference band used on a given field is determined by which has most impact the prior list for XID+. In IRAC regions this is the IRAC 3.6 µm band, in other regions it is the deepest K band. Within magnitude bins we then determine an effective exclusion radius, r50, at which the excess number density (above the background level) drops to fifty percent of its peak. We choose fifty percent because the decline is steep and taking the mid point is a robust measure of its location. We then fit a linear relation: This function defines the radius of a circle around each Gaia star, within which all objects are excluded from the prior list and which should be excluded from all statistical analysis. This function typically reduces to zero for all objects below 14 mag. We fit the parameters A and B based on the magnitude bins of size 0.5 and generate star masks for each field and target band independently. The final star masks are provided in the DS9 and MOC formats.

XID+: the probabilistic deblender for confusion dominated images
For many Herschel fields, in addition to the SPIRE images we also have Spitzer MIPS 24 µm and Herschel PACS 100 and 160 µm images that cover the mid to far-infrared part of the electromagnetic spectrum. However, due to the relatively large beam size of the these images compared to the galaxy density (≈ 30 per SPIRE beam for optical sources with B < 28), multiple galaxies can be located within the same instrument beam. This is referred to as the problem of source confusion.
To obtain accurate photometry from these infrared images, accounting for source confusion is essential. One way to solve the problem is to use prior information to accurately distribute the flux in the images to the underlying astronomical objects. For example, if we know the location of a galaxy to a reasonable tolerance (e.g. from an optical image where resolution is much smaller than the Herschel beam), we may expect a galaxy to be found in the MIPS, PACS and SPIRE images at the same location. Typically the position of known objects have errors significantly less than the FIR point spread function such that we assume the positions are known precisely.
As part of HELP, we have developed XID+ (Hurley et al. 2017) which uses a probabilistic Bayesian approach that provides a framework in which to include prior information and obtain the full posterior probability distribution on flux estimates. Obtaining the full posterior probability distribution is particularly important for getting accurate uncertainties on source flux. A given XID+ model is described as: where dj is the model of the map pixel j, Pij is the Point Response Function (PRF) for source i in pixel j, fi is the flux density for source i and two independent noise terms for instrumental and confusion noise: N (0, Σinst) and N (B, Σ conf ) respectively. Rather than find just the flux values that maximises the likelihood, XID+ maps out the entire posterior, p(f |d), which can be defined as: where p(d|f ) is the likelihood, the probability of the data given the flux densities, and p(f ) is the prior probability distribution on the fluxes. The method is fully described in Hurley et al. (2017).

HELP XID+ pipeline
HELP uses XID+ to carry out forced photometry on the Spitzer MIPS and the Herschel PACS and SPIRE images to produce catalogue fluxes for the HELP database. Our prior source list for these images are constructed using two different pipelines, which we describe in detail in the following paragraphs. For flux priors, we use uninformative flux priors (i.e. uniform flux prior bounded with reasonable limits derived locally from the image) to enable an unrestricted range of analysis with the HELP data products. More informative prior information would be preferable for specific science projects and is a powerful approach to extract more information out of the data (e.g. Pearson et al. 2017bPearson et al. , 2018, however their use must be fully understood and taken into account, such that they are more suited for bespoke projects than for a data product. If not then apparent results might reflect the ancilliary data more than the far-infrared maps directly. In the next paragraphs we describe the steps followed to run XID+ across the HELP fields. Our list of prior sources is constructed from the master list. Fitting all the sources in the master list to the source confused infrared images results in fluxes that are degenerate without using more informative flux priors. We therefore have to limit the number of sources that go into our prior source list to those that are most likely to be detectable in the images. This approach fits the Bayesian philosophy of model building; build a simple model, fit to the data, evaluate, and finally improve the model where necessary. The prior source list is an integral part of our model for the infrared images. Limiting the number of sources to those that are most likely to be detected simplifies the model and the Bayesian P value maps described later, which provide a data product to carry out model evaluation. They identify where additional sources are needed to model the images. We depict the two ways we have constructed the prior source lists in Figure 2, one for fields where there is Spitzer and another for when there is no Spitzer coverage. For fields covered by Spitzer, we use sources detected in any of the Spitzer IRAC bands as they are known to be a good tracer for the Spitzer MIPS images (Rodighiero et al. 2006). To remove any possible artefacts in the IRAC catalogues, we impose an additional constraint that sources must also have a detection in either the optical or NIR wavelength range. Sources that meet this criteria, constitute the XID+ prior list for MIPS images.
Once we have the output from XID+ on the Spitzer MIPS images, we use our definition of detection level to select sources to be used for the XID+ prior list for the Herschel PACS and SPIRE images, which are fit independently (we do not use PACS XID+ detections as a prior for SPIRE given that the PACS data tends to be shallower than SPIRE). Detection is determined by the MIPS flux level where the Gaussian approximation to uncertainties is valid. Below a certain flux level, the map uncertainty is too large to be able to constrain the source flux and the flux posterior probability distribution for sources becomes skewed as the uniform flux prior dominates over the likelihood. This point is determined by manual inspection and given in Table 3.
For areas that have not been observed by Spitzer IRAC, we compute a total dust luminosity (L dust = 8−1000µm B λ dλ) for each object using the CIGALE code, as described in Section 4.7. We used the relationship between the ratio L dust f 250 and redshift, based on data from COSMOS field where we have both L dust and f250. We apply this relationship to the sources for which we have dust luminosity and redshift predictions to estimate f250. Sources that have a predicted f250 > 5mJy are added to the prior source list and XID+ is run on the Herschel PACS and SPIRE images. This flux cut was chosen after running XID+ on a small region within Herschel Stripe 82, using a range of flux cuts (e.g. using different prior lists), and comparing the Bayesian P -value maps described in Section 4.3 to check whether the cuts applied to the prior source lists provided a good fit to the map. This was to manually check whether bright sources were being missed at a given predicted L dust cutoff value.
For a number of fields, the dust luminosity relationship varies slightly due to an early bug in the prediction. Upon investigation, this has the affect of missing out 17 percent of the lowest flux sources that would otherwise have been included, whilst including 48 percent of sources that otherwise would have been removed from the source list. In this data release this will have the effect of introducing a selection effect and increasing the effective flux cut in predicted 250 µm flux. This will be propagated in to the effective forced photometry depth maps we provide and so will be automatically accounted for by modelling the selection using these maps.
As described in Section 4.2, star masks are used to define regions where bright stars cause large numbers of artefacts and spurious sources, resulting in these regions having no prior list sources. As XID+ is used for source de-blending rather than source detecting, it is not appropriate to ap-ply XID+ to areas of the map where you have no prior knowledge of sources. We therefore exclude pixels from the area defined by the star masks from XID+ fitting using the Multi-Order Coverage maps (MOCs) built from the star masks.
XID+ uses a Bayesian framework and so the flux parameters require a prior probability distribution. We use non-informative, uniform distributions with sensible limits. For Spitzer MIPS images, the upper and lower 24 micron flux limits are based on the longest wavelength IRAC flux available. For a lower limit we take fIRAC/500 and for upper limit fIRAC × 500. For Herschel SPIRE and PACS, we set the flux prior lower limit to zero and source specific upper limit equal to the local (as defined by the PRF) maximum pixel value plus the absolute value of the prior mean for background plus two times the standard deviation of the background prior. This combination of maximum pixel value alongside value and width of the background prior gives a conservative but not extreme upper limit on the flux.
An important part of the model is the Point Response Function (PRF). The PRF is the convolution of the point spread function and the transfer through to a pixel response function via the detection and map building process. This fully maps the contribution a point source makes to each pixel.
In Herschel SPIRE images the PSF is assumed to be a Gaussian, with full width half-maximum (FWHM) of 18.15, 25.15, and 36.3 arcsec for 250, 350, and 500 µm, respectively (Griffin et al. 2010). This is convolved with the pixel space to produce the PRF. In the case of both Spitzer MIPS and Herschel PACS the PRF is calculated by stacking the flux of point-like sources from the astrometry corrected AllWISE catalogues, referenced into the GAIA reference frame. Morphological outliers (with a reduced χ 2 > 4 in the AllWISE profile fit) were excluded before stacking.
The PRF obtained in the stacking will not be as high signal-to-noise as the instrumental PSF and will not track the extended wings of the PDF. To get the correct normalisation, we match the curve-of-growth of the instrumental PSF to that determined from our PRF.
Having defined the PRF, we use it to populate the pointing matrix, which describes how much each source contributes to each pixel in the map. It is calculated by taking the PRF for each band, centring it on the position for each source and carrying out a nearest neighbour interpolation to establish the contribution each source makes to each pixel in the map.
Running XID+ on the full images simultaneously is computationally unfeasible. We therefore divide the image into equal areas using the Hierarchical Equal Area isoLatitude Pixelization of a sphere (HEALPix). The resolution of the pixels are determined by the HEALPix level, with optimum order for Spitzer MIPS and Herschel PACS set at 11, and 9 for Herschel SPIRE, which correspond to ≈ 1.718 arcmin, and ≈ 6.871 arcmin respectively. When we fit each tile, the perimeter is extended by one HEALPix pixel with two levels higher resolution (i.e. level 13 for MIPS/PACS, level 11 for SPIRE with a resolution ≈ 25.77 arcsec and 1.718 arcmin respectively), so any sources that could contribute within the HEALPix pixel of interest are taken into account.
As with other MCMC fitting, we need to run chains long enough to ensure we converge locally and with multiple chains to ensure we have found a global minimum. We use the default number of chains, four, and discard the first half of each chain as 'warm up' or 'burn in'. In order to assess the convergence of each parameter we use the same diagnostics, R and n effective used in Hurley et al. (2017) and described in Gelman et al. (2013).

HELP XID+ data products
One of the key strengths of XID+ is that it maps out the posterior rather than just the maximum likelihood estimate. For individual objects of interest the full posterior can be used to verify the quality of the fit. This also allows using the joint posterior probability distribution of two correlated sources, getting the full uncertainty on the fluxes. The full posterior is stored in a '.pkl' file for each HEALPix tile. This data can be provided on request.
The posterior distribution also allows us to perform a probabilistic check of the XID+ fit. When examining goodness of fits, the traditional method is to look at the residuals. i.e. (data − model)/σ. Because we have many samples from the posterior, we can create a distribution of model images, that cover all the possible images XID+ generates from the posterior parameter values. Having a distribution of model images that we can compare to the original data provides a more robust check than if we were to use one, best fit model map coming from likelihood. Using a distribution of models drawn from the posterior, and comparing to the original data is called a posterior predictive check (Gelman et al. 1996).
For XID+, our approach to posterior predictive checks is to compare the observed flux of a pixel to the distribution realised in the model images. By calculating the fraction of model realisations that are above the observed value, we obtain the Bayesian P -value. A P -value of ∼ 0.5 means the model is consistent with the data. Values close to 0 shows that there is too much flux in the model compared to the map, whereas values close to 1 indicates there is flux in the map that our model cannot explain. We convert these probabilities to a typical 'σ' level. Figure 4 illustrates this process, which is repeated for every pixel in the map to produce Bayesian P -value maps for each band and field.
By using the full statistical power of the posterior probability distribution, these maps are more robust and less noisy than a traditional residual image and can be used to identify where the XID+ model assumptions break down. Examples of where XID+ might provide a bad fit are extended sources, which will appear in the Bayesian P value maps as a negative centre and positive rings. Another more interesting example would be missing sources: i.e sources in the map that are not in the prior list which will appear in the Bayesian P -value maps as positive peaks. Obtaining a catalogue of sources from the Bayesian P -value maps can be added to the prior source list for XID+ to obtain updated catalogues or to find interesting objects which are drop outs in the lower wavelength images.
For the final catalogues we summarise the posterior flux distribution in the form of 16th, 50th, and 84th percentile This is equivalent to mean, and mean ±σ if the posterior distribution is Gaussian. These values are used for SED fitting and most science cases. We use the skewness level of the cat- alogue to determine a detection level, as described in Hurley et al. (2017) and flag sources that are below this level. We also carry out model checking to identify whether the local (defined by the PRF) area of the map for each source is a good fit. To quantify this check, we define a Bayesian P -value residual statistic as follows; from the XID+ posterior probability samples, we use the same model images as our Bayesian P -value maps and calculate weighted residuals for the local pixels for every sample. We then calculate what percentage of our model images have a χ 2 statistic greater than we would expect. This value is a probability, with zero indicating our inferred model will always provide a fit deemed good given the uncertainties, and 1 indicating our inferred model would always provide a fit deemed poor given the uncertainties. We provide this Bayesian P value residual statistic for each source and each band. Sources with a value > 0.5 are flagged as unreliable.
The final product consists of a catalogue with the flux percentiles, the median background, and the convergence statistics, Bayesian P -value residual statistic and a flag for sources that are below detection level or have a high Bayesian P value residual statistic. In addition to the final products, the XID+ example notebooks also provides some visualization tools to conduct a comprehensive analysis of the posterior. It is possible to create posterior replicated images and animations, create the marginalised posterior plots or reproduce the Bayesian P -value maps. There are examples available in the XID+ user guide.

Blind catalogues
An important additional step for providing a legacy data set, suitable for community exploitation is to construct a catalogue of objects detected in the SPIRE images without reference to any other data and with fluxes extracted at the SPIRE wavelengths (a 'blind' catalogue). These catalogues give a perspective of the sub-mm sky unaffected by any prior prejudice. Again, the most significant challenge is the large SPIRE beam, leading to source confusion (e.g. Nguyen et al. 2010) which requires careful de-blending and the resultant catalogues of sources do not necessarily correspond one-toone to individual galaxies. To enable statistical studies there are a number of key metrics required: positional and flux biases and accuracy; completeness and reliability. Similar catalogues and metrics have been produced and made public for the other HerMES fields (Smith et al. 2012;Wang et al. 2014), H-ATLAS fields (Maddox et al. 2018), and for all SPIRE fields in the Herschel SPIRE Point source catalogue (ESA 2017). We have produced new blind catalogues for all the HELP fields using a similar method to Chapin et al. (2011), and described below:

Peak finding in Match Filtered images
The blind sources are identified by searching for peaks in the matched filtered (MF) images, as they are optimised for identifying sources in source-confused images (Chapin et al. 2011). We require peaks to have a flux density above the 85 per cent completeness level for each SPIRE band, where completeness is quantified as 1 − N spurious N real as a function of flux density. N real is the number of identified peaks in a given flux bin and Nspurious are the number of identified peaks in the negative version of a map (we assume the noise in the map is symmetrical about zero, thereby identifying peaks in the negative map will quantify the number of peaks that are from random fluctuations in the noise, as a function of flux).

Determining accurate positions
Having found the peaks for each band independently, we determine accurate centres for each source by determining the best-fit flux density for the three SPIRE bands using inverse variance weighting and for sub-pixel steps around each peak. For each sub-pixel position, we calculate the Pearson correlation coefficient between best fit and map. The position with largest correlation is taken as the new position. We combine the three resulting SPIRE catalogues by removing duplicates at 350µm and 500µm using a nearest neighbour matching algorithm with 12 arcsec and 18 arcsec radius, respectively, adopting the position in the shortest wavelength available for each merged source.

Determining fluxes with XID+
Having obtained accurate positions and removed duplicates, XID+ is applied to the standard image map 4 , using the merged, blind source matched filtered catalogue as the prior source list. The resulting XID+ catalogue, as described in section 4.3 is merged with the blind source matched filtered catalogue (so as to preserve information about the sources used as priors for XID+) to produce the HELP blind source catalogue.

Spectroscopic redshifts
As part of the HELP project we have collected spectroscopic redshifts from 101 different origins across all HELP fields, and created one merged catalogue for each field. Merging catalogues can be problematic due to the varying degrees of information and data quality available. Here we briefly describe the process used to make the matched catalogues.

Spectroscopic redshift catalogue homogenisation
The first step is to homogenise the individual redshift catalogues into a 'standard format', where we extract the RA, Dec, redshift, and if available whether the spectra classify the object as a QSO or AGN. We also assign a 'Quality Flag' (Q) to each spectra, using the information provided by the individual survey. For HELP we adopted the same Q definition as used by the 2dF survey , and GAMA team (Hopkins et al. 2013), where Q characterises confidence level on a five point scale. In reality, assigning an exact Q particularly for small surveys where reliability information is not given is difficult, therefore if a survey does not give an estimated reliability and claims the redshift is 'good' or 'reliable' it is normally assigned to Q = 3. If available we also record the exact reliability given for the individual survey, as many large surveys have reliabilities higher than 90% but not high enough to be assigned Q = 4; this enables the user to set a individual reliability threshold (for example >95%). For science studies we recommend using any redshifts with Q ≥ 3. We also search the individual catalogue to ensure there are no duplicate entries by checking for no self-matches within 0.4 . Any duplicates were manually investigated and the best redshift kept following the procedure outlined below.
Before the catalogues for each field are merged, each catalogue is given a unique binary identifier (i.e., 1, 2, 4, 8, etc...). This means if the same object is observed by multiple surveys the source identifier numbers are added together, and the new source identifier can be used to see each catalogue that provided the corresponding redshift. For example a redshift with a source identifier of 11 would have been observed by surveys 1, 2 and 8 (but not 4). All individual redshift catalogues and their identifier are listed in Table C2.

Merging of spectroscopic redshift catalogues
To merge the individual catalogues, we match each catalogue sequentially by using STILTS (Taylor 2006a) to perform a sky match in a radius of 1-2 and checking up to 5 , where the radius is chosen to give the optimum matches. By performing a manual check of objects close to the matching radius the code can be modified so that any above/below the matching threshold should be merged/split.
For any redshifts that have been found to be a match between the merged catalogue and the new individual catalogue the redshift with the higher Q flag is kept. For the case where both Q flags are ≥ 3 a check is performed to see if the redshifts have a ∆z < 0.01 and < 5% (for lower redshifts). If the two reliable redshifts disagree, a manual choice is made to decide the best redshift based on available information (i.e., if multiple sources, quality of data etc...). The fraction of conflicts between reliable redshifts is very small, for example in the SGP field we have 16 conflicts out of 47 213 redshifts. For redshifts from PRIMUS ) due its lower resolution we increased the matching to ∆z < 0.03 and used the higher-resolution spectra values. For any merged galaxies the QSO/AGN flag is switched on if any of the QSO/AGN flags is set.
In total for HELP we collected 891 317 spectroscopic redshifts, of which 713 660 are considered reliable (Q ≥ 3), and of these 621 407 are unique sources. Table 2 gives the number of reliable spectroscopic redshifts available for each field.

Photometric redshifts
For the majority of HELP fields, where extensive multiwavelength photometry is available, photo-zs have been calculated and are presented here for the first time. For 8 of the small HELP fields, where the best available optical photometry is provided only by all-sky photometric surveys and therefore is not improved by combining multiple surveys, we make use of photo-zs presented in the literature Zou et al. (2019). These account for under 1% of HELP DR1 photo-zs.

HELP Derived Photometric Redshifts
Photo-zs for the prime HELP optical datasets are estimated based on the method presented by Duncan et al. (2018a) and Duncan et al. (2018b). The approach combines multiple templates and machine-learning estimates to produce a hybrid consensus photo-z estimate with accurately calibrated uncertainties. This method is only possible on fields with sufficient spectroscopic redshift samples as described below. We refer the reader to the original papers for a detailed discussion on the motivation and implementation. Below we summarise the implementation of this method for HELP.
As in Duncan et al. (2018a), three different templatebased estimations are calculated using the EAZY software (Brammer et al. 2008) with three different template sets: one set of stellar-only templates, the EAZY default library (Brammer et al. 2008), and two sets including both stellar and AGN/QSO contributions, the XMM-COSMOS templates (Salvato et al. 2008(Salvato et al. , 2011 and the Atlas of Galaxy SEDs (Brown et al. 2014). The individual template fitting results are optimized using zero-point offsets calculated from the spectroscopic redshift sample 5 and the posterior redshift predictions calibrated such that they accurately represent the uncertainties in the estimates. When sufficient spectroscopic training sets are available for a given field, additional Gaussian process photo-z estimates (GPz; Almosallam et al. 2016a,b) are produced by training on one or more subsets of the master list photometry. The multiple individual photo-z estimates are then combined following the Hierarchical Bayesian combination method first presented in (Dahlen et al. 2013), incorporating the additional improvements outlined in Duncan et al. (2018a); Duncan et al. (2018b).
A key step in the Hierarchical Bayesian photo-z method outlined in Duncan et al. (2018b) is the separate treatment of priors and uncertainty calibration for known AGN. For HELP, we identified known AGN as follows: • Optical AGN : We identify known optical AGN through cross-matching of the master list with the Million Quasar Catalogue compilation of optical AGN (Flesch 2015). Sources which have been spectroscopically classified as AGN are also flagged (see 4.5).
• X-ray AGN: In HELP fields that have been targeted by deep pointed X-ray surveys, we make use of any publicly available X-ray catalogues and associated optical IDs to identify known X-ray AGN in the master list. Outside of the publicly available deep X-ray surveys, we make use of the Second Rosat all-sky survey (2RXS; Boller et al. 2016) and the XMM-Newton slew survey (XMMSL2) 6 all-sky surveys. X-ray sources were matched to their HELP master list optical counterparts using the published AllWISE cross-matches of Salvato et al. (2018). AGN and star-forming (or stellar) X-ray source populations are identified based on the colour criteria presented in Salvato et al. (2018): where [W 1] is the AllWISE W1 magnitude in Vega magnitudes and F 0.5−2keV the 2RXS or XMMSL2 flux in units of erg −1 s −1 cm −2 .
• Infrared AGN: When Spitzer photometry is available for a given field, IR AGN are also identified using the updated IR colour criteria presented in Donley et al. (2012).
Based on these criteria, master list sources classified as AGN are flagged and processed following the AGN (Duncan et al. 2018a;Duncan et al. 2018b). We note that by design these selections are not intended to provide complete samples of the AGN population within the HELP master list.
For each HELP field we provide documentation outlining the specific master list and spectroscopic redshift compilation versions that were used for the photo-z estimation, calibration and validation. We also document the precise set of optical filters included in the template fitting along with the lists of filter combinations used for GPz machinelearning estimates (where available).
The photo-z estimates produced by HELP are provided in two ways. Firstly, we provide the full calibrated photo-z posterior for all sources for which the Hierarchical Bayesian procedure outlined above could be performed (as well as working examples of how to extract and use this information). Secondly, we provide summary values for the posteriors in a format suitable for catalogues and single-value based quality statistics. We follow the approach outlined in Duncan et al. (2019), which aims to provide an accurate representation of the redshift posteriors, regardless of whether the posterior is uni-or multi-modal (as is often the case for photo-zs). In summary, for each calibrated redshift posterior the primary (and secondary if present) peak above the 80% highest probability density (HPD) credible interval (CI) is identified based on the redshifts at which the redshift posterior, P (z), crosses this threshold. For each peak, the median redshift within the boundaries of the 80% HPD CI is calculated to produce our point-estimate of the photo-z (hereafter 6 https://www.cosmos.esa.int/web/xmm-newton/xmmsl2-ug z 1,median or z 2,median ). To present a measure of redshift uncertainty within the HELP catalogues we also then present the lower and upper boundaries of the 80% HPD CI peaks (i.e. where the P (z) crosses the threshold). In the HELP catalogues, database and the subsequent analysis (e.g. Section 4.7), photo-zs values are taken to be z 1,median .

Literature Photometric redshifts
For the fields AKARI-NEP, AKARI-SEP, ELAIS-N2, HDF-N, SA13, SPIRE-NEP, xFLS, and XMM-13hr we use the photometric redshifts presented in Zou et al. (2019) based on Legacy Survey grz fluxes and Wise W1, and W2. These are fields without additional data sets to those presented there and therefore where recalculating them was of little additional benefit. These are matched into the master list using a positional cross match with a radius of 0.4 arcsec. These redshifts are subject to a cut of r < 23 mag. After processing they also impose a redshift cut of z < 1 and stellar mass cut 8.4 < log(M * ) < 11.9. These cuts impose limits on studies that can be conducted with these areas but lead to a well defined selection function. This redshift selection function can be automatically handled using the photometric redhsift depth maps.

Photo-z Validation
Due to the range in photometric data quality and spectroscopic training samples available in each field, there is significant variation in the photo-z quality across the HELP footprint. Fig. 5 presents a qualitative illustration of the accuracy of photo-z in fields that demonstrate the dynamic range in parameter space probed by HELP; the deep but small COSMOS field and the Herschel Stripe 82 field that spans over 360 deg 2 . In both fields, we limit the sample to sources with reliable spectroscopic redshifts and detections in at least the optical and NIR regimes. With additional selection criteria (such as magnitude selection and redshift limits), samples with reliable and precise photo-z can be easily produced for each field. To facilitate this, as part of the photo-z validation steps we generate a number of diagnostic and illustrative plots to enable assessment of the photo-z quality within each field. However, we note that given the limited availability of spectroscopic data in some fields (and the overall variation in spectroscopic coverage), it is not possible to provide homogenous and complete assessment across the full HELP photo-z sample.

Photo-z Selection Functions
Following the HELP goals and philosophy, we have also endeavoured to provide informative data products and tools for understanding and accounting for both the explicit and implicit photo-z selection functions.
Given the in-homegeneity across both the full HELP sky and across each individual survey field, quantifying the spatially varying selection function is critical. However, due to the complicated nature of the optical selection function within deep fields that typically have very heterogeneous depth and filter coverage the selection will be highly multidimensional. Additionally, as the exact selection function corresponding to a given sample are user and science-case dependent (i.e. depending on required redshift or photometric quality criteria), a novel and flexible approach is required. Building upon the HEALPix based optical depth maps produced in the production of the optical master list (Shirley et al. 2019), we provide a set of tools to calculate photo-z completeness across a given HELP field as a function of any desired master list magnitude within the field. Specifically, these tools allow simple calculation of the area within a field where desired photo-z completeness is met (given the measured photometric quality as a function of magnitude). Or alternatively, the same tools can be used to calculate the magnitude selection at every position in the field that is required to meet a desired photo-z selection criteria -accounting for variable photo-z quality across a field due to heterogeneous photometric coverage. Full details of the motivation and method will be presented in Duncan et al. (in prep.). As part of HELP Data Release 1, we provide working example notebooks for both the generation and exploitation of these HEALPix based photo-z selection functions.

Future HELP Photo-z Plans
The photo-z estimation methodology applied to majority of HELP fields and sources (i.e. Duncan et al. 2018a;Duncan et al. 2018b) was developed with the aim of providing the optimum photo-z estimate given the available data, regardless of source type. However, the requirement for template fitting (and machine learning), hierarchical Bayesian combination and detailed posterior. This difficulty is particularly acute in the largest and most complicated datasets (e.g. Herschel Stripe 82) where significant high-performance computing resources were required for the runs.
Going forward, we will therefore aim to move to a more scaleable and automated approach that can exploit the infrastructure provided by HELP. The ingestion of the homogenised optical datasets into the HELP virtual observatory now opens up possibilities for optimised machinelearning based photo-zs that combine all spectroscopic redshifts available for a given combination of optical to near-IR photometry, regardless of field. By combining this unified resource with, for example, updated GPz algorithm that naturally allow for redshift predictions in the case of missing data (Almosallam et al. in prep), it will be possible to provide comparable high quality photo-z estimates in a more scalable manner. Additionally, on-demand computation within the database could be provided following the approach presented in Beck et al. (2017).

Physical Modelling
Physical modelling of master list sources with both a redshift estimate and FIR data was carried out using Code Investigating GALaxy Emission 7 (CIGALE, Boquien et al. 2019). We refer to Burgarella et al. (2005) CIGALE conserves the energy balance between dust absorption in the UV to near infrared domain and emission in the mid and far IR when generating SEDs. The stellar emission is constructed from composite stellar populations from simple stellar populations (SSP) combined with flexible star formation histories (SFH). Attenuation curves are then applied to estimate the fraction of energy from stars and gas absorbed and re-emitted by the dust using a dust emission template. CIGALE is a highly flexible code, containing multiple different modules for SSP, SFH, dust attenuation, dust emission and AGN component for users to apply or they can add their own modules.
For the HELP project we selected low resolution Bruzual & Charlot (2003) single stellar population models, assuming a Chabrier (2003) initial mass function. SFHs are chosen to have the so-called delayed τ parametrisation (∝ t e −t/τ ) form, with additional bursts to allow the SFH to model starburst populations on top of older (potentially quiescent) populations. The SFH is defined as: with t0 being the age of onset of the second episode of star formation. We perform SED-fitting runs with Charlot & Fall (2000) model as the dust attenuation recipe. The Charlot & Fall (2000) law assumed that stars are formed in interstellar birth clouds (BC), and after 10 7 yr, young stars disrupt their "nursery" and migrate into the ambient inter stellar medium (ISM). Both regions, BC and ISM, are characterised by a different power law -one representing dust attenuation in the BC and the second in the ISM for older stars. As a result, the emission from young stellar population is attenuated first in the BC and then it goes through the dust in the ISM. Stars older than 10 7 yr are attenuated only in the ISM. Different power slopes for BC and ISM can be used, but here we chose to keep both power law slopes of the attenuation fixed at -0.7, as in the original Charlot & Fall (2000) work.
To model the IR SEDs of the HELP galaxies, we use a dust emission model where the majority of the dust is heated by a radiation field from the diffuse interstellar medium, while a much smaller fraction of dust is illuminated by the starlight. AGNs can substantially contribute to the mid-IR emission. To improve the accuracy of derived galaxy properties and because we have data coverage from optical to FIR, we add an Active Galactic Nuclei (AGN) component to the SED modelling, using the dusty torus models of Fritz et al. (2006).
Based on the five components a large grid of models are fitted to the data. The number of parameter values depends on the properties of the field containing between 50 and 100 million individual models. The physical properties published as a part of HELP DR1; dust luminosity (L dust ), stellar mass (Mstar), and SFR, are built from the probability distribution function and for each parameter, the likelihood-weighted mean and standard deviations are calculated. These measurements have already been used for science purposes providing validation of the method against the literature (Ocran et al. 2021).
For each fitted galaxy we calculate two values of the reduced χ 2 for the best-fit template and photometric measurements: OPT χ 2 (for wavelengths lower than 8 µm restframe) and IR χ 2 (for wavelengths larger than 8 µm restframe). Using these two χ 2 values enables identification of SED-fitting failures or peculiar objects (see Ma lek et al. 2018). An example CIGALE fit is presented in Fig. 6.
CIGALE is run for all HELP galaxies with at least two optical and at least two near-IR detections. Additionally, to select the sample for which physical properties are estimated, we keep only sources with at least 2 far-IR measurements (signal to noise ratio ≥2). As the redshift is essential for the physical modelling process we used the photo-zs (as presented in Section 4.6). As SED fitting codes are sensitive to having one wavelength region overly weighted due to the presence of multiple measurements on a single passband. Constraining power from multiple measurements at similar passbands will dominate over other bands. When multiple measurements are available in similar passbands we take the deepest only as determined by the signal to noise ratio. On average, based on the selection described above, we estimated physical properties of 1.7 million galaxies or 1% off all objects in the master list. Table 2 shows the summary of the catalogues in each field, including the number of sources for which Mstar, L dust and SFR were estimated. To make HELP's SED fitting easily reproducible, we published a dedicated version of CIGALE, cigalon 8 which contains the modules and parameters used to fit HELP's galaxies. This version can be found on the main CIGALE page.
As it was shown in Ma lek et al. (2018), CIGALE's implementation of the energy balance enables predictions of L dust for standard IR galaxies, which preserve energy budget, based on the UV to near infrared data only. Here we would like to stress that CIGALE cannot estimate monochromatic fluxes with reliable uncertainties, but only the total value of L dust . We used the predictions as priors for the IR extraction pipeline XID+. A similar method was used by Pearson et al. (2018) for the star forming galaxies beyond the Herschel confusion limit. To predict the L dust for galaxies without IR measurements, we run cigalon using the same parameters and methodology as described above but without the AGN module as without mid-IR data we are not able to constrain a reliable AGN component.

Database structure and access
HELP data is distributed through the Herschel Database in Marseille 9 (HeDaM) in addition to the Virtual Observatory at susseX 10 (VOX). The former allows access to all raw data for reprocessing or direct handling. The latter permits fast queries over the full HELP area to generate samples for scientific analysis. Data accessed by code on GitHub can be found via its relative link on HeDaM such that the user can download the entire database and perform a full rerun. We also include meta data files in the Git repository with links to the corresponding data files.

HeDaM catalogues and images
On HeDaM each data product is organised by field. For each field there is a final catalogue containing all the information from the optical fluxes to the infrared, the redshifts and the physical parameters derived with SED fitting. The HELP Herschel SPIRE and PACS images are also present in addition to the blind sources associated with each. The database is designed to run across the entire HELP sky. Many scientific users will be interested in an individual field of interest. We have therefore provided overviews of each field to help a new user become familiar with the data presented there.
HeDaM also provides everything to reprocess the data exactly as described here. This facilitates rerunning the HELP work while changing some parameters or, for instance, adding a new optical catalogue to the process. As we have emphasised throughout, all our code is available on GitHub as Python modules and Jupyter notebooks. For storage reasons, the data files are not included in the Github repository. HeDaM contains a file storage with the exact same structure but with the data files within. For many science cases the final merged catalogues with summary information on every galaxy will be sufficient.

Virtual Observatory
The Virtual Observatory at susseX (VOX) is a virtual observatory server built using the German Astrophysical Virtual Observatory (GAVO) DaCHS software: Data Center Helper Suite 11 (Demleitner et al. 2014;Demleitner 2018). VOX contains both the images and the catalogue data.
Images are available through the Simple Image Access Protocol (SIAP). In particular, VOX makes is possible to get image cutouts at a given position.
The catalogue data is gathered into a single table across all the coverage that users can query using the Table Access Protocol (TAP) with compliant software like TOPCAT (Taylor 2005), STILTS (Taylor 2006b) or PyVO 12 . This allows users to make sophisticated queries or to remotely crossmatch their catalogues with HELP data.
The total catalogue containing all photometry measurements across all fields has around 500 columns where each source may only have flux information in a small subset of these bands. We therefore also provide a 'best' photometry catalogue which contains the lowest error measurement in each ugrizyJHKKs band in addition to the far infrared fluxes. This reduces the number of columns to around 50.
Due to the reduction in size it is also possible to index every column allowing fast queries. If the user requires the full photometry measurement they can then join their selection to the main table. For people working on a specific field they might prefer to download the full catalogue table on the field from HeDaM. VOX is particularly helpful when looking at samples scattered across a large area where it is unfeasible to download the full catalogue to perform cross matching.

RESULTS
In this section we summarise the quality and sensitivities of the DR1 catalogue products. Table 2 gives an overview of the numbers of processed objects and areas associated with each field. All fields have been fully processed through the HELP pipeline. Each field has different features which determine the depths and quality of forced far infrared fluxes and calculated physical parameters. Our aim is to allow these features to be modelled automatically across the whole HELP area as much as possible when users are constructing samples. Depending on the scientific question at hand, the desired sample properties can range from complete but heterogeneous samples over large areas/multiple fields to homogeneous samples within individual fields (and any permutation in between). Constructing the selection functions associated with these varying samples can be facilitated using the depth maps described and presented in Shirley et al. (2019) and here with the additional far infrared bands. Figure 7 shows the differential number counts in the Herschel bands PACS 100, PACS 160, SPIRE 250, SPIRE 350, and SPIRE 500. Figure 8 shows the relative fractions and numbers of each type of object in the final catalogue. Together these figures give an overview of the galaxy numbers as they pass through the pipeline.

Summary of master list and depths
The master list number counts as a function of the optical and near infrared band magnitude on each field are summarised in Shirley et al. (2019). We also show the basic catalogue statistics here in Table 2. Defining depth for the forced photometry is dependent on the various factors contributing to the selection function. Here we take a similar approach to the depth maps described in Shirley et al. (2019) and compute mean errors for objects with signal to noise above 2. This gives a metric analogous to the traditional notion of depth for optical and NIR detection images and allows us to compare limits on the faintness of objects across the full wavelength range. Nevertheless the effective far infrared flux depth is in turn dependent on the depths of all the bands contributing to the determination of the prior list. Figure 12 shows the cumulative depths of MIPS, PACS and SPIRE coverage. Figure 13 shows an overview of depths for each band compared to a typical HELP ULIRG galaxy SED. Figures 9 to 11 show areas available to a given depth or deeper as a function of various optical to FIR band depths to illustrate how band depth are correlated with each other. These figures show the complexity of selection effects and its high dimensionality, often being dependent on over five band depths in addition to the shape of the SED. The additional data products presented here facilitate selection func-  Figure 7. The differential number counts in Herschel bands on each field. Some of the small fields suffer from small number statistics. The normalisation factors for each field are computed based on the area over which there are any XID+ priors. The size of this area is dependent on what XID+ prior used for each field, and can cover less than half of the field depending (e.g. IRAC based priors).  tion modelling. The method used to model selection effects must be targeted to a given science goal since a given physical property of interest will be impacted by different selection effects depending on how it is correlated with the relevant detection bands and requirements made by each processing stage.

Summary of XID+ catalogues
The number of objects that have been deblended with XID+ (for MIPS, PACS and or SPIRE) in each field can be seen in Table 2. We provide the number counts in the PACS and SPIRE bands produced by the forced photometry presented here on each field in Figure 7. These number counts are jointly determined by the depth of the images and the prior list. These numbers go beyond the confusion limit that determine the number counts for blind source extracted cata- logues. The flux cuts for each field, described in section 4.3.1 can be found in Table 3.

Summary of blind catalogues
The number of blind SPIRE objects in each field can be seen in Table 2. In total there are 1.9 million blind sources across the HELP fields. We have compared these objects to previous blind catalogues (Smith et al. 2012;Wang et al. 2014;ESA 2017) and find over 90% overlap between them. The previous blind catalogues from HerMES and HATLAS have been used to identify numerous high star-forming high redshift galaxies (e.g. Riechers 2013; Asboth et al. 2015) and candidates for follow-up programs (Duivenvoorden et al. 2018). The HELP blind catalogues, across all 1270 deg. 2 are a data product for which many more of the rare, highly star forming, high redshift candidates can be identified. They are also a useful comparison with the forced XID+ catalogues in  Figure 13. The distribution of area on the sky to a given depth, νF ν, shown via a violin plot, for each broad band type (taking the deepest specific band available in a given HEALPix cell). Optical and NIR depths are 5σ depths in a 2 arcsec aperture. MIPS, PACS, and SPIRE depths are 5σ depths for the XID+ forced photometry values. The colour of each area is determined by the total area that data for that band is available. We also plot a HELP spectral energy distribution for a ULIRG galaxy with star formation rate of 200 M /yr and a stellar mass of 10 10 M at various redshifts.
understanding the multiplicity of SPIRE sources (Scudder et al. 2016) and the impact of the prior catalogues used for forced photometry.

Summary of CIGALE Physical Properties
The main physical properties were estimated for all galaxies with known redshift (spectroscopic or photometric), with at least two detections in the optical range, two detections in NIR range and at least two FIR measurements (PACS and/or SPIRE) with SNR>2. We call this the A list. The CIGALE code was used to estimate dust luminosity, star formation rate, stellar mass and the AGN contribution to dust luminosity. Alongside the physical properties obtained from the full UV to far infrared SED fitting, three different values of χ 2 s are provided: reduced χ 2 r , quantifying the global quality of the SED fit for each galaxy, and OP T χ 2 and IR χ 2 , described in section 4.7 (detailed description can be found in Ma lek et al. 2018). Those quantities can help to identify possible interesting sources or modelling failures. Figure 14 shows the distribution of estimated physical properties: dust luminosity, star formation rate, and stellar mass as a function of redshift. These all-sky samples are drawn from vastly different areas of sky in terms of depths and areas. To illustrate this variation in dynamic range. we overlay contours between the wide and deep Herschel Stripe 82 field and the narrow deep field Boötes in Figure 14. Those figures show the uniqueness of HELP in providing data for extremely bright IR sources at high redshift, and normal star forming galaxies at low redshift. Around 90% of HELP galaxies with SED fits have dust luminosity larger than 10 11 L . The majority of IR galaxies belong to Lumi-nous Infra Red Galaxies (75%), 15% are classified as Ultra Luminous Infra Red Galaxies. Using our SED fitting procedure, we found more than 3 500 (0.2%) IR galaxies with dust luminosity larger than 10 13 L -those extraordinarily bright and active objects are still rare and not well understood (e.g. Rowan-Robinson 2000;Farrah et al. 2002;Wang, L. et al. 2020). The most extreme IR sources are also very active in star formation processes, especially at high redshift. This selection effect can also be seen in fig. 14a.

DISCUSSION AND CONCLUSION
Understanding galaxy formation and evolution requires measurements from many facilities to trace the different physical processes such as star formation and AGN activity in galaxies over cosmic time. These data need to cover large areas of the sky to obtain the large samples to characterise the population properties. These data sets need to be homogenous and well understood to enable the statistical studies required.
Construction of such a dataset has been the main goal of the HELP project. The corresponding data release documented in this paper describes HELP's first attempt at solving the challenges in the collation and merging of large, disparate, complex multi-wavelength datasets. Focusing on the Herschel fields for our first data release, DR1, HELP collates and homogenous optical and near-infrared catalogues over 1270 deg. 2 and applies novel methods to provide reliable photometry from the inherently source-confused far infrared images. Making as much use as possible of the Herschel data remains an important consideration, given this   Figure 14. Overview of dust luminosity, star formation rate and stellar mass estimated using CIGALE SED fitting tool as a function of redshift across the full HELP sky. As an example we overlay contours between the wide Herschel Stripe 82 field and the narrow deep field Boötes. Star formation rates are scaled to dust luminosity using the median ratio. data captures most of the CIRB over a wavelength range for which no new instrument is planned for the foreseeable future.

Enabling Science
The ultimate validation of the HELP data will be through its utilisation for new scientific analysis.
The collation of data over many fields enables scientific investigations with larger statistical samples. Duivenvoorden et al. (2020) used the collated master list and Herschel SPIRE images to investigate how the contribution to the CIRB varies with galaxies selected at different wavelengths.
The novel methods of extracting information from low resolution far infrared data also enables new investigations. Scudder et al. (2016Scudder et al. ( , 2018 used the full posterior probability distribution provided by XID+ to show how the multiplicity of Herschel SPIRE sources changes with flux. The large-area means that rare objects are included, and the wide variety of data enables these to be discovered. An example can be found in the hyper luminous obscured quasar recently discovered at z ∼ 4 (Efstathiou et al. 2021).
The far infrared data, uniquely available in the DR1 fields, permits studies of massive dusty star-forming galaxies (DSFGs, e.g. Weiß et al. 2013;Casey et al. 2014). These galaxies play an important role in understanding galaxy evolution because the dust-obscured star formation activity becomes more important at higher redshift (Donevski et al. 2020). They are also crucial to understand how massive galaxies assembled (Hamed et al. 2021). Furthermore, a large sample of infrared galaxies spanning a wide redshift range allow us to study the effect of dust attenuation on the physical properties of galaxies (Lo Faro et al. 2017;Buat et al. 2018Buat et al. , 2019 The HELP data products cover many significant extragalactic survey fields, they are a valuable resource for existing surveys such as the Spitzer Extragalactic Representative Volume Survey (SERVS) and the DeepDrill survey (Mauduit et al. 2012). Ongoing and future surveys will also be able to exploit and build on the HELP data products. The LOFAR deep field team have used the HELP products directly (e.g. Gloudemans, A.J. et al. 2020;Smith, D. J. B. et al. 2020;Wang, L. et al. 2020) and together with HELP team, exploring the far-infrared radio correlation McCheyne (2021). The MeerKAT International GHz Tiered Extragalactic Exploration (MIGHTEE, (Jarvis et al. 2016)), is another ongoing radio survey with fields that overlaps with HELP and so products like the deblended FIR XID+ fluxes will be a valuable resource e.g. in deeper exploration of the far infraredradio correlation.
The wide spectrum of physical properties can be also used to design or simulate upcoming large surveys such as the Legacy Survey of Space and Time (LSST) (Riccio 2021). The selected LSST Deep Drilling fields (ELAIS S1, XMM-LSS, Extended Chandra Deep Field-South and COSMOS) are also covered by HELP and so data products these fields could provide immediate inputs to planning and direct contributions to early science.
The large area and depth of HELP can enable more robust analysis of the fundamental statistical relations. However such analyses must take into account the associated selection functions. Modelling the full selection functions for physical properties is now the focus of HELP research. Modelling how fluxes propagate through to redshifts and other physical properties. We have aimed to provide the necessary information throughout the HELP pipeline to make this more tractable. Campos Varillas (2021) will develop empirical methods to estimate the HELP selection function in order to probe the bright end of the stellar mass function of galaxies in Herschel Stripe 82.

Future data releases and how to contribute
HELP's open approach means decisions made during the collation and production of DR1 are transparent and can be reproduced by other teams. It also provides a pathway for any astronomers to contribute and add to the HELP dataset in the future. As optical, infrared and radio surveys produce new datasets, these will need to be combined with legacy surveys. The methods, pipelines and tools described in this paper provide a platform to enable their ingestion.
As further data sets are incorporated it will also be possible to improve the prior list and the subsequent photometry and the SED fitting. The open source nature of the HELP pipelines and tools means these improvements can be done by anyone and are not dependent on the current HELP team. For DR1 we decided to generate the most versatile data products, with a non-informative prior, but specific science may benefit from using the full optical to mid infrared photometry and SED fitting to use more informative flux priors.
For HELP to continue being a valuable resource there will need to be a transition from a centrally managed project to becoming a distributed community endeavour. Using version control systems such as GitHub which have enabled many collaborators to contribute to open source projects will be one way in which such a community can be fostered. We are working with other survey teams and training them in the use of the HELP pipelines and tools. We welcome new teams interested in combining their observational data, or value-added related datasets (e.g. alternative physical parameter catalogues, galaxy cluster catalogues etc) to contribute and integrate their data.

Summary
We present the HELP project which collates extragalactic surveys from the optical to Herschel far infrared bands. This includes an open source software pipeline as well as the resultant data products. This first data release, DR1, can be used to study an unprecedented wide area of Herschel sky. Some key highlights of this new data set are: • We have collated data for 170 million objects from optical to far infrared over 1270 deg 2 of the prime extragalactic fields, with boundaries defined by the mapping of the Herschel Space observatory.
• We present far-infrared photometry for 18 million objects in an optical to mid infrared selected prior list chosen to be tightly correlated with far infrared bright objects. We calculate a posterior distribution on the flux for all objects using Bayesian inference.
• We publish the main physical properties; stellar mass, dust luminosity and star formation rate based on spectral energy distribution modelling with the CIGALE code. This is done for all galaxies with at least two detections in each wavelength region for a total of 1.7 million objects, being 1% of the total HELP catalogue.
• The new catalogue is presented alongside an array of other data products, including newly homogenised images, supplementary catalogues and extensive tools for accessing and analysing these new data sets.

AUTHOR CONTRIBUTIONS
R. Shirley made the largest contribution to the data products, analysis and led the writing of text in this paper and is thus identified as first author. Oliver defined the project, secured the funding and led the team in the development and execution of the project and is identified in the senior author (last) position. K. Duncan, P.D. Hurley, K. Ma lek, M.C. Campos-Varillas, Y. Roehlly and M.W.L. Smith contributed very significantly and equally to the production of the data products, the analysis and the writing. The other authors all contributed significantly to either the design of the project, some data products, validation, analysis or writing of the paper. and for SPIRE here: https://github.com/H-E-L-P/dmu_products/blob/ master/dmu19/dmu19_HELP-SPIRE-maps/spire_obsids. csv All products available through the HELP www pages herschel.sussex.ac.uk.

APPENDIX B: DATABASE STRUCTURE AND ACCESS
The data presented here is all publicly available including all input data sets that were used to produce it. All the code used for the processing is also publicly available on GitHub here: https://github.com/H-E-L-P/dmu_products/ The raw data as fits tables and images is available at the Herschel database at Marseille here: http://hedam.lam.fr/HELP/dataproducts/ Those large files can be inconvenient so we also supply a Virtual Observatory server for automated querying of the full data set here: https://herschel-vos.phys.sussex.ac.uk/ All the code used to produce the figures presented in this paper is available here: https://github.com/H-E-L-P/dmu_products/tree/ master/dmu31/dmu31_Examples In order to access the flat files you will need to navigate the structure of Data Management Units products defined in Table C1. At each stage of the pipeline the working files are saved with the final merged catalogues stored in DMU32. We have produced and continue to work on extensive documentation to aid navigating the database. Starting at the front page of the GitHub repository should allow the reader to locate detailed descriptions of each section of the database. Alternatively we also provide per field summaries in addition to imaging of each field in order to inspect individual areas or objects.

APPENDIX C: SPECTROSCOPIC REDSHIFT SOURCES
Spectroscopic redshifts are compiled from numerous sources. Table C2 gives all the relevant references and can be used to find the source for a given object in the masterlist.   16 Chincarini et al. (1984) 32 Dressler et al. (1999) 64 Loveday et al. (1996) 128 Maza et al. (1995) 256 Tully et al. (2008) 1 AGES: The AGN and Galaxy Evolution Survey (Kochanek et al. 2012