Super-resolving Herschel imaging: a proof of concept using Deep Neural Networks

Wide-field sub-millimetre surveys have driven many major advances in galaxy evolution in the past decade, but without extensive follow-up observations the coarse angular resolution of these surveys limits the science exploitation. This has driven the development of various analytical deconvolution methods. In the last half a decade Generative Adversarial Networks have been used to attempt deconvolutions on optical data. Here we present an autoencoder with a novel loss function to overcome this problem in the sub-millimeter wavelength range. This approach is successfully demonstrated on Herschel SPIRE 500$\mu$m COSMOS data, with the super-resolving target being the JCMT SCUBA-2 450$\mu$m observations of the same field. We reproduce the JCMT SCUBA-2 images with high fidelity using this autoencoder. This is quantified through the point source fluxes and positions, the completeness and the purity.


INTRODUCTION
All astronomical imaging has an intrinsic angular resolution limit, whether due to seeing, diffraction, instrumental effects or (in the case of an interferometer) the longest available baselines.In single-dish diffraction-limited imaging, there is formally no signal on Fourier scales smaller than the diffraction limit.This is because Fraunhofer diffraction is mathematically equivalent 1 to a Fourier transform, so the large-scale boundary of the telescope aperture also implies there is no image information smaller than some angular scale.For interferometers, the Fourier plane has incomplete coverage especially approaching the smallest angular scales.
However, there are often strong scientific drivers for improving angular resolution.Among many advantages, higher angular resolution affords the possibility of more reliable multi-wavelength crossidentifications (e.g.Franco et al. 2018;Dudzevičiūtė et al. 2020), improved deblending of nearby sources (e.g.Hodge et al. 2013;Simpson et al. 2015), and fainter fundamental confusion limits.For example, Geach et al. (2013) used the better angular resolution of ★ E-mail: lynge.lauritsen@open.ac.uk 1 The Fourier transform of the aperture gives the amplitude pattern of a point source, e.g. a 1D top-hat aperture yields a sinc function.Incident energy is proportional to amplitude squared, e.g. a top-hat aperture yields sinc 2 .For a 2D circular aperture this is sinc 2 ( |r |), i.e. an Airy function.
the James Clerk Maxwell Telescope (JCMT) SCUBA-2 450 µm data compared to the Herschel SPIRE instrument to probe sub-millimetre (sub-mm) number counts with fluxes below 20 mJy where source confusion becomes problematic in Herschel SPIRE data (Oliver et al. 2010;Valiante et al. 2016).Further, Geach et al. (2013) also resolved a larger part of the Cosmic Infrared Background than that possible using Herschel SPIRE.There has therefore been a great deal of interest in developing algorithms for recovering or estimating some of the missing Fourier data on smaller angular scales (see Starck et al. (2002) and references therein for a detailed discussion), including approaches that exploit abundant multi-wavelength data where that exists (e.g.Hurley et al. 2017;Jin et al. 2018).
One domain where angular resolution gains are particularly advantageous is sub-mm astronomy.Wide-field extragalactic surveys have proved transformative for e.g.nearby galaxies (e.g.Clark et al. 2018), galaxy evolution (e.g.Lutz 2014;Hayward et al. 2013;Geach et al. 2017) and strong gravitational lensing (e.g.Negrello et al. 2010).Sub-mm galaxies can also be used to trace possible protoclusters through overdensities (Ma et al. 2015;Lewis et al. 2018;Greenslade et al. 2018).Much of this progress has been driven by surveys with the SPIRE instrument (Griffin et al. 2010) on the ESA Herschel 2 2 Herschel is an ESA space observatory with science instruments provided by European-led Principal Investigator consortia and with important participation from NASA.mission (Pilbratt et al. 2010), but at moderately high redshifts (e.g.

𝑧 >
∼ 4) the detections tend to be dominated by the longest wavelength band (500 µm ) where the diffraction-limited point spread function (PSF) has a full width half maximum (FWHM) of 36.6 .Higher resolution mapping is possible with ground-based facilities such as SCUBA-2 (Holland et al. 2013) and the Atacama Large Millimeter Array (ALMA), but the mapping efficiencies are far lower and it is not feasible to map the entire Herschel SPIRE extragalactic survey fields with sub-mm ground-based facilities to comparable depths.Furthermore, the abundant multi-wavelength data available for multi-wavelength prior based deconvolution work in the deeper Herschel fields (e.g.Oliver et al. 2012) does not exist at equivalent depths for all wider-area Herschel surveys (e.g.Eales et al. 2010).
In the past half-decade the use of machine learning, and in particular Convolutional Neural Networks (CNNs), has gained popularity as a potential solution to image deconvolution (Schawinski et al. 2017;Jia et al. 2021;Moriwaki et al. 2021).These CNNs all use a Generative Adversarial Neural Network (GAN) for their CNN based image restoration.A GAN consists of two neural networks called the generator, and the discriminator.The generator is trained to generate an image that looks "realistic" (according to some relevant quantitative metric), while the discriminator will try to determine if a given image is real or generated (Goodfellow et al. 2014).As they are trained concurrently with competing objectives the performance of the generator will depend on the detailed characteristics of the discriminator.This paper presents an alternative approach using an autoencoder 3 with a specially designed loss function.Similar networks have been used to enhance and remove noise from astronomical images at other wavelengths (e.g.Vojtekova et al. 2020).
Architecturally, an autoencoder contains an encoding CNN which extracts a relatively small number of scalar-valued features from input images.The values of these features are referred to collectively as an embedding of the input image.A second, decoding CNN is then used to generate an image with the same dimensions as the input images, using only the information encoded by the embedding.The objective of the decoding network is to produce an image that closely matches a given target image that is associated with the corresponding input.During training, the encoding network learns to construct an embedding which optimally represents the features of the input image that are required for the decoding network to generate a close match to the corresponding target (Goodfellow et al. 2016).This paper will show that a simple autoencoder network can be used to super-resolve Herschel SPIRE data, and achieve angular resolution comparable to that of JCMT.This super-resolved data can then be used to determine the sky locations and fluxes of previously lower-resolution observations of sub-mm galaxies.
In §2 two different training sets are discussed, in §3 the network architecture, and the loss function is described, in §4 the network performance on both observed, and simulated data is presented, and finally §5, and §6 will discuss and summarise the network performance.

TRAINING DATA
The autoencoder presented in this paper is a supervised machine learning algorithm.Supervised learning requires the use of a training dataset with known truth values.Two separate training sets were 3 The architecture of an autoencoder is similar to that of many GANs Schaw- used to train the network presented in this paper: (i) a simulated training set, made using images generated by a modified version of the Empirical Galaxy Generator (EGG) software (Schreiber et al. 2017) as both the the target and input images, and (ii) using the JCMT SCUBA-2 450 µm maps from the STUDIES project (Wang et al. 2017) as target examples, with the Herschel SPIRE maps for the COSMOS field as input images (Levenson et al. 2010;Oliver et al. 2012;Viero et al. 2013).Table 1 shows the FWHM, confusion limit and pre-interpolation pixel scales of the instruments whose data are used in this paper.

Simulated Data
There are very few large astronomical fields that have been surveyed by both Herschel SPIRE and JCMT SCUBA-2.Accordingly, simulations must be used to create a large representative dataset of images to train the network.The simulated dataset was generated using a version of The Empirical Galaxy Generator (EGG) software (see Schreiber et al. (2017) for an in-depth discussion on the workings of EGG), that was modified to avoid simulating galaxies with negligible infrared fluxes using an empirically determined bolometric luminosity threshold imposed within the code.This was done to improve efficiency and objects with negligible FIR luminosity were not simulated.This modification altered the number counts outside the FIR range, but reproduced realistic number counts in the FIR range at a lower computational cost.We verified that it made no discernible difference to the output images, while saving considerable computation time.EGG is designed to generate a mock survey catalog with realistic multi-wavelength galaxy number counts, using an empirical calibration, and with realistic galaxy clustering.To reduce the number of simulated galaxies, dependent on the depth of the simulated image, the original EGG code uses either a stellar mass cutoff on simulated galaxies, or a UVJ-diagram based selection criteria designed around optical galaxies.The modification in this paper uses the estimated star formation rate (SFR) to calculate a bolometric infrared luminosity using the same empicially calibrated formulas already used in the EGG code.EGG uses the SkyMaker (Bertin & Fouqué 2010) code to generate survey images from the mock catalogues.The EGG software was used to create a training data set representing the redshift range 0.1 ≤  ≤ 6.The EGG code generated a number of co-spatial 20 deg 2 images for the 4 bands used in this paper with three Herschel SPIRE bands and one JCMT SCUBA-2 band.Each set of 4 images was cut into non-overlapping subregions covering 424 × 424 arcsec 2 , providing a total of 2373 images to train on.10% of the generated images were reserved for use as a test set.

Observational Data
A smaller training sample was derived from the small area of overlap within the COSMOS field between the JCMT SCUBA-2 STUDIES large program, and the Herschel SPIRE maps.The small area of overlap was divided into 144 overlapping images offset from each other in RA and Dec in steps of 12 arcsec.The 12 arcsec offset steps correspond with the pixel scale of the Herschel SPIRE 500 µm images and are therefore the smallest possible increment consistent with the lowest resolution images.Each set of images was then flipped and/or rotated to augment the data, to produce 8 images in total at each shifted position.This procedure resulted in an overall dataset containing 1156 images.The 144 non-flipped, non-rotated original images were removed from the training set to be used as a test set, leaving 1008 images for training and ensuring that the images used for testing differed as much as possible from the training set.

NETWORK ARCHITECTURE
The generator network of the auto-encoder is based on that used by the GalaxyGAN code (Schawinski et al. 2017).The CNN presented in this paper differs from GalaxyGAN, and other previous works in two significant ways: (i) the use of a more computationally expensive loss function, that is better designed to extract the individual features of interest, and (ii) no discriminator network is used.
The network processes the two training sets independently, in succession.Each epoch4 begins by training on the entire simulated training set, before training on the observed data set three times in succession.The aim was to have the network learn the key structural features of the sub-mm images on the simulated data before using the observed data to fine-tune the network to handle any small differences between observed and simulated images.Each training set was randomised before each run through the data.Due to simulation differences in the flux distribution the observed data were renormalised before training on them to ensure a comparable flux distribution to the simulations.

Autoencoder
The architecture presented in this paper uses a U-net configuration (Ronneberger et al. 2015).The outputs from each convolutional layer in the encoder network are concatenated with the inputs of their corresponding layer in the decoder network.This helps to prevent the overall network output from diverging substantially from its input.It takes as its input the three Herschel SPIRE bands (250 µm , 350 µm and 500 µm ) images and is trained to produce an output image that closely matches a target image consistent with the single JCMT SCUBA-2 450 µm band.All activation functions in the CNN are LeakyReLU: except for the final layer where a sigmoid function is used: (2) LeakyReLU was chosen as the activation functions over the ReLU function, as the zero-gradient nature of the ReLU function at  < 0 can cause "dead neurons" in the network.The sigmoid function in the final layer ensures a well constrained output range with continuous coverage.Batch normalisation is included after each convolutional layer to regularise their outputs, which enhances the overall stability of the network and its predictive performance on unseen input data (Ioffe & Szegedy 2015).For similar reasons, dropout layers are used to randomly disable training 50% of the kernel weights in the first three layers of the decoder network (Srivastava et al. 2014).
The architecture of the CNN is described in table 2 and a schematic shown in fig. 1.Using this architecture requires that the pixel dimensions of the input and ouput images match.However, the Herschel SPIRE 250 µm , 350 µm , and 500 µm image pixel scales are 6", 8.33", and 12" respectively, while the JCMT SCUBA-2 images have a pixel scale of 1".Accordingly, since the input and output images represent equal areas on the sky, a 2-D linear interpolation routine from the S P Python package (Virtanen et al. 2020) was used to subsample the input Herschel images.

Loss Function
The common approach for deconvolution when designing the loss function for GANs (e.g.Schawinski et al. 2017;Moriwaki et al. 2021) is to combine the loss  disc from a discriminator network with a simple  1−loss between the encoder-decoder network output  predicted and a target image  true .
The approach in this paper is different.While the CNN presented retains the  1−loss as part of the overall loss function, it is not the main component.The main goal of the CNN in this paper is to super-resolve the sub-mm telescope PSF.To achieve this, a novel, custom loss function was designed to better target the data features of interest.In particular, this multifaceted loss function focuses on the differences between the fluxes of any point sources that are identified in corresponding pairs of generated and target images.
The loss computation uses the P Python package (Bradley et al. 2020) to identify point sources within the target or generated images and extract fluxes from circular apertures with 10 arcsec radius, centred on the identified source locations.
The loss is computed by comparing the fluxes extracted from the generated and target images, but the details of the computation are different when training on the simulated and observed training sets.

Training on simulated data
When the network is training on simulated data, the locations of sources with signal-to-noise ratios of S/N > 3 are derived using the target image only.Fluxes are extracted from both the target and generated images using apertures corresponding to the target image locations.
The loss is computed as the sum over all apertures of the absolute  difference between the extracted flux in the target and generated image. where is the number of point sources that are identified in the target image,

Training on observed data
When the network is training on observed data, the locations of source are derived for both the target and generated images.For the target image the source identification criterion remains S/N > 3, but for the generated image, this threshold is relaxed and all sources with S/N > 1 are identified.The disparity in S/N detection limits used originates in the different purposes of the loss function in the two cases.When detecting sources in the real data the purpose is to replicate the aperture flux in real galaxies.This necessitates that a reasonable lower limit has to be set on sources that are attempted reconstructed.The opposite holds true when detecting sources in the generated data.In this case it is just important to identify spurious flux anomalies that does not correspond to actual sources, necessitating a lower S/N threshold.Four sets of fluxes are then extracted from both the target and generated images using both sets of apertures.Fluxes are extracted from the target images using apertures from the target and generated images, and vice-versa.
The loss is computed as where is the number of point sources that are identified in the generated image.The second term explicitly penalises spurious features that appear in the generated image.

Common loss components
In addition to those based on the aperture flux differences, three common components also contribute to training loss functions for both the observational and simulated training datasets.The first is the reduced mean of the absolute per-pixel difference between the target and generated images5 .
where  pix is the number of pixels in either of the images.This component ensures that the loss includes some influence from the bulk of image pixels outside extracted apertures.
The second common loss component is the absolute difference between the mean pixel fluxes of the generated and target images.This loss component is designed to encourage the generated image to have an integrated flux similar to that of the target image.
Finally, the loss includes the absolute difference between the median pixel fluxes of the generated and target images.This component is intended to produce generated images that have a similar distribution of pixel intensities to the target images.Since the majority of image pixels are noise or background dominated, this tends to result in generated images with similar background properties to the targets.

RESULTS
The CNN presented here was trained and tested using both a pure simulation data set and on a data set combining simulated and observed data.In Fig. 2 the performance on a pure simulation data set is demonstrated.The performance of the network on the combined simulated and observed data can be seen in Fig. 3.It is clear that the target images for the simulated data contain more discernible sources than the real JCMT images do.This is likely due to the reduced noise in the simulated data.Figs. 2 and 3 show the Herschel bands, as they are provided to the network, post-normalisation.The network has no effective way of recreating the noise inherent in real observations.The median, and  1−loss components of the loss function, should drive the network to represent the mean noise level as a quasiuniform background flux, the spatially varying nature of the real data noise will not be reproduced.The noise in the real backgroundsubtracted JCMT image is distributed around zero.This drives the background level generated by the CNN to be very close to zero, but it can never be negative because the sigmoid activation of the output layer does not allow negative values.The enforced non-negativity of the super-resolved image pixel values also means that the distribution of background noise is highly non-Gaussian and the significance of any point sources in the super-resolved image, relative to the background level cannot be interpreted in a standard Gaussian framework.The right-hand panel of Fig. 4 shows an Eddington-like bias in the reconstructed fluxes at the faint end (Eddington 1913), caused by pre-selecting faint features in the reconstruction.The matching with the JCMT observations uses a lower threshold for features.
Fig. 5 shows the astrometric error on the predicted locations of all of extracted sources detected in the super-resolved image.The natural intuition from single-dish observations is that the positional uncertainty of a point source should be approximately 0.6 FWHM /(S/N) where  FWHM is the beam full-width half maximum and S/N is the signal-to-noise ratio of that source (e.g.Ivison et al. 2007).However, in this case, the reconstructed map is not a single-dish observation, even though it resembles one.The astrometric uncertainty is a nontrivial product of the map reconstruction, and therefore it is something that must be determined directly from the comparison with the truth data.A further complicating factor is that the pixel scales of the originally used Herschel SPIRE images are 6", 8.33" and 12", and that of the JCMT images is 1", while the network uses images of the size 424 × 424 pixels.As 424 is not divisible by either of the Herschel SPIRE pixel scales, this will cause minor differences in the exact astrometric alignment of the images fed to the network causing small astrometric errors.Furthermore, the misalignment of the pixels in the Herschel SPIRE data, due to the individual pixel scales not being integer multiples of one another might cause additional issues.However, we find that the astrometric accuracy is often better than the pixel scale of the Herschel SPIRE 500 µm images, which is the closest equivalent image to the JCMT SCUBA-2 images.
Observed flux is a key characteristic of observed galaxies.A significant portion of the loss function was designed to target the recovery of this observable.Fig. 4 shows the relationship between the superresolved fluxes and the target fluxes of all sources brighter than 10 times the background flux RMS in the super-resolved image, for all the test image pairs.To compute an absolute flux calibration for the super-resolved point-source fluxes, the JCMT SCUBA-2 450 µm and super-resolved images are both convolved with a 2D Gaussian with a FWHM of 36.3 arcsec.A mask is generated to isolate the brightest pixels in the Herschel 500 µm image and the pixel fluxes at the unmasked locations are compared with corresponding pixel fluxes in the two convolved images.Two linear scaling relations are found While the network does seem to slightly underestimate the calibrated flux for the simulated sources, the results for observational data show good promise.It is worth noting that due to the substantial overlap between the sky areas covered by the individual test images, many of the extracted fluxes correspond to the the same submm galaxy, seen in a different image.Thus the source distribution might not be entirely representative.For bright sources identified in both the simulated and observational datasets, an approximately 1:1 correlation between the calibrated, super-resolved fluxes and their counterparts in the target images is evident, albeit with some scatter.This correlation implies that the fluxes of bright sources can be reli-ably extracted from super-resolved images.Note that the custom loss function is designed to recover the total flux within a 10" aperture.The pull is defined as  = | − |/, with  being the expected value,  being the mean value of the bin, and  being the standard deviation.The pull has been calculated for the reconstructed source fluxes in table 3, where it is shown that the pull for sources between 9 and 24 mJy with only one exception varies between 0.11 and 0.65.A stacking of the reconstructed sources reveals that the reconstruction has a PSF profile very similar to that of the target data (see fig. 6).This is achieved with only the  1−loss part of the loss function trying to replicate the PSF shape.
The completeness (also known as recall) and purity (also known as reliability) of the reconstructed sources are shown in fig.7 ness is evaluated considering a set of "real" sources with SNR ≥ 5 in the JCMT SCUBA-2 450 µm STUDIES survey maps.Sources that are detected in the generated maps are considered to be true positives if they fall within 10" of a real source and false negatives otherwise.
On the other hand we evaluate purity by considering the set of all "potential" sources that are detected in the generated maps.Potential sources that fall within 10" of a real source are deemed to be true positives and all other potential sources are counted as false positives.The completeness is > 95% at sources brighter than 15 mJy, and above 60% at 10 mJy.The purity does not drop below 87% at any point.Note that our reconstruction is remarkably complete even below the formal 500 µm blank-field confusion limit for Herschel SPIRE (table 1).

DISCUSSION
While many comparable neural networks (e.g.Schawinski et al. 2017;Jia et al. 2021;Moriwaki et al. 2021) use the output of a discriminator network as part of their loss functions this paper adopts a different approach.Recall that the training objective of a discriminator component in a GAN is to effectively distinguish between images that have been artificially generated or processed and images that are genuine or pristine.However, in order to make this distinction, it may rely on features of the images that a human interpreter might consider unimportant.In this paper, the most important objective from a human perspective is for the neural network to recover the locations and fluxes of the genuine point sources in the target data.However, from the perspective of a CNN it may be that the data sets used in this paper (see Figs. 2 and 3) differ most significantly in their noise  characteristics.It is therefore possible that a discriminator network would realise its training objective more effectively by focusing its attention on the fine details of the image noise, and disregarding the point source properties like astrometry and flux.Conversely, by using a hand engineered loss function, the network presented in this paper can be forced to focus on the image features that are most critical for the overall objective of super-resolving low resolution images.Fig. 5 plots the offset distances and position angles between the locations of identifiable point sources in the super-resolved Herschel SPIRE data and the nearest sub-mm galaxy location in the corresponding JCMT SCUBA-2 imaging.Overall reconstruction accuracy is excellent, with a purity calculated at above 87% at all reconstructed source flux densities, and completeness above 95% at target source flux densities above 15 mJy.Nonetheless, some small offsets between the reconstructed and target source positions are apparent.These offsets are likely caused by the different pixel scales for the different Herschel SPIRE bands, and the JCMT data.These pixel scales are not exact multiples of each other and so pixels from the different image bands intercept flux from different parts of the sky, and may encode information about different subsets of the true source distribution.Even after interpolation, the sources which fall close to the edge of a pixel in the lower resolution bands have inherently uncertain positions, which is likely reflected in the CNN output.Further, the uncertain alignment of in particular the 350 µm band might cause problems.The 12" and 6" of the 500 µm and 250 µm bands divides into each other, while the 8.33" of the 350 µm band might cause some uncertainty in source location when the images are shifted during data augmentation.Finally, the redder sources might have higher astrometric uncertainty as they are less represented in the higher resolution Herschel bands.While further work might reduce this astrometric offset, Fig. 5 shows a tendency of astrometric precision better than the 12" pixel scale of the 500 µm Herschel SPIRE band.
Following this successful proof of concept, there are several obvious next steps.These go beyond this initial analysis, and at least some of these will be presented in future papers.
Firstly, this deconvolution algorithm will be applied to all the Herschel SPIRE extragalactic survey data sets.For deeper fields with richer multi-wavelength complementary data, the deconvolution can be compared to other approaches that use this supplementary data as a prior (e.g.Hurley et al. 2017;Serjeant 2019).
Secondly, there are enhancements that can be made to the simulations, such as incorporating Galactic cirrus.Furthermore, Dunne et al. (2020) find that foreground large-scale structure can statistically magnify the background sub-mm source counts, so one improvement to the simulations would be to incorporate optical/near-infrared imaging and the effects of weak lensing.The deconvolutions would then be able to make use of the three SPIRE bands and the optical/nearinfrared data.In the present analysis, the statistical clustering properties of sub-mm galaxies are implicitly (and non-trivially) used to reconstruct the missing Fourier modes on scales smaller than the point spread function (section 1), so simulating a wider range of clustered multi-wavelength training data should improve the deconvolution.Strong gravitational lensing could also be included (e.g.Negrello et al. 2010), in which case the network could also encode multi-wavelength information, such as the presence of a foreground elliptical or cluster to signpost possible strong lensing.Extending the simulations and neural net training to a wider multi-wavelength domain has the potential in principle to implicitly incorporate more information than explicit multi-wavelength priors, albeit at a cost of less direct interpretability.
Thirdly, the loss function can be tailored to suit particular science goals.The present analysis represents a particular balance between source completeness, source reliability, flux reproducibility and astrometric accuracy, but other choices are possible.There is no reason to suppose that a single "best" deconvolution to suit all purposes is possible even in principle.Indeed, the balances between angular resolution, point-source sensitivity and large-scale features are usually explicit and deliberate choices in astronomical image processing, driven in each case by the particular science goals (e.g.Briggs 1995;Serjeant et al. 2003;Smith et al. 2019;Danieli et al. 2020).One could imagine optimising the loss function not just for completeness or reliability or some balance thereof, but instead to reproduce the sub-mm galaxy source counts, or make the best estimate of the twopoint correlation function of sub-mm galaxies, or reliably detect faint ultra-red sub-mm galaxies.

CONCLUSIONS
This paper has shown that it is possible to super-resolve Herschel SPIRE data using CNNs.In this paper an autoencoder was chosen.A new and innovative loss function was engineered to better replicate the image features of interest.It is possible to reconstruct both astrometry and source flux using this method with some uncertainty.It is expected that the performance on particularly the source flux would improve with a larger, more varied training set of observed data, reducing the need for simulated data in the training phase.More realistic simulated data might also achieve this goal.

Figure 1 .
Figure 1.Schematic of the autoencoder used in this work.The yellow boxes represent convolutional layers.The purple boxes represent de-convolutional layers.The green boxes represent combined batch normalisation and LeakyReLU activation function layers.Blue boxes represent a sequence of batch normalisation, dropout and LeakyReLU activation layers.The red box represents the sigmoid activation function in the final layer, and the grey boxes are the input/output images.
target  is the flux extracted from the th aperture in the target image and  generated  is the flux extracted from the th aperture in the generated image.

Figure 2 .
Figure 2. The performance of the CNN on simulated COSMOS data from Herschel SPIRE.Top left is the deconvolved image, top right is the actual JCMT SCUBA-2 450 µm image, and the bottom row are the Herschel SPIRE images.The Herschel SPIRE images shown here are processed identically to the network inputs.They are 2-D linearly interpolated, and linearly normalised to have pixel values between zero and unity.The simulated JCMT 450 µm image demonstrates the depth that originates in the high S/N possible with simulations in the lack of discernible noise.

Figure 3 .
Figure 3.The performance of the CNN on real COSMOS data from Herschel SPIRE.Top left is the deconvolved image, top right is the actual JCMT SCUBA-2 450 µm image, and the bottom row are the Herschel SPIRE images.The Herschel SPIRE images shown here are processed identically to the network inputs.They are 2-D linearly interpolated, and linearly normalised to have pixel values between zero and unity.The JCMT 450 µm image shows the noise inherent in real observations, while the super-resolved image shows the power of an autoencoder in reconstructing the JCMT 450 µm image without the clear noise contribution.

Figure 4 .
Figure 4. Left panel: Comparison between fluxes of point sources that are detected in super-resolved simulated images and fluxes extracted from spatially coincident point sources in simulated high resolution target images.Only sources with fluxes exceeding 10 times the background RMS in the super-resolved image are considered.Note that the distribution of background noise is highly non-Gaussian and the significance of any point sources in the super-resolved image, relative to the background level cannot be interpreted in a standard Gaussian framework.The flux is calculated using aperture photometry within a circular aperture of 10" centered on the source locations.The grey histogram shows the number of sub-mm galaxies detected in each bin of target versus super-resolved flux space.The red points and errors show the mean super-resolved aperture flux within each target flux bin and its associated standard deviation, respectively.The bins are defined to ensure equal numbers of galaxies in each bin, which results in the faintest and brightest bins covering a large logarithmic range.The red data points are located at the bin centres in logarithmic flux space.For clarity, the axes on the left panel are linear below 1 mJy and logarithmic above this value.Right panel: Same as left-hand panel, but comparing observed high resolution JCMT SCUBA-2 450 µm images with super-resolved observed Herschel SPIRE counterparts.Note the Eddington bias in the faint fluxes, caused by pre-selection on faint features in the reconstruction (see text).

Figure 5 .
Figure5.Angular distance versus position angle for offsets of extracted sources identified on the super-resolved images and the nearest counterpart point source in coincident SCUBA-2 imaging.Position angles are defined in degrees anti-clockwise from West.Grey markers correspond with individual objects that are identified in the super-resolved images.Contours showing the 68th, 95th and 99th percentiles are overlaid.Overall, the astrometric accuracy is excellent with 99% of the super-resolved objects having offsets 15 arcseconds.There is some evidence for clustering of offsets to the Northwest.

Figure 6 .
Figure6.The stacked PSF of the real JCMT SCUBA-2 450 µm sources (blue), and the super-resolved sources (orange).The fluctuations in the real sources originate from the small sample of real images that can be used for training and validation.This causes a large overlap of sky area repeating the same source multiple times.The smoothness of the super-resolved PSF is achieved by the final activation layer suppressing the noise in the generated images.

Figure 7 .
Figure 7. Left panel: Completeness of recreated sources inside 10" of detected SNR ≥ 5 sources in the JCMT SCUBA-2 450 µm STUDIES survey maps.Right panel: Purity of sources detected in the reconstructed image as compared to sources with SNR ≥ 5 in the JCMT SCUBA-2 450 µm STUDIES survey.In both cases, the horizontal axis shows the minimum flux density threshold under consideration.The shaded areas show the extent of the ±1 binomial uncertainties.
the Python package Matplotlib(Hunter 2007).SS and HD were supported in part by ESCAPE -The European Science Cluster of Astronomy & Particle Physics ESFRI Research Infrastructures, which in turn received funding from the European Union's Horizon 2020 research and innovation programme under Grant Agreement no.824064.SS and LL thank the Science and Technology Facilities Council for support under grants ST/P000584/1 and ST/T506321/1 respectively.

Table 2 .
Autoencoder network architecture.Layers 1-8 comprise the encoder, while layers 9-16 comprise the decoder.The output of the encoder network is a 2 × 2 × 512 element tensor embedding of the input image, which is used as the input to the decoder network.The outputs from each convolutional layer in the encoder network are concatenated with the inputs of their corresponding layer in the decoder network.All layers use convolutional kernels 4 × 4 pixel extent in the width and height dimensions.The output of layer 8 encodes the embedding for this auto-encoder network.

Table 3 .
The pull calculated for the reconstructed sources shown in the righthand panel of fig. 4.