The Herschel census of infrared SEDs through cosmic time

Using Herschel data from the deepest SPIRE and PACS surveys (HerMES and PEP) in COSMOS and GOODS (N+S), we examine the dust properties of IR-luminous (L_IR>10^10 L_sun) galaxies at 0.1<z<2 and determine how these evolve with cosmic time. The unique angle of this work is the rigorous analysis of survey selection effects, making this the first study of the star-formation-dominated, IR-luminous population within a framework almost entirely free of selection biases. We find that IR-luminous galaxies have SEDs with broad far-IR peaks characterised by cool/extended dust emission and average dust temperatures in the 25-45K range. Hot (T>45K) SEDs and cold (T<25K), cirrus-dominated SEDs are rare, with most sources being within the range occupied by warm starbursts such as M82 and cool spirals such as M51. We observe a luminosity-temperature (L-T) relation, where the average dust temperature of log[L_IR/L_sun]=12.5 galaxies is about 10K higher than that of their log[L_IR/L_sun]=10.5 counterparts. However, although the increased dust heating in more luminous systems is the driving factor behind the L-T relation, the increase in dust mass and/or starburst size with luminosity plays a dominant role in shaping it. Our results show that the dust conditions in IR-luminous sources evolve with cosmic time: at high redshift, dust temperatures are on average up to 10K lower than what is measured locally. This is manifested as a flattening of the L-T relation, suggesting that (U)LIRGs in the early Universe are typically characterised by a more extended dust distribution and/or higher dust masses than local equivalent sources. Interestingly, the evolution in dust temperature is luminosity dependent, with the fraction of LIRGs with T<35K showing a 2-fold increase from z~0 to z~2, whereas that of ULIRGs with T<35K shows a 6-fold increase.

IR-luminous galaxies are the ideal laboratories for studies of galaxy formation and evolution through chemical enrichment, star-formation, black hole accretion and stellar mass build-up. They hide an immensely active interstel-lar medium (ISM; e.g. Lutz et al. 1998;Farrah et al. 2003;Narayanan et al. 2005;Sturm et al. 2010) and are the ultimate stellar nurseries, with star-formation rates (SFRs) up to a few thousand times higher than Milky Way (MW)-type galaxies (e.g. Kennicutt 1998;Egami et al. 2004;Choi et al. 2006;Rieke et al. 2009). In addition, they are amongst the most massive galaxies in the Universe (e.g. Dye et al. 2008;Micha lowski et al. 2010) and often their morphologies show signs of interactions and mergers (e.g. Sanders & Mirabel 1996;Farrah et al. 2001;Moustakas et al. 2004;Kartaltepe et al. 2010b). Finally, they frequently harbour an active galactic nucleus (AGN), which is commonly considered a key player in the evolution of the system (e.g. Genzel et al. 1998;Ptak et al. 2003;Alexander et al. 2005;Page et al. 2012).
Until recently our view of the IR-luminous galaxy population at high redshift has been based on data from the space observatories ISO and Spitzer, as well as ground-based submm/mm facilities such as JCMT/SCUBA, APEX/LABOCA, IRAM/MAMBO and SMA/AzTEC. Although huge advances have been made in our understanding of the nature and evolution of these sources, it has been challenging to reconcile the data from space observatories with comparable ground-based IR/mm datasets. In recent years it has become increasingly apparent that, besides strong evolution in IR galaxy number density (e.g. Le Floc'h et al. 2005;Huynh et al. 2007;Magnelli et al. 2009, Berta et al. 2010, the physical properties of IR galaxies might also evolve with time, with the rate of evolution potentially changing as a function of luminosity (Seymour et al. 2010). Studies of the local Universe showed that ultraluminous infrared galaxies (ULIRGs) are characterised by warm average dust temperatures (e.g. Soifer et al. 1984; Klaas et al. 1997;Clements, Dunne & Eales 2010), strong silicate ab-sorption and PAH emission features in their mid- IR continuum (e.g Brandl et al. 2006;Armus et al. 2007), as well as compact starburst sizes (e.g. Condon et al. 1991;Soifer et al. 2001). However, with the onset of submm/mm facilities which probed the early Universe (z >2), such as SCUBA in the late 1990s, a different picture emerged. Many IRluminous galaxies at high redshift were found to be less compact than their local counterparts (e.g. Tacconi et al. 2006;Iono et al. 2009;Rujopakarn et al. 2011), exhibiting stronger PAH emission (Farrah et al. 2007;Valiante et al. 2007) and a greater abundance of cold dust (Kovács et al. 2006;Pope et al. 2006;Coppin et al. 2008). It was later shown that these differences in the measured dust properties were partly due to selection effects and partly due to evolution (Symeonidis et al. 2009;2011a). Moreover, the exploitation of long wavelength data from ISO (Rowan- Robinson et al. 2005) and Spitzer (Symeonidis et al. 2007;, enabled the discovery of IR-luminous galaxies at <1, with a spectrum of properties which overlapped with both the local population and the SCUBA-detected, z >2, sources, providing the missing link between the two (Symeonidis et al. 2009).
The launch of the Herschel Space Observatory 1 (Pilbratt et al. 2010) has opened a new window in infrared astronomy, as it is the only facility to date and for the foreseeable future to perfectly span the wavelength range in which most of the Universe's obscured radiation emerges (70-500 µm), uncovering unprecedented numbers of dustenshrouded galaxies over a sizeable fraction of cosmic time. The large dynamical range of the PACS (Poglitsch et al. 2010) and SPIRE (Griffin et al. 2010) instruments, have enabled spectral energy distributions (SEDs) to be compiled for a large range of objects, both for AGN (e.g. Hatziminaoglou et al. 2010;Seymour et al. 2011) and star-forming galaxies (e.g. Rowan-Robinson et al. 2010). Recent studies of the properties of the IR-luminous galaxy population using Herschel data provide an excellent showcase of the capabilities of this observatory (some examples from the multitude of Herschel papers on this topic: Amblard et al. 2010;Rowan-Robinson et al. 2010;Magnelli et al. 2010;, Magdis et al. 2010Gruppioni et al. 2010;Eales et al. 2010b;Hwang et al. 2010;Elbaz et al. 2010;Rodighiero et al. 2010;Ivison et al. 2010;Buat et al. 2010;Dye et al. 2010;Berta et al. 2010;Dunne et al. 2011;Symeonidis et al. 2011b;Kartaltepe et al. 2012). Results from these studies carried out during the Science Demonstration Phase (SDP) of the largest extragalactic surveys, HerMES (Herschel multi-tiered extragalactic survey; Oliver et al. 2012), PEP (PACS Evolutionary Probe; Lutz et al. 2011) and H-ATLAS (Herschel Astrophysical Terahertz Large Area Survey; Eales et al. 2010a), confirmed previous findings on the diversity of IR-luminous galaxy properties at high redshift, as well as the existence of high-redshift sources with no local equivalents, moving us closer in understanding the complex nature of the infrared galaxy population up to z ∼3.
In this paper we report a comprehensive study of the SEDs and dust temperatures of IR-luminous galaxies up to z ∼2, using the deepest available Herschel /PACS and SPIRE data acquired as part of the PEP and HerMES consortia in the COSMOS and GOODS (N & S) fields. A key aspect of our work is the attempt to eradicate survey selection effects, an issue which has plagued previous attempts to canvas the range of infrared SEDs (see Symeonidis et al. 2011a). Thus for the first time we are able to examine the aggregate properties (infrared luminosity, dust temperature, SED shape) of IR galaxies within an almost entirely bias-free framework. The paper is laid out as follows: the introduction is followed by a section on the sample selection (section 2) and SED measurements (section 3). In section 4 we discuss how we deal with AGN, in order to obtain a sample which is star-formation dominated in the infrared. Section 5 is devoted to treatment of selection effects, enabling us to assemble a complete sample of IR galaxies. In section 6 we present our results and discuss them in section 7. Finally our summary and conclusions are presented in section 8. Throughout we employ a concordance ΛCDM cosmology of H0=70 km s −1 Mpc −1 , ΩM=1-ΩΛ=0.3.

Initial selection
The starting point for this work are data from Herschel, covering three extragalactic fields: the Great Observatories Origins Deep Survey (GOODS)-North and South (Giavalisco et al. 2002) and the Cosmic Evolution Survey (COSMOS) field (Scoville et al. 2007). We use PACS 100 and 160 µm and SPIRE 250, 350 and 500 µm images, acquired as part of PEP and HerMES respectively. Source extraction from the PACS and SPIRE images 2 is performed on the IRAC-3.6 µm positions of the f24 30 µJy GOODS (N and S) sources and f24 60 µJy COSMOS sources, as described in Magnelli et al. (2009) andRoseboom et al. (2010;. This method of source extraction on prior positions is widely used and enables identifications of secure counterparts over the whole SED. In this case however, its significant advantage, lies in its ability to effectively deal with source blending in the Herschel bands, particularly for SPIRE where the beam is large (18.1, 24.9 and 36.6 arcsec FWHM at 250, 350 and 500 µm respectively; Nguyen et al. 2010). By using prior information to identify galaxies in the Herschel images, we are able to extract 'clean' photometry for each galaxy, even for those which appear blended in the PACS and SPIRE bands. For information on the GOODS Spitzer /MIPS 24 µm data see Magnelli et al. (2009) and for information on the COS-MOS Spitzer /MIPS 24 µm data see Sanders et al. (2007); Le Floc'h et al. (2009). The 3 σ sensitivity limits of the PACS 100 and 160 µm catalogues respectively are 5 and 10 mJy for COSMOS, 3 and 6 mJy for GOODS-N and 1 and 2 mJy for GOODS-S. A 3 σ detection in SPIRE using prior positions and the cross-identification method of Roseboom et al. (2010) is approximately 8, 11 and 13 mJy at 250, 350 and 500 µm in all fields. In the case of the PACS bands σ is only Table 1. Table showing the detection statistics of the Herschel sample used in this work. The first and second columns show the field and number of 24 µm sources with f 24 >30 µJy and f 24 >60 µJy for GOODS (N & S) and COSMOS respectively, whose positions are used as priors for source extraction in the Herschel bands. Column 3 shows the total number of 'isolated' 24 µm sources defined as having no companion within 8 ′′ ; the percentage in brackets is calculated out of the number in column 2. Column 4 shows the fraction detected at 100 and 160 µm, whereas column 5 shows the fraction detected at 160 and 250 µm. The final column shows what fraction of the 24 µm population is detected when the two criteria are used in disjunction, i.e. [100 and 160 µm] OR [160 and 250 µm]. This is the criterion used to select the initial Herschel sample (section 2.1). The fractions shown in columns 4, 5 and 6 are out of the number of sources in column 3. As also mentioned in section 2.1, the 3 σ sensitivity limits of the PACS 100 and 160 µm catalogues respectively are 5 and 10 mJy for COSMOS, 3 and 6 mJy for GOODS-N and 1 and 2 mJy for GOODS-S. A 3 σ detection in SPIRE using prior positions and the cross-identification method of Roseboom et al. (2010)  the photometric error, whereas for the SPIRE bands, σ includes confusion error (see Nguyen et al. 2010 for the SPIRE confusion limits). The initial selection for our sample includes all 24 µm sources that have detections (at least 3σ) at [100 and 160 µm] OR [160 and 250 µm] (where 'OR' is the operator representing disjunction in Boolean logic). This ensures that (i) the sample consists of IR-luminous galaxies (LIR > 10 10 L⊙), (ii) the sample is as complete as possible over a large redshift range with respect to SED types, given the PACS and SPIRE selection functions (see section 5 for more details) and (iii) there are at least 3 reliable photometric points in the SED (24 µm+ 2 Herschel bands) for subsequent measurements.
Our selection is in essence a colour selection rather than a single band selection, as we require sources to be detected at both 24 and 160 µm (the additional Herschel photometry at 100 or 250 µm has a small effect on the sample selection but enables more accurate SED measurements; see section 5). As a result, the detection rate is more strongly dependent on the SED shape, in our case the mid-to-far-IR continuum slope, than for a typical flux limited survey. Given the flux limits reported earlier, the 24 µm survey is 66, 166 and 200 times deeper than the PACS 160 µm survey in GOODS-S, COSMOS and GOODS-N respectively. We find that the different ratios in flux density limits (f lim 160 /f lim 24 ) between the 3 fields, introduce a bias with respect to the SED shapes that are detected in each survey particularly for objects with flux densities close to the limit. To mitigate this effect, we match the relative PACS-24 µm depths of the GOODS fields to the COSMOS survey, as the latter covers the largest area and hence dominates the statistics of the final sample. This is done as follows: the GOODS-S sample is cut at f160=5 mJy and the GOODS-N sample is cut at f24=36 µJy, so that f lim 160 /f lim 24 ∼ 166 in all cases. Note that matching the samples in this way ensures that the relative biases between the surveys are minimised, i.e. that all three surveys probe the same range of SED types, however it does not deal with absolute biases; these are dealt with in section 5.
Besides the photometric selection criteria, we also restrict the sample to sources which have no other 24 µm companions within 8 ′′ ; i.e. 'isolated' sources. This allows us to work with more reliable photometry, as at the longer wave-lengths, where the Herschel beam is large, flux extraction in the Herschel bands can be problematic when dealing with blended sources. The choice of an 8 ′′ radius is larger than the 24 µm beam (6 ′′ ) and visual inspection shows that it is sufficient to eliminate problematic blends. In addition, 8 ′′ is the scale of the first airy ring of bright 24 µm sources. In cases where there is a companion within the first airy ring, the Herschel flux is assigned to the bright source in the centre, if the companion source is 10 or more times fainter (Roseboom et al. 2010). By eliminating such cases from our sample, we avoid the occurance of a potential bias causing the measured Herschel flux density to positively correlate with the 24 µm flux density. As a result, for the 24 µm sources we subsequently use, there have been no prior assumptions when assigning Herschel fluxes. This is confirmed by performing a K-S test on the flux density distribution of the whole 24 µm population and that of the 'isolated' 24 µm sources in each field. We find the two to be entirely consistent, suggesting that our approach works well in eliminating problematic sources without introducing systematic biases.
At this point, our assembled Herschel sample consists of 2500 sources (2206 from COSMOS, 173 from GOODS-S and 121 from GOODS-N). Some statistics for the initial sample are shown in table 1.

Redshifts
The redshifts we use are a combination of spectroscopic and photometric, assembled from various catalogues: Berta et al.

MEASUREMENTS
We fit the photometry of the Herschel sample, with the Siebenmorgen & Krügel (2007, hereafter SK07) library of 7208 models, built on the formulation of Krügel & Siebenmorgen (1994). This library of templates ranges in 5 free parameters, in physically acceptable combinations and the shape of each template is determined by the combination of parameters which define it: • the radius of the IR emitting starburst region (R), taking discrete values of 0. 35, 1, 3, 9, 15 kpc • the total luminosity of the system (Ltot) ranging from 10 10 to 10 15 L⊙ • the visual extinction from the edge to the centre of the starburst, taking discrete values of 2. 2, 4.5, 7, 9, 18, 35, 70 and 120 mag • the ratio of the luminosity of OB stars with hot spots to the total luminosity (LOB/Ltot) taking discrete values of 40, 60 and 90 per cent for the 3 kpc models and 100 per cent for the 9 and 15kpc models • the dust density in the hot spots in units of hydrogen number density (cm −3 ) ranging from 100 to 10000, in discrete steps As mentioned earlier, 6-band photometry is available -24 µm from Spitzer /MIPS, 100, 160 from Herschel / PACS and 250, 350, 500µm from Herschel /SPIRE -and at least 3 bands, always including the 24 and 160 µm data, are used in the fitting. The normalisation of the templates is varied in order to minimise χ 2 . The 0.68 lower and upper confidence limits for our computed parameters resulting from the fits (e.g. total infrared luminosity etc.) are calculated according to the prescribed χ 2 confidence intervals for one interesting parameter, namely χ 2 min + 1, where χ 2 min is the minimum χ 2 . Note that because our photometry only sparsely samples the full IR SED, the parameters that characterise the best fit SEDs within χ 2 min + 1 are often degenerate and not well constrained. In addition, each object in the sample is fit with the entire SK07 library irrespective of the inherent luminosity of the templates; the total infrared luminosity, LIR (L⊙), of each source is computed by integrating the best matched SED model between 8 and 1000µm (see Fig.  1 for a plot of LIR as a function of redshift). This allows us to stay clear of any assumptions which link the SED shape to the luminosity. However, it also implies that we cannot directly use some of the SK07 parameters to describe our sample, as they require scaling. One such parameter is the starburst size, as its impact on the SED shape depends on the total input luminosity. Finally, the SK07 grid, although more flexible than most stand-alone SED libraries currently in the public domain, is still too coarse to allow complete characterisation of the physical properties of the sample. For these reasons, we opt to use one parameter to describe the overall shape of the SK07 SED templates; we refer to this as the flux (F), calculated as log [Ltot/4πR 2 ] in units of L⊙ kpc −2 , where Ltot is the given luminosity of the template (not our computed LIR) and R is the starburst size that corresponds to that template. In the SK07 formulation, for constant AV, as R becomes larger, the dust mass increases as a function of R 2 and hence becomes cooler, with the temperature being a function of Ltot/R 2 . Hence, the larger F is, the more flux reaches the edge of the starburst region and therefore the dust emission is warmer. One can interpret high and low values of F as representative of systems with warmer/more compact and cooler/more extended dust-emission respectively.
In order to calculate dust temperatures, we use a modified black-body function (a grey-body), of the form B λ (T )(1−e −τ λ ), with a wavelength dependent optical depth τ λ = τ100µm(100µm/λ) β (e.g. see Klaas et al. 2001) and a dust emissivity index β. We assume a low opacity limit, so approximate the term (1 − e −τ λ ) by λ −β . Typical reported values of β range between 1.5-2 (e.g. Dunne et al. 2000, Lisenfeld, Isaak & Hills 2000 and we adopt β=1.5, consistent with studies of the far-IR emissivity of large grains (Desert, Boulanger & Puget 1990). The temperatures are derived by fitting all photometry at λ λi max−1 , where i denotes a Herschel band (100, 160, 250, 350, 500 µm) and i max is the band which corresponds to the maximum flux (νfν); the 24 µm photometry is never included in the fitting. Note that although some studies have shown that a two-temperature greybody model (e.g. Klaas et al. 2001;Dunne & Eales 2001), is a more accurate description of far-IR dust-emission, this would not work with our available photometry; much longer wavelength data would be needed especially for sources at high redshift. However, in any case, our aim is to measure the average dust temperature of the far-IR peak for each source, thus a single temperature greybody model is required; our method gives a temperature which is most representative of the peak dust emission.

DEALING WITH AGN CONTAMINATION
As this work targets the properties of the star-forming galaxy population, objects whose infrared energy budget potentially includes a significant contribution from an AGN need to be removed from the sample. Although the frac- tion of Herschel galaxies found to host AGN is high (∼30 per cent; Symeonidis et al. 2011b), it has been shown that AGN in far-IR-selected galaxies do not often dominate the infrared or total energy budget of the system. According to an energy balance argument, if the AGN is energetic enough to contribute significantly to the infrared emission of a starburst galaxy, then its signature is likely to emerge in the mid-IR part of the SED in the form of a power-law continuum (e.g. Symeonidis et al. 2010). As a result, the most suitable way to identify such objects is by examining their colours in the Spitzer /IRAC (3.6, 4.5, 5.8, 8 µm) bands. Until recently, the most commonly used IRAC AGN selection criteria have been those presented in Lacy et al. (2004) and Stern et al. (2005). However, as shown in Yun et al. (2008) and Donley et al. (2012), there is a non-negligible chance that IR/submm selected galaxies will be erroneously identified as AGN-dominated in the IRAC bands. In addition, Hatziminaoglou et al. (2009)  (2012) who show that AGN rarely contribute more than 20 per cent in the IR emission of far-IR detected systems (see also Hatziminaoglou et al. 2010;Page et al. 2012;Rosario et al. 2012;Nordon et al. 2012).

Herschel selection
To accurately characterise the aggregate properties of the IR-luminous population and their evolution with redshift, there should be no bias with regard to the SED types we can observe, particularly regarding the far-IR where all our measurements are performed. As a result, the accuracy of our work rests on minimising selection biases and assembling a sample within an unbiased part of the L−z parameter space. For this purpose, we use the method described in Symeonidis et al. (2011a) to examine the selection functions of the PACS and SPIRE bands, mapping out an SED-redshiftluminosity parameter space at the flux density limits of the GOODS and COSMOS surveys used in this work. Fig. 3, shows the LIR ∼10 11 L⊙ and 10 12 L⊙ selection functions for MIPS 24 µm, PACS 100, 160 µm and SPIRE 250, 350 and 500 µm at the COSMOS flux density limits. As also explained in detail in Symeonidis et al. (2011a), Fig.  3 is created by using all SED templates from the SK07 library, normalising them to the required total IR luminosity and then scaling and redshifting them to the observed frame. Each SED template is then convolved with the MIPS, PACS and SPIRE filter transmission curves in order to get the weighted integrated flux within each filter. We subsequently perform a colour correction (according to the prescription in the instruments' observer manuals) in order to obtain a monochromatic flux density in each band, derived with the same spectral shape used to calculate the measured flux density of real sources. For each band we then compare our template monochromatic flux density to the flux density limit, in order to determine whether an object with the given redshift, luminosity and SED shape would be part of our sample. This results in the selection functions presented in Fig. 3. Red thick patterns mark the regions where all templates of a given peak wavelength are detected, whereas the blue and green dashed patterns indicate regions where only 90 and 70 per cent of the SK07 templates are recovered respectively. The detection rate relates to variations in SED shape; for example, for the MIPS 24 µm selection function, only luminous SEDs with strong PAH features will be detected at z ∼1.7. In the unshaded areas less than 70 per cent of templates are detectable, a fraction which reduces to zero above a certain redshift.
Note that the selection functions of PACS and SPIRE overlap significantly, suggesting that the SEDs of most sources will be fully sampled by Herschel photometry. However, there are large differences between the MIPS/24 µm and the Herschel selection functions, especially at 500 µm. This is not surprising as they probe different parts of the SED and it is expected that the long wavelength one eventually turns to favour very cold sources and the short wavelength one eventually turns to favour warm sources. The 24µm selection is a steep function of SED shape: SEDs with a significant warm dust component are favoured up to very  Symeonidis et al. (2011a). The plot shows SED peak wavelength (left y-axis) and grey body temperature (right yaxis) as a function of redshift. At any redshift slice and for a given flux density limit, the red region indicates that all SED shapes of the corresponding peak wavelength are detectable, whereas the blue and green dashed patterns indicate regions where only 90 and 70 per cent of the SK07 templates are recovered respectively. In the unshaded areas less than 70 per cent of templates are detectable, reaching zero above a certain redshift. The flux density limits used to construct these diagrams are 0.06, 5, 10, 8, 11 and 13 mJy for the 24, 100, 160, 250, 350 and 500 µm bands respectively, corresponding to the COSMOS 3 σ flux density limits. . The plot shows SED peak wavelength (left y-axis) and grey body temperature (right y-axis) as a function of redshift. At any redshift slice and for a given flux density limit, the red region indicates that all SED shapes of the corresponding peak wavelength are detectable, whereas the blue and green dashed patterns indicate regions where only 90 and 70 per cent of the SK07 templates are recovered respectively. In the unshaded areas less than 70 per cent of templates are detectable, reaching zero above a certain redshift. The flux density limits used to construct these diagrams are 5, 10 and 8 mJy for the 100, 160 and 250 µm bands respectively, corresponding to the 3σ flux density limits in COSMOS. The black box in each panel outlines the redshift range where all SED shapes with peak wavelengths 50-140 µm are detectable -note that the redshift range is larger in the 3rd panel. In the work presented in this paper we use only sources which lie below the red curve, which includes any source with 50 < λ peak (µm) < 140, or, equivalently, temperature within 18 T (K) 52. high redshifts. The small area covered by the red pattern implies that it is only over a small redshift range that all SED shapes are recoverable. However, the large blue and green shaded regions indicate that cold SEDs can be detected up to high redshift, as long as they do not have a high far-IR to mid-IR ratio.
Ignoring the 24 µm selection for the moment, Fig. 3 shows that our far-IR selection criteria of a detection at [100+160 µm] OR [160+250 µm] will result in the largest number of sources detected in an unbiased part of L-z space. This is more clear in Fig. 4  .5, at the COSMOS limits. The black boxes outline the extent in redshift whereby all SEDs with peak between 50-140 µm will be detected. This translates to a temperature range of 18-52 K, using the Wien displacement law for a νfν grey body, T (K) ∼ hc (4+β)kλ peak , where h is the Planck constant, c is the speed of light in a vacuum and k is the Boltzmann constant and we take the dust emissivity (β) to be 1.5. This choice of peak wavelength/temperature range is the best compromise between the range of SED types probed and the number of objects studied. As we shall see in section 6 increasing that range would not have changed the measured average properties of the sample but would have significantly reduced the statistics. The black box in the lower panel of Fig. 4 shows that using the two criteria in disjunction, i.e. [100+160 µm] OR [160+250 µm], allows log (LIR/L⊙)=11.5 sources to be selected up to much higher redshifts, than when these criteria are used separately.
With the aid of the combined selection functions, we now define the complete LIR-z parameter space for each survey (GOODS-N & S, COSMOS), shown in Fig. 5, where the curves separate the complete (below curve) and incomplete (above curve) part of the parameter space. The space below the curves indicates the LIR-z range, where any source with SED peak wavelength within 50 < λ peak (µm) < 140, or, equivalently, temperature within 18 T (K) 52, is detectable. As mentioned earlier, the [100+160 µm] criterion (green curve in Fig. 5), picks up more sources at low redshift but excludes more high redshift sources because its selection function quickly turns over to favour warm SEDs providing a limited unbiased L-z space at high redshift. On the other hand the [160+250 µm] criterion (blue curve in Fig. 5) performs poorly at low redshift, because the SPIRE 250 µm which mainly drives the combined selection function, largely favours cold sources. At z ∼1, the combination of 160 and 250 µm turns over as now these bands are sampling 80 and 125µm respectively, covering the bulk of IR emission. The combined criterion of [100+160 µm] OR [160+250 µm] (red curve in Fig. 5) is what we thus use to select the final sample used in this work. Note that it is the 160 µm band that principally drives the combined selection function, whereas the PACS 100 µm and SPIRE 250 µm in essence provide an additional band in the infrared, vital for our analysis. However, as they complement each other well, such that most SEDs missed at 100 µm are picked up at 250 µm and vice versa, the complete region below the red curve in Fig. 5 includes many sources which are in the incompleteness regions of both the blue and green curves. In addition, this selection criterion ensures that the SED peak is well sampled for most sources up to z ∼ 2.
Although the selection outlined above is quite conservative since it is unlikely that all SK07 models with 50< λ peak (µm) <140 are representative of real objects, it nevertheless allows us to perform our analysis within a biasfree framework with respect to the PACS and SPIRE surveys. Hereafter, our study concerns only the 1159 sources within the complete parameter space below the red curves in Fig. 5. The redshift distribution of the final Herschel sample is shown in Fig. 6.

The 24 µm selection
In section 5.1 we assembled a complete sample with respect to the Herschel bands, which cover the part of the SED primarily used in our study (see section 6). It is now important to examine whether the requirement for a 24 µm detection affects the completeness of this sample. Fig. 3 shows that although the 24 µm and 160 µm selection functions cover approximately the same redshift range, some SEDs are systematically missed at 24 µm. We investigate this further by aiming to answer the following questions: what types of SEDs are missed, how common are sources with such SED types and how does this affect our results. The first question is easier to answer. These SEDs are ones with high far-to-mid-IR ratio, resulting from a combination of parameters in the SK07 formulation, such as high extinction (AV>70) and/or low luminosity and/or low dust density within the hot spot, with a detection rate that is also redshift dependent. Examples of these SEDs (in the observed frame) are presented in Fig. 7, and Fig. 8 shows the f500/f160 -f160/f24 colour space these cover; in both figures we assume the COSMOS flux limits. Fig. 7 shows that such templates are characterised by steep mid-IR continua Figure 7. Example SED templates from the SK07 library, shown in the observed frame. In the top panel they are normalised to a luminosity of L IR =10 11 L ⊙ and then scaled and redshifted to z=0.3. In the lower panel they are normalised to a luminosity of L IR =10 12 L ⊙ and scaled and redshifted to z=0.9. At these luminosities and redshifts, these templates would not be detected down to the 60 µJy COSMOS 24 µm flux density limit. However they are detected in the Herschel bands at the COSMOS 3 σ 100, 160, 250 µm limits, also shown in both panels. For comparison, the SED of Arp220 (black SED; taken from SK07) is also plotted normalised at the appropriate luminosity and redshifted. Note that at z=0.9, it is only just detected. and strong silicate absorption features at 9.7 and 18 µm, a result of high extinction in the SK07 formulation. Note that these SED types are easily detected in the Herschel bands. In both panels of Fig. 8, we also show the SED of Arp220 (taken from SK07), normalised at the appropriate luminosity and redshift. Arp220 is one of the most optically thick ULIRGs known (e.g. Papadopoulos et al. 2010), so it is interesting to examine whether it would be detected in our sample. The top panel shows that at low redshift it would be detected in all bands, however, at z=0.9 it is only just detected at 24 µm, but easily detected with Herschel. As this SED is redshifted further, it will be missed by the 24 µm survey, suggesting that optically thick SEDs, with deep silicate absorption, would not be in our sample at high redshift (z 1). Fig. 8 shows the observed f500/f160 versus f160/f24 colours for the COSMOS sample as well as for sources in the GOODS samples which fall within the COSMOS completeness region shown in Fig. 5. The grey-shaded region is the parameter space occupied by the SED templates which are  Fig. 7). The black dots are the COSMOS sample and red squares are the GOODS-S and GOODS-N sources within the COSMOS L − z completeness region (Fig. 5). The orange asterisk are the colours for the SK07 SED of Arp220 redshifted to z =1.5. detected in the Herschel bands within the complete LIR − z parameter space for COSMOS outlined in Fig. 5, but are missed by the 24 µm selection. Some overlap between the samples and shaded region should be expected, as some sources could have high f160/f24 ratios because of high f160 rather than a low f24. However, we see very little overlap, suggesting that the detected and non-detected SEDs cover a significantly different part of parameter space in terms of their far-to-mid-IR colour. This implies that, on average, SEDs are missed by the 24 µm selection because of a particular feature that reduces the amount of 24 µm observed flux, such as a silicate feature, rather than an inherently steep far-to-mid-IR continuum. This also becomes obvious from the values of f160/f24 ratios that some templates have: such high values of f160/f24, up to 6 orders of magnitude, can only be caused by a strong 9.7 µm silicate feature; c.f. with what is observed for the Arp 220 SED redshifted at z =1.5, where the 9.7 µm feature falls in the 24 µm band. In terms of the f500/f160 ratio, there are only a handful of Herschel sources with f500/f160 >1, whereas a significant fraction of the templates in the grey region have such cold colours. This is not surprising as we would expect some of the templates that are missed at 24 µm to be overall colder and hence have higher f500/f160; see some examples in Fig. 7.
To answer the second question 'how common are these SED types?', we compare the GOODS (N and S) to the COSMOS colours in Fig. 8. As mentioned earlier, the GOODS sources shown in Fig. 8 are within the COSMOS completeness region mapped out in Fig. 5. However, for GOODS, the 24 µm flux density limit is twice as deep as it is in COSMOS, so in principle these GOODS sources could have SEDs with f160/f24 >330. We do not find any such sources, in fact we note that GOODS objects have f160/f24 ratios within the range covered by COSMOS, suggesting that SEDs with high f160/f24 (up to z∼2) are rare.
In Fig. 9 we examine the flux density distribution of the 24 µm population in the 3 fields under study, in comparison to the distribution of Herschel sources before and after the Herschel completeness criteria are applied (section 5.1). Note the significant offset between the distributions of Herschel sources and that of the 24 µm population, suggesting that Herschel flux densities are intrinsically correlated with bright 24 µm flux densities. This is unlikely to be an artifact of our selection, as we are using only 'isolated' 24 µm objects, so there is no reason why, in principle, a Herschel source cannot be associated with a faint 24 µm source. It is immediately obvious from Fig. 9 that the fraction of sources which would not be part of our sample because of the requirement of a 24 µm detection is very small. Assuming normally distributed flux densities and by calculating the mean and standard deviation of each distribution, we can compute nσ where n is the number of standard deviations (σ), at the location of the 24 µm flux density limit. This gives a rough indication of the fraction of sources that are likely missed due to the 24 µm selection. Before the Herschel completeness criteria are applied, we estimate that 0.1, 1.4 and 0.01 per cent of sources are unaccounted for in GOODS-S, COSMOS and GOODS-N respectively. After these criteria are applied, the fraction of missing sources in COSMOS goes down to 0.2 per cent, whereas for the GOODS fields it is less than 0.06 per cent. Although these are rough estimates and rest on the assumption of normally distributed flux densities, they do indicate that the fraction of sources missed by the 24 µm selection is very small once our final sample is assembled in the complete L − z parameter space using the Herschel selection functions (section 5.1).
In light of this analysis, we conclude that (i) the SEDs missed by the 24 µm selection have high far-to-mid-IR ratios, mainly as a result of high optical depth in the mid-IR and deep silicate features, (ii) these SEDs are not a common occurrence amongst IR-luminous galaxies up to z ∼2 and (iii) our final Herschel sample can be assumed to be complete in terms of the SED types we can detect. Our findings are consistent with the study of Magdis et al. (2011) who found the Herschel 24 µm dropouts to constitute a few per cent of the IR-luminous galaxy population at high redshift and concluded that they must be sources with stronger silicate absorption features -see also Roseboom et al. (2010); Lutz et al. (2011) for discussion of 24 µm selection effects. Finally the third question of 'how would the fraction of sources missed affect our results' is discussed at the end of section 6.4.

Local sample selection
In order to compare the Herschel sample to analogous sources in the nearby (z 0.1) Universe, we also assemble an IRAS -selected local sample of LIR > 10 10 L⊙ galaxies by combining sources from Clements et al. (2010), Hwang et al. (2010) and Buat et al. (2010). For the Clements et al. (2010) objects, we retrieve their published single greybody temperatures and total infrared luminosities calculated using IRAS 60 and 100 µm data as well as SCUBA data. For the other two samples (Hwang et al. 2010 andBuat et al. 2010), we use their computed total infrared luminosities and calculate greybody temperatures by fitting a greybody function of emissivity β=1.5 (see section 3) to the IRAS and AKARI fluxes at λ 60 µm. In all cases, the SEDs of the local sources have some coverage at 100 µm, either because of AKARI or SCUBA data.
In order to combine these samples for subsequent analysis, we select all sources which are in the complete L − z region of the IRAS /60 µm selection function down to a flux density of 1 Jy (see method in section 5 and Symeonidis et al. 2011a for the IRAS selection function). Fig. 5 shows the curve dividing complete and incomplete parts of parameter space. We cut the local samples to include only sources in the complete parameter space, where any source with f60 >1 Jy and SED peak wavelength within 50 < λ peak (µm) < 140, or, equivalently, temperature within 18 T (K) 52, is detectable.

SED characteristics
Typical SEDs for the Herschel sample (0.1 < z < 2) are shown in Fig. 11, split into the 3 standard luminosity classes: normal IR galaxies (NIRGs; 10 10 <LIR<10 11 ), luminous IR galaxies (LIRGs; 10 11 <LIR<10 12 ) and ultraluminous IR galaxies (ULIRGs; 10 12 <LIR<10 13 ); see section 3 for details on the SED fitting. The Herschel bands cover the bulk of the far-IR emission for the majority of sources, although in some cases the exact position of the SED peak might be underestimated or the slope of the mid-IR continuum might not be well constrained due to lack of data between 24 µm and the first Herschel band used in the fitting (λ obs =100 or 160 µm). Nevertheless, in all cases the greybody function covers the bulk of the dust emission giv- Figure 11. Typical SEDs for NIRGs, LIRGs and ULIRGs; y-axis rest-frame luminosity in L ⊙ , x-axis rest-frame wavelength (µm). Black points are the available photometry from 24 to 500 µm. The blue curve is the best-fit SK07 model and the green dotted curve is a single temperature greybody fit around the photometric peak in ν fν . ing a good representation of the average temperature of the sources.
To quantitatively describe the global SED shape we use the F parameter defined in section 3. Fig. 12 shows the distribution in F for the Herschel sample, split into the 3 luminosity classes and overlaid on the distribution of all templates in the SK07 library. Interestingly, the F distributions of NIRGs, LIRGs and ULIRGs show large overlap, perhaps surprising as one might expect ULIRGs to exhibit a noticeable offset to larger F values, simply because they are more luminous. However this is not the case, indicating that many of the Herschel ULIRGs are described by cool/extended rather than warm/compact SEDs, in order to reach the same radiation strength per unit area as their lower luminosity counterparts. Indeed, the majority of objects have 8.5< F <10, suggesting that the IR-luminous population up to z ∼2 is best described by extended rather than compact dust emission. Note that the F distribution of the Herschel sample covers only a small range of the available parameter space. We find that templates with F >11 are not representative of any object (within the 1σ uncertainties on F) and in fact, only about a 1/3 of the number of templates in the SK07 library are representative of the sample. This provides useful insight on what SED types are observationally confirmed in the context of a physicallymotivated suite of models. Fig. 13 illustrates that the observed distribution in F translates to broad-peaked SEDs for the Herschel sample, particularly evident when comparing to those of well-studied local galaxies such as NGC1808, M82, NGC6240 and Arp220 (their SEDs all taken from SK07). For the Herschel SEDs, the slope on either side of the peak is shallow in antithesis with the SEDs of the more compact starbursts M82 and Arp220, which have F=10 and 11 respectively.

Far-IR colours
Fig. 14 shows the rest-frame L100/L250 versus L70/L100 colours of the best fit SK07 models to the Herschel sample. These bands were chosen for two reasons: (i) they probe a part of the SED that is well sampled by our data hence substantially constraining the SK07 models in that region and (ii) they trace the shape of the SED both around the peak (L70/L100) and in the Rayleigh-Jeans side of the continuum (L100/L250). For comparison we also include the colours of the SK07 library as well as those of nearby galaxies and modelled SEDs computed using the GRASIL code (Silva et al. 1998;1999) 3 . These are: M100, M82, M51, NGC 6946, Arp220 and NGC 6090, nearby LIRGs and ULIRGs (Vega et al. 2008), modelled colours to represent face-on spirals (Sa, Sb and Sc), as well as high redshift gamma-ray burst (GRB) host galaxies (Micha lowski et al. 2008), in essence young, compact, star-forming systems of low metallicity. Nearby LIRGs and ULIRGs, such as M82 and Arp220, show warm colours, whereas M51, M100 and M6946 are in the cold part of colour-colour space. On the other hand, GRB hosts have similar values of L100/L250 but warmer L70/L100 colours than spiral galaxies. Note that the modelled spirals (Sa, Sb, Sc) are located along the edge of the available parameter space with colours that become colder down the sequence from Sa to Sc. Although the colours of the Sc spiral are slightly offset from the parameter space that the SK07 templates cover, small discrepancies between SED libraries are Figure 14. Rest-frame L 100 /L 250 versus L 70 /L 100 colours for the Herschel sample compared to other star-forming galaxy types indicated in the legend. The light grey shaded region represents the colours for the SK07 library, with dark grey shading used for templates of 9 < F < 10 and black shading for templates with F > 11. The location of face-on spirals (large white circles with black border) shifts from top-right to bottom-left with consecutive morphological class, with Sa being in the top-right. The red writing indicates the position of the well-known galaxies M100, M82, M51, NGC 6946, Arp220 and NGC 6090. expected, particularly since there are many input parameters which can contribute to the final SED shape.
Overall, the SK07 templates extend over a large range in colour-colour space, adequately covering the colours of the comparison sample. The SK07 colours show a large spread below L100/L250 ∼10 and a narrow tail at large values of L100/L250 and L70/L100. This tail, formed by models with F > 11, is not populated by any of the galaxy groups presented here, suggesting that such hot SEDs are not typical of any type of star-forming galaxy. The darker grey shaded region consists of templates with 9 < F < 10 and is the parameter space that most Herschel sources occupy (see Fig. 12). These templates have cool far-IR L100/L250 colours of <10 and a large spread in L70/L100. A value of L70/L100=1.0 ±0.3 ties in with an SED peak between 70 and 100 µm, the range seen in the Herschel sample (see section 6.3). The light grey lower-left part of colour-colour space is scarcely occupied by the Herschel sample, indicating that very cold, cirrus-dominated SEDs with L100/L250 <3 are uncommon in LIR > 10 10 L⊙ galaxies. This is not a consequence of either the selection or the survey flux limits; we remind the reader that our sample is complete with respect to the SED types that can be probed, hence a deeper IR survey is not expected to identify IR-luminous galaxies with different SEDs to the ones observed here. Colder colours (L100/L250 <3) might perhaps be more common amongst Figure 15. Dust emperature as a function of total infrared luminosity for the Herschel sample (black points). The green filled circles are the mean temperatures computed for 11 L IR bins (see table 2). The green error bars are the standard deviation in the measured temperatures of each bin, whereas the black error bars are the error on the mean. The 2 limiting scenarios of the Stefan-Boltzmann law are also shown (solid black lines). The dotted horizontal lines outline the L − T parameter space in which we are complete; see section 5.1. more quiescently star-forming galaxies with lower infrared luminosities (LIR < 10 10 L⊙), but examining such sources is beyond the scope of this paper. It is interesting to compare the locus of Herschel sources with that of the comparison samples. We see significant overlap overall, however many Herschel LIRGs and ULIRGs are clearly offset from the region traditionally occupied by their local counterparts, displaying colder colours consistent with more quiescent star-forming galaxies such as M100 and NGC 6090.

The luminosity -dust temperature relation
The dust temperature (T ) for the Herschel sample, derived as described in section 3, is shown in Fig. 15 as a function of total infrared luminosity. Note that although the derivation of single average dust temperatures represents simplistic assumptions with respect to dust properties, such as optical depth, emissivity, dust geometry and so on, it is currently the only consistent way to characterise and compare statistically large samples of IR-luminous galaxies over a large redshift range.
The mean temperature of the Herschel sample ranges from about 28 to 39 K, increasing with LIR and showing an average 1 σ scatter of 5 K; see table 2. Our results confirm that the choice of temperature range (∼18-52 K) over which our sample was described as unbiased (section 5) was adequate, as we find that the minimum and maximum average temperatures are offset by about 10 K from 18 and 52 K respectively. In fact, IR-luminous galaxies with T <25 K and T >45 K constitute ∼6 and ∼3 per cent of the total population respectively.
Since the emission from large dust grains in equilibrium, hence the bulk of the IR emission, is well approximated by the black body (or grey body) function, we also investigate to what extent we can use the Stefan-Boltzmann law, L = AǫσT 4 , to interpret the L − T relation, where A is the surface area, ǫ is the emissivity and σ is the Stefan constant. A is proportional to R 2 , which is in turn proportional to the dust mass (M dust ), for constant extinction, so one can re-write the Stefan-Boltzmann law as L ∝ M dust T 4 (or L ∝ R 2 T 4 ). This spawns two limiting scenarios. The first is Figure 16. The luminosity-redshift parameter space of the final Herschel sample used in the analysis (black triangles), split into bins of size 0.2 dex in luminosity and 0.2 in redshift. The bins are outlined in blue for NIRGs, green for LIRGs and orange for ULIRGs and further delineated in red if used when computing the average SED shape in that bin (see Fig. 17 for average SEDs). that the emitting area and/or dust mass is constant which would result in an L − T relation of the form: L ∝ T 4 . The second is that the emitting area and/or dust mass is proportional to the luminosity (L ∝ R 2 or L ∝ M dust ) with the temperature remaining constant for all galaxies. The curves representing these scenarios are plotted in Fig. 15. The observed L − T relation for the Herschel sample is quite flat-2 orders of magnitude increase in luminosity results in only a 40 per cent increase in temperature -and hence closer to the L ∝ M dust limiting scenario. This suggests that the L − T relation is mainly shaped by an increase in dust mass and/or IR emitting radius and less so by an increase in dust heating. In other words, the average dust temperature of ULIRGs is much lower than what one would expect if their increased luminosity were the only factor shaping the L − T relation. This indicates that the dust masses and/or sizes of ULIRGs are larger than those of NIRGs, significantly diluting the effect that their increased luminosity has on the temperature. Fig. 16 shows the L − z distribution of the sample, split into bins of 0.2 dex in luminosity and 0.2 in redshift. For the bins additionally outlined in red (24 in total), containing 5 objects, we compute the average SED in that bin shown in Fig. 17. Fig. 17 is analogous to Fig. 16, such that each row represents a change in redshift interval, and each column a change in luminosity interval. The shaded region represents the 1σ scatter around the average SEDs, whereas the red SED at the end of each row is the average for that row. Consistent with what we observe with regard to the L − T relation (Fig. 15), Fig. 17 demonstrates that there is a shift in the SED peak from longer to shorter wavelengths with increasing infrared luminosity. This is more clear in the last column which shows the average SED for each luminosity bin: the SED peak (λ peak ) shifts from 86 µm in the lowest luminosity bin to 65 µm in the highest luminosity bin. This is also seen in Fig. 13 where the average SED of each luminosity class is shown, with the mean and standard deviation in λ peak being 86±18 µm for NIRGs, 75±18 µm for LIRGs and 65±17 µm for ULIRGs. Note that the 1 σ scatter is large, partly because the peak is not always well constrained by our data, and partly because of the large diversity in SED types (see Figs 11 and 14 for examples).
Another feature that appears to change with LIR, is the silicate absorption depth, becoming shallower for higher LIR. Our data does not probe the depth of the silicate feature, except over a small window in redshift where it coincides with the 24 µm passband. Hence this trend is likely an artifact brought about by model degeneracies. In the SK07 formulation, visual extinction is tied in to the silicate absorption depth, the slope of the mid-IR continuum and the SED peak  wavelength. These quantities are significantly degenerate at low redshifts, although for high redshift sources (z 1), the photometry probes further into the mid-IR continuum, placing additional constraints on the extinction. Nevertheless, although the SED shape in the near/mid-IR is more reliably reproduced for the high redshift sources, the observed trend of decreasing silicate depth with increasing LIR is most likely artificial.

Evolution in dust conditions
In Fig. 18, we compare the properties of the Herschel sources to those of the local sample, assembled as described in section 5.3. The local luminosity-temperature and luminositycolour (C; L60/L100) relations, the latter in functional form from Chapin, Hughes & Aretxaga (2009), are shown in Fig.  18, left and right panels respectively. The L60/L100 colour has been used extensively to characterise the dust temperature of local samples and analysis of IRAS -selected galaxies has shown that more luminous sources have higher colour temperatures than their less luminous counterparts (e.g. Dunne et al. 2000;Dale et al. 2001;Dale & Helou 2002;Chapman et al. 2003). For both L − T and L − C relations we note a systematic difference between the Herschel and local samples, with the former displaying lower values of T and L60/L100. This can be interpreted as evidence for evolution: high redshift IR-luminous galaxies have more emission longward of ∼60 µm compared to their low redshift analogues, lowering the average dust temperature. This is consistent with results from section 6.2, Fig. 14, where we see that the Herschel sample extends to colder far-IR colours than local LIRGs and ULIRGs.
Recent results from the Planck collaboration (Part 16) on the dust properties of nearby IRAS -selected sources showed that many local 10 10 < LIR < 10 11 galaxies extended to lower temperatures than previously reported. This does not affect our work as (i) within the scatter and given that emissivity is a free parameter in their fitting, their measured dust temperature in the 10 10 < LIR < 10 11 luminosity range are consistent with the ones we report here for the local sample (Fig. 18) and (ii) as described in section 5.3, we assemble the local sample in an unbiased part of parameter space. Furthermore, lack of additional submm/mm data for our nearby sources, would not change the average dust temperatures measured, as these trace the peak dust emission and are thus insensitive to inclusion of photometry significantly longward of the peak (see also Magnelli et al. 2012).
Although a systematic reduction in the dust temperature and colour of IR-luminous galaxies from the local to the high redshift Universe is observed, it is worth examining how this translates to changes in the SED shape with redshift within the Herschel sample. In Fig. 17, we see some evidence for a shift in the SED peak to longer wavelengths along the rows, i.e. with increasing redshift, for example in the log LIR=11.4-11.6 bin. However, overall, it is not clear how the SED shape evolves with redshift. This is likely a consequence of sparse sampling of the SK07 templates by our photometry, exacerbated by the small redshift range covered along each row, statistical uncertainties in each bin as well as the fact that in many cases, sources do not cover each L − z bin uniformly (see Fig. 16). A different way to investigate evolution in dust properties is to repeat this exercise with our computed dust temperatures, in order to remove model-dependent uncertainties, although the other sources of uncertainty outlined above would remain. Fig. 19 shows a 2-D image of the dust temperature of IR luminous galaxies (local and Herschel samples combined) as a function of redshift and luminosity, in bins of 0.2 and 0.2 dex respectively. Besides an increase in average dust temperature vertically along the luminosity axis, consistent with our analysis in section 6.3 on the L − T relation, we also note an overall reduction in the mean dust temperature, horizontally, along the redshift axis. This is more pronounced when considering Figure 19. 2-D image of the L − T − z space for IR-luminous galaxies from the local Universe to z = 2. This includes both the local and the Herschel samples. Objects are divided into L IR (y-axis) and z bins (x-axis), and the average temperature of each bin is shown as a greyscale intensity map. The colourbar on the right is the temperature key for the map. The temperature bins extend from 22.5 to 45 K. White colour indicates unpopulated or underpopulated (i.e. <5 objects) L − T − z bins. the first and last bins of each row, however in some cases it is also evident along the length of the row.
Considering the above trends, it is interesting to examine whether the evolution in dust temperature we observe, is luminosity dependent, i.e. whether dust conditions of ULIRGs evolve at a different rate to those of NIRGs (Fig. 20). We place the separation between cold and warm at T =35K, corresponding to the mean temperature for the Herschel sample shown in Fig. 18. Fig. 20 shows an increase in the fraction of cold ULIRGs with redshift, from 5 per cent in the local Universe to about 30 per cent at z ∼1-2. Also the fraction of cold LIRGs increases from about 60 per cent locally to about 80 per cent at high redshift. However, as for Figs 17 and 19, in some bins, the L − z parameter space is not sampled uniformly. The higher redshift bins in Fig. 20 include a larger fraction of more luminous and hence warmer sources, implying that the fraction of cold sources is likely underestimated, resulting in the observed downturn of the computed fraction.
Note that given the depth of our data, currently this is the best attainable coverage of the L − z plane in the horizontal (z) direction within the unbiased framework we define in this work. This does not mean that LIRGs at z 1 and ULIRGs at z 2 will not be detected by Herschel or other facilities, rather it implies that it is not currently possible to measure the aggregate properties of the LIRG and ULIRG population at those redshifts. On the other hand, with larger area Herschel surveys we expect to cover the gap between the local and high redshift Universe in the vertical (L) direction, enabling better sampling of the L − z plane, where large survey area is needed, and hence achieve better statistics for log LIR/L⊙ >11.5 galaxies up to z = 1.
With respect to the selection at 24 µm, we saw that we are likely missing a few per cent of SEDs with steep farto-mid-IR continua and deep silicate features, about half of which have f500/f160 >1 extending to higher values than we find in the Herschel sample (see Fig. 8 and section 5.2). This suggests that many of the sources missed by the 24 µm criterion might have lower temperatures than the average temperatures derived for the Herschel sample. Although this fraction of sources is very small and hence unlikely to change our results, it would nevertheless only serve to strengthen the differences we observe between the local and high redshift sample.

The properties of the IR-luminous population
The Herschel sample under study consists mainly of LIRGs (64 per cent), with ULIRGs constituting 20 per cent, consistent with what is expected in the redshift range probed (0.1 < z < 2), whereas the remaining 16 per cent are normal IR galaxies (NIRGs; 10 10 < LIR < 10 11 L⊙). The dust temperatures of the sample show large scatter from 15 to 55 K, however the mean temperature ranges from 28±4 to 39±6 K, with ULIRGs being on average about 10 K warmer than NIRGs. Similarly, we see a shift in the average SED peak wavelength from 86±18 µm for NIRGs to 65±17 µm for ULIRGs. The sample is best described by cool/extended, rather than warm/compact SEDs, and broad peaks, translating to low values of F, a parameter which we defined as a measure of the overall IR SED shape. In addition, there is a large overlap in colour-colour (L100/L250 -L70/L100) space between the Herschel sample and other star-forming galaxy types, the coldest of which (spirals and young compact star-forming galaxies) are at L100/L250 7 and the warmest (ULIRGs and starbursts) at L100/L250 7. For the Herschel sample, we noted a roughly equal number of sources above and below L100/L250 ∼7. It is worth mentioning that only about a 1/3 of the SK07 templates are representative of the Herschel sample. The L100/L250>20, L100/L250<3 and L70/L100>2 regions of the SK07 library are scarsely populated. Moreover, the distribution in F of the SK07 library extends to much higher values (F ∼15), than what is observed in the Herschel sample (F <11). This indicates that compact, hot-dust-dominated SEDs or very cold cirrusdominated SEDs are not typical of the IR-luminous population.
How do our findings compare to other studies of IR galaxies? As mentioned earlier, we have performed a rigorous analysis of selection effects in order to minimise any biases which would interfere with our results. Hence, due to the nature of our study, we are sensitive to most, if not all, IR galaxy types with LIR > 10 10 L⊙ at z=0.1-2. Consequently, results on the properties of IR galaxies from previous studies with Spitzer (e.g. Symeonidis et al. 2009;Kartaltepe et al. 2010a;Patel et al. 2011) andSCUBA (e.g. Kovacs et al. 2006;Coppin et al. 2008;Santini et al. 2010) are all within the parameter space we probe (see also Magnelli et al. 2012). The results reported recently using Herschel data (e.g. Rowan-Robinson et al. 2010;Hwang et al. 2010;Smith et al. 2012) are also within the parameter space we define here for the IR-luminous population. However, we note that although cold, cirrus-dominated SEDs such as those reported in Rowan-Robinson et al. (2010) and Smith et al. (2012) are part of our sample, they represent a small fraction (<6 per cent) of the IR-luminous population and are not the prevalent SED types.

The L − T relation
The increase in dust temperature as a function of infrared luminosity we observe here, is similar to the L − T relation that has emerged from most infrared population studies (e.g. Dunne et al. 2000;Dale et al. 2001;Dale & Helou 2002;Chapman et al. 2003). Since the emission from large dust grains in equilibrium, hence the bulk of the IR emission, is well approximated by a black (or grey) body function, we aimed to understand the L−T relation within the framework of the Stefan-Boltzmann law. As described in section 6.3, the two limiting scenarios of the Stefan-Boltzmann law are: (i) L ∝ T 4 with R (the radius of the emitting region), or M dust (the dust mass) kept constant and (ii) L ∝ R 2 or L ∝ M dust with T kept constant. The former scenario would produce a steep L − T relation, whereas for the latter the L − T relation would be flat, with the increase in luminosity tying in with an increase in surface area of the emitting body or the dust mass. We find that the L − T relation for the Herschel sample lies closer to the latter scenario suggesting that its shape is mainly driven by an increase in dust mass or extent of dust emitting region and less so by energetics. In other words, it seems that the increased dust heating in ULIRGs is diluted by an increase in their physical size and/or dust mass, such that the L − T relation is diverted away from the L ∝ T 4 scenario. However, although more strongly starforming galaxies such as ULIRGs are predominantly more massive (e.g. Davé 2008;Shapley 2011) and dustier systems (e.g. Magdis et al. 2012) with warmer average dust temperatures, we find that their SED shapes are not substantially different to their lower luminosity counterparts. In particular, we do not see a clear segregation in far-IR colour-colour space or in the range covered by F as a function of infrared luminosity. Interestingly, this suggests that properties such as optical depth, dust extinction, extent of IR emitting regions etc., which determine the overall SED shape are not substantially different between high redshift ULIRGs and their lower luminosity systems.

Evolution of the IR-luminous population
In this work we found evidence that the dust temperatures of Herschel sources are systematically colder than equivalently luminous galaxies in the local Universe. The rigorous analysis of survey biases we performed ensures the validity of this result, as our final sample selection was sensitive to all IR galaxies with 18 T (K) 52. We found sources at z >0.5 and log [LIR/L⊙] >11.0 to be on average between 5 and 10 K colder than their z <0.1 counterparts. We also noted a systematic offset to colder rest frame L60/L100, L100/L250 and L70/L100 colours for the Herschel sample in comparison to local equivalent sources. We believe that this is unlikely to be due to an increase in extinction, which by removing flux from the mid-IR and adding it in the far-IR could mimic a lower dust temperature. As discussed earlier, sources with high extinction and deep silicate absorption are missed by our 24 µm selection at z >1. However, they do not constitute more than a few per cent of the population (see also Magdis et al. 2010). Moreover, for the high redshift sources, where the Herschel photometry can more accurately constrain the slope of the mid-IR-to-far-IR continuum, we find that the sample is mainly fit with low extinction (shallow silicate absorption depth) models.
Note that although previous studies have shown that IR galaxies colder than local equivalents exist in abundance at high redshift -e.g. results from ISO (e.g. Rowan-Robinson et al. 2005), SCUBA (Kovács et al. 2006;Pope et al. 2006;Coppin et al. 2008), Spitzer (e.g. Symeonidis et al. 2008;Kartaltepe et al. 2010a;Patel et al. 2011), BLAST (Muzzin et al. 2010), Herschel (e.g. Hwang et al. 2010, Rowan-Robinson et al. 2010Smith et al. 2012) -for the first time we determine that the mean dust temperature of the IR-luminous population as a whole decreases as a function of redshift. Moreover, we find that the decrease in temperature is also a function of infrared luminosity, i.e. the temperatures of more luminous objects show a stronger decline from the local to the early Universe. We note that almost all NIRGs, up to z ∼0.5, have temperatures below 35 K, whereas for LIRGs the local cold (T <35 K) fraction is 60 per cent increasing to about 90 per cent at z ∼0.5. The ULIRGs show the largest increase in the cold galaxy fraction, from about 5 per cent at z < 0.1 to 30 per cent at z = 1 − 2. This is interesting as it implies that LIRGs undergo more modest evolution than ULIRGs, the former showing a 2-fold increase in the fraction of cold galaxies, whereas the latter a 6-fold. Moreover, the cold LIRG fraction in the local Universe is about 12 times higher than the cold ULIRG fraction, however, we see that this difference decreases to about 2.5 at z ∼0.8. Similarly the cold NIRG fraction in the local Universe is about 50 per cent higher than the cold LIRG fraction, whereas at z ∼0.4 these fractions are about the same. This is evidence that cold galaxies become more dominant at high redshift. However, it also indicates that there might be a lower limit in the average dust temperature of the IR-luminous population, at T ∼ 25 K, towards which systems tend. This is not to say that T <25 K IR-luminous galaxies do not exist, but these would be at the tail of the temperature distribution.
As discussed earlier, the L − T relation would be completely flat, e.g. all IR luminous galaxies would have an average temperature of ∼25 K, if their sizes or dust masses increased in proportion to their total IR luminosities (see also Fig. 15). As it stands, this is not the case, and so the increased luminosity succeeds in heating up the dust to a higher temperature. However, the decrease in average dust temperature with redshift suggests that high redshift LIRGs/ULIRGs must have more extended IR emitting regions and/or higher dust masses relative to their lower redshift counterparts, causing the L−T relation at high redshift to become flatter than the local one. Described phenomenologically, we observe that the temperature evolution of IRluminous galaxies is more rapid if their local temperatures are much higher than 25 K, such as for ULIRGs, than if they are closer to 25 K such as for NIRGs. Our results are in agreement with the work presented in Dunne et al. (2011) who find strong evolution in the dust mass density, proposing that IR-luminous galaxies are dustier at z ∼0.5 compared to today, corresponding to a factor 4-5 increase in the dust masses of the most massive galaxies. Moreover, CO measurements support the idea of extended instead of compact star-formation in high redshift ULIRGs, which have >>kpc CO sizes, in contrast to local equivalent sources which are more concentrated (<1kpc) (e.g. Tacconi et al. 2006, Iono et al. 2009Rujopakarn et al. 2011). Moreover, significantly higher gas fractions in z ∼ 1 disc galaxies than in nearby discs have been reported (e.g. Tacconi et al. 2010). Our results also agree with Seymour et al. (2010), who studied the comoving IR luminosity density (IRLD) as a function of temperature, proposing that cold galaxies dominate the IRLD across 0 < z < 1 and are thus likely to be the main driver behind the increase in SFR density up to z ∼1.
Although the IR-luminous population and particularly LIRGs and ULIRGs seem to have more extended dust distribution and/or higher dust masses at high redshift, the analysis presented here cannot constrain whether these systems are characterised by a merger-induced or isolated starformation history or where they are located in the starformation rate-stellar mass parameter space (e.g. Reddy et al. 2006Reddy et al. , 2010Wuyts et al. 2011;Whitaker et al. 2012;Zahid et al. 2013). Morphological studies of IR-luminous galaxies have presented contrasting results -e.g. Bell et al. 2005;Lotz et al. 2006 report that more than half of LIRGs at z> 0.7 are gas-rich isolated spirals, whereas other studies (e.g. Le Fevre et al. 2000, Cassata et al. 2005, Bridge et al. 2007de Ravel et al. 2009) claim an increase in the merger rate out to z ∼1 suggesting that more than half of IR luminosity density out to z ∼1 is a result of some merger event. Moreover, Zamojski et al. (2011) and Kartaltepe et al. (2012) find more than 70 per cent of ULIRGs up to z ∼2 to be mergers. These findings are hard to reconcile and recent evidence from Lotz et al. (2011) shows that these differences might simply be the result of increasing gas fractions at high redshift, as the timescale for observing a galaxy to be asymmetric increases in tandem with the gas fraction, with the resulting dust obscuration also being a key factor. With the work presented here, we are not able to test this argument nor examine the morphological evolution (if any) of IR galaxies. Nevertheless, our description of the IR-luminous population is independent of morphological classification. Our results indicate that the gas-rich environment in the early Universe might have set or enabled different initial conditions in these systems, resulting in the observed differences between IR galaxies at high redshift and their local counterparts. The increase in cold (T <35 K) galaxy fraction reported here suggests a greater diversity in the IR population at high redshift, particularly for (U)LIRGs. In contrast, the dust properties of the local (U)LIRG population are more uniform and as a result they are not archetypal of the (U)LIRG population as a whole.

SUMMARY & CONCLUSIONS
We have examined the dust properties and infrared SEDs of a sample of 1159 infrared-selected galaxies at z=0.1-2, using data from Herschel /PACS and SPIRE and Spitzer /MIPS (24 µm) in the COSMOS and GOODS fields. The unique angle of this work has been the rigorous analysis of survey selection effects enabling us work within a framework almost entirely free of selection biases. The results we thus report should be considered as representative of the aggregate properties of the star-formation-dominated, IR-luminous (LIR>10 10 ) population up to z ∼2.
We conclude that: • IR-luminous galaxies have mean dust temperatures between 25 and 45 K, with T < 25 and T > 45 K sources being rare. They are characterised by broad-peaked and cool/extended, rather than warm/compact SEDs, however very cold cirrus-dominated SEDs are rare occurrences, with most sources having SED types between those of warm starbursts such as M82 and cool spirals such as M51.
• The IR luminous population follows a luminositytemperature (L − T ) relation, where the more luminous sources have up to 10 K higher dust temperatures. However, the effect of increased dust heating is not solely responsible for shape of the L − T relation. We find that the increase in dust mass and/or extent of dust distribution of the more luminous sources dilutes the increased dust heating, flattening the L − T relation and driving it towards the limiting scenario of L ∝ R 2 or L ∝ M dust with T =const.
• High redshift IR-luminous galaxies are on average colder with a temperature difference that increases as a function of total infrared luminosity and reaches a maximum of 10 K. For the more luminous (log [LIR/L⊙] 11.5) sources, the L − T relation is flatter at high redshift than in the local Universe, suggesting an increase in the sizes and/or dust-masses of these systems compared to their local counterparts.
• The fraction of T<35 K galaxies increases with redshift as a function of total infrared luminosity. Although NIRG fractions are consistent in the local and high redshift Universe, LIRGs show a 2-fold increase and ULIRGs a 6-fold increase. This suggests a greater diversity in the IR-luminous population at high redshift, particularly for ULIRGs.

APPENDIX A: REDSHIFTS
Here we examine the reliability of the photometric redshift catalogues used in this work (see section 2.2) for sources which satisfy the initial criterion for the selection of our sample (section 2.1): 24 µm sources that have detections (at least 3σ) at [100 and 160 µm] OR [160 and 250 µm].  Santini et al. (2009) and Cardamone et al. (2010) catalogues. There is excellent agreement between the spectroscopic and photometric redshift in both catalogues, with the catastrophic failures, which we define as a more than 20 per cent offset in (1+z), being <1 per cent of the spectroscopic sample.
For COSMOS the spectroscopic redshifts are from Lilly et al. (2009) and the photometric redshifts from Ilbert et al. (2008), derived using up to 30 photometric bands. The estimated accuracy of a median zspec-z phot /(1 + zspec)=0.007 for the galaxies brighter than i = 22.5. For the sources which host X-ray detected AGN we substitute the Ilbert et al. redshift with a photometric redshift from Salvato et al. (2009);. These are derived with a combination of AGN and galaxy templates and are hence more accurate for galaxies hosting AGN. Fig. A1 (lower panel) shows a comparison of photometric to spectroscopic redshifts for the COSMOS sample sources. The agreement is again excellent, with only about 1 per cent of sources outside the 20 per cent uncertainty region.
We next examine the redshift distribution of our final Herschel sample used in this work assembled in section 5. This is shown in A2 for the 3 fields. Note that the photometric and spectroscopic redshift distributions agree, although the photometric redshift distribution tails off to higher redshifts. We examine whether the use of photometric redshifts would have any effect on our results, by reproducing one of our main figures using only spectroscopic redshifts (Fig.  A3). We see that using only spectroscopic redshifts does not change the overall differences we find between the Herschel and local sample, however it does significantly reduce the statistics.   Figure A3. This figure is reproduced from the main part of the paper, but solely with spectroscopic redshifts. We see that the overall results remain the same, however the statistics are reduced.