Optimal metallicity diagnostics for MUSE observations of low-z galaxies

The relatively red wavelength range (4800-9300{\AA}) of the VLT Multi Unit Spectroscopic Explorer (MUSE) limits which metallicity diagnostics can be used; in particular excluding those requiring the [O ii]{\lambda}{\lambda}3726,29 doublet. We assess various strong line diagnostics by comparing to sulphur Te-based metallicity measurements for a sample of 671 HII regions from 36 nearby galaxies from the MUSE Atlas of Disks (MAD) survey. We find that the O3N2 and N2 diagnostics return a narrower range of metallicities which lie up to ~0.3 dex below Te-based measurements, with a clear dependence on both metallicity and ionisation parameter. The N2S2H{\alpha} diagnostic shows a near-linear relation with the Te-based metallicities, although with a systematic downward offset of ~0.2 dex, but no clear dependence on ionisation parameter. These results imply that the N2S2H{\alpha} diagnostic produces the most reliable results when studying the distribution of metals within galaxies with MUSE. On sub-HII region scales, the O3N2 and N2 diagnostics measure metallicity decreasing towards the centres of HII regions, contrary to expectations. The S-calibration and N2S2H{\alpha} diagnostics show no evidence of this, and show a positive relationship between ionisation parameter and metallicity at 12 + log(O/H)>8.4, implying the relationship between ionisation parameter and metallicity differs on local and global scales. We also present HIIdentify, a python tool developed to identify HII regions within galaxies from H{\alpha} emission maps. All segmentation maps and measured emission line strengths for the 4408 HII regions identified within the MAD sample are available to download.


INTRODUCTION
Gas-phase metallicity is a key indicator of how star formation has progressed through a galaxy's history, as metals are formed and later expelled into the interstellar medium (ISM) by stars over cosmic time.A principal pursuit in galaxy evolution studies is therefore to trace the build up of heavy elements through cosmic time, and within galaxies.
The oxygen abundance, 12 + log(O/H), which is typically used as a proxy for the gas-phase metallicity of the ISM, is measured via diagnostics which rely on the relative strengths of emission lines (see e.g.Maiolino & Mannucci 2019).Metal recombination lines provide the most direct measure of metallicity, and have the advantage that their line strength is weakly dependent on gas properties such as temperature and density (e.g.Peimbert et al. 2017).However, the lines are very weak, being around 10 3 − 10 4 times fainter than the Balmer lines (Maiolino & Mannucci 2019), and can therefore only be detected in the very nearby Universe.Alternatively, the slightly stronger auroral lines also provide a relatively direct ⋆ E-mail: be329@bath.ac.uk tracer of metallicity, whereby the relative strength of auroral to nebular lines originating from the same species provide a measure of the electron temperature (Te) of the gas.Due to the increased cooling effect provided by the metal lines in more metal-rich gas, the metallicity and electron temperature are linked, so from this measure of Te, it is possible to fairly accurately determine the metallicity of the gas (e.g.Izotov et al. 2006;Peimbert et al. 2017;Kewley et al. 2019;Yates et al. 2020).Nevertheless, while stronger than the metal recombination lines, auroral lines are still 10−100 times fainter than Hydrogen Balmer lines, becoming increasingly faint in more metal rich systems.This means they are also only detectable in a limited number of systems.While auroral line diagnostics can be considered a direct measurement of the physical conditions within the gas, it must be noted that they rely on a number of assumptions and simplifications (e.g.Pérez-Montero 2017; Cameron et al. 2020;Yates et al. 2020).For example, the gas temperature is assumed to remain constant within a series of concentric shells, rather than taking into account variations on smaller scales (Osterbrock & Ferland 2006;Bresolin 2006), which may lead to diagnostics under-estimating the metallicity.
Due to the faintness of these lines, strong line diagnostics have been developed, offering an essential tool to explore the gas-phase metallicity in galaxies too metal-rich or too distant for the recombination and auroral lines to be detected.These diagnostics are developed either by finding empirical relations between a combination of strong line ratios and Tebased metallicity in Hii regions or galaxies, or equivalently, between metallicities and strong line ratios predicted with photoionisation models.
These strong line diagnostics, first developed by Alloin et al. (1979) and Pagel et al. (1979), frequently rely on the [O ii] and [O iii] nebular lines, such as the R23 (log(([O ii]λ3727 + [O iii]λλ4959,5007)/Hβ)) diagnostic, which uses the two principal oxygen states to account for the ionisation structure in the Hii region.This diagnostic, however, has a large dependence on the ionisation parameter, as well as being double-branched, requiring a second, less sensitive metallicity diagnostic to determine which branch applies (Kewley & Dopita 2002;Kobulnicky & Kewley 2004;Maiolino & Mannucci 2019).The N2O2 diagnostic (log([N ii]λ6584/[O ii]λ3727)) has very little dependence on the ionisation parameter, but primarily traces N/O, and is therefore sensitive to the assumed relation between N/O and O/H (Maiolino & Mannucci 2019).The O3N2 (log(([O iii]λ5007/Hβ)/([N ii]λ6584/Hα))) diagnostic is popular due to all relevant lines being generally accessible in a single grating setting, and its small dependence on dust reddening due to the proximity of the lines in the ratios.However, O3N2 is primarily a tracer of the ionisation parameter, log(U ), thus requiring an understanding of how metallicity and ionisation parameter are related, which can vary on spatial scales and with redshift.An alternative to the O3N2 diagnostic which is similarly insensitive to dust reddening, but apparently additionally independent of log(U ) and gas pressure, is the Dopita et al. (2016) N2S2Hα diagnostic, which the authors thus claim is a useful diagnostic to use on highredshift galaxies.
Thus, while strong line diagnostics are an important tool, it is important to remain mindful of their associated shortfalls, largely related to their dependency on properties other than metallicity, such as ionisation parameter, ISM pressure, and electron density (Kewley & Dopita 2002;Dopita et al. 2016).These limitations are manifested in the large discrepancies in metallicity that can be observed between different diagnostics (e.g.Kewley & Ellison 2008) and are not necessarily systematic.Such relative discrepancies can be seen, for example, in the shape of observed radial metallicity profiles (e.g.Belfiore et al. 2017Belfiore et al. , 2019;;Schaefer et al. 2019;Boardman et al. 2020;Mingozzi et al. 2020;Poetrodjojo et al. 2021;Yates et al. 2021).These discrepancies are not yet well understood (Kewley & Dopita 2002;Stasinska 2019).
To this end, in Easeman et al. (2022), we investigated the radial metallicity profiles of galaxies, and found large differences in the prevalence of certain features such as central dips in the radial profiles when different diagnostics were used.The prevalence of these dips in metallicity profiles measured using the O3N2 diagnostic also appeared linked to global properties of the galaxy such as stellar mass and star formation rate, whereas links to global properties were much weaker when the Dopita et al. (2016) N2S2Hα diagnostic was used.
Possible discrepancies between different strong line diag-nostics may arise from biases present in the calibration samples used when deriving these diagnostics (e.g.Curti et al. 2017;Kewley et al. 2019;Stasinska 2019).Often these biases are unavoidable, for example in the need for auroral lines detections in empirically derived diagnostics, which are primarily detected in low metallicity, high excitation gas (Hoyos & Díaz 2006).Using diagnostics on observations of different spatial scales to the calibration sample can also be problematic.Yates et al. (2020) found that diagnostics calibrated on Hii region scales agreed well with Te-based measurements for observations on similar scales, but less well for observations on global scales.Similarly, diagnostics calibrated on galaxyintegrated observations appeared less reliable for observations on Hii region scales.With higher resolution observations using integral-field unit (IFU) spectrographs such as the Multi Unit Spectrographic Explorer (MUSE; Bacon et al. 2010) and, more recently, the NIRSpec integral field spectrograph on the James Webb Space Telescope (JWST; Gardner et al. 2006), the variation in metallicity within galaxies can be studied on much smaller scales than was previously possible.The combination of high spatial resolution and large field of view has made MUSE an especially powerful instrument for studying the ISM conditions within star forming regions, and variations across a galaxy (e.g.Emsellem et al. 2022).However, a drawback of MUSE is its relatively short wavelength range (4650-9300 Å; Bacon et al. 2010), which means that certain key emission lines, such as [O ii]λ3727,29, are not visible for low-z galaxies, limiting the metallicity diagnostics which can be applied.
In this paper we therefore investigate the reliability of a number of strong line diagnostics when applied to MUSE data, using an empirical approach.In Section 2 we detail the observations used, and present the steps taken in our analysis in Section 3. The metallicity diagnostics used are described in Section 4, and our results in Section 5, with a discussion in Section 6.Finally, our conclusions are presented in Section 7. The Appendices contain further information about the sample, as well as flux measurements for the relevant emission lines from 4408 Hii regions we identify within our sample of 36 galaxies using HIIdentify, our newly-developed python tool.

DATA
For our analysis we use MUSE data taken as part of the MUSE Atlas of Disks (MAD)1 survey, which covers 38 galaxies on the star forming main sequence, selected as a sample of nearby (z < 0.013), relatively face-on (inclination < 70 • ) galaxies, with a range of masses from 10 8.5 to 10 11.2 M⊙ (Erroz-Ferrer et al. 2019).
The typical spatial resolution is ∼100 pc (Erroz-Ferrer et al. 2019), allowing for the study of individual Hii regions.This sample therefore provides a large number of individual Hii regions which can be identified and used in our analysis.The range of masses allows us to probe a range of metallicities, as both global and local gas-phase metallicity have been shown to correlate with stellar mass (Tremonti et al. 2004;Sánchez et al. 2013).We give details on the MAD galaxy sample in table B1 of the appendix.

ANALYSIS
The IFU data returned by MUSE are 3D cubes, with 2 spatial dimensions, and one spectral dimension, meaning each pixel of the image has an associated spectrum.A number of data products have been made available from the MAD survey2 , including 2D maps of dust-corrected emission line fluxes for all strong lines within the observed wavelength range.However, for our analysis we require flux maps for additional weak, auroral emission lines, as well as associated line flux uncertainties, which are not readily available.We therefore produce our own emission line maps from the reduced MUSE data cubes, which we download from the ESO archive science portal 3 .
In order to measure accurate emission line fluxes, we first need to separate the stellar and gas emission components in order to correct for Balmer absorption from old stellar populations.A failure to correct for such stellar absorption features can result in the Balmer line fluxes being underestimated, increasingly so for bluer Balmer lines, thus affecting the measured Balmer decrement which we need to produce galaxy dust reddening maps.We use the starlight software package to separate the stellar and gas emission components (Cid Fernandes et al. 2005, 2009), following a similar procedure as described in Krühler et al. (2017).In summary, we use starlight to fit a linear superposition of template spectra to each 2 × 2 binned MUSE spectrum, using the stellar population models from Bruzual & Charlot (2003).At the typical redshift of our galaxy sample (z ∼ 0.005), the 2×2 spatial binning corresponds to a physical size of ∼40 pc, reaching up to 100 pc for the most distant galaxy in our sample (NGC3393).We then linearly scale the best-fit stellar template to the intensity of each of the four spaxels in our bin, and subtract this weighted stellar component from the original data to produce a gas-phase only cube.
We removed NGC3521 from the sample, as it has very weak emission lines, and low S/N of the [O iii] and auroral lines.NGC4593 was also removed, due to the large contribution of and active galactic nucleus in the centre, leaving us with a final sample of 36 galaxies.

Hii region identification with HIIdentify
To identify Hii regions within each galaxy, we use maps of the Hα flux, masking out all spaxels with equivalent width of HαEW< 6 Å, which are associated with Diffuse Ionised Gas (DIGs) rather than star forming (SF) regions.DIGs have different physical properties to Hii regions, with lower gas densities, and lower ionisation parameters.The ionising source for DIGs has not been conclusively determined, meaning the metallicity diagnostics which have been calibrated against observations and models of the gas within Hii regions are not expected to remain valid when used on DIGs (e.g.Sanders et al. 2017;Zhang et al. 2017;Lacerda et al. 2018).Our choice of a threshold of 6 Å is a compromise between the recommendations of Sánchez et al. (2015), Belfiore et al. (2016) and Lacerda et al. (2018) (see also Belfiore et al. 2017).However our results are not very sensitive to the choice of the HαEW threshold that we use, and increasing this cut to 14 Å, as suggested by Lacerda et al. (2018) to identify purely star-forming regions, was found to have little effect on the Hii regions identified using the process described below.
We developed a python tool, which we have named HIIdentify, to automatically identify Hii regions within a galaxy based on the Hα emission, following a similar methodology to codes such as HIIphot (Thilker et al. 2000) and pyHIIexplorer (Espinosa-Ponce et al. 2020).To identify Hii regions, HIIdentify iterates through the pixels from the brightest to the dimmest, terminating at a user-defined lower limit on the flux.Each of these pixels are considered as a possible peak of a Hii region, using several criteria to exclude noise within the image, as shown in Fig. 1.For example the peak is rejected if > 50% of the surrounding pixels have nonfinite flux values, or if the median flux of the surrounding pixels is < 30% of that of the peak.Other criteria must also be met for the pixel to be confirmed as the peak of a Hii region -all of the immediately surrounding pixels must have lower fluxes than the currently considered peak, and a minimum required separation between regions can be specified.For this analysis, we used a minimum separation of 50 pc.
Once a pixel is successfully identified as the peak of a Hii region, surrounding pixels are considered in circular annuli, and are added to the region if the flux is greater than the userdefined background flux, and the pixel has not already been added to another region.If a pixel is selected to belong to multiple regions, it is assigned to the region with the closest peak.The radius of the regions is not constrained, and instead the growth of the region stops when > 80% of the pixels in the annulus have been rejected.
Finally, a minimum required number of pixels can be specified, which sets a lower limit on the number of pixels any identified Hii region must have.Once all possible Hii regions have been identified according to the criteria described above, our code creates a number of maps, including a segmentation map, which depicts which region each pixel belongs to, if any.HIIdentify is publicly available for download, with information on how to install and use it provided in the online documentation4 .An example MUSE Hα map from the MAD sample can be seen in Fig 2, with the outlines of HIIdentify identified regions overlaid.The associated segmentation map is also shown, indicating the spaxels belonging to each of the identified Hii regions.As there are no constraints on the shape or size of the identified regions, it can be seen that HIIdentify identifies all regions with peak flux above a userdefined level, and encapsulates all of the surrounding region out to a given background level, rather than making assumptions about the geometry of the regions.
The results from applying HIIdentify to our sample are shown in Table 1, and the returned segmentation maps have been made publicly available5 .Input parameters were set so as to ensure that the entire Hii region was encapsulated, with any low S/N spaxels removed at a later stage.The back-  ground flux was determined using spaxels with S/N > 3, and HαEW< 14 Å, selecting the 75th percentile of the flux to represent the background level at which to set the edge of the Hii regions.Using the above input parameters, a total of 4408 Hii regions were identified in our sample of 36 galaxies, and they generally had a radius of a few hundred parsecs, which is consistent with observed sizes of Hii regions.Following the release of the MUSE-PHANGS (Physics at High Angular resolution in Nearby GalaxieS; Emsellem et al. 2022) catalogue in Groves et al. (2023), which included segmentation maps from the HIIphot code, in Appendix D we compare the results from the two codes, finding very good agreement in the resulting metallicity measurements.

Spectral line fits
To measure the line fluxes within each spaxel, we fitted the emission lines of interest with Gaussian functions using the specutils package in python.
The lines were grouped into chunks with nearby lines (e.g. the Hα and [N ii]λλ6549,84 lines) when fitting, so that the positions could be tied to that of the brightest line to help constrain the fits, and in the case of doublets, the widths could be tied and the ratio of the amplitudes fixed to known theoretical ratios when fitting.The continuum extending ∼ 50Å either side of the lines was included, allowing the continuum level to be fitted when determining line fluxes.The [O i]λλ6302,65 sky lines were masked out before fitting the [S iii]λ6312 lines.The cube slice was dust-corrected using dust reddening maps that were produced from the measured Hα-to-Hβ Balmer decrement, and assuming a Cardelli et al. (1989) attenuation law.We assumed an intrinsic Balmer decrement of 2.87 (Osterbrock & Ferland 2006), suitable for star forming galaxies with electron density ∼100cm −3 and temperature ∼10,000K.
For the [S iii]λ9531 line, required for our Te-based measurements, we use the known theoretical flux ratio between [S iii]λ9070 and [S iii]λ9531 of 2.47 (Luridiana et al. 2015), along with our measured flux for the [S iii]λ9070 line.

METALLICITY AND IONISATION PARAMETER DIAGNOSTICS
Te-based methods rely on the detection of auroral lines, which are very faint compared to the strength of nebular lines -for example, the [O iii]λ4363 line is around ∼ 100 times fainter than the [O iii]λ5007 line (Maiolino & Mannucci 2019).Despite this difficulty, in regions where the auroral lines can be detected, Te-based diagnostics give a more reliable measure of metallicity.We therefore focus our analysis on the subset of Hii regions where we can derive a Te-based metallicity using the [S iii]λ6312 auroral line (see Section 4.1), and use this as our reference metallicity to which we compare the results from a number of strong line diagnostics.Of the full sample of 4408 Hii regions, 671 had S/N > 5 detections of [S iii]λ6312, as well as S/N > 3 in all nebular lines used in the metallicity diagnostics considered in this paper.

T e-based metallicity diagnostics
As the spectral range of MUSE has a lower limit of 4650 .We are therefore unable to use a Te-based diagnostic based on the oxygen lines.Sulphur has been proposed as a useful alternative tracer (e.g.Berg et al. 2015Berg et al. , 2020;;Díaz & Zamora 2022), as both sulphur and oxygen are produced in massive stars, and the yield of the two elements are expected to be linked, though with a slight time difference in their ejection from different types of supernovae (Kobayashi et al. 2020).The required [S iii]λ6312 auroral and [S ii]λ6717,31 and [S iii]λλ9070,9531 nebular lines also experience less dust reddening due to the longer wavelengths of these lines.
In this paper we adopted the recent sulphur-based method presented in Díaz & Zamora (2022).The RS3 = I(9070 + 9531)/I(6312) line ratio is used to determine Te ,[S iii] , and due to the ionisation structure of the Hii region, where the overlap between S ++ and S + appears to cover most of the region (Garnett 1992 To account for the contribution of S 3+ when the required [S iv]λ10540 is not visible, Díaz & Zamora (2022) provide a method relying on the argon lines.Neither the [S iv] nor argon lines are present within the wavelength range of our data, but the abundance of S 3+ is not expected to be significant in Hii regions within star forming galaxies such as those present within our sample (Díaz & Zamora 2022).
Using the measure of Te ,[S iii] , and the [S ii]λ6317,31 and [S iii]λλ9070,9531 nebular lines, respective values of 12 + log(S + /H) and 12 + log(S ++ /H) are determined, and combined to give 12 + log(S/H).We then convert from the 12 + log(S/H) returned from this diagnostic, to 12 + log(O/H), using a fixed log(S/O) of -1.57(Asplund et al. 2009).There have been suggestions of the S/O ratio being dependent on metallicity (e.g.Dors et al. 2016;Díaz & Zamora 2022), though other works have found no clear dependence (e.g.Berg et al. 2020).We therefore decided to use a fixed value of -1.57 in our analysis.A discussion on the implications of this choice is provided in Section 6.1.

Strong line metallicity diagnostics
The relatively high lower-wavelength cut-off in the MUSE wavelength range restricts the number of diagnostics that we can use, so it was not possible to extend this analysis to commonly used diagnostics such as R23.However, this limitation makes it all the more important that an analysis is carried out to understand the robustness of metallicity diagnostics within the wavelength range and spatial scales covered by MUSE.
In this paper we test the most commonly used strong line metallicity diagnostics which include emission lines available with MUSE for nearby galaxies.Since we are interested in leveraging the high spatial resolution of MUSE, we focus on those diagnostics which have been calibrated on samples of observed or modelled Hii regions, rather than on galaxyintegrated spectra.These are the N2S2Hα diagnostic from Dopita et al. (2016), the N2 and O3N2 diagnostics based on the calibrations from Pettini & Pagel (2004) and Marino et al. (2013), and finally the recent S-calibration diagnostic from Pilyugin & Grebel (2016).
The Dopita et al. (2016) N2S2Hα diagnostic, given in Eqn. 1, was derived using the MAPPINGS 5.0 photoionisation model code, using line ratios which are accessible from the ground out to high-redshift within a single configuration, and which have a low dependence on dust attenuation due to the proximity of the lines.
12+log(O/H) < 8.8.The S-calibration is derived for use when the [O ii]λλ3727,29 lines are unavailable, as is the case for our MUSE data, and instead uses the [S ii]λλ6717,31 lines.They find very good agreement (within ∼0.05 dex) between their S-calibration and the R-calibration, which uses the [O ii] lines for single-slit spectra from a test sample of over 3000 Hii regions.Of note is that Pilyugin et al. (2022) find that when comparing spectra of Hii regions from IFU to single-slit observations, the S-calibration underestimates metallicities by ∼0.06 dex on average at 12 + log(O/H) ≳ 8.55, and by ∼ 0.02 dex on average at lower metallicities.We discuss the potential impact of this on our results in Section 6.Both the S-and R-calibration are double-branched, and Pilyugin & Grebel (2016) thus recommend using the line ratio log(N2) (where N2 = [N ii]λλ6548,84 / Hβ) to separate the upper and lower branch.For the upper branch (log(N2) ≥ -0.6), the relationship in Eq. 6 is used; for the lower branch, the relationship in Eq. 7.Here S2 = [S ii]λλ 6717,31 / Hβ, and

Ionisation parameter diagnostic
The O3N2 and N2 diagnostics are strongly dependent on the ionisation parameter (log(U )), a measure of the central ionising source's ability to ionise the surrounding gas.Some works have suggested that these diagnostics may become unreliable under certain log(U ) conditions if, for example, these conditions are not represented in the calibration samples (e.g.Krühler et al. 2017;Mao et al. 2018).Therefore, we explore whether there is any evidence of this in our data.
To measure log(U ), we used the Mingozzi et al. ( 2020) sulphur-based diagnostic, calibrated on spatially resolved IFU data.Sulphur-based ionisation parameter diagnostics have been found to have a lower dependence on the metallicity compared to oxygen-based diagnostics (Kewley & Dopita 2002;Dors et al. 2011), making the sulphur-based diagnostic in principle more reliable, but also suggesting that any observed trends between metallicity and log(U ) should not be a consequence of the log(U ) diagnostic itself.
This diagnostic relies on the ratio of the [S iii]λλ9070,9531 and [S ii]λλ6717,31 line doublets.The Mingozzi et al. (2020) re-calibration accounts for an overestimate of the strength of the [S iii] lines, suggested by Kewley & Dopita (2002) to be the cause of sulphur-based diagnostics under-predicting the ionisation parameter compared to oxygen-based diagnostics.

RESULTS
To investigate the diagnostics on Hii region scales, we identified Hii regions using HIIdentify, and stacked the spectra of all selected spaxels within each region.We then identified the sub-sample of regions with S/N of the measured [S iii]λ6312 auroral line > 5, to allow for a reliable measurement of the Tebased metallicity.We imposed an additional criteria to ensure that the nebular lines needed in the strong line metallicity diagnostics that we used had a S/N > 3 within each Hii stacked spectrum.This left us with a sample of 671 Hii regions within a total of 31 galaxies, with logarithmic stellar masses ranging between log(M⋆/M⊙) = 8.5 and log(M⋆/M⊙) = 11.2, and SFRs in the range 0.1-11.1 M⊙/yr.The galaxies NGC4603, NGC3393, NGC3081, NGC7162, and ESO498-G5 had no Hii regions which met the S/N criteria (see Table 1).

Sulphur-based T e metallicity diagnostics
As all of our empirically-derived strong line metallicity diagnostics were calibrated against Te-based diagnostics using the [O iii]λ4363 line, we first investigate the agreement between Te-based diagnostics that rely on the [O iii]λ4363 and the [S iii]λ6312 lines, before comparing our [S iii]λ6312 Te-based metallicities to the strong line diagnostics.
Since the [O iii]λ4363 line is not visible in our MUSE data, to make these comparisons we used the published data of Hii regions taken with the Multi-Object Double Spectrographs on the Large Binocular Telescope (MODS; Pogge et al. 2010) as part of the CHemical Abundances Of Spirals (CHAOS) project (Berg et al. 2015).These observations provide a broad wavelength coverage, extending from 3200Å to 10,000Å, enabling the detection of multiple auroral lines, including [O iii]λ4363, out to z ∼ 0.36.Thus far the multislit data for four star-forming galaxies observed as part of CHAOS have been published, amounting to published spectra for 190 Hii regions (Berg et al. 2015;Croxall et al. 2015Croxall et al. , 2016;;Berg et al. 2020).
For this comparison we apply the methodology described in Section 4.1 to the CHAOS data to measure the metallicities using the sulphur Te-based diagnostic from Díaz & Zamora (2022), as well as applying the Izotov et al. (2006) [O iii]λ4363-based method.In addition, we compare these sulphur and oxygen-based Te values to the metallicities presented in Berg et al. (2020), who use a multiple auroral line method to measure the temperature in a 3-zone temperature Hii model that incorporates low, intermediate, and high ionisation zones.They probe the temperatures (and thus metallicities) of each of these zones using a combination of the [N ii], [S iii], and [O iii] auroral lines.These metallicities are then combined to obtain the total metallicity of the region.
In order to compare the published Berg et al. (2020) Hii region metallicities to the respective Izotov et al. (2006) and Díaz & Zamora (2022) oxygen and sulphur Te-based metallicities, we selected all Hii regions from the CHAOS sample with [O iii]λ4363 and [O ii]λ7320,30 lines detected with S/N > 3 (as was done by Berg et al. 2020), and with [S iii]λ6312 lines detected with S/N > 5.This more stringent criteria on the S/N of the [S iii]λ6312 line was used to remain compatible with the selection criteria used in our MAD sample, though we find it makes no clear systematic difference to use a cut of [S iii]λ6312 S/N > 3 instead.NGC5194 had no regions with detected [O iii]λ4363, leaving a total of 43 regions within the remaining three CHAOS galaxies which meet our S/N criteria.
Fig. 3 (left panel) shows the comparison between the sulphur-based (Díaz & Zamora 2022) method and the oxygen (Izotov et al. 2006) based method.We find very good agreement between these diagnostics, suggesting that the sulphurbased diagnostic is appropriate as a tracer of the oxygen abundance for verifying the accuracy of strong line diagnostics in our MAD sample.The line of best-fit (black line in Fig. 3 However, when comparing the [S iii]λ6312 based metallicity to the multi-zone metallicities from Berg et al. (2020), the data no longer fall along the one-to-one line (dot-dashed line in right panel, Fig. 3).There is a systematic offset at low metallicities where the Berg et al. ( 2020) method returns higher metallicity values, with the sulphur-based measurements offset by -0.37 dex at 12+log(O/H)= 8.0.Nevertheless, although there is not a one-to-one agreement between the results of the two diagnostics, there is a clear relation, making it possible to convert between the results from the two diagnostics using the parameters of our best fit line, with slope = 1.56 ± 0.11, and y-intercept = -4.85± 0.96.The implications of applying such a re-scaling to our sulphur-based Te-based metallicities are discussed in Section 5.2.

Comparing strong line and T e-based diagnostics
Having found the Díaz & Zamora (2022) sulphur-based Te method to agree well with the oxygen-based Te method, we can now explore the reliability of applying strong-line diagnostics to our MUSE observations, by comparing the results from each strong-line diagnostic to that from the Díaz & Zamora (2022) Te-based method.These comparisons are shown in Fig. 4, with the points coloured by log(U ).The dot-dashed line shows the 1:1 relationship between the diagnostics, and the dotted line represents the best fit, obtained using the seaborn regplot method, with consistent fits obtained using a bootstrap method.
According to the best-fit relationships, a number of the diagnostics show significant disagreement with the Te-based measurements, particularly at high metallicity.The PP04 O3N2, PP04 N2, and Pilyugin & Grebel (2016) S-calibration all show some agreement with the Te-based measurements at low metallicities, but show increasing discrepancies at higher metallicities.For these diagnostics, the best-fit line to the data returns a slope of ∼0.55, leading to the strong line diagnostics returning results ∼ 0.3 dex lower than the Te-based measurements at 12 + log(O/H) = 8.8.One suggestion for the offsets observed with the PP04 diagnostics could be due to the relatively small calibration sample used.The auroral line becomes increasingly faint at higher metallicities, which results in this region being less well sampled.This is evident in the PP04 sample, where the low O3N2, high oxygen abundance region of the parameter space contains just six data points, four of which have metallicities based on strong line diagnostics.The authors themselves note the need to increase the sample size of their calibration sample at the high metallicity end.(2022) Te-based metallicities, but with a slope consistent with unity (see Table 2).
As expected, the highly log(U )-sensitive O3N2 and N2 diagnostics show a clear variation of log(U ) along the y-axis, with the lowest metallicity regions as measured by the strong line diagnostics having higher log(U ).This leads to the diagnostics showing an increased discrepancy at higher log(U ).Unfortunately, the line fluxes necessary to determine log(U ) for the PP04 sample were not published in the original paper, and we therefore cannot verify this potential log(U )-bias in the PP04 calibration sample.
Interestingly, considering the variation of log(U ) along the x-axis, we see no clear trend between Te-based metallicity and log(U ), suggesting the O3N2 and N2 diagnostics may become unreliable under certain log(U ) conditions.The Pilyugin & Grebel ( 2016 2).The N2S2Hα diagnostic, however, shows a much lower level of agreement at low metallicity when converting to the predicted Berg et al. (2020) values, with a best-fit slope of ∼1.5, and larger offsets of ∼0.4 dex at 12 + log(O/H)= 8.2.The best-fit parameters are summarised in Table 2.
Although we cannot conclusively state which of the Temetallicity diagnostics is the more accurate, it is noteworthy that the PP04 and M13 O3N2 and N2 diagnostics, and the Pilyugin & Grebel (2016) diagnostic were all calibrated against two-zone temperature models.We would therefore expect these diagnostics to be in better agreement with the Díaz & Zamora (2022) sulphur-based Te-metallicity, which is in far better agreeement with the Izotov et al. (2006) oxygen two-zone temperature model (see Fig. 3) than the Berg et al. (2020) three-zone temperature model.Furthermore, the offsets at high metallicity apparent in Fig. 4 between Temetallicities and the N2, O3N2 and S-calibration can be explained to some degree by selection effects in the calibration samples.We discuss this further in section 6.2.1.This re-calibration used [O iii]λ4363 line fluxes obtained from whole-galaxy observations, with spectra stacked in bins of log([O ii]/Hβ) and log([O iii]/Hβ) to obtain sufficient S/N of the auroral line, in particular at higher metallicities.The sample of galaxies was selected from SDSS data release 7 (DR7 Abazajian et al. 2009) and had a median redshift of 0.072, leading to each fibre corresponding to observations on the scale of ∼3 kpc.Using the oxygen Te-based metallicities, Curti et al. (2017) then modelled the relation between the strong line ratios and the Te-based metallicities with a high  2017) O3N2 values are generally shifted to higher metallicities.However, the line of best fit still features a sub-unity slope of ∼ 0.50, similar to the PP04 O3N2, N2, and the S-calibration results.This suggests that the increased discrepancy between the strong line and Te-based diagnostics at high metallicity cannot simply be explained by biases in the calibration sample.
The Curti et al. (2017) calibration sample, by using observations of galaxy-integrated spectra, averages over a large number of Hii regions in each observation, thus averaging over a range of different ionisation conditions.Despite this, there remains a clear log(U ) dependence in the Curti et al. (2017) metallicities, again suggesting that the log(U ) must be taken into account in order to reduce the scatter in O3N2 metallicities.
The relatively flat relation between Te-based metallicities and O3N2 and N2 diagnostics implies that the strong line ratios used in these diagnostics are not sufficiently sensitive to metallicity, leading to a narrower range in metallicity estimates than Te-based methods.This can be understood from the unaccounted log(U ) dependency in the O3N2 and N2 diagnostics (Kewley & Dopita 2002), which are effectively flattened out in the calibrations.To illustrate this more clearly, in Fig. 7 we show the O3N2 line ratio against Te-based metallicity for our sample of Hii regions, colour-coded by log(U ), and we overplot the best-fit relation between these two parameters from PP04, M13 and Curti et al. (2017).From this figure it can be clearly seen how Hii regions with comparable O3N2 line ratios can have very different metallicities, depending on their ionisation parameter, as expected from Kewley & Dopita (2002).

Metallicity diagnostics on sub-Hii region scales
The Hii regions identified with our HIIdentify code range in size from 22 pixels up to 1740 pixels, corresponding to 0.08 -0.98 kpc.Therefore, in addition to investigating the properties of individual Hii regions, the MUSE data used in this paper also allow us to explore the range of metallicity and log(U ) values within Hii regions.This may provide further insight into how representative Hii-integrated properties are of the range in values within the Hii region.
Using the same sample of Hii regions with measured Tebased metallicities as in our previous analysis, we selected the spaxels belonging to these Hii regions, as determined using HIIdentify.We repeated our previous analysis using these spaxels, and in Fig. 8 we compare the results from the strong line diagnostics to the Díaz & Zamora (2022) Te-based diagnostic.For the strong line metallicity diagnostics, we applied a S/N > 3 cut on all relevant emission lines within each spaxel.For the Te-based, we used a S/N > 3 cut on the nebular lines, and a S/N > 5 cut on the auroral line.We then further required the resulting metallicity to also have S/N > 3. We note that requiring a measure of the Te-based metallicity means that only 3.5% of the spaxels within these regions are selected, and the impacts of this are discussed further in Section 6.2.1.
Fig. 8 shows that strong-line metallicity estimates are also offset towards lower values compared to Te-based estimates on sub-Hii region scales, covering a similar region of the parameter space as the stacked Hii region values.
Overall, these offsets are slightly larger on sub-Hii region Despite the similarity in the parameter space occupied by single spaxels and stacked Hii regions, of note in Fig. 8 is the almost flat distribution of data points for a given galaxy, especially for the O3N2, N2 and S-calibration diagnostics.A possible explanation for this very flat distribution may therefore be that it reflects large variations within galaxies in log(U ) and metallicity, but with only a correspondingly small variation in the O3N2 and N2 line ratios.For example, for spaxels in IC5273 with measured Te-based metallicities, log(U ) covers a range of 0.85 dex and the Te-based metallicities vary by ∼ 0.7 dex, whereas the line ratios of log(([O iii]/Hβ)/([N ii]/Hα)) cover only 0.4 dex corresponding to just 0.1 dex range in estimated metallicities from both the PP04 and M13 O3N2 diagnostics.When stacking the spaxels within Hii regions, this range in properties is reduced, although there is still evidence of the dependency between O3N2 and N2 line ratios, log(U ) and Te-based metallicity (Fig. 7).This therefore implies that the location of an Hii region in Fig. 4 and Fig. 5 will be sensitive to its average log(U ) value, and thus the distribution in log(U ) within the Hii region.
To investigate this further, for each strong line diagnostic considered in Fig. 4, we plot in Fig. 9 the difference between the stacked and spaxel metallicities (∆Z), against the region's stacked metallicity, colouring each data point by the spaxel log(U ).
Within each region there is a large spread of spaxel metallicities of up to 0.4 dex for the N2S2Hα diagnostic, reduced to mostly within 0.25 dex for the M13 O3N2 and N2 diagnostics.This range of values is in contrast to works such as Peimbert et al. ( 2017), which suggest Hii regions to be chemically homogeneous.There is additionally a strong anti-correlation between ∆Z and log(U ) at fixed stacked metallicity for the O3N2 and N2 diagnostics.
The O3N2 and N2 diagnostics rely on an anti-correlation between the metallicity and log(U ), which is reflected in the log(U ) trends seen for these diagnostics in Fig. 9 (left and centre panels, respectively).High values of log(U ) are seen at the low metallicity end on the x-axis in Fig. 9, and along the y-axis spaxels with lower metallicities have higher log(U ) values than spaxels with higher metallicities, leading to a diagonal colour gradient.
Conversely, the N2S2Hα diagnostic should be independent of log(U ) by using a ratio of [N ii]/[S ii], and any observed log(U )-dependency with metallicity should thus reflect an in-  2020) diagnostic.Vertical red dotted lines denote the validity limits of the diagnostics, and diagonal red lines show the corresponding limits on the possible differences between the spaxel and region metallicity measurements.
trinsic relation.The bottom right panel in Fig. 9 shows that there is no clear trend with log(U ) at low metallicities, but at higher metallicities (12+log(O/H)>8.4)there is a clear positive trend, with spaxels with higher metallicities than that of the overall region also having higher log(U ).This suggestion of a positive relationship between metallicity and log(U ) has been previously observed when using the N2S2Hα (Krühler et al. 2017;Easeman et al. 2022), S-calibration (Kreckel et al. 2019;Groves et al. 2023), and the N2O2 ([N ii]λ6584 / [O ii]λλ3727,29) (Grasha et al. 2022) diagnostics.
The Pilyugin & Grebel (2016) S-calibration shows a much smaller range of metallicities within a single Hii region compared to the N2S2Hα values, mostly within ∼0.2 dex of the measured stack value, with the exception of a few high log(U ) points which lie further away.Although the relationship is less clear, the points seem to show a similar relationship with log(U ) as the N2S2Hα diagnostic, with no clear relationship at low metallicity, but a positive relationship for metallicities > 8.4.
Aside from the distribution of log(U ) and metallicity within Hii regions in absolute terms, it is also interesting to consider where the spread of data points lie relative to the stacked Hii metallicity (dashed horizontal line).For the O3N2 and N2 diagnostics, the data points lie predominantly above the dashed line (∼70% of spaxels lie above).The N2S2Hα and S-calibration instead show the stacked spectra measurements lying closer to the centre of the distribution slightly offset towards higher metallicities than returned from the individual spaxels (∼40% of spaxels lie above, ∼60% below).Considering the spaxels with metallicity values within 0.01 dex of the stack value, we found the average log(U ) to be fairly consistent across the diagnostics, ranging from -2.97 to -3.00.This implies that for each strong line diagnostic, the Hii region metallicity is weighted towards spaxels with comparable environmental properties.
Given this apparent difference between the diagnostics in how the Hii stacked metallicity compares to the metallicity distribution within the Hii regions, we also investigated the spatial location of the spaxels that carried a higher weight in the stacked Hii metallicities.We did this by colour-coding the points in Fig. 9 by the distance of each spaxel from the centre of the region (Fig. 10).In doing so, we noticed that the O3N2 and N2 profiles clearly showed profiles inverted to what would be expected, with lower metallicities measured in the centres compared to the outskirts.This was also observed by Krühler et al. (2017) and Mao et al. (2018), and similarly to these papers, we find the N2S2Hα diagnostic to behave as expected, with metallicities increasing towards the centres of Hii regions.The S-calibration and N2S2Hα diagnostic show negative radial gradients most clearly for regions with metallicity above 8.4.The inverted profiles seen when using the O3N2 and N2 diagnostics is therefore likely a consequence of the positive relationship observed between metallicity and log(U ) in more metal rich Hii regions with the N2S2Hα and S-calibration diagnostics, contrary to the anti-correlation required by the O3N2 and N2 diagnostics.
For spaxels with metallicities within 0.01 dex of the stack value, the average distance from the centre of the region is also fairly consistent between the diagnostics, ranging from 0.11 to 0.13 kpc.
As the metallicity of stacked spectra appear to trace the same part of the region and gas with similar levels of excitation for all strong line diagnostics, there is no obvious property on the sub-Hii region scales that can account for the differences observed in Fig. 4 between the diagnostics considered.Instead, the distribution in Hii properties echoes the dependencies observed in the stacked spectra between O3N2 and N2 metallicity and log(U ).

DISCUSSION
Strong line metallicity diagnostics are only indirect tracers of metallicity, and they are generally dependent on multiple environmental properties, most notably log(U ) (Kewley & Dopita 2002).Diagnostics with either just a weak dependence on log(U ) (e.g.N2O2 at high metallicity; Kewley & Dopita 2002), or ones which include multiple line ratios that can be used to constrain the various environmental properties (e.g.R23; Kobulnicky & Kewley 2004) can therefore be considered more reliable.However, such diagnostics commonly include emission lines at very different wavelength ranges, which are not always accessible with a single spectrograph.In the case of MUSE, the relatively high spectral cut off at the blue end of ∼ 4800Å makes the important [O ii] line doublet inaccessible for nearby galaxies (z < 0.3), for which the spatial resolution is highest.Similarly, the [O iii]λ4363 line cannot be detected with MUSE in galaxies at z < 0.1, meaning that oxygen Te-based diagnostics cannot be applied.In this paper we have therefore used the Díaz & Zamora (2022) sulphur Te-based metallicity diagnostic to investigate the accuracy of several strong line metallicity diagnostics which are applicable to MUSE observations of nearby galaxies (z < 0.03).This includes the Dopita et al. (2016) N2S2Hα diagnostic, which was calibrated against the Mappings photoionisation code (Dopita et al. 2013), and which has not yet been compared empirically to Te-based metallicities on a large sample of Hii regions.
In general we find good agreement between the Díaz & Zamora (2022) sulphur Te-based metallicities and oxygen Tebased metallicities Izotov et al. (2006) (see Fig. 3), but there are significant offsets between the Te-based metallicities and the strong line diagnostics considered in this paper (Fig. 4).In this section we consider possible causes for the offsets that we observe, and discuss the implications of our findings.

The sulphur-to-oxygen ratio
The S/O ratio has been suggested to increase with decreasing metallicity (Dors et al. 2016;Díaz & Zamora 2022), and thus by assuming a constant value, as we do in section 4.1 to convert between 12 + log(S/H) and 12 + log(O/H) we could be overestimating 12 + log(O/H) at low metallicities (relative to high metallicities).For example, using a combination of star forming galaxies and Hii regions Dors et al. (2016) found the log(S/O) ratio to vary from -1.68 at 12 + log(O/H) = 8.0, to -1.90 at 12 + log(O/H) = 8.8.Applying such a metallicitydependent log(S/O) ratio would increase our Te-based metallicities by ∼ 0.3 dex at the high metallicity end, further flattening the relation between Te-based and strong line metallicities, and significantly increasing the offsets we observe between the strong line and Te-based metallicities in Figs. 4  and 5.The log(S/O) values measured for the 'DHR' sample presented in Díaz & Zamora (2022), which best represents our sample of Hii regions, similarly imply an inverse relation with metallicity, varying from around -1.3 at 12 + log(O/H) = 8.0 to around -1.6 at 12 + log(O/H) = 8.7.
In contrast to these results, Berg et al. (2020) found no evidence of a metallicity dependence in log(S/O), instead finding a fairly constant ratio with an average value of log(S/O) = -1.34 for the 190 Hii regions in their sample of four CHAOS galaxies.
With our choice of log(S/O) = -1.57,we find good agreement between the sulphur-and oxygen-based Te diagnostics in Section 3. If we were instead to apply a varying log(S/O) such as found by Díaz & Zamora (2022), discrepancies of ∼0.2 dex would be introduced at low metallicities between the Díaz & Zamora (2022) and Izotov et al. (2006) Te-based diagnostics, and the differences between Díaz & Zamora (2022) and Berg et al. (2020) Te-based metallicities would also increase.If alternatively we used the best-fit constant value of log(S/O) = -1.34from Berg et al. (2020), we would reduce the discrepancies between the strong line and sulphur based Te metallicities shown in Fig. 4, bringing the N2S2Hα diagnostic in particular in very good agreement.However, unexplained discrepancies of 0.23 dex would then be introduced between the sulphur-and oxygen-based Te diagnostics (Fig. 3), suggesting that metallicities based on the Díaz & Zamora (2022) method would no longer provide values in alignment with those returned by oxygen-based methods.We therefore find a fixed value of log(S/O) = -1.57to be the most appropriate choice.

Possible origin of metallicity offsets
Given that the majority of the strong line metallicity diagnostics in Section 3 are calibrated against Te-based metallicities, it seems surprising that they show such poor agreement with our sulphur Te-based metallicities.It is therefore reasonable to consider what differences there may be in the range of environmental properties present in the respective calibration samples and in our sample of Hii regions, and how different Te-based methods compare.

Selection effects in Te-based samples
The [O iii]λ4363 line is stronger in high excitation, low metallicity galaxies (e.g.Hoyos & Díaz 2006), which could bias calibration samples that are based on auroral line detections against low excitation, high metallicity regions.Evidence of such biases can be seen in the M13 Hii sample, where for metallicities above 8.5, there are significantly fewer measurements, and for PP04, only 3 of their 131 observed Hii regions with [O iii]λ4363 line detections have higher than solar metallicities.
To test whether similar selection effects are present in our [S iii]λ6312 line sample, we compared the distribution of log(U ) and strong line metallicities in the full Hii region sample selected in Section 3.1 by our HIIdentify code to the sub-sample of regions with Te-based measurements.Out of 4408 regions within our sample, 186 (4.2%) have measured Te-based metallicities.However we note that our HIIdentify code is capable of separating overlapping Hii regions (see Fig. 2), which would result in a larger number of detected Hii regions, extending down to lower Hα fluxes, compared to other Hii identification codes.For example, from visual inspection of the segmentation maps presented in Grasha et al. (2022) (their fig.3), where the HIIphot code was used to identify Hii regions, the regions identified appear to be fairly isolated, suggesting that overlapping regions are generally merged together.Evidence of this is present in the galaxy NGC2835, which is present in both our sample and in Grasha et al. (2022).The MUSE pointing covers just the central 1 ′ × 1 ′ part of the galaxy, whereas the data in Grasha et al. (2022) cover an area ∼ 8 times larger (their fig.1).Nevertheless, the number of identified Hii regions in Grasha et al. (2022) is only ∼ 30% greater.The larger size of our parent sample therefore has the consequence of reducing the fraction of Hii regions with [S iii]λ6312 detections than if we had used alternative Hii detection algorithms.
Similarly to the biases present in [O iii]λ4363-based samples, our subset of [S iii]λ6312-bright Hii regions are indeed shifted towards higher values of log(U ), covering a range of -3.3 --2.2, compared to -4.1 --2.2 for the full sample.The metallicities as measured by the strong line diagnostics are also shifted towards lower values, with the maximum metallicities in the sub-sample of Te-selected regions typically ∼0.2 dex below that for the full sample, despite the parent population peaking toward high metallicities.
Similar trends are seen in our spaxel-scale analysis, where in our sample of Hii regions with measured Te-based metallicities in the stacked spectra, only 3.5% of spaxels within those regions also have measurements of the Te-based metallicity.Again the range of log(U ) is reduced, with all spaxels covering a range of -4.0 to -1.8, and those with Te-based metallicity measurements covering -3.3 to -1.8.The range of metallicity measurements is similarly limited at the high metallicity end, to roughly 0.1 dex below the maximum of that of the parent sample.For example, the PP04 O3N2 values for the parent sample range from 8.15 to 8.8, peaking around 8.6.The sub-sample of regions with Te-based measurements cover only from 8.15 to 8.65.
In Figs. 4 -6 we saw evidence for strong log(U ) gradients in the O3N2 and N2 comparison plots, whereby Hii regions with lower log(U ) values lay at systematically larger offsets from the line of equality, especially at the higher metallicity end.Given this trend, one could therefore speculate that the large sample of low log(U ) and high metallicity Hii regions omitted by our selection criteria lie closer to the line of equality at the high metallicity end than our Te-based sample, steepening the slope of the best-fit line.Nevertheless, given that the [O iii]λ4363 Te-based methods used to calibrate the diagnostics will suffer from similar selection effects, in particular when stacking is not used, as is the case with the PP04 and M13 diagnostics, this bias cannot clearly explain the disagreement that we find in Figs. 4 -6.Furthermore, the better agreement shown in Fig. 6 between our Te-based metallicities and the Curti et al. (2017) O3N2 calibration, which uses stacking to reduce selection effects, implies that our results are not significantly affected by selection effects present in our [S iii]λ6312-based sample relative to other auroral line-based samples.
To verify that unaccounted selection effects are not inadvertently causing the flat relation that we find between [S iii]λ6312 Te-based metallicities and the O3N2 and N2 metallicities, we investigated the relation between the sulphur-based and PP04 metallicities in the CHAOS sample, which used fixed slit spectroscopy rather than IFU data.We found the O3N2 and N2 relations to be flatter than our MAD galaxy results shown in Fig. 4, with respective best-fit slopes of 0.30 ± 0.05 and 0.31 ± 0.04 (compared to 0.55 ± 0.05 and 0.53 ± 0.04 when using the MAD galaxy sample).The CHAOS sample of Hii regions with [S iii]λ6312 detections has predominantly high log(U ) values (> −3), implying that CHAOS has a larger log(U ) bias than the MAD sample, and possibly also more biased than the PP04 calibration sample in terms of the range of ionisation parameters covered.

Differences in electron temperatures
M13 show their relation between Te-metallicity and the O3N2 and N2 line ratios next to other relations, including PP04, in their fig. 2 & 4, where it is clear that the M13 relation is much shallower.Notably, as illustrated in Fig. 7, at 12 + log(O/H)≳ 8.4 the Curti et al. (2017) O3N2 calibration has a similar slope, but is offset to larger metallicities.The sample of Hii regions used in M13 is larger than used in PP04, and M13 also use a range of auroral line detections to minimise the impact of temperature-based selection effects.The reason for the systematically lower M13 O3N2 and N2 metallicities at high metallicities, leading to their shallower relation, is not clear.However, it is worth noting that there is scatter in the M13 sample, and in the case of O3N2 (see their fig. 2), the best-fit relation at high metallicities is clearly driven by a tight distribution of data points from Pilyugin et al. (2012), whereas data from Pérez-Montero & Contini (2009), for example, extend to higher metallicities for the same given O3N2 line ratio.
M13 used auroral line detections from various elements to measure the electron temperature of the Hii regions in their sample, using the Pilyugin et al. (2010) method to re-measure Te-based metallicities.Which of the auroral lines were detected depended largely on the metallicity of the Hii region, with the [N ii] auroral line generally used for metallicities > 8.4, and metallicities at < 8.4 largely measured using the [O iii] auroral line.When determining metallicities based on a detection of the [N ii] auroral line, their method assumes that Te ,[N ii] = Te ,[O ii] , and while the two elements do have similar ionisation potentials and can be expected to trace similar parts of the Hii region, various studies have suggested the two temperatures can not be considered to be equal, although there is contention in what the relation should be.Yates et al. (2020) found that especially at higher temperatures, Te ,[N ii] > Te , [O ii] in their observations of Hii regions and galaxies, whereas Berg et al. (2020) found the opposite using the CHAOS sample of Hii regions, suggesting that the relationship between the two temperatures is not well defined, and it is therefore unclear whether it is valid to assume that Te ,[N ii] = Te ,[O ii] .The mixture of auroral lines used in M13 to trace the electron temperature could therefore contribute to the discrepancies we find between the M13 diagnostics and our sulphur Te-based metallicities, in particular in the case that Te ,[N ii] > Te ,[O ii] (Yates et al. 2020), which would imply that M13 have overestimated Te ,[O ii] , and thus underestimated 12 + log(O/H) at high metallicities.However, the inhomogeneous sample of Hii regions used in M13 may also contribute to offsets.

Fixed slit versus IFU measurements
The Pilyugin & Grebel (2016) S-calibration has been suggested to return underestimates of the metallicity when used on IFU data, with Pilyugin et al. (2022) finding the results to be underestimated by ∼0.06 dex at 12+log(O/H) ≳ 8.55, and by ∼0.02 below this, on average.They reason that this is due to differences in the intensities of each line in singleslit and IFU observations, due to the limitations of single-slit observations meaning that the entire Hii region is not always captured in a single observation.They suggest this introduces systematic effects when their diagnostic, which was calibrated using single-slit observations, is used on IFU data.This could be expected to therefore also affect the PP04 diagnostics, which were calibrated against single slit data, and the M13 diagnostics, which were calibrated using a combina-tion of IFU data and single slit observations.However, this can only account for some of the offsets observed between the diagnostics in our analysis, as we observed systematic offsets of up to 0.3 dex at high metallicities.Furthermore, our analysis on the spaxel-level data indicate that the Hii stacked spectra are predominantly weighted by the central regions of the Hii region, implying that emission missed in single slit spectra from the outer regions of the Hii region are unlikely to contribute greatly to the measured line fluxes.

Large and small scale metallicity gradients
The strong line O3N2, N2 and S-calibration diagnostics underestimate the metallicity at high metallicities by ∼0.3 dex, which could have implications on measurements of the metallicity gradient, for example imposing a flattened inner profile for a galaxy with a negative metallicity gradient.The N2S2Hα diagnostic shows no evidence of a metallicitydependent offset from Te-based measurements, and no clear log(U ) dependence in the discrepancies between the strong line and Te-based measurements.This could explain why, for a sample of galaxies observed in the MaNGA survey (Mapping Nearby Galaxies at Apache Point Observatory; Bundy et al. 2015), Yates et al. (2021) found evidence of flattened inner profiles in more massive galaxies when O3N2-based diagnostics were used, which were not seen in results from the N2S2Hα diagnostic, as well as finding that the N2 diagnostic also returned much flatter profiles than the N2S2Hα diagnostic.
On sub-Hii region scales, we find in Figs. 9 and 10 that the O3N2 and N2 diagnostics show clear inverted metallicity profiles, with metallicity appearing to decrease towards the centre of the Hii regions.The N2S2Hα diagnostic, however, shows decreasing metallicities from the centres of the regions out to the outskirts, which is in line with expectations.Exploring the relation between metallicity and log(U ), we find that at high metallicities, the N2S2Hα diagnostic shows a clear positive relationship between log(U ) and metallicity, which would explain the inverted profiles shown by the O3N2 and N2 diagnostics, as they rely on an anti-correlation between metallicity and log(U ) observed in Hii region and galaxy samples.

Optimal metallicity diagnostics with MUSE
On both Hii region and sub-Hii region scales, the N2S2Hα diagnostic results are offset from the Te-based values, however this discrepancy has no apparent dependence on either metallicity or ionisation parameter, unlike the other diagnostics tested.We therefore find that this diagnostic produces the most reliable results when used on spatially resolved MUSE observations of nearby galaxies.
We note, however, that the Hii regions used in our analysis typically account for just ∼10 % of the overall Hα luminosity of the galaxy within the MUSE field of view, therefore these results may not be descriptive of trends seen on whole-galaxy scales.
Further study of the diagnostics will be possible with the proposed BlueMUSE instrument (Richard et al. 2019).Blue-MUSE will have a spectral range covering much shorter wavelengths (3500 -6000 Å), and will provide high spatial resolution IFU data with coverage of the [O ii]λλ3727,29 and [O iii]λ4363 lines, allowing for Te-based diagnostics reliant on the oxygen lines to be used, removing the need for any conversion between sulphur and oxygen abundances, and allowing a larger range of strong line diagnostics to be investigated.

CONCLUSIONS
We have used a sample of 671 Hii regions from 36 galaxies observed as part of the MUSE Atlas of Disks survey, to assess the optimal strong line diagnostic for use with MUSE data.We additionally release a catalogue of 4408 Hii regions identified within this sample using our newly-developed python tool HIIdentify6 , with segmentation maps and tables of emission line strengths made available (see Appendix C).By comparing the results from strong line diagnostics to Te-based measurements, we find:  Marino et al. (2013) show the largest differences with the Te-based measurements at high metallicities, with these strong line diagnostics returning values over a significantly reduced range compared to the Te-based values, due to the much steeper relationship between the line ratios and metallicity (see Fig. 7).• This comparison on Hii region scales suggests that when measuring the metallicity of Hii regions for galaxies observed with MUSE, the N2S2Hα diagnostic provides the most accurate measurement of the metallicity.• We used the CHAOS sample (Berg et al. 2020) 2020) method could be used on our data, the resulting comparisons between the strong line and Te-based diagnostics are significantly changed, with reduced metallicity-dependence in the offsets for the O3N2, N2 and S-calibration diagnostics, but increased dependence for the N2S2Hα.• On sub-Hii region scales, we find clear evidence of inverted metallicity profiles when using the O3N2 and N2 diagnostics, which are not seen when the S-calibration or N2S2Hα diagnostic is used.For the N2S2Hα diagnostic, we find a clear positive relationship between log(U ) and metallicity at high metallicities, in contrast to the negative relationship required by the O3N2 and N2 diagnostics.This could therefore explain the inverted profiles observed within Hii regions, as has been suggested by Krühler et al. (2017) and Mao et al. (2018).• Stacked spectra from Hii regions appear to generally be weighted towards the inner part of Hii regions, where the log(U ) is highest.The differing relationships between log(U ) and metallicity between the diagnostics may therefore suggest a reason for the disagreement between the N2S2Hα, and O3N2 and N2 diagnostics.tal of 4408 Hii regions within 36 galaxies from the MAD sample.For each of these regions, we stacked all spectra within the regions, and fitted the emission lines, as described in Section 3.  , an excerpt of which can be seen in Table C1.The segmentation maps denoting which pixels belong to each region can be found at https://doi.org/10.6084/m9.figshare.22041263.These segmentation maps have 4 extensions, with the first providing a region ID for each pixel, the second a measure of the distance of the pixel from the peak of the region in kpc, the third a map of the peak positions, and the fourth shows the Hα flux within the identified regions.within each region, and growing outwards from that point.As the codes iterate outward, different conditions are used to terminate the growth of the region.For the results from using HIIphot presented in Groves et al. (2023), the edges of the regions were set using the gradient of the Hα surface brightness, with a fixed value used for all galaxies.For our analysis using HIIdentify, we terminate the growth of a region once the flux drops to a defined background value, determined individually for each galaxy.
To compare the results of the two codes, we matched the area observed in our MAD data to the wider field of view of the PHANGS observations, achieved by mosaicking 7 MUSE pointings.As shown in Fig. D1 the HIIphot code (orange outlines) appears to return very isolated regions compared to HIIdentify (grey regions), suggesting that neighboring regions may be merged together by HIIphot.The HIIphot code also identifies a greater number of regions, including a large number within areas removed in our analysis due to having HαEW < 6 Å.Some of the regions appear to match up well between the results from the two codes, but for other regions it appears that HIIdentify has split what was considered as one region by HIIphot into multiple regions.Some of the regions identified by HIIdentify are also larger, which may be due to the difference in how the growth of the regions are terminated in the two identification codes, with HIIdentify capturing more of the diffuse emission around the edges of the region.
Due to the differences in the identification of regions be-tween the two codes, we then explored whether this led to any differences in the measured metallicity.Using our MAD data and the HIIphot segmentation map, we stacked and fitted the spectra within regions identified by HIIphot following the same process as described in Section 3.2.Then for each HIIphot region, we selected every region identified by HIIdentify with any overlapping pixels.To compare the metallicities returned, in Fig. D2 we plot the S-calibration metallicity from the HIIphot regions on the x-axis, against the metallicity of each overlapping HIIdentify region on the y-axis.The dot-dashed black line shows the 1:1 relation between the two and the points are coloured by the percentage of the HIIphot region covered by the HIIdentify region, which is also used to weight the best-fit to the data shown as the orange dashed line.Fig. D2 shows very good agreement between the Scalibration metallicities returned for the regions identified by HIIphot and the overlapping HIIdentify regions, especially for the regions which share a large fraction of the region's pixels (purple points).We find a similar amount of consistency for all diagnostics used in our analysis.This implies that the nebular line fluxes that we measure are consistent with the published MUSE-PHANGS measurements, and that our results are robust and show little dependence on the code used to identify the Hii regions.
This paper has been typeset from a T E X/L A T E X file prepared by the author.

Figure 1 .
Figure1.Flowchart illustrating the steps taken by HIIdentify to identify pixels as being peaks of Hii regions, removing any spaxels determined to be noise within the image, and to then assign pixels to the regions, returning segmentation maps as shown in Fig.2.

Figure 2 .
Figure 2. Left: Hα flux map of NGC1483, with outlines of the regions identified with HIIdentify shown in red.Right: Segmentation map, where the pixels within each region are set to the region ID number, shown here by different colours.

Figure 3 .
Figure 3.Comparison of the Díaz & Zamora (2022) sulphur-based Te diagnostic, to other Te-based diagnostics.In both panels, the dot-dashed black line shows the 1:1 relationship, and the solid black line the best-fit to the data, with the shaded region representing the uncertainty on the fit.Comparison to the results from the Izotov et al. (2006) diagnostic shows a good agreement (left), but when comparing to the published Berg et al. (2020) values (right), a systematic offset at lower metallicities is seen.The slope and intercept of the best-fit line are given in the bottom right of each panel.Note that not all Hii regions within the CHAOS sample have the necessary oxygen auroral line detections required to apply the Izotov et al. (2006) diagnostic, hence the range in y-axis values differ between the two panels.
The M13 O3N2 and N2 diagnostics show slightly reduced scatter, but the least agreement with the Te-based measurements, with an offset of ∼ 0.4 dex below the Te-based metallicities at 12 + log(O/H) = 8.8.The N2S2Hα diagnostic returns an average offset of ∼0.2 dex below the Díaz & Zamora

Figure 4 .
Figure 4. Comparison between strong line and sulphur Te-based metallicity diagnostics, with the dot-dashed line showing a 1:1 relationship between the diagnostics, and a dotted line representing the best-fit.The points are coloured by log(U ), measured using the sulphur-based Mingozzi et al. (2020) diagnostic.The scatter about the best-fit line is denoted by the standard deviation of the residuals (σ residuals ).
) diagnostic shows a weaker trend with log(U ), and the N2S2Hα diagnostics shows no evidence of a relationship.If the arguably more sophisticated multi-zone Berg et al. (2020) metallicity diagnostic is considered the more accurate of the Te-based diagnostics investigated in this work, this would imply that our sulphur Te-based metallicities are underestimated at low metallicities.Although this would not improve the discrepancies seen at high metallicity between our Te-metallicities and the strong line diagnostics shown in Fig. 4, it could, in part, explain the sub-linear relation observed in most cases.To explore this further, we use our best-fit relationship between the sulphur Te-based metallicities and the Berg et al. (2020) metallicities shown in Fig. 3 to convert our sulphur-based Te-metallicities to the Berg et al. (2020) metallicity scale, and compare this to the strong line metallicities in Fig. 5.As expected, the PP04 O3N2 and N2, and the S-calibration metallicities show a more linear relation with the Te-based values (best-fit slopes increase from ∼ 0.5 to ∼ 0.8), but with a systematic offset to lower metallicities of ∼0.1 dex at 12 + log(O/H)=8.2, and ∼0.2 dex at 12 + log(O/H)= 8.8.Although the relations with the M13 diagnostics also steepen, they still remain fairly flat, with a slope in the range 0.5 − 0.6 (see Table

Figure 5 .
Figure 5.As for Fig. 4, but using the relationship between the Díaz & Zamora (2022) and Berg et al. (2020) Te-based diagnostics shown in Fig. 3 to convert from the measurements made using the Díaz & Zamora (2022) diagnostic to the predicted value for if the Berg et al. (2020) diagnostic could be used.

Figure 7 .
Figure 7.The O3N2 line ratio against sulphur Te-based metallicity, colour coded by log(U ).The best fit relations from PP04 (dotted), M13 (dashed), and Curti et al. (2017) (solid) are overplotted.The differences between the O3N2 diagnostics can be clearly seen, with the PP04 and M13 diagnostics shifted to lower metallicities than the Curti et al. (2017) re-calibration.The M13 diagnostic also clearly shows a steeper relationship, hence covering a much smaller range of metallicities.

Figure 8 .
Figure 8.As for Fig. 4, now showing the metallicities of individual spaxels (no outlines), together with the metallicities from whole Hii region stacked spectra overlaid with black outlines.The colour and shape of the markers is unique to each galaxy.The best-fit line, with the uncertainty shown by the shaded region, is fitted only to the spaxel measurements.

Figure 9 .
Figure 9. Difference between the metallicity of each spaxel within a given Hii region and the metallicity obtained from the corresponding stacked Hii spectrum for each Hii region in our sample with a Te-based metallicity.The points are coloured by log(U ), as determined using the Mingozzi et al. (2020) diagnostic.Vertical red dotted lines denote the validity limits of the diagnostics, and diagonal red lines show the corresponding limits on the possible differences between the spaxel and region metallicity measurements.

Figure 10 .
Figure 10.As for Fig. 9, colouring the points instead by the radial distance of each spaxel from the centre of the region, in kpc.

Figure A1 .
Figure A1.As for Fig. 9, comparing the log(U ) measured in each spaxel within a region, to the values returned from the region's stacked spectrum, coloured by the distance to the centre of the region.
2. We provide fluxes for the Hβ, [O iii]λλ4959,5007, [O i]λλ6302,65, [S iii]λ6312, [N ii]λλ6548,84, Hα, [S ii]λλ6717,31 and [S iii]λ9070 lines in an online table APPENDIX D: COMPARING THE RESULTS FROM HIIdentify TO HIIphotA number of codes exist to identify Hii regions within galaxies, and following the release of the MUSE-PHANGS (Physics at High Angular resolution in Nearby GalaxieS;Emsellem et al. 2022) catalogue byGroves et al. (2023), we compare the results from our code to those from HIIphot for NGC2835, which has been analysed in both surveys.The methodologies of the two codes differ slightly, for example HIIphot begins with an initial guess of the shape of the region, chosen from a set of 6 possible models, and grows the regions iteratively based on a slowly declining flux threshold.HIIdentify proceeds by identifying just the brightest pixel

Figure D1 .
Figure D1.Field of view of the MAD observation of the centre of NGC2835, showing the regions identified with HIIdentify (grey outlines and shaded) and with HIIphot (orange outlines) as provided by Groves et al. (2023).

Figure D2 .
Figure D2.For each region identified by HIIphot which overlaps with at least one HIIdentify region, we plot the S-calibration metallicity for the HIIphot region on the x-axis, and the metallicity for each overlapping region as identified by HIIdentify on the y-axis.The points are coloured by the percentage of the HIIphot region covered by the HIIdentify region, with the orange line of best-fit to the data also weighted by this.

Table 1 .
The stellar mass (M * ), star formation rate (SFR), and number of Hii regions identified by HIIdentify in the 36 MAD galaxies in our sample.We also list the number of Hii regions in each galaxy that have [S iii]λ6312 S/N > 5 and nebular S/N > 3, and thus were included in our final sample.

Table 2 .
Parameters for the best-fit lines shown in Figs.4 and 5, such that 12 + log(O/H) X = slope × 12 + log(O/H) T e + intercept, where X is the strong line diagnostic.
dependence in the offsets from the Te-based values.The N2S2Hα diagnostic shows the greatest level of agreement with the Te-based values, with the slope of the best-fit line being consistent with a 1:1 relationship.However, this diagnostic has a systematic offset of ∼0.2 dex, which, unlike the O3N2 and N2 diagnostics, has no clear dependence on log(U ) or metallicity.•The O3N2 and N2 diagnostics presented by