Morphological analysis of H I features – III. Metric space technique revisited

This is the third paper on the morphological analysis of H I features. As in the ﬁrst paper, we use the mathematical formalism of the metric space technique, developed by F. C. Adams and J. Wiseman, to quantify the complexity of 21-cm interstellar maps. This method compares the one-dimensional ‘output functions’ of the maps, which characterize speciﬁc morphological and kinematical aspects of the maps. The H I feature catalogue, from our ﬁrst paper, is increased from 28 to 51 features of known origin, such as star formation regions, the environment of Wolf–Rayet (WR) stars and supernova remnants. The maps come from the Canadian Galactic Plane Survey (CGPS), and have a resolution of 1 cos δ arcmin. Also, signiﬁcant improvements are applied to the metric space technique. We present a new data reduction technique, new and improved ‘output functions’ and a better characterization of the noise propagation and uncertainties in functions. We again look for correlations between the complexity of the H I features and other intrinsic aspects such as age, excitation parameter u , wind velocity, | z | and distance. Many interesting correlations are measured, for example: (i) more complex H I is associated with intense ﬂux emission from star formation regions; (ii) the higher the wind velocity of the WR star, the more complex the H I topology; (iii) the higher the H I feature above the Galactic plane, the less complex its topology.


I N T RO D U C T I O N
The major components of the interstellar medium (ISM) are atomic gas, molecular gas and dust. Neutral hydrogen (H I) makes up approximately 70 per cent of the ISM mass. Its ubiquity and, usually, its low density imply that it is easily shaped by the energetic processes sweeping the ISM. For many years, astrophysicists have tried to identify and analyse H I features that are expected to be associated with massive stars and supernova remnants (SNRs). An individual analysis of these features is usually made to characterize the kinematics of the gas and the topology of structures on a visual basis. Another way to view astrophysical maps is to consider data as real valued mathematical functions of n variables. Then, mathematical tools become an alternative to visual investigation and can provide quantitative morphological analysis of H I features for different aspects of their shape. These calculations enable an objective comparison of H I features. Descriptive quantitative techniques can E-mail: jean-francois.robitaille.1@ulaval.ca (JFR); joncas@phy.ulaval.ca (GJ); andre.khalil@maine.edu (AK) generally be divided into two categories: morphological analysis or spectral analysis.
The first group is dedicated to the analysis of the spatial shape of interstellar or extragalactic features. These are applied to various astrophysical maps, such as continuum emissions, spectral-line images or column density maps. Mostly they were applied to large structures at first (Gott, Dickinson & Melott 1986), and topological tools were then gradually applied to smaller galactic features such as molecular clouds. Veeraraghavan & Fuller (1991) introduced the genus statistics (see Section 3.2.2) and the contour-crossing statistics applied to molecular cloud maps. Later, algorithms were developed to find clumps in molecular clouds with contour maps, such as CLUMPFIND (Williams, de Geus & Blitz 1994) and GAUSSCLUMP (Stutzki & Guesten 1990). However, many different results were obtained on the number of clumps and their positions. In the same way, recently Rosolowsky et al. (2008) introduced a new method using dendrograms, which evaluate the hierarchical structures in data cubes (position-position-velocity maps) using contour levels. Using a different approach, Adams (1992) and  introduced a completely new method, called the metric space technique (MST), which mixed multiple morphological tools with C 2010 The Authors. Journal compilation C 2010 RAS Downloaded from https://academic.oup.com/mnras/article-abstract/405/1/638/1024977 by guest on 27 July 2018 the aim of comparing the complexity of astrophysical maps. In this context, astrophysical maps are not considered as a space but rather as an element of the space of all such maps.
The second group gives more emphasis to Doppler-broadened spectral lines (i.e. to the velocity coordinates of data cubes). One example is the velocity channel analysis (VCA; Lazarian & Pogosyan 2000;Lazarian et al. 2001;Lazarian, Pogosyan & Esquivel 2002;Chepurnov & Lazarian 2009). This analysis reveals the relation between the power law of velocity channel fluctuations and the width of the channel (i.e. the V resolution of a channel). In the same way, the velocity coordinate spectrum (VCS) analysed the intensity fluctuation along the velocity coordinates (Chepurnov & Lazarian 2009). This analysis is typically applied when the spatial information is not sufficient. According to the Kolmogorov theory, those analyses are usually dedicated to recovering the typical eddy scales of a region. The principal component analysis (PCA ;Heyer & Schloerb 1997;Brunt & Heyer 2002a,b) is another tool combining the analysis of the power spectrum and the spatial lines of data cubes. This technique consists of identifying a relation that characterizes the magnitude of velocity differences of line profiles and the spatial scale at which these differences occur within the image.
In this paper, we present a modified version of the MST. First introduced by Adams (1992), the MST provides an excellent mathematical tool to quantitatively compare astrophysical maps using one-dimensional 'output functions'. These functions analyse the topology of structures instead of a pixel-by-pixel comparison. This technique was applied for the first time by . Using IRAS maps, they analysed the morphology of five molecular clouds and thus produced a classification scheme based on their complexity. A strong correlation was found between the complexity and the mass of the clouds. The comparison of the young stellar objects present in these clouds allowed them to support that 'more massive stars tend to form in a more complex environment'. Later, Khalil, Joncas & Nekka (2004, hereafter Paper I) described an improved version of the MST applied to H I features isolated from the Canadian Galactic Plane Survey (CGPS) data (Taylor et al. 2003). Two new output functions were added, which helped to improve the classification scheme of H I features and the quantification of their complexity. They analysed the complexity of H I features associated with star-forming regions (SFRs), Wolf-Rayet nebulae (WRNs) and SNRs. They found correlations between the complexity and descriptive characteristics of the associated object, such as the age of the SNR, the photon flux of the WR stars, the high |z| and the WR type. More recently, the MST was developed into a multiscale formalism and used on galaxy distribution catalogues from the Sloan Digital Sky Survey (SDSS; Wu, Batuski & Khalil 2009).
H I line maps are very different from molecular maps. The latter are found under cloud forms while the diffuse nature of the H I line makes it different to define the border of a feature. The multiphase nature of H I in the ISM (Field, Goldsmith & Habing 1969;Wolfire et al. 1995) complicates the spectral analysis of features. Many works discuss the coexistence of warm neutral medium (WNM) and cold neutral medium (CNM) in the ISM (Heiles & Troland 2003;Lequeux 2005). This characteristic of the H I line prevents the correlation analysis of spectra on an individual line of sight where WNM and CNM can be found but not clearly separate from each other in one small region. This behaviour explains in part why some topological tools such as the VCS, the PCA or the spectral correlation function (SCF; Rosolowsky et al. 1999) cannot be included in a MST algorithm applied to H I features.
Here, we propose a new version of the metric space analysing technique through the improvement of the old output functions and the addition of new ones. The MST is applied to an increased number of H I features that were applied a data reduction technique better suited to this analysis. In Section 2, we describe the data obtained from the CGPS. We discuss the improvements to the mathematical tools and formalism in Section 3. In Section 4 we present our results and analysis, and the summary and discussion are in Section 5.

DATA BA S E A N D M A P E X T R AC T I O N
The data come from the CGPS. The observations were taken at the Dominion Radio Astrophysical Observatory (DRAO; Landecker et al. 2000). The H I line emission survey at 21 cm and its continuum at 1420 MHz were analysed for Galactic longitudes l = 62. • 2 to 147. • 3 and latitudes from b = −3. • 6 to 5. • 6 in mosaics of 5. • 115 × 5. • 115. Each mosaic contains 1024 × 1024 pixel 2 , which cover 18 arcsec per pixel and with a resolution of 1 cos δ arcmin (where δ is the declination). The 21-cm line emission data cubes have a resolution of 1.32 km s −1 and are composed of 272 velocity channels separated by 0.824 km s −1 . A detailed description of the CGPS data is available from Taylor et al. (2003).
The H I features were principally located using the following catalogues: Blitz, Fich & Stark (1982) and Fich & Blitz (1984) for the SFRs, van der Hucht (2001a) for the WRNs and Green (1984) and Green (2004) for the SNRs. A total of 51 H I features were located, associated with the three different stellar objects: 27 SFR, 12 WRN and 12 SNR (see Table 1). Each feature is isolated in the three dimensions l, b and v. The velocity range of SFRs, located via H II regions, was determined from the associated CO molecular gas. The CO maps come from the Five College Radio Astronomy Observatory (FCRAO) survey (Heyer et al. 1998).
In Paper I, a filter of 3σ rms noise level (which corresponds to a threshold of ∼9 K) was applied to each isolated subcube. This filter ensured the subtraction of the instrumental noise, but did not guarantee the subtraction of the non-associated H I gas (i.e. the H I gas that could travel at the same velocity as the desired feature but located elsewhere along the line of sight in the Galactic plane). This gas has no physical connection with the analysed feature. In this paper, we corrected for this contamination working with cubes having an area 25 times larger than the extent of the feature. This size is large enough that a mean background can be computed without being biased by the feature of interest. The background is subtracted independently for each channel. On a few occasions, the area was diminished to account for intruding features. Most of the contaminating H I is removed this way. This method is evaluated in Section 3.4.
Most output functions are computed using the column density maps of the features. We calculated the mean brightness temperature using only the positive values resulting from the background subtraction over the relevant velocity range.
An in-depth review of the literature was carried out to find the associated characteristics of each object and to confirm the physical association with the H I feature. The excitation parameter u was calculated using the relations derived by Mezger & Henderson (1967) and Schraml & Mezger (1969). These relations depend on the measured flux density from the exciting star in the H II region and on the distance of the feature. The relation of Panagia (1973) between the spectral type of the ionizing star and its excitation parameter was used when the measured flux density was not available in the literature.   Heyer, Carpenter & Ladd (1996); (10) Hunter & Massey (1990); (11) Joncas et al. (1985); (12) Joncas, Roger & Dewdney (1989); (13) Joncas, Durand & Roger (1992); (14) Paper I; (15) Kuchar & Clark (1997); (16) Landecker et al. (1992); calculated from the distance and the Galactic latitude (see Section 4.6 and Table 1).

I M P ROV E M E N T S TO T H E M E T R I C S PAC E T E C H N I Q U E
The mathematical formalism of the MST is based on the comparison of multiple output functions that analyse the topological characteristics of a column density map. In the analogy of a metric space, each H I map represents an element of that space and the output functions its dimensions. The distance functions (called metrics) compare the distance between maps in the metric space to classify them in order of their quantified complexity. In our formalism, each element of the metric space is compared to a uniform map that represents the origin of one dimension. This origin does not necessarily have the same value from one output function to another but must represent the closest uniform state for the evaluated characteristic. This way, we evaluate 'how far' an H I feature is from uniformity or, in other words, its complexity. The coordinate of an object in a metric space can be represented by where g represents the output function, σ is the independent variable of the map, σ 0 is the uniform reference map, or the origin, and d E is the Euclidean metric. The latter can be written in this case as where is the independent variable threshold level value. The variable p is usually chosen to be 2. In this paper, the variable p is chosen to be equal to one, to be consistent with Paper I. This latter value provides the largest coordinate dynamic range. The metric is also normalized by the factor to obtain dimensionless coordinates. Other details of the mathematical formalism can be found in the three papers of Adams & Wiseman (Adams 1992;.
For this paper, we keep the six output functions of Paper I, as follows.
(i) The distribution of density characterizes the fraction m of the map at a column density higher than a threshold value : Here, x represents the position on the map and is the step function. 1 (ii) The distribution of volume, related to the distribution of density, characterizes the fraction of volume v of the map at densities higher than a threshold value .
(iii) The distribution of components characterizes sets of connected pixels, called components, for a fixed threshold value. This output function is a histogram of the number of components as a function of the threshold value. We use the notation n(σ ; ) to denote this distribution. This output function is similar to the CLUMPFIND algorithm introduced by Williams et al. (1994). However, the latter code used contour maps to identify high-density zones. CLUMPFIND works on molecular maps where there is a better discretization of clumps. However, we think that our technique is more appropriate in the case of a more diffuse gas such as H I features.
1 Note that there was an extraneous x in the equation of Paper I.
(iv) The distribution of filaments is directly related to the output function of the components. For each component, a filament index F is assigned, which characterizes the filamentary structure of the component. The definition of the filament index is where P is the perimeter, D is the diameter of the structure and A its area (Wu et al. 2009). For example, the filament index of a circle takes the value of 1. The definition of this output function is where F j is the filament index of the component j.
(v) The distribution of pixel values (hereafter the distribution of column density) is simply a histogram of the column density values of the map. We use the notation j (σ ; x) to denote this output function. Because this output function is not a function of the threshold value, the normalization factor is not butσ , the mean column density value of the map.
(vi) Finally, the last output function from Paper I used in this paper is the average H I spectrum. This output function represents the mean velocity profile of the H I feature. The notation a(σ ; V ) is used to characterize this output function.
As we see in the following subsections, major modifications were made to the output function of the average H I spectrum and the discretization of the column density values.

Average H I spectrum
Contrary to the other output functions, the average H I spectrum does not characterize the topology of the feature as a function of a column density threshold but analyses the kinematics of the structure. The associated metric to this output function in Paper I was simply the integral of the velocity profile of the feature: This is not the best way to evaluate the 'complexity' of a profile when the objective is to compare it to its closest uniform state. As in Fig. 1, the profiles of these average spectra are mostly close to the profile of a normal distribution. Two main values can characterize a normal distribution: the mean and the standard deviation. The mean velocity for H I features is often related to the distance of the feature in the Galactic plane. This information cannot be taken to compare H I features between them. However, the standard deviation, as a measure of the velocity dispersion of the gas around the mean value, is a better measure of the physical perturbation possibly affecting the gas. The skewness calculation could also be an approach to characterize the distribution. Although, for some features, the H I expanding shell is not necessarily complete, it can be cut by other line-of-sight features or by the inhomogeneities of the environment in which the shell evolves (Cazzolato & Pineault 2005). These aspects could notably bias the skewness analysis of the feature's velocity profiles. Thus, instead of calculating the integral of the velocity distribution, we have chosen to characterize the average spectra with the standard deviation of the distribution. The uniform state origin was also modified. If we consider the H I features of our data base as clouds of particles perturbed by external sources such as a massive star or the explosion of a supernova, the  closest uniform state for the velocity of the particles is the Maxwell-Boltzmann velocity distribution of a gas in thermal equilibrium. The common expression of the Maxwell-Boltzmann distribution is well known and its standard deviation is given by Here, m is the mass of the hydrogen atom, k is the Boltzmann constant, T is the temperature of the medium and v is the velocity of the atom. We set the temperature to 100 K and the mass to that of the hydrogen atom to evaluate the standard deviation of reference. This temperature was chosen arbitrarily, distinguishing between the CNM and the WNM. The H I feature discussed here is WNM gas. Thus, the new metric of the average H I spectrum output function is simply the difference between the standard deviation of the velocity profile of the feature and the standard deviation of the Maxwell-Boltzmann velocity distribution: The metric is normalized by the factor 1 km s −1 to obtain a dimensionless coordinate. This factor is valid mathematically and does not modify the physical meaning of the standard deviation. A factor similar to as in equation (2) would be v , but as mentioned above the physical meaning of this factor could bias output function coordinates. The Paper I metric (equation 6) was compared with a flat velocity profile equal to zero. Such an origin compared to a velocity dispersion such as the H I feature associated with Sh 134 in Fig. 1 would overestimate the complexity of the object. The standard deviation of a thermal gas is closer to the reality of interstellar conditions.
Again, the reason we only use the average spectrum is because of the possible coexistence of two equilibrium phases in a region. It is difficult to distinguish WNM components from CNM 'clouds' and for this reason we have chosen to analyse the average spectrum of a region rather than apply a more complete analysis of many lines of sight (position-position coordinates), such as the SCF (Rosolowsky et al. 1999).

Discretization of the column density values
In Paper I, the column density values of images were binned and normalized into 125 bins from minimal to maximal threshold values. This standard discretization consequently changed the integration variable d as a function of the dynamic range of the column density values of the image. As can be seen from equation (2), the value of d can have a non-negligible influence on the coordinate value of an output function. Each d width being less than one, this value could underestimate the complexity of a feature.
As mentioned in , the step size should be fixed according to the map noise level. If the width of d is much smaller than the rms noise, the oversampling could cause extraneous fluctuations in the output function itself. However, an undersampling causes a loss of information in the metric analysis. The rms brightness temperature in the CGPS maps is T B ∼ 3 K. As we work with the column density maps of the H I features, the noise level becomes where N is the number of channels for an object.
We have chosen to equal the width d to the noise level for all maps using the mean number of velocity channels used for all the H I features. N is equal to 16 where the standard deviation is 4. Using equation (9), the new step d is fixed at 0.75 K. This communal interval size instead of a fixed number of interval step has reduced notably the output function's noise, as can be seen in Figs 2(a) and 2(b). Obviously, this improvement has an impact on all the output functions that need a threshold column density .

Power spectrum
The mathematical formalism of the MST allows us to add as many dimensions as we want. More output functions means an improved classification. One of the output functions added in this research is the spatial power spectrum (SPS) of H I features. As the average H I spectrum, this function is different from the other five because it does not characterize the topology of the feature but rather its dynamical behaviour. The SPS measures the power of the column density distribution at different scales of the H I map (see, for example, Dickey et al. 2001). Even though the data cubes are regressed to two-dimensional maps, Miville-Deschênes et al. (2003) and Goldman (2000) have shown that the power spectrum of the three-dimensional density field can be determined directly from the power spectrum of the integrated emission map if (i) the observed medium is optically thin and (ii) the spatial scales observed on the plane of the sky are smaller or equal to the spatial depth of the line of sight.
As for the first condition, we cannot guarantee that the medium is optically thin for every pixel in all isolated H I features, but we have ensured that no visible absorption zones are present on the maps. This first statement also confirms that if the medium is optically thin, the intensity of the map is proportional to the column density of H I in the line of sight (Spitzer 1978;Goldman 2000). These intensity fluctuations arise from density fluctuations in H I features. In this case, the spectral index of column density maps corresponds to the three-dimensional spectral index of underlying density turbulence (Lazarian & Pogosyan 2000). As for the second condition, it is known that distant objects can be smaller on the plane of the sky. If the dimension of the object is comparable to the resolution of  1 arcmin, then the calculation of its SPS is useless. The smallest object in the data bank on which the SPS was calculated is Sh 158a, with a size of 60 × 60 pixel 2 (∼18 × 18 arcmin 2 ).
As described by Miville-Deschênes et al. (2003), the power spectrum is calculated from the sum of the square of the real and imaginary parts of the two-dimensional fast Fourier transform (FFT), azimuthally averaged over each wavenumber κ = κ 2 x + κ 2 y . The FFT 'edge effects' (Dickey et al. 2001) induced by the discontinuities at the edge of the maps was reduced by apodizing over the last 3 per cent of the linear size of the images.
According to Crovisier & Dickey (1983), τ (κ), which is the Fourier transform of the brightness temperature distribution of a map, is in fact the sum of the Fourier transform of the H I line, the continuum distribution, and of a fluctuating term due to system noise. The energy of these three terms can be added so that The continuum simultaneously provided by the telescope during the observations was subtracted during the first step of data reduction. The SPS of the H I feature associated with the H II region Sh 134 is shown in Fig. 3(b). Noise contributes significant power at scales 0.001 < κ < 0.01, and at κ ≈ 0.01 there is an abrupt power drop as a result of the beam of the interferometer. To reduce these constraints, we must increase the signal-to-noise ratio of the maps. For this, we used a filter based on a wavelet decomposition developed by Miville-Deschênes et al. (2003). Fig. 3(a) shows the original column density map of the H I feature associated with Sh 134 and Fig. 3(c) shows the filtered column density map. The SPS of the filtered map is shown in Fig. 3(d). Noise reduction at small scales is significant and allows for a better evaluation of the power law in the linear range of the spectrum. Vertical lines show the limits for the calculation of the slope. The dotted line is the best fit and the two dashed lines show the uncertainty limits. These limits were chosen from the SPS of an empty column density map of the CGPS where only the noise was present. Fig. 4 shows the SPS of a filtered column density map of 10 averaged empty velocity channels of 1500 × 1500 pixel 2 . The abrupt power drop is also visible, as in Fig. 3 Finally, the origin (reference map) for this new output function has to be found. What is closest to a uniform state in Fourier space? As the SPS is often used to determine if a medium is characterized by turbulent behaviour, the property of isotropic, homogeneous turbulence is used as reference. According to the Kolmogorov theory, elaborated for an incompressible medium, the energy in a turbulent medium is first injected at large scales from where it cascades to lower scales until it dissipates as a result of the viscosity of the medium. The three-dimensional Kolmogorov spectrum follows the relation where κ = 1/l. κ is the wavenumber and l is the scale of the image in arcsec. E(κ) is the energy spectrum. The new metric for the output function of the power spectrum is the difference between the Kolmogorov spectrum power-law exponent and the measured power-law exponent of the object: Here, β(κ) is the power law of the H I feature. Normalization is not necessary for this output function as it is already dimensionless.

Genus statistics
The genus statistics was first introduced in a metric space context by Adams (1992). The genus, as a technique in quantitative topology, is another excellent tool to evaluate the complexity of an H I feature (Chepurnov et al. 2008) and is used here for the first time in the MST.
The basic idea of the genus (Gott et al. 1986) is to work with a closed surface S . The surface S separates regions with a density higher than a given density threshold of regions with a density lower than . By comparison, the distribution of component output functions evaluates the total number of topological components as a function of . We can define the surfaces S as the area element dA by using the Gauss-Bonnett theorem integral where κ is the Gaussian curvature of the surface S . Simply, we can consider the genus as the number of holes that the surface contains. The genus G function becomes (Chepurnov et al. 2008): G ≡ (number of isolated high-density regions) − (number of isolated low-density regions). As Gott et al. (1986), we have chosen to take the notation ν instead of the column density as the independent variable. The variable ν represents the deviation from the mean column density value. A value of ν of 1 means a threshold of one standard deviation from the mean column density value. The choice of this variable makes the reading of the output function easier. Fig. 5 shows, on the 'plus sign' curve, the genus of the H I feature associated with Sh 141. At ν = 0, the number of 'holes' does not equal the number of components. Where G = 0 on the central slope of the function, ν 0 deviates to the left side of the origin with a value of −0.24. This behaviour means that the analysed H I feature has a clumpy topology rather than a Swiss-cheese topology.
Previous quantitative interpretations of the genus distribution of a map were based only on the parameter ν 0 . However, the metric associated with this output function should provide more information if the shape of the distribution is also analysed. For example, a slower drop of the wings at high ν than for a random Gaussian distribution field means that the components are more discrete and pronounced in the feature (Chepurnov et al. 2008). Accordingly, the reference map associated with the new metric of the genus output function was chosen to be a random Gaussian distribution field of   components. This distribution is considered as the neutral topology for a genus distribution and also as the closest uniform state for an unperturbed field of components in a medium. Mathematically, it is written as (Coles 1988;Chepurnov et al. 2008): This function is represented by the dotted line in Fig. 5. The dispersion and the amplitude are fitted on the genus distribution of Sh 141. This fit is applied to each object of the data base in order to compare the object with its appropriate closest uniform state. The metric associated with this new output function is where A and σ are the free parameters. The metric described in equation (16) provides a better characterization of the distribution than ν 0 alone. The normalization factor G max is the extremum value of the genus measured for the feature. The integral limits that minimize the function are from −∞ to ∞.

New metric space
The genus statistic output function uses a metric function similar to other output functions already used in Paper I. This metric, as a function of a threshold value of the column density, satisfies the properties of a pseudometric space, as described in Paper I. However, the average spectrum and the power spectrum output functions are different because they are based on the difference between two scalar values. This difference defines a new metric space that coexists with the other pseudometric space. The proof that these new metrics satisfy the properties of a pseudometric space is developed in Appendix A.

Calculation of the uncertainties
With the new discretization of the column density step size (see Section 3.1.3), it can be considered that the rms noise does not influence much the evaluation of output functions, as it is below the lowest threshold. As in  and Paper I, for the output functions of components and filaments we consider only components with three pixels or more. This condition prevents the appearance of extra components as a result of noisy pixels pushing up above the threshold level ). However, these manipulations address only the effects of instrumental noise. As discussed in Section 2, an H I background was subtracted from each H I feature in order to improve the accuracy of the complexity estimates. Of course, this subtraction will not remove perfectly what is basically a random contribution across the map. Foreground or background emission from optically thin gas along the line of sight contributing to the isolated H I features cannot be predicted.
To evaluate the impact of the H I background on the complexity ranking of an isolated feature, statistical simulations were run using CGPS data. An artificial H I bubble was added to different CGPS subcubes, which were run through the analysis pipeline. The simulated H I bubble does not have to be realistic but only to satisfy the general shape and size of real objects. The bubble is circular and has an intensity gradient from centre (low) to edge (high). This structure could represent, for example, the systemic velocity of an elementary expanding H I bubble. The bubble had a diameter of 180 × 180 pixel 2 . It was added to CGPS subcubes of 12 velocity channels and a spatial size of 900 × 900 pixel 2 , which is representative of the data set analysed in Section 4. The velocity distribution of the simulated bubble had a Gaussian profile through the entire velocity range. This simple model does not simulate adequately an expanding H I shell in the ISM, but is sufficiently representative of the physical process for our purpose. Because the velocity dispersion within the shell often causes the absence of observable caps and the presence of a stationary ring (Cazzolato & Pineault 2005), the proposed model is not too far from the observed reality.
The best way to evaluate the impact of the H I background on features is to repeat this procedure for several CGPS subcubes and then to measure the dispersion of the complexity ranking for all C 2010 The Authors. Journal compilation C 2010 RAS, MNRAS 405, 638-656  Table 2. To evaluate the complexity rank, the dimensionless coordinates are considered as vector elements in the multiple dimension metric space. Each element is normalized by the largest coordinate for each output function. Then, we sum the normalized coordinates to obtain the complexity rank given in the last column. The mean complexity rank of the 30 models is 5.94 and the standard deviation is 0.41. The measured dispersion means that the varying H I background has a significant effect on the complexity rank of the feature. This represents an estimation of the uncertainty of the complexity rank. The last row of Table 2 shows the standard deviation of all output functions. The distribution of column density has a higher value among the eight functions. Contrary to the distribution of components, which identifies topological structures (and has the smallest standard deviation), the distribution of column density is simply a histogram of the column density value of the map. As this function is less connected to the topology of the analysed feature, it is more sensitive to the H I background contribution. The genus output function has the second higher standard deviation value. Besides the distribution of components of the H I feature, the genus function is also sensitive to the distribution of 'holes' in the map and thus to the sides of the map where a small contribution of the H I background could change the value of G in the function of the threshold (see Section 3.2.2).
The impact of the uncertainty on the complexity rank is discussed in the following section. Table 3 lists the coordinates associated with all output functions and the global complexity for 40 objects. The complexity rank is calculated as described at the end of Section 3.4.

R E S U LT S
The data base is divided into two sections, one including H I features on which the power spectrum and the genus statistics were evaluated (see Table 3) and one where they were not (see Table 4). Maps having less than 50 pixels of width or height did not contain enough large-scale information to evaluate the power spectrum or to fit a reliable function to their genus curve.
The uncertainty of 0.41 on the complexity rank of Table 3 divides the list of objects into approximately five groups of distinct complexity. Each group contains a mix of the three types of object considered in the analysis. This behaviour means that the calculation of the complexity rank cannot be used as a classification tool of object origin. However, in Section 4.4 we discuss the possibility of such tools in the case of output functions taken independently.

Wolf-Rayet environment complexity
In Paper I, H I features associated with SNRs were generally more complex than H I features associated with WR stars. The mean complexity of WR H I features was 3.65 (4.58 for the SNR H I features).  number of WRs and SNRs was four for each type in Paper I, which is a relatively poor statistic by comparison to this paper. Five of the WR H I features are part of the 10 most complex features. WR 140 has been identified in Paper I as a deviant WR feature because of its great complexity. Now, WR 140 is part of the five most complex WR features with four other new features. In Paper I, the great complexity of WR 140 was connected to its filament coordinate, which was one of the greatest, and its unique characteristic of being a binary system (Williams et al. 1997) contrary to the three other WR objects. In this paper, WR 140 preserves its large filament coordinate in spite of the modification of this output function and four new WR stars of the data base are now binary systems: WR 139 (Marchenko et al. 1997), WR 143 (Varricatt & Ashok 2006), WR 151 (Villar-Sbaffi et al. 2006) and WR 153, which is a quadruple system (Demers et al. 2002). Except for WR 139, all these objects are highly ranked in complexity. The complexity of WR 153 is mainly a result of its great power spectrum coordinate and for WR 143 and WR 151, it is mainly because of their average spectrum coordinate. It seems that these systems have significant physical actions on their environment to modify the topology and the dynamics of the associated gas. All these objects are also among the higher coordinates of genus statistics, which signify a great departure from a random Gaussian distribution of clumps. WR 151 is the most complex WR H I feature. According to Villar-Sbaffi et al. (2006), this binary system probably has asymmetric winds and is the second shortest period of WR+O system known in our Galaxy (P = 2.12 d). The next most complex feature is WR 144. It is the only member of the five most complex WR H I features that is not part of a multiple star system. As is discussed in Section 4.4, there exists a correlation between the wind velocity of the WR and the complexity of the H I feature associated. Table 1 shows that WR 144 has a wind velocity of 2400 km s −1 , which is one of the highest velocities with the four other most complex WR H I features.
In spite of its binarity, WR 139 does not show a peculiar complexity. This WR is part of the well-known eclipsing binary V444 Cygni (WN5 + O6 V−III; Marchenko, Moffat & Koenigsberger 1994;Marchenko et al. 1997). The departure from the other WR systems is mostly reflected by the filament and the genus output functions. V444 Cygni is part of the association Cyg OB1. All this area is superposed on a diffuse nebulosity and is surrounded by a supershell. For this reason, individual members of Cyg OB1 are not expected to be surrounded by interstellar features (Miller & Chu 1993). V444 Cygni presents only small evidence of high-velocity surrounding gas (Marchenko et al. 1994). However, other WR stars in the association possess some associated features in their vicinity (see Pineault, Gaumont-Guay & Madore (1996) for WR 134, WR 135 and WR 137). The location of the binary in this active area could explain the difference in complexity for nearby H I gas.

Most complex features
The SFR Sh 137 has the highest complexity rank in Table 3. This object also has the highest volume, column density and genus coordinates. However, we must be careful in this case. For computational reasons, all images analysed with the MST are squares or rectangles. Fig. 6 shows the column density map of the Sh 137 H I feature.
The main filament has an angle of ∼45 • with the edge of the map. This configuration leaves large empty areas in the upper-left and lower-right corners of the map. These areas affect particularly the measure of the fraction of volume as a function of a threshold value of the column density (the distribution of volume), the number of low column density values (the distribution of column density) and the number of 'holes' as a function of the threshold (the genus statistics).
The SNR G116.5+1.1 has a behaviour similar to Sh 137 for these three output functions. In this case, a large H I shell is present around the supernova. The expanding shock wave of the explosion left an inhomogeneous diffuse region inside the shell (Yar-Uyaniker, Uyaniker & Kothes 2004). Here, the feature is correctly limited by the rectangular shape of the subcube and the behaviour of these three output functions illustrates appropriately the morphology of the feature.
Sh 139 has the second highest complexity rank. Its high rank is mainly because of the power spectrum coordinate, which is the highest among the 40 objects. The fragmented structure of the feature is also characterized by the relatively high components, density and volume coordinates. The real association of the H I feature is hard to confirm and there is a lack of information about this object in the literature. The dimensions of the subcube were chosen according to the associated CO spatial (29.4 × 35.4 arcmin 2 ) and velocity extent (from −40.21 to −49.28 km s −1 ). The average H I velocity profile is asymmetric. These results could justify future studies of this peculiar complex object.
CTB1 is the second most complex SNR feature analysed by the MST. The feature was first isolated from the velocity range determined by Landecker, Roger & Dewdney (1982). Fig. 7 represents the average spectrum of the feature. Two distinct emission profiles can be seen in this spectrum. Yar-Uyaniker et al. (2004) consider the first bubble, extending from −16 to −25 km s −1 , as an evolved structure created by stellar wind effects and/or a supernova explosion a long time ago. The associated H I shell would be created inside this old bubble and blueshifted in the Local arm. The CTB1 supernova shock wave moving toward us was blocked by the edge of the old bubble, which caused the blueshift of the feature. The associated H I feature should have a velocity range between −25 and −36 km s −1 .
An analysis of the two features CTB1 A and B has been made independently with the MST. Feature A has a velocity range between −13.01 and −24.55 km s −1 and feature B between −25.37  and −36.09 km s −1 (see Figs 8a and 8b). Fig. 9(a) shows the average spectrum profile for feature A. The emission centred at −19.60 km s −1 is almost merged with the high background emission level. The surface used to subtract the average background was exceptionally 1.5 times larger than the size of the actual feature seen in Fig. 8(a), instead of the usual 25 times mentioned in Section 2. The smaller size of the average background was chosen to avoid the bias produced by great H I structures located in the CTB1 H I feature neighbourhood. This choice explains the high background emission level seen in Figs 9(a) and 9(b). A background subtraction on a surface 25 times larger makes the profile of CTB1 A no longer visible in the average spectrum of the region.
The previous spatial boundaries were defined following the radio continuum dimension of the structure, and hence smaller than the actual H I-line feature isolated according to the figures of Yar- Uyaniker et al. (2004). The lack of contrast and the diffuse structure of feature A is an indication that it has started to merge with the surrounding medium (Yar- Uyaniker et al. 2004). By comparison, the average spectrum profile of feature B shows that the mean background subtraction did not affect it (see Fig. 9b).
The MST analysis results for features A and B are shown in Table 5. The MST analysis of feature B reveals a smaller complexity rank than the original CTB1 feature with a coordinate of 5.67 (Table 3). Obviously, the new feature B loses in the average spectrum coordinate and steps down to 2.00, but grows significantly in its genus and filament coordinates with 2.25 and 5.81, respectively.   Feature A becomes the second most complex feature associated with a SNR with a coordinate of 5.87. Its average spectrum coordinate also dropped, but it is probably biased by the poor contrast in the intensity of the feature. Its high complexity rank is mostly characterized by the output functions of the distribution of density, volume and filaments. This behaviour seems typically related to H I features associated with WRNs and SNRs. These results could corroborate the assumption of an evolved structure created by a massive star progenitor or a supernova explosion, as mentioned by Yar-Uyaniker et al. (2004). For this reason, CTB1 A will be kept in the data base as a SNR H I feature for the rest of the analysis.

Least complex features
The least complex feature is Sh 173. In Paper I, Sh 173 was more complex than is derived in this paper. One of its lowest coordinates is associated with the genus statistics. As Sh 134, Sh 135 and Sh 158a, the features of lower complexity seem to reveal the 'model' behaviour of dissociated H I gas. Their velocity profiles are mostly symmetric (e.g. Fig. 1) and their ν 0 parameters are near zero with a small departure from a Gaussian component distribution. These objects of low complexity could reflect 'perfect' star formation conditions or young SFRs. In the case of Sh 173, the age of the H II region is evaluated to be 0.6-1.0 Myr (Cichowolski et al. 2009).

Wolf-Rayet, supernovae and filaments
In the previous section we have discussed the tendency of H I features associated with WRNs and SNRs to have higher coordinates related to the distribution of density, volume and filament output functions. The most notable partition arises in the filament coordinate. For this coordinate, ∼75 per cent of the features associated with SNRs and WRNs are in the higher half. This interesting behaviour could make the filament output function a potential classification tool to discriminate H I features associated with SFRs or WRNs and SNRs. In a recent work, Myers (2009) studies the origin of the filamentary structures seen in SFR. Myers (2009) also cites many works on the filamentary structures of several parsec near the main dense region associated with SFRs seen on CO, submillimetre and infrared maps. However, the structures seen in H I line emission outline different physical processes and allow us to trace the photodissociation region of a SFR. Also, the filamentary structure detected in WRNs and SNRs in this current work probably have different origins than those discussed in Myers (2009).

Impact of the proposed modifications on the metric space
Many changes have been made to the MST since Paper I. These modifications have a significant impact on the analysis of the data base.
The modification of the filament index F calculation has changed the dynamics of the output function coordinates. The old index calculation applied to the 51 objects in the current data base has a mean coordinate of 4.23 and a standard deviation of 0.45. The new index has a mean coordinate in the current data base of 5.17 and a standard deviation of 0.61. These results indicate that equation (4) increases, in general, the mean filament index of components as well as the discretization and the dynamics of coordinates.
The impact of the new metric associated with the average spectrum output function is difficult to quantify as it is totally different. However, the new approach is now conforming to the metric definition introduced by Adams (1992): the reference map must represent the closest uniform state of the analysed characteristic. In this sense, the new interpretation of the complexity made with this modified output function is more exact.
The addition of two new output functions also has a significant impact on the complexity of objects. The more the characteristics are analysed by functions, the more precise the quantification of complexity will be. New output functions give not only a better sensitivity to topological and kinematical aspects to the global complexity of features, but also enable new correlation possibilities for the interpretation of H I cloud behaviour in the ISM.

Correlations
As in Paper I, the correlation analysis is applied to the entire data base between the complexity of H I features and physical parameters. The Pearson correlation factor is a dimensionless number that quantifies the linear connection between two random variables. For two distributions x and y, where is the covariance, N is the number of elements in the distributions and S xx and S yy are the standard deviations of the distributions x and y, respectively. The factor r has a definite value between −1 and 1. A correlation factor of 1 means a perfect correlation between the two distributions and a null factor implies two linearly independent distributions. A negative value means an inverse correlation. However, a correlation factor different from zero does not necessarily mean that a significant correlation exists between the two distributions.
The correlation results are shown in Table 6. The percentage gives the confidence level upon which the Pearson correlation factor is not null. The confidence level depends on the number N (Sokal & Rohlf 1987). For each sample of N objects, a confidence limit r lim corresponding to typical confidence levels is calculated. As the samples are relatively small (N < 500) and thus do not perfectly satisfy a normal distribution, the factor r must be converted by Fisher's transformation: We consider that the variance of r , σ 2 r = 1/(N − 3), is a good approximation even if we deal with small samples.
where the number z is the confidence limit of a normal distribution for a given confidence level. For example, a confidence level of 95 per cent corresponds to the z number 1.96, and 80 per cent to 1.28. For the calculations of Table 6, r in equation (21) equals zero because the calculation tests the confidence level upon which the correlation factor is not null. Then, the inverse transformation of equation (20) is calculated to recover the confidence limit r lim corresponding to a given confidence level. The smaller N is, the larger the variance σ 2 r . Thus, according to equation (21), the correlation factor r needs to be higher for small samples to satisfy a significant confidence level. Confidence levels of 80 per cent and above are shown in Table 6 but only confidence levels of 90 per cent and above are discussed. This threshold was chosen in accordance with recurrent confidence analysis and most statistical tables. Because of the differences in the metric calculations and in the characteristics evaluated by the output functions, three different complexity ranks are calculated. The first represents the same complexity as given in the last column of Tables 3 and 4, which is the sum of the eight normalized output function coordinates. The kinematical complexity represents only the sum of the average spectrum and the power spectrum normalized output function coordinates, and the topological complexity is the sum of the six other normalized output function coordinates.
The complete data base of Tables 3 and 4 is used for the correlation calculation when it is possible (e.g. filament distribution versus high |z| or distance d; average spectrum versus u parameter of all H II regions).

Complexity versus age
Correlations between the age of the SNRs and their output function coordinates are all inversely proportional. Because of its uncertain origin, the feature CTB1 A is not taken into consideration in the analysis. We note that the current results are different from the previous evaluation from Paper I where positive correlations were found. This behaviour can be explained by the small statistics of Paper I in addition to the improved data reduction used in the present paper. This new result could demonstrate a relaxation effect of the medium and a slow dispersion of the H I shell in the ISM background with time.
A significant inverse correlation is measured between the age of the SNRs and the average spectrum output function coordinates. The confidence level is 90 per cent. With reference to equation (8), an inverse correlation means that the older the SNR is, the more the velocity profile of the H I feature corresponds to the profile of a gas in thermal equilibrium. In spite of the relatively low confidence level, this analysis of multiple SNR H I features could suggest an energy dissipation tendency with time after the supernova explosion, as expected.
The behaviour seems different for SFRs. The ages of the regions are found according to the associated star cluster. Among the seven ages found, two correlations are significant, one positive with a 90 per cent confidence level related to the kinematical complexity and another negative at 90 per cent related to the genus statistic function. These correlations are calculated only for the five larger H I features associated with SFRs (Sh 142,Sh 158a,Sh 199,Sh 212 and Sh 217). The other correlation coefficients of less confidence related to topological output functions are also negative. These results show that the topology of the H I photodissociated regions loses complexity as the regions become older and there is an inverse tendency for the kinematical complexity. The ionizing flux of massive stars makes uniform the H I interface from the clumpy topology of the original molecular cloud. However, the stellar wind of massive stars seems to influence the kinematics of the photodissociated H I gas, as Kothes & Kerton (2002) show in their model.

Complexity versus excitation parameter u
A directly proportional correlation with a confidence level of 95 per cent is evaluated between the excitation parameter u and the kinematical complexity of photodissociated H I. This significant correlation is mainly caused by the average spectrum output function coordinates. The excitation parameter u (Mezger & Henderson 1967;Schraml & Mezger 1969) gives a measure of the absolute flux of photons produced by the ionizing stars. This correlation means that the larger the ionizing flux is in the H II region, the more these stars influence the velocity and the dynamics of the H I in the medium. The ionizing flux seems to affect more significantly the velocity profile than the power spectrum of the H I feature. The models show that in a SFR the dissociated gas has a layered structure with a wide unshocked layer formed by the dissociation front. The thinner shocked layer of H I will be located between the expanding ionization front and the unshocked layer (Kothes & Kerton 2002). Except for the case of extremely young systems, the shocked layer of H I could expand with the H II region. The result of the MST could illustrate this behaviour as a function of the ionizing flux of H II regions.
An inversely proportional correlation with a confidence level of 90 per cent is also calculated between the genus output function coordinates and the excitation parameter of H II regions. Except for Sh 134 and Sh 158a, all genus functions associated with H II regions have a ν 0 smaller than zero. This means that the majority of the H I gas associated with H II regions has a clumpy topology rather than a Swiss-cheese topology. This topology is similar to molecular cloud topology. According to Roger & Dewdney (1992), the radii of the dissociation zone (H I) will be more than five times larger than the ionizing radii (H II) for later spectral types and only 1.5 times larger for earlier spectral types. The dissociation zones around early-type stars are more rapidly eroded by the ionizing fronts. Here, the inversely proportional correlation means that the stronger ionizing flux of stars (early-type stars) is associated with weaker deviations of the distribution of clumps from a random Gaussian distribution. Thus, H I features associated with early-type stars seem less diffuse and more randomly distributed, as predicted by the models. Fig. 10 shows two examples of the genus function for H II regions Sh 137 and Sh 154. These examples show that great departures exist between the genus functions of SFRs.

Complexity versus wind velocity
Many correlations are found between the stellar wind velocity of WR stars and their topology. Confidence levels of 99 and 95 per cent were found for the overall complexity and the topological complexity, respectively. A strong correlation is calculated for the genus function coordinate with a confidence level of 99 per cent. Every ν 0 on the WR genus curves is shifted on the negative side of the graph, which reveals a clumpy topology for the associated H I gas. This analysis also shows that the more intense WR stellar winds are associated with a greater departure of the clump distribution from a random Gaussian distribution. Some WR stars have nonhomogeneous stellar winds. However, it cannot be assumed that the energy introduced by the WR star induces turbulence in the medium, an effect that could directly influence the clump distribution in its environment. According to Acker et al. (2002), stellar wind inhomogeneities should be flattened by pressure waves in the hot bubbles. It is not certain that the wind clumps can reach nebular distances. Nevertheless, Acker et al. (2002) allow that other effects such as multipolar flows or the orbital motion of binary systems could affect the global structure and dynamics of the nebulae. The weak dependence between the power spectrum, which could characterize a turbulent behaviour, and the wind velocity does not show a correlation coefficient significant enough to be mentioned here.

Complexity versus |z|
The height |z| above the Galactic plane is correlated with some output functions, especially with SNRs. Confidence levels of 95 and 90 per cent are found for inverse correlations with the distributions of filaments and density, respectively, in the case of the SNRs. The confidence level remains the same for the distribution of filaments without G76.9+1.0 H I features a and b, which have a larger z value that could bias the correlation calculation but becomes under 80 per cent for the distribution of density. The stronger correlation related to the distribution of filaments is interesting. The higher the SNR is above the Galactic plane, the less the associated H I gas is filamentary. The same behaviour is found for all objects. We see also from the genus statistics that the departure from a random Gaussian distribution of the components decreases with the height |z| for all objects, especially for SNRs (with G76.9+1.0 H I features) and H II regions.
The correlation between the column density distribution of WR features and the height |z| is uncertain. The correlation is mostly a result of the high column density coordinate of WR 4. The confidence level of the Pearson correlation factor is not sufficient without this object.
The correlation factor was also calculated with the 13 objects that have |z| above 100 pc, where H I begins to reach its half-thickness and half-intensity in the Galactic plane (Binney & Merrifield 1998). A confidence level of −95 per cent is found between the three object types and topological complexity. This correlation is mostly a result of the distribution of components, filaments, density and volume. These results could reflect a connection between the density of H I gas in the Galactic plane and the complexity of H I features.

Complexity versus distance
Some significant correlations are found between the output function coordinates and the distance of objects. In Paper I, a modification on the normalization of the output function of component and column density distributions was introduced to alleviate the distance effects on the coordinates. We can see in Table 6 that the component output function presents no correlation with the distance, but the column density distribution and other output functions do. These correlations reflect most correlations found with height |z| and the excitation parameter u.
There is a significant correlation with a confidence level of more than 99.9 per cent between the distance of objects and the height |z|. The well-known relation that connects these two parameters is where d is the distance to the object and b is its Galactic latitude. From this fact, we consider that most of these correlations come from the dependence with height above the Galactic plane rather than a dependence with the distance from us. A similar correlation exists between the average spectrum and the excitation parameter u. The relation used in this paper to evaluate u (Schraml & Mezger 1969) is also directly dependent on the distance d. We assume that the correlation between the distance and the average spectrum of the H II regions comes from the correlation with the excitation parameter u and the average spectrum (see Section 4.6.2) rather than with the distance of objects.

Correlations compared to Paper I
Compared to Paper I, the analysis has been improved by considering a larger number of objects. Mostly for this reason, minor discrepancies can be seen between the current analysis and Paper I. The inverse correlations between the SNR complexity and age are in disagreement with the previous results in Paper I. However, the previous correlations were calculated on only four objects. As mentioned in Paper I, the result was difficult to explain. The new correlations are in agreement with most models of the interaction of a shock wave with ISM clouds (Klein, McKee & Colella 1994;Xu & Stone 1995), which predict a destruction of the shocked cloud by fragmentation on a time-scale of the order of their radius times 10 5 yr. This behaviour could reflect a relaxation of the gas after the shock wave has passed through the medium. The correlation associated with the photodissociated H I emission and the ionizing flux remains the same. H I associated with intense flux emission from massive stars is more complex. Moreover, the new genus statistic output function allows the measurement of a C 2010 The Authors. Journal compilation C 2010 RAS, MNRAS 405, 638-656 convincing correlation between the distribution of the clump and the spectral type of the ionizing star.
A stronger correlation than in Paper I is found for the complexity of the SNR H I feature and the height |z| above the Galactic plane. The decrease of the complexity of the H I feature with height |z| is also present for the three types of object with the increased data base of this paper.
The correlation between the wind velocity and the complexity of the associated H I feature is confirmed. This correlation relies now on 12 objects compared to only three in Paper I. Moreover, the proportional dependence relies on the two types of WR stars: carbon-type (WC) and nitrogen-type (WN). Except for the star WR 151, generally WC stars seem associated with more complex H I feature than WN stars. This discretization could reinforce the hypothesis of Paper I that H I shells associated with WN stars may be in transient phase and thus, in fact, less complex. WR 151 has the most rapid velocity wind; its binary system particularities (see Section 4.1) are probably more related to the complexity of the H I shell than the evolutionary state of the WR star.

S U M M A RY
The MST used in Paper I to quantify the complexity of H I features of known origin is significantly improved. The analysis was carried out on a data base increased from 28 to 51 objects. A better data reduction technique was used to avoid as much as possible the contamination of H I features by diffuse background or non-related structures. The calculation of the filamentary index of components was modified to take into account a concave topology of clumps or filaments. The metric formula of the average spectrum output function now accurately respects the formalism established by Adams (1992). Two additional output functions were added to the metric space: one that allows us to learn about the dynamics of H I features, the power spectrum, and another tool that can quantify the distribution of clumps versus holes in features, the genus statistic. Finally, the reliability of output function coordinates is ascertained with an rms noise-dependent integration variable and the evaluation of uncertainties associated with the non-related structures in interstellar maps.
The results rising from the MST analysis allow us to make the following interesting hypotheses about H I features of known origin.
(i) The H I features associated with WR stars in multiple systems seem to be more complex than H I features associated with a single WR star.
(ii) The H I features of WR stars and SNRs are more filamentary than gas associated with SFRs.
(iii) Older SNRs are associated with gas in thermal equilibrium.
(iv) H I associated with intense flux emission from massive stars is more complex.
(v) The higher the wind velocity of the WR star, the more complex the H I topology.
(vi) The higher the H I feature above the Galactic plane, the less complex it is.
This large-scale study of H I features allows us to observe some general behaviour of H I gas in the Galactic plane. An objective characterization of H I features also allows us to make new connections between their topology or kinematic characteristics and their past history as well as their interaction with the entire Galactic system. For example, the seemingly exclusive connection of H I filaments with the environment of WR stars and SNRs bodes well for the application of a systematic use of the topological MST across the plane of the Galaxy to determine the energy budget of the ISM. This mathematical formalism has been applied here to a 21-cm interstellar map, but it can surely be applied to any astrophysical map. In addition, some of our findings can be used to test the results of models studying interactions between stars and the ISM.

AC K N OW L E D G M E N T S
We thank Marc-Antoine Miville-Deschênes for providing us with his filter program based on the wavelet decomposition. The CGPS is a Canadian project with international partners, and was supported by the Natural Science and Engineering Research Council (NSERC).
It must be verified that This inequality is known as the Minkowski inequality (Goldberg 1964). We will not demonstrate this here, but this well-known theo-rem proves that the new metrics satisfy condition (A5). Naturally, as in equation (A3), equation (A12) can be considered for n = 1. This confirms that the new space created by the two new output functions is a pseudometric space, as for the six other output functions. This paper has been typeset from a T E X/L A T E X file prepared by the author.