The dynamical evolution of star-forming regions measured with INDICATE

Observations of star-forming regions provide snapshots in time of the star formation process, and can be compared with simulation data to constrain the initial conditions of star formation. In order to make robust inferences, different metrics must be used to quantify the spatial and kinematic distributions of stars. In this paper, we assess the suitability of the INDICATE (INdex to Define Inherent Clustering And TEndencies) method as a diagnostic to infer the initial conditions of star-forming regions that subsequently undergo dynamical evolution. We use INDICATE to measure the degree of clustering in N-body simulations of the evolution of star-forming regions with different initial conditions. We find that the clustering of individual stars, as measured by INDICATE, becomes significantly higher in simulations with higher initial stellar densities, and is higher in subvirial star-forming regions where significant amounts of dynamical mixing has occurred. We then combine INDICATE with other methods that measure the mass segregation, relative stellar surface density ratio and the morphology (Q-parameter) of star-forming regions, and show that the diagnostic capability of INDICATE increases when combined with these other metrics.


INTRODUCTION
Most stars form in groupings of 10s to 1000s of members, located along dense filaments embedded within giant molecular clouds (GMCs) (Lada & Lada 2003;André et al. 2014).By understanding the origins of star-forming regions 1 and their subsequent evolution and dispersal we can gain further understanding of Galactic evolution and the effects on planetary systems (Laughlin & Adams 1998;Smith & Bonnell 2001;Bonnell et al. 2001;Parker & Quanz 2012;Daffern-Powell et al. 2022;Rickman et al. 2023).
As an example, a planetary system is unlikely to suffer dynamical perturbations in a low-density star-forming region that rapidly disperses into the Galactic field, compared to a high-density region that remains gravitationally bound for many dynamical timescales (Vincke & Pfalzner 2016).In the latter case, the orbits of planets can be altered, either through direct interactions (e.g.Parker & Quanz 2012;Daffern-Powell et al. 2022), or subsequently, via secondary ef-★ Email: gablaylock-squibbs1@sheffield.ac.uk † Email: r.parker@sheffield.ac.uk ‡ Royal Society Dorothy Hodgkin fellow 1 In this paper we use the term 'star-forming region' to refer to a population of young stars that we assume formed from the same Giant Molecular Cloud.This star-forming region may be gravitationally bound, in which case it would be classified as a cluster, or unbound, in which case it would be an association.
As we model the evolution of both bound and unbound stellar populations in this paper, we use 'star-forming region', though some researchers exclusively reserve this term for populations of stars still surrounded by gas left over from the star formation process.
Recent work has suggested that even low-density star-forming regions are detrimental to planet formation if massive stars are present, whose FUV and EUV radiation fields are strong enough to evaporate protoplanetary discs during formation (Scally & Clarke 2001;Adams et al. 2004;Winter et al. 2018;Concha-Ramírez et al. 2019;Haworth et al. 2023).
We therefore need to quantify the evolution of star-forming regions so that we can quantify the effects of the star-forming environment on planetary systems, by determining how long a given star-forming region is likely to remain gravitationally bound before dispersing into the Galactic field.
However, an observation of one star-forming region only provides information about the stellar density and velocity field at that snapshot in time.If all star-forming regions formed with the same initial conditions, we could build up a statistical picture of the evolution of star-forming regions by observing more than one region.
This approach is complicated by the likelihood that star-forming regions form with a range of masses (Portegies Zwart et al. 2010), a range of stellar densities (Parker 2014;Parker & Alves de Oliveira 2017;Parker & Schoettler 2022), different degrees of initial substructure (Girichidis et al. 2012;Dale et al. 2012Dale et al. , 2013Dale et al. , 2014;;Dib & Henning 2019;Daffern-Powell & Parker 2020;Dib et al. 2020;Schneider et al. 2022) and a range of initial virial states, with compact, bound clusters being the tail of a broad distribution (Kruĳssen 2012).This means that two star-forming regions that had very different initial conditions may evolve to be similar in appearance, or vice versa, something termed the 'density degeneracy problem'.
To overcome this problem, and to pinpoint the initial conditions of an observed star-forming region, different measures of the spatial distribution of stars, e.g. the mean surface density of companions (Larson 1995;Bate et al. 1998;Gouliermis et al. 2014), the Q-parameter (Cartwright & Whitworth 2004;Schmeja et al. 2008;Sánchez & Alfaro 2009;Lomax et al. 2011;Dib et al. 2018;Dib & Henning 2019) and the spatial distribution of massive stars relative to the low-mass stars (e.g.Allison et al. 2009;Küpper et al. 2011;Maschberger & Clarke 2011), can be used to provide information.
Previous work has shown that combinations of these measures can be used as a dynamical clock, to infer the amount of dynamical evolution that a star-forming region has undergone (e.g Parker et al. 2014 (Buckner et al. 2019;Nony et al. 2021;Buckner et al. 2023).Until now IN-DICATE has not been applied to purely -body simulations of the long-term (10 Myr) dynamical evolution of star-forming region.
In this paper we measure the clustering evolution of -body simulations of star-forming regions using INDICATE.We also combine INDICATE with other methods (the Q-parameter (Cartwright & Whitworth 2004), Σ LDR (Maschberger & Clarke 2011), Λ MSR (Allison et al. 2009)) for different snapshots in our simulations to assess the diagnostic ability of INDICATE to pinpoint the initial conditions of star-forming regions.
The paper is organised as follows.In Section 2 we describe the set-up of our -body simulations, and briefly define the different metrics for quantifying the spatial distribution of stars within the region, including INDICATE.In Section 3 we present our results, and we conclude in Section 4.

METHODS
In this section we describe the setup of the -body simulations before describing the methods used to quantify the clustering, morphology, surface density and mass segregation of star-forming regions as they dynamically evolve.

𝑁-Body simulation set up
We utilise the simulations previously described in Blaylock-Squibbs & Parker (2023).We have eight sets of simulations, each with different initial conditions (i.e. initial degree of substructure, density and virial state).We run 10 regions for each set of initial conditions, as even though they share the same initial properties, there is some stochasticity in the dynamical evolution and two statistically similar star-forming regions can evolve very differently from one another (Parker et al. 2014).
Our simulated regions contain 1000 stars, with average total masses of ∼ 600 M ⊙ , which places them in the middle of the cluster mass distribution from Lada & Lada (2003), which ranges from 10 M ⊙ to 10 5 M ⊙ .
We set up the simulations with two very different velocity fields, as From left to right the columns show the fractal dimension (lower values correspond to greater degrees of substructure), the initial radii of the simulations, the median initial stellar mass density, the mean total stellar mass in the sets and the initial virial state of the regions where they can either be collapsing or expanding,  vir = 0.1 or  vir = 0.9, respectively.defined by the virial ratio  vir = /|Ω|, where  and |Ω| are the total respective kinetic and potential energies of the stars.Observations of prestellar cores show them to have a subvirial velocity dispersion (main sequence stars will inherit their velocities from the prestellar cores) (Foster et al. 2015;Kuznetsova et al. 2015), and so we run a set of subvirial simulations ( vir = 0.1).We also run sets of supervirial simulations ( vir = 0.9) as the observations of Bravi et al. (2018); Kuhn et al. (2019) and Kounkel et al. (2022) show that some young (around 1-5 Myr) star-forming regions are expanding.
We model this substructure in our simulations using the box-fractal method, generating simulations with a high degree of substructure (corresponding fractal dimension of   = 1.6) and simulations with no substructure (corresponding fractal dimension of   = 3.0) (Goodwin & Whitworth 2004;Daffern-Powell & Parker 2020).
To generate substructure we follow the method of Cartwright & Whitworth (2004); Goodwin & Whitworth (2004) which has been used extensively in the literature (Allison et al. 2009;Parker & Goodwin 2015;Daffern-Powell & Parker 2020;Daffern-Powell et al. 2022;Blaylock-Squibbs & Parker 2023).The box-fractal method works as follows.A single star is placed at the centre of a cube with side length  Div = 2 (in order to create substructure,  Div must be greater than unity, but the choice of  Div = 2 is arbitrary, see Goodwin & Whitworth 2004).This cube is then subdivided into  3 Div (8 in this case) smaller sub-cubes.A star is then placed at the centre of each of the sub-cubes.Each sub-cube then has a probability of    −3 Div of being subdivided itself, where   is the desired fractal dimension of the region.The lower the fractal dimension,   , the more substructured the region will be.For example, if   = 1.6, then the probability of that star's cube being subdivided is  −1.4 Div , whereas for   = 3.0, then the probability of that star's cube being subdivided is  0 Div , i.e. unity.
Stars whose cubes are not subdivided are removed, along with any previous generation of stars that preceded them.A small amount of random noise is added to the position of the stars to remove any regular structure that may appear.The stars' cubes are subdivided repeatedly until the desired number of stars ( ★ = 1000) is reached or exceeded.Once a generation consists of or exceeds the target number of stars, all previous generations of stars are removed.In the event the target number of stars is exceeded, we randomly select stars in the last generation and remove them until the target number of stars is reached.

Stellar Velocities
The initial star in the box-fractal method has a velocity drawn from a Gaussian with mean 0 km s −1 and variance 1 km s −1 .Subsequent stars inherit this velocity plus a value drawn from the same Gaussian but multiplied by a factor of (1/ Div )  , where  is the current generation of stars.Because of this, the velocity inherited decreases for each generation which results in stars spatially close to one another having similar velocities whereas stars that are far apart spatially can have very different velocities.This means that our simulations also have kinematic substructure, which may be able to trace the formation mode of star clusters (Arnold et al. 2022).
We then scale the velocities of the stars to the desired initial virial ratio of the simulations, where the virial ratio  vir = /|Ω|, where  and Ω are the total kinetic and potential energies of the stars, respectively, and  vir = 0.5 is virial equilibrium.We set the simulations to be either subvirial (cool, collapsing regions,  vir = 0.1) or supervirial (warm, expanding regions,  vir = 0.9).

Stellar masses
Stellar masses in the simulations are drawn randomly from the Maschberger IMF (Maschberger 2013).The Maschberger IMF has a functional form, unlike the piece-wise IMFs from Kroupa (2001) and Chabrier (2003).The probability density function of the Maschberger IMF is where  = 2.3 is the high mass exponent and  = 1.4 describes the slope of the IMF for lower masses. = 0.2 M ⊙ is the scale factor and the combination of these parameters produces a mean stellar mass of ∼ 0.4 M ⊙ .This form of the IMF is very similar to other parameterisations (e.g.Kroupa 2008;Chabrier 2003).We note that other versions, where parameters such as the exponent  may vary with the surface density of stars, have been proposed in the literature (Dib 2023).The masses are selected using the quantile function where the input  is a value drawn from a uniform distribution between 0 <  < 1,  low = 0.1 M ⊙ is the lower mass limit and  upp = 50 M ⊙ is the upper mass limit.
The function  () is the auxiliary function where  is either  low or  upp and the other the terms are as above.

Dynamical evolution
We take the masses, positions and velocities of the stars and evolve them using the 4 th -order Hermite scheme kira integrator within the Starlab software environment (Portegies Zwart et al. 2001).The simulations are evolved for 10 Myr and we do not include stellar evolution, nor do we simulate the background gas potential in our simulations nor the galactic tidal field.Both of these likely affect the dispersal of young stellar clusters via gas expulsion or by tidal stripping, reducing the time simulations remain bound (Fellhauer & Kroupa 2005;Mamikonyan et al. 2017).
A summary of the initial conditions for the -body simulations is given in Table 1.

INDICATE
To measure the clustering of stars in our simulations we use INDI-CATE (Buckner et al. 2019).INDICATE measures the degree of relative clustering on a star by star basis and also determines the significance of any clustering.
The INDICATE algorithm proceeds as follows.First, an evenly spaced control grid is generated with the same number density as the data of interest.This control grid has the appearance of a regular grid-like distribution of points that extends beyond the original data set.Because the control grid is constructed over the same spatial scale, the degree of clustering measured by INDICATE is already normalised, allowing the degree of clustering measured by INDICATE to be compared with the clustering in different regions, without needing to normalise the datasets against each other (the Qparameter Cartwright & Whitworth 2004, is normalised in a similar way).The number density of the data is calculated by dividing the number of stars by the rectangular area that encloses all stars.The rectangle is defined using the minimum and maximum (, ) coordinates of stars in the region.Then, for each star we calculate the Euclidean distance to its  th nearest neighbour in the control grid.The mean of these distances is then found, which we call r.For each star, , we calculate the number of other stars within r of .The INDICATE index is where  r is the number of stars within r of star  and  is the nearest neighbour number (we follow Buckner et al. (2019) and use  = 5 in this work).In this work we characterise the overall clustering in the simulations at each snapshot by calculating the mean INDICATE index, Ī5 .

Significant Index
INDICATE can determine the significance of clustering on a starby-star basis by using a significant index, above which stars are non-randomly clustered, and is calculated as follows.Firstly, once the control grid has been generated for the data we then generate a uniform distribution of points with the same number density as the data.We then calculate the INDICATE indices for the uniform distribution of points and calculate the mean of all of these indices.The significant index is defined as where Ī is the mean INDICATE index found for the uniform distribution of points and  is the standard deviation of the indices for the uniform distribution.Buckner et al. (2020) calculate the significant index by repeating the significant index calculation 100 times, each time using a different uniform distribution, then finding the mean significant index.Blaylock-Squibbs et al. (2022) show that for ∼ 1000 stars a single calculation is sufficient, giving a similar significant index (∼ 2.2) to the one calculated using repeats.However, for data sets with < 100 stars a minimum of 20 repeats should be used.

Q-Parameter
The Q-parameter was first introduced in Cartwright & Whitworth (2004) and is used to differentiate between regions with different morphologies.The Q-parameter uses a minimum spanning tree (MST), which is a graph connecting all points in such a way that the total edge length of the graph is minimised, and there are no closed loops.The Q-parameter is a ratio between the normalised mean edge length, m, and the normalised correlation length, s, and is calculated as follows.First, the normalised mean edge length is calculated by generating an MST of all stars in the data and finding the mean edge length of this MST.The mean edge length is normalised by dividing it by where  is the circular area of the region (though see Schmeja & Klessen (2006) and Parker (2018) for a discussion on alternative normalisation methods).
The normalised correlation length is the mean separation between all stars, which is normalised by dividing the mean separation by the radius of the region.The Q-parameter is defined as The value of the Q-parameter indicates the morphology of a group of stars;  < 0.8 implies a substructured region, whereas  > 0.8 implies a a smooth, centrally concentrated region.

Local stellar surface density ratio: Σ LDR
The local stellar surface density ratio was first used in Maschberger & Clarke (2011) to quantify the differences in stellar surface density of chosen subsets of stars compared to the entire region.In this work our chosen subset is the 10 most massive stars.The method proceeds as follows.First, for each star we calculate the two dimensional euclidean distance to the   ℎ nearest neighbour, we use  = 5 in this work.The surface density of a star is then defined as where  = 5 and  is the distance from the star to its the 5  ℎ nearest neighbour (Casertano & Hut 1985).The local surface density ratio is defined as where Σsubset is the median surface density of the 10 most massive stars and Σall is the median surface density of all stars in our simulations.If Σ LDR > 1 then the 10 most massive stars are located in areas of higher than average surface density, and if Σ LDR < 1 they are located in areas of lower than average surface density.We quantify the significance of any deviation from unity via a two-sample Kolmogorov-Smirnov (KS) test, where a −value of less than 0.01 is associated with the difference between the subset of massive stars, and the entire sample, means we reject the hypothesis that they share the same underlying parent distribution.Projection effects do not unduly affect this measurement (Bressert et al. 2010;Parker & Meyer 2012).

Mass segregation ratio: Λ MSR
The mass segregation ratio is a metric to quantify the degree of mass segregation in a star-forming region, and was first developed by Allison et al. (2009), and has been used extensively (Olczak et al. 2011;Parker & Alves de Oliveira 2017;Dib et al. 2018;Dib & Henning 2019;Plunkett et al. 2018;Maurya et al. 2023).Mass segregation is when the separation between the most massive stars is smaller than the separation between the average stars in a region (for example, if the massive stars are all located in the centre of a starforming region).The method proceeds as follows.First we generate an MST for a chosen subset of stars; in this work we select the 10 most massive stars.We then generate MSTs for 10 randomly chosen stars (which can include stars from the chosen subset); we repeat this 200 times and calculate the mean edge length for the random MSTs.
The mass segregation ratio is where  average is the mean edge length for the randomly constructed MSTs and  10 is the edge length of the MST for the 10 most massive stars.We follow Parker ( 2018) and find the upper (+ 5/6 / 10 ) and lower (− 1/6 / 10 ) uncertainties by taking an ordered list of all of the random MST lengths and selecting the upper and lower uncertainties from 5/6 and 1/6 of the way through the ordered list of MST lengths.The uncertainties will therefore corresponds to a 66% deviation from the median MST length and prevents outliers from affecting the uncertainty, which could be an issue if a Gaussian dispersion is used to estimate the uncertainty (Allison et al. 2009).
In addition to the uncertainties, we follow Parker & Goodwin (2015) and require Λ MSR > 2 as a significant detection of mass segregation.Therefore if Λ MSR > 2 then the 10 most massive stars are said to be mass segregated (the 10 most massive stars are closer to each other than the average stars are to each other) and if Λ MSR ∼ 1 then they are not mass segregated (the 10 most massive stars are separated by a similar distance to the average stars).If Λ MSR < 1 the region is said to be inversely mass segregated, with the most massive stars more widely distributed compared to the average stars in the region.

RESULTS
In this section we present our results in which we follow the evolution of the INDICATE metric,  5 , for -body simulations with a high initial degree of substructure (  = 1.6) and simulations with little to no initial substructure (  = 3.0).We show results for both high density (10 2 − 10 4 M ⊙ pc −3 ) and low density (10 0 − 10 2 M ⊙ pc −3 ) realisations of these simulations, corresponding to initial radii of 1 pc and 5 pc, respectively.We quantify the clustering using INDICATE along three different lines of sight (LOS); each LOS being parallel to one of the component axes (i.e.(, , )).
We then present the results where we combine , Σ LDR and Λ MSR with INDICATE, and assess which combinations most reliably constrain the initial conditions of the simulated star-forming regions.

Evolution of INDICATE in substructured regions
Figure 1 shows the mean INDICATE indexes for initially substructured simulations with fractal dimension   = 1.6 with initial radii of 1 pc.The general trend across all three lines of sight ((, ), (, ), (, )) for the subvirial regions (top row, panels a-c) is a rapid We show how different initial conditions and perspectives affect the measured amount of clustering in simulations at 5 Myr along three different lines of sight in Figures 2 and 3.Although there is some variation depending on the line of sight, this is minimal compared to the differences between the sub-and supervirial simulations.The supervirial simulations have a much lower degree of clustering.
The same behaviour is seen for some of the supervirial simulations (Fig 1 , panels d-f), but with a much slower initial increase in the mean indexes.The supervirial simulations also never reach the same degree of clustering according to INDICATE that the subvirial simulations do.The lower final Ī5 in the supervirial simulations is because the stars are all initially moving away from one another.Some stars assemble themselves in small bound groupings, but the overall lower number of stars in these subgroups results in lower values of Ī5 .
The main difference between the different initial virial states is in the final mean indexes, with subvirial simulations attaining mean indexes of ≳ 100 and supervirial simulations not exceeding a mean index of ∼ 90 by 10 Myr.This behaviour is seen across all three different lines of sight.The subvirial simulations attain higher Ī5 due to all the stars falling into the gravitational potential well and forming a cluster (Allison et al. 2010;Parker et al. 2014), which also explains the rapid increase of Ī5 during the first 2 Myr.
Figure 4 shows the mean INDICATE indexes for substructured (  = 1.6) low-density simulations with initial radii of 5 pc.We see the same general trend as in Figure 1 with the subvirial simulations attaining higher final mean indexes (panels a-c), compared to the supervirial simulations (panels d-f).A key difference is the lack of a rapid initial increase in the clustering; due to the low density of the simulations the stars take longer to interact and so the rate of clustering is slower.We see a maximum mean index of Ī5 ∼ 120 for the subvirial simulations at 10 Myr, but this is only for one out of the 10 simulations.The other subvirial simulations finish with 40 ≲ Ī5 ≲ 90, below the minimum mean index found in the much denser ( = 1 pc) subvirial simulations.

Evolution of INDICATE in smooth regions
Figure 5 shows the mean INDICATE indexes against time for regions with no initial substructure (  = 3.0) and initial radii of 1 pc.When comparing between the different virial states we see similar behaviour as for sub-and supervirial regions with   = 1.6.We find that the final mean indexes are similar to those of the regions with   = 1.6.
For subvirial regions (Fig. 5, panels a-c) we see once again an initial rapid increase in the clustering before a gradual increase for the remaining duration of the simulations.The initial increase lasts longer in these simulations, with the increase lasting for around 3 Myr, compared to it finishing within 2 Myr for the substructured (  = 1.6) simulations.We see similar behaviour for the supervirial simulations (Fig. 5, panels d-f) as well, in that they do not attain as high a mean INDICATE index by the end of the simulations.
Figure 6 shows Ī5 against time for low density regions with no substructure (  = 3.0) and initial radii of 5 pc.We see no significant difference in the evolution of the clustering between the different lines of sight.For subvirial simulations (Fig. 6, panels a-c) there is no rapid increase in Ī5 as the stars are further apart initially, and the absence of substructure also prevents the development of clustering.This delay is substantial; there is an increase in the clustering for some of the simulations near the end of the run time, at around 7 Myr, but the majority of simulations' INDICATE indexes do not increase.For the supervirial simulations (Fig. 6, panels d-f) the clustering does not change significantly with Ī5 < 20 for 10 Myr.

Combining Methods
In this section we present the 2D results (line of sight along the  axis with the plane of sky being  and ) of combining the Qparameter, Σ LDR and Λ MSR with INDICATE.We do this for each of the simulations at the following times: 0 Myr, 1 Myr and 5 Myr (see also Parker et al. (2014), Parker & Goodwin (2015) and Blaylock-Squibbs & Parker (2023) for other examples of Q-parameter, Σ LDR and Λ MSR being used in combination).and 5 Myr has also increased, making distinguishing between the different times more difficult.

Q & INDICATE
Panels (c) and (d) of Figure 7 show the results for the low density, initially substructured (  = 1.6) simulations.The INDICATE results for 0 Myr and 1 Myr are similar when comparing the sub-and supervirial realisations, due to a lack of dynamical evolution at early times.We once again see that the 5 Myr Q values are higher in the subvirial regions as the substructure is more effectively erased due to the stars collapsing inwards into the gravitational potential well.On the other hand, the supervirial regions retain substructure, but do not become more clustered due to a lack of significant dynamical evolution.
Figure 8 shows the Q and Ī5 values for initially smooth subvirial (lefthand panels) and supervirial (righthand panels) simulations, respectively.Panels (a) and (b) show simulations that are high density with   = 3.0 (little to no initial degree of substructure).For subvirial simulations (panel a) both Q and Ī5 values are distinct with very little overlap between the different times.For the supervirial simulations (panel b) there is much more overlap, with Q values being similar across 0 Myr, 1 Myr and 5 Myr, and only the INDICATE vales increase slightly over time.As in the highly substructured simulations, we find that the maximum Ī5 values are found in the subvirial simulations.
In panels (c) and (d) of Figure 8 we show the evolution of nonsubstructured (  = 3.0), low-density ( = 5 pc) simulations and we find the Q values overlap across 0 Myr, 1 Myr and 5 Myr.This overlap is seen for both subvirial and supervirial simulations implying that the initial virial state of low density regions without substructure cannot be constrained using these methods.Parker et al. (2014) and Parker (2014) show that the relative surface densities of the most massive stars can be used in tandem with the Qparameter to infer the initial density and virial state of a star-forming region.

Σ LDR & INDICATE
Figures 9 (a) and (b) show the results for Σ LDR and INDICATE for high density and highly substructured regions with   = 1.6.The Σ LDR values are less reliable at inferring the initial conditions than the Q values, with an overlap between Σ LDR between the suband supervirial simulations at 1 Myr and 5 Myr.The Ī5 values are the same here so the analysis in the previous section will apply for the rest of this section.
Panels (c) and (d) show the combination for low density simulations.Comparing across the sub-and supervirial simulations we see some overlap in the Σ LDR values for the 0 Myr and 1 Myr snapshots, although after 5 Myr it is possible to distinguish between the highand low-density simulations.
Figures 10 (a) and (b) shows the results for dense simulations with no initial substructure (  = 3.0).The dense, subvirial simulations (panel a) show a very distinct difference between 1 Myr and 5 Myr (compared the triangles with the crosses), but the distinction is less clear for the supervirial simulations (ostensibly because the degree of clustering measured by INDICATE does not increase as much in the supervirial simulations).
In the low-density simulations without substructure (panels (c) and (d) of Figure 10) there is significant overlap of Σ LDR versus Ī5 at all snapshot times, and as such it is impossible to differentiate between initially subvirial and supervirial simulations.
We find that in the substructured simulations Σ LDR values tend to be higher at 5 Myr compared to simulations with no substructure, and the degree of clustering measured by INDICATE is also slightly higher in the substructured simulations.

Λ MSR & INDICATE
In addition to plotting the evolution of the Q parameter against the relative surface density ratio, Σ LDR , Parker et al. (2014) showed that it is possible to also distinguish between sub-and supervirial initial conditions by using a combination of Q and the mass segregation ratio, Λ MSR .
Figure 11 shows the combination of Λ MSR and INDICATE for initially substructured simulations (  = 1.6).Although subvirial simulations tend to attain higher Λ MSR values (Allison et al. 2010;Parker et al. 2014), not all simulations dynamically segregate.This, combined with the wide spread in INDICATE indexes makes inferring the initial virial state of dense regions using Λ MSR and IN-DICATE alone unreliable (compare panels (a) and (b)).The Λ MSR values are also similar between the subvirial and supervirial simulations in the low density initial conditions (panels c and d).The key difference between the low and high density simulations is the Λ MSR values at 1 Myr tending to be higher in the subvirial high density simulations compared to the supervirial high density simulations.
Figure 12 shows the results for simulations with no initial substructure (  = 3.0).Comparing the high density simulations we see no significant difference in the Λ MSR values, with a single simulation in both sets attaining Λ MSR > 4, which is a significant degree of mass segregation.
However, the combination of Λ MSR and INDICATE does provide some constraints; the smooth, subvirial simulations (panel a) are distinguishable from the supervirial simulations (panel b).
For the low-density simulations (panels c and d) the two different initial virial states are indistinguishable.

Determining the initial degree of substructure
We have shown that combining INDICATE with other measures of the spatial distribution of stars can be used to infer the initial density of a star-forming region, and whether the star-forming region was initially subvirial, or supervirial.
Observations suggest that many star-forming regions form with the stars arranged in a substructured distribution, likely to be inherited from the filamentary nature of the gas from which they form.Substructure is not created during the dynamical evolution of a starforming region, it is always erased (Goodwin & Whitworth 2004;Parker et al. 2014;Daffern-Powell & Parker 2020).
Hydrodynamical simulations of the early phases of star formation suggest that the initial degree of substructure of stars can vary quite considerably (Schmeja & Klessen 2006;Girichidis et al. 2012;Dale et al. 2012Dale et al. , 2013Dale et al. , 2014)).As any observed star-forming region could have experienced some dynamical evolution, it is not clear what the typical degree of substructure is for star-forming regions (or even if there is a typical value).
Furthermore, Dib & Henning (2019) suggest that the value of the Q-parameter increases with increasing star formation rate, and postulate a link between the early stages of star formation in clouds, and the later degree of substructure.This could, however, be affected by any later dynamical evolution of the stars (Parker et al. 2014; Parker & Dale 2017), and if there is not a direct mapping of the structure of the gas to the stars (Parker & Dale 2015).Inspection of our results suggests that combining INDICATE with other measures of the spatial (and kinematic) information could be used to infer the initial degree of substructure.The most important factor in determining the amount of dynamical evolution is the initial local density within the star-forming region (Parker 2014).This means that in order to compare the effects of the substructure on the evolution of a star-forming region, we must compare regions with similar median local densities.
For regions with similar radii, a highly substructured region (  = 1.6) will have a very high median density compared to a region with no substructure (  = 3.0).Our substructured regions with large radii (5 pc) have similar median local densities ( ρ ∼ 100 M ⊙ pc −3 ) to our smooth regions (  = 3.0) with smaller radii (1 pc), and so we determine the effects of substructure on the evolution of INDICATE using these pairs of simulations.
As an example, we compare the evolution of the Q-parameter and INDICATE for substructured simulations with large radii (Fig. 7-c) to smooth simulations with more compact radii (Fig. 8-a).We can see that both Q and INDICATE display a higher level of clustering in the more compact, smoother simulations, and this is also evident in the subsequent plots that show the evolution of INDICATE combined with Λ MSR (compare Fig. 11-c to Fig. 12-a) or Σ LDR (compare Fig. 9-c to Fig. 10-a).This is interesting because for most combinations of metrics, their signal is enhanced from the dynamical evolution of initially substructured regions.Because the degree of clustering measured by INDICATE is highest for smooth rather than substructured regions (of comparable median local densities), INDICATE can be used in tandem with other metrics to distinguish between subtle differences in the initial degree of substructure in star-forming regions.

CONCLUSIONS
We apply the INDICATE clustering measure (Buckner et al. 2019) to eight sets of -body simulations consisting of 1000 stars along three different lines of sight to understand how it behaves due to dynamical evolution in star-forming regions with different initial conditions.We also combine INDICATE with other methods to asses its diagnostic ability in determining the initial conditions of the simulations from later snapshots.Our conclusions are as follows.
(i) INDICATE is not significantly affected by projection affects in our simulations, and evolves similarly when measured along different lines of sight for the same set of simulations.
(ii) INDICATE does evolve differently depending on the initial virial state of the simulations.We find that Ī5 increases rapidly in the subvirial simulations during the first 2 Myr, and then steadily increases until the end of the simulations.This behaviour is seen in both simulations with a high degree of initial substructure (  = 1.6) and in simulations with no initial substructure (  = 3.0), albeit to a lesser extent in the less substructured simulations.
(iii) The values of the INDICATE measure, Ī5 , are higher for subvirial simulations than for supervirial simulations at the end of the simulations (10 Myr), and this is seen in all sets of simulations apart from very low density, initially non-substructured   = 3.0 simulations, where there has been little-to-no dynamical evolution.(iv) INDICATE is most sensitive to the initial density, rather than the initial degree of substructure in a region.The higher the initial density of a star-forming region, the more clustered the stars tend to become.
(v) However, INDICATE -in combination with other metrics such as the Q-parameter -can be used to infer the initial degree of substructure, but only if the initial densities and virial ratios are known.For example, compare Fig. 7 and Fig. 8, where the plot of Q versus Ī5 gives different results for high density simulations with and without substructure.This difference is much less pronounced for the low density simulations.
(vi) When we combine INDICATE with other measures of spatial clustering, we find that INDICATE used in combination with the Qparameter (Cartwright & Whitworth 2004) provides the most reliable way of inferring the initial conditions of the simulated star-forming regions.
Ideally, combinations of more than one spatial diagnostic (e.g.INDICATE and Q, Q and Σ LDR , etc.) are required to really pinpoint the initial conditions of a star-forming region, in addition to kinematic measures such as the distributions of ejected stars (Schoettler et al. 2019(Schoettler et al. , 2020;;Schoettler et al. 2022;Farias et al. 2020;Parker & Schoettler 2022;Arunima et al. 2023).

Figure 1 .
Figure 1.The mean INDICATE index against time for high-density, substructured, sub-and supervirial simulations.Both the sub-and supervirial simulations have a high initial degree of substructure (  = 1.6) and both contain 1000 stars with initial radii of 1 pc, simulated for 10 Myr.From left to right the columns show the INDICATE indexes calculated along three different lines of sight, ( , ), ( , ) and ( , ).The top row shows the indexes calculated for subvirial simulations and the bottom row shows the indexes calculated for supervirial simulations.The solid lines show the mean index for all stars and the dashed lines show the mean index for the significantly clustered stars.We show the 10 simulations we run for the different sets of initial conditions, with the colour shading of the lines being consistent across the different lines of sight.

Figure 2 .Figure 3 .
Figure 2. Three different line-of-sight perspectives of a subvirial simulation of 1000 stars after 5 Myr of dynamical evolution.The panels show the simulation as viewed along the ,  and  axes.The colour of the points corresponds to their INDICATE index.The redder the points the more clustered a star is, according to INDICATE.The 10 most massive stars are highlighted with star symbols.We show the median index for all 1000 stars with the solid black line in the colour bar ( Ĩ5,  = 146.00,148.60, 154.00 calculated along the ,  and  axes, respectively), the black dashed line shows the median index for the 10 most massive stars ( Ĩ5,10 = 152.10,154.20, 156.90 calculated along the same lines of sight) and the thin black dotted line shows the significant index ( sig = 2.2); a star with an index above this is said to be significantly clustered.The scale of the colour bar is scaled based on the maximum INDICATE index found in the ( , ) plane of  max = 158.20.

Figure 7
Figure 7 shows INDICATE plotted against Q at 0 Myr, 1 Myr and 5 Myr in the high density, substructured (  = 1.6) simulations, for subvirial (panel a) and supervirial (panel b) simulations, respectively.Comparing the sub-and supervirial simulations we see a clear distinction between them in both Q and Ī5 .All the subvirial simulations have  > 1 by 1 Myr, and at the same time Ī5 lies between 20 and 100.The Ī5 values found at 5 Myr in the subvirial simulations overlap slightly with the 1 Myr results, lying between 85 and 135.In the high density supervirial simulations (Figure7(b)) we find that some of the Q values at 1 Myr and 5 Myr have remained below the boundary between substructured and smooth morphologies (stars are moving away from each other in grouping of stars and this preserves some of the initial substructure, hence a lower Q value), with some Q values at 5 Myr overlapping with the 1 Myr values found in the subvirial simulations.The overlap between the Ī5 values at 1 Myr

Figure 4 .
Figure 4.The mean INDICATE index against time for low-density, substructured, sub-and supervirial simulations.Both the sub-and supervirial simulations have a high initial degree of substructure (  = 1.6) and both contain 1000 stars with initial radii of 5 pc, simulated for 10 Myr.From left to right the columns show the INDICATE indexes calculated for three different lines of sight, ( , ), ( , ) and ( , ).The top row shows the indexes calculated for subvirial simulations and the bottom row shows the indexes calculated for supervirial simulations.The solid lines show the mean index for all stars and the dashed lines show the mean index for the significantly clustered stars.We show the 10 simulations we run for the different sets of initial conditions, with the colour shading of the lines being consistent across the different lines of sight.

Figure 5 .
Figure 5.The mean INDICATE index against time for smooth, high-density, sub-and supervirial simulations.Both the sub-and supervirial simulations have no initial substructure (  = 3.0) and both contain 1000 stars with initial radii of 1 pc, simulated for 10 Myr.From left to right the columns show the INDICATE indexes calculated for three different lines of sight, ( , ), ( , ) and ( , ).The top row shows the indexes calculated for subvirial simulations and the bottom row shows the indexes calculated for supervirial simulations.The solid lines show the mean index for all stars and the dashed lines show the mean index for the significantly clustered stars.We show the 10 simulations we run for the different sets of initial conditions, with the colour shading of the lines being consistent across the different lines of sight.

Figure 6 .
Figure 6.The mean INDICATE index against time for smooth, low-density, sub-and supervirial simulations.Both the sub-and supervirial simulations have no initial substructure (  = 3.0) and both contain 1000 stars with initial radii of 5 pc, simulated for 10 Myr.From left to right the columns show the INDICATE indexes calculated for different viewing angles, ( , ), ( , ) and ( , ).The top row shows the indexes calculated for subvirial simulations and the bottom row shows the indexes calculated for supervirial simulations.The solid lines show the mean index for all stars and the dashed lines show the mean index for the significantly clustered stars.We show the 10 simulations we run for the different sets of initial conditions, with the colour shading of the lines being consistent across the different lines of sight.

Figure 7 .
Figure7.The Q-parameter plotted against INDICATE for simulations with a high degree of initial substructure (  = 1.6).The top row shows the results for high density simulations with initial radii of 1 pc and the bottom row shows the results for low density simulations with initial radii of 5 pc.The black pluses show the results at 0 Myr, the blue crosses for 1 Myr and the red triangles are for 5 Myr.The horizontal grey dash dotted line is at 0.8, above which Q values signify the regions have a smooth, centrally concentrated morphology.The left hand column shows the results for subvirial simulations and the right hand column shows the results for supervirial simulations.

Figure 8 .
Figure 8.The Q-parameter plotted against INDICATE for simulations with little to no initial substructure (  = 3.0).The top row shows the results for high density simulations with initial radii of 1 pc and the bottom row shows the results for low density simulations with initial radii of 5 pc.The black pluses show the results at 0 Myr, the blue crosses for 1 Myr and the red triangles are for 5 Myr.The horizontal grey dash dotted line is at 0.8, above which Q values signify the regions have a smooth, centrally concentrated morphology.The left hand column shows the results for subvirial simulations and the right hand column shows the results for supervirial simulations.The top row shows the high density simulations and the bottom row shows the low density simulations.

Figure 9 .Figure 10 .
Figure 9.The relative surface density of massive stars, Σ LDR , plotted against INDICATE for simulations with a high degree of initial substructure (  = 1.6).The black pluses show the results for 0 Myr, the blue crosses show the results for 1 Myr and the red triangles for 5 Myr.The horizontal grey dash dotted line is at 1, above which Σ LDR finds the 10 most massive stars are in areas of greater than average surface density.The top row shows the results for high-density simulations with initial radii of 1 pc and the bottom row shows the results for low-density simulations with initial radii of 5 pc.The left hand column shows the results for subvirial simulations and the right hand column shows the results for the supervirial simulations.

;
Parker 2014; Parker & Alves de Oliveira 2017; Blaylock-Squibbs & Parker 2023).Whilst these combinations are powerful diagnostics, they can be degenerate, especially for low-density star-forming regions, and new measures of clustering are continuously being added to the literature that require extensive testing against more established metrics.First introduced by Buckner et al. (2019) INDICATE (INdex to Define Inherent Clustering And TEndencies) is designed to quantify the relative clustering of stars.INDICATE has been used to investigate NGC 3372 and NGC 2264, and also for assessing how accurately Gaia can observe young stellar clusters and associations

Table 1 .
Table summarising the initial conditions for the eight sets of 10 simulations.