Multi-Dimensional Density Estimation and Phase Space Structure Of Dark Matter Halos

We present a method to numerically estimate the densities of a discretely sampled data based on binary space partitioning tree. We start with a root node containing all the particles and then recursively divide each node into two nodes each containing roughly equal number of particles,until each of the nodes contains only one particle. The volume of such a leaf node provides an estimate of the local density. We implement an entropy-based node splitting criterion that results in a significant improvement in the estimation of densities compared to earlier work. The method is completely metric free and can be applied to arbitrary number of dimensions. We apply this method to determine the phase space densities of dark matter halos obtained from cosmological N-body simulations. We find that contrary to earlier studies, the volume distribution function $v(f)$ of phase space density $f$ does not have a constant slope but rather a small hump at high phase space densities. We demonstrate that a model in which a halo is made up by a superposition of Hernquist spheres is not capable in explaining the shape of $v(f)$ vs $f$ relation, whereas a model which takes into account the contribution of the main halo separately roughly reproduces the behavior as seen in simulations. The use of the presented method is not limited to calculation of phase space densities, but can be used as a general-purpose data-mining tool and due to its speed and accuracy it is ideally suited for analysis of large multidimensional data sets.


INTRODUCTION
One of the basic problems in data mining is to estimate the probability distributions or density distributions based on a discrete set of points (particles) distributed in a multidimensional space. Once the density distribution is known expectation values of other quantities of interest can be derived. Considering the huge amounts of data both astronomy and other fields are facing there is a need for methods that are accurate flexible and fast. However, most of the existing methods encounter problems when applied to higher dimensions. In the particular application of N-body simulations, the estimate of phase space densities is one such problem as it requires an efficient and flexible method for 6 dimensional phase space density estimation for a large variety of equilibrium and non-equilibrium solutions of largely different topology (e.g. highly flattened disks, spheroidal but anisotropic halos, spheroidal nearly isotropic ellipticals). ⋆ E-mail:sharma@physics.arizona.edu † E-mail:msteinmetz@aip.de The simplest method for density estimation is the k nearest neighbor. Consider the radius r enclosing k nearest neighbors then density is given by k/V d (r) where V d (r) is the volume enclosed by a d-dimensional sphere of radius r (Loftsgaarden & Quesenberry 1965). A more accurate method than this is the kernel density estimation (KDE) or popularly known as SPH (Gingold & Monaghan 1977;Lucy 1977;Silverman 1986). The results are sensitive to the choice of kernel function and the bandwidth of the kernel or in other words the number of smoothing neighbors. The later being more important. Variable bandwidth estimators are more superior as compared to the fixed bandwidth estimators. For the multidimensional case simple isotropic bandwidths perform poorly when the data has an anisotropic distribution. In this case one needs to select different bandwidths in different dimensions. In general a covariance matrix is determined and the bandwidth is selected so as to have constant co-variance in all directions. This leads to anisotropic kernels. The Delaunay tessellation (Okabe et al. 1992 de Weygaert 1996) which tessellates space into disjoint regions, performs much better for anisotropic data. Delaunay tessellation is very accurate but also very time consuming.
Most existing methods, including both KDE and Delaunay Tessellation require an a-priori definition of a metric of the ndimensional space under investigation. A suboptimal choice of metric results in a poor estimate of the density. Metric based density estimators provide optimal approximations, only if covariance of the data is identical along all dimensions, locally at each point in space. In general, however, data is non-homogeneous and anisotropic. Consequently the above conditions cannot be realized by assuming a global scaling relation among different dimensions. A method is required that is adaptive to the data under investigation. Recently a new method dubbed FiEstAS which is metric free has been proposed by Ascasibar & Binney (2005). FiEstAS is also very fast and efficient. The method relies on a repeated binary decomposition of space (organized by a tree data structure) until each volume element contains exactly one particle. The accuracy of the method depends upon the criteria used for splitting the nodes. In the simplest implementation the dimension to be divided is chosen either randomly or alternately, guaranteeing equal number of divisions for each dimension. The more a particular dimension is tessellated the higher the resolution achieved in that dimension. Ideally we need a scheme which makes more divisions in the dimension along which there is maximum variation and few divisions (or none) along which there is minimum (or no) variation. However, the scheme as described above is data blind and thus fails to optimize the number of divisions to be made in a particular dimension.
In this paper we propose and evaluate a splitting criterion that is based upon the concepts of Information Theory (Shannon 1948(Shannon , 1949MacKay 2003;Gershenfeld 1999). Space is tessellated along the dimension having the minimum entropy (Shannon Entropy) or in other words maximum information . Consequently, this scheme optimizes the number of divisions to be made in a particular dimension so as to extract maximum information from the data. This method can also be used to determine the metric that locally gives approximately constant covariance. Kernel based methods can then be used to estimate the densities.
As an application we study the phase space density of dark matter halos obtained from cosmological simulations. The code is available upon request and in future we plan to make it publicly available at the following url https://sourceforge.net/projects/enbid/

ALGORITHM
The basic problem is to estimate the density function ρ(x) from a finite number N of data points x 1 , x 2 ..x N drawn from that density function. Here x i is a vector in a space of d dimensions having components x i 1 , x i 2 ..x i d . The overall procedure of our algorithm EnBiD (Entropy Based Binary Decomposition) consists of three steps, which we will describe in detail below. First we tessellate the space into mutually disjoint hypercubes each containing exactly one particle. If V i is the volume of the hypercube containing ith particle then its density is mi/V i . Second we apply the boundary corrections to take into account the arbitrary shape of the volume containing the data. Third we apply a smoothing technique in order to reduce the noise in the density estimate.

Tessellation
We start with a root node containing all particles. The node is divided by means of a hyper plane perpendicular to one of the axis into two nodes each containing half the particles. If j is the dimension along which the split is to be performed, the position of the hyper plane is given by the median of xj. The process is repeated recursively till each sub-node contains exactly one particle(so-called leaf nodes). Let Vi be the volume of the leaf node containing particle i, and mi be the particle mass, then the density is given by ρi = mi/Vi. An alternative to this, as was originally done in FiEstAS, is to calculate the mean < xj > and then identify two points one on each side which are closest to the mean. The split point is then chosen midway between these two points. xcut = (x lef t + x right )/2. This speeds up the tessellation.
In the implementation of FiEstAS the splitting axis alternates between the considered dimensions, which guarantees roughly equal number of divisions per dimension. In the calculation of phase space densities the real and velocity space are known to be Euclidean. Therefore the splitting is done alternately in real and velocity space and in each subspace the axis with highest elongation (< x 2 j > − < xj > 2 ) is chosen to be split. This generates cells that are cubical rather than elongated rectangular in the aforementioned subspace, and also helps alleviate numerical problems that arise when two points have very close values of a particular co-ordinate. We call this decomposition which is implemented in FiEstAS as Cubic Cells while the one free from this as General.
For N particles the binary decomposition results in 2N − 1 nodes out of which there are N leaf nodes each having one particle. The more a particular dimension is tessellated, the more the resolution in that dimension. However, for data that is uniformly distributed in a particular dimension there is actually no need to perform a split in that dimension. This fact can be exploited to increase the accuracy of the results.
For each node we calculate the Shannon entropy Sj along each dimension (or subspace) and then select the axis (subspace) with minimum entropy. The dimension having minimum entropy guarantees maximum density variation or clustered structures in that dimension. In other words we split the dimension that has the maximum amount of information. The entropy S along any dimension or subspace is estimated by dividing the dimension or subspace into N b bins of equal size and calculating the number of points ni in each bin (we choose N b to be equal to the number of particles in each node). The probability that a particle is in the ith bin is given by pi = ni/N where N is the total number of particles. The entropy is then given by Rather than treating each dimension independently it is also possible to select a subspace (real or velocity space) with minimum entropy and then choose an axis with maximum elongation from this subspace (Cubic Cells). This provides slightly lower dispersion in estimated densities.

Boundary correction
The data in general might have an irregular shape and may not be distributed throughout the rectangular volume of the root node. Consequently, the densities of particles near the boundary can be underestimated. This is not an issue for systems with periodic boundary conditions but it would be for systems which are for example spherical. In higher dimensions this correction becomes even more important since the fraction of particles that lie near the boundary increases sharply with number of dimensions 1 .
In FiEstAS the following correction is implemented: Suppose a leaf node having a particle at x p has one of its surfaces in dimension i either xi = xmax or xi = xmin as a boundary, then the boundary face is redefined such that its distance from the particle is same as the distance of the other face from the particle. For the former the redefinition is xi = x p i + (x p − x min ) and for the later If both the faces lie on the boundary then the scheme fails to apply the correction. Moreover for small sub halos embedded in a bigger halo the sub halos have lower velocity dispersion and occupy a smaller region in velocity space, hence its boundary needs to be corrected even though it is not directly derived from the global boundary. A similar situation also arises near the center of the halos for which the circular velocity Vc(r) → 0 as r → 0. Moreover in EnBiD we need to calculate the entropy for each node and the boundary effects might decrease the entropy of the system spuriously. Consequently, a boundary correction needs to be applied to each node during the tessellation, and not just to the leaf node at the end of tessellation. In EnBiD for each node having more than a given threshold n b of particles, a node is checked for boundary correction before the calculation of entropy. In a given dimension if lmax and lmin are the maximum and minimum coordinates of a node and xmax and xmin, the corresponding maximum and minimum co-ordinates of the particles inside it, then a boundary correction is applied if simultaneously and where f b is a constant factor. This is effective in detecting embedded structures. To check for corrections applicable for only one face, the value of f b is chosen to be 5 times higher. For cubical cells in real and velocity space f b = 0.5N 1/d was found to give optimal results, where N is the total number of particles in the system. For general decomposition the corresponding value of f b was found to be 2.0N 1/d . The node boundary lmax and lmin are corrected as where (xmax − xmin)/(n node − 1) is the expected mean interparticle separation. The choice of n b is dictated by two factors a) if d is the number of dimensions of the space then a minimum of d + 1 particles are needed to define a geometry in that space so we set n b d + 1. If the number of particles in a node are too small this leads to Poisson errors in the calculation of the inter-particle separation so we impose a lower limit of n b = 7.

Smoothing
The un-smoothed density estimates have a large dispersion which cannot be reduced even by increasing the number of particles. By smoothing this dispersion can be reduced provided the density does not vary significantly over the smoothing region. We test two different smoothing techniques. The FiEstAS smoothing as proposed by (Ascasibar & Binney 2005) and the kernel based scheme (KDE).
In FiEstAS smoothing, first the density of each node is calculated assuming that the mass of each particle is distributed uniformly over its leaf node. Next the volume Vs centered on that point which encompasses a given smoothing mass Ms is calculated. The density estimate is then given by ρ = Ms/Vs. For Cubic tessellation the smoothing cells are also chosen to be exactly cubical in the real and velocity subspaces. To calculate Vs an iterative procedure is used. We start with a hyper-box having boundaries in the i-th dimension at xi±∆i, ∆i being the distance to the closest hyper plane along i-th axis of the leaf node containing the point x. ∆i is then doubled until the mass enclosed by smoothing box M < Ms and then the interval is halved repeatedly till |(M − Ms)/Ms| η tol where η tol is a tolerance parameter. Our experiments show that a tolerance parameter of 0.1 gives satisfactory results. Although in FiEstAS the smoothing mass Ms = 10mp is chosen, we find that choosing Ms = 2mp gives a higher resolution, while not compromising much on the noise reduction.
In Kernel smoothing a fixed number of nearest neighbors around the point of interest are identified and the density is computed by summing over the contributions of each of the neighbors by using a kernel function. This is known as the adaptive kernel smoothing since the smoothing length is ∝ ρ 1/d , ρ being the density in a d dimensional space. The kernel function can be spherical of the form of W (u), u = d i=1 u 2 i being the distance of the neighbor from the center and ui the corresponding co-ordinates in a d dimensional space, or of the form of Π d i=1 W (ui) known as the product kernel. The standard kernel scheme provides a much poorer estimate of the phase space density, since a global metric is usually unsuitable in accounting for the complex real and velocity structure encountered in many astrophysical systems. However, with a method like EnBiD we can determine the appropriate metric at each point in space and thus force the co-variance to be approximately same along all dimensions. At any given point the correct metric can be calculated by determining the sides of the leaf node which encompasses that point, followed by a coordinate transformation such that the node is transformed into a cube. As we illustrate in the appendix, the kernel density estimator can have a significant bias in the estimated densities. The results we show here are after correcting for this bias. We tested and compared the use of spline and the Epanechnikov kernel function and found the later to be more efficient. For all our analysis we use the Epanechnikov kernel function. Bias correction and other details pertaining to kernel based methods e.g the number of smoothing neighbors are given in the Appendix. The algorithm implemented in EnBiD for nearest neighbor search is based on the algorithm of SMOOTH (Stadel 1995).
Although the length of the sides of a node provides an accurate estimate of the metric but when trying to smooth over a region, the smoothing region might exceed the boundaries of the actual particle distribution. The smoothing lengths in such case needs to be appropriately redefined. This situation arises in cases where a dimension has very less entropy and has been split many times or near the boundaries of the system where the metric has not been accurately determined. In a given dimension let lmax and lmin be the maximum and minimum co-ordinates of a smoothing box or a sphere encompassing a fixed number of neighbors N ngb and xmax and xmin the maximum and minimum co-ordinates of the particles inside it. A smoothing length correction is applied to the box, if simultaneously the distance to both the right and left boundaries given by where N ngb is the number of smoothing neighbors. The metric is redefined with lmax and lmin set to xmax and xmin. For FiEs-tAS smoothing also we implement a similar smoothing volume correction. For a given smoothing box of volume Vs, if mi is the mass contributed by a leaf node to the smoothing box and vi its corresponding volume that falls within the box, then instead of calculating the density as ρ = mi/Vs, we calculate it as ρ = ( mi)/( vi). This correction is only applied if ( vi)/Vs < 0.5.

TESTS
To test the accuracy of the results we generate test data with a given density distribution in a d dimensional space and then perform a comparison with the density estimates given by the code. We employ systems which have an analytical expression of 6 dimensional phase space density f , namely an isotropic Hernquist sphere (c.f. Ascasibar & Binney (2005)) and an isotropic halo with a Maxwellian velocity distribution (c.f. Arad et al. (2004) ). The test cases are generated by discrete random sampling of this density function f using a fixed number of particles N . We show here results of tests done in 6 dimensions only and with boundary correction and smoothing. Results pertaining to 3 dimensions and effects of boundary correction and smoothing are discussed in detail in Ascasibar & Binney (2005).

Hernquist Sphere
For a Hernquist (1990) sphere of total mass M and scale length a the real space density is given by and gravitational potential is given by

The phase space density as a function of energy
where First we generate a random realization in real space corresponding to density given by Eq. (8). Then we use von Neumann rejection technique to generate the velocities that sample the distribution Press et al. (1992). Further details can be found in Ascasibar & Binney (2005). For calculating the virial quantities of a Hernquist sphere we use c = RV ir /a = 4.0 which roughly corresponds to an N F W halo with c = 8.0.
In top panel of Fig. 1 we plot the ratio of numerically estimated phase density f evaluated by the respective method to the analytical phase space density ft , as a function of ft for a Hernquist sphere sampled with 10 6 particles. f is calculated by binning the particles in 80 logarithmically spaced bins in ft with at least 5 particles per bin and then evaluating the mean value of the estimated density of all the particles in the bin.
Ideally one expects the plot to be a straight line with f /ft = 1. It can be seen from the figure that the density is well reproduced for most of the halo for about 18 decades in density except near the very center where the density is very high. Both FiEstAS and En-BiD tessellation, followed by FiEstAS smoothing with Ms = 2mp, underestimate the density in the region of very high density, however when compared to FiEstAS tessellation the high density cusp is resolved better by EnBiD by about 2 decades in density. In real space there is more variation of density as compared to velocity space. EnBiD accounts for this by allocating more divisions in real space thereby achieving higher spatial resolution, whereas FiEstAS gives equal weight to both spaces and ends up thus compromising the spatial resolution. When kernel smoothing is employed along with metric as determined by EnBiD tessellation (EnBiD+Kernel Smooth), there is a further gain in resolution by about 3 and 4 decades for smoothing neighbors n = 40 and n = 10 respectively. Lowering the number of smoothing neighbors results in higher resolution.
Next we compare the volume distribution function v(f ) as reproduced by the code. Numerically v(f ) is evaluated by binning the particles as before in logarithmically spaced bins of f . If m bin is the mass of all the particles in the i th bin, the density of the bin being f bin = (fi+1 + fi)/2 then v(f bin ) = (m bin /f bin )/(fi+1 − fi). Statistical error in each bin is given by ∆f = f bin − < f bin > (where < f bin > is the mean value of density of all the particles in the bin). Analytically the volume distribution function is given by where g(E) is the density of states. For a Hernquist sphere It can be seen from middle panel of Fig. 1 that v(f ) is well reproduced by both FiEstAS and EnBiD. However, in the high density region FiEstAS underestimates v(f ) which results in steepening of the volume distribution function at the high f end, while EnBiD estimates the v(f ) accurately to much higher densities.
This can be seen more clearly in lower panel of Fig. 1 where we plot the logarithmic slope denoted by α of the volume distribution function as function of density f .
FiEstAS can reproduce the slope parameter α only till f /fV ir = 10 2 whereas EnBiD can reproduce it till f /fV ir = 10 4 and En-BiD+Kernel Smooth can reproduce it till f /fV ir = 10 5 and f /fV ir = 10 6 , for smoothing neighbors n = 40 and 10 respectively.
In order to get an estimate of the dispersion in the reproduced values of f and in order to check the effectiveness of smoothing we plot in Fig. 2 the probability distribution of f /ft. The distribution can be fitted with a log-normal distribution and the fit parameters are also shown in the figure. The bias is less than 0.03 dex for The EnBiD tessellation in the results as analyzed above was done with Cubic Cells in real and velocity space. In top right panel of Fig. 3 we compare the results as obtained with General decomposition where each dimension is treated independently. Kernel smoothing with smoothing neighbors n = 10 was employed for both of them. The estimates are nearly identical. There is a slight gain in resolution but the estimates with General decomposition were also found to have a slightly higher dispersion in the estimates. In bottom right panel we compare the result of smoothing between a product kernel and a spherical kernel. There is very little difference between the estimates. The number of neighbors were chosen so as to have identical dispersions in both the estimates. When using the kernel in product form about double the number of neighbors are needed to obtain identical dispersion.
In top left panel we compare the un-smoothed densities with FiEstAS smoothed densities. For both of them EnBiD scheme is used for tessellation. The un-smoothed estimates are the densities as determined from the volume of the leaf nodes generated by the tessellation procedure. The FiEstAS smoothing only reduces the dispersion the resolution remains nearly unaltered. The resolution and accuracy is essentially determined by the density of the leaf nodes. Next we compare the FiEstAS smoothing with cloud in cell scheme (Hockney & Eastwood 1981) of density estimation. The cloud in cell (CIC) method of density estimation is a special case of smoothing with a product kernel along with a linear kernel function W (u) ∝ (1−u). Although the FiEstAS smoothing is similar to the cloud in a cell scheme of density estimation but is still unique in its own respect. The main difference being that the clouds which are the leaf nodes in case of FiEstAS smoothing are disjoint whereas in cloud in cell scheme or in general for Kernel based schemes they are overlapping. They can smooth over much smaller regions and hence achieve higher resolution as compared to FiEstAS smoothing. In bottom left panel we plot the estimates of FiEstAS smoothing alongside the estimates as obtained with product kernel with smoothing neighbors n=18. Instead of a linear kernel function we use the Epanechnikov kernel. It can be seen from the figure that the resolution achieved with product kernel is higher as compared to that of FiEstAS smoothing. When decomposition was done alternately in each dimension the median criterion gave more accurate results. However for En-BiD decomposition choosing the splitting point at either the mean or the median both gave similar results for density estimation of a hernquist sphere, but for a system having substructures the mean criterion gave better results. For all our analysis unless otherwise mentioned, for evaluating phase space densities we use EnBiD decomposition with Cubic Cells to determine the metric and then use the method of spherical kernel smoothing for calculating densities. The mean criterion is used for choosing the splitting point. The number of smoothing neighbors n is chosen to be 40, although choosing n = 10 gives higher resolution but it also has higher dispersion which means that the volume distribution function will be smoothed out below the scale set by the dispersion (see §3.2 for more explanation).
In Table. 1 we compare the CPU time needed to estimate the phase space density of 10 6 particles in a Hernquist sphere by various methods and techniques. The time as reported by Ascasibar & Binney (2005) for FiEstAS is labeled as AB FiEstAS and the time as reported by Arad et al. (2004) for Delaunay Tessellation method as AD Delaunay. It can be seen that most of the time is needed for smoothing. For both FiEstAS and Kernel smoothing, increasing the smoothing mass or the number of smoothing neighbors, increases the time. Our implementation of FiEstAS smoothing is slightly faster as compared to that of Ascasibar & Binney (2005) due to better cache utilization. This is achieved by ordering the particles just as they are arranged in the binary tree. The kernel smoothing which gives more accurate results requires a modest 20% more time as compared to the time reported in Ascasibar & Binney (2005) for FiEstAS. For median splitting it is possible to speed up the neighbor search by about 10%.

Maxwellian Velocity Distribution Models
For these models the phase space density is given by where ρ(r) is the real space density given by ρ(r) = e −r/(5rs) (r/rs) α (1 + r/rs) 3−α The velocity dispersion is assumed to be either constant with σc(r) = 0.1 or variable with σv(r) = M (r)/r. We generate models with α = 0 and α = 1.
The volume distribution function v(f ) for such systems is given by where In Fig. 4 we show the volume distribution function as recovered by EnBiD along with kernel smoothing for three different models 1) α = 0, σc 2) α = 1, σc and 3) α = 1, σv and with three different particle resolutions N = 10 4 , N = 10 5 and N = 10 6 . For the highest resolution the volume distribution can be recovered for about 9 to 13 decades in f . The range of densities over which the v(f ) is reliably recovered increases with increasing particle number. For systems with a sharp transition in slope of v(f ) for example α = 0, σc system, Delaunay Tessellation was found to significantly over-estimate v(f ) (Fig-A2 Arad et al. (2004)), because the measured v(f ) can be thought of as a convolution of the exact vt(f ) with a fixed window function p(f /ft). The narrower the p(f /ft) the closer is v(f ) to vt(f ). If vt(f ) varies significantly over scales smaller than the width of p(f /ft) the shape of recov- ered v(f ) will be affected. The v(f ) will be over-estimated for a system with a sharp change in the slope of v(f ). Moreover due to the width of p(f /ft) the effective cut-off value of f is also higher as compared to the theoretically expected upper bound. A bias in p(f /ft) will also affect the results. Delaunay Tessellation estimates have a width of about one decade in the distribution of p(f /ft). With EnBiD (using smoothing neighbors n = 40) for α = 0, σc system at the high f end there is very little width in the recovered values of f , this is the reason that v(f ) is recovered better by En-BiD as compared to Delaunay Tessellation (Fig. 4). For other systems the range of f over which v(f ) is recovered is slightly higher for EnBiD (using smoothing neighbors n = 10) as compared to Delaunay Tessellation (Fig-A2 and A3 Arad et al. (2004)).

PHASE SPACE STRUCTURE OF DARK MATTER HALOS
We are now applying our tools to the phase space structure of virialized dark matter halos in a concordance ΛCDM universe (Spergel et al. 2003; Melchiorri, Bode, Bahcall, & Silk 2003). The structure of these halos in real space has been studied in great detail over the past decade and the radial density profile is known to follow an almost universal form known as the NFW (Navarro, Frenk, & White 1996 profile (see however Navarro et al. (2004) for a new α profile).
ρ(r) = ρs (r/rs)(1 + r/rs) 2 The dark matter particles are collisionless and obey the collisionless Boltzmann equations. For a collisionless spherical system in equilibrium with a given density profile ρ(r) the phase space density f (r, v) can be calculated using the Eddington equation (Binney & Tremaine 1987).
Since f is a function of six variables it is hard to study except in cases where there are isolated integrals of motion which reduce the number of independent variables. To study the structure of phase space density, the function v(f ) is introduced which is the volume distribution function of f . v(f )df is the volume of phase space occupied by phase space elements having density between f to f + df . Arad et al. (2004) calculated the phase space density using Delaunay Tessellation in 6 dimensions and studied the volume distribution function of halos obtained from simulations. They found that v(f ) follows an almost universal form which is a power law with slope −2.5 ± 0.05 which is valid for about four decades from fV ir to fV ir 10 4 . fV ir is an estimate of the phase space density in the outer parts of the halo.
This behavior was also found to be independent of redshift and the mass of the halo. Ascasibar & Binney (2005) used the FiEstAS algorithm to calculate the phase space densities and confirmed the above result and in addition found slight deviations both at low and high f end. At the low f end (near fV ir ) the slope was found to be flatter than −2.5 and at the high f end it was found to be significantly steeper. At the high f end there are two relevant numerical phase space densities, above which two-body relaxation and discreteness effects in simulations start dominating. The phase space density above which the two body relaxation is shorter than the age of the universe is given by ) The above value is obtained by assuming a Coulomb logarithm of ln Λ = 6 and using t0 = 14.5Gyr as the age of the Universe. The phase space density, above which the discreteness effects discussed by Binney (2004) become important, is Since the steepening was found to roughly coincide with these densities, this effect was attributed by Ascasibar & Binney (2005) to the numerical effects of the simulations.
We analyze 5 halos at z = 0 simulated in a ΛCDM cosmology with Ω λ = 0.7, Ωm = 0.3. To evaluate the phase space densities we use the EnBiD scheme along with kernel smoothing employing n = 40 neighbors. Halos A, B, and C were isolated from a cosmological simulation of 128 3 dark matter particles in a 32.5h −1 Mpc cube performed by AP 3 M code (Couchman 1991) and were then re-simulated at higher resolution from z = 50 to z = 0 using the code GADGET (Springel, Yoshida, & White 2001). Halo A' is a warm dark matter (WDM) realization of halo A which was generated by suppressing power on scales smaller than the size of the halo. Halo D is from a simulation done with an ART code (Kravtsov et al. 1997) with a box size of 80h −1 Mpc. Further details are given in Table. 2. For calculating phase space densities we use the EnBiD tessellation scheme and smoothing is done with a spherical kernel employing n = 40 neighbors.
It can be seen from Fig. 5 that at the high f end there are differences between the phase space properties of halos as reproduced by EnBiD (kernel smoothing using 40 neighbors) and FiEstAS (Fi-EstAS smoothing using smoothing mass Ms = 2mp). We argue that the steepening of the volume distribution function as found by Ascasibar & Binney (2005) is probably an artifact of the FiEstAS algorithm since such a steepening also appears in tests done with a pure Hernquist sphere ( §3.1). For EnBiD we do not see such steepening; on the contrary, we see a slight hump. This however does not preclude the association of discreteness and relaxation effects with the phase space structure of halos. Since we do not know the real phase space density of the halo it is difficult to disentangle any such effect from the effect of the estimator. For a WDM halo whose profile we expect to be the same as that of a Hernquist sphere we do see a sudden change in slope at around f relax (Fig. 10). Also the slope parameter of ΛCDM halos have a maximum which is around f discr and beyond this it starts to fall off Fig. 8. At the low f end the flatness in v(f ) profile is partly due to the truncation of the halo at a finite radius r = RV ir . This is demonstrated in Fig. 6 where for a synthetic Hernquist sphere with Rcut = RV ir the v(f ) profile is found to flatten out beyond f = fV ir and α(f ) rises sharply. The cosmological halos exhibit a flattening that is more pronounced than the synthetic halos. This suggests that their structure is slightly different from that of an equilibrium spherical model corresponding to a given density profile. Models with anisotropy in the velocity dispersion also do not seem to suggest any extra flattening of the v(f ) profile. One possibility which was suggested earlier (Ascasibar & Binney 2005) was that this could be due to depletion of low density phase space by the presence of high density sub halos co-occupying the same space. This can be ruled out as the low f behavior of a WDM halo that does not exhibit significant substructure is identical to that of a ΛCDM halo.
Next we analyze the phase space structure of halos simulated in a ΛCDM cosmology. We see the existence of a slight departure from the constant power law behavior at the high f end (Fig. 7). The slope parameter α (Fig. 8) has a minimum at around f /fV ir = 10 and then it rises reaching a peak at around f /fV ir = 10 4 . Beyond this it starts to falls off.
In order to check whether the power law type behavior of the volume distribution function is due to the substructure or whether it is associated with the virialization process we simulated a WDM halo whose power on small scales has been suppressed and we find that it has a steeper slope at the high f end (Fig. 9). Its slope parameter α as a function of f is roughly consistent with that of a Hernquist sphere (Fig. 10). This suggests that the shape of the volume distribution function is governed by the amount of substructure and its mass function.       written as where µM is the mass of the largest sub halo. However, for γ = 1.9, as derived by (De Lucia et al. 2004), this model predicts v(f ) ∝ f −2.1 , rather than v(f ) ∝ f −2.5 as found in Arad et al. (2004). Ascasibar & Binney (2005) modified this model by pointing out that the lower limit of the integral in Eq. (19) cannot be zero (model AB) since the resolution of the simulation imposes a limit on the minimum mass that a sub halo can have. For a halo sampled with a finite number of particles each of mass mp the minimum mass of a sub halo is mmin ∼ 100mp. The analysis as done in Ascasibar & Binney (2005) assumes the sub halos to be Hernquist spheres and approximates its distribution function by a double power law where k = 3.25 × 10 18 M⊙ 2 Mpc −3 ( km s −1 ) −3 . The distribution function can then be written as In Fig. 11 we plot the slope parameter α as function of f as predicted by the AD and AB Toy models (Eq. (21)). It can be seen that in the limit the parameter mmin → 0 and parameter mmax → ∞ the AB model approaches the AD model. We can see that either model fails to reproduce the behavior seen in simulations.
In both the models it was assumed that the entire halo is made up by superposition of sub halos with a mass function given by dn/dm ∝ m −γ . In the analysis done by De Lucia et al. (2004), where this mass function was determined, the background parent halo which, which accounts about 90% of the total mass, is excluded from the calculation. The parent halo here is not a part of  the substructure population. We take this fact into account and develop a model in which we account separately for the contribution of the parent halo. The halo consists of 1) the parent halo with mass (1 − f sub )M modeled as a Hernquist sphere and 2) the substructure of total mass f sub M which is modeled as a superposition of Hernquist spheres with a mass function of dn/dm ∝ m −γ . To calculate the scale radius a of a sub halo of mass m we use the virial scaling relation MV ir ∝ R 3 V ir which gives m ∝ a 3 (assuming concentration parameter to be same for all sub halos). In Fig. 12 we plot the volume distribution function as predicted by this model for f sub = 0.1, mmin = 10 −4 M . In order to calculate v(f ) we employ a semi-analytic technique. We generate a sub halo popu- Figure 14. Effect of changing the number of smoothing neighbors on the slope parameter α for a WDM halo. Results are shown for kernel smoothing with smoothing neighbors n = 40 and n = 10. The slope parameter α for a Hernquist sphere and a model with m min = 10 −5 and f sub = 0.002 is also plotted alongside. Figure 15. The x Vs y and Vx Vs Vy scatter plot of particles having phase space density above 10 3 f V ir for a warm dark matter halo. In top panels the density is evaluated by using kernel smoothing with 40 smoothing neighbors while in lower panels the density is evaluated using 10 smoothing neighbors. lation corresponding to the given mass function and mass fraction f sub and then for a given value of f we sum the volume contribution of each sub halo along with the parent halo to give the total v(f ).
The v(f ) for each sub halo is determined using Eq. (9) and Eq. (10). The total v(f ) as predicted by this model is close to the expected v(f ) ∝ f −2.5 behavior but there is a presence of a slight hump in the high f part. This is similar to what we saw for ΛCDM halos Fig. 7. In the high f part v(f ) is dominated by the substructure component the transition being at around f /fV ir = 10 2 . In Fig. 13 we plot the slope parameter α as predicted by the model for various values of f sub and mmin.
In Fig. 8 α(f ) corresponding to model with parameters f sub = 0.05, mmin = 10 −5 M is compared against α(f ) for simulated halos. It can be seen that the model (analytical profile) is successful in qualitatively explaining the behavior of the simulated halos (namely the dip and the peak) but there is still some difference at the low f end. At the low f end near fV ir , parameter α rises much more sharply as compared to the model, even after taking the truncation effect into account. In (Fig. 14) we show the effect of varying the number of smoothing neighbors on the α profile of a WDM halo. Lowering the number of smoothing neighbors to 10 makes the slope parameter rise to a peak at the high f end. Since with n = 10 dispersion in density estimates is high, this also results in a slight flattening of the α profile around f = fV ir , where α is found to rise steeply. Plotted alongside is the α profile of the best fit parent + sub halo model. The α profile of the WDM halo is consistent with a model having substructure mass fraction f sub = 0.002. In Fig. 15 we plot the particles having f /fV ir > 10 3 in both real and velocity space. In top panels the density was estimated using n = 40 neighbors, while for lower panels the density was estimated using n = 10 neighbors. It can be seen from the figure that warm dark matter halo is not completely free from substructure. More substructure is resolved using smaller number of smoothing neighbors. The fact that even such a small amount of substructure can be detected demonstrates the superior ability of the estimator in resolving the high density regions. It also suggests that the slope parameter α plotted as a function of f can be used as a sensitive tool to estimate the amount of substructure and and the mass function of sub halos.
To further check the efficiency of the code in reproducing the phase space density of a system with substructure we generated a mock system with f sub = 0.1 and mmin = 10 −4 , and calculated its phase space density using EnBiD. The results are shown in Fig. 16 . The sub halos where distributed uniformly inside the virial radius of the parent halo and their center of mass velocity was also chosen so as to have a uniform random distribution within a sphere of radius VV ir in velocity space. For a system modeled with 10 6 particles, the phase space structure till f = 10 4 fV ir is successfully reproduced by using kernel smoothing with 10 smoothing neighbors. If 40 smoothing neighbors are used the high density regions are poorly resolved. Lowering the total number of particles in the system also leads to poor resolution at the high f end.

DISCUSSION & CONCLUSIONS
We have presented a method for estimation of densities in a multi-dimensional space based on binary space partitioning trees (Ascasibar & Binney 2005). We implement a node splitting criterion that uses Shannon Entropy as a measure of information available in a particular dimension. The new algorithm makes the scheme metric free and recovers maximum information available from the data with a minimum loss of resolution. In our tests on systems whose density distribution is known analytically, we find significant improvement in estimated densities as compared to earlier algorithms.
We suggest how kernel-based schemes (SPH) or in general any metric based scheme can be implemented within the framework of the new algorithm: the algorithm EnBiD is used to determine the metric at any given point, which has the property that locally the covariance of the data points has a similar value along all dimensions. Next we incorporate this metric into kernel-based schemes and use them for density estimation. We also show that SPH schemes suffer Figure 16. Dependence of slope parameter α on f as recovered by En-BiD from a parent + substructure model. The fraction of mass in the form of substructure is f sub = 0.1, and the minimum mass of the substructure is m min = 10 −4 M , M being the total mass of the system. The theoretically expected slope parameter for the above model and for a Hernquist sphere without any substructure is also plotted alongside. For a system sampled with 10 6 particles, the parameter α can be accurately predicted till f /f V ir = 10 4 using kernel smoothing with 10 smoothing neighbors. from a bias in their density estimates. We suggest a prescription that can successfully correct the bias.
As an immediate application, we employ this method to analyze the phase space structure of dark matter halos obtained from N-Body simulation with a ΛCDM cosmology. We find evidence for slight deviations from the near power law behavior of the volume distribution function v(f ) of halos in such simulations. At the high f end there is slight hump and the low f end there is significant flattening. We also analyzed a W DM halo and found that its slope parameter profile α(f ) at the high f end is consistent with that of an equilibrium Hernquist sphere having a very small amount of mass (0.2%) in the form of substructure.
In ΛCDM halos the contribution to the volume distribution function at the high f end is dominated by the presence of significant amount of substructure. We devise a toy model in which the halo is modeled as a Hernquist sphere and the substructure is modeled as a superposition of Hernquist spheres with a fixed mass fraction f sub and a mass function dn/dm ∝ m −1.9 . We demonstrate that this reproduces the behavior of v(f ) as seen in simulations.
The behavior of v(f ) and α(f ) depends upon the parameters f sub ,mass function dn/dm of sub halos, and mmin the minimum mass of the sub halo. Since the mass function of sub halos and their fraction f sub depends upon the power spectrum of initial conditions and on the cosmology adopted, the phase space structure of the halos might have an imprint of cosmology and initial conditions which might be visible in the profile α(f ).
Although the simple toy model that we propose here can explain the basic properties of the volume distribution function there is still some difference at the low f end. The flattening at low f end is more pronounced in simulated halos as compared to those of model halos, even after taking the truncation effect into account. Further improvements on the model described include: The toy model assumes that all sub halos obey the same virial scaling relation while in simulation there should be slight dependence on the time of formation of the sub halo. Moreover the sub halos may be tidally truncated and stripped and so their density profile may be different from that of a pure Hernquist sphere (Hayashi et al. 2003;Kazantzidis et al. 2004). Furthermore, there might be a radial dependence on the properties of sub halos. A detailed model which takes into account these effects might help explaining the phase space properties more accurately.
The issue of universality in the behavior of the volume distribution function still deserves further investigation. For the four halos that we have analyzed one of them had a nearly flat α(f ) profile and the others showed a characteristic dip at f ∼ 10fV ir and a corresponding rise which peaks at around f ∼ 10 3 fV ir . Larger samples of halos need to be investigated in order to put these results on a sound statistical basis. The differences that are seen in the properties of halos might be due to varying degree of virialization. The second concern is regarding the role of numerical resolution on the behavior of the volume distribution function. In the model the shape of the α(f ) profile depends upon the minimum mass mmin of the sub halo used to model the sub halo population. According to the model α(f ) has a minimum at around f /fV ir ∼ 10 and then it rises to a peak at around f /fV ir ∼ 10 3 whose maximum value is determined by the logarithmic slope of the mass function and is given be −(4 − γ). Beyond this point increasing the resolution should make the α(f ) reach a plateau and then fall off once it reaches the resolution limit of the simulation which occurs approximately at f relax /fV ir ∼ 10 −2 MV ir /mp. This suggests that a proper convergence study needs to be done to establish the universality in the phase space behavior of the halos. At higher resolution existence of a behavior different from the toy model suggested here would imply that there are some physical processes at work which significantly alter the properties of low mass sub halos and drive the system towards a universal behavior e.g. the one with a constant slope.
Our analysis here shows that the phase space properties of the halos that are roughly consistent with equilibrium spherical models with a given density profile in real space. A question of fundamental importance is regarding the origin of the universal behavior of these density profiles as seen in simulations. A clue to which might be found by studying as to how the system approaches equilibrium. The evolution of the distribution function of collisionless particles is governed by the collisionless Boltzmann equation. Since the coarsely-grained distribution function of collisionless particles can be measured directly with EnBiD, this offers interesting opportunities to study the processes of phase mixing and violent relaxation, which help the system to reach equilibrium. It might be interesting in this context to study the evolution of the volume distribution function of the halos with time.
Another interesting application of this method is to study the distribution function of equilibrium systems e.g. a disk that hierarchically grows inside a halo. One can study the distribution function of these systems and this can in turn be used to construct equilibrium models.
Finally we would like to point out a potential improvement in the code. If the density distribution in any dimension is linearly independent of the other dimensions then this offers an opportunity to further improve the density estimates by measuring the density distributions in different dimensions separately. The concept of mutual information offers one such way to quantify this linear dependence or independence. An algorithm can be developed which can exploit this feature and improve the density estimates in situations where the data offers such an opportunity.

APPENDIX A: KERNEL DENSITY ESTIMATE
For the so called kernel density estimate (KDE) a kernel W is defined such that The density estimate of a discretely set of N particles at a point x is given by while the probability densityf (x) is given bŷ The smoothing parameter h is chosen such that it encloses a fixed number of neighbors N smooth . Assuming spherical symmetry the kernel can be written in terms of a radial co-ordinate u only. Some of the popular choices are Gaussian function and the Bsplines (Monaghan & Lattanzio 1985). The later is preferred due to its compact support. A d dimensional multivariate bandwidth spherical kernel can be written as and the normalization f d is given by S d being the surface of a unit hyper-sphere in d dimensions V d its volume.
Some popular kernels are given below and their normalizations constants f d are listed in Table. A1 For kernels in product form where ui = xi/hi and f1 is the corresponding one dimensional normalization factor as given by Eq. (A6).

A1 Optimum Choice of Smoothing Neighbors
Iff (x) is the estimated probability density of a field f (x) then its mean square error (MSE) can be written in terms of its bias β(x) and variance σ(x). Bias of an estimate is given by while its variance is Hence mean square error is given by To get accurate estimates both bias and variance should be small. Using the fact that the bias and variance of an estimator can be calculated by using Eq. (A3) and expanding f (x ′ ) as a Taylor series about x. For a d dimensional multivariate kernel density estimate,the bias and variance are given by where H f (x) = ∂ 2 f ∂x i ∂x j is the Hessian matrix of function f (x).
Lowering h or equivalently lowering N smooth lowers β(x) but increases σ(x). Ideally the optimum choice of N smooth is given by minimizing the MSE. The bias β, which depends on the second order derivative of the field, is small for slowly varying fields, hence can be ignored. Since σ(x) ∝ 1/ √ N smooth , the variance increases as N smooth is decreased. The minimum value of N smooth that is needed to attain a given value of σ(x) is the optimum choice of number of neighbors. We define this lower limit on σ as 0.25f (x). In Fig. A1 σ is plotted as a function of number of dimensions d for N smooth = 100 (assuming f (x) = 1). The variance as obtained by applying kernel smoothing on a Poisson sampled data with N smooth = 100 is also shown alongside. They are in agreement. The variance σ does not increase exponentially with number of dimensions. Hence the optimum number of neighbors also do not have to grow exponentially with the number of dimensions. This means that even in higher dimensions kernel smoothing can be efficiently done employing a small number of neighbors. In higher dimensions the efficiency of the nearest neighbor search algorithm is the main factor which determines the time required for kernel density estimation. It can also be seen from Fig. A1 that for a fixed number of neighbors the spline kernel gives maximum variance while the Epanechnikov kernel gives the lowest variance. Eq. (A24) can be used to calculate the number smoothing neighbors N smooth required to achieve a given σ, for any given kernel in any arbitrary dimension. For density estimation with an Epanechnikov kernel in 6 dimensions, N smooth = 32 gives a variance of σ = 0.22 which is equivalent to a variance of 0.1 dex.

A2 Fraction of Boundary Particles
For a system of N particles uniformly distributed in a spherical region in a d dimensional space the fraction of particles f b that lie on the boundary increases sharply with the number of dimensions d. If l is the mean inter-particle separation then l = (V d r d /N ) 1/d and the fraction f b is given by f b = (S d r 2 /l 2 )/(V d r 3 /l 3 ) = dl/r = d(V d /N ) 1/d For N = 10 6 , the fraction f b is 0.05 and 0.79 for d = 3 and d = 6 respectively.

A3 Anisotropic Kernels
For planar structures which are not parallel to one of the co-ordinate axis one needs to adopt an anisotropic kernel to get accurate results. This is equivalent to a transformation with a rotation and a shear which diagonalizes the covariance matrix and then normalizes the eigenvalues (Shapiro et al. 1996). Let H be a diagonal matrix such that Hii = hi and x ′ = H −1 x. If C(x ′ ) is the covariance matrix locally at point x ′ then the kernel is given by where E is the eigenvalue matrix that diagonalizes C and D is the corresponding diagonal eigenvalue matrix. To keep the number of smoothing neighbors roughly constant we normalize the eigenvalue matrix, D → D/|D| 1/d , this preserves the smoothing volume. To identify the neighbors that contribute to the density at x one now needs to select a spherical region with radius h ′ = h max(D 1/2 ).

A4 Bias in Spline Kernels
Spline kernels have a bias in their estimated densities i.e. they systematically overestimate the density. This is not present for a regularly distributed data like a lattice or a glass like configuration where the inter-particle separation is constant 2 . This only occurs for a data which has Poisson noise and whose density is measured at the location of the data points. In some sense the bias is due to evaluation of the density at the location of Poisson peaks in the density distribution. The smaller the distance from the center the greater the weight of the kernel. When the density is estimated at the location of the particle the kernel assigns a very high weight to this particle since its distance is zero. Below is shown a simple calculation which demonstrates the bias in a spline kernel as compared to a top hat kernel which is free from such bias.
Assuming that the top hat kernel gives the correct density ft = k/(V d h d ). Taking one particle out from the smoothing region Figure A2. Kernel density estimates with and without bias correction. are shown for a system of N = 10 5 particles distributed uniformly in a 6 dimensional space with periodic boundaries. Probability distribution P (log(f /ft)) is plotted for spline kernel with smoothing neighbors n = 64 (left panel ) and Epanechnikov kernel with n = 32 (right panel). The mean < x > and dispersion σx of the best fit Gaussian distribution to x = log(f /ft), is also shown alongside.
should roughly give a density of It can be seen from Eq. (A31) that the bias decreases when the number of smoothing neighbors k is increased. This bias can be removed by displacing the central particle having r = 0 to r = hd/(1 + d), h being the radius of the smoothing sphere, and d the dimensionality of the space. This corresponds to the mean value of radius r of a homogeneous sphere in a d dimensional space . This correction should only be applied if the distribution of data is known to be irregular.
In Fig. A2 kernel density estimates with and without bias correction are shown for a system of N = 10 5 particles distributed uniformly in a 6 dimensional space with periodic boundaries. In left panel the probability distribution P (log(f /ft)) is plotted with and without bias correction, for kernel density estimate obtained using a spline function and smoothing neighbors n = 64. In right panel the probability distributions are plotted for kernel density estimates obtained using an Epanechnikov function and smoothing neighbors n = 32. The bias given by mean < x > of the best fit Gaussian distribution is also plotted alongside. According to Eq. (A31), in a 6 dimensional space for spline kernels with neighbors k = 64 the bias is < log(fsp/ft) >= 0.21 and for Epanechnikov kernel with k = 32 the bias is < log(fEp/ft) >= 0.04. These values are close to those shown in Fig. A2 for uncorrected estimates. The Epanechnikov kernel function has less bias than the spline kernel function. After correction, for both the kernels, the bias is considerably reduced.