A study on the statistical significance of mutual information between morphology of a galaxy and its large-scale environment

A non-zero mutual information between morphology of a galaxy and its large-scale environment is known to exist in SDSS upto a few tens of Mpc. It is important to test the statistical significance of these mutual information if any. We propose three different methods to test the statistical significance of these non-zero mutual information and apply them to SDSS and Millennium Run simulation. We randomize the morphological information of SDSS galaxies without affecting their spatial distribution and compare the mutual information in the original and randomized datasets. We also divide the galaxy distribution into smaller subcubes and randomly shuffle them many times keeping the morphological information of galaxies intact. We compare the mutual information in the original SDSS data and its shuffled realizations for different shuffling lengths. Using a t-test, we find that a statistically significant (at 99.9% confidence level) mutual information between morphology and environment exists upto the entire length scale probed. We also conduct another experiment using mock datasets from a semi-analytic galaxy catalogue where we assign morphology to galaxies in a controlled manner based on the density at their locations. The experiment clearly demonstrate that mutual information can effectively capture the physical correlations between morphology and environment. Our analysis suggests that physical association between morphology and environment may extend to much larger length scales than currently believed and mutual information can be used as an useful statistical measure for the study of large-scale environmental dependence of galaxy properties.


INTRODUCTION
The role of environment on galaxy formation and evolution is one of the most complex issues in cosmology. The present day Universe is filled with billions of galaxies which are distributed across a vast network namely the 'cosmic web' (Bond et al. 1996) that stretches through the Universe. This spectacular network of galaxies is made up of interconnected filaments, walls and nodes which are encompassed by vast empty regions. The galaxies broadly form and evolve in these four types of environments inside the cosmic web. One can characterize the environment of a galaxy with the local density at its location. The role of local density on galaxy properties is well studied in literature (Oemler 1974;Dressler 1980;Goto et al. 2003;Davis & Geller 1976;Guzzo et al. suman2reach@gmail.com † biswap@visva-bharati.ac.in 1997; Zehavi et al. 2002;Hogg et al. 2003;Blanton et al. 2003;Park et al. 2005;Einasto, et al. 2003a;Kauffmann et al. 2004;Mouhcine et al. 2007;Koyama et al. 2013;Bamford et al. 2009). It is now well known that the galaxy properties exhibit a strong dependence on the local density of their environment. However the role of large-scale environment on the formation and evolution of galaxies still remains a debated issue.
The growth of primordial density perturbations leads to collapse of dark matter haloes in a hierarchical fashion. It is now widely accepted following the seminal work by White & Rees (1978) that galaxies form at the centre of the dark matter haloes by radiative cooling and condensation. One of the central postulates of the halo model (Neyman & Scott 1952;Mo & White 1996;Ma & Fry 2000;Seljak 2000;Scoccimarro & Sheth 2001;Cooray & Sheth 2002;Berlind & Weinberg 2002;Yang, Mo & van den Bosch 2003) is that the halo mass determines all the properties of a galaxy. The halos are assembled through accretion and merger in different parts of the cosmic web. Different accretion and merger histories of the halos across different environments leads to assembly bias (Croton, Gao & White 2007;Musso, et al. 2018;Vakili & Hahn 2019) which manifests in the clustering of these halos. It has been shown that the large-scale environments in the cosmic web influence the mass, shape and spin of dark matter halos (Hahn et al. 2007,b). These observations suggest that the role of environment on galaxy formation and evolution may not be limited to local density alone. The morphology and coherence of large-scale patterns in the cosmic web may play a significant role in determining the galaxy properties and their evolution.
Observations suggest that the properties of satellite galaxies are strongly correlated with the central galaxy (Weinmann et al. 2006;Kauffmann et al. 2010;Wang et al. 2010;Wang & White 2012). Kauffmann et al. (2013) find that the star formation rates in galaxies can be correlated upto 4 Mpc. A recent work by Kerscher (2018) reported the existence of galactic conformity out to 40 Mpc. Some other works (Luparello et al. 2015;Scudder et al. 2012;Pandey & Bharadwaj 2006Darvish et al. 2014;Filho et al. 2015) report significant dependence of the luminosity, star formation rate and metallicity of galaxies on the large-scale environment. A recent study by Lee (2018) show that both the least and most luminous elliptical galaxies in sheetlike structures inhabit the regions with highest tidal coherence. A number of studies (Trujillo et al. 2006;Lee & Erdogdu 2007;Paz et al. 2008;Jones et al. 2010;Tempel & Libeskind 2013;) suggest alignment of halo shapes and spins with filaments which can extend upto 40 Mpc (Chen, et al. 2019). In a recent study Pandey & Sarkar (2017) use information theoretic measures to show that the galaxy morphology and environment in the SDSS exhibit a synergic interaction at least upto a length scale of ∼ 30 h −1 Mpc. A more recent study (Pandey & Sarkar 2020) find that the fraction of red galaxies in sheets and filaments increases with the size of these large-scale structures. Any such large-scale correlations beyond the extent of the dark matter halo are unlikely to be explained by direct interactions between them. Such large-scale correlations may be explained if the role of their large-scale environment are properly taken into account. Pandey & Sarkar (2017) use mutual information to quantify the large-scale environmental dependence of galaxy morphology. They find a non-zero mutual information between morphology of galaxies and their environment which decreases with increasing length scales but remains non-zero throughout the entire length scales probed. In the present work, we would like to test the statistical significance of mutual information between morphology and environment and study its validity and effectiveness as a measure of largescale environmental dependence of galaxy properties for future studies.
We propose a method where we destroy the correlation between morphology and environment by randomizing the morphological classification and measure the mutual information to test its statistical significance. We also divide the data into cubes and shuffle them around many times to test how the mutual information between morphology and environment are affected by the shuffling procedure. We carry out these tests using data from the Galaxy Zoo database (Lintott, et al. 2008). Further, we carry out a controlled test using a semi-analytic galaxy catalogue (Henriques et al. 2015) based on the Millennium simulation (Springel et al. 2005). The galaxies in these mock datasets are selectively assigned morphology based on their local density. We measure the mutual information between morphology and environment in each case and try to understand the statistical significance of mutual information in the present context. The goal of the present analysis is to explore the potential of mutual information as a statistical measure to reveal the large-scale correlations between environment and morphology if any.

SDSS DR16
We use data from the 16 th data release Ahumada, et al. (2019) of Sloan Digital Sky Survey (SDSS) York, et al. (2000). DR16 is the final data release of the fourth phase of SDSS which covers more than nine thousand square degrees of the sky and provides spectral information for more than two million galaxies. This includes an accumulation of data collected for new targets as well as targets from all prior data releases of SDSS. The data is downloaded through SciServer: CASjobs 1 which is a SQL based interface for public access. We identify a contiguous region within 0 • ≤ δ ≤ 60 • & 135 • ≤ α ≤ 225 • and select all galaxies with the apparent r-band Petrosian magnitude limit m r < 17.77 within that region. Here α and δ are the right ascension and declination respectively. We combine the three tables SpecObjAll, Photoz and ZooSpec of SDSS database to get the required information about each of these selected galaxies. We retrieve the spectroscopic and photometric information of galaxies from the SpecObjAll and Photoz tables respectively. The ZooSpec table provides The morphological classifications for the SDSS galaxies from the Galaxy Zoo project 2 . Galaxy zoo (Lintott, et al. 2008(Lintott, et al. , 2011 is a platform where millions of registered volunteers vote for visual morphological classification of galaxies. These votes contribute in identification of galaxy morphologies through a structured algorithm. The galaxies in galaxy zoo are flagged as spiral, elliptical or uncertain depending on the vote fractions. We only consider the galaxies which are flagged as spiral or elliptical with debiased vote fraction > 0.8 (Bamford et al. 2009). These cuts yield a total 136155 galaxies within redshift z < 0.3. We then construct a volume limited sample using a r-band absolute magnitude cut M r ≤ −20.5. This provides us 44049 galaxies within z < 0.096. The present analysis requires a cubic region. We extract a cubic region of side 145 h −1 Mpc from the volume limited sample which contains 14558 galaxies. The resulting datacube consists of 11171 spiral galaxies and 3387 elliptical galaxies. The mean intergalactic separation of the galaxies in this sample is ∼ 6 h −1 Mpc.

Millennium Run Simulation
Galaxy formation and evolution involve many complex physical processes such as gas cooling, star formation, supernovae feedback, metal enrichment, merging and morphological evolution. The semi analytic models (SAM) of galaxy formation (White & Frenk 1991;Kauffmann, White & Guiderdoni 1993;Cole et al. 1994;Baugh et al. 1998;Somerville & Primack 1999;Benson et al. 2002) is a powerful tool which parametrise these complex physical processes in terms of simple models following the dark matter merger trees over time and finally provide the statistical predictions of galaxy properties at any given epoch. In the present work, we use the data from a semi analytic galaxy catalogue (Henriques et al. 2015) derived from the Millennium run simulation (MRS) (Springel et al. 2005). Henriques et al. (2015) updated the Munich model of galaxy formation using the values of cosmological parameters from PLANCK first year data. This model provides a better fit to the observed stellar mass functions and reproduce the recent data on the abundance and passive fractions of galaxies over the redshift range 0 ≤ z ≤ 3 better than the other models. We use SQL to extract the required data from the Millennium database 3 . We use the peculiar velocities of the Millennium galaxies to map them in redshift space and extract all the galaxies with M r ≤ −20.5. Finally we construct 8 mock SDSS datacubes of side 145 h −1 Mpc each containing a total 14558 galaxies.

Random distributions
We simulate 10 Poisson distributions each within a cube of side 145 h −1 Mpc. 14558 random data points are generated within each of the 10 datacubes. For each cube, we randomly label 3387 points as elliptical and rest of the points are labelled as spirals. The number of galaxies and the ratio of spirals to ellipticals in these random data sets are identical to that observed in the original SDSS datacube. The results for mock Poisson distribution with randomly assigned morphology are also shown together for a comparison. The 1 − σ errorbars for the original SDSS data are estimated using 10 jack-knife samples drawn from the same dataset. For the SDSS random and Poisson random datasets each, we estimate the 1 − σ errobars using 10 different realizations. The right panel of this figure shows the t score as a function of length scales, obtained from a t-test which compares the SDSS galaxy distribution with randomized morphological classification to the SDSS galaxy distribution with actual morphological classification.

Mutual information between environment and morphology
We consider a cubic region of side L h −1 Mpc extracted from the volume limited sample prepared from SDSS DR16. We subdivide the entire cube into The probability of finding a randomly selected galaxy in the i th voxel is p(X i ) = N i N , where N i is the number of galaxies in the i th voxel and N is the total number of galaxies in the cube. The random variable X thus defines the environment of a galaxy at a specific length scale d h −1 Mpc.
The information entropy (Shannon 1948) associated with the random variable X at scale d is given by We use another variable Y to describe the morphology of the galaxies. We have only considered the galaxies with a classified morphology and hence there are only two possible outcomes: spiral or elliptical. If the cube consists of N sp spiral galaxies and N el elliptical galaxies then the information entropy associated with Y will be Now having the prior information about the morphology of each of the galaxies one can determine the mutual information between morphology of the galaxies and their environment.
The mutual information I(X; Y) between environment and morphology is , H(X) and H(Y) are the individual entropy associated with the random variables X and Y respectively. The joint entropy H(X, Y) ≤ H(X) + H(Y) where the equality holds only when X and Y are independent. The joint entropy is symmetric i.e. H(X, Y) = H(Y, X).
If N i j is the number of galaxies in the i th voxel that belongs to the j th morphological class ( j = 1 for spiral and j = 2 for elliptical), then the joint entropy H(X, Y) is given by, N is the joint probability derived from the conditional probability using Bayes' theorem.
The mutual information between two random variables measures the reduction in uncertainty in the knowledge of one random variable given the knowledge of other. A higher value of mutual information between two random variables convey a greater degree of association between the two random variables. One specific advantage of mutual information over the traditional tools like covariance analysis is that it does not require any assumptions regarding the nature of the random variables and their relationship.

Randomizing the morphological classification of galaxies
We consider each of the SDSS galaxies in the datacube and randomly identify them as spirals and ellipticals leaving aside their actual morphology. We randomly pick 3387 SDSS galaxies and tag them as ellipticals. Rest of the galaxies in the SDSS datacube are labelled as spirals. The number of spirals and ellipticals in the resulting distribution thus remains same as the original distribution.
We generate 10 such datacubes with randomly assigned galaxy morphology from the original SDSS datacube and measure the mutual information between environment and morphology in each of them. We would like to compare the mutual information I(X; Y) measured in the original SDSS data with that from the SDSS dataset with randomly assigned morphology to study the statistical significance of I(X; Y) and its scale dependence.

Shuffling the spatial distribution of galaxies
We divide the SDSS datacube of side L h −1 Mpc into N c = n 3 s smaller subcubes of size l s = L n s h −1 Mpc. Each of these smaller subcubes along with all the galaxies within them are rotated around three different axes by different angles which are ran- dom multiples of 90 • . The rotated subcubes are then randomly interchanged with any other subcubes inside the datacube. This process of arbitrary rotation followed by random swapping is repeated for 100× N c times to generate a Shuffled realization (Bhavsar & Ling 1988) from the original SDSS datacube. We carry out the shuffling procedure for three different choices n s = 3, n s = 7 and n s = 15 which corresponds to shuffling length l s = 48.33 h −1 Mpc, l s = 20.71 h −1 Mpc and l s = 9.67 h −1 Mpc respectively. We generate 10 shuffled realizations for each values of the shuffling length (l s ). Our goal is to compare the mutual information I(X; Y) measured in the original SDSS data with that from the shuffled datasets to test the statistical significance of I(X; Y) on different length scales.

Simulating different morphology-density correlations
The morphology-density relation is a well known phenomenon which indicates that environment play a crucial role in deciding galaxy morphology. We would like to test whether mutual information I(X; Y) can capture the strength of morphology-density relation in the galaxy distribution.
We construct a set of SDSS mock datacubes from a semi analytic galaxy catalogues as discussed in Section 2.2. We compute the local number density at the location of each galaxies using k th nearest neighbour method (Casertano & Hut 1985). We find the distance to the the k th nearest neighbour to each galaxy. The local number density around a galaxy is estimated as, Here r k is the distance to the k th nearest neighbour and V(r k ) = 4 3 πr 3 k . We have used k = 10 in this analysis. Our goal is to test if I(X; Y) can capture the degree and nature of correlation between environment (X) and morphology (Y). The elliptical galaxies are known to reside preferen-tially in denser environments. Each mock SDSS datacubes from the SAM contains a total 14558 galaxies. We would like to assign a morphology to each of these galaxies. To do so, we first sort the number density at the locations of galaxies in a descending order. We consider three different schemes which are as follows, (i) We randomly label 3387 galaxies as ellipticals from the top 30% high density locations and consider the rest of the 11171 galaxies as spirals.
(ii) We randomly label 3387 galaxies as ellipticals from top 50% high density locations and consider the rest of the 11171 galaxies as spirals.
(iii) We randomly label 3387 galaxies as ellipticals irrespective of their local density and consider the rest of the 11171 galaxies as spirals.
The morphology-density relation in case (i) is stronger than case (ii) and there is no morphology-density relation in case (iii). We would like to test if mutual information I(X; Y) can correctly capture the degree of association between environment and morphology in these distributions.

Testing statistical significance of the difference in mutual information with t test
We use an equal variance t-test which can be used when both the datasets consists of same number of samples or have a similar variance. We calculate the t score at each length scale using the following formula, where σ s = (n 1 −1)σ 2 1 +(n 2 −1)σ 2 2 n 1 +n 2 −2 ,X 1 andX 1 are the average values, σ 1 and σ 2 are the standard deviations, n 1 and n 2 are the number of datapoints associated with the two datasets at any given lengthscale.
We would like to test the null hypothesis that the av- erage value of mutual information in the original and randomized or shuffled distribution at a given lengthscale are not significantly different. We find that randomizing or shuffling the data always leads to a reduction in the mutual information between morphology and environment. We use a one-tailed test with significance level α = 0.0005 which corresponds to a confidence level of 99.9%. The degrees of freedom in this test is (n 1 + n 2 − 2). The same test is also applied to asses the statistical significance of I(X; Y) in mock datasets where a morphology-density relation is introduced in a controlled manner. We compute the t score at each length scale using Equation 7 and determine the associated p value to test the statistical significance.

Effects of randomizing the morphological classification
We show the mutual information I(X; Y) between environment and morphology as a function of length scale in the SDSS datacube in left panel of Figure 2. The result for the SDSS datasets with randomly assigned morphology is also shown in the same panel for a comparison. This shows that there is a significant reduction in I(X; Y) at each length scale due to the randomization of morphological information of the SDSS galaxies. We find that a finite non-zero mutual information still persists at each length scale even after the randomization of morphology. We also show together the mutual information measured in the Poisson datacubes with   Table 2. This table shows the t score and the associated p value at each length scale when we compare the mutual information between actual SDSS data and its shuffled realizations for different shuffling lengths. The grid size for each n s is chosen in a such a way so that the shuffling length is not equal or an integral multiple of the grid size.
Grid size n s = 3 n s = 7 n s = 15  randomly assigned morphology in the left panel of Figure 2. Interestingly, we find that the non-zero mutual information between X and Y in the Poisson distributions are nearly same as the SDSS datacube with randomly assigned morphology. The information entropy H(X) associated with environment at each length scale d remains unchanged, as the position of each galaxies in the resulting distribution remains same as the original SDSS distribution. There would be also no change in the information entropy H(Y) associated with morphology of the galaxies as the number of spirals and ellipticals remains the same after the randomization. However this procedure would change the joint entropy H(X, Y). The randomization of morphological classification would turn the joint probability distribution to a product of the two individual probability distribution i.e. p(X i , Y j ) = p(X i )p(Y j ). The adopted procedure is thus expected to destroy any existing correlations between environment and morphology and consequently any non-zero mutual information between environment and morphology should ideally disappear after the randomization.
However in left panel of Figure 2, we find that I(X; Y) does not reduce to zero after the randomization of morphology of the SDSS galaxies. This residual nonzero mutual information can be explained by the results obtained from the Poisson datacubes with randomly assigned morphology. The results show that I(X; Y) in the Poisson datacubes with randomly assigned morphology and SDSS datacube with randomly assigned morphology are nearly the same. This suggests that a part of the measured mutual information arises due to the finite and discrete nature of the galaxy sample. The origin of this residual information is thus non-physical in nature and should be properly taken into account during such analysis.
The reduction in I(X; Y) due to the randomization of morphology suggests that a part of the measured mutual information I(X; Y) must have some physical origin. Interestingly, left panel of Figure 2 shows that randomization leads to a reduction in the mutual information at each length scale. We test the statistical significance of these differences at each length scale using a t test. We show the t score at each length scale in the right panel of Figure 2. The critical t score at 99.9% confidence level for 18 degrees of freedom are also shown in the same panel. The t score and the associated p value at each length scale are tabulated in Ta-ble 1. We find a strong evidence against the null hypothesis which suggests that the differences in the mutual information I(X; Y) in the two distributions are statistically significant at 99.9% confidence level for the entire length scales probed. This clearly indicates that the association between environment and morphology is not limited to only the local environment but extends to environments on larger lengthscales.

Effects of shuffling the spatial distribution of galaxies
We divide the SDSS datacube into a number of regular subcubes using different values of l s as discussed in Section 3.3 and shuffle them many times to generate a set of shuffled realizations for each shuffling length. The Figure 3 shows the distributions of ellipticals (brown dots) and spirals (blue dots) in the original unshuffled SDSS datacube along with one realization of the shuffled datacubes for each shuffling length. The size of the shuffling units used to shuffle the data in each case are shown with a red subcube at the corner of the respective shuffled datacubes. A comparison of the shuffled datacubes with the original SDSS datacube clearly shows that the coherent features visible in the actual data on larger length scales progressively disappears with the increasing shuffling length. It may be noted that both the measurement of I(X; Y) and shuffling requires us to divide the datacube into a number of subcubes. In each case, we choose the shuffling lengths and the grid sizes so that the shuffling length is not equal or integral multiple of grid size or vice versa. This must be ensured to avoid any spurious correlations in I(X; Y). We compare the mutual information I(X; Y) in the original and shuffled datasets in the left panel of Figure 4. For each shuffled datasets we observe a reduction in I(X; Y) at different length scales. A smaller reduction in I(X; Y) is observed at smaller length scales whereas a relatively larger reduction in I(X; Y) is seen on larger length scales.
It may be noted that the morphological information of galaxies remain intact after shuffling the data. The shuffling procedure keeps the clustering at scales below l s nearly identical to the original data but eliminates all the coherent spatial features in the galaxy distribution on scales larger than l s . Shuffling is thus expected to diminish any existing correla-tions between environment and morphology. Measuring the mutual information between environment and morphology in the original SDSS data and its shuffled versions allows us to address the statistical significance of I(X; Y). The mutual information is expected to reduce by a greater amount on scales above the shuffling length l s because shuffling destroys nearly all the coherent patterns beyond this length scale. On the other hand, we expect a relatively smaller reduction in I(X; Y) below the shuffling length l s . This can be explained by the fact that most of the coherent features in the galaxy distribution below length scale l s survive the shuffling procedure. However some of the coherent features which extend upto l s but lie across the subcubes would be destroyed by shuffling. Shuffling may also produce a small number of spatial features which are the product of pure chance alignments. These random features are unlikely to introduce any physical correlations between environment and morphology. A comparison of I(X; Y) between the original and shuffled data at different length scales for different shuffling length thus reveal the statistical significance of the degree of association between environment and morphology on different length scales.
We find that I(X; Y) decreases monotonically at all length scales with decreasing shuffling lengths. Figure 4 shows that I(X; Y) for n s = 15 or l s ∼ 10 h −1 Mpc still lies above the values that are expected for an identical Poisson random distributions. A greater reduction in I(X; Y) on larger length scales for each shuffling length considered suggests that the mutual information between environment and morphology is statistically significant on these length scales. I(X; Y) in actual data and shuffled data for different shuffling lengths do not differ much on smallest length scale as the coherent structures on these length scales are nearly intact in all the shuffled datasets. However when shuffled with smaller values of l s , greater number of coherent structures on larger length scales are lost. This explain why reduction in I(X; Y) increases with decreasing shuffling length.
We employ a t test to test the statistical significance of the observed differences in I(X; Y) in original and shuffled datasets at different length scales. The t score and the corresponding p value at each length scale are tabulated in Table 2. The t score for the shuffled datasets for three different shuffling length are shown as a function of length scale in the right panel of Figure 4. The critical value of the t score at 99.9% confidence level for 18 degrees of freedom are shown together in the same panel. We find that the differences in I(X; Y) in the shuffled and unshuffled SDSS data are statistically significant at 99.9% confidence level at nearly the entire length scale probed.
We find a weak evidence against the null hypothesis for all the shuffling lengths at smaller length scales. This arises due to the fact that the coherence between environment and morphology are retained on smaller scales when the data is shuffled with a comparable or larger shuffling lengths. However we note that a considerable reduction in I(X; Y) can occur even below the shuffling length for n s = 7 and n s = 3. A subset of the coherent features extending below the shuffling length may lie across the subcubes used to shuffle the data. These coherent structures will be destroyed by the shuffling procedure even when they are smaller than the shuffling length. The number of such coherent structures which belongs to this particular group is expected to increase with the size of the subcubes due to their larger boundary.
The results shown in Figure 4 thus indicates that the association between environment and morphology is certainly not limited to their local environment but extends throughout the length scales probed in this analysis.

Effects of different morphology-density correlations
In Figure 5, we show the distributions of spirals and ellipticals in mock SDSS datacubes from SAM. We show one distribution for each of the simulated morphology-density relations.
We show the mutual information I(X; Y) as a function of length scales for the three different density-morphology relation in the left panel of Figure 6. When the elliptical are randomly selected from the entire distribution irrespective of their density then we do not expect any mutual information between morphology and environment. The non-zero mutual information in this case is just an outcome of the finite and discrete nature of the distributions. We find that the results for this case is identical to that expected for a Poisson distribution with same ratio of spirals to ellipticals.
However when ellipticals are preferentially selected from denser regions, the mutual information between morphology and environment rises above the values that are expected for a Poisson random distribution. The figure Figure 6 shows that mutual information I(X; Y) is significantly higher than Poisson distribution when galaxies are randomly tagged as elliptical from the top 50% high density positions. We find that the mutual information between morphology and environment increase further to much higher values when galaxies are randomly identified as ellipticals from the top 30% high density regions. We note a change in I(X; Y) at all lengthscales upto 50 h −1 Mpc. A larger change in I(X; Y) is observed on smaller length scales whereas the change in I(X; Y) becomes gradually smaller on larger length scales. This indicates that the morphology-density relations simulated here, become weaker on larger length scales.
We use a t test to asses the statistical significance of the differences in I(X; Y) in the mock datasets with and without a morphology-density relation. We tabulate the t score and the corresponding p value at each length scale for the two mock datasets in Table 3. In the right panel of Figure 6, we show the t score as a function of length scales in two mock datasets with different morphology-density relation. The critical value of the t score at 99.9% confidence level for 14 degrees of freedom are shown together in the same panel. The results suggest that a statistically significant difference (99.9% confidence level) exists between the datasets with and without a morphology-density relation. Interestingly, these differences persist throughout the entire length scale probed in the analysis. This indicates that the correlation between environment and morphology is not limited to the local environment but extends to larger length scales.
The morphology-density relations considered here are too simple in nature. In this experiment, we find that the mutual information between morphology and environment decreases monotonically with increasing length scales. Contrary to this, the SDSS observations show that mutual information initially decreases with increasing length scales and nearly plateaus out at larger length scales. The schemes used for the morphology-density relation in this experiment are not realistic in nature. But they clearly shows that mutual information can effectively capture the degree of association between morphology and environment and such a relation may extend upto larger length scales.

CONCLUSIONS
In the present work, we aim to test the statistical significance of mutual information between morphology of a galaxy and its environment. The morphology-density relation is a well known phenomenon which has been observed in the galaxy distribution. The relation suggests that the ellipticals are preferentially found in denser regions of galaxy distribution whereas spirals are sporadically distributed across the fields. It is important to understand the role of environment in galaxy formation and evolution. The local density at the location of a galaxy is very often used to characterize its environment. It is believed that the environmental dependence of galaxy properties can be mostly explained by the local density alone. The mutual information between environment and morphology for SDSS galaxies has been studied by Pandey & Sarkar (2017) where they find that a non-zero mutual information between morphology and environment persists throughout the entire length scale probed. They show that the mutual information between environments on different length scales may introduce such correlations between environment and morphology observed on larger length scales. We would like to critically examine the statistical significance of the observed non-zero mutual information between morphology and environment on different length scales. We propose three different methods to asses the statistical significance of mutual information. These methods also help us to understand the relative importance of environment on different length scales in deciding the morphology of galaxies.
Three different tests are carried out in the present analysis. In the first case, we randomize the morphological information about the SDSS galaxies without affecting their spatial distribution. In the second case, we shuffle the spatial distribution of the SDSS galaxies without affecting their morphological classification. Both these tests show that the mutual information between morphology and environment are statistically significant at 99.9% confidence level throughout the entire length scales probed in this analysis. We find that a small non-zero mutual information can be observed even in a random distribution without any existing physical correlations between environment and morphology. This non-zero value originates from the finite and discrete nature of the distribution. Interestingly, the mutual information between environment and morphology in the SDSS datacube is significantly larger than the randomized datasets throughout the entire length scales probed. Shuffling the SDSS datacube also affect the mutual information between environment and morphology in a statistically significant way at nearly the entire length scales considered. This suggests that the association between morphology and environment continues upto a larger length scales and these correlations must have a physical origin. In a third test, we construct a set of mock SDSS datacubes from the semi an-alytic galaxy catalogue where we assign morphology to the simulated galaxies based on the density at their locations. We vary the strength of the simulated morphology-density relation and measure mutual information between environment and morphology in each case. Our results suggest that mutual information effectively capture the degree of association between environment and morphology in these mock datasets.
Every statistical measure have their pros and cons. One particular drawback of mutual information is that it does not tell us the direction of the relation between two random variables i.e. the measured mutual information does not provide us the simple information that the ellipticals and spirals are preferentially distributed in high density and low density regions respectively. But the mutual information reliably captures the degree of association between any two random variables irrespective of the nature of their relationship. So in the present context, mutual information can be an effective and powerful tool to quantify the degree of influence that environment imparts on morphology across different length scales. The amplitude of mutual information quantify the strength of correlation between morphology and environment on different length scales. It also helps us to probe the length scales upto which the morphology of a galaxy is sensitive to its environment.
The effects of local density on morphology of galaxies is understood in terms of various types of galaxy interactions, ram pressure stripping and quenching of star formation. These processes may play a dominant role in shaping the morphology of a galaxy. However they may not be the only factors which decides the morphology of a galaxy. The presence of large-scale coherent features like filaments, sheets and voids may induce large-scale correlations between the observed galaxy properties and their environment. We need to understand the physical origin of such correlations and incorporate them in the models of galaxy formation.
One can also study the mutual information between environment and any other galaxy property to understand the influence of environment on that property at various length scales. The relative influence of environment on different galaxy properties on any given length scale may provide useful inputs for the galaxy formation models. Finally we note that mutual information between environment and a galaxy property is a powerful and effective tool which can be used successfully for the future studies of large-scale environmental dependence of galaxy properties.

ACKNOWLEDGEMENT
SS would like to thank UGC, Government of India for providing financial support through a Rajiv Gandhi National Fellowship. BP would like to acknowledge financial support from the SERB, DST, Government of India through the project CRG/2019/001110. BP would also like to acknowledge IUCAA, Pune for providing support through associateship programme. The Millennium Simulation data bases (Lemson & Virgo Consortium 2006) used in this paper and the web application providing online access to them were constructed as part of the activities of the German Astrophysical Virtual Observatory.