Application of the independent component analysis to the iKAGRA data

We apply independent component analysis (ICA) to the real data from a gravitational wave detector for the first time. ICA separates various sources of signals from multiple detection channels making use of non-Gaussian nature of the statistical distributions of the sources. Specifically we use the iKAGRA data taken in April 2016, and calculate the correlations between the gravitational wave strain channel and 35 physical environmental channels. Using a couple of seismic channels which are found to be strongly correlated with the strain, we perform ICA. Injecting a sinusoidal continuous signal in the strain channel, we find that ICA recovers correct parameters with enhanced signal-to-noise ratio, which demonstrates usefulness of this method.


Introduction
Ever since Einstein found the existence of a gravitational wave solution in his theory of general relativity in 1916, it took exactly a century for mankind to succeed in its direct detection.This delay is primarily due to the fact that the gravitational force is an exceedingly weak force compared with other interactions.
The first detection of a gravitational wave by the advanced Laser Interferometer Gravitational wave Observatory (aLIGO) [1] brought a great impact on science and told the beginning of gravitational wave astronomy.Following aLIGO and advanced Virgo, the large-scale cryogenic gravitational wave telescope (LCGT) now known as KAGRA, has been constructed in Kamioka, Japan [2].KAGRA will play very important roles in the international network of gravitational wave detection by measuring the number of polarization property, which is indispensable to prove the general relativity, and by improving the sky localization of each event significantly.As the first underground and cryogenic detector, it will also provide important information to the third-generation detectors.
Because gravity is the weakest force among the four elementary interactions, gravitational waves have high penetrating power.Therefore, they enable us to see deep inside dense matter, such as neutron stars, and bring information that electromagnetic waves cannot.On the other hand, this property makes its detection very difficult.It is very important to develop methods for extraction of these tiny signals.If the detector noises are normally distributed, the appropriate analysis methods, such as matched filtering [3], are known.However, the problem is not so simple, as it is known that there exist non-Gaussian noises in real data.They decrease the performance of the analysis methods assuming Gaussianity of the noises.What is worse, these noises may be mistaken for true signals and increase the false alarm probability.It is necessary to deal with non-Gaussianity properly as stressed in [4].
In this situation, independent component analysis (ICA) [5,6,7] occupies a unique position among methods of signal extraction because it makes use of non-Gaussianity of signals and noises instead of treating it an obstacle.In this paper we report the results of application of ICA to the iKAGRA data and discuss its usefulness in gravitational wave data analysis.This method assumes only statistical independence of the noises and does not impose any other conditions on their distributions.
The rest of the paper is organized as follows.In §2, we introduce ICA in the simplest case only one environmental channel is incorporated to the strain channel and review analytic formulas obtained in our previous paper [8].In §3, we present our application of ICA to the iKAGRA data with injected artificial continuous signal.The final section §4 is devoted to conclusion.

Independent Component Analysis (ICA)
As is seen in our previous paper [4], signal detection under non-Gaussian noises is much more involved than the case with Gaussian since the optimal statistic has much compli-cated forms.Independent component analysis is an attractive method of signal extraction because it makes use of non-Gaussian nature of the signals [5,6,7] (see [9,10] for textbooks).
Here we consider the following simple problem as a first step to test applicability of this approach for detection of GWs.Let us consider the case where we have two detector outputs, x 1 (t) and x 2 (t) (t stands for the time).The former is the output from the laser interferometer, namely, the strain channel, and the latter is an environmental channel such as an output of a seismograph.We wish to separate gravitational wave signal h(t) and non-Gaussian noise k(t) using the data of t x(t) = (x 1 (t), x 2 (t)).
As the simplest case we assume that there is a linear relation between the outputs and the sources: where A is assumed to be a time independent matrix.By definition the gravitational wave signal obeys a probability distribution function (PDF) where h(t, θ) is the actual waveform of gravitational radiation emitted from some source, where θ collectively denotes parameters of the source.On the other hand, we do not specify the PDF of k(t), r 2 (s 2 ), except that it is a super-Gaussian distribution such as a Poisson distribution with a larger tail than Gaussian.
The detector output of a laser interferometer, of course, suffers from Gaussian noises n(t) besides non-Gaussian noise k(t).Hence, Eq. (1) should actually read Here we have not incorporated any Gaussian noise to the second line where the signal k(t) itself consists of (non-Gaussian) noises and any Gaussian noise can be incorporated to a part of it.
Following [8], we introduce a trick to replace the original source s 1 (t) = h(t) by s 1 (t) = h(t) + n(t), that is, we regard the Gaussian noise as a part of the original signal.Since n(t) is a Gaussian with vanishing mean, its statistical property is entirely characterized by the two-point correlation function K(t − t ) = n(t)n(t ) .Then the marginal distribution function of s 1 (t) is given by Thus s 1 (t) now satisfies a simple Gaussian distribution which is much easier to handle with than the delta-function distribution (2), and s(t) and x(t) are related by a simple formula x(t) = As(t).Now our goal is to find the inverse matrix of A whose components are not known precisely.One may set it as since the gravitational wave is so weak that it will not affect any environmental meters such as a seismograph.The aim of ICA is to find a linear transformation such that two components of the transformed variables y are statistically independent of each other.Thanks to the assumption (5), the matrix W also takes a form If we knew all the components of A, the matrix W could simply be given by the inverse matrix W = A −1 , in which case we would find y = s.However, since we do not know it, we attempt to determine W in such a way that the components of y, y 1 (t) and y 2 (t) to be statistically independent as much as possible.
In general, mutual independence of statistical variables may be judged by introducing a cost function L(W ) which represents a "distance" in the space of statistical distribution functionals.In [8] the Kullback-Leibler divergence [11], which is defined between two arbitrary PDFs p(y) and q(y) as D[p(y); q(y)] = p(y) ln p(y) q(y) dy = E py ln p(y) q(y) , was adopted to obtain a matrix which realizes statistical independence between y 1 and y 2 .Here E p [•] denotes an expectation value with respect to a PDF p. Ideally, the matrix W should be found in such a way that the distance between the distribution of y, p y (y), which is constructed from the observed distribution function of x through the linear transformation y = W x as p y (y) and the real distribution function of statistically independent source variables s, r(s Here ||X|| denotes the determinant of a matrix X.However, since we do not know the form of r(s) a priori, we minimize the cost function where q(y) = q 1 (y 1 )q 2 (y 2 ) is an appropriately chosen distribution function, to find the matrix W [8]. In fact, it is known that even for an arbitrary choice of q(y), the real W gives an extremum of L q (W ).In the above expression (10), H[x] denotes the entropy of the distribution of x defined by H[x] ≡ −E px [ln p(x)], which can be obtained from observed distribution and has nothing to do with the matrix W . From we can solve for the components of W , w ij .While the solution of Eq. ( 11) has been obtained in [8], we can show that the same expression can be obtained for our particular problem with a 21 = w 21 = 0 more easily as we show below.

Correlation method
From now on we replace the ensemble average E[•] by temporal average of observed values of x which we denote by brackets. From and we find y 2 (t) = w 22 x 2 (t) = w 22 a 22 s 2 (t) = w 22 a 22 k(t) consists of an environmental channel only, while y 1 (t) depends both on strain and seismic channels.In order to guarantee statistical independence we need to satisfy y 1 (t)y 2 (t) = 0, which is equivalent to requiring We therefore obtain Since ICA does not uniquely determine the overall factor of y by nature, this relation suffices for our purpose to determine y 1 .These are what we calculated in our previous paper [8].
Here we develop three components method for further analysis and we apply this method in §3.2.2.This is achieved by the analogy of the GramSchmidt process which is a method for orthonormalising a set of vectors, and it can be extended to the case where there are more than three components.For three components, y(t) and x(t) become Because of the gauge degree of freedom, we can take w 32 = 0 without loss of generality and choose We first require y 2 (t)y 3 (t) = y 2 (t)x 3 (t) = 0.This gives following relation, Based on this, we can choose If we take 21) is symmetrical with respect to the permutation of x 2 (t) and x 3 (t).

FastICA method
Next we introduce another method to obtain a matrix W called FastICA [12] which can be easily implemented even when x(t) has more than two components.In this method, assuming that each component, s i (t), of source vector s(t) is properly normalized with vanishing mean, we first apply whitening to the detector outputs x(t) and take the dispersion of each source s i (t) to be unity.This is achieved in the following way.First let the normalized eigenvector and corresponding eigenvalue of a matrix x t x be c i and λ i , respectively (i = 1, 2, ...), and define a matrix Γ by Γ = (c 1 , c 2 , c 3 , ...), and Λ −1/2 by Λ −1/2 = diag(λ , ...).Then the whitened variable x(t) is defined by Here we have used the statistical independence of each component of the normalized source term s i .This means that the matrix Ã is an orthogonal matrix and that W may be identified with t Ã for whitened output data x.Thus we may restrict W to be an orthogonal matrix, too, after appropriate whitening 1 .
Then choosing q(y) as a product of marginal distributions, q(y) = py (y) ≡ i pi (y i ), pi (y i ) = p y (y)dy 1 ...dy i−1 dy i+1 ..., the cost function reads where H i [y i ] ≡ − dy i pi (y i ) ln pi (y i ) is the entropy of the marginal distribution of y i .When W is an orthonormal matrix, only the last term matters to determine W . Hence minimization of the cost function for W is achieved by minimizing entropy of the marginal distribution of each variable.This is the spirit of the FastICA method.It has been proposed to maximize the negentropy defined by which is a positive semi-definite quantity, instead of the entropy itself.Here ν is a random Gaussian variable with vanishing mean and unit variance.
In order to achieve easier implementation of the method, however, we minimize a simpler cost function L(w i ) for each row vector w i constituting the matrix W as W ≡ (w 1 , w 2 , ...).Since W is an orthogonal matrix now, we find |w i | 2 = 1, so the cost function may be defined as where G is an appropriate nonquadratic function and β is a Lagrange multiplier.Minimization of Eq. ( 27) corresponds to solving the following equation: where g(y) = G (y). FastICA solves for this equation starting from an arbitrary initial choice of w i in terms of the Newton method.Several other methods of ICA have also been tested using mock data without incorporating environmental channels [13].

Analysis of iKAGRA data
The initial engineering run of KAGRA without the cryogenic system was done in March and April, 2016 [14].From the results of many time series data we analyzed, we report those of two datasets of 224 second long.One starts from 20:15:11 UTC on April 14, 2016.The other starts from 01:01:35 UTC on April 17, 2016.For each dataset, we calculated correlation between the strain channel and each of 35 physical environmental monitor (PEM) channels.We found that almost all of these channels strongly correlated for the latter dataset (strongly correlated data) and weakly correlated for the former dataset (weakly correlated data) 2 .The amplitude spectrum density (ASD) of strain for each data set is depicted in Fig. 1.We chose two channels which showed large correlation for each dataset.Those channels are listed in Table 1.For both dataset, 4724ch had the largest correlation with the strain.This channel is the output of the seismograph that observes vertical vibration installed at the end of the X arm.Both 4774ch and 4823ch are the outputs of seismographs installed at the end of the Y arm, and they observe horizontal vibration orthogonal to each other.We have made mock strain data injecting sinusoidal continuous waves to the strain channel and applied two methods of ICA, which were introduced in the previous section, to this mock data and those environmental channels.We utilized the python implementation of FastICA from scikit-learn. 3We found that results often depend on initial conditions where the Newton method is started.To mitigate this, we parallelly generated thirty realizations and chose one which gives the highest SNR.

Global performance
First, we analyze how the signal-to-noise ratio (SNR) changes before and after noise separation by ICA for mock data with varying frequencies f .We performed matched filter (MF) analysis to both the raw mock strain data and the noise-removed data in terms of the two methods of ICA using 4724ch as an environmental channel.For various f , we calculated SNR by applying MF with the same frequency as the injected signal.We simultaneously plot the results against the data before and after ICA to assess the global performance of ICA.For strongly correlated data, the results are shown in Fig. 2.
Figure 2: SNR for varying f with and without ICA using 4724ch for the strongly correlated dataset.Red line corresponds to the raw mock strain, while green and blue lines are noiseremoved data using the correlation method and FastICA, respectively.
In this dataset, strain had larger amplitude than the other dataset, and we set A = 9 × 10 −10 .As one can see from Fig. 2, SNRs are homogeneously enhanced by ICA for f 0.1Hz.Correlation method enhances the SNR more than FastICA.However, there are anomalous peaks at frequencies 0.01Hz and 0.04Hz.As shown in Fig. 1(a), even in the absence of injection strain has large amplitude at these frequencies, which is predominantly contributed by seismic noises.We also found that their oscillation phases are more or less stable during the time period we analyzed.Such noises are difficult to be differentiated from our sinusoidal signal waveform and hence yield large SNR of mock strain as shown in Fig. 2.This however indicates that by removing contribution of the noises, the SNR can possibly be reduced rather than enhanced provided the injected signal is moderate.This is actually realized in the analysis based on the correlation method as seen in Fig. 2. On the other hand, in the case of FastICA, the reduction of SNR is not seen.This is solely due to our implementation, which tries to increase the SNR as much as possible as mentioned before.In that sense, around the 0.01Hz and 0.04Hz peaks, blue line in Fig. 2 corresponds to the SNR of the separated noise.Apart from these low frequencies contaminated by seismic noises, we find that ICA improves SNR significantly throughout the entire frequency range with f 0.1Hz.However, based on these considerations, it is deduced that ICA works even near the peak due to seismic noise.For weakly correlated data, the results are shown in Fig. 3.The amplitude of strain at this time period is moderate, and we set A = 3 × 10 −11 .As is seen in Fig. 3, the SNR of the data with ICA is higher than the mock data in several frequency ranges.Comparing FastICA with the correlation method, correlation method has fewer frequencies where the SNR falls below that of mock data.As for weakly correlated data, 4774ch had the second highest correlation with strain.If we use 4774ch instead of 4724ch as the environmental data, the result changes as shown in Fig. 4. Compared with the case 4724ch is used (Fig. 3), the frequency region where the SNR rises is different.As a whole the improvement of SNR is less significant, which is a natural result considering that the correlation coefficient of 4774ch is smaller than that of 4724ch.Next, we perform parameter estimation using strongly correlated data to examine whether ICA can recover correct parameters of injected signal.We injected the signal wave form in Eq. ( 29) with f = 0.125Hz and A = 1.3 × 10 −9 .We applied MF analysis to search for the frequency with the highest SNR which corresponds to the maximum likelihood estimation of the parameter.We compare how the result of parameter estimation changes before and after ICA and how much the SNR changes.
Figure 5 depicts the SNR before and after applying ICA.In this case, we can see the effect of seismic noise directly.By ICA with 4724ch, SNR at f ∼ 0.01Hz is reduced and that at the injected frequency f = 0.125Hz is successfully enhanced.From this result, we deduce that 4724ch is highly correlated to the 0.01Hz peak.On the other hand, the peak of 0.04Hz is still higher, which turned out to be correlated to 4823ch which had second largest correlation with the strain, as we will see below.

Correlation method
As is seen §3.1, the correlation method shows more stable performance than FastICA, although it is much simpler.This method can be generalized to multi-channel analysis.As a first step to multi-channel analysis, here we investigate the effectiveness of three components analysis, which we developed in §2.1, including two PEM channels which strongly correlated to the strain.For this purpose we have used the mock data including the same signal wave as in the previous subsection, and applied the three components correlation method to this mock data, 4724ch and 4823ch.The result is depicted in Fig. 6.We simultaneously plotted the results of two components analysis in which we used 4724ch and 4823ch respectively.The green and black lines correspond to the cases where noises are removed using one PEM channel.While the 0.01Hz peak was reduced by using 4724ch, the 0.04Hz peak was reduced by using 4823ch.However, both peaks can not be reduced by only using one channel.The data with ICA using two PEM channels (cyan line) has much higher SNR than the data with ICA using only one PEM channel.In addition, we successfully reduced both 0.01Hz peak and 0.04Hz peak.This result suggests that by combining many environmental channels we can effectively remove noises with various characteristic frequencies.

FastICA
As explained in §2.2, FastICA can be easily implemented even when there are more than two components.We applied FastICA to the mock data, 4724ch and 4823ch simultaneously.Here, mock data included the same sinusoidal signal as in the previous section.The result is shown in Fig. 7.As compared to Fig. 5, SNR at fiducial frequency is much higher than the case where only 4724ch was used.This result suggests that the use of multiple environmental channels can enhance the effect of FastICA noise separation.However, compared to the case of the three components correlation method, 3ch FastICA can remove only a small amount of noise at the 0.04Hz peak and SNR at fiducial frequency is lower than that with the correlation method (10.60 for FastICA, 10.89 for correlation method).This indicates that correlation method is more effective than FastICA for this dataset.

Parameter estimation for weakly correlated data
We also perform parameter estimation for weakly correlated data.Here, we used 4724ch as an environmental channel.From Fig. 3, ICA using 4724ch is most effective for f = 0.227Hz with this dataset.We injected sinusoidal wave signal with f = 0.227Hz and A = 3×10 −11 .Again, we applied MF to search for the frequency with the highest SNR.The result is depicted in Fig. 8.The red line represents SNR calculated with the raw mock strain.The green and blue lines correspond to the noise-removed strain by the correlation method and FastICA, respectively.An enlarged figure of the fiducial (f = 0.227Hz) area is shown in Fig. 8 (b).As one can see, in the case of the raw mock strain, the position of the SNR peak deviates from the fiducial one.On the other hand, after applying ICA, the SNR is increased and the peak is found at the correct frequency.Next, we applied multiple channels ICA for this data using 4724ch and 4774ch.Here, we used correlation method.Figure 9    The green and black lines correspond to the data with ICA using one PEM channel.When using only 4774ch, enhancement of SNR is small and still the SNR peak deviates from the fiducial frequency.However, the data with ICA using two channels has slightly higher SNR at correct frequency than the data with ICA using only 4724ch.This result for weakly correlated data also supports our expectation that the effect of ICA can be enhanced by combining many environmental channels.

Conclusion
In the present paper, we have demonstrated usefulness of ICA in gravitational wave data analysis in application to the iKAGRA strain and environmental channels.Assuming continuous waves as input signal, we have shown that ICA can enhance SNR in particular when the strain channel has large correlation with environmental ones.Moreover, we have shown that ICA can correctly recover input frequencies in parameter estimation.We have also found that combining multiple environmental channels can enhance the effect of ICA.In this paper we analyzed low frequency range mainly because iKAGRA measured mostly the low frequency seismic noises due to the simplified vibration isolation system compared with the full designed specification which will be realized with bKAGRA.Another reason is that only PEM channels measuring low frequency part were available for iKAGRA.These two problems will be solved in bKAGRA with improved vibration isolation system and various PEM channels, and we should be able to apply ICA in the hectoHerz frequency range relevant to GW physics.We will analyze O3 KAGRA data and apply ICA to them.

Figure 1 :
Figure1: ASD of strain for two datasets.For strongly correlated data, ASD below 0.1Hz becomes much larger than that of weakly correlated data.This means that strongly correlated data is contaminated by seismic noise with low frequency.

Figure 3 :
Figure 3: The same figure as in Fig. 2 but for the weakly correlated dataset.

3. 2
Parameter estimation for strongly correlated data 3.2.1 Two channels ICA

Figure 5 :
Figure 5: Parameter estimation with fiducial frequency f=0.125Hz.Correspondence of each line is the same as in Fig. 2.
(a) Overall view of the result.(b) Around the fiducial frequency.

Figure 8 :
Figure 8: Parameter estimation for weakly correlated data with fiducial frequency f=0.227Hz.
depicts the results of analysis.The enlarged figure of the fiducial area is shown in Fig. 9 (b).
(a) Overall view of the result.(b) Around the fiducial frequency.

Table 1 :
Correlation between PEM channels and strain.