Path optimization in 0+1 dimensional QCD at finite density

We investigate the sign problem in 0+1 dimensional QCD at finite chemical potential by using the path optimization method. The SU(3) link variable is complexified to the SL(3,$\mathbb{C}$) link variable, and the integral path is represented by a feedforward neural network. The integral path is then optimized to weaken the sign problem. The average phase factor is enhanced to be greater than 0.99 on the optimized path. Results with and without diagonalized gauge fixing are compared and proven to be consistent. This is the first step of applying the path optimization method to gauge theories.


Introduction
Solving the sign problem for complex actions in the path integral is one of the grand challenges in theoretical physics. When the Boltzmann weight is a complex-valued highly oscillating function, the strong cancellation takes place and it is difficult to control the numerical error of integration. The sign problem is a generic problem in finite density fermion systems such as finite density Quantum Chromodynamics (QCD). There have been various approaches proposed so far to avoid or evade the sign problem in finite density lattice QCD. These approaches include the reweighting method [1][2][3][4], the Taylor expansion method [5][6][7], the analytic continuation [8][9][10][11][12], and the canonical method by using the imaginary chemical potential [13][14][15][16][17][18], the density of states method [19,20], and the strong coupling QCD [21][22][23][24][25]. Unfortunately, these methods work only in the region of µ/T 1, with heavy quark mass, or in the strong coupling regime at present.
In addition, several approaches based on complexified variables have been developed, and recently attract much attention. It may be possible to discuss physics with severe sign problem. These approaches include the complex Langevin method [26,27], the Lefschetz thimble method [28][29][30], and the path optimization method [31][32][33][34]. In this study, we use the path optimization method.
In the path optimization method, the integration path (or manifold) is prepared appropriately, and optimized so as to weaken the seriousness of the sign problem. The present authors have proposed this method [31], and have applied it to the one-dimensional integral of a highly oscillating complex function [31], the two-dimensional scalar field theory at finite density [32], and the Polyakov-loop extended Nambu-Jona-Lasinio model [33,34]. This method has been also applied to the Thirring model [35] and the one-dimensional complex scalar field theory at finite density [36]. Now, it is important to discuss the sign problem in gauge theories by using the path optimization method. As the first step, we consider 0+1 dimensional QCD (0+1D QCD). 0+1D QCD has the sign problem at finite chemical potential, and the temporal hopping term of quarks is the origin of the sign problem, as in 3+1 dimensional QCD. The analytic solution is known for 0+1D QCD [37], then it is a good laboratory to examine the framework for the sign problem in gauge theory. Actually, 0+1D QCD has been studied by using the complex Langevin method [38], the Lefschetz thimble method [39], the subset method [40,41], and so on.
In this article, as the first step towards investigating the sign problem in gauge theories in 3 + 1 dimension, we apply the path optimization method to 0+1D QCD at finite chemical potential. We develop the method to complexify the SU(3) link variable to that in SL(3,C). The integral path is prepared by using a feedforward neural network, and is optimized to evade the sign problem. We have found that the average phase factor on the optimized path exceeds 0.99, which may allow us to study finite density QCD at higher dimensions. We also compare the results with and without diagonalized gauge fixing, and the results are found to be consistent with each other. This paper is organized as follows. In Sec. 2 and 3, we explain 0+1D QCD and the path optimization method, respectively. We also discuss how to complexify the link variable. In Sec. 4, we show the results with and without diagonalized gauge fixing, and compare these results. In Sec. 5, we summarize this paper.

0 + 1 dimensional QCD
In this study, we consider the 0 + 1 dimensional QCD at finite density with and without diagonalized gauge fixing in the path optimization method. Details are shown below.

Partition function and observables
The partition function of the 0+1D QCD in the Euclidean spacetime is given as [37,42], with the lattice action where we consider one species of staggered fermion χ, U τ is the SU(3) link variable in the temporal direction, m denotes the bare fermion mass, µ is the chemical potential, and N τ is the number of lattice cites in the temporal direction. By using the gauge transformation, one can reduce the degree of freedom to one link variable, and the partition function is rewritten as 2/12 where X = 2 cosh(E/T ), E = arcsinh(m), and T = 1/N τ is the temperature. The determinant can be written by using the Polyakov loop and its conjugate for N c = 3, det D(U ) =X 3 + N c X 2 (e µ/T P U + e −µ/T P U ) + N c X(e 2µ/T P U + e −2µ/T P U + N c P U P U − 1) where P U = TrU/N c and P U = TrU −1 /N c are those before taking the average. By using the one-link integral, dU U ab U −1 cd = δ ad δ bc /N c , we find dU P = dU P = 0 and dU P P = 1/N 2 c . Then the integral of det D(U ) is obtained as Z = dU det D(U ) = X 3 − 2X + 2 cosh(3µ/T ).
An analytic expression of the partition function is known as which agrees with the result for N c = 3 given in the previous paragraph. Then the quark condensate (σ = χχ ) and the quark number density (n q ) in 0+1D QCD are obtained as where the spatial volume is unity in 0+1D QCD, V = 1. The action in Eq. (2) is invariant under the following transformation at m = 0; The quark condensate χχ is not invariant under this transformation, and can be regarded as the order parameter of the symmetry breaking. It should be noted that the above transformation is an analogue to the chiral transformation in 3+1 dimensional QCD, by replacing ǫ τ with ǫ x = (−1) x0+x1+x2+x3 . Thus we refer to this as the chiral symmetry in 0+1D QCD, and the quark condensate χχ is referred to as the "chiral" condensate. The other way to consider the chiral condensate in the odd-dimensional system is using the higher dimensional spinor-representation; see Ref. [43] for the 3-dimensional system. The Polyakov loop for N c = 3 is also obtained as which requires the technique developed in Ref. [41,44]. It is also possible to obtain the Polyakov loop for N c = 3 by using Eq. (5) and the one-link integral, dU U ab U cd U ef = ε ace ε bdf /N c !, which gives dU P 3 U = 1/N 3 c . It should be noted that there is no spatial dimensions, and there is no "deconfinement" in 0+1D QCD. Thus we cannot regard the Polyakov loop as the order parameter of the deconfinement. Nevertheless, the Polyakov loop represents the energy of single quark excitation. 3/12

Diagonalized gauge fixing
By using the remaining gauge degrees of freedom, the link variable can be diagonalized as U diag = diag(e iθ1, , e iθ2 , e iθ3 ), then the integral in Eq. (1) is rewritten as where S eff = − log det D(U diag ), and H(θ) is the Haar measure given by the Vandermonde's determinant [45], The fermion determinant in this gauge is simply written as In this diagonalized gauge, we have only two independent variables, θ 1 and θ 2 , and then explicit estimation of the integral on mesh points is available. We show the comparison between results with and without the gauge fixing in the later discussions.

Path optimization for 0+1 dimensional QCD
In the path optimization method, the trial functions, z(x; C), which represents the optimized integral path should be considered in the complex integral-variable space where C mean the parameters in the trial function. Details are explained below.

Cost function and reweighting
The parameters C are optimized to reduce the cost function which represents the seriousness of the sign problem. Specifically, we adopt the following cost function: with where W represents the Boltzmann weight and · · · pq denotes the phase quenched average; Then, one can calculate the expectation value of observable as By optimization of parameters, we can enhance the average phase factor, and calculate the expectation values with small error, in principle.

With diagonalized gauge fixing
We shall now apply the path optimization method to 0+1D QCD. In the case with diagonalized gauge fixing, we perform integration on the 2D mesh points, where n represents the mesh point. We give the imaginary parts y 1,2 by using the feedforward neural network, with i = 1, 2 and are the output variables of the input and hidden layers in the feedforward neural network. The variational parameters w, b, ω are collectively denoted by C and g(·) is called the activation function. In this paper, we use the hyperbolic tangent function as the activation function. We include the Haar measure in the Boltzmann weight, W = He −Seff . It is also possible to regard the imaginary parts y (n) as the variational parameters when we fix the mesh points; see Methods A and B explained in Sec. 4.

Without diagonalized gauge fixing
In the case without diagonalized gauge fixing, we parametrize the link variable as where λ a (a = 1, ..., 8) are the Gell-Mann matrices and the parameters y a are given by the feedforward neural network in the same way as Eq. (20); The Jacobian matrix can be calculated as follows. If the local coordinate x (|x| ≪ 1) around U ∈ SU(3) is given as e ixaλa U , the Haar measure at U is obtained as d 8 x = dx 1 dx 2 · · · dx 8 . Similarly, we consider the local coordinate z around U (U ) ∈ SL(3, C) as e izaλa U (U ), where z is given as a function of x by the implicit equation, e izaλa U (U ) = U (e ixaλa U ). By differentiating this equation with respect to x, we have The Jacobian matrix is then obtained as Therefore, the Haar measure at around U (U ) is given by To calculate the phase-quenched expectation value and to evaluate the cost function, we need the Monte-Carlo sampling, where U (k) denotes the SU (3)  details about the way to evaluate the cost function F and to optimize the parameters C stochastically are explained in Ref. [32].
It should be noted that we complexify the SU(3) matrix U to an SL(3; C) matrix U , rather than complexifying x a . For example, y a in Eq. (21) is the imaginary part of z a , complexified variable of x a , only when x and y are small enough. A more natural parametrization would be U (U ) = exp(iz a λ a ) with z a = x a + iy a , but the derivative of U with respect to y becomes complicated in this parametrization. Function forms other than that in Eq. (21) can be acceptable. The parametrization dependence should be compensated by the Jacobian.

Numerical results
As mentioned in the previous section, we perform the numerical calculation in three different ways as follows.

Method A:
The 2D mesh integral in the diagonalized gauge where the imaginary parts are regarded as the variational parameters.

Method B:
The 2D mesh integral in the diagonalized gauge where the imaginary parts are given by using the feedforward neural network.

Method C:
The Monte-Carlo sampling without diagonalized gauge fixing where the mapped SL(3) link variable is given by using the feedforward neural network. It should be noted that this method is a promising way to be utilized in realistic 3 + 1 dimensional QCD simulation.
We first summarize our numerical setup below, and next we show our numerical results.

Numerical setup
In the Methods A and B, we utilize the standard gradient descent method to optimize the imaginary parts via the equation,Ċ = −∂F cost /∂C, which shows the fictitious time evolution. In the Method A, we prepare 30 × 30 or 60 × 60 mesh points to perform the integration, and we update the parameters with the fictitious time step ∆t = 10 −3 ∼ 10 −2 .
We also invoke smearing of the imaginary part. When the average phase factor is larger after averaging the imaginary part with the adjacent mesh points, we adopt that configurations as a further optimized path. In the Method B, the number of mesh points is 25 × 25, and the learning rate is set to ∆t = 10 −3 ∼ 10 −2 . In the Method C, the optimization is performed by using the stochastic gradient descent method. Actually, we employ Adadelta [46] as the optimizer. Parameters in the 0+1D QCD action are given as m = 0.05, T = 0.5 (N τ = 2), and the chemical potential range of µ/T = 0 ∼ 2 is discussed. We note that the quark number density is almost saturated in the region µ/T > 2.

Results with diagonalized gauge fixing: Methods A and B
Let us now discuss the numerical results in the diagonalized gauge. In Fig. 1 imaginary part is suppressed strongly as shown in the middle panel. This suppression comes mainly from the suppression of the complex phase of the Boltzmann weight, and in part from the cancellation of the complex phase of the Jacobian and the Boltzmann weight. In the right panel of Fig. 1, we show the imaginary part of the Boltzmann weight, Im(W ), which has larger absolute values compared with Im(JW ). This is one of the merits of using the path optimization method: The optimization including the complex Jacobian effects leads to smaller imaginary part of JW .  In Fig. 2, we show y 1 , the imaginary part of z 1 , as a function of real parts, (x 1 , x 2 ). The imaginary part of z 2 is obtained by exchanging x 1 and x 2 . We compare the results in the Method A (left) and those in the Method B (right). The standard gradient descent method and the feedforward neural network give qualitatively the same but somewhat different 7/12 results of y 1 . In the statistically significant region (inside the white curve), these results are more consistent with each other. Since the cost function does not have sensitivity to the region with small weight, results can be different outside of the statistically significant regions as we expected. Results with mesh points of 30 2 (red circles) and 60 2 (blue squares) are shown in comparison with the average phase factor without optimization. We show the difference of the absolute values of the average phase factor from unity, 1 − | e iθ |.
In Fig. 3, we show the average phase factor as a function of µ/T obtained on the 30 2 and 60 2 mesh points in the Method A. We show the difference from unity of the average phase factor. After optimization, the average phase factor is enhanced to be greater than 0.997, i.e. 1 − | e iθ pq | < 3 × 10 −3 . While the average phase factor is large even without optimization, | e iθ pq | > 0.95, the difference with and without optimization appears strongly in finite spatial volume cases, as discussed later. The minimum average phase factor appears at around µ/T = 0.5.
As already mentioned, we clearly see that the statistical weight has six separated regions on the (x 1 , x 2 ) plane in the diagonalized gauge. This separation comes from the structure of the Haar measure and it causes the numerical problem in Monte-Carlo sampling: it is difficult to sample all of relevant regions beyond the energy barriers between them. It is, of course, not a problem in the mesh point integration, but we cannot apply the mesh point integration to more realistic problems in field theories because of its enormous numerical cost. Thus we proceed to discuss the results based on the Monte-Carlo integral without the diagonalized gauge fixing.

Results without diagonalized gauge fixing: Method C
Next, we discuss the results without diagonalized gauge fixing. Since we have eight independent variables, the mesh point integration is not possible to perform. We utilize the feedforward neural network to prepare and optimize the path, and we apply the hybrid Monte-Carlo method to sample the configurations in the Method C. Figure 4 shows the average phase factor with and without the path optimization. The path optimization is performed by using the method C and the results without the path optimization is estimated by 8/12 using the 2D mesh point integration. We can clearly see that the path optimization increases the average phase factor in all values of µ/T shown in the figure.  In Fig. 5, we show the expectation values of the chiral condensate (σ), the quark number density (ρ), and the Polyakov loop (P ), as functions of µ/T in comparison with exact results. Here, we use N conf = 1000 for σ and ρ and N conf = 10000 for P . Since the fluctuation of P is larger than those of σ and ρ, we need larger number of configurations. The expectation values on the optimized integral path agree well with exact results within the error bar. The chiral condensate decreases rapidly in the µ/T = 0.5 ∼ 1.0 region and the Polyakov loop seems to stay at small values above µ/T > 2.
It is interesting to check if we can reproduce the statistical weight distribution obtained in the diagonalized gauge (Methods A and B) shown in the left panel of Fig. 1, by using configurations obtained in the method C. We diagonalize the link variable by the similarity transformation, U = P U diag P −1 , and (z 1 , z 2 ) are specified by the diagonal matrix elements, 9/12 µ/T=1, T=0.5, m=0.05 The scatter plot of (x 1 , x 2 ) in the Method C. The variables (x 1 , x 2 ) are obtained by diagonalizing the link variables, specifying the eigenvalues as (e iz1 , e iz2 , e iz3 ), and taking the real parts of (z 1 , z 2 ). The configurations are generated by using the hybrid Monte-Carlo method.
U diag = diag(e iz1 , e iz2 , e iz3 ). Figure 6 shows the scatter plot of the real part of z 1 and z 2 . From the figure, we can see that the relevant six regions are actually visited in the Markov-chain hybrid Monte-Carlo sampling.

Summary
In this article, we have examined the validity and usefulness of the path optimization method in the 0 + 1 dimensional QCD. The 0 + 1 dimensional QCD is a toy model of the realistic 3 + 1 dimensional QCD, but it contains the same terms, the temporal hopping terms of quarks, which causes the sign problem in 3 + 1 dimension; we can use it as a good laboratory to investigate several issues of the sign problem. We have found that the path optimization method combined with the feedforward neural network is a useful tool to study the sign problem from several reasons. First, it can well improve the average phase factor. Second, the statistical weight distributions with and without diagonalized gauge fixing agree with each other. We find that, in 0+1D QCD with diagonalized gauge fixing, the statistical weight distribution is separated into six regions, which could prevent us from sampling configurations equally well from these regions in Markov-chain Monte-Carlo methods. However, the six regions are found to be well visited in Monte-Carlo configurations generated without diagonalized gauge fixing. This fact indicates that the energy barriers in the variable space with the diagonalized gauge, (x 1 , x 2 ), are apparent ones, and do not cause trouble in gauge unfixed calculations. Thirdly, some of the observables are demonstrated to be obtained correctly. We have calculated the chiral condensate, the quark number density, the Polyakov loop and its conjugate as functions of µ/T , and these results reproduce the exact ones within the errors. This is not surprising, since the integral of holomorphic functions are independent of the path as long as the path does not go through the singular points of the Boltzmann weight, which do not exist in the region of finite imaginary parts of integral variables.
The average phase factor without the gauge fixing on the optimized path is found to be larger than 0.995. This value is corresponding to the situation that the average phase factor 10/12 is about ∼ 0.08 on the 8 3 × N τ lattice, provided that other terms in the 3+1 dimensional QCD do not make the average phase factor worse. The average phase factor of 0.08 seems to be small, but it is not impossible to perform the Monte-Carlo sampling in the current computer power. Compared with the 2D mesh point integration, the average phase factor is still smaller. Then it would be possible to further optimize the integral path by, for example, modifying the cost function to be more sensitive to the average phase factor at around e iθ pq ≃ 1. The path optimization method, thus, may be a promising method to overcome the sign problem in realistic QCD, if we can reduce the numerical cost; for example, we need to reduce the Jacobian computation. The promising way is the diagonal ansatz of the Jacobian matrix [35] and the nearest-neighbor lattice-cite ansatz [36].