Application of neural network to sign problem via path optimization method

We introduce the feedforward neural network to attack the sign problem via the path optimization method. The variables of integration is complexified and the integration path is optimized in the complexified space by minimizing the cost function which reflects the seriousness of the sign problem. For the preparation and optimization of the integral path in multi-dimensional systems, we utilize the feedforward neural network. We examine the validity and usefulness of the method in the two-dimensional complex $\lambda \phi^4$ theory at finite chemical potential as an example of the quantum field theory having the sign problem. We show that the average phase factor is significantly enhanced after the optimization and then we can safely perform the hybrid Monte-Carlo method.


Introduction
The feedforward neural network [1][2][3][4] is a widely used mathematical model in the machine learning.Its name is originated from similarities with animal brains; there are the input, hidden and output layers and their connections replicate brains.In the model, we have some parameters and these are used to learn training data and/or minimize the function which is so called the cost function.Therefore, this model can be used in the optimization problems and thus we can apply it to attack the sign problem.
The sign problem is caused by the integral of rapidly oscillating function, and thus it appears in many fields in physics such as the elementary particle physics, hadron and nuclear physics and the condensed matter physics.Particularly, the sign problem has attracted much more attention recently in Quantum Chromodynamics (QCD) at finite baryon density because there are remarkable progress based on the complex Langevin method [5,6] and the Lefschetz thimble method [7][8][9][10][11].In the Complex Langevin method, we generate configurations by solving the complex Langevin equation, and the sign problem does not appear.However, the Complex Langevin method may provide wrong results when the drift term shows singular behavior [12,13].In comparison, the Lefschetz-thimble method is based on the Picard-Lefschetz theory [7] and within the standard path-integral formulation.We construct the integral path referred to as the Lefschetz thimbles by solving the holomorphic flow starting from fixed points.Then the partition function can be expressed by the sum of contributions from relevant Lefschetz thimbles.Relevance of a thimble can be determined by the crossing behavior of the dual thimble, the holomorphic flow to the fixed point.Relevant thimbles can be also obtained by modifying the integral path from the original integral path according to the holomorphic flow [10].On each Lefschetz thimble, the imaginary part of the action is constant, but the Jacobian generated by the bending structure of the integral path causes a phase (residual sign problem).In addition, there is the cancellation between different relevant thimbles (global sign problem).Recently, one more serious problem in the Lefschetz-thimble method has been discussed [14].If the action has the square root or the logarithm, there appear singular points and cuts on the complexified variables of integration, which obstruct to draw continuous Lefschetz-thimbles in the numerical calculation of holomorphic flows.Thus it is valuable to develop a method, in which an appropriate integral path is obtained without suffering from the singular points of the action.
Recently, the present authors have proposed a new method to attack the sign problem which is so called the path optimization method [15].The path optimization method is based on the complexification of variables of integration.We prepare the trial function representing the modified integral path, which is given as a simple function having a few parameters.The parameters are tuned to minimize the cost function which reflects the seriousness of the sign problem.In Ref. [15], we have demonstrated that the optimized path matches the Lefschetz thimbles around the fixed points in a one-dimensional model with a serious sign problem.In systems having large degrees of freedom, it is too tedious to prepare the functional form of the trial function by hand, then it is helpful to apply the machine learning technique to obtain the optimized integral path.
Machine learning has been recently utilized to attack the sign problem [11,16].In Ref. [16], the mono-layer neural network is adopted, and the optimized path is found to agree with that in Ref. [15].In Ref. [11], the multi-layer neural network is applied to Thirring model with supervision by some of the configurations from generalized Lefschetz thimble method.
In this article, we use the feedforward neural network with a hidden mono-layer to apply the path optimization method to a quantum field theory with a sign problem, the complex λφ 4 theory with finite chemical potential.The average phase factor, the scatter plot of fields and the expectation value of the number density are investigated.Some attempts have been done to apply the machine learning to QCD [17][18][19], but our application is a bit different; our machine learning is the unsupervised learning and thus the process is similar to that for board games [20,21] rather than that for image recognitions [22]; we optimize our integral path in the plane of complexified variables of integration to aim to win the sign problem.
This paper is organized as follows.In Sec. 2, we explain the formalism of the path optimization method.Section 3 shows details of the feedforward neural network.In Sec. 4, we examine the method in two-dimensional λφ 4 theory.The action and the numerical setup are explained in Sec.4.1 and Sec.4.2, respectively, and results are shown in Sec.4.3.Section 5 is devoted to summary.

Path optimization method
The path optimization method is the method of varying the integral path in the plane of complexified variables of integration to minimize the cost function [15]; the cost function should reflect the seriousness of the sign problem and then the optimization process controls how much the numerical integration is precise.Actual procedure is given as follows.For the theory with n degree of freedoms, we start form the complexification of variables of 2/10 integration with parametric variables t, x i ∈ R → z i (t) ∈ C where i = 1, • • • n.Next, we set the trial function containing some parameters to represent the modified integral path.The optimization of the cost function is then performed by tuning parameters in the trial function.
One of the candidates for the cost function is given as where Z is the partition function, J(t) = det(∂z i /∂t j ) is the Jacobian, and e iθ pq means the average phase factor, The subscripts pq means the phase quenched average.The first term in the cost function, Eq. ( 1), is proportional to the inverse average phase factor e iθ −1 pq , Then this cost function represent the seriousness of the sign problem and depends on the detailed shape of the deformed integral path.
In calculation of the cost function, Eq. ( 1), we assume that integral of analytic function is independent of the path due to Cauchy's integral theorem.Integrals on two different paths are the same as long as the contribution from infinity |z| = ∞ is zero and the deformed path does not go across the singular points.In this paper, we treat the complex φ 4 theory, and the action decreases more rapidly than gaussian at |Re z| → ∞, |Im z| < ∞.Thus, we restrict the integral path so that imaginary parts of variables are finite.Under this condition, expectation values of observables, are also path-independent.
After setting the cost function, the next task is arrangement of the functional form for the trial function.In Ref. [15], a simple trial function is adopted and the optimization is demonstrated to be successful in a one variable model.By comparison, it is natural to think that the task becomes difficult in complicated systems having large degrees of freedom such as the field theory.To circumvent the problem, we use the feedforward neural network as a possible and practical solution instead of arrangement of the functional form for the trial function.

Neural network
In this section, we give a brief review on the feedforward neural network and explain how to use it for the path optimization method.The feedforward neural network is the basic algorithm containing the input, hidden and output layers as shown in Fig. 1.Each layer has 3/10 where W and b are parameters for the optimization and characterize the feedforward neural network.The function g(x) is non-linear function which is so called the activation function; the sigmoid function, g(x) = 1/(1 + e −x ), and hyperbolic tangent are typical examples.
It should be noted that we can use a deep neural network having three or more hidden layers, but the hidden mono-layer is found to be enough to perform the optimization in the present computation: The feedforward neural network even with the hidden mono-layer can simulate any kind of continuous functions on the compact subset as long as we can prepare sufficient number of units in the hidden layer, as given in the universal approximation theorem [23,24].
In practice, we regard t (f ) as an input (output) and give the integral path as where α i and β i are parameters.For simplicity, we assume that the real part is not modified.We use the hyperbolic tangent for the activation function in the feedforward neural network.In the updation of α i , β i and parameters in f i , we employ the back-propagation algorithm [25].
To evaluate the cost function by using the feedforward neural network with the hybrid Monte-Carlo method, we normalize Eq. (1) by d n t P 0 (t) with an appropriate probability distribution P 0 (t), where {t (k) } k-th configuration sampled according to P 0 (t), and N is the number of configurations.Let c = {c i } be the set of parameters appearing in the feedforward neural network, z(t) = z(t, c), then derivatives of F with respect to c i are evaluated as where the derivative for each configuration, F i , is given as Here we have used the relation arg( k J(t (k) )e −S(z(t (k) )) /P 0 (t (k) )) ≃ arg(Z) = θ 0 , which holds for a large number of Monte-Carlo configurations.Because Eq. ( 9) is expressed as a summation, the stochastic gradient descent (SGD) is available for the optimization.In this study, we employ the ADADELTA algorithm [26] which is the extended algorithm of SGD with preventing the learning weight decay as the optimizer, and it is compared with the Adam algorithm [27] to check the stability.In ADADELTA algorithm, we update the parameters in the fictitious time step j, c where η is a parameter so called the learning rate.The direction of the update, v i , is obtained as, where γ is the decay constant, and ǫ is a positive small number introduced to avoid the divergence.The variables r and s denote the mean square values of F i and v i , respectively, and we set r (0) = 0, s (0) = 0 for the initial condition.In the (j + 1)-th update, we evaluate the variables in the order of r . After updates of many times, we find , where • • • shows the average over the updates.The optimization algorithm shown here has been found to be successful empirically, and should be used with care.Nevertheless, parameters converge when all the derivatives F i become zero, as in the standard gradient methods.
In the actual optimization process, we use mini-batch training to make our optimization faster and easier: We divide the configurations as N = KN batch where N batch is the batch size and the learning is performed in batch by batch.To include all updation of each batch, the parameters in the feedforward neural network is then updated by using the mean value of F i as In one optimization procedure we perform updations K times with N batch configurations. 5/10 4. Application to two-dimensional complex λφ 4 theory

Two-dimensional Complex λφ 4 theory
In order to examine the applicability and usefulness of the path optimization method in quantum field theories, we consider here the two-dimensional complex λφ 4 theory with finite chemical potential (µ) [28][29][30][31] as an example.The action is defined in the lattice unit as where a, b = 1, 2, and φ 1 , φ 2 ∈ R are field variables.we assume these variables satisfy the periodic boundary condition.In the path optimization method, we complexify variables, φ a,x → z a,x ∈ C, with Eq. ( 7).Then the number density is given by where V is the lattice volume, It should be noted that the action Eq. ( 16) has the global U(1) symmetry: The action is invariant under the rotation in the (φ 1 , φ 2 ) plane, (φ 1 , φ 2 ) → (φ 1 cos q − φ 2 sin q, φ 1 sin q + φ 2 cos q), where q is a rotation angle.Then the distribution of (φ 1 , φ 2 ) should be, in principle, invariant under the U(1) transformation even after complexification.In one set of Monte-Carlo samples of configurations, however, the distribution can break the U(1) symmetry spontaneously.We will discuss this point in Subsec.4.3.

Numerical setup
We have investigated the range of µ from 0.2 to 2.0 with an interval of ∆µ = 0.2.Parameters in the theory are fixed as m = 1 and λ = 1.Under this setup, the mean field treatment predicts the number density starts to increase at around µ = 1.
The lattice size used in this article are L = L t = L x = 4, 6 and 8.The step size of the molecular dynamics in the hybrid Monte-Carlo method is set to ∆τ = τ /n step = 0.1/10 = 0.01, where n step is the number of steps.In the optimization procedure, the number of units in the hidden layer, N unit , is set to 2 5 , 2 6 and 2 8 for L = 4, 6 and 8, respectively.For ADADELTA algorithm, we use η = 0.1/N ite where N ite = 1, • • • is the number of the optimization, γ = 0.95 and ǫ = 10 −6 .In the mini-batch training, we use N batch = 10.
To calculate the expectation values, we have generated 10 4 configurations for each optimization, and the expectation values are estimated after N ite = 20 optimizations for L = 4 and 6 with discarding the first 2000 configurations for thermalization.In the case with L = 8, we use N ite = 30.Statistical errors are obtained by using the Jack-knife method.
In actual numerical calculation to obtain the optimized integral path, we perform the following procedures: (1) Take initial parameters in Eq. ( 7) as parameters obtained in smaller µ or randomly with the Xavier initialization [32].(2) Generate configurations with the phase-quenched probability, P 0 where z 0 (t) is the original path.A sufficient number of configurations should be generated. 6/10 (3) Select configurations randomly and compute Nbatch F i (t, c) to update parameters in Eq. ( 7) by using the back-propagation with the mini-batch training.(4) Regenerate configurations with P 0 (t) = |Je −S | z=z(t) where z(t) is the modified path.
(5) Check the average phase factor.In the case where the average phase factor is not large enough and the iteration number, N ite , is still small, go back to step 3. Fig. 2 The top and middle panels show the real part of average phase factor without and with the optimization as a function of µ.The bottom panel shows the imaginary part of the average phase factor with the optimization.Circles, squares and crosses are results for L = 4, 6 and 8, respectively.

Numerical results
We first discuss the average phase factor, e iθ pq .The top (middle) panel of Fig. 2 shows the µ-dependence of the real part of the average phase factor, Re e iθ pq , without (with) optimization.The real part of the average phase factor without path optimization rapidly decreases with increasing µ, and it becomes almost zero at µ > 1 on a larger lattice, L = 8.With the path optimization, Re e iθ pq takes larger values than that without optimization.Particularly, results with L = 4 show Re e iθ pq ∼ 1 in the range, 0 ≤ µ ≤ 2. For L = 8, Re e iθ pq takes smaller values than those on the L = 4 and L = 6 lattices, and takes a minimum value of ∼ 0.4 at µ ∼ 1.4.The minimum value is well above zero, and we can safely obtain the expectation values of observables.The bottom panel of Fig. 2 shows the µ-dependence of Im e iθ pq with the optimization.We find that Im e iθ pq takes smaller 7/10 values than the real part.We also note that |Im e iθ pq | becomes smaller after the path optimization.Fig. 3 The expectation value of the number density in the Monte-Carlo calculations (solid lines with symbols) and in the mean field approximation (dashed line).Shaded area shows the expectation value without path optimization method at L = 8.
Since the average phase factor is large enough with path optimization, it becomes possible to discuss observables such as the number density.The upper (bottom) panel of Fig. 3 shows the expectation value of the real (imaginary) part of the number density as a function of µ.We can see that Re n starts to grow rapidly around µ = 1.By comparison, Im n is sufficiently smaller than Re n .This tendency of Re n is consistent with the four dimensional case; see Refs [28][29][30][31], and references therein.The present results are also consistent with the mean field results, where the fields are assumed to be homogeneous and static.In this case, the action is simplified as where φ 2 = φ 2 1 + φ 2 2 and φ 2 stat.represents the stationary value.The critical chemical potential is µ c = arccosh(1 + m 2 2 ) ≃ 0.962.The green dashed line in the left panel of Fig. 3 shows the mean field results.Except around µ ≃ 1.4, the mean field results approximately explain the optimized path results.
Next, we discuss the distribution of field variables.Figure 4 shows the scatter plot of Re z 1 and Re z 2 on the lattice sites in sampled Monte-Carlo configurations at µ = 0.6, 1.0, 1.4 and 2.0.In the µ 1.0 region, we find that the distribution departs from the origin and breaks the U(1) symmetry.The deviation from rotationally symmetric distribution signals the spontaneous symmetry breaking.Since the field distribution is determined by the optimized path we reach an observation that the path optimization breaks the symmetry spontaneously.

Summary
In this article, we have introduced the feedforward neural network to attack the sign problem appearing in the quantum field theory via the path optimization method.To obtain a better integral path in the complexified space, we have utilized the feedforward neural network and optimized its parameters to minimize the cost function which reflects the seriousness of the sign problem; this procedure corresponds to increasing the average phase factor.As an example of the quantum field theory, we consider two-dimensional complex λφ 4 theory on the lattice at finite chemical potential.
We have performed the lattice simulation on the L 2 lattices with L = 4, 6 and 8.We set model parameters as m = 1, λ = 1.In the actual calculation, we have used ADADELTA algorithm for the optimization and generated configurations by using the hybrid Monte-Carlo method.
It is found that the average phase factor becomes significantly larger after the optimization compared with that for the original integral path.In the case of L = 4, the average phase factor is close to 1.By comparison, it sometimes becomes about 0.4 in the case of L = 8.The average phase factor is large enough to discuss observables such as the number density.The hybrid Monte-Carlo samples show that the spontaneous symmetry breaking can take place at µ > 1.0.This observation suggests that the optimized path may break the symmetry spontaneously.
Let us comment on another work on the sign problem using machine learning developed in Ref. [11].In the work, the authors have introduced the machine leaning to construct the new integral path from the feedforward neural network which is trained by using the few field configurations by solving the holomorphic flow equations.Thus, the learning is nothing but the supervised learning.By comparison, the present path optimization method employs the unsupervised learning, and then we do not need the teacher data and just try to enhance the average phase factor.It would be interesting to start from the integral path obtained with supervision, and to further obtain unsupervised integral path with taking care of the Jacobian phase effects.
For a future work, it is interesting to apply the path optimization method with the artificial neural network to the lattice gauge theory because present study means that the path optimization method can well work in the simple quantum field theory.9/10

Fig. 1
Fig. 1 Schematic figure of the feedforward neural network.Arrows indicates the propagation of signals.

Fig. 4
Fig.4The scatter plot of Re z 1 and Re z 2 in the case of L = 8 at µ = 0.6, 1.0, 1.4, 2.0.Solid line shows the mean field results, i.e. homogeneous stationary points of the action.