Parallel tempering algorithm for integration over Lefschetz thimbles

The algorithm based on integration over Lefschetz thimbles is a promising method to resolve the sign problem for complex actions. However, this algorithm often meets a difficulty in actual Monte Carlo calculations because the configuration space is not easily explored due to the infinitely high potential barriers between different thimbles. In this paper, we propose to use the flow time of the antiholomorphic gradient flow as an auxiliary variable for the highly multimodal distribution. To illustrate this, we implement the parallel tempering method by taking the flow time as a tempering parameter. In this algorithm, we can take the maximum flow time to be sufficiently large such that the sign problem disappears there, and two separate modes are connected through configurations at small flow times. To exemplify that this algorithm does work, we investigate the (0+1)-dimensional massive Thirring model at finite density and show that our algorithm correctly reproduces the analytic results for large flow times such as T=2.


Introduction
There are many cases where one needs to deal with complex actions.One important example for high energy/nuclear physics is quantum chromodynamics (QCD) at finite density (see Ref. [1] for a review on the recent developments).However, since a complex action does not give a real and positive Boltzmann weight, one cannot directly resort to the traditional Markov chain Monte Carlo methods to estimate correlation functions.The so-called reweighting algorithm (which absorbs the phase of the weight into observables) is highly ineffective when the imaginary part of the action becomes very large (such as in the thermodynamic limit), because one needs to take a sample from a configuration space where the weights of nearby configurations have almost the same amplitudes but very different phases.The difficulty of numerical evaluation in such a situation is termed the sign problem.
There have been many proposals to circumvent the sign problem.One of the approaches that are currently under intense study is the use of integration over Lefschetz thimbles [2] (see also Refs.[3,4,5,6,7,8,9,10]), which we will call the Lefschetz thimble algorithm hereafter.There, the original real-valued variable (say, x = (x i ) ∈ R N ) is complexified according to the antiholomorphic gradient flow (sometimes called the upward flow in the literature) żi = [∂ i S(z)] * .In the original Lefschetz thimble algorithm, as will be reviewed in Sect.2.1, the flow time is taken to infinity to map the original configuration space to a union of Lefschetz thimbles.Since the imaginary part of the action is constant on each thimble, the sign problem disappears if the path integral is made over the Lefschetz thimbles.However, since two different thimbles are separated by an infinitely high potential barrier, one needs to invent some machinery to incorporate contributions from all relevant thimbles.Recently, Alexandru et al. [11] was a very interesting proposal to consider configurations on a manifold that is obtained from the original configuration manifold by a finite amount of flow time.This algorithm was a great success in various models [11,12], but reducing the amount of flow time may also reduce the effectiveness against the sign problem, and one does not know a priori whether the chosen flow time avoids both the sign problem and the multimodal problem simultaneously.
In this paper, as a versatile tool for Monte Carlo calculations of models with complex actions, we propose a Lefschetz thimble algorithm where the flow time is used as an auxiliary variable for the highly multimodal distribution.There can be various methods to realize this idea, and in this paper we implement the parallel tempering method because of its simplicity, by taking the flow time as a tempering parameter.There, we consider a set of manifolds corresponding to various flow times.Two separate modes at large flow times (where the sign problem no longer exists) are then connected by passing through configurations at small flow times (where the original sign problem exists but the multimodality is expected to be mild).Since the sample average is taken only with respect to the largest flow time, we need not worry about the sign problem at small flow times, although we take into account configurations there.This paper is organized as follows.In Sect. 2 we first review the basics of the Lefschetz thimble algorithm based on Refs.[7,11], and then implement the parallel tempering method in the algorithm by taking the flow time as a tempering parameter.In Sect. 3 we investigate the (0+1)-dimensional massive Thirring model at finite density, and show that our algorithm correctly reproduces the analytic results for large flow times such as T = 2. Section 4 is devoted to the conclusion and outlook for future work.

Integration over Lefschetz thimbles (review)
We consider a real N-dimensional dynamical variable, x = (x i ) ∈ R N , with action S(x) that may take complex values for real-valued x.Our main concern is to evaluate the expectation values of functions of x: x) . (2.1) We assume that |e −S(x) | decreases rapidly enough in the limit x → ±∞, and that e −S(z) and O(z) are entire functions when regarded as functions of z = (z i ) ∈ C N .The integration region can then be changed to any other region Σ in C N as long as it is obtained as a continuous deformation of the original region with the boundary fixed at infinity.We consider as such an integration region the submanifold that is obtained from the following antiholomorphic gradient flow z(t; x) with a flow time t: In fact, the flow defines a map from the original integration region Σ 0 ≡ R N to a real N-dimensional submanifold Σ t in C N : We thus see that (2.1) can be rewritten as which can be further rewritten as a reweighted integral over S eff e i arg detJt(x)−S I (zt(x)) Here, J t (x) ≡ ∂z i t (x)/∂x j is the Jacobi matrix, S R z t (x) (S I z t (x) ) is the real (imaginary) part of S z t (x) , and * S eff is the expectation value taken with respect to S eff (x; t) ≡ S R z t (x) − ln detJ t (x) . (2.6) The Jacobi matrix is obtained by solving the combined differential equations with respect to t [7]: where H ij (z) ≡ ∂ 2 S(x)/∂z i ∂z j is the Hesse matrix for the action S(z).
The key point is that the right-hand side of (2.4) or (2.5) does not depend on the flow time t, so that one can set t to an arbitrary value that is convenient for actual calculation.
1 One may also take as an integration region the tangent space to the critical point of the dominant thimble [7] if the integral can be well approximated by the integration around the critical point. 2 The second equation is obtained by differentiating the first equation with respect to x j ∈ R; Note that under the flow the real part S R z t (x) does not decrease while the imaginary part Lefschetz thimble algorithm, one takes the limit t → ∞, in which Σ t approaches a union of connected components (Lefschetz thimbles), and the action has a constant imaginary part on each thimble. 3In a generic situation the phase change coming from J t (x) is sufficiently mild, so that the Monte Carlo calculation for the expression (2.5) is free from sign problems.However, two different thimbles are also disconnected in the sense of Monte Carlo sampling because S R increases indefinitely near the boundary of each thimble.This multimodality of distribution makes the Monte Carlo calculation impractical, especially when contributions from more than one thimble are relevant to estimating expectation values.
A very interesting proposal made in Refs.[11] is to use a finite amount t, which is chosen to be large enough to avoid the sign problem but also not too large in order to enable exploration in the configuration space.However, one does not know a priori whether the adopted value of t is actually free from the two obstacles (the sign and multimodal problems) simultaneously.We will show that one can solve both simultaneously if we implement the parallel tempering method in their algorithm with the flow time as a tempering parameter.

Implementation of parallel tempering
The basic idea of the parallel tempering algorithm [13,14,15] is the following.Suppose that we want to estimate expectation values with action S(x; λ), where x ∈ R N is a dynamical variable and λ is the parameter (such as the temperature) that we want to use for a Monte Carlo calculation.The point is that even when the distribution is multimodal for the original λ (e.g., when λ represents a very low temperature), the multimodality can be made mild if one takes another value λ (e.g., λ corresponding to a very high temperature).So, if the configuration space is enlarged such that the parameter can change gradually between λ and λ, two separate configurations for the original λ will be connected by passing through configurations at parameters near λ.The parallel tempering algorithm enables the move of configurations among different λ by enlarging the configuration space from R N = {x} to the set of A + 1 replicas, (R N ) A+1 = {(x 0 , x 1 , . . ., x A )}.We there assign λ α to replica α (α = 0, 1, . . ., A), such that λ 0 = λ and λ A = λ and that λ α and λ α+1 are sufficiently close to each other. 4We set up an irreducible, aperiodic Markov chain for the enlarged configuration 3 Generically there is a single critical point z σ on each connected component J σ , and J σ is obtained as the set of orbits flowing out of z σ .The complementary submanifold to J σ in C N consists of orbits that flow into z σ , and will be denoted by K σ .The integrations in (2.5) are dominated by points near the intersection of K σ and R N . 4The computational cost required for the parallel tempering method can be roughly estimated to be proportional to (A + 1)/X, where X is the minimum of the acceptance rates for all pairs of adjacent replica (see Step 3 below).In this paper, we will set the parameters A and λ α such that the minimum acceptance space such that the probability distribution for (x 0 , x 1 , . . ., x A ) eventually approaches the equilibrium distribution proportional to α e −S(xα;λα) . (2.9) We finally take sample averages only with respect to a sample taken from α = A. The simplest algorithm to realize this idea 5 is to swap two configurations of two adjacent replicas α and α + 1, (i.e., to update the configuration ( with the probability which obviously satisfies the detailed balance condition w α (x, x ′ ) e −S(x;λα)−S(x ′ ,λ α+1 ) = w α (x ′ , x) e −S(x ′ ;λα)−S(x,λ α+1 ) . (2.11) Our proposal is to take the flow time t as such a tempering parameter.The basic algorithm is then as follows. 6  • Step 0. Fix the maximum flow time T , which should be sufficiently large such that the sign problem disappears there, and pick up flow times {t α } from the interval [0, T ] with The values of A and t α are determined manually or adaptively to optimize the acceptance rate in Step 3 below.
• Step 1. Choose an initial value x α ∈ R N for each replica α, and numerically solve the differential equations (2.7) and (2.8) to obtain the triplet (x α , z tα , J tα ).
• Step 2. For each α, construct a Metropolis process to update the value of x.Explicitly, we take a value x ′ α from x α using a symmetric proposal distribution, and recalculate the triplet (x ′ α , z ′ tα , J ′ tα ) using x ′ α as the initial value.We then update x α to x ′ α with the probability min(1, e −∆S eff,α ), where (recall that S eff (x; t) = S R z t (x) − ln detJ t (x) , Eq. (2.6)).We repeat the process sufficiently many times such that local equilibrium is realized for each α.
rate is well above 50% (see Fig. 5). 5 Of course, there can be many variations on this algorithm. 6To make discussions simple, we only take the flow time as a tempering parameter.The algorithm can be readily extended such that other parameters are included as extra tempering parameters.
• Step 5.After repeating Steps 2 to 4, we obtain a sequence of triplets {(x t A =T )}}, which we use to estimate the expectation value: . (2.14) Note that the action with t 0 = 0 is the original action for which the sign problem exists.However, as can be seen from (2.14), the sample average is taken only with respect to the action with t A = T , so that we need not worry about the original sign problem, although we include configurations near t A = 0. Also, t A = T can be taken to be sufficiently large such that the sign problem disappears if we complement intermediate flow times sufficiently (with larger A).Thus, this simple algorithm solves the two obstacles simultaneously: the original sign problem at t 0 = 0 is resolved at t A = T while the multimodal problem at t A = T is resolved by passing through configurations near t 0 = 0.7

Example
In this section we investigate the (0+1)-dimensional massive Thirring model at finite density [16,17,10] to exemplify that the algorithm given in the previous section does work.
The (0 + 1)-dimensional massive Thirring model is defined from the standard (1 + 1)dimensional massive Thirring model by dimensional reduction.With an auxiliary field φ(τ ), the continuum representation of the grand partition function Z = tr e −β(H−µQ) is given by the path integral where the Euclidean action takes the form and the φ integral (the ψ, ψ integral) obeys the periodic (antiperiodic) boundary condition.
In realizing the model on the lattice, we discretize the Euclidean time as τ = na (n = 1, . . ., N) with β = Na (N: even), and follow the prescription of Ref. [17], where φ(τ ) = φ n is treated as a U(1) gauge potential and is combined with the chemical potential µ to become a link variable of a complexified gauge group, e (iφ(τ )+µ)a = e iφna e µa ≡ U n e µa , as proposed in Ref. [18].Then, by using a staggered fermion formulation, the grand partition function is given by where (dU We henceforth set a = 1 and treat m, µ, and g 2 as dimensionless parameters.
After carrying out the fermion integration, we obtain which can be calculated analytically [17] as where , and I n (α) is the modified Bessel function of the first kind of order n.Using this expression, we obtain the analytic form of the chiral condensate as

.7)
We now numerically evaluate the chiral condensate by using the complex action where It is easy to check that [det D(U; µ)] * = det D(U; −µ), so that the second term of (3.8) is complex-valued for real µ, and the sign problem should arise when N is large.
Step 1 of our algorithm, we make a cold start (φ n = 0) for every replica α, and numerically solve the differential equations (2.7) and (2.8) with the adaptive 4th-order Runge-Kutta method to obtain the triplet (x α , z tα , J tα ).We repeat the Metropolis process twenty times in Step 2, 9 which is followed by a single sequence of swapping of Step 3. We then repeat Steps 2 and 3 ten times (as Step 4).With the first 5 data points discarded as initial sweeps, we estimate correlation functions with 10 4 data points.Figure 2 shows the absolute value of the denominator in (2. Figure 3 shows the chiral condensate χχ as a function of µ.The other parameters are again set to N = 8, m = 1, g 2 = 1/2.The dotted line represents the analytic result (3.7).The blue points are the result for the flow time T = 0 and exhibit large statistical errors, reflecting the sign problem.The green points are the result for the flow time T = 2 without the parallel tempering implemented.They have small statistical errors, but exhibit statistically significant discrepancies from the analytic result.This should be attributed, as discussed in detail in Ref. [11], to the fact that the dominant contributions come only from a single thimble for such large T .The red points are the result for the flow time T = 2 now with the parallel tempering implemented.They show a good agreement with the analytic result, which implies that contributions from various thimbles are correctly taken into account through the parallel tempering.This can be confirmed by Figs. 4 and 5. Figure 4 exhibits the histogram of φ = (1/N) n φ n at T = 2.We see that the configurations are concentrated on a single thimble if the parallel tempering is not implemented (left), while they are spread out on various thimbles if the parallel tempering is implemented (right).Figure 5 shows the average acceptance rates for the swaps between replicas α and α + 1 in Step 3. We see that the swapping is carried out very well because the average acceptance rate is more than 50%   for all pairs (α, α + 1).

Conclusion and outlook
In this paper, as a versatile tool for Monte Carlo calculations of models with complex actions, we have proposed a Lefschetz thimble algorithm where the flow time is used as an auxiliary variable for the highly multimodal distribution.In particular, we implemented the parallel tempering method by taking the flow time t as a tempering parameter.There, we prepare flow times {t α } such that t 0 = 0 and t A = T .The largest flow time T can be taken to be sufficiently large such that the sign problem disappears there.Although the algorithm includes configurations at t 0 = 0, the original sign problem at t 0 = 0 does not enter the calculation because the sample average is taken only with respect to t A = T , where the sign problem disappears.We have investigated the (0 + 1)-dimensional massive Thirring model at finite density to exemplify that the algorithm does work, and showed that contributions from multi thimbles are correctly taken into account even for such a large flow time as T = 2.
We should investigate to what extent this algorithm is actually versatile.One interesting class of models, for which we can readily test our algorithm before applying it to QCD at finite density, is that of various types of large-N random matrix models with complex actions.In fact, if the free energy is calculated by an integration over matrices themselves (not over their eigenvalues), the classical solutions and the corresponding Lefschetz thimbles do not have a useful meaning because the quantum corrections are of the same order as the leading term.It thus provides us with a good test of versatility to check whether correct results are obtained for such models where thimble structure or its usefulness is not clear.As a related model, the numerical study of the triangle-hinge model [19,20,21] should also be interesting.The model is a sort of matrix model that generates 3D random volumes as a collection of triangles and hinges.In order to restrict the resulting configurations to tetrahedral decompositions, one needs to introduce a special form of interaction [19], which makes the action complex-valued (M.Fukuma, S. Sugishita, and N. Umeda, manuscript in preparation).A numerical study was made for a simplified model (with no restriction to tetrahedral decompositions), and the existence of a third-order phase transition is confirmed (manuscript in preparation).It is thus interesting to see whether the phase transition still exists when the restriction is imposed.
Besides the Lefschetz thimble algorithm, the complex Langevin algorithm [22,23] is also under intense study as a promising method to solve the sign problem.Recently, a very interesting proposal was made by Bloch [24] (see also Ref. [25]) to evaluate correlation functions by reweighting complex Langevin trajectories using such parameters that satisfy known validity conditions [26,27] to be free from wrong convergence problems [26,28,27,29,30].It should be interesting to compare the extent of versatility between the reweighted complex Langevin algorithm and our parallel tempering algorithm with the flow time as a tempering parameter.
A study along these lines is now in progress and will be reported elsewhere.

Figure 2 :
Figure 2: The absolute value of the denominator in (2.14) (divided by the sample size) as a function of µ with the other parameters set to N = 8, m = 1, g 2 = 1/2.The blue points are the result for the flow time T = 0 and show that the sign problem actually exists for µ 1.0.The green (red) points are the result for the flow time T = 2 without (with) the parallel tempering (PT) implemented, and show that the sign problem disappears at T = 2.

Figure 3 :Figure 4 :
Figure 3: Chiral condensate χχ as a function of µ with the other parameters set to N = 8, m = 1, g 2 = 1/2.The dotted line represents the analytic result (3.7).The blue (green) points are the result for the flow time T = 0 (T = 2) without the parallel tempering implemented.The red points are the result for the flow time T = 2 with the parallel tempering implemented.