Universality in COVID-19 spread in view of the Gompertz function

Abstract We demonstrate that universal scaling behavior is observed in the current corona virus (SARS-CoV-2) spread, the COVID-19 pandemic, in various countries. We analyze the numbers of infected people who tested positive (cases) in selected eleven countries (Japan, USA, Russia, Brazil, China, Italy, Indonesia, Spain, South Korea, UK, and Sweden). By using the double exponential function called the Gompertz function, fG(x) = exp(–e–x), the number of cases is well described as N(t) = N0fG(γ(t – t0)), where N0, γ and t0 are the final number of cases, the damping rate of the infection probability and the peak time of the daily number of new cases, dN(t)/dt, respectively. The scaled data of cases in most of the analyzed countries are found to collapse onto a common scaling function fG(x) with x = γ(t – t0) being the scaling variable in the range of fG(x) ± 0.05. The recently proposed indicator so-called the K value, the increasing rate of cases in one week, is also found to show universal behavior. The mechanism for the Gompertz function to appear is discussed from the time dependence of the produced pion numbers in nucleus-nucleus collisions, which is also found to be described by the Gompertz function.


Introduction
The COVID-19 pandemic is the worst disease spread in this century. As of May 20, 2020, over 4 million people have tested positive in the world, and the number of cases N (t), the number of infected people who tested positive at the time t, is still increasing rapidly. In order to control the spread of infection, it is desired to understand the diffusion mechanism of COVID- 19. Recently, a double exponential function called the Gompertz function is found to catch the features of N (t) [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]. The Gompertz function appears when the infection probability per infected people exponentially decreases as a function of time. With the Gompertz function, the daily number of new cases dN (t)/dt, daily increase of infected people who tested positive, shows asymmetric time-profile rather than the symmetric one found in the prediction of the Susceptible-Infected (SI) model [21], one of the standard models of the spread of infection. The Gompertz function was proposed by B. Gompertz in 1825 to discuss the life contingencies [22]. It is interesting to find that the Gompertz function also appears as the number of tumors [23] and the number of detected bugs in a software [24], as well as particle multiplicities at high energies [25].
The exponential decrease of the infection probability is also important to deduce when the restrictions can be relaxed. For example, Nakano and Ikeda [26] found that the number of cases is well characterized by the newly proposed indicator K, which represents the increasing rate of cases in one week. The indicator K takes a value between zero and unity, is not affected by the weekly schedule of the test, and is found to decrease almost linearly as a function of time in the region 0.25 < K < 0.9 provided that there is only a single outbreak affecting the infection. In order to understand the linearly decreasing behavior of K, Nakano and Ikeda proposed the "constant damping hypothesis" in their paper [26], and Akiyama proved that the hypothesis of exponential decrease in discrete time shows the Gompertz curve in continuous time [27], independently of other works [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]. Since the indicator K is expected to be useful to predict the date when the restrictions can be relaxed as K(t) 0.05, it would be valuable to analyze its solution, the Gompertz function, in more detail.
In this article, we analyze the number of cases by using the Gompertz function. We examine that N (t) in one outbreak is well described by the Gompertz function in several countries. Then with the time shift and the scale transformation of time and N (t), the data are found to show universal behavior; they are on one-curve described by the basic Gompertz function, exp(−e −x ). The newly proposed indicator K [26] also shows universality. We further discuss that the number of produced pions in nucleus-nucleus collisions is described by the Gompertz function. This similarity may be helpful to understand the mechanism of COVID-19 spread.
This article is organized as follows. In Sec. 2, we give a brief review of the Gompertz function and its relevance to the disease spread. In Sec. 3, we show the comparison of the number of cases and the Gompertz function fitting results. We demonstrate that the numbers of cases in many countries show universal scaling behavior. In Sec. 4, we show that the number of pions in nuclear collisions is well described by the Gompertz function, and we deduce the mechanism to produce the time-dependence described by the Gompertz function. In Sec. 5, we summarize our work.

Gompertz function, indicator K and scaling variables
When the infection probability k(t) is given as a function of time, the evolution equation for the number of infected people N (t) and its solution are given as Adopting an exponentially decreasing function as k(t), the solution is found to be By choosing the reference time t 0 so that k(t 0 ) = γ, the solution is given as where N 0 = N (t 0 ) exp(k(t 0 )/γ) = eN (t 0 ) is the asymptotic value of N (t), N 0 = lim t→∞ N (t). Here, N 0 can be interpreted as "terminal velocity", since the evolution equation (1) has an analogy to the equation of motion of particle feeling a viscous resistance, where its coefficient is not constant but exponentially decreasing.  Fig. 1 Comparison of Gompertz f G (x) and sigmoid f S (x) functions and their derivatives, f G (x) = df G (x)/dx and f S (x) = df S (x)/dx, in the linear (top) and logarithmic (bottom) scales. We also show f S (2x), the sigmoid function with doubled infection probability, by green dot-dashed curves.
The double exponential function appearing in Eq. (3) is called the Gompertz function [22], By using the Gompertz function, N (t) is given as where x is the scaling variable for N (t).
One of the characteristic features of the Gompertz function is the asymmetry of its derivative in the early (x < 0) and late (x > 0) stages. In Fig. 1, we show the Gompertz function f G (x) and its derivative, as functions of x. The derivative takes the maximum at x = 0, and the asymmetry in the negative and positive x region is clearly seen. This asymmetry should be compared with the solution of the SI model [21], in which the susceptible people (S(t)) are infected by the infectious people (I(t)) at the rate proportional to the product S(t)I(t), where k is a constant. The solution is found to be where N 0 = S + I is a constant, γ = N 0 k, and t 0 is the time when half of the people are infected, I(t 0 ) = N 0 /2. The characteristic function f S (x) is called the sigmoid function, 3/21 which is also referred to as the logistic function, Then the daily number of infected people is a symmetric function of t − t 0 , where f S (x) = df S (x)/dx. In Fig. 1, we also show f S (x) and df S (x)/dx by green curves. At large x (x ≥ 1), the Gompertz and sigmoid functions show similar behavior, . The difference found in the derivatives in the figure comes from the factor (e and 4) multiplied to normalize them to be unity at x = 0. By comparison, the sigmoid function with doubled transition probability (f S (2x)) approximately agrees with f G (x) on average in the negative x region, −2 ≤ x ≤ 0. As already noticed [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20] and is discussed later, the asymmetry is clearly found in dN (t)/dt for COVID-19, so N (t) is better understood by the Gompertz function than by the sigmoid function.
Once the solution is given, one can obtain the K value [26], increasing rate of the infected people in a week, as where x K is the scaling variable for K(t), with w = 7 days. In the later discussions, we regard the number of cases (the number of infected people who tested positive) as the number of infected people, since the latter is difficult to measure. We expect that the former takes a similar value to the latter, as long as the infected people are defined as the infectious people, who already have symptoms or bear enough amount of coronavirus DNAs.

Adopted dataset
Let us now examine the universal behavior given by the Gompertz function in real data. We use the data given in Ref. [28] as of May 21, which contain the N (t) data from Dec. 31, 2019 (t = 0) till May 20 (t = 141). Throughout this article, we measure the time t in the unit of day. In order to avoid the discontinuity coming from the definition change, we have removed the spikes in the daily increase (dN (t)/dt) data in Japan (April 12, t = 103) and China (February 13, t = 44), and instead the daily numbers in previous days are increased by multiplying a common factor, which is determined to keep the total number of cases in the days after the spike. The number of cases (N (t)) is obtained as the integral of thussmoothen dN (t)/dt. In addition, seven-day averages (±3 days) are considered in the analysis 4/21 of dN (t)/dt in order to remove the fluctuations in a week. Thus-smoothen dN (t)/dt data are available in 3 ≤ t ≤ 138 1 .
We have chosen the countries with a large number of cases (the USA, Russia, Brazil, and the UK as of May 20, 2020), the first three Asian countries where the COVID-19 spread explosively (China, South Korea, and Japan), the first two countries of spread in Europe (Italy and Spain), a country with a unique policy (Sweden), and a country with somewhat different dN (t)/dt profile (Indonesia).

Number of cases and its daily increase
In the left panel of Fig. 2, we show the daily number of new cases dN (t)/dt given in Ref. [28] with the smoothing mentioned above. The legends stand for the abbreviation of the country name (internet country domain code, see Table 1). The dN (t)/dt data show there is one big peak in each country, and the shape of the peak is asymmetric; fast rise and slow decay. In many of the countries, there are several other peaks, which are smaller than the dominant one but visible at least in the log-scale plot. The fitting results using the Gompertz function are shown by dotted lines, and are found to explain the dominant peak region of data well. The fitting to dN (t)/dt data is carried out by using the derivative of the Gompertz function, In order to concentrate on the dominant peak, we first limit the time region of the fit to t min ≤ t ≤ t max , which covers it. Next, the fitting time region is extended to the whole range, 0 ≤ t ≤ 140. In the fitting procedure, we have assumed a Poisson distribution for the daily number of new cases, then the uncertainty in dN (t)/dt is assumed to be dN (t)/dt + ε, where ε = 0.1 is introduced to avoid zero uncertainty in the case of zero daily number. We summarize the obtained parameters (N 0 , γ, t 0 ) in Table 1, and parameters in the line with (*) are adopted to draw the figures. When the χ 2 value is smaller in the whole range analysis and the obtained parameters in the two cases are similar, the single outbreak assumption is supported and we show only the results in the whole range analysis. In other cases, we in principle adopt the results giving the smaller reduced χ 2 . The exception is the USA, where the reduced χ 2 is larger but we adopt the whole range analysis results. This is closely related to the multiple outbreaks, and will be discussed in Appendix A. It should be also noted that, unfortunately, the reduced χ 2 s are large in the single outbreak model with the present error estimate of dN (t)/dt. There are non-negligible contributions of other outbreaks as discussed in Appendix A. In addition, while we use the 7-day average data, we cannot completely remove the daily oscillations of dN (t)/dt in a week coming from the test schedule. The Gompertz function does not take care of such oscillators, then χ 2 /dof remains to be large. In the right panel of Fig. 2, we show the normalized daily numbers, (dN (t)/dt)/(N 0 γ), as functions of the scaling variable, x = γ(t − t 0 ). Most of the data points are around the 6/21 derivative of the Gompertz function and inside the gray band, which shows the region with 5% uncertainty in γ and 20% uncertainty in N 0 .
We also show the sigmoid function by the green curves in the right panel of Fig. 2. It is clear that a single sigmoid function cannot describe the behavior of the dN (t)/dt data. If we try to fit dN (t)/dt data by the sigmoid function, we need to adopt larger γ in the negative x region as shown by the green dot-dashed curve in the right panel of Fig. 2 approximately agrees with f G (x) in shape in the positive x region. The left panel of Fig. 3 shows the number of cases N (t), obtained as the integral of dN (t)/dt data after removing the spike. We also show the Gompertz function results with the parameters (N 0 , γ, t 0 ) determined from the dN (t)/dt data, and an integration constant ∆N , where ∆N is obtained by fitting to the N (t) data. In most of the countries, the Gompertz function with ∆N explains the data in the large N (t) region and deviations are found only in the region with small N (t). In Japan, the fitted time range is limited to be t ≥ 85 and earlier time data are not fitted to. Thus deviations at t < 80 are visible in the log scale, while the value is less than 10% of the total number of cases at t = 140. We show the normalized N (t) as functions of the scaling variables in the right panel of Fig. 3. We subtract ∆N from N (t). It is interesting to find that most of the world data are on the Gompertz function f G (x). In Japan, China and South Korea, the fitted time range is limited and deviation from the Gompertz function results are found in the earlier times (Japan) and in later times (China and South Korea). It should be noted that the agreement at x < 0 owes largely to the large denominator compared with the number of cases in the early stage. Compared with the exponentially grown number of cases, the number of cases 7/21 in the early stage is much smaller and the ratio is seen to be very small. Nevertheless, it is impressive to find the agreement of the Gompertz function and the observed number of cases after scaling.
We also show the sigmoid functions, f S (x) and f S (2x), by green solid and dot-dashed curves in the right panel of Fig. 3. As in the dN (t)/dt case, the sigmoid function f S (x) agrees with the scaled data at x ≥ 1, but we need to use f S (2x), the sigmoid function with a larger infection probability, to explain the scaled data in the region of x ≤ 0.

K value
We now proceed to discuss the K value. We choose the offset day when explosive spread started, as shown by open squares in Fig. 3. The offset day and the offset number of cases, (t offset , N offset ), are summarized in Table 2. With these offset parameters, the K value is obtained as Since N offset is generally much smaller than the number of cases after explosive spread, the K value is not sensitive to the choice of the offset parameters as long as we discuss the long-time behavior. In the left panel of Fig. 4, we show the K factor as a function of time. Data of K(t) are explained by the prediction from the scaling function, 1 − f G (x K ), while the fluctuations around the predictions are large compared with the number of cases, N (t). In the right panel of Fig. 4, we show K values as functions of the scaling variable x K = γ(t − t 0 − ∆t). Except for several countries, the scaling behavior in K is observed.
In Ref. [26], K(t) is found to show the linear dependence on time in the large K(t) region, 0.25 < K(t) < 0.9. Actually, the Gompertz function f G (x) shows the linear dependence on time in the small |x K | region. When the scaling variable is small, the first-order Taylor 8/21 expansion would work, The precision of this first order approximation is around 1% e −xK should be also small and the Taylor expansion with respect to e −xK may work. Then K decreases exponentially, With the sigmoid function, while scaling behavior is observed in N (t) and dN (t)/dt, K does not scale as a function of a single scaling variable. In Fig. 5, we compare the functions in the K value derived from the Gompertz and sigmoid functions for N (t) as functions of x = γ(t − t 0 ). With the Gompertz function, the shift of the scaling variable is enough for the K value to be described by the basic Gompertz function. In contrast, the K value from the sigmoid function reads In addition to the shift in the scaling variable, the amplitude also depends on γw = 7γ.

Days of expected relaxing COVID-19 restrictions from the K value
When the K(t) value goes down to be around 0.05 [26], it would be possible to relax COVID-19 restrictions such as lifting the lockdown of the city or relaxing the state of emergency. Let us call those days as t relax . The days of relaxing the restrictions in several countries roughly correspond to the time at K(t) = 0.05. With this condition, the number of cases in the current outbreak will increase by around 5% and the infection in the current outbreak will converge. In terms of the scaling variable, this corresponds to the solution of 1 − f G (x K ) = 0.05 and is found to be x K = x R 2.97. Thus the expected day of relaxing the restrictions can be evaluated to be around, provided that there will be no further outbreaks. In Table 3, we summarize the expected day of relaxing COVID-19 restrictions t relax from the Gompertz function analyses in comparison with the relaxed day t relaxed in several countries.  in the USA. In Japan, the state of emergency was declared on April 7, 2020 (t = 98) and canceled on May 25 (t = 146). The relaxed days in Italy, Spain, Japan, the UK, and the USA are close to those expected from the Nakano-Ikeda model analyses. In China and Korea, the relaxed days were significantly later than the expectations from the model. One can guess that the governments tried to be on the safe side in these first two countries of COVID-19 spread. The expected day of relaxing COVID-19 restrictions strongly depends on the value of the damping rate of the infection probability, γ, which people and governments should try to enhance. In South Korea and China, the damping rate is larger than 0.1. Then the infection probability decreases by a factor of 1/e within 10 days. In these countries, the test-containment processes have been performed strongly. In many of the countries under consideration (Japan, USA, UK, Italy, and Spain), the damping rates take the value between 4 − 10%/day. In these countries, many of the restaurants and shops are closed and people are requested to stay home for one month or more. Sweden may be an interesting example. The Swedish government does not require restaurants and shops to be closed and does not ask people to stay home. The government asks people to be responsible for their behavior and social distancing is encouraged. The damping rate in Sweden, γ = 3.7%/day, may be regarded as a value representing the intrinsic nature of COVID-19.
It would be valuable to comment on the use of the effective reproduction number, R e , which is defined as the ratio of the number of new cases in one week to that in the previous week and has the merit that it is not necessary to define the offset. For a constant infection probability per infected people, k(t) = k 0 = const. in Eq. (1), R e is found to be R e = exp(k 0 w) and we can guess k 0 from R e . In contrast, when the number of cases is given by the Gompertz function, N (t) = N 0 f G (x), the effective reproduction number R e (t) is a function of two variables, x K and γw, while K(t) is a function of a single scaling variable x K . Thus the universality observed in K(t) is lost in R e (t), and it would be less easy to give a prediction of t relax from R e (t) than 11/21 from K(t). In Fig. 7, we show R e (t) from the Gompertz function at γ = 4, 8 and 16%/day. The value of R e (t) at x K = x R depends on γ, the value of x K at R e (t) = 1 is different from x R , x K = 0.49 (1.42) for γ = 16% (4%)/day, and apparent large values of R e (t) appear in the early stage, x K < 0 or t < t 0 + ∆t(γ).

Mechanism of appearance of the Gompertz function
Fast and slow rises in the early and late stages are found in many physical processes such as the particle production in nuclear collisions. In Fig. 8, we show the number of ∆ particles (∆ ++ , ∆ + , ∆ 0 and ∆ − ) and π particles (π + , π 0 and π − ) produced in central Au+Au collisions at the incident energy of 1 GeV/nucleon [29]. Histograms show the calculated results by using the hadronic transport model JAM [30]. The main production mechanism of π particles at this incident energy is the ∆ production and its decay. In the early stage (t < 15 fm/c), nucleons are excited to resonances such as the ∆ particles in the nucleonnucleon collisions, N N → N ∆. Produced ∆s collide with other nucleons and ∆s, some of them produce additional ∆ particles, ∆N → ∆∆, and some of them are deexcited to nucleons in the ∆ absorption processes such as ∆N → N N . The ∆ particles decay and produce π particles, ∆ → N π. Produced π particles may collide with other nucleons, ∆s and πs, and occasionally produce additional ∆, πN → π∆. In the later stage, the system expands, particle density decreases, and interaction rate goes down. When the density becomes low enough, all ∆ particles decay to N π with the lifetime τ ∆ = /Γ ∆ 2 fm/c and π particles go out from the reaction region and are detected.
The number of π and ∆ particles in the above nucleus-nucleus collision is found to be well fitted by the Gompertz function f G (x) and its derivative f G (x). Curves in Fig. 8 show the results of fit by the Gompertz function and its derivative, where the parameters are obtained as (N π , γ π , t π ) = (62.0, 0.113 (fm/c) −1 , 14.6 fm/c) and (N ∆ , γ ∆ , t ∆ ) = (760, 0.160 (fm/c) −1 , 14.3 fm/c). Compared with the lifetime of ∆, the number of ∆ during nucleus-nucleus collisions has a longer tail. This may be because of the relativistic effects, resonance mass dependence of the width, and sequential decay and production of ∆ such as ∆ → πN followed by πN → ∆.  Fig. 8 The number of π and ∆ particles as functions of time t in central Au+Au collisions at the incident energy of 1 GeV per nucleon. Histograms show the π (red) and ∆ (blue) numbers taken from Ref. [29], and solid curves show the fitting results using the Gompertz function. The green dashed curve shows the sum of π and ∆ numbers.
Let us discuss the time dependence of the pion production in a simplified treatment, where the microscopic processes are limited to be the ∆ production from N N collisions and the ∆ decay, and the Markov processes are assumed, where C(t) denotes the production rate of ∆ from N N collisions and Γ eff represents the effective decay rate of ∆. The above ∆ decay represents the net decay rate, the difference of the rate of ∆ → N π and N π → ∆ in nuclear collisions. By further assuming that Γ eff is constant in time, the solution of Eq. (23) is obtained as N ∆ (t) = t −∞ dt e −Γeff (t−t ) C(t ). Then after the time where C(t) vanishes, N ∆ decays exponentially and its integral, N π , approaches to a constant. These features agree with those of the Gompertz function in the late stage. Earlier time behavior depends on the function form of C(t). In terms of the scaling variable, x = Γ eff (t − t 0 ), Eq. (23) reads N ∆π (x) ≡ N ∆ + N π = dN π /dx + N π = t0+x/Γeff −∞ dt C(t). The function N ∆π is a rapidly increasing function of time in the early stage, and the increasing rate becomes smaller in the later stage because of the energy loss and expansion of nuclear matter. If we fit the curve to the N ∆π data in Ref. [29] by using in the ∆ production region (t ≤ 15 fm/c), the variance of residuals (reduced χ 2 with the uncertainty of unity being assumed in the data) is the smallest with f G (x) + f G (x) as shown by the green dashed curve in Fig. 8. This supports that N π is described by a Gompertz function, while the mechanism of this function form to appear is not understood. It should be noted that the number of ∆ is proportional to the increasing rate of π number in the present simplified treatment, but the fitting to the transport model data results in γ ∆ = γ π , implying that this is not the case.
It should be noted that one ∆ particle mostly decays into N π and additionally produced number of π is less than unity in the present pion production in nucleus-nucleus collisions. A rough estimate of the upper bound of the basic reproduction number R 0 from ∆ to π may be obtained as follows. The lower bound of the number of produced ∆ particles is the peak number of ∆, which is N ∆ = 45.3 at t = 14 fm/c in the calculated data and is 13/21 expected to be N ∆ (t ∆ ) = N ∆ γ ∆ /e 44.7 at t = t ∆ from the Gompertz function. Then the upper bound of the additionally produced π number is given as N π (t = ∞) − N ∆ (t ∆ ) 17. Consequently, the upper bound of R 0 is given as R 0 = [N π (t = ∞) − N ∆ (t ∆ )]/N ∆ (t ∆ ) 0.38, which is less than unity. As a result, the number of pions in the final state is already determined in the early stage. The green dashed curve in Fig. 8 shows the "π-like" particle number, N π∆ = N π + N ∆ , which shows a peak at t 16 fm/c, gradually decreases by the ∆ absorption processes, and converges to N π (t = ∞).
Based on the success in describing N π (t) by using the Gompertz function as in the case of N (t) associated with COVID-19, we may make a conjecture of the correspondence of the COVID-19 spread and the π production in nuclear collisions. Let us assume that π particles correspond to the cases. Then ∆ particles are regarded as the coronavirus carriers, defined as the infected people who have not tested positive yet. Carriers will test positive later or will be recovered without testing positive. The susceptible people are infected and become carriers in the initial dense stage (∆ production), occasionally infect other susceptible people (additional ∆ production) or are recovered (∆ absorption), develop symptoms and test positive (∆ → N π). Thus the number of carriers (N ∆ ) increases rapidly by explosive spread of infection in the early stage (∆ production stage), while it decreases more slowly by developing symptoms in the late stage (∆ decay stage). This causes the asymmetric timeprofile in the number of carriers (N ∆ ), which is roughly proportional to the daily number of new cases (dN π /dt). By comparison, the number of infected people including carriers (N π + N ∆ ) grows rapidly in the early dense stage but does not change much in the later stage. Hence, except for the early dense stage, the basic reproduction number would be less than unity.

Summary
We have analyzed the number of cases (the number of infected people who tested positive for COVID-19), as a function of time, N (t), by using the double exponential function referred to as the Gompertz function, f G (x) = exp(−e −x ). The Gompertz function appears when the infection probability is an exponentially decreasing function of time. One of the characteristic features of the Gompertz function is the asymmetry of its derivative, f G (x) = df G (x)/dx, fast rise and slow decay.
This feature is found in the daily new cases, dN (t)/dt. We have assumed that the number of cases from one outbreak is given as N (t) N 0 exp[−e −γ(t−t0) ], where N 0 , γ and t 0 are the final number of cases, damping rate of the infection probability, and the time where the daily number of new cases peaks out. These parameters are obtained by the χ 2 fitting to the dN (t)/dt data. Then we have found that N (t) and dN (t)/dt show universal scaling, The K value, the increasing rate of cases in one week, is also found to show the scaling behavior, K(t) = 1 − f G (x K ), where x K = γ(t − t 0 − ∆t(γ)) is the scaling variable for K(t) with ∆t(γ) being a given function of γ.
We have also found that the time dependence of the produced pion number in nucleusnucleus collisions is described by the Gompertz function. Since both of the COVID-19 spread and the pion production are transport phenomena, the mechanism of the former may be 14/21 similar to the latter. If this is the case, there is a possibility that the basic reproduction number is high only in the initial stage of the outbreak.
Throughout this article, we have imposed the single-outbreak assumption. Since this assumption may be too restrictive, we show the results of multiple-outbreak analyses in Appendix A. The multiple-outbreak analyses also show that the COVID-19 spread in one outbreak is well described by the Gompertz function. We also note that after submitting the original manuscript of this article, the daily numbers of new cases are found to be significantly larger than the predictions given in Fig. 2 (and Fig. A1 in Appendix A). We give brief descriptions of the data observed later in some countries in Appendix B.

A. Multiple-outbreak model analysis
In the analyses in the main text, we have assumed that there is only one dominant outbreak. Let us consider here the multiple-outbreak cases, where we assume the number of cases is described by the sum of several Gompertz functions, As in the single-outbreak model discussed in the main text, we analyze the daily number of new cases, dN (t)/dt, where f G (x) = df G (x)/dx. In Fig. A1, we show dN/dt in Eq. (A2) in comparison with the daily number of new cases. In the multiple-outbreak analysis, we use data in the whole range, 0 ≤ t ≤ 140. Two or three outbreaks are considered, and we try to describe the region with large dN/dt by adding outbreaks. Obtained parameters are summarized in Table A1.
In many countries under consideration, dN/dt is decreasing on May 21, 2020, and the parameters are well determined. Then we adopt the three-outbreak model (n = 3). In Japan, China, and South Korea, multiple-outbreak structure of dN/dt is clearly seen in the logarithmic plot and can be fitted by using Eq. (A2). In Russia, Italy, Spain, the UK and Sweden, additional outbreaks improve the reduced χ 2 by filling the peaks which are not covered by the single outbreak. In Brazil and Indonesia, where the numbers of cases are still rapidly increasing, we need at least one outbreak term with t i > t now . In those cases, parameters generally have large uncertainties, so the third outbreak, if included, has extremely large uncertainties larger than 100 %. Thus we use the two-outbreak model (n = 2) in these countries.
In the USA, there are many centers of outbreaks. In Fig. A2, we show the dN (t)/dt data in New York, Massachusetts and California states [31]. It is possible to fit the data in each state by using the Gompertz function, but the results show significantly different values in (γ, t 0 ). 15   This would be the reason of the slow decrease of dN (t)/dt in the USA. As a result, a singleoutbreak treatment is not appropriate. By comparison, the dN (t)/dt data are reasonably explained by two outbreaks (n = 2). We have used the data in Ref. to explain the data well. This supports the idea that the number of cases in one outbreak would be described by the Gompertz function.

B. COVID-19 spread in July and August
After submitting the original manuscript in June, 2020, we find excess of cases over the predictions in several countries. It is also valuable to verify the consequences of the lockdowns. We give a brief description of the above mentioned issues. This Appendix is given as a "note added", and the details will be reported elsewhere. In Sweden, the USA and Japan, significant excess of cases over the Gompertz function analyses has been observed after mid June as shown in Fig. B1. In addition to dN (t)/dt, the daily number of deaths dD(t)/dt can be also explained by the Gompertz functions [1], as shown by blue (dN (t)/dt) and red (dD(t)/dt) solid curves. Each of the dotted curves shows the contribution from one outbreak. We have used two and three Gompertz function terms for Sweden and the USA, respectively, and three and four terms for dN (t)/dt and dD(t)/dt in Japan. In Sweden, dN (t)/dt started to increase again around t = 140, but dD(t)/dt kept decreasing. In the USA, dN (t)/dt started to increase again around t = 165. Compared with the new cases, the increase of dD(t)/dt after mid June is less prominent. The grey dashed line shows the fitting results of the new cases shifted later by seven days and multiplied by 0.07, A × dN (t − t D )/dt with A = 0.07 and t D = 7. This curve roughly agrees with dD(t)/dt 18/21  Fig. B1 The number of new cases dN (t)/dt (blue histograms), the daily number of deaths dD(t)/dt (red histograms), and the number of new PCR tests with the rescaling factor 0.1 or 0.2 (green histograms) in Sweden (left), the USA (middle) and Japan (right). Histograms show the data taken from Ref. [28]. Solid (dotted) curves show the fitting results in the multiple-outbreak model for dN (t)/dt (blue) and dD(t)/dt (red) (their breakdowns). The grey dashed curves show the shifted and scaled dN (t)/dt results in the model (see the text).
until t 130 but is larger than dD(t)/dt later. A similar trend is observed but more strongly in Japan. We find dN (t)/dt once decreased rapidly but increased again around t = 160. The shifted and scaled function of the new cases (A = 0.05 and t D = 14) roughly explains the daily number of deaths until t 170 but it overestimates dD(t)/dt later.
Let us guess the reason of these behaviors. There are several possible reasons of the dN (t)/dt increase in July, such as (a) increase of the number of PCR tests and improved sensitivity, (b) lifting the lockdown and economic resumption, and (c) genomic mutation [32]. As for the decrease of mortality (death rate), (ab') dominance by young and mild symptom cases from (a) and (b) with improved medical care, (c) genomic mutation [32], and (d) T cell immunity [33] are the candidate reasons. In Sweden, the government expanded coronavirus testing to include people with mild symptoms on May 19 (t = 141), then the increase of cases after t = 140 may have been caused mainly by the candidate reason (a), the increase of the number of PCR tests. The increase of cases would have been dominated by the people with mild symptoms and the actual number of infected people may have not increased. Then it is reasonable that dD(t)/dt did not increase. In the USA and Japan, dD(t)/dt also started to increase in late June (t ≥ 160), and then the reason (a) is not enough. We guess that the candidate reason (b), lifting the lockdown and economic resumption, would have affected the spread, and that (ab'), dominance by young and mild symptom cases with improved medical care, may have suppressed the mortality. We would like to mention other possibilities, (c) and (d). It is already pointed out that the genomes with advanced mutation are found after late June in Japan [32]. The candidate reason (c), a genomic mutation, may have caused the July and August epidemic and reduced mortality. Another candidate is (d), the T cell immunity [33]. The memory T cells exposed by the SARS-CoV-2 may work for long-term immune protection against COVID-19 [33], and these effects may be enhanced by the BCG vaccination [34]. This mechanism can be universal and explains the suppression of mortality in many countries. The actual mechanism of the present coronavirus spread would be the combination of (a)-(d) and others. Analyses of more data and further research of infection diseases are necessary to elucidate the mechanism. These are out of the scope of this article. 19 The numbers of total confirmed cases and the K value should depend on both the lockdown of the social systems and the number of PCR testing carried out in the countries. Let us here take a look at these effects in the case of the UK, as an example. In Fig. B2, we show the K value before t = 140 (left) and the number of new cases, deaths and PCR tests (right). The peak time was t 0 = 110.6 (t 1 = 106.1 and t 2 = 127.6) in the single-outbreak (multipleoutbreak) analysis as given in Table 1 (Table A1). The lockdown in the UK started on March 23 (t = 83), and the K value started to decrease around two weeks after the lockdown (t = 97) as seen in the left panel of Fig. B2, while the decrease is not enough to go well below the Gompertz curve. Nevertheless, we should mention that the lockdown prevented the next outbreak to start. The increase of new cases around t = 120 is considered to be caused by the speedy increase of the PCR testing since the daily number of deaths kept decreasing, and the next outbreak started later than the lift of the lockdown, t relaxed = 132. It should be noted that these are conjectures deduced from the dN (t)/dt and dD(t)/dt data, and confirmation by further analyses would be necessary.