Localized Orbital Scaling Correction for Systematic Elimination of Delocalization Error in Density Functional Approximations

The delocalization error of popular density functional approximations (DFAs) leads to diversified problems in present-day density functional theory calculations. For achieving a universal elimination of delocalization error, we develop a localized orbital scaling correction (LOSC) framework, which unifies our previously proposed global and local scaling approaches. The LOSC framework accurately characterizes the distributions of global and local fractional electrons, and is thus capable of correcting system energy, energy derivative and electron density in a self-consistent and size-consistent manner. The LOSC-DFAs lead to systematically improved results, including the dissociation of cationic species, the band gaps of molecules and polymer chains, the energy and density changes upon electron addition and removal, and photoemission spectra.


INTRODUCTION
Despite the enormous success of density functional theory (DFT) [1,2] in many fields of modern physics, its predictive power is impaired by the intrinsic errors associated with the approximations made for the exchangecorrelation functional. Delocalization error is one of the dominant errors of mainstream density functional approximations (DFAs). It is responsible for many failures of DFT calculations [3][4][5]. Its physical origin is the violation of the Perdew-Parr-Levy-Balduz (PPLB) condition [6][7][8][9], which requires the total energy of a system as a function of electron number, E(N ), to be piecewise straight lines interpolating between integers. DFAs suffering from delocalization error yield E(N ) underneath the piecewise linear segments for finite systems, and thus tend to give too low energy for delocalized electron distributions, and produce excessively delocalized electron distribution and sometimes qualitatively wrong density, as it falsely lowers the system energy [3][4][5]. Moreover, the error in the total energy also transfers to the error in the energy derivatives with respect to the electron number, i.e., the chemical potentials. As a consequence, the frontier orbital energies as predicted by DFAs significantly deviate from the true ionization potentials or electron affinities. This also applies to infinite systems, where delocalization error is indicated by the unphysical narrowing of the band gap.
The manifestation of delocalization error is sizedependent. Figure 1(a) explores the evolution of error in a series of loosely bound He M clusters, where all the He atoms are chemically equivalent but separated by a large distance. For a small M , for example M = 1, the energy of the highest occupied molecular orbital (ǫ HOMO ) resulting from the paradigmatic local density approximation (LDA) significantly overestimates the calculated negative vertical ionization potential (−I ve ) [10], leading to a large positive error bar (shown in red). In comparison, the deviation of −I ve from the experimental value (I exp ) is negligibly small. As the cluster grows from the atomic limit (M = 1) to the bulk limit (M = ∞), however, while ǫ HOMO gradually approaches −I ve , I ve deviates increasingly from the experimental value (I exp ), resulting in a large negative error bar.
The above deviations can be represented and understood from the fractional charge perspective [3,[10][11][12]. Figure 1(b) shows how the calculated E(N ) curve of a helium atom deviates from the PPLB condition. The discrepancy ∆I = I ve − I exp for He M corresponds quantitatively to the deviation of total energy from linearity, ∆E, for a helium atom upon removal of 1 M electron. In particular, ∆E( 1 M ) = 1 M ∆I(He M ). Moreover, ǫ HOMO − I ve (He) = ∂E ∂N | − − I ve (He) = − d∆E dx x=0 + is the tangent slope error at integer N (note the positive x direction in Figure 1(b) is to the left), which agrees with −∆I(He M ) only in the infinite M limit. To see this, we Therefore, the positive error bar in the orbital energy for M = 1 exactly agrees in absolute value with the negative error bar in the total energy for M = ∞, demonstrating that the delocalization error for finite (small) and bulk systems are similar but are manifested in two different ways [3]. The situation is similar for many other DFAs, such as the popular hybrid B3LYP functional [13,14]. As shown in Fig. 1, inclusion of Hartree-Fock (HF) exchange only slightly reduces the energy deviations, because the delocalization error is compensated partly by the localization error (referring to E(N ) above the piecewise linear segment) associated with the HF exchange.
Here we make a comparison between the delocalization error with related concepts of self-interaction error (SIE) and many electron self-interaction error (MSIE). SIE was the first concept to describe systematic error in DFAs and refers to the failure of compensation between the Hatree and exchange-correlation functional, or the error in the electron interaction energy, with DFAs in one-electron systems. [15] The SIE concept also applies to fractional electron systems with less than one electron. [16] It had been presumed to be the key reason for some systematic failure in DFAs and various SIE-free functionals have been constructed. [11,[15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30] However, in 2006 this presumption was discovered untrue [11]: it was found that two well-developed functionals MCY2 [22] and B05 [31], while both are SIE-free by construction and have improvements over hybrid functionals in describing chemical reaction barriers, still exhibit similar behavior on the remaining difficulties such as incorrect dissociation limit for molecular ions and overestimation of molecular polarizability for polymers, which was once associated with SIE. This leads to the identification of a different source of the systematic errors of DFAs, namely, the essentially convex deviation from the exact linear condition of fractional number of electrons, and the introduction of the concept of many-electron self-interaction error (MSIE). [11] However, multiple definitions of MSIE exist: the term many-electron self-interaction error has also been defined in Ref [21,32], yet it was used to describe both convex and concave behaviors of fractional charges, which leads to totally different behaviors of errors in DFAs. [3] The concepts of localization error (LE) and delocalization error (DE) were then introduced to capture the physical essence of the problem for systems with any number of electrons. [3] Delocalization error, which exists in all commonly used DFAs including hybrids, describes the essentially convex deviation from the exact linear E(N ) curve for fractional charges and highlights the associated unphysical delocalization of electrons, or unphysically low energies for delocalized electrons. Localization error, which exists in Hartree-Fock (HF) approximation, describes the concave deviation from the exact linear E(N ) curve for fractional charges and highlights the associated unphysical localization of electrons. Moreover, the concept of DE essentially incorporates the concept of SIE. For one-electron systems, the SIE grows with increasing local fractional electron numbers and this is the reason for the failure in describing H + 2 disscociation, as first pointed out in Ref [16]. This can also be characterized by DE. The only missing component of SIE in the description of DE is the integer energy error for one electron systems that are compact as in molecular equilibrium structures. Nevertheless, for such systems, the SIE of commonly used DFAs is much smaller compared to fractional systems. [16] Neglecting the integer error, DE and SIE become identical for systems with one electron or less. In other word, functionals that are free of DE is essentially SIE-free, while SIE-free functionals can still suffer significantly from DE, just as other commonly used DFAs.
DE thus pinpoints the critical flaws in commonly used DFA and its reduction has been the driving force in many functional developments. As illustrated in Figure 1, most problems of DFAs relating to delocalization error can be quantitatively connected to the fractional E(N) curve of a single He atom. The corresponding localization error for HF is shown in the supplemental material. [33] Enormous efforts have been devoted towards systematic removal of delocalization error. These include the development of global hybrid [13,34], local hybrid [35,36], doubly hybrid [37][38][39] and range-separated functionals [40][41][42][43][44][45][46][47][48][49][50][51][52][53]. Like the B3LYP, the performance of these DFAs relies on the cancellation of errors, and is often systemdependent. There are also attempts focusing on specific properties, such as the Koopmans-compliant functional [54], the generalized transition state method [55] and related approaches [56], and others [57][58][59].
To achieve a universal elimination of delocalization error, it is crucial to treat fractional electron distribution explicitly and properly. Following this idea, in 2011 we have developed a global scaling correction (GSC) approach by imposing the PPLB condition globally on any given system [60][61][62]. The GSC largely restores the energy linearity condition at any fractional electron number, and thus reduces substantially the discrepancy between ǫ HOMO (ǫ LUMO ) and −I ve (−A ve ) for systems of all sizes. Here, ǫ LUMO is the energy of the lowest unoccupied molecular orbital, and A ve is the vertical electron affinity. Retrieving PPLB condition for electron addition within the framework of DFT guarantees the improvement in the prediction of −I ve using ǫ HOMO , as well as −A ve using ǫ LUMO . [10] Despite the success of GSC, it does not offer any cure to the size-dependent discrepancy ∆I, because it gives zero correction to energies at integers. This implies that (1) the parent functionals such as LDA is not size-consistent for calculations of I ve (A ve ); and (2) any functional correction that cannot affect the energies at integers is not size-consistent either [63]. To address this issue, in 2015 we have developed a local scaling correction (LSC) scheme that enforces the PPLB condition on local subsystems [64,65]. The LSC is capable of correcting the total energy and electron density of integer-N systems self-consistently, and thus leads to much improved description of dissociating molecules, transitionstate species and charge-transfer systems. However, the LSC has difficulty in capturing an infinitesimal amount of fractional electron, and so it cannot improve the prediction of ǫ HOMO , ǫ LUMO and thus fundamental gaps from the chemical potential differences [3][4][5].
It would be ideal to incorporate the merits of describing global fractions (as in GSC) and local fractions (as in LSC) in one framework while avoiding their difficulties. To this end, two major advancements are needed: (1) a unified scheme for characterizing global and local fractional electron distributions; and (2) an explicit and size-consistent correction to system energies at both fractional and integer electron numbers. In this paper, we develop a localized orbital scaling correction (LOSC) framework, which realizes these advancements. To the best of our knowledge, this is the first density functional approximation that alleviates delocalization error in all aspects without involving system dependent parameters. In the rest of the paper, we will first revisit the forms of GSC and LSC, and then proceed to formulate the LOSC functional. Finally we will close with some concluding remarks.

MOTIVATION
The basic idea of GSC is to add a correction functional to linearize each of the nonlinear components of the KS functional in the presence of fractional electron in the global system. The nonlinear components include the Hartree functional and the exchange-correlation functional, with the former being a quadratic functional of the electron density ρ and the latter having a more complicated scaling relation with ρ. Nevertheless, by observation it has been found that the energy deviation from the linear E(N ) often displays a nearly parabolic shape, which suggests that one can approximately write the correction formula in terms of a quadratic expression of the fractional electron number as follows, Here n f = N −[N ], and also equals the fractional electron occupation number on the KS frontier orbital, and κ is a functional of the frontier orbital ϕ f (r), which itself is a functional of the KS reduced density matrix ρ s . For LDA functional, in particular, κ is approximated by the following form, [60] where ρ f (r) = |ϕ f (r)| 2 and C x = 3 4 ( 6 π ) 1/3 . One can verify that ∂∆E GSC ∂n f = ± 1 2 κ at n f → 0 or 1 gives a correction to the tangent slope at integers, while ∂ 2 ∆E GSC The functional form of LSC can be viewed as a "local" version of Eq. (2), withñ f (r) andκ(r, r ′ ) being the local fractional electron occupation and local curvature, respectively. These local variables are introduced as functionals of the KS reduced density matrix to capture the local fractional information. Note despite the fact that the original form of LSC as in Ref [64] slightly differs from Eq. (4), with some range separation and extra non-local attraction, Eq. (4) captures the main idea of LSC. Here for the simplicity of comparison, we stick with Eq. (4) as the approximate form of LSC.
Let us consider two typical examples: (1) H + 2 at large internuclear distance R; (2) He M cluster as mentioned in Figure 1. In (1), at large R, the one-electron H + 2 molecule yields a delocalized electron density over the two separated protons, with each subsystem density integrating to half an electron. The LDA severely underestimates the energy of the stretched H + 2 because of delocalization error. By design, GSC cannot capture the locally half electron information because it counts fractional electrons globally. In contrast, LSC yieldsñ f (r) ≃ 0.5 near each proton by imposing a spatial screening on the density matrix, from which the local information is extracted, and then effectively performing the calculation of n f (r)−n f (r) m , with n f (r) being the fractional component of the locally screened density matrix and m a large integer (fixed to be 10). It is by the vanishing nature of high powers of any fractional number that enablesñ f (r) to approximate n f (r), and thus capture the half electron information. Yet, this only allows us to distinguish a fractional number from integer 0 or 1; it cannot distinguish two integers because n f (r)−n f (r) m = 0 identically for all such occupations. Moreover, distinguishing a tiny fractionalñ f (r) from zero is also numerically difficult. This causes trouble in example (2). On the one hand, in the case of He + M , as M → ∞ the local fraction becomes vanishingly small, which poses numerical challenge for LSC to capture the tiny fraction. On the other hand, in the case of neutral He M where local fraction is absent, LSC cannot capture the right "local frontier orbital" to compute the local curvature for the frontier orbital energy correction.
The above analysis thus suggests that the key is to capture the "local frontier orbital" and its local occupation in order to solve all the problem together. Here we highlight that the two key words are "local" and "frontier". The extension from GSC to LSC captures "local" but overlooks "frontier", with the latter requiring energy information to enter in the local variable construction, rather than simply invoking the density matrix. This is a great challenge if our local extension is through the r space, since then we have to devise a "local frontier energy" function to achieve our purpose, which is difficult in both conceptual and practical manner.

FORMULATION OF LOSC
We now pursue an alternative way to realize the local extension through the orbital space, by invoking localized orbitals (LOs). Note that the KS density matrix is a sum over KS canonical orbital (CO) projections, ρ s = m n m |ϕ m ϕ m |, and the occupation numbers {n m } are all integers (0 or 1) for integer systems. The COs are not the only choice for unraveling ρ s . Alternatively, we can exploit a localized representation of the density matrix as (5) where {φ i (r)} is chosen to be a set of orthonormal LOs, and λ ij = φ i |ρ s |φ j serves our purpose of being a local occupation matrix. In particular, one can show that the diagonal elements satisfy 0 λ ii 1 and i λ ii = N , so that each λ ii plays the role of an occupation number associated with φ i . Moreover, our desired local fractions arise naturally through the fractional components of {λ ii }. This is the motivation of LOSC. Now the remaining task is to construct the LOs from ρ s and to build a correction functional out of {φ i } and λ ij .
In the orbital space, the LOs should come from a unitary transformation of the COs through a localization procedure. In the traditional procedures, such as in the Boys [66] and the Ruedenberg [67] prescription, one minimizes a target function of only occupied COs, rendering the virtual COs irrelevant. These approaches, however, cannot serve our purpose. In the H + 2 , for example, only one occupied CO exists, which leads to identical orbital with itself no matter what localization scheme is implemented. To obtain the desired LO, we have to incorporate the virtual COs into the localization scheme to form collective unitary rotations with the occupied orbitals. Moreover, we note that at large R, the HOMO and LUMO are near-degenerate, and energetically separated from other COs; the desired LOs should come from linear combinations (mixtures) of only HOMO and LUMO, and not from any other CO. Therefore, the localization procedure should be able to limit the mixing of these other orbitals with HOMO/LUMO. This suggests that our desired localized orbitals should achieve a compromised localization both in the physical space and in the energy space. In contrast, traditional localized orbitals keep localization only in the physical space by construction, while traditional canonical orbitals maintain localization only in the energy space, being energy eigenstates of an one-particle Hamiltonian. Here to better describe our desired localized orbitals, we introduce a new concept and call them orbitallets, in analogy to wavelets, which achieve a compromised localization both in the physical space and in the momentum space. [68] There are many ways to obtain localization in both the physical and energy spaces. A simple way to achieve this is to modify the localization target function by adding a penalty function that enforces the localization of CO energies. Here to have localization in energy space for our LOs (orbitallets), we have to define an energy spectrum for them. In this paper, we define it to be the same as the CO spectrum. One can invoke other definitions, but our definition seems to be the most natural choice. Each LO with energy ǫ LO i results from a mixture of the COs whose energies are within a certain energy window of a fixed size centered at ǫ LO i , and the mixing coefficients come from a unitary matrix U im = φ i |ϕ m . See Figure 2 for a schematic illustration. It is worth mentioning that localization involving both occupied and virtual orbitals have been used previously within a fixed energy window near the Fermi level for constructing maximally localized Wannier functions for systems with entangled energy bands. [69] Defined in a different way and for a different purpose, our localized orbitals, orbitallets, have dynamic energy windows opened at each ǫ LO i and are designed to capture the local fractions in the system.
In our present implementation of LOSC, {φ i (r)}, the LOs, are generated by a restrained Boys localization procedure [33,66], which aims at minimizing the following spread function Here the first term on the right hand side of Eq. (6) resembles the Boys spread function, except that the sum rums over all the orbitals. In the second term, a penalty function is introduced to suppress mixing of orbitals beyond an energy radius ǫ 0 , in order to achieve energy localization. As a remark, without the penalty term, minimization of F with infinite number of orbitals (which span the entire Hilbert space of functions) will lead to a set of δ functions centered at different positions and F = 0. Therefore, from pure mathematical perspective, the penalty function is also needed if the virtual orbitals participate in the localization. From the physical considerations, the penalty function w(x) should satisfy the following requirements: (1) w(0) = 0, which implies no penalty imposed against mixing between degenerate COs; (2) w(x) is a monotonically increasing function; (3) w(∞) = ∞, which forbids mixing between COs that are far apart in energy. In this paper, we try to use the following simple function to achieve the above properties, which involves four parameters. R 0 can be considered as related to the typical spread radius of small molecules, and is introduced here to factor out the length unit; ǫ 0 is the energy window as mentioned above; and γ and η are the other parameters to adjust the shape of the function, in particular, the asymptotic behavior as x → 0 and x → ∞. Note that γ has to be positive to satisfy condition (3). However, after some numerical experiment, we find that a positive γ imposes insufficient penalty for x < ǫ 0 , leading to excessively artificial local fractions for compact molecules. Therefore, we modify the function by dividing it into a piecewise function, with γ = 0 when x < ǫ 0 and The parameters are optimized for a balanced behavior on thermochemistry, reaction barrier heights and dissociation curves, resulting in R 0 = 2.7Å, ǫ 0 = 2.5eV, γ = 2.0 and η = 3.0, see the supplemental materials for more details. [33] The penalty function in Eq. (8) is continuous but not smooth at x = ǫ 0 . Yet this artificial feature has little impact on our results and can be easily removed by rounding it out or by a smooth interpolation between the two constructing functions. In the present work, since we put more emphasis on the general idea, we will not need to refine the details and make more sophisticated forms.
In the minimal basis, the H + 2 molecule has two KS orbitals. Figure 3(a) depicts the LO densities of a compact H + 2 . Both LOs resemble the original COs, and the diagonal elements of the local occupation matrix remain integers, λ 11 = 1 and λ 22 = 0. This is because the HOMO and LUMO are far apart in energy so that their mixing is suppressed, rendering LOs of the similar character as the COs. This is reasonable since there is hardly any fractional electron distribution at a small internuclear distance R. In contrast, at a large R the two nearly degenerate COs fully mix into two spatially separated LOs. The two LOs locate symmetrically at the two nuclei with λ 11 = λ 22 = 0.5, which reveals the fact that each proton carries half an electron (see Figure 3(b)). In addition, the above behavior is almost independent of the basis set.
Here as a remark, for H + 2 at a small R or a large R, different choice of the localization target functions, for example Boys or Ruedenberg, does not make a difference in the optimized LOs, as long as the penalty function satisfies the three conditions. This, however, will make a difference in the intermediate R or in a more complicated molecule. Yet this is a minor effect and beyond the scope of the present paper. Here we choose to modify Boys localization because it can be easily implemented and applied to systems of all sizes, and more importantly the Boys LOs can be replaced by Wannier functions [70] for periodic solids.
Explicit functional form. With the LOs (orbitallets) and λ-matrix, an explicit form of LOSC is constructed, similar to Eqs. (2) and (4): where Here the diagonal terms in the summation of Eq. (9) is a natural extension of the energy correction formula of Eq. (2) from one global fraction to all the local fractions. Note that for λ ii = 0 or 1, their contributions to ∆E LOSC are zero and thus have no impact on the sum. The off-diagonal terms are introduced as non-local corrections to the unphysical interaction between the local fractions centered at different positions. They play similar role as the long-range attraction term in the LSC functional. The curvature matrix elements {κ ij } for LDA and the generalized gradient approximations (GGAs) are calculated via where ρ i (r) = |φ i (r)| 2 is the ith LO density. This is a straightforward extension of Eq. (3) in a symmetric manner with respect to i and j, but with a parameter τ . In Eq. (3), τ is set to 1, which is good for the orbital corrections. In particular, in the case of HOMO energy correction of a hydrogen atom, with the exchange only LDA functional one can show that Eq. (3) exactly compensates the wrong slope under the frozen orbital assumption. [33] However, τ = 1 does not retrieve the right amount of correction for H 1/2+ . This is because the LDA exchange functional is not quadratic, so that the energy deviation from linearity in the E vs N curve of (exchange only) LDA is not strictly quadratic either, although it can be approximately treated as a parabola. As a consequence, under this parabolic assumption, one cannot simultaneously retrieve the right slope at integer and the energy at half integer. In order to do that, higher order corrections have to be introduced.
[62] In the present paper, we are biased towards the half integer energy within the parabolic correction under the frozen orbital analysis, and assign τ a nonempirical value of τ = 6(1 − 2 −1/3 ) ≈ 1.2378 [33]. In the frozen-orbital approximation, Eq. (9) leads to the following correction to the CO energies [33]: As shown in Fig. 3(c), the dissociation energy curve of H + 2 is greatly improved for R > 2Å by using Eq. (9); while for more compact geometries (R < 2Å) the energy correction is almost zero. The latter is due to the specific form of the penalty function adopted in the restrained Boys localization of Eq. (6) -it is designed to preserve the good accuracy of parent DFAs on thermochemistry for small-and medium-sized molecules. In particular, at a small R, the localization plays trivial role in the sense that the unitary transformation U is almost an identity matrix. This suggests that the minimizer of the restrained Boys target function is achieved almost at its boundary due to the small overall spread of COs and large penalty against their mixing. As R increases, the CO spread grows while the penalty term is alleviated. When R reaches a critical value (which is mainly dependent on the R 0 parameter; smaller R 0 will reduce the critical R), mixing between COs becomes favorable so that U becomes different from identity. This causes a kink at the critical R, an artifact due to the choice of our present localization scheme, and shall be addressed through a better construction scheme in future work. In addition, we note that for large R, the LOSC-LDA energy is slightly above zero, which suggests that H 1/2+ is over-corrected. This is because (1) the LDA correlation energy correction has been neglected-we introduce a non-empirical τ parameter only to account for the LDA exchange curvature; (2) the unrelaxed LDA COs in the post-SCF implementation leads to some overcorrection, which is another effect in the design of the τ parameter. Self-consistent implementation could only modestly relax the energy, without achieving the zero energy limit; in order to reach the zero energy limit, one has to go beyond the frozen orbital assumption in the design of curvature matrix and total energy correction formula.
One can also apply LOSC to other more accurate parent functionals. In this paper, we have achieved this by designing flexible forms of κ ij [33] on the basis of the curvature formula of LDA for many other types of DFAs , including the GGAs, the hybrids such as the B3LYP, the range-separated functionals such as the CAM-B3LYP [48], etc. These DFAs suffer from the delocalization error to different extents, while the LOSC gives similar corrected results; see for instance the H + 2 dissociation curves calculated by LOSC-LDA and LOSC-B3LYP in Fig. 3(c). Moreover, the fact that LOSC-B3LYP energy at large R is almost perfect suggests that LOSC applied to a better parent functional leads to better results.
In a related effort, Anisimov and Kozhevnikov have developed a generalized transition state (GTS) method to improve the LDA calculation for band gaps of solids [55]. Their suggested energy correction amounts to ∆E GTS = i 1 where each κ ii is determined by a separate constrained LDA calculation. A similar scheme was recently constructed by Ma and Wang [56]. In these works the LOs come from mixing of only occupied or virtual COs in the localization and their energy corrections, thus do not change the total energies for physical systems with integer number of electrons; hence these energy functionals are not size consistent [63] and can only correct orbital energies. These methods have only been implemented as post-DFT corrections rather than in a self-consistent manner, thus they rely on the qualitatively correct DFT densities to produce reasonable corrections. In contrast, the LOSC framework uses mixing of the occupied and virtual COs in the localization and offers an explicit form of Eq. (10) for computing the κ-matrix. It changes the DFA energies both at integer and fractional electron numbers. Moreover, Eq. (9) involves off-diagonal κ ij and λ ij that are crucial because they dispel the unwanted interactions between LO pairs. In the case of H + 2 dissociation, it is only with these off-diagonal terms that the correct asymptotic behavior as R → ∞ can be retrieved. Importantly, LOSC is a functional of the non-interacting density matrix and can be implemented self-consistently within the generalized Kohn-Sham approach.
Self-consistent corrections to energy and electron density. For practical calculations we have devised a selfconsistent field (SCF) procedure, with which the LOSC approach improves E(N ) and ρ(r) simultaneously. The SCF procedure consists of a series of steps as follows, In step (I) a projected KS (or generalized KS, GKS) Hamiltonian is constructed from the initial density matrix as h p = ρ s h 0 ρ s + (I − ρ s )h 0 (I − ρ s ), with h 0 = δE DFA δρs being the KS/GKS Hamiltonian of the parent DFA. Here, the projections on h 0 using ρ s and I − ρ s avoid overcorrecting the energies of compact molecules by rendering λ ii close to integer 0 or 1. [33]. In step (II) h p is diagonalized to generate the auxiliary COs. In step (III) the restrained Boys localization is carried out to obtain the LOs. In step (IV) the LOSC contribution to GKS Hamiltonian matrix is computed via ∆h = δ∆E LOSC δρs . Finally in step (V) ρ s is updated by minimizing the total energy with the aid of approximate gradient h 0 + ∆h [33]. Steps (I)-(V) are iterated until the initial and final density matrices are equal. It is worth pointing out that although the LOSC involves orbitals, it remains a functional of the density matrix and its SCF implementation can be carried out within the GKS scheme (or the Hartree-Fock-Kohn-Sham scheme) [71], because the Hamiltonian h p and the COs and LOs are all determined by ρ s . Thus, ǫ HOMO /ǫ LUMO of the LOSC hamiltonian h 0 + ∆h is the chemical potential for electron removal/addition and the HOMO-LUMO gap is the derivative gap, which are theoretical predictions of the fundamental gap in both finite molecules and bulk [3,5,10].
The implementation of LOSC is very efficient, since the computation of pertinent quantities such as λ and κ are straightforward. The restrained Boys localization can be conducted efficiently using the Jacobi sweep approach [67,84]. Consequently, the extra computational cost due to the LOSC procedure usually amounts to a small portion of the overall cost.

RESULTS AND CONCLUSION
In the following, we demonstrate that the LOSC approach generally alleviates the delocalization error associated with the mainstream DFAs, and thus cures many related problems in practical DFT calculations.
To start with, the size-dependent deviations between the calculated ǫ HOMO and −I ve and between I ve and I exp are mostly eliminated by applying the LOSC, suggesting that LOSC achieves size-consistency; see Fig. 1(a). As indicated in Fig. 1(b), the LOSC largely straightens the E(N ) curve between integers. Furthermore, the straightening of E(N ) curve is achieved not only for electron removal (related to HOMO prediction), but also for electron addition (related to LUMO prediction); not only for compact systems, but also for dissociating molecules such as He + 2 . [33] In the latter case, when the parent DFA predicts wrong integer energy, merely straightening the E(N ) curve does not cure the problem. In such case, the LOSC not only straightens the curve, but also shifts the integer energy so as to point to the right slope, see for instance the E(N ) curve connecting He 2 and He + 2 at R = 5Å. [33] Besides the H + 2 shown in Fig. 2(c), the LOSC also systematically improves the dissociation behavior of many other molecular cation species, such as the He + 2 , the water dimer cation, and the benzene dimer cation [33]. This suggests that the LOSC does not deteriorate with system size.
The LOSC systematically improves the prediction of ǫ HOMO , ǫ LUMO and thus the fundamental gaps for sys-tems of all sizes, ranging from atoms and molecules to polymers such as polyacenes and trans-polyacetylenes; see Fig. 4. In contrast to GSC which has no effect (and LSC which has numerical difficulty) on band gaps of bulk, [60] the LOSC gives a promising improvement as it has a nonzero correction on polymers in the extrapolated infinite chain length limit.
In addition to ǫ HOMO and ǫ LUMO , the LOSC also corrects the energies of other KS orbitals via Eq. (11). As depicted in Fig. 4(d), the orbital energies of pentacene predicted by the LOSC-LDA agree accurately with the peak positions in experimental photoemission spectrum. This feature will be further studied in our subsequent papers. Finally, the LOSC also corrects the wrong electron density caused by the delocalization error. A typical example is a solvated Cl anion [4,5]. As shown in Fig. 5, the PBE functional [88] erroneously predicts that the excess electron delocalizes over several water molecules, while the LOSC-PBE yields the correct distribution that the excess electron mostly localizes on the Cl atom.
The examples shown demonstrate that the LOSC approach remedies a wide range of problems caused by the delocalization error of DFAs. The LOSC functional is very different from conventional density functional constructions, which are explicit analytical functionals of the density, the density gradients or the KS reduced density matrix. The LOSC framework utilizes many other information, such as the localized orbitals , local occupation matrix and parent DFA reference spectrum, which themselves are implicit functionals of ρ s . This opens up a lot more possibilities in the exploration of the exact functional within the functional space. As a first effort, the LOSC functional presented in this paper addresses the size-consistency problem and exhibits a systematic elimination of delocalization error in all aspects. With a more sophisticated localization procedure and a more complete form of ∆E LOSC addressing local fractional spins, it is possible to eliminate other intrinsic errors of the mainstream functionals, and systematically improve the density functional approximation to a greater extent. Support from the National Science Foundation (CHE-1362927) ( Given two non-interacting system A and B, size consistency requires the energy of the super system of A · · · B, at the infinite separation limit, to be equal to the sum of energies of isolated system A and B, i.e., This requirement is often applied to neutral systems. Here we extend the size-consistency condition to general subsystems with arbitrary number of electrons. Let E(Z; N Z ) (Z = A or B) be the ground state energy of an isolated system Z with N Z (integer) electrons. Then the size consistency condition for the super system A · · · B with N = N A + N B electrons is that E(A · · · B; N) = min As a special case, for any neutral system Z with N Z electrons, consider a super system of Z · · · Z · · · Z with M subsystems, let us denote as Z M . Then by condition (S1), we have

II. DETAILS ABOUT RESTRAINED BOYS LOCALIZATION
In the present implementation of LOSC, we define the localized orbitals {φ i } through a restrained Boys localization procedure.
Here {ϕ i } and {ǫ CO i } are auxiliary canonical orbitals and orbital energies of h p (see the main text); {ǫ LO i } are introduced as associating to localized orbital energies, which are aligned according to the CO energies, ǫ LO i = ǫ CO i , so that the artificial LO spectrum is identical with the auxiliary CO spectrum. F is the restrained Boys target function, given by where the penalty function is a function of the energy difference between an LO and a CO, and we adopt an empirical form as where R 0 can be considered as related to the typical spread radius of small molecules. Here u becomes a dimensionless function with the form of (see Fig. S1 for the lineshape), where and We optimize the parameters for LOSC-BLYP (which are the same as LOSC-LDA, see the next section) to obtain a balanced behavior for thermochemistry, reaction barrier heights and  Figure S1. In the present version, the LOSC has negligible correction to compact molecules, and therefore has almost no correction to thermochemistry and little correction to transition state species, see Table S3 for the overall performance of LOSC on thermochemistry and reaction barriers.

III. DETAILS ABOUT CURVATURE FORMULA FOR GGAS AND HYBRID FUNCTIONALS
The LOSC curvature for GGAs have been set to be the same as LDA, which reads For global hybrid and range-separated hybrid functionals in general, the Coulomb interaction is separated into attenuated short-range and longe-range, where erf is the error function. The first term on the rhs is the attenuated short-range part, treated by GGA exchange, while the second term is the attenuated long-range part, treated by HF exchange. Note the global hybrids correspond to β = 0, while long-range corrected functionals correspond to α = 0 and β = 1. When both α and β are fractional numbers, it corresponds to the most general form. We design the curvature formula for the general hybrid functional as given by where v(r 12 ) is the attenuated short-range Coulomb interaction.
Note that for the range-separated hybrid functionals, the HOMO-LUMO gaps are usually much larger than LDA and GGAs. To account for this effect, we choose R 0 = 2.0Å for LOSC-CAM-B3LYP, while for LOSC-GGAs and LOSC-B3LYP The parameters are set to be the same as LOSC-LDA.
In terms of τ parameter, we choose a nonempirical one so that the exchange energy satisfies the following linear condition at half electron: Note here ρ 1 is an arbitrary one electron density. It follows that which leads to Note that the above analysis is based on the frozen orbital assumption. Now suppose one wants to retrieve the right slope produced by the LDA exchange at integer n = 1, i.e., then it follows which leads to τ = 1. This is the parameter used in the global scaling correction.
Regarding the numerical implementation, the double integrals in the curvature formula are evaluated using the resolution of identity (or density fitting) technique [5,6].

IV. DETAILS ABOUT THE SELF-CONSISTENT IMPLEMENTATION OF LOSC
The LOSC effective Hamiltonian ∆h is given by where ∆E is the LOSC correction. Here the first term on the rhs (we denote it as ∆h 1 ) is the frozen orbital contribution, whereas the last two terms (we denote it as ∆h 2 ) arise as the orbital relaxation term. ∆h 1 can be written explicitly as On the other hand, ∆h 2 must be evaluated through a multi-step chain rule which leads to a complicated expression.
In the present self-consistent-field (SCF) implementation of LOSC, we adopt the frozenorbital approximation and ignore the contribution of ∆h 2 , and use h 0 +∆h ≈ h 0 +∆h 1 as an approximate effective Hamiltonian to perform the energy minimization through a gradient descent approach with line search.
All the coupled cluster calculations were done using the Gaussian09 program [9]. In the main text, the basis set used is 6-31+G* for Fig. 4; an even larger basis set would be too demanding with our computational resources.
As a supplement to Fig. 1 of the main text, in Fig. S2, we show the error evolution of the HF functional for the loosely bound He M cluster. As can be seen in Fig. S2(a) and (b), the deviation of the HOMO energy from the minus integer IP and the deviation of the integer IP from the experimental IP are all independent of M, and agree with the error for a single He atom. This is due to the localization error intrinsic in the HF functional, as displayed in Fig. S2(c): its E(N) curve is piecewise concave, which suggests that the fractional system energies are overestimated. Therefore, it tends to avoid forming local fractional charges in He + M , instead, the removed electron, or the hole, localizes on one of the He atoms, leaving the rest of the He atoms as spectators rather than participants in sharing the hole. This is in contrast to the consequence of the delocalization error as discussed in the main text, and explains why the errors are constants in Fig. S2(a) and (b). In particular, for one He atom, ǫ HOMO + I ve agrees with the slope error as indicated by the arrow in Fig. S2(c). It is a negative error due to the concave nature of the HF E(N) curve. Moreover, we note that the integer IP by HF is not a good approximation to the experimental IP due to the lack of correlation. In particular, the missing correlation energy for the neutral He atom is greater than the He + . As a result, the integer IP is overestimated, as shown in Fig. S2(b) (overestimation translates to negative error). Moreover, the overestimation is in similar magnitude as ǫ HOMO + I ve , making ǫ HOMO a good approximation (slightly underestimated) to −I exp through error cancellation.
As a further remark, for infinite systems, for example the limit of M → ∞ for the He M cluster, there are two ways to compute the E(N) curve, i.e., with or without the periodic boundary condition (PBC). Without PBC, the E(N) should be identical with that of a single He atom (concave curve) and the ground state is a symmetry-breaking answer. With PBC, one should obtain a symmetric solution whose energy is higher than without PBC and the E(N) is linear. In the former case, the HOMO energy error relative to the experimental IP transfers to that of a single He atom. In the latter case, however, the HOMO energy error is completely given by the integer IP error of the He M cluster, which again can be related to the slope error of the E(N) curve of a single He atom, which is identical with the HOMO energy error of a single He atom. This can be deduced by making analogy to the discussion of delocalization error for infinite systems. Note that all the results of He clusters and discussions presented in this paper on DE or LE for different DFAs in finite systems and bulk systems (periodic or otherwise) are completely consistent with the results of Ref. [1]. In conclusion, the HF HOMO energy error for bulk has close connection with that of a single atom regardless of how the calculation is performed. Here the results are obtained by doing a one-shot calculation based on the SCF density matrix of the parent DFAs (post-SCF calculation). This is because the parent DFAs give qualitatively right density for these systems and the SCF implementation of LOSC will only slightly change the density. By performing SCF calculation using LOSC-LDA for H + 2 and comparing it with the post-SCF results, the total energy only decreases by no more than 3 mHartree. In terms of the post-SCF behavior of LOSC-LDA on these four systems, they are qualitatively similar; the dissociation curves are largely improved upon LDA. Moreover, for each dissociation, LOSC on top of different parent functionals lead to similar results and they all improve tremendously over their parent DFAs.
As a supplement to Fig. 3(a)  Furthermore, to validate the linearity condition for fractional charge systems, we compare the E vs N curve regarding electron removal of He 2 with increasing internuclear distances for BLYP and LOSC-BLYP, benchmarked by CCSD(T) as shown in Figure S5. As can be seen, the BLYP exhibits convex curves for all R, while the correct linear curve is largely restored by LOSC-BLYP. It is worth noticing that the LOSC not only changes the wrong curvature, but also corrects the wrong integer point. For example, when R = 3 and 5Å, the N − 1 system energy is greatly underestimated by BLYP, but reaches good agreement with the CCSD(T) result when applying LOSC. In Figure S6, we show the E vs N curve for F 2 molecule, in the case of electron removal and electron addition. As can be seen, both curves are convex for BLYP but straightened by LOSC-BLYP.
In Table S1, we further present the LOSC results in terms of mean absolute deviation of −I DFA ve (−A DFA ve ) and ǫ HOMO (ǫ LUMO ) based on LDA, GGAs, global hybrid and range separated functionals, relative to the best estimates of the −I ve (−A ve ). Here, as in the main text, the calculations are done in the post-SCF manner, in the sense that the SCF converged density of the parent functional is used to evaluated h 0 + ∆h 1 and ∆E. And the LOSC orbital energy is computed through where ψ i 's are the canonical orbitals of the parent functional.
In fact, an alternative way of computing the orbital energy is through diagonalizing the LOSC effective Hamiltonian h 0 + ∆h 1 in the post-SCF manner. This is in the similar spirit as the SCF implementation, but in a one shot calculation. We have compared the orbital energy results of LOSC-LDA using post-SCF projection and post-SCF diagonlization, and they differ by 0.02eV at most for atoms or molecules in the G2-97 set.
Regarding Fig. 3(b)-(d) of the main text, we have truncated the number of LOs in the localization and procedures following that to save computational cost without affecting the numerical accuracy. Essentially, the orbitals (COs or LOs) with too low or too high energies have little contribution to the total energy or orbital energies near the frontier level, although they do affect the correction to the lower-lying or higher-lying orbital energies. Therefore, for the correct prediction of HOMO/LUMO energies and photoemission spectra near the frontier level, we have restricted the energy range from -30 eV to 10 eV and only consider the LOs within this range to tremendously reduce the computational cost during the localization and curvature computation procedures. By comparing the results with and without truncation for polyacenes with n = 1-3, we find that energy levels of interest change by less than 0.01 eV for LOSC-LDA. Here the calculations have been based on post-SCF projection. In Table S2, we supplement the LOSC results of polyacenes using other mainstream DFAs as parent functionals. The global scaling correction (GSC) results are also listed as a comparison.
From Table S2, it is obvious that the effectiveness of LOSC is insensitive to the parent DFAs; and its behavior is rather size-independent. This is in clear contrast to the GSC, which behaves very well for small-sized systems, but becomes insufficient as the system size increases.
Finally, the LOSC largely preserves the thermochemistry of the parent functionals, as can be seen from subsets, spanning the occupied and virtual space, respectively; and the set {φ i } can be nearly separated into the two subsets. As a result, the diagonal elements of the local occupation matrix λ ii = φ i |ρ|φ i ≈ 0 or 1, and the off-diagonal elements λ ij = φ i |ρ|φ j ≈ 0. Then it is obvious that the correction energy is small, hence the thermochemistry is preserved. When the bond is stretched, corresponding to transition state species, then the restrained Boys localization takes effect and orbitals from occupied and virtual spaces begin to mix into φ i 's, yielding nonzero λ ii and λ ij . This is when we see the correction in reaction barriers and dissociating molecules. In the present paper, however, we adopt a conservative localization scheme, so that slight stretch of molecules does not incur orbital mixing from occupied and virtual space. This leads to little correction to the reaction barrier heights. Here the LOSC results are based on post-SCF calculations using the parent functional density. The basis used for each calculation is (a) 6-311++G(3df,3pd), (b) aug-cc-pVTZ, (c) 6-311++G(3df,3pd) and (d) cc-pVDZ. The fitting basis used for density fitting procedure for the curvature calculation of LOSC is aug-cc-pVTZ for (a)(b)(c), and 6-311++G(3df,3pd) for (d).  Figure 3(a) of the main text. Here the LOSC-LDA −I ve (−A ve ) results almost overlap with LDA. The basis used for each calculation is 6-311++G(3df,3pd) and the fitting basis used for density fitting procedure in the curvature calculation of LOSC is augcc-pVTZ.