Embedding quasi-static time series within a genetic algorithm for stochastic optimization: the case of reactive power compensation on distribution systems

This paper presents a methodology for the optimal placement and sizing of reactive power compensation devices in a distribution system (DS) with distributed generation. Quasi-static time series is embedded in an optimization method based on a genetic algorithm to adequately represent the uncertainty introduced by solar photovoltaic generation and electricity demand and its effect on DS operation. From the analysis of a typical DS, the reactive power compensation rating power results in an increment of 24.9% when compared to the classical genetic algorithm model. However, the incorporation of quasi-static time series analysis entails an increase of 26.8% on the computational time required.

n R P C : Node for the installation of reactive power compensation. t : Index for each hour of the year (t = 1, . . . , 8760). A : Matrix of genetic algorithm population. a (g) : Individual g of genetic algorithm population (A). a (b, g) : B i tb of individual g of genetic algorithm. AC C (n) : Capital cost of compensation device at node n (€). AC C F xd (n) : Capital cost of fixed compensation device at node n (€). (n) : Capital cost of variable compensation device at node n (€). ARC (n) :

AC C Var
Replacement cost of compensation device at node n (€). ARC F xd (n) : Replacement cost of fixed compensation device at node n (€). ARC Var (n) : Replacement cost of variable compensation device at node n (€). AMC (n) : Maintenance cost of compensation device at node n (€). AMC F xd (n) : Maintenance cost of fixed compensation device at node n (€). AMC Var (n) : Maintenance cost of variable compensation device at node n (€). BC BV : Branch-current to bus-voltage matrix. B I BC : Bus-injection to branch-current matrix. C RF : Capital recovery factor. C AP (l, b) : Reactive power injected to the system for block l and node b (kVAr). d (h, n) : Load profile at time h and node n. D (t, n, c) : Load time series at time t, node n, and Monte Carlo trial c. D k++ (t, n) : Load time series selected by k-means++ algorithm. D max (n) : Capacity of distribution transformer at node n (kVA). e P V : Electron charge (C). E max : Maximum value of electricity prices (€/MWh). E (t) : Electricity price at time t (€/MWh). E L S(n) : Cost of energy losses for compensation installed at node n (€). f R (·) : Function to calculate the heat loss by radiation. f C (·) : Function to calculate the heat loss by convection. f S (·) : Function to calculate the heat gained by solar radiation. f MP P T (·) : Implementation of maximum power point tracking algorithm. FZ (·) : Cumulative distribution of time seriesz (t, n, c) . F D (·) : Cumulative distribution of load-demand time series D (t, n, c) . F F o P V(t, n) : Maximum fill factor at time t for generator n. F F P V(n) : Fill factor of generator n. G MD (n) : Geometric mean distance between phase conductors of branch n (m). G MR (n) : Geometric mean radius of phase conductor of branch n (m). G (t) : Solar radiation at time t (W/m 2 ). I SC P V(t, n) : Short-circuit current at time t for generator n (A).
I SC P V, STC (n) : Short-circuit current under standard test conditions for unit n (A). I MAX P V(n) : Current at maximum power production for generator n (A). I P V(t, n) : Photovoltaic cell current at time t for generator n (A). I (t) : Vector of branch currents at time t (A). I (t, n) : Current at time t and branch n (A). I (t,n) : Difference between the maximum current and the actual (A). I max (t, n) : Ampacity at time t of branch n (A). k DS : Coefficient that depends on the system frequency ( /km). k P V : Parameter of solar transmittance and absorptance. k B : Boltzmann constant (J/K). k C S(n) : Auxiliary variable of objective function for node and branch n. m P V : Photovoltaic cell ideality factor. NP C (n) : Net present cost for compensation equipment at node n (€).

NP C G A(g) :
Net present cost of individual g (€). N P S P V(n) : Number of panels connected in serial on generator n. N P P P V(n) : Number of panels connected in parallel on generator n. N C S P V(n) : Number of cells connected in serial on generator n. N C P P V(n) : Number of cells connected in parallel on generator n. NOC T (n) : Nominal operating cell temperature of generator n ( • C).

Ob j G A(g) :
Value of the objective function for individual g (€). p I (t, n) : Normalized power of inverter at time t for generator n. P I (n) : Rated power of converter of generator n (kW). P (t, n) : Active power consumed at time t and node n (kW). P R(t, n) : Heat loss by radiation at time t for branch n (W/m). P C (t, n) : Heat loss by convection at time t for branch n (W/m). P S(t, n) : Heat gained by solar radiation at time t for branch n (W/m). P P V(t, n) : Photovoltaic cell power at time t for generator n (W). P r {·} : Probability of a determined event. Q (t, n) : Reactive power consumed at time t and node n (kVAr). Q l C : Lowest value of confidence interval of reactive power (kVAr). Q r C : Highest value of confidence interval of reactive power (kVAr). Q F xd R P C(n) : Rated capacity of fixed compensation at node n (kVAr). Q Var R P C(n) : Rated capacity of variable compensation at node n (kVAr). R (n) : Conductor resistance of branch n ( /km). R S P V(t, n) : Photovoltaic cell resistance at time t for generator n ( ). S (t, n) : Simulated load series at time t and node n (kVA). T A(t) : Ambient temperature at time t ( • C). T P V(t, n) : Photovoltaic cell temperature at time t for generator n ( • C). u OC P V(t, n) : Relative photovoltaic cell open-circuit voltage at time t for unit n. U (t) : Vector of system voltage at time t (kV).
Voltage at time t and node n (kV). U R : Vector of rated voltage (kV). −→ U (t) : Vector of voltage reduction at time t (kV). U SYS P V(n) : Voltage of photovoltaic generator at node n (V). U mi n : Minimum allowed voltage of distribution system (kV). U max : Maximum allowed voltage of distribution system (kV). U OC P V(t, n) : Photovoltaic cell open-circuit voltage at time t for generator n (V). U OC P V, STC (n) : Open-circuit voltage under standard test conditions of unit n (V). U e P V (t, n) : Thermal voltage at time t for generator n (V). U P V(t, n) : Photovoltaic cell voltage at time t for generator n (V).
U MAX P V(n) : Voltage at maximum power production for generator n (V). V (t) : Wind speed at time t (m/s). z (h, n) : Standardized daily load profile at time h for node n. z (t, n, c) : Standardized and normalized series at time t, node n, and trial c. z (t, n) : Standardized yearly load profile at time t for node n. Z (n) : Conductor impedance of branch n ( /km). α L (t) : Value of white noise at time t. α P V(n) : Temperature coefficient of generator n (%/ • C). α I (n) : Parameter of inverter efficiency model for generator n. β I (n) : Parameter of inverter efficiency model for generator n. δ : Significance level. η I (t, n) : Inverter efficiency at time t for generator n. η P V(n) : Cell efficiency of photovoltaic generator n. θ I (n) : Parameter of inverter efficiency model for generator n. λ (t, n, c) : Random variable at time t, node n, and Monte Carlo trial c. μ d : Mean value of daily load profile. σ d : Standard deviation value of daily load profile. Ø : Autocorrelation coefficient of load time series (%).

Introduction
The cultural and technological development of human society is closely related to the idea of management and allocation of natural, human, and economic resources in an appropriate manner. Management of resources and budgets needed for developing social welfare requires feasible strategies, frequently based on optimization techniques. With the increasing complexity of industrial systems and human interactions, optimized management of real-life activities involves the incorporation of stochastic processes. Decision-makers incorporate various sources of uncertainty through probability theory or fuzzy logic (FL) to enhance the effectiveness and reliability of the decision process. Moreover, the construction of a sustainable society requires an electricity network with smart capabilities to integrate renewable resources and mechanisms to their full exploitation. Due to the random behavior of clean energies, optimal management of energy systems supported by these technologies has become a necessity because of their growing adoption.
The topic under discussion in this paper is essentially a planning problem subject to the uncertain effects of renewable generation. Next, subsection 1.1 briefly presents a general perspective of the state-of-the-art optimization techniques under uncertainty, while subsection 1.2 explores particular issues of the electricity system operating under uncertain conditions. Finally, subsection 1.3 carefully explains the main contributions of the approach developed in this work.

Optimization under uncertainty
In a general sense, decision-makers can solve optimization problems with uncertain variables using robust optimization (RO), stochastic programming (SP), chance-constrained programming (CCP), and FL, among other techniques.
Concerning RO, Bertsimas, Gupta, and Kallus (2018) developed a robust version of the sample average approximation to overcome the weaknesses related to the limited number of data points used to represent the behavior of uncertain processes. The authors combined features of distributionally RO with hypothesis testing. Similarly, Esfahani and Kuhn (2018) developed a technique to face the problems related to the availability of a limited number of data points by using the Wasserstein metric and distributionally RO. Following a similar trend, Ning and You (2018) modeled the uncertainty using the Dirichlet process mixture combined with a maximum likelihood estimator. Ma (2019) produced a model based on RO that incorporates the risk of life cycle assessment, minimizing the environmental impact of product usage.
RO can be combined with FL and SP to improve optimization results. Farrokh, Azar, Jandaghi, and Ahmadi (2018) developed a hybrid robust fuzzy SP model to consider the influence of different types of uncertainty sources expressed as distributions and fuzzy numbers. In the same way, Ghahremani-Nahr, Kian, and Sabet (2019) developed a model formulated as a robust fuzzy programming problem that employs heuristic techniques. Tsao, Vo-Van, Lu, and Yu (2018) combined two-stage SP and FL for multiobjective optimization. Zou, Ahmed, and Sun (2019) evaluated the weaknesses of multistage stochastic integer programming and proposed an enhanced technique based on stochastic dual dynamic integer programming. The authors reformulated the problem and combined decomposition algorithms with Lagrangian cuts. Baptista, Barbosa-Povoa, Escudero, Gomes, and Pizarro (2019) developed a technique for two-stage multiperiod stochastic optimization that combined CCP and second-order stochastic dominance for risk aversion.
Van Ackooij, Berge, Oliveira, and Sagastizábal (2017) developed a technique to determine the area in which the probabilistic constraints are feasible. Jie, Prashanth, Fu, Marcus, and Szepesvári (2018) developed a technique to solve those problems in which human perspectives are involved. The technique is based on the cumulative prospect theory, where the estimation and optimization are sequentially determined. The approach developed by Buchheim and Pruente (2019) aimed to define a set of feasible solutions instead of a single optimal one. Liu, Lau, and Kananian (2019) proposed a constrained stochastic successive convex approximation methodology to the solution of non-convex problems. The technique solves several convex optimization problems using a surrogate model. Liu, Liu, and Tao (2019) designed a technique to face stochastic composition optimization problems with two expected value functions. The approach embedded an inner objective function into an outer one. The stochastic variance reduction gradient and stochastic average gradient methodologies are combined to estimate the value of the inner objective function, while the dualityfree concept is employed to estimate the value of the outer function.
Al-Juboori and Datta (2019) employ non-dominated sorted genetic algorithm-II (NSGA-II) on the optimal design of hydraulic water-retaining structures considering reliability concepts. The formulation considers several stochastic ensemble surrogate models to incorporate the effect of uncertainty.
Regarding CPP, Tan, Gong, Chiclana, and Zhang (2019) proposed a model based on goal programming. The interaction between decision-makers' opinions is represented using uncertain variables, while a combined technique based on the Monte Carlo (MC) simulation and a genetic algorithm (GA) solves the optimization problem. Further on this topic, Kinay, Saldanhada-Gama, and Kara (2019) created a multicriteria model and analyzed the performance of vectorial optimization and goal programming. Tavana, Shiraz, and Di Caprio (2019) proposed a model based on random rough mathematical programming formulated using CCP.
On the one hand, the computational tractability of the problem deserves particular attention. On the other hand, the knowledge about the specific application becomes crucial to mitigate the adverse effects of the lack of information. Concerning this reasoning, the next subsection gives an overview of the problem analyzed in this paper.

Reactive power compensation on distribution networks
Full-scale deployment of renewable generation could produce operating problems on a distribution system (DS), specifically on the voltage profile. A representative condition is when a high amount of renewable power is injected into the DS during periods of low load, resulting in a voltage increase out of the safe limits. Under this panorama, planning engineers analyze the incorporation of reactive power compensation (RPC) to guarantee the effective operation of the smart grid (SG). RPC can be performed using different devices: a shunt capacitor bank (SCB), a distribution static compensator (DSTATCOM), a unified power quality conditioner (UPQC), or a static VAR compensator (SVC).
Consequently, the sizing and management of these devices have been extensively studied in the literature, resulting in several techniques. Das, Das, and Patra (2017) developed a method that aims to improve the voltage of a node remotely located from another under our control by injecting reactive power. Azzam and Mousa (2010) combined a GA and the ε-dominated definition with the technique for order preference by similarity to the ideal solution (TOPSIS), considering multiple optimization objectives. Pereira, Costa, Contreras, and Mantovani (2016) presented a model for the allocation of distributed generation (DG) and RPC, which uses a tabu search (TS) algorithm and Chu-Beasley GA (CBGA). The TS is used to optimize the installation place of the SCB and DG, while the CBGA is used to perform their power dispatch. Jannat and Savić (2016) developed a model for the optimal allocation and size of an SCB. The model considers the variability of renewable resources, the quality of the voltage profile, and the capacity of reactive power. Thus, MC simulations are used to consider the uncertainty, while an NSGA-II solves the multiobjective optimization problem. Fard and Niknam (2014) developed a methodology that jointly analyzes several reliability indexes and the power loss cost. The authors designed the method as the solution to a multiobjective optimization problem using a firefly algorithm. Su, Masoum, and Wolfs (2016) proposed a sequential approach to maximize the net annual savings, taking into account the voltage limits and the maximum size of the SCB.
Using the literature review presented until this point, the next subsection discusses the main contributions of this work.

Contributions
Stochastic optimization techniques presented in the technical literature show different viewpoints on how to describe the uncertain processes as well as how to make the resulting formulation computationally tractable. On the other hand, the performance of RPC devices, such as SCBs, DSTATCOMs, UPQCs, and SVCs, depends on the load profile and behavior of the DG so that the operation is affected by the randomness of renewable generation.
Probabilistic constraints related to the problem of installing RPC in a DS supported by renewable energy are directly related to the interactions between the DG and load demand through time. Both of these variables, solar photovoltaic (PV) generation and load demand, are modeled by a time series with a determined correlation degree and daily profile, along with other features. In this paper, quasi-static time series is combined with a GA to determine the optimal size and placement of RPC devices, such as SCB and DSTATCOM.
On the one hand, heuristic optimization approaches, such as GA, GA-TOPSIS, TS-CBGA, NSGA-II, firefly, and PSO, provide flexibility on their formulation and implementation, which allows us to incorporate the non-linear influence of RPC devices on the DS operation. On the other hand, the technical literature considers a limited number of load and DG operating conditions due to the high complexity of the problem under study.
Quasi-static time series consists of evaluating the performance of the DS in terms of voltage and current at each node and branch of the system along the time. The method performs a probabilistic power flow (PPF), obtaining the probability distribution function (PDF) of each voltage and current of the system. These PDFs allow us to verify whether the voltage and ampacity at each node and branch of the network are within the appropriate operating interval.
As the main contribution of this paper, the proposed methodology embedded this general concept into an optimization model based on a GA, using probabilistic constraints to guarantee the feasibility of the solution (sizing and placing of RPC equipment), and minimizing the net present cost (NPC). This contribution fulfills the gap of many works that only assess and optimize the system behavior under a limited number of operating conditions and assumptions.
The paper organization is as follows: Section 2 describes the energy system; Section 3 explains the simulation method; Section 4 presents the proposed methodology, and Section 5 illustrates the proposed method. Finally, Section 6 discusses the main conclusions and findings.

Energy System Model
This section describes several essential aspects: the model of environmental resources, the characterization of electricity consumption, the model of the energy conversion systems used for DG, and the behavior of energy prices.

Environmental variables and renewable resources
Many research centers analyze the meteorological variables such as solar radiation, wind speed, and ambient temperature. The United States National Aeronautics and Space Administration (NASA) provides a general-purpose database of natural resources across the globe (NASA, 2019), which many optimization tools use for the integration of renewable energies.
This work pays special attention to the influence of solar radiation, wind speed, and ambient temperature on the ampacity of DS feeders. The proposed method's simulation model adopts the retrospective reanalysis approach; Pfenninger and Staffell (2016) and Staffell and Pfenninger (2016) implemented this technique for the estimation of the natural resources. The website Renewables.ninja (2019) provides an online implementation of this technique.

Electricity demand, real-time pricing, and smart distribution system
With the integration of communication equipment and remote sensing  into the DS, the interest in the behavior of electricity consumption has grown continuously. Many approaches have been proposed in the literature to represent and classify the consumption pattern of electricity. This work uses a statistical approach to numerically create a time series able to represent the daily behavior and PDF of energy consumption. However, other methodologies based on the combination of a time series regression with an artificial neural network (Moeeni & Bonakdari, 2017) or those based on a Markov process (Bonakdari, Zaji, Binns, & Gharabaghi, 2019) are also available. The simulation approach used in this paper assumes the availability of a consumption profile, its correlation coefficient, and its PDF. Thus, the process described as follows can be used to synthetically generate hourly values in 1 year (Lambert, Gilman, & Lilienthal 2006): Step 1: Select a determined node (n) classified as a demand node (PQ model for power flow). For those nodes with PV generation, go to subsection 2.3.
Step 2: Set the index of the MC simulation (c) to 1 (c ← 1).
Step 3: Create the time series of correlated random numbers for the corresponding node (n) and MC simulation (c). It is carried out by applying equation (1).
The variable α L (t) is modeled as white noise with mean zero and standard deviation equal to √ 1 − Ø 2 .
Equation (2) defines the normalized daily profile.
Step 5: Calculate the yearly time series of a typical load profile. It is carried out by periodically repeating the daily profile throughout the year (365 times). It results in the yearly profilē z (t, n) built from the seriesz (h, n) . Step 6: Create a yearly time series with a determined correlation degree and with the profile shown in Fig. 1. It is carried out by adding the seriesz (t, n) and λ (t, n, c) according to equation (3).
Step 7: Apply the probability transformation shown in equation (4) to determine the load time series with the required PDF. Figure 2 presents the cumulative density function (CDF) of load time series [F D (·)], which was calculated from the load profile of Fig. 1.
Step 8: If c < C , set c ← c + 1, and go back to Step 3; else go to Step 9.
Step 9: Select the time series to be used in the simulation process employing the k-means++ clustering algorithm. All of the time series D (t, n, c) are input to the k-means++ algorithm to find a single time series [D k++ (t, n) ] to represent the load at node n.
Step 10: Scale the time series obtained in Step 9 [D k++ (t, n) ] using the capacity of distribution transformer [D max (n) ], according to equation (5).
An essential characteristic of SG is its capability to apply the electricity tariff on a real-time basis so that electricity prices can vary dynamically. Regarding this issue, this study assumes that the electricity prices are proportional to the corresponding electricity demand. It means that electricity prices reach their highest value when load demand is maximal, and they reach their minimum value at those hours of low demand. Accordingly, equation (6) estimates the electricity prices.
The average impedance of each branch of the network, considering the conductor temperature in the range between 25 and 75 • C, is a frequently used parameter to perform power flow analysis of three-phase balanced systems. Thus, equation (7) shows its definition (Short, 2004).

Distributed PV generation
PV generation distributed over the system is a promising manner to incorporate renewable generation in urban areas. Equations (8-21) describe a complete model of a PV generator connected to a determined node (n). The PV cell temperature depends on the ambient temperature and solar radiation, as shown in equation (8) (Lambert, Gilman, & Lilienthal, 2006). The PV cell model, including the non-linear relationship between voltage and current, is presented in equations (9-17) (Lorenzo et al., 1994). The PV controller employs the golden section search algorithm (Kheldoun, Bradai, Boukenoui, & Mellit, 2016) applied on equation (17). Equation (18) expresses this idea through the function f MP P T (·). The rated capacity of the power converter is presented in equation (19), while equations (20) and (21) describe its efficiency (Rampinelli, Krenzinger, & Romero 2014).

Performance of the Energy System
A long-term analysis of the optimal allocation of compensation devices requires a reliable and accurate simulation model. This section describes how the SG is studied, taking into account the influence of DG. Figure 3 summarizes the general methodology implemented for the assessment of the energy system. According to Fig. 3, information related to the environmental variables (wind speed, ambient temperature, and solar radiation) is required. Additionally, technical data related to the SG, such as DS topology, branch impedance, load demand, and electricity price time series, are also needed. In this sense, the models previously described in Section 2 can be used to approximate some of the required data.
Another piece of critical data is the information related to the set of nodes (candidate nodes) in which a decision-maker would install compensation devices (SCB and DSTATCOM).
Assume that a decision-maker compensates a determined node (n R P C ). It means that the voltage at this node should be equal to the rated value of the system (one in the per-unit system). An RPC device consumes or injects reactive power from or to this node, respectively, at each hour of the year. An RPC compensation device consists of a constant (fixed) provision of reactive power and another provision of reactive power with dynamic (variable) capabilities. The constant provision could be an SCB, and the variable provision could be a DSTATCOM. Once the initial input data are available, solar radiation time series [G (t) ] is used to estimate the corresponding PV generation. Then, the SG and DS information, as well as load and electricity price time series, are incorporated into the PPF, taking into account that the voltage of the compensation node (n R P C ) keeps at the nominal value.
The PPF consists of sequentially solving 8760 power flow problems to consider the variations of load demand and DG generation along the time. Time series of the current at each branch, the voltage at each node, energy losses on the DS, and the reactive power consumed or injected from or to the compensation node (n R P C ) are results of the PPF analysis. Time series of all of these variables are used to estimate the cost of integrating RPC at the chosen node and the probability of maintaining the voltage profile and branch current within an acceptable interval. The time series of reactive power at the compensation node (n R P C ) is analyzed to estimate the rated value of the equipment (SCB and DSTATCOM) to be installed.
Then, the decision-maker builds the PDF of the RPC at this node. The confidence interval associated with the PDF is calculated considering a determined significance level (δ) and later used to estimate the rated capacity of the RPC device. Thus, the rated capacity consists of the constant provision of reactive power [Q F xd R P C(nRPC ) ] and the dynamic provision [Q Var R P C(nRPC ) ], specifically. Moreover, the time series of energy losses is used to estimate its associated costs [E L S(nRPC ) ]. The rated capacity for compensation equipment and the cost related to energy losses (E L S ) are combined with other techno-economic parameters to estimate the corresponding NPC, which is the crucial factor to be minimized in this study.
Voltage time series at each node are used to build their corresponding PDFs. These PDFs are later used to evaluate the probability of maintaining the voltage within a determined interval (P r {U mi n ≤ U (t, n) ≤ U max } ≥ 1 − δ), considering a determined significance level (δ).
Wind speed, solar radiation, and temperature time series, as well as SG data, are used to perform the ampacity analysis. This study obtains the maximum allowed current per branch [I max (t, n) ]. Then, the model computes the difference between the maximum allowed current and the actual current obtained from the PPF [ I (t, n) = I max (t, n) − I (t, n) ], to estimate the probability of overloading at each branch [P r { I (t, n) < 0} ≤ δ], considering a determined significance level (δ).
From the perspective of an optimization study, adequacy analysis for the installation of a compensation device is evaluated using the NPC as the main objective, and the probabilities P r {U mi n ≤ U (t, n) ≤ U max } ≥ 1 − δ and P r { I (t, n) < 0} ≤ δ as constraints. The next subsections explain each of these steps on the assessment of the SG.

Probabilistic power flow analysis
The solution of PPF with the load demands and DGs is an iterative process. As aforementioned, load nodes are represented as a PQ model, while DGs are voltage-controlled nodes (PV model for power flow nomenclature). PV generation is connected to DS at point of common coupling (PCC) through the power converter, which has capabilities to maintain the voltage at the connection node at the rated value (IEEE Standards Coordinating Committee 21, 2018).
For each hour of the year (t = 1, . . . , 8760) and node of the system (n = 1, . . . , N), the corresponding apparent power [S (t, n) ], active power [P (t, n) ], and reactive power [Q (t, n) ] are available in the format shown in equation (22). In the case of load demand nodes, the model used here calculates the active and reactive power through equation (22) using the time series developed in subsection 2.2 with a determined power factor. In the case of PV generation, the simulation model estimates the active and reactive power combining the output power of PV generation according to subsection 2.3 and the reactive power required to maintain the voltage at PCC at its rated value. The computational procedure to calculate the reactive power of PV nodes is available in Teng, (2003Teng, ( , 2008 and Lujano-Rojas, Dufo-López, Bernal-Agustín, Domínguez-Navarro, and Catalão (2018).
The optimization model solves 8760 load flow problems, obtaining the vectors of voltage and current [ U (t) and I (t) ] that define the time series of the voltage of each node and current of each branch of the system. Equation (23) defines the form of such vectors.

Ampacity analysis
The maximum admissible current through a determined DS feeder [I max (t, n) ] can be approximated using the general heat equation of the conductor of interest under steady-state conditions. This process results in the model shown in equation (24) (Deb, 2000).

I max
(t,n) = P R(t,n) + P C (t,n) − P S(t,n) R (n) ⦡ t = 1, . . . , 8760; n = 1, . . . , N In this sense, heat loss by radiation depends on temperature, as suggested in equation (25). Heat loss by convection depends on the temperature and wind speed, as suggested in equation (26), while heat gain depends on the solar radiation, as suggested in equation (27) (Deb, 2000).
Specific details about the estimation model represented by the functions f R (·), f C (·), and f S (·) can be found in Deb (2000).

Net present cost estimation
Estimating NPC depends on many critical techno-economic factors, such as inflation and discount rates and project lifetime, as well as capital replacement and maintenance costs. Capital and replacement costs are directly related to the capacity of the RPC equipment to be installed, while maintenance costs are related to the costs associated with the energy losses on the DS.
The model calculates the fixed and variable capacities of the RPC device [Q F xd R P C(n) and Q Var R P C(n) ] from the PDF of reactive power provision at the node of interest (n = n R P C ). It means that the reactive power to be provided to this node, to maintain its voltage at the nominal value, has to be calculated for each hour of the year. Otherwise, the model estimates the effect of the RPC (fixed and variable) on the DS by modeling this device as a voltagecontrolled node (PV model for power flow nomenclature) able to provide reactive power only.
The approximation of the required amount of reactive power to be injected or drawn from the system is an iterative process. The main result consists of a vector of reactive power injected or drawn to the installation node [Q (t, nRPC ) ∀ t = 1, . . . , 8760]. Then, the model calculates the rated capacity of the compensation device using the CDF. The probabilistic technique accurately estimates the rated capacity from the most extreme value observed on the confidence interval of the PDF, considering a specific significance level (δ). The technique calculates the confidence interval by evaluating the CDF of Q (t, nRPC ) inversely in the points δ/2 and 1 − δ/2, to obtain the lowest and highest values of the interval, respectively. Figure 4 briefly shows this idea, where the model uses the histogram of frequencies of reactive power Q (t, n) at the installation node (n = n R P C ) to calculate the CDF. The technique estimates the highest value of the confidence interval (Q r C ) by evaluating the corresponding CDF curve at 1 − δ/2 in an inverse manner. Similarly, the approach calculates the lowest value of the confidence interval (Q l C ) by evaluating the CDF curve at the point δ/2 inversely. Finally, it defines the rated capacity of each RPC device [Q F xd R P C(nRPC ) and Q Var R P C(nRPC ) ] according to equations (28) and (29).
This approach allows us to determine the rated capacity of the device for a given node. Thus, exclusively focusing on determining the proper node to install the corresponding device is needed.
The cost of energy losses [E L S(nRPC ) ] related to the compensation of the node n = n R P C can be estimated by using the energy losses obtained from the PPF and the electricity price time series [E (t) ]. Finally, the NPC [NP C (nRPC ) ] of reactive power compensation at node n = n R P C is calculated using equations (30-33).

Probabilistic voltage profile analysis
As previously mentioned, the optimization model evaluates the quality of the voltage profile by using the probabilistic constraint shown in equation (34).
The optimization technique uses the results obtained from the PPF related to the voltage of the DS at each node. Thus, it builds the CDF of voltage at a determined node (n) using U (t, n) time series.
Then, the approach computes the value P r {U mi n ≤ U (t, n) ≤ U max } as the subtraction between P r {U (t, n) ≤ U max } and P r {U (t, n) ≤ U mi n }.

Probabilistic feeder ampacity analysis
The optimization method uses the results from the PPF for the probabilistic assessment of feeders overloading. This evaluation is carried out by verifying equation (35).
The proposed technique employs the branch current time series [I (t, n) ] and the maximum permissible current time series [I max (t, n) ] to calculate the margin time series [ I (t, n) ], as shown in equation (36). Then, it builds the CDF of the margin [ I (t, n) ] and the probability value P r { I (t, n) < 0} using a linear interpolation algorithm.

Proposed Methodology
This section thoroughly explains the proposed methodology. First, in subsection 4.1, the formulation of the optimization problem is carried out in a probabilistic manner so that the random nature of renewable resources is adequately justified. Then, in subsection 4.2, the solution to this problem using a GA is comprehensively described.

Optimization problem
Optimal allocation and sizing of the RPC devices consists of minimizing equation (37), subject to equations (38-42).
Equation (37) represents the minimization of the NPC of the RPC equipment. Equations (38-40) are the constraints of the voltage profile and feeder ampacity. Constraint equations (41) and (42) represent the energy balance of the system obtained by solving power flow at each time step (t = 1, . . . , 8760).
The next subsection describes the proposed GA implementation.

GA implementation
Researchers apply heuristic techniques in many engineering fields such as vehicle performance improvement (Morteza & Yildiz, 2016), manufacturing process optimization (Yildiz, 2013), and the cam-roller follower mechanism design (Hamza, Abderazek, Lakhdar, Ferhat, & Yildiz, 2018), among other applications. The optimization method developed in this paper considers all of the available places in the system to the allocation of the RPC equipment. The proposed approach organizes the information in a table similar to that presented in Fig. 5, considering B places for the allocation of RPC devices. The proposed methodology uses a binary-coded GA, where each individual has B bits. This table establishes the relationship between each bit and the DS node. Using the example shown in Fig. 5, if the second bit (b = 2) is equal to 1, it means that the installation of a compensation device on node 20 (n R P C = 20)   is analyzed. On the contrary, if the second bit is equal to zero, the installation of a compensation device in that node is not considered. Figure 6 presents an illustration of the GA population (A) with G individuals and B possible places for the installation of compensation equipment. Figure 7 illustrates the structure of the individual g, which considers the compensation of two different places on the system simultaneously.
The optimization algorithm calculates the objective function of a determined individual [ a (g) ] by following the algorithm explained as follows: Step 1: Considering those bits of the GA individual [ a (g) ] equal to 1 and the corresponding table of candidate nodes (Fig. 5), perform the PPF (subsection 3.1) and ampacity analysis (subsection 3.2), and estimate the capacity of the RPC equipment and the NPC (subsection 3.3). Equation (43) expresses the NPC of the individual under analysis.
Step 3: The CDF of the system voltage and current at each node and branch (n) are built to evaluate the inequality equations (38) and (39). If the constraint equation (38) is false or the constraint equation (39) is false, the respective element of the Downloaded from https://academic.oup.com/jcde/article-abstract/7/2/177/5815390 by Universidad de Zaragoza user on 22 May 2020  Step 4: If (n < N), set n ← n + 1, and go to Step 3; else go to Step 5.
Step 5: Calculate the value of the objective function [Ob j G A(n) ] of individual g using equation (44).
The objective function equation (44) becomes higher as the operational feasibility of the solution reduces. In other words, when the constraint equations (38) and (39) are not valid in some nodes or branches, the value of the objective function in equation (44) increases, reducing the fitness of the individual.
The rest of the steps related to the optimization with a GA using binary coding (reproduction, crossing, and mutation pro-cedures) appear in the technical literature; more information is available in Goldberg (1989).

Case Study
The technique proposed in this work has been analyzed using an extensive simulation process considering a hypothetical DS. The next subsections thoroughly describe the testing conditions as well as the corresponding results.

Case description
A DS to be located in Zaragoza (Spain) illustrates the proposed methodology in this paper. The simulation model generated the load demand and electricity price time series using the method previously described in subsection 2.2. For a better understanding, Fig. 8 shows the yearly time series of load demand [D k++ (t, n) ] on the left side and the first 72 hours of the year on the right side. It is possible to observe how the generated time series have the hourly trend of the profile previously shown in Fig. 1.
The typical meteorological year (TMY) for solar radiation, ambient temperature, and wind speed were found on the website Renewables.ninja (2019) for the area of interest (latitude equal to 41.84 • and longitude equal to −1.1695 • ).
Figures 9 and 10 present the TMY for ambient temperature and wind speed, respectively. Figure 11 shows the TMY of solar radiation for 0 • and 36 • tilt. The TMY of ambient temperature, wind speed, and solar radiation with 0 • tilt are used to evaluate the ampacity of each distribution feeder (subsection 3.2). The TMY of solar radiation for 36 • tilt is used to evaluate the PV power production; according to Jacobson and Jadhav (2018), this is the optimal tilt angle for the location under study. Figure 12 shows the structure of the DS, while Table 1 presents the corresponding information related to the conductor type, feeder length, and the location of the DG. The rated voltage was assumed to be 12.47 kV operating at 60 Hz; the tolerance of PPF analysis was 0.001, the maximum number of iterations was 16, and the general power factor was 90%.  The substation capacity was 4000 kVA. All of the conductors of the DS were assumed to be an all-aluminum conductor type, while the structure of each feeder was assumed to be built considering G MD (n) = 1.26 m. The types of nodes considered were load, generation, and compensation. The load nodes are power consumption with constant power factor, while the compensation nodes are places where the decision-maker would install RPC equipment (candidate nodes). Compensation and generation nodes are voltage-controlled nodes to maintain the voltage at PCC to 12.47 kV. Table 2 presents the data and characteristics of the PV generators used to estimate the power production of the farms connected to nodes 18 and 49. Table 3 shows the table of candidate nodes (Fig. 5) for the system under analysis. Table 1 was used to build this table, considering the node type. This table presents the relationship between GA individuals and the candidate nodes available for the installation of the RPC devices. According to the number of rows in this table, the number of bits of each individual is set to 20 (B = 20), obtaining 2 20 = 1048 576 possible combinations for the installation of the RPC.
The binary-coded GA with the objective function described in subsection 4.2 was implemented considering 100 generations, 50 individuals in the population (G = 50), a crossover rate of 95%, and a mutation rate of 5%. The maximum value of electricity prices considered is 110€/MWh (E max = 110€/MWh). The calculated NPC assumes a discount rate of 5% and an inflation rate of 3.5%.
The example considered a project lifetime of 25 years. Capital costs for fixed and variable RPC equipment were assumed to be 50€/kVA and 500€/kVA, respectively, and their lifetime was 15 years. The simulation has been performed in MATLAB R using a computer with an i7-3630QM CPU at 2.40 GHz with 8 GB of memory and 64-bit operating system. Table 4 presents the rated values of the fixed [Q F xd R P C(n) ] and variable [Q Var R P C(n) ] compensation devices. These values were estimated considering a significance level of 5% (δ = 0.05).

Optimization results
The computational time spent was 58 hours, approximately. The iterative process needed to estimate the reactive power to be injected or drawn at each compensation node and time step requires a considerable amount of computational resources. Nevertheless, the proposed approach can obtain a reliable solution with a probabilistic basis. Figure 13 describes the behavior of the optimization algorithm, which converges to the values previously shown in Table 4. Figures 14 and 15 present the PDF and CDF of the RPC, respectively. The values of reactive power are negative, which means that reactive power is injected into the system most of the time. As shown in Fig. 14, a continuous provision of reactive power is required. This amount of reactive power (463.39 kVAr) is supposed to be supplied by a device similar to an SCB. On the other hand, the hourly fluctuations of the reactive power, mainly related to the diurnal variations of load demand, are supposed to be provided by a device with dynamic capabilities, similar to a DSTATCOM of 1464.69 kVAr. Table 5 presents the probability constraint equations (38) and (39) considering a range of ±5% of the rated voltage of the system (U mi n = 95% and U max = 105%). As can be observed, all of the probabilities of fulfilling the constraint equation (38) are higher than 95% (with a significance level of 5%) when the RPC devices are installed. Initially, the DS has problems at those nodes located far from the substation. Nodes 62, 64, and 65 have low probabilities of fulfilling the required operating conditions. These nodes have probabilities of 94.908, 90.325, and 87.594%, respectively (i.e. all of them are lower than 95%, with a significance level of 5%).
However, when the decision-maker evaluates the installation of RPC in node 8, the corresponding condition becomes valid for all the nodes of the system. On the other hand, the probability of overloading remains negligible because the wind speed is higher than zero most of the time, which increases the ampacity of each distribution feeder.

Proposed versus traditional GA implementation
This subsection presents a comparison between the proposed and the traditional GA implementation. Complete details related to the implementation of a traditional GA are given in Appendix A. Table 6 presents the rated capacity of the RPC at each node of the system, while Table 7 shows a comparative analysis between the total RPC to be added to the system according to the method implemented.
From Tables 4 and 6, the proposed methodology only suggests the integration of the RPC in specific nodes, while the traditional methodology suggests the integration of compensation Downloaded from https://academic.oup.com/jcde/article-abstract/7/2/177/5815390 by Universidad de Zaragoza user on 22 May 2020    devices spread over the system. On the other hand, the traditional method suggests the incorporation of less compensation capacity than the proposed technique. At this point, the most important differences between both techniques influence the results. In the proposed methodology, adding or subtracting reactive power fixes the voltage of a determined node of the system at the rated value. Thus, this action impacts the behavior of the rest of the system. In the traditional approach, the addition or subtraction of reactive power reduces the energy losses of the system, maintaining the voltages within a determined operating interval, but not necessarily at the rated value all the time.
On the one hand, the proposed approach considers the local effects of reactive power injection of DG at the PCC, and the interaction between the load demand and DG. On the other hand, the traditional method considers the DG's effects in a centralized manner, using the estimation of the net load. In a general sense, the estimation of the net load results in the analysis of a DS with lower demand than in the case of the proposed approach, resulting in a lower estimation of the RPC and a difference of 24.9%.
Nevertheless, the formulation of both methods has necessary implications for the structure of the optimization problem. The proposed technique depends only on the number of candidate nodes, while the traditional formulation depends on several aspects; these aspects are the number of blocks of load duration curve (LDC), the number of candidate nodes, and the integer magnitudes of reactive power.

Conclusions
In this paper, a methodology for the optimal placement and sizing of RPC devices in a DS with PV generation has been presented and tested through an extensive computational analysis.
The proposed methodology reduces the computational complexity when compared to the traditional GA implementation based on the discretization of the LDC, because it considers only binary variables. However, the evaluation of each GA individual of the proposed approach requires the solution of PPF, which means solving 8760 deterministic power flow calculations. Conversely, the evaluation of a GA individual of the traditional implementation depends on the number of discretization intervals used to build the LDC. Consequently, the proposed approach requires 26.8% more computational time than the classic version, according to the analyzed case. This issue could be overcome by implementing parallel computing solutions or by reducing the number of data points of input time series using a clustering algorithm. Another significant problem of the proposed approach is its scalability because the computational time also increases with the number of DS nodes. Although the proposed technique requires considerable computational resources, it can provide a reliable solution with a probabilistic basis. In other words, traditional GA implementation offers a general approximation of a required RPC; this method calculates it from the discretized LDC (Table A1 of the  appendix). Relevant characteristics, such as autocorrelation and daily profile behavior, are not considered. Conversely, the proposed approach considers the hourly interaction between the load demand (Fig. 8) and renewable resources (Figs 9-11), resulting in a 24.9% increase of the RPC's capacity, when compared with the classical approach. Additionally, PPF considers the connection node of the RPC on the DS in each evaluation of GA individuals. These features offer higher accuracy and a solution Downloaded from https://academic.oup.com/jcde/article-abstract/7/2/177/5815390 by Universidad de Zaragoza user on 22 May 2020 with high robustness because the PDF of the RPC is calculated (Figs 14 and 15).
Another disadvantage relies on the quality of the time series simulation method of environmental variables and electricity demand because the proposed approach is highly dependent on this data. In this regard, recently developed approaches related to big data analytics applied to the SG can be useful to estimate the time series of the load and the environmental variables of interest. Besides, a more in-depth study on the relationship between the behavior of load demand and electricity prices (equation 6) should be performed to increase the feasibility of the NPC estimation.

Supplementary Data
Supplementary data are available at JCDENG online.

A.1 Traditional GA implementation
The technique proposed by Das (2002) was considered a benchmark for comparison. Traditional implementation consists of building the system's LDC, as shown in Fig. A1. Then, the traditional method discretizes this curve into l = 1, . . . , L blocks considering the corresponding relationship between the load demand [ P (l) ] and time [ t (l) ].
Calculations have been carried out by considering the net load of the system, while the power injection at each generation node was zero.   objective (equation A.1) is related to the energy losses on the DS and the investment in RPC devices, while the constraint (equation A.2) is related to the voltage profile improvements for each operating condition (l = 1, . . . , L ). The energy cost (K e ) is the mean value of the price signal [E (t) ], while the acquisition cost (K c ) is the value mentioned above 500€/kVA.  Figure A2 shows the structure of a typical individual of this implementation, which uses integers to specify a determined type and size of RPC. The variable C AP (l, b) characterizes each element of the GA individual; this variable depends on the block number and the installation node. C AP (l, b) represents the magnitude of the reactive power injected or drawn from the system during the analysis of the load condition, which corresponds to the block l at the installation node b. Figure A3 presents the LDC and the corresponding discretization.
The traditional GA was implemented considering 800 generations, a population size of 7500 individuals, a crossover rate of 95%, and a mutation rate of 5%. The addition or subtraction of reactive power was evaluated using integer magnitudes. It means that the sizes associated with the variable C AP (l, b) ∀ l = 1, . . . , 12; b = 1, . . . 20 were injections or extractions of reactive power using integers in the step of 1 kVAr. For our case study, C AP (l, b) ∈ [−114, 0]. Figure A4 shows the convergence of the traditional GA for the case under study. Table A1 shows the results of the optimization process. It reports the amount of reactive power to be injected per candidate node (b = 1, . . . ,20) and discretization block (l = 1, . . . ,12). As expected, the highest injection of reactive power occurs at the highest load demand, which defines the capacity of the equipment according to this method.