Uncertainty analysis of numerical inversions of temperature logs from boreholes under injection conditions

Conventional methods to estimate the static formation temperature (SFT) require borehole temperature data measured during thermal recovery periods. This can be both economically and technically prohibitive under real operational conditions, especially for high-temperature boreholes. This study investigates the use of temperature logs obtained under injection conditions to determine SFT through inverse modelling. An adaptive sampling approach based on machine-learning techniques is applied to explore the model space efficiently by iteratively proposing samples based on the results of previous runs. Synthetic case studies are conducted with rigorous evaluation of factors affecting the quality of SFT estimates for deep hot wells. The results show that using temperature data measured at higher flow rates or after longer injection times could lead to less-reliable results. Furthermore, the estimation error exhibits an almost linear dependency on the standard error of the measured borehole temperatures. In addition, potential flow loss zones in the borehole would lead to increased uncertainties in the SFT estimates. Consequently, any prior knowledge about the amount of flow loss could improve the estimation accuracy considerably. For formations with thermal gradients varying with depth, prior information on the depth of the gradient change is necessary to avoid spurious results. The inversion scheme presented is demonstrated as an efficient tool for quantifying uncertainty in the interpretation of borehole data. Although only temperature data are considered in this work, other types of data such as flow and transport measurements can also be included in this method for geophysical and rock physics studies.


Introduction
The undisturbed or static formation temperature (SFT) is a key objective for the analysis of borehole measurements. It is a particularly important parameter in the exploration and exploitation of geothermal and hydrocarbon resources, as it reveals thermal reserves (Prensky 1992), affects the transport properties of hydrocarbons (Kutasov & Eppelbaum 2010) and determines the drilling operation and production parameters in geothermal and oil reservoirs (Bu et al. 2012). Over the past decades, temperature surveys from geothermal and petroleum wells have been widely used to derive the SFT (Roux et al. 1980;Espinosa-Paredes & Garcia-Gutierrez 2003;Bassam et al. 2010). Most of these methods are based on different analytical models that extrapolate borehole-temperature buildup data after a previous thermal perturbation period (the drilling process) under shut-in conditions (i.e., in a static water column). Advanced approaches have been developed, such as applying neural networks to synthetic and field thermal recovery data (Bassam et al. 2010;Wong-Loya et al. 2012). To-date, the challenges in these numerical approaches have been hardly overcome due to unrealistic assumptions on the borehole drilling process, neglecting measurement errors, etc. (Aabø & Hermanrud 2019).
On the operational side, the acquisition of temperature data over a relatively long period of thermal recovery (hours up to several days) can become a difficult endeavor in practice, especially for high-temperature boreholes. Technical challenges may arise since conventional tools have an upper operating temperature limit (∼ 300 • C). Recent developments concern high-temperature measuring instruments for temperatures above 350 • C (Ásmundsson et al. 2014;Friðleifsson et al. 2020;Okamoto et al. 2019). However, the endurance time of these logging tools in the harsh environment is limited to only a few hours, which limits the application of the above correction methods to the shut-in temperature data.
On the other hand, dynamic temperature logs acquired under flow conditions can also provide valuable information about the borehole and its surrounding formation. Barton et al. (1995) and Steingrimsson (2013) analysed thermal logging data to determine the location of feed/loss zones, their relative sizes and the associated flow rates by detecting 'kicks' in the temperature profiles. Patterson et al. (2017) used snapshots of the temperature profile at discrete times to indicate the rate of wellbore heat gain/loss as well as the evolution of reservoir temperature under normal borehole operating conditions. Drakeley et al. (2006) and Wang et al. (2010) applied optic fiber Distributed Temperature Sensing (DTS) to monitor downhole temperatures in real time with high frequency and spatial resolution. Following the pioneering work of Nowak (1953) to diagnose zonal flow contributions in the borehole based on temperature data, a few papers presented the use of temperature profiles to derive flow measurements (Kabir et al. 2012;Reges et al. 2016;Silva et al. 2019). However, to the authors' knowledge, there is not yet an evaluation of the SFT based on temperature measurements obtained under dynamic situations (i.e., arising from the drilling process or flow injections).
In this study, we apply an inversion modelling approach to analyse the uncertainty in the interpretation of dynamic temperature logs for the SFT determination. Specifically, the inversion scheme involves reduced-order modelling which has proved to be a promising method to solve non-linear inverse problems in recent years (Mirghani et al. 2012;Chen et al. 2017;Schulte et al. 2020). A reduced-order model (ROM), also known as a surrogate model, can be viewed as a regression for a set of input-output data obtained from a high-fidelity code. It is often used to replace the complex original physical model to accelerate the computational speed and improve efficiency in searching the model space of an inverse problem (Zhang et al. 2020). A variety of techniques have been tried to construct ROMs, such as polynomials (Oladyshkin et al. 2011) and kriging (Mo et al. 2019), while other studies consider machine-learning methods (also referred to as data-driven), including support vector machine ( Jhong et al. 2017), artificial neural networks (Sudakov et al. 2019) and quantum-enhanced deep learning (Liu et al. 2021), to name a few.
We adopt a simple, non-parametric, supervised machinelearning model called K-Nearest Neighbor (KNN, see Section 2.2 for more details) to create a ROM. The ROM is then integrated into the inversion process to propose sampling points in each iteration. A detailed description of the workflow can be found in Section 2. In Section 3 and 4, the inversion procedure is applied to different case studies to examine various aspects that affect the accuracy of the determined SFT and are related to the injection conditions, the quality of the data and aspects of the inverse modelling such as the prior information and the type of misfit function. Our study aims to contribute to a better understanding of these influencing factors and to present a method for quantifying the associated uncertainties. In doing so, it also demonstrates the capability of the data-driven surrogate modelling approach to solve inverse problems using borehole temperature data, which has hardly been investigated in this context so far.

Methodology
To determine the parameters of interest, namely the SFT and later the flow loss, as well as their uncertainties (deviations from the true values), a two-step procedure is applied. The first step consists of forward modelling that evaluates the temperature profile by simulating the advective heat transport within the borehole and the heat transfer between the borehole and the formation, using an in-house simulator developed on the MOOSE framework Wang et al. 2019). The second step consists of parameter inversion using an adaptive sampling approach based on the ROM, which is driven by the RAVEN software (Alfonsi et al. 2016). Specifically, RAVEN provides different machine-learning algorithms to train a ROM via an Application Programming Interface (API) from the scikit-learn python library (Pedregosa et al. 2011). It also couples natively with a MOOSE-based application so that the two steps above can be performed using a single software tool. Such a framework also enables the distribution of a large number of calculations to multicore workstations and high-performance computation systems.

Forward thermal modelling
The forward simulation simplifies the thermal modelling procedure by assuming that the geometries of the borehole and formation are cylindrical, the fluid is incompressible and its flow direction in the borehole is only axial. Furthermore, the rock formation is considered impermeable and the thermal dissipation and expansion effects of the fluid are negligible.
Given above, the thermal transport mechanism in the borehole is governed by both conduction and advection, which is typically expressed in cylindrical coordinates as follows (Yang et al. 2013): Assuming incompressible flow, the continuity equation is given by where f is the fluid density, c p, f is the fluid specific heat capacity, f is the fluid thermal conductivity, and v z, f and v r, f are the axial and radial flow velocities, respectively. v z, f is calculated by Q/A, where Q is the flow rate and A is the borehole cross-sectional area.
Considering only heat conduction in the formation, the energy conservation equation can be written as where s , c p, s and s are the density, heat capacity and thermal conductivity of the formation, respectively.
The thermal exchange between the borehole and formation is modelled via thermal transfer relations at their interface: where q is the heat flux, Γ sf is the interfacial area between the fluid and the formation, and h is the heat transfer coefficient under forced convection. A detailed description for the calculation of h can be found in Wang et al. (2019).

Adaptive sampling based on the reduced-order model
The inversion or estimation of the SFT and the flow rate is conducted using an adaptive sampling approach. This involves using the results of previous simulations to create a surrogate model, which is then used to suggest the most informative area in the model space for the next sampling step. In this way, the number of iterations required to solve the inversion problem is reduced compared to other classical sampling methods such as Monte Carlo, Latin Hypercube Sampling, etc (Mandelli et al. 2015).
In this study, the ROM is built using perhaps the simplest and most transparent surrogate model, KNN (Runarsson 2004). KNN is non-parametric and requires no prior knowledge of the type of mapping function. Thus, it is free to learn any functional form from the training data (Russell & Norvig 2002). Furthermore, it is easy to implement since the learning consists of simply storing points that are evaluated using the high-fidelity model, and each time a point is added, the trained model is improved. KNN predicts a so-called label (defined, in our case, in equation 7) of a sampling point based on the labels of its k nearest neighbors using the following formula: where C is the label associated with each nearest neighbor and k is the number of nearest neighbors. The weight of the jth nearest neighbor (p j ) for the evaluated point (p) is defined as w j = 1/dist(p, p j ), where the distance dist(p, p j ) is the Euclidean distance between p and p j . The ROM is then used as a 'classifier' that predicts where further exploration of the model space should be oriented to develop a Limit Surface (LS) that identifies the boundary between the positive and negative Boolean labels established according to a user-defined constraint criterion (Alfonsi et al. 2016). In our analysis, such a criterion is constructed using the root mean square error (RMSE), which describes the discrepancy between the simulated and measured borehole temperatures as follows: where T sim is the simulated temperature, T measure is the measured temperature and m is the number of the sampled temperatures along the depth. A decision function C(RMSE) is defined to recast the response of the system into a binary form: where RMSE thres is the RMSE threshold value. In reality, the RMSE comes from two sources: the errors of (1) the measured temperatures and (2) the calculated T sim in the forward modelling, both with respect to the true borehole temperatures. The reason for choosing the RMSE as a 1024 criterion is that, due to its arithmetic similarity, the RMSE can be considered analogous to the standard deviation of the measured data (assuming that the data are not biased) (Meyer 2012). Meanwhile, it is the most commonly used metric for measuring model prediction quality, which makes it also suitable for presenting the second type of error source. In all case studies of this work, only one error source is included in the RMSE at a time in order to investigate its influence on the interpretation of the temperature log separately. We will first focus on the measurement errors and later include aspects of forward modelling by considering incorrect model assumptions. Given that temperature logging instruments typically have an accuracy of ± 1 • C (Förster 2001) and the errors of the measurements can still rise at higher temperatures (Sharma et al. 2021), we explicitly select different RMSE thres values between 0.5 • C and 2.0 • C as possibly acceptable fitting qualities between the model predictions and the measurements, and allow either data to have some errors.
In the context of this study, the employed inversion scheme is intended to find the boundary (LS) that delineates the model space-SFT (one-dimensional) or SFT and the flow loss (two-dimensional), depending on whether the RMSE values of the model predictions are larger or smaller than the RMSE thres . In summary, the generalised workflow used for this work consists of the following steps: 1. Initial sampling points in the model space are generated using the Monte Carlo forward sampling scheme for the model parameters, namely the SFT and flow rate. 2. Borehole temperatures at those measured depths are computed using the borehole simulator for each sampling point. 3. The decision function, equation (7), is evaluated using the results from step 2. 4. The data pairs {model parameters, decision function value} are used to train the ROM using the KNN classification model. 5. The ROM is used to predict the values of the decision function for all the discretisation nodes of the model space and then the LS is determined based on the change of the values of the decision function (i.e., the transition from −1 to 1). 6. Each point on the LS is assigned a score based on its distance from the sampling points already taken (the greater the distance, the higher the score) and the persistence of its predicted decision function value (the larger the number of times the prediction for that point has changed, the higher the score). The point with the highest score is added to the training samples. 7. The procedure is repeated from step 2 until convergence is achieved: (a) when the LS does not change after a certain number of consecutive iterations (here-after called persistence step) and (b) when the "volume" fraction of each cell in the entire discretised model space reaches a user-defined tolerance (referred to as convergence confidence).
It is worth mentioning the former criterion is necessary to prevent the search algorithm from focusing too much on a certain region of the LS while placing too few points in other zones and thus completely hiding undiscovered regions of the LS. In addition, the latter convergence criterion determines the accuracy of the predicted LS, i.e., the smaller the tolerance value, the finer the discretisation grid on the model domain, the more accurate the computed LS.

Uniform geothermal gradient
This section examines the factors that influence the accuracy of the SFT estimates, namely borehole operation parameters such as injection flow rate and injection duration, the quality of temperature measurements and the presence of a flow loss zone. For simplicity, the formation is assumed to have a constant geothermal gradient.

Estimation of the SFT alone.
Herein, a 2-D domain that consists of a borehole with a radius of 0.11 m and a nonpermeable formation with a vertical extension of 4500 m and a lateral extension of 50 m is simulated. The rock formation is assumed to have constant thermal properties ( s = 2700 kg m −3 ; c p, s = 800 J kg −1 K −1 ; s = 2.5 W m −1 K −1 ). Figure 1a shows the initial and boundary conditions of the model. The SFT is assumed to increase linearly from 10 • C at the surface to 500 • C at 4500 m, and no flow loss occurs along the borehole. Water is injected from the wellhead into the borehole at a constant temperature (10 • C) and a constant flow rate. Borehole temperatures are then simulated assuming constant water properties ( f = 998 kg m −3 ; c p, f = 4182 J kg −1 K −1 ; f = 0.6 W m −1 K −1 ) using the forward-modelling approach that is previously described. Figure 1b shows synthetic temperature logs for different injection durations (3-12 hours) at a flow rate of 50 L s −1 , and figure 1c for six hours after injection at different flow rates (25-100 L s −1 ).
The analysis of the sensitivity of the SFT estimates to the dynamic injection conditions (injection time and flow rate) and the chosen RMSE thres value is performed using the aforementioned adaptive sampling approach. Note that the RMSE thres in this case only takes the measurement error into account. Since the SFT is a linear function of depth, only the SFT at the bottom-hole needs to be solved. The number of realisations required for adaptive sampling to converge usually depends on the complexity of the inverse problem (e.g., the number of predicted variables) and the prior uncertainty (e.g., the RMSE thres value and the model space of the variables). For all inversion scenarios presented in this section, the input bottom-hole SFT value is assumed to have a uniform distribution in the interval 450-550 • C, and the ROMs are trained with KNN using five nearest neighbors (see also table 1 for a summary of the relevant parameter values in this study). The total number of forward simulations evaluated for each model to reach convergence is about 100-200.
Four temperature logs obtained after different injection durations (3, 6, 9 and 12 hours) at 50 L s −1 are inverted to estimate the SFT value at the bottom-hole separately. According to figure 2a, the estimation errors of the 1026  SFT are ± 11.5 • C (± 2.3%), ± 14.4 • C (± 2.9%), ± 16.2 • C (± 3.2%) and ± 17.5 • C (± 3.5%), respectively. Figure 2b shows the inversion results using temperatures measured at different injection rates (25, 50, 75 and 100 L s −1 ) after the same injection duration of 6 hours. The estimation error is lowest (± 7.2 • C/± 1.4%) when the flow rate is 25 L s −1 and highest (± 28.9 • C/± 5.8%) when the flow rate is 100 L s −1 . These results are obtained for an RMSE thres value of 1.0 • C. Figure 2c displays the results for the formation temperature at the bottom-hole by inverting the temperature log obtained after 6 hours of injection at 50 L s −1 , considering RMSE thres values between 0.5 • C and 2 • C (with a step of 0.5 • C). As expected, the error of the estimate would increase (from ± 7.2 • C to ± 28.9 • C) as the RMSE thres value increases from 0.5 • C to 2.0 • C.

Estimation of the SFT and the flow loss.
The loss of circulation fluid is commonly encountered in drilled boreholes due to the existence of faulted or fractured formations (Allahvirdizadeh 2020). To account for such a case, a loss zone at 3500 m is added to the same model explained in figure 1a. It is assumed the injection flow rate becomes 25 L s −1 below 3500 m due to the loss. As can be seen in figure 3, a significant increase in the temperature gradient after the loss zone is observed for each temperature profile measured at a different time. In the following, the bottom-hole SFT and the remaining flow below the loss zone are jointly estimated using the temperature log obtained six hours after injection. The dependency of the results on the accuracy of the temperature measurement is analysed again by taking four different RMSE thres values in the inversion procedure. The prior distributions of the bottom-hole SFT and the remaining flow rate are 400-600 • C and 0-50 L s −1 , respectively. The numbers of steps for the four models to converge is around 1000-1500. As can be seen from figure 4, the errors of both the SFT and flow rate estimates increase as the RMSE thres value becomes larger. For instance, if the RMSE thres is 0.5 • C, the maximum estimation error is ∼10 • C (2%) for the bottom-hole SFT and ∼1.5 L s −1 (3%) for the flow rate below 3500 m. However, when the RMSE thres rises up to 2.0 • C, the maximum estimation error becomes ∼ 48 • C (9.6%) for the SFT and ∼6.5 L s −1 (26%) for the flow rate. Also, the elliptical shape of the contour lines indicates a positive correlation between the bottom-hole SFT value and the remaining flow. Furthermore, it is found that the uncertainty of the SFT estimate increases when a flow loss zone is present. For example, compared to the inversion results for the case in Section 3.1.1 where no loss occurs, the maximum error of the SFT estimate increases by 2.8 • C for RMSE thres = 0.5 • C and 19.1 • C for RMSE thres =2.0 • C.

Two-layer model with non-uniform geothermal gradient
Herein, we extend the complexity of the above study by considering a formation (hereafter referred to as formation F1) consisting of two layers with different geothermal gradients. The purpose of the new study is to investigate the influence of different prior assumptions about the geothermal gradient (i.e., the second type of error source contributing to the RMSE, as discussed in Section 2.2) on the prediction of the SFT and the flow rate. We would like to mention that this study is inspired by the deep well RN-15/IDDP-2 in Reykjanes, Iceland. The well was drilled by deepening an existing well (RN-15) of 2500 m depth to 4500 m deep. During the drilling, a major flow loss was found at around 3500 m. The high-temperature environment around the well was confirmed by measured temperatures of up to 426 • C (Friðleifsson et al. 2020). The SFT profile from the surface to a depth of 2500 m has been directly calculated using thermal recovery temperature data of the well. However, the determination of the SFT below 2500 m has been an issue of much interest. As cold fluid was continuously injected during drilling to cool down the casing and the formation (Peter-Borie et al. 2018), only temperature measurements from injection conditions are available to assess the formation temperature. For F1, the true formation temperature is assumed to increase from the surface with a constant gradient of 0.096 • C m −1 to 298 • C at 3000 m and then continue to increase with a gradient of 0.135 • C m −1 until it reaches 500 • C at 4500 m (figure 5, SFT_F1). Figure 5 (blue line) also shows the temperature profile after 6 hours of injection at a rate of 50 L s −1 . It is noticeable that the local flow loss of 25 L s −1 at 3500 m leads to a dramatic increase in the borehole temperature gradient (figure 5, red dashed line), whereas the increase of the SFT gradient after 3000 m has no obvious effect on the change in the local fluid temperature gradient. In the following investigated scenarios, it is assumed that the SFT in the upper 2500 m is already known. For the geothermal gradient below 2500 m, however, different assumptions are made. One model (F1A1) hypothesizes a constant thermal gradient from the surface to 3000 m (which is consistent with the truth) and another possibly different gradient below 3000 m. Therefore, the SFT can be linearly extrapolated from 2500 m until 3000 m but remains unknown for the second layer. In another model (F1A2), a linear-shaped SFT is assumed for the entire depth interval between 2500 m and 4500 m. By comparing F1A2 with F1A1, a question being addressed is: Without knowing how the geothermal gradient varies within the intended depth interval, what impact would the assumption of a uniform geothermal gradient-a commonly adopted simplification in For both models, the flow rate below 3500 m is a prediction variable. An additional inversion parameter for F1A1 is the SFT in the depth interval 3000-4500 m, and for F1A2 the SFT in the depth interval 2500-4500 m. Again, assuming a constant geothermal gradient within each layer, only the SFT value at the bottom depth (4500 m) needs to be solved in both cases. The thermal gradient is considered to vary possibly between 0 and 0.3 • C m −1 (Bahlburg & Breitkreuz 2018). Accordingly, the explored SFT values at 4500 m for F1A1 and F1A2 are 298-748 • C and 250-850 • C, respectively. The flow rate below 3500 m is assumed to be uniformly distributed over the interval [0,50] L s −1 . The total number of forward simulations performed is ∼ 3800 for model F1A1 and ∼ 2700 for model F1A2. Figure 6 shows the contour plots of RMSE thres = 1.0 • C for model F1A1 and F1A2 in the explored space of the bottomhole SFT value and the flow rate. For model F1A1, both the SFT and the flow rate are poorly estimated: the acceptable SFT at 4500 m covers the entire domain allowed, 298-748 • C, and the flow rate can vary between 16 L s −1 and 38 L s −1 . Nonetheless, there is still a strong correlation between the flow rate value and the associated SFT value. On the other hand, both variables seem to be better constrained in F1A2 than in F1A1, although the variability of the inverted values is still quite high.

Impact of the injection and logging conditions
As can be seen from figure 2, the accuracy of the SFT estimate when interpreting dynamic temperature logs is highly 1029 Figure 7. Maximum (circles) and minimum (squares) values of the estimated SFT at the bottom-hole using temperature logs obtained under injection rate 25 L s −1 (blue line) and 50 L s −1 (red line) after different injection durations (1, 2, 3, 4, 6, 9 and 12 hours) considering RMSE thres = 1.0 • C (for the model described in Section 3.1.1). dependent on both the flow rate and the duration of the injection before the temperature recording. For the same injection duration before logging, the errors of the estimates would increase with the injection rate. Similarly, longer injection durations at the same injection rate lead to decreasing accuracy of the estimates. Therefore, the determination of SFT using dynamic temperature logs requires careful selection of these logs. As such, the inversion scheme can also be applied to propose appropriate temperature logs to be used. For example, for the investigated case in Section 3.1.1, the time at which the log is acquired needs to be restricted according to the injection rate and the desired accuracy of the SFT estimates. To achieve an accuracy of ± 10 • C, temperatures measured within 12 hours of injection can be accepted for an injection rate of 25 L s −1 , whereas for an injection rate of 50 L s −1 , only logs recorded within the first 2 hours after the start of injection can be used ( figure 7). However, it should be noted that our discussion is based only on stable injection conditions (i.e., constant injection rates). In practice, if several temperature profiles are measured, the injection rates and the respective duration of injection before these logs are obtained may be very different. Since both flow rate and injection duration affect the quality of data interpretation, it may be necessary in such situations to use several temperature logs to perform independent inversion procedures and make a crosscomparison of the results.
Herein, the inversion study was only performed on instantaneous depth-temperature profiles. In other words, we assumed that temperatures were recorded simultaneously at all sampling depths. As mentioned in the Introduction, the acquisition of this type of temperature log can be achieved with DTS. In contrast, conventional logging methods such as wireline logging often involve running a temperature sensor in and out along the borehole and recording the temperature at each specific depth. Since the temperature sensor requires some time to reach thermal equilibrium with the measured fluid, the logging speed needs to be limited to attain high accuracy of the temperature data (Sharma et al. 2021). At a typical logging speed of 10-15 m min −1 (Prensky 1992), the logging time for a borehole with a depth of 4500 m would be 5-7.5 hours. Our study suggests that such a time span would cause varying errors in the SFT estimates for different locations as they have inconsistent exposure time to the thermal perturbations at the time of temperature recording. Namely, the later the temperature is measured at a given depth, the higher the uncertainty in the SFT estimate at that depth (assuming a constant injection rate).

Impact of a flow loss zone
In the presence of flow losses along the borehole, the results of the joint estimation of the SFT and the flow rate below the loss zone show a clear increase in the uncertainty of the SFT prediction as indicated both in figures 4 and 6. The reason for this behavior can be explained by the coupled effects of formation temperature and flow rate on the borehole temperature. Namely, an elevated borehole temperature due to a reduced flow rate (i.e., more sufficient time for the heat exchange with the surrounding formation) could be compensated by a cooler formation temperature. Conversely, a cooler borehole temperature caused by a higher injection rate can be compensated by a hotter formation temperature. As a result, borehole temperature logs simulated with different combinations of a wide range of values for the SFT and flow rate may give similarly good fits to the temperature-depth data.
For instance, figure 8 shows two temperature logs, referred to as L1 and L2, with the same RMSE value of 1.0 • C in model F1A1 (Section 3.2). Each log corresponds to an acceptable but extreme solution of this synthetic case (see figure 6). Compared with the real solutions for the SFT value at the bottom-hole (500 • C) and the flow rate (25 L s −1 ) after the loss zone at 3500 m, the applied values for the bottom-hole SFT (737.4 • C) and the flow rate (34.5 L s −1 ) to simulate L1 are much higher. On the contrary, the other valid log L2 is simulated with a significantly lower bottom-hole SFT value (289 • C) and a smaller remaining flow rate (16.9 L s −1 ). In fact, the issue with the aforementioned thermal compensation effect of formation temperature and injection rate on borehole temperature can always hinder the quality of the estimates as long as only temperature data is used for the simultaneous prediction of the flow rate and the SFT.

Impact of inversion constraints
4.3.1. The misfit function. So far, the decision function (equation 7) has been defined based on the RMSE to account for the measurement error. However, as can be observed in figure 8, the acceptable temperature logs when simultaneously estimating the SFT and the flow loss (model F1A1) can locally diverge from the true temperature log by up to 3 • C (especially near the loss zone and at the bottom depth), despite that RMSE thres is equal to 1 • C. Such an observation may not appear satisfactory but is inherent in the initially selected misfit function based on the L2-norm metric (hereafter referred to as M1). Given this, a different evaluation metric (M2), which is the maximum absolute difference between the simulated and the true borehole temperature [max(|T sim − T measure | 1, 2, …, n ), where n is the total number of the sampled logging data], can be adopted in the decision function. This new decision function is tested on model F1A1 to investigate its impact on the solutions for the SFT and the flow rate below 3500 m. The contour lines depicting the RMSE (F1A1-M1) and the max(|T sim − T measure | 1, 2, …, n ) (F1A1-M2) both being equal to 1.0 • C are plotted in the model space in figure 9. It is shown that the solution space for the max(|T sim − T measure | 1, 2, …, n ) being less than 1.0 • C is indeed more confined compared to that for the RMSE being less than 1.0 • C. Furthermore, the two extreme solutions of model F1A1 when M1 is applied (see figure 8) are removed from the solution space after M2 is applied. It should be stressed, however, that the application of such a measurement-wise criterion would require a careful assessment of the data quality for each measurement. For example, if the error of a single measurement is higher than ± 1 • C, imposing the same type of criterion like M2 in the misfit function can lead to biased estimates.

Prior information for the model space.
Prior information is another key factor that can contribute to the uncertainty of the inversion results, as it determines how well the presumed inversion model represents the unknown true model. In our context, knowledge about the variation of the geothermal gradient along the depth needs to be provided for a meaningful interpretation of the temperature logs. In the present study, it is assumed that the possible change in the geothermal gradient is directly related to the layout of the geological layers. Therefore, the layer thickness as well as the location of the layer boundaries serve as constraints in the estimation of the geothermal gradient (i.e., SFT). The results of model F1A2 suggest that an incorrect assumption regarding the thickness of the geological layers introduces a bias that leads to the shifting of the acceptable parameter region away from the true region. This is evidenced by comparing the locations of the black and red contours in figure 6 with regards to the position of the blue star. We additionally examined a different layer configuration F2, where the first layer extends to 4000 m (figure 10). Again, the inversion modelling is performed assuming a single geothermal gradient below 2500 m. The solutions, as shown in figure 11, move even further away from the true values of the SFT and the flow rate. It is also worth mentioning that, in a hydrothermal system, fluid advection or convection, or both, can cause variations in the geothermal gradient that cannot be predicted by the conductive model (Schilling et al. 2013). It is therefore recommended to investigate a comprehensive coupled thermal-hydraulic forward model to predict the temperature distribution in the target area, constrained by temperature measurements from boreholes (Athens & Caers 2019).
Finally, including different types of prior information into the inverse modelling might also be necessary to limit the boundaries of the model space, especially when dealing with large uncertainties in the joint estimation of the flow rate and the SFT. In practice, this can be done by combining the borehole temperature profiles with other types of measurement data such as flowmeter logs (Molz et al. 1994) that provide information on the rate of flow along the depth, or geophysical surveys such as magnetotelluric, gravity, resistivity logs (Hokstad & Tanavasuu-Milkeviciene 2017) and geothermometry data (Ystroem et al. 2020) which can add additional constraints on the in-situ formation temperatures.

Conclusion
In this paper, a data-driven inversion method is performed to analyse the uncertainty in deriving static formation temperatures (SFT) from borehole temperature logs measured under injection conditions. Specifically, the inversion scheme groups the simulated temperature logs from the forwardmodelling step into two categories, 'passing' and 'failing' , based on a user-defined misfit tolerance (e.g., root mean squared error) between the predicted and true temperature values. A k nearest neighbor machine-learning model is then trained as a 'classifier' that proposes the most promising sampling points in the model space for each iteration until the optimal prediction of the boundary between the two categories is achieved. Compared with deterministic optimisation methods that are used to find an optimal set of parameters, the applied method allows for the simultaneous inversion of all relevant model parameters, leading to predictions that equally match the predefined quality of the data fitting.
Our study showcases the application of the method in evaluating multiple factors that affect the accuracy of the solutions for the SFT. For example, the inversion result for the bottom-hole SFT deviates by ± 2.9%, i.e., ± 14.4 • C, from the true value when the interpreted temperature log is measured after 6 hours of injection at 50 L s −1 with a standard error of 1.0 • C. More generally, it is found that the use of temperature data acquired at relatively lower injection rates or after shorter injection durations and undoubtedly with higher accuracy would improve the quality of the prediction.
Additional case studies indicate that the occurrence of flow losses along the borehole can lead to large uncertainties in the determination of the SFT due to the thermal compensation effect between the formation temperature and the flow rate. Hence, the integration of prior information, e.g., from other types of measurements such as flowmeter logs or geothermometers, into the inversion modelling would help to reduce such uncertainties. Another option is to apply tighter constraints on the misfit between predictions and measurements. However, as with any misfit criterion, the choice of this criterion should be justified, e.g., with regard to the quality of the data acquired.
The present study is based on the assumption that the SFT profile has a piecewise linear shape corresponding to the structure of geological layers, which is most suitable for conductive geothermal systems. Under these conditions, our study shows that prior information on the thickness and location depth of the geological layers is necessary to estimate the SFT. If oversimplified assumptions are made due to a lack of such information, the search for solutions in the model space may be strongly biased towards a wrong direction. On the other hand, for a hydrothermal convection system where conductive heat flow can be disturbed due to the movement of fluids in the formation, a piecewise linear-shape SFT may not be applicable. Nevertheless, a coupled forward thermalhydraulic model can still be adapted to the current inversion scheme.
With this work, we demonstrate the promise of applying machine-learning techniques for efficient inversion of borehole data, including uncertainty quantification. Besides the numerical setting of the problem, the performance of any inversion method also depends on the availability and quality of the input data. Hereby, the use of more sophisticated logging tools such as distributed temperature sensing to obtain spatially and temporally dense measurements is therefore encouraging. Future work may involve the integration of other types of data into the inversion process to reduce the uncertainty of the estimated parameters or to investigate different parameters in the context of other geophysical applications.