ANN-based ground motion model for Turkey using stochastic simulation of earthquakes

Turkey is characterized by a high level of seismic activity attributed to its complex tectonic str ucture. The countr y has a dense network to record ear thquake g round motions; ho wever , to study previous earthquakes and to account for potential future ones, ground motion simulations are required. Ground motion simulation techniques offer an alternative means of generating region-speciﬁc time-series data for locations with limited seismic networks or regions with seismic data gaps, facilitating the study of potential catastrophic earthquakes. In this research

Ground motion models (GMMs) provide estimations of different shaking intensities that characterize strong ground motions from prior knowledge of seismological parameters, such as moment magnitude M w , fault mechanism (FM), focal depth (FD), average shear wav e v elocity in the top 30 m ( V s 30 ) and various source-to-site distance metrics.Because of their broad applications, GMMs are commonly employed in different fields, such as earthquake engineering and seismology.Moreover, GMMs represent a well-known tool for the prediction of ground shaking intensities, and therefore, their implementation is essential in the context of seismic hazard anal yses.A large v ariety of GMMs have been developed in the past for the prediction of various intensity measures (IMs), including peak ground acceleration (PGA), peak ground velocity (PGV), peak ground displacement (PGD), spectral acceleration (SA), or pseudo-spectral acceleration (PSA) at different periods (Boore & Atkinson 2008 ;Akkar et al. 2014 ;Campbell & Bozorgnia 2014 ).Nevertheless, two major drawbacks might be pointed out regarding most GMMs currently availab le.F irst, most models adopt parametric formulations, which might induce bias in the prediction of IMs (Campbell & Bozorgnia 2012 ) and second, GMMs rely on the quality of the data set adopted to develop the model, which can be problematic for regions with a moderate to high levels of hazard and lack of recorded accelerograms characteristic of large-magnitude events (Gianniotis et al. 2014 ).
Over the last decade, extensive research on GMMs has been conducted involving parametric formulations and robust mathematical forms.Boore et al. ( 2014 ) provided prediction equations for computing medians and standard deviations of PGA, PGV and 5 per cent damped PSA for shallow cr ustal ear thquakes using the NGA-West2 database (Ancheta et al. 2014 ).Similarly, GMMs were presented by Bindi et al. ( 2014 ) valid for Europe and the Middle East, with distances (i.e.Joyner-Boore, R JB ; and hypocentral, R hypo ) less than 300 km, hypocentral depth up to 35 km and M w range from 4 to 7.6.Kale et al. ( 2015 ) proposed a GMM for Turkey and Iran to investigate the potential regional effects on ground motion amplitudes from shallow active crustal earthquakes using a subset of the recently compiled strong-motion database of the Earthquake Model of the Middle East Region project ( S ¸e s ¸etyan et al. 2018 ).Bommer et al. ( 2016 ) considered the Netherlands seismicity to develop GMMs for spectral ordinates of moderate-to-large-magnitude earthquak es.Lik ewise, Idini et al. ( 2017 ) presented a GMM valid for the Chilean subduction zone.The study by Bozorgnia & Campbell 2016a , b) focused on the development of GMMs for vertical components of PGA, PGV and PSA.The model was claimed to be valid for worldwide shallow cr ustal ear thquakes, various types of faulting, M w from 3.3 to 8.5, and for fault rupture distances ranging from 0 to 300 km.More recently, Bindi et al. ( 2019 ) introduced a GMM for the prediction of acceleration and displacement spectral ordinates, including countr y-to-countr y random ef fect (Ital y, T urkey , Romania, Greece and others).Boore et al. ( 2021 ) derived a GMM for the horizontal components of PGA, PGV and 5 per cent damped PSA using a wide database of uniformly processed strong-motion data recorded in Greece.Alternati vel y, the implementation of alternative non-parametric GMMs has increased significantly in the last few years.W isznio wski ( 2019 ) implemented the Fahlman's Cascade Correlation neural network (Fahlman & Lebiere 1990 ) to generate an improved GMM for the prediction of peak horizontal acceleration as a function of distant metrics in different M w ranges.Meenakshi et al. ( 2023 ) adopted artificial neural networks (ANNs) coupled with the genetic algorithm to develop GMMs in the Peninsular India for maximum rotated (RotD50) components of PGA, PGV and 5 per cent damped PSA for periods between 0.01 and 3 s.Sreenath et al. ( 2023 ) adopted diverse machine learning models to develop a hybrid non-parametric GMM for shallow crustal earthquakes in Europe; the model was developed for a large number of seismic intensities (i.e.PGA; PGV; PGD; cumulative absolute velocity; Arias intensity and significant duration).Based on recorded ground motions in T urkey , Yerlikaya-Özkurt et al. ( 2014 ) recently derived a GMM for Turkey to predict PGA and PGV using the multi v ariate adapti v e re gression splines method.
In the meantime, other researchers have approached the lack of recorded accelerograms characteristic of large-magnitude events by adopting simulation techniques to reproduce synthetic motions.For instance, Ugurhan & Askan ( 2010 ) performed stochastic simulation based on the dynamic corner frequency approach proposed by Motazedian & Atkinson ( 2005 ) considering the D üzce (Turkey) earthquake that took place on 1999 November 12 ( M w = 7.1).Later, Ozmen et al. ( 2020 ) studied the same event with an updated simulation approach.Askan et al. ( 2013 ) investigated the sensitivity to seismic parameters of stochastic simulations using sparse data collected from the 1992 March 13 Erzincan earthquake in eastern Turkey of M w = 6.6.The work of Karimzadeh & Askan ( 2018 ) focused on simulations of the historical 1939 Erzincan earthquak e in Turk ey through the dynamic corner frequency approach using regional seismological information computed from the 1992 earthquake that took place in the same region.Cheloni and Akinci ( 2020 ) recently performed stochastic finite-fault simulations to generate high-frequency synthetics motions for the Elazıg earthquake in Turkey of M w = 6.8.A scenario earthquake ground motion data set was also developed for the Gaziantep region in T urkey , which was af fected b y the recent 2023 Kahramanmara s ¸e vents (Arslan K elam et al. 2022 ).Similarly, stochastic simulation has been employed for simulating the records of past earthquakes, such as: the 1998 July 9 Faial Earthquake (Azores, Portugal) (Karimzadeh & Louren c ¸o 2022 ); the 2022 Febr uar y 3 Cay (Turkey) ear thquake (Can et al . 2021 ); the 2009 April 6 L'Aquila earthquake (Ugurhan et al. 2012 ) and the 2016 Kumamoto (Japan) earthquake (Zhang et al. 2016 ).Consecuti vel y, other investigations have addressed the validation of synthetic records from a seismological and engineering point of view (Zonno et al. 2010 ;Koboevic et al. 2011 ;Karimzadeh 2019 ;Karimzadeh et al. 2019Karimzadeh et al. , 2020 ; ;Fayaz et al. 2020 ;Karimzadeh et al. 2021a , b).In other studies, large suites of simulated motions have been employed for the development of GMMs valid for the prediction of PGA and spectral ordinates (Campbell 2003 ;Megawati et al. 2005 ;Withers et al. 2020 ;Raghucharan et al. 2021 ;Sreenath et al. 2023 ).
This paper introduces a novel approach to develop a nonparametric GMM using a database of stochastically simulated records through an ANN implementation in Python.The study focuses on Turkey as the chosen area, driven by its high seismic activity and the scarcity of large-magnitude events in different regions.The choice of Turkey was further motivated by the occurrence of recent catastrophic events of 2023 Febr uar y 6 in Gaziantep ( M w = 7.7) and Elbistan ( M w = 7.5).While a non-parametric GMM for estimating spectral ordinates in Turkey has been pre viousl y proposed using the XGBoost algorithm (Mohammadi et al. 2023 ), it has limitations in capturing information on large-magnitude events and lacks data set homogeneity.In contrast, the non-parametric GMM presented in this paper relies on a substantial data set of synthetic records generated through the stochastic finite-fault method, encompassing regions such as Afyon, Erzincan, Duzce, Istanbul and Van within T urkey .The developed model aims to predict ground motion IMs such as PGA, PGV as well as various values of PSA in the range 0.03-2.0s.The ef fecti veness of the proposed GMM is verified by comparing the predicted values of the ground motion IMs with the observed values recorded during previous events in the selected regions.Additionally, the model's trend is assessed by comparing it with the real data set of T urkey , which includes the most recent events of 2023 Febr uar y 6.In addition, the developed model is compared against the selected parametric GMM proposed by Kale et al. ( 2015 ).

G RO U N D M O T I O N D ATA S E T
This study utilizes the stochastic finite-fault ground motion simulation approach proposed by Motazedian & Atkinson ( 2005 ) to constr uct the g round motion data set for the GMM.The chosen study area for conducting simulations is selected regions in Turkey characterized by high seismicity and diverse tectonic structures, including Afyon, Erzincan, Duzce, Istanbul and Van.Fig. 1 illustrates the tectonic map of Turkey with their convergence rates Utkucu et al. ( 2003 ), highlighting the selected study areas represented by red rectangular boxes.These regions are specifically selected due to the absence of a unified set of recorded ground motions that encompass a wide range of magnitudes, source-to-site distances and site conditions corresponding to past earthquakes.Focusing on these areas aims to address the need for a comprehensive data set that encompasses a broader spectrum of seismic events and associated ground motions.This section provides details on the methodology employed for ground motion simulation and information regarding the scenario earthquakes and the generated ground motion data set.

Ground motion simulation method
The generation of synthetic records based on the stochastic point source was originally introduced by Boore ( 1983 ), who combined the source spectrum of Aki ( 1967 ) and Brune ( 1970 ) with the findings of Hanks & McGuire ( 1981 ).Initially, the implementation of Boore ( 1983 ) relied on the deterministic far-field S -wave Fourier amplitude spectrum of acceleration, where random phase angles were incorporated to generate single horizontal acceleration timeseries.Subsequentl y, Beresne v & Atkinson ( 1997 ) extended the point-source model of Boore ( 1983 ) to include finite-fault effects in simulations.This extension, known as the stochastic finite-fault method, involved discretising the fault plane into smaller subfaults.Each subfault is treated as a stochastic point source, and their contributions are summed in the time domain.Ho wever , a limitation of this method is the assumption of a constant corner frequency, which led to the dependence of the total radiated energy on the sizes of the subfaults.To address this limitation, a more recent version of the stochastic method introduced a dynamic corner frequency approach (Motazedian & Atkinson 2005 ) to model the high-frequency content of the shear wave portion of ground motion records.In this approach, the corner frequency at any given time is defined to be inversely proportional to the area of the subfaults that had ruptured up to that time (Motazedian & Atkinson 2005 ).Fig. 2 is the schematic distribution of the wave front from a finite-fault source model.
This study employs the stochastic finite-fault method incorporating a dynamic corner frequency concept (Motazedian & Atkinson 2005 ) to simulate earthquake scenarios in the selected regions in T urkey .In this approach, the fault plane is represented by a collection of smaller subfaults, each of which is considered a stochastic point source (Boore 1983 ).The acceleration spectrum of each point source ( ij ) is expressed as follows: where In the given equations, various parameters are used to characterize the seismic phenomena: M 0 represents the seismic moment measured in dyne •cm; R ij denotes the distance from the observation point to the subfault indexed as ij ; β signifies the crustal shear wav e v elocity, measured in km s −1 ; Q is the frequency-dependent quality factor; G(R) presents the geometric spreading as a function of source-to-site distance ( R ), S(f) represents the soil amplification function; k (kappa) models the linear decay in higher frequencies of the Fourier amplitude spectrum of the S -wave portion of the acceleration records, represented in semi-logarithmic space; FS denotes the free surface amplification factor, typically assumed to be 2 and PRTITN is a factor that reflects the partitioning of shear wave energy into two horizontal components.Its assumed value is generally 1/ √ 2 • ρ and represents the crustal density measured in g cm −2 ; H ij is a scaling factor dependent on frequency, specifically for high frequencies and, finally, R θø -denotes the radiation pattern constant, often considered as 0.55 for shear waves (Atkinson & Boore 1995 ).It is worth noting that recent investigations (Takemura et al. 2016 ;Kotha et al. 2019 ;Wang et al. 2021 ) underscore the rele v ance of frequency-and distance-dependent radiation pattern models.Yet, this study is founded on utilizing a consistent radiation pattern coefficient of 0.55, construed as emblematic of an average radiation pattern coefficient.
The term fc ij in eq. ( 1), which defines the corner frequency of a subfault, is defined as follows: where N R ( t ) represents the total count of subfaults that have experienced rupture by time t , σ denotes the stress drop and M 0-ave indicates the average seismic moment associated with the fault.
The deterministic acceleration spectrum described in eq. ( 1) is combined with random phases and converted into the time domain for each point source on the fault plane.The individual contributions from each sub-fault are then accumulated in the time domain to produce the overall total acceleration as follows: where, a ij represents the time-series of acceleration specific to the ij th subfault, while a ( t ) denotes the acceleration of the entire fault.
The terms nl and nw represent the number of subfaults considered along the length and width of the rectangular fault plane, respecti vel y.T ij corresponds to the ratio of the subfault radius to the rupture velocity and t ij indicates the time delay between each subfault and the observation point.

Input-model parameters
The ground motion data set in this study encompasses a comprehensive set of time-series derived from simulations conducted in div erse re gions of Turke y, including Afyon, Erzincan, Duzce, Istanbul and Van.The simulations cover a wide range of magnitude from 5.0 to 7.5 with an interval of 0.5 in addition to the past events, including the 2002 Afyon ( M w = 6.6), 1992 Erzincan ( M w = 6.9), 1999 Duzce ( M w = 7.1) and 2011 Van  according to the study of Boore & Joyner ( 1997 ).By incorporating these soil types, the simulations account for the variability in ground response associated with different soil conditions pre v alent within the study regions.Table 1 provides a summary of the input-model parameters utilized for simulations conducted in the selected regions.These parameters encompass source characteristics, path properties and site conditions, offering a comprehensiv e ov erview of the key factors considered in the simulations for each respective region.In addition, the table outlines the boundaries for each region in which e venl y distributed nodes are chosen for simulations.The respective numbers of stations considered for Afyon, Erzincan, Duzce, Istanbul and Van are 324,365,90,88 and 430.
It is important to note that the simulations conducted in this study have been rigorously verified and validated in earthquake engineering practice.The authors have pre viousl y employed these simulations in various studies encompassing different applications (Askan et al. 2015 ;Karimzadeh et al. 2017Karimzadeh et al. a, b, 2019Karimzadeh et al. , 2020Karimzadeh et al. , 2021 ; ;Karimzadeh & Askan 2018, 2021 ;Ozmen et al. 2020 ;Can et al . 2021 ;Kelam et al. 2022 ).This e xtensiv e practical application and validation serve to enhance the reliability and credibility of the simulation methodology used.

Sim ulation r esults
The results of simulations led to a total of 7358 acceleration timeseries in Afyon, Duzce, Erzincan, Istanbul and Van.In summary, the data set contains scenarios with M w ranging from 5.0 to 7.5 and R JB values up to 272 km.For Duzce simulations, the M w values of the scenario events are 5.0, 5.5, 6.0, 6.5, 7.0, 7.1 and 7.5.For Erzincan, M w values include 5.0, 5.5, 6.0, 6.5, 6.6, 7.0 and 7.5.For Istanbul, M w values are 5.0, 5.5, 6.0, 6.5, 7.0 and 7.4, while for Afyon, the M w range includes 5.0, 5.5, 6.0, 6.5, 6.6 and 7.0.Finally, for the scenario events in Van, the M w values are 5.0, 5.5, 6.0, 6.5, 7.0 and 7.1.For all regions, three distinct soil types characterized b y V s 30 v alues of 255, 520 and 310 m s −1 are considered.These measures of shear wav e v elocity correspond to soil types C, D and generic soil, respecti vel y (Boore & Joyner 1997 ) Fur ther more, the histog rams por tray ed in F ig. 4 reveal the seismological features of the performed simulations.First, the presence of high-magnitude motions (i.e.M w values from 7.0 up to 7.5) should be noted, covering in this way the lack of real recorded highmagnitude events.An even occurrence of simulations in the range 5.0 ≤ M w ≤ 7.0 is also observed.In the case of distance metrics, the number of synthetic records reduces as the values of R JB sequentially increase.In the case of V s 30 , the predominance of simulations for soil type C is clear, with fewer occurrences for generic soil and soil type D. The majority of simulations were performed for depth values less than or equal to 10 km, while the rest of them (approximately 20 per cent) were performed for depth values in the range 10 km < FD < 20 km.Finally, the distribution of PGA and PGV metrics regarding R JB values, and for each FM is depicted in Fig. 5 .Independently of the fault type, the distribution of PGA and PGV shows higher values for lower R JB and higher values of M w .This behaviour is coherent with the actual distribution of PGA and PGV with respect to distant metrics and M w values, which further validates the performance of simulations.
F inally, F ig. 6 depicts samples of the simulated time-series for the regions under analysis, selected as representative examples of large-magnitude events with higher PGA values.Baseline correction and Butterworth filtering in the range of 0.1-25 Hz are applied for the postprocessing of the signals.The earthquake timeseries for Afyon represents a scenario of M w 7.0, V s 30 of 255 m s −1 and R JB of 62.49 km with a resulting PGA of 395.55 cm s −2 .The same soil conditions, M w of 7.1 and R JB of 30.40 km, are assumed for Duzce, with an estimated PGA of 540.01 cm s −2 .The M w for Erzincan simulation is for a scenario event of 7.5, with V s 30 of 520 m s −1 and R JB of 4.81 km resulting in PGA of 913.10 cm s −2 .For Istanbul, M w of 7.4, V s 30 of 255 m s −1 and R JB of 8.16 km are taken, respecti vel y, resulting in a PGA of 746.45 cm s −2 .Ultimatel y, a v alue of 354.89 cm s −2 in terms of PGA is computed b y taking M w , V s 30 and R JB values of 7.1, 310 m s −1 and 122.68 km, respecti vel y.

G RO U N D M O T I O N M O D E L L I N G M E T H O D O L O G Y
The pre v ailing approach for predicting ground motion IMs, like PGA, PGV or PSA, is to employ GMMs.These models are typicall y de veloped using empirical methods that entail performing statistical regression analysis on extensive data sets of ground motion intensities (Bindi et al. 2014(Bindi et al. , 2019 ; ;Boore et al. 2014Boore et al. , 2021 ; ;Kale et al. 2015 ;Bommer et al. 2016 ;Bozorgnia & Campbell 2016a , b ;Idini et al. 2017 ).Given the considerable variability or dispersion observed in the ground motion data for each IM, GMMs typically offer a probability distribution of potential ground motion results rather than a single deterministic value.In this study, the GMM to be developed has the following form: where ln( y ij ) is the natural logarithm of the interested IM, herein PGA, PGV and PSA.The inter-event residual component is denoted as η i and the intra-event residual component is denoted as ε ij both in the natural logarithm scale.Finally, i represents the index of the earthquake event, and j represents the index of the station.The functional form in eq. ( 5) is modelled using the ANN algorithm.
The two components of residuals in GMMs, namely inter-event and intra-event residuals, are assumed to be independent and follow a normal distribution with a mean of zero.The inter-event residual component has a standard deviation of τ , while the intra-event residual component has a standard deviation of σ .To calculate the total standard deviation for a given GMM, the square root of the sum of squares of the two components of residuals is taken.This can be expressed mathematically as follows: F inally, the inter -event error for the ith earthquake event can be described as follows: Since the number of records in each event is rather large and n i 2 is much larger than 1, the approximate equation can accurately measure the inter-event residuals (Kubo et al. 2020 ).Finally, the intra-event residuals can be obtained by subtracting the inter-event residuals and predicted IMs from the observed ones.
ANNs are intricate networks comprised of interconnected neural computing elements.They possess the capability to receive input stimuli and adapt to their environment through learning.The process of utilizing ANN in volves tw o phases: learning and recall.In the learning phase, known data sets are employed to train the network by adjusting the weights between the input and output layers.Subsequently, during the recall phase, the network applies the acquired weights to process new inputs and make predictions.ANNs have emerged as a well-established and widely utilized tool across various domains (Flood & Kartam 1994 ;Abiodun et al. 2018 ).Neural network paradigms are characterized b y v arious nomenclatures.In the context of network architecture, a single-layer network comprises individual input and output units, while a multilayer network incorporates one or more hidden units situated between the input and output layers.The backpropagation neural network is a well-known example of a multilayer neural network (Adamowski &

R E S U LT S A N D D I S C U S S I O N S
This section provides an overview of the results obtained from the developed ANN-based GMM.Following that, the developed GMM is subjected to a validation process in which its pattern is compared with all the recorded ground motion data sets from T urkey , encompassing the latest events up to 2023.Specifically, the success of the GMM in estimating the intensity parameters of past real events in T urkey , with a focus on the Afyon, Erzincan, Duzce, Istanbul and Van regions, is e v aluated to further assess its performance.

Performance of ANN-based GMM
In this section, the performance of ANN-based GMM is assessed through a set of statistical metrics, including root-mean-square error (RMSE), coefficient of determination ( R 2 ), Pearson correlation coefficient ( r ) and mean-absolute-percentage error (MAPE).These metrics provide insight into the model's accuracy, fit, correlation and relative error , allo wing for a comprehensi ve e v aluation of its performance.Fig. 8 presents the e v aluated metrics for the proposed GMM across all considered IMs, namely, ln(PGA), ln(PGV) and ln(PSA) at periods ranging from 0.03 to 2 s.The values of RMSE, R 2 and r metrics fall within a narrow range, indicating a consistent performance across all IMs without any notable variation for a specific IM.The mean RMSE value is approximately 0.3, slightly increasing toward ln(PSA) at longer periods.In a similar trend, as can be seen in Fig. 8 , there is a slight decline in the R 2 and r metrics for ln(PSA) as the periods increase.The mean R 2 -value is nearly 0.97, suggesting a robust fit between the model and the data.The mean r -value is  roughly 0.98, indicating a strong linear correlation between the estimated and observed IMs Furthermore, the similarity between the RMSE, R 2 and r values of the train and the test data sets suggests a consistent level of accuracy, fit and correlation across both data sets, respecti vel y.This implies that the model is performing reliably in terms of these metrics regarding the unseen data.On the other hand, the MAPE values exhibit a slightly wide range, indicating varying levels of prediction accuracy across considered IMs.The MAPE values for ln(PSA) within periods of 0.05-0.4s remain in a tight range with a mean value of roughly 0.1 across the train and test data sets.A similar trend is observed for ln(PGA) with a mean MAPE v alue of 0.2.Howe ver, the MAPE v alues increase for ln(PGV) and ln(PSA) at periods higher than 0.4 s.Additionally, there is less consistency across the train and test data sets for ln(PGV) and ln(PSA) at long periods.The differences observed between MAPE , and RMSE are an anticipated outcome due to the inherent nature of these metrics, which assess distinct aspects of error.MAPE emphasizes the relative magnitude of errors, whereas RMSE takes into account the overall magnitude of errors.As a result, MAPE tends to be more sensitive to outliers and the scale of the data within the model.Ho wever , the agreement between, RMSE, R 2 and r values is higher than that of MAPE, indicating a strong performance of the model.
Subsequently, the model's bias with respect to the input variables, namely, M w , R JB and V s 30 is assessed through the analysis of residuals.To this end, the total uncertainty is divided into the inter-event ( τ ) and intra-event ( σ ) uncertainties, demonstrating the standard deviation of residuals attributed to the earthquake source and site characteristics, respecti vel y.Fig. 9 shows the distribution of inter-/intra-event and total uncertainties for ln(PGA), ln(PGV) and ln(PSA) at periods of 0.03 s up to 2 s.Overall, the inter-event residual is consistently smaller than the intra-event residual across all IMs.The intra-event residual of ln(PSA) tends to increase at longer periods, leading to higher total uncertainty.
A closer analysis of residuals is conducted by selecting sample IMs, ln(PGA) , ln(PGV) and ln(PSA) at periods of 0.2, 0.5, 1.0 and 2.0 s.These IMs are selected to encompass a frequency bandwidth including low , intermediate and high frequencies.Fig. 10 shows the distribution of inter-event residuals with respect to M w for selected IMs Likewise, Figs 11 and 12 show the distribution of intra-event residuals in relation to R JB and V s 30 for the same IMs, respecti vel y.The inter-e vent residuals vary between −0.5 and 0.5, while the intra-event residuals show a broader range between −1.0 and 1.0, consistent with other studies (Akkar et al. 2014 ;Kale et al. 2015 ;Mohammadi et al. 2023 )  explanatory variab les, w hile the shaded area around these lines represents the 95 per cent confidence interval for the true mean of the residuals.There is no discernible trend in the inter-and intra-event mean residual across all IMs, indicating the unbiasedness of model errors.The tight confidence inter vals fur ther suppor t this obser vation.Additionally, we employ p -values at a significance level of 0.05 to examine the null hypothesis regarding the unbiasedness of model errors.p -values close to 1.0 imply less bias toward the input parameters.In general, all p -values exceed 0.05 by far across all IMs, implying the absence of any trend in the mean residual.Thus, the model does not exhibit systematic bias toward M w , R JB and V s 30 .Ho wever , as can be seen in Fig. 10 , the confidence interval of the mean residual is slightly wider as the magnitude of the event increases.

Validation of the developed ANN-based GMM
To ensure that the proposed GMM captures the characteristics of recorded strong ground motions, the results are assessed across various magnitudes ( M w ) and distances ( R JB ) by considering soil class C ( V s 30 = 520 m s −1 ), FM of strike-slip and the mean FD.Fig. 13 illustrates the variation of selected IMs, including PGA, PGV and PSA at periods of 0.2, 0.5, 1 and 2 s with respect to M w for various R JB values of 1, 30 and 70 km.The median of the selected IMs and a range of two standard deviations are considered.Filled and unfilled dots, respecti vel y, represent IMs obtained from real and simulated earthquake events.The results are also compared with a selected parametric GMM developed for Turkey by Kale et al. ( 2015 ).It is clear that higher M w and lower R JB result in ele v ated levels of all selected IMs Similarly, Fig. 14 shows the variation of  the same IMs with respect to R JB for different M w values of 5.5, 6.5 and 7.5.Similarly, the results are compared with the GMM of Kale et al. ( 2015 ).An increase in R JB leads to a decrease in PGA, PGV and PSA levels at all periods.The proposed GMM effecti vel y captures this distance-dependent attenuation.As shown in Fig. 14 , higher M w is associated with higher ground motion amplitudes consistent with the former observation.This is expected as higher M w corresponds to higher energy release during an earthquake.There are multiple real records, including those from the recent Turkey 2023 event, which fall within the considered seismological criteria.As shown in Fig. 14 , the proposed GMM effectively captures the sample IMs obtained from these real records, particularly for strong events ( M w = 7.5), mostly within two standard deviations.Finally, the comparisons show that for large-magnitudes the ANN-based GMM de veloped b y this study performs more effecti vel y than the parametric GMM, especially for the 2023 Kahramanmara s ¸earthquakes in T urkey .This implies the reliable performance of the proposed GMM regarding unseen data.
The capability of the proposed GMM to capture the geometric and inelastic attenuation is further investigated.To this end, the variation of PSA regarding R JB for soil class C ( V s 30 = 520 m s −1 ), FM of strike-slip and two M w of 5.0 and 7.5 are illustrated in Fig. 15 (a).Results demonstrate that as distance increases, the peak value of PSA decreases and shifts toward longer periods as observed in previous studies (Dhanya & Raghukanth 2018 ;Mohammadi et al. 2023 ).The seismic energy dissipates as it propagates away from the source due to geometric and inelastic attenuation (Boore 2003 ), resulting in lower levels of PSA.However, higher frequency contents tend to attenuate faster with distance, leading to a shift of PSA peaks toward longer periods.This shift is affected by M w and is less for M w = 7.5 compared to 5.0, as illustrated in Fig. 15 (a).Similarly, the performance of the proposed GMM in representing the effects of soil is assessed.For this purpose, soil class C, generic soil and soil class D are considered with a representative mean V s 30 of 520, 310 and 255 m s −1 , respecti vel y, according to NEHRP soil classification ((US) & (US) 2001 ) and Boore & Joyner ( 1997 ). Fig. 15 (b) depicts the variation of PSA regarding soil classes for R JB = 10 km and two M w of 5.0 and 7.5.The results clearly indicate that when transitioning from stiffer soil (type C) to softer soil (type D), there is a notable increase in PSA le vel, especiall y for longer periods.Additionall y, the peak of the spectra tends to shift towards longer periods.Results indicate that the magnitude of the earthquake influences the extent of this peak shift, which aligns with the fundamental principles of the earthquake's physics (i.e. the corner frequency is lower for large M w and thus, large events have enhanced longer periods).
The performance of the developed model is further assessed by analysing its ability to predict the ground motion IMs for real events that occurred in the regions where there are available simulated motions.Specifically, the model's performance is evaluated for the 2002 Afyon ( M w = 6.6), 1992 Erzincan ( M w = 6.6), 1999 Duzce ( M w = 7.1) and 2011 Van ( M w = 7.1) earthquakes.Table 2 represents detailed information on these events.Table 3 compares the real ground motion IMs, namely PGA and PGV, with the predicted values obtained from the developed GMM at three selected stations with different seismological information for each event.The comparison demonstrates that, for the majority of stations, the estimated values fall within two standard deviations of the actual values for all events, with both underestimation and overestimation of the recorded data.This observation provides additional validation for the proposed model and simulated data set, affirming their suitability in accurately predicting the characteristics of the recorded strong motions.
It is worth noting that the stochastic ground motion simulation method ef fecti vel y replicates obser ved g round motions for mediumto high-frequency contents, predominantly due to the influence of seismic wave scattering, as highlighted in Sato et al. ( 2012 ).However, propagation path effects resulting from deterministic velocity structures become prominent for short-frequency contents.Consequently, the current method's reproducibility diminishes for lowfrequency (long-period) content (IMs, including PGV and PSA at 1 and 2 s).While addressing this limitation by integrating deter ministic velocity str uctures through the finite-difference, spec-tral element, or finite-element methods (Mai et al. 2010 ;Pitarka et al. 2022 ), simulations for long-period ranges would enhance the method's accuracy.
In summary, the developed ANN-based GMM, utilizing homogeneous earthquake data, demonstrates the ability to accurately capture real ground motion attenuation patterns, thus eliminating the need for complex nonlinear regressions with numerous coefficients.Ho wever , the traditional approach remains crucial in situations characterized by limited data availability when compared to all machinelearning-based non-parametric GMMs.This limitation is often due to its reliance on established equations rooted in fundamental physical principles.It is noteworthy that the recent earthquakes that occurred in Turkey in 2023 have notably improved the quality and quantity of near-field from large-magnitude events.Therefore, it is imperative that future research endeavours prioritize the incorporation of this high-quality data into machine-learning-based GMMs.

C O N C L U S I O N S
In this study, a local GMM for Turkey is developed, utilizing a homogeneous data set created through the simulation of regionspecific records.The stochastic finite-fault approach is employed, using the validated input-model parameters for selected regions in Turkey with large past earthquakes, including Afyon, Erzincan, Duzce, Istanbul and Van.To overcome the limitations of traditional linear regression-based models, an ANN is utilized herein to establish the predictive equations and coef ficients.The predicti ve input parameters include FM, FD, M w , R JB and V s 30 .The simulation results include spectral ordinates (PSA) within a specific period range (0.01-2.0 s), PGA and PGV.The uncertainty of the GMM is quantified through the analysis of residuals, providing insights into interand intra-event uncertainties.The developed GMM and simulation results are compared with the real data set of T urkey .The following main conclusions are drawn from the analysis conducted in this study: (i) A homogeneous ground motion data set covering a large range of magnitudes, source-to-site distances and soil classes is developed for different regions in T urkey .The data set consists of 7359 records, covering different FMs including normal, reverse and strike-slip, a range of M w between 5.0 and 7.5, R JB in the range of 0-272 km and soil classes of C, D and generic soil as proposed in Boore & Joyner ( 1997 ).A comparison of the trend of these data sets with the recorded ground motion data set demonstrates the validity of the simulations within the region.Therefore, these data sets are available for engineering use.
(ii) The analysis of residuals yields satisfactory levels of uncertainty across all spectral values.The residuals are further examined to assess inter-and intra-event uncertainties concerning explanatory v ariables.Specificall y, the inter-e vent residual is examined with respect to magnitude, while the intra-event residual is investigated considering soil and distance information within the data set.The results indicate that the inter-event uncertainty for all spectral values is smaller than the intra-event uncertainty .Additionally , it is observed that neither the inter-event nor the intra-event residuals exhibit significant bias with respect to the input variables.These findings indicate a consistent performance of the GMM with respect to input parameters.
(iii) The proposed GMM ef fecti vel y captures the physical characteristics observed in real earthquakes regarding the magnitude, source-to-site distance and soil type.In particular, the GMM features distance-dependent attenuation, geometric and anelastic attenuation, and soil amplification effects.
(iv) The capability of the proposed GMM in estimating the ground motion amplitudes even for large-magnitude events, including the most recent ones in Turkey in 2023, is confirmed by comparing results with the unseen data from Turkish real record data sets.
(v) The model's ef fecti v eness is further v erified by comparing the predicted ground motion parameters with observed values recorded during previous events in the region.Overall, the research validates the suitability of the proposed model and simulated data set in accurately simulating seismic phenomena in T urkey .The utilization of an ANN-based GMM offers a notable advantage in comprehensi vel y capturing the intricate and nonlinear attributes inherent in ground motion data sets, in contrast to the parametric GMMs.This advancement holds particular significance, as the ANN-based model alleviates the stringent limitations imposed by conventional GMMs in terms of prescribed functional forms and the determination of unknown coefficients.Fur ther more, the existing parametric GMM exhibit a constraint in adequately representing intensity levels, especially for events with large magnitudes.This deficiency is addressed through the proposed model introduced in this research.
Finall y, the v alidity of the proposed model is restricted to the examined regions and other areas with similar tectonic characteristics.It is essential to emphasize that the model's applicability is solely limited to the FMs, FDs, magnitudes, distances and soil conditions considered in this study .Additionally , using a stochastic finite-fault simulation technique enhances the model's ability to reproduce medium-to high-frequency ground motion records accurately.Yet, incorporating more accurate frequency-and distance-dependent radiation pattern models can augment the fidelity and precision that characterize the outcomes of ground motion simulation endeavours, particularly in enhancing low-frequency content representation.In addition, deterministic wave propagation studies using numerical models such as finite-element, finite-difference and spectral element are imperative for a more precise simulation of the low-frequency contents.Thus, to improve the accuracy and overcome limitations of the proposed model, especially in different regions, future research should focus on implementing the suggested ANN-based GMM in other tectonic zones while also addressing the need to enhance the representation of low-frequency content.These improvements can be achieved by constructing region-specific simulated data sets and employing hybrid ground motion simulation approaches.Moreover, the continuous enhancement in data quality and quantity, as evident from events such as the 2023 Febr uar y 6 Kahramanmara s ȩarthquak es in Turk ey, highlights the pressing necessity for future research to prioritize the initial integration of this superior data into synthetic data sets and, in the following stages, into machinelear ning-based GMMs.These effor ts will facilitate advancements in accuracy and expand the applicability of the proposed model across various geological settings.

D ATA AVA I L A B I L I T Y
The data underlying this paper will be shared on reasonable request to the corresponding author.

Figure 1 .
Figure 1.Tectonic map of Turkey displaying epicentres of previous events (modified from the study of Utkucu et al. 2003 ).The red boxes indicate the regions considered in this study.

Figure 3 .
Figure 3. Distribution of simulations regarding (a) FMs and (b) regions.

Figure 4 .
Figure 4. Histograms of seismological features of the simulated Turkish ground motion records.

Figure 5 .Figure 6 .
Figure 5. (a) PGA and (b) PGV distributions with respect to the distant metric and faulting mechanism.Karapataki 2010 ).It employs a gradient-descent technique to minimize errors during the learning process, thereby facilitating error minimization in the network.ANN attempts to emulate the parallel information processing capability observed in the human brain.The increasing interest in utilizing ANN-based black box models in seismology and earthquake engineering problems can be attributed to their ability to model nonlinear multi v ariate problems ef fecti vel y (M öller et al. 2009 ; Dhanya & Raghukanth 2018 ; Paolucci et al.

Figure 7 .
Figure 7. Structure of the ANN model and illustration of artificial neurons of the hidden layer.

Figure 8 .
Figure 8. Model performance metrics in terms of RMSE, R 2 , r and MAPE for IMs including ln(PGA) , ln(PGV) and ln(PSA) at periods ranging from 0.03 to 2 s.

Figure 13 .
Figure 13.Variation of PGA, PGV and PSA at periods of 0.2, 0.5, 1 and 2 s with respect to M w for faut mechanism of strike-slip, V s 30 = 520 m s −1 and R JB = 1, 30 and 70 km.

Figure 14 .
Figure 14.Variation of PGA, PGV and PSA at periods of 0.2, 0.5, 1.0 and 2.0 s with respect to R JB for FM of strike-slip, V s 30 = 520 m s −1 , and M w = 5.5, 6.5 and 7.5.

Figure 15 .
Figure 15.Variation of PSA for different (a) R JB values of 10, 70 and 130 km for soil class C ( V s 30 = 520 m s −1 ) and (b) soil classes for R JB = 10 km based on FM of strike-slip and two M w of 5.0 and 7.5.
This work was partly financed by FCT/MCTES through National funds (PIDD A C) under the R&D Unit Institute for Sustainability and Innovation in Structural Engineering (ISISE), under reference UIDB/04029/2020, and under the Associate Laboratory Advanced Production and Intelligent Systems ARISE under reference LA/P/0112/2020.This study has been partly funded by the ST AND4HERIT AGE project that has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement no.833123), as an advanced grant.This work is financed by national funds through FCT-Foundation for Science and Technology, under grant agreement 2020.08876.BD attributed to the second author.This work is financed by national funds through FCT-Foundation for Science and Technology, under grant agreement UI/BD/153379/2022 attributed to the third author.Sha ghaye gh Karimzadeh : Conceptualisation, Data curation, Formal analysis, Investigation, Methodology, Resources, Supervision, Validation, Visualisation, Writing-original draft, Writing-re vie w & editing.Amirhossein Mohammadi : Formal analysis, Investigation, Methodology, Resources, Visualisation, Writing-original draft, Writingre vie w & editing.Sayed Mohammad Sajad Hussaini : Formal analysis, Investigation, Writing-original draft, Writing-re vie w & editing.Daniel Caicedo : Formal analysis, Investigation, Writingoriginal draft, Writing-re vie w & editing.Aysegul Askan : Data curation, Resources, Writing-re vie w & editing.Paulo B. Louren c ¸o: Funding acquisition, Resources, Supervision, Writing-re vie w & editing.
. As reported pre viousl y, the number of stations corresponds to 90 for Duzce, 365 for Erzincan, 88 for Istanbul, 324 for Afyon and 430 for Van, leading to a total of 1297 stations.The distribution of FMs within the stations corresponds to 40.7 per cent for the strike-slip mechanism, 35.1 per cent for the thrust and 24.2 per cent for the normal FM (see Fig.3 a).In addition, the distribution of simulations with respect to regions is shown in Fig.3 (b).It is noted that the availability and validity of input-model parameters impact the variability in the number of simulations con- ducted in different regions (Ismet Kanli et al. 2006 ; Ugurhan & Askan 2010 ; Askan et al. 2015 ; Sahin et al. 2016 ; Akkaya & Özvan 2019 ).

Table 2 .
Detailed information on the past large-magnitude earthquakes within the region

Table 3 .
Comparing real PGA and PGV with predicted values from the developed GMM for past earthquakes in Turkey at selected stations