A comparison of nonergodic ground-motion models based on geographically weighted regression and the integrated nested laplace approximation

Different nonergodic Ground-Motion Models based on spatially varying coefficient models are compared for ground-motion data in Italy. The models are based different methodologies: Multi-source geographically weighted regression (Caramenti et al. 2022), and Bayesian hierarchical models estimated with the integrated nested Laplace approximation (Rue et al. 2009). The different models are compared in terms of their predictive performance, their spatial coefficients, and their predictions. Models that include spatial terms perform slightly better than a simple base model that includes only event and station terms, in terms of out-of sample error based on cross-validation. The Bayesian spatial models have slightly lower generalization error, which can be attributed to the fact that they can include random effects for events and stations. The different methodologies give rise to different dependencies of the spatially varying terms on event and station locations, leading to between-model uncertainty in their predictions, which should be accommodated in a nonergodic seismic hazard assessment.


Introduction
In recent years, there has been a push to relax the ergodic assumption (Anderson and Brune 1999) in the development of empirical ground-motion models (GMMs). Loosely speaking, the ergodic assumption states that one can trade sampling in time for sampling in space, and that the ground-motion distribution at a particular site over time is the same as the ground-motion distribution averaged over many sites. This means that an ergodic GMM ignores repeatable systematic source, path, and site effects. Due to increasing numbers of observed data, it is possible to estimate these effects. Much has been written about nonergodic GMMs and their use in probabilistic seismic hazard analysis (PSHA). For a schematic overview of nonergodic PSHA, see e.g. Villani and Abrahamson (2015) or Walling and Abrahamson (2012). Abrahamson et al. (2019) conducted a nonergodic PSHA for three sites in California, and showed that there can be large differences in PSHA results due to the inclusion of nonergodic effects on ground motion. Landwehr et al. (2016) introduced the concept of spatially varying coefficient models (SVCMs, Gelfand et al. 2003) for the estimation of nonergodic GMMs. In an SVCM, several coefficients are functions of location, thus encoding spatial variation of ground-motion scaling. The model of Landwehr et al. (2016) is an example of a Bayesian hierarchical model, with the covariance of the random effects modeled by a Gaussian process (GP, Rasmussen and Williams 2006). The GP can be understood as a prior distribution over the functional form of the spatial dependence of the coefficients, with the covariance function providing some constraints on the shape of the function. The GMM of (Landwehr et al. 2016) was based on the SVCM formulation of Bussas et al. (2015Bussas et al. ( , 2017, which relied on marginalization of the random effects. Kuehn (2021b) reformulated the model, exploiting that the number of stations and events is much smaller than the number of records. In this formulation, the size of the covariance matrix for the spatial event and site effects depends only on the number of events and stations, while it depends on the number of records when the effects are integrated out. This formulation is also used in recent nonergodic GMMs based on GPs (Lavrentiadis et al. 2021;Sung et al. 2021).
Recently, Caramenti et al. (2020) introduced an alternative to GPs for the estimation of spatially varying GMMs, based on geographically weighted regression (GWR) (Fotheringham et al. 2002), called multi-source geographically weighted regression (MS-GWR). The method of Caramenti et al. (2020) extends GWR to account for spatially varying coefficients depending on both event and station coordinates. They showed an example based on peak ground acceleration (PGA) data recorded in Italy, using the ITA2018 GMM (Lanzano et al. 2019) as a reference model. Lanzano et al. (2021) investigated MS-GWR further, extending the model to different response spectral periods.
GWR and GP based SVCMs are based on different underlying assumptions, which naturally raises the question how they compare. GWR constructs local models for different locations, based on weighting data according to their distance to the location. The weights are determined by a spatial kernel. By contrast, GP based models assume an underlying functional dependence of the spatial coefficients, and place a GP prior on the functions. There have been several studies comparing Bayesian SVCMs and GWR based models (e.g. Finley 2011;Wolf et al. 2018), which generally find that both methods can produce similar results. None of the comparisons, though, deal with the peculiarities of empirical groundmotion data sets, with spatial processes based on different coordinates (event and station), and non-spatially varying random effects.
Here, I compare different GP based nonergodic models with each other, as well as with the MS-GWR model of (Caramenti et al. 2020). The same data set as in Caramenti et al. (2020) is used, which comprises the Italian data of the model of Lanzano et al. (2019). The GP based nonergodic models are estimated via Bayesian inference using the integrated nested Laplace approximation (INLA, Rue et al. 2009). INLA has been widely used to estimate large-scale spatial and spatio-temporal models, for example in ecology (Vilela et al. 2021;Lezama-Ochoa et al. 2020), disease modeling (Schrödle and Held 2011;Moraga 2019), to model spatially varying seismicity (Bayliss et al. 2020;D'Angelo et al. 2020), or in seismic tomography (Zhang et al. 2016). INLA provides an efficient way to estimate spatially varying models (Franco-Villoria et al. 2019), and can be used to estimate nonergodic GMMs (Kuehn 2021b).

3
The paper is organized as follows: I give a short overview of SVCMs and place them in the context of nonergodic GMMs. Then follows a brief introduction to the Italian data and the specific nonergodic GMM formulation (functional form, which nonergodic terms are included). I then provide some details on the implementation of the models with INLA and MS-GWR. Finally, different models are compared in terms of their predictive accuracy, and their estimated spatially varying coefficients. Code to run the analysis in the computer program R (R Core Team 2021) is provided at https:// github. com/ nikue hn/ NonEr gGMM_ Italy.

Spatially varying coefficient models
In this section, I provide a brief introduction to SVCMs, and the differences between Bayesian GP based models and GWR models. For more details, see e.g. Gelfand et al. (2003);Finley (2011) or Fotheringham et al. (2002. These models are then connected to nonergodic GMMs. A general linear model has the following form where y is the target variable, is a vector of coefficients, ⃗ x is the vector of predictor variables, and is the residual. For a spatial model, where the coefficients depend on location ⃗ t , the model becomes In a Bayesian GP based SVCM, the spatial coefficients are typically written as where ⃗ (⃗ t) can be understood as spatial adjustment terms to the base coefficients ⃗ . The adjustment terms are assumed to be distributed according to a GP. For computational reasons, the GPs for the different adjustments are often assumed to be independent (Finley 2011), but a multivariate specification is possible as well (Gelfand et al. 2004). The GPs have mean zero, so the specification for the ith coefficient becomes where k i (⃗ t,⃗ t � ) is the covariance function (or kernel function). Chapter 4 of Rasmussen and Williams (2006) provides an overview of covariance functions. Typically, a stationary and isotropic covariance function is employed (such as an exponential or squared exponential kernel), but nonisotropic and nonstationary models are possible (Paciorek and Schervish 2006;Finley 2011). One can understand the Bayesian SVCM as a model where the coefficients are spatially correlated, and the spatial correlation structure is determined by the covariance function. The parameters of the Bayesian SVCM can be estimated via Markov Chain Monte Carlo (MCMC) sampling (Finley 2011;Wheeler and Calder 2007), or by optimizing the marginal likelihood (Rasmussen and Williams 2006; Dambon et al. 2021).
In GWR, the coefficients for location ⃗ t are determined through a weighted regression, where the weights depend on the distance to the location, and are determined by a spatial kernel. Thus, the regression coefficients are estimated as where X is the design matrix, ⃗ y is the vector of target variables, and W(⃗ t) is a diagonal matrix of weights. The weight matrix is location specific, and its elements are determined by a spatial kernel, W ij = h(⃗ t i ,⃗ t j ) , where the spatial kernel h(⃗ t i ,⃗ t j ) is a function of the coordinates. The elements of W(⃗ t) place greater weight on observations that are close to ⃗ t . Different kernels may be used (Fotheringham et al. 2002;Wheeler 2014), with popular choices being the exponential or the Gaussian kernel. Apart from choosing a kernel, one must also choose a kernel bandwidth, which determines the range of influence of the observations. This parameter is typically determined by cross-validation. In Equation (5), the bandwidth is the same for all spatial coefficients. This assumption can be relaxed (Fotheringham et al. 2017).
In Equation (2), all coefficients ⃗ are spatially varying. A more flexible model is one where only some of the coefficients vary with ⃗ t where subscript c denotes the spatially constant coefficients, and subscript v denotes the spatially varying coefficients. Fotheringham et al. (2002) extended simple GWR to this case. In a Bayesian SVCM, this can be considered a special case of the more general formulation of Eq. (2), where some of the adjustment terms are zero.

Nonergodic GMMs and spatially varying coefficient models
A general GMM has the form where y is the ground-motion parameter of interest, f (⃗ c;⃗ x) is a function describing the scaling of the y with the predictor variables ⃗ x , ⃗ c is a vector of coefficients, B e and S2S s are the event term and site term, and subscripts e and s denote event e and station s, respectively. The event and site term are random effects which are introduced to account for correlation of records from the same event or station. WS es is the remaining within-event/ within-station residual, corresponding to in the linear model of Eq. (1).
It has long been recognized that the formulation of Eq. (7) is overly simplistic, as it does not account for systematic, repeatable, location specific source and path effects. Equation (7) does account for systematic site effects via S2S , but these are specific to each particular station, and technically do not allow to extrapolate to locations close to the station. If one accounts for systematic terms, then the GMM becomes nonergodic, which is typically written as where L2L(⃗ t e ) and P2P(⃗ t e ,⃗ t s ) are systematic source and path effects, and the dependence of the different variables (target, predictors, systematic terms) on event locations ⃗ t e and station locations ⃗ t s is made explicit. In this formulation, a nonergodic model consists of local adjustment terms to an ergodic base model. Often it is assumed that the event term can be decomposed into the systematic source term and the remaining event term, -Atik et al. 2010;Lin et al. 2011). Similarly, the within-event/withinsite residual is decomposed as WS = WS 0 + P2P . Accounting for spatial correlation in (7) y es = f (⃗ c;⃗ x es ) + B e + S2S s + WS es (8) y(⃗ t e ,⃗ t s ) = f (⃗ c;⃗ x(⃗ t e ,⃗ t s )) + B 0,e + S2S s + L2L(⃗ t e ) + P2P(⃗ t e ,⃗ t s ) + WS 0,es site terms S2S can lead to improved predictions for sites that do not have observed records (Chao et al. 2020).
In an SVCM framework, the terms associated with spatially varying coefficients can be thought of as the nonergodic, systematic terms. The interpretation outlined above in Eq. (8) fits neatly with the Bayesian SVCM, as the spatially varying coefficients are modeled as adjustments to a global coefficient. The Bayesian SVCM is a natural extension to the simple random effects model, which accounts for spatial effects with a more complicated covariance structure. A GWR model is different, and does not lend itself to the "nonergodic adjustment to an ergodic base model" formulation. Instead, a GWR model constructs a local model for each location under consideration. One should note that neither framework is a priori better or more suited to the development of nonergodic GMMs.

Nonergodic ground-motion models for italy
The functional form of the ITA18 model, which serves as the reference model, is as follows (Lanzano et al. 2019) where 1 (M w ≤M h ) is an indicator function that evaluates to one if the condition is true, and zero otherwise. M W is the moment magnitude, R JB is the Joyner-Boore distance in km, and V S30 is the time-average shear wave velocity in the upper 30m with units of m/s. F SS and F R are indicators for strike-slip and reverse-faulting events, respectively. B , S2S , and WS are the event term, site term, and within-event/within-site residual, which are distributed according to normal distributions with mean zero and standard deviations , S2S , and SS , respectively.
In their nonergodic GMM for California, Landwehr et al. (2016) made the constant coefficient (intercept), the geometrical spreading coefficient, and the anelastic attenuation coefficient dependent on the event location ⃗ t e . In addition, they had an intercept and the linear site scaling coefficient that was dependent on site location ⃗ t s . The model of Landwehr et al. (2016) did not include an event term B or a site term S2S.
The MS-GWR model of Caramenti et al. (2020) follows the model of Landwehr et al. (2016), and includes a spatially varying geometrical spreading, anelastic attenuation, and linear site scaling coefficient. It also does not include event or site terms, and does not account for spatially varying intercepts.
One can rewrite the reference model of Eq. (9) with spatially varying coefficients as (9) where two spatial intercepts (dependent on event and station location), spatially varying geometrical spreading, anelastic attenuation, and linear site scaling coefficients are included.
If one defines a spatially varying coefficient as an adjustment to the ergodic base coefficient, the full spatial model can be written as where a e (⃗ t e ) is the spatially varying adjustment for the constant coefficient a, dependent on event location. The other adjustment coefficients are denoted similarly. The subscript 0 is added to the event and site term and the within-event/within-site residual to indicate that they are the terms after the systematic, repeatable nonergodic terms are taken out.
The adjustment to anelastic attenuation coefficient, c 3 , is modeled as event location dependent in Landwehr et al. (2016) and Caramenti et al. (2020). This implies that the attenuation for a specific event is isotropic, i.e. is the same in all directions. This is an improvement over an ergodic model, but is unrealistic. Hence, the anelastic attenuation coefficient is made dependent on the midpoint between event and station location (denoted by ⃗ t es ), to account for possible directional differences in attenuation. I compare the midpoint model to an event-location model, which is why the coefficient is denoted as c 3 (⃗ t e∕es ) , to specify that the adjustment depends either on event or midpoint location.
Even if the anelastic attenuation is modeled dependent on the midpoint, it neglects that the anelastic attenuation is an accumulated effect over the whole travel path. This could be modeled by a dependence on both event and station locations, but this runs into problems that there are not many close points in four dimensions. Dawood and Rodriguez-Marek (2013) introduced a cell-specific attenuation model in which the anelasic attenuation is modeled as the sum over distance fractions in small cells. Kuehn et al. (2019) cast the cell-specific model as a Bayesian model to account for uncertainty in cells that are not well sampled. In the cellspecific model, the adjustment to the anelastic attenuation term becomes where ΔR i is the fraction of the path in the ith cell. (10) For the Bayesian SVCMs, the spatially varying adjustment coefficients are modeled as a Gaussian process (or a spatial Gaussian field, GF) with mean zero and Matérn covariance function where denotes the standard deviation of the GP, which is a measure of how much the coefficient can vary across the region under study. In Eq. (13), Γ is the gamma function, K is the modified Bessel function of the second kind, is a scale parameter and is a smoothness parameter. For = 0.5 , the Matérn covariance function becomes the exponential covariance function (Rasmussen and Williams 2006), which is extensively used in spatial correlation models for ground motion (e.g. Jayaram and Baker 2009;Kuehn and Abrahamson 2020;Chao et al. 2020;Iervolino 2011, 2012;Huang and Galasso 2019). The practical spatial range is defined as = √ 8 ∕ , which corresponds to the distance where the correlation becomes 0.14 (Krainski et al. 2019).
Here, the Matérn covariance function with = 1 is used. This makes it possible to estimate the model parameters using Bayesian inference, via the INLA method (Rue et al. 2009). Together with the stochastic partial differential equation ( In Eq. (10), the ergodic coefficients (a, b 1 , b 2 , c 1 , c 2 , c 3 , k, f 1 , f 2 ) are fixed effects. The adjustment terms (both the spatial adjustment terms, as well as the event terms B and site term S2S ), are random effects. The parameters that control the distribution of the random effects are called hyperparameters. These are the standard deviations and S2S for event terms and site terms, and the standard deviation and spatial range for the spatial terms, or the standard deviation ca for the cell-specific adjustment coefficients. All parameters (fixed effects, random effects, and hyperparameters) are estimated at the same time during the fitting process.
In the following section, I consider different nonergodic SVCMs, implemented in INLA. The models differ in the nonergodic adjustment terms that are included. The INLA models are compared with each other, as well as the MS-GWR model of (Caramenti et al. 2020).

Data
The analysis is performed on the same data set as in Caramenti et al. (2020). This data set comprises the Italian subset of the data used to derive the ITA18 GMM of Lanzano et al. (2019). For details on the data set see Lanzano et al. (2019) and Caramenti et al. (2020). In total, there are 4784 records from 137 events, recorded at 923 stations. I consider stations with the same location as the same station. In the data set of (Caramenti et al. 2020), there were three stations (as identified by their location) with slightly different values of V S30 . I used the geometric mean of the V S30 -values of all records from each station as the final V S30 value for each station. The magnitude ranges from M w = 4 to M w = 6.87 , the distance range is 0 ≤ R JB ≤ 199.7km. Figure 1 shows a magnitude-distance scatterplot, as well as a map with event and station locations. For the analysis, the event and station locations are converted to UTM coordinates (UTM zone 33). The target value is horizontal PGA in units of cm∕s 2 . The two horizontal components are combined as RotD50 (Boore 2010).

Implementation
Here, I briefly describe details on the implementation of the INLA models. All models are implemented in the R computing environment (R Core Team 2021). The implementation of the MS-GWR model follows the one provided by Caramenti et al. (2020) at https:// github. com/ lucar amenti/ ms-gwr/. Code to fit the INLA models can be found at https:// github. com/ nikue hn/ NonEr gGMM_ Italy.

INLA
The INLA models are implemented using the R-INLA 1 package Lindgren and Rue 2015;Bivand et al. 2015). R-INLA is a package for approximate Bayesian inference in latent Gaussian models. For details and mathematical background on INLA, see Rue et al. (2009). For introductions to computation with R-INLA, see e.g. Gómez-Rubio (2020) or Krainski et al. (2019). For a brief introduction to INLA in the context of estimation of GMMs, see Kuehn (2021a). Kuehn (2021b) showed how one can use R-INLA to efficiently estimate a nonergodic GMM which is a combination of an SVCM and the cell-specific attenuation approach.
In the SPDE approach to modeling spatial data, the spatial Gaussian field is approximated by basis functions which are evaluated on a triangular mesh (Lindgren et al. 2011;Lindgren and Rue 2015). For a Matérn covariance function with = 1 , the field becomes a Gaussian Markov random field, which means that the precision matrix (the inverse of the covariance matrix) becomes sparse, which can be exploited numerically. Thus, instead of Stations are triangles, events are cicles.

Fig. 1 Data used in the analysis
modeling the spatially varying coefficients at the observation locations, they are modeled at the mesh nodes, and projected onto the observation locations.
As an example, the model for the spatial adjustment term a e (⃗ t e ) is actually where P −1 is the covariance matrix describing the covariance of the adjustment term between the mesh nodes ⃗ t mesh , P is the associated precision matrix, and A e is a projector matrix connecting the mesh nodes to the observation (event) locations. The precision matrix P is sparse, which can be exploited numerically (Simpson et al. 2012). Figure 2 shows the mesh used for the Italian data. For simplicity,the same mesh is used for station and event locations. The mesh has 7505 nodes, with a maximum edge length of 10km, and an outer boundary of 50km. In general, the model complexity increases with a larger number of nodes (and smaller edge lengths). Meshes with too large edge lengths will lead to biased inferences 2 .

Prior distributions
INLA is a Bayesian method, which means one needs to set prior distributions for the parameters that are estimated. These are the fixed effects; the standard deviations S2S , and SS of the event terms, site terms, and within-event/within-site residuals; as well as the hyperparameters (spatial range and standard deviation ) of the spatially varying terms. For the prior distributions of the fixed effects, the following priors are used: The priors for the coefficients associated with different types of faulting ( f 1 , f 2 ) are based on Bommer et al. (2003). The anelastic attenuation coefficient is typically small and negative, which informs the prior distribution for c 3 . For the other coefficients, I choose a weakly informative prior distribution, which means I use a prior that is not too restrictive on the parameter space. The prior distributions for the standard deviations , S2S , and SS have to be set on the log-precisions, which are internally used in INLA. A log-Gamma distribution with shape parameter 0.9 and rate parameter 0.007 is used: The parameters are chosen in such a way that the 5% and 95% quantiles for the prior distribution of the standard deviations (if the log-Gamma distribution is suitably transformed) are 0.05 and 0.45, respectively.
The prior distributions for the spatial field parameters are based on a penalized complexity (PC) prior derived by Thus, the PC prior is specified by setting the prior belief that the standard deviation of the spatial field is larger than a certain value, and its spatial range smaller than a certain value. For the spatial range of all spatially varying adjustment coefficients, the prior is set according to meaning a prior probability of 50% that the range is smaller than 30km is assumed. For the standard deviations, the PC prior is set based on for the constants a e , a s , geometrical spreading adjustment c 2 , and linear site scaling adjustment k . Since the scale of the anelastic attenuation coefficient is very different, its prior distribution for its standard deviation is based on For models that use a cell-specific attenuation model, I also use a PC prior for the standard deviation ca , which is specified in the same way and uses the same values. The prior  Caramenti et al. (2020) extended the simple spatial model of Eq.

MS-GWR
(2) to the case where different coefficients can depend on different sets of coordinates (like event and station coordinates), i.e. a model like where subscript c, e, s denote the sets of coefficients and predictor variables that are spatially constant or depend on event and station coordinates. The different sets of coefficients are estimated sequentially: the spatial coefficients dependent on the station location ⃗ s (⃗ t s ) are estimated first; then, the spatial coefficients dependent on the event location, ⃗ e (⃗ t e ) are estimated based on partial residuals calculated from the first step, and finally the constant coefficients ⃗ c from partial residuals of the second step. Note that the order of the sequence is arbitrary, but this is the one found to be preferred by Caramenti et al. (2020), and used in Lanzano et al. (2021). For the exact derivation of the coefficients, see Caramenti et al. (2020). The event-related spatial coefficients are the geometrical spreading coefficient c 2 and the anelastic attenuation coefficient c 3 . The spatial coefficient dependent on the site location is k, the coefficient controlling the V S30 scaling. The spatial weights are calculated based on a Gaussian kernel: where b denotes the bandwidth of the kernel. The bandwidths used for the MS-GWR model are 25km for the event-related coefficients, and 75km for the station related coefficient, which are determined in Caramenti et al. (2020) through cross-validation.
The MS-GWR model in the specification of Caramenti et al. (2020) does not include systematic event and site terms. Following Lanzano et al. (2021), the residuals are partitioned into between-event, station-to-station, and within-event/within-station residuals, which allows one to calculate SS , , and S2S . This model is called "MS-GWR B". The partitioning of the residuals is done using INLA.

Model comparison based on predictive measures
First, I compare different (spatial and non-spatial) models fit with INLA in terms of the widely applicable information criterion 3 (WAIC, Watanabe 2010Watanabe , 2013, which is an information criterion for Bayesian models. Similar to the Akaike information criterion (AIC, Akaike 1973), lower values indicate a better fit. Like AIC, WAIC penalizes model complexity (Gelman et al. 2014). The different models, their specifications, and the WAIC and runtimes are listed in Table 1. The simplest (Model 1) model is a linear model of Eq. (9), without any random effects or spatially varying coefficients. I then add event and site terms, as well as the spatially varying terms for the intercept (both dependent on event and station coordinates), geometrical spreading, linear site scaling, and anelastic attenuation. I also replace the spatially varying anelastic attenuation coefficient with a cell-specific attenuation term. As expected, the simplest model performs worst in terms of WAIC. Adding event terms, and in particular site terms, leads to a major improvement. Incorporating spatially varying coefficients leads to lower WAIC values, with the inclusion of the spatially varying intercepts leading to the most improvement. One can see that the model with a spatially varying linear distance term dependent on the midpoint between the event and station coordinate has lower WAIC than when the coefficient depends on event coordinate. This is somewhat expected, as the anelastic attenuation should be anisotropic for a given event. The models with cell-specific anelastic attenuation coefficients perform best. This is due to the fact that the cell model treats the anelastic attenuation as a sum over the path.
I also fit a model that only includes spatially varying geometrical spreading, linear site scaling, and anelastic attenuation, but no intercepts or event/site terms. Such a model is closest to the MS-GWR model of Caramenti et al. (2020), which includes only these terms. In terms of WAIC, the INLA model with these terms performs better then the simplest models, but actually worse than a model without spatial terms, but site terms S2S . One should not draw conclusions about the MS-GWR model based on this comparison, though, since the WAIC values are tied to the INLA models and their parameterization.
The SVCM models lead to a significant increase in runtime of the models. However, the runtime is still manageable, with the longest running model finishing in under 40 minutes. In fact, even the longest running INLA model compares favorable with the MS-GWR Table 1 Comparison of different INLA models. † Column Random effects denotes which non-spatially effect is included, column Spatial effects which spatially varying effect is included. A spatially varying intercept is denoted with with c, spatially varying geometrical spreading with gs, spatially varying linear site term with vs, and spatially varying anelastic attenuation with r. Subscript e, s, es denote dependence on event, station, and midpoint coordinate. Column Cells indicates whether cell-specific attenuation is included, Runtimes depend on the computational load of the computer, and can be variable even for different runs on the same computer. All runs were carried out on a Macbook Pro with 2.8Ghz and 16GB RAM model in terms of runtime. One thing to note here is that INLA is run with a simple integration scheme and a simpler approximation to the posterior marginals 4 . Running Model 9 with default options (which uses a composite integration scheme over the hyperparameters, and the simplified Laplace approximation to the posterior marginals (Martino and Riebler 2020)) increases the runtime to 1596.59s for Model 9.
The WAIC values listed in Table 1 are based on calculating the likelihood of the observations (while integrating over the posterior distribution of the parameters), adjusted by the effective number of parameters (Vehtari et al. 2017). The likelihood of each observation is calculated conditional on all random effects, which means the median prediction for an observation includes the event and site term, and the standard deviation is SS or 0 . For ground-motion prediction, one typically wants to predict observations from new events, which means the event term is unknown, and the aleatory standard deviation is SS = √ 2 SS + 2 (or 0 , calculated the same way). To compare the models how they can predict new events, a 5-fold cross-validation (CV) is performed. The events are split into 5 sets. In each iteration, the observations from one set of events are left out and used as the test set, while the other sets are used as the training set. In the prediction of the test set, I include all random effects, i.e. if there are stations with records in both the training and test, the estimated site term is used for the prediction. The root mean squared error (RMSE) is calculated over all combined test data points.
Since the accurate quantification of the predictive uncertainty is important in PSHA, I also calculate the log-likelihood of each test observation. This indicates how well the predictive uncertainty is captured in a model, which is important for PSHA. Note that the predictive uncertainty contains both aleatory variability and epistemic uncertainty: there is aleatory variability because the predictions are made for previously unseen records and evens, and there is epistemic uncertainty because the exact values of the nonergodic model terms are not known, in particular if the new data is far from observations. There is also uncertainty in the values of the fixed effects. The log-likelihood values are calculated for each observation in the test set for all 5 folds, and the total log-likelihood is computed as the sum of all observations.   Table 1 For the CV calculations, the coefficients associated with style of faulting ( f 1 and f 2 in Eqs. (9) and (10)) are dropped. I only perform the CV calculations for a subset of the models of Table 1, mainly to reduce computation time. The two versions of the MS-GWR model are included. For the second model, the site terms estimated from the residuals to the corresponding site terms in the test set are added to the predictions. The log-likelihood values are not calculated for the second model, since it becomes difficult to combine the uncertainty from the fist model with the one from the second model.
The RMSE and log-likelihood values based on CV are shown in Fig. 3. The results in terms of ranking of the models are similar to the ones based on WAIC. However, one can see that the spatial models only provide a small improvement (slightly smaller RMSE, slightly larger log-likelihood) over the base model that includes event and site terms (Model 3). Based on WAIC, Model 8 performs best. This model includes a spatially varying geometrical spreading and a spatially varying V S30 -scaling coefficient, in addition to spatially varying constants and cell-specific attenuation. However, according to the CV results, model 9 performs better, which only includes the spatially varying constants and cell-specific attenuation. An explanation could be that an event-specific geometrical spreading term leads to a better prediction of records within the event, but that it is not well represented by a simple spatial correlation model. Another explanation could be that there are not enough events in the data set to resolve the spatially varying geometrical spreading well enough that it proves beneficial for predicting new events.
Another conclusion one can draw from Fig. 3 is that while RMSE and log-likelihood lead to a similar ranking of models, there is more separation between models in the loglikelihood values. Models 4 and 8 have essentially the same RMSE, but different log-likelihood values, which indicates that both lead to similar median predictions, but Model 4 has a better quantification of the predictive uncertainty.
In terms of RMSE and log-likelihood, the MS-GWR model provides an improvement over the linear model without random effects (Model 1), but performs worse than the simple random-effects model without spatial terms. Adding site terms to the predictions leads to improvements, with model MS-GWR B having similar RMSE values as the other models.

Fig. 4 Estimated valus of the between-even standard deviation
, site-to-site standard deviation S2S , and within-event/withinsite standard deviation SS . For models that do not partition the residuals, the estimated total standard deviation is shown. Error bars indicate the 2.5% and 97.5% quantiles of the posterior distribution  Figure 4 compares the standard deviations of different models, for the event, station and record levels ( , S2S , and SS ). The median of the posterior distributions are plotted, as well as the 2.5% and 97.5% quantiles. For the models that do not include event and site terms, the total standard deviation is plotted. In general, there is a reduction in the values of SS , S2S , and due to the inclusion of spatial terms, which is expected, and has been observed before Lin et al. 2011;Lavrentiadis et al. 2021). For the MS-GWR model, the total variability is about = 0.3 , which is larger than for the other spatial models. This might in part be explained that the model does not include event and  site terms, which are accounted for by the spatial terms. Since the bandwidths of the spatial kernels are quite long (75km and 25km), they cannot accommodate the non-spatially varying random effects, which would have a bandwidth (or spatial correlation length) of zero. The model that partitions the residuals of the MS-GWR model into between-event, station-to-station, and within-event residuals leads to values of that are comparable with the spatial INLA models, while SS and S2S are larger, and are comparable with the values from Model 3, which is the non-spatial model. These results are in line with the findings of Lanzano et al. (2021).

Spatially varying coefficients
Figures 5, 6, and 7 show maps of the spatially varying geometrical spreading, V S30 -scaling, and anelastic attenuation coefficients c 2 , k, and c 3 . These are shown for Model 7 (all spatially varying coefficients), Model 8 (cell-specific attenuation), Model 10 (no random effects or spatially varying intercepts), and the MS-GWR model. Maps for other models, as well as maps of associated uncertainties and the spatially varying intercepts, can be found at https:// github. com/ nikue hn/ NonEr gGMM_ Italy. Main differences can be seen for Model 10; since this model does not include non-spatially varying event and site terms, the model tries to accommodate these terms by very short spatial ranges for the geometrical spreading and V S30 -scaling. Differences between Model 7 and Model 8 are small for the geometrical spreading and V S30 -coefficient, as are differences between the anelastic attenuation coefficient for Model 7 and Model 10. Differences between the INLA models and the MS-GWR model are more pronounced, in particular for the geometrical spreading and anelastic attenuation coefficient. There are some similarities, such as low values for k in the Northwest at the border to France, and higher values in and around Tuscany. All models have positive values of the anelastic attenuation coefficient in the Northern part of Italy. This is an area where the Moho bounce is prominent (Lanzano et al. 2016), which biases the estimates of the anelastic attenuation if not properly modeled. The cell-specfic attenuation coefficients (Model 8 in Fig. 7) follow a similar pattern as the c 3 values of Models 7 and 10, with lower values along the Tyrrhenian coast. They also show some unphysical behavior: 15 out of 603 cells have a positive cell-specific attenuation coefficient.

Prediction
Here, models are compared in terms of their predictions. Predictions are calculated for an event in Middle Italy, for seismic waves traveling to the Northwest and Southeast. The geometry is shown in Fig. 8. Predictions are calculated for 1km to 200km, using Model 3 (only event and site terms), Model 9 (spatially varying intercepts and cellspecific attenuation), and the MS-GWR model. The predictions and associated epistemic uncertainties, denoted , are also shown in Fig 8. At short distances, the predictions of both SVCMs are similar, slightly higher than the ergodic base predictions from Model 3. This is due to the combined effect of a systematic event and site term, which both models capture with their spatially varying coefficients. The SVCM predictions show a different attenuation with distance than the base model. For the MS-GWR model, where the anelastic attenuation coefficient is dependent on event coordinate, the predictions to the North and South are very similar; small differences arise due to the spatially varying V S30 -scaling term at the different stations. By contrast, the INLA model predictions diverge, due to the cell-specific attenuation, with predictions to the South being more similar to the MS-GWR predictions. One can also see that the INLA predictions towards the South sometimes exhibit an increase in amplitudes with distance. This happens because of a positive cell-specific attenuation coefficient, which is unphysical, and should be corrected in a real application. Figure 8 also shows the standard deviation of epistemic uncertainty associated with the predictions, denoted as . Two plots are shown: in the first one, the standard deviation associated with the systematic site terms, S2S , is included in , while in the second plot it is not. If S2S is include in the uncertainty, then the base model (Model 3) exhibits larger values of than the spatial models. The reason is that the value of S2S is reduced do the inclusion of the spatial terms (cf. Fig. 4), while the additional uncertainty due to the spatial terms is reduced, since the predictions are made for locations close to observed data. On the other hand, if S2S is not included, then is determined only by the uncertainty of the fixed effects, and the uncertainty due to the spatial terms. In this case, the spatial models show larger uncertainty than the base model.  to the base model when S2S is included. In the INLA model, the uncertainty increases strongly with increasing distance. This is due to the cell-specific attenuation model. The predictve uncertainties of the INLA models shown in Fig. 8c and d are calculated by integrating over the uncertainties of all underlying parameters (the fixed effects, random effects (both non-spatial and spatial), and hyperparameters). Thus, different values of the hyperparameters are implicitly considered. Kuehn (2021b) showed that the impact of uncertainty in the hyperparameters on the predictions (both median and standard deviation is small.

Discussion and conclusions
The main point of the present paper is to compare SVCMs for the development of nonergodic GMMs. Two methods to model the SVCMs are compared, Bayesian hierarchical models based on GPs, and local models based on GWR, all on the Italian data of (Lanzano et al. 2019). Based on the analyses carried out in this work, both methods lead to broadly similar results, but with some differences in the details. Differences between methods should be embraced, as none of the models are a "true" representation of reality. Epistemic uncertainty, both on model parameters as well as on different models, is typically included in a standard PSHA. For nonergodic GMMs, the importance of accounting for epistemic uncertainty in the nonergodic terms has long been recognized (e.g. Walling and Abrahamson 2012). Nonergodic GMMs typically provide a method to calculate withinmodel uncertainty due to the spatial effects (e.g. Landwehr et al. 2016;Caramenti et al. 2020;Lavrentiadis et al. 2021), which can be incorporated into a nonergodic PSHA (Abrahamson et al. 2019). However, between-model uncertainty is generally not considered for nonergodic models. Both GWR and GP based SVCM methods have their strengths and weaknesses. Caramenti et al. (2020) and Lanzano et al. (2021) argue that GWR as a method is easier to understand. This is true, as in its simplest form it is a simple weighted linear regression, and the extension to MS-GWR, which takes into account multiple sets of coordinates, is relatively straightforward. A Bayesian SVCM, on the other hand, places a GP prior on the functional dependence of the spatial adjustment on the locations. However, one can understand the GP as a framework to extend the simple random effects structure of a typical GMM to a more general structure that takes more complicated correlations between records into account. Random effects model are common and familiar in GMM development, so this perspective should make the model easier to understand. Results based on CV have shown that inclusion of site terms S2S is important, with Model 3 (a non-spatial model with site terms) having similar performance as the spatial models. The GP based models can include random effects such as event and site terms directly, while they need to be estimated from partitioned residuals for MS-GWR. Figure 4 shows that the values of the resulting standard deviations SS , , and S2S are different. This could be attributed to just a case of between-model uncertainty. However, the Bayesian models show a reduction in all components of the standard deviation ( SS , S2S , and ), while for the MS-GWR model the main reduction is in the value of . Given that the spatial models are modeling nonergodic source, path, and site effects, one would expect a decrease in all components (Lin et al. 2011).
Even if one accepts that nonergodic adjustment terms (and SVCMs) are just a simple extension to a traditional random effects model, GP based models are not easy to fit. Often, these models are fit with MCMC sampling (Finley 2011). Since GPs scale poorly to large data sets, approximations need to be made with increasing number of observations, such as a nearest neighbourhood GP (NNGP, Datta et al. 2016;Finley et al. 2018). Here, I have used INLA, which allows to fit the spatial models efficiently. Neither INLA nor MCMC are as easy to apply as a simple (weighted) linear regression, and require some experience. However, this may be offset by the rich inferential framework supplied by the Bayesian methods, in particular the possibility to model complicated random effects structures. This includes the cell-specific attenuation term, which can be formulated as a random effect, is a more physical representation of the path effect than a simple spatially varying coefficient. In addition, one can set prior distributions on all parameters, which can help constrain the model. Informative prior distributions on model coefficients have been proven to be successful to improve model performance (Kowsari et al. 2020). The INLA models are fast enough to fit that one can carry out some sensitivity studies to different choices of the prior distributions.
In terms of predictive performance, the spatial models estimated with INLA outperform the MS-GWR models. Once site terms are accounted for in model MS-GWR B, the difference in terms of RMSE based on CV is small. However, the different models exhibit different spatial patterns of the spatially varying coefficients (cf. Figs. 5, 6, 7), which can lead to quite different predictions (Fig. 8), in particular at large distances. Given the similarity in RMSE, one should probably consider the notion of using different models based on different methodologies in a PSHA.
Both GWR and Bayesian SVCMs employed in this work are relatively simple. The MS-GWR model uses a single kernel, and the same bandwidth for the spatially varying coefficients that depend on the same location (geometrical spreading and anelastic attenuation). Murakami et al. (2019) note "the importance of scale in spatially Varying Coefficient Modeling" and note that the inclusion of different bandwidths for different coefficients is generally beneficial. This can be incorporated into a GWR model based on the methodology of Fotheringham et al. (2017). In the INLA models, different values of (the practical spatial range) are estimated for the different spatial effects, which adds flexibility to the model. Other extensions to increase flexibility are the use of adaptive kernels (where bandwidth is not measured in term of length, but in the number of nearest neighbours (Wolf et al. 2018)) for GWR, or the use of non-stationary and non-isotropic covariance function in Bayesian SVCMs.
Apart from a comparison of different SVCMs, this paper also highlights the usefulness of INLA to estimate nonergodic GMMs. A range of different models were fit with INLA, with different numbers of spatial random effects. INLA is quite fast (cf. Table 1), which is important for scaling the method to larger data sets. The Italian data set used here is relatively small (4784 records) compared with other regions, which means that model fitting can be done relatively easy. For comparison, the Californian data of the NGA-W2 data set Bozorgnia et al. (2014); Ancheta et al. (2014) comprises more than 12000 records. A nonergodic model for California similar to the ones estimated here takes about 1-2 hours to fit with INLA, compared to 20 hours using MCMC sampling (Kuehn 2021b). Data sets are going to increase in the next years, both due to newly recorded data, as well as lowering magnitude thresholds. This is an opportunity to develop better nonergodic models (more data means that the nonergodic terms can be better constrained), but this requires estimation methods that can handle large spatial data sets. INLA can fit a nonergodic GMM for an induced data set in Oklahoma, which contains more than 90000 records from about 3600 earthquakes (Rennolet et al. 2018), in about 3 hours (Walling et al. 2021). Thus, I believe INLA is a promising tool for the development of nonergodic GMMs for large data sets.
All nonergodic models investigated in this paper are empirical, data-driven models, and no physical constraints are imposed. This should be kept in mind when using these models in a PSHA, and the nonergodic adjustment terms should be checked that they do not violate physical behavior. Lavrentiadis et al. (2021) compared the cell-specific attenuation pattern of their SVCM, based on Californian data, with a regional Q-model, which is one way to check for physical consistency. The spatially varying event terms can be compared with spatial stress drop variations (Trugman and Shearer 2018). One can also incorporate simulated ground motions (Meng and Goulet 2022) to either help constrain the model, or to test the results. From a modeling standpoint, additional complexities could be incorporated via more complicated covariance structures (for example using non-stationary/non-isotropic correlation functions (Liu et al. 2022)). In the end, one needs to be aware of the promises (lower aleatory variability, better capture of systematic source, site, and path effects compared to an ergodic model), as well as the limitations (limited physical constraints, still relatively simple behavior of systematic effects) of current nonergodic models based on SVCMs.