A two-stage framework for automated operational modal identification

Abstract Automated operational modal analysis (OMA) is attractive and has been extensively used to replace traditional OMA, which requires much empirical observation and engineers’ judgment. Still, the uncertainties on modal parameters and spurious modes are challenging to estimate under the field conditions. For addressing this challenge, this research proposed an automated modal identification approach. The proposed approach consists of two steps: (1) modal analysis using covariance-driven stochastic subspace algorithm; (2) automated interpretation of the stabilization diagram. An additional uncertainty criterion is employed to initially remove as many spurious modes as possible. A novel threshold calculation for clustering is proposed with incorporating uncertainty of modal parameters and the weighting factor. An improved self-adaptive clustering with new distance calculation is used to group physical modes, followed by the final step of robust outlier detection to select outlying modes. The proposed automated approach requires minimum human intervention. Two field tests of the footbridge and a post-tensioned concrete bridge are used to verify the proposed approach. A modal tracking was used for continuously measured data for demonstrating the applicability of the approach. Results show the proposed approach has fairly good performance and be suitable for automated OMA and long-term health monitoring.


Introduction
Over the last few decades, long-term structural health monitoring (SHM) using vibration responses has been applied to civil infrastructures such as bridges and buildings (Chen & Ni, 2018). It is widely recognized that continuous monitoring is useful in tracking changes in dynamic modal parameters: frequencies, damping ratios, and mode shapes.In particular, operational modal analysis (OMA) is paid great attention because it can be implemented efficiently, economically, and safely, avoiding interruption to the normal operation of observed structure and requiring no artificial loadings (Brownjohn, Magalhaes, Caetano, & Cunha, 2010). Therefore, only ambient excitation (i.e., wind and traffic) is used to identify dynamic modal parameters by measuring vibration responses. Moreover, dynamic modal parameters are beneficial for: (1) finite element model updating (Zeng & Kim, 2020); (2) vibration-based damage detection (Honfi, Bj€ ornsson, Ivanov, & Leander, 2020); (3) real-time alarm systems for bridges and buildings (Chen et al., 2011).
It can be incorporated with maintenance systems to observe variations of modal parameters over time and track abnormality to prevent disastrous failure at an early stage. However, it requires a huge amount of recorded data and data analysis within a short period of time. Therefore, it is labor-intensive, even impractical, to process massive measured data and identify modal parameters by manual intervention and engineers' experience. Much user intervention on a large amount of vibration data can be an obstacle in a real application. For overcoming this, it is essential to have an automated evaluation of structural conditions in almost real-time.
In recent years, there are growing researches focusing on the development of automated OMA techniques. Generally, OMA techniques can be categories into two groups: direct methods and indirect methods. Direct methods measure vibration responses of observed structures to identify modal parameters, requiring networks of sensors installed on the target structures. Many system identification algorithms have been developed to be compatible with direct methods, such as stochastic subspace identification (SSI) (Cabboi, Magalhães, Gentile, & Cunha, 2017), eigensystem realization algorithm (Yang, Yi, Qu, Li, & Liu, 2019), poly-least squares complex frequency domain method (Poly-MAX) (Magalhães, Reynders, Cunha, & De Roeck, 2009), fast Bayesian FFT method (Au, 2011b), etc.
On the other hand, indirect methods are referred to as the primary use of mobile sensors installed inside in the passing-by vehicles to perform modal identification. No sensors are needed to be deployed on structures as the indirect methods. Recently, much progress has been made in terms of system identification using networks of mobile sensors. For example, mass-modeled vehicle scanning method for bridges (Yang, Lin, & Yau, 2004;Yang & Sun, 2020), hybrid sensors based method (Marulanda, Caicedo, & Thomson, 2017), extended structural identification using expectation optimization (STRIDEX) (Matarazzo & Pakzad, 2018), matrix completion method (Sadeghi Eshkevari, Pakzad, Tak a c, , crowdsensing based method . A detailed literature review on direct and indirect methods can be found in Malekjafarian, Mcgetrick, and Obrien (2015).
The present work belongs to the class of direct methods. Vibration response is measured from sensors installed on investigated structures. Among various system identification algorithms in the direct methods, SSI has been widely applied to diverse structures to perform modal analysis. It offers accurate identification results and simple implementation, which are important attributes accounting for its popularity. In addition, due to SSI's explicit mathematical nature, SSI tends to be more suited for automated modal identification. However, the major challenge in SSI is spurious modes appear in outputs.
Commonly, spurious modes consist of pure mathematical (i.e., non-physical) and noise modes (Reynders, Pintelon, & De Roeck, 2008). The most common strategy to deal with this challenge is to construct a stabilization diagram, a plot of model order vs. frequency for an extensive range of model order. In the stabilization diagram, physical modes are referred to as those poles that cross most of the model orders consistently. Therefore, physical modes should be graphically recognized and homogeneously distributed along vertical alignments in the stabilization diagram (Cabboi et al., 2017). On the contrary, spurious modes appear in the stabilization diagram in a scattered way. Spurious modes are eliminated in a manual analysis depending on empirical discovery and engineers' judgment, which is subjective, timeconsuming, and leads to incorrect modal identification.
For addressing this issue, a variety of methods are proposed in the literature to automatically interpret stabilization diagrams and remove spurious modes. In general, the process can be divided into three steps: 1.
Step 1: Apply the modal validation criteria to eliminate as many spurious modes as possible in the stabilization diagram 2.
Step 2: Group modes with similar characteristics, i.e., frequencies, damping ratios, and mode shapes by clustering strategies 3.
Step 3: Detect outliers in each cluster to improve the accuracy of modal parameters and select representative of each cluster Several methods aiming at minimizing human involvement in the interpretation of the stabilization diagram have been developed. For example, in step one, many modal validation criteria are proposed to detect spurious modes in the stabilization diagram. These criteria include hard criteria, which yield a binary answer, and soft criteria, which yield a certain range of values. Reynders, Houbrechts, and De Roeck (2012) thoroughly reviewed and summarized hard and soft criteria. However, conventional modal validation criteria limitedly remove a certain number of spurious modes; many spurious modes, which still remain in the stabilization diagram, affect parameter estimates' accuracy and imposes a computational burden to the following step (clustering process).
In step 2, various clustering strategies are widely employed to group modes with similar characteristics. Hierarchical clustering has been extensively applied by many researchers and is considered as the most natural approach (Reynders et al., 2012). Hierarchical clustering has a significant advantage of allowing a good selection of physical clusters. However, the main drawbacks include a userdefined tree cutoff distance and human intervention with demanding computational cost. Furthermore, the hierarchical clustering is sensitive to outliers. Another popular clustering strategy is partitioning methods, often referred to as K-means clustering (Neu, Janser, Khatibi, & Orifici, 2017). K-means clustering has the benefit of being fast processing. However, the number of clusters has to be predefined, and it is sensitive to cluster seeds (initial centroid). By merging the benefits of hierarchical clustering and K-means clustering to overcome some of their limitations, self-adaptive clustering is recently proposed (Cabboi et al., 2017;Fan, Li, & Hao, 2019). The self-adaptive clustering has outstanding features: (1) simple implementation; (2) fast computation; (3) no need for the number of clusters; (4) clustering threshold is iteratively trained during the clustering process.
While it still starts with a user-defined clustering threshold, which requires some level of human intervention. Some methods are proposed to automatically calculate clustering threshold based on statistical properties, i.e., mean and standard derivation or median, of the distance between two closed poles in the stabilization diagram (Magalhães et al., 2009;Reynders et al., 2012;Yang et al., 2019). However, these methods do not consider uncertainty on modal parameters and inaccuracy of mode shapes. In practice, modal parameters' uncertainty is inevitable due to modeling error and measurement noise; it can be a more reasonable approach to consider uncertainty when calculating the clustering threshold. Also, measurement on mode shapes is less accurate than that on frequencies. Thus, a weighting factor can reduce the inaccuracy of mode shape difference on threshold calculation (Boroschek & Bilbao, 2019).
In step 3, some outliers are undesirably involved in identified physical clusters; this phenomenon is pronounced in a damping ratio with a scattered nature. Most outlier detection techniques need to define limit bounds, such as boxplot outlier detection (Yang et al., 2019). A bound-free outlier detection method is needed to improve the accuracy of parameter estimations. In summary, challenges to the current automated interpretation of the stabilization diagram are listed as follows: 1. Conventional modal validation criteria are inefficient resulting in high computational cost in the clustering process. 2. The clustering threshold and distance calculation in the clustering process does not consider the uncertainty of parameters and the weighting factor. 3. Uncertainties on identified modal parameters and physical clusters are unavailable.
4. Outlier detection requires to define limit bounds.
This article attempts to address the aforementioned challenges. A two-stage framework for automated operational modal identification is proposed. Figure 1 shows the flowchart of the proposed framework: (1) modal analysis using covariance-driven reference-based SSI (SSI-cov/ref); (2) twostage automated interpretation of stabilization diagram. In the first place, SSI-cov/ref is adopted to perform modal analysis and construct a stabilization diagram. Subsequently, a two-stage automated analysis for the stabilization diagram is carried out.
At the pre-processing stage, besides applying conventional modal validation criteria, such as damping ratio check and modal complexity check, to eliminate spurious modes, a new supplementary criterion: uncertainty criterion, which is also applied for further removal of spurious modes. At the clustering stage, a novel threshold calculation, which incorporates the uncertainty of modal parameters and weighting factor, is proposed. An improved self-adaptive clustering with new distance calculation is then employed to group modes with similar characteristics and identify physical clusters. Finally, robust outlier detection is implemented to exclude outliers. The average of each cluster's elements is chosen as representative frequency, damping ratio, and mode shape.
Two field tests were used to evaluate the proposed approach. The first field example is the Dowling Hall Footbridge located at Turfs University, a two-span steel frame bridge with 144 ft (44 m) long and 12 ft (2.7 m) width having a reinforced concrete deck. The second field test example is the Z24 bridge located in Switzerland, a post-tensioned concrete bridge with a main span of 100 ft (30 m) and two sides span of 46 ft (14 m), considered as a benchmark in the research community.
This article is organized as follows: In Section 2, the background of SSI-cov/ref is briefly introduced. In Section 3, a two-stage approach for proposed automated modal identification is presented. In Section 4, the capability of the proposed approach is validated by two field tests along with the modal tracking results. Finally, the conclusion is presented in Section 5.

Background of SSI
SSI has been extensively spread over the field of OMA during the past few decades accounting for its quick implementation and high accuracy. In this article, covariance-driven reference-based SSI (SSI-cov/ref) is employed to reduce the dimensions of the output matrix and computational cost. The detailed theoretical fundamentals of SSI-cov/refl are fully described in the literature (Peeters & De Roeck, 1999). Briefly, SSI-cov/ref is developed based on assuming a linear and stationary N degree of freedoms (DOFs) system with a dynamic motion characterized by the discrete-time statespace equation: where subscript k denotes time step; A 2 R nÂn denotes system state matrix with (n ¼ 2N); C 2 R lÂn is an output matrix, l is defined as the number of measured signals; x k 2 R nÂ1 and y k 2 R lÂ1 are discrete-time state vector and measured response vector, respectively; x k 2 R nÂ1 is a process white noise vector. v k 2 R lÂ1 is the measurement of the white noise vector. Peeters and De Roeck (1999) reported that modal parameters can be identified from system matrices A and C: Two main SSI preparation parameters significantly affect the accuracy of identification results: (1) model order; (2) time lag, i: Unfortunately, the value of model order and i, which yield the best identification results are never known (Fan et al., 2019;Ubertini, Gentile, & Materazzi, 2013). In practice, it is necessary to over-specify model order to cover  weakly-excited modes, but spurious modes increase with model order increases. These spurious modes must be singled out in the subsequent procedure. On the other hand, the value of i determines the size of the response covariance function. The smaller i may fail to identify the fundamental mode, but the larger value of i yields more spurious modes and increases computational time. The value of i may be chosen at least estimated value as follows (Fan et al., 2019): where T i denotes fundamental period (s); t denotes sampling interval (s).

A Two-stage automated modal identification
In this section, a two-stage framework for automated modal identification is proposed. The flowchart of the entire automated process in detail is presented in Figure 2. Section 3.1 presented the pre-processing stage for automated identification, including conventional modal validation criteria (Section 3.1.1) and a new additional uncertainty criterion (Section 3.1.2). Subsequently, the clustering stage is introduced in Section 3.2. A newly proposed threshold calculation for clustering is provided in Section 3.2.1. An improved self-adaptive clustering is then employed to determine physical clusters in Section 3.2.2. Finally, robust outlier detection is performed to improve the accuracy of modal parameter estimates in Section 3.2.3. The pseudocode of the proposed automated framework is also provided in Appendix A.

The pre-processing stage
A full stabilization diagram looks very complex and contains the spurious modes. At the pre-processing stage, different validation criteria are utilized to initially screen spurious modes, clarifying the stabilization diagram and consequently accelerating automated interpretation of the stabilization diagram.

Modal validation criteria
Firstly, for civil engineering structures, a negative or high damping ratio hardly appears in practice. Therefore, a damping ratio with less than 0 and higher than 10% is discarded (Cabboi et al., 2017;Fan et al., 2019). Additionally, two popular modal validation criteria are used to measure the complexity of mode shape vectors, namely, modal phase collinearity (MPC) and mean phase deviation (MPD). These two indicators have been utilized by various researchers to distinguish physical modes from spurious modes (Neu et al., 2017;Reynders et al., 2012). The real (Re) and imaginary (Im) part of mode shapes display a linear correlation, which can be assessed by the MPC indicator. The value of MPC for the t th mode shape, / t , is expressed as (Reynders et al., 2012): The kth component of/ t is given: MPC values are dimensionless; they lie within the range of 0 and 1. MPC value closer to 1 indicates that mode shape, / t , is more collinear and 'monophase,' which is usually regarded as a physical mode.
With regard to MPD, it represents the phase degree of each identified mode shape vector. The value of MPD/90 lies between 0 and 1. A smaller quantity of MPD implies that mode shape vector is more likely to be physical. A detailed discussion can be found in Reynders et al. (2012). For the t th mode shape, / t , the mean phase (MP) is defined as: where h is phase angle in degree, Eq. (6) can be solved by the least square as: where USV T is singular value decomposition, V 12 and V 22 denotes elements (1,2) and (2,2) of V matrix, respectively. Then, MPD can be determined as: where x k is a weighting factor that equals to the kth component of the t th mode shape, / t : The selection of threshold values of MPC and MPD depends on measurement conditions and dynamic vibration properties. For a structure with clear linear behavior and high signal-to-noise ratio, the threshold of MPC and MPD can be conservatively chosen as 0.7 and 0.3, which implies that modes whose MPC are less than 0.7 and MPD exceed 0.3 are regarded as spurious modes. Conversely, the threshold of MPC and MPD are chosen as 0.3 and 0.7 in the case of structures with complex behavior (Cabboi et al., 2017;Fan et al., 2019). Two representative field tests with complex measurement conditions are used to validate the methods. Thus, the values of 0.3 and 0.7 are selected as a threshold for MPC and MPD, respectively, in this study.

Uncertainty criterion
Although conventional modal validation criteria remove certain spurious modes, many spurious modes still remain in the stabilization diagram, slowing down the following process (herein clustering process). More effective validation criteria should be adopted to delete as many spurious modes as possible. This study employed supplementary uncertainty criteria at the pre-processing stage to further eliminate spurious modes. Uncertainty on modal parameters by SSI mainly arise from five sources: (1) finite number of data sample; (2) unmeasured excitation and measurement noise modeled as white noise; (3) assumption of linear and stationary behavior; (4) imperfect filter of data; (5) incorrect choice of model order (Reynders et al., 2008). Reynders et al. (2008) initially developed the uncertainty computation based on the propagation of first-order perturbation from measured data to modal parameters. Also, some validation and application are summarized in Reynders et al. (2016).
Later, D€ ohler, Lam, and Mevel (2013) significantly improved the computational efficiency of uncertainty at multiple model orders, which has been applied in various structures (D€ ohler et al., 2013). Uncertainty quantification can provide information to measure the accuracy of identified modal parameters. It is the fact that the uncertainty of physical modes is smaller than those of spurious modes. Based on this information, coefficient of variation (COV) (standard derivation/mean) with respect to frequency may be used to distinguish physical modes from spurious modes . Some research has introduced uncertainty features in the stabilization diagram, but uncertainty criterion is not used or does not contribute to further automated modal procedure. General procedures of uncertainty computation are summarized as follows: Input parameters: the number of block rows in Hankel matrix, q; the amount of data blocks, n b ; the range of model order, ðn min , n max Þ; Compute Hankel matrix,H, system state matrix, A and output matrix, C in Eq. (1), as well as observability matrix, O, based on SSI-cov/ref, then compute transform matrix, T Compute covariance and sensitivity of subspace matrix from SSI-cov/ref, given byR H cov and J O, H , respectively Compute sensitivity and covariance of system state matrix, A, and output matrix, C from SSI-cov/ref, given by J A, O , J C, O and R A, C , respectively For each mode i at successive modal order, compute sensitivity matrix: Comprehensive derivation for uncertainty estimation can be found in . When the COV in the frequency is chosen as a threshold (herein, 2%), modes with the COV of frequency larger than the threshold will be discarded.

Clustering stage
The clustering stage is sequentially performed to assemble modes based on similarities in modal parameters in this section. A novel method is proposed to calculate the clustering threshold; an improved self-adaptive clustering is then used to identify physical clusters. Finally, robust outlier detection is implemented, and each representative of modal parameters is determined.

Automated computation of clustering threshold
Typically, two kinds of thresholds are usually adopted for clustering: (1) static threshold; (2) automatically computed threshold. A static threshold relies on the engineers' judgment. Also, during long-term health monitoring, a welldefined static threshold may be suitable for some initial datasets; however, there is no guarantee that the static threshold will be keeping appropriate for all datasets. This is more challenging in the case of handling massive datasets. In this study, a novel method is proposed to calculate the clustering threshold based on possible physical modes at the pre-processing stage. First, the mutual distance between the two modes is defined as: f i and f j are ith and jth identified frequency at a pre-processing stage, respectively; r f i and r f j are corresponding standard derivation, respect- jth mode shapes at a pre-processing stage, respectively; r / i and r / j are corresponding standard derivation, respectively.
x is a weighting factor of mode shape differ- : Equation (9) does not consider the damping ratio difference because it is difficult to accurately measure the damping ratio in practice. In addition, there is a high probability of two different modes with a similar damping ratio. A weighting factor, x, in Eq. (9) represents different participation for frequency difference and mode shape difference. Generally, mode shape is measured with limited sensors, yielding missing components of mode shape; frequency is usually measured with an accurate level. Therefore, the use of x can reduce the effect of measurement inaccuracy of mode shapes on distance calculation (Boroschek & Bilbao, 2019). An uncertainty quantification using standard derivation is used to form a weighting matrix for Finite Element Model Updating (Yang & Lam, 2018). Similarly, this work adopted the average of the standard derivation of mode shapes to define x: Furthermore, as uncertainty on modal parameters is inevitable in practice, it is more reasonable to incorporate uncertainty in distance calculation. Here, two standard derivations are considered in Eq. (9). At the next model order, the mutual distance between one mode and all other modes is computed by Eq. (9), then the minimum distance is determined. Assuming n modes have been identified at the pre-processing stage, each mode has its minimum mutual distance with forming a minimum distance vector, V ¼ ðd 1 min , d 2 min , d 3 min Á Á Á d n min Þ, (n denotes the number of modes, d min denotes the minimum distance between one mode and all other modes). Finally, the sum of mean and two standard derivations of V are used to compute the clustering threshold, d (Reynders et al., 2012): Generally, modal features are usually assumed to follow Gaussian normal distribution, such as frequency, damping ratio and mode shape (MAC value) (Au, 2011b). In this study, Eq. (9) defines modal distance which is the sum of frequency difference and mode shape difference between two modes. Therefore, modal distance turns out to be Gaussian normal distribution. Two standard derivations in Eq. (10) guarantee the distance between two modes should be captured within a 95% confidence interval with the assumption of Gaussian distribution.
It is worth mentioning that conventional threshold calculation for clustering does not consider the uncertainty of modal parameters and weighting factor. A newly proposed threshold calculation in Eq. (9) incorporates both uncertainty and weighted factor, which is considered a contribution of this study to the advancement of automated OMA.

Mode clustering
Mode clustering starts with a calculated threshold in Section 3.2.1. to group individual physical modes with similar modal characteristics. This study adopts self-adaptive clustering (Cabboi et al., 2017) to accomplish automated process. But different from original Cabboi et al.'s work, a weighted distance with an uncertainty of modal parameters is proposed. The i th weighed distance at model order, n, is defined as: n, i ¼ / n, i þ 2r / n, i :f z and/ z are mean frequency and mean mode shape at the zth cluster, respectively; rf z and r/ z are corresponding mean standard derivation at the zth cluster. f n, i and / n, i are the ith frequency, and mode shape at model order, n, respectively; r f n, i and r / n, i are corresponding standard derivation, respectively.MAC represents the modal assurance criteria (Pastor, Binda, & Har carik, 2012). c is a weighting factor to reduce the effect of inaccurate mode shape on distance calculation c ¼ r/ z þr / n, i 2 : Equations (9) and (11) are similar; both consider the uncertainty of parameter estimates and the importance of mode shape difference. Figure 3 shows the flowchart of an improved clustering strategy. A more detailed introduction of self-adaptive clustering is referred to Cabboi et al. (2017).

Robust outlier detection
The number of physical poles in the stabilization diagram has trends with the increase of model order, exhibiting variability of modal estimates (Neu et al., 2017). The phenomenon more frequently appears in the damping ratio because the damping ratio has a high scattered nature. Outlier detection is applied to penalize undesirable modes in the final clusters for reducing identification variance from different measurements. In this study, robust outlier detection based on the minimum covariance determinant (MCD) is employed to identify outlying values from physical clusters as described in Hubert, Debruyne, and Rousseeuw (2018). A robust distance (RD) is defined as: where observation sample, x, is either frequency or damping ratio in a physical cluster in our case.l MCD is the MCD estimates of location;R MCD is the MCD covariance estimate. Explicit derivation and introduction can be found in Hubert et al. (2018). A robust MCD estimator based on Eq. (12) is very powerful to flag outliers, as RD in Eq. (12). It is not sensitive to diagnostic tools' masking effect compared to statistical distance and Mahalanobis distance (Cerioli, 2010). Also, MCD has a high resistance to outliers and are more robust and efficient (Hubert et al., 2018). Furthermore, MCD has the advantage of requiring no user-defined threshold, like a box-plot method that needs to define limit bounds (Sarlo & Tarazaga, 2019). Robust outlier detection in this work can be done by the function 'robustcov' in MATLAB.
After outlier removal, the average frequency, damping ratio, and mode shape in each physical cluster are taken as a representative. For evaluating the quality of each identified cluster, uncertainty on the zth physical clusters is quantified by Euclidean norm of uncertainty on modal parameters: where r z is the standard derivation of the zth clusters;r f , z , r /, z , andr n, z are the average values of standard derivations of all frequencies, damping ratios, and mode shapes in the zth clusters.

Application
In this section, the performance of the proposed approach is validated by two field tests on the bridge, namely, the Dowling Hall Footbridge located at Turfs University in the USA and the Z24 bridge benchmark located in Switzerland.
The data are open sources, and many researchers used these data to test the algorithms in the research community.

Application 1: Steel frame footbridge
Dowling Hall Footbridge is located at Tufts University, as shown in Figure 4(a). The bridge is a two-span steel frame bridge, 144 ft (44 m) long and 12 ft (2.7 m) wide with a reinforced concrete deck. Continuous health monitoring was designed and performed on Dowling Hall Footbridge from January 2010 to May 2010. The layout of eight accelerometers is shown in Figure 4(b). More details of Dowling Hall Footbridge's information can be found in Moser and Moaveni (2011). In this study, the first six modal characteristics are used as baseline results obtained from the literature (Moser & Moaveni, 2011) to evaluate the performance of the proposed approach.
The acceleration data used in this study are obtained from vertical measurement under ambient excitation collected in the first week at 1:00 p.m. on January 7, 2010. The frequency range of interest is 0-30 Hz. The sampling frequency is 128 Hz. Preparation parameters for SSI-cov/ref in this application are: i ¼ 60, model order n ¼ 40-150, reference sensor ¼ ð1, 2, 3, 4, 5, 6, 7, 8Þ:

Identification results
The proposed approach described in Section 3 is utilized to analyze measured data. Figure 5(a-c) displays the modal identification results at the pre-processing stage. The singular value spectrum (appeared in the curves in Figure 5) is plotted below the stabilization diagram. The standard derivation (6r) uncertainty bounds of the frequency are shown as horizontal bars. Figure 5(a) displays all possible physical modes remaining in the stabilization diagram after applying conventional validation criteria, e.g., damping ratio check and modal complexity check. Figure 5(b) depicts the stabilization diagram filtered by a supplementary uncertainty criterion. It is observed only using conventional validation criteria, the stabilization diagram still looks busy, including lots of scattered poles, which are spurious modes. However, uncertainty criterion can eliminate as many spurious modes as possible compared to conventional validation criteria, which will speed up later automated processes. The pre-processing stage's identification results demonstrate that the uncertainty criterion is more effective than conventional validation criteria.
The clustering stage then starts with a calculated clustering threshold using Eqs. (9) and (10) based on the remaining modes in Figure 5(b). The proposed method's calculated threshold in this example is 0.022, while without the weighting factor, x, it is 0.0488. It implies the stricter threshold by Eqs. (9) and (10) that allows removing more spurious modes with keeping physical modes. Furthermore, the  updated threshold by improved clustering with Eq. (11) is 0.0086. Still, the original work (distance calculation without weighting factor) gives the updated threshold as 0.0304, indicating that the weighted distance tends to give a smaller value of the updated threshold.
The identified modes are more consistent and stable. It may be attributed to the use of x can improve the accuracy of measured mode shapes by distance calculation. Figure  5(c) shows the modal identification results after performing the improved self-adaptive clustering. It is observed that  clustering procedures remove spurious modes, the stabilization diagram is clarified with only remaining stable modes (vertical alignments). The first six modes in the reported work (Moaveni & Behmanesh, 2012) are used as a baseline for comparison, marked as M 1 to M 6 (a total of six clusters) in Figure 6. It is noted that M 5 and M 6 are closely spaced modes, which are a common challenge in the OMA. The robust outlier detection is to remove outlying frequencies and damping ratios. As seen in Figure 6 (b), damping ratios are tighter and more consistent. Table 1 presents identified frequencies and damping ratios along with the baseline data. Identified frequencies in this work agree well with those in the literature; the maximum relative difference (2.05%) is observed in the third mode. While larger variation is found in terms of damping ratio. It is because two tests were performed at a different time. Moaveni and Behmanesh (2012) reported baseline data, measured on April 4, 2009. In this study, measured data were collected on January 7, 2010. When considered the effect of environmental variables such as temperature, it is not surprising to have these differences. The frequency is less sensitive to environmental effects than the damping ratio.
The uncertainty on modal parameters and physical clusters are also investigated in this example. The frequency and damping ratio in each mode are plotted as an open circle overlapping the standard derivation (r) error bar in Figure  7. And the uncertainty of modal frequencies is much smaller than those of damping ratios. It is often more difficult to accurately measure the damping ratio in practice. The uncertainty of identified physical clusters is also quantified by Eq. (13) and shown in Table 2. And the uncertainty of the sixth cluster is much larger than those of others, suggesting it is more challenging to identify the sixth cluster because this cluster contains weakly-excited and closely spaced modes.
Overall, the proposed approach successfully identifies six modes under ambient vibration, as shown in Figure 8. The first six global mode shapes with corresponding uncertainties are presented; 62r uncertainty bounds are plotted as blue dashed lines. Identified mode shapes have good agreement with those identified in the reported work (Moaveni & Behmanesh, 2012). Modes 3 and 4 are bending-torsional modes with evident rotational motion, while only vertical deformation is found on other modes. In addition, uncertainty bounds for all modes are narrow, which concludes that the identification of mode shapes is accurate.
For continuous SHM, it is crucial to track the change of modal parameters over time. In this example, the proposed approach is applied to modal tracking with measured data collected at every 1:00 p.m. from January 5 to February 28, in 2010 (total 55 datasets). The same procedures as the former data analysis are applied for modal tracking. As shown in Figure 9, solid black lines indicate frequency estimates, and grey areas cover 62 standard derivations. All six modes are identified and tracked for all datasets by the proposed approach. It is not surprising that frequencies varied over time, mainly because of environmental change and ambient excitation's randomness. The frequency at mode 6 has a relatively larger variation for two months, as this mode is not excited well and unstable to environmental change. The results illustrate the proposed approach can analyze massive datasets with minimum human intervention.

Sensitivity analysis
Two preparation parameters in SSI, e.g., maximum model order, n max , and time lag, i, significantly affect identification results as described in Section 2. The influence of n max and i is investigated to demonstrate the proposed approach is     robust to their choice. n max and i are varied from 70 to 160 and 30 to 120 in intervals of 10, respectively. As shown in Figure 10, identified frequencies are almost invariant to a different choice of n max and i, suggesting the proposed approach is robust and not sensitive to these two preparation parameters. It is very difficult to identify the best n max and i in practice (Neu et al., 2017;Ubertini et al., 2013). Thus, insensitivity to them allows to more conveniently perform automated OMA and continuous health monitoring.
On the other hand, a different choice of uncertainty threshold (COV of frequency) is utilized to evaluate its effect on identification results. The uncertainty threshold is varied from 1% to 5% in the interval of 1%. As shown in Figure 11, the proposed approach yields the same frequencies regardless of COV thresholds, indicating a COV threshold can be safely chosen in the range of 1% COV 5%:

Application 2: Concrete box girder bridge
The proposed approach is also applied to the Z24 bridge benchmark to validate its performance. The Z24 bridge was built in 1963 and located in Switzerland, serving to connect Koppigen with Utzenstorf and crossing over the A1 highway (see Figure 12(a)). It is a post-tensioned concrete box-girder bridge with a main span of 100 ft (30 m) and two sides span of 46 ft (14 m). Detailed introduction of the Z24 bridge can be found in Maeck and De Roeck (2003). The Z24 bridge was demolished at the end of 1998. Before the complete demolition, a short-term progressive damage test was implemented on the bridge to investigate the effect of simulated damage on the safety of the bridge.
A total of 17 different damage scenarios were designed under full forced and ambient excitation . In this work, acceleration response data from the scenario of No.8 for the new reference condition under ambient excitation is used to assess the proposed approach. A total of 291 DOFs were measured (see Figure 12(b)). Due to the limited number of sensors, only at most 33 DOFs were measured for each set-up. Therefore, nine measurement set-ups were recorded with most 33 sensors to have full location coverage of the whole bridge, containing five reference sensors that are common to each set-up and 28 moving sensors whose location changes with different setups. For the No. 5 set-up, only 22 moving sensors were used. Samples of 65,536 data were recorded at each set-up at a 100 Hz sampling rate.

Identification results
Nine stabilization diagrams corresponding to each set-up are created; results of the fifth set-up are only presented in Figure 13 due to space limitation in this article. The singular  value spectrum (appeared in curves in Figure 13) is also plotted below the stabilization diagram. 6r (standard derivation) uncertainty bounds of frequency are plotted as horizontal bars. Figure 13(a) displays modal identification results using conventional validation criteria, many scattered poles which are definitely spurious modes, still retain in the stabilization diagram. However, the uncertainty criterion can remove most spurious modes, demonstrating that the uncertainty criterion is more effective than conventional validation criteria (see Figure 13(b)).   Based on the remaining poles in the stabilization diagram after the pre-processing stage, the clustering threshold for each measurement set-up is calculated using Eqs. (9) and (10). As shown in Figure 14(a), all threshold values are significantly reduced compared to those calculated without weighting factor, as weighting factor can offset the effect of mode shape difference. Manual clustering thresholds in commercial OMA software are usually below 0.06 (Neu et al., 2017). The threshold derived from the newly proposed method is closer to the one from manual analysis, indicating proposed method's rationality and feasibility in practice. Mode clustering described in Section 3.2.2 is then implemented to group physical modes. The number of scattered poles is greatly removed by the proposed approach, only remaining obvious vertical alignments in the stabilization diagram in Figure 14(b). In addition, an improved self-adaptive clustering that considers the weighting factor of c in Eq. (11) tends to give a smaller updated threshold, implying identified modal parameters are more stable and consistent with each other (see Figure 15). The use of the weighting factor can improve the performance of clustering. Because only the first six modes are present in baseline work (Reynders et al., 2012), the first six clusters are presented in Figure 14(b), marked as P 1 -P 6 : Robust outlier detection is used to identify outlying modes (See Figure 16). Finally, the average of modal parameters in each cluster is selected as representative. Table 3 shows the sample mean and sample standard derivation of frequency and damping ratio over nine measurement set-ups obtained from the proposed approach. The standard deviation in Table 3 represents the setup-to-setup sample statistics among all set-ups. The calculation of sample standard derivation in Table 3 only considers the environmental change among different set-ups rather than uncertainty sources summarized in Section 3.1.2. It is seen from Table 3 that the damping ratio has more significant variability than frequency, implying it is more challenging to identify damping ratio in practice, as the damping ratio is sensitive to environmental change. Overall, the proposed approach's identified frequencies and damping ratios are almost identical to those from baseline work, demonstrating low demand for human intervention.
Uncertainties of modal parameters airing from assumptions made in SSI, such as linear, stationary structural behavior, white noise, etc., are also studied. Figure 17 depicts the variability of frequency and damping ratio from modes 1 to 6 across nine measurement set-ups, respectively, with open circles representing the parameter estimates and error bars covering 62r standard derivations. Both frequencies and damping ratios change over time, while the damping ratios have larger uncertainties. The negative damping ratio is immaterial in Figure 17(b), such as mode 5 at No. 4 set-up and mode 6 at No. 1 set-up, merely because of the Gaussian distribution approximation and the larger standard derivation. Mode 6 has relatively larger uncertainty since the mode is not excited well. Table 4 presents the average of standard derivation for each cluster over nine measurement set-ups using Eq. (13). As expected, the sixth cluster has the highest uncertainty, implying it is relatively harder to identify this cluster, which is also reflected in Figure 14(b) that the sixth vertical alignments from the left form at a very ambiguous peak. Generally speaking, quantities of identified frequency and damping ratio are consistent from one to another set-up numbers, suggesting robust and fair performance on modal analysis.
The global mode shapes are directly assembled from a local one in a single dataset by multiplying by a scaling factor so that their common DOFs, at the location of reference sensor, agree well with each other through data fitting, namely, the method of the least squares. For the sake of article spaces, detailed procedures for calculating scaling factors are referred to work (Au, 2011a).
As seen in Figure 18, the entire six modes are successfully identified from vibration response in all the nine measurement set-ups, which are in good accordance with those in Reynders et al. (2012). Mode 1 is a typical bending mode with a symmetric shape that has the maximum deflection at midspan. Mode 2 is the first torsional mode with a slight rotational dynamic behavior along y axis (transverse direction). Similar to mode 2, but more significant rotation is observed on modes 3 and 4; they are another two torsional modes. Modes 5 and 6 are vertical modes with asymmetric shapes.
Furthermore, five mode shapes at the only vertical direction (corresponding z axis in Figure 18) and one mode shape at the only transverse direction (corresponding y axis in Figure 18) are also presented in Figures 19 and 20, 62r uncertainty bounds are plotted as blue dashed lines. Figure  19 presents only mode 6 has relatively wider uncertainty intervals since it is weakly-excited, while others have narrow bounds, implying mode shapes are identified with an accurate level.
To further investigate the performance of the proposed approach for continuous health monitoring. As seen in Table 5, the proposed approach is applied to eight different damage scenarios during the short-term progressive damage test. A total of 72 datasets consists of nine individual measurement set-ups for each damage scenario. The tracked frequencies, damping ratios, and associated uncertainty are plotted in Figures 21 and 23, with sample mean (solid black lines) and two averages of standard derivation (grey shaded areas) among all measurement set-ups. It is observed that the maximum frequency happened at scenario No. 1 corresponding to undamaged condition; the minimum frequency happened at modes 1, 3, 4, and 5 in scenario No. 6 and modes 2 and 6 in scenario No. 7 for corresponding to 95 mm settlement of pier and tilt foundation.
As seen in Figure 22, damage scenarios in Table 5 have a significant effect on frequency, especially when the pier is settled and the foundation is tilted. For example, frequency reduction at modes 1, 3, 4, and 5 reaches the maximum magnitude when the pier has the maximum settlement, 95 mm, ranging from 5.93% to 8.08%. On the other hand, modes 2 and 6 have the maximum frequency reduction of 9.06% and 3.66%, respectively, due to the foundation's tilt, respectively. In contrast, 95-mm pier settlement still impairs on frequency, suggesting pier and foundation may be paid more attention during SHM.      Figure 19. Mode shapes at X-Z plane with 62r uncertainty bounds. Figure 23 illustrates the variability of damping ratio is smaller than of frequency, implying damping ratio is not sensitive to global damage scenarios in Table 5, but the damping ratio has much larger uncertainty. The results demonstrate potential benefits to handle a large amount of data with an acceptable level of performance while reducing human involvement. Therefore, the proposed approach is suitable for continuous health monitoring and modal tracking.

Sensitivity analysis
To examine the performance of the proposed approach in case of a different combination of preparation parameters in SSI-cov/ref, e.g., the maximum mode order, n max , and time lag, i, the sensitivity analysis is conducted for No. 5 measurement set-up in this example. n max and i range from 70 to 160 and from 30 to 120, respectively. As seen in Figure 24, the proposed approach has consistent behavior for identifying six modes using different SSI-cov/ref preparation parameters.   Similar to Application 1, Figure 25 shows that any threshold between 1%and5% yields the same outcomes. The sensitivity analysis demonstrates that the proposed approach is insensitive to two crucial parameters in SSI-cov/ref: model order and time lag. Generally, model order is over-estimated to identify weakly excited modes, yielding more spurious modes; a small value of time lag may fail to generate enough stable poles in the stabilization diagram. It is very difficult to determine the best model order and time lag in real test. The proposed approach provides more flexibility for the selection of the two parameters, significantly facilitating automated modal identification in practice.

Discussion and recommendations
The proposed approach accurately identifies the modes of interest and concurrently eliminates spurious modes. The  weakly excited and closely spaced modes are identified on two bridges under ambient vibration. The procedures only require a few initial parameters setting, e.g., model order range, time lag, the threshold of MPC/MPD, and uncertainty criterion. In short, the proposed approach is insensitive to these parameters, especially, two crucial parameters: model order and time lag.
Finally, recommendations are summarized in Table 6. Both n max and i are over-defined, yielding spurious modes, but the proposed approach will remove them. MPC and MPD can be regarded as standard values, with no need for adjustment. Uncertainty threshold, COV, can also be safely chosen in the range 1%-5%. In addition, two-month period data (55 datasets: Application 1) and short-term progressive damage test (72 datasets: Application 2) demonstrates the feasibility of automated OMA and modal tracking of massive data for continuous health monitoring.

Conclusions
A two-stage framework for automated operational modal identification based on SSI-cov/ref is proposed in this study. First, a stabilization diagram is created by SSI-cov/ref. Twostage framework, e.g., pre-processing stage and clustering stage, is then implemented to interpret the stabilization diagram with low demand of user intervention. Two field tests on the bridge are employed to validate the capability of the proposed approach. The main contributions are summarized as follows: This uncertainty criterion is efficient in eliminating many undesired modes at the processing stage, which speeds up the later automation process. A novel distance calculation with the uncertainty of modal parameters and weighting factor yields a reasonable threshold for clustering. An improved self-adaptive clustering is proposed based on weighted distance calculation with the uncertainty of modal parameters. The uncertainty on modal parameters and identified physical clusters are also quantified and providing additional information about quality of identified results. A robust outlier detection requiring no setting of threshold improves modal parameters' accuracy.
The proposed framework has a minimal user's involvement to achieve sufficient accuracy. Therefore, the proposed work can be suitable for long-term health monitoring, e.g., modal tracking. Some limitations should be further investigated. For example, both steel frame pedestrian bridge and highway concrete bridge have a relatively wider frequency range (e.g., 0-15 Hz), including few weakly-excited modes and closed spaced modes. In contrast, long span or suspension bridges exhibit low frequency range (e.g., 0-1 Hz) and multiple extremely closed-spaced modes Zhang, Ni, & Ni, 2016). Therefore, the further verification as the future study is needed. Another is the optimized sensor location should be determined to improve computational efficiency.

Disclosure statement
No potential conflict of interest was reported by the authors.   (2) 0.3 0.7 1%-5% Note: n max is the maximum model order; i is time lag; MPC is modal phase collinearity; MPD is mean phase deviation; COV is coefficient of variation of frequency.