Machine Learning-Based Sensitivity of Steel Frames with Highly Imbalanced and High-Dimensional Data

The machine learning-based feature selection approach is presented to estimate the eﬀect of uncertainties and identify failure modes of structures that incorporate a low failure probability and high-dimensional uncertainties. As structures are designed to have few failures, a dataset classiﬁed based on the failure status becomes imbalanced, which poses a challenge for the predictive modeling of machine learning classiﬁers. Moreover, in order to improve the accuracy and eﬃciency of the model performance, it is necessary to determine the critical factors and redundant factors, especially for a large feature set. This study benchmarks the novel method for sensitivity analysis by using datasets that exacerbate the problems involved in class imbalance and large input features. This study investigates two planar steel frames with spatially uncorrelated properties between structural members. Geometric and materialpropertiesareconsideredasuncertainties, suchasmaterialyieldstress, Young’smodulus, frame sway, and residual stress. Six feature importance techniques including ANOVA, mRMR, Spearman’s rank, impurity-based, permutation, and SHAP are employed to measure the feature importance and identify parameters germane to the prediction of structural failures. Logistic regression and decision tree models are trained on the important feature set, and the predictive performance is evaluated. The use of the feature importance approach for structures with a low probability of failure and a large number of uncertain parameters is validated by showing identical results with the reliability-based sensitivity study and appropriate predictive accuracy.


Introduction
A reliability-based sensitivity analysis estimates how uncertainty in the input parameters affects system performance by analyzing the dependence of the failure probability on the inputs, which requires repeated evaluation of the performance function, resulting in significant computational cost and time.This challenge is exacerbated for large-scale engineering problems which often carry a large quantity of uncertain parameters.Researchers have made efforts to improve the computational efficiency of sensitivity analysis.For example, Wu [ ] proposed the adaptive importance sampling approach, which improves computational efficiency by minimizing oversampling in the safe region of the limit-state surface.The score function approach proposed by Rubinstein and Kroese [ ] estimates all sensitivities by the gradient and derivative of parameters.The score function method does not require additional simulations for reliability sensitivity analysis.Torii et al. [ ] applied polynomial expansions to the performance function and its derivatives for the probability of failure sensitivity analysis.Proppe [ ] introduced the local reliability sensitivity analysis using the moving particles method, which estimates the failure probability based on the new locations of the moved data points in the design space.
In recent years, the interest in artificial intelligence has been growing in the field of structural engineering because it provides efficient solutions to the problems in this field relative to traditional computational techniques.The use of artificial intelligence in steel structural design has been focused on artificial neural networks for design of steel members or connections such as compression members [ ], steel panels [ ], steel connections [ , ], and cold-formed steel channels [ ].For reinforced concrete (RC) members or systems, machine learning techniques are implemented to predict structural responses such as the shear capacity of fiber RC beams [ ], structural response of RC deep beams [ ] and RC slabs [ ]. Data-driven machine learning approaches were used for fragility, risk, and vulnerability assessment of a special steel moment resisting frame building [ ] and RC building frames [ ]. Also, machine learning approaches were used to identify failure modes and rank the significant factors affecting the failure mode of RC members [ ], RC frames [ ], and steel frames [ ]. Regarding the assessment of structural members of systems using machine learning techniques, RC structures have been mainly investigated rather than steel structures.In particular, studies on steel structures have used neural networks, which is a specific technique of machine learning, and have focused more on structural members than systems.Koh and Blum [ ] introduced the machine learning-based feature selection framework to be used for structural sensitivity analysis.This framework measures the feature importance of all parameters and ranks them to determine the important or redundant parameters for the prediction of system failure.Two planar steel frames were investigated with the consideration of uncertainties that affect steel frame behaviors, such as yield stress, Young's modulus, frame sway, and residual stress.The frames have different failure modes and the ultimate frame strength obtained from finite element analysis was used as the response variable.The feature rankings derived by four feature importance techniques showed the identical order of factors resulting in the largest failure probability obtained from the conventional sensitivity studies.It is demonstrated that the general procedure of the proposed feature importance method can be used for sensitivity analysis.The approach is efficient because all feature importances are estimated from a single training.Moreover, variable space can be reduced by removing irrelevant parameters to improve both computational efficiency and accuracy.
This study benchmarks the feature importance approach on the datasets that incorporate a larger set of uncertainties, more data points, and a high imbalanced ratio.The importance of benchmark studies has been emphasized in the machine learning community [ , ]. Machines carry out tasks based on learning from a given dataset, therefore the best algorithm will not be the same for all the datasets [ ].Therefore, benchmarking of the feature importance approach must be accomplished to draw conclusions for the use of the approach in a wide range of structural systems.
The steel frames investigated in this study have the same layout as for Koh and Blum [ ] but have a different spatial correlation scenario.Unlike the correlated scenario that considers all columns (or beams) to have the same properties, this study applies the uncorrelated scenario where all structural members have different properties.Sensitivity studies on structural systems with uncorrelated properties inform which specific structural member largely influences the frame failure.However, the increased number of uncertainties poses additional challenges in performing reliability sensitivity studies, which require repeated evaluations of the performance function.Regarding the feature importance approach, a large set of uncertainties would make the feature importance method difficult to derive consistent rankings between various techniques.In addition, the uncorrelated scenario contains a fewer number of failures than the correlated case, therefore the class-imbalanced ratio increases.Machine learning classifiers are sensitive to imbalanced data because the prediction is biased towards the majority class [ , ].As structural failures should always occur rarely in structural design, it is critical to overcome the class imbalance problem.In summary, this study aims to examine how the feature importance approach works on structural sensitivity analysis when fitting high-dimensional and extremely class-imbalanced data, which presents challenges in model training for structural engineering problems.
This study implemented six existing feature importance methods to measure the importance score.There are typically two categories in feature importance methods: ( ) data analysis techniques that directly analyze the data without model fitting to measure the feature importance and ( ) model analysis techniques that identify important features based on predictions from trained The results are compared with the reliability-based sensitivity analysis results to validate the feature importance framework.Finally, the best feature importance technique depending on the failure modes is discussed.

Machine learning techniques for the feature importance approach
The primary task of this study is to estimate the effect of uncertainties on failure modes of steel frames using the machine learning-based feature importance approach.In a high-dimensional dataset, it is important to determine the relevant features and remove the redundant features in an effort to prevent overfitting of a model and reduce training time, thereby improving the model performance and computational efficiency.This study measures the feature importance and compares the critical features between the feature selection methods.Once the feature rankings are derived from the feature importance techniques, a classification model is employed to predict the failure status of a structure by increasing the size of the feature set used for training.The model accuracy of each feature importance method is measured to determine how many features are informative or irrelevant for identifying structural failures.The feature ranking as well as the model performance are considered when evaluating the feature importance techniques for structural sensitivity analysis.
. Feature importance techniques . .ANOVA The ANOVA test compares the relationship between features and response variables based on the value of F-statistic.The feature importance score J AN OV A is equal to the value of F-statistic.The score can be calculated by Eq. : where N m = the number of instances that y = m, x(i) m = the sample mean of feature x i for class m, s 2 m = the sample variance of feature x i for class m, x(i) = the grand mean of feature x i , M = the number of classes, M = in a binary dataset, N = the total number of instances.The importance score is the ratio of between-group variance to within-group variance, thus this technique assesses the difference between the mean values of the corresponding feature x i between the classes.A higher value of the F-statistic indicates a larger difference between the mean values among the classes, thus the feature has a significant effect on the classes.Note that ANOVA is always positive because it is based on variance, which is always positive.

. . mRMR
The mRMR technique [ ] ranks features by mutual information, which considers both relevance and redundancy of features.The feature relevance indicates a correlation with the response variable, and the feature redundancy represents the information duplicated between features.As the dataset is discrete not continuous, the mutual information difference (MID) is used as the mRMR criterion, which can be estimated by Eq. : where |S| = the feature set size (number of features), S = a feature set, x j = a feature not selected in the set S, and I = the mutual information.The first term represents the relevance of the feature x i about the response variable y.The relevance is determined from the outcome variable prediction.The second term estimates the redundancy, which is measured within the selected features x i and x j .By intuition, a feature with a negative J mRM R value has a small relevance and large redundancy, therefore, including it in model training would decrease the predictive performance of the model.

. . Spearman's rank
The Spearman's rank correlation coefficient [ ] measures a monotonic nonlinear relationship between two variables, a feature x i and the response variables y.The measured coefficient varies between -as the perfect negative correlation and + as the perfect positive correlation.A feature with the largest absolute value is considered the most important.

Impurity-based importance
Impurity-based importance considers the node impurity in a tree to estimate the importance.A node containing instances of one class only is pure while a node with greater than or equal to two classes is impure.The impurity-based feature importance can be computed by Eq. [ ], which measures the impurities at a node j before and after splitting and then averages the impurity decrease by N (i) , which is the number of nodes in a tree split based on x i .A negative feature importance value (J impurity ) indicates that including it in model training would decrease the predictive performance of the model.

Permutation importance
Permutation importance measures the importance by removing a single feature column in a dataset.First, a tree-based algorithm is fitted to obtain the baseline model performance.After training the model, a single feature column is randomly shuffled to remove the association between the feature and the response variable.The performance of the permuted model is evaluated and compared with the baseline model performance.The difference in accuracy is considered to be the importance score as shown in Eq. .The feature that results in the largest Mean Decrease in Accuracy (MDA) is the most important.The permutation method provides a negative score when a feature has no effect and shuffled data are shown to be more accurate.
J permutation (x i ) = accuracy for dataset without permutation − accuracy for permuted dataset of x i ( )

. . SHAP
The SHAP algorithm [ ] identifies how much each feature contributes to the response variable based on the predictions for linear models trained on all feature subsets.The difference of the predictions from the model f S∪{i} trained on a feature subset S including a feature x i and another model f S excluding x i is interpreted as the effect of x i .The SHAP importance score is a weighted average of all possible differences, as shown in Eq. : where F = the set of all features, and S = all feature subsets without x i .Since all possible subsets are used to measure the SHAP score, the computation time of the SHAP algorithm is expensive because the model is repeatedly trained on all possible feature subsets.SHAP measures the influence of features in terms of the prediction of either positive (minority class) or negative (majority class) outcomes.

. Classification-based techniques . . Logistic regression
Logistic regression [ ] is trained using the top-k features obtained from the feature importance methods, where k = the number of important features.Logistic regression is used for the classification problems, which uses the logistic sigmoid function and transforms the output into a probability value between and as follows in Eq. : where P (Y = m) is the probability of presence of Class m; Class represents no failure and Class represents failure in this study, w i = regression coefficients, x i = input features, and n = the number of input features.
A linear function is embedded in the logistic regression model, which is given as the natural logarithm of the ratio of P (Y = 1) to P (Y = 0).Logistic regression estimates the regression coefficients by minimizing the value of the ratio shown in Eq. .
Feature scaling was performed on the datasets used for training a logistic regression classifier, which is sensitive to the location of data points.A decision tree is scale-invariant because it trains the model based on decision rules.Therefore, the datasets for the logistic regression algorithm are transformed to a standardized scale, which has a mean value = and standard deviation = .

. . Decision tree
A tree-based classifier, decision tree [ ], is used to measure the performance in addition to estimate the feature importance score of the model analysis techniques including impurity-based, permutation, and SHAP.Decision tree continuously splits the data according to a certain parameter such as impurity or entropy.Decision rules for splitting and the leaf nodes are the final outcomes of the decision tree.The decision tree algorithm used in this study splits the nodes based on entropy (Eq. ) until all leaves are pure: where p m = the probability of Class m occurring in the data.

. Evaluation metric
After training a model with a training set, the model performance is evaluated by using a test set, which was not involved in training.A confusion matrix provides a visualization of the model performance and it is used as a performance measurement for a machine learning classification problem.The confusion matrix for binary classification consists of the four different cases as shown in Eq. : where True Positive (TP) = the number of actual positives that are correctly predicted positives, True Negative (TN) = the number of actual negatives that are correctly predicted negatives, False Negative (FN) = the number of actual positives that are incorrectly predicted negatives, and False Positive (FP) = the number of actual negatives that are incorrectly predicted positives.
Several statistical rates can be computed based upon the values given in the confusion matrix.For example, Accuracy is defined as the ratio of the correctly predicted instances to all the instances, (T P + T N )/(T P + T N + F P + F N ).F -score represents the harmonic mean of precision and recall, where precision is T P/(T P + F P ) and recall is T P/(T P + F N ).Accuracy and F -score are the most popular metrics, but they lead to the overoptimistic inflated measures especially on imbalanced datasets because several classifiers learn towards the majority class [ , ].For instance, when a dataset has .% minority class and a model predicts all minority classes incorrectly, i.e., all data points are classified as the majority class, the model accuracy is .%, which is nearly perfect.In structural engineering practice, however, it is critical to identify structural failures (the minority class) rather than safe structures (the majority class), thereby necessitating using the right metric that can correctly predict both classes in a binary classification.
When a structure is designed to have little load redistribution, the target reliability index β T is .[ ], which corresponds to the failure probability P f of .×10 −3 , derived from β = −Φ −1 (P f ), where Φ −1 = inverse standard normal cumulative density function.Moreover, ASCE [ ] suggests using a target reliability index between .and .for structural components subjected to dead, live, and other loads except earthquake loads, depending on the risk category from I through IV where category I represents the lowest level of risk to human life.Likewise, a classification dataset for structural design problems will be severely imbalanced, which is indicated from the values of β T and P f .As structural design problems often include an imbalanced dataset, it is challenging to select an adequate statistical metric that provides informative and truthful results.
This study employed three statistical measures to evaluate the performance including specificity, recall, and the Matthews correlation coefficients (MCC) [ ].For imbalanced class distributions, the majority class is typically referred to as the negative outcome and the minority class is assigned to the positive outcome.Therefore, structural failure is the positive outcome and no failure is the negative outcome.Specificity is the probability that an actual negative will test negative and is calculated by T N/(T N + F P ), which is the true negative rate.Specificity refers to how well a model identifies the frames which have no failure.Recall, also called the true positive rate, is the ratio of correct positive predictions to the total positive examples and is computed by T P/(T P + F N ).Recall informs how many positive predictions are missed from the prediction.The MCC is a reliable measure for imbalanced classification problems because it takes into account the ratio between positive and negative outcomes, which are not considered in both specificity and recall.The MCC is independent of the class imbalance, thus can reduce misleading results on imbalanced datasets [ ].The value of MCC varies between -and , similar to other correlation coefficients.The score is high only when all four categories in the confusion matrix are generated correctly.The MCC is computed by:

Reliability-based sensitivity analysis
Existing steel design specifications [ -] provide guidance for inelastic analysis, also referred to as advanced analysis and GMNIA, which directly considers geometric and material nonlinearities and includes uncertainty in system, member, and connection strength and stiffness.Estimating structural performance with certainty is challenging because of the inherent uncertainty in a structural system which affects system performance.Reliability-based sensitivity analysis estimates the effect of an input variable by evaluating the structural performance with the variable under consideration as random while all other variables are at their nominal values.After repeated simulations for each property under consideration, the probability of failure P f is estimated by n/N [ ] where n = number of simulations which resulted in failure and N = total number of simulations.The system reliability index β is computed based on P f .Unlike a reliability analysis which takes into account multiple random variables simultaneously, a sensitivity analysis considers only one random variable per simulation set to examine how the random variable affects the system behavior, therefore the strength distribution might have a smaller COV compared to that from a reliability analysis.The normal probability plot [ ] is used to estimate P f and β when no failure cases occur.
Researchers have investigated the system reliability of various steel structures estimated by considering uncertainties in the systems.

Data collection . Structural system
Two example steel frames designed according to AISC [ ] are analyzed in this study, which have the same layout but different member sizes and loads, adopted from [ ]. was given to all columns in the frame, where h is the story height.The inelastic finite element (FE) analyses are conducted with all nominal properties to find the nominal value of the ultimate load ratio λ, which is the ratio of ultimate to factored design loads.The ultimate load ratios λ obtained from the analyses are .for both frames, i.e., they have a limited capacity for load redistribution.

. Uncertainty
The Monte Carlo sampling method is used to generate samples of the uncertainties in material yield strength F y , modulus of elasticity E, sway imperfection, and residual stress.GPa are utilized to determine the mean value of yield strength and elastic modulus, respectively.The distribution of sway imperfection followed the distribution of Lindner and Gietzelt [ ].The scale factor of maximum compressive residual stress X is modeled as a normal distribution provided in Shayan et al. [ ].The random scale factor X was multiplied by σ RC to consider the uncertainty of residual stress magnitudes.The peak tensile residual stress (σ RT ) within a cross section was determined by the geometry and the peak compressive residual stress (σ RC ), which equals .F yn , as shown in (Fig. ).σ RT includes X indirectly because it is calculated based on equilibrium.Once σ RC and σ RT are determined, Figure : Residual stress pattern and fiber distribution the rest of the residual stresses in the cross section were set based on the residual stress pattern.The residual stress condition is constant along the length of a member.To account for the maximum number of uncertainties and investigate the effect of each parameter on structural failure, the frames are assumed as spatially uncorrelated, i.e., all structural members have different random properties.

. Dataset
As the structural members are uncorrelated, each individual member had a different realization of the random properties.In other words, there were no identical random values of the input variables shared between all beams or all columns.For example, input feature variables consist of thirty-three different parameters including ten different values of yield strength, elastic modulus, and residual stress and three sway imperfections assigned to the three column locations -left, center, and right.The response variable is the binary outputs based on the ultimate load ratio λ obtained from the FE analysis containing random realizations of the input parameters.As the frame is designed according to the inelastic method provided in AISC [ ], this study applied the probability-based limit state design criteria λ = .as the classification criteria of the dataset.If the ultimate load ratio is less than ., the frame experiences a structural failure and the observation is assigned to Class .An ultimate load ratio greater than or equal to .indicates that the frame is safe and the sample set is assigned to Class , which means no failure.A total of , simulations were run for Frame and , , simulations for Frame .A small percent of simulations had convergence issues and were excluded from the datasets.In total, Frame had failures (Class ) out of , labeled data points.Frame had failures, which is less than that of Frame , out of the total number of simulations, , .The data points for each frame were randomly assigned into equal training and test sets, for a %-% split between training and testing datasets for each frame.
As discussed previously, a classification dataset for structural design problems will be severely imbalanced, based on the selection of β T and the corresponding P f .The no-information rate describes how much the dataset is imbalanced, which is calculated by max(n Class 0 , n

Comparison of sensitivity analysis results
This section compares the sensitivity analysis results obtained by the feature importance approach and the reliability sensitivity analysis.For the reliability-based sensitivity study, , simulations for each uncertainty under consideration were conducted.The feature name consists of the structural member name following the property name.Residual stress and sway imperfection are shortened to 'rs' and 'sway', respectively.For example, E-B represents the elastic modulus of B and rs-C indicates the residual stress of C .Sway imperfections at the column locations -left, center, and right -are labeled as sway-C , sway-C , and sway-C , respectively.
Table summarizes the reliability-based sensitivity analysis results including statistics of strength, probability of failure, and reliability index.Frame has larger COVs than Frame , with a larger difference for random elastic modulus and sway imperfection, which indicates that Frame is more sensitive to these factors.Although the nominal ultimate strength was equal to .for both frames, Frame has smaller values of β compared to those of Frame due to the larger COVs of Frame resulting in a lower boundary of the strength distributions.

. Frame : instability of a single column
The results of the feature importance method are shown in Fig. From the results of Table , random sway imperfection resulted in the lowest β in Frame , followed by yield strength, elastic modulus, and residual stress.A small elastic modulus and large sway imperfection increase lateral deflections, thereby increasing second-order bending moments.As Frame fails by the instability of C , the frame capacity is most influenced by the factors resulting in increased bending moments.The highly-ranked features determined from the feature importance methods are identical with the factors that resulted in a lower β from the reliability-based sensitivity analysis.The feature rankings determined by the feature importance framework showed that sway-C and F y -C are the most important features among the thirty-three random properties.The third-ranked feature is either sway-C or E-C , which showed a positive correlation with the frame strength, but less significant than sway-C and F y -C .Overall, the random properties that have significant impacts on Frame 's capacity determined by the reliability-based and machine learning-based sensitivity analyses are in agreement.
. system in addition to the influential properties.Due to the complex failure mode of Frame , the feature orders are not as straightforward as Frame , however the results indicate that the feature importance approach is accurate for steel frames with various failure modes.

Frame : progressive yielding
The reliability-based sensitivity study investigated the effects of random properties on the frame strength.
As shown in Fig. , the properties that have a significant effect are identical with the features that are highly ranked by the feature importance methods.The center columns sway have the most significant impact on the frame strength than the sway of other columns.This example illustrates that not all factors influence the system behavior, and it is therefore unnecessary to assess the effects of each factor individually, as is done in a reliability-based sensitivity analysis.On the other hand, the feature importance approach analyzes all the factors at once to estimate the effects on system behavior.
Overall, Frame has smaller magnitudes of the importance scores than for Frame .Moreover, Frame has a large difference between the two top-ranked features and the remainder of the features, while the score difference between the features in Frame is smaller.In other words, the importance score of Frame decreases smoothly from the top to the bottom of the rankings.When a structural system fails by a single member (Frame ), the properties of that member has a critical impact on the entire system.On the other hand, when various members lead to a system failure, such as progressive yielding (Frame ), the properties of multiple members have a significant impact on the entire system.A comparison of the importance values between Frame and Frame indicates that the number of structural members involved in system failure influences the magnitude of importance as well as the number of features considered to be important.

Performance evaluation
The test set, which accounts for % of the dataset, is employed to evaluate the models fitted on the training set.The accuracy metrics include specificity and recall to measure the correct prediction of Class (no failure) and Class (failure), respectively.In addition, the Matthews correlation coefficient (MCC) is used, which is a suitable metric for the imbalanced dataset.
The predictive performance of the machine learning models for Frame is shown in   score reaches the nearly perfect value of approximately ., even with only a few features.In this study, specificity represents the proportion of safe structures that are correctly predicted.The Frame dataset is severely imbalanced with the no-information rate of .%, therefore the model performance measured by specificity shows inflated results due to the biased classification towards the majority class.Recall is computed to evaluate how good a model is at detecting a structural failure, which is the positive class.
As it is critical to identify system failure rather than safe structures in structural design practice, recall is a more crucial measure than specificity in this study.As previously discussed in Section ., Frame consisted of a larger sample space and a fewer number of failures than for Frame .Moreover, the feature orders of Frame were inconsistent between the feature importance techniques, as multiple features were significant to the prediction of system failure.The higher imbalanced ratio and the complex failure mode of Frame result in a low performance measured by recall as well as the MCC.The extremely imbalanced classification of Frame led to a lower performance for predicting structural failure because the machine learning classifiers had only a few minority class examples to oversample in the training set as well as to test the model prediction.
The model performance, which measures the prediction of minority class, can be improved by obtaining more number of failures in a dataset or reducing the class imbalanced ratio.
The six feature importance techniques showed similar performance of Frame measured by the MCC.
The performance was improved after containing the three top-ranked features (Fig. e and f).In Frame , however, the model analysis methods showed more accurate results than the data analysis methods.In particular, the permutation and SHAP methods showed the best performance until the feature set size increases to seven (Fig. f) because they ranked F y -C and F y -C at fifth and sixth, respectively, whereas the impurity-based method ranked F y -C at seventh, and F y -C and F y -C have a significant influence on the system failure of Frame .In summary, based on the feature rankings and the model performance results of both frames, SHAP and permutation methods are the best techniques for estimating the importance of features.

Conclusion
This study examined the feature importance approach using datasets with a large number of uncertainties and severely imbalanced classification.Two designs of a non-symmetric planar steel frame were investi- gated with consideration of uncertainties in material yield strength, Young's modulus, sway imperfection, and residual stress.The dataset information consisted of thirty-three uncertainties based on the uncorrelated scenario and the ultimate load ratio obtained from the finite element analysis.A scarce number of failures occurred, as is common in structural engineering design, thus the datasets were extremely classimbalanced with the two classes being safe and fail.Feature importance techniques including ANOVA, mRMR, Spearman's rank, impurity-based, permutation, and SHAP are trained on the high-dimensional and severely class-imbalanced datasets to identify the important features.After rating the features according to the importance score, the logistic regression and decision tree algorithms are trained to predict the classes using the feature set containing the top-ranked features.The model performance evaluated by specificity showed the nearly-perfect performance for both frames because most examples were assigned to the majority class, safe structure.Frame , which failed by the buckling of a single column, showed good performance for the prediction of failure even with the highly imbalanced classification.However, for Frame which had a complex failure mode of progressive yielding in addition to an extremely low failure probability, it was challenging to obtain the accurate prediction of the minority class.As class-imbalanced data is inevitable in structural engineering, it is necessary to be cautious in assessing the predictive accuracy of structures with a complex failure mode and a failure probability approximately equal to zero.The important features identified using the machine learning based feature importance approach were compared with the results of a conventional reliability-based sensitivity study to identify factors which result in a lower system reliability index.Overall, both methods identified the same factors which reflected the system failure modes.This study demonstrated that the machine learning-based sensitivity analysis can identify the influential features affecting system failure even when there are high-dimensional uncertain parameters and a highly imbalanced dataset.
models [ ]. ( ) ANOVA (Analysis of variance), ( ) mRMR (minimal-redundancy-maximal-relevance) [ ], and ( ) Spearman's rank correlation coefficient [ ] are utilized as data analysis techniques.This study used a decision tree classifier [ ] to measure the feature importance by using model analysis techniques.Model analysis techniques include two feature importance methods, which are ( ) impurity-based importance and ( ) permutation importance, and ( ) SHAP (SHapley Additive exPlanations) [ ]. Based on the measured feature importance, logistic regression [ ] and decision tree [ ] models are fitted to predict whether a steel frame fails.The predictive performance is evaluated by specificity, recall, and the Matthews correlation coefficient [ ].
Fig. a and Table.summarize the information of the example frames including geometry and applied loads.The frames are modeled in OpenSees [ ] with displacement-based and fiber-type elements.All cross-sections contain the residual stress with the Galambos and Ketter pattern [ ] and the fiber distribution as shown in Fig. .The nominal peak compressive residual stress value is .F yn , where F yn = nominal material yield strength of MPa ( ksi).Initial sway imperfection of h/ Fig. b and c illustrate the location of the most highly yielded zones for Frames and , respectively.Frame fails by the instability (inelastic buckling) of a ground floor column C and at failure members C and B are partially yielded.Frame fails from a gradual sequence of yielding, thereby multiple members have highly yielded zones where a location along the member yielded greater than %, which are B , B , B , B , C , and C .

Figure :
Figure : (a) Frame layout; location of highly yielded zones for (b) Frame ; (c) Frame Class 1 )/(n Class 0 + n Class 1 ), where n Class 0 is the number of Class examples and n Class 1 is the number of Class examples.The noinformation rates of Frame and Frame are .% and .%, respectively, approximately equal to %.As imbalanced classification data leads to biased prediction toward the majority class, this study employed the oversampling method for the training set to reduce the problem caused by imbalanced classification.The minority class is oversampled to have the same number as the majority class by the Synthetic Minority Over-sampling Technique (SMOTE) proposed by Chawla et al. [ ].For example, Frame had , times greater number of the majority class than the minority class.The new minority class examples are generated , times in a training set to equal the number of majority class examples.The oversampled minority class are not duplicates of the existing samples, but are derived using k-nearest neighbors and interpolation parameters.
. The top row (Fig.a-c) shows the top ten feature rankings of Frame derived from the data analysis methods including ANOVA, mRMR, and Spearman's rank.The feature orders obtained from the model analysis techniques including impuritybased, permutation, and SHAP are shown in the bottom row (Fig.d-f).As previously discussed in Section , the feature importance techniques can derive either positive or negative values or both.As the feature rankings show only ten highly-ranked features, the negative scores are not included in the figure except for Spearman's rank (Fig.c), which rated the features based on their feature importance value magnitude.Sway-C and F y -C are top-ranked from all the feature importance methods.Sway-C or E-C is thirdranked but has a negligible importance score in comparison to the top two features.Although the order of the remainder of the features is different between the various techniques, only the two highly-ranked features have significant scores.In other words, only the first two features are significant to the prediction of failure for Frame .Most methods result in scores approximately equal to zero for the least important features.Frame fails by the inelastic instability of C , and this is reflected in the importance score results as the features related to C are the most highly ranked.

Fig. illustrates
Fig. illustrates the scatter plots of input random properties versus the frame strength based on the reliability-based sensitivity studies.As shown in Fig. a, the yield strength of C and the frame strength have a nearly perfect correlation, which indicates that the strength of Frame is controlled by C .The yield strength of the other members showed no correlation with frame strength.Fig. b and c show that the frame strength has a weak correlation with the elastic modulus of C and the residual stress of C .The sway imperfection at the center column (C ) has the most significant impact on the strength among the three column positions -left (C ), center, and right (C ) -by showing a strong positive correlation.

Figure :
Figure : Importance ranking of the top ten features of Frame derived by data analysis methods (top row) and model analysis methods (bottom row)

Fig. shows theFigure :
Fig. shows the top ten feature rankings of Frame derived by the feature importance approach.As shown in Fig. a-c, the top-ranked feature is either F y -C or sway-C , and the remaining order varies for each data analysis technique, which identifies important features without model fitting.However, the model analysis techniques, which require model training to measure feature importance, derived the same top four features including sway-C , F y -C , F y -B , and sway-C in descending order (Fig. d-f).The features ranked fourth through tenth are similar between the model analysis methods.At least four yield strengths are highly-ranked across all the methods, which indicates that yield strength is an influential factor in the failure of Frame and the failure mode is progressive yielding.As previously shown in Fig.c, Frame has six members that have critical impacts on the system failure, including four beams and two columns.In particular, B and C have two highly yielded zones each, and the yield strengths of these members are topranked among all the yield strengths.The feature importance results show the significant members in the Fig. a and b show dents on the upper left side, and they occur when the value of the B yield strength is the maximum or the minimum among all members, respectively.Fig. c and d indicate that Frame 's strength has positive correlations with elastic moduli of C and B .As X of C increases, the frame strength decreases (Fig. e) because the presence of residual stresses leads to the onset of yielding at a lower applied load [ ]. Random elastic modulus and residual stress of C and B are correlated with the frame strength but showed small COVs, which represents less significant influence.The effects of random sway imperfection are shown in Fig. f-h.

Figure :
Figure : Importance ranking of the top features of Frame derived by data analysis methods (top row) and model analysis methods (bottom row)

Figure :
Figure : Scatter plots of Frame strength versus random (a) F y of C (b) F y of B (c) E of C (d) E of B (e) X of C (f) sway imperfection of C (g) C (h) C Fig. c  shows the recall curve obtained from the logistic regression model.When the feature set contains the top three features of the model analysis techniques and ANOVA, which are sway-C , F y -C , and sway-C , the recall score rapidly increases up to . .Spearman and mRMR ranked the sway imperfection of C at fourth and therefore the score abruptly rises when the feature set increases to four.The recall curve of the decision tree (Fig.d) shows the highest value when only two or three features are selected.The outcome recall scores converge to a lower score of .after reaching the peak point.Fig.e and fshow the outcome MCC scores for the logistic regression and decision tree models, respectively.The MCC curves have a similar shape as the recall curves; the logistic regression model performance improves as the feature set increases, and the decision tree model reaches the peak point when the feature set is small.This indicates an overfitting issue which occurs when the model is trained on a large feature set.Feature selection based on the feature importance score can improve the overfitting by excluding the redundant features from training.The least important features, which are ranked after the fifteenth, could be removed to obtain a better performance.The model performance of Frame measured by specificity is shown in Fig.a and b.The specificity generates overoptimistic results due to the high imbalanced ratio of .%.When the yield strength of C is ranked as the most important feature by ANOVA or Spearman, the decision tree model shows the nearlyperfect score even though the dataset includes only one feature (Fig.b).The lowest score of specificity is ., indicating that both logistic regression and decision tree models can identify safe structures with high accuracy.The recall curves of Frame are shown in Fig.c and d.The logistic regression model can correctly predict the failure only when the sway imperfection of C is included in the feature set.For example, the recall curve of Spearman's rank shows zeros until the feature set size is seven because the feature ranking rated the sway imperfection of C at seventh.The recall curve of the decision tree model (Fig.d) shows a large variation between the data analysis methods because they had completely different feature rankings.However, each curve merges to about .as the number of features increases.The MCC curve of the logistic regression model (Fig.e) shows a similar shape as the recall curve because the specificity scores had a single value, which is close to , regardless of the feature set size.The MCC values of the decision tree model merge to ., which is the score of the entire feature set (Fig.f).When a high-dimensional dataset is used, the class imbalance leads to additional challenges in misclassification of the minority class [ ].

Figure :
Figure : Frame specificity, recall, and MCC for logistic regression (left column) and decision tree (right column)

Figure :
Figure : Frame specificity, recall, and MCC for logistic regression (left column) and decision tree (right column) Buonopane [ ]conducted a reliability sensitivity study on two steel frames by considering uncertainties in yield strength, elastic modulus, residual stress, and sway and bow imperfections.Szyniszewski [ ] investigated the effect of random geometric imperfections on progressive collapse propagation by analyzing -D steel framed buildings with uncorrelated geometric imperfections between structural members.Shayan et al.[ ] presented a probabilistic study regarding modeling random geometric imperfections on regular and irregular sway and braced planar steel frames.Thai et al. [ ] evaluated the system reliability of steel frames with semi-rigid connections.Uncertainties in gravity loads, material properties, cross-sectional properties, and connection properties were included in the reliability analysis.Zhang et al.[ ] examined the system reliability of five steel structures including a beam, a portal frame, and three low-rise frames.Randomness considered in the analysis includes gravity loads, material properties, cross-sectional properties, and sway imperfection.Cardoso et al. [ ] calibrated the system reliability of cold-formed steel portal frames with uncertain parameters in material properties, cross-section thickness, joint properties, and geometric imperfections.

Table :
Member sizes and applied loads Table summarizes the statistical information and references of the uncertainties.Yield strength and elastic modulus are modeled following the distributions published in Bartlett et al. [ ]. Nominal yield strength F yn of MPa and nominal elastic modulus E n of