Voltage dip diagnosis in electrical distribution systems using extreme learning machines: an empirical evaluation

An ineﬃcient operation of the utility grid can be inferred from low power quality indexes. Under such conditions, reductions in the useful life of equipments and loads, grid instabilities, service interruptions and breakdowns can be expected, which would produce meaningful economic losses for the end-users as well as for the power supply company. In this context, it is useful to have a diagnosis tool


Introduction
The interest in power quality has increased among electric utilities and consumers in the last decades, as a result of the technological advancement and consumer awareness. Ten years ago, the duration and frequency of the discontinuity of electricity supply were the main power quality issues. Today, the 5 monitoring and control of variations in the frequency and voltage amplitude are equally important.
The diffusion of non-linear electric equipment contributes to the degradation of power quality, requiring more attention from the network operators (Decanini et al., 2011). Power quality problems are the consequences of the increasing use ductions in the useful life of the equipment, malfunctions, instabilities, and interruptions; resulting in breakdowns and product life reductions can result in significant economic losses for the end-user and for the power company (Zhang et al., 2012;Sánchez et al., 2013).
A study carried out in Unites States of America (USA) showed that indus-25 trial and digital business firms lost $45.7 billion per year due to power quality issues (Seymour & Horsley, 2008;Meher & Pradhan, 2010). A power quality survey carried out in European Union (EU) analyzed the economic consequences of unreliable electrical power for industrial and service sectors representing over 70% of the EU-25 economic output. The main purpose was to estimate costs interruptions are the disturbances with the greatest negative economic impact, accounting for 57% of the total loss (Manson & Targosz, 2008). This situation has raised the need to monitor and improve the power trans-40 mission and distribution networks. The ultimate goal of monitoring is to avoid electrical equipment damage and to identify the sources of power quality disturbances. But while the sources and causes of such disturbances must be identified and controlled before taking an appropriate mitigating action, the typical practices to diagnose them strongly rely on visual inspection of wave-45 form disturbances and operator assessment. The analysis of such a volume of data is a difficult task -not to say almost impossible -even for experimented engineers (Khokhar et al., 2015b(Khokhar et al., , 2017. Furthermore, it is rather difficult to determine the cause of a voltage dip, which usually gives rise to controversies between consumers and the electrical company. Within this perspective, it be-50 comes evident the need for diagnostic tools able to identify possible causes of voltage disturbances and to provide assistance in decision-making processes so as to define corrective actions (Decanini et al., 2011). This requires to develop techniques for automatic disturbance identification, to be applied directly or by means of feature extraction and pattern recognition. 55 Several surveys reporting the application of signal processing and artificial intelligence techniques for the (semi) automatic detection, classification and diagnosis of power quality disturbances can be found in the literature (Strack et al., 2018;Montoya et al., 2016;Khokhar et al., 2015b;Mahela et al., 2015;Saini & Kapoor, 2012;Granados-Lieberman et al., 2011;Saxena et al., 2010). 60 These proposals can be generally divided into two groups.
The first group involves the approaches aimed to classify the type of disturbance -e.g., voltage sags/swells, interruption, etc. A Support Vector Machine  The goal of the second group of techniques is to diagnose the disturbances 135 by identifying their underlying causes. One of the main attempts in this direction can be found in Bollen et al. (2007). It analyzes a deterministic and a statistical classification approach. The work distinguishes five classes of disturbances on the synthetic dataset: voltage dips with drop in one phase, voltage dips with identical drop in two phases, voltage dips due to three phase faults, 140 voltage dips with a different drop in two phases, and voltage disturbances due to transformer energizing. Erişti & Demir (2010) presents a Wavelet Multi-Resolution Analysis (WMRA) and SVM based approach for the diagnosis of simulated phase-to-ground faults, phase-to-phase faults, three-phase fault, load switching, capacitor switching and transformer energizing events. In Erişti & Demir (2012), WT is used at the feature extraction stage, while the disturbances are classified by means of SVM and diagnosed by using a DT algorithm. Real power system event data is used to evaluate the performance of the strategy, which considers the following disturbances: faults, self-extinguishing faults, line energizing, interruption, and transformer energizing. A feature selection tech-150 nique for the diagnosis of a dataset of real disturbances is proposed in Erişti et al. (2013). A data set feature vector is obtained by applying multi-resolution analysis (MRA) and k-Means based Apriori algorithm feature selection is used.
The obtained optimal feature vector is the input a Least Squares Support Vector Machine (LS-SVM) classifier. A S-Transform and ELM based diagnosis 155 approach is proposed in Erişti et al. (2014) and its performance is assessed on noisy conditions over real and synthetic datasets.
The previous paragraphs show the large number of existing initiatives regarding the detection, classification and diagnosis of power quality disturbances.
However, most of them classify disturbances according to their types -e.g.: 160 dip, sag, interruption, etc. -instead of doing it according to their underlying causes. Although such approach is useful for the development of methods, the availability of tools for classifying disturbances according to the event that generated them has a greater practical value. At the same time, the identification of the underlying causes of disturbances is a more difficult and challenging 165 task. It requires not only signal analysis but also the information of power network configuration or settings, comprising signal analysis knowledge, power system knowledge, and information on power network configurations and settings (Bollen et al., 2007).
The aforementioned issues are addressed in this article by performing a set 170 of experiments aimed at assessing the feasibility of an artificial intelligence technique for the diagnosis of voltage dips. The proposal is rooted on the one-step application of the ELM algorithm for single-hidden layer feed-forward neural networks. The one-step application refers to the lack of any segmentation and feature extraction pre-processing stage: the three-phase voltage waveforms are directly shown to the classifier to identify the underlying cause of the observed disturbance. The classification task is performed over synthetic data generated by the simulation of a real power distribution grid, assessing the performance of  as any deviation of voltage or current from the ideal one. A disturbance is a voltage disturbance or a current disturbance, but it is often not possible to distinguish them given that any change in current is accompanied by a change in voltage and viceversa (Bollen & Gu, 2006).

210
Voltage disturbances are generally categorized according to their duration and magnitude, considering short and long interruptions, short-duration overvoltages and undervoltages, and long-duration overvoltages and undervoltages.
A voltage disturbance is caused by a sudden current increase at a specific point of the electrical grid and along a specified period. The main cause of the overcur-215 rents are faults, originated from short-circuits raised by insulation breakdown, switching operations or atmospheric discharges. Switching of large shunt capacitor banks and starting of large loads (e.g., motors), can also produce large changes in current similar in effect to those of short-circuit states. Supply systems are equipped with protective devices to disconnect the short circuit from 220 the source of energy, which allows the almost immediate recovery of the voltage at every point except the disconnected ones. Some types of faults are selfclearing: the short circuit disappears and the voltage recovers before the line disconnection can take place (Bagdadioglu & Senyucel, 2010;Hanzelka, 2008;Bollen & Gu, 2006). In consequence, the depth and duration of a disturbance 225 perceived by a given user is strongly related to the cause of the overcurrent and the point of the grid where it occurs.
This work focuses on the disturbances known as "voltage dips" or "voltage sags". A voltage dip/sag is a sudden reduction of the voltage at a particular point of an electricity supply system below a specified dip threshold within a 230 short time period, followed by its recovery after a brief interval. The threshold value is the voltage r.m.s value defined in order to determine the start and end of a voltage dip. IEEE defines voltage sag as a decrease of 10 -90% in the rms voltage at the power frequency for duration of half cycle to 1 minute (IEEE, 2009). IEC defines a voltage dip as a sudden reduction in the supply voltage 235 to a value between 90% and 1% of the nominal voltage followed by a recovery between 10 milliseconds and 1 minute. In polyphase systems it is assumed that a three-phase dip starts when the voltage in the first disturbed phase falls below the threshold and ends when voltages in all phases are equal to or above the threshold. The duration of a voltage dip depends on the specified 240 threshold value. The rest of this work will use the term voltage dip to refer to both concepts, although some details in terms of amplitude and duration differ slightly (IEC, 2004).

Extreme Learning Machine (ELM)
ELM is a relatively new learning algorithm for Single-hidden Layer Feed-245 forward Neural networks (SLFNs), which provides good generalization performance at extremely fast learning speed -in most cases thousands of times faster than conventional learning techniques (Huang et al., 2006b,a). Due to its remarkable efficiency, simplicity, and impressive generalization performance, ELM has been applied in a variety of domains, such as biomedical engineering, com-250 puter vision, system identification, robotics, and power systems control and operation (Zhang et al., 2018;Das et al., 2018;Lin et al., 2017;Bequé & Lessmann, 2017;Rubiolo et al., 2018;Huang et al., 2015;Song & Zhang, 2013;Avci, 2013;Avci & Coteli, 2012).
ELM randomly chooses hidden nodes and analytically determines the output 255 weights of SLFNs. Thus, the only free parameters that need to be learned are the weights between the hidden layer and the output layer of the neural model.
is the l -th input vector and y l = [y l1 , y l2 , ..., y lM ] T ∈ R M is the l -th target vector, a SLFN with P hidden nodes and activation function f (·) is mathematically modeled as: where B i is the weight vector connecting the i -th hidden node and the output nodes, w i = [w i1 , w i2 , ..., w iN ] T is the weight vector connecting the i -th hidden node and the input nodes, b i is the threshold of the i -th hidden node, and o l is 260 the output value of the l -th node.
The SLFN can approximate these L samples with zero error as l o ly l = 0 , for which there must existB i ,ŵ i , andb i so that: The above L equations can be compactly written as: H is the hidden layer matrix of the neural network, and its i -th column is the i -th hidden node output with respect to inputs x 1 , x 2 , ..., x L . The output connection weights can be obtained by solving min B HB-Y . Then: where H † is the Moore-Penrose generalized matrix inverse of the hidden layer matrix H .
The ELM algorithm can be summarized as follows: a) Inputs:

265
• L-size training set (x l , y l ) l=1,2,...,L , where x l ∈ R N and y l ∈ R M , • the number P of hidden nodes, • and the activation function f (·). b) Steps: • Select arbitrary values for the input weights w i ∈ R N and thresholds 270 b i ∈ R for the P hidden neurons.
• Calculate the hidden layer matrix H of the SLFN.
As proved in Huang et al. (2006b), if the f (·) activation function of the hidden layer is infinitely differentiable and the number P of hidden nodes is less than or equal to the number L of samples, then the input weights w i ∈ R N and thresholds b i ∈ R can be randomly selected and the SLFN approximates the training data with ε error, i.e.: HB -Y ≤ ε.

280
After the input weights and the hidden layer biases are chosen randomly, the SLFNs is a linear system and the output weights are analytically determined through generalized inverse operation of the hidden layer output matrices. In this way, ELM obtains the global optimal solution by fittingB , avoiding to fall into local optimums.

285
Different from traditional iterative learning algorithms, ELM not only reach the smallest training error but also the smallest norm of weights. According to the Barlett (1998) study on the generalization performance of feedforward neural networks reaching smaller training error, the smaller the norm of weights is, the better generalization performance the networks tend to have. Thus,

290
ELM obtains not only the minimum square training error but also the best generalization scores on new samples (Huang et al., 2006a,b).

Experimental dataset
Diagnosis of voltage disturbances is performed by classifying voltage dips according to their underlying causes. The classification task is performed over syn-   it remains to be noted that the simulations considers peak load demands and the loads were modeled as constant impedances, according to the parameters depicted in Table 2.

C2
Single line-to-ground fault. Single-phase fault-clearing. Serial nonlinear fault resistance.

C3
Single line-to-ground fault. Single-phase fault-clearing. Parallel nonlinear fault resistance.

C4
Double line-to-ground faults. Three-phase fault-clearing. Always the same two phases for each simulation.

C5
Double line-to-ground fault. Tree-phase fault-clearing. Two different phases for each simulation. C6 Three-phase fault. Three-phase fault-clearing.

C7
Simultaneous single line-to-ground fault on all phases. Independent single fault-clearing. Different fault resistances and different clearing times for each phase.

C8
Simultaneous single line-to-ground fault on all phases. Independent single fault-clearing. The same fault resistances but different clearing times for each phase. C9 Self-extinguishing single line-to-ground fault. Nonlinear fault resistance. C10 Self-extinguishing line-to-line fault. Linear fault resistance. C11 Self-extinguishing three-phase fault. Linear fault resistance.

C12
Simultaneous line-to-ground fault on all phases. Electric arc with fault ionization.

C13
Simultaneous line-to-ground fault on all phases. Electric arc with fault deionization.

C14
Simultaneous line-to-ground fault on all phases. Electric arc with random resistance variation. C15 Balanced three-phase starting current (∼ 180A).
Strategy 1 applies a one-for-all approach. A single classifier is trained to classify the 20,000 simulated voltage dips according to the 20 classes, testing 9 different combinations of the hyperparameters: 1,500 (1.5k ); 3,000 (3k ); and 350 5,000 (5k ) neurons in the hidden layer, applying sigmoid activation function (sigm), hyperbolic tangent activation function (tanh), or RBF activation function with L ∞ norm (rb ∞ ). This strategy applies the one-hot encoding approach by defining an output matrix of Boolean values, showing the class -i.e., the underlying cause -of a given voltage dip. Therefore, an output matrix of 400,000

Results
The performance of the approach is assessed by means of three widely used classification task metrics: precision, recall, and F1 score. Precision (P) is defined as tp/(tp + f p). Recall (R) is defined as tp/(tp + f n).
In both expressions, tp is the number of true positives, fp is the number of false 390 positives, and fn is the number of false negatives. Intuitively, P is the ability of the classifier to not label as positive a sample that is negative, while R measures its ability to find the positive samples.
3 More information on https://www.python.org/ 4 Python implementation of the experiments can be found in https://gitlab.com/emireys/elm4vdc-experiments.git F1 score (F1) is defined as 2 * (P * R)/(P + R), and represents a weighted average of precision and recall metrics where the relative contribution of both 395 metrics to F1 score are equal. As it can be inferred from the previous mathematical expressions, the scores for all these metrics range from 1 (best) to 0 (worse).
Since F1 score can be used as a joint representation of precision and recall metrics, the experiment results will be mainly assessed by analyzing Table 3. By analyzing the rows, it is possible to notice those classes that are easily identifiable by means of any of the proposed hyperparameter combinations.
2. It gets the best average F1 score along the 20 classes, as shown in the penultimate row of the table.
3. It presents the lowest dispersion of F1 scores, as shown in the last row of the table.

440
The analysis of training and test times strengthen the choosing of this option.

Discussion
Although there is a large number of initiatives regarding the detection, classification and diagnosis of PQDs, most of them classify disturbances according 460 to their types instead of doing it according to their underlying causes. The diagnosis approach is a difficult and challenging task, but it also comprises greater practical value. Erişti & Demir (2010; Erişti et al. (2013) and Erişti et al. (2014)

Conclusions
This work shows the potential of a SLFN/ELM based approach for the di- The experiments also allow to recognize the strategy that presents the best 520 5 The dataset and Python implementation of the experiments can be found on https: //gitlab.com/emireys/elm4vdc-experiments predictive performance, and at the same time, it is the most attractive from a pragmatic point of view. I.e., the one-against-all strategy is a scalable approach, and the 750-tanh hyperparameter combination turns out a simple model with high predictive scores at short training and test times.
Scalability is an extremely important issue since in a real environment new 525 causes for disturbances could be identified and the diagnostic technique should be able to incorporate them into their internal models. The one-against-all approach enables to extend such models by building SLFN/ELMs specifically trained to diagnose the disturbances according to the newly identified cause.
The re-training of the previous models is required in this case, but as it was 530 previously shown, this is quite fast. In contrast, the gradual incorporation of new knowledge is not possible when the one-for-all strategy is applied. It requires the building and training of a full new model from scratch to consider new causes for disturbances, and this is quite costly in terms of time and effort.
It is important to mention that the hyperparameters settings assessed in 535 this works also respond to the intention to contrast the performance of the onefor-all (Strategy 1) and one-against-all (Strategy 2) approaches. According to that intention, Strategy 2 only evaluates the half of the minimum number of hidden neurons considered by Strategy 1 and applies only one of the available activation funcions. However, despite this drastic reduction in the complexity 540 of the model, Strategy 2 is presented as the best option for the task under study in terms of generalization performance, training times and testing times. These results made evident the impact of each approach on the performance metrics, architectural features -in terms of scalability-, and extensibility -through gradual incorporation of new knowledge-of the overall solution.

545
It is known that a laboratory experiment suffers from a certain lack of realism, but is also recognizable that a field study that involves the monitoring and diagnosis of voltage events is quite expensive both from a technical and legal point of view. Even taking into account these limitations, the results encourages researchers on the field to formulate further hypotheses to be tested in 550 subsequent experiments.
A first scenario to be analyzed involves the assessment of the proposal under different noisy conditions. An even more interesting proposal is to assess the classification performance of the models on a power network different from which they were trained. Such experiments would allow to test the impact of a