Characterizing Long-term Wear and Tear of 1 Ion-Selective pH Sensors

8 The development and validation of methods for fault detection and identi(cid:28)ca-tion in wastewater treatment research today relies on two important assumptions: (i) that sensor faults appear at distinct times in di(cid:27)erent sensors and (ii) that any given sensor will function near-perfectly for a signi(cid:28)cant amount of time following installation. In this work, we show that such assumptions are unrealistic, at least for sensors built around an ion-selective measurement principle. Indeed, long-term exposure of sensors to treated wastewater shows that sensors exhibit important fault symptoms that appear simultaneously and with similar intensity. Consequently, our work suggests that focus of research on methods for fault detection and identi(cid:28)cation should be reoriented towards methods that do not rely on the assumptions mentioned above. This study also provides the very (cid:28)rst empirically validated sensor fault model for wastewater treatment simulation and we recommend its use for e(cid:27)ective benchmarking of both fault detection and identi(cid:28)cation

(ii) that any given sensor will function near-perfectly for a signicant amount of time following installation.In this work, we show that such assumptions are unrealistic, at least for sensors built around an ion-selective measurement principle.Indeed, long-term exposure of sensors to treated wastewater shows that sensors exhibit important fault symptoms that appear simultaneously and with similar intensity.Consequently, our work suggests that focus of research on methods for fault detection and identication should be reoriented towards methods that do not rely on the assumptions mentioned above.This study also provides the very rst empirically validated sensor fault model for wastewater treatment simulation and we recommend its use for eective benchmarking of both fault detection and identication
Methods that are nding their way into practice today mainly consist of sanity checks.In the authors' experience, these work rather well to detect and classify a subset of commonly recognized fault symptoms, including outliers, spikes, stuck, and out-of-range values.For sensor faults that lead to more subtle symptoms, current practice relies primarily on regular on-site sensor maintenance, e.g.once every one or two weeks, to counter such subtle faults.For unstaed wastewater treatment plants, on-site maintenance may be feasible economically only if this is limited to once per year.This practical constraint to the adoption of quality assessment and control practices forms the primary motivation for this study.
The literature suggests that data-analytical techniques can enable automated and remote detection of sensor faults.Without exception, such techniques rely on redundant relationships and can therefore be categorized by the type of redundancy that is used.A rst category consists of techniques relying on reference measurements and computing a deviation between online sensor signal and the reference signal.A second category relies on hardware redundancy by placing multiple online sensors, possibly built around a distinct measurement principle, in the same location and then computing deviations between them.A third category relies on temporal redundancy, essentially assuming that meaningful changes in the sensor signal can only be smooth when measured with a suciently high frequency.Finally, the fourth category relies on spatial redundancy, relating signals produced at distinct locations or for dierent measured variables.Examples of this last category include both methods based on rst principles, e.g.balance equations, as well as methods rooted in statistical practice, e.g.principal component analysis.Importantly, each of these advanced methods require tuning to maximize the number of true alarms and to ensure suitable quality control eorts while simultaneously minimizing the number of false alarms and futile maintenance actions.Invariably, such tuning is obtained by means of a historical, fault-free data set from which acceptable limits for computed residuals are derived.Consequently, this means that these methods rely on the availability of representative data of an acceptable quality.In addition, the use of most techniques implies that sensor fault symptoms can be assumed to appear independently from each other, i.e. the probability that two faults start at the same time is assumed to equal zero.
The prevalence of faults in actuators, sensors, and processes as well as the complexity of the fault detection and identication (FDI) task, has led to a plethora of methods that exploit one or more of the types of redundancy discussed above.In fact, the wealth of literature as well as the number of reviews on this or related topics (Venkatasubramanian et al., 2003c,a,b;Haimi et al., 2013;Corominas et al., 2018) suggest that the science and practice of FDI is all but settled, an observation also supported by no free lunch theorems (Wolpert, 1996).
Despite the tremendous amount of research on FDI methods, little is actually known about the cause-and-eect relationships between sensor ageing, the occurrence of sensor faults and failures, and the production of faulty data.This is explained by the fact that the availability of information describing the exact circumstances under which faults occur or faulty data is produced, i.e. meta-data, is usually severely limited.This is the secondary motivation of this study.
To facilitate performance evaluation of FDI tools, the formulation of simulation benchmarks has been an accepted practice in engineering sciences (Barty et al., 2006;Downs and Vogel, 1993).Similarly, the Benchmark Sim-ulation Model No. 1 was conceived as a way to test and compare innovative FDI and control strategies (Jeppsson et al., 2007).Today, it is primarily used as a starting point for a family of plant-wide models of water resource recovery facilities (Nopens et al., 2009;Volcke et al., 2006).Actual benchmarking of FDI methods has been limited to one study so far (Corominas et al., 2011).
The BSM family includes a set of sensor models which include sensor faults and this allows the user to add realism to the sensor signals.The simulated sensor faults always start at a time that is substantially later than the start of the simulated time.This provides ideal conditions for FDI method tuning as high-quality sensor data are always present in the rst sections of the simulated data set.Moreover, a simulated fault always appears independently of any other sensor fault, i.e. no two sensor faults are simulated to start at the same time or with the same direction or magnitude.We expect that the situation in real-world conditions is very dierent.We thus hypothesize that typical fault symptoms will appear at the same time and with similar directions and magnitudes when exposed to the same harsh medium, especially when the same measurement principle is applied.Evaluating the merit of this hypothesis is the tertiary motivation of this study.
The following paragraphs are focused on the results and conclusions drawn directly from experimental data obtained during a long-term sensor exposure experiment.Additional insight is however obtained by studying a variety of dynamic models to describe our measurements.

Materials & Methods
2.1.Theoretical and real-world behavior of the ion-selective electrodes for pH measurement The ion-selective measurement principle for pH measurement is understood rather well.According to the Nernst equation (Westcott, 2012) one measures an electric potential E (in mV), which is related to the activity of the protons, [H + ], in the measured medium in steady state: where E 0 is the reference potential, F is the Faraday constant (96485.33289Taylor et al., 2007), [H + ] is the proton activity in the reference cell, R is the molar gas constant (8.3144598J mol −1 K −1 , Taylor et al., 2007), and T is the temperature measured in Kelvin.The pH is dened as et al., 2002) so that S(T ) is the temperature-specic sensitivity, which can be computed as: Most typically, pH sensors are designed to deliver 0 mV at pH 7 so that E 0 is theoretically 0 mV.Similarly, the theoretical sensitivity at standard

Long-term exposure experiment
The sensors are exposed to the contents of a reactor used primarily to study advanced control strategies for nitrite accumulation prevention in a urine nitrication process (Thürlimann et al., Submitted).To this end, the nitried urine is pumped through a closed tube made from PVC with a ow rate of 43 L/h.The design of this tube equipped with sensor-holding locks is shown in the Supplementary Information (Section B).
The treated urine is from anthropogenic origin during the whole experimental period.The treated urine was collected from male lavatories in the Forum Chriesbach building at Eawag, with exception of the period from day Raw potential measurements recorded during P1, P2, and P3 are used to compute the oset ( Ẽ0 ) and two measurements of the sensitivity ( SD and SR ).In line with (Carr, 1993), the following steps are applied for every sensor and every sensor characterization test: 1. Compute the median value among the potential measurements collected in P1, P2, and P3 between 2 and 1 minutes before the start of the next phase (P2, P3, and P4).Refer to these values as E P 1 , E P 2 , and E P 3 2. The sensor oset is dened as Ẽ0 = ẼP2 .
3. The decay potential sensitivity is dened as SD = .
4. The decay potential sensitivity is dened as SR = .
These steps are demonstrated below with a practical example.

Drift model
The results shown below indicate that the oset signicantly varies over time while the sensitivity remains remarkably stable in all studied sensors.
We describe the observed drift of the oset by means of two models.
2.5.1.Model 1 -Constant trend followed by linear trend For the rst model, we apply a modied version of the excessive drift model proposed for the BSM family (Rosén et al., 2008).This model simulates E 0 (t), the sensor oset, as: Values for the 4 parameters d o , t f , r d , and σ are obtained independently for all sensors through maximum likelihood estimation (MLE).Once calibrated, the model is used to obtain the estimated mean and point-wise stan-dard deviations for the sensor oset, µ 1 (t) = E (E 0 (t)) and σ 1 (t), while using the estimates of t f and σ as xed hyperparameter values.
2.5.2.Model 2 -Integrated Brownian motion for a single sensor In model 2, we assume instead that the recorded oset measurements are generated by an integrated Brownian motion.This is a continuous-time stochastic process, which reects that the drift rate is subject to unmeasured disturbances: This model also includes 4 parameters: the initial drift rate (r d,o ); the initial oset (d o ); an input noise standard deviation controlling the rate by which the drift rate changes (σ); and an output noise standard deviation (σ ).As with model 1, parameter values are obtained through MLE.This is achieved by formulating the above process as a Gaussian process (Rasmussen and Williams, 2006).This also enables to compute expected values and associated point-wise standard deviations, µ 2 (t) = E (E 0 (t)) and σ 2 (t), with the estimates of σ γ and σ now used as xed hyperparameter values.

Model 3 -Integrated Brownian motion for multiple sensors
A third model is derived from Eqs. 5-7 by considering that two sensors of the same type may be characterized by distinct initial conditions (r

Model evaluation
The proposed models are evaluated through visual inspection of the measurements, predictions, and residuals between the measurements and predictions.In the present case, such a visual inspection is considered sucient to select a suitable model.

Implementation
All data collected during the sensor characterization tests and all code necessary to reproduce our results is added in the Supplementary Information (Section A). 3.2.Long-term trends in the oset measurements within the warranty period ).This is however only obvious when comparing these measurements with the simultaneous T1b/T2b/T3b measurements (see the Supplementary Information, Section D).In all cases, except for the T4 and T5a/b pairs, the dierence between osets in sensors of the same type remains rather small with 1 year of installation, with a maximal dierence of 16.7 mV recorded with the T2 sensors.Taking the 0.1 pH threshold discussed above as a guideline, one could propose to validate and calibrate the sensors when their potential measurements are 5.9 mV apart.This happens for the rst time for the T1, T2, and T3 sensors on day 127, 79, and 309.By these times, the absolute osets are already larger than this accepted threshold so that the relative dierence between sensors of the same type is unlikely a good measure to trigger sensor maintenance.3).The commercially available sensors (T1-T4) exhibit drift from the start of installation while the prototypes (T5) exhibit close to no drift when otherwise functioning properly.A signicant shock eect is observed for the T4 sensors at the start of the experiment but not for any other sensor.

Sensor characterization tests: Example
suggests that oset dierence between sensors can be predictive of the oset in an individual sensor.The right panel shows that this is less likely to be successful for sensors of the same sensor type, as also described above.This is considered an important opportunity for further research, which we discuss further below.

Long-term trends in the oset measurements beyond the warranty period
The oset measurements obtained after the warranty period expired exhibit two phenomena that are surprising (Fig. 3).The rst phenomenon is the rise of the oset of the T1a sensor after 480 days of exposure and a similar rise of the oset of the T1b sensor after 630 days of exposure.Considering that this appears at distinct times in the lifetime of the T1 sensors, this cannot be explained as a direct eect of medium composition changes.Based on information provided by the sensor manufacturer, this type of drift rate sign reversal is unique for the T1 sensors and is unlikely to be observed with any other sensor type covered in this study.It is the opinion of the authors that the time for this reversal is dicult to predict in advance.For this reason, this phenomenon is best handled as an unmeasured process disturbance.
The second phenomenon consists of the rather at to increasing prole of the oset measurements in the T2 and T3 sensors between day 360 and day 480.Before and after this period, the drift rate in these sensors are visually similar.Given the synchronicity of this eect between 4 pH sensors, it is hypothesized that this change in the drift rate is inuenced by the deliberate addition of nitrite in the form of NaNO 2 salt to the reactor contents from day 366 to 417.The nitrite addition aected the biomass concentration and the concentrations of all dominant nitrogen species (ammonia, nitrite, nitrate, see Supplementary Information, Section C) and may also have aected the ion strength and conductivity of the reactor contents.Due to this combination of eects, the available data only oers an incomplete understanding of the complete chain of causes and eects between the nitrite addition and the observed changes in the sensor drift rates.For this reason, the eects of changing media composition on the sensor drift rate is best also considered an unmeasured process disturbance.
3.4.Long-term trends in the sensitivity measurements Fig. 5 displays the computed sensitivity measurements for the potential rise ( SR ) during the complete experimental period.These measurements do not exhibit strong trends in any particular direction.The sensitivity measurements fall between 54.9 and 62.1 mV per pH unit.This means that one can expect to measure a pH value between 5.95 and 6.08 when (i) the true pH value is 6 and (ii) any oset is corrected for.The same graph also shows the theoretical value of the sensitivity according to (2) and the recorded temperature.This prole is very similar to the recorded sensitivity proles and explains most of the variations in the sensitivity measurements, which are small anyway.The same conclusions are drawn from the computed sensitivity measurements for the potential decay ( SD , see Supplementary Information, Section E). Figure 5: Sensitivity measurements for the potential rise as a function of time.
Vertical lines indicate a change of installed sensors (see Fig. 1).Grey bands indicate a change of reactor medium (see Section 2.3).A black line shows the theoretically expected sensitivity computed with (2).Variations in the sensitivity are small and follow the theoretical sensitivity closely.

Drift models
For practical intents and purposes, the sensitivity when corrected for temperature variations can be considered constant for the considered process and sensors.We therefore focus on further analysis of the oset measurements.Left panel: Oset condence bounds (µ±2 σ) obtained with models 1 (µ 1 , σ 1 ), 2 (µ 2 , σ 2 ), and 3 (µ 3 , σ 3 ).Right panel: Residuals between expected values (µ) and measured potentials ( Ẽ0 ).Model 1 does not describe the data well, leading to larger condence bounds and auto-correlated residuals.Models 2 and 3 t the data well and their predictions are hard to distinguish from each other.

Discussion
This study present the rst peer-reviewed results with which the eect of long-term wear-and-tear on water quality sensors deployed in wastewater treatment plants is assessed and evaluated in a systematic manner and at this scale (12 sensors).The experimental results reveal that commonly held assumptions regarding the occurrence of sensors faults and fault symptoms are false.First, it is demonstrated that drift in pH sensors occurs simultaneously in all commercially available sensors.Second, it is demonstrated that drift occurs as soon as a sensor is deployed in the measured medium.
In some cases, the immediate onset of drift is paired by a signicant shift in the oset.Importantly, the data needed to compute the osets and sensitivities as a function of time are also available in modern pH instruments in the form of a calibration logbook that can be accessed through standardized communication protocols (e.g., Modbus).
These observations have important consequences for the development of methods for fault detection and identication (FDI).Indeed, (i) one cannot assume that faults appear independently in distinct sensors and (ii) one cannot assume to have access to a fault-free historical data set.Naturally, this also holds in the context of simulation-based benchmarking of FDI methods.
Consequently, it is our opinion that the development of FDI methods and model-based benchmarking should be focused on methods that do not rely on such assumptions.
Fortunately, our results also reveal a number of opportunities for the use and maintenance of ion-selective measurements.First, the prototype sensors tested in this study exhibit a remarkably stable oset.While these sensors appear prone to failure, as one might expect from a prototype, this suggests that practically drift-free yet economical pH sensors will enter the market soon.Second, the recorded sensitivity measurements in all sensors hover around the ideal values and are remarkably stable throughout the experimental period.Such a stable sensitivity lends support for advanced monitoring and control strategies which are inherently robust to changes in the oset but still assume a rather stable sensitivity (Villez and Habermacher, 2016;Thürlimann et al., 2018a,b).Third, it was shown that the oset dierence between two pH sensors in the same medium can be predictive of the oset of the individual pH sensors, however only if two suciently distinct sensor types are selected.Combined with a stable sensitivity, this means that the deviation between two online pH sensor signals could be used as a proxy for the deviation in each individual sensor.Such a proxy measurement could be very useful for remote sensor quality assessment and predictive sensor maintenance, especially since one can compute such deviations between on-line sensor signals while the sensors remain in their normal measurement location in the monitored reactor.
The obtained oset measurements were studied in more detail by comparing the t of 3 models.From this, it is concluded that the excessive drift model included in the BSM family (Rosén et al., 2008;Gernaey et al., 2014) cannot adequately describe the naturally occurring drift in ion-selective electrodes.Instead, the proposed stochastic model, specically an integrated Brownian process, delivers a good description of the obtained data sets.In the authors' opinion, such a model should be included in the BSM family for realistic simulation of measurements obtained through ion-selective measurement principles.The obtained model also enables prediction of the expected oset measurement and associated condence intervals beyond the last measurement.This means that such a model can be used for predictive sensor maintenance, e.g., by planning a new sensor validation and/or calibration before the predicted condence interval exceeds a predetermined tolerance, each time also updating the parameters of the stochastic model.For this, condence intervals for the reference potential (E 0 ) rather than for the measurements ( Ẽ0 ) are expected to be most useful.Exploring the utility of this idea is considered for future research.

Conclusions
Despite the abundance of literature of fault detection and identication (FDI) methods, little is actually known about the cause-and-eect relationships between the exposure of water quality sensors to harsh conditions, such as wastewater media, and the occurrence of sensor faults and failures.This rst long-term study of the ageing of 12 individual pH sensors gives valuable insight into this challenge.First, it is concluded that commonly held assumptions in FDI method development and evaluation, such as the availability of fault-free historical data and independent onsets of sensor faults, are invalid for pH sensors based on the ion-selective measurement principle.In addition, the eects of oset drift in redundant sensors is unlikely to be identied early if these sensors are of the exact same type and exposed to the same medium.
A stochastic model is shown to oer a good description of the observed drifts of the sensor osets and perform better than a previously established drift model.Finally, our results suggest that newly developed pH sensors which exhibit stable osets will enter the commercial market soon.
ambient temperature and pressure (SATP) thus is S(298.15)= 59.1593 mV per pH unit.Because the actual values of these parameters tend to deviate from their theoretical values, it is common to identify their values through a 2-point calibration procedure.At the engineering department at Eawag, the most common practice is to use buered calibration media with pH 4.01 and 7.00 for validation, followed by calibration when the absolute deviations between the produced pH measurements and the known pH values exceed a predetermined threshold.The data end user sets this threshold.Depending on the application, this ranges from 0.1 to 0.4 pH units.The theoretical potential at pH 4.01 and SATP is 177.0 mV.2.2.Studied sensorsA total of 12 pH sensors are produced by Endress+Hauser (Reinach, Switzerland).These sensors consist of 5 sensor types (T1-T5) whose exact type cannot be revealed due to a condentiality agreement.The rst eight sensors consist of pairs of four commercially available sensor types (T1-T4) which are typically sold with a one-year warranty agreement.The rst (second) sensor in each pair is designated with an a (b), e.g.T1a, T1b.The last 4 pH sensors are replicates of a recently developed sensor prototype (T5) and are referred to as T5a, T5b, T5c, and T5d.The rst three sensor pairs (T1-T3) have been in use throughout a longterm exposure experiment which lasted for 731 days(Oct.4th, 2016 Oct.     4th, 2018).An overview of this experiment is given in Fig.1.The 4th pair (T4) has been in use during the rst half year and was replaced with the 5th pair (T5) on April 3rd, 2017 (day 182) as (i) the T4 sensors exhibit a long response time (not shown) and (ii) the opportunity arose to test the T5 prototypes.The T5a sensor stopped producing a meaningful signal on June 30th, 2017 (day 270) while T5b became faulty (details below) on August 31st, 2017 (day 332).These sensors were replaced with another sensor of the same prototype (T5) onOct.2nd, 2017 (day 364).In this last pair, one sensor (T5d) failed within 1 day (day 365) while the other (T5c) has been fully functional until the end of the experiment.

Figure 1 :
Figure 1: Overview of the complete experimental campaign.The periods of sensor exposure are indicated by rectangles.The periods during which the sensors produced meaningful data are marked black.

Figure 2 :
Figure 2: Exemplary sensor characterization test.Raw data obtained in the rst sensor characterization test with sensor T1a.The measured potential decays during P0, P2, and P4, while it increases during P1 and P3.Steady state is reached quickly in P1, P2, and P3.The theoretical potential values for P1, P2, and P3 are indicated with dashed horizontal lines.Grey shading indicates the data used to obtain the potential measurements (2 to 1 minute before phase change).The selected median potential values are shown with red crosses.
) with d o the initial oset, r d the drift rate parameter, H (•) the Heaviside function (H(a) = 1 if a ≥ 0, H(a) = 0 otherwise), t the time since sensor installation, and t f the time of the drift onset.The applied modication consists of adding the parameter d o .To t this model, the oset measurements, Ẽ0 (t h ), collected at discrete time instants t h , are assumed to exhibit independently and identically distributed measurement errors, h , drawn from a normal distribution with zero mean and standard deviation, σ :

Fig. 2
Fig.2shows the data obtained in the rst sensor characterization test with sensor T1a on Oct. 6th, 2016 (day 3).The raw potential measurement

Fig. 3
Fig. 3 displays the measured osets in all sensors throughout the experimental period.The recorded values collected within the warranty period (1 year) range from approximately 0 mV (no oset) to roughly −70 mV.All commercially available sensors (T1-T4) produce a decaying trend in the osets.The rstly recorded osets for the T1-T3 sensors are small in magnitude and concentrate around 0 mV.In contrast, the T4 sensors oset values indicate a shock eect producing a shift of −20 and −45 mV (T4a, T4b) within days from installation.This is explained by the manufacturer as an eect of the high ammonium concentration in the medium and should only be expected for this specic type of sensors.The accumulated drift in the T1 sensors is at most −25 mV after one year while the T2 and T3 sensors exhibit an oset of −75 mV after one year.Without calibration, this means

Fig. 4 Figure 3 :
Fig. 4 shows osets for the sensors T1a, T3a, and T3b collected in the rst year of the experiment as a function of the dierence in the oset between T1a and T3a (left panel) and T3b and T3a (right panel).The left panel

Figure 4 :
Figure 4: Oset measurements as a function of relative deviations in the oset measurements.Left panel: Osets of sensor T1a and T3a as a function of the dierence of these osets.These data are suggestive of a close to linear relationship between sensor osets and the oset dierence.Right panel: Osets of sensors T3a and T3b relative to the dierence of these osets.The dierence in oset remains small and there is no obvious relationship in this case.