Cybersecurity Concerns for Total Productive Maintenance in Smart Manufacturing Systems

Maintenance is the core function to keep a system running and avoid failure. Total Productive Maintenance (TPM) has broadly utilized maintenance strategy to improve the customer's satisfaction and hence obtain a competitive advancement. However, the complexity of smart manufacturing systems due to the recent advancements, specifically the integration of internet and network systems with traditional manufacturing platforms


Introduction
The increasing trend of adoption of smart features in manufacturing systems has helped improve productivity and quality as well as profit by revolutionizing the whole manufacturing paradigm ( Figure 1). However, interconnectivity at production level makes them vulnerable to cyber threats [1]. Even though most of these threats are not new to IT world, there are new concerns in manufacturing systems regarding cybersecurity. Though most of the research has been done in the field of threats to intellectual properties (IPs) including stealing and modifying them, this is not the only concern of smart manufacturing systems.
Due to increasing utilization of cyber-physical systems, autonomous robots, and internet of things (IoT), any cyberphysical threat not only harms the integrity of the IPs but also can potentially disrupt production process and even bring a hazardous situation to the system [2].
To reduce disruption and increase productivity, one of the pillars of implementing lean concept is total productive maintenance (TPM). The goals of TPM are zero breakdown, no slow running, no defects, and making the production environment safe and in a perfect condition [3]. However, the cybersecurity threats could directly disrupt these goals by provoking system failure, slow running and quality problems [4,5].
This paper intends to demonstrate the effects of cybersecurity threats on leanness of the system by discussing the impact of cyber-physical threats on principles of TPM and overall equipment effectiveness (OEE) as its key performance indicator (KPI). A structured review of previous research provides evidence of the proposed concepts. While other researchers mostly attempted to demonstrate the effect of cybersecurity threats in terms of monetary consequences [6,7], this paper emphasizes on their effects on leanness of the system and its performance.
The remaining contents are organized as follows. Section 2 elaborates and discusses new challenges brought by recent integration of the cyber world with the traditional manufacturing world on the main principles of the total productive maintenance. Section 3 explains the effects of these threats on each of the components of the OEE of the system as the main KPI of TPM. Section 4 discusses further concerns of cybersecurity in a lean smart manufacturing and concludes the paper with suggestions for the future works.  The traditional TPM model consists of eight principles, also referred as the pillars of the TPM, namely, autonomous  maintenance, focused improvement, planned maintenance, quality management, development management, education  and training, administrative & office TPM, and safety health environment [8]. The base of these pillars is that 5S method that tries to make a well-organized work environment by five actions: sort, set in order, shine, standardize, and sustain [9]. In this section, the eight pillars are discussed with respect to threats and concerns of cybersecurity toward each of them.

Autonomous maintenance
This pillar is the mother pillar of TPM. It places routine maintenances of machines such as cleaning, lubrication, inspection, and adjusting in the operator's hand. It brings the ownership attitude to the operator of the machine. This way the maintenance personnel will be freed to perform a higher-level task.
According to literatures [10,11] the system is subject to vulnerabilities whenever a human entity is interacting with cyber entities. Introducing new and complicated machines to do tasks such as inspection could be very challenging to operators. Since operators are normally less computer savvy, it will be hard for them to recognize cyber threats and possible alterations in their jobs that could be coming from cybersecurity breach. Wells et al. [12] illustrate the weakness of the operators' knowledge by a case study in which they altered the transferred file for a CNC machine by a virus and result was that only a few groups determined the quality issue in the product and none could diagnose the source of the problem.

Focused improvement
This pillar aims to improve overall equipment effectiveness (OEE) by minimizing waste in the system. Normally it is deployed by a small group of employees that regularly identify and resolve recurring problems in order to incrementally improve the operation of the equipment.
Based on the Bekar et al. [13] research in industry 4.0 cybersecurity concerns is a pressing issue for this pillar. They reached to this conclusion by collecting opinions of the decision makers including TPM, Production, and Quality managers to measure the impact of key technologies of Industry 4.0.

Planned Maintenance
Planned maintenance is prescheduled repairs, adjustments or replacing of a part due to an inspection that increases the life span of the machine and prevents breakdown. The schedule is normally created according to historical or predicted failure rates. Figure 2. The traditional TPM model consists of a 5S foundation (Sort, Set in Order, Shine, Standardize, and Sustain) and eight supporting activities [8] The cybersecurity concern for this pillar is to have a proper prescheduled preventive maintenance not only for traditional machines but also to have it for the network, mobile devices, and RFIDs, hardware and software as well [14][15][16]. These planned maintenances are the main way to prevent or mitigate the impact of a cyber-attack.

Quality maintenance
Quality maintenance aims to have a zero defect in products and processes through analysis of root causes of failures and defects to eliminate the source of the quality defects. In the new age of manufacturing, there are two challenges regarding the quality of the product. The first challenge is to detect the quality issue in a product and the second one is to find the source of the failure and eliminate it. This argument will be further discussed in the OEE section.

Education and Training
Education and training are one of the most essential principles of TPM that ensures that the employees and staff involved with the TPM have the proper knowledge and skills to have a successful deployment of TPM. This principle must get more serious attention since manufacturing systems often lack qualified employees regarding the skill and knowledge for cybersecurity incorporation with the production system [17,18]. Training current and future TPM employees including operators, maintenance staff, and managers on the potential threats of cybersecurity plays a key role in having a successful TPM implementation for smart manufacturing systems [19].

Safety Health Environment
This principle tends to implement a methodology to maintain a safe and healthy environment for all staff and employees by eliminating potential health and safety risks to reach an accident-free workplace. This principle is also exposed to vulnerabilities from cybersecurity threat. According to Wu et al. [20] in a cyber manufacturing system human can be targeted as the victim. Operators, assembly workers working close to autonomous robots and machines are endangered when hackers can send malicious control to actuators [21].
Moreover, by increasing the trend of using safety-critical mechanical systems such as next-generation composite aircraft, sustainable energy, artificial heart valves, automated drug dispensary equipment, high-speed rail systems, and gas turbines any interruption or alteration in their precise engineering could cause a significant safety problem [22][23][24][25].

Office TPM
Office TPM is to provide all necessary administrative support in all areas of the system and apply TPM techniques to administrative functions. It extends the benefits of TPM beyond the floor plans. Some losses addressed by office TPM are processing and communication losses, office equipment breakdown, and communication channel breakdown [26,27]. This is the area in which the traditional IT cybersecurity is involved. In this part, the main threats to the system are compromising and stealing of intellectual properties of a company.

Early/equipment management
Early equipment management or also called development management tends to minimize the problems and running time for installing new equipment. Also, it improves the development of the new equipment by directing practical knowledge and understanding gained from TPM [28]. Clearly, for smart manufacturing system considering cybersecurity issues for the development of new equipment with the minimal problem is important. Table 1 is summarizing the cybersecurity concerns in manufacturing systems regarding each pillar of TPM. It also provides related literature for each pillar. As can be seen from in this table while some of the pillars have received more attention from the research community others such as focused improvement and development management pillars demonstrate opportunities for researches. Less computer savvy operators [10][11][12]18] Focused improvement Consideration of losses regarding cyber-physical security [13] Planned maintenance Planning effective PM to prevent and mitigate cyber-physical attacks [14][15][16] Quality maintenance Detecting a quality issue regarding cyber-physical attacks and the ability to analyze its source [29][30][31] Education and training Improving cybersecurity skills and awareness of operators, maintenance staff, and management [17][18][19] Safety, health, and environment Eliminate incidents of injuries and accidents caused by cyber threats [20][21][22][23][24][25] Office TPM Addressing traditional IT cybersecurity, securing intellectual properties, data and network [13,26,27] Development management Minimal cybersecurity problems from new equipment [28]

Overall Equipment Effectiveness
Overall equipment effectiveness (OEE) is the main KPI to measure the effectiveness of TPM in a system. It identifies the percentage of planned production time that is truly productive. Equation 1 demonstrates the model to calculate OEE of a system. It consists of three components that are aligned with the TPM goals of no breakdowns (measured by availability), no small stops or slow running (measured by performance), and no defects (measured by quality) and each component contribute with a type of productivity loss. Each of these components can be affected by the cyber-physical attack that is discussed in the following sub-sections.

Availability
Availability takes into account losses that cause a stop in planned production for a considerable length of time (typically several minutes or longer). It includes breakdowns, repairs, changeover, adjustment and startups. As a common result of cyber-attacks, the availability of the system could be lost.
Industrial control systems could be a primary target for attackers in a manufacturing system. Due to the complexity of industrial control, system attackers need to have high skills and familiarity with the control system to execute an attack. As a result, these attacks normally are a state or a government funded. Li et al. [32] demonstrate that many physical processes controlled by SCADA system could be targeted in a cyber-physical system.
The most famous attack to this day has been a worm called Stuxnet that attacked the Iranian nuclear enrichment plant in Natanz in 2009 and 2010. It does little or no harm to computers, instead, it checks if the computer is connected to programmable logic controllers (PLCs). If so, it alters the PLCs' programming, results to centrifuges spun too fast for too long which causes the destruction of the equipment [33,34]. Another incident is the infection of the industrial control system of German Steel Mill that caused failure in multiple components of the system [35].
However, the threat to availability is not just the infection of control systems. Availability loss could be the result of a different mechanism. For instance, infection of computers in manufacturing plant with WannaCry virus caused Honda, the automobile manufacturer, to shut down production in a plant. This virus takes advantage of legacy systems and takes control of the infected computer and demands payment via Bitcoin [36].

Performance
Performance considers losses that happened because of the performance of the production system with less than the maximum possible running speed. It includes slow cycles and short stoppages. In the case of a cybersecurity attack, there are many scenarios that could lead to longer cycle time and hence decrease the performance of the system.
One of these scenarios is to change the process parameters of the system which leads the system to perform with lower yield rate. This applies to both additive manufacturing and subtractive manufacturing. In subtractive manufacturing simply changing the G-code or M-code to make the spindle or the feed rate work slower not only will cause a longer production but also could potentially change the mechanical properties of the product [37,38]. This could be worsen when metals and alloys are used and could endanger the system where weak or damaged components are used in safety-critical systems, potentially endanger human lives [39,40]. Similarly, any alteration in nozzle speed or motion of printer head in additive manufacturing could result in lower performance [41][42][43].
Information delay has an adverse effect on flexible manufacturing system performance rate and the potential to disrupt production schedules [44]. Also, with the increasing demand for utilizing cloud manufacturing any interruption with the resource allocation, service composition and service operation management could adversely affect system performance [45][46][47]. An attack can drastically reduce performance by altering a manufacturing system, resulting in impaired communication, functionality or reduced performance [48].
One of the most significant vulnerabilities of manufacturing systems regarding performance is probably supply chain security. With the highly outsourced supply chain and dependability on suppliers for manufacturing systems, any interruption in the supply chain could cause a delay of satisfying customer demands. In a recent incident, a virus attack to a supplier of Apple Co. delayed the shipment of the company's products. The Taiwan Semiconductor Manufacturing Company -the world's largest chip manufacturerwas forced to shut down production for a few days. The company said some of the computers and 80% of its manufacturing tools had been infected by a virus [49].

Quality
Quality takes into account losses when a manufactured part or product do not meet quality standards. It includes product rejects, scraps and reworks. Wells et al. [12] mention quality control as one of the weaknesses in manufacturing systems regarding the cybersecurity. The first issue can come from the silent cyber-attacks when the system cannot catch the quality changes made by a malicious attack. It could happen when the quality control process is under attack as well as the production system [22]. In these circumstances, there are two ways that an attack could harm the system. Firstly, by making the system accept bad parts that even though it will not affect OEE directly will cause problems later and secondly, by rejecting good quality parts by inserting faulty criteria. Zeltmann et al. [29] describe a situation in which the quality of a part is compromised in additive manufacturing by alteration in printing direction which would pass the quality control completely undetected. Considering having the integrity of the quality control process intact, a variety of attack methods could directly affect the quality of a product [30]. This could be done through altering part quality definitions, reporting falsified data, acquiring QC implementation data, altering product design and altering manufacturing processes. Sturm et al [31] describe also the vulnerability of using .STL files in additive manufacturing that could cause quality issues. Table 2 presents a few examples of cybersecurity incidents for each of OEE components. This should be mentioned that sometimes one incident could cause harm to more than one component of OEE. For example, any interruption in the availability of a company which is the supplier of a bigger one could harm the performance of the bigger company.

Discussion and conclusion
The best way to defend a system and mitigate cybersecurity threat in a system is actually having a reliable TPM that could ensure and protect them from these threats. To implement such a TPM in a system, it is necessary to consider all factors involved with the interconnectivity of cyber and physical domain in a smart manufacturing system. These factors could jeopardize the availability, performance, and quality of the system as well as safety concerns which are the primary goals of TPM.
Some of these threats could instantly harm the system by reducing availability, performance or quality of the system. However, there are threats that target the integrity of the system which it will have an indirect effect on the productivity of the system; which sometimes does not affect it instantly. Another concern regarding the cybersecurity of a lean system with implemented TPM is the visibility of a cyber-attack. As the number of attacks has been increasing over the past decade the visibility of these attacks is decreased that means it is getting harder to discover a cyberattack in a system [12]. So designing a proper defense policy and quality assurance system to discover all interference in the system is essential. Moreover, have a low recovery time, also considered as repair time, is very important to have an agile maintenance system. Mean time between failure (MTBF) and mean time to repair (MTTR) as the contributor factors to the availability of a system depends directly on low repair time. So, having a proper plan for recovery from an attack and to do all the repairs in minimum time plays an important role in the availability of the system.
For future works, evaluating and assessing cybersecurity in manufacturing systems considering their unique characteristics which differentiate it from the traditional IT security is necessary. Measuring the consequence of attacks in term of overall equipment effectiveness and not in monetary format could provide a new insight into this domain. Lastly, as it was shown in this paper some of the pillars of TPM has not received enough attention regarding cybersecurity issues which could be considered for future research direction. Similarly, regarding OEE most of the researches focus on the effect of cybersecurity concerns on availability and quality which leave the performance as another future research opportunity.