AccessMyLibrary provides FREE access to over 30 million articles from top publications available through your library.
Create a link to this page
Copy and paste this link tag into your Web page or blog:
INTRODUCTION
The purpose of this paper is to propose a meta methodology to promote engineering safety by learning from previous system failures. The predominant worldview in IT engineering is that systems failures can be prevented at the design phase. This worldview is obvious if we examine mainstream, current methodologies for managing system failures. These methodologies use a reductionist approach and are based on a static model (Nakamura and Kijima, 2007, 2008). It is often pointed out that most such methodologies have difficulty coping with emergent properties in a proactive manner and preventing the introduction of various side effects from quick (i.e. temporary) fixes, which leads to repeating failures of similar type. The main reason for this situation is that current methodologies tend to identify a system failure as a single, static event, so organizational learning tends to be limited to a single loop rather than a double loop in rectifying the model of the model (i.e. the meta model) of action (i.e. the operating norm). This indicates that we need a meta methodology that can manage the dynamic aspects of system failure, by ensuring the efficacy of its countermeasures through the promotion of double loop learning.
In this paper, we propose a meta methodology called System of System Failures (SOSF), along with a system diagnostic failure flow, in order to overcome the current methodologies' shortcomings. We also demonstrate this meta methodology's efficacy through an application in IT engineering.
In the following section, we explain the features of current troubleshooting techniques and limitations with respect to certain aspects of system failures. In the next section, the three key features required in order to overcome these limitations, as well as SOSF, which actually overcomes the limitations are described. In the subsequent section, we discuss the dynamic aspects of system failures and their side effects, which is followed by tools (such as the diagnostic system failure flow) that fully utilize SOSF in the actual application phase to promote double loop learning. Furthermore, we give an application example of a server problem, and finally followed by a concluding discussion of the efficacy of SOSF.
LIMITATIONS OF CURRENT TROUBLESHOOTING TECHNIQUES
The predominant technology of current IT troubleshooting is based on a predefined goal-seeking model, van Gigch (1991) points out the main shortcomings of system improvement in this model, as follows: (1) Engineers look for causes of malfunctions within the system boundary. The rationale of system improvement tends to justify systems as ends in themselves, without considering that a system exists only to satisfy the requirements of larger systems in which it is included. (2) Engineers seek to restore systems back to normal. A lasting solution cannot result from an improvement in the operation of a present system. An improvement in operations is not a lasting improvement. (3) Engineers tend to hold incorrect, obsolete assumptions and goals. It is not difficult to find organizations in which the formulation of assumptions and goals has not been explicit. Fostering system improvement in this context is senseless. (4) Engineers act as 'planner followers' rather than as 'planner leaders'. Another manifestation of the problem of holding incorrect assumptions and pursuing the wrong goals can be traced to different concepts of planning and of the planner's role. In the context of system design, the planner must be a planner leader, planning to influence trends, instead of a planner follower, planning to satisfy trends.
This paper focuses on system failure aspects that current methodologies cannot manage properly in the sense pointed out by van Gigch. To summarize, these aspects are soft, systemic, emergent and dynamic (i.e. they accommodate multiple stakeholders' worldviews).