A Near-Miss Management System to Facilitate Forensic Investigation of Software Failures
Bihina Bella, Eloff, and Olivier
2014
Citation information
M. Bihina Bella, J. H. P. Eloff, and M. S. Olivier. “A Near-Miss Management System to Facilitate Forensic Investigation of Software Failures”. In: Proceedings of the 13th European Conference on Cyber Warfare and Security. Ed. by A. Liaropoulos and G. Tsihrintzis. Academic Conferences, 2014, pp. 233–241Abstract
The increasing complexity of software applications can lead to operational failures that have disastrous consequences. In order to prevent the recurrence of such failures, a thorough post-mortem investigation is required to identify the root causes involved. This root cause analysis must be based on reliable digital evidence to ensure its objectivity and accuracy. However, current approaches to failure analysis do not promote the collection of digital evidence for causal analysis. A promising alternative is offered by the field of digital forensics. Digital forensics uses proven scientific methods and principles of law to determine the cause of an event based on forensically sound evidence. However, being a reactive process, digital forensics can only be applied after the occurrence of costly failures. This limits its effectiveness, as volatile data that could serve as potential evidence may be destroyed or corrupted after a system crash. A more proactive approach to digital forensics is therefore required. The analysis of near misses is a promising solution to the above issue. Unlike failures, near misses do not result in loss. Instead, they are high-risk situations with the potential for loss or damage, and as such, are often forerunners to serious failures. The detection of near misses therefore provides an opportunity to safely collect relevant failure-related data before the actual failure occurs. Near-miss analysis has been implemented successfully for decades in many engineering disciplines, but it is not yet readily used in the IT industry. The current paper therefore proposes the architecture of a near-miss management system suitable for the software industry by proposing a definition of a near miss from an IT system perspective. The proposed definition is based on the allowed downtime indicated in the Service Level Agreement (SLA), which specifies the system’s contractually agreed performance level. The downtime-based definition of near misses is then used to detect and classify near misses based on their risk level.
Full text
A pre- or postprint of the publication is available at https://mo.co.za/open/nmms.pdf.BibTeX reference
@inproceedings(nmms,author={Bihina Bella, Madeleine and Jan H P Eloff and Martin S Olivier},
title={A Near-Miss Management System to Facilitate Forensic Investigation of Software Failures},
booktitle={Proceedings of the 13th European Conference on Cyber Warfare and Security},
editor={Andrew Liaropoulos and George Tsihrintzis},
pages={233--241},
publisher={Academic Conferences},
year={2014} )