2003
Conference article
Restricted
A Fault-Tolerant Distributed Legacy-based System and Its Evaluation
Bondavalli A, Chiaradonna S, Cotroneo D, Romano LIn this paper, we present a complete architecture for improving the dependability of complex COTS and legacy-based systems. For long-lived applications, such as most of those being constructed nowadays via integration of legacy subsystems, fault treatment is a very important part of the fault tolerance strategy. The paper advocates the need for careful diagnosis and damage assessment, and for precise and effective recovery actions, specifically tailored to the a®ecting fault and/or to the extent of the damage in the affected component. In our proposal, threshold-based mechanisms are exploited to trigger alternative actions. The design and implementation of the resulting solution is illustrated with respect to a case study. This consists of a distributed architectural framework, handling replicated legacy-based subsystems. Replication and voting are used for error detection and masking. An experimental prototype deployed over a COTS-based LAN is described and has allowed a dependability analysis, via combined use of direct measurements and analytical modeling.DOI: 10.1007/978-3-540-45214-0_22Metrics:
See at:
doi.org
| CNR IRIS
| CNR IRIS
| link.springer.com
| www.scopus.com
2004
Journal article
Restricted
Effective fault treatment for improving the dependability of COTS- and legacy-based applications
Bondavalli A, Chiaradonna S, Cotroneo D, Romano LThis paper proposes a novel methodology and an architectural framework for handling multiple classes of faults (namely, hardware-induced software errors in the application, process and/or host crashes or hangs, and errors in the persistent system stable storage) in a COTS and Legacy-based application. The basic idea is to use an evidence-accruing fault tolerance manager to choose and carry out one of multiple fault recovery strategies, depending upon the perceived severity of the fault. The methodology and the framework have been applied to a case study system consisting of a Legacy system, which makes use of a COTS DBMS for persistent storage facilities. A thorough performability analysis has also been conducted via combined use of direct measurements and analytical modeling. Experimental results demonstrate that effective fault treatment, consisting of careful diagnosis and damage assessment, plays a key role in leveraging the dependability of COTS and Legacy-based applications.Source: IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, vol. 1 (issue 4), pp. 223-237
DOI: 10.1109/tdsc.2004.40Metrics:
See at:
IEEE Transactions on Dependable and Secure Computing
| CNR IRIS
| CNR IRIS
2008
Journal article
Restricted
Analysis of a redundant architecture for critical infrastructure protection
Daidone A, Chiaradonna S, Bondavalli A, Verissimo PCritical infrastructures like the power grid are emerging as collection of existing separated systems of different nature which are interconnected together. Their criticality becomes more and more evident as the damage and the risks deriving from wrong behaviors (both accidental and intentionally caused) are increasing. It is becoming evident that existing (legacy) subsystem must be interconnected together following some disciplined and controlled way. This is one of the challenges taken by the European Project CRUTIAL, where an infrastructure architecture seen as a WAN of LANs is being proposed, where LANs confine existing sub-systems, protected by special interconnection and filtering devices (CIS - CRUTIAL Information Switches). Previous work led to the definition of the CIS internal and interconnection architecture, so that a set of CIS can collectively ensure that the computers controlling the physical process correctly exchange information despite accidents and malicious attacks. CIS resilience is achieved thanks to replication for intrusion tolerance and replica recovery for self-healing. This chapter analyzes the redundant architecture of the CIS, with a set of objectives: identifying the relevant parameters of the architecture; evaluating how effective is the trade-off between proactive and reactive recoveries; and finding the best parameter setup. Two measures of interest were identified, a model of the recovery strategy was constructed and the quantitative behavior of the recovery strategy was analyzed. The impact of the detection coverage, of the intrusions and of the number of CIS replicas was analyzed and discussed. The directions for refining and improving the recovery strategy were proposed.DOI: 10.1007/978-3-540-85571-2_4Metrics:
See at:
doi.org
| CNR IRIS
| CNR IRIS
| www.springerlink.com
2006
Conference article
Restricted
Integration of an MPS modeling approach into mobius
Bondavalli A, Chiaradonna S, Lollini P, Squittieri FIn this paper we present an extension to the Mobius Framework to deal with Multiple Phased Systems (MPS). MPS are a special class of systems whose operational life can be partitioned in a set of disjoint periods, called phases. Due to their deployment in critical applications, the dependability modeling and analysis of MPS is a task of primary relevance. In the philosophy of an extensible multiformalism multi-solution modeling framework such as Mobius, and due to its wide usage, we have developed an extension for the MPS modeling process. MPS models can be defined using our approach and solved using the simulation supports already available in Mobius.
See at:
CNR IRIS
| CNR IRIS
2002
Conference article
Restricted
Implementation of threshold-based diagnostic mechanisms for COTS-based applications
Romano L, Bondavalli A, Chiaradonna S, Cotroneo DThis work investigates feasibility issues that must be addressed when threshold-based mechanism are to be used for diagnostic purposes in COTS-based distributed systems. Threshold based mechanism have typically been used for such purposes in embedded systems. A variety of solutions exist, with different characteristics of completeness, accuracy, and induced overhead. We first discuss the challenges related to applying such mechanisms to COTS-based distributed applications. We then identify alternative strategies for diagnosis, which use run-time data on COTS component service failures to trigger alarms to reconfiguration and fault treatment mechanisms. We implement those strategies in a system prototype, which is based on a substantial application, i.e. a real world
See at:
CNR IRIS
| CNR IRIS
2007
Conference article
Restricted
A simulator for performability analysis of electrical power systems considering interdependencies
Romani F, Chiaradonna S, Di Giandomenico F, Simoncini LElectric Power Systems (EPS) become more and more critical for our society, but evaluating dependability and performability measures of such systems is a highly challenging task. Existing EPS are composed by two complex and tightly cooperating infrastructures: the Electric Infrastructure (EI) for the electricity generation and transportation to final users, and its Computer-based Control System (CCS), introduced in addition to existing SCADA systems and devoted to control the dynamics of EI and to trigger the reconfigurations in emergency situations. Significant dif- ficulties to analyze EPS are posed by the very high complexity of these infrastructures and by the tight coupling between them. Moreover, the complex interactions between such infrastructures make harder or just practically impossible both to analyze the overall system and to decompose it to focus on each single infrastructure. There is also a lack of well-established theories, models and tools supporting them, since studies on these topics are at an early stage of development. The European project CRUTIAL1, started on January 2006, aims to improve the studies in this field, with explicit focus on interdependencies between EI and the rest of the surrounding environment, in particular CCS. CRUTIAL also addresses new networked CCS systems for the management of the electric power grid, focusing on the issues arising from connection of artefacts controlling the physical process of electricity transportation to corporate networks (intranets) and to Internet.
See at:
CNR IRIS
| CNR IRIS
2007
Other
Open Access
Simulation Models and Implementation of a Simulator for the Performability Analysis of Electric Power Systems Considering Interdependencies
Romani F, Chiaradonna S, Di Giandomenico F, Simoncini LElectric Power Systems (EPS) become more and more critical for our society, since they provide vital services for the human activities. At the same time, obtaining dependable behaviour of EPS is an highly challenging task, both in terms of defining effective business management and in terms of analysis of dependability and performability attributes. A major concern when dealing with EPS is the understanding and the evaluation of the interdependencies between Electric Infrastructures (EI) and the Computer-based Control System (CCS), which controls the status and the activities of EI. Studies on these interdependencies are only at early stage of development. Major difficulties are the complexity of the infrastructures under analysis and the lack of well-established models and tools for dealing with them. This paper presents an ad-hoc simulator for the evaluation of dependability and performability measures in EPS. The system model the simulator is based on focuses on interdependencies between EI and CCS. Most existing modeling approaches in EPS do not provide explicit modeling of interdependencies among the composing subsystems, so that the cascading or escalating phenomena can not be deeply analysed. Our stochastic model is composed by separated and simple, but representative, submodels representing the dynamics of EI and different policies of reactions to disruptions and reconfigurations triggered by CCS. In this way, the simulator aims to provide explicit modeling of the interdependencies between the main subsystems, so the impact on the dependability and performability of the cascading or escalating failures can be analyzed. In this paper, we describe the simulator and highlight the design choices.
See at:
CNR IRIS
| ISTI Repository
| CNR IRIS
2000
Conference article
Metadata Only Access
See at:
CNR IRIS
2000
Conference article
Restricted
DEEM: a tool for the dependability modeling and evaluation of multiple phased systems
Bondavalli A, Mura I, Chiaradonna S, Filippini R, Poli S, Sandrini FMultiple-Phased Systems, whose operational life can be partitioned in a set of disjoint periods, called "phases", include several classes of systems such as Phased Mission Systems and Scheduled Maintenance Systems. Because of their deployment in critical applications, the dependability modeling and analysis of Multiple-Phased Systems is a task of primary relevance. However, the phased behavior makes the analysis of Multiple-Phased Systems extremely complex.. This paper is centered on the description and application of DEEM, a dependability modeling and evaluation tool for Multiple Phased Systems. DEEM supports a powerful and efficient methodology for the analytical dependability modeling and evaluation of Multiple Phased Systems, based on Deterministic and Stochastic Petri Nets and on Markov Regenerative Processes.DOI: 10.1109/icdsn.2000.857541Metrics:
See at:
doi.org
| CNR IRIS
| CNR IRIS
1994
Conference article
Restricted
On performability modeling and evaluation of software fault tolerance structures
Chiaradonna S, Bondavalli A, Strigini LAn adaptive scheme for software fault-tolerance is evaluated from the point of view of performability, comparing it with previously published analyses of the more popular schemes, recovery blocks and multiple version programming. In the case considered, this adaptive scheme, "Self-Configuring Optimistic Programming" (SCOP), is equivalent to N-version programming in terms of the probability of delivering correct results, but achieves better performance by delaying the execution of some of the variants until it is made necessary by an error. A discussion follows highlighting the limits in the realism of these analyses, due to the assumptions made to obtain mathematically tractable models, to the lack of experimental data and to the need to consider also resource consumption in the definition of the models. We consider ways of improving usability of the results of comparative evaluation for guiding design decisions.
See at:
CNR IRIS
| CNR IRIS
| www.scopus.com
1994
Other
Open Access
Comparative performability evaluation of RB, NVP and SCOP
Chiaradonna S, Bondavalli A, Strigini LAn adaptive scheme for software fault-tolerance is evaluated from the point of view of performability, comparing it with previously published analyses of the more popular schemes, recovery blocks and multiple version programming. In the case considered, this adaptive scheme, "Self-Configuring Optimistic Prograrnrning" (SCOP), is equivalent to N-version programming in terms of the probability of delivering correct results, but achieves better performance by delaying the execution of some of the variants until it is made necessary by an error. We discuss, by mean of an example, the application of modelling to realistic problems in fault-tolerant design.
See at:
CNR IRIS
| CNR IRIS
1995
Contribution to book
Restricted
Rational design of Multiple-Redundant systems : adjudication and fault treatment
Bondavalli A, Chiaradonna S, Di Giandomenico F Strigini LThe design of fault-tolerant systems should ideally be based on rigorous predictions of the effects of design decisions on the achieveddependability. However, the complexity of the task is such that these decisions are typically based on ingrained, time-proven practice, without the benefit of thorough mathematical analysis. We analyse two specific problems in fault-tolerant design based on modular replication (with or without design diversity). First, we consider derivation of a single correct result from the multiple results produced the replicas in a redundant component. Many designs have been proposed in the literature. supposed to improve upon simple majority voting. but without a unified, rigorous analysis to assist design choices. We describe such a general method for evaluating and comparing adjudicators, in probabilistic terms, and specify an optimal adjudicator, which yields the highest possible rei iabi li ty for a redundant component, given the (probabilistic) failure characteristics of its subcomponcnts. Our analysis applies to components with and without a fai l-safc mode. Second, we consider fault treatment: how the decision can be made to remove a replica of a component, considering it permanently failed, on the basis of its history of agreement/disagreement with other replicas. The problem is compounded by transient faults, which make it undesirable to disconnect a component at the first signs of errors, and by the use of dynamic error processing, in which the number of replicas executed depends on whether disagreements are observed. For this problem, we choose a scheme integrating dynamic error processing with diagnosis and disconnection of components that may be permanently failed, and show how its behaviour can be compared with alternative designs via simulation.
See at:
CNR IRIS
| CNR IRIS
2002
Journal article
Restricted
An adaptive approach to achieving hardware and software fault tolerance in a distributed computing environment
Bondavalli A, Chiaradonna S, Di Giandomenico F, Xu JThis paper focuses on the problem of providing tolerance to both hardware and software faults in independent applications running on a distributed computing environment. Several hybrid-fault-tolerant architectures are identified and proposed. Given the highly varying and dynamic characteristics of the operating environment, solutions are developed mainly exploiting the adaptation property. They are based on the adaptive execution of redundant programs so as to minimise hardware resource consumption and to shorten response time, as much as possible, for a required level of fault tolerance. A method is introduced for evaluating the proposed architectures with respect to reliability, resource utilisation and response time. Examples of quantitative evaluations are also given.Source: JOURNAL OF SYSTEMS ARCHITECTURE, vol. 47, pp. 763-781
See at:
CNR IRIS
| CNR IRIS
2004
Journal article
Restricted
Dependability modeling & evaluation of multiple-phased systems using DEEM
Bondavalli A, Chiaradonna S, Di Giandomenico F, Mura IMultiple-Phased Systems (MPS), i.e., systems whose operational life can be partitioned in a set of disjoint periods, called ``phases'', include several classes of systems such as Phased Mission Systems and Scheduled Maintenance Systems. Because of their deployment in critical applications, the dependability modeling and analysis of Multiple-Phased Systems is a task of primary relevance. The phased behavior makes the analysis of Multiple-Phased Systems extremely complex. This paper describes the modeling methodology and the solution procedure implemented in DEEM, a dependability modeling and evaluation tool specifically tailored for Multiple Phased Systems. It describes its use for the solution of representative MPS problems. DEEM relies upon Deterministic and Stochastic Petri Nets as the modeling formalism and on Markov Regenerative Processes for the model solution. When compared to existing general-purpose tools based on similar formalisms, DEEM offers advantages on both the modeling side (sub-models neatly model the phase-dependent behaviors of MPS), and on the evaluation side (a specialized algorithm allows a considerable reduction of the solution cost and time). Thus, DEEM is able to deal with all the scenarios of MPS that have been analytically treated in the literature, at a cost which is comparable with that of the cheapest ones, completely solving the issues posed by the phased-behavior of MPS.Source: IEEE TRANSACTIONS ON RELIABILITY, vol. 53 (issue 4), pp. 509-522
DOI: 10.1109/tr.2004.837709Metrics:
See at:
IEEE Transactions on Reliability
| CNR IRIS
| CNR IRIS
2009
Journal article
Restricted
Assessing the impact of interdependencies in electric power systems
Chiaradonna S, Di Giandomenico F, Lollini PElectric power systems (EPS) greatly support our daily activities and are therefore among the most prominent critical infrastructures that need to be reliable and resilient in providing their services. They are rather complex and vulnerable systems, being composed by two interdependent infrastructures: the electric infrastructure (EI) and its information-technology-based control system (ITCS), which controls and manages EI. Understanding the reciprocal effect of interdependencies among interacting infrastructures is tackled by many studies in several application sectors. In this paper, we address the quantitative assessment of the impact of interdependencies in EPS, focusing on blackoutsrelated indicators. The obtained results contribute to better understand the EPS vulnerabilities and are expected to provide useful guidelines towards enhanced design choices for EPS protection at architectural level.Source: INTERNATIONAL JOURNAL OF SYSTEM OF SYSTEMS ENGINEERING, vol. 1 (issue 3), pp. 367-386
DOI: 10.1504/ijsse.2009.02991Metrics:
See at:
CNR IRIS
| CNR IRIS
| www.inderscience.com
2006
Conference article
Restricted
Hidden markov models as a support for diagnosis: formalization of the problem and synthesis of the solution
Daidone A, Di Giandomenico F, Bondavalli A, Chiaradonna SIn modern information infrastructures, diagnosis must be able to assess the status or the extent of the damage of individual components. Traditional one-shot diagnosis is not adequate, but streams of data on component behavior need to be collected and filtered over time as done by some existing heuristics. This paper proposes instead a general framework and a formalism to model such over-time diagnosis scenarios, and to find appropriate solutions. As such, it is very beneficial to system designers to support design choices. Taking advantage of the characteristics of the hidden Markov models formalism, widely used in pattern recognition, the paper proposes a formalization of the diagnosis process, addressing the complete chain constituted by monitored component, deviation detection and state diagnosis. Hidden Markov models are well suited to represent problems where the internal state of a certain entity is not known and can only be inferred from external observations of what this entity emits. Such over-time diagnosis is a first class representative of this category of problems. The accuracy of diagnosis carried out through the proposed formalization is then discussed, as well as how to concretely use it to perform state diagnosis and allow direct comparison of alternative solutions.
See at:
CNR IRIS
| CNR IRIS
2006
Conference article
Restricted
Model-based dimensioning of CAUTION++
Di Giandomenico F, Chiaradonna S, Galliano E, Mura IReal-time adaptive management of wireless networks radio resources is a challenge that has been tackled by various EU funded projects, and that has led to the prototypal implementation of several network management systems. Dimensioning of such network management systems, which must be able to quickly react to varying traffic load conditions in the different segments of the controlled network, is of vital importance to ensure smooth transitions from normal operations states to congested ones, while achieving the best utilization of radio resources. To this purpose, a performance modeling approach is followed and is applied to a recently developed control infrastructure system for heterogeneous mobile networks. Specifically, a model that reproduces the internal processing and the communications among system components is built. The model is triggered by an increasing rate of service requests, to identify which system configurations are capable to satisfy the incoming flow of requests while respecting the time constraints of system operation.
See at:
CNR IRIS
| CNR IRIS