In vivo test and rollback of Java applications as they are Bertolino A., De Angelis G., Miranda B., Tonella P. Modern software systems accommodate complex configurations and execution conditions that depend on the environment where the software is run. While in house testing can exercise only a fraction of such execution contexts, in vivo testing can take advantage of the execution state observed in the field to conduct further testing activities. In this paper, we present the Groucho approach to in vivo testing. Groucho can suspend the execution, run some in vivo tests, rollback the side effects introduced by such tests, and eventually resume normal execution. The approach can be transparently applied to the original application, even if only available as compiled code, and it is fully automated. Our empirical studies of the performance overhead introduced by Groucho under various configurations showed that this may be kept to a negligible level by activating in vivo testing with low probability. Our empirical studies about the effectiveness of the approach confirm previous findings on the existence of
faults that are unlikely exposed in house and become easy to expose in the field. Moreover, we include the first study to quantify the coverage increase gained when in vivo testing is added to complement in house testing.Source: Software testing, verification & reliability (Online) (2023). doi:10.1002/stvr.1857 DOI: 10.1002/stvr.1857 Metrics:
State of practical applicability of regression testing research: a live systematic literature review Greca R., Miranda B., Bertolino A. This repository provides a static version of the dataset used for the study"State of Practical Applicability of Regression Testing Research: A Live Systematic Literature Review". The data is available in CSV format (for easy parsing) and HTML format (for readability). HTML files require the included sheet.cssfile (provided by Google Sheets). A non-static version of this data is available via Google Sheets. Furthermore, this data was used to create a live repository of papers on the topic of software regression testing. Itis open for contributions.Source: Computing surveys 55 (2023). doi:10.1145/3579851 DOI: 10.1145/3579851 Metrics:
Orchestration strategies for regression test suites Greca R., Miranda B., Bertolino A. Regression testing is widely studied in the literature, although most research on the topic is concerned with improving specific sub-challenges of a wider goal. Test suite orchestration proposes a more comprehensive view of the challenge of regression testing, by merging and combining different techniques with a variety of objectives, including prioritizing, selecting, reducing and amplifying tests, detecting flaky tests and potentially more. This paper presents the key approaches and techniques that form test suite orchestration, along with common evaluation metrics, and discusses how they can be used together to ultimately provide an efficient and effective regression testing strategy. To illustrate the benefits of orchestration, we provide some examples of existing papers that take steps towards this goal, even if the specific terminology is not yet used. Orchestrated strategies utilizing existing regression testing techniques provide a pathway to practicality and real-world usage of the academic literature.Source: AST 2023 - IEEE/ACM International Conference on Automation of Software Test, pp. 163–167, Melbourne, Australia, 15-16/05/2023 DOI: 10.1109/ast58925.2023.00020 Metrics:
Cross-coverage testing of functionally equivalent programs Bertolino A., De Angelis G., Di Giandomenico F., Lonetti F. Cross-coverage of a program P refers to the test coverage measured over a different program Q that is functionally equivalent to P. The novel concept of cross-coverage can find useful applications in the test of redundant software. We apply here cross-coverage for test suite augmentation and show that additional test cases generated from the coverage of an equivalent program, referred to as cross tests, can increase the coverage of a program in more effective way than a random baseline. We also observe that -contrary to traditional coverage testing-cross coverage could help finding (artificially created) missing functionality faults.Source: AST 2023 - IEEE/ACM International Conference on Automation of Software Test, pp. 101–111, Melbourne, Australia, 15-16/05/2023 DOI: 10.1109/ast58925.2023.00014 Metrics:
Fault localization for reinforcement learning Morán J., Bertolino A., De La Riva C., Tuya J. Reinforcement Learning is widely adopted in industry to approach control tasks in intelligent way. The quality of these programs is important especially when they are used for critical tasks like autonomous driving. Testing and debugging these programs are complex because they behave
autonomously without providing insights about the reasons of the decisions taken. Even these decisions could be wrong if they learned from faults. In this paper, we present the first approach to automatically locate faults in Reinforcement Learning programs. This approach called SBFL4RL analyses several executions to extract those internal states that commonly reduce the performance of the program when they are covered. Locating these states can help testers to understand a known fault, or even detect an unknown fault. SBFL4RL is validated in 2 case studies locating correctly an injected fault. Initial results suggest that the faults of reinforcement learning programs can be automatically located, and there is room for further research.Source: AITest 2023 - The 5th IEEE International Conference on Artificial Intelligence Testing, pp. 49–50, Athens, Greece, 17-20/07/2023 DOI: 10.1109/aitest58265.2023.00016 Metrics:
Model-based security testing in IoT systems: a rapid review Lonetti F., Bertolino A., Di Giandomenico F. Context: Security testing is a challenging and effort-demanding task in IoT scenarios. The heterogeneous devices expose different vulnerabilities that can influence the methods and cost of security testing. Model-based security testing techniques support the systematic generation of test cases for the assessment of security requirements by leveraging the specifications of the IoT system model and of the attack templates.
Objective: This paper aims to review the adoption of model-based security testing in the context of IoT, and then provides the first systematic and up-to-date comprehensive classification and analysis of research studies in this topic.
Method: We conducted a systematic literature review analysing 803 publications and finally selecting 17 primary studies, which satisfied our inclusion criteria and were classified according to a set of relevant analysis dimensions.
Results: We report the state-of-the-art about the used formalisms, the test techniques, the objectives, the target applications and domains; we also identify the targeted security attacks, and discuss the challenges, gaps and future research directions.
Conclusion: Our review represents the first attempt to systematically analyze and classify existing studies on model-based security testing for IoT. According to the results, model-based security testing has been applied in core IoT domains. Models complexity and the need of modeling evolving scenarios that include heterogeneous open software and hardware components remain the most important shortcomings. Our study shows that model-based security testing of IoT applications is a promising research direction. The principal future research directions deal with: extending the existing modeling formalisms in order to capture all peculiarities and constraints of complex and large scale IoT networks; the definition of context-aware and dynamic evolution modelling approaches of IoT entities; and the combination of model-based testing techniques with other security test strategies such as penetration testing or learning techniques for model inference.Source: Information and software technology 164 (2023). doi:10.1016/j.infsof.2023.107326 DOI: 10.1016/j.infsof.2023.107326 Metrics:
A systematic mapping study on security for systems of systems Olivero M. A., Bertolino A., Dominguez-Mayo F. J., Escalona M. J., Matteucci I. In the late twentieth century, the term "System of Systems" (SoS) became popular to describe a complex system made up of a combination of independent constituent systems. Since then, several studies have been conducted to support and assess SoS management, functionality, and performance. Due to the evolutionary nature of SoS and the non-composability of the security properties of its constituent systems, it is difficult to assess or evaluate SoS security. This paper provides an up-to-date survey on SoS security, aimed at stimulating and guiding further research efforts. This systematic mapping study (SMS) focuses
on SoS security, privacy, and trust. Our SMS identified 1828 studies from 6 digital libraries, 87 of which were selected that presented approaches analyzing, evaluating, or improving security. We classified these studies using nine research questions that focused on the nature of the studies, the studied SoS, or the study validation. After examining the selected studies, we identified six gaps and as many future work directions. More precisely, we observed that few studies examine SoS problems and instead propose specific solutions, making it challenging to develop generalizable approaches. Furthermore, the lack of standardization has hindered the reuse of existing approaches, making it difficult for solutions to be generalized to other SoS. In addition, the lack of descriptions of industrial environments in the literature makes it difficult to design realistic validation environments. As a result, the validation of new SoS research remains a challenge in the field.Source: International journal of information security (Internet) (2023). doi:10.1007/s10207-023-00757-0 DOI: 10.1007/s10207-023-00757-0 Metrics:
Designing and testing systems of systems: from variability models to test cases passing through desirability assessment Lonetti F., De Oliveira Neves V., Bertolino A. In the early stages of a system of systems (SoS) conception, several constituent systems
could be available that provide similar functionalities. An SoS design methodology
should provide adequate means to model variability in order to support the
opportunistic selection of the most desirable SoS configuration. We propose the
VANTESS approach that (i) supports SoS modeling taking into account the variation
points implied by the considered constituent systems; (ii) includes a heuristics to
weight benefits and costs of potential architectural choices (called as SoS variants)
for the selection of the constituent systems; and finally (iii) also helps test planning
for the selected SoS variant by deriving a simulation model on which test objectives
and scenarios can be devised. We illustrate an application example of VANTESS to
the "educational" SoS and discuss its pros and cons within a focus group.Source: Journal of software (Malden, Mass. Online) (2022). doi:10.1002/smr.2427 DOI: 10.1002/smr.2427 Metrics:
A Delphi study to recognize and assess systems of systems vulnerabilities Olivero M. A., Bertolino A., Dominguez-Mayo F. J., Matteucci I., María José Escalona M. J. Context System of Systems (SoS) is an emerging paradigm by which independent systems collaborate by sharing resources and processes to achieve objectives that they could not achieve on their own. In this context, a number of emergent behaviors may arise that can undermine the security of the constituent systems. Objective We apply the Delphi method with the aims to improve our understanding of SoS security and related problems, and to investigate their possible causes and remedies. Method Experts on SoS expressed their opinions and reached consensus in a series of rounds by following a structured questionnaire. Results The results show that the experts found more consensus in disagreement than in agreement about some SoS characteristics, and on how SoS vulnerabilities could be identified and prevented. Conclusions From this study we learn that more work is needed to reach a shared understanding of SoS vulnerabilities, and we leverage expert feedback to outline some future research directions.Source: Information and software technology 146 (2022). doi:10.1016/j.infsof.2022.106874 DOI: 10.1016/j.infsof.2022.106874 Metrics:
Unobtrusive in vivo test and rollback of Java applications Bertolino A., De Angelis G., Miranda B., Tonella P. Modern software systems accommodate complex configurations and execution conditions that depend on the environment where the software is run. While in house testing can exercise only a fraction of such execution contexts, in vivo testing can take advantage of the execution state observed in the field to conduct further testing activities. In this paper, we present the Groucho approach to in vivo testing. Groucho can suspend the execution, run some in vivo tests, rollback the side effects introduced by such tests, and eventually resume normal execution. Differently from the state-of-art approach Invite, Groucho can be transparently applied to the original application code, even if only available as compiled code, and is fully automated. Our empirical studies of the performance overhead introduced by Groucho under various configurations showed that this may be kept to a negligible level by activating in vivo testing with low probability. Our empirical studies about the effectiveness of the approach confirm previous findings on the existence of faults that are unlikely exposed in house and become easy to expose in the field. Moreover, we include the first study to quantify the coverage increase gained when in vivo testing is added to complement in house testing.Source: ISTI Technical Report, ISTI-2022-TR/008, 2022 DOI: 10.32079/isti-tr-2022/008 Metrics:
Insights from running flaky tests into the field: extended version Barboni M., Bertolino A., De Angelis G. Test flakiness is a topmost concern in software test automation. While conducting pre-deployment testing, those tests that are flagged as ?flaky are put aside for being either repaired or discarded. We hypothesize that some flaky tests could provide useful insights if run in the field, and could help identify hard-to-detect failures that escape testing and present themselves in operation. We present the first study to investigate the behaviour of flaky tests when moved to the field. Our experimentation over 52 test methods labelled as flaky provides a first confirmation that moving from the laboratory to the field, the behaviour of tests changes and, in particular, the failure frequency of intermittently failing tests can increase. More importantly, we could identify few cases of field failures that would have been hard to detect while testing in house.Source: ISTI Technical Report, ISTI-2022-TR/007, 2022 DOI: 10.32079/isti-tr-2022/007 Metrics:
Comparing and combining file-based selection and similarity-based prioritization towards regression test orchestration Greca R., Miranda B., Gligoric M., Bertolino A. Test case selection (TCS) and test case prioritization (TCP) techniques can reduce time to detect the first test failure. Although these techniques have been extensively studied in combination and isolation, they have not been compared one against the other. In this paper, we perform an empirical study directly comparing TCS and TCP approaches, represented by the tools Ekstazi and FAST, respectively. Furthermore, we develop the first combination, named Fastazi, of file-based TCS and similarity-based TCP and evaluate its benefit and cost against each individual technique. We performed our experiments using 12 Java-based open-source projects. Our results show that, in the median case, the combined approach detects the first failure nearly two times faster than either Ekstazi alone (with random test ordering) or FAST alone (without TCS). Statistical analysis shows that the effectiveness of Fastazi is higher than that of Ekstazi, which in turn is higher than that of FAST. On the other hand, FAST adds the least overhead to testing time, while the difference between the additional time needed by Ekstazi and Fastazi is negligible. Fastazi can also improve failure detection in scenarios where the time available for testing is restricted. CCS CONCEPTS o Software and its engineering ->Software testing and debugging.Source: AST 2022 - 3rd IEEE/ACM International Conference on Automation of Software Test, pp. 115–125, Pittsburgh, USA, 17-18/05/2022 DOI: 10.1145/3524481.3527223 Metrics:
Testing non-testable programs using association rules Bertolino A., Cruciani E., Miranda B., Verdecchia R. We propose a novel scalable approach for testing non-testable programs denoted as ARMED testing. The approach leverages efficient Association Rules Mining algorithms to determine relevant implication relations among features and actions observed while the system is in operation. These relations are used as the specification of positive and negative tests, allowing for identifying plausible or suspicious behaviors: for those cases when oracles are inherently unknownable, such as in social testing, ARMED testing introduces the novel concept of testing for plausibility. To illustrate the approach we walk-through an application example.Source: AST 2022 - 3rd ACM/IEEE International Conference on Automation of Software Test, pp. 87–91, Pittsburgh, USA, 17-18/05/2022 DOI: 10.1145/3524481.3527238 Metrics:
Self-adaptive testing in the field: are we there yet? Silva S., Bertolino A., Pelliccione P. Testing in the field is gaining momentum, as a means to detect those failures that escape in-house testing by continuing the testing even while a system is operating in production. Among several approaches that are proposed, this paper focuses on the important notion of self-adaptivity of testing in the field, as such techniques need to adapt in many ways their strategy to the context and the emerging behaviors of the system under test. In this work, we investigate the topic by conducting a scoping review of the literature on self-adaptive testing in the field. We rely on a taxonomy organized in some categories that include the object to adapt, the adaptation trigger, the temporal characteristics, the realization issues, the interaction concerns, the type of field-based approach, and the impact/cost. Our study sheds light on self-adaptive testing in the field by identifying related key concepts and key characteristics and extracting some knowledge gaps to better guide future research.Source: SEAMS 2022 - 17th Symposium on Software Engineering for Adaptive and Self-Managing Systems, pp. 58–69, Pittsburgh, USA, 18-20/05/2022
A survey of field-based testing techniques Bertolino A., Braione P., De Angelis G., Gazzola L., Kifetew F., Mariani L., Orrù M., Pezzè M., Pietrantuono R., Russo S., Tonella P. Field testing refers to testing techniques that operate in the field to reveal those faults that escape in-house testing. Field testing techniques are becoming increasingly popular with the growing complexity of contemporary software systems. In this paper, we present the first systematic survey of field testing approaches over a body of 80 collected studies, and propose their categorization based on the environment and the system on which field testing is performed. We discuss four research questions addressing how software is tested in the field, what is tested in the field, which are the requirements, and how field tests are managed, and identify many challenging research directions.Source: ACM computing surveys (Online) 54 (2021). doi:10.1145/3447240 DOI: 10.1145/3447240 Metrics:
What we talk about when we talk about software test flakiness Barboni M., Bertolino A., De Angelis G. Software test flakiness is drawing increasing interest among both academic researchers and practitioners. In this work we report our findings from a scoping review of white and grey literature, highlighting variations across flaky tests key concepts. Our study clearly indicates the need of a unifying definition as well as of a more comprehensive analysis for establishing a conceptual map that can better guide future research.Source: QUATIC 2021 - 14th International Conference Quality of Information and Communications Technology, pp. 29–39, Algarve, Portugal and Online, 8-11/9/2021 DOI: 10.1007/978-3-030-85347-1_3 Metrics:
Know your neighbor: fast static prediction of test flakiness Bertolino A., Cruciani E., Miranda B., Verdecchia R. Flaky tests plague regression testing in Continuous Integration environments by slowing down change releases, wasting development effort, and also eroding testers trust in the test process. We present FLAST, the rst static approach to akiness detection using test code similarity. Our extensive evaluation on 24 projects taken from repositories used in three previous studies showed that FLAST can identify aky tests with up to 0.98 Median and 0.92 Mean precision. For six of those projects it could already yield 0.98 average precision values with a training set containing less than 100 tests.
Besides, where known aky tests are classied according to their causes, the same approach can also predict a aky test category with alike precision values. The cost of the approach is negligible: the average train time over a dataset of 1,700 test methods is less than one second, while the average prediction time for a new test is less than one millisecond.Source: ISTI Technical Reports 001/2020, 2020, 2020 DOI: 10.32079/isti-tr-2020/001 Metrics:
Digital persona portrayal: identifying pluridentity vulnerabilities in digital life Olivero M. A., Bertolino A., Dominguez-Mayo F. J., Escalona M. J., Matteucci I. The increasing use of the Internet for social purposes enriches the data available online about all of us and promotes the concept of the Digital Persona. Actually, most of us are represented online by more than one identity, what we define here as a Pluridentity. This trend brings increased risks: it is well known that the security of a Digital Persona can be exploited if its data and security are not effectively managed. In this paper, we focus specifically on a new type of digital attack that can be perpetrated by combining pieces of data belonging to one same Pluridentity in order to profile their target. Some victims can be so accurately depicted when looking at their Pluridentity that by using the gathered information attackers can execute very personalized social engineering attacks, or even bypass otherwise safe security mechanisms. We characterize these Pluridentity attacks as a security issue of a virtual System of Systems, whose constituent systems are the individual identities and the humans themselves. We present a strategy to identify vulnerabilities caused by overexposure due to the combination of data from the constituent identities of a Pluridentity. To this end we introduce the Digital Persona Portrayal Metamodel, and the related Digital Pluridentity Persona Portrayal Analysis process that supports the architecting of data from different identities: such model and process can be used to identify the vulnerabilities of a Pluridentity due to its exploitation as a System of Systems. The approach has been validated on the Pluridentities of seventeen candidates selected from a data leak, by retrieving the data of their Digital Personae, and matching them against the security mechanisms of their Pluridentities. After analyzing the results for some of the analyzed subjects we could detect several vulnerabilities.Source: Journal of Information Security and Applications 52 (2020). doi:10.1016/j.jisa.2020.102492 DOI: 10.1016/j.jisa.2020.102492 Metrics: