Thesis

Formulating test oracles via anomaly detection techniques

Creator
Rights statement
Awarding institution
  • University of Strathclyde
Date of award
  • 2017
Thesis identifier
  • T14716
Person Identifier (Local)
  • 2012877256
Qualification Level
Qualification Name
Department, School or Faculty
Abstract
  • Developments in the automation of test data generation have greatly improved efficiency of the software testing process but the so-called "oracle problem" (deciding the pass or fail outcome of a test execution) is still primarily an expensive and error-prone manual activity. This thesis presents an approach to build an automated test oracle using anomaly detection techniques (based on semi-supervised and unsupervised learning approaches) on dynamic execution data (test input/output pairs and execution traces).;Firstly, anomaly detection techniques based on semi-supervised learning approach were investigated to automatically classify passing and failing executions. A small proportion of the test data is labelled as passing or failing and used in conjunction with the unlabelled data to build a classifier which labels the remaining outputs (classify them as passing or failing tests).;A range of learning algorithms are investigated using several faulty versions of three systems along with varying types of data (inputs/outputs alone, or in combination with execution traces) and different labelling strategies (both failing and passing tests, and passing tests alone). The results show that in many cases labelling just a small proportion of the test cases - as low as 10% - is sufficient to build a classifier that is able to correctly categorise the large majority of the remaining test cases.;This has important practical potential: when checking the test results from a system a developer need only examine a small proportion of these and use this information to train a learning algorithm to automatically classify the remainder.;Secondly, anomaly detection techniques based on unsupervised learning (mainly clustering algorithms) were investigated to automatically detect passing and failing executions. The key hypothesis is that failures will group into small clusters whereas passing executions will group into larger ones. In this investigation, the same dynamic execution data and systems used in previous study were used to evaluate the proposed approach.;The results show that this hypothesis to be valid, and illustrates that the approach has the potential to substantially reduce the numbers of outputs that would need to be manually examined following a test run.;Finally, a comparison study was performed between existing techniques from the specifications mining domain (the data invariant detector Daikon [30]) and anomaly detection techniques (based on semi-supervised and unsupervised learning approaches). In most cases semi-supervised learning techniques (mainly Self-training approach - Naïve Bayes with EM clustering algorithm - and Co-training approach - Naïve Bayes) perform far better under both scenarios (two different labelling strategies) as an automated test classifier than Daikon especially when input/output pairs are used together with execution traces. Furthermore, unsupervised learning techniques performed on a par when compared with Daikon in several cases.
Resource Type
DOI
Date Created
  • 2017
Former identifier
  • 9912568092702996

Relations

Items