Formulating test oracles via anomaly detection techniques

Almaghairbe, Rafig

Thesis

Formulating test oracles via anomaly detection techniques

Download PDF

Creator

Almaghairbe, Rafig

Rights statement

Strathclyde Thesis Copyright

Awarding institution

University of Strathclyde

Date of award

2017

Thesis identifier

T14716

Person Identifier (Local)

2012877256

Qualification Level

Doctoral (Postgraduate)

Qualification Name

Doctor of Philosophy (PhD)

Department, School or Faculty

Department of Computer and Information Sciences

Abstract

Developments in the automation of test data generation have greatly improved efficiency of the software testing process but the so-called "oracle problem" (deciding the pass or fail outcome of a test execution) is still primarily an expensive and error-prone manual activity. This thesis presents an approach to build an automated test oracle using anomaly detection techniques (based on semi-supervised and unsupervised learning approaches) on dynamic execution data (test input/output pairs and execution traces).;Firstly, anomaly detection techniques based on semi-supervised learning approach were investigated to automatically classify passing and failing executions. A small proportion of the test data is labelled as passing or failing and used in conjunction with the unlabelled data to build a classifier which labels the remaining outputs (classify them as passing or failing tests).;A range of learning algorithms are investigated using several faulty versions of three systems along with varying types of data (inputs/outputs alone, or in combination with execution traces) and different labelling strategies (both failing and passing tests, and passing tests alone). The results show that in many cases labelling just a small proportion of the test cases - as low as 10% - is sufficient to build a classifier that is able to correctly categorise the large majority of the remaining test cases.;This has important practical potential: when checking the test results from a system a developer need only examine a small proportion of these and use this information to train a learning algorithm to automatically classify the remainder.;Secondly, anomaly detection techniques based on unsupervised learning (mainly clustering algorithms) were investigated to automatically detect passing and failing executions. The key hypothesis is that failures will group into small clusters whereas passing executions will group into larger ones. In this investigation, the same dynamic execution data and systems used in previous study were used to evaluate the proposed approach.;The results show that this hypothesis to be valid, and illustrates that the approach has the potential to substantially reduce the numbers of outputs that would need to be manually examined following a test run.;Finally, a comparison study was performed between existing techniques from the specifications mining domain (the data invariant detector Daikon [30]) and anomaly detection techniques (based on semi-supervised and unsupervised learning approaches). In most cases semi-supervised learning techniques (mainly Self-training approach - Naïve Bayes with EM clustering algorithm - and Co-training approach - Naïve Bayes) perform far better under both scenarios (two different labelling strategies) as an automated test classifier than Daikon especially when input/output pairs are used together with execution traces. Furthermore, unsupervised learning techniques performed on a par when compared with Daikon in several cases.

Resource Type

Doctoral thesis

DOI

10.48730/te5r-kw05

Date Created

2017

Former identifier

9912568092702996

Relations

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	PDF of thesis T14716	2021-07-02	Public	Download

Formulating test oracles via anomaly detection techniques

Downloadable Content

Relations

Items