Android malware detection using static analysis, machine learning and deep learning

Ahmad, Fawad

Thesis

Android malware detection using static analysis, machine learning and deep learning

Download PDF

Creator

Ahmad, Fawad

Rights statement

Strathclyde Thesis Copyright

Awarding institution

University of Strathclyde

Date of award

2022

Thesis identifier

T16296

Person Identifier (Local)

201557732

Qualification Level

Doctoral (Postgraduate)

Qualification Name

Doctor of Philosophy (PhD)

Department, School or Faculty

Department of Computer and Information Sciences

Abstract

Android has been a dominant mobile operating system since 2012 as shown in Figure 1. This popularity coupled with a ubiquitous usage of smartphone in all aspects of our lives, e.g. online banking, social networking, and online shopping etc. have made Android a lucrative target for malware developers.To combat the threat of malware stealing our private information, researchers have suggested various techniques for detecting Android malware. Broadly speaking, three primary techniques have been used for malware detection. Static Analysis, performed without running the application, has been used to generate signatures of malware, that can be used to differentiate between malware and benign applications. Another technique, Dynamic Analysis, has been used to create a behaviour profile of malware and benign applications by executing them in a controlled environment and monitoring their behaviour to detect malware. Hybrid Analysis has been used to utilise signatures generated from static analysis and behaviour profile created from dynamic analysis for detecting Android malware. In recent years, complementary techniques such as Machine Learning and Deep Learning have been used to extract features from the three primary analysis techniques and feed them to several algorithms for classification purposes. Deep Learning is a subfield of Machine Learning that relates to structuring algorithms in layers to mimic human neural network. The artificial neural network is used to solve complex problems using different algorithms.In this dissertation, firstly, a systematic review is presented to amalgamate current approaches for detecting Android malware, and custom-built malware detection technologies. As a result of the literature evaluation, a taxonomy is suggested for Android malware detection. Furthermore, trends in the usage of the major analytical techniques and complementary techniques are shown. Research gaps in the Android malware detection area are identified for future research direction.Secondly, Droid Fence, a custom-built web-based framework, for managing experiments is developed. Droid Fence automates the extraction of the required features from malware and benign applications directory by conducting static analysis via a frontend. Next, Droid Fence completes the automated process by storing the extracted features against each application record in a relational database, feeding them to the required machine learning and deep learning algorithms, storing the result into the database, and finally displaying the outcome of each experiment.Thirdly, developed an approach that amalgamates a set of permissions, services, and six other features (usage of https, database, dynamic code, native code, reflection, and cryptography) to generate a matrix that is used for detecting malware effectively. To the best of our knowledge, this is a novel approach that combines these features to detect malware. Droid Fence is evaluated on a dataset of 13191 applications consisting of 5787 malware and 7404 benign applications. Our results show that Droid Fence is very effective when it utilises a Sequential (Deep Learning) algorithm to detect malware, achieving accuracy, F1-measure, precision, and recall scores of 0.971, 0.967, 0.977, and 0.956 respectively. Our experiments, conducted using Droid Fence, demonstrates that deep learning Sequential algorithm scored consistently highly when compared against eight machine learning algorithms. However, the difference between the accuracy scores achieved by the Sequential (97.1%) and Random Forest Classifier (95.8%) is minimal in comparison with the remaining algorithms used in our experiments. We used a stratified k fold cross-validation method, and the result was compared for four metrics: accuracy, F1 score, precision, and recall.Finally, a conclusion and future research direction are suggested for both Android malware detection area and improvement in Droid Fence.

Advisor / supervisor