Classifying tor traffic using character analysis

Choorod, Pitpimon

Thesis

Classifying tor traffic using character analysis

Download PDF

Creator

Choorod, Pitpimon

Rights statement

Strathclyde Thesis Copyright

Awarding institution

University of Strathclyde

Date of award

2024

Thesis identifier

T16831

Person Identifier (Local)

201673488

Qualification Level

Doctoral (Postgraduate)

Qualification Name

Doctor of Philosophy (PhD)

Department, School or Faculty

Department of Computer and Information Sciences

Abstract

Tor is a privacy-preserving network that enables users to browse the Internet anonymously. Although the prospect of such anonymity is welcomed in many quarters, Tor can also be used for malicious purposes, prompting the need to monitor Tor network connections. Most traffic classification methods depend on flow-based features, due to traffic encryption. However, these features can be less reliable due to issues like asymmetric routing, and processing multiple packets can be time-intensive. In light of Tor’s sophisticated multilayered payload encryption compared with nonTor encryption, our research explored patterns in the encrypted data of both networks, challenging conventional encryption theory which assumes that ciphertexts should not be distinguishable from random strings of equal length.Our novel approach leverages machine learning to differentiate Tor from nonTor traffic using only the encrypted payload. We focused on extracting statistical hex character-based features from their encrypted data. For consistent findings, we drew from two datasets: a public one, which was divided into eight application types for more granular insight and a private one. Both datasets covered Tor and nonTor traffic. We developed a custom Python script called Charcount to extract relevant data and features accurately. To verify our results’ robustness, we utilized both Weka and scikit-learn for classification.In our first line of research, we conducted hex character analysis on the encrypted payloads of both Tor and nonTor traffic using statistical testing. Our investigation revealed a significant differentiation rate between Tor and nonTor traffic of 95.42% for the public dataset and 100% for the private dataset.The second phase of our study aimed to distinguish between Tor and nonTor traffic using machine learning, focusing on encrypted payload features that are independent of length. In our evaluations, the public dataset yielded an average accuracy of 93.56% when classified with the Decision Tree (DT) algorithm in scikit-learn, and 95.65% with the j48 algorithm in Weka. For the private dataset, the accuracies were 95.23% and 97.12%, respectively. Additionally, we found that the combination of WrapperSubsetEval+BestFirst with the J48 classifier both enhanced accuracy and optimized processing efficiency.In conclusion, our study contributes to both demonstrating the distinction between Tor and nonTor traffic and achieving efficient classification of both types of traffic using features derived exclusively from a single encrypted payload packet. This work holds significant implications for cybersecurity and points towards further advancements in the field.

Advisor / supervisor

Weir, George
Fernando, Anil

Resource Type

Doctoral thesis

DOI

10.48730/eg9x-e255

Funder

Royal Thai Government Scholarship Program

Relations

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	PDF of thesis T16831	2024-02-20	Public	Download

Classifying tor traffic using character analysis

Downloadable Content

Relations

Items