Thesis
Classifying tor traffic using character analysis
- Creator
- Rights statement
- Awarding institution
- University of Strathclyde
- Date of award
- 2024
- Thesis identifier
- T16831
- Person Identifier (Local)
- 201673488
- Qualification Level
- Qualification Name
- Department, School or Faculty
- Abstract
- Tor is a privacy-preserving network that enables users to browse the Internet anonymously. Although the prospect of such anonymity is welcomed in many quarters, Tor can also be used for malicious purposes, prompting the need to monitor Tor network connections. Most traffic classification methods depend on flow-based features, due to traffic encryption. However, these features can be less reliable due to issues like asymmetric routing, and processing multiple packets can be time-intensive. In light of Tor’s sophisticated multilayered payload encryption compared with nonTor encryption, our research explored patterns in the encrypted data of both networks, challenging conventional encryption theory which assumes that ciphertexts should not be distinguishable from random strings of equal length.Our novel approach leverages machine learning to differentiate Tor from nonTor traffic using only the encrypted payload. We focused on extracting statistical hex character-based features from their encrypted data. For consistent findings, we drew from two datasets: a public one, which was divided into eight application types for more granular insight and a private one. Both datasets covered Tor and nonTor traffic. We developed a custom Python script called Charcount to extract relevant data and features accurately. To verify our results’ robustness, we utilized both Weka and scikit-learn for classification.In our first line of research, we conducted hex character analysis on the encrypted payloads of both Tor and nonTor traffic using statistical testing. Our investigation revealed a significant differentiation rate between Tor and nonTor traffic of 95.42% for the public dataset and 100% for the private dataset.The second phase of our study aimed to distinguish between Tor and nonTor traffic using machine learning, focusing on encrypted payload features that are independent of length. In our evaluations, the public dataset yielded an average accuracy of 93.56% when classified with the Decision Tree (DT) algorithm in scikit-learn, and 95.65% with the j48 algorithm in Weka. For the private dataset, the accuracies were 95.23% and 97.12%, respectively. Additionally, we found that the combination of WrapperSubsetEval+BestFirst with the J48 classifier both enhanced accuracy and optimized processing efficiency.In conclusion, our study contributes to both demonstrating the distinction between Tor and nonTor traffic and achieving efficient classification of both types of traffic using features derived exclusively from a single encrypted payload packet. This work holds significant implications for cybersecurity and points towards further advancements in the field.
- Advisor / supervisor
- Weir, George
- Fernando, Anil
- Resource Type
- DOI
- Funder
Relations
Items
Thumbnail | Title | Date Uploaded | Visibility | Actions |
---|---|---|---|---|
|
PDF of thesis T16831 | 2024-02-20 | Public | Download |