Thesis
The role of chemometrics and machine learning in cancer diagnostics
- Creator
- Rights statement
- Awarding institution
- University of Strathclyde
- Date of award
- 2024
- Thesis identifier
- T16967
- Person Identifier (Local)
- 202070779
- Qualification Level
- Qualification Name
- Department, School or Faculty
- Abstract
- Cancer is one of the leading causes of death worldwide, with late-stage diagnosis being one of the primary causes for high mortality rates. Quick and cost-effective methods for cancer diagnosis have been widely investigated to increase survival and patient quality of life. This thesis will examine how machine learning (ML) can be coupled with chemometric analyses to diagnose cancer and cancer-related diseases. Firstly, the use of Raman spectroscopy will be used with ML to classify patients into various pre cancerous categories. Using a random forest multiclass model, this work was able to achieve an overall accuracy of 73% across the categories, as well as being able to identify unique spectral features for each of the low-risk classes. Distinguishing between pancreatic cancer, symptomatic, and healthy controls is a challenging task, so the second results chapter investigates the use of mass spectrometry with ML to distinguish between pancreatic cancer patients and those who are symptomatic or healthy, with a 97% sensitivity and 99% specificity, with equal performance across all cancer stages. Multicancer early detection tests are cost-effective for providing a rapid diagnosis for patients. The third results chapter examines the use of infrared spectroscopy coupled with ML to classify patients as either cancer or non-cancer. The model could diagnose patient samples with a 90% sensitivity and 61% specificity, again producing an equal performance across all cancer stages, with a stage I detection rate of 93%. To conclude, this thesis will investigate the use of data augmentation using a Wasserstein generative adversarial network to generate synthetic infrared spectra to improve the performance of a convolutional neural network for cancer diagnosis. By adding this synthetic data into model training, the area under the receiver operating characteristic curve increased from 0.66 to 0.76, demonstrating how data augmentation can be used to improve diagnostic performance.
- Advisor / supervisor
- Palmer, David
- Resource Type
- Note
- This thesis was previously held under moratorium from 12/06/2024 until 03/06/2026.
- DOI
- Embargo Note
Relations
Items
| Thumbnail | Title | Date Uploaded | Visibility | Actions |
|---|---|---|---|---|
|
|
File | 2024-06-12 | Embargo |