Thesis

Machine learning for reaction outcome prediction and catalyst classification

Creator
Rights statement
Awarding institution
  • University of Strathclyde
Date of award
  • 2026
Thesis identifier
  • T18044
Person Identifier (Local)
  • 202152381
Qualification Level
Qualification Name
Department, School or Faculty
Abstract
  • Application of machine learning to chemistry is a rapidly growing field that is providing valuable insight into chemical reactivity. The work detailed in this thesis focuses on the application of machine learning for reaction outcome prediction in catalytic reactions. Iridium catalysed C-H borylation reactions are important chemical transformations used in synthetic chemistry. Current understanding of the reactivity of substrates towards this transformation relies on rules of thumb and generic trial and error experimentation. By optimising reaction conditions using high-throughput experimentation, a substrate scope study was completed to generate a dataset for machine learning. The substrate dataset was used to train a variety of machine learning models, which were optimised for the highest possible model performance. The models were trained to classify substrates as likely to have low, medium or high degrees of borylation. After testing ten types of machine learning model, partial least squares discriminant analysis was identified as the most accurate model type with an accuracy of 0.807 ± 0.105. Four types of molecular descriptors were tested to understand their impact on the model’s performance. Molecular fingerprints exhibited poor model performance but RDKit, Mordred and cddd descriptors all showed good performance. After considering the feature importances, it was established that RDKit is the best descriptor set to use, as the interpretable nature of the descriptors allows for understanding of the structure reactivity relationships to be established. Another reaction outcome that can be predicted is the enantiomeric excess of enantioselective transformations. In Chapter 5, two machine learning models were trained to predict the enantiomeric excess of reactions catalysed by imidodiphosphorimidates. This was then used to create an augmented library of over 7,500 reactions. This augmented dataset was used to assess the generality of imidodiphosphorimidate catalysts. The most general catalysts were identified. These catalysts are the ones most likely to exhibit across the board success in a range of chemical transformations. In summary, the work detailed in this thesis involves the application of machine learning to catalytic chemistry, to understand the links between substrate/catalyst structure and the reaction outcome. By applying machine learning, we are able to gain insight about the reactivity of the catalytic reactions using minimal experimental time and resources.
Advisor / supervisor
  • Dominey, Andrew
  • Reid, Marc
  • Nelson, David J.
  • Palmer, David (David S.)
Resource Type
DOI
Embargo Note
  • This digital copy of this thesis is available to Strathclyde users only until 8th June 2031

Relations

Items