Automatic annotation of subsea pipelines using deep learning

Rights statement
Awarding institution
  • University of Strathclyde
Date of award
  • 2023
Thesis identifier
  • T16666
Person Identifier (Local)
  • 201861573
Qualification Level
Qualification Name
Department, School or Faculty
  • Subsea pipeline inspection is a crucial process for the oil and gas industry to ensure asset quality, as damage can lead to interruptions in production and pose an environmental threat. To this day, this process is based on human annotators that inspect an immense amount of visual and sensor data and manually annotate the events that occur during subsea pipeline surveys. This is a labour-intensive process, prone to human error, very costly for the oil and gas industry, and potentially unsafe for the annotators as it happens off-shore. This thesis proposes methodologies to automate visual inspection of subsea pipelines using Deep Learning (DL) which, in turn, can enable more robust, accurate, and faster inspections, allowing personnel to work on other more sophisticated tasks while reducing cost. To this end, the objectives of this thesis are: (i) developing a framework for subsea survey multi-label image classification and threshold search using Precision Recall (PR) curves, (ii) extending to subsea survey video classification and comparison between three Convolutional Neural Network (CNN)-based models, (iii) proposing a subsea survey texture adaptation methodology that combines the Swapping Autoencoder (SAE) architecture with a classifier module along with a characterisation of the domain shift between two subsea surveys recorded at different times and places. For the first objective, a deep CNN ResNet-50 is used to automatically detect five subsea survey events; Anode, Exposure, Burial, Field Joint and Free Span, using only the centre video feed of a Remotely Operated Vehicle (ROV). To reduce the demands on the training time, a transfer learning approach is adopted where the feature extraction layers of the network are initialised using the weights of a network pre-trained on ImageNet. The network is then modified to allow for multi-label classification, allowing for the identification of events that appear concurrently in subsea surveys and re-trained on subsea survey images. Different ResNet depths have been compared and ResNet50 is selected as it provides the best balance between performance and number of parameters. An additional experiment is conducted to demonstrate the generalisation of the validation to the test sets when the data is split based on different events of a survey. To extend this study to automatic video annotation, an evaluation of three models for classifying subsea survey video data is presented. A subsea survey video has been curated, and several regularisation techniques are investigated to address its challenges. The models include a traditional 2D CNN, IBN-ResNet50, which classifies individual frames and averages the predictions, along with a 3D IBN-ResNet50 and a 2D IBN-ResNet50-LSTM, which create a single prediction per video clip. Instance Batch Normalisation (IBN) is used between the convolutional layers of the models to improve performance with varying lighting conditions and changes in colour contrast in the surveys. Experimental results indicate that the 2D model outperforms the spatiotemporal models, particularly for short events. The experiment also suggests that a larger dataset would have been beneficial for the 3D model, but it would also require additional manual annotation. For the third objective, three methods are tested to measure the adaptation of models from a source to a target survey. The first method compares variations of a ResNet-50 model with different normalisation layers. The second method proposes a two-step process combining an image-to-image translation solution (SAE) with a classifier module. The third method involves creating two synthesised datasets using SAE, and use them to increase the variability of the training source data. All methods present better or equal performance on the target surveys compared to the baseline ResNet-50 but they do not achieve supervised learning levels. To this end, further experimentation shows that adding 20% of target events to the source dataset is enough to boost performance in the target test set and reach the same levels as using all of the events. Finally, one method is proposed that measures the in- and out-of-domain shift between two surveys by examining the Frechet Inception Distance (FID) scores between class-specific subsets.
Advisor / supervisor
  • Tachtatzis, Christos
Resource Type