Causal discovery from observational tabular data with generative adversarial learning

Petkov, Hristo

Thesis

Causal discovery from observational tabular data with generative adversarial learning

下载PDF文件

Creator

Petkov, Hristo

Rights statement

Strathclyde Thesis Copyright

Awarding institution

University of Strathclyde

Date of award

2025

Thesis identifier

T17565

Person Identifier (Local)

202092048

Qualification Level

Doctoral (Postgraduate)

Qualification Name

Doctor of Philosophy (PhD)

Department, School or Faculty

Department of Computer and Information Sciences

Abstract

Background Causal knowledge is essential for understanding complex systems and revealing relationships between variables. It enables researchers to transition beyond correlations, reason about cause and effect, and derive scientific insights. Although Randomized Controlled Trials (RCT) remain the gold standard for causal inference, they are often infeasible due to ethical, logistical, or financial constraints and may lack real-world applicability. In contrast, observational data offer abundant, diverse samples, making them well-suited for large-scale analysis. Despite susceptibility to confounding, advances in structure learning from observations allow researchers to identify causal relationships without relying on randomized experiments. Research objectives This thesis challenges conventional maximum likelihood estimation (MLE)-based methods by exploring adversarial causal discovery approaches. It leverages the Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) framework to address key limitations: (1) model overfitting from simplistic loss functions; (2) dependence on single parametric assumptions that hinder accurate causal graph recovery reflective of true data relationships; (3) high computational cost from Augmented Lagrangian optimization in the NOTEARS framework; and (4) inability to perform causal discovery and tabular data synthesis simultaneously under a single framework. Methods Three models were developed using the WGAN-GP framework. The first, DAG-WGAN integrates WGAN-GP with variational inference, leveraging hybrid losses for improved causal modeling. The second, DAG-WGAN+ enhances continuous optimization with efficient structure learning techniques. The third, DAGAF captures variable interdependencies under various causal assumptions to generate synthetic data preserving causal relations. Results All models target multivariate causal discovery and were rigorously evaluated using Structural Hamming Distance (SHD). Results show they outperform leading methods in causal discovery across 97.47% of all test cases. In real-world experiments, the proposed models achieve superior accuracy (SHD = 8 vs. > 10 for state-of-the-art models). Findings further reveal that precise causal modeling enhances synthetic data quality by preserving underlying causal mechanisms.

Advisor / supervisor

Dong, Feng
Maguire, Roma

Resource Type

Doctoral thesis

DOI

10.48730/wp6y-b583

关系

项目

缩略图	标题	上传日期	公开度	行动
	PDF of thesis T17565	2026-01-09	公开	下载

Causal discovery from observational tabular data with generative adversarial learning

可下载的内容

关系

项目