Unifying Population Structure and Relatedness Analysis through a Coalescent Approach
Author
Advisor
Date
Embargo until
Language
Book title
Journal
Publisher
Peer Reviewed
Type
Research Area
Jurisdiction
Files
Other Titles
See at
Abstract
Genetic similarity in genome-wide association studies (GWAS) is typically partitioned into recent kinship, modeled by a genetic relationship matrix (GRM), and distant ancestry, corrected by principal components (PCs). In this dissertation, I argue that this partitioned model is a methodological practice built on a typically implicit causal framework that conflates population structure with confounding. This work deconstructs this standard approach and proposes a unified genetic model as a formal baseline. To this end, I make two contributions. First, I introduce the Coefficient of Genealogical Similarity (GeSi), a measure of relatedness derived from coalescent theory that captures the full continuum of shared genealogy. This leads to a classification of genetic relationship matrices (GRMs) into genealogically “full” or “shallow” matrices. Empirical tests demonstrate that full GRMs are sufficient to model the genetic covariance from population structure in the absence of confounding. This reframes the role of PCs as proxies for unmeasured confounders correlated with ancestry, rather than as a necessary correction for population structure itself. Second, I develop phenocause, an R package for simulating phenotypes under complex genetic and non-genetic causal models. This tool addresses a critical methodological gap by enabling the simulation of specific genetic and non-genetic confounding scenarios which are necessary to test the assumptions of GWAS models. Together, these contributions provide a theoretical basis and a practical tool to move the field beyond correcting for inflation of test statistic and towards understanding the mechanisms that give rise to such inflation.
