Notations

Notations#

Below is a table of notations that we use in this course.

Symbol	Meaning	Example Meaning
Genotype Data
$\mathbf{X}_{\text{raw}}$	Raw genotype matrix (without centering or standardization)	$N \times M$ matrix with values 0, 1, 2 for AA, Aa, aa
$\mathbf{X}$	Genotype matrix after normalization	Centered and scaled genotype matrix
$\mathbf{X}_{i,\cdot}$	Genotype for individual $i$ across all variants	Row vector of genotypes for individual $i$
$\mathbf{X}_{i,j}$	Genotype for individual $i$ at variant $j$	Number of risk alleles (0, 1, or 2) for individual $i$ at variant $j$
$\mathbf{X}_{\cdot,j}$	Genotype at variant $j$ for all individuals	Column vector of genotypes for variant $j$ across all individuals
$X_i$	Genotype value for individual $i$ (single variant case)	Number of risk alleles for individual $i$ (0, 1, or 2)
$\mu_j$	Mean value of variant $j$ in $\mathbf{X}_{\text{raw}}$	Mean genotype value across all individuals for variant $j$
Indices and Dimensions
$i$	Index for individual	$i=1,2,...,N$
$j$	Index for genetic variant	$j=1,2,...,M$
$k$	Index for study	$k=1,2,...,K$
$c$	Index for mixture component	$c=1,2,...,C$
$N$	Number of individuals	500,000 individuals in UK Biobank subset
$M$	Number of variants	1,000,000 SNPs after quality control
$M_e	Number of effective variants	900,000 SNPs after corrected for LD
$R$	Number of random effects	5 random effects from gene 1 to gene 5
$K$	Number of studies	3 studies in meta-analysis
$C$	Number of mixture components	4 components in mixture model (null, small, medium, large effects)
Alleles and Frequencies
$A$	Major allele	Reference allele (more common)
$a$	Minor allele	Alternative allele (less common)
$f_A$, $f_a$	Allele frequencies	$f_A=0.7, f_a=0.3$ for a variant
Linkage and Relatedness
$l_j$	LD score of variant $j$	Measure of local LD around variant $j$
$\bar{\boldsymbol{l}}$	The mean LD score across all SNPs	Average LD scores
$\mathbf{G}$	Genetic relationship matrix (GRM)	$N \times N$ matrix of genetic similarities
Traits and Phenotypes
$\mathbf{Y}$	Trait vector (multivariate case)	Height and weight measurements
$Y_i$	Trait value for individual $i$	Height = 175cm for individual $i$
$Y_{i,a}$, $Y_{i,b}$	Trait values for individual $i$ (multivariate)	Height = 175cm, weight = 70kg for individual $i$
Regression Parameters
$b_0$	Intercept in regression model	Mean height = 170cm when genotype = 0
$\beta$	Effect size of variant on trait (scalar)	0.5cm increase in height per risk allele
$\boldsymbol{\beta}$	Effect size vector (multivariate)	Effects on height and weight: $\begin{pmatrix} 0.5 \\ 0.3 \end{pmatrix}$
$\hat{\beta}_k$	Estimated effect from study $k$	Effect estimate from European study
$\mu_\beta$	Mean of $\beta$ distribution	Average effect across all variants
$w_k$	Weight for the $k$-th study under fixed effect meta-analysis	$w_k = \frac{1}{\text{SE}_k^2}$
$\tau^2$	The between-study variance (heterogeneity) under random effect meta-analysis	How different the studies are from each other
$w_k^*$	Weight for the $k$-th study under random effect meta-analysis	$w_k^* = \frac{1}{\text{SE}_k^2 + \tau^2}$
Variance and Covariance
$\sigma^2$	Variance of trait	Residual variance in height (e.g., 100 cm²)
$\boldsymbol{\Sigma}$	Known covariance matrix	Residual covariance between height and weight
Prior Parameters
$\beta_0$	Prior mean (scalar)	Expected effect = 0 before seeing data
$\boldsymbol{\beta}_0$	Prior mean vector	Expected effects = $\begin{pmatrix} 0 \\ 0 \end{pmatrix}$ before data
$\sigma_0^2$	Prior variance (scalar)	Prior uncertainty = 0.25 for effect size
$\boldsymbol{\Sigma}_0$	Prior covariance matrix	Prior uncertainty about joint effects
Posterior Parameters
$\beta_1$	Posterior mean (scalar)	Updated effect = 0.3 after seeing data
$\boldsymbol{\beta}_1$	Posterior mean vector	Updated effects = $\begin{pmatrix} 0.3 \\ 0.2 \end{pmatrix}$ after data
$\sigma_1^2$	Posterior variance (scalar)	Reduced uncertainty = 0.1 after data
$\boldsymbol{\Sigma}_1$	Posterior covariance matrix	Reduced uncertainty about joint effects
Mixed Models
$\mathbf{g}$	Random effect vector in linear mixed models	Individual-specific genetic effects
$\mathbf{Z}$	Design matrix for random effects	Matrix linking observations to random effects
$\mathbf{u}$	Random effect vector in linear mixed models	Polygenic effects for each individual
Covariates
$\mathbf{W}$	Covariate matrix (confounder, collider, or mediator)	Age, sex, principal components
Meta-Analysis
$Q$	Cochran’s Q statistic	Test statistic for heterogeneity = 12.5
$I^2$	$I^2$ statistic for heterogeneity	Percentage of variation due to heterogeneity = 75%
Model and Data
$M$	Model	Linear model vs. quadratic model
$D$	Observed data	Genotype and phenotype measurements
Likelihood and Testing
$\mathcal{L}$	Likelihood function	$\mathcal{L}(\beta\mid Y,X) = P(Y\mid X,\beta)$
$\ell$	Log-likelihood	$\ell = \log(\mathcal{L})$ for numerical stability
$\Lambda$	Likelihood ratio test statistic	$\Lambda = 2(\ell_1 - \ell_0)$
$p$	P-value	Probability = 0.001 under null hypothesis
$H_0$	Null hypothesis	No genetic effect ($\beta = 0$)
$H_a$	Alternative hypothesis	Genetic effect exists ($\beta \neq 0$)
Parameters and Spaces
$\theta$	Generic parameter of statistical model	Could be $\beta$, $\sigma^2$, or other parameters
$\Theta$	Parameter space of $\theta$	Set of all possible values for $\theta$
Sufficient Statistics
$T_1$	Sufficient statistic (scalar)	$T_1 = \sum_{i=1}^N X_i^2 = 1000$
$\mathbf{T}_2$	Sufficient statistic vector	$\mathbf{T}_2 = \sum_{i=1}^N X_i \mathbf{Y}_i$ for multivariate traits
Mixture Models
$\pi_c$	Mixture weight for component $c$	90% of variants have no effect ($\pi_1 = 0.9$)
$p_c(\beta)$	Component density function	Normal distribution for component $c$

Symbol	Meaning	Example Meaning
Genotype Data
\(\mathbf{X}_{\text{raw}}\)	Raw genotype matrix (without centering or standardization)	\(N \times M\) matrix with values 0, 1, 2 for AA, Aa, aa
\(\mathbf{X}\)	Genotype matrix after normalization	Centered and scaled genotype matrix
\(\mathbf{X}_{i,\cdot}\)	Genotype for individual \(i\) across all variants	Row vector of genotypes for individual \(i\)
\(\mathbf{X}_{i,j}\)	Genotype for individual \(i\) at variant \(j\)	Number of risk alleles (0, 1, or 2) for individual \(i\) at variant \(j\)
\(\mathbf{X}_{\cdot,j}\)	Genotype at variant \(j\) for all individuals	Column vector of genotypes for variant \(j\) across all individuals
\(X_i\)	Genotype value for individual \(i\) (single variant case)	Number of risk alleles for individual \(i\) (0, 1, or 2)
\(\mu_j\)	Mean value of variant \(j\) in \(\mathbf{X}_{\text{raw}}\)	Mean genotype value across all individuals for variant \(j\)
Indices and Dimensions
\(i\)	Index for individual	\(i=1,2,...,N\)
\(j\)	Index for genetic variant	\(j=1,2,...,M\)
\(k\)	Index for study	\(k=1,2,...,K\)
\(c\)	Index for mixture component	\(c=1,2,...,C\)
\(N\)	Number of individuals	500,000 individuals in UK Biobank subset
\(M\)	Number of variants	1,000,000 SNPs after quality control
$M_e	Number of effective variants	900,000 SNPs after corrected for LD
\(R\)	Number of random effects	5 random effects from gene 1 to gene 5
\(K\)	Number of studies	3 studies in meta-analysis
\(C\)	Number of mixture components	4 components in mixture model (null, small, medium, large effects)
Alleles and Frequencies
\(A\)	Major allele	Reference allele (more common)
\(a\)	Minor allele	Alternative allele (less common)
\(f_A\), \(f_a\)	Allele frequencies	\(f_A=0.7, f_a=0.3\) for a variant
Linkage and Relatedness
\(l_j\)	LD score of variant \(j\)	Measure of local LD around variant \(j\)
\(\bar{\boldsymbol{l}}\)	The mean LD score across all SNPs	Average LD scores
\(\mathbf{G}\)	Genetic relationship matrix (GRM)	\(N \times N\) matrix of genetic similarities
Traits and Phenotypes
\(\mathbf{Y}\)	Trait vector (multivariate case)	Height and weight measurements
\(Y_i\)	Trait value for individual \(i\)	Height = 175cm for individual \(i\)
\(Y_{i,a}\), \(Y_{i,b}\)	Trait values for individual \(i\) (multivariate)	Height = 175cm, weight = 70kg for individual \(i\)
Regression Parameters
\(b_0\)	Intercept in regression model	Mean height = 170cm when genotype = 0
\(\beta\)	Effect size of variant on trait (scalar)	0.5cm increase in height per risk allele
\(\boldsymbol{\beta}\)	Effect size vector (multivariate)	Effects on height and weight: \(\begin{pmatrix} 0.5 \\ 0.3 \end{pmatrix}\)
\(\hat{\beta}_k\)	Estimated effect from study \(k\)	Effect estimate from European study
\(\mu_\beta\)	Mean of \(\beta\) distribution	Average effect across all variants
\(w_k\)	Weight for the \(k\)-th study under fixed effect meta-analysis	\(w_k = \frac{1}{\text{SE}_k^2}\)
\(\tau^2\)	The between-study variance (heterogeneity) under random effect meta-analysis	How different the studies are from each other
\(w_k^*\)	Weight for the \(k\)-th study under random effect meta-analysis	\(w_k^* = \frac{1}{\text{SE}_k^2 + \tau^2}\)
Variance and Covariance
\(\sigma^2\)	Variance of trait	Residual variance in height (e.g., 100 cm²)
\(\boldsymbol{\Sigma}\)	Known covariance matrix	Residual covariance between height and weight
Prior Parameters
\(\beta_0\)	Prior mean (scalar)	Expected effect = 0 before seeing data
\(\boldsymbol{\beta}_0\)	Prior mean vector	Expected effects = \(\begin{pmatrix} 0 \\ 0 \end{pmatrix}\) before data
\(\sigma_0^2\)	Prior variance (scalar)	Prior uncertainty = 0.25 for effect size
\(\boldsymbol{\Sigma}_0\)	Prior covariance matrix	Prior uncertainty about joint effects
Posterior Parameters
\(\beta_1\)	Posterior mean (scalar)	Updated effect = 0.3 after seeing data
\(\boldsymbol{\beta}_1\)	Posterior mean vector	Updated effects = \(\begin{pmatrix} 0.3 \\ 0.2 \end{pmatrix}\) after data
\(\sigma_1^2\)	Posterior variance (scalar)	Reduced uncertainty = 0.1 after data
\(\boldsymbol{\Sigma}_1\)	Posterior covariance matrix	Reduced uncertainty about joint effects
Mixed Models
\(\mathbf{g}\)	Random effect vector in linear mixed models	Individual-specific genetic effects
\(\mathbf{Z}\)	Design matrix for random effects	Matrix linking observations to random effects
\(\mathbf{u}\)	Random effect vector in linear mixed models	Polygenic effects for each individual
Covariates
\(\mathbf{W}\)	Covariate matrix (confounder, collider, or mediator)	Age, sex, principal components
Meta-Analysis
\(Q\)	Cochran’s Q statistic	Test statistic for heterogeneity = 12.5
\(I^2\)	\(I^2\) statistic for heterogeneity	Percentage of variation due to heterogeneity = 75%
Model and Data
\(M\)	Model	Linear model vs. quadratic model
\(D\)	Observed data	Genotype and phenotype measurements
Likelihood and Testing
\(\mathcal{L}\)	Likelihood function	\(\mathcal{L}(\beta\mid Y,X) = P(Y\mid X,\beta)\)
\(\ell\)	Log-likelihood	\(\ell = \log(\mathcal{L})\) for numerical stability
\(\Lambda\)	Likelihood ratio test statistic	\(\Lambda = 2(\ell_1 - \ell_0)\)
\(p\)	P-value	Probability = 0.001 under null hypothesis
\(H_0\)	Null hypothesis	No genetic effect (\(\beta = 0\))
\(H_a\)	Alternative hypothesis	Genetic effect exists (\(\beta \neq 0\))
Parameters and Spaces
\(\theta\)	Generic parameter of statistical model	Could be \(\beta\), \(\sigma^2\), or other parameters
\(\Theta\)	Parameter space of \(\theta\)	Set of all possible values for \(\theta\)
Sufficient Statistics
\(T_1\)	Sufficient statistic (scalar)	\(T_1 = \sum_{i=1}^N X_i^2 = 1000\)
\(\mathbf{T}_2\)	Sufficient statistic vector	\(\mathbf{T}_2 = \sum_{i=1}^N X_i \mathbf{Y}_i\) for multivariate traits
Mixture Models
\(\pi_c\)	Mixture weight for component \(c\)	90% of variants have no effect (\(\pi_1 = 0.9\))
\(p_c(\beta)\)	Component density function	Normal distribution for component \(c\)