Notations

Notations#

Below is a table of notations that we use in this course.

Symbol

Meaning

Example Meaning

Genotype Data

\(\mathbf{X}_{\text{raw}}\)

Raw genotype matrix (without centering or standardization)

\(N \times M\) matrix with values 0, 1, 2 for AA, Aa, aa

\(\mathbf{X}\)

Genotype matrix after normalization

Centered and scaled genotype matrix

\(\mathbf{X}_{i,\cdot}\)

Genotype for individual \(i\) across all variants

Row vector of genotypes for individual \(i\)

\(\mathbf{X}_{i,j}\)

Genotype for individual \(i\) at variant \(j\)

Number of risk alleles (0, 1, or 2) for individual \(i\) at variant \(j\)

\(\mathbf{X}_{\cdot,j}\)

Genotype at variant \(j\) for all individuals

Column vector of genotypes for variant \(j\) across all individuals

\(X_i\)

Genotype value for individual \(i\) (single variant case)

Number of risk alleles for individual \(i\) (0, 1, or 2)

\(\mu_j\)

Mean value of variant \(j\) in \(\mathbf{X}_{\text{raw}}\)

Mean genotype value across all individuals for variant \(j\)

Indices and Dimensions

\(i\)

Index for individual

\(i=1,2,...,N\)

\(j\)

Index for genetic variant

\(j=1,2,...,M\)

\(k\)

Index for study

\(k=1,2,...,K\)

\(c\)

Index for mixture component

\(c=1,2,...,C\)

\(N\)

Number of individuals

500,000 individuals in UK Biobank subset

\(M\)

Number of variants

1,000,000 SNPs after quality control

$M_e

Number of effective variants

900,000 SNPs after corrected for LD

\(R\)

Number of random effects

5 random effects from gene 1 to gene 5

\(K\)

Number of studies

3 studies in meta-analysis

\(C\)

Number of mixture components

4 components in mixture model (null, small, medium, large effects)

Alleles and Frequencies

\(A\)

Major allele

Reference allele (more common)

\(a\)

Minor allele

Alternative allele (less common)

\(f_A\), \(f_a\)

Allele frequencies

\(f_A=0.7, f_a=0.3\) for a variant

Linkage and Relatedness

\(l_j\)

LD score of variant \(j\)

Measure of local LD around variant \(j\)

\(\bar{\boldsymbol{l}}\)

The mean LD score across all SNPs

Average LD scores

\(\mathbf{G}\)

Genetic relationship matrix (GRM)

\(N \times N\) matrix of genetic similarities

Traits and Phenotypes

\(\mathbf{Y}\)

Trait vector (multivariate case)

Height and weight measurements

\(Y_i\)

Trait value for individual \(i\)

Height = 175cm for individual \(i\)

\(Y_{i,a}\), \(Y_{i,b}\)

Trait values for individual \(i\) (multivariate)

Height = 175cm, weight = 70kg for individual \(i\)

Regression Parameters

\(b_0\)

Intercept in regression model

Mean height = 170cm when genotype = 0

\(\beta\)

Effect size of variant on trait (scalar)

0.5cm increase in height per risk allele

\(\boldsymbol{\beta}\)

Effect size vector (multivariate)

Effects on height and weight: \(\begin{pmatrix} 0.5 \\ 0.3 \end{pmatrix}\)

\(\hat{\beta}_k\)

Estimated effect from study \(k\)

Effect estimate from European study

\(\mu_\beta\)

Mean of \(\beta\) distribution

Average effect across all variants

\(w_k\)

Weight for the \(k\)-th study under fixed effect meta-analysis

\(w_k = \frac{1}{\text{SE}_k^2}\)

\(\tau^2\)

The between-study variance (heterogeneity) under random effect meta-analysis

How different the studies are from each other

\(w_k^*\)

Weight for the \(k\)-th study under random effect meta-analysis

\(w_k^* = \frac{1}{\text{SE}_k^2 + \tau^2}\)

Variance and Covariance

\(\sigma^2\)

Variance of trait

Residual variance in height (e.g., 100 cm²)

\(\boldsymbol{\Sigma}\)

Known covariance matrix

Residual covariance between height and weight

Prior Parameters

\(\beta_0\)

Prior mean (scalar)

Expected effect = 0 before seeing data

\(\boldsymbol{\beta}_0\)

Prior mean vector

Expected effects = \(\begin{pmatrix} 0 \\ 0 \end{pmatrix}\) before data

\(\sigma_0^2\)

Prior variance (scalar)

Prior uncertainty = 0.25 for effect size

\(\boldsymbol{\Sigma}_0\)

Prior covariance matrix

Prior uncertainty about joint effects

Posterior Parameters

\(\beta_1\)

Posterior mean (scalar)

Updated effect = 0.3 after seeing data

\(\boldsymbol{\beta}_1\)

Posterior mean vector

Updated effects = \(\begin{pmatrix} 0.3 \\ 0.2 \end{pmatrix}\) after data

\(\sigma_1^2\)

Posterior variance (scalar)

Reduced uncertainty = 0.1 after data

\(\boldsymbol{\Sigma}_1\)

Posterior covariance matrix

Reduced uncertainty about joint effects

Mixed Models

\(\mathbf{g}\)

Random effect vector in linear mixed models

Individual-specific genetic effects

\(\mathbf{Z}\)

Design matrix for random effects

Matrix linking observations to random effects

\(\mathbf{u}\)

Random effect vector in linear mixed models

Polygenic effects for each individual

Covariates

\(\mathbf{W}\)

Covariate matrix (confounder, collider, or mediator)

Age, sex, principal components

Meta-Analysis

\(Q\)

Cochran’s Q statistic

Test statistic for heterogeneity = 12.5

\(I^2\)

\(I^2\) statistic for heterogeneity

Percentage of variation due to heterogeneity = 75%

Model and Data

\(M\)

Model

Linear model vs. quadratic model

\(D\)

Observed data

Genotype and phenotype measurements

Likelihood and Testing

\(\mathcal{L}\)

Likelihood function

\(\mathcal{L}(\beta\mid Y,X) = P(Y\mid X,\beta)\)

\(\ell\)

Log-likelihood

\(\ell = \log(\mathcal{L})\) for numerical stability

\(\Lambda\)

Likelihood ratio test statistic

\(\Lambda = 2(\ell_1 - \ell_0)\)

\(p\)

P-value

Probability = 0.001 under null hypothesis

\(H_0\)

Null hypothesis

No genetic effect (\(\beta = 0\))

\(H_a\)

Alternative hypothesis

Genetic effect exists (\(\beta \neq 0\))

Parameters and Spaces

\(\theta\)

Generic parameter of statistical model

Could be \(\beta\), \(\sigma^2\), or other parameters

\(\Theta\)

Parameter space of \(\theta\)

Set of all possible values for \(\theta\)

Sufficient Statistics

\(T_1\)

Sufficient statistic (scalar)

\(T_1 = \sum_{i=1}^N X_i^2 = 1000\)

\(\mathbf{T}_2\)

Sufficient statistic vector

\(\mathbf{T}_2 = \sum_{i=1}^N X_i \mathbf{Y}_i\) for multivariate traits

Mixture Models

\(\pi_c\)

Mixture weight for component \(c\)

90% of variants have no effect (\(\pi_1 = 0.9\))

\(p_c(\beta)\)

Component density function

Normal distribution for component \(c\)