Notations#
Below is a table of notations that we use in this course.
Symbol |
Meaning |
Example Meaning |
---|---|---|
Genotype Data |
||
\(\mathbf{X}_{\text{raw}}\) |
Raw genotype matrix (without centering or standardization) |
\(N \times M\) matrix with values 0, 1, 2 for AA, Aa, aa |
\(\mathbf{X}\) |
Genotype matrix after normalization |
Centered and scaled genotype matrix |
\(\mathbf{X}_{i,\cdot}\) |
Genotype for individual \(i\) across all variants |
Row vector of genotypes for individual \(i\) |
\(\mathbf{X}_{i,j}\) |
Genotype for individual \(i\) at variant \(j\) |
Number of risk alleles (0, 1, or 2) for individual \(i\) at variant \(j\) |
\(\mathbf{X}_{\cdot,j}\) |
Genotype at variant \(j\) for all individuals |
Column vector of genotypes for variant \(j\) across all individuals |
\(X_i\) |
Genotype value for individual \(i\) (single variant case) |
Number of risk alleles for individual \(i\) (0, 1, or 2) |
\(\mu_j\) |
Mean value of variant \(j\) in \(\mathbf{X}_{\text{raw}}\) |
Mean genotype value across all individuals for variant \(j\) |
Indices and Dimensions |
||
\(i\) |
Index for individual |
\(i=1,2,...,N\) |
\(j\) |
Index for genetic variant |
\(j=1,2,...,M\) |
\(k\) |
Index for study |
\(k=1,2,...,K\) |
\(c\) |
Index for mixture component |
\(c=1,2,...,C\) |
\(N\) |
Number of individuals |
500,000 individuals in UK Biobank subset |
\(M\) |
Number of variants |
1,000,000 SNPs after quality control |
$M_e |
Number of effective variants |
900,000 SNPs after corrected for LD |
\(R\) |
Number of random effects |
5 random effects from gene 1 to gene 5 |
\(K\) |
Number of studies |
3 studies in meta-analysis |
\(C\) |
Number of mixture components |
4 components in mixture model (null, small, medium, large effects) |
Alleles and Frequencies |
||
\(A\) |
Major allele |
Reference allele (more common) |
\(a\) |
Minor allele |
Alternative allele (less common) |
\(f_A\), \(f_a\) |
Allele frequencies |
\(f_A=0.7, f_a=0.3\) for a variant |
Linkage and Relatedness |
||
\(l_j\) |
LD score of variant \(j\) |
Measure of local LD around variant \(j\) |
\(\bar{\boldsymbol{l}}\) |
The mean LD score across all SNPs |
Average LD scores |
\(\mathbf{G}\) |
Genetic relationship matrix (GRM) |
\(N \times N\) matrix of genetic similarities |
Traits and Phenotypes |
||
\(\mathbf{Y}\) |
Trait vector (multivariate case) |
Height and weight measurements |
\(Y_i\) |
Trait value for individual \(i\) |
Height = 175cm for individual \(i\) |
\(Y_{i,a}\), \(Y_{i,b}\) |
Trait values for individual \(i\) (multivariate) |
Height = 175cm, weight = 70kg for individual \(i\) |
Regression Parameters |
||
\(b_0\) |
Intercept in regression model |
Mean height = 170cm when genotype = 0 |
\(\beta\) |
Effect size of variant on trait (scalar) |
0.5cm increase in height per risk allele |
\(\boldsymbol{\beta}\) |
Effect size vector (multivariate) |
Effects on height and weight: \(\begin{pmatrix} 0.5 \\ 0.3 \end{pmatrix}\) |
\(\hat{\beta}_k\) |
Estimated effect from study \(k\) |
Effect estimate from European study |
\(\mu_\beta\) |
Mean of \(\beta\) distribution |
Average effect across all variants |
\(w_k\) |
Weight for the \(k\)-th study under fixed effect meta-analysis |
\(w_k = \frac{1}{\text{SE}_k^2}\) |
\(\tau^2\) |
The between-study variance (heterogeneity) under random effect meta-analysis |
How different the studies are from each other |
\(w_k^*\) |
Weight for the \(k\)-th study under random effect meta-analysis |
\(w_k^* = \frac{1}{\text{SE}_k^2 + \tau^2}\) |
Variance and Covariance |
||
\(\sigma^2\) |
Variance of trait |
Residual variance in height (e.g., 100 cm²) |
\(\boldsymbol{\Sigma}\) |
Known covariance matrix |
Residual covariance between height and weight |
Prior Parameters |
||
\(\beta_0\) |
Prior mean (scalar) |
Expected effect = 0 before seeing data |
\(\boldsymbol{\beta}_0\) |
Prior mean vector |
Expected effects = \(\begin{pmatrix} 0 \\ 0 \end{pmatrix}\) before data |
\(\sigma_0^2\) |
Prior variance (scalar) |
Prior uncertainty = 0.25 for effect size |
\(\boldsymbol{\Sigma}_0\) |
Prior covariance matrix |
Prior uncertainty about joint effects |
Posterior Parameters |
||
\(\beta_1\) |
Posterior mean (scalar) |
Updated effect = 0.3 after seeing data |
\(\boldsymbol{\beta}_1\) |
Posterior mean vector |
Updated effects = \(\begin{pmatrix} 0.3 \\ 0.2 \end{pmatrix}\) after data |
\(\sigma_1^2\) |
Posterior variance (scalar) |
Reduced uncertainty = 0.1 after data |
\(\boldsymbol{\Sigma}_1\) |
Posterior covariance matrix |
Reduced uncertainty about joint effects |
Mixed Models |
||
\(\mathbf{g}\) |
Random effect vector in linear mixed models |
Individual-specific genetic effects |
\(\mathbf{Z}\) |
Design matrix for random effects |
Matrix linking observations to random effects |
\(\mathbf{u}\) |
Random effect vector in linear mixed models |
Polygenic effects for each individual |
Covariates |
||
\(\mathbf{W}\) |
Covariate matrix (confounder, collider, or mediator) |
Age, sex, principal components |
Meta-Analysis |
||
\(Q\) |
Cochran’s Q statistic |
Test statistic for heterogeneity = 12.5 |
\(I^2\) |
\(I^2\) statistic for heterogeneity |
Percentage of variation due to heterogeneity = 75% |
Model and Data |
||
\(M\) |
Model |
Linear model vs. quadratic model |
\(D\) |
Observed data |
Genotype and phenotype measurements |
Likelihood and Testing |
||
\(\mathcal{L}\) |
Likelihood function |
\(\mathcal{L}(\beta\mid Y,X) = P(Y\mid X,\beta)\) |
\(\ell\) |
Log-likelihood |
\(\ell = \log(\mathcal{L})\) for numerical stability |
\(\Lambda\) |
Likelihood ratio test statistic |
\(\Lambda = 2(\ell_1 - \ell_0)\) |
\(p\) |
P-value |
Probability = 0.001 under null hypothesis |
\(H_0\) |
Null hypothesis |
No genetic effect (\(\beta = 0\)) |
\(H_a\) |
Alternative hypothesis |
Genetic effect exists (\(\beta \neq 0\)) |
Parameters and Spaces |
||
\(\theta\) |
Generic parameter of statistical model |
Could be \(\beta\), \(\sigma^2\), or other parameters |
\(\Theta\) |
Parameter space of \(\theta\) |
Set of all possible values for \(\theta\) |
Sufficient Statistics |
||
\(T_1\) |
Sufficient statistic (scalar) |
\(T_1 = \sum_{i=1}^N X_i^2 = 1000\) |
\(\mathbf{T}_2\) |
Sufficient statistic vector |
\(\mathbf{T}_2 = \sum_{i=1}^N X_i \mathbf{Y}_i\) for multivariate traits |
Mixture Models |
||
\(\pi_c\) |
Mixture weight for component \(c\) |
90% of variants have no effect (\(\pi_1 = 0.9\)) |
\(p_c(\beta)\) |
Component density function |
Normal distribution for component \(c\) |