# Notations

Below is a table of notations that we use in this course.


| Symbol | Meaning | Example Meaning |
|--------|---------|-----------------|
| **Genotype Data** | | |
| $\mathbf{X}_{\text{raw}}$ | Raw genotype matrix (without centering or standardization) | $N \times M$ matrix with values 0, 1, 2 for AA, Aa, aa |
| $\mathbf{X}$ | Genotype matrix after normalization | Centered and scaled genotype matrix |
| $\mathbf{X}_{i,\cdot}$ | Genotype for individual $i$ across all variants | Row vector of genotypes for individual $i$ |
| $\mathbf{X}_{i,j}$ | Genotype for individual $i$ at variant $j$ | Number of risk alleles (0, 1, or 2) for individual $i$ at variant $j$ |
| $\mathbf{X}_{\cdot,j}$ | Genotype at variant $j$ for all individuals | Column vector of genotypes for variant $j$ across all individuals |
| $X_i$ | Genotype value for individual $i$ (single variant case) | Number of risk alleles for individual $i$ (0, 1, or 2) |
| $\mu_j$ | Mean value of variant $j$ in $\mathbf{X}_{\text{raw}}$ | Mean genotype value across all individuals for variant $j$ |
| **Indices and Dimensions** | | |
| $i$ | Index for individual | $i=1,2,...,N$ |
| $j$ | Index for genetic variant | $j=1,2,...,M$ |
| $k$ | Index for study | $k=1,2,...,K$ |
| $c$ | Index for mixture component | $c=1,2,...,C$ |
| $N$ | Number of individuals | 500,000 individuals in UK Biobank subset |
| $M$ | Number of variants | 1,000,000 SNPs after quality control |
| $M_e | Number of effective variants | 900,000 SNPs after corrected for LD |
| $R$ | Number of random effects | 5 random effects from gene 1 to gene 5|
| $K$ | Number of studies | 3 studies in meta-analysis |
| $C$ | Number of mixture components | 4 components in mixture model (null, small, medium, large effects) |
| **Alleles and Frequencies** | | |
| $A$ | Major allele | Reference allele (more common) |
| $a$ | Minor allele | Alternative allele (less common) |
| $f_A$, $f_a$ | Allele frequencies | $f_A=0.7, f_a=0.3$ for a variant |
| **Linkage and Relatedness** | | |
| $l_j$ | LD score of variant $j$ | Measure of local LD around variant $j$ |
| $\bar{\boldsymbol{l}}$ | The mean LD score across all SNPs | Average LD scores |
| $\mathbf{G}$ | Genetic relationship matrix (GRM) | $N \times N$ matrix of genetic similarities |
| **Traits and Phenotypes** | | |
| $\mathbf{Y}$ | Trait vector (multivariate case) | Height and weight measurements |
| $Y_i$ | Trait value for individual $i$ | Height = 175cm for individual $i$ |
| $Y_{i,a}$, $Y_{i,b}$ | Trait values for individual $i$ (multivariate) | Height = 175cm, weight = 70kg for individual $i$ |
| **Regression Parameters** | | |
| $b_0$ | Intercept in regression model | Mean height = 170cm when genotype = 0 |
| $\beta$ | Effect size of variant on trait (scalar) | 0.5cm increase in height per risk allele |
| $\boldsymbol{\beta}$ | Effect size vector (multivariate) | Effects on height and weight: $\begin{pmatrix} 0.5 \\ 0.3 \end{pmatrix}$ |
| $\hat{\beta}_k$ | Estimated effect from study $k$ | Effect estimate from European study |
| $\mu_\beta$ | Mean of $\beta$ distribution | Average effect across all variants |
| $w_k$ | Weight for the $k$-th study under fixed effect meta-analysis | $w_k = \frac{1}{\text{SE}_k^2}$ |
| $\tau^2$ | The between-study variance (heterogeneity) under random effect meta-analysis | How different the studies are from each other | 
| $w_k^*$ | Weight for the $k$-th study under random effect meta-analysis | $w_k^* = \frac{1}{\text{SE}_k^2 + \tau^2}$ | 
| **Variance and Covariance** | | |
| $\sigma^2$ | Variance of trait | Residual variance in height (e.g., 100 cmÂ²) |
| $\boldsymbol{\Sigma}$ | Known covariance matrix | Residual covariance between height and weight |
| **Prior Parameters** | | |
| $\beta_0$ | Prior mean (scalar) | Expected effect = 0 before seeing data |
| $\boldsymbol{\beta}_0$ | Prior mean vector | Expected effects = $\begin{pmatrix} 0 \\ 0 \end{pmatrix}$ before data |
| $\sigma_0^2$ | Prior variance (scalar) | Prior uncertainty = 0.25 for effect size |
| $\boldsymbol{\Sigma}_0$ | Prior covariance matrix | Prior uncertainty about joint effects |
| **Posterior Parameters** | | |
| $\beta_1$ | Posterior mean (scalar) | Updated effect = 0.3 after seeing data |
| $\boldsymbol{\beta}_1$ | Posterior mean vector | Updated effects = $\begin{pmatrix} 0.3 \\ 0.2 \end{pmatrix}$ after data |
| $\sigma_1^2$ | Posterior variance (scalar) | Reduced uncertainty = 0.1 after data |
| $\boldsymbol{\Sigma}_1$ | Posterior covariance matrix | Reduced uncertainty about joint effects |
| **Mixed Models** | | |
| $\mathbf{g}$ | Random effect vector in linear mixed models | Individual-specific genetic effects |
| $\mathbf{Z}$ | Design matrix for random effects | Matrix linking observations to random effects |
| $\mathbf{u}$ | Random effect vector in linear mixed models | Polygenic effects for each individual |
| **Covariates** | | |
| $\mathbf{W}$ | Covariate matrix (confounder, collider, or mediator) | Age, sex, principal components |
| **Meta-Analysis** | | |
| $Q$ | Cochran's Q statistic | Test statistic for heterogeneity = 12.5 |
| $I^2$ | $I^2$ statistic for heterogeneity | Percentage of variation due to heterogeneity = 75% |
| **Model and Data** | | |
| $M$ | Model | Linear model vs. quadratic model |
| $D$ | Observed data | Genotype and phenotype measurements |
| **Likelihood and Testing** | | |
| $\mathcal{L}$ | Likelihood function | $\mathcal{L}(\beta\mid Y,X) = P(Y\mid X,\beta)$ |
| $\ell$ | Log-likelihood | $\ell = \log(\mathcal{L})$ for numerical stability |
| $\Lambda$ | Likelihood ratio test statistic | $\Lambda = 2(\ell_1 - \ell_0)$ |
| $p$ | P-value | Probability = 0.001 under null hypothesis |
| $H_0$ | Null hypothesis | No genetic effect ($\beta = 0$) |
| $H_a$ | Alternative hypothesis | Genetic effect exists ($\beta \neq 0$) |
| **Parameters and Spaces** | | |
| $\theta$ | Generic parameter of statistical model | Could be $\beta$, $\sigma^2$, or other parameters |
| $\Theta$ | Parameter space of $\theta$ | Set of all possible values for $\theta$ |
| **Sufficient Statistics** | | |
| $T_1$ | Sufficient statistic (scalar) | $T_1 = \sum_{i=1}^N X_i^2 = 1000$ |
| $\mathbf{T}_2$ | Sufficient statistic vector | $\mathbf{T}_2 = \sum_{i=1}^N X_i \mathbf{Y}_i$ for multivariate traits |
| **Mixture Models** | | |
| $\pi_c$ | Mixture weight for component $c$ | 90% of variants have no effect ($\pi_1 = 0.9$) |
| $p_c(\beta)$ | Component density function | Normal distribution for component $c$ |