Cochran Theorem for Variance Decomposition



Cochran’s Theorem is the basis of variance composition in ANOVA. Here is a proof that could be understood intuitively, and an linear regression example of how Cochran’s thm. is used in ANOVA.

Cochran’s Theorem:

Given r.v. $X=(X_1,X_2,\ldots,X_n)',$ $X_i$ i.i.d. $\sim N(0,1)$, and positive semi-definite matrix $A_1,A_2,\ldots,A_k$ with $\mathrm{rank}(A_i):=r_i$. If $\begin{align} \sum_{i=1}^kA_i=I_n,\quad \sum_{i=1}^kr_i=n \end{align}$

the quadric form generated by $X$ and $A$s denoted: $\begin{align} &Q_i:=X'A_iX,\quad i=1,2,\ldots,k\\ &\sum_{i=1}^nX_i^2=X'I_nX=\sum_{i=1}^kX'A_iX=\sum_{i=1}^kQ_i \end{align}$

then we have:

Independency: $Q_i\perp\!\!\!\perp Q_j,\quad\text{if }i\neq j$
$\chi^2$ distribution: $Q_i\sim \chi^2_{r_i}$

Here an intuitive proof for this simplified version is provided:

Proof

Note that positive semi-definite matrix $\mathrm{rank}(A_i)=r_i$, i.e. $A_i$’s eigen decomposition could be written as $\begin{align} A_i=P_i\Lambda _iP_i'= \tiny \begin{bmatrix} *_{11}&\ldots&*_{1r_i}&0&\ldots &0\\ *_{21}&\ldots&*_{2r_i}&0&\ldots &0\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ *_{n1}&\ldots&*_{nr_i}&0&\ldots &0\\ \end{bmatrix} \tiny\begin{bmatrix} \lambda_1 & 0 & 0 & \cdots & \cdots & & 0 \\ 0 & \lambda_2 & 0 & \cdots & \cdots & & 0 \\ 0 & 0 & \ddots & & & & \vdots \\ \vdots & \vdots & & \lambda_{r_i} & & \\ \vdots & \vdots & & & 0 & \\ 0 & \vdots & & & & \ddots \\ 0 & 0 & \ldots & & & & 0 \end{bmatrix}\tiny\begin{bmatrix} *_{11}&*_{12}&\ldots &*_{1n}\\ \vdots&\vdots&\vdots&\vdots&\\ *_{r_i1}&*_{r_i2}&\ldots &*_{r_in}\\ 0&0&\vdots&0\\ \vdots&\vdots&\vdots&\vdots&\\ 0&0&\vdots&0 \end{bmatrix} \end{align}$

Note: Non-zero rows of $P_i$ & $\Lambda _i$ could be changed, here we would select rows $(1+\sum_{j<i}r_j):(\sum_{j\leq j}r_i)$. i.e. $P_1$ uses row $1:r_1$, $P_2$ uses row $(r_1+1):(r_1+r_2)$, $\ldots$, $P_k$ uses row $(n-r_k):n$. In this way we would find that: $\begin{align} P_i'P_j=0,\quad P_iP_j'=0,\quad \text{if }i\neq j \end{align}$

Then we could denote $Y_i=P_i'X$ and $Y:=\sum_{i=1}^kP_i'X:=P'X$. In this way $Y_i$ has only $r_i$ non-zero elements at $\begin{align} Y_{1+\sum_{j<i}r_j},Y_{1+\sum_{j<i}r_j},\ldots,Y_{\sum_{j\leq i}r_j} \end{align}$

and zero otherwise. The orthogonal property or $Y_i$ indicates that $\begin{align} I_n=&\sum_{i=1}^kA_i=\sum_{i=1}^kP_i\Lambda _iP_i'\\ =&\sum_{i=1}^k\left[\left(\sum_{j=1}^kP_j\right)\Lambda _i\left(\sum_{j=1}^kP_j\right)'\right]\\ :=&P\left(\sum_{i=1}^k\Lambda_i\right) P'\\ \Rightarrow \Lambda _i=&\mathrm{diag}\left\{ 0_{r_1},\ldots,I_{r_i},\ldots,0_{r_k} \right\},\,\forall i=1,2\ldots,k \end{align}$

Thus the quadric decomposition expressed in $Y_i=P_i'X$: $\begin{align} X'A_iX=X'P_i\Lambda _iP_i'X=Y_i'Y_i=&\sum_{l=1+\sum_{j<i}r_j}^{r_i+\sum_{j< i}r_j}Y_l^2\\ % =&Y_{1+\sum_{j<i}r_j}^2+Y_{1+\sum_{j<i}r_j}^2+\ldots+Y_{\sum_{j\leq i}r_j}^2 \end{align}$

Note that covariance of $Y$: $\begin{align} cov(Y_i,Y_j)=&\mathbb{E}\left( Y_iY_j' \right) =P_i'P_j=\mathrm{diag}\left\{ 0_{r_1},\ldots,\delta _{ij}I_{r_i},\ldots,0_{r_k} \right\} \end{align}$

then $\begin{align} &Q_i=Y_i'Y_i=\sum_{l=1+\sum_{j<i}r_j}^{r_i+\sum_{j< i}r_j}Y_l^2\sim \chi^2_{r_i}\\ &Q_i\perp\!\!\!\perp Q_j,\quad i\neq j \end{align}$

An Example of Linear Regression

In OLS estimation of linear regression $\begin{align} \mathop{Y}\limits_{n\times 1} =\mathop{X}\limits_{n\times (p+1)} \mathop{\beta }\limits_{(n+1)\times 1} +\mathop{\varepsilon }\limits_{n\times 1},\quad \varepsilon \sim N_n(0,I_n) \end{align}$

in which $\beta =(\beta _0,\beta _1,\ldots,\beta _p)'$. The solution is $\begin{align} \hat{\beta }=(X'X)^{-1}X'Y \end{align}$

Denote Hat matrix $H:=X(X'X)^{-1}X'$. Note that $\mathbf{1}_n=X_{1:n,1}$ is the first column of $X$, i.e. $\begin{align} H\mathbf{1}_n=&HX_{1:n,1}=\left[X(X'X)^{-1}X'X\right]_{1:n,1}=X_{1:n,1}=\mathbf{1}_n\\ \Rightarrow &H\mathcal{J}_n=H\mathbf{1}_n\mathbf{1}_n'=\mathcal{J}_n \end{align}$

Variance decomposition: $\begin{align} \mathrm{SSTO}=&\sum_{i=1}^n(Y_i-\bar{Y})^2= \left(Y-\dfrac{1}{n}\mathcal{J}_nY\right)' \left(Y-\dfrac{1}{n}\mathcal{J}_nY\right)=Y'\left( I-\dfrac{1}{n}\mathcal{J}_n \right)Y\\ \mathrm{SSR}=&\sum_{i=1}^n(\hat{Y}_i-\bar{Y})^2=\left( HY-\dfrac{1}{n}\mathcal{J}_nY \right)'\left( HY-\dfrac{1}{n}\mathcal{J}_nY \right)=Y'\left( H-\dfrac{1}{n}\mathcal{J}_n \right)Y\\ \mathrm{SSE}=&\sum_{i=1}^n(Y_i-\hat{Y}_i)^2= \left( Y-HY \right)' \left( Y-HY \right)=Y'\left( I- H\right)Y \end{align}$

Note that all $I-\dfrac{1}{n}\mathcal{J}_n$, $H-\dfrac{1}{n}\mathcal{J}_n$ and $I-H$ are idempotent matrix, then $\begin{align} \mathrm{rank}&(I-\dfrac{1}{n}\mathcal{J}_n)=tr(I-\dfrac{1}{n}\mathcal{J}_n)=n-1 \\ \mathrm{rank}&(H-\dfrac{1}{n}\mathcal{J}_n)=tr(H-\dfrac{1}{n}\mathcal{J}_n)=(p+1)-1=p\\ \mathrm{rank}&(I-H)=tr(I-H)=n-(p+1)=n-p-1 \end{align}$

here the $\mathrm{rank}$ is just the degree of freedom in ANOVA.

An extra comment: The deduction above requires that the design matrix $X$ contain the column of $\mathbf{1}_{n}$ as the first column, i.e. requires an intercept term. The requirement may indicate that the ANOVA above could only be applied to the case of regression with intercept term.

Author: Vincent Peng

First Created: March 8, 2022

Category: Knowledge

