Tuorui "v1ncent19" Peng
En voyage dans l'espace de Hilbert.

Cochran Theorem for Variance Decomposition

Cochran’s Theorem is the basis of variance composition in ANOVA. Here is a proof that could be understood intuitively, and an linear regression example of how Cochran’s thm. is used in ANOVA.

Cochran’s Theorem:

Given r.v. \(X=(X_1,X_2,\ldots,X_n)',\) \(X_i\) i.i.d. \(\sim N(0,1)\), and positive semi-definite matrix \(A_1,A_2,\ldots,A_k\) with \(\mathrm{rank}(A_i):=r_i\). If \(\begin{align} \sum_{i=1}^kA_i=I_n,\quad \sum_{i=1}^kr_i=n \end{align}\)

the quadric form generated by \(X\) and \(A\)s denoted: \(\begin{align} &Q_i:=X'A_iX,\quad i=1,2,\ldots,k\\ &\sum_{i=1}^nX_i^2=X'I_nX=\sum_{i=1}^kX'A_iX=\sum_{i=1}^kQ_i \end{align}\)

then we have:

  • Independency: \(Q_i\perp\!\!\!\perp Q_j,\quad\text{if }i\neq j\)
  • \(\chi^2\) distribution: \(Q_i\sim \chi^2_{r_i}\)

Here an intuitive proof for this simplified version is provided:

Proof

Note that positive semi-definite matrix \(\mathrm{rank}(A_i)=r_i\), i.e. \(A_i\)’s eigen decomposition could be written as \(\begin{align} A_i=P_i\Lambda _iP_i'= \tiny \begin{bmatrix} *_{11}&\ldots&*_{1r_i}&0&\ldots &0\\ *_{21}&\ldots&*_{2r_i}&0&\ldots &0\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots&\\ *_{n1}&\ldots&*_{nr_i}&0&\ldots &0\\ \end{bmatrix} \tiny\begin{bmatrix} \lambda_1 & 0 & 0 & \cdots & \cdots & & 0 \\ 0 & \lambda_2 & 0 & \cdots & \cdots & & 0 \\ 0 & 0 & \ddots & & & & \vdots \\ \vdots & \vdots & & \lambda_{r_i} & & \\ \vdots & \vdots & & & 0 & \\ 0 & \vdots & & & & \ddots \\ 0 & 0 & \ldots & & & & 0 \end{bmatrix}\tiny\begin{bmatrix} *_{11}&*_{12}&\ldots &*_{1n}\\ \vdots&\vdots&\vdots&\vdots&\\ *_{r_i1}&*_{r_i2}&\ldots &*_{r_in}\\ 0&0&\vdots&0\\ \vdots&\vdots&\vdots&\vdots&\\ 0&0&\vdots&0 \end{bmatrix} \end{align}\)

Note: Non-zero rows of \(P_i\) & \(\Lambda _i\) could be changed, here we would select rows \((1+\sum_{j<i}r_j):(\sum_{j\leq j}r_i)\). i.e. \(P_1\) uses row \(1:r_1\), \(P_2\) uses row \((r_1+1):(r_1+r_2)\), \(\ldots\), \(P_k\) uses row \((n-r_k):n\). In this way we would find that: \(\begin{align} P_i'P_j=0,\quad P_iP_j'=0,\quad \text{if }i\neq j \end{align}\)

Then we could denote \(Y_i=P_i'X\) and \(Y:=\sum_{i=1}^kP_i'X:=P'X\). In this way \(Y_i\) has only \(r_i\) non-zero elements at \(\begin{align} Y_{1+\sum_{j<i}r_j},Y_{1+\sum_{j<i}r_j},\ldots,Y_{\sum_{j\leq i}r_j} \end{align}\)

and zero otherwise. The orthogonal property or $Y_i$ indicates that \(\begin{align} I_n=&\sum_{i=1}^kA_i=\sum_{i=1}^kP_i\Lambda _iP_i'\\ =&\sum_{i=1}^k\left[\left(\sum_{j=1}^kP_j\right)\Lambda _i\left(\sum_{j=1}^kP_j\right)'\right]\\ :=&P\left(\sum_{i=1}^k\Lambda_i\right) P'\\ \Rightarrow \Lambda _i=&\mathrm{diag}\left\{ 0_{r_1},\ldots,I_{r_i},\ldots,0_{r_k} \right\},\,\forall i=1,2\ldots,k \end{align}\)

Thus the quadric decomposition expressed in \(Y_i=P_i'X\): \(\begin{align} X'A_iX=X'P_i\Lambda _iP_i'X=Y_i'Y_i=&\sum_{l=1+\sum_{j<i}r_j}^{r_i+\sum_{j< i}r_j}Y_l^2\\ % =&Y_{1+\sum_{j<i}r_j}^2+Y_{1+\sum_{j<i}r_j}^2+\ldots+Y_{\sum_{j\leq i}r_j}^2 \end{align}\)

Note that covariance of \(Y\): \(\begin{align} cov(Y_i,Y_j)=&\mathbb{E}\left( Y_iY_j' \right) =P_i'P_j=\mathrm{diag}\left\{ 0_{r_1},\ldots,\delta _{ij}I_{r_i},\ldots,0_{r_k} \right\} \end{align}\)

then \(\begin{align} &Q_i=Y_i'Y_i=\sum_{l=1+\sum_{j<i}r_j}^{r_i+\sum_{j< i}r_j}Y_l^2\sim \chi^2_{r_i}\\ &Q_i\perp\!\!\!\perp Q_j,\quad i\neq j \end{align}\)

An Example of Linear Regression

In OLS estimation of linear regression \(\begin{align} \mathop{Y}\limits_{n\times 1} =\mathop{X}\limits_{n\times (p+1)} \mathop{\beta }\limits_{(n+1)\times 1} +\mathop{\varepsilon }\limits_{n\times 1},\quad \varepsilon \sim N_n(0,I_n) \end{align}\)

in which \(\beta =(\beta _0,\beta _1,\ldots,\beta _p)'\). The solution is \(\begin{align} \hat{\beta }=(X'X)^{-1}X'Y \end{align}\)

Denote Hat matrix \(H:=X(X'X)^{-1}X'\). Note that \(\mathbf{1}_n=X_{1:n,1}\) is the first column of \(X\), i.e. \(\begin{align} H\mathbf{1}_n=&HX_{1:n,1}=\left[X(X'X)^{-1}X'X\right]_{1:n,1}=X_{1:n,1}=\mathbf{1}_n\\ \Rightarrow &H\mathcal{J}_n=H\mathbf{1}_n\mathbf{1}_n'=\mathcal{J}_n \end{align}\)

Variance decomposition: \(\begin{align} \mathrm{SSTO}=&\sum_{i=1}^n(Y_i-\bar{Y})^2= \left(Y-\dfrac{1}{n}\mathcal{J}_nY\right)' \left(Y-\dfrac{1}{n}\mathcal{J}_nY\right)=Y'\left( I-\dfrac{1}{n}\mathcal{J}_n \right)Y\\ \mathrm{SSR}=&\sum_{i=1}^n(\hat{Y}_i-\bar{Y})^2=\left( HY-\dfrac{1}{n}\mathcal{J}_nY \right)'\left( HY-\dfrac{1}{n}\mathcal{J}_nY \right)=Y'\left( H-\dfrac{1}{n}\mathcal{J}_n \right)Y\\ \mathrm{SSE}=&\sum_{i=1}^n(Y_i-\hat{Y}_i)^2= \left( Y-HY \right)' \left( Y-HY \right)=Y'\left( I- H\right)Y \end{align}\)

Note that all \(I-\dfrac{1}{n}\mathcal{J}_n\), \(H-\dfrac{1}{n}\mathcal{J}_n\) and \(I-H\) are idempotent matrix, then \(\begin{align} \mathrm{rank}&(I-\dfrac{1}{n}\mathcal{J}_n)=tr(I-\dfrac{1}{n}\mathcal{J}_n)=n-1 \\ \mathrm{rank}&(H-\dfrac{1}{n}\mathcal{J}_n)=tr(H-\dfrac{1}{n}\mathcal{J}_n)=(p+1)-1=p\\ \mathrm{rank}&(I-H)=tr(I-H)=n-(p+1)=n-p-1 \end{align}\)

here the \(\mathrm{rank}\) is just the degree of freedom in ANOVA.

An extra comment: The deduction above requires that the design matrix $X$ contain the column of \(\mathbf{1}_{n}\) as the first column, i.e. requires an intercept term. The requirement may indicate that the ANOVA above could only be applied to the case of regression with intercept term.


Author: Vincent Peng

First Created: March 8, 2022

Category: Knowledge