Frisch–Waugh–Lovell theorem

In econometrics, the Frisch–Waugh–Lovell^[a] (FWL) theorem is named after the econometricians Ragnar Frisch, Frederick V. Waugh, and Michael C. Lovell.^[1]^[2]^[3]

Background

The Frisch-Waugh-Lovell theorem is an algebraic result for regressions estimated by least squares, the most commonly used estimator in applied econometrics.^[4] Least squares is a method of estimating coefficients in models which are linear in parameters. That is, the outcome variable is modeled as a linear combination of the input variables plus some error term. The least squares solution is that which sets the input variables' coefficients to minimize the sum of squared errors. Under a certain set of assumptions, the Gauss–Markov theorem, least squares estimation is the best linear unbiased estimator.

Let $y$ be any outcome variable and $X$ a set of predictor variables, such that $X=\left[x_{1}~x_{2}~\cdots ~x_{k}\right]$ , and suppose $n$ observations of $(y_{i},X_{i})$ are sampled. If $y$ is modeled as a linear function of $X$ , it can be written as $y_{i}=\beta _{0}+\beta _{1}x_{1i}+\beta _{2}x_{2i}+\cdots +\beta _{k}x_{ki}+e_{i}$ . The least squares estimator sets the coefficients $\beta _{j},j\in \{0,\dots ,k\}$ to minimize the sum of squared errors ${\textstyle \sum _{i=1}^{n}e_{i}^{2}}$ . With $n$ observations this involves minimizing across $n$ equations, and is typically written in matrix form as $y=X\beta +e$ , where $y$ and $e$ are $n$ -dimensional column vectors and $X$ is an $n$ -by- $k$ -dimensional matrix. Then, the least squares solution is $\beta =(X^{\prime }X)^{-1}X^{\prime }y$ .^[5]

In regressions estimated by least squares, it is common to refer to a coefficient as the effect of that variable "holding constant" the other input variables.^[6] For example, if wage is modeled as a function of education and work experience, the coefficient for education is interpreted as the difference in the expectation of wage for a unit difference in education, "holding constant" work experience. Econometrician Arthur Goldberger frames the Frisch-Waugh-Lovell theorem as "giving content to th[is] language".^[7]

Definition and interpretation

The Frisch-Waugh-Lovell theorem states that in a least squares-estimated regression of the form

y_{i}=\beta _{0}+\beta _{1}x_{1i}+\beta _{2}x_{2i}+\cdots +\beta _{k}x_{ki}+e_{i}

any coefficient $\beta _{j},j\in \{1,\dots ,k\}$ can be estimated by the two-step process of:

Regress $x_{j}$ on the set of other right-hand-side variables, obtaining residuals ${\tilde {x}}_{j}$
Regress $y$ on ${\tilde {x}}_{j}$ , obtaining $\beta _{j}={\frac {{\text{cov}}(y,{\tilde {x}}_{j})}{{\text{var}}({\tilde {x}}_{j})}}$

This two-step process is referred to as the residual regression or equivalently the regression anatomy theorem.^[8]^[9]^[10]

The theorem shows that coefficients in a multiple regression reflect the relationship between the associated variable and the outcome variable after removing the part linearly explained by the other predictor variables.^[11]^[9] This is the basis for understanding the contribution of each single variable to a multivariate regression (see, for instance, Ch. 13 in ^[12]).

Double residual regression

The double residual regression is the three-step process:

Regress $x_{j}$ on the set of other right-hand-side variables, obtaining residuals ${\tilde {x}}_{j}$
Regress $y$ on the set of right-hand-side variables excluding $x_{j}$ , obtaining residuals ${\tilde {y}}$
Regress ${\tilde {y}}$ on ${\tilde {x}}_{j}$ , estimating $\beta _{j}={\frac {{\text{cov}}({\tilde {y}},{\tilde {x}}_{j})}{{\text{var}}({\tilde {x}}_{j})}}$ and $e_{i}={\tilde {y}}_{i}-\beta _{j}{\tilde {x}}_{ji}$

This yields an identical coefficient to the two-step process.^[7]^[13] It includes the additional feature that the residuals from the regression in step 3 equal the residuals in the full regression.^[11]

Multivariate definition

Consider the regression $y=X\beta +Z\delta +e$ , where $y$ and $e$ are $n$ -dimensional column vectors, $X$ is an $n$ -by- $k$ matrix, and $Z$ is an $n$ -by- $p$ matrix. Then, the Frisch-Waugh-Lovell theorem states that

\beta =({\tilde {X}}^{\prime }{\tilde {X}})^{-1}{\tilde {X}}^{\prime }y=({\tilde {X}}^{\prime }{\tilde {X}})^{-1}{\tilde {X}}^{\prime }{\tilde {y}}

where ${\tilde {X}}=X-Z(Z^{\prime }Z)^{-1}Z^{\prime }X$ , the residuals from the regression of $X$ on $Z$ , and ${\tilde {y}}=y-Z(Z^{\prime }Z)^{-1}Z^{\prime }y$ , the residuals from the regression of $y$ on $Z$ . The first expression of $\beta$ is the residual regression, and the second the double residual regression.^[7]

History

The origin of the theorem is uncertain, but it was well-established in the realm of linear regression before the Frisch and Waugh paper. George Udny Yule's comprehensive analysis of partial regressions, published in 1907, included the theorem in section 9 on page 184.^[14]

Yule emphasized the theorem's importance for understanding multiple and partial regression and correlation coefficients, as mentioned in section 10 of the same paper.^[14]

Yule 1907 also introduced the partial regression notation which is still in use today.

In 1962, Richard Stone generalized the theorem to apply to an arbitrary number of variables which may be chosen for special analysis in the same way that time was distinguished in Frisch's and Waugh's original formulation.^[15]

In 1963, Lovell published a proof considered more straightforward and intuitive.^[2] In recognition, people generally add his name to the theorem name.

Proof

Consider the linear regression $y=X\beta +Z\delta +e$ and annihilator matrix $M_{Z}=I-Z(Z^{\prime }Z)^{-1}Z^{\prime }$ . Premultiplying both sides of the regression equation by the annihilator matrix removes from $y$ and $X$ the component linearly explained by $Z$ :

{\begin{aligned}M_{Z}y&=M_{Z}(X\beta +Z\delta +e)\\{\tilde {y}}&=M_{Z}X\beta +M_{Z}Z\delta +M_{Z}e\\{\tilde {y}}&={\tilde {X}}\beta +e\end{aligned}}

Then, by the least squares result, $\beta =({\tilde {X}}^{\prime }{\tilde {X}})^{-1}{\tilde {X}}^{\prime }{\tilde {y}}$ and $e={\tilde {y}}-{\tilde {X}}\beta$ . This concludes the proof.^[16]^[7]

Extensions

Standard errors

The Frisch-Waugh-Lovell theorem applies to the standard errors of the partial and full regressions, where they differ (in the homoskedastic case) only by a degrees of freedom adjustment.^[17]

Notes

[a]
Pronounced /ˈfriʃˌwɔːˌlʌvəl/.

Frisch–Waugh–Lovell theorem

Background

Definition and interpretation

Double residual regression

Multivariate definition

History

Proof

Extensions

Standard errors

See also

Notes

References

Sources

Journal articles

Books

Related Articles

Related Articles

"On the theory of correlation for any number of variables, treated by a new system of notation"

"Seasonal Adjustment of Economic Time Series and Multiple Regression Analysis"

"A Simple Proof of the FWL Theorem"

"Regression Anatomy, Revealed"

"The Frisch–Waugh–Lovell theorem for standard errors"

"Frisch–Waugh–Lovell theorem-type results for the k-Class and 2SGMM estimators"

"The frisch-waugh theorem and generalized least squares"

"The Frisch–Waugh–Lovell theorem for the lasso and the ridge regression"