Frisch–Waugh–Lovell theorem

In econometrics, the Frisch–Waugh–Lovell^[a] (FWL) theorem proves a property of ordinary least squares estimators. The theorem states that, in a least squares-estimated regression, each independent variable's coefficient reflects the relationship between the dependent variable and the part of that independent variable which is not linearly explained by the other covariates. By relating multiple regression coefficients to simple regression coefficients, the theorem forms the basis for interpreting coefficients in multiple regressions. The theorem is named for econometricians Ragnar Frisch, Frederick V. Waugh, and Michael C. Lovell.

Background

The Frisch-Waugh-Lovell theorem is a result for regressions estimated by ordinary least squares, the most commonly used estimator in applied econometrics.^[1] Ordinary least squares is a method of estimating coefficients in regressions which are linear in parameters: a single dependent variable is modeled as a linear combination of one or more independent variables plus some error term. For example, wages may be modeled as a function of a constant term, education, and parental income, with an error term that encompasses deviations from the model's prediction. Least squares is one way of estimating the coefficients of such a model, setting the coefficients to minimize the sum of squared errors.^[2] Under a certain set of assumptions, the hypotheses of the Gauss–Markov theorem, least squares estimation is the best linear unbiased estimator.^[3]

Let $y$ be any dependent variable and $x_{1},\,x_{2},\,\dots ,\,x_{k}$ a set of $k$ independent variables, and suppose $n$ observations of $(y_{i},x_{1i},x_{2i},\dots ,x_{ki})$ are obtained. If $y$ is modeled as a linear function of the independent variables and a constant, it can be written as $y_{i}=\beta _{0}+\beta _{1}x_{1i}+\beta _{2}x_{2i}+\cdots +\beta _{k}x_{ki}+e_{i}$ . The least squares estimator sets the coefficients $\beta _{j},j\in \{0,\dots ,k\}$ to minimize the sum of squared errors ${\textstyle \sum _{i=1}^{n}e_{i}^{2}}$ . With $n$ observations this involves minimizing across $n$ equations, and is typically written in matrix form as $y=X\beta +e$ , where $y$ and $e$ are $n\times 1$ vectors of dependent variable observations and errors, respectively, and $X$ is an $n$ -by- $k$ matrix of independent variables' observations. Then, the least squares solution is $\beta =(X^{\prime }X)^{-1}X^{\prime }y$ .^[4]

In regressions estimated by least squares, it is common to refer to an independent variable's coefficient as the effect of that variable "holding constant" the other independent variables.^[5] For example, if wage is modeled as a function of education and work experience, the coefficient on education is interpreted as the difference in the expectation of wage for a unit difference in education, "holding constant" work experience. Econometrician Arthur Goldberger frames the Frisch-Waugh-Lovell theorem as "giving content to th[is] language".^[6]

Definition and interpretation

The Frisch-Waugh-Lovell theorem states that in a least squares-estimated regression of the form

y_{i}=\beta _{0}+\beta _{1}x_{1i}+\beta _{2}x_{2i}+\cdots +\beta _{k}x_{ki}+e_{i}

any coefficient $\beta _{j},j\in \{1,\dots ,k\}$ can be obtained by the two-step process of:

Regress $x_{j}$ on the set of other independent variables, obtaining residuals ${\tilde {x}}_{j}$
Regress $y$ on ${\tilde {x}}_{j}$ , obtaining $\beta _{j}={\frac {{\text{cov}}(y,{\tilde {x}}_{j})}{{\text{var}}({\tilde {x}}_{j})}}$

This two-step process is referred to as the residual regression or equivalently the regression anatomy theorem.^[7]^[8] This result is a numerical property of least squares estimation and does not depend on statistical properties of the data.^[9]^[10]

From this theorem, each independent variable can be decomposed into two parts: the part which is linearly related to the set of other independent variables, and the residual which remains. Then, that independent variable's coefficient can be found from the simple regression of the dependent variable on those residuals.^[11]^[12] This result is the basis for interpreting the impact of including additional variables in a regression: it is equivalent to removing from the existing variables the part which the new variables linearly explain.^[13]^[14]

The Frisch-Waugh-Lovell theorem can, for example, be applied to interpret multicollinearity. When most of the variation in an independent variable is explained by the other independent variables, very little variation remains after the first step of the residual regression. Resultingly, the estimate of the independent variable's coefficient may be less precise than if fewer variables were controlled for. This is because least squares, in equaling the regression of the dependent variable on the part of each independent variable not linearly related to the other independent variables, is implicitly removing the variation in each independent variable explained by the other independent variables.^[15]^[16]

Example

Consider the regression of wage on education and parental income:

wage_{i}=\beta _{0}+\beta _{1}education_{i}+\beta _{2}parental\_income_{i}+e_{i}

While the least squares estimates for $\beta _{0},\,\beta _{1}$ and $\beta _{2}$ can be obtained by minimizing ${\textstyle \sum _{i=1}^{n}e_{i}^{2}}$ directly, each can be equivalently obtained by the two-step Frisch-Waugh-Lovell process. In the case of education:

Regress education on parental income, saving the residuals from this regression: the part of education not linearly related to parental income
Regress wages on the residuals, obtaining the least squares estimate for $\beta _{1}$

This illustrates how $\beta _{1}$ reflects the effect of education on wages controlling for parental income: it is the relationship between wages and the part of education not linearly related to parental income.^[8]

Double residual regression

The double residual regression is the three-step process:

Regress $x_{j}$ on the set of other independent variables, obtaining residuals ${\tilde {x}}_{j}$
Regress $y$ on the set of independent variables excluding $x_{j}$ , obtaining residuals ${\tilde {y}}$
Regress ${\tilde {y}}$ on ${\tilde {x}}_{j}$ , estimating $\beta _{j}={\frac {{\text{cov}}({\tilde {y}},{\tilde {x}}_{j})}{{\text{var}}({\tilde {x}}_{j})}}$ and $e_{i}={\tilde {y}}_{i}-\beta _{j}{\tilde {x}}_{ji}$

Like the two-step process, this yields an identical coefficient to the full regression.^[6]^[17] It includes the additional feature that the residuals from the regression in step 3 equal the residuals in the full regression.^[11]

Multivariate definition

Consider the regression $y=X\beta +Z\delta +e$ , where $y$ and $e$ are $n\times 1$ vectors of dependent variable observations and errors, respectively, $X$ is an $n\times k$ matrix of $k$ independent variables' observations, $Z$ is an $n\times p$ matrix of $p$ independent variables' observations, and $\beta$ and $\delta$ are $k\times 1$ coefficient vectors for $X$ and $Z$ respectively. Then, the Frisch-Waugh-Lovell theorem states that

\beta =({\tilde {X}}^{\prime }{\tilde {X}})^{-1}{\tilde {X}}^{\prime }y=({\tilde {X}}^{\prime }{\tilde {X}})^{-1}{\tilde {X}}^{\prime }{\tilde {y}}

where ${\tilde {X}}=X-Z(Z^{\prime }Z)^{-1}Z^{\prime }X$ , the residuals from the regression of $X$ on $Z$ , and ${\tilde {y}}=y-Z(Z^{\prime }Z)^{-1}Z^{\prime }y$ , the residuals from the regression of $y$ on $Z$ . The first expression of $\beta$ is the residual regression, and the second the double residual regression.^[6]

Geometric interpretation

With a linear regression of the form $y=X\beta +Z\delta +e$ , the fitted values $X\beta +Z\delta$ can be interpreted as the orthogonal projection of $y$ onto the column space of $[X~Z]$ , ${\text{Col}}(X,Z)$ .^[18] The Frisch-Waugh-Lovell theorem is then (in the double residual regression case) the three step process:

Project $X$ onto the orthogonal complement of ${\text{Col}}(Z)$ , obtaining residual vector ${\tilde {X}}$
Project $y$ onto the orthogonal complement of ${\text{Col}}(Z)$ , obtaining residual vector ${\tilde {y}}$
Project ${\tilde {y}}$ onto ${\text{Col}}({\tilde {X}})$ , obtaining projection ${\tilde {X}}\beta$ and residuals $e={\tilde {y}}-{\tilde {X}}\beta$

The resulting $\beta$ and residuals are identical to those in the full regression of $y$ on $X$ and $Z$ .^[19]^[20]

Proof

Consider the linear regression $y=X\beta +Z\delta +e$ and annihilator matrix $M_{Z}=I-Z(Z^{\prime }Z)^{-1}Z^{\prime }$ . Premultiplying both sides of the regression equation by the annihilator matrix removes from $y$ and $X$ the part linearly explained by $Z$ :

{\begin{aligned}M_{Z}y&=M_{Z}(X\beta +Z\delta +e)\\{\tilde {y}}&=M_{Z}X\beta +M_{Z}Z\delta +M_{Z}e\\{\tilde {y}}&={\tilde {X}}\beta +e\end{aligned}}

Then, by the least squares result, $\beta =({\tilde {X}}^{\prime }{\tilde {X}})^{-1}{\tilde {X}}^{\prime }{\tilde {y}}$ and $e={\tilde {y}}-{\tilde {X}}\beta$ . This concludes the proof.^[6]^[21]

History

Yule (1907)

In 1907, statistician Udny Yule introduced a new system of notation for, and derived a number of algebraic results of, least squares-estimated regression coefficients. Among his results was an early form of the Frisch-Waugh-Lovell theorem.^[22]^[23] In Yule's notation, where $x_{1\cdot 34\dots n}$ represents the residuals from the regression of $x_{1}$ on $x_{3}$ through $x_{n}$ , and $x_{2\cdot 34\dots n}$ the residuals from the regression of $x_{2}$ on $x_{3}$ through $x_{n}$ , he finds that the regression of $x_{1\cdot 34\dots n}$ on $x_{2\cdot 34\dots n}$ yields the coefficient for $x_{2}$ in the full regression of $x_{1}$ on $x_{2}$ through $x_{n}$ . He notes that this relationship holds "quite generally and without reference to the form of the [variables'] frequency distribution." ^[24] With this result, Yule defines the multiple regression coefficient $b_{12\cdot 34\dots n}$ – the coefficient on $x_{2}$ in the regression of $x_{1}$ on $x_{2}$ through $x_{n}$ – as the simple regression of $x_{1\cdot 34\dots n}$ on $x_{2\cdot 34\dots n}$ :

{\begin{aligned}b_{12\cdot 34\dots n}={\frac {\sum x_{1\cdot 34\dots n}x_{2\cdot 34\dots n}}{\sum x_{2\cdot 34\dots n}^{2}}}\end{aligned}}

Having related simple regression coefficients to multiple regression coefficients, Yule describes his result as filling a gap in the interpretation of least squares coefficients and partial correlations by showing that they reflect "an actual correlation between determinate variables."^[24]^[25]

Using Yule's notation, in a 1968 text econometrician Arthur Goldberger states the residual regression form of the Frisch-Waugh-Lovell theorem as $b_{1(2\cdot 3)}=b_{12\cdot 3}$ and the double residual regression form as $b_{(1\cdot 3)(2\cdot 3)}=b_{12\cdot 3}$ .^[26]

Frisch and Waugh (1933)

In the early 20th century, there was debate among economists over the correct approach to adjusting time series data used in regressions for the influence of trends. The two primary methods in question were the direct de-trending of each time series and the inclusion of a time trend in the regression. In 1933 and using the notation introduced by Yule, a paper in the first volume of Econometrica by econometricians Ragnar Frisch and Frederick V. Waugh proved the equivalence between the two methods.^[27]^[28]^[29]

Prior to Frisch and Waugh's result, much of the debate around the optimal time trend adjustment concerned estimates of static demand equations whose observations had been taken over time, a factor which economists sought to adjust for, in order to bring the statistical model closer in line with theory. Advocates of including time trends in regressions argued it improved the model's fit, where opponents argued that time trends may violate ceteris paribus assumptions of the underlying theoretical model.^[30] In proving the equivalence of the two methods, addressing the difference in model fit, and formalizing the distinction between estimated coefficients and theoretical models, Frisch and Waugh's paper resolved the debate around trend adjustments.^[31]

Economist and historian Mary S. Morgan contextualizes Frisch and Waugh's result, as it pertains to a greater understanding of regression coefficients, as having "paved the way for a more generous use of the other factors in the demand equation."^[32] Frisch and Waugh's results were, in 1952, extended by Gerhard Tintner to polynomial trend adjustment.^[33]^[34] In a 1953 textbook on demand analysis, econometrician Herman Wold references Frisch and Waugh's paper as a special case applied to time adjustments.^[35]

Generalization and later development

In 1963, econometrician Michael C. Lovell extended Frisch and Waugh's results and provided a general proof of the theorem in matrix notation.^[22]^[19]^[36] Rather than focusing on certain types of variables, as Frisch and Waugh did with time trends, Lovell proves the result with arbitrary sets of independent variables.^[37] Lovell presents 7 regression specifications and proves how their coefficients relate, among them both the residual and double residual forms of the theorem.^[38] Lovell published an additional proof in 2008 using only simple algebra.^[37]

In 1964, economist Richard Stone published a generalized proof of the theorem.^[39]

The Frisch-Waugh-Lovell theorem is included in most intermediate to advanced econometrics textbooks.^[19]

Naming

The theorem has been referred to under a number of names, including the Frisch-Waugh-Lovell theorem, Frisch-Waugh theorem, partitioned regression theorem, residual regression, and the regression anatomy theorem.^[19]^[15]

While Frisch and Waugh's paper was not the first introduction of the result, it was the first proof in econometrics.^[25]^[19] Recognizing their proof and the generalization by Lovell, the theorem was presented as the Frisch-Waugh-Lovell theorem in a 1993 econometrics textbook by Russell Davidson and James G. MacKinnon.^[19]

Extensions

Where the Frisch-Waugh-Lovell theorem states that the full and residual regressions have the same coefficients, relationships between the coefficients' standard errors can also be shown. Lovell's 1963 paper finds that the homoskedastic standard errors of coefficients in the double residual regression differ from those of the full regression by a degrees of freedom adjustment.^[40] In 2021, statistician Peng Ding presented a proof of Lovell's results and found comparable results for other estimates of standard errors, including heteroskedasticity-consistent and clustered standard errors.^[41]

Analogues to the Frisch-Waugh-Lovell theorem have been shown for a number of other estimators, including generalized least squares,^[42] ridge regression and the LASSO,^[43] and k-class estimators, including limited information maximum likelihood.^[44]

Notes

[a]
Pronounced /ˈfriʃˌwɔːˌlʌvəl/.