Linear predictor function
From Wikipedia, the free encyclopedia
In statistics and in machine learning, a linear predictor function is a linear function (linear combination) of a set of coefficients and explanatory variables (independent variables), whose value is used to predict the outcome of a dependent variable.[1] This sort of function usually comes in linear regression, where the coefficients are called regression coefficients. However, they also occur in various types of linear classifiers (e.g. logistic regression,[2] perceptrons,[3] support vector machines,[4] and linear discriminant analysis[5]), as well as in various other models, such as principal component analysis[6] and factor analysis. In many of these models, the coefficients are referred to as "weights".
Notations
The basic form of a linear predictor function for data point i (consisting of p explanatory variables), for i = 1, ..., n, is
where , for k = 1, ..., p, is the value of the k-th explanatory variable for data point i, and are the coefficients (regression coefficients, weights, etc.) indicating the relative effect of a particular explanatory variable on the outcome.
It is common to write the predictor function in a more compact form as follows:
- The coefficients β0, β1, ..., βp are grouped into a single vector β of size p + 1.
- For each data point i, an additional explanatory pseudo-variable xi0 is added, with a fixed value of 1, corresponding to the intercept coefficient β0.
- The resulting explanatory variables xi0(= 1), xi1, ..., xip are then grouped into a single vector xi of size p + 1.
Vector Notation
This makes it possible to write the linear predictor function as follows:
using the notation for a dot product between two vectors.
Matrix Notation
An equivalent form using matrix notation is as follows:
where and are assumed to be a (p+1)-by-1 column vectors, is the matrix transpose of (so is a 1-by-(p+1) row vector), and indicates matrix multiplication between the 1-by-(p+1) row vector and the (p+1)-by-1 column vector, producing a 1-by-1 matrix that is taken to be a scalar.
Linear regression
An example of the usage of a linear predictor function is in linear regression, where each data point is associated with a continuous outcome yi, and the relationship written
where is a disturbance term or error variable — an unobserved random variable that adds noise to the linear relationship between the dependent variable and predictor function.
Stacking
In some models (standard linear regression, in particular), the equations for each of the data points i = 1, ..., n are stacked together and written in vector form as
where
The matrix X is known as the design matrix and encodes all known information about the independent variables. The variables are random variables, which in standard linear regression are distributed according to a standard normal distribution; they express the influence of any unknown factors on the outcome.
This makes it possible to find optimal coefficients through the method of least squares using simple matrix operations. In particular, the optimal coefficients as estimated by least squares can be written as follows:
The matrix is known as the Moore–Penrose pseudoinverse of X. The use of the matrix inverse in this formula requires that X is of full rank, i.e. there is not perfect multicollinearity among different explanatory variables (i.e. no explanatory variable can be perfectly predicted from the others). In such cases, the singular value decomposition can be used to compute the pseudoinverse.