Laplace's approximation

Analytical expression in statistics From Wikipedia, the free encyclopedia

Laplace's approximation or the quadratic approximation (QUAP)[1] provides an analytical expression for a posterior probability distribution by fitting a Gaussian distribution with a mean equal to the MAP solution and precision equal to the observed Fisher information.[2][3] The approximation is justified by the Bernstein–von Mises theorem, which states that, under regularity conditions, the error of the approximation tends to 0 as the number of data points tends to infinity.[4][5]

For example, consider a regression or classification model with data set comprising inputs and outputs with (unknown) parameter vector of length . The likelihood is denoted and the parameter prior . Suppose one wants to approximate the joint density of outputs and parameters . Bayes' formula reads:

The joint is equal to the product of the likelihood and the prior and by Bayes' rule, equal to the product of the marginal likelihood and posterior . Seen as a function of the joint is an un-normalised density.

In Laplace's approximation, we approximate the joint by an un-normalised Gaussian , where we use to denote approximate density, for un-normalised density and the normalisation constant of (independent of ). Since the marginal likelihood doesn't depend on the parameter and the posterior normalises over we can immediately identify them with and of our approximation, respectively.

Laplace's approximation is

where we have defined

where is the location of a mode of the joint target density, also known as the maximum a posteriori or MAP point and is the positive definite matrix of second derivatives of the negative log joint target density at the mode . Thus, the Gaussian approximation matches the value and the log-curvature of the un-normalised target density at the mode. The value of is usually found using a gradient based method.

In summary, we have

for the approximate posterior over and the approximate log marginal likelihood respectively.

The main weaknesses of Laplace's approximation are that it is symmetric around the mode and that it is very local: the entire approximation is derived from properties at a single point of the target density. Laplace's method is widely used and was pioneered in the context of neural networks by David MacKay,[6] and for Gaussian processes by Williams and Barber.[7]

Integrated nested Laplace approximation

Integrated nested Laplace approximation (INLA) is a method for approximate Bayesian inference based on Laplace's approximation.[8] It is designed for a class of models called latent Gaussian models (LGMs), for which it can be a fast and accurate alternative for Markov chain Monte Carlo methods to compute posterior marginal distributions.[9][10][11] Due to its relative speed even with large data sets for certain problems and models, INLA has been a popular inference method in applied statistics, in particular spatial statistics, ecology, seismology, and epidemiology.[12][13][14] It is also possible to combine INLA with a finite element method solution of a stochastic partial differential equation to study e.g. spatial point processes and species distribution models.[15][16] The INLA method is implemented in the R-INLA R package.[17]

Latent Gaussian models

Let denote the response variable (that is, the observations) which belongs to an exponential family, with the mean (of ) being linked to a linear predictor via an appropriate link function. The linear predictor can take the form of a (Bayesian) additive model. All latent effects (the linear predictor, the intercept, coefficients of possible covariates, and so on) are collectively denoted by the vector . The hyperparameters of the model are denoted by . As per Bayesian statistics, and are random variables with prior distributions.

The observations are assumed to be conditionally independent given and : where is the set of indices for observed elements of (some elements may be unobserved, and for these INLA computes a posterior predictive distribution). Note that the linear predictor is part of .

For the model to be a latent Gaussian model, it is assumed that is a Gaussian Markov Random Field (GMRF)[8] (that is, a multivariate Gaussian with additional conditional independence properties) with probability density where is a -dependent sparse precision matrix and is its determinant. The precision matrix is sparse due to the GMRF assumption. The prior distribution for the hyperparameters need not be Gaussian. However, the number of hyperparameters, , is assumed to be small (say, less than 15).

Approximate Bayesian inference with INLA

In Bayesian inference, one wants to solve for the posterior distribution of the latent variables and . Applying Bayes' theorem the joint posterior distribution of and is given by Obtaining the exact posterior is generally a very difficult problem. In INLA, the main aim is to approximate the posterior marginals where .

A key idea of INLA is to construct nested approximations given by where is an approximated posterior density. The approximation to the marginal density is obtained in a nested fashion by first approximating and , and then numerically integrating out as where the summation is over the values of , with integration weights given by . The approximation of is computed by numerically integrating out from .

To get the approximate distribution , one can use the relation as the starting point. Then is obtained at a specific value of the hyperparameters with Laplace's approximation[8] where is the Gaussian approximation to whose mode at a given is . The mode can be found numerically for example with the Newton-Raphson method.

The trick in the Laplace approximation above is the fact that the Gaussian approximation is applied on the full conditional of in the denominator since it is usually close to a Gaussian due to the GMRF property of . Applying the approximation here improves the accuracy of the method, since the posterior itself need not be close to a Gaussian, and so the Gaussian approximation is not directly applied on . The second important property of a GMRF, the sparsity of the precision matrix , is required for efficient computation of for each value .[8]

Obtaining the approximate distribution is more involved, and the INLA method provides three options for this: Gaussian approximation, Laplace approximation, or the simplified Laplace approximation.[8] For the numerical integration to obtain , also three options are available: grid search, central composite design, or empirical Bayes.[8]

References

Further reading

Related Articles

Wikiwand AI