Neyman Type A distribution
From Wikipedia, the free encyclopedia
|
Probability mass function The horizontal axis is the index x, the number of occurrences. The vertical axis we have the probability that the v.a. take the value of x | |||
|
Cumulative distribution function The horizontal axis is the index x, the number of occurrences. The vertical axis we have the acumulative sum of probabilityes from to | |||
| Notation | |||
|---|---|---|---|
| Parameters | |||
| Support | x ∈ { 0, 1, 2, ... } | ||
| PMF | |||
| CDF | |||
| Mean | |||
| Variance | |||
| Skewness | |||
| Excess kurtosis | |||
| MGF | |||
| CF | |||
| PGF | |||
In statistics and probability, the Neyman Type A distribution is a discrete probability distribution from the family of Compound Poisson distribution. First of all, to easily understand this distribution we will demonstrate it with the following example explained in Univariate Discret Distributions;[1] we have a statistical model of the distribution of larvae in a unit area of field (in a unit of habitat) by assuming that the variation in the number of clusters of eggs per unit area (per unit of habitat) could be represented by a Poisson distribution with parameter , while the number of larvae developing per cluster of eggs are assumed to have independent Poisson distribution all with the same parameter . If we want to know how many larvae there are, we define a random variable Y as the sum of the number of larvae hatched in each group (given j groups). Therefore, Y = X1 + X2 + ... X j, where X1,...,Xj are independent Poisson variables with parameter and .
Jerzy Neyman was born in Russia in April 16 of 1894, he was a Polish statistician who spent the first part of his career in Europe. In 1939 he developed the Neyman Type A distribution [1] to describe the distribution of larvae in experimental field plots. Above all, it is used to describe populations based on contagion, e.g., entomology (Beall[1940],[2] Evans[1953][3]), accidents (Creswell i Froggatt [1963]),[4] and bacteriology.
The original derivation of this distribution was on the basis of a biological model and, presumably, it was expected that a good fit to the data would justify the hypothesized model. However, it is now known that it is possible to derive this distribution from different models (William Feller[1943]),[5] and in view of this, Neyman's distribution derive as Compound Poisson distribution. This interpretation makes them suitable for modelling heterogeneous populations and renders them examples of apparent contagion.
Despite this, the difficulties in dealing with Neyman's Type A arise from the fact that its expressions for probabilities are highly complex. Even estimations of parameters through efficient methods, such as maximum likelihood, are tedious and not easy to understand equations.
Definition
Probability generating function
The probability generating function (pgf) G1(z), which creates N independent Xj random variables, is used to a branching process. Each Xj produces a random number of individuals, where X1, X2,... have the same distribution as X, which is that of X with pgf G2(z). The total number of individuals is then the random variable,[1]
The p.g.f. of the distribution of SN is :
One of the notations, which is particularly helpful, allows us to use a symbolic representation to refer to an F1 distribution that has been generalized by an F2 distribution is,
In this instance, it is written as,
Finally, the probability generating function is,
From the generating function of probabilities we can calculate the probability mass function explained below.
Probability mass function
Let X1,X2,...Xj be Poisson independent variables. The probability distribution of the random variable Y = X1 +X2+...Xj is the Neyman's Type A distribution with parameters and .
Alternatively,
In order to see how the previous expression develops, we must bear in mind that the probability mass function is calculated from the probability generating function, and use the property of Stirling Numbers. Let's see the development
Another form to estimate the probabilities is with recurring successions,[6]
- ,
Although its length varies directly with n, this recurrence relation is only employed for numerical computation and is particularly useful for computer applications.
where
- x = 0, 1, 2, ... , except for probabilities of recurring successions, where x = 1, 2, 3, ...
- , .
- x! and j! are the factorials of x and j, respectively.
- one of the properties of Stirling numbers of the second kind is as follows:[7]
Notation
Properties
Moment and cumulant generating functions
The moment generating function of a random variable X is defined as the expected value of et, as a function of the real parameter t. For an , the moment generating function exists and is equal to
The cumulant generating function is the logarithm of the moment generating function and is equal to [1]
In the following table we can see the moments of the order from 1 to 4
| Order | Moment | Cumulant |
|---|---|---|
| 1 | ||
| 2 | ||
| 3 | ||
| 4 |
Skewness
The skewness is the third moment centered around the mean divided by the 3/2 power of the standard deviation, and for the distribution is,
Kurtosis
The kurtosis is the fourth moment centered around the mean, divided by the square of the variance, and for the distribution is,
The excess kurtosis is just a correction to make the kurtosis of the normal distribution equal to zero, and it is the following,
- Always , or the distribution has a high acute peak around the mean and fatter tails.
Characteristic function
In a discrete distribution the characteristic function of any real-valued random variable is defined as the expected value of , where i is the imaginary unit and t ∈ R
This function is related to the moment generating function via . Hence for this distribution the characteristic function is,
- Note that the symbol is used to represent the characteristic function.
Cumulative distribution function
The cumulative distribution function is,
Other properties
- The index of dispersion is a normalized measure of the dispersion of a probability distribution. It is defined as the ratio of the variance to the mean ,[8]
- From a sample of size N, where each random variable Yi comes from a , where Y1, Y2, .., Yn are independent. This gives the MLE estimator as,[9]
- where is the poblational mean of
- Between the two earlier expressions We are able to parametrize using and ,
Parameter estimation
Method of moments
The mean and the variance of the NA() are and , respectively. So we have these two equations,[10]
- and are the mostral variance and mean respectively.
Solving these two equations we get the moment estimators and of and .
Maximum likelihood
Calculating the maximum likelihood estimator of and involves multiplying all the probabilities in the probability mass function to obtain the expression .
When we apply the parameterization adjustment defined in "Other Properties," we get . We may define the Maximum likelihood estimation based on a single parameter if we estimate the as the (sample mean) given a sample X of size N. We can see it below.
- To estimate the probabilities, we will use the p.m.f. of recurring successions, so that the calculation is less complex.
Testing Poisson assumption
When is used to simulate a data sample it is important to see if the Poisson distribution fits the data well. For this, the following Hypothesis test is used:
Likelihood-ratio test
The likelihood-ratio test statistic for is,
Where likelihood is the log-likelihood function. W does not have an asymptotic distribution as expected under the null hypothesis since d = 1 is at the parameter domain's edge. In the asymptotic distribution of W, it can be demonstrated that the constant 0 and have a 50:50 mixture. For this mixture, the upper-tail percentage points are the same as the upper-tail percentage points for a