Correlation ratio

From Wikipedia, the free encyclopedia

In statistics, the correlation ratio is a measure of the curvilinear association between the statistical dispersion within individual categories and the dispersion across the whole population or sample. The measure is defined as the ratio of two standard deviations representing these types of variation. The context here is the same as that of the intraclass correlation coefficient, whose value is the square of the correlation ratio.

Suppose each observation is yxi where x indicates the category that observation is in and i is the label of the particular observation. Let nx be the number of observations in category x and

and

where is the mean of the category x and is the mean of the whole population. The correlation ratio η (eta) is defined as to satisfy

which can be written as

i.e. the weighted variance of the category means divided by the variance of all samples.

If the relationship between values of and values of is linear (which is certainly true when there are only two possibilities for x) this will give the same result as the square of Pearson's correlation coefficient; otherwise the correlation ratio will be larger in magnitude. It can therefore be used for judging non-linear relationships.

Range

The correlation ratio takes values between 0 and 1. The limit represents the special case of no dispersion among the means of the different categories, while refers to no dispersion within the respective categories. is undefined when all data points of the complete population take the same value.

Example

Pearson vs. Fisher

References

Related Articles

Wikiwand AI