Correlation ratio
From Wikipedia, the free encyclopedia
In statistics, the correlation ratio is a measure of the curvilinear association between the statistical dispersion within individual categories and the dispersion across the whole population or sample. The measure is defined as the ratio of two standard deviations representing these types of variation. The context here is the same as that of the intraclass correlation coefficient, whose value is the square of the correlation ratio.
Suppose each observation is yxi where x indicates the category that observation is in and i is the label of the particular observation. Let nx be the number of observations in category x and
- and
where is the mean of the category x and is the mean of the whole population. The correlation ratio η (eta) is defined as to satisfy
which can be written as
i.e. the weighted variance of the category means divided by the variance of all samples.
If the relationship between values of and values of is linear (which is certainly true when there are only two possibilities for x) this will give the same result as the square of Pearson's correlation coefficient; otherwise the correlation ratio will be larger in magnitude. It can therefore be used for judging non-linear relationships.
Range
The correlation ratio takes values between 0 and 1. The limit represents the special case of no dispersion among the means of the different categories, while refers to no dispersion within the respective categories. is undefined when all data points of the complete population take the same value.