Reference class problem

From Wikipedia, the free encyclopedia

In statistics, the reference class problem is the problem of deciding what class to use when calculating the probability applicable to a particular case.

For example, to estimate the probability of an aircraft crashing, we could refer to the frequency of crashes among various different sets of aircraft: all aircraft, this make of aircraft, aircraft flown by this company in the last ten years, etc. In this example, the aircraft for which we wish to calculate the probability of a crash is a member of many different classes, in which the frequency of crashes differs. It is not obvious which class we should refer to for this aircraft. In general, any case is a member of very many classes among which the frequency of the attribute of interest differs. The reference class problem discusses which class is the most appropriate to use.

More formally, many arguments in statistics take the form of a statistical syllogism:

  1. proportion of are
  2. is an
  3. Therefore, the chance that is a is

is called the "reference class" and is the "attribute class" and is the individual object. How is one to choose an appropriate class ?

In Bayesian statistics, the problem arises as that of deciding on a prior probability for the outcome in question (or when considering multiple outcomes, a prior probability distribution).

John Venn stated in 1876 that "every single thing or event has an indefinite number of properties or attributes observable in it, and might therefore be considered as belonging to an indefinite number of different classes of things", leading to problems with how to assign probabilities to a single case. He used as an example the probability that John Smith, a consumptive Englishman aged fifty, will live to sixty-one.[1]

The name "problem of the reference class" was given by Hans Reichenbach, who wrote, "If we are asked to find the probability holding for an individual future event, we must first incorporate the event into a suitable reference class. An individual thing or event may be incorporated in many reference classes, from which different probabilities will result."[2]

There has also been discussion of the reference class problem in philosophy[3] and in the life sciences, e.g., clinical trial prediction.[4]

In the book Anthropic Bias, philosopher Nick Bostrom described ways in which reference classes can be applied to reasoning about one's position in reality. Bostrom investigates how to reason when one suspects that evidence is biased by "observation selection effects", in other words, when the evidence presented has been pre-filtered by the condition that there was some appropriately positioned observer to "receive" the evidence.[5][6] This conundrum is sometimes called the "anthropic principle", "self-locating belief", or "indexical information". The book first discusses the fine-tuned universe hypothesis and its possible explanations, notably considering the possibility of a multiverse.

Bostrom argues against the self-indication assumption (SIA), a term he uses to characterize some existing views, and introduces the self-sampling assumption (SSA): that you should think of yourself as if you were a random observer from a suitable reference class. He later refines SSA into using observer-moments instead of observers to address certain paradoxes in anthropic reasoning, formalized as the strong self-sampling assumption (SSSA): Each observer-moment should reason as if it were randomly selected from the class of all observer-moments in its reference class.[7] These different assumptions are affected differently based on the choice of reference class. An application of the principle underlying SSSA (though this application is nowhere expressly articulated by Bostrom), is: If the minute in which you read this article is randomly selected from every minute in every human's lifespan, then (with 95% confidence) this event has occurred after the first 5% of human observer-moments. If the mean lifespan in the future is twice the historic mean lifespan, this implies 95% confidence that N < 10n (the average future human will account for twice the observer-moments of the average historic human). Therefore, the 95th percentile extinction-time estimate in this version is 4560 years.

See also

References

Related Articles

Wikiwand AI