Talk:Likelihood function/Archive 1

From Wikipedia, the free encyclopedia

Archive 1

Unclassified comments

Article does not communicate well. A relatively simple matter turns out to be difficult to understand. —Preceding unsigned comment added by 80.212.104.206 (talk) 13:34, 18 September 2010 (UTC)


I adjusted the wiktionary entry so it doesn't say that the mathematical definition is 'likelihood = probability'. Someone more mathematical than I may want to check to see if the mathematical definition I gave is correct. I defined "likelihood" in the parameterized-model sense, because that is the only way in which I have ever seen it used (i.e., not in the more abstract Pr(A | B=b) sense currently given in the Wikipedia article). 128.231.132.2 03:06, 21 March 2007 (UTC)


This article needs integrating / refactoring with the other two on the likelihood principle and maximum likelihood method, and a good going-over by someone expert in the field. -- The Anome


I emphatically agree. I've rewritten some related articles and I may get to this one if I ever have time. -- Mike Hardy

All was going well until I hit

In statistics, a likelihood function is a conditional probability function considered as a function of its second argument with its first argument held fixed, thus:

Would it be possible for someone to elaborate on that sentence of to given an example? FarrelIThink 06:12, 21 February 2007 (UTC)



I found the very first sentence under the "Definition" section very confusing:

The likelihood of a set of parameter values, θ, given outcomes x, is equal to the probability of those observed outcomes given those parameter values.

This is not true in the continuous case, as described by the article itself a few sentences later. I think the whole thing would be much clearer if the first sentence were omitted and it simply said "The likelihood function is defined differently for discrete and continuous probability distributions". I'm currently a student of this topic and I had quickly read the first sentence under Definition (and only that sentence), ended up greatly confused, and only later came back to read the rest of the section to clarify things. --nonagonal  Preceding unsigned comment added by Nonagonal (talkcontribs) 19:59, 8 October 2015 (UTC)

The arrow

Can someone tell me what the arrow notation is suppose to mean? --Huggie (talk) 11:30, 3 April 2010 (UTC)

I can't figure this out either. Did you ever find an answer? Jackmjackm (talk) 20:00, 23 September 2023 (UTC)
The arrows indicate mapping, so is saying that for each parameter value , we can define a function giving the probability (or probability density) of the data given . I.e., we map to a function . I hope this is somewhat helpful and not too circular! So many suspicious toenails (talk) 16:16, 26 September 2023 (UTC)

Context tag

I added the context tag because the article starts throwing mathematical functions and jargon around from the very beginning with no explanation of what the letters and symbols mean. Rompe 04:40, 15 July 2006 (UTC)

The tag proposes making it more accessible to a general audience. A vernacular usage makes likelihood synonymous with probability, but that is not what is meant here. I doubt this topic can be made readily comprehensible to those not familiar at the very least with probability theory. So I question the appropriateness of the "context" tag. The article starts with the words "In statistics,...". That's enough to tell the general reader that it's not about criminology, church decoration, sports tactics, chemistry, fiction writing, etc. If not such preceeding words were there, I'd agree with the "context" tag. Michael Hardy 23:55, 16 July 2006 (UTC)

Which came first

Which came first? the common use as in "in all likelihood this will not occur" or the mathematical function?

See History of probability. "Probable and probability and their cognates in other modern languages derive from medieval learned Latin probabilis ... . The mathematical sense of the term is from 1718. ... The English adjective likely is of Germanic origin, most likely from Old Norse likligr (Old English had geliclic with the same sense), originally meaning "having the appearance of being strong or able" "having the similar appearance or qualities, with a meaning of "probably" recorded from the late 14th century. Similarly, the derived noun likelihood had a meaning of "similarity, resemblance" but took on a meaning of "probability" from the mid 15th century." Mathematical formalizations of probability came later, starting primarily around roughly 1600. Ronald Fisher is credited with popularizing "likelihood" in its modern sense beginning around 1912, according to the Wikipedia article on him. DavidMCEddy (talk) 15:47, 21 March 2016 (UTC)

Backwards

An earlier version of this page said "In a sense, likelihood works backwards from probability: given B, we use the conditional probability Pr(A|B) to reason about A, and, given A, we use the likelihood function L(A|B) to reason about B. ". This makes sense; i.e. it says it's backwards, and it is.

The current version uses L(B|A) instead, i.e. it says: "In a sense, likelihood works backwards from probability: given B, we use the conditional probability Pr(A|B) to reason about A, and, given A, we use the likelihood function L(B|A) to reason about B. " This does not make sense. It says it's backwards, but it talks as if Pr and L are interchangeable.

How about switching back to the earlier version, and providing a concrete example to help clarify it? Possible example: Given that a die is fair, we use the probability of getting 10 sixes in a row given that the die is fair to reason about getting 10 sixes in a row; or given that we got 10 sixes in a row, we use the likelihood of getting 10 sixes in a row given that the die is fair to reason about whether the die is fair. (Or should it say "the likelihood that the die is fair given that 10 sixes occur in a row"? What exactly is the definition of "likelihood" used in this sort of verbal context, anyway?) --Coppertwig 20:28, 24 August 2007 (UTC)

I agree. and similarly, in the "abstract", currently the last sentence ends in "...and indicates how likely a parameter value is in light of the observed outcome." I do not know if it is ok to use the word "likely" in this way. Clearly, replacing it with "probable" in this sentence would make it terribly wrong by committing the common reversal-of-conditional-probabilities mistake. Therefore: is "likely" clearly distinct (and understood) from probable? Anyways I would suggest to rewrite and say "... and indicates how likely the observed outcome is to occur for different parameter values." Or am I missing something here? Enlightenmentreloaded (talk) 10:01, 28 October 2011 (UTC)

And the preamble has the variable and the parameter confused in "Equivalently, the likelihood may be written to emphasize that it is the probability of observing sample given ,..." 109.250.93.11 (talk) 16:32, 17 November 2022 (UTC)

Likelihood of continuous distributions is a problem

The contribution looks attractive; however, it ignores several basic mathematical facts:

1. Usually likelihood is assessed using not one realization, but a series of observed random variables (independently identically distributed). Then the likelihood expands to a large product. Usually this is transformed by a logarithm to a sum. This transformation is not linear (like that mentioned in the entry), but it attains its maximum at the same point.

2. Likelihood can easily be defined for discrete distributions, where its values are values of some probabilities. A problem arises with an analogue for continuous distributions. Then the probability density function (pdf) is used instead of probability (probability function, pf). This is incorrect unless we use additional assumptions, e.g., continuity of the pdf. Without it, the notion of likelihood does not make sense, although this error occurs in most textbooks. (Do you know any which makes this correct? I did not find any, I did it in my textbook.) In any case, there are two totally different and incomaparable notions of likelihood, one for discerte, the other for continuous distributions. As a consequence, there is no notion of likelihood applicable to mixed distributions. (Nevertheless, the maximum likelihood method can be applied separately to the discrete and continuous parts.)

Mirko Navara, http://cmp.felk.cvut.cz/~navara —Preceding unsigned comment added by 88.146.54.129 (talk) 08:16, 22 February 2008 (UTC)

Just to clarify, by "the contribution" are you referring to the whole article or a particular section or edit? I assume the former.
On (1), well, the log-likelihood isn't mentioned in this article but clearly it isn't itself a likelihood. The invariance of maximum likelihood estimates to transformation is surely a matter not for this article but for the one on maximum likelihood. (I haven't checked that article to see what it says on the topic, if anything).
On (2), I think you've got a point that this article lacks a rigorous definition. I think the more accessible definition is needed too and should be given first. If you want to add a more rigorous definition, go ahead. I'm sure i've seen a measure-theoretic definition somewhere but I'm afraid i've never got to grips with measure theory myself.
When you say "I did it in my textbook", is that Teorie Pravděpodobnosti Na Kvantových a Fuzzy Logikách? I'm afraid i can't locate a copy to consult. Qwfp (talk) 09:34, 22 February 2008 (UTC)
The "problem" between definitions of likelihood for discrete and continuous distributions is resolved by using Measure-theoretic probability theory. This generality comes with the substantial cost of learning measure theory. Fortunately, is unnecessary for many applications. It is, nevertheless, useful for many purposes -- one of which is understanding the commonality of the treatment between discrete, absolutely continuous and other distributions. I just added an "In general" section to explain this: A discrete probability mass function is the probability density function for that distribution with respect to the counting measure on the set of all possible discrete outcomes. For absolutely continuous distributions, the standard density function is the density (Radon-Nikodym derivative) with respect to the Lebesgue measure. I hope this adds more clarity than confusion. DavidMCEddy (talk) 16:19, 21 March 2016 (UTC)
Regarding different definitions for discrete and continuous distributions: this is a mathematical point, not a conceptual point, and should be discussed further down in the article, but not in its introduction, I think. Can we use a small volume element dx in measurement space, and consider 'p(x|theta)dx' instead of 'p(x|theta)' for the continuous case, at least in the introduction? Benjamin.friedrich (talk) —Preceding undated comment added 20:35, 14 May 2020 (UTC)

Area under the curve

I'm confused about this statement:

"...the integral of a likelihood function is not in general 1. In this example, the integral of the likelihood density over the interval [0, 1] in pH is 1/3, demonstrating again that the likelihood density function cannot be interpreted as a probability density function for pH."

Because the likelihood function is defined up to a scalar, the fact that the integral is 1/3 isn't that meaningful. However, I think we could say that one possibility is twice as likely as another or similarly that the likelihood of being in the range [a,b] is six times as likely as being in the disjoint range [c,d]. Given that can't be less than 0 or more than 1, it seems sensible to normalize the likelihood so that the integral over that range is 1. I think that we could then say that if

then there's a 50/50 chance of being in the range [a,b] which would correspond to a normalized likelihood of 0.5. Am I mistaken? Why can't we just normalize to 1.0 and then interpret the normalized likelihood function as a probability density function? —Ben FrantzDale (talk) 17:17, 14 August 2008 (UTC)

"Why can't we just normalize to 1.0"?. There are several reasons. One is that the integral in general doesn't exist (isn't finite). If an appropriate weighting function can be found, then the scaled function becomes something else, with its own interpretation, which would move us away from "likelihood function". However, certain theoretical work has been done which makes use of a different scaling ... scaling by a factor to make the maximum of the scaled likelihood equal to one. Melcombe (talk) 08:45, 15 August 2008 (UTC)
Interesting. Can you give an example of when that integral wouldn't be finite? (This question may be getting at the heart of the difference between "likelihood" and "probability" -- a difference which I don't yet fully understand. —Ben FrantzDale (talk) 12:38, 15 August 2008 (UTC)
An example might be the case where an observation X is from a uniform distribution on (0,a) with a>0. The likelihood function is 1/a for a > (observed X) : so not integrable. A simple change of parameterisation to b=1/a gives a likelihood which is integrable. Melcombe (talk) 13:25, 15 August 2008 (UTC)
Don't forget the simplest case of all: uniform support! Not possible to normalize in this case. Robinh (talk) 14:47, 15 August 2008 (UTC)

It doesn't make sense to speak of a "likelihood density function". Likelihoods are not densities. Density functions are not defined pointwise. One can convolve them, but not multipliy them. Likelihoods are defined pointwise. One can multiply them but not convolve them. One can multiply a likelihood by a density and get another density (although not in general a probability density, until one normalizes). Michael Hardy (talk) 16:00, 15 August 2008 (UTC)

I'm deleting that entire paragraph beginning with, "The likelihood function is not a probability ... ." I agree it's confusing, and I don't see that it adds anything.
The issues raised by a discussion of "the integral of a likelihood function" could be answered clearly with a sensible discussion of likelihood in Bayesian inference. I don't know if I'll find the time to write such a section myself, but it would make a useful addition to this article. DavidMCEddy (talk) 16:37, 21 March 2016 (UTC)

Needs a simpler introduction?

I believe it is a good habit for mathematical articles on Wikipedia, to start with a simple heuristical explanation of the concept, before diving into details and formalism. In this case I think it should be made clearer that the likelihood is simply the pdf regarded as a function of the parameter rather than of the data.

Perhaps the fact that while the pdf is a deterministic function, the likelihood is considered a random function, should also be adressed. Thomas Tvileren (talk) 07:30, 17 April 2009 (UTC)

What is the scaling factor alpha in the introduction good for? If that's for the purpose of simplification of the maximum likelihood method then (a) it is totally misplaced comment and (b) you could put there any strictly increasing function, not just scaling by a constant. --David Pal (talk) 01:35, 1 March 2011 (UTC)

Median

For a bernoulli trial, is there a significant meaning for the median of the likelihood function? —Preceding unsigned comment added by Fulldecent (talkcontribs) 16:30, 13 August 2009 (UTC)

The Bernoulli trial has a probability distribution function fP defined by fP(0) = 1P and fP(1) = P. This means that the likelihood function is Lx defined by L0(P) = 1P and L1(P) = P for 0P1. For x=0 the maximum likelihood estimate of P is 0; the median is 11/2 = 0.29; and the mean value is 1/3=0.33. For x=1 the maximum likelihood estimate of P is 1; the median is 1/2 = 0.71; and the mean value is 2/3=0.67. These are point estimates for P. Some likelihood functions have a well defined maximum likelihood value but no median. Other likelihood functions have median but no mean value. See for example the German tank problem#Likelihood function. Bo Jacoby (talk) 22:27, 3 September 2009 (UTC).

The above is wrong.

  • First a minor point. The term "probability distribution function usually means cumulative distribution function.
  • What sense can it make to call the number proposed above the "median" of the likelihood function? That would be the answer if one treated the function as a probability density function, but that makes sense only if we assume a uniform measure on the line, in effect a prior, so the proposed median is actually the median of the posterior probability distribution, assuming a uniform prior. It's not a median of the likelihood function. If we assumed a different prior, we'd get a different median with the SAME likelihood function. Similar comments apply to the mean. There's no such thing as the mean or the median of a likelihood function. Michael Hardy (talk) 00:02, 4 September 2009 (UTC)

Comment to Michael:

  • The article on probability distribution function allows for the interpretation as probability density function.
  • The uniform prior likelihood function, f(P)=1 for 0P1, expresses prior ignorance of the actual value of P. A different prior likelihood function expresses some knowledge of the actual value of P, and no such knowledge is provided. It is correct that assuming a uniform prior distribution makes the likelihood function define a posterior distribution, in which the mode, median, mean value, standard deviation etc, are defined.

Your main objection seems to be that tacitly assuming a uniform prior distribution is unjustified. Consider the (bernoulli) process of sampling from an infinite population as a limiting case of the (hypergeometric) process of sampling from a finite population. The J expression

  udaf=.!/&(i.@>:) * !/&(- i.@>:)

computes odds of the hypergeometric distribution.

The program call

  1 udaf 10
10 9 8 7 6 5 4 3 2 1  0
 0 1 2 3 4 5 6 7 8 9 10

computes the odds when you pick 1 pebble from a population of 10 red and white pebbles. The 11 columns are odds for getting 0 or 1 red pebble, when the number of red pebbels in the population is 0 through 10. The 2 rows are likelihoods for the population containing 0 through 10 red pebbles given that the sample contained 0 or 1 red pebble. The top row shows that 0 red pebbles in the population has the maximum likelihood (= 10). A median is about 2.5 red pebbles = 25% of the population. (10+9+8 = 27 < 27.5 < 28 = 7+6+5+4+3+2+1+0). The mean value is 30% and the standard deviation is 24%.

The prior likelihood function is (of course)

  0 udaf 10
1 1 1 1 1 1 1 1 1 1 1

expressing prior ignorance regarding the number of red pebbles in the population. The maximum likelihood value is undefined; the median and the mean are both equal to 50% of the population, and the standard deviation is 32% of the population.

In the limiting case where the number of pebbles in the population is large, you get (unnormalized) binomial distributions in the columns and (unnormalized) beta distributions in the rows.

  5 udaf 16
4368 3003 2002 1287  792  462  252  126   56   21    6    1    0    0    0    0    0
   0 1365 2002 2145 1980 1650 1260  882  560  315  150   55   12    0    0    0    0
   0    0  364  858 1320 1650 1800 1764 1568 1260  900  550  264   78    0    0    0
   0    0    0   78  264  550  900 1260 1568 1764 1800 1650 1320  858  364    0    0
   0    0    0    0   12   55  150  315  560  882 1260 1650 1980 2145 2002 1365    0
   0    0    0    0    0    1    6   21   56  126  252  462  792 1287 2002 3003 4368

Study the finite case first, and the infinite case as a limit of the finite case, rather than to begin with the infinite case where a prior distribution is problematic. It is dangerous to assume that lim(f(x))=f(lim(x)). Bo Jacoby (talk) 10:00, 4 September 2009 (UTC).

graph

The likelihood function for estimating the probability of a coin landing heads-up without prior knowledge after observing HHT

How was this graph generated? Is there a closed form for this calculation? Is there a closed form for given # of H and # of T ? —Preceding unsigned comment added by Fulldecent (talkcontribs) 17:46, 13 August 2009 (UTC)

The expression
is for fixed n,p a binomial distribution function of i, (i=0,..,n), and for fixed n,i a continuous (unnormalized) beta distribution of p, (0≤p≤1). So the graph is simply
Bo Jacoby (talk) 12:33, 20 August 2009 (UTC).
Isn't the correct formula
given that the binomial coefficient 3 choose 2 evaluates to 3? Implementing this correctly scales the probabilities on the y-axis.
Littlejohn.farmer (talk) 17:01, 13 February 2023 (UTC)

Probability of causes and not probability of effects?

The definition given here is the opposite that given by D'Agostini, Bayesian Reasoning in Data Analysis (2003). From pp. 34-35: "The possible values which may be observed are classified in belief by . This function is traditionally called `likelihood' and summarizes all previous knowledge on that kind of measurement..." In other words, it is the probability of an effect given a parameter (cause) . The definition given in this entry, proportional to the probability of a cause given the effect ( seems more useful, as the concept is more important, but is it possible that there is more than one definition in use in the literature? LiamH (talk) 02:10, 4 October 2009 (UTC)

putting x and theta in bold

since P(x|theta) is describing sets of data points (as if a vector), shouldn't it be put in bold?

theta represents a vector (or set) of parameters, and x represents a vector of data points from a sample.

I might be wrong about this, thought it would be worth mentioning

SuperChocolate (talk) 14:49, 18 September 2014 (UTC)

Discussion

It is confusing to have several different definitions that are approximately the same. We first use P(x|\theta) then p_theta(x) then f_theta(x). Then we have two separate discussions on the page about continuous vs. discrete. Can we just define the likelihood for the discrete case and then refer to Likelihood_function#Likelihood_function_of_a_parameterized_model for the continuous case?

It's noted in several places that the likelihood is defined up to a multiplicative constant, is there a reason we don't define it that way?

Finally, there doesn't seem to be uniform notation on the page can we remedy that?

User:SolidPhase what do you think? Prax54 (talk) 20:40, 28 January 2015 (UTC)

On the points you raise, I think that the article needs substantial revisions. Regarding the definition of likelihood for a continuous distribution, the article previously included more on this, but it looked to me to be in error; so I deleted some. See my edit and especially the explanation's link, which cites Burnham & Anderson.
Confusion seems to have come about for historical reasons. Originally, likelihood was used to compare different parameters of the same model: there, the constant is irrelevant. Now, likelihood is used to compare different models (see Likelihood function#Relative likelihood of models): here, the constant is relevant.
SolidPhase (talk) 13:29, 29 January 2015 (UTC)
Thanks for the response. I am not sure where to start in improving this article. Any suggestions are welcome. Prax54 (talk) 11:31, 21 May 2015 (UTC)

In general Likelihoods with respect to a dominating measure

I wish to thank Podgorec for attempting to clarify this section by inserting, "with all distributions being absolutely continuous with respect to a common measure" before "whether discrete, absolutely continuous, a mixture or something else." I've reworded this addition and placed it in a parenthetical comment at the end of the sentence. I've done this to make that section more accessible to people unfamiliar with measure-theoretic probability -- without eliminating the mathematical rigor.

If this is not adequate, I fear we will need to cite a modern text on measure-theoretic probability theory. My knowledge of this subject dates from the late 1970s and early 1980s. I think my memory of that material is is still adequate for this, but the standard treatment of the subject may have changed -- and I no longer have instant access to a text on the subject to cite now. (It would also be good to mention likelihood in the Wikipedia article on Radon-Nikodym theorem, to help explain one important use, but I won't attempt that right now.) DavidMCEddy (talk)

Definition

User:Gitchygoomy changed the definition of likelihood to read, 'The likelihood of a set of parameter values, θ, given outcomes x, is assumed to be equal to the probability of those observed outcomes given those parameter values', from 'The "likelihood ... given outcomes x, is equal to ... .": This is incorrect. I will edit this to read as follows:

'The likelihood of a set of parameter values, θ, given outcomes x, is equal to the probability assumed for those observed outcomes given those parameter values'.

Clearly, something is assumed: The assumption is about the probability itself, not the formal identity of the likelihood to it.

I hope this will address User:Gitchygoomy's concerns with the previous definition. DavidMCEddy (talk) 20:55, 6 February 2017 (UTC)

Thanks for the change but I don't think it's quite fair. If the term is supposed to help draw an inference about an actual probability, then a reference to an assumed probability changes its significance entirely. How can you say that the likelihood is equal to an assumed probability, which is unconstrained?  Preceding unsigned comment added by Gitchygoomy (talkcontribs) 21:55, 6 February 2017 (UTC)
I think it should further be reworded as follows:
'The likelihood of a parameter values (or vector of parameter values), θ, given outcomes x, is equal to the probability (density) assumed for those observed outcomes given those parameter values'. (I will change the definition to this.)
This may not address User:Gitchygoomy's concern, which I do not yet understand.
For discrete probabilities, the probability of any specific outcome is precisely the probability density for that possible outcome. More precisely, this probability density is the Radon-Nikodym derivative of the probability with respect to the counting measure, which is the standard dominating measure for discrete probabilities. (See the discussion of likelihood with measure-theoretic probabilities in the main article.)
More generally, probabilities are always between 0 and 1. This means that probability densities (with respect to non-negative dominating measures) are non-negative but also possibly unbounded.
To User:Gitchygoomy: If this does not address your concern, might you be able to provide a more concrete example? Thanks, DavidMCEddy (talk) 00:03, 7 February 2017 (UTC)

"Historical remarks" section

This section seems to trace the etymology and usage of the word "likelihood", which generally seems irrelevant to the specific concept of likelihood functions. Recommend heavily truncating or removing this section.  Preceding unsigned comment added by Denziloe (talkcontribs) 09:55, 7 September 2017 (UTC)

Yes, I much agree. The Wikipedia article is about the likelihood function in mathematical statistics. That function is not what Peirce was referring to in his papers, when he discussed likelihood. Nor is a detailed etymology of the word relevant to the function. I have removed most of the section, and added an additional citation for Fisher.  BetterMath (talk) 08:20, 9 November 2017 (UTC)

Hello fellow Wikipedians,

I have just modified one external link on Likelihood function. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 06:24, 23 December 2017 (UTC)

Wording of lead

Why deleting "A more detailed discussion of history of likelihood ..."?

Inverse logic

New section on integrability

When is a conditional probability not a conditional probability?

parameter(s) singular or plural?

A good lead

Properties of the likelihood function for MLE

Related Articles

Wikiwand AI