Draft:Random Group Formation Distribution
Probability distribution
From Wikipedia, the free encyclopedia
In probability theory and statistics, the Random Group Formation distribution or RGF distribution is a heavy-tailed distribution and fat-tailed distribution. It is the distribution of the number of individuals in each group, based on N individuals being put into M groups.[1]
Submission declined on 18 November 2025 by Somepinkdude (talk).
Where to get help
How to improve a draft
You can also browse Wikipedia:Featured articles and Wikipedia:Good articles to find examples of Wikipedia's best writing on topics similar to your proposed article. Improving your odds of a speedy review To improve your odds of a faster review, tag your draft with relevant WikiProject tags using the button below. This will let reviewers know a new draft has been submitted in their area of interest. For instance, if you wrote about a female astronomer, you would want to add the Biography, Astronomy, and Women scientists tags. Editor resources
|
Definition
Many real-world samples seem to follow a power-law distribution and come from N individuals being placed into M groups. An example is the number of people in each county. Another is number of people working at each company. Another, in a different area, is all the words in a document grouped by the word. (E.g., the word "the" occurs in 10 places, "of" occurs in 5 places, etc.)
While many other fat-tailed distributions have been used to fit these data samples, the RGF distribution tries to fit them by defining the information in the grouping and choosing the minimum information cost distribution.
Baek, Bernhardsson, and Minnhagen define the information in the grouping as:[1]
where ranges over the number of members in a group, is the probability of an individual being in a group of size , is the natural log, and is the number of groups with members.
The resulting distribution is:
where and are constants gotten by solving a Lagrangian equation with particular and .
This distribution does not fit all real world samples. Baek, Bernhardsson, and Minnhagen generalize the definition by allowing some ordering to the grouping. That is realized with function that computes the discounted entropy as a function of the distribution. In practice, this doesn't need to be calculated. The size of the largest group is sufficient to fit the discounted entropy. That distribution is:
Related Distributions
The RGF is a maximum entropy distribution. Other ones include the normal distribution (when the mean and variance is known), the exponential distribution and Laplace distribution.
Matt Visser created a similar distribution.[2] It is a maximum entropy distribution that generates a power law distribution, with a simpler constraint: .
Applications
The RGF distribution is not flat in a log-log graph like a power-law distributions. Data presented by Baek, Bernhardsson, and Minnhagen show that the curved RGF distribution matchs certain real-world samples better than the flat power-law distributions.[1]

- provide significant coverage: discuss the subject in detail, not just brief mentions or routine announcements;
- are reliable: from reputable outlets with editorial oversight;
- are independent: not connected to the subject, such as interviews, press releases, the subject's own website, or sponsored content.
Please add references that meet all three of these criteria. If none exist, the subject is not yet suitable for Wikipedia.