User:Mill 1/Project Chaining back the Years/Statistics

From Wikipedia, the free encyclopedia

The statistics cover the period 1990-2005 and are organized in separate categories.

General statistics and facts

Information on the dpm's per 6 November 2024:

  • Total number of entries: 42,765
  • Total number of references: 27,268
  • Overall reference density[1]: 63.76% (27,268/42,765)
  • Total size of combined text (approx.): 10.7 megabytes
  • Which translates to approx. 1,850 pages (A4)[2]
  • Average number of entries per death day: 7.32 (42,765/5,843)
  • Average number of references per death day: 4.666 (27,268/5,843)
  • Average number of entries per dpm: 222.73 (42,765/(12 * 16))
  • Average number of references per dpm: 142.02 (27,268/(12 * 16))
  • Death day with the most entries (74): Deaths in September 2001#11 (details)
  • Dpm with the most entries (310): Deaths in December 2005
  • Dpm with the most references (229): Deaths in December 1995
  • Dpm with highest reference density[1] (82.33%): Deaths in January 1999
  • Minimum reference density regarding all processed death days: 30% (example)
  • Month with the most deaths: December (3,980) (2nd: January (3,901)) (details)
  • Month with the least deaths: June (3,326) (details)
  • Total number of views for all dpm's per year (2023): 846,402 (details)

Data on entries, references and page sizes

Overview of Excel tables 1990-2005, displaying counts per day of entries and references. Rounds: Baseline and Round 2

This section shows the development of the number of entries, references and page sizes over time.[3] To do that I had to establish a baseline first. This is the status of the dpm's before I started work on them. As explained work was done in specific rounds but not all dpm's were handled in each round (which is shown here). As a consequence data on Round 1 is skipped. Data regarding Round 2 does not exist for the years 1990 1995. In those cases the baseline counts are used. Regarding the years 1993 and 1994 no dpm's or dpy's existed to use as a baseline. Therefore the corresponding Year-page acted as the baseline.[4]

Charts & tables

The charts in this section visualize the results over time. Beneath the chart a table states its data. To consult the underlying data of a particular year, click on the corresponding title in the table.

Number of entries

As you can see in the chart below the number of (notable) entries increased considerably over time. In total I doubled the overall number of entries by 21,174 to 42,765. This is including the entries I removed during the course of the project. Strangely the inital dpm's of the year 1990 contained a lot of entries (and zero references). This was also true for the first five months of 1991. It turned out that a particular wikipedian in an unprecedented editing frenzy had been adding entries to dpm's between January 1990 and May 1993. And he really went to town regarding the first 17 dpm's adding entries indiscriminately and unreferenced. When reprocessing them I filtered out more than a third of unworthy entries which improved the lists considerably.

 
Chart of the entries per year per round regarding 1990-2005
Chart of the entries per year per round regarding 1990-2005
More information Year, Total ...
Year 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 Total
Baseline 4,1212,5131,5972472391,6971,0339074205368851,3949581,0661,5492,42921,591
After Round 2 4,1212,5131,5972472391,6972,1912,2332,1432,1662,2272,7892,4872,3502,1392,70133,840
Current 2,5562,3682,4052,4702,4012,9372,5082,7952,7512,7962,7332,9382,7642,6892,7262,92842,765
Close

Number of references

Citing the date (and cause) of death of the listed dead did not have a big priority to the Wikipedians before me who concerned themselves with the death lists. Looking at the for the nineties, excluding 1995 (which was a special case) only 450 references existed on a total of 11,613 entries. This meant an abysmal reference density of 3.9%. And the other months didn't fare much better, leading to a total number of refs of 4,561 (on 21,591 entries) concerning the whole period. So it's not a big surprise that Round 2 would yield big rewards when generating NYTimes refs. As explained not all dpm's were handled in Round 2; only the years 1997-2005 and 8 months of 1996; 116 dpm's in total.[5] Round 3 would produce even more (semi)generated references[6]; not only NYTimes obituary citations were automatically created but a whole bunch stemming from Wikidata as well. In addition I added about 1500 references to the dpm's individually, semi-automatically or manually.

 
Chart of the references per year per round regarding 1990-2005
Chart of the references per year per round regarding 1990-2005
More information Year, Total ...
Year 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 Total
Baseline 223494131,503515433559287329404576174,561
After Round 2 223494131,5031,0758988459098601,1091,3691,1871,03491711,797
Current 1,5771,4081,4061,5151,4292,1641,4641,7371,7391,7721,7291,8601,8841,7981,8171,96927,268
Close

Generated and semi-generated references

As already explained the WikipediaReferences tool enabled me to automaticallly attach NYTimes refs to entries. The 'Generate reference' submenu, however, also enabled me to generate citations of other sources semi-automatically:

 

Screenshot of the references submenu of the WikipediaReferences tool

As a consequence regarding some sources it is now difficult to determine how a reference came about; was it generated or semi-generated via the submenu? To make matters worse I (and others) had manually added citations for years (using the RefTool) citing the same sources. These too were often indistinguishable from the semi-automated ones. This was particularly true for manually added references citing the sources worldfootball.net, pro-football-reference.com and baseball-reference.com. So to try to count the number of (semi)automated per source accurately is not possible. Even so, I dare to state that between 75% and 80% of all references in the processed dpm's are generated automatically, the fast majority using Wikidata as the source.

Generated references per source per year

In the following table you find data for this claim. The number of references is stated per source. Inevitably some manually added citations will incorrectly be stated as being generated. On the other hand however, since it was too much work to determine which refs citing The Guardian and The Independent were generated (which were a lot) I placed them all under 'Manual references'. The same is true for the all the semi-automated ESPNcricinfo citations. This should more than compensate for the fact that some manual references are misrepresented in next table.
Especially during the first 5 years the ratio of generated references was quite high. Regarding 1990 nineteen out of twenty references seem to be generated. Dpm July 1990 even has a ratio of 100% (per 25 Dec. 2024). The reason is that I created the references from scratch since my predecessor did not care at all about citations. The percentage seems to drop off in the following years though, although it remains significant.

More information Generated references per source per year, Source ...
Generated references per source per year
Source 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 Total
The New York Times 500 313 245 281 349 489 407 521 460 427 574 603 391 421 345 407 6,733
Bibliothèque nationale de France 178 164 162 227 258 158 235 266 181 163 181 155 145 179 204 223 3,079
snaccooperative.org 225 211 238 180 140 76 116 115 162 175 165 168 152 197 212 227 2,759
olympedia.org 187 180 178 191 138 151 138 143 138 144 150 165 136 131 145 149 2,464
filmportal.de 49 60 54 57 57 31 52 36 45 49 32 0 14 40 50 88 714
worldfootball.net 49 51 34 45 44 27 47 47 44 47 44 56 50 39 37 38 699
pro-football-reference.com 42 34 60 34 44 33 43 39 61 39 44 35 19 8 14 29 578
baseball-reference.com 62 58 48 53 50 28 63 58 41 37 7 3 3 2 1 37 551
Encyclopædia Britannica 33 94 87 79 67 14 59 12 14 8 8 13 19 12 16 15 550
Internet Broadway Database 22 30 47 40 36 8 29 32 24 18 17 9 31 29 33 40 445
Fichier des décès 32 33 32 12 0 1 6 0 15 37 29 38 27 42 33 40 377
Library of Congress 28 23 10 16 14 15 16 19 17 15 11 12 12 13 11 11 243
FemBio 15 7 12 12 15 8 21 6 18 9 9 14 9 3 12 12 182
DB~e 16 19 14 12 9 9 13 7 12 7 10 7 10 14 11 8 178
procyclingstats.com 13 6 7 11 13 12 8 19 13 9 7 14 15 8 5 12 172
basketball-reference.com 11 10 8 13 11 7 12 13 14 7 10 5 12 11 9 15 168
Biografisch Portaal 16 7 12 3 14 12 11 10 5 9 4 9 6 8 5 13 144
hockey-reference.com 18 9 17 7 3 5 11 3 9 14 4 5 4 2 3 11 125
where2golf.com[7] 2 0 2 1 0 1 0 1 0 0 0 0 0 0 2 1 10
Total generated[8] 1,498 1,309 1,267 1,274 1,262 1,085 1,287 1,347 1,273 1,214 1,306 1,311 1,055 1,159 1,148 1,376 20,171
Total manually added[9] 79 99 139 241 167 1,079 177 390 466 558 423 549 829 639 669 593 7,097
Total number of references 1,577 1,408 1,406 1,515 1,429 2,164 1,464 1,737 1,739 1,772 1,729 1,860 1,884 1,798 1,817 1,969 27,268
% of generated references 95.0% 93.0% 90.1% 84.1% 88.3% 50.1% 87.9% 77.5% 73.2% 68.5% 75.5% 70.5% 56.0% 64.5% 63.2% 69.9% 74.0%
Close

NYTimes obituaries
The use of the references tool starting in Round 2 led to a big increase of NYTimes obituary citations. In a single edit many references in a dpm were added and updated. For instance take a look at this edit regarding Deaths in December 2001: afterwards the tool had added 29 NYTimes citations and had replaced 14 existing references with the more reliable and future proof NYTimes obituaries. Also, the edit caused a netto size increase of 10,934 bytes for this particular dpm alone. All dpm's benefited this way. You can find a list of dpm's whose size increase most because of it here. Regarding replaced citations the year 1995 won. Check out the specifics here

Reference density

The reference density is the number of references / number of entries.[1] So it is no surprise that this ratio was very small regarding the nineties (excluding 1995) since hardly any citing was done by my predecessors. What does stand out is that after Round 3 the reference density per year are quite similar. This is largely the result of the occurence of reference data in Wikidata that was automatically attached to entries when processing a dpm (see previous paragraph). Apperently the existence of this data is distributed evenly over the period 1990-2005. I also added references manually when my minimum standards regarding reference density fell short per processed death date: the ref. density should be least 30% with a minimum two references per day. You will not find a day section with ten entries (or more) and only two references anywhere.

 
Chart of reference density per year per round regarding 1990-2005
Chart of reference density per year per round regarding 1990-2005
More information Year, Total ...
Year 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 Total
Baseline 0.5%0.1%3.1%1.6%5.4%88.6%0.5%1.7%1.0%62.5%66.9%62.6%3.0%3.8%29.5%25.4%22.3%
After Round 2 0.5%0.1%3.1%1.6%5.4%88.6%49.1%40.2%39.4%42.0%38.6%39.8%55.0%50.5%48.3%34.0%33.5%
Current 61.7%59.5%58.5%61.3%59.5%73.7%58.4%62.1%63.2%63.4%63.3%63.3%68.2%66.9%66.7%67.2%63.5%
Close

Article sizes

The increase in the size of the dpm articles perhaps is the area where most progress was realised. About half of the content is currently represented by (generated) citations.

 
Chart of article sizes (in bytes) per year per round regarding 1990-2005
Chart of article sizes (in bytes) per year per round regarding 1990-2005
More information Year, Total ...
Year 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 Total
Baseline 278,197183,945139,45822,17621,744510,48077,60368,81833,106112,690216,886334,91273,89690,287276,163385,4072,825,768
After Round 2 278,197183,945139,45822,17621,744510,480463,766438,279411,236428,967428,277548,770616,674560,766494,002513,7216,060,458
Current 605,208546,099553,140588,934569,257787,423585,610683,271679,157689,906679,035761,645749,906721,308723,522778,29710,701,718
Close

Pageviews

The total number of views for all dpm's in 2023 was 846,402. This translates to 4,408 views per page per year, 367 views per dpm per month and about 12 views per dpm daily.

Number of pageviews per dpm

The table below shows the number of pageviews stated per dpm. The nineties seem to be less popular than the zero's. Also, the month of January is viewed considerable more than the other months for some reason (a new year?). The first month of 1990 and 2000[10] especially generate more interest. This is also true for the last month of the millenium.

Spikes

For different reasons other dpm's also have significantly more pageviews:

More information Number of pageviews per month in 2023. Total 846,402:, Day ...
Number of pageviews per month in 2023. Total 846,402:
Day Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1990 8,889 3,594 3,240 3,019 3,371 3,193 2,657 2,569 2,459 2,484 2,272 2,814
1991 4,134 2,603 2,544 2,603 2,511 2,672 2,473 2,408 2,598 2,342 3,303 3,843
1992 4,175 2,585 2,538 2,594 2,441 2,362 2,319 2,355 2,389 2,451 2,393 2,780
1993 5,241 3,122 3,311 3,401 2,862 3,139 3,270 3,157 2,905 3,115 2,781 3,389
1994 5,328 3,040 3,262 4,602 3,745 3,064 2,990 2,519 2,771 2,700 3,147 3,046
1995 4,844 3,129 4,113 3,206 3,032 3,249 3,139 3,258 2,981 3,114 2,942 3,705
1996 4,740 3,337 2,941 2,717 2,941 3,014 3,061 2,930 4,464 3,124 2,873 3,137
1997 5,166 3,191 3,946 3,487 3,536 3,343 3,671 5,764 4,304 3,620 3,338 4,099
1998 5,406 3,573 3,491 3,718 4,149 4,442 3,727 3,969 3,578 3,463 3,382 4,068
1999 6,265 3,896 3,988 4,791 4,482 4,183 4,456 3,912 4,053 4,368 4,371 9,567
2000 11,090 5,232 4,070 4,032 4,235 3,997 4,092 3,771 3,988 4,032 3,781 5,075
2001 6,296 4,835 4,708 3,960 3,955 4,660 4,032 5,036 24,739 4,172 4,575 4,441
2002 5,973 4,828 4,540 4,848 4,342 4,022 4,073 4,293 4,178 4,208 4,152 4,689
2003 8,173 5,663 5,233 5,233 4,890 5,176 5,152 4,898 5,555 4,975 4,898 4,757
2004 6,652 4,522 4,605 4,460 4,625 5,037 4,635 5,004 4,671 5,377 4,900 57,865
2005 7,512 5,033 5,219 5,014 5,183 5,056 4,830 5,208 5,282 5,547 4,998 4,976
Total 99,884 62,183 61,749 61,685 60,300 60,609 58,577 61,051 80,915 59,092 58,106 122,251
Close

Wikipedia edit count

Although having recovered from editcountitis I still wanted to estimate the number of edit I spent on this project. The number of edits is calculated by analyzing the edits of the envolved articles. Edits on project documentation are excluded as are sandbox and Talk Page changes.

Edits on dpm's, dpy's and Year-pages

Work on the actual listing pages was divided between the dpm's, dpy's and the Year-pages like 2000. I used Xtools edit counter to resolve the the data regarding the dpy's and Year-pages. Especially the number of edits on Deaths in 1999 stood out.

Regarding the dpm's I decided to use extrapolation to calculate an approximation. I would select the two most 'average'-edited months and based on that would calculate the total regarding all months. Causality between the number of entries and the number of edits exists. So, looking at the total number of edits distributed per month I chose the months October and November. They are closest to the average of 3,563.75 entries per month of the year. Next table shows per year the edit count results for the three types of lists:

More information Year, Year-pages ...
YearYear-pages[26]dpy'sdpm's Octdpm's Novdpm's Year[27]
1990 0268[28]58[29]756
1991 027270852
1992 027387960
1993 925560
1994 12[30]21817210
1995 127577912
1996 3575752654
1997 17937567852
1998 5307980954
1999 3631[31]93761,014
2000 6261021461,488
2001 4481281191,482
2002 15255108991,242
2003 22029373996
2004 528284996
2005 521121061,308
Total 871,35814,736
Close

Processing pages

As Mill 1 I used two processing pages to compile dpm's whose content was copied to the actual article when a month was done.

User:Mill 1/Months/December
I set up this page exclusively for creating dpm content. Therefore all edits in this page can be attributed to the project. Total edits: 5,078[32]

User:Mill 1/tmp
Although I used this page to test my software and other stuff I mainly used it to for the project. Since 5 November 2022 I used it for other purposes so those edits are left out. Total count: 494

Category edits

Edits to new categories added up to a whopping total of 12:

Other page types

During the project I also needed to edit other page types like templates whose counts are stated in the next section.

Edit count summary

All edit counts add up to a total of 21,900:

More information Page type, Count ...
Page typeCount
Dpm edits14,736
Dpy edits1,358
Year-page edits87
User:Mill 1/Months/December5,078
User:Mill 1/tmp494
Lists of deaths by year[33]17
Template edits[34]24
Category edits12
Sonictonic[35]94
Total21,900
Close

Top 50 most edited pages

More information Edits, Page title ...
Edits Page title Assessment Links
631 Deaths in 1999 Redirect Log · Page History · Top Edits
255 Deaths in 2002 Redirect Log · Page History · Top Edits
202 Deaths in 2003 Redirect Log · Page History · Top Edits
153 Deaths in July 2000 List Log · Page History · Top Edits
146 Deaths in November 2000 List Log · Page History · Top Edits
146 Deaths in August 2001 List Log · Page History · Top Edits
136 Deaths in April 2001 List Log · Page History · Top Edits
136 Deaths in July 2001 List Log · Page History · Top Edits
136 Deaths in March 2001 List Log · Page History · Top Edits
135 Deaths in January 2002 List Log · Page History · Top Edits
132 Deaths in April 2000 List Log · Page History · Top Edits
131 Deaths in June 2001 List Log · Page History · Top Edits
129 Deaths in January 1995 List Log · Page History · Top Edits
128 Deaths in October 2001 List Log · Page History · Top Edits
126 Deaths in December 2001 List Log · Page History · Top Edits
125 Deaths in May 2001 List Log · Page History · Top Edits
124 Deaths in September 2001 List Log · Page History · Top Edits
120 Deaths in January 2000 List Log · Page History · Top Edits
120 Deaths in February 2000 List Log · Page History · Top Edits
119 Deaths in September 2000 List Log · Page History · Top Edits
119 Deaths in November 2001 List Log · Page History · Top Edits
118 Deaths in December 2004 List Log · Page History · Top Edits
117 Deaths in June 2000 List Log · Page History · Top Edits
117 Deaths in September 2003 List Log · Page History · Top Edits
115 Deaths in December 2005 List Log · Page History · Top Edits
114 Deaths in August 2000 List Log · Page History · Top Edits
113 Deaths in May 2005 List Log · Page History · Top Edits
112 Deaths in October 2005 List Log · Page History · Top Edits
112 Deaths in September 2005 List Log · Page History · Top Edits
112 Deaths in January 2001 List Log · Page History · Top Edits
112 Deaths in February 2001 List Log · Page History · Top Edits
112 Deaths in June 2004 List Log · Page History · Top Edits
111 Deaths in March 2000 List Log · Page History · Top Edits
111 Deaths in December 2000 List Log · Page History · Top Edits
111 Deaths in August 2005 List Log · Page History · Top Edits
109 Deaths in May 2003 List Log · Page History · Top Edits
108 Deaths in October 2002 List Log · Page History · Top Edits
106 Deaths in April 2003 List Log · Page History · Top Edits
106 Deaths in November 2005 List Log · Page History · Top Edits
105 Deaths in May 2000 List Log · Page History · Top Edits
105 Deaths in April 2005 List Log · Page History · Top Edits
105 Deaths in May 2002 List Log · Page History · Top Edits
104 Deaths in September 2002 List Log · Page History · Top Edits
103 Deaths in August 2003 List Log · Page History · Top Edits
103 Deaths in August 2004 List Log · Page History · Top Edits
103 Deaths in April 2004 List Log · Page History · Top Edits
102 Deaths in June 2002 List Log · Page History · Top Edits
102 Deaths in October 2000 List Log · Page History · Top Edits
102 Deaths in March 2005 List Log · Page History · Top Edits
100 Deaths in July 2003 List Log · Page History · Top Edits
Close

Miscellaneous data

Death dates with the highest number of notable entries

Total number of entries per month of dpm

Next table shows the combined number of notable entries regarding all dpm's per month of the year. So for example a total of 3,901 deaths are listed for January regarding the dpm's 1990-2005.

More information Jan, Feb ...
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Total
3,9013,5123,6653,4623,5003,3263,3873,4533,4643,5403,5753,98042,765
Close

Average number of entries per month: 3,563.75.

Top 10 largest revisions added/updated NYTimes references (in bytes)

Thanks to my strict edit summaries I was able to compile this list. The full list can be found here.

Number of added/updated NYTimes references regarding 1995 dpm's

Many generated New York Times obituaries were automatically attached to entries regarding 1995. The reason was that the dpm's contained (too) many entries. And many of those entries had references. This is reflected in the relative limited increase of the size (in bytes); The fast majority of the edits were reference replacements which add less content than new citations; only a few NYTimes refs were added regarding 1995. Even so, the improvements were substantial. Check out January 4th for instance where five citations were replaced[37]!

References

Related Articles

Wikiwand AI