User:Mill 1/Project Finding The Forgotten Few

From Wikipedia, the free encyclopedia

Screenshot of console application WikipediaBiographyCreator after processing month July 1992

TL;DR

I created a software application that cross-references obituary archives from The New York Times and The Guardian, identifies overlapping mentions of the deceased individuals, and checks whether they already have a biography article on Wikipedia. Reasoning: obituaries are great sources for bio's and if someone's obituary is published in both newspapers than per definition it warrants its own article here.

The application, called 'Wikipedia Biography Creator' works great (source code). Given a specific month (e.g. January 2001) it identifies potential candidates who meet the criteria (cited in both newspapers but no bio yet). After a manual check that indeed a candidate does not exist here I proceed to creating the actual draft for the new bio article.

The application spotted 41 candidates; I created 34 biographies based on that. Downside was that the news archive of The Guardian only starts in January 1999. I also noticed that candidates were more abundand in the earlier years. During my work here it had already struck me that online newspaper The Independent had a lot of (early) obitaries on people. However, they do not have a news archive (yet). Using some ICT techniques I managed to create an 'obituary archive' for The Independent myself, going back to July 1992. I used that source in my application to look for earlier possible candidates which resulted in eight more articles.

Using ChatGPT

After creating dozens of biographies manually I decided to investigate if I could get a LLM to do the writing for me. Obviously I checked what Wikipedia had to say about it (later). I learned that although using an AI chatbot should be used with great caution and that its output should be rigorously scrutinized it was not forbidden (September 2025).

After much experimenting I thought I had defined an AI-prompt that would serve my purpose. It would set up the draft in wiki markup (maximum of 300 words) using url's to the two newspaper citations as sources. Initial checks of the output generated by ChatGPT looked good. So after applying some corrections and additions I decided to publish my first bio: Jacques Fauvet. Subsequently I created eight more articles this way. During that time I realized that AI hallucinations were present in every draft and were hard to spot because of their credible nature. It cost me a lot of time finding them and I started thinking that it would be faster to return to manual writing. It was about that time that I was alerted by Wikipedian NicheSports that the extend of hallucinations was even larger than I realised.
During the investigation what went wrong I did some further research on how to prevent these issues. New trials indicated that next approach eliminates hallicunations when writing draft articles using ChatGPT:

  • In the ChatGPT personalisation settings turn off 'Memory' and 'Web Search'
  • Delete the previous chat
  • Create a new chat. Use next prompt template:

Per article the prompt differs in two ways:
1. The provided two texts containing the source citations
2. The maximum number of words (based on the combined length of the obituaries, implying importance)

You are a Wikipedia editor. Write a neutral biography in encyclopedic tone in valid English Wikitext. 
Important: only use the information provided in following two texts to answer the question.

Name of biography article: Mary Bodne

TEXT 1 START
url: https://www.theguardian.com/news/2000/mar/20/guardianobituaries
Source: The Guardian
Author: Mark Krupnick
Date: 20 Mar 2000 02.45 CET
Obituary:
Title=Mary Bodne
Body=
[Text body of the obituary]
TEXT 1 END

TEXT 2 START
url: https://www.nytimes.com/2000/03/02/nyregion/mary-bodne-ex-owner-of-algonquin-hotel-dies-at-93.html
Source: The New York Times
Author: Douglas Martin
Date: March 2, 2000
Obituary:
Title=Mary Bodne, Ex-Owner of Algonquin Hotel, Dies at 93
Body=
[Text body of the obituary]
TEXT 2 END

Use and merge content from these texts paraphrased (no copying wording!).
Every (set of) statements should end with a reference to the source.
Two sources exist: The Guardian (<ref name="guardian">) and the NYTimes (<ref name="nyt">), both {{cite news}}, access-date=[today, d mmmm yyyy].

Other requirements:
- About 250 words maximum.
- Use an Infobox when appropriate.
- If Infobox: if possible use {{death date and age|...}}, date format: follow opening sentence. d mmmm yyyy means {{death date and age|...|df=yes}}.
- Add appropriate subsections, e.g.: == Early life == and == Later life and death ==.
- Add appropriate internal wiki links (e.g., [[Nigeria]], [[Nancy, France|Nancy]], ''[[Le Monde]]'').
- References section with == References == and {{reflist}}.
- Add property url-access=subscription regarding the NYTimes reference.
- Add {{Short description|[short description]}} at the top.
- Add appropriate Wikipedia categories at the end.
- Between the references and categories add the following two:
{{Authority control}}

{{DEFAULTSORT:LASTNAME(S), FIRSTNAME(S)}}

- Output only valid Wiki markup for a new article draft.

And again: you should ONLY respond to my question, given information provided in the two texts.

The actual prompt that created the draft for Mary Bodne can be found here. Copy the output generated by the LLM in a draft page to start the verification process.

Before publication

  • Start by reading the two sources carefully.
  • Check every statement in the output for accuracy, puffery and existence in the source text(s).
  • Check the first and last lines of the output particularly for LLM editorialization.
  • Combine duplicate citations (e.g. multiple consecutive statements citing the same source).
  • Add missing internal wiki links.
  • Correct non-existent categories, if any.

After publication

  • Create the corresponding Talk page.
  • Add wiki links pointing to our newly created article (surname!).
  • Update the Results section in this page.

Inception

Some online news media have created APIs to enable computer applications to request information from them. Two of them are publicly available news archives and free: the NYTimes API and The Guardian Open Platform. I've used the NYTimes API extensively in my previous private project applying it to generate close to 7,000 citations in numerous list pages.

The NYTimes News Archive dates back to 1851. The news archive of the Guardian only starts in January 1999. That's why I used that month as a starting point to find candidates.

TODO

Results: NYTimes-The Guardian

I processed the months January 1999 May 2025 to discover deceased who meet the criteria of this project. The red links and new redirects I leave for someone else to create.

More information ID, Month of Death ...
IDMonth of DeathDeceasedRef. The GuardianRef. NYTimesRemarks
1March 1999Vera TolstoyreferencereferenceUnited States Corrected
2April 1999Geoffrey WigoderreferencereferenceUnited Kingdom Corrected
3June 1999Angus MacDonald (piper)referencereferenceUnited Kingdom Corrected
4July 1999Joe HymanreferencereferenceUnited Kingdom Corrected
5August 1999Carlos CachaçareferencereferenceBrazil Corrected
6August 1999David Maurice GrahamreferencereferenceUnited Kingdom Corrected
7October 1999Ranjabati SircarreferencereferenceIndia Corrected
8February 2000Mary BodnereferencereferenceUnited States Corrected
9September 2000Donald GallupreferencereferenceUnited States
10November 2000Frederick S. ClarkereferencereferenceUnited States
11January 2001Charles Mérieux[1]referencereferenceFrance Was a redirect; no article count for me :(
12March 2001Charly BaumannreferencereferenceGermany
13April 2001Jérôme Lindon[2]referencereferenceFrance
14June 2002Jacques FauvetreferencereferenceFrance Corrected. First article to be published.
15July 2002Constantine LeventisreferencereferenceGreece Uncle
16November 2002Lynda Van DevanterreferencereferenceUnited States Keep as redirect links
17December 2002Orlando Villas-BôasreferencereferenceBrazil wlh Brothers
18February 2003Ted PerryreferencereferenceUnited Kingdom wlh AfD 2006 logs
19May 2003Sue Sally HalereferencereferenceUnited States
20August 2003Charles S. RhynereferencereferenceUnited States not mentioned! search pic
21August 2003David Webster (broadcasting executive)referencereferenceUnited Kingdom google
22August 2003Nina FonaroffreferencereferenceUnited States mother also misspelled
23November 2003Cyla WiesenthalreferencereferencePoland husband
24April 2004John B. EvansreferencereferenceUnited Kingdom ha!
25November 2004Harry FleischmanreferencereferenceUnited States links
26November 2004Martin M. KaplanreferencereferenceUnited States
27March 2005George Scott (singer)referencereferenceUnited States Was a redirect; no article count for me :(
28May 2005Chung Se-yung[3]referencereferenceSouth Korea Brother wikidata issue
29April 2007Paul LeventhalreferencereferenceUnited States search log
30June 2008Trevor Wilkinson (engineer)referencereferenceUnited Kingdom :) search wikidata! ref3
31August 2010Reginald LevyreferencereferenceUnited Kingdom Keep as redirect
32March 2012Jimmy Ellis (singer)referencereferenceUnited States Keep as redirect
33November 2012Derek HutchinsonreferencereferenceUnited Kingdom Sea_kayak
34August 2014Jeremiah HealyreferencereferenceUnited States books
35September 2016Philip KingsleyreferencereferenceUnited Kingdom Trichology search
36July 2017Jon UnderwoodreferencereferenceUnited Kingdom Death Cafe Keep as new redirect
37January 2019Patricia LousadareferencereferenceUnited States search
38May 2019Jake Black (musician)referencereferenceUnited Kingdom Keep as redirect
39July 2021Paul HuntleyreferencereferenceUnited Kingdom
40October 2022Theo RichmondreferencereferenceUnited Kingdom Konin prize book review
41June 2023Paul IckovicreferencereferenceUnited Kingdom WP:O nytimes link
Close

Results: NYTimes-The Independent

I processed the months July 1992 December 2009 to discover deceased who meet the criteria of this project. The red links and new redirects I leave for someone else to create.

More information ID, Month of Death ...
Close

About this project

Every year about a 50 million people die. Of those roughly 10,000 end up with a biography on the English wiki. Regarding the 2010s I found eight individuals who met the criteria of this project. This says something about the maturity of Wikipedia I think. It seems I am truly mopping up the last forgotten notable few.

References

Related Articles

Wikiwand AI