User:Mill 1/Project Finding The Forgotten Few

From Wikipedia, the free encyclopedia

TL;DR

I created a software application that cross-references obituary archives from The New York Times and The Guardian, identifies overlapping mentions of the deceased individuals, and checks whether they already have a biography article on Wikipedia. Reasoning: obituaries are great sources for bio's and if someone's obituary is published in both newspapers than per definition it warrants its own article here.

The application, called 'Wikipedia Biography Creator' works great (source code). Given a specific month (e.g. January 2001) it identifies potential candidates who meet the criteria (cited in both newspapers but no bio yet). After a manual check that indeed a candidate does not exist here I proceed to creating the actual draft for the new bio article.

The application spotted 41 candidates; I created 34 biographies based on that. Downside was that the news archive of The Guardian only starts in January 1999. I also noticed that candidates were more abundand in the earlier years. During my work here it had already struck me that online newspaper The Independent had a lot of (early) obitaries on people. However, they do not have a news archive (yet). Using some ICT techniques I managed to create an 'obituary archive' for The Independent myself, going back to July 1992. I used that source in my application to look for earlier possible candidates which resulted in eight more articles.

Using ChatGPT

After creating dozens of biographies manually I decided to investigate if I could get a LLM to do the writing for me. Obviously I checked what Wikipedia had to say about it (later). I learned that although using an AI chatbot should be used with great caution and that its output should be rigorously scrutinized it was not forbidden (September 2025).

After much experimenting I thought I had defined an AI-prompt that would serve my purpose. It would set up the draft in wiki markup (maximum of 300 words) using url's to the two newspaper citations as sources. Initial checks of the output generated by ChatGPT looked good. So after applying some corrections and additions I decided to publish my first bio: Jacques Fauvet. Subsequently I created eight more articles this way. During that time I realized that AI hallucinations were present in every draft and were hard to spot because of their credible nature. It cost me a lot of time finding them and I started thinking that it would be faster to return to manual writing. It was about that time that I was alerted by Wikipedian NicheSports that the extend of hallucinations was even larger than I realised.
During the investigation what went wrong I did some further research on how to prevent these issues. New trials indicated that next approach eliminates hallicunations when writing draft articles using ChatGPT:

In the ChatGPT personalisation settings turn off 'Memory' and 'Web Search'
Delete the previous chat
Create a new chat. Use next prompt template:

Per article the prompt differs in two ways:
1. The provided two texts containing the source citations
2. The maximum number of words (based on the combined length of the obituaries, implying importance)

You are a Wikipedia editor. Write a neutral biography in encyclopedic tone in valid English Wikitext. 
Important: only use the information provided in following two texts to answer the question.

Name of biography article: Mary Bodne

TEXT 1 START
url: https://www.theguardian.com/news/2000/mar/20/guardianobituaries
Source: The Guardian
Author: Mark Krupnick
Date: 20 Mar 2000 02.45 CET
Obituary:
Title=Mary Bodne
Body=
[Text body of the obituary]
TEXT 1 END

TEXT 2 START
url: https://www.nytimes.com/2000/03/02/nyregion/mary-bodne-ex-owner-of-algonquin-hotel-dies-at-93.html
Source: The New York Times
Author: Douglas Martin
Date: March 2, 2000
Obituary:
Title=Mary Bodne, Ex-Owner of Algonquin Hotel, Dies at 93
Body=
[Text body of the obituary]
TEXT 2 END

Use and merge content from these texts paraphrased (no copying wording!).
Every (set of) statements should end with a reference to the source.
Two sources exist: The Guardian (<ref name="guardian">) and the NYTimes (<ref name="nyt">), both {{cite news}}, access-date=[today, d mmmm yyyy].

Other requirements:
- About 250 words maximum.
- Use an Infobox when appropriate.
- If Infobox: if possible use {{death date and age|...}}, date format: follow opening sentence. d mmmm yyyy means {{death date and age|...|df=yes}}.
- Add appropriate subsections, e.g.: == Early life == and == Later life and death ==.
- Add appropriate internal wiki links (e.g., [[Nigeria]], [[Nancy, France|Nancy]], ''[[Le Monde]]'').
- References section with == References == and {{reflist}}.
- Add property url-access=subscription regarding the NYTimes reference.
- Add {{Short description|[short description]}} at the top.
- Add appropriate Wikipedia categories at the end.
- Between the references and categories add the following two:
{{Authority control}}

{{DEFAULTSORT:LASTNAME(S), FIRSTNAME(S)}}

- Output only valid Wiki markup for a new article draft.

And again: you should ONLY respond to my question, given information provided in the two texts.

The actual prompt that created the draft for Mary Bodne can be found here. Copy the output generated by the LLM in a draft page to start the verification process.

Before publication

Start by reading the two sources carefully.
Check every statement in the output for accuracy, puffery and existence in the source text(s).
Check the first and last lines of the output particularly for LLM editorialization.
Combine duplicate citations (e.g. multiple consecutive statements citing the same source).
Add missing internal wiki links.
Correct non-existent categories, if any.

After publication

Create the corresponding Talk page.
Add wiki links pointing to our newly created article (surname!).
Update the Results section in this page.

Inception

Some online news media have created APIs to enable computer applications to request information from them. Two of them are publicly available news archives and free: the NYTimes API and The Guardian Open Platform. I've used the NYTimes API extensively in my previous private project applying it to generate close to 7,000 citations in numerous list pages.

The NYTimes News Archive dates back to 1851. The news archive of the Guardian only starts in January 1999. That's why I used that month as a starting point to find candidates.

TODO

Results: NYTimes-The Guardian

I processed the months January 1999 – May 2025 to discover deceased who meet the criteria of this project. The red links and new redirects I leave for someone else to create.

More information ID, Month of Death ...

ID	Month of Death	Deceased	Ref. The Guardian	Ref. NYTimes	Remarks
1	March 1999	Vera Tolstoy	reference	reference	Corrected
2	April 1999	Geoffrey Wigoder	reference	reference	Corrected
3	June 1999	Angus MacDonald (piper)	reference	reference	Corrected
4	July 1999	Joe Hyman	reference	reference	Corrected
5	August 1999	Carlos Cachaça	reference	reference	Corrected
6	August 1999	David Maurice Graham	reference	reference	Corrected
7	October 1999	Ranjabati Sircar	reference	reference	Corrected
8	February 2000	Mary Bodne	reference	reference	Corrected
9	September 2000	Donald Gallup	reference	reference
10	November 2000	Frederick S. Clarke	reference	reference
11	January 2001	Charles Mérieux^[1]	reference	reference	Was a redirect; no article count for me :(
12	March 2001	Charly Baumann	reference	reference
13	April 2001	Jérôme Lindon^[2]	reference	reference
14	June 2002	Jacques Fauvet	reference	reference	Corrected. First article to be published.
15	July 2002	Constantine Leventis	reference	reference	Uncle
16	November 2002	Lynda Van Devanter	reference	reference	Keep as redirect links
17	December 2002	Orlando Villas-Bôas	reference	reference	wlh Brothers
18	February 2003	Ted Perry	reference	reference	wlh AfD 2006 logs
19	May 2003	Sue Sally Hale	reference	reference
20	August 2003	Charles S. Rhyne	reference	reference	not mentioned! search pic
21	August 2003	David Webster (broadcasting executive)	reference	reference	google
22	August 2003	Nina Fonaroff	reference	reference	mother also misspelled
23	November 2003	Cyla Wiesenthal	reference	reference	husband
24	April 2004	John B. Evans	reference	reference	ha!
25	November 2004	Harry Fleischman	reference	reference	links
26	November 2004	Martin M. Kaplan	reference	reference
27	March 2005	George Scott (singer)	reference	reference	Was a redirect; no article count for me :(
28	May 2005	Chung Se-yung^[3]	reference	reference	Brother wikidata issue
29	April 2007	Paul Leventhal	reference	reference	search log
30	June 2008	Trevor Wilkinson (engineer)	reference	reference	:) search wikidata! ref3
31	August 2010	Reginald Levy	reference	reference	Keep as redirect
32	March 2012	Jimmy Ellis (singer)	reference	reference	Keep as redirect
33	November 2012	Derek Hutchinson	reference	reference	Sea_kayak
34	August 2014	Jeremiah Healy	reference	reference	books
35	September 2016	Philip Kingsley	reference	reference	Trichology search
36	July 2017	Jon Underwood	reference	reference	Death Cafe Keep as new redirect
37	January 2019	Patricia Lousada	reference	reference	search
38	May 2019	Jake Black (musician)	reference	reference	Keep as redirect
39	July 2021	Paul Huntley	reference	reference
40	October 2022	Theo Richmond	reference	reference	Konin prize book review
41	June 2023	Paul Ickovic	reference	reference	WP:O nytimes link

Results: NYTimes-The Independent

I processed the months July 1992 – December 2009 to discover deceased who meet the criteria of this project. The red links and new redirects I leave for someone else to create.

More information ID, Month of Death ...

ID	Month of Death	Deceased	Ref. The Independent	Ref. NYTimes	Remarks
1	July 1992	Pierre Uri^[4]	reference	reference	search Has pic
2	February 1993	Michel Renault	reference	reference	wd search
3	February 1993	Adina Blady-Szwajger^[5]	reference	reference	WP:O search link wd
4	March 1993	Warren Ellsworth	reference	reference	WP:O encyclopedia.com wd
5	March 1994	Kenneth Neill Cameron	reference	reference	search wd
6	January 1995	Elaine Greene	reference	reference	search
7	March 1995	Constance Morrow Morgan	reference	reference	WP:O Smith College
8	December 1995	Nina Verchinina	reference	reference	image search wd LATimes
9	September 1996	Rose Isabel Williams	reference	reference	WP:N Keep as new redirect wlh
10	January 1998	Jack Grimm	reference	reference	WP:O
11	August 1998	Francesco Crucitti^[6]	reference	reference	WP:O Keep as new redirect wlh

About this project

Every year about a 50 million people die. Of those roughly 10,000 end up with a biography on the English wiki. Regarding the 2010s I found eight individuals who met the criteria of this project. This says something about the maturity of Wikipedia I think. It seems I am truly mopping up the last forgotten notable few.

Before publication

After publication

Related Articles