User:Dreamyshade/Reliability resources

From Wikipedia, the free encyclopedia

I'm interested in ways to improve the effectiveness of Wikipedia community efforts to counter unreliable content in Wikipedia articles. These are related notes and rough drafts, unpolished and incomplete.

Some inspiration

Problems

I believe we need to improve our guidelines and processes for dealing with these problems:

Disinformation, such as intentional deceptive efforts toward:
1. Discrediting people and businesses for commercial, political, or personal reasons, including people involved in opposition politics (sometimes called "black PR")
2. Improving the reputation of people and businesses, especially those that have received negative coverage in reliable sources, including by trying to discredit the source of the coverage (journalists, news publications, whistleblowers, etc.) or adding puffery cited to pay-to-play websites
3. Other kinds of political propaganda
Citations to deceptive websites that appear to be reliable sources. (I believe that intentionally deceptive websites are most important to mitigate, including paid-placement websites, but we also have issues with good-faith editors repeatedly citing benign self-published websites that don't meet reliable source standards.)

These problems reinforce each other because disinformation sticks well when cited to a website that appears to verify the content.

These problems fall into a gap in our strategies against unreliable content. We have relatively mature guidelines and processes to counter spam, vandalism, and fringe theories. Wikipedia:WikiProject AI Cleanup is developing guidelines and processes to mitigate that emerging threat to reliability. Wikipedia:Biographies of living persons helps by setting a relatively high standard for sourcing for BLPs. But we don't have mature guidelines or processes for addressing these specific problems.

Potential mitigations

1) We need community-endorsed guidelines, information pages, and explanatory essays for editors about disinformation and citations to deceptive websites that appear to be reliable sources. To help get there, we should improve the existing guidelines and essays on related topics.

2) If we mitigate the citation issue better (which is well-suited to automated mechanisms to surface signals to editors), that will help us mitigate disinformation (which is tough to identify automatically).

Hypotheses about collecting and distributing better data about unreliable websites:

Identify more WikiProjects with topic-specific reliable/unreliable sources lists and encourage them to get their lists into Meta:Cite Unseen/sources - Meta:Cite Unseen/sources/Import a new WikiProject source list.
The reliable sources noticeboard (WP:RSN) has multiple purposes, and if we split it into two, I believe we could more effectively address patterns across articles:
- Settling disagreements over uses of specific citations in specific articles
- Identifying patterns in unreliable websites used across many articles and getting consensus on them (RFCs)
It's too hard to get a WP:QUESTIONABLE website into the deprecated sources list. (WP:RSN says "RfCs for deprecation, blacklisting, or other classification should not be opened unless the source is widely used and has been repeatedly discussed.") We need a clear, easy-to-follow, templated nomination and discussion process that accommodates bulk nominations of questionable sources. This could even be a third tier below blacklisting and deprecation.
There's no list of generally reliable sources. The perennial sources list is not meant to be a list of reliable sources. ("If a source is not listed here, it only means that it has not been the subject of repeated community discussion. That may be because the source is a stellar source, and we simply never needed to talk about it because it is so obviously reliable")

The spam blacklist is overloaded with multiple purposes - spam, malicious websites, and low-quality and unreliable websites. But I think that we mainly need to address the deprecated sources list.

Hypotheses about application of better data about unreliable websites:

Suggestions mode
AfC tools and templates
AfD tools and templates
Tools for New Pages Patrol
Counter-Disinformation Unit to train people on using opt-in tools
Use "usurped" status in citation templates as a signal for potential automated mechanisms
Promote a scoped-down version of Cite Unseen to a beta feature in English Wikipedia
A bot could post notices on talk pages for articles that cite non-RS domains that are on consensus-curated lists (Meta:Cite Unseen/sources), encouraging editors to check those citations. Could be most feasible for a WikiProject to run this as a pilot program for articles in its WikiProject first, before expanding the concept to articles in general.

3) Encourage:

Improving articles about notable non-top-tier news sources, including both positive and negative coverage, including "Mass media in [country]" articles
Systematically linking to website and publisher names in citations to non-top-tier news sources

4) Empower Wikiprojects to use better tools to automatically identify and prioritize problematic content for humans to deal with, using heuristics and LLMs, especially in BLPs and articles about companies that are in business.

Types of unreliable content and how we deal with them

More information Issue, Intent ...

Summary
Issue	Intent	Guideline or policy	Discussions	Structured compilations of signals
Conflict of interest editing (in general)	Many (commercial, political, vanity)	WP:COI, WP:PROMO	WP:COIN
Undisclosed paid editing	Many (commercial, political, vanity)	WP:COI, WP:PAID	WP:COIN
Unreliable content in BLPs (in general)	Many (commercial, political, vanity, good faith)	WP:BLP	WP:BLPN
Spam	Commercial	WP:SPAM	WT:WPSPAM	WT:BLIST
Hoaxes	Mischief	WP:HOAX
Vandalism	Mischief	WP:VAND	WP:AIV
Advocacy	Political	WP:NOTADVOCACY
Fringe theories	Many	WP:FRINGE	WP:FTN
LLM-generated text	Many (may be good faith or commercial)	WP:LLMARTICLE	WP:AINB
Citations to unreliable sources (in general)	Many (may be good faith, commercial, political, vanity)	WP:RS	WP:RSN
Citations to deceptive websites that appear to be reliable sources	Many (may be good faith, commercial, political, vanity)	Nothing specific?	WP:RSN
Disinformation (intentional)	Commercial, political	Nothing specific?	Nothing specific?

Policies and guidelines

Wikipedia:Conflict of interest (WP:COI)
Wikipedia:Paid-contribution disclosure (WP:PAID)
Wikipedia:Biographies of living persons (WP:BLP)
Wikipedia:Spam (WP:SPAM)
Wikipedia:Do not create hoaxes (WP:HOAX)
Wikipedia:Vandalism (WP:VAND)
Wikipedia:What Wikipedia is not (WP:NOT)
- "Not a soapbox or means of promotion" (WP:PROMO)
- "not for: Advocacy, propaganda, or recruitment" (WP:NOTADVOCACY)
Wikipedia:Fringe theories (WP:FRINGE)
Wikipedia:Writing articles with large language models (WP:LLMARTICLE)

Noticeboards and discussions

Wikipedia:Conflict of interest/Noticeboard (WP:COIN)
Wikipedia:Reliable sources/Noticeboard (WP:RSN)
Wikipedia:Biographies of living persons/Noticeboard (WP:BLPN)
Wikipedia:Fringe theories/Noticeboard (WP:FTN)
Wikipedia:Administrator intervention against vandalism (WP:AIV)
Meta:Talk:Cite Unseen/Suggestions

Automated mechanisms

Wikipedia:Edit filter (Special:AbuseFilter) - "a tool that allows editors in the edit filter manager group to set controls, mainly[1] to address common patterns of harmful editing"
User:XLinkBot - anti-spam bot

Wikiprojects

Wikipedia:WikiProject Spam
- Wikipedia talk:WikiProject Spam (WT:WPSPAM)

Wikipedia:WikiProject Reliability
- Wikipedia talk:WikiProject Reliability (WT:REFCHECK)
Wikipedia:WikiProject AI Cleanup (WP:AICLEAN)
- Wikipedia:WikiProject AI Cleanup/Noticeboard (WP:AINB)
Wikipedia:Counter-Vandalism Unit (WP:CVU)

Compilations

Wikipedia:Reliable sources/Perennial sources (WP:RSP)
Wikipedia:Deprecated sources (WP:DEPRECATED)
Wikipedia:New pages patrol source guide (WP:SOURCEGUIDE)
Wikipedia:Spam blacklist (WP:BLACKLIST): MediaWiki:Spam-blacklist
MediaWiki talk:Spam-blacklist (WT:BLIST)
List of fake news websites
Meta:Cite Unseen/sources
User:Kuru/fakesources - "This list represents sites that exist solely to host paid PR content. Many disguise themselves as legitimate "news" sources, often with very obvious falsified staff listings and stock photos. Some deceptively chose names similar to existing publications or take over domains of prior publications."

Essays

Places to discuss policies and guidelines

Wikipedia:Village pump (policy)

Opt-in tools

Helpful:

Spamcheck (Toolforge)
Meta:Cite Unseen - "user script that helps readers quickly perform an initial evaluation of the sources used in a given article" (context: Meta:Wikimedia CH/Grant apply/Extending Cite Unseen)
Wikipedia:Citation Watchlist
Deprecated sources recent changes filter

Haven't tried yet:

User:Headbomb/unreliable - "This is the Unreliable/Predatory Source Detector (UPSD), a user script that identifies various unreliable and potentially unreliable sources."

Haven't found very useful: