Wikipedia:Request a query

Page for requesting database queries From Wikipedia, the free encyclopedia

This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.

Shortcuts

You may also be interested in the following:

  • If you are interested in writing SQL queries or helping out here, visit our tips page.
  • If you need to obtain a list of pages that meet certain criteria, consider using PetScan (user manual) or the default search. PetScan can generate lists of articles in subcategories, articles which transclude some template, etc.
  • If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
  • For long-term review and checking, database reports are available.

The database replicas do not have access to page content, so requests which require checking wikitext cannot be answered with database queries. In particular, there is no way to query for template parameters or anything related to references. However, someone may be able to assist by querying in another way (for example, checking for external links rather than references) or suggest an alternative tool.

Number of non page-move mainspace redirect pages created by a specific user

I have a request, but it's a doozy. At WP:AN#Iljhgtn I have been trying (and mostly failing) to create a query to accurately count how many redirects the sockpuppet created. So if someone with better knowledge of Quarry would be able to produce a query that does the following, it would be much appreciated:

Given a specific user, return the number of redirects they have created, subject to the following conditions:

  1. Only mainspace (namespace 0).
  2. Include both live and deleted counts and a total.
  3. The page must have been a redirect at the time of the page creation. quarry:query/104250, my query, suffers from false positives by including pages created by the user that were only later made into redirects (e.g. the page was WP:BLARed or merged into another). I discovered this by running quarry:query/104288 (check the first entry). Simply put, the specific user must be the one who redirected the page.
  4. The redirect must not have been created from a page move. (This is something I have no idea how to check for.)

SuperPianoMan9167 (talk) 22:38, 11 April 2026 (UTC)

@Cryptic What do you think of https://quarry.wmcloud.org/query/104292 ? Polygnotus (talk) 22:50, 11 April 2026 (UTC)
@Polygnotus Thank you so much! It appears my 24,000 estimate was accurate. SuperPianoMan9167 (talk) 23:26, 11 April 2026 (UTC)
@SuperPianoMan9167 Yeah 24035 is very close. Maybe Cryptic knows a more elegant way to do this. Polygnotus (talk) 23:29, 11 April 2026 (UTC)
First, you'd want to use revision_userindex and archive_userindex in that query; rev_actor and ar_actor will be most selective, and the plain revision and archive views don't index them.
But if you don't mind it missing creations before 2018-06-27, then the best way here is to use the creation log: logging_userindex...WHERE log_type='create' AND log_action='create', joining change_tag on ct_log_id=log_id. —Cryptic 01:07, 12 April 2026 (UTC)
@Cryptic Thanks. That is a much easier approach but it does miss a few: https://quarry.wmcloud.org/query/104295 Polygnotus (talk) 01:13, 12 April 2026 (UTC)
It probably doesn't matter for redirects, and maybe everyone but me already knows this, but for interest, I noticed something about 'rev_parent_id = 0' that confused me for a while the other day. I have a bit of code that implicitly assumed that a page would only have one revision where rev_parent_id = 0, but during testing I came across a page where there were 2, List of schools in Isfahan for blocked page creator Baratiiman, I guess because the user didn't follow the normal draft process and there was a history merge. Sean.hoyland (talk) 05:02, 12 April 2026 (UTC)
Yes, both history merges and splits, as well as other forms of selective undeletion, break the assumption that rev_parent_id=0 === first revision. It happens rarely enough that it can usually be ignored; I only worry about it for queries that are expected to have few or no results. —Cryptic 05:55, 12 April 2026 (UTC)
It's likely the discrepancy between the two queries is because your first one looks for pages that are currently in mainspace (or were there when they were deleted), and the second looks for where they were when they were created. —Cryptic 16:02, 12 April 2026 (UTC)

Usernames containing only non-spacing marks and combining diacritical marks

For context, I need to know how many usernames will be affected if this global title blacklisting request is accepted. The list of non-spacing marks is here and the list of combining diacritical marks is here. I tried querying for ^[\p{Mn}\u{0300}-\u{036F}]*$ on gUser search, but it said MySQL supports neither \u nor \x with more than 2 hex digits. I also tried CONCAT('^[\\p{Mn}', UNHEX('0300'), '-', UNHEX('036f'), ']*$') on Quarry, but UNHEX() also can't handle more than 2 digits at once. NguoiDungKhongDinhDanh 21:28, 17 April 2026 (UTC)

You're looking for \x{...}. quarry:query/104514. (The explicit conversion to utf8mb4 is important.) I haven't verified your regex. Reference. —Cryptic 23:24, 19 April 2026 (UTC)
Thanks. So Quarry queries can use \x{}, unlike gUser. NguoiDungKhongDinhDanh 23:26, 19 April 2026 (UTC)
I just tried gUser - it shows the failed query when it dies. Its problem is it lacks the conversion to utf8, so it's doing a bytewise regex - and multibyte characters are indeed invalid in that context. Might be worth contacting its maintainer; CONVERT() won't give any more of a performance penalty than the regex match already will. —Cryptic 23:33, 19 April 2026 (UTC)
BTW, the reason your try with UNHEX() failed is because it gives a literal byte sequence; you've got to encode that into utf8. Unicode codepoints U+0300 and U+036f in utf8 aren't 03 00 and 03 6f, they're cc 30 and cd af. —Cryptic 23:48, 19 April 2026 (UTC)
Erm, cc 80, not cc 30. A case in point of why you shouldn't try to do this manually, even as a workaround. and for reference. —Cryptic 00:22, 20 April 2026 (UTC)
Thanks again. Those are some very useful information. NguoiDungKhongDinhDanh 23:51, 19 April 2026 (UTC)

Yearly total number of citations

Hey, wouldn't it be nice to see the yearly total number of references of all english Wikipedia articles? For WP:Statistics. In order to get an impression that Wikipedia has become more reliable.

Alternatively yearly total number of good / very good articles.

Or yearly share of paragraphs with at least one citation. WikiPate (talk) 22:36, 19 April 2026 (UTC)

Without a repository of citations (that I know of), I'm not sure how we can query this. Good idea, though. the Stefen 𝕋ower 22:48, 19 April 2026 (UTC)
(edit conflict)References aren't queryable. See the first section here. Closest we can come is looking at the external links, and only in the current versions of pages. (Besides, past a certain point, number of references isn't directly correlated with reliability. See Wikipedia:Citation overkill.)
Historical number of Good Articles isn't queryable either, but what you can do is look at old versions of Wikipedia:Good articles, or Wayback Machine archives of Category:Good articles. —Cryptic 22:54, 19 April 2026 (UTC)
Meta:Shared Citation is the project to address this. ~ 🦝 Shushugah (he/him  talk) 01:11, 20 April 2026 (UTC)
Thank you for the replies.
Yes I know reliablility is not 100% correlated to the number of total references, still citation overkill happens rarely on Wikipedia due to my experience.
The user WhatamIdoing just wrote that there are graphs regarding good and featured articles in Wikipedia:Good article statistics WikiPate (talk) 08:01, 20 April 2026 (UTC)

I've been looking to try and find disambiguation pages that are no longer needed - they only link out to one or zero other articles. Updating an old quarry query from @Dragons flight I was able to produce this, which returns the number of internal links from each disambiguation page and takes about ten minutes to run. (It turns out there are about 250 pages that are no longer needed, and also that the record-holder is over a thousand outbound links...). I also tried this version to filter it to just ns0 (ie article) links, but that seems to drag on indefinitely.

Is there a more efficient way to do these? Andrew Gray (talk) 12:36, 26 April 2026 (UTC)

There are no indices on page titles anywhere in the schema except when combined with namespace. Your query doesn't just look for pages categorized into Category:All disambiguation pages; it tries to look for pages categorized into 'All disambiguation pages' in every namespace. The optimizer doesn't - and can't - know that A) there's only a limited number of valid namespaces for any page, and B) that no page will ever be categorized into a page not in the Category: namespace. So what it does is look at all categorizations of all pages, and then filters it down afterwards. Add AND catlink.lt_namespace = 14 to your WHERE clause and the first query completes in about a minute. (It'll take longer to show up on the Quarry interface because it has to store and present those 376k results, but it'll still be well under ten minutes.) —Cryptic 13:03, 26 April 2026 (UTC)
Aha, thought these results looked familiar. I also did this query at quarry:query/97491, in a way that's not affected by the cl_to schema change - it looks for the __DISAMBIG__ magic word, typically as part of a template. —Cryptic 13:09, 26 April 2026 (UTC)
Amazing, thank you so much! Andrew Gray (talk) 17:16, 26 April 2026 (UTC)
I've put a report together at Wikipedia:WikiProject Disambiguation/One-link pages and will chip away at it. Andrew Gray (talk) 18:29, 26 April 2026 (UTC)
There'd be a bunch more if not for navboxes, which I see you've also been chipping away at. (I happened to notice New wave reggae today, for instance, which disambiguates two mainspace pages and has links to 259 more in its navboxes.) quarry:query/104722 shows the obvious ones. I don't work with templates much these days, so I don't know how common it is to make navbox templates without using either Module:Navbox or Module:Sidebar. —Cryptic 18:50, 26 April 2026 (UTC)
Yes, the navboxes are a bit confusing - the regiments and string quartets pages all seem to have an intentional navbox of disambiguation pages, which I'm a little confused by. Meanwhile the sidebar in Turing machine (disambiguation) is a load of entries all of which could reasonably be in text on the page. But at least we can deal with the simplest ones! Andrew Gray (talk) 20:39, 26 April 2026 (UTC)
It's not normal for dabs to have navboxes. I've looked at dabs beginning A–G in the report. I fixed a few by adding missing entries or linking unlinked entries. However, most should either redirect to their sole entry or have that entry usurp the dab's title. Thank you for finding these problematic pages. Certes (talk) 22:17, 26 April 2026 (UTC)
Thanks for looking into these! Fifty down now. I agree most of what are left are probably usurpation cases, and those should be straightforward to go through when I have some time to concentrate on it. Andrew Gray (talk) 12:19, 28 April 2026 (UTC)
Thank you. Most need usurpation but there's a significant number like Adams Crossroads and Amei where the linked title is already good and we should just edit the dab to become a redirect (bypassing double redirects for cases like Aetna, Arkansas). I can fix these later if you didn't get there first. Certes (talk) 13:13, 28 April 2026 (UTC)
@Andrew Gray: Thanks for working back from Z. I've done A–L but skipped a couple where I'd like a second opinion please. KNCC could be treated like KORM but we might add Cinema of Kuwait#Kuwait National Cinema Company if the name is current and the text not too promotional to keep. I'm finding no relevant search results for the abbreviation MTGF so the dab might be best deleted. What do you (or anyone else) think? Certes (talk) 11:29, 29 April 2026 (UTC)
KNCC - the article on the cinema company was at Cinescape and was deleted for being spam, but arguably a notable topic despite that so I'd say go ahead there. They still seem to trade under the KNCC name and the other one is just the branding.
MTGF - agree, feels like this would only be worth redirecting if he was known by the abbreviation rather than the name Andrew Gray (talk) 17:36, 29 April 2026 (UTC)
I think that's the non-usurpation cases done. The two newly found zero-link cases seem useless and can be deleted per G14. Certes (talk) 16:35, 30 April 2026 (UTC)
Excellent, that'll give me something to work on over the weekend :) Andrew Gray (talk) 21:38, 30 April 2026 (UTC)
Thanks. There are still a few judgement calls, mostly to do with WP:INITS. For example, I've no idea whether Edwin Peacock or Edwin D. Peacock is his usual form of address, but the article title is probably not critical as long as one redirects to the other. Certes (talk) 22:12, 30 April 2026 (UTC)

Related Articles

Wikiwand AI