Talk:List of Unicode characters
From Wikipedia, the free encyclopedia
| This is the talk page for discussing improvements to the List of Unicode characters article. This is not a forum for general discussion of the subject of the article. |
Article policies
|
| Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL |
| Archives: 1 |
This article was nominated for deletion. Please review the prior discussions if you are considering re-nomination:
|
| This article is rated List-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||
| |||||||||||
| Text and/or other creative content from this version of Unified Canadian Aboriginal Syllabics character table was copied or moved into List of Unicode characters with this edit on 20:26, 21 December 2007. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists. |
| Text and/or other creative content from this version of List of Unicode characters was copied or moved into Dingbat with this edit on 22:55, 3 February 2015. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists. |
| Text and/or other creative content from this version of List of Unicode characters was copied or moved into Miscellaneous Mathematical Symbols-A with this edit on 19:31, 3 February 2015. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists. |
| Text and/or other creative content from this version of List of Unicode characters was copied or moved into Unicode and HTML for the Hebrew alphabet with this edit on 20:59, 4 February 2015. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists. |
| Text and/or other creative content from this version of List of Unicode characters was copied or moved into Arabic script in Unicode with this edit on 16:22, 6 February 2015. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists. |
| Text and/or other creative content from this version of List of Unicode characters was copied or moved into Syriac (Unicode block) with this edit on 18:06, 6 February 2015. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists. |
| Text and/or other creative content from this version of List of Unicode characters was copied or moved into Block Elements with this edit on 18:33, 6 February 2015. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists. |
| Text and/or other creative content from this version of List of Unicode characters was copied or moved into Spacing Modifier Letters with this edit on 21:25, 8 February 2015. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists. |
Why is U+00A0 not in the control character section?
Its function is a control character no? — Preceding unsigned comment added by 76.81.249.42 (talk) 01:52, 9 October 2019 (UTC)
- U+00A0 has a general category of Zs (Separator, space), not Cc (Other, control) per UnicodeData.txt. BTW: I've removed U+0020 from the control character section's table because it too has a Unicode general category of Zs and the text before the table correctly states there are "65 characters, including DEL but not SP". DRMcCreedy (talk) 04:13, 9 October 2019 (UTC)
Octal Entity Reference Code
Octal code is very useful & still need to be used in some programs, for example: in bash/shell programming, escape sequence, JS(javascript), perl, postscript, etc, etc. Various OS core (low-level) libraries/programs still use octal, & its especially need to be viewed for Control-Characters, Basic-Latin, etc Unicode characater ranges.
To see/obtain more octal chart/code, you may go here: https://utf8-chartable.de/unicode-utf8-table.pl?utf8=oct
More info: https://en.wikipedia.org/wiki/UTF-8#Examples ,
Wiki page on Octal needs to be updated further with a more detail on how octal numbers are actually used in different type of computer programs. Literal conversion from hex/dec to oct is not enough for all cases. But one sentence that has "\3nn", does mention the UTF-8 based octal usage, but needs elaboration. In shell terminal, 3-digits octal code can be used, for-example, we will try to show ÷ (U+00F7) and € (U+20AC) sign: this code ‟printf "Not-Bold. \303\267 . \342\202\254 (1) \xE2\x82\xAC (2) \x20AC (3) \u20AC (4) \U000020AC (5). \u \033[1mBold\033[0m.\n";”
Or this code ‟echo $'Not-Bold. \303\267 . \342\202\254 (1) \xE2\x82\xAC (2) \x20AC (3) \u20AC (4) \U000020AC (5). \033[1mBold\033[0m.';”
both will be displayed as: ‟No-Bold. ÷ . € (1) € (2) \x20AC (3) \u20AC (4) \U000020AC (5). Bold.” (in macOS-catalina(10.15.x) old bash v3.2.57 shell did not support (3)(4)(5) format) . € = U+20AC = Decimal code-point 8364 = Octal code-point 20254 = UTF-8-Octal \342\202\254 = UTF-8-Hex \xE2\x82\xAC.
To convert a symbol/character into octal, you may do this1:
printf 👍 | od -t o1
0000000 360 237 221 215 <-- Octal Unicode code-point 372115 (U+1F44D)
^ ^^ ^^ ^^. --atErik1 (talk) 13:43, 5 September 2020 (UTC)
The mysterious # column
Hi, most of the tables from Basic_Latin through Cyrillic have a rightmost column headed #. What is the significance? Without an explanation the naive reader is left to guess. =8~/ Thx, ... PeterEasthope (talk) 02:59, 18 November 2022 (UTC)
- It's the decimal value for the hexidecimal Unicode code point. I agree it should definitely be labeled better. DRMcCreedy (talk) 03:26, 18 November 2022 (UTC)
- No, it isn't. The numbers start with "001" at the space, and increment through Latin Extended-A. Then select characters in Latin Extended-B and Additional, IPA Extensions, Spacing Modifier Letters, then take up again in Greek and Coptic and Cyrillic. I have sheparded a script through the Unicode / ISO 10646 process, and I am confident I've never seen those values before. VanIsaac, GHTV contWpWS 04:47, 18 November 2022 (UTC)
- Sorry, I was looking at the wrong column. My best guess is it's some enumeration of the characters in WGL-4, MES-1 and MES-2. Maybe just MES-2 since the article says MES-2 contains all the characters in WGL-4 and MES-1. The WGL-4, MES-1 and MES-2 table splits the Unicode code point up by "row" and "cells" but you can see it going from U+0020–7E, 00A0–FF, 0100-017F, 018F, 0192, 01B7, etc, which matches the # column. No idea why this as added to the List of Unicode characters article. Although the lede says "This article includes the 1062 characters in the Multilingual European Character Set 2 (MES-2) subset, and some additional related characters." DRMcCreedy (talk) 08:24, 18 November 2022 (UTC)
- No, it isn't. The numbers start with "001" at the space, and increment through Latin Extended-A. Then select characters in Latin Extended-B and Additional, IPA Extensions, Spacing Modifier Letters, then take up again in Greek and Coptic and Cyrillic. I have sheparded a script through the Unicode / ISO 10646 process, and I am confident I've never seen those values before. VanIsaac, GHTV contWpWS 04:47, 18 November 2022 (UTC)
- I noticed that the change is made by @Wbm1058:. Perhaps it would be best to ask him about the rationale behind it? Smbat.petrosyan (talk) 14:01, 11 March 2025 (UTC)
- Been a long time since I spent any significant time working on this page. Note that I expanded the lead section on 15 August 2016 to explain this, and apparently since then, someone decided that this was too much information, and shortened the lead to remove my more detailed explanation. Perhaps this longer explanation can be put back. The column was just my way of counting the MES-2 characters to make sure that they were all accounted for in this list. I guess I got up to 0926 before I ran out of steam and moved on to work on other things. 0927–1062 would still be in the bottom tables which haven't been converted to lists which include a Description column yet. Note the column heading MES-2 Rationale starting at List of Unicode characters#Latin Extended-B where MES-2 starts being selective, and doesn't include everything. – wbm1058 (talk) 14:58, 11 March 2025 (UTC)
- This 29 December 2022 edit was a misguided move of my text as a "self-reference in the opening to a proper hatnote." – wbm1058 (talk) 15:10, 11 March 2025 (UTC)
- And then this 10 September 2023 edit removed the misguided hatnote. – wbm1058 (talk) 15:18, 11 March 2025 (UTC)
The really problem is the rejected/boxed ones.
they are just boxes! No significance. 2804:663C:2D07:97C0:B103:6474:A7EA:4A7F (talk) 20:40, 7 April 2025 (UTC)
- Many Unicode characters will no doubt show as boxes unless you have supporting fonts installed on your device. See Help:Multilingual support for more information. DRMcCreedy (talk) 00:00, 8 April 2025 (UTC)
Unicode 17.0 and Emoji 17.0
The pages Emoji, List of Emojis, and List of Unicode Characters all need to be updated to have the correct counts of characters and emoji following the release of Unicode 17.0 and Emoji 17.0. I have already edited the Template for List of Emojis emoji table, but I'm not very technically-orientated and cannot decipher the proper count of characters on Unicode's informative text files, so if any other editor could undertake this that would be super. ✨ΩmegaMantis✨blather 21:38, 9 September 2025 (UTC)
- I plan on doing these updates this week. DRMcCreedy (talk) 22:23, 9 September 2025 (UTC)
- Thank you! ✨ΩmegaMantis✨blather 02:00, 10 September 2025 (UTC)
Miscellaneous Symbols has more Game symbols
There are some dice, shogi, checkers, and card suits symbols in the Miscellaneous Symbols block that might benefit from also being included in the Game Symbols section. Arguably, the Go symbols can be added too, but they're more for mathematical analysis rather than gaming. Also, arguably, there's the ball emojis for sports. 2600:8800:97:E200:1031:EF09:EE66:A6E0 (talk) 18:06, 19 September 2025 (UTC)
Adding new external link
I was about to add a link, but I read the following: "Please do not add links to commercial or copycat sites, they will be removed". The link was https://www.compart.com/en/unicode/. Does anyone think the site, while commercial, is still useful? Thanks in advance, Simone Aiello 95.233.153.179 (talk) 02:45, 7 October 2025 (UTC)
- I lean towards not adding it. It's a very visually pleasing layout of Unicode information but it's just a different presentation of the data. That said, I won't delete it if it is added. DRMcCreedy (talk) 04:54, 7 October 2025 (UTC)
ToC?
Why does this very long page with many sub-headings not have a TOC on top? Even FORCETOC does not work. Tom Peters (talk) 14:05, 17 February 2026 (UTC)
- I see the table of contents but have no explanation of why you don't. Sorry. DRMcCreedy (talk) 16:09, 17 February 2026 (UTC)
- I see it too. 49 headings and goodness knows how many sub-headings. 𝕁𝕄𝔽 (talk) 19:25, 17 February 2026 (UTC)