Talk:Unicode

From Wikipedia, the free encyclopedia

This is the talk page for discussing improvements to the Unicode article.
This is not a forum for general discussion of the subject of the article.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Archives (index): 1, 2, 3, 4, 5, 6, 7: 2 years

Typography Top‑importance

	This article is within the scope of WikiProject Typography, a collaborative effort to improve the coverage of articles related to Typography on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.TypographyWikipedia:WikiProject TypographyTemplate:WikiProject TypographyTypography
Top	This article has been rated as Top-importance on the importance scale.

Languages Low‑importance

	Language portal This article is within the scope of WikiProject Languages, a collaborative effort to improve the coverage of languages on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.LanguagesWikipedia:WikiProject LanguagesTemplate:WikiProject Languageslanguage
Low	This article has been rated as Low-importance on the project's importance scale.

Computing High‑importance

	This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing
High	This article has been rated as High-importance on the project's importance scale.

Globalization

	This article is within the scope of WikiProject Globalization, a collaborative effort to improve the coverage of Globalization on Wikipedia. If you would like to participate, you can edit the article attached to this page, or visit the project page, where you can join the project and see a list of open tasks.GlobalizationWikipedia:WikiProject GlobalizationTemplate:WikiProject GlobalizationGlobalization
???	This article has not yet received a rating on the project's importance scale.

Text and/or other creative content from this version of Unicode was copied or moved into incubator:Wp/nod/ᩀᩪᨶᩥᨣᩰ᩠ᨯ with this edit. The former page's history now serves to provide attribution for that content in the latter page, and it must not be deleted as long as the latter page exists.

Codespace and code points

In the Codespace and code points section, it refers to "the interval $[0,17\times 2^{16})$ ". I had to read it several times to figure out what was meant. I originally parsed "0,17" as a European-format decimal number, which made no sense. Eventually I figured what was meant, but it wasn't at all obvious. There is nothing in the referenced Unicode 15 standard which uses that terminology, either. The use of mis-matched bracket and paren is a math construct which makes sense for real intervals, but is less commonly used in integer contexts. It will simply appear wrong to readers without a real analysis background.

May I suggest this might be more understandable replaced with "the range 0 : 1114111"? The origin of the latter number is available later in the sentence (with the hexadecimal number 0x10FFFF). Alternatively, a less obscure notation might be $[0,(2^{20}+2^{16}-1)]$ . Tarl N. (discuss) 22:55, 11 May 2024 (UTC)

I think I was the one who originally added this. Please do replace it with something more straightforward. Remsense诉 23:10, 11 May 2024 (UTC)

Done, using prose: in the range from 0 to 1114111,... Tarl N. (discuss) 08:01, 12 May 2024 (UTC)

308 characters not mentioned

The only detail for Unicode 1.0.1 is about 20902 CJK Unified Ideographs added, but in total 21204 characters were added and 6 were removed. In total, 308 characters were not mentioned at all. Did I miss something while reading the page? What happened to those characters? Can somebody at least explain to me? Apologies in advance if I wasted your time. Mucksrunt (talk) 13:31, 26 August 2024 (UTC)

The Unicode 1.0.1 changes were messy. They brought Unicode into alignment with ISO 10646 and happened prior to the stability policies in place today. I don't come up with 308 characters but looking through the infoboxes for the various Unicode blocks (which I beleive are accurate), I find these changes with Unicode version 1.0.1:

Alphabetic Presentation Forms (+1)
CJK Compatibility Ideographs (+302)
CJK Symbols and Punctuation (+0)
CJK Unified Ideographs (+20,902)
Combining Diacritical Marks (+2)
Cyrillic (-4)
Enclosed CJK Letters and Months (-1)
Greek and Coptic (-9)
Hebrew (-1)
Lao (-5)
Miscellaneous Technical (-2)
Thai (-5)
Additionally, the range for Private Use Areas was expanded by 768 code points. DRMcCreedy (talk)

Input requested on Unicode block template redesign

Hey! On a lark, I decided to try a minor redesign of the Unicode block templates while fixing the pressing issue of dark mode support—see Wikipedia:Village pump (technical)#Unicode block template and tell me any thoughts you have, as I think it's probably worthwhile to at least refresh these templates. Remsense ‥ 论 15:44, 11 September 2024 (UTC)

I have two concerns on your proposed redesign: First, the link to the Unicode PDF chart is no longer obvious to the reader as it's now a reference as opposed to being clear in the chart heading. Easy access to the PDF is especially important for not widely supported code ranges. Second, consolidating the notes onto a single line is OK for most of the cases but will be harder to understand for charts with longer notes like Template:Unicode chart Hangul Jamo. DRMcCreedy (talk) 17:41, 11 September 2024 (UTC)

Moving to a reference wasn't my idea directly, as I can see it either way. Per your second point, I would actually handle this by adding additional lines for those extra notes. Remsense ‥ 论 17:53, 11 September 2024 (UTC)

@Drmccreedy I think I've finished iterating on the design for now in response to feedback here and at the Village Pump—I'm still not totally sure how/whether to display the default footnote and the PDF code chart reference, but other than that I think it's just about ready to consider deploying. Any further thoughts? Remsense ‥ 论 05:01, 13 September 2024 (UTC)

Unicode BMP Status

According to the Unicode Roadmap, the status is not categorised. I’ve tried to categorise them: here’s the result:

0000-058F Most basic LTR scripts
0590-08FF RTL scripts
0900-109F Most Asian and Indian scripts and languages
10A0-10FF Georgian (unique part)
1100-167F Larger scripts, including UCAS, Ethiopic and Hangul
1680-16FF Historical scripts
1700-1CFF Most Asian scripts, somewhat European
1D00-1FFF Latin and other basic LTR scripts
2000-2BFF Set of symbols, including punctuation and math and currency
2C00-2CFF Latin, Glagolitic (I don't know how to categorize them)
2C80-2E7F African scripts and most LTR scripts
2E80-9FFF CJK scripts, including Japanese, Hangul Jamo and ideographs
A000-A4FF Asian scripts
A500-A7FF Most LTR Scripts including the Medieval, African and Asian scripts
A800-ABFF Most Asian scripts
AC00-D7FF Hangul / Korean
D800-F8FF Surrogates & Private Use
F900-FFFF Mixed scripts, especially alternative or presentation forms

MarcoToa1 (talk) 01:57, 26 May 2025 (UTC)

I'm not sure where you are going with this. It looks like original research, which isn't allowed in Wikipedia articles. DRMcCreedy (talk) 14:34, 27 May 2025 (UTC)

There seems to be a table like this at BMP that is where you want to go. Spitzak (talk) 15:18, 27 May 2025 (UTC) — Preceding unsigned comment added by Banovercheckcross (talk • contribs)

"Mapping to Legacy Character Sets" on Hangul

Near the beginning of this section, it says "This is most pronounced in the three different encoding forms for Korean Hangul".

This needs clarification. Im only aware of two redundant encodings for Korean Hangul, those being the precomposed blocks and the positional jamo. Is the third such encoding the unpositioned jamo? Those characters aren't rendered in blocks by font engines, so I don't think they would count. Awelotta (talk) 04:01, 1 October 2025 (UTC)

Related Articles