Cork encoding

Latin script character encoding used by LaTeX From Wikipedia, the free encyclopedia

The Cork (also known as T1 or EC) encoding is a character encoding used for encoding glyphs in fonts.[1] It is named after the city of Cork in Ireland, where during a TeX Users Group (TUG) conference in 1990 a new encoding was introduced for LaTeX.[1] It contains 256 characters supporting most west- and east-European languages with the Latin alphabet.[2]

Details

In 8-bit TeX engines the font encoding has to match the encoding of hyphenation patterns where this encoding is most commonly used.[3] In LaTeX one can switch to this encoding with \usepackage[T1]{fontenc}, while in ConTeXt MkII this is the default encoding already. In modern engines such as XeTeX and LuaTeX Unicode is fully supported and the 8-bit font encodings are obsolete.

Character set

More information A, B ...
Cork encoding
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x `
0060
´
00B4
ˆ
02C6
˜
02DC
¨
00A8
˝
02DD
˚
02DA
ˇ
02C7
˘
02D8
¯
00AF
˙
02D9
¸
00B8
˛
02DB
201A
2039
203A
1x
201C
201D
201E
«
00AB
»
00BB
2013
2014
ZWSP[a]
200B
[b]
2080
ı[c]
0131
ȷ[c]
0237
FB00
FB01
FB02
FB03
FB04
2x
2423
! " # $ % &
2019
( ) * + , - . /
3x 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \ ] ^ _
6x
2018
a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { | } ~ SHY[d]
8x Ă
0102
Ą
0104
Ć
0106
Č
010C
Ď
010E
Ě
011A
Ę
0118
Ğ
011E
Ĺ
0139
Ľ
013D
Ł
0141
Ń
0143
Ň
0147
Ŋ
014A
Ő
0150
Ŕ
0154
9x Ř
0158
Ś
015A
Š
0160
Ş
015E
Ť
0164
Ţ
0162
Ű
0170
Ů
016E
Ÿ
0178
Ź
0179
Ž
017D
Ż
017B
IJ
0132
İ
0130
đ
0111
§
00A7
Ax ă
0103
ą
0105
ć
0107
č
010D
ď
010F
ě
011B
ę
0119
ğ
011F
ĺ
013A
ľ
013E
ł
0142
ń
0144
ň
0148
ŋ
014B
ő
0151
ŕ
0155
Bx ř
0159
ś
015B
š
0161
ş
015F
ť
0165
ţ
0163
ű
0171
ů
016F
ÿ
00FF
ź
017A
ž
017E
ż
017C
ij
0133
¡
00A1
¿
00BF
£
00A3
Cx À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
Dx Ð[e] Ñ Ò Ó Ô Õ Ö Œ
0152
Ø Ù Ú Û Ü Ý Þ SS[f]
1E9E
Ex à á â ã ä å æ ç è é ê ë ì í î ï
Fx ð ñ ò ó ô õ ö œ
0153
ø ù ú û ü ý þ ß
00DF
Close

Notes

  • Hexadecimal values under the characters in the table are the Unicode character codes.
  • The first 12 characters are often used as combining characters.
  1. 0x17 is dubbed a “compound word mark” (CWM) in the Cork encoding, and is an innovation of this standard. It is an invisible character that separates compounds in a complex word, for instance in German, in order to disallow esthetic ligatures at compound boundaries.[2] It is mapped to the Unicode “zero-width space” (ZWSP, U+200B), defined at about the same time, whose purpose is similar, if not identical.
  2. 0x18 is a “small o”, used to compose or (or arbitrary smaller quantities) out of percent sign (%).[2]
  3. Dotless i and dotless j may be used to compose accented variants like i with macron (ī).
  4. 0x7F is the hyphenation character, not really a soft hyphen (SHY) as defined by Unicode.
  5. 0xD0 is used both as Eth (Ð, U+00D0) and as D with stroke (Đ, U+0110) which might be a problem at some occasions (like copying text from PDF, hyphenation, ...)
  6. 0xDF contains an uppercase variant of ß. Traditionally, it was SS (two letters S), but some newer fonts may use ẞ.[4] It allows TeX to automatically convert the German lowercase ß into the uppercase form.

Supported languages

The encoding supports most European languages written in Latin alphabet. Notable exceptions are:

Languages with slightly suboptimal support include:

  • Galician language, Portuguese language and Spanish language – due to the lack of characters ª and º, which are not superscript versions of lowercase "a" and "o" (superscripts are thinner) and they are often underlined
  • Croatian language, Bosnian language, Serbian language – due to the shared use of the slot for Đ
  • Turkish language – due to dotless i having different uppercase and lowercase combinations than in other languages
  • Romanian language – due to the characters "Ş ş Ţ ţ" (with a cedilla) being typographically considered incorrect by modern standards[5][6], with the expected correct forms being "Ș ș Ț ț" [7] (with a comma below) - though when the encoding was developed, it was arguably considered acceptable at that time, but the status of support retroactively changed to suboptimal or insufficient when thw Unicode codepoints were disunified.

References

Related Articles

Wikiwand AI