Coding table for the Latin 2 character set

This table is primarily intended for the Internet users of the Morphological Analyzer software. It also directly applies to the ISO Latin 2-based tables of the 8-bit coding perl library HB_CharCode8.pm (and its associated code conversion program HBCode.pl)8. Obviously, it can be also used just for reference purposes (first five columns).

Char1 ISO 10646-1
Code2
ISO 8859-2
Code3
Windows
CP 12503
SGML Entity4 Pseudo5 (La)TeX6 Without
accents7
   ( )
 (a0)
 (a0)
    ~  
 Ą (Ą)
(a1)
(a5)
Ą A_ {\c{A}} A
 ˘ (˘)
(a2)
(a2)
˘ _ {\u{}} _
 Ł (Ł)
(a3)
(a3)
Ł L/ {\L} L
 ¤ (¤)
(a4)
(a4)
¤ $ {\${}} $
 Ľ (Ľ)
(a5)
(bc)
Ľ L" {\v{L}} L
 Ś (Ś)
(a6)
(8c)
Ś S' {\'{S}} S
 § (§)
(a7)
(a7)
§ P$ {\S} ~
 ¨ (¨)
(a8)
(a8)
¨ : {\"{}} :
 Š (Š)
(a9)
(8a)
Š S" {\v{S}} S
 Ş (Ş)
(aa)
(aa)
Ş S, {\c{S}} S
 Ť (Ť)
(ab)
(8d)
Ť T" {\v{T}} T
 Ź (Ź)
(ac)
(8f)
Ź Z' {\'{Z}} Z
 – (–)
(ad)
(ad)
­ - \- -
 Ž (Ž)
(ae)
(8e)
Ž Z" {\v{Z}} Z
 Ż (Ż)
(af)
(af)
Ż Z. {\.{Z}} Z
 ˚ (˚)
(b0)
(b0)
° * {\accent'27{}} *
 ą (ą)
(b1)
(b9)
ą a, {\c{a}} a
 ˛ (˛)
(b2)
(b2)
˛ , {\c{}} ,
 ł (ł)
(b3)
(b3)
ł l/ {\l} l
 ˊ (ˊ)
(b4)
(b4)
´ ' {\'{}} '
 ľ (ľ)
(b5)
(be)
ľ l" {\v{l}} l
 ś (ś)
(b6)
(9c)
ś s' {\'{s}} s
 ˇ (ˇ)
(b7)
(a1)
ˇ " {\v{}} "
 ¸ (¸)
(b8)
(b8)
¸ , {\c{}} ,
 š (š)
(b9)
(9a)
š s" {\v{s}} s
 ş (ş)
(ba)
(ba)
ş s, {\c{s}} s
 ť (ť)
(bb)
(9d)
ť t" {\v{t}} t
 ź (ź)
(bc)
(9f)
ź z' {\'{z}} z
 ˝ (˝)
(bd)
(bd)
˝ ; {\H{}} ;
 ž (ž)
(be)
(9e)
ž z" {\v{z}} z
 ż (ż)
(bf)
(bf)
ż z. {\.{z}} z
 Ŕ (Ŕ)
(c0)
(c0)
Ŕ R' {\'{R}} R
 Á (Á)
(c1)
(c1)
Á A' {\'{A}} A
 Â (Â)
(c2)
(c2)
 A^ {\^{A}} A
 Ă (Ă)
(c3)
(c3)
Ă A_ {\u{A}} A
 Ä (Ä)
(c4)
(c4)
Ä A: {\"{A}} A
 Ĺ (Ĺ)
(c5)
(c5)
Ĺ L' {\'{L}} L
 Ć (Ć)
(c6)
(c6)
Ć C' {\'{C}} C
 Ç (Ç)
(c7)
(c7)
Ç C, {\c{C}} C
 Č (Č)
(c8)
(c8)
Č C" {\v{C}} C
 É (É)
(c9)
(c9)
É E' {\'{E}} E
 Ę (Ę)
(ca)
(ca)
Ę E, {\c{E}} E
 Ë (Ë)
(cb)
(cb)
Ë E: {\"{E}} E
 Ě (Ě)
(cc)
(cc)
Ě E" {\v{E}} E
 Í (Í)
(cd)
(cd)
Í I' {\'{I}} I
 Î (Î)
(ce)
(ce)
Î I^ {\^{I}} I
 Ď (Ď)
(cf)
(cf)
Ď D" {\v{D}} D
 Đ (Đ)
(d0)
(d0)
Đ D/ {\D} D
 Ń (Ń)
(d1)
(d1)
Ń N' {\'{N}} N
 Ň (Ň)
(d2)
(d2)
Ň N" {\v{N}} N
 Ó (Ó)
(d3)
(d3)
Ó O' {\'{O}} O
 Ô (Ô)
(d4)
(d4)
Ô O^ {\^{O}} O
 Ő (Ő)
(d5)
(d5)
Ő O; {\H{O}} O
 Ö (Ö)
(d6)
(d6)
Ö O: {\"{O}} O
 × (×)
(d7)
(d7)
× x {\times} x
 Ř (Ř)
(d8)
(d8)
Ř R" {\v{R}} R
 Ů (Ů)
(d9)
(d9)
Ů U" {\accent'27U} U
 Ú (Ú)
(da)
(da)
Ú U' {\'{U}} U
 Ű (Ű)
(db)
(db)
Ű U; {\H{U}} U
 Ü (Ü)
(dc)
(dc)
Ü U: {\"{U}} U
 Ý (Ý)
(dd)
(dd)
Ý Y' {\'{Y}} Y
 Ţ (Ţ)
(de)
(de)
Ţ T, {\c{T}} T
 ß (ß)
(df)
(df)
ß s$ {\ss} ss
 ŕ (ŕ)
(e0)
(e0)
ŕ r' {\'{r}} r
 á (á)
(e1)
(e1)
á a' {\'{a}} a
 â (â)
(e2)
(e2)
â a^ {\^{a}} a
 ă (ă)
(e3)
(e3)
ă a_ {\u{a}} a
 ä (ä)
(e4)
(e4)
ä a: {\"{a}} a
 ĺ (ĺ)
(e5)
(e5)
ĺ l' {\'{l}} l
 ć (ć)
(e6)
(e6)
ć c' {\'{c}} c
 ç (ç)
(e7)
(e7)
ç c, {\c{c}} c
 č (č)
(e8)
(e8)
č c" {\v{c}} c
 é (é)
(e9)
(e9)
é e' {\'{e}} e
 ę (ę)
(ea)
(ea)
ę e, {\c{e}} e
 ë (ë)
(eb)
(eb)
ë e: {\"{e}} e
 ě (ě)
(ec)
(ec)
ě e" {\v{e}} e
 í (í)
(ed)
(ed)
í i' {\'{\i}} i
 î (î)
(ee)
(ee)
î i^ {\^{\i}} i
 ď (ď)
(ef)
(ef)
ď d" {\v{d}} d
 đ (đ)
(f0)
(f0)
đ d/ {\d} d
 ń (ń)
(f1)
(f1)
ń n' {\'{n}} n
 ň (ň)
(f2)
(f2)
ň n" {\v{n}} n
 ó (ó)
(f3)
(f3)
ó o' {\'{o}} o
 ô (ô)
(f4)
(f4)
ô o^ {\^{o}} o
 ő (ő)
(f5)
(f5)
ő o; {\H{o}} o
 ö (ö)
(f6)
(f6)
ö o: {\"{o}} o
 ÷ (÷)
(f7)
(f7)
÷ / {\div} /
 ř (ř)
(f8)
(f8)
ř r" {\v{r}} r
 ů (ů)
(f9)
(f9)
ů u" {\accent'27u} u
 ú (ú)
(fa)
(fa)
ú u' {\'{u}} u
 ű (ű)
(fb)
(fb)
ű u; {\H{u}} u
 ü (ü)
(fc)
(fc)
ü u: {\"{u}} u
 ý (ý)
(fd)
(fd)
ý y' {\'{y}} y
 ţ (ţ)
(fe)
(fe)
ţ t, {\c{t}} t
 ˙ (·)
(ff)
(ff)
˙ . {\.{}} .

1 This is what you get when you specify the output code as "Graphics Output" in the Morphological Analyzer data entry page. (More precisely, when you select "Graphics Output (bold)".)

2 This is currently for reference only. ISO 10646-1 is the (future) standard, now being compatible with Unicode. Every character in the world has (theoretically) a numeric code assigned to it in the 10646-1 standard, therefore (in the ideal situation that everyone sticks to this code and every software understands it) avoiding any coding problems. It's a multi-byte code, and on the network, the high byte goes first (unlike the native byte-order in x86 integers). It appears that already today (2000) both the newest Internet Explorer (5.0) and Navigator/Communicator (4.7, or even any 4.x) do understand these codes when encoded in an ordinary, 8-bit data stream, by using so called "numeric character reference". In this column, the character to be displayed has been supplied to the browser as a decimal numeric character reference. It is then followed by the (visible) SGML source for it, in decimal notation. (The reason for using decimal is that some browsers - such as Netscape 4.03 - are incapable to decode hexadecimal numeric character references.) For more details on numeric character references in HTML 4, see HTML Document Representation. The important thing to notice here is that if you see the characters properly (i.e., they all match the graphical character image in column 1), your browser does understand ISO 10646-1 numeric character references.

3 This is the original 8-bit code for Latin 2, nowadays most widely used on the web and elsewhere (except possibly for old documents created on MS/PC DOS and on Macintoshes). Code in parenthesis is in the hexadecimal notation. If you see the characters correctly displayed in the ISO 8859-2 column, your browser does understand the charset argument in the document header (<META HTTP-EQUIV=.... CHARSET=iso-8859-2'>), or your browser is bad behaving but your system is a Unix system with the ISO 8859-2 fonts active; you should keep the iso-8859-2 coding checked (for displaying at least) in the word/text entry form. If you see the correct characters in the Windows CP 1250 column, you are currently in front of a Windows machine and your browser is getting old; in such a case, either upgrade or manually select Windows CP 1250 coding in the word/text entry window. In other words, you should now always see the correct characters in the ISO 8859-2 column, and not in the Windows CP 1250 column, even if you are in front of a MS Windows machine with Internet Explorer running! The color-coded rows help you quickly determine this, since those are the codes where ISO 8859-2 and Windows CP 1250 differ.

4 Additional encoding/decoding might take place (or be required, depending on the coding direction) for special characters such as &, <, > etc.

5 Decoding the pseudocode is only done for characters which use apostrophe (') and doublequote (") as their second character. (This corresponds to characters used in the Czech language.) This might change in the future, but then other conventions would have to be introduced to disambiguate while decoding back to ISO Latin 2.

6 Additional possibilities exist for TeX- and LaTeX-style of representing accented characters; every valid (La)TeX possibility should be decoded correctly back into ISO Latin 2 (including the weird space-separated accents), such as ko\v cka, ko\v {c}ka, ko{\v c}ka, and ko\v{c}ka for "koka". The representation in the table is the preferred one and it is used for coding from ISO Latin 2 to (La)TeX.

7 Decoding back to ISO Latin 2 is obviously not possible with this encoding type. However, in the Morphological Analyzer data entry page, it can be specified meaning that an "expanded" analysis of unaccented word forms is requested. The mechanism for doing so is, however, non-trivial and has nothing to do with character encoding schemes (in fact, a separate, special version of the dictionary is being used for this purpose).

8 Which will be shortly made available from UFAL.