viernes, 5 de febrero de 2010

ISO 639. Code for the representation of the names of languages

Code for the Representation of the Names of Languages. From ISO 639, revised 1989.

ISO 639. Code for the representation of the names of languages.

[2001-08-29] Note: See now the updated document "Language Identifiers in the Markup Context."

Note. See also related materials:


[December 20, 1997] Updated to reflect six changes from the "1989" revision by the ISO 639 Registration Authority; see the note at the end of the document itemizing the edits.

The partial listing of ISO 639 two-character codes is supplied here will supplement the shorter lists given in Martin Bryan (SGML: An Author's Guide to the Standard Generalized Markup Language, 92-93) and Eric van Herwijnen (Practical SGML, 67-68). The two-character language codes of ISO 639 are relevant to SGML encoding in two respects. First, the SGML standard (ISO 8879) itself specifies that declaration of public text language should be given using the language code(s) from ISO 639; see ISO 8879-1986(E) page 36, section 10.2.2.3. Second, the WSD (Writing System Declaration) implemented in the Text Encoding Initiative uses the [two-character] language code of ISO 639 (as amended) as a language.code attribute of the nat.language declaration, specifying the language in which the WSD is written.

The information on 2-character language codes summarized below has been taken from ISO 639 Code for the representation of the names of languages. First edition, 1988-04-01. Reference number: ISO 639: 1988 (E/F). iii + 17 pages. ISO 639:1988 is a technical revision of ISO 639: 1967, prepared by Technical Committee ISO/TC 37. The language codes are listed in ISO 639 with lowercase letters, but are given here in uppercase, as recommended for use as SGML tag names for "public text language." See ISO 8879 section 10.2.2.3: "the 'public text language' must be a two-character name, entered with upper-case letters."

ISO 639 contains much other information about the use of language symbols, registration of new symbols, etc. The language codes of ISO 639 are said to be "devised primarily for use in terminology, lexicography and linguistics, but they may be used for any application requiring the expression of languages in coded form." The registration authority for ISO 639 is given as Infoterm, Österreiches Normungsinstitut (ON), Postfach 130, A-1021 Vienna, AUSTRIA.

The two-character language codes of ISO 639 are recognized as being inadequate for use as SGML language attributes when tagging text, viz, for use as global lang attributes attached to any element to identify the language of the text element or a language shift. On lang as a global attribute, see the TEI Guidelines, page 45, section 3.2.1. In principle, there should be nothing wrong with tagging language using SGML elements rather than attributes, if the encoder has principled reasons for not using attributes (e.g., indexing engines which read simple tags but not SGML attributes). But the two-character codes of ISO 639 are neither sufficiently mnemonic nor complete for the world's languages: whereas ISO 639 supplies codes for only about 136 languages, the Ethnologue published by the Summer Institute of Linguistics identifies over 6100 languages (see Ethnologue: Languages of the World, ed. Barbara Grimes. 11th edition. Dallas, TX: Summer Institute of Linguistics, 1988). A revision of ISO 639 completed late 1990 is described as supplying 3-character language codes (following MARC 3-character language codes in part), based upon the code sequence of the American National Standard (ANSI Z39.53). This draft will be circulated for worldwide review in 1991. It remains to be seen whether these new ISO 639 3-character codes qualify mnemonically for use in SGML tagging and if the set is complete. Provisionally, and as a convenience, the set of 3-character MARC language codes are supplied in this appendix. Where they are mnemonic, unique and adequately distinguish dialectical variants, it would seem permissible to use them for lang attribute values or as language tags.

         ISO 639 CODES ALPHABETIC BY LANGUAGE NAME (ENGLISH SPELLING)

LANGUAGE NAME CODE LANGUAGE FAMILY

ABKHAZIAN AB IBERO-CAUCASIAN
AFAN (OROMO) OM HAMITIC
AFAR AA HAMITIC
AFRIKAANS AF GERMANIC
ALBANIAN SQ INDO-EUROPEAN (OTHER)
AMHARIC AM SEMITIC
ARABIC AR SEMITIC
ARMENIAN HY INDO-EUROPEAN (OTHER)
ASSAMESE AS INDIAN
AYMARA AY AMERINDIAN
AZERBAIJANI AZ TURKIC/ALTAIC

BASHKIR BA TURKIC/ALTAIC
BASQUE EU BASQUE
BENGALI;BANGLA BN INDIAN
BHUTANI DZ ASIAN
BIHARI BH INDIAN
BISLAMA BI [not given]
BRETON BR CELTIC
BULGARIAN BG SLAVIC
BURMESE MY ASIAN
BYELORUSSIAN BE SLAVIC

CAMBODIAN KM ASIAN
CATALAN CA ROMANCE
CHINESE ZH ASIAN
CORSICAN CO ROMANCE
CROATIAN HR SLAVIC
CZECH CS SLAVIC

DANISH DA GERMANIC
DUTCH NL GERMANIC

ENGLISH EN GERMANIC
ESPERANTO EO INTERNATIONAL AUX.
ESTONIAN ET FINNO-UGRIC

FAROESE FO GERMANIC
FIJI FJ OCEANIC/INDONESIAN
FINNISH FI FINNO-UGRIC
FRENCH FR ROMANCE
FRISIAN FY GERMANIC

GALICIAN GL ROMANCE
GEORGIAN KA IBERO-CAUCASIAN
GERMAN DE GERMANIC
GREEK EL LATIN/GREEK
GREENLANDIC KL ESKIMO
GUARANI GN AMERINDIAN
GUJARATI GU INDIAN

HAUSA HA NEGRO-AFRICAN
HEBREW HE SEMITIC [*Changed 1989 from original ISO 639:1988, IW]
HINDI HI INDIAN
HUNGARIAN HU FINNO-UGRIC

ICELANDIC IS GERMANIC
INDONESIAN ID OCEANIC/INDONESIAN [*Changed 1989 from original ISO 639:1988, IN]
INTERLINGUA IA INTERNATIONAL AUX.
INTERLINGUE IE INTERNATIONAL AUX.
INUKTITUT IU [ ]
INUPIAK IK ESKIMO
IRISH GA CELTIC
ITALIAN IT ROMANCE

JAPANESE JA ASIAN
JAVANESE JV OCEANIC/INDONESIAN

KANNADA KN DRAVIDIAN
KASHMIRI KS INDIAN
KAZAKH KK TURKIC/ALTAIC
KINYARWANDA RW NEGRO-AFRICAN
KIRGHIZ KY TURKIC/ALTAIC
KURUNDI RN NEGRO-AFRICAN
KOREAN KO ASIAN
KURDISH KU IRANIAN

LAOTHIAN LO ASIAN
LATIN LA LATIN/GREEK
LATVIAN;LETTISH LV BALTIC
LINGALA LN NEGRO-AFRICAN
LITHUANIAN LT BALTIC

MACEDONIAN MK SLAVIC
MALAGASY MG OCEANIC/INDONESIAN
MALAY MS OCEANIC/INDONESIAN
MALAYALAM ML DRAVIDIAN
MALTESE MT SEMITIC
MAORI MI OCEANIC/INDONESIAN
MARATHI MR INDIAN
MOLDAVIAN MO ROMANCE
MONGOLIAN MN [not given]

NAURU NA [not given]
NEPALI NE INDIAN
NORWEGIAN NO GERMANIC

OCCITAN OC ROMANCE
ORIYA OR INDIAN

PASHTO;PUSHTO PS IRANIAN
PERSIAN (farsi) FA IRANIAN
POLISH PL SLAVIC
PORTUGUESE PT ROMANCE
PUNJABI PA INDIAN

QUECHUA QU AMERINDIAN

RHAETO-ROMANCE RM ROMANCE
ROMANIAN RO ROMANCE
RUSSIAN RU SLAVIC

SAMOAN SM OCEANIC/INDONESIAN
SANGHO SG NEGRO-AFRICAN
SANSKRIT SA INDIAN
SCOTS GAELIC GD CELTIC
SERBIAN SR SLAVIC
SERBO-CROATIAN SH SLAVIC
SESOTHO ST NEGRO-AFRICAN
SETSWANA TN NEGRO-AFRICAN
SHONA SN NEGRO-AFRICAN
SINDHI SD INDIAN
SINGHALESE SI INDIAN
SISWATI SS NEGRO-AFRICAN
SLOVAK SK SLAVIC
SLOVENIAN SL SLAVIC
SOMALI SO HAMITIC
SPANISH ES ROMANCE
SUNDANESE SU OCEANIC/INDONESIAN
SWAHILI SW NEGRO-AFRICAN
SWEDISH SV GERMANIC

TAGALOG TL OCEANIC/INDONESIAN
TAJIK TG IRANIAN
TAMIL TA DRAVIDIAN
TATAR TT TURKIC/ALTAIC
TELUGU TE DRAVIDIAN
THAI TH ASIAN
TIBETAN BO ASIAN
TIGRINYA TI SEMITIC
TONGA TO OCEANIC/INDONESIAN
TSONGA TS NEGRO-AFRICAN
TURKISH TR TURKIC/ALTAIC
TURKMEN TK TURKIC/ALTAIC
TWI TW NEGRO-AFRICAN

UIGUR UG [ ]
UKRAINIAN UK SLAVIC
URDU UR INDIAN
UZBEK UZ TURKIC/ALTAIC

VIETNAMESE VI ASIAN
VOLAPUK VO INTERNATIONAL AUX.

WELSH CY CELTIC
WOLOF WO NEGRO-AFRICAN

XHOSA XH NEGRO-AFRICAN

YIDDISH YI GERMANIC [*Changed 1989 from original ISO 639:1988, JI]
YORUBA YO NEGRO-AFRICAN

ZHUANG ZA [ ]
ZULU ZU NEGRO-AFRICAN


ISO 639 CODES SORTED BY LANGUAGE CODE

LANGUAGE NAME CODE LANGUAGE FAMILY

AFAR AA HAMITIC
ABKHAZIAN AB IBERO-CAUCASIAN
AFRIKAANS AF GERMANIC
AMHARIC AM SEMITIC
ARABIC AR SEMITIC
ASSAMESE AS INDIAN
AYMARA AY AMERINDIAN
AZERBAIJANI AZ TURKIC/ALTAIC
BASHKIR BA TURKIC/ALTAIC
BYELORUSSIAN BE SLAVIC
BULGARIAN BG SLAVIC
BIHARI BH INDIAN
BISLAMA BI [not given]
BENGALI;BANGLA BN INDIAN
TIBETAN BO ASIAN
BRETON BR CELTIC
CATALAN CA ROMANCE
CORSICAN CO ROMANCE
CZECH CS SLAVIC
WELSH CY CELTIC
DANISH DA GERMANIC
GERMAN DE GERMANIC
BHUTANI DZ ASIAN
GREEK EL LATIN/GREEK
ENGLISH EN GERMANIC
ESPERANTO EO INTERNATIONAL AUX.
SPANISH ES ROMANCE
ESTONIAN ET FINNO-UGRIC
BASQUE EU BASQUE
PERSIAN (farsi) FA IRANIAN
FINNISH FI FINNO-UGRIC
FIJI FJ OCEANIC/INDONESIAN
FAROESE FO GERMANIC
FRENCH FR ROMANCE
FRISIAN FY GERMANIC
IRISH GA CELTIC
SCOTS GAELIC GD CELTIC
GALICIAN GL ROMANCE
GUARANI GN AMERINDIAN
GUJARATI GU INDIAN
HAUSA HA NEGRO-AFRICAN
HEBREW HE SEMITIC [*Changed 1989 from original ISO 639:1988, IW]
HINDI HI INDIAN
CROATIAN HR SLAVIC
HUNGARIAN HU FINNO-UGRIC
ARMENIAN HY INDO-EUROPEAN (OTHER)
INTERLINGUA IA INTERNATIONAL AUX.
INTERLINGUE IE INTERNATIONAL AUX.
INUPIAK IK ESKIMO
INDONESIAN ID OCEANIC/INDONESIAN [*Changed 1989 from original ISO 639:1988, IN]
ICELANDIC IS GERMANIC
ITALIAN IT ROMANCE
INUKTITUT IU [ ]
JAPANESE JA ASIAN
JAVANESE JV OCEANIC/INDONESIAN
GEORGIAN KA IBERO-CAUCASIAN
KAZAKH KK TURKIC/ALTAIC
GREENLANDIC KL ESKIMO
CAMBODIAN KM ASIAN
KANNADA KN DRAVIDIAN
KOREAN KO ASIAN
KASHMIRI KS INDIAN
KURDISH KU IRANIAN
KIRGHIZ KY TURKIC/ALTAIC
LATIN LA LATIN/GREEK
LINGALA LN NEGRO-AFRICAN
LAOTHIAN LO ASIAN
LITHUANIAN LT BALTIC
LATVIAN;LETTISH LV BALTIC
MALAGASY MG OCEANIC/INDONESIAN
MAORI MI OCEANIC/INDONESIAN
MACEDONIAN MK SLAVIC
MALAYALAM ML DRAVIDIAN
MONGOLIAN MN [not given]
MOLDAVIAN MO ROMANCE
MARATHI MR INDIAN
MALAY MS OCEANIC/INDONESIAN
MALTESE MT SEMITIC
BURMESE MY ASIAN
NAURU NA [not given]
NEPALI NE INDIAN
DUTCH NL GERMANIC
NORWEGIAN NO GERMANIC
OCCITAN OC ROMANCE
AFAN (OROMO) OM HAMITIC
ORIYA OR INDIAN
PUNJABI PA INDIAN
POLISH PL SLAVIC
PASHTO;PUSHTO PS IRANIAN
PORTUGUESE PT ROMANCE
QUECHUA QU AMERINDIAN
RHAETO-ROMANCE RM ROMANCE
KURUNDI RN NEGRO-AFRICAN
ROMANIAN RO ROMANCE
RUSSIAN RU SLAVIC
KINYARWANDA RW NEGRO-AFRICAN
SANSKRIT SA INDIAN
SINDHI SD INDIAN
SANGHO SG NEGRO-AFRICAN
SERBO-CROATIAN SH SLAVIC
SINGHALESE SI INDIAN
SLOVAK SK SLAVIC
SLOVENIAN SL SLAVIC
SAMOAN SM OCEANIC/INDONESIAN
SHONA SN NEGRO-AFRICAN
SOMALI SO HAMITIC
ALBANIAN SQ INDO-EUROPEAN (OTHER)
SERBIAN SR SLAVIC
SISWATI SS NEGRO-AFRICAN
SESOTHO ST NEGRO-AFRICAN
SUNDANESE SU OCEANIC/INDONESIAN
SWEDISH SV GERMANIC
SWAHILI SW NEGRO-AFRICAN
TAMIL TA DRAVIDIAN
TELUGU TE DRAVIDIAN
TAJIK TG IRANIAN
THAI TH ASIAN
TIGRINYA TI SEMITIC
TURKMEN TK TURKIC/ALTAIC
TAGALOG TL OCEANIC/INDONESIAN
SETSWANA TN NEGRO-AFRICAN
TONGA TO OCEANIC/INDONESIAN
TURKISH TR TURKIC/ALTAIC
TSONGA TS NEGRO-AFRICAN
TATAR TT TURKIC/ALTAIC
TWI TW NEGRO-AFRICAN
UIGUR UG [ ]
UKRAINIAN UK SLAVIC
URDU UR INDIAN
UZBEK UZ TURKIC/ALTAIC
VIETNAMESE VI ASIAN
VOLAPUK VO INTERNATIONAL AUX.
WOLOF WO NEGRO-AFRICAN
XHOSA XH NEGRO-AFRICAN
YIDDISH YI GERMANIC [*Changed 1989 from original ISO 639:1988, JI]
YORUBA YO NEGRO-AFRICAN
ZHUANG ZA [ ]
CHINESE ZH ASIAN
ZULU ZU NEGRO-AFRICAN


ISO 639 LANGUAGE CODES SORTED BY LANGUAGE GROUP

LANGUAGE NAME CODE LANGUAGE FAMILY

AYMARA AY AMERINDIAN
GUARANI GN AMERINDIAN
QUECHUA QU AMERINDIAN

BHUTANI DZ ASIAN
BURMESE MY ASIAN
CAMBODIAN KM ASIAN
CHINESE ZH ASIAN
JAPANESE JA ASIAN
KOREAN KO ASIAN
LAOTHIAN LO ASIAN
THAI TH ASIAN
TIBETAN BO ASIAN
VIETNAMESE VI ASIAN

LATVIAN;LETTISH LV BALTIC
LITHUANIAN LT BALTIC

BASQUE EU BASQUE

BRETON BR CELTIC
IRISH GA CELTIC
SCOTS GAELIC GD CELTIC
WELSH CY CELTIC

KANNADA KN DRAVIDIAN
MALAYALAM ML DRAVIDIAN
TAMIL TA DRAVIDIAN
TELUGU TE DRAVIDIAN

GREENLANDIC KL ESKIMO
INUPIAK IK ESKIMO

ESTONIAN ET FINNO-UGRIC
FINNISH FI FINNO-UGRIC
HUNGARIAN HU FINNO-UGRIC

AFRIKAANS AF GERMANIC
DANISH DA GERMANIC
DUTCH NL GERMANIC
ENGLISH EN GERMANIC
FAROESE FO GERMANIC
FRISIAN FY GERMANIC
GERMAN DE GERMANIC
ICELANDIC IS GERMANIC
NORWEGIAN NO GERMANIC
SWEDISH SV GERMANIC
YIDDISH YI GERMANIC [*Changed 1989 from original ISO 639:1988, JI]

AFAN (OROMO) OM HAMITIC
AFAR AA HAMITIC
SOMALI SO HAMITIC

ABKHAZIAN AB IBERO-CAUCASIAN
GEORGIAN KA IBERO-CAUCASIAN

ASSAMESE AS INDIAN
BENGALI;BANGLA BN INDIAN
BIHARI BH INDIAN
GUJARATI GU INDIAN
HINDI HI INDIAN
KASHMIRI KS INDIAN
MARATHI MR INDIAN
NEPALI NE INDIAN
ORIYA OR INDIAN
PUNJABI PA INDIAN
SANSKRIT SA INDIAN
SINDHI SD INDIAN
SINGHALESE SI INDIAN
URDU UR INDIAN

ALBANIAN SQ INDO-EUROPEAN (OTHER)
ARMENIAN HY INDO-EUROPEAN (OTHER)

ESPERANTO EO INTERNATIONAL AUX.
INTERLINGUA IA INTERNATIONAL AUX.
INTERLINGUE IE INTERNATIONAL AUX.
VOLAPUK VO INTERNATIONAL AUX.

KURDISH KU IRANIAN
PASHTO;PUSHTO PS IRANIAN
PERSIAN (farsi) FA IRANIAN
TAJIK TG IRANIAN

GREEK EL LATIN/GREEK
LATIN LA LATIN/GREEK

HAUSA HA NEGRO-AFRICAN
KINYARWANDA RW NEGRO-AFRICAN
KURUNDI RN NEGRO-AFRICAN
LINGALA LN NEGRO-AFRICAN
SANGHO SG NEGRO-AFRICAN
SESOTHO ST NEGRO-AFRICAN
SETSWANA TN NEGRO-AFRICAN
SHONA SN NEGRO-AFRICAN
SISWATI SS NEGRO-AFRICAN
SWAHILI SW NEGRO-AFRICAN
TSONGA TS NEGRO-AFRICAN
TWI TW NEGRO-AFRICAN
WOLOF WO NEGRO-AFRICAN
XHOSA XH NEGRO-AFRICAN
YORUBA YO NEGRO-AFRICAN
ZULU ZU NEGRO-AFRICAN

FIJI FJ OCEANIC/INDONESIAN
INDONESIAN ID OCEANIC/INDONESIAN [*Changed 1989 from original ISO 639:1988, IN]
JAVANESE JV OCEANIC/INDONESIAN
MALAGASY MG OCEANIC/INDONESIAN
MALAY MS OCEANIC/INDONESIAN
MAORI MI OCEANIC/INDONESIAN
SAMOAN SM OCEANIC/INDONESIAN
SUNDANESE SU OCEANIC/INDONESIAN
TAGALOG TL OCEANIC/INDONESIAN
TONGA TO OCEANIC/INDONESIAN

CATALAN CA ROMANCE
CORSICAN CO ROMANCE
FRENCH FR ROMANCE
GALICIAN GL ROMANCE
ITALIAN IT ROMANCE
MOLDAVIAN MO ROMANCE
OCCITAN OC ROMANCE
PORTUGUESE PT ROMANCE
RHAETO-ROMANCE RM ROMANCE
ROMANIAN RO ROMANCE
SPANISH ES ROMANCE

AMHARIC AM SEMITIC
ARABIC AR SEMITIC
HEBREW HE SEMITIC [*Changed 1989 from original ISO 639:1988, IW]
MALTESE MT SEMITIC
TIGRINYA TI SEMITIC

BULGARIAN BG SLAVIC
BYELORUSSIAN BE SLAVIC
CROATIAN HR SLAVIC
CZECH CS SLAVIC
MACEDONIAN MK SLAVIC
POLISH PL SLAVIC
RUSSIAN RU SLAVIC
SERBIAN SR SLAVIC
SERBO-CROATIAN SH SLAVIC
SLOVAK SK SLAVIC
SLOVENIAN SL SLAVIC
UKRAINIAN UK SLAVIC

AZERBAIJANI AZ TURKIC/ALTAIC
BASHKIR BA TURKIC/ALTAIC
KAZAKH KK TURKIC/ALTAIC
KIRGHIZ KY TURKIC/ALTAIC
TATAR TT TURKIC/ALTAIC
TURKISH TR TURKIC/ALTAIC
TURKMEN TK TURKIC/ALTAIC
UZBEK UZ TURKIC/ALTAIC

BISLAMA BI [not given]
MONGOLIAN MN [not given]
NAURU NA [not given]



Changes made December 20, 1997, based upon information in the following note from a member of the W3C HTML group:

"In 1989, the ISO 639 Registration Authority changed a number of codes
as follows (the quote is taken from RFC 1766):

: The following codes have been added in 1989 (nothing later): ug
: (Uigur), iu (Inuktitut, also called Eskimo), za (Zhuang), he (Hebrew,
: replacing iw), yi (Yiddish, replacing ji), and id (Indonesian,
: replacing in)."

Hence these changes in the listings above (assignment of UIGUR,
INUKTITUT and ZHUANG to a 'LANGUAGE FAMILY' to be determined):

HEBREW HE SEMITIC (3 occurrences, replacing IW with HE)
YIDDISH YI GERMANIC (3 occurrences, replacing JI with YI)
INDONESIAN ID OCEANIC/INDONESIAN (3 occurrences, replacing IN with ID)
UIGUR UG [ ] (2 occurrences added)
INUKTITUT IU [ ] (2 occurrences added)
ZHUANG ZA [ ] (2 occurrences added)

Additional Note 2001-08-29

The provisional/draft (informative) "Annex B" in ISO 639-1:2001
(FDIS) offers these clarifications:

From: http://www.rtt.org/ISO/TC37/SC2/WG1/639/639-1-FDIS-x-2001-02-09.htm

Changes from ISO 639:1988 to ISO 639-1:2001
This annex lists all languages that have been added since the
publication of ISO 639:1988. Modifications to the names of the
languages are not included.

Three language identifiers were changed in 1989. The changes were
publicised, but they have not been included in printed versions of ISO
639. These changes are:

The identifier for Hebrew was changed from "iw" to "he".
The identifier for Indonesian was changed from "in" to "id".
The identifier for Yiddish was changed from "ji" to "yi".
In addition, ISO 639:1988 contains one error. The identifier for
Javanese is rendered as "jw" in table 1, while it is correctly
given as "jv" in the other tables.

Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
Microsoft Corporation
Oracle Corporation

Primeton

Sun Microsystems, Inc.

XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

Newsletter Subscription
Newsletter Archives
Globe Image

Document URI: http://xml.coverpages.org/iso639a.htmlLegal stuff
Robin Cover, Editor: robin@oasis-open.org


No hay comentarios:

Publicar un comentario

Correo Vaishnava