Given a term t
in a language L
, the URI is constructed as follows:
t
is encoded using Unicode, and the NFC
normalization procedure is applied to ensure a unique representation. Conventional unnormalized Unicode allows encoding a character such as "à"
in either a composed or in a decomposed form.%4D
with the respective octet value stored as two upper-case hexadecimal digits.http://lexvo.org/id/term/
as well as the ISO 639-3 code for the
language L
followed by the "/" character are prepended to this path segment to obtain a complete URI.t
in language L
.
Language URIs consist of the base address http://lexvo.org/id/iso639-3/
followed by a valid three-letter ISO 639-3 language code that is not defined as a special code.
A language URI abiding to this specification refers to the language denoted by the language code according
to the ISO 639-3 standard. Additionally, because many systems use two-letter ISO 639-1 codes instead
of 3-letter ISO 639-3 codes, we also provide equivalent URIs consisting of the base address
http://lexvo.org/id/iso639-1/
followed by a 2-letter ISO 639-1 code.
Script URIs consist of the base address http://lexvo.org/id/script/
followed by an ISO 15924 script code other
than Zxxx
, Zyyy
, Zzzz
. A script URI abiding to this
specification refers to the script denoted by the code according to the ISO 15924 standard.
Character URIs consist of the base address http://lexvo.org/id/char/
,
followed by a Unicode code point in upper-case hexadecimal notation
with zero-padding to 4 digits if shorter than 4 digits, and without additional zero-padding if longer.
A character URI abiding to this specification refers to the character denoted by the code point
according to the Unicode 5.0 standard.
Geographical URIs consist either of the base address http://lexvo.org/id/iso3166-1/
,
followed by an ISO 3166-1 alpha-2 code for countries, or of the base address
http://lexvo.org/id/un_m49/
followed by a UN M.49
code for regions that are not countries (i.e. only for continents and other groupings).
WordNet URIs consist of the base address http://lexvo.org/id/wordnet/30/
, followed
by a part-of-speech indicator ("noun/", "verb/", "adj/", or "adv/"), and a sense key.
The sense keys are similar to WordNet's original sense keys,
however using the following format: lemma + "_" + lex_filenum + "_" + lex_id [+ "_" + head_word "_" + head_id],
where lemma and headword are encoded using percent-encoding as per RFC 3986.
These URIs identify not the synsets themselves but the denotational meanings associated with the synsets, just like Lexvo.org's
language URIs identify not the corresponding language codes themselves but the actual languages.
Kangxi radicals are abstract entities associated with
specific semantic components of Chinese characters. Lexvo.org's Kangxi Radical URIs consist of the base address http://lexvo.org/id/kangxi-radical/
, followed
by a number from 1 to 214 representing the radical numbers used in the 1716 Kangxi dictionary.
The classes and properties used are specified in the Lexvo.org Ontology in OWL/RDF.
A machine-readable dataset description in VoiD format is available.
We offer a very simple Java API that creates URIs for languages and terms.
Lexvo.org 2008-2016 Gerard de Melo. Contact Data Sources Legal Information / Imprint