While much of the Web consists of text intended for human readers, the Semantic Web is an effort to provide information on the Web that machines can easily process in order to accomplish useful things. A URI can be regarded as an ID, an identifier for things on the Web like websites, real-world objects, or even somewhat more abstract entities like words, concepts, and languages. Please refer to Wikipedia for more information.
The name Lexvo is derived from the Ancient Greek λεξικόν (lexicon) and the Latin vocabularium (or vocabulary). It is the name of a project that aims to provide lexicon-related services on the Web. The Lexvo.org Linked Data URIs are the first of these services.
Yes, using a normal web browser you can access the URIs and receive a human-readable web page describing an entity (e.g. a word or a language). Instead, clients may also choose to request a machine-readable RDF-form representation of pertinent information about an entity using HTTP content negotiation.
http://lexvo.org/id/...
or http://www.lexvo.org/page/...
?You should use the URIs starting with http://lexvo.org/id/
as identifiers to refer to the language-related entities.
The URIs starting with http://www.lexvo.org/page/
instead only refer to web pages that happen to be about the
respective entities. If a web browser accesses an URI starting with http://www.lexvo.org/id/
, it will automatically
be redirected to the corresponding web page.
String literals cannot serve as subjects of an RDF triple, so it is not conveniently possible to express knowledge about words using string literals. To express lexical knowledge, several ontologies have instead defined OWL classes that represent words or other terms in a language. However, the URIs for individual terms, i.e. the instances of such classes, were often created on an ad hoc basis when needed. For instance, the W3C draft RDF/OWL Representation of WordNet defined URIs for the words covered by the WordNet lexical database, but not for other words. Lexvo.org addresses this by providing predictable URIs for words in any ISO 639-3 language.
Linking to term URIs is especially useful to establish the meaning of a non-information
resource URI more clearly. For, example, we might have an URI such as <http://www.some.org/#Frankfurt>
that is supposed to refer to the city of Frankfurt in Germany. However, we should rely on factual
data rather than mere appearances to derive this meaning, because it shouldn't matter to us whether
the URI is named <http://www.some.org/#Frankfurt>
or <http://www.some.org/#City348914>
.
One way of doing so is to clarify the meaning using a lexicalization relation:
<http://www.some.org/#Frankfurt> <lexvo:lexicalization> <lexvo:term/deu/Frankfurt%20am%20Main>
or
<http://www.some.org/#City348914> <lexvo:lexicalization> <lexvo:term/deu/Frankfurt%20am%20Main>
Now, it is clear in both cases that the URI can only denote entities that are called "Frankfurt am Main"
in German.
Different levels of abstractions can be chosen. We made a pragmatic choice to consider two term entities distinct if the strings are different after Unicode NFC normalization, or if the ISO 639-3 codes differ, which is similar to the RDF semantics for literals. Thus we do not distinguish the meanings of polysemous words in a language, e.g. the verb and noun meanings of the English term "call". In contrast, we do consider the Italian term "burro", which means butter, distinct from the Spanish term "burro", which means donkey.
More information can be found on the Technical Details page. We also offer a simple Java API that allows you to create URIs for languages and terms, which is described in further detail on our Getting Started page.
<rdfs:label>
or <skos:prefLabel>
instead of <lexvo:label>
?<lexvo:label>
represents the semantic relation that holds between an entity and terms (words, names, etc.) commonly
used to refer to it, e.g. between Albert Einstein and the string "Albert Einstein", or between the concept of books and the French term "livre" (NB: It
is deliberately underspecified to apply to real-world entities as well as conceptual entities). RDF triples
involving <lexvo:label>
describe actual language use.
In contrast, <rdfs:label>
is merely an annotation property that is used to assign human-readable resource labels to resources,
which can also be identifier strings such as minCardinality
rather than genuine words
or names used by a language community.
The SKOS label properties force us to make normative judgments about which label is preferred for a given entity. This makes sense
within a single authoritative thesaurus, but is not appropriate for an open environment where we merely wish to describe which terms are commonly
used to refer to something. This is why <lexvo:label>
is defined to be a more generic super-property of <skos:prefLabel>
and <skos:altLabel>
.
One advantage of using those URIs is that they are maintained by the Library of Congress. However, since there is a natural and simple well-defined scheme for transforming
authoritative ISO 639-3 standard codes to Lexvo.org URIs and vice versa, Lexvo.org's language URIs are just as stable and will not become meaningless at any point in the future.
Additionally, there are several other issues to consider. First of all, the code set that the LOC URIs are based on is orders of magnitude smaller
than ISO 639-3 and for example lacks an adequate code for Cantonese, which is spoken by over 60 million speakers.
More importantly, the LOC's URIs do not describe languages per se but rather describe code-mediated conceptualizations
of languages. This implies, for instance, that the French language (<http://lexvo.org/id/iso639-3/fra>
) has two different
counterparts at the LOC, <http://id.loc.gov/vocabulary/iso639-2/fra>
and
<http://id.loc.gov/vocabulary/iso639-2/fre>
, which each have slightly
different properties.
Finally, connecting your data to Lexvo.org's information is likely to be more useful in practical applications. It offers information about
the languages themselves, e.g. where they are spoken, while the LOC mostly provides information about the codes, e.g. when the codes were created
and updated and what kind of code they are.
In practice, you can also use both codes simultaneously in your data. However, you need to be very careful to make sure that you are asserting
that a publication is written in French rather than in some concept of French created on January, 1, 1970 in the United States.
If you need a URI for a language variety not covered by Lexvo.org, e.g. Old English, one option is to use the corresponding DBpedia URI, if the variety has its own Wikipedia article.
Lexvo.org 2008-2016 Gerard de Melo. Contact Data Sources Legal Information / Imprint