RDF is the result of metadata communities bringing together their needs to provide a robust and flexible architecture for supporting metadata on the web. RDF is a collaborative design effort. Several W3C Member companies are contributing intellectual resources. Other metadata efforts, such as the Dublin Core and the Warwick Framework have also influenced the design of the RDF (Eric Miller).
As its name suggests, RDF is a resource description framework, an XML based framework for describing and interchanging metadata about web services. For example describing web page meta data such as author, title, modification data, copyright and licensing information etc.
RDF has a very simple model. It is based on the idea of identifying things using Web identifiers (URIs) and of describing RESOURCES in terms of simple PROPERTIES and property VALUES.
For example, say I want to make a statement about a property of WormBase: WormBase (resource) has a sequence (property) called AC3.8 (value)
The three component parts of the model are:
A statement consists of the combination of a resource, a property, and a value. These parts are known as the 'subject', 'predicate' and 'object' of a statement. An example statement is:
http://www.textuality.com/RDF/Why.html
is Tim Bray." http://www.textuality.com/RDF/Why.html
is
http://www.textuality.com
." Using a triadic model of resources, property-types and corresponding values, RDF attempts to provide an unambiguous method of expressing semantics in a machine-readable encoding.
Consider RDF properties author, writer,creator. Do you think everyone on the global scale of the internet will define creator in the same manner? Both the writer and reader must understand the meaning of the RDF statement. It is crucial that there is no ambiguity in meaning.
RDF doesn't provide any property definitions of its own (e.g. for author, title, director etc). Property definitions come in packages or vocabularies and must be defined somewhere in a schema.
The schema:
If the schema is machine-processable, an application should understand the semantics of a property named in the schema. RDF schemas are structured based on the RDF data model. Therefore, an application that has no understanding of a particular schema will still be able to parse the description into the property-type and corresponding values and will be able to transport the description intact (e.g., to a cache or to another application).
Rules: each predicate must be identified with exactly one schema.
The ability to standardize the declaration of vocabularies is anticipated to encourage the reuse and extension of semantics among disparate information communities. For example the Dublin Core Metadata initiative.
Originally developed at the March '95 Metadata Workshop in Dublin, Ohio, The Dublin Core is a set of "descriptive elements" (properties) for describing documents and facilitating automated indexing (and hence, for recording metadata).
It is intended to be used by resource discovery tools on the Internet, e.g. "webcrawlers" for popular World Wide Web search engines. It is sufficiently simple to be understood and used by the range of authors and casual publishers on the Internet.
Dublin Core elements have become widely used in documenting Internet resources. For example it defines the following properties:
Information using the Dublin Core elements may be represented in any suitable language (e.g., in HTML Meta elements) or RDF is ideal.
At the core of RDF is a syntax-independent model for representing resources and their corresponding descriptions. For example, RDF statements can be modeled as nodes and arcs in a graph.
Astatement is represented by:
Groups of statements are represented by corresponding groups of nodes and arcs. Figure 1 represents the resource http://www.example.org/index.html which has a creator (http://purl.org/dc/elements/1.1/creator) whose value is the URI http://example.org/staffid/85740.
If descriptive information about the creator's address is desired, there needs to be a unique identifiable resource representing the address. Given the directed label graph notation in the previous example, the data model corresponding to this description is graphically represented as (Figure 2):
Figure 2
In this case, the creator's address is a uniquely identified resource denoted by http://www.example.org/addressid/85740 with the associated property-types of city, street,
state and zip.
The use of unique identifiers for resources allows for the unambiguous association of properties.
RDF is about making machine-processable statements. Therefore a system of machine-processable identifiers that allows identification of a subject, predicate, or object in a statement, without confusion with a similar-looking identifier on the Web, must be used. URIs are used as the identifiers to make the information machine readable. They link pieces of information across the Web, different people and organizations can independently create URIs.
Secondly, a machine-processable language is required for representing these statements and exchanging them between machines. RDF uses the Extensible Markup Language XML. RDF defines a specific XML markup language, referred to as RDF/XML, for use in representing RDF information, and for exchanging it between machines.
Advantages of XML:
So XML documents should be a natural vehicle for exchanging general purpose metadata.
RDF imposes formal structure on XML to support the consistent representation of semantics.
RDF provides the ability for resource description communities to define semantics. In order to avoid confusion between independent, and possibly conflicting, semantics, RDF uses the XML namespace facility. Namespaces tie a specific use of a property word in context, to the dictionary (schema) where the intended definition is to be found.
For example, reusing the property type " CREATOR ", defined by the Dublin Core Initiative for the author of your document, avoids re-inventing the wheel. An XML namespace is used to identify the Schema for the Dublin Core vocabulary by pointing to the Dublin Core resource for the corresponding semantics. In order to indicate which schema is being used for a particular property, a prefix is used before the property name. For the Dublin Core RDF Schema a prefix of "DC" is commonly used. In this way, other schemas may also define a their own property named Creator but then would used a different schema identifier and prefix. (Note: a schema usually defines several properties).
All RDF elements (like RDF and description) are
placed in the http://www.w3.org/1999/02/22-rdf-syntax-ns# namespace. (Property values generally
come from other namespaces). The namespace declaration is
typically be included as an XML attribute on the rdf:RDF
element but may also be included with a particular Description
element or even an individual property tag.
So this is how the following graph would be represented in RDF/XML.
The resource (www.example.org/index.html) has 3 properties:
1:<?xml version="1.0"?>
2:<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3: xmlns:exterms="http://www.example.org/terms/"
4: xmlns:dc="http://purl.org/dc/elements/1.1/" >
5: <rdf:Description rdf:about="http://www.example.org/index.html">
6: <exterms:creation-date> August 16, 1999 </exterms:creation-date>
7: <dc:creator rdf:resource="http://www.example.org/staffid/85740"/>
8: <exterms:language> English </exterms:language>
9: </rdf:Description>
10: </rdf:RDF>
Line 1: XML declaration.
Line 2: The start of an rdf:RDF
element (with its
corresponding closing tag on line 10).
- The root element of an RDF document is an RDF element. The following XML
content (starting here and ending with the </rdf:RDF>
in Line 7) is intended to represent RDF. The RDF Schema is
declared as a boot-strapping mechanism for the declaration of the necessary
vocabulary needed for expressing the data model.
-On this same line is an XML namespace declaration, represented as an xmlns
attribute of the rdf:RDF
start-tag. This means all
tags in this content prefixed with rdf:
are part of the namespace
identified by the URIref http://www.w3.org/1999/02/22-rdf-syntax-ns#
.
Line 3 and 4: specifies another XML namespace declaration as an attribute
of the rdf:RDF
element. Line 3 specifies that the namespace
URIref 'http://www.example.org/terms/'
is to be associated with
the exterms:
prefix.
Line 4 specifies that any property prefixed with "dc" is associated with "http://purl.org/dc/elements/1.1/
".
Lines 5-7:
An RDF statement is represented in XML as a description. It is
about the subject of the statement (in this case, about
http://www.example.org/index.html).
Line 5: rdf:Description
start-tag indicates the
start of a description of a resource. All the lines between this
start tag and its corresponding end tag (line 9), describe the resource. The rdf:about
attribute specifies the URIref that the resource
statement is about
Line 6: This element has a QName <exterms:creation-date>
as its tag. The URIref of the creation-date property
corresponding to the QName <exterms:creation-date>
is obtained by appending the name creation-date
to the URIref of the exterms:
prefix (http://www.example.org/terms/
), giving http://www.example.org/terms/creation-date
. The <exterms:creation-date>
holds the plain literal August 19, 1999
of the creation-date property of the statement.
Lines 7 and 8 are based on the same principles.
If additional descriptive information about an author is required, similar syntactic constructs are used. In this case, the Dublin Core CREATOR property-type could still be used to represent the author, but additional property-types "name", "email" and "affiliation" are required. For this case, since the semantics for these elements are not defined in Dublin Core, an additional resource description standard may be used.
Figure 5
This, in turn, could be syntactically represented as
<?xml:namespace ns = "http://www.w3.org/RDF/RDF/" prefix = "RDF" ?>
<?xml:namespace ns = "http://purl.oclc.org/DC/" prefix = "DC" ?>
<?xml:namespace ns = "http://person.org/BusinessCard/" prefix = "CARD"
?>
<RDF:RDF>
<RDF:Description RDF:HREF ="http://uri-of-Document-1">
<DC:Creator RDF:HREF= "#Creator_001"/>
</RDF:Description>
<RDF:Description ID="Creator_001">
<CARD:Name>John Smith</CARD:Name>
<CARD:Email>smith@home.net</CARD:Email>
<CARD:Affiliation>Home, Inc.</CARD:Affiliation>
</RDF:Description>
</RDF:RDF>
In this case, the value associated with the property-type DC:Creator
is now a resource. While the reference to the resource is an internal identifier, an external URI, for example, to a controlled authority of names, could have been used as well.
In addition to using simple elements, RDF also allows you to assemble
collections of resources into 'bags' (a resource having type rdf:Bag)
where the order doesn't matter, and 'sequences' (a resource
having type rdf:Seq)
where the order is important and
'alternatives' (a resource having type rdf:Alt)
.
e.g.
<?xml version="1.0"?> <rdf: RDF xmlns: rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#"> xmlns: books= http://www.booksRus.com/schemas/books/> <rdf: Description about= http://www.xs4all.nl/~sintac> <books:Creator> <RDF:Seq> <books:LI> Elliotte Rusty Harold </books:LI> <books:LI> W.Scott Means </books:LI> </RDF:Seq> </books:Creator> </rdf: Description> </rdf: RDF>
RDF is carefully designed to have the following characteristics.
Independence: in terms of defining a property and resource.
Interchange: RDF Statements can be converted into XML so they are easy to interchangee. It is important to have portability without
loss of information.
Scalability: As described above, RDF statements are simple, three-part records (resource, property, value), so they are easy to handle and look things up by, even in large numbers.
Since it is a common framework, application designers can take advantage of the availability of common RDF parsers and processing tools.
The aim of Gene Ontology (GO) Consortium is to provide controlled vocabularies to describe specific aspects of gene products. Collaborating databases annotate their gene products (or genes) with GO terms, providing references and indicating what kind of evidence is available to support the annotations.
Result: The use of common GO terms by these databases facilitates uniform queries across them.
Vocabularies: The GO vocabularies are dynamic, since knowledge of gene and protein roles in cells is accumulating and changing.
The three organizing principles of the GO are:
Definitions of the terms within all three of these ontologies are contained in a single (text) definition file.
From: http://www.godatabase.org/dev/" Each monthly release of GO is available as either XML-RDF or as a MySQL compatible file. Both files are available as either a light version containing just the ontologies, or a full version, containing ontologies and gene product associations."
RDF was chosen for use in the XML versions of the ontologies because of its flexibility in representing these graph structures, and because it has a widespread tool support.
Extension of RDF: The GO has added its own extensions to the RDF vocabulary.
A child term may have more than one parent term and may have a different class of relationship with its different parents. Synonyms and cross-references to external databases are also represented
in the ontologies. The go:dbxref
element represents the term
in an external database. The go:term
is the basic element. Every annotation must be
attributed to a source, which may be a literature reference, another database
or a computational analysis. The annotation must indicate what kind of evidence
is found in the cited source to support the association between the gene
product and the GO term.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE go:go> <go:go xmlns:go="http://www.geneontology.org/xml-dtd/go.dtd#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <go:version timestamp="Wed May 9 23:55:02 2001" /> <rdf:RDF> <go:term rdf:about="http://www.geneontology.org/go#GO:0003673"> <go:accession>GO:0003673</go:accession> <go:name>Gene_Ontology</go:name> <go:definition></go:definition> </go:term> <go:term rdf:about="http://www.geneontology.org/go#GO:0003674"> <go:accession>GO:0003674</go:accession> <go:name>molecular_function</go:name> <go:definition>The action characteristic of a gene product.</go:definition> <go:part-of rdf:resource="http://www.geneontology.org/go#GO:0003673" /> <go:dbxref> <go:database_symbol>go</go:database_symbol> <go:reference>curators</go:reference> </go:dbxref> </go:term> <go:term rdf:about="http://www.geneontology.org/go#GO:0016209"> <go:accession>GO:0016209</go:accession> <go:name>antioxidant</go:name> <go:definition></go:definition> <go:isa rdf:resource="http://www.geneontology.org/go#GO:0003674" /> <go:association> <go:evidence evidence_code="ISS"> <go:dbxref> <go:database_symbol>fb</go:database_symbol> <go:reference>fbrf0105495</go:reference> </go:dbxref> </go:evidence> <go:gene_product> <go:name>CG7217</go:name> <go:dbxref> <go:database_symbol>fb</go:database_symbol> <go:reference>FBgn0038570</go:reference> </go:dbxref> </go:gene_product> </go:association> <go:association> <go:evidence evidence_code="ISS"> <go:dbxref> <go:database_symbol>fb</go:database_symbol> <go:reference>fbrf0105495</go:reference> </go:dbxref> </go:evidence> <go:gene_product> <go:name>Jafrac1</go:name> <go:dbxref> <go:database_symbol>fb</go:database_symbol> <go:reference> FBgn0040309</go:reference> </go:dbxref> </go:gene_product> </go:association> </go:term> </rdf:RDF> </go:go>
Extension example: GO:0674
has the element:
<go:part-of
rdf:resource="http://www.geneontology.org/go#GO:0673"/>.
This says that "Molecular function is part of the Gene Ontology".
The GO illustrates a number of interesting points:
The GO is also another example in which the RDF will not necessarily appear for direct use on the Web (although the files are Web-accessible).
The GO illustrates the role RDF can play as a basis for representing ontologies. This role will be further enhanced by richer RDF-based languages for specifying ontologies, such as the DAML+OIL or OWL languages.
RDF supports the use of conventions that will facilitate modular interoperability among separate metadata element sets. These conventions include standard mechanisms for representing semantics that have a simple and powerful, data model.
Miller, Eric. An Introduction to the Resource Description Framework.
SAMS Teach Yourself XML in 21 days (North and Hermans)
W3: http://www.w3.org/People/EM/talks/www7/tutorial/part2
W3 RDF: http://www.w3.org/RDF
W3 RDF Primer: http://www.w3.org/TR/rdf-primer/
W3: http://www.w3.org/TR/PR-rdf-syntax/
Resource Description Framework (RDF) Model and Syntax Specification http://www.w3.org/TR/PR-rdf-syntax/
Wilkinson, MD, Links, M. (2002). BioMOBY: an open-source biological web services proposal. Briefings In Bioinformatics 3:4. 331-341.
Bray, Tim. (2001). What is RDF? (XML.com: http://www.xml.com/pub/a/2001/01/24/rdf.html?page=1)
XML in a Nutshell (Harold and Means, O'Reilly)