Fiona Cunningham 2003-03-13T03:20:00Z

Resource Description Framework (RDF)

Date:
March 12, 2003
Author:
Fiona Cunningham
Version:
1.0

Background

RDF is the result of metadata communities bringing together their needs to provide a robust and flexible architecture for supporting metadata on the web. RDF is a collaborative design effort. Several W3C Member companies are contributing intellectual resources. Other metadata efforts, such as the Dublin Core and the Warwick Framework have also influenced the design of the RDF (Eric Miller).

RDF

As its name suggests, RDF is a resource description framework, an XML based framework for describing and interchanging metadata about web services. For example describing web page meta data such as author, title, modification data, copyright and licensing information etc.

Aims: (amongst others)

RDF data model

RDF has a very simple model. It is based on the idea of identifying things using Web identifiers (URIs) and of describing RESOURCES in terms of simple PROPERTIES and property VALUES.

For example, say I want to make a statement about a property of WormBase: WormBase (resource) has a sequence (property) called AC3.8 (value)

The three component parts of the model are:

1) A resource:
The thing we want to describe (e.g. WormBase). In RDF a resource is anything that has a URI (e.g. all webpages but also things that are not retrievable on the web, for example people).
A resource has zero or more properties.
2) A property:
The property of a resource, describes the type of value associated with the resource. For example a resource can have a property of author or title and its value would be "Stein" or "hello world".
As property has a name and a value it is also a resource, so that it can have its own properties and its own URI. The URI associated with a resource or property precisely identifies the kind of relationship that exists between the linked items.
3) A value:
The value of the property assigned (e.g. AC3.8) can either be a string e.g. "AC3.8", "11/3/03", or it can be another resource. For example
"The protein page for http://www.www.wormbase.org/db/gene/gene?name=AC3.8;class=Sequence is http://www.wormbase.org/db/seq/protein?name=WP:CE05137;class=Protein."

A statement consists of the combination of a resource, a property, and a value. These parts are known as the 'subject', 'predicate' and 'object' of a statement. An example statement is:

"The Author of http://www.textuality.com/RDF/Why.html is Tim Bray."

The value can just be a string, for example "Tim Bray" in the previous example, or it can be another resource, for example:
"The Home-Page of http://www.textuality.com/RDF/Why.html is http://www.textuality.com."

Using a triadic model of resources, property-types and corresponding values, RDF attempts to provide an unambiguous method of expressing semantics in a machine-readable encoding.


RDF Schema

Consider RDF properties author, writer,creator. Do you think everyone on the global scale of the internet will define creator in the same manner? Both the writer and reader must understand the meaning of the RDF statement. It is crucial that there is no ambiguity in meaning.

RDF doesn't provide any property definitions of its own (e.g. for author, title, director etc). Property definitions come in packages or vocabularies and must be defined somewhere in a schema.

The schema:

If the schema is machine-processable, an application should understand the semantics of a property named in the schema. RDF schemas are structured based on the RDF data model. Therefore, an application that has no understanding of a particular schema will still be able to parse the description into the property-type and corresponding values and will be able to transport the description intact (e.g., to a cache or to another application).

Rules: each predicate must be identified with exactly one schema.

The ability to standardize the declaration of vocabularies is anticipated to encourage the reuse and extension of semantics among disparate information communities. For example the Dublin Core Metadata initiative.


Dublin Core Metadata Initiative

Originally developed at the March '95 Metadata Workshop in Dublin, Ohio, The Dublin Core is a set of "descriptive elements" (properties) for describing documents and facilitating automated indexing (and hence, for recording metadata).

It is intended to be used by resource discovery tools on the Internet, e.g. "webcrawlers" for popular World Wide Web search engines. It is sufficiently simple to be understood and used by the range of authors and casual publishers on the Internet.

Dublin Core elements have become widely used in documenting Internet resources. For example it defines the following properties:

Title: A name given to the resource.
Creator: An entity primarily responsible for making the content of the resource.
Subject: The topic of the content of the resource.
Description: An account of the content of the resource.
Publisher: An entity responsible for making the resource available.
Contributor: An entity responsible for making contributions to the content of the resource.
Date: A date associated with an event in the life cycle of the resource.
Type: The nature or genre of the content of the resource.
Format: The physical or digital manifestation of the resource.
Identifier: An unambiguous reference to the resource within a given context.
Source: A Reference to a resource from which the present resource is derived.
Language: A language of the intellectual content of the resource.
Relation: A reference to a related resource.
Coverage: The extent or scope of the content of the resource.
Rights: Information about rights held in and over the resource.

Information using the Dublin Core elements may be represented in any suitable language (e.g., in HTML Meta elements) or RDF is ideal.

RDF Syntax

At the core of RDF is a syntax-independent model for representing resources and their corresponding descriptions. For example, RDF statements can be modeled as nodes and arcs in a graph.

Astatement is represented by:

Groups of statements are represented by corresponding groups of nodes and arcs. Figure 1 represents the resource http://www.example.org/index.html which has a creator (http://purl.org/dc/elements/1.1/creator) whose value is the URI http://example.org/staffid/85740.

A Simple RDF Statement

If descriptive information about the creator's address is desired, there needs to be a unique identifiable resource representing the address. Given the directed label graph notation in the previous example, the data model corresponding to this description is graphically represented as (Figure 2):

Breaking up John's address

Figure 2

In this case, the creator's address is a uniquely identified resource denoted by http://www.example.org/addressid/85740 with the associated property-types of city, street, state and zip. The use of unique identifiers for resources allows for the unambiguous association of properties.


RDF XML syntax

RDF is about making machine-processable statements. Therefore a system of machine-processable identifiers that allows identification of a subject, predicate, or object in a statement, without confusion with a similar-looking identifier on the Web, must be used. URIs are used as the identifiers to make the information machine readable. They link pieces of information across the Web, different people and organizations can independently create URIs.

Secondly, a machine-processable language is required for representing these statements and exchanging them between machines. RDF uses the Extensible Markup Language XML. RDF defines a specific XML markup language, referred to as RDF/XML, for use in representing RDF information, and for exchanging it between machines.

Advantages of XML:

So XML documents should be a natural vehicle for exchanging general purpose metadata.

XML Namespaces

RDF imposes formal structure on XML to support the consistent representation of semantics.

RDF provides the ability for resource description communities to define semantics. In order to avoid confusion between independent, and possibly conflicting, semantics, RDF uses the XML namespace facility. Namespaces tie a specific use of a property word in context, to the dictionary (schema) where the intended definition is to be found.

For example, reusing the property type " CREATOR ", defined by the Dublin Core Initiative for the author of your document, avoids re-inventing the wheel. An XML namespace is used to identify the Schema for the Dublin Core vocabulary by pointing to the Dublin Core resource for the corresponding semantics. In order to indicate which schema is being used for a particular property, a prefix is used before the property name. For the Dublin Core RDF Schema a prefix of "DC" is commonly used. In this way, other schemas may also define a their own property named Creator but then would used a different schema identifier and prefix. (Note: a schema usually defines several properties).

All RDF elements (like RDF and description) are placed in the http://www.w3.org/1999/02/22-rdf-syntax-ns# namespace. (Property values generally come from other namespaces). The namespace declaration is typically be included as an XML attribute on the rdf:RDF element but may also be included with a particular Description element or even an individual property tag.

So this is how the following graph would be represented in RDF/XML.

Several Statements About the Same Resource

The resource (www.example.org/index.html) has 3 properties:

1:<?xml version="1.0"?>
2:<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
3: xmlns:exterms="http://www.example.org/terms/"
4: xmlns:dc="http://purl.org/dc/elements/1.1/" >

5: <rdf:Description rdf:about="http://www.example.org/index.html">
6: <exterms:creation-date> August 16, 1999 </exterms:creation-date>
7: <dc:creator rdf:resource="http://www.example.org/staffid/85740"/>
8: <exterms:language> English </exterms:language>
9: </rdf:Description>
10: </rdf:RDF>

Line 1: XML declaration.

Line 2: The start of an rdf:RDF element (with its corresponding closing tag on line 10).

- The root element of an RDF document is an RDF element. The following XML content (starting here and ending with the </rdf:RDF> in Line 7) is intended to represent RDF. The RDF Schema is declared as a boot-strapping mechanism for the declaration of the necessary vocabulary needed for expressing the data model.

-On this same line is an XML namespace declaration, represented as an xmlns attribute of the rdf:RDF start-tag. This means all tags in this content prefixed with rdf: are part of the namespace identified by the URIref http://www.w3.org/1999/02/22-rdf-syntax-ns#.

Line 3 and 4: specifies another XML namespace declaration as an attribute of the rdf:RDF element. Line 3 specifies that the namespace URIref 'http://www.example.org/terms/' is to be associated with the exterms: prefix.

Line 4 specifies that any property prefixed with "dc" is associated with "http://purl.org/dc/elements/1.1/".

Lines 5-7:
An RDF statement is represented in XML as a description. It is about the subject of the statement (in this case, about http://www.example.org/index.html).

Line 5: rdf:Description start-tag indicates the start of a description of a resource. All the lines between this start tag and its corresponding end tag (line 9), describe the resource.  The rdf:about attribute specifies the URIref that the resource statement is about

Line 6: This element has a QName <exterms:creation-date> as its tag.  The URIref of the creation-date property corresponding to the QName <exterms:creation-date> is obtained by appending the name creation-date to the URIref of the exterms: prefix (http://www.example.org/terms/), giving http://www.example.org/terms/creation-date.  The <exterms:creation-date> holds the plain literal August 19, 1999 of the creation-date property of the statement.
Lines 7 and 8 are based on the same principles.

Properties as resources

If additional descriptive information about an author is required, similar syntactic constructs are used. In this case, the Dublin Core CREATOR property-type could still be used to represent the author, but additional property-types "name", "email" and "affiliation" are required. For this case, since the semantics for these elements are not defined in Dublin Core, an additional resource description standard may be used.

Using a second schema for new properties

Figure 5

This, in turn, could be syntactically represented as
<?xml:namespace ns = "http://www.w3.org/RDF/RDF/" prefix = "RDF" ?>
<?xml:namespace ns = "http://purl.oclc.org/DC/" prefix = "DC" ?>
<?xml:namespace ns = "http://person.org/BusinessCard/" prefix = "CARD" ?>

<RDF:RDF>
<RDF:Description RDF:HREF ="http://uri-of-Document-1">
<DC:Creator RDF:HREF= "#Creator_001"/>
</RDF:Description>  

<RDF:Description ID="Creator_001">
<CARD:Name>John Smith</CARD:Name>
<CARD:Email>smith@home.net</CARD:Email>
<CARD:Affiliation>Home, Inc.</CARD:Affiliation>
</RDF:Description>
</RDF:RDF>

In this case, the value associated with the property-type DC:Creator is now a resource. While the reference to the resource is an internal identifier, an external URI, for example, to a controlled authority of names, could have been used as well.


RDF Containers

In addition to using simple elements, RDF also allows you to assemble collections of resources into 'bags' (a resource having type rdf:Bag) where the order doesn't matter, and 'sequences' (a resource having type rdf:Seq) where the order is important and 'alternatives' (a resource having type rdf:Alt).

e.g.

<?xml version="1.0"?>
<rdf: RDF 
     xmlns: rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
     xmlns: books= http://www.booksRus.com/schemas/books/>

<rdf: Description about= http://www.xs4all.nl/~sintac>
<books:Creator>
     <RDF:Seq>
         <books:LI> Elliotte Rusty Harold </books:LI>
         <books:LI> W.Scott Means </books:LI>
     </RDF:Seq>
</books:Creator>  
</rdf: Description>
</rdf: RDF>

Characteristics

RDF is carefully designed to have the following characteristics.

Independence: in terms of defining a property and resource.
Interchange: RDF Statements can be converted into XML so they are easy to interchangee. It is important to have portability without loss of information.
Scalability: As described above, RDF statements are simple, three-part records (resource, property, value), so they are easy to handle and look things up by, even in large numbers.

Since it is a common framework, application designers can take advantage of the availability of common RDF parsers and processing tools.


Implementation

The aim of Gene Ontology (GO) Consortium is to provide controlled vocabularies to describe specific aspects of gene products. Collaborating databases annotate their gene products (or genes) with GO terms, providing references and indicating what kind of evidence is available to support the annotations.

Result: The use of common GO terms by these databases facilitates uniform queries across them.

Vocabularies: The GO vocabularies are dynamic, since knowledge of gene and protein roles in cells is accumulating and changing.

The three organizing principles of the GO are:

  1. molecular function,
  2. biological process and
  3. cellular component.

XML

Definitions of the terms within all three of these ontologies are contained in a single (text) definition file.

From: http://www.godatabase.org/dev/" Each monthly release of GO is available as either XML-RDF or as a MySQL compatible file. Both files are available as either a light version containing just the ontologies, or a full version, containing ontologies and gene product associations."

RDF was chosen for use in the XML versions of the ontologies because of its flexibility in representing these graph structures, and because it has a widespread tool support.

Extension of RDF: The GO has added its own extensions to the RDF vocabulary.

  1. A child term may be an "instance" of its parent term (isa relationship) or
  2. a component of its parent term (part-of relationship).

A child term may have more than one parent term and may have a different class of relationship with its different parents. Synonyms and cross-references to external databases are also represented in the ontologies. The go:dbxref element represents the term in an external database. The go:termis the basic element. Every annotation must be attributed to a source, which may be a literature reference, another database or a computational analysis. The annotation must indicate what kind of evidence is found in the cited source to support the association between the gene product and the GO term.

<?xml version="1.0" encoding="UTF-8"?>  
     <!DOCTYPE go:go>  
     <go:go xmlns:go="http://www.geneontology.org/xml-dtd/go.dtd#" 
            xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">  
       <go:version timestamp="Wed May 9 23:55:02 2001" />  


     <rdf:RDF>  
          <go:term rdf:about="http://www.geneontology.org/go#GO:0003673">  
             <go:accession>GO:0003673</go:accession>  
             <go:name>Gene_Ontology</go:name>  
             <go:definition></go:definition>  
          </go:term>  


          <go:term rdf:about="http://www.geneontology.org/go#GO:0003674">  
             <go:accession>GO:0003674</go:accession>  
             <go:name>molecular_function</go:name>  
             <go:definition>The action characteristic of a gene product.</go:definition>  
             <go:part-of rdf:resource="http://www.geneontology.org/go#GO:0003673" />  
             <go:dbxref>  
                <go:database_symbol>go</go:database_symbol>  
                <go:reference>curators</go:reference>  
             </go:dbxref>  
          </go:term>  


          <go:term rdf:about="http://www.geneontology.org/go#GO:0016209">  
             <go:accession>GO:0016209</go:accession>  
             <go:name>antioxidant</go:name>  
             <go:definition></go:definition>  
             <go:isa rdf:resource="http://www.geneontology.org/go#GO:0003674" />  
             <go:association>  
                <go:evidence evidence_code="ISS">  
                   <go:dbxref>  
                      <go:database_symbol>fb</go:database_symbol>  
                      <go:reference>fbrf0105495</go:reference>  
                   </go:dbxref>  
                </go:evidence>  
                

           <go:gene_product>  
                    <go:name>CG7217</go:name>  
                    <go:dbxref>  
                       <go:database_symbol>fb</go:database_symbol>  
                       <go:reference>FBgn0038570</go:reference>  
                    </go:dbxref>  
                 </go:gene_product>  
              </go:association>  
              <go:association>  
                 <go:evidence evidence_code="ISS">  
                    <go:dbxref>  
                       <go:database_symbol>fb</go:database_symbol>  
                       <go:reference>fbrf0105495</go:reference>  
                    </go:dbxref>  
                 </go:evidence>  
                 <go:gene_product>  
                    <go:name>Jafrac1</go:name>  
                    <go:dbxref>  
                       <go:database_symbol>fb</go:database_symbol>  
                       <go:reference> FBgn0040309</go:reference> 
                    </go:dbxref>  
                 </go:gene_product>  
              </go:association>  
            </go:term>  
        </rdf:RDF>  
      </go:go> 

Extension example: GO:0674 has the element:
<go:part-of rdf:resource="http://www.geneontology.org/go#GO:0673"/>.
This says that "Molecular function is part of the Gene Ontology".

The GO illustrates a number of interesting points:

The GO is also another example in which the RDF will not necessarily appear for direct use on the Web (although the files are Web-accessible).

The GO illustrates the role RDF can play as a basis for representing ontologies. This role will be further enhanced by richer RDF-based languages for specifying ontologies, such as the DAML+OIL or OWL languages.

RDF supports the use of conventions that will facilitate modular interoperability among separate metadata element sets. These conventions include standard mechanisms for representing semantics that have a simple and powerful, data model.


References

Miller, Eric.  An Introduction to the Resource Description Framework.

SAMS Teach Yourself XML in 21 days (North and Hermans)

W3: http://www.w3.org/People/EM/talks/www7/tutorial/part2

W3 RDF: http://www.w3.org/RDF

W3 RDF Primer: http://www.w3.org/TR/rdf-primer/

W3: http://www.w3.org/TR/PR-rdf-syntax/

Resource Description Framework (RDF) Model and Syntax Specification http://www.w3.org/TR/PR-rdf-syntax/

Wilkinson, MD, Links, M. (2002). BioMOBY: an open-source biological web services proposal.  Briefings In Bioinformatics 3:4. 331-341.

Bray, Tim. (2001). What is RDF? (XML.com: http://www.xml.com/pub/a/2001/01/24/rdf.html?page=1)

XML in a Nutshell (Harold and Means, O'Reilly)