Semantic Representation Languages

Author:: Andrew Farmer
Date:: March 12, 2003
Version:: 1.0

Overview

This is intended to give a high-level overview of the work that others are doing in the area of "semantic" representation standards for the web. To give a brief overview of the set of topics I'll be addressing here, I will begin by quoting a concise but excellent summary:

"...the growing stack of W3C recommendations related to the Semantic Web.

XML provides a surface syntax for structured documents, but imposes no semantic constraints on the meaning of these documents.

XML Schema is a language for restricting the structure of XML documents.

RDF is a datamodel for objects ("resources") and relations between them, provides a simple semantics for this datamodel, and these datamodels can be represented in an XML syntax.

RDF Schema is a vocabulary for describing properties and classes of RDF resources, with a semantics for generalization-hierarchies of such properties and classes.

OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes."

from Web Ontology Language (OWL): Overview

Another oft-cited graphical representation of this "semantic stack" and its relationship to the complete vision of the semantic web, including some even higher levels relating to logic, proof and trust can be found at http://www.w3.org/2000/Talks/1206-xml2k-tbl/slide10-0.html

I'll elaborate a little bit on these subjects and give some thoughts on the significance of each of these layers to MOBY in what follows. It is perhaps worth pointing out, however that the characterization of these topics as a stack is perhaps a little misleading, especially with respect to RDF, which does not depend on XML, but uses it as one possible serialization format. However the notion of a stack is definitely useful in understanding how the more advanced semantic standard represented by OWL builds on top of the foundations laid out by RDF and RDF Schema. (OWL represents the transitioning of the earlier DAML/OIL work to an RDF foundation under the auspices of the W3C; OWL itself is broken into a stack of OWL Lite, OWL DL and OWL Full.) It's a rather elegant layering that seems to facilitate a very flexible approach to adopting the level of semantic precision needed in a variety of contexts (which can become very abstruse), as long as one accepts the basic foundation of the RDF model.

XML Schema

At its core, XML Schema provides a way for specifying constraints on XML-encoded data in terms of simple element "datatypes" and complex element structures. In other words, an XML document representing a set of XML Schema constraints allows a schema processor to determine whether or not another XML document is a valid instance of that schema or not; this is essentially the same sort of thing for which people use DTDs, with some enhanced features of the constraint language.

As far as "datatypes" are concerned, this is something that was almost totally lacking in the facilities provided by DTDs (with the possible exception of such "datatypes" as NMTOKEN). The specification provides a number of "built-in" datatypes (various flavors of numeric, datetime, and string), the sort of things one would map to the basic types in a database system or programming language. Each datatype is characterized in terms of a "value space" (e.g. signed 32 bit integers) and a "lexical space" (how the values are represented as characters). A standardized representation of a "null" value is also given ("xsi:nil").

Some of these types are "primitive", while others are derived in terms of restrictions ("facets") placed on these primitive types; for example, "integer" is derived from "decimal" by constraining the fractionDigits facet to be 0. Facets may also be used by schema authors to derive new simple types; for example, a "pattern" facet allows one to constrain types by means of a regular expression, or an "enumeration" facet allows one to enumerate the allowed values for a type.

In addition to the restrictions that may be applied to derive more restrictive simple types, new simple types may be formed using language constructs for defining lists and unions of simple types.

The other main advance that XML Schema makes over DTDs is in its support for XML namespaces, which allows a much greater degree of modularity in building up complex schema specifications from simpler schema elements that may have been designed independently (and hence may have namespace collisions when brought together). This seems to constitute the main advance of XML Schema over Document Type Definitions with respect to characterizing complex element structures, although there are a number of new features of the Schema language that are aimed at increasing the modularity of defining the elements of complex structures.

As with other XML schema specification languages, the main point of using XML Schema is to allow a designer to specify constraints on documents that will allow any given instance of a document to be validated against the constraints specified in a schema. As such, they are primarily concerned with element content (datatypes) and element markup structure, without technically supplying any formal "semantics". This is an extremely nice distinction, but the idea seems to be that one can adopt a certain set of conventions for encoding logical concepts such as class/property constructs or class/subclass relationships in your XML structures, but the XML Schema specification itself does not supply a "semantic interpretation" (i.e one that dictates a set of logical inference rules) for its constructs. Thus, someone coming across an XML Schema specification without knowing the particular conventions, wouldn't be able to "reverse engineer" a logical interpretation simply from the structure. Of course, a well-designed XML structure with human readable tags would probably allow an intelligent user (who was conversant with the domain being described) to infer the semantics correctly. Nevertheless, it is this perceived "semantic opaqueness" of the relationships between the pieces of XML structures that forms the core of the argument for using RDF as the foundation of the "semantic web" and its various ontology languages, rather than arbitrary XML grammars.

Here are several reasonably good discussions on the subject of why XML structures are viewed as inadequate as a foundation for representing semantics for the web (some also address possible translations of ontological relationships into schema descriptions):

Despite this, XML Schema does figure into most of the proposed semantic standards; however, its use there seems to be limited to use of its datatyping facilities, rather than its mechanisms for specifying complex element structures.

Another interesting point that we should perhaps consider with respect to XML Schema and its use in MOBY is made in Comparing XML Schema Languages:

"One of the key strengths of XML, sometimes called "late binding," is the decoupling of the writer and the reader of an XML document: this gives the reader the ability to have its own interpretation and understanding of the document. By being more prescriptive about the way to interpret a document, XML schema languages reduce the possibility of erroneous interpretation but also create the possibility of unexpectedly adding "value" to the document by creating interpretations not apparent from an examination of the document itself."

So, for example, we should consider whether it would be better for MOBY to have a "strongly-typed" notion of chromosome position in a central data ontology that forced it to be numerical data (and whose responsibility it would be to do this validation), or for MOBY to simply mark the concept of chromosome position and for consumers of chromosome position data that did not meet their expectations to ignore it or throw errors. The answer is probably "both": some software would probably benefit from being able to recognize that cytogenetic positional information and base pair and centimorgan coordinates are all conceptually related as being "kinds of" genomic positions; other software will want to ensure that it's not trying to average "4q22", "100.32 cM" and "2356 bp". I have the sense that we will need to be careful to make sure that we do not tangle up orthogonal concerns in the design of the system, i.e. that we do not impose "type-safety" unless it is needed. This separation of "description" from "constraint" seems to be a recurring motif in a lot of the work that is being done in this area; the basic separation between the notions of well-formedness and validity in XML documents is the most familiar example, but we will see the same idea expressed in somewhat different terms as we explore the higher levels of the ontology description stack.

RDF (Resource Description Framework)

At a basic level, RDF seems so simple that it can be initially rather hard to understand why it should be taken so seriously by semantic web researchers as the foundation of the next generation web. There are a lot of subtleties to RDF that I wouldn't claim to understand, but I think I'm beginning to grasp the core of the idea, and think that it's well worth considering its significance to MOBY.

The basics of the idea will be familiar to anyone coming from a data-oriented background; in fact, RDF is described in various places as the key to building a "data-oriented" web, as opposed to the "document-oriented" first generation web.

"The foundation of RDF is a model for representing named properties and property values. The RDF model draws on well-established principles from various data representation communities. RDF properties may be thought of as attributes of resources and in this sense correspond to traditional attribute-value pairs. RDF properties also represent relationships between resources and an RDF model can therefore resemble an entity-relationship diagram. (More precisely, RDF Schemas - which are themselves instances of RDF data models - are ER diagrams.) In object-oriented design terminology, resources correspond to objects and properties correspond to instance variables."

from: Resource Description Framework (RDF) Model and Syntax Specification

Before we get into a discussion of the key differences of RDF from the relational and object-oriented models, let's make sure we get a handle on the standard terminology. The unit of meaning in RDF is the statement, which is conceptualized as a triple consisting of a subject, an object and a predicate (or property) that relates the two. The subject of an RDF statement is always a resource, that is, something with a URI. The object may be either another resource or a "literal" of some sort (e.g. a string, an integer, a chunk of XML). The predicate that relates the two is called a property; properties are themselves a special subset of resources (this is important, and we'll expand on it in what follows). So, in relational database terms, an RDF statement can be thought of as giving the value (the object) of a particular column (the property) for a given row (the subject). If the object is another resource, the column would be a foreign key, otherwise for a literal the column would be a regular datatype; similarly, in object-oriented data modeling, statements with resources for their objects would be instance variables storing references to other objects, while those with literals would correspond to primitive datatypes.

There are two subtle, but critical differences between RDF and these familiar approaches to data modeling, which make it suitable to be the data modeling technique for the web in its sense of a universal information space.

The first is the use made in RDF of the web's URI. RDF basically provides a framework for making "meaningful" assertions or "statements" about "resources"- that is to say, things that have been given URIs. Having a URI is the sine qua non of being "on the web"; it means having a unique identity in the universal information space of the web (which is independent of whether or not the resource is network-retrievable, like a web page). Having a URI is like having a primary key in a database or a reference to an object, but instead of the scope being this database or this computer's memory, the URI is universal. (Note that the LSID of the I3C is a species of URI.) So, by insisting that its identifiers are universal, RDF provides for decentralization of data without fear of "id-space collisions". The second key difference is that the properties themselves have URIs. This is important for several reasons. First, it allows properties to be "first-class citizens" of the data model, independent of constructs such as tables or classes; the same property (as identified by URI) need not be constrained to apply only to instances of a given class of object (although higher levels in the "semantic stack" allow the domain and range of properties to be constrained); alternatively, one can imagine any property as defining a class of objects (those objects which have been given a value for the property). Second, the property's significance is universal, i.e. anyone who uses the same property to make an assertion must (or should) "mean" the same thing as anyone else. Third, since properties are themselves resources, they may be used as the subject of statements. This forms the basis of RDF's ability to define arbitrary levels of metadata using the same basic model, and leads to the vocabularies of properties defined in the higher levels of the "semantic stack" for defining schemas and ontologies in the RDF model.

Looking back at the analogy to relational databases, some of these differences are well expressed by the following description:

"Is the RDF model an entity-relationship model? Yes and no. It is great as a basis for ER-modelling, but because RDF is used for other things as well, RDF is more general. RDF is a model of entities (nodes) and relationships. If you are used to the "ER" modelling system for data, then the RDF model is basically an opening of the ER model to work on the Web. In typical ER model involved entity types, and for each entity type there are a set of relationships (slots in the typical ER diagram). The RDF model is the same, except that relationships are first class objects: they are identified by a URI, and so anyone can make one. Furthermore, the set of slots of an object is not defined when the class of an object is defined. The Web works though anyone being (technically) allowed to say anything about anything. This means that a relationship between two objects may be stored apart from any other information about the two objects. This is different from object-oriented systems often used to implement ER models, which generally assume that information about an object is stored in an object: the definition of the class of an object defines the storage implied for its properties.
For example, one person may define a vehicle as having a number of wheels and a weight and a length, but not foresee a color. This will not stop another person making the assertion that a given car is red, using the color vocabulary from elsewhere. Apart from this simple but significant change, many concepts involved in the ER modelling take across directly onto the Semantic Web model."

from:What the Semantic Web can represent

I'd like to call special attention to the idea in this paragraph that "The Web works though anyone being (technically) allowed to say anything about anything." This is stated as one of the principal design goals for RDF (see Resource Description Framework (RDF): Concepts and Abstract Syntax), and seems to be really fundamental to understanding the view of the semantic web research world (whether or not you accept it as a desirable goal).

A set of RDF statements is easily represented as a directed, labeled graph in which the nodes represent subjects and objects of statements and the edges represent the predicates. (Technically, it's a multigraph, since two nodes representing resources may be connected by many edges.) Each resource used as a subject or object in a statement labels a single node in the graph, and each edge is labeled by the property. Thus, unlike XML whose ancestry as a document-oriented language led to its "natural representation" as a tree (reflected in the DOM API, for example), RDF is really built upon a general graph-oriented (read: many-to-many relationships) foundation.

As I have pointed out elsewhere, the distinction between the XML approach and RDF is confused by the fact that the RDF spec suggests a standard serialization model for RDF into an XML syntax. The canonical XML format for RDF has been the source of some controversy and, some say, is one of the major reasons for RDF's failure to be as widely adopted as XML by the regular web development community. However, there are other simpler serialization formats for RDF, the most prominent being the N3 format which basically represents each statement directly as a triple. At any rate, it is important to bear in mind that RDF is a theoretical model and and is not dependent on any particular syntactical implementation, in the same way that the relational model is not dependent on a physical implementation.

The theoretical model behind RDF is not merely a proposal for an open-ended scheme for information encoding, but is concerned with the problem of inference; i.e., given a set of explicit RDF assertions, what implicit facts may be inferred from them. Thus RDF should be explicitly understood as providing a substrate for AI-like inference engines. (For example, see An Introduction to Prolog and RDF for a discussion of the possible translation of RDF into Prolog knowledge bases.) One of the major features of the logical model behind RDF is the notion of "monotonicity", which basically means that no "conclusion" is drawn from a set of RDF statements which additional "premises" could invalidate; there are a lot of subtleties to this, but the basic import seems to be that the inference engines are restricted from making any "closed world assumptions" about the set of assertions from which they derive conclusions.

RDF has some more complicated constructs which I'll only mention in passing, such as:

blank nodes for representing complex assertions as a set of simple statements
support for collections (ordered and unordered)
"reification" which allows statements to be referenced as components of other statements (e.g. representation of the evidence for or belief in an assertion could be captured in this way).

There seems to be good tool support for basic RDF manipulation in a variety of languages, although it's certainly not as mainstream yet as XML. See, for example, the lists at Dave Beckett's Resource Description Framework (RDF) Resource Guide

Finally, some further reading (none of it is too heavy) that may be helpful to get a perspective on the significance of RDF:

Business Model for the Semantic Web (This is short and non-technical, but worth reading to get a feel for the intention behind RDF- especially the last two sections):

I found these to be helpful introductions to the vision of the semantic web and the importance of RDF in that vision:

Finally, this little discussion of the "utility" of the RDF approach for simple data representation resonated with me, especially the bit about the situations in which it is advantageous to not constrain oneself to a definitive schema ...

"Responding to both St.Laurent's claim about straitjackets and to Champion's plea for a demonstration of RDF's utility, Eric van der Vlist said that lots of things -- like RDBMS and XML -- are straitjackets, that every storage or representation technology has advantages and disadvantages, including RDF. "RDF and its triples," van der Vlist claimed, are "really lightweight when you have the right tools to manipulate them. I like to think of them as a RDBMS with a variable geometry: each 'row'...can have a variable number of columns..."
Van der Vlist makes nicely the point I made earlier about Python's rdflib. Being able to use RDF as a loose storage system, without having to worry about outgrowing (or even fully specifying, in advance) an RDBMS schema can be very helpful, in at least two situations: because, first, you don't know what the data schema really is yet, owing either to problem domain constraints or to an extended prototype stage; and, second, because in some applications the storage schema needs to stay very flexible and extensible for the lifetime of the project. Or, as van der Vlist said, RDF is "like a RDBMS which you could populate before having written any schema, that's really very flexible..."

from: RDF, What's It Good For?

RDF Schema

Once you have grasped the concepts behind RDF, RDF Schema is best understood as introducing a basic set of resources (i.e. terms identified by URIs using a common namespace for RDFS) which can be used by people who want to build their own domain-specific schemas by referring to a standard vocabulary of terms. For example, one can make an RDF statement such as "moby:Gene rdf:type rdfs:Class", meaning that the conceptual resource identified as "moby:Gene" is an element of (the meaning of rdf:type) the set of resources described by the concept represented by rdfs:Class; this, in turn means that it can be used as the "object" in statements that use rdf:type to describe a resource. In other words, whereas RDF basically describes the way in which meaningful statements can be made about things, RDF Schema begins to define a vocabulary for making statements (in RDF) that describe those things known as "schemas", and establishing the semantics of the terms in these vocabularies (e.g. rdf:type is an instance of rdf:Property whose rdfs:range is rdfs:Class). To get a better sense for the sorts of semantics that may be expressed using the terms in the RDF Schema vocabulary, you may wish to take a quick glance at RDF Vocabulary Description Language 1.0: RDF Schema. The most important are the rdfs:subClassOf and rdfs:subPropertyOf which allow for the development of inheritance hierarchies among the terms used in a schema described using the rdfs vocabulary.

Now, this may seem like a somewhat cumbersome system for essentially expressing the same basic information that object-oriented languages or relational modeling tools allow one to express. However, it is important to realize that the intention is to translate these sorts of concepts outside of the context of any particular programming language and into the universal information space of the web. To use a familiar example, GO is distributed as RDF that uses a property defined in the GO namespace "go:isA" to relate the resources it is describing; assuming that it's more likely that a semantic search engine would be written to the language of the RDF Schema than to GO, it would perhaps be better for GO to use the same terms (i.e. rdfs:subClassOf); on the other hand, one of the nice properties of RDF is that the discrepancy is easily amended (at least conceptually) by adding an assertion that the two properties are logically equivalent.

Furthermore, the grounding in RDF has some interesting implications for the behavior of "schemas" defined in this way. I highly recommend reading the excellent discussion in RDF Primer: Interpreting RDF Schema Declarations, but I'll paraphrase it briefly.

First, the fact that the focus of RDF is on relating things via properties (rather than defining classes of things in terms of properties) has the natural corollary that properties may be described schematically without reference to their context in describing things belonging to classes defined by means of these properties. So, for example, one could define a property "Name" and it would have the same significance with respect to any object that had a string predicated of it via this property; contrast this with the case where many different classes have a property with this name, but it is not prima facie evident whether a gene.name property is in any sense similar to a person.name property. It could be argued that the ambiguity in the latter case could be addressed by an inheritance scheme that placed the name property in a superclass from which both gene and person were derived; however, this strategy will require multiple inheritance when considering multiple properties that may be independently combined, and in the limit essentially reduces to defining each property independently as a class of things that have that property.

Second, type systems as used in closed world environments like programming languages or databases are fairly tightly bound to the application of the constraints that are implied by the type descriptions; an RDF Schema description however, merely describes how a processor might test conformance of certain instances of data to the properties described by an RDF Schema, but it obviously can't do anything to enforce the constraints. So for example, if I create a certain Class definition in Java, it will constrain the set of properties that may be associated with an instance of that Class, and will not allow me to dynamically associate new properties with an object instance that would potentially alter its typing at run-time. This makes good sense in non-distributed, or centrally coordinated environments, and is certainly key to implementing applications like compilers and database systems that need to organize data efficiently according to the design-time constraints on the runtime behavior. On the web, on the other hand, any schematic description is essentially only another set of assertions that someone has made about something, and may not even be known to someone else making another set of assertions with respect to the same thing.

OWL: Web Ontology Language

OWL can basically be understood as a further extension of the work begun in RDF Schema to define a set of resources for use in RDF-based descriptions of schemas/ontologies, with more expressive/complicated semantics. Its history begins in the independent development of DAML and OIL by US and European researchers, their fusion into DAML+OIL, and the final stage to reformulate the terms and semantics developed in those efforts to be consistent as an extension of the RDF and RDF Schema framework.

For example, one can describe cardinality constraints on properties with respect to their use with a given class; one can characterize domain-specific properties in terms of classes of logical properties such as transitivity, or relate two properties as being inverses of each other (hasParent/hasChild); one can make assertions about the logical equivalence of two classes or properties or assert the identity of two individual instances; one can characterize classes as class expressions that are logically composed of other classes (via unionOf, intersectionOf, complementOf), or define a class as an enumeration of a set of individuals.

It is important to note that the logical expressions that are introduced in the language have their roots in a tradition of research on the subject of "description logics", for which efficient algorithms have been developed to perform inferences on sets of assertions about classes and properties that allow a reasoning engine to answer questions such as those regarding the logical subsumption of concepts (i.e. hierarchical relationships between concepts that have not been explicitly encoded, but which follow from their definition in terms of properties), or inconsistencies between the constraints asserted for a given concept and an instance asserted to belong to that class of things, but violating the declared contraints on membership of that class.

In connection with this notion of inference power, it is perhaps worth noting that the various flavors of OWL (Lite, DL and Full) are aimed at different levels of inferential power and computational difficulty. OWL Lite is intended to support classification hierarchies and simple constraint features (e.g. cardinality values of 0 or 1). OWL DL includes all OWL language constructs, but restricts their uses in ways that will guarantee that all reasoning based on these constructs will be decidable and complete; this seems mostly to relate to the notion of "type separation", which requires that classes, properties and instances be treated as disjoint sets (e.g. a class cannot be considered a "instance" of some other class, only a "subclass"). OWL Full removes these restrictions, but with the result that "it is unlikely that any reasoning software will be able to support ever feature of OWL Full."

It may be instructive to look at the use cases and requirements that were developed by the group responsible for the OWL proposal (Web Ontology Language (OWL) Use Cases and Requirements) and to consider whether we could imagine similar usages in the context of a system like MOBY.

A decent (and fast) practical overview of the application space of ontologies at the level of DAML+OIL (i.e. OWL DL) is given in DAML+OIL for Application Developers. This covers the most important constructs found from RDF up to DAML+OIL and provides a good set of pointers (a little bit dated, but fairly representative) to applications that have been developed on top of the semantic web framework (from RDF to DAML+OIL).

A good introduction to the history behind this level of the stack and the sorts of problems that are being addressed at this level is given in OIL in a Nutshell

DAML-S: An extension of DAML+OIL for characterization of "services"

DAML-S is an extension of the DAML+OIL ontology that provides a specialized set of ontologically defined terms for use in describing service capabilities. It is intended to be complementary to the capabilities provided by standards such as WSDL; whereas the latter provides a specific syntax for describing how to interact with a service (message formats, protocol bindings, etc.) it does not provide any formal semantics for describing what the service does. I believe that the myGrid work has built on top of the DAML-S foundation as an domain-neutral ontology for services that they have augmented with bioinformatic-specific service concepts (BLAST et al.)

The objectives of DAML-S are to develop a language for service description that specifically provides for:

semantically rich discovery of services based on specified constraints
automated invocation of services, including facilities for interoperability, e.g. message parameter translations
composition of new services from existing services
monitoring execution of services

It is worthwhile to look at some of the motivating examples of these features in section 2 of DAML-S: Semantic Markup for Web Services. These include reasonably complicated examples of services, with a particular emphasis on the notion of dynamic facilitation of interoperation via "computer-interpretable API", as opposed to possible senses of the latter phrase that might be restricted to compiler-like type-checking.

The DAML-S "upper" ontology is broken into three parts: ServiceProfiles, ServiceModels and ServiceGroundings.

ServiceProfiles are intended to describe services at a level that will support discovery. It is concerned with high-level characteristics of services such as inputs, outputs, preconditions and effects, information about the service provider and functional characteristics such as quality of service or geographic radius. It should be noted that description of these characteristics is semantic as opposed to the syntactic characterization of XML Schema types given in WSDL documents to describe the contents of messages. The emphasis here is on a declarative representation of service capabilities that is not bound to any one form of registry or style of lookup. For example, the case where demand for a service outweighs supply is discussed in terms of a registry of requests that would presumably be characterized using similar semantics and queried by the providers of services. The characterization of services at this level is intentionally somewhat less precise than what would be necessary in order for a consumer of the service to interact with it; the focus of the profile is to enable discovery.

The next aspect to service description in the DAML-S ontology is the ServiceModel. This is primarily concerned with describing the process model (control flow/data flow) involved in using the service and is aimed at enabling composition and execution of services. There is a degree of overlap between what may be specified in the Profile vs. what is specified in the Model, for example both support description of inputs, outputs, preconditions and effects; however, there is no constraint that the information specified in these two places be the same. The basic idea is that while the description offered by the Profile is aimed at rough matching of the needs of a consumer with the capabilities of the provider, the Model is used to support an actual interaction with the discovered service, and thus may wish to specify this information more precisely or provide fuller details. For example, services may wish to expose some details about their internal process model, such as whether or not they are "atomic" or "composite"; in the case of a composite process, it may describe how it is composed of other services and how the information/control flow takes place between the components. This latter area (information/control flow) is also the subject of several other web services oriented standards having to do with choreography or orchestration of conversational state such as WSFL (Web Services Flow Language) and BPEL4WS (Business Process Execution Language for Web Services). In some ways, this seems like a strange capitulation of the basic notion of encapsulation, but as I understand it, the idea is to support such use cases as: -representation of "workflows" that support high-level goals, but whose components (service instances) may be composed given a particular client's preferences or runtime circumstances -process monitoring (The REST community has some interesting ideas about how the REST approach can be used for the purposes of coordinating distributed processes based on REST's resource-centric approach: see, for example A Web-Centric Approach to State Transition)

Finally, DAML-S provides the notion of a Service Grounding which describes how to take the abstract specification of the Model and translates it into a concrete messages to be passed between the service consumer and service provider. It is very similar to the concept of a binding presented by WSDL, and the authors of the DAML-S ontology show how a DAML-S grounding can be specified in terms of a mapping onto WSDL. I'm not clear on all the subtle details here, but the most significant point seems to be that the WSDL specification of types for its messages is done in terms of an XML Schema specification, while the DAML-S Grounding specifies message parts in terms of DAML-OIL classes. Thus, the latter is "semantically accessible" to inference engines, whereas the former is "syntactically concrete" enough to be used by toolkits that can automatically generate the messages.

As far as significance to MOBY is concerned, I think that DAML-S is at least worth a certain amount of consideration in terms of its separation of the notion of description for the purpose of service discovery and description for the purposes of invocation or other interactions with a service. It seems reasonably clear that the lack of semantics in WSDL is problematic with respect to service discovery, and the UDDI solution to this (as far as I understand it) seems to rely on predetermined taxonomies of service types which are far from being well-defined in our domain. On the other hand, it's not clear to me how useful many of the specific classes/properties for service description that are given by DAML-S might be for MOBY. I believe some of the myGrid folks have expressed a certain amount of dissatisfaction for the level of complexity introduced by some of these upper-level ontologies.

Significance to MOBY

As far as the significance to MOBY of RDF and the associated "semantic stack" is concerned, it seems to me that there are several major issues to be considered. (These are pretty rough at this point, but may help to ground all of the abstract discussion into our "problem space" a little bit...)

The first is simply the question of to what extent MOBY might benefit from the sorts of applications that are already being developed around this framework, from simple APIs for manipulating sets of RDF statements to inference engines and semantic search tools built on top of the RDF foundation and higher levels in the stack. It's pretty clear that the myGrid project has gone far along this path (and are up at the OWL level of the stack as far as semantic expressiveness is concerned); see, for example the recent announcement of myGrid's choice of the Cerebra "inference engine" to drive the project: http://lists.w3.org/Archives/Public/www-rdf-logic/2003Jan/0000.html. Whether or not we feel we need to embrace this level of semantic complexity for MOBY, it seems clear that we are going to at least be dealing with class/subclass and class/instance relationships, and will need to support traversing these links; possibly also further "graph navigation" of schemas via property relations between classes.

Next, to what extent do we see the goals of the "semantic web" in terms of its extreme embracing of the "open world" prinicple as being consistent with the MOBY vision; or conversely, are we perfectly happy to accept that in certain respects we can assume a certain level of internal convention? For example, I could easily imagine wanting to construct an XML element structure representing position on a genome that was "semantically opaque" in the sense that it had subelements (e.g start and end) that were not intended to be understood or referenced independently of their context in the genomic position structure. On the other hand, it seems to me that there are fundamental semantic constructs we will need that are supplied by RDF, such as the basic construct of asserting "semantically-typed" data about a uniquely identified thing and at least the very basic sorts of concept hierarchy and perhaps concept/property relationship semantics; it certainly seems worth considering as a foundation for data/metadata representation for our system.

Finally, I should note that in some ways, the core of the vision for the semantic web seems to have some interesting parallels (at least superficially) to some of the problems that were explored independently in ISYS and DAS. For example, I see the work we did in terms of loose data modeling with the IsysAttribute and IsysObject constructs as being quite similar in respects to the property-centric view of RDF and the support for dynamic aggregation of data with respect to an object changing its interpretation in the system. On the DAS front, there seems to be a loose parallel between the notion of the reference server establishing a common coordinate system for "any annotation server to provide annotations for any reference sequence" and the notion of URI space providing the common identity space for "anyone to make any RDF assertion about anything (represented in URI space)".