A protocol is a set of conventions or rules that allow the orderly and sensible exchange of information. When two parties or software agents agree on a protocol, they agree on an encoding and decoding scheme for the representation and interpretation of data. There are hundreds of communication protocols (see, for example, Javvin's map of communication protocols). A common organization of communication protocols is the OSI-RM (Open Systems Interconnect Reference Model) which was developed by the ISO in 1978. This multi-layer model creates a protocol stack:
Protocol Layer |
Description |
Example |
7. Application |
Interface to user processes |
HTTP, FTP, SMTP |
6.
Presentation |
Architecture-independent
data encoding |
PP |
5.
Session |
Establishes connection
between processes on different hosts; handles security and creation of
the sessions. |
NetBIOS/IP |
4.
Transport |
Point to point connection
between hosts |
TCP[1],
UDP |
3.
Network |
Packet routing and multicast |
IP |
2.
Data link |
Low-level framing and
error correction. MAC addresses are at this level. |
IEEE
protocols |
1.
Physical |
Electrical and mechanical
connections to the network |
Ethernet,
Token ring |
Protocol Layers (Reconstructed from queries at: http://foldoc.doc.ic.ac.uk/foldoc/index.html)
For MOBY, we will be concerned exclusively with the Application layer; that is, a new protocol that uses HTTP, FTP, etc. to handle the exchange of data under a shared syntax and semantic. This may be accomplished by either extending an existing protocol (for example, using extension headers in HTTP), or layering on top an existing application layer (for example, using SOAP's binding to HTTP, SMTP, etc.). A benefit to the former approach is that HTTP is functionally rich yet simple; it is well documented and broadly adopted, and this makes it is relative easy to implement a set of extension standards amongst compliant web servers. HTTP's clean use of headers and payloads, MIME-types, and its set of mature helper applications makes it an excellent platform with which to rapidly develop a prototypical model or even a new protocol[2].
Yet a tight dependency on HTTP has its limitations: HTTP is a stateless, synchronous protocol (see below), and this has limitations in a web service environment. For example, see IBM's proposal for a "reliable" HTTPR protocol to be layered on top of HTTP[3]. Just as SOAP[4] can be layered on top of HTTP, it can be layered on top of HTTPR, so the use of a separate messaging protocol like SOAP offers an additional layer of encapsulation. Messaging layers like SOAP that can layer on top of both synchronous and asynchronous protocols may offer elegant benefits, since asynchronous protocols are increasingly considered essential for web services.
The decision between the extension or additional layering of protocols may be aided by "grading" protocols- including a new MOBY protocol- on how they satisfy desirable properties. Ideally, protocols are scalable, efficient, simple, and extensible. Additionally, protocols may operate without the need for a maintained open connection (asynchronous), or they may maintain an interaction between sender and receiver (synchronous); they may send a series of self-contained requests (stateless), or they may set and use information across requests (stateful). Desirable properties may sometimes be antagonistic, for example, a compact binary encoding may increase a protocol's efficiency but reduce its simplicity; similarly, the construction of state information may increase per request efficiency at the cost of connection overheads. The protocol designer should consult the use cases and technical requirements of the protocol to balance these conflicting properties.
Property |
Description |
Scalability |
Properties are preserved
upon massive scaling. |
Efficiency |
Bandwidth is conserved
by compactly sending relevant, and only relevant, information. |
Simplicity |
Simple tasks are done
simply, complicated tasks can be decomposed
into simple tasks. |
Extensibility |
The protocol can adapt
to address unforeseen changes. |
Synchronicity |
Asynchronous: does not wait for reply from receiver (e.g., ANTP[5], BEEP[6])[7]. Synchronous: waits (blocks) for a reply (e.g., HTTP, FTP). |
State |
Stateless: requests are self-contained (e.g., HTTP, IP, NFS, UDP). Stateful: connection or session information is used over the packet stream or across multiple requests within a session (e.g., FTP, SMTP, TCP). |
Desirable Protocol Properties (Based in part on: http://mappa.mundi.net/features/mtr/properties.html)
Because of the ubiquitousness of HTTP as an Application layer for the web, the remainder of this document discusses the above six properties as they relate to HTTP and contrasting protocols. HTTP is a connection-based, client-server protocol; that means it is based on the model that one computer (a client) sends a request to another computer (the server), and the server then sends a response back to the client. The client-server model places distinctly different software demands on the client and the server, which is easily seen in noting that merely accessing the web (as a client) is not the same as publishing on the web (as a server). This basic design can be contrasted with peer-to-peer architectures, where all parties access and publish data. HTTP is synchronous: there is no way within the protocol itself to get a response without first sending a request. Both the request and the response may contain arbitrary data in the message body. This differs from single-direction protocols like SMTP (Simple Mail Transfer Protocol) for email, where the protocol does not support anything other than acknowledgements and status messages as a response.
There are only two types of messages in HTTP: requests and responses. Both are specified in a simple, human-readable format, with requests consisting of a start line specifying a method (such as GET or POST), a series of headers (specifying details about the request), and in some cases a body or payload for data. Responses are equally simple, with a status code in the start line, a series of headers, and also a body. A useful header is the content-type, which allows a binding of the data to specific applications, thus allowing clients a simple lookup mechanism for custom data handling.
|
Request |
Response |
Start
line |
GET
/resource/file.txt HTTP/1.1 |
HTTP/1.1
200 OK |
Headers |
Accept: text/* |
Content-length:
14 |
|
|
Content-type: text/plain |
<blank line> |
|
|
Body |
|
Hello, world! |
Based on Figure 1-7 of Gourley, D. and B. Totty
2002 HTTP: The Definitive Guide. O'Reilly & Associates.
The web and HTTP rely heavily on the concept of data as resources,
and these resources are referenced by Uniform Resource Identifiers,
or URIs. URIs may be of two types,
either URLs (Uniform Resource Locators) or URNs (Uniform Resource Names). URLs specify a location via a scheme (or
protocol, such as http:// or ftp://), a server,
and a path within the server (e.g., http://www.server.org/path/file.htm). In dereferencing the URL, the server's name is mapped
to a unique electronic address (IP address) by a network of DNS (Domain Name
Service) lookup servers. URNs are location independent
resources and thus are not tied to IP addresses (e.g., urn:nameSpaceID:nameSpaceString). This
makes dereferencing the URN problematic (where is it?), and thus- in the absence
of URN-URL mapping servers- URLs are used to the virtual exclusion of URNs. URIs are highly relevant to MOBY, since early thinking
on MOBY has not exploited the application of MOBY objects as URI identified
resources.
MOBY will essentially encompass three protocols, one for syntactical messaging, one for semantic interpretation, and one for data/service discovery and mapping. Even if these properties are embedded in a single API instead of being delineated as separate "protocols," MOBY will have to specify the conventions or rules for the parsing of data, its meaning, and its mechanisms for service discovery. The design and construction of solutions to these tasks may be aided by assessing technologies against the properties of scalability, efficiency, simplicity, extensibility, synchronousity, and state.
[1]
TCP (Transmission Control Protocol) is a reliable, byte-stream protocol: data is bundled into packets with a checksum
and each packet is sequentially numbered. The byte-stream
is guaranteed to be able to be reassembled in order, and the protocol allows
the receiver to send requests back to the sender to resend corrupt packets. In this sense, TCP is reliable, since higher layered protocols
can send and forget over TCP with respect to data transmission integrity. UDP (Use Datagram Protocol) is a datagram (vs. a byte-stream) protocol: it unreliable
and connectionless, meaning that it does not employ receiver
acknowledgements (though datagrams do have checksums), nor does it guarantee
a sequential flow of datagrams across larger data streams.
UDP is "datagram centric," and is thus appropriate where small amounts
of data are being sent in low-overhead conditions. Because UDP does not handle
data loss or corruption as part of the protocol per se, integrity
is the responsibility of higher layers. See http://www.tcm.hut.fi/Studies/Tik-110.350/1997/Essays/udp.html
and www.novell.com/documentation/lg/nw6p/index.html?page=/documentation/lg/nw6p/tcpipenu/data/h1a308vx.html.
[2] Roy Fielding, a noted web guru, proposes the deployment of a new web protocol via HTTP's Update header field in "waka: A replacement for HTTP" available at www.apache.org/~fielding/waka/ 200211_fielding_apachecon.ppt.
[3] See www-106.ibm.com/developerworks/library/ws-phtt. "Reliability" is used with respect to failure recovery; e.g., multiple POSTs under HTTP may have side-effects such as duplicate orders in a shopping cart or replicate updates in a database, so failed connections while POSTing should not be naively resent. HTTPR addresses these types of issues in both synchronous and asynchronous settings.
[4]
SOAP, Simple Object Access Protocol, is an XML-based messaging protocol for data exchange (www.w3.org/TR/SOAP).
SOAP is an Application layer messaging protocol: the
emphasis in SOAP is how messages are constructed (vs., for
example, the Transport layer protocol TCP where the emphasis is in how data
is delivered).
[5]
Asynchronous Notification Transport Protocol; see http://simp.mitre.org/drafts/antp.html.
[6]
Blocks Extensible Exchange Protocol; see http://www.beepcore.org/beepcore/home.jsp,
http://www.clipcode.com/peer/beep_technical_whitepaper.htm,
and www.ietf.org/rfc/rfc3080.txt.
[7]
For interesting reading on how to use HTTP asynchronously, see Technical Whitepaper
by Clipcode.com at www.clipcode.com/peer/http_async_notif.htm.
[8]
Wu, J. and F. Dai 2003 A
Generic Broadcast Protocol in Ad Hoc Networks Based on Self-Pruning, Accepted
by the 17th International Parallel and Distributed Processing Symposium (IPDPS
2003) , Apr. 2003, Nice,
[9]
See studies cited in Touch, J. J. Heidemann, and K. Obraczka 1996 Analysis
of HTTP Performance. USC/Information Sciences Institute Available at www.isi.edu/lsam/publications/http-perf.
[10]
See also Litjens, R. M. Siler, and M. D. Spiller 1995 FTP versus HTTP: A Comparison
of Two Mainstream Transfer Protocols EE228A Fall 1995. Available at www-cad.eecs.berkeley.edu/~mds/research/1995/http.html.
[11]
Though see some of the earlier work on PEP - an Extension Mechanism for HTTP
http://www.w3.org/TR/WD-http-pep.html.
[12] URLs created dynamically with user-identifying information in the local resource component.