WebDeveloper.com �: Where Web Developers and Designers Learn How to Build Web Sites, Program in Java and JavaScript, and More!   
Web Developer Resource Directory WebDev Jobs
Animated GIFs
CSS Properties
HTML 4.01 Tags
Site Management
WD Forums

    Web Video
    Expression Web



    Forum, Blog, Wiki & CMS

 Site Management
    Domain Names
    Search Engines
    Website Reviews

 Web Development
  Business Issues

    Business Matters

    The Coffee Lounge
    Computer Issues

A Look at XML Part 3

A Look at XML
Part 3

Sometimes browsers and other users of documents need to know about those documents to check the meaning of things, parse the contents, and use the documents effectively. Metadata is the solution to this problem, and I believe XML is a fine solution for metadata.

Jim Whitehead's paper "A Proposal for Web Metadata Operations" defines metadata to be "information about information... Information on the Web, known as Web resources, have many pieces of associated descriptive information which is often not explicitly represented in the resource itself. Examples of metadata include the creator of a resource, its subject, length, publisher, creation date, etc. Such descriptive metadata can be used to make information easier to locate by improving Web searches, rate information to protect children from indecent content (e.g. the Platform for Internet Content Selection (PICS)), capture copyright information, contain a digital signature, or store cataloging data. Many other uses are also possible."

Jim Whitehead goes on to declare that "Another type of metadata is the relationship. A relationship captures an association between two or more resources, and can be one to one, one to many, or many to many. Relationships can be used to capture navigational relationships, such as "go to this resource next," or a table of content, and can also express hierarchies (parent/child, successor/predecessor) Relationships have many domain-specific uses, such as a piece of software which has many "implements" relationships with a requirements document. Annotations are another use of relationships in which the relationship points to commentary material on the resource. The use of relationships to capture associations between data items is an old idea, stemming from semantic data modeling, and early hypertext work on the NLS and Xanadu systems."

Metadata on the Web

Now, let's take this a level further. Metadata on the Web should be applicable to resources of any media type. Note that on today's Web, many resources are not HTML (to name a few examples, Java applets, Adobe Portable Document Format, and Adobe Postscript). Metadata on the Web should provide descriptive information about Web resources of any media type, including those that have no built-in provision for storing general purpose metadata (and never will).

Rohit Khare identified some overlap between some current metadata proposals:

  1. PICS. They realize that URLs are not enough for secure pointers, so they had to figure out how to differentiate different version, media-types, languages, etc that could be behind one location. Led to...
  2. PICS-NG. collided with a separate intra-PICS movement to have more structured rating values (strings, structs, pathnames, set inclusion/exclusion, etc). Ora Lassila from Nokia is working on that draft at W3C.
  3. WebDAV. Jim Whitehead's team effort with respect to metadata is described in his "Proposal for Web Metadata Operations".
  4. Dublin Core, et al. Actual, concrete metadata schemas were banging on our door for acceptance too. Who defines "author", "publisher", etc? The usual digital library community suspects.
  5. SiteMap. Microsoft originally proposed a stylized use of HTML to outline a site, for use in collapsible 'remote controls' and printing. Used nested ULs to indicate hierarchy, etc -- too much tacit knowledge.
  6. Digital signature manifests. It becomes evident almost instantly that one needs to sign packages, not atomic blobs, so we needed a DSIG Common Manifest Format for enumerating bills of materials.
  7. Email to HTML. Qualcomm wanted to use HTML as the native UI format for mail, but need a way to structurally markup quoted regions, etc. Drove an ABOUT tag proposal Dave Raggett made earlier, in order to associate metadata about one quotee in several quotes.
  8. XML. Of course, at the same time as wars are being fought between ()s (PICS) and {}s (PEP), <> has emerged as the industry standard for "open dust" (e.g. Open Financial Exchange, most amusingly, HDML, most corrosively). So SiteMaps morphed into XML-syntax-based proposals. Hence the CDF submission, metadata about push channels rendered in XML.

While these metadata proposals are in some sense roughly related (they all use the word metadata), most of them are complementary technologies. For example, Dublin Core, MARC, and RFC1806 (the Dienst bibliography format) are all bibliographic record formats, created (as you would hope) by researchers from the digital libraries community. These formats were NOT intended to solve the general-purpose Web metadata problem -- for example, none of these bibliographic record formats can effectively convey PICS-like rating information. On the other hand, PICS is not a good bibliographic record format. Thus, PICS and Dublin Core/MARC/RFC1806 can peacefully coexist.

The WebDAV Proposal

Jim Whitehead's "Proposal for Web Metadata Operations" gives a framework that explains the relationship between "large chunk" metadata proposals like PICS, PICS-NG, Dublin Core, MARC, Web Data, etc. and the "small chunk" HTTP extensions proposal. His proposal contains an extensive hyperlinked reference section, which makes it easy to track down the source material being described. The remainder of this section is quoted from Jim Whitehead himself off the FoRK mailing list.

Basically, the proposal extends the HTTP object model to create a new area for state storage within a resource, to be used for the storage of name/value metadata pairs. While there is no effective upper bound on the length of a metadata item (and hence you could make a name/value pair like "PICS-label", "{an instance of a PICS label}", typically you'd want to create a link on the resource which points to the PICS label which is itself stored as a separate resource. Methods are introduced to create name/value pairs (ADDMETA), delete name/value pairs (DELMETA), and to access name/value pairs (GETMETA). The GETMETA method is bundled with a simple s-expression like search syntax, so if you want to get a listing of all the attributes on a resource you'd pass a search specification of (OR (AND (name "*")(value "*"))). Hypertext links are defined as a special type of metadata with some constraints on the format and semantics of the value of the link name/value pair (e.g., name="DAV:/Link", value="Type = {token} Source = {URI} Dest = {URI}").

The WebDAV proposal supports small chunk metadata and large chunk metadata. It doesn't address packaging issues, because there are already many proposals for how to package metadata. Far fewer proposals actually address how this metadata is stored and associated with the resources they describe. Because the proposal is implemented via HTTP, it also provides the ability to store metadata on resources of *any* content type, not just HTML. The WebDAV proposal describes "how" metadata is stored and associated, while efforts like Dublin Core, PICS-NG, Web Data, etc., describe "what" metadata is stored, and its packaging.

Thus the WebDAV proposal is complementary to packaging efforts such as Dublin Core, PICS-NG, Web Data, XML, Digital Signature manifests, and so on.

Regarding the Protocol Extension Protocol, PEP only describes extensions to HTTP that involve adding new headers to modify the semantics of existing methods. WebDAV is proposing to add several new methods to HTTP, and hence is outside the scope of PEP. This applies equally to methods like COPY and MOVE as well as to methods like GETMETA. As for HTTP purity, Roy Fielding was present at the meeting where they crafted the GETMETA method (he helped write the BNF for the search syntax), and there are few others who can claim the mantle of "HTTP purist" more effectively than he.

And Yet, I Still Have XML on my Mind

That aside, I still think XML has an excellent role to play in the evolution of metadata for the Web. Then again, I think trust on the Web is really important, too. It's definitely something to think about. Meanwhile, enjoy some of the links I've collected pointing to some of the best XML information I could find on the Web.

[ < A Look at XML:
Part 2 ]
[ A Look at XML:
Part 1 > ]

HTML5 Development Center

Recent Articles