The XML Files
by Nate Zelnick
XML seems to make life more complex, but it will really simplify Web page coding.
Web developers have had to deal with an astounding amount of technical change over the last three years. As the Web has expanded to include more and more of the technical landscape, it has accreted more and more complexity. Add in a vicious war to set browser standards, and the job of building Web sites has strayed far from its initial simplicity.
XML (eXtensible Markup Language) has emerged not so much as another new technology added on to the pile that is straining the Web, but rather as a reassertion of one of the original principles that made the Web possible: Simplicity.
Before the Web became supreme, there were lots of architectures vying to become the universal means of information access. But they all had drawbacks, principally their proprietary nature and lack of market share necessary to reach critical mass. The genius of the Web was a straightforward way to mark up documents and the assurance that they could be retrieved anywhere, by anyone on any platform.
Tim Berners-Lee used HTML as the basis for his World Wide Web with the full knowledge that it was inadequate for anything more than structuring text. Amidst a wider discussion of how to add media and binary elements to HTML, Marc Andreesen added image support to Mosaic, released it on the Internet, and the Web as we know it today was born.
But that discussion about binary elements was about a lot more than just pictures. It was focused on how to extend HTML in a more general fashion. The approach taken with Mosaic's IMG tag became an unfortunate precedent in how HTML was later extended. Netscape, the successor to Mosaic, followed a "new tag" approach to adding new presentation elements to HTML.
This strategy was then mirrored by Microsoft when it engaged Netscape in the browser war. Pretty soon the explosion of tags began to splinter the Web into ghettoes of pages viewable by either Navigator or Explorer, but not both.
The fight over how to add image support to HTML was a fight about abstraction. Where the Mosaic crowd wanted a straightforward tag for images, opponents had argued for a more generic approach: A single "object" tag that could be used to encapsulate any kind of binary element.
Abstraction is a key concept for understanding just what XML is designed to do. Where HTML grew presentation-specific tags like barnacles, XML is designed as a framework, wherein any tag can be created but has no concept of presentation. Instead, tags denote structure alone, and presentation is left to another application.
Let's take another step back to see how this is important. One of the things that makes the Web so cool is that data created for it is accessible from any kind of computer and, with work, any number of networked devices. This device independence is crucial to the Web's success, because it means that any page marked up in HTML can be viewed on any platform and by any application that knows how to parse the language.
This idea of device indepedence itself derives from HTML's parent, the Standard Generalized Markup Language (SGML). SGML is a framework for describing languages that themselves describe the structure of data. If you've been paying attention, you'll recognize that SGML sounds a lot like XML. That's because XML is an effort to combine the straightforward simplicity of the Web with the flexibility of SGML.
We won't dwell on the differences between SGML and XML (they're not huge). Instead, let's take a look at one application of the conceptual similarities of the two.
The key concept in understanding what XML is for is structure. Where HTML combines structure and presentation in a single rigid tag set, XML clearly delineates structured data from its presentation.
For example, in HTML we denote structure of text by defining generically a set of headline tags--from top level H1 heads down to H6 heads. Subsequent versions of HTML added author-controllable presentation attributes such as text alignment, but initially headlines were defined solely as relative levels in a hierarchy; presentation specifics were up to the browser or other parser. In essence, headlines were defined only as abstracted structural elements.
The advantages of this approach are many. First of all, authors don't have to be designers -- they are free to create documents that have structural integrity and leave presentation to others. Presentation can be adjusted more easily -- either for groups of documents or for a single user with special requirements -- without having to risk chaging the underlying information since the document doesn't need to be touched once the structure is created. A corollary to this last point is that the document remains readable even if the consuming application changes radically: Documents can live forever.
This is one of the reasons why people are jazzed about XML. Meanwhile, check out my resources page.