WebDeveloper.com �: Where Web Developers and Designers Learn How to Build Web Sites, Program in Java and JavaScript, and More!   
Web Developer Resource Directory WebDev Jobs
Animated GIFs
CSS
CSS Properties
Database
Design
Flash
HTML
HTML 4.01 Tags
JavaScript
.NET
PHP
Reference
Security
Site Management
Video
XML/RSS
WD Forums
 Client-Side
  Development

    HTML
    XML
    CSS
    Graphics
    JavaScript
    ASP
    Multimedia
    Web Video
    Accessibility
    Dreamweaver
    General
    Accessibility
    Dreamweaver
    Expression Web

    General

 Server-Side
  Development

    PHP
    Perl
    .NET
    Forum, Blog, Wiki & CMS
    SQL
    Java
    Others

 Site Management
    Domain Names
    Search Engines
    Website Reviews

 Web Development
  Business Issues

    Business Matters

 Etc.
    The Coffee Lounge
    Computer Issues
    Feedback




Environmental Awareness

Without even realizing it, users connecting to a Web site can transmit a wealth of information about themselves and their computing environment. When a server enacts a script, values are set for a number of environmental variables--such as what browser (and version number) is in use, or what MIME file formats the client can accept.

There are many interesting ways to use this kind of client data--from processing forms, to customizing Web pages for different presentations or content depending on a user's browser or IP address. For example, if a site makes use of proprietary browser extensions, users with the targeted browser will load an optimized page, while visitors with other browsers can be redirected to a standardized page. Webmasters can conduct a survey of browsers used to access their sites without building a complex user registration system.

Accessing Client Variable Data

Client variables are sent by the browser to an HTTP server. They are passed to an environmental array depending on the scripting and programming language used (for example, the %ENV array in Perl). By parsing these variables, you can use them to determine what pages or files to make available to users on an individual basis.

Here's a simple Perl script that returns all the variables our browser is sending.

#!/usr/bin/perl
print "Content-type: text/html
   \n\n<HEAD>\n";
print "<PRE>\n";
printf ("%-24.24s %-80.80s\n",
    "Variable", "Value");
foreach (sort keys %ENV) {
   printf ("%-24.24s %-80.80s\n", 
       $_, $ENV{$_});
}
print "</PRE>\n";

Only a couple of these variables are local to the shell invoked on execution of the script. Most are specific client variables that can be used in scripts to create custom results.

Getting Set Up

If you've written Perl or shell scripts before, you know that any data output that you want a remote browser to read as HTML must be preceded by a Content-Type header. This takes the form of a MIME statement showing

Content-type: text/html

and two hard returns, then <HEAD> followed by a hard return. You must follow this exactly, or the remote browser will not read the output from your scripts correctly. The equivalent Perl statement to generate this accurately is:

print "Content-type: text/html\n\n<HEAD>\n";

or in shell ('sh' or 'csh') speak:

cat << EOF
Content-type: text/html

<HEAD>
EOF

This sets the stage for the rest of your output, which gets fed via standard output. If you're expecting any user-fed variables from a form, you need a simple CGI parser that determines GET or POST variable input, parses it, and drops the variables into an associative array. Yahoo lists a variety of these. The specific routine used in examples in this column was available at http://www.bio.cam.ac.uk/web/cgi-lib.pl.txt as of this writing.

Install the script in your cgi-bin directory and note the path to the directory, usually /usr/www/cgi-bin or /usr/local/www/cgi-bin. You then need to include a couple of lines at the top of your Perl routines:

require '/usr/www/cgi-bin/cgi-lib.pl';
&ReadParse;

The associative array %in will then contain the variables from form submissions. This automates a very tedious process, so we suggest using it exclusively.

Talking About Clients

The client variable that stores data as to what browser and version number is used is called HTTP_USER_AGENT. There is no standard format for how this information is returned, but you'll usually see something like:

Browser name/version number (plat-
form; processor; other info) proxy
info or misc.

Since there are so many browsers available on the market, it's difficult to know which standards and extensions are supported by each one. Some generalizations are possible, though. For example, current versions of Netscape can display tables and inline JPEG images; they also support a plug-in architecture that allows in-line video, Acrobat PDF display, Corel CMX rendering, Macromedia Director Shockwave application support, and other features.

Of course, there's no way to know which plug-ins are loaded using client variables, so the most you can assume is the standard feature set by version number. For example, tables have been supported by Netscape since version 1.1, while frames--a feature that allows separately scrollable sub-areas in the main Netscape window--were added only in version 2.0.

Netscape and Microsoft's Internet Explorer have a fairly reliable set of shared features, and most pages designed for one of the two will display similarly in the other; unsupported proprietary tags are simply ignored. Other browsers, such as NCSA Mosaic (the freeware original, which is still being maintained and updated), have very limited table support; and even simpler browsers, like Lynx, display text only.

We've made some broad determinations that to support all browsers reasonably well, there should be three versions of a page: a Netscape/Internet Explorer version, one for the AOL browsers and Lynx, and one for the great unwashed masses of other browsers. A recent examination of one of our customers' usage statistics showed that among 150,000 unique visits in November 1995, there were 150 different browsers running more than 2,100 different versions.

Unique Identification

The only reliable way for a Web site to identify unique incoming users is through a combination of "identd" and registration. Sites that have employed registration often find substantially reduced participation, since it's a barrier to entry for first-time users who may not want to exert the time and effort to register. But for certain kinds of applications, where you need to know your visitors in order to let them post to discussion groups or access restricted or licensed data, it makes sense to take the hit.

Two types of identity can be logged and tracked via environmental variables. In the common log format for Web server logfiles, both the "log name" (the HTTP server-based registered name) and the "remote identity" are logged. Both are also dropped live into the user variables LOGNAME and REMOTE_IDENT.

Setting up local user and password protection varies widely by server. To generalize, most servers support user identification, and you can simultaneously use this identity to restrict access to the site and use scripts to control further access based on values in LOGNAME.

Remote identity checking requires the operation of an ident daemon. This daemon looks on a port on the incoming remote host to see if an identd is running, and, if so, queries it for the user's identity. Unfortunately, if the remote host isn't running identd, turning on this kind of querying can generate a 30-second delay before timeout, depending on the server. With the CERN server, for example, the directive is "IdentityCheck," and the delay is substantial.

Most sites aren't running identd any more, because it can open up a great security risk: If you know a login name coming in from a specific machine, it's much easier to crack into the system.

An alternative way to track users, though not as practical, is to use REMOTE_HOST and REMOTE_ADDR. The REMOTE_HOST variable, when available, is sent as the fully qualified domain name for remote sites (like boombox.bathouse.com) and REMOTE_ADDR is a dotted quad IP address (such as 36.44.0.6). Machines with proper reverse address registration should provide both variables' values. Unfortunately, many system administrators operate in an ad hoc manner, and oftentimes you can only count on getting the REMOTE_ADDR.

For some of our customers, we've built ordering systems that use REMOTE_HOST or REMOTE_ADDR to track a user through a system over short periods of time. We do so under a few assumptions:

  • Unique simultaneous users. Most sites on the Net that consumers use to make purchases have mostly unique addresses. Big sites (like Sun Microsystems or Oracle) may have a single proxy address; but commercial providers and services--such as Prodigy, America Online, CompuServe, Netcom, and PSI--use unique IP numbers for each simultaneous user. You can count on a high degree of probability that any two simultaneous users from the same system will have a unique address.
  • Unique users in time. Since we allow only a short degree of persistence (10 minutes to an hour, depending on the system), even sites such as Sun are unlikely to have two simultaneous users in an ordering system; even if the sites have multiple users over the course of a day, the separation in time provides uniqueness.

This kind of system is untenable for systems that have a large number of orders, but that would probably be on the order of magnitude of 1,000 orders per day, or possibly higher. The majority of ordering on the systems that we manage comes from users who are outside of business firewalls.

Practical Uses

With a little time and effort, you can significantly tweak your pages and sites to provide the maximum automatic customization for incoming users. Below we've listed some real-world uses we've had for customizing access to Web information based on values in client variables. The first two examples are designed to help users minimize wait time. One example demonstrates how to redirect AOL and Lynx users to a text-only page instead of a highly graphics-intensive page. The second example benefits users whose browsers can support inline JPEGs by sending out JPEGs (which compress well and make use of 24-bit color) instead of GIFs (which don't compress as tightly and are limited to 8-bit color). Finally, we provide an order tracking system as described just above, by username and user authentication where possible.

Unique Pages by Browser

One of our customers gets about 6 percent of its visits from users on America Online, which has a notoriously slow browser for loading images, especially its Macintosh version. We proposed maintaining a non-graphics-intensive welcome page for AOL users and users with Lynx browsers--who represent the next largest chunk, with about 2 percent of all accesses to the site. A simple script accomplished the rest:

#!/usr/bin/perl
$root = "/usr/www/local/client/";
require '/usr/www/local/cgi-bin/
    cgi-lib.pl';
&ReadParse;
print "Content-type:
text/html\n\n<HEAD>\n";
$agent = $ENV{'HTTP_USER_AGENT'};
if ($in{'test'}) { $agent = "iweng"; }
if ($agent =~ /(iweng|lynx|aol)/i) {
  $file = "aol";
} else {
    $file = "ns";
}
open (HTML, "<
$root/welcome.${root}.html");
foreach (<HTML>) { print; }

We're using the CERN server, so we also needed to add some Exec statements in the server configuration file so that instead of loading the page welcome.html, it would execute the file. This varies by server, but CERN requires:

Exec  /                /usr/local/www/client/
                           welcome.html
Exec  /welcome.html    /usr/local/www/client/
                           welcome.html

You can't simply omit the first case, because the default path has to be mapped to the executable, or it will simply display the script instead of loading it--which is definitely not what you want!

The script checks for the kinds of browsers it wants to send the alternate page to, and, if it scores a match, it sets the file insert to the right kind of browser. There's also a test case built in, so you can send a URL that has "?test=1" at the end in order to test the page and the script generation.

Serving Up JPEGs and GIFs by Browser

We don't have available a definitive list of which browser supports which graphics file formats, but for this example we're going to serve up JPEGs to just Internet Explorer (uniquely identified by MSIE) and Netscape (Mozilla) users.

This approach is rather like a two-pronged assault. The client variable PATH_INFO passes any information that comes in the form of a directory-like addition after the name of a script. So if you have a URL like

http://www.bimbeaux.com/frogs/pithing/images

where "images" is really a script, not a directory, then you can use the URL

http://www.bimbeaux.com/frogs/pithing/
 images/froggie.goes.a.courtin.gif

and it will assign to PATH_INFO

froggie.goes.a.courtin.gif

This is awfully handy, because it hides from the user what you're doing to send the images.

#!/usr/bin/perl
$root = "/usr/www/rambeaux/images/";
$path = $ENV{'PATH_INFO'};
$browser = $ENV{'HTTP_USER_AGENT'};
if ($browser =~ /(mozilla|msie)/i) {
   $path .= ".jpg";
   $type = "jpeg";
} else {
   $path .= ".gif";
   $type = "gif";
}
if (-e "$root/$path") {
   print "Content-type: image/$type\n\n";
   open (IMAGE, "< $root/$path");
   while (<IMAGE>) { print; }
   close IMAGE;
}

You can place this script in your cgi-bin directory, since it can execute from anywhere, and then it just dumps out the image. The $root variable sets the root location for where the images are located. The $path variable is then set, by browser, to the appropriate extension. At the same time, you need to send out the appropriate MIME line with the appropriate image type in order for the remote browser to correctly parse the incoming data.

Some troubleshooting defaults are built into this script. If the file doesn't exist, then nothing should be output. But you should probably ensure that whenever you reference this script, you have both file types installed.

The PATH_INFO variable will pass the entire path specified following a URL, so you can send

/rambeaux/images/froggie.goes.
 a.courtin

and set the $root variable to the head of the Web directory, rather than pre-wiring.

Tracking Orders by Host

Here's a real chunk of code--you're going to have to learn to love it to use it. This short routine does some complex activity.

if ($ENV{'REMOTE_HOST'}) { $user = 
  $ENV{'REMOTE_HOST'}; }
else { $user = $ENV{'REMOTE_ADDR'}; }
open (ORDERINDEX, "<$root/orderindex");
@orderindex = grep(/^$user\t/i,<ORDERINDEX>);
close ORDERINDEX;
@ordersplit = split('\t',$orderindex[0]);
$orderid = $ordersplit[1];
if (!$orderid) {
	srand (time|$$);
	$orderid = int(rand(899999)) + 
    100000;
	while (-e "$root/$orderid") {
		$orderid = int(rand(899999)) + 100000;
	}
	$write = "$user\t$orderid\t" . 
      time . "\n";
	open(MAKEORDER, ">>$root/orderindex");
	&dofile(MAKEORDER,0);
	print MAKEORDER $write;
	&dofile(MAKEORDER,1);
} else {
	if ((-M "$root/$orderid") > .03) {
		unlink("$root/$orderid");
	} else { 
		system("touch $root/$orderid"); 
	}
}

Here's a natural language version of the operations this code carries out:

  • Do I know the REMOTE_HOST? If not, I'll take the REMOTE_ADDR.
  • Do I have an entry that links this host or IP number to an existing order?
  • If I don't, create a random six-digit number and add it to the index.
  • If I do, check if the order hasn't been updated in more than 10 minutes.
  • If it hasn't, then delete it, assuming the user has wandered off.
  • If it has, touch the file to update the datestamp.

There's a crontab entry, too, for a program that occasionally checks to see if the orderindex file hasn't been written for more than an hour or so, and deletes it if it hasn't been.

The &dofile subroutine is a simple flock automater that flocks the filehandle when you send it a 0, and removes the flock when you send it a 1. The routine can be found in Randall Schwartz and Larry Wall's indispensible Programming Perl (O'Reilly and Associates, 1991), now being revised for Perl 5.


Glenn Fleishman started doing Web development in the Jurrasic Age of the Net--May 1994. He now runs a Web development and hosting company, moderates the Internet Marketing Discussion List, and serves as a contributing editor for Adobe Magazine.


Reprinted from Web Developer® magazine, Vol. 2 No.1 Spring 1996 (c) 1996 internet.com Corporation. All rights reserved.


Web Developer® Site Feedback




HTML5 Development Center


Recent Articles