WebDeveloper.com ®: Where Web Developers and Designers Learn How to Build Web Sites, Program in Java and JavaScript, and More!   
Web Developer Resource DirectoryWebDev Jobs  
Animated GIFs
CSS
CSS Properties
Database
Design
Flash
HTML
HTML 4.01 Tags
JavaScript
.NET
PHP
Reference
Security
Site Management
Video
XML/RSS
WD Forums
 Client-Side
  Development

    CSS
    Graphics
    HTML
    JavaScript
    XML
    Dreamweaver/FrontPage
    Multimedia
    Web Video
    General
    Accessibility

 Server-Side
  Development

    ASP
    Perl
    PHP
    .NET
    Java
    SQL
    Other

 Web Development
  Business Issues

    Business Matters
    Website Reviews

 E-Commerce
    Domain Names
    Search Engines

 Etc.
    Computer Issues
    Forum Software
    Feedback
    The Coffee Lounge



Script Downloads
Disable Form Buttons

Featured: July 23, 2008
Description: This script will disable your submit button in order to prevent multiple form submissions. Easy to implement.

Get Script

Hosting Search
Unix   Windows
PHP   Webmail

Sign up for the free WebDeveloper E-mail newsletter!


JupiterWeb Commerce
Partners & Affiliates
Partner With Us
Website Load Testing
Calling Cards
Web Hosting Directory
PDA Phones & Cases
Online Education
Car Donations
Shop Online
Corporate Gifts
Logo Design
Imprinted Gifts
GPS
Compare Prices
Baby Photo Contest
Prepaid Phone Card

internet.com
IT
Developer
Internet News
Small Business
Personal Technology
International

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers

Just because Web sites are easy to build these days, that doesn't mean it's easy to build a quality Web site that meets your business objectives.

Before developing your next Web site, or redesigning an existing site, download this Internet.com eBook to guide you through the process and plan your project, whether you're developing a site in-house or outsourcing the project.
Register now for your free Internet.com membership to download your complimentary eBook. Membership will also give you access to:

eBook library         Whitepapers         Webcasts
Newsletters         WinDrivers

Web Log Analysis: Who's Doing What, When?
Part 2

by Glenn Fleishman

Other Analysis

There are plenty of other kinds of analysis you can do with a Web log. For example, you might be interested in looking at the number of bytes used by directory--especially if you're running multiple servers to the same log file--and the frequency of retrievals by browser and referer. I have included two Perl scripts here, called Bytecount and Quickdirty, that can perform these tasks.

The Bytecount script accomplishes some very simple actions. It reads in a Web log file name, gunzips it if it's a gzip file, then analyzes the data stream by using the path. It grabs the first bounded directory--i.e., whatever it finds in /blah/--and uses that to create an associative array where the transferred bytes are accumulated. When the file's done, it creates a short summary of the usage by top-level directory. The default cutoff point is 100 kilobytes.

Quickdirty does a bit more. It's been useful for a few of my company's clients who receive daily summaries of where people come from and what they're using. This script helps them make decisions about tailoring content to users and browsers.

Quickdirty lives up to its name by avoiding any deep analysis. The script just summarizes, by request, the number of times a given browser or referer URL shows up, and spits this data out. We use a crontab to generate these reports in the middle of the night, which is why the script calls for dumping to STDOUT if a command-line argument is supplied to the program.

The two variables at the top of the file, named clientthres and refthres, provide a minimum cutoff point for the browser and referer summaries. On a given day or week, you might have thousands of unique referers and several hundred browsers, but in general, you and your clients will probably care only about the most frequently accessed referring links and the most commonly used browsers. These variables let you set the number of top responses you're interested in--say, the "top 10" browsers or the "top 100" referring links.

Commercial Analysis

To give you an idea of what the commercial programs are capable of, I've included a couple of charts generated by Intersé Market Focus. The system contains information about different kinds of browsers, as well as a lookup database for all domains to their registering organization. Table 1 shows the breakdown of visits by browser. Table 2 is a list of topic referer organizations--that is, the organizations that sent the most users our way. You can also graph daily visits to the site; this could be presented in table form, too, but such information is better represented to the reader through a graph.

Table 1:

Browser product	No. of visits	% of visits
1. Netscape Navigator	146,876		62.12
2. Unknown browser 	65,092		27.53
3. CompuServe Mosaic	6,737		2.85
4. America Online	5,953		2.52
5. Lynx		5,165		2.18
6. Internet Explorer	3,234		1.37
7. NCSA Mosaic	1,207		0.51
8. IBM WebExplorer	876		0.37
9. Prodigy		722		0.31
10. Netcom Netcruiser	559		0.24
Totals:		236,421		100.00

Table 2:

Referer organization			No. of visits	% of visits
1. Yahoo				45,895		19.41
2. Infoseek				43,513		18.40
3. Carnegie-Mellon University		6,273		2.65
4. Pittsburgh Supercomputer Center	4,018		1.70
5. Mississippi State University	3,621		1.53
6. Wake Forest University		2,665		1.13
7. Webcrawler Search Engine (AOL)	2,104		0.89
8. OpenText Corp.			1,298		0.55
9. PGH.PA.US				1,247		0.53
10. CF.AC.UK				1,087		0.46
Totals:				116,121		49.12

CERN Rewiring

You may draw some inspiration from Home Improvement's enterprising Tim "Toolman" Taylor: If your Web server doesn't log referers and clients, your first impulse may be to rewire it.

I didn't do the actual rewiring of my company's CERN server myself; for that I have our talented contract programmer Raj Vaswani to thank. He rewired the CERN http daemon to provide a simple solution for logging all Web information in one file.

With this server, you can actually use a logging component to record any client variable in a separate file using the following directives. The variables' names must end with an equal sign.


EnvLog	/usr/local/cern_httpd/env.cstoll
EnvLogVar	SCRIPT_NAME=
EnvLogVar	HTTP_USER_AGENT=
EnvLogVar	REFERER_URL=

You can use the directive


LogFormat	Extended


to turn the regular log file into an "extended log format" file per the above discussion.

Beating the Underbrush

These bits and pieces are certainly not the definitive way to suck every bit of information out of the mass that is a Web log. Be sure to also check out, for instance:

Internet Profiles
Intersé
NetCount

This information should give you the impetus to get started on Web log analysis and give you a better appreciation of how to customize the site for your user population, provide tracking information to customers, and--for some of us--to better target and sell advertising.

[ < Web Log Analysis: Who's Doing What, When?:
Part 1 ]
[ Web Log Analysis: Who's Doing What, When?:
Part 3 > ]




Acceptable Use Policy

JupiterOnlineMedia

internet.comearthweb.comDevx.commediabistro.comGraphics.com

Search:

Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

Jupitermedia Corporate Info


Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

Web Hosting | Newsletters | Tech Jobs | Shopping | E-mail Offers