Hi all,
I am new to this group. I need help regarding a perl script which parses the web log file, access_log.
The format of the access_log is:
127.0.0.1 - - [15/Jun/2003:13:54:02 -0100] "GET /xxxx HTTP/1.1" 200 34906
The goal is to
1. Perfom a count of the pages for the given timestamp. It is possible that multiple pages exist with the same timestamp (As the timestamp I mentioned above).
2. Within a range of time interval, say, 15 minutes starting with the timestamp of the first line in the log file, I would like to compute the average of the number of pages, minimum and maximum number of pages in that interval.
3. I would like the output as below. Following is just an example.
Time Average Pages Min Pages Max Pages
--------------------------- ----------------- -----------------
15/Jun/2003:14:09:02 6.5 3 10
15/Jun/2003:14:24:02 5.5 4 7
I shall appreciate an early response.
Thanks in advance
Regards
Andy
Hi all,
I developed a perl script to parse the web log, access_log. However, I am having difficulty in getting the output that I want.
The output that I am looking at is:
Time Average Pages Min Pages Max Pages
--------------------------- ----------------- -----------------
15/Jun/2003:14:09:02 6.5 3 10
15/Jun/2003:14:24:02 5.5 4 7
-----------------
Pl. look at the perl script and suggest any changes to the script.
Thanks in advance
Andy
:-)
-----------
Here's the Perl script:
#!/usr/bin/perl
use Getopt::Long;
use Time::Local;
my $file="access_log_modified";
my $line;
my $count;
my $begin_time = "";
my $end_time;
my %seen = ();
my @visual_pages = ();
my ($datetime, $get_post, $Day, $Month, $Year, $Hour, $Minute, $Second);
my $interval = 60; #An interval of 1 minute
my @pages_processed;
count_recs();
sub count_recs {
open (INFILE, "<$file") || die "Cannot read from $file";
WHILELOOP: while (<INFILE>) {
$line = $_;
chomp;
($datetime,$get_post) = (split / /) [3,6];
$datetime =~ s/\[//;
($Day,$Month,$Year,$Hour,$Minute,$Second)= $datetime =~m#^(\d\d)/(\w\w\w)/(\d\d\d\d):(\d\d):(\d\d):(\d\d)#;
next WHILELOOP if ($get_post =~ /\.js$/ || $get_post =~ /\.gif$/ || $get_post =~ /\.css$/);
unless ($begin_time) {
$begin_time = $datetime;
}
$end_time = $datetime;
&calculate_time($begin_time, $end_time);
} #while
foreach $visual_page (sort by_seen keys %seen) {
push (@{$pages_processed{$visual_page}}, $seen{$visual_page});
}
foreach $page_processed (sort keys %pages_processed) {
print "$page_processed: @{$pages_processed{$page_processed}}\n";
}
close(INFILE);
}
sub calculate_time {
my @visual_pages = ();
my @processed_visual_pages = ();
###Break up the date time into Day, Month, Year, Hour, Minute and Second.
($begin_Day,$begin_Month,$begin_Year,$begin_Hour,$begin_Minute,$begin_Second)= $begin_time =~m#^(\d\d)/(\w\w\w)/(\d\d\d\d):(\d\d):(\d\d):(\d\d)#;
($end_Day,$end_Month,$end_Year,$end_Hour,$end_Minute,$end_Second)= $end_time =~m#^(\d\d)/(\w\w\w)/(\d\d\d\d):(\d\d):(\d\d):(\d\d)#;
###Since the Day above is in the Alpha format, Jan, Feb,... and not numeric
###format, 01, 02, 03,..., we need to convert it to a numeric format.Otherwise,
###we cannot pass Day to timelocal or localtime modules. That's why the
###subroutine is called. It converts Jan into 01 and so on.
&Initialize;
my $begin_seconds = timelocal($begin_Second, $begin_Minute, $begin_Hour, $begin_Day, $MonthToNumber{$begin_Month}, $begin_Year-1900);
my $end_seconds = timelocal($end_Second, $end_Minute, $end_Hour, $end_Day, $MonthToNumber{$end_Month}, $end_Year-1900);
###elapsed time is the difference between two timestamps of two consecutive
###records in the log file.
my $elapsed = $end_seconds - $begin_seconds;
###We check whether the elapsed time is greater than the interval that we
###choose, 1 minute or 15 minutes. If yes, then we need to start counting the
###records into a new 15 minute interval. If no, count the number of records
###in the same interval. Also, reset the begin_time and end_time, for the new
###count. Store all the interval periods into an array, processed_visual_pages.
if ( $elapsed > $interval ){
$count = 0;
$begin_time = $end_time;
$end_time = $datetime;
push (@processed_visual_pages, $end_time);
} else {
push (@visual_pages, $end_time);
foreach $visual_page (@visual_pages) {
$seen{$visual_page}++;
}
}
}
sub Initialize {
my %MonthToNumber=(
'Jan', '01',
'Feb', '02',
'Mar', '03',
'Apr', '04',
'May', '05',
'Jun', '06',
'Jul', '07',
'Aug', '08',
'Sep', '09',
'Oct', '10',
'Nov', '11',
'Dec', '12',
);
my %NumberToMonth=(
'01', 'Jan',
'02', 'Feb',
'03', 'Mar',
'04', 'Apr',
'05', 'May',
'06', 'Jun',
'07', 'Jul',
'08', 'Aug',
'09', 'Sep',
'10', 'Oct',
'11', 'Nov',
'12', 'Dec',
);
}
sub by_seen () {
( $seen{$b} cmp $seen{$a} );
}
-----------------
The output I get is:
25/Apr/2003:13:54:02: 3
25/Apr/2003:13:54:19: 2
25/Apr/2003:13:54:22: 4
25/Apr/2003:13:54:34: 3
25/Apr/2003:13:54:38: 5
25/Apr/2003:13:54:41: 3
25/Apr/2003:13:54:43: 6
25/Apr/2003:13:54:44: 3
25/Apr/2003:13:54:46: 5
25/Apr/2003:13:54:47: 2
25/Apr/2003:13:54:48: 3
25/Apr/2003:13:54:50: 7
25/Apr/2003:13:54:51: 4
25/Apr/2003:13:54:53: 2
25/Apr/2003:13:54:58: 3
25/Apr/2003:13:55:01: 2
25/Apr/2003:13:55:02: 4
25/Apr/2003:13:55:05: 4
25/Apr/2003:13:55:08: 1
25/Apr/2003:13:55:14: 3
25/Apr/2003:13:55:15: 1
25/Apr/2003:13:56:13: 5
25/Apr/2003:13:56:27: 5
25/Apr/2003:13:56:35: 4
25/Apr/2003:13:56:40: 4
25/Apr/2003:13:56:45: 1
25/Apr/2003:13:56:51: 5
-----------------------------
I would like to group the output by interval, say a 1 minute interval. So, I want to see all the entries starting with 25/Apr/2003:13:54:02 and ending with 25/Apr/2003:13:55:02 grouped as:
Time Average Pages Min Pages Max Pages
25/Apr/2003:13:55:02 5.5 4 7
----------------------