www.webdeveloper.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 21

Thread: Sorting Dates in Perl

  1. #1
    Join Date
    Mar 2008
    Posts
    49

    Sorting Dates in Perl

    Hi,

    I have a file with different fields and many records
    like
    Date F1 F2 F3

    I want to sort all records by their DATEnTime field. Date format is given as under

    2005-11_19T11:39:00+08:00
    2005-09-14T01:22:00+0000


    How can I sort such time values :

  2. #2
    Join Date
    Oct 2007
    Location
    Vienna, Austria
    Posts
    389
    First, 2005-11_19 is a typo, right? What you really have is 2005-11-19, isn't it?

    Second, do you really have one timezone given as +08:00 (with colon) and other as +0000 (without colon)? Or was it a typo?

    Third, how do you want these two dates to compare?
    Date1: 2005-11-19T12:00:00+0800
    Date2: 2005-11-19T13:00:00+0000

    That is, do you take timezones into regard?

  3. #3
    Join Date
    Oct 2007
    Location
    Vienna, Austria
    Posts
    389
    What most likely will do the job:
    Code:
    use Date::Parse;
    my @records_sorted = sort {
        
        # extracting date fields from the lines
        my ($date_str1) = split /\s+/, $a, 2;
        my ($date_str2) = split /\s+/, $b, 2;
        
        # parsing the date strings into numeric timestamps
        my $date1 = str2time($date_str1);
        my $date2 = str2time($date_str2);
        my $cmp;
        
        # check for parse errors
        if (not defined $date1) {
            warn("Failed to parse date: $date_str1");
            $cmp = -1;
        }
        elsif (not defined $date2) {
            warn("Failed to parse date: $date_str2");
            $cmp = 1;
        }
        else { # success, compare dates numerically
            $cmp = $date1 <=> $date2
        }
        
        $cmp
    } @records
    Of course you need to have the Date::Parse module installed.
    Also note that this code assumes that your date fields are the very first thing on each line and have a whitespace character after them.

  4. #4
    Join Date
    Mar 2008
    Posts
    49
    HI Thanks for your reply.

    Installing packages is always a problem. Yes using underscore was a TYPO.

    Second, do you really have one timezone given as +08:00 (with colon) and other as +0000 (without colon)? Or was it a typo?

    >>< I even dont know what these are after the DATES (after T) . I just need to sort the documents according to their time they were created and time is given in this format

    2006-02-05T06:56:00-05:00
    2006-02-01T22:33:05+0000
    2006-02-06T09:13:00+0000
    2006-02-04T21:06:00-08:00
    2006-02-01T01:42:20+0000
    2006-01-31T22:53:18+0000

    What if I just take one time like this
    2006-02-05T06:56:00


    is there anyway sorting this without installing DATE package

  5. #5
    Join Date
    Oct 2007
    Location
    Vienna, Austria
    Posts
    389
    Without the module, you have a problem. You'd have to parse the dates, which would be pretty easy without the timezones. With them, you'd have to be able to add and subtract dates, which is crazily complicated. It's definitely easier to install a module, even if you only have FTP access, trust me, I did both.

    But if you can live without the timezones in regard, that is, if you're OK that you'll list 2010-01-03 14:25 CET later than 2010-01-03 13:40 UTC, then it's you can do something like this:

    Code:
    # from 2006-02-05T06:56:00-05:00
    # to   2006_02_05_06_56_00
    sub numify_date {
        my ($date_str) = @_;
        my $date_num =~ s/\D+/_/g;
        return substr($date_num, 19)
    }
    
    my @records_sorted = sort {
        
        # extracting date fields from the lines
        my ($date_str1) = split /\s+/, $a, 2;
        my ($date_str2) = split /\s+/, $b, 2;
        
        # parsing the date strings into numeric expressions
        my $date1 = numify_date($date_str1);
        my $date2 = numify_date($date_str2);
        
        $date1 <=> $date2
        
    } @records

  6. #6
    Join Date
    Mar 2008
    Posts
    49
    Quote Originally Posted by Sixtease View Post
    Without the module, you have a problem. You'd have to parse the dates, which would be pretty easy without the timezones. With them, you'd have to be able to add and subtract dates, which is crazily complicated. It's definitely easier to install a module, even if you only have FTP access, trust me, I did both.

    But if you can live without the timezones in regard, that is, if you're OK that you'll list 2010-01-03 14:25 CET later than 2010-01-03 13:40 UTC, then it's you can do something like this:

    Code:
    # from 2006-02-05T06:56:00-05:00
    # to   2006_02_05_06_56_00
    sub numify_date {
        my ($date_str) = @_;
        my $date_num =~ s/\D+/_/g;
        return substr($date_num, 19)
    }
    
    my @records_sorted = sort {
        
        # extracting date fields from the lines
        my ($date_str1) = split /\s+/, $a, 2;
        my ($date_str2) = split /\s+/, $b, 2;
        
        # parsing the date strings into numeric expressions
        my $date1 = numify_date($date_str1);
        my $date2 = numify_date($date_str2);
        
        $date1 <=> $date2
        
    } @records
    Thanks but I could not understand the line "extracting date fields from the lines". Which line do u mean here.

    Scenario is that I have an Array of dates and I have to sort them.
    Thanks already for ur kind help

  7. #7
    Join Date
    Oct 2007
    Location
    Vienna, Austria
    Posts
    389
    Quote Originally Posted by kimskams80 View Post
    I have a file with different fields and many records
    like
    Date F1 F2 F3
    I understood from this that your @records were lines of text like
    Code:
    2006-02-05T06:56:00-05:00 foo bar baz
    2006-02-01T22:33:05+0000 goo car caz
    2006-02-06T09:13:00+0000 hoo dar daz
    Thus I wrote code that was sorting these whole lines according to the dates.

  8. #8
    Join Date
    Mar 2008
    Posts
    49
    Thanks for reply.

    Its not working. Its printing Whatever.
    Code:
    2005-11-28T11:39:00+0000    F1  1            4    0    0    0    3 
       0    0    8    0    0    0    0    20    0    19    0    0    0    0    0
    
    2005-09-14T02:13:45+0000    F2   1            2    0    0    0    1 
       0    0    6    0    1    0    0    2    0    0    0    0    0    0    0
    
    2005-04-18T04:41:41+0000    F31            2    0    0    0    8 
       0    0    4    0    0    0    0    0    0    0    0    0    0    0    0
    
    2005-04-14T15:07:11+0000    F1            1    0    0    0    8 
       0    0    3    0    0    0    0    0    0    0    0    0    0    0    0

    Above is the way it prints sorted records which is not correct.

  9. #9
    Join Date
    Oct 2007
    Location
    Vienna, Austria
    Posts
    389
    Yea, there were a few bugs, I didn't really test it. This version works for me.
    Code:
    #!/usr/bin/perl
    use strict;
    use warnings;
    
    my @records = (
        '2005-11-28T11:39:00+0000    F1  1            4    0    0    0    3',
        '2005-09-14T02:13:45+0000    F2   1            2    0    0    0    1',
        '2005-04-18T04:41:41+0000    F31            2    0    0    0    8',
        '2005-04-14T15:07:11+0000    F1            1    0    0    0    8',
    );
    
    sub numify_date {
        my ($date_str) = @_;
        $date_str =~ s/\D+//g;
        return substr($date_str, 0, 19)
    }
    
    my @records_sorted = sort {
    
        # extracting date fields from the lines
        my ($date_str1) = split /\s+/, $a, 2;
        my ($date_str2) = split /\s+/, $b, 2;
    
        # parsing the date strings into numeric expressions
        my $date1 = numify_date($date_str1);
        my $date2 = numify_date($date_str2);
    
        $date1 <=> $date2
    
    } @records;
    
    print "$_\n" for @records_sorted;
    I think you could have debugged it yourself -- show some initiative, man.

  10. #10
    Join Date
    Mar 2008
    Posts
    49
    Sir I did debug and I removed syntax errors but still even after your corrected version, its not printing well the sorted record. I want the records be printed date wise ... Publish the one first with OLD DATE.
    But its giving kind of mix results i.E. Unsorted yet.

    I dont know why.

  11. #11
    Join Date
    Oct 2007
    Location
    Vienna, Austria
    Posts
    389
    Okok, no offense meant.

    If you copy the above script verbatim and run it, doesn't it print this?
    Code:
    2005-04-14T15:07:11+0000    F1            1    0    0    0    8
    2005-04-18T04:41:41+0000    F31            2    0    0    0    8
    2005-09-14T02:13:45+0000    F2   1            2    0    0    0    1
    2005-11-28T11:39:00+0000    F1  1            4    0    0    0    3

  12. #12
    Join Date
    Mar 2008
    Posts
    49
    Its working for this small array.

    But the array I have ... its much bigger and when I do with that array then the result is not sorted.

    Code:
    2005-12-25T22:09:00-08:00      1            3    0    0    0    1    1    1    1    0   
     0    0    0    0    0    1    0    0    0    0    0
    
    
    2005-12-28T18:11:57+0000      1            5    0    0    0    2    0    0    2    0    
    2    0    0    2    0    0    11    0    0    0    0
    
    
    2005-12-27T07:40:14+0000     1            1    0    0    0    2    0    0    2    0    
    2    0    0    2    0    0    11    0    0    0    0
    
    
    2005-12-29T03:07:10+0000       1            4    0    0    0    32    0    0    31    0  
      0    0    0    8    0    0    5    0    1    0    0
    
    
    2005-12-30T19:38:41+0000      1            1    0    0    0    2    0    0    2    0    
    0    0    0    1    0    0    0    0    6    0    0
    
    
    2006-01-03T22:53:36+0000       1            1    0    0    0    1    0    0    16    0   
     0    0    0    0    0    0    0    0    0    0    0
    
    
    2006-01-03T23:18:24+0000       1            1    0    0    0    4    0    0    8    0    
    1    0    0    4    0    1    0    0    0    0    0
    
    
    2006-01-01T21:03:00-05:00      1            1    0    0    0    0    0    0    2    0   
     0    0    0    0    0    0    0    0    0    0    0
    
    
    2006-01-03T16:24:31+0000       1            1    0    0    0    22    0    0    21    0  
      0    0    0    3    0    0    0    0    0    0    0
    Above is a portion of the array which is sorted. Below is the code

    Code:
    ################ sorting records with date ##############################"
    sub numify_date {
        my ($date_str) = @_;
        my $date_num =~ s/\D+//g;
        return substr($date_num, 0, 19);
    }
    
    my @records_sorted = sort {
        
        # extracting date fields from the lines
        my ($date_str1) = split /\s+/, $a, 2;
        my ($date_str2) = split /\s+/, $b, 2;
        
        # parsing the date strings into numeric expressions
        my $date1 = numify_date($date_str1);
        my $date2 = numify_date($date_str2);
        
        $date1 <=> $date2
        
    } @BIG;

  13. #13
    Join Date
    Oct 2007
    Location
    Vienna, Austria
    Posts
    389
    It's likely the data. Could you show me what exactly you have in the @BIG array? (preferably the smallest set of rows that get sorted wrong)

  14. #14
    Join Date
    Mar 2008
    Posts
    49
    Code:
    2005-04-18T04:41:41+0000    BLOG06-20051208-003-0020952408   1            2    0    0    0    8    0    0    4    0    0    0    0    0    0    0    0    0    0    0    0
    
    2005-04-14T15:07:11+0000    BLOG06-20051208-003-0020975072   1            1    0    0    0    8    0    0    3    0    0    0    0    0    0    0    0    0    0    0    0
    
    2005-12-05T20:50:00+00:00    BLOG06-20051209-025-0019107314   1            1    0    0    0    3    0    0    1    0    0    0    0    0    0    1    0    0    0    0    0
    
    2005-12-08T20:26:06+0000    BLOG06-20051209-043-0030624711   1            15    0    0    0    15    15    15    182    0    2    0    0    0    0    0    77    0    0    0    0
    
    2005-12-08T01:25:32+0000    BLOG06-20051209-043-0030877786   1            14    0    0    0    14    14    14    170    0    1    0    0    0    0    0    72    0    0    0    0
    
    2005-12-04T23:19:00+0000    BLOG06-20051209-043-0031124156   1            13    0    0    0    13    13    13    157    0    0    0    0    0    0    0    67    0    0    0    0
    
    2005-12-03T00:50:20+0000    BLOG06-20051209-043-0031382711   1            12    0    0    0    12    12    12    145    0    0    0    0    0    0    0    62    0    0    0    0
    
    2005-11-30T07:53:36+0000    BLOG06-20051209-043-0031626543   1            11    0    0    0    11    11    11    133    0    0    0    0    0    0    0    57    0    0    0    0
    
    2005-11-26T02:56:21+0000    BLOG06-20051209-043-0031885727   1            10    0    0    0    10    10    10    121    0    0    0    0    0    0    0    51    0    0    0    0
    
    2005-11-22T22:31:41+0000    BLOG06-20051209-043-0032144895   1            9    0    0    0    9    9    9    109    0    0    0    0    0    0    0    45    0    0    0    0
    
    2005-11-22T00:28:52+0000    BLOG06-20051209-043-0032393882   1            8

    This is the BIG array (an initial part of it)

    thanks

  15. #15
    Join Date
    Oct 2007
    Location
    Vienna, Austria
    Posts
    389
    Works for me. You'll have to show the exact data structure. Please dump the first few records like this:
    Code:
    use Data::Dumper;
    print( Dumper( [ @BIG[0..4] ] ) )

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles