www.webdeveloper.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 18

Thread: grouping or counting identical strings

  1. #1
    Join Date
    Jun 2009
    Posts
    9

    grouping or counting identical strings

    All,

    I am looking to take a file that holds latitudes and longitudes

    i.e:

    60.000 270.000
    50.000 180.000
    60.000 270.000
    etc....

    and if the lat and lon are identical either group them on the same line OR, create a new file that would list (from the example above)

    60.000 270.000 (2)
    50.000 180.000 (1)

    is there a way to do this? I've tried looking at the has function but i just cannot get it to work. Im a perl beginner.

    Thanks!!!

  2. #2
    Join Date
    May 2009
    Posts
    64
    Post whatever code you have tried so far even if it does not work. That way people can at least get a sense about how much perl understanding you do have. Only posting your programming requirements and showing no effort makes it look like you just want someone to write your code for you, and this has the ring of school work to it.

  3. #3
    Join Date
    Jun 2009
    Posts
    9
    The code i have been trying to play with is as follows:

    Code:
    #!/usr/bin/perl
    
    my %hash;
    open (MYFILE, "All_origins.txt");
    while (<MYFILE>) {
            my ($lat, $lon) = split(/\\s*/);
            if (exists $hash{$lat$lon}) {
                    $hash{$lat$lon} .= ",$lat$lon";
                    next;
            }
            $hash{$latlon} = $latlon;
    }
    for (keys %hash) { print "$_ $hash{$_}\n"; }
    exit;
    
    close (MYFILE);
    the ideal outcome would be the example from the previous post
    i.e:
    lat lon (#)

    but right now i cant even get the lat lon groups to one line.

  4. #4
    Join Date
    May 2009
    Posts
    64
    You should use "strict" and "warnings" when writing perl scripts, especially for new perl programmers. But anyway..... the code you posted should be returning an error (or errors). What is the error you get?

  5. #5
    Join Date
    Jun 2009
    Posts
    9
    no errors, just a list of random lats and lons with commas between

    i.e.:

    50.000 270.000
    ,,,,,,,,,,
    60.000 070.000
    ,,,,,
    40.000 330.000
    ,,,
    60.000 170.000
    ,,,,,,,,,,,,,,
    30.000 090.000

    60.000 090.000
    ,,,,,,,,,,,
    60.000 300.000
    ,,,,,,,,,,,,,,,,,,
    40.000 350.000
    ,
    60.000 240.000
    ,,,,,,,,,

  6. #6
    Join Date
    Jun 2009
    Posts
    9
    im sorry, it was putting out errors.

    i changed it to this and got the output i listed above.

    Code:
    !/usr/bin/perl
    
    my &#37;hash;
    open (MYFILE, "All_origins.txt");
    while (<MYFILE>) {
            my ($lat, $lon) = split(/\\s*/);
            if (exists $hash{$lat}) {
                    $hash{$lat} .= ",$lon";
                    next;
            }
            $hash{$lat} = $lon;
    }
    for (keys %hash) { print "$_ $hash{$_}\n"; }
    exit;
    close (MYFILE);

  7. #7
    Join Date
    May 2009
    Posts
    64
    Quote Originally Posted by donal0516 View Post
    im sorry, it was putting out errors.

    ....
    Quote Originally Posted by perl_diver
    What is the error you get?

  8. #8
    Join Date
    Jun 2009
    Posts
    9
    from the code that you said should be putting out an error, these are the errors produced


    Scalar found where operator expected at ./counter.pl line 7, near "$lat$lon"
    (Missing operator before $lon?)
    Scalar found where operator expected at ./counter.pl line 8, near "$lat$lon"
    (Missing operator before $lon?)
    syntax error at ./counter.pl line 7, near "$lat$lon"
    Execution of ./counter.pl aborted due to compilation errors.

  9. #9
    Join Date
    May 2009
    Posts
    64
    Perl is trying to help you but it can't always tell exaclty what the error is or even exactly where it is. But in this case its sort of staring you inthe face:

    near "$lat$lon"

    in fact that is exactly the error, you have no double-quotes around $let$lon so that perl can combine them into a string and use them as a hash key. So lets fix that and move on. I am going to add three very useful pragmas to your script: strict, warnings, diagnostics

    Code:
    #!/usr/bin/perl
    
    use strict;
    use warnings;
    use diagnostics;
    
    my &#37;hash;
    open (MYFILE, "All_origins.txt");
    while (<MYFILE>) {
            my ($lat, $lon) = split(/\s*/);
            if (exists $hash{"$lat$lon"}) {
                    $hash{"$lat$lon"} .= ",$lat$lon";
                    next;
            }
            $hash{"$latlon"} = "$latlon";
    }
    for (keys %hash) { print "$_ $hash{$_}\n"; }
    exit;
    
    close (MYFILE);
    Run that and see what happens. Note that I fixed another error in your code, see if you can find it.
    Last edited by perl_diver; 06-04-2009 at 06:19 PM. Reason: correct spelling

  10. #10
    Join Date
    Jun 2009
    Posts
    9
    it appears that this code is neglecting the longitude

    i.e:
    the file looks like

    50.000 240.000
    60.000 220.000
    30.000 180.000
    60.000 310.000
    50.000 100.000

    and when i run this file it outputs

    60, 60
    50, 50
    30

  11. #11
    Join Date
    Jun 2009
    Posts
    9
    and i saw that you changed split(/\\s*/) to split(/\s*/)

  12. #12
    Join Date
    May 2009
    Posts
    64
    I can only assume you did not run the code I posted because you should have gotten another error. I was hoping you would pick up the error (the one I did not correct). You did find the one I corrected: \\s*. So if you did not run the code I posted, what did you run?

  13. #13
    Join Date
    May 2009
    Posts
    64
    anyway... here is the problem with the last code I posted. THsi line taken from your original code:

    Code:
          $hash{"$latlon"} = "$latlon";
    There is no $ symbol before 'lon', should be:

    Code:
            $hash{"$lat$lon"} = "$lat$lon";
    Otherwise you will get an error about $latlon not being packaged. Also, while I did correct \\s* in split, your real split pattern should be \s+ whcih means one or more spaces instead of zero or more spaces. But all this is really academic, which was the point since you are learning perl. If all you wanted to do was count duplicate lines in the file your code is too verbose, this will siffice:

    Code:
    #!/usr/bin/perl
    use strict;
    use warnings;
    use diagnostics;
    my &#37;hash;
    open (MYFILE, "All_origins.txt") or die "$!";
    while (<MYFILE>) {
       chomp;
       $hash{$_}++;
    }
    close (MYFILE);
    for (sort {$hash{$b} <=> $hash{$b} } keys %hash) {
       print "$_ ($hash{$_})\n";
    }
    exit;
    The splitting of the line into two tokens is not necessary unless there is a variable number of spaces between the lat and the lon. And of course your code wasn't even counting anything but I assume you were going to get to that after getting your initial code to run.

  14. #14
    Join Date
    Dec 2002
    Location
    Pleasanton, CA
    Posts
    2,132
    Quote Originally Posted by donal0516 View Post
    the ideal outcome would be the example from the previous post. i.e: lat lon (#)
    Seem that you are making this much more difficult than it need be.
    Code:
    my &#37;hash;
    #open (MYFILE, "All_origins.txt");
    while (<DATA>) {
    	chomp;
    	if (exists $hash{$_}) { $hash{$_}++ } else { $hash{$_} = 1 }
    }
    for (keys %hash) { print "$_ \($hash{$_}\)\n"; }
    #close (MYFILE);
    exit;
    
    __DATA__
    50.000 270.000
    60.000 070.000
    40.000 330.000
    60.000 240.000
    60.000 090.000
    60.000 170.000
    30.000 090.000
    40.000 330.000
    60.000 090.000
    40.000 330.000
    60.000 300.000
    40.000 350.000
    40.000 330.000
    60.000 090.000
    60.000 240.000
    perl diver,
    Great minds think alike. You posted while I was writing.

  15. #15
    Join Date
    Jun 2009
    Posts
    9
    this line

    $hash{"$latlon"} = "$latlon";

    was changed to

    $hash{"$lat$lon"} = "$lat$lon";

    i dont think that $lat$lon is going to get what i want, i want to match the lat lon string to any other lat lon string that is identical but the lat lon strings are

    50.000 240.000

    so there is a space between the two. $lat$lon doesnt do that. When i run the code now, i recieve no errors. but the output puts out something like

    60 60,60,60,60,60,60,60,60,
    50 50,50,50,50
    40 40,40,40,40,40
    30 30,30

    so it appears that it is correctly identifying the matching lats but not lons and not outputting the correct thing.

    By the way this is not a hw assignment. This is for my graduate research, i normally work in IDL but when i need to do some regular expressions or text manipulation i try to use perl or shell scripts. But i havent learned much perl.

    Here is the code as i am running it now
    Code:
    #!/usr/bin/perl
    
    use strict;
    use warnings;
    use diagnostics;
    
    my &#37;hash;
    open (MYFILE, "All_origins.txt");
    while (<MYFILE>) {
            my ($lat, $lon) = split(/\s*/);
            if (exists $hash{"$lat$lon"}) {
                    $hash{"$lat$lon"} .= ",$lat$lon";
                    next;
            }
            $hash{"$lat$lon"} = "$lat$lon";
    }
    for (keys %hash) { print "$_ $hash{$_}\n"; }
    exit;
    
    close (MYFILE);

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles