www.webdeveloper.com

View Poll Results: Would you consider yourself to be a REGEX expert?

Voters
3. You may not vote on this poll
  • I know everything there is to know about REGEX.

    0 0%
  • I know more than most.

    2 66.67%
  • I know enought to get the results I want.

    1 33.33%
  • I know more than you.

    0 0%
Results 1 to 9 of 9

Thread: REGEX general questions

  1. #1
    Join Date
    Dec 2004
    Location
    Midwest USA
    Posts
    20

    Post REGEX general questions

    I'm using REGEX to check values in a form. Currently I'm using the POSIX version (ereg function). In the PHP documentation it describes POSIX REGEX as greedy vs PERL as not. Unfortunately, they don't explain what they mean by 'greedy'. Could someone shed some light on that?

    When I'm checking for patterns I find the ereg functions return true if any character matches the pattern unless I specify the exact size of the string so the whole pattern is matched. For example:

    ereg("[a-zA-Z0-9]", $somevariable);

    returns true if any character in the string matches the set. Whereas

    ereg("[a-zA-Z0-9]{n}", $somevariable);

    Will match any alphanumeric string n characters long. As I don't know the size of the string someone might enter using {1,n} does not work either.

    I've found a work around:

    $i = strlen($somevariable);
    ereg("[a-zA-Z0-9]{". $i . "}", $somevariable)

    And even though this works I'm certain there are much better ways to accomplish this with the perl REGEX functions. My primary concern is SQL injection and limiting fields for input.

  2. #2
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    18,921
    "Greedy" means it finds the largest chunk of text that matches the pattern. For instance, with the string, "she sells sea shells by the sea shore" and the regexp /^sh.*sh/; a greedy parser would find the match as "she sells sea shells by the sea sh", while a non-greedy(?) parser would match it as "she sells sea sh".

    Hope that helps?
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  3. #3
    Join Date
    Dec 2004
    Location
    Midwest USA
    Posts
    20

    did that help ...yes

    the pattern /^sh.*sh/; (if i'm reading it correctly) means except the first 'sh' /^sh match the string up to and including the characters before .*sh/ The first string to match the pattern is "she sells sea sh". And the greedy REGEX will go to the last sh that matches the pattern. I don't understand why you put the period in /^sh.*sh/ however. Wouldn't /^sh*sh/ have the same results?

  4. #4
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    18,921

    Re: did that help ...yes

    Originally posted by epsilonv
    the pattern /^sh.*sh/; (if i'm reading it correctly) means except the first 'sh' /^sh match the string up to and including the characters before .*sh/ The first string to match the pattern is "she sells sea sh". And the greedy REGEX will go to the last sh that matches the pattern. I don't understand why you put the period in /^sh.*sh/ however. Wouldn't /^sh*sh/ have the same results?
    '^' = beginning of line (not to be confused with '[^abc]' which would mean NOT the letter a, b, or c.)
    '.' = any character
    '*' = 0 to any number of the preceding character. (This tends to confuse just about everyone at first, since most of us learn about using it as a filename wildcard first, where it means "any characters" all by itself.)

    So, /^sh.*sh/ = 'sh' at the start of the line, followed by any number (including none) of any characters, followed by another 'sh'.
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  5. #5
    Join Date
    Dec 2002
    Location
    Pleasanton, CA
    Posts
    2,132
    Let me expand on that a little further with regards to greedy
    Code:
    
    ^sh - start of line which must be 'sh'
    .*  - 0 or more characters (ANY including whitespace) to the END of the line (greedy)
    sh  - then search BACK to first occurance of 'sh'
    
    ^sh - start of line must contain 'sh'
    .*? - 0 or more characters, searching FORWARD (not greedy)
    sh  - to NEXT occurance of 'sh'
    

  6. #6
    Join Date
    Dec 2004
    Location
    Midwest USA
    Posts
    20

    Ok...got it.

    Thank you for your help. I wish I could find a good book (that I could understand) on the subject. I've read a few documentation pages and usually, if they are written well, is all I need. With pattern matching it takes alittle more effort than I was expecting.

  7. #7
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    18,921
    The regexp mechanism is almost a programming language of its own. O'Reilly has a whole book dedicated to Perl Regexps.
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  8. #8
    Join Date
    Dec 2004
    Location
    Midwest USA
    Posts
    20

    I'll check it out

    Thanks I'll check with O'Reilly

  9. #9
    Join Date
    Dec 2004
    Location
    Midwest USA
    Posts
    20

    Wow I can read it online...sweet

    checkout safari.oreilly.com

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles