www.webdeveloper.com
Results 1 to 7 of 7

Thread: strip_tags aborts when encountering certain text syntax - Why?

  1. #1
    Join Date
    Jan 2006
    Location
    South Africa
    Posts
    62

    strip_tags aborts when encountering certain text syntax - Why?

    I'm sure this is a feature and not a bug, and I'm overlooking something obvious here. But I just can't see it.

    I have the following code:
    Code:
    $wtf = <<<WTF
    <h1>First header</h1>
    <p class="intro">First line of text</p>
    <h2>Second header</h2>
    <p>Second line of text</p>
    <?= showImg ('image.jpg'); ?>
    <p>Third line of text</p>
    <?= showImg ('image.jpg', ''); ?>
    <p>Fourth line of text</p>
    <?= showImg ('image.jpg', '', ''); ?>
    <p>Fifth line of text</p>
    <?= showImg ('image.jpg', '', 'class="content"'); ?>
    <p>Sixth line of text</p>
    WTF;
    echo strip_tags ($wtf);
    This outputs the following:
    Code:
    First header
    First line of text
    Second header
    Second line of text
    
    Third line of text
    
    Fourth line of text
    
    Fifth line of text
    As you can see the sixth line of text is not included in the output. The culprit is the preceding line,
    Code:
    <?= showImg ('image.jpg', '', 'class="content"'); ?>
    or rather, the third parameter in the showImg() call. As soon as strip_tags() encounters this part, it simply quits without displaying an error message and returns the text processed so far - which leads me to believe that somehow it believes having encountered the end of the data that it's supposed to process.

    Why?

    Incidentally, the
    Code:
    <?= showImg ('image.jpg', '', 'class="content"'); ?>
    bit itself works fine when I run it, and even with full error reporting generates no syntax-related warnings or errors, so I believe it's valid and allowable syntax.

    Can anyone enlighten me as to what's going on here? Thanks - it would be greatly appreciated!

    // Frank

  2. #2
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    19,154
    I cannot tell you for sure why, but certainly it could fall within the realm of the warning in the manual page for strip_tags():
    Because strip_tags() does not actually validate the HTML, partial, or broken tags can result in the removal of more text/data than expected.
    And certainly that is not a valid HTML tag syntax. Maybe you could first apply preg_replace() to remove all PHP tags?
    PHP Code:
    $wtf preg_replace('/<\?(=|php).*?\?>/i'''$wtf);
    echo 
    strip_tags($wtf); 
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  3. #3
    Join Date
    Jan 2006
    Location
    South Africa
    Posts
    62

    Red face

    Hi, NogDog,
    Quote Originally Posted by NogDog View Post
    I cannot tell you for sure why, but certainly it could fall within the realm of the warning in the manual page for strip_tags():
    That's not what's been happening - I'm not getting mangled HTML code, I'm running perfectly valid input through strip_tags() and seeing major chunks that are being left out or truncated off.

    After much Googling, many headaches and some experimenting, it turns out that strip_tags() chokes on double quotes ("") which causes it to skip lines, and totally terminates without reporting an error when it encounters a single/double quote combo ('"), even though the syntax of the input data is perfectly valid HTML. It's not clear whether this is a bug or a feature: it has been reported as a PHP bug, rejected by the PHP team, declared a feature, challenged, then "fixed", and apparently it was originally a "fix" to fix another bug to begin with. So I'm not too sure - in my opinion it's broken if it does this, but who am I to judge.

    Anyway, using a regexp to remove PHP and then running the remainder through strip_tags() is exactly what I ended up doing.

    Thanks for responding!

    // FvW

  4. #4
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    19,154
    This is not valid HTML:
    Code:
    <?= showImg ('image.jpg', '', 'class="content"'); ?>
    It is valid PHP -- if short_open_tags are enabled -- but not XML/HTML, but strip_tags will see the opening "<" and treat it as a tag. Thus you cannot depend upon strip_tags() to handle it correctly.
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  5. #5
    Join Date
    Jan 2006
    Location
    South Africa
    Posts
    62
    Quote Originally Posted by NogDog View Post
    This is not valid HTML:
    Code:
    <?= showImg ('image.jpg', '', 'class="content"'); ?>
    It is valid PHP -- if short_open_tags are enabled -- but not XML/HTML, but strip_tags will see the opening "<" and treat it as a tag. Thus you cannot depend upon strip_tags() to handle it correctly.
    While that is true, the PHP documentation states very clearly that strip_tags() will always remove both embedded PHP code and HTML comments - a feature that cannot even be turned off because it is hardcoded in.

    So. If the PHP doc states that strip_tags will remove PHP, I would expect it to do so - regardless of what the PHP code itself constitutes valid XML/HTML or not, since the documentation does not mention any exceptions. But it doesn't. And something that does not work as advertised is, at least by my standards, broken. :-)

    // FvW

  6. #6
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    19,154
    Hmm...I'll bow to the fact that you've probably put a lot more research into it than I have.

    If you're going to have to do the preg_replace() anyway, maybe you could just use it to remove all tags and save yourself a step?
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  7. #7
    Join Date
    Jan 2009
    Posts
    3,346
    Is it possible it is choking on the short tag (<?=) rather than the full "<?php echo"? The short tags are not well supported anymore anyway and certainly not a best practice.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles