www.webdeveloper.com
Results 1 to 4 of 4

Thread: Extract every instance of a string between tags.

  1. #1
    Join Date
    Sep 2011
    Location
    Bristol, England, United Kingdom
    Posts
    192

    Extract every instance of a string between tags.

    Hi all,

    I really need some help with a little issue I'm having. I've come to the conclusion that I need to use some regex expressions to solve the problem, but I'm not comfortable with them yet. I understand how they work but the syntax is horribly ugly, so I've stayed away from them so far.

    Anyway this is the problem:

    I am building a wiki-style web site for game programming tutorials. I have a <textarea> where all input is entered and I have code tags (the same code tags used on this board) where source code is entered. I am using the PHP library GeSHi to highlight the code and so far I have been able to find a solution which allows me to extract everything between {code}{/code}, parse it with GeSHi and re-insert it back in to the original string as fully coloured and formatted source code.

    The problem with this method became apparent when I tried entering a 2nd set of {code}{/code} tags in the editor--it just messed everything up.

    So, I need to extract the source code (which is just a string) between every instance of the {code}{/code} tags, and have that stored in an array. Then, I need to be able to re-insert the syntax-highlighted code where it was found, with the {code}{/code} tags removed.

    The previous method I used of course only worked with a single instance of {code}{/code}, and my method was thus:

    PHP Code:
    // function to extract source code between [code][/code]
    function extract_code($string$start$end)
    {
        
    $pos_start strpos($string$start);
        
    $pos_end   strpos($string$end, ($pos_start strlen($start)));
            
        if((
    $pos_start !== false) && ($pos_end !== false))
        {
            
    $pos1 $pos_start strlen($start);
            
    $pos2 $pos_end $pos1;
            return 
    substr($string$pos1$pos2);
        }

    My code is printed with the following:
    PHP Code:
    // this is a nested loop.
    for($j 0$j count($topics); $j++)
    {
        
    // extract contents between [code][/code] tags.
        
    $source_code extract_code($contents_parent_id[$j]['contents'], "[code]""[/code]");
                            
        
    // remove HTML <br /> tags from the source.
        
    $source_code str_replace("<br />""\n"$source_code);
                            
        
    //--------------------------------+
        // highlight the code with GeSHi. |
        //--------------------------------+
        /**/ 
    $language="java";
        
    /**/ $geshi =& new GeSHi($source_code$language);
        
    /**/ $final_code $geshi->parse_code();
        
    /**/ //echo $final_code;
        //--------------------------------+
                            
        // the source code delimiters.
        
    $tagOne "[code]"
        
    $tagTwo "[/code]"
        
    // the starting position of the first tag: [code].
        
    $startTagPos strrpos($contents_parent_id[$j]['contents'], $tagOne); 
        
    // the ending position of the second tag: [/code].
        
    $endTagPos strrpos($contents_parent_id[$j]['contents'], $tagTwo); 
                            
        
    // length of the tag.
        
    $tagLength $endTagPos $startTagPos
                             
        
    // replace all code between [code][/code] tags with the new highlighted code.
        
    $contents_parent_id[$j]['contents'] = substr_replace($contents_parent_id[$j]['contents'], $final_code$startTagPos$tagLength); 
                            
        
    // remove the [code][/code] tags from the source.
        
    $contents_parent_id[$j]['contents'] = str_replace("[code]"""$contents_parent_id[$j]['contents']);
        
    $contents_parent_id[$j]['contents'] = str_replace("[/code]"""$contents_parent_id[$j]['contents']);

        
    // clean up any broken (unclosed) HTML tags.
        // WARNING: USING HTML CLEANER WILL RESULT IN SOURCE CODE DISPLAY TO BE SEVERELY EFFECTED.
        //$contents_parent_id[$j]['contents'] = $purifier->purify($contents_parent_id[$j]['contents']);
        // code is now fully highlighted ready for output.
        
    echo $contents_parent_id[$j]['contents'];

    Can anybody help me out here? I'm getting so confused. Thanks .

  2. #2
    Join Date
    Dec 2011
    Location
    Centurion, South Africa
    Posts
    792
    I've recently also done a bb styled knowledgebase for myself, using regular expressions this is how I did a couple tags:

    PHP Code:
    <?php

        
    function formatType($matches)
        {
            switch(
    strtolower($matches[1])) {
                case 
    'code': return '<span style="border: 1px solid #888; padding: 2px; background-color: #eee;">' htmlentities(trim($matches[2])) . '</span>'; break;
                case 
    'note': return '<span style="border: 1px solid #880; padding: 2px; background-color: #ff0;">' htmlentities(trim($matches[2])) . '</span>'; break;
            }
        }

        
    $text 'Some text [code]First set of code[/code], more text [code]Second set of code[/code]. And lastly some random sentence. [note]Cool![/note]';

         echo 
    preg_replace_callback('/\[(code|note)\]([\w\W]*?)\[\/\1\]/''formatType'$text);

    ?>
    I've given you an example with two different bb tags that are formatted slightly different in the callback function. Obviously you can use GeSHi to format the output within the switch statement.

    The method I'm using also allows the = parameter as well, so if you require that let me know.
    JavaScript: Learn | Validate | Compact

  3. #3
    Join Date
    Sep 2011
    Location
    Bristol, England, United Kingdom
    Posts
    192
    Hi bionoid,

    Thank you for the code, it works perfectly with GeSHi and is an easier solution than I thought I would require. Even though I don't understand the expression '/\[(code|note)\]([\w\W]*?)\[\/\1\]/', that's something I can learn in my own time.

    You mentioned that your code also allows for the = parameter to be used. Would you be able to give me an example of this using [image src="" /], where the = paramater may only link to images on my domain? i.e. [image src="../../images/someimage.png" /] and NOT [image src="http://www.somedomain.com/image.png" /]. Reason being that I want to keep everything local, so images can be uploaded in to the appropriate directory and linked to from there to keep performance up.

    Thank you again.
    Last edited by George88; 02-17-2012 at 04:11 AM.

  4. #4
    Join Date
    Dec 2011
    Location
    Centurion, South Africa
    Posts
    792
    That particular expression wasn't intended for img tags as they are self closing.
    Here is an example with a parameter based tag (url):

    PHP Code:
    <?php

        
    function formatType($matches)
        {
            
    $type strtolower($matches[1]);
            
    $data trim($matches[3]);
            
    $args $matches[2];

            switch(
    $type) {
                case 
    'code': return '<span style="border: 1px solid #888; padding: 2px; background-color: #eee;">' htmlentities($data) . '</span>'; break;
                case 
    'note': return '<span style="border: 1px solid #880; padding: 2px; background-color: #ff0;">' htmlentities($data) . '</span>'; break;
                case  
    'url': return '<a href="' $args '">' htmlentities($data) . '</a>'; break;
            }
        }

        
    $text 'Some text [code]First set of code[/code], more text [code]Second set of code[/code]. And lastly some [url=http://www.google.com]random[/url] sentence. [note]Cool![/note]';

         echo 
    preg_replace_callback('/\[(code|note|url)=?([^\]]*)\]([\w\W]*?)\[\/\1\]/''formatType'$text);

    ?>
    If could also be used to extend the code tag and specify which language GeSHi should format etc.
    JavaScript: Learn | Validate | Compact

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles