www.webdeveloper.com
Results 1 to 6 of 6

Thread: [RESOLVED] Substring Algorithm

  1. #1
    Join Date
    Oct 2009
    Posts
    658

    resolved [RESOLVED] Substring Algorithm

    Just wondering what's the best substring algorithm to use in splitting a wysiwyg data without breaking the html tags. Example:

    <p>
    Lorem ipsum <a href="blah_blah_blah"> vidi </a>
    </p>
    and I want to get only the first part, say first 30 chars and render it. That would break the page since its not a properly formatted html/xhtml tags.

    TIA

  2. #2
    Join Date
    Aug 2007
    Posts
    3,767
    /[^<]{0,30}/
    Is crude, but works in most cases.
    Great wit and madness are near allied, and fine a line their bounds divide.

  3. #3
    Join Date
    Jul 2008
    Location
    urbana, il
    Posts
    2,787
    not usre what you want to do, but here are two options that sound close:

    Code:
    var t='Lorem ipsum <a href="blah_blah_blah"> vidi </a>'
    
    
    //to "fix" partial tags:
    var elm=document.createElement("div");
    elm.innerHTML=t.slice(0,30);
    alert(elm.innerHTML)//=="Lorem ipsum <a href="blah_blah"></a>"
    
    
    
    //to ignore tag markup completely and count content chars only:
    var elm=document.createElement("div");
    elm.innerHTML=t;
    var res=elm.innerText || elm.textContent;
    alert(res.slice(0,30))//=="Lorem ipsum vidi "

  4. #4
    Join Date
    Oct 2009
    Posts
    658
    Thanks for the reply. What I'm trying to achieve was the combination of rnd_me's options.

    Code:
    <script type="text/javascript">
        $(document).ready(
            function() {
                var str = $("#staticArticleContainer").html();
                $("#staticArticleContainer").html(str.slice(0, 31));
            }
        );
    </script>
    
    
    <div id="articleContainer">
    <p>Lorem Ipsum is simply <a href="dummy">dummy</a> text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>
    </div>
    Consider the code above. It should include the hyperlink "dummy" on the output. A couple of options that I already worked before was iterating the entire string and closing unclosed tags but that have the wrong character count. Another was go on code behind and store them on multiple level associative array and close them accordingly. Drawback was TOOOO much overhead. Any other algorithms around there?

  5. #5
    Join Date
    Jul 2008
    Location
    urbana, il
    Posts
    2,787
    Quote Originally Posted by ssystems View Post

    Consider the code above. It should include the hyperlink "dummy" on the output. A couple of options that I already worked before was iterating the entire string and closing unclosed tags but that have the wrong character count. Another was go on code behind and store them on multiple level associative array and close them accordingly. Drawback was TOOOO much overhead. Any other algorithms around there?
    the browser is probably the most test, optimized, and downright creative html parser around.
    this function closes off any opening or closing tag, and let's the browser automatically close the remaining open tags to deliver valid html.
    Code:
    function getWhole(str, slot){
      var base=ht.slice(0,slot), elm=document.createElement("div");
      if((base.match(/</g)||[]).length > (base.match(/>/g)||[]).length){
        base+= ht.slice(slot,ht.indexOf(">", slot)+1);
      }
    
      elm.innerHTML=base;
     return elm.innerHTML;
    }
    
    
    var ht='<p>Lorem Ipsum is simply <a href="dummy">dummy</a> text of the\
     printing and typesetting industry. Lorem .';
    
    
    alert(getWhole( ht, 31));
    //==<p>Lorem Ipsum is simply <a href="dummy"></a></p>
    Last edited by rnd me; 10-22-2009 at 02:01 AM.

  6. #6
    Join Date
    Oct 2009
    Posts
    658
    Quote Originally Posted by rnd me View Post
    the browser is probably the most test, optimized, and downright creative html parser around.
    Agree. THanks

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles