www.webdeveloper.com
Results 1 to 4 of 4

Thread: Streamlining parsing code

  1. #1
    Join Date
    Sep 2008
    Location
    Jackson MS
    Posts
    373

    Streamlining parsing code

    After some minor reformatting of data from a textarea, from a loop I look at it with a variety of regular expressions. When I find one with an index of zero I create an appropriate entry in a work table, remove the length of that construct from the beginning of the string and loop again. A variable "maxOffset" is initialized to the length of the data and, if a found construct has an index > zero, that index will replace maxOffset if it is less that that current value. If I fall through to the end of the loop I will know how long the garbage is, create an error message, and remove it from the string.
    Code:
      while (lines.length > 0) {
         var strLength, maxOffset = lines.length;
    ...
         if ((result = lines.match(/\|\|/))) { //Found double bar line
    	   if (result.index == 0) { lastfound = "double bar";
    	     strLength = result[0].length;
    		 genWorkEntry("Bar|Style:Double");
    		 lines = lines.substr(strLength);
    		 continue;  }
    	   else
             maxOffset = (result.index < maxOffset)? result.index : maxOffset; }
    
    	 if ((result = lines.match(/:\|(\d)/))) {  //Found Repeat Close and Special ending
    	   	if (result.index == 0) {  lastfound = "repclose special";
    	     strLength = result[0].length;
             genWorkEntry("Bar|Style:MasterRepeatClose");
    		 genWorkEntry("Ending|Endings:" + result[1]);
    		 lines = lines.substr(strLength);
    	     continue;   }
    		else
              maxOffset = (result.index < maxOffset)? result.index : maxOffset;	}
    ...
      errText += "Unable to process: " + lines.substr(0,maxOffset) + " length: " + maxOffset + " last: " + lastfound + "\n";
       genWorkEntry("Text|Text:\"" + lines.substr(0,maxOffset)
            	+ "\"|Font:StaffBold|Pos:-8|Wide:Y|Justify:Left|Placement:BestFit|Color:3|Visibility:Default");
       lines = lines.substr(maxOffset);
      }  //End main parsing loop
    Each of these constructs ends with the same (colored) lines. Is there a way to eliminate the need for this? switch and case seems close to the structure, but it can't do regular expressions.

    The website is abcnwc.htm and this code is on lines 541-819.

    TIA

  2. #2
    Join Date
    Dec 2003
    Location
    Bucharest, ROMANIA
    Posts
    15,428
    Not very clear, but I have the impression that you are looking for a way to write dynamically a regular expression. That is to be done using the new RegExp() native Object/Constructor
    Code:
    var expressions=["\|\|",":\|(\\d)"], i=0, e, reg;
    while(e=expression[i++]){
    reg=new RegExp(e);
    if(result = lines.match(reg)){
    //... blah blah
    }
    }
    You may also use a double array (or an object) in order to keep a correspondence between a certain regular expression and a certain string, in order to do something later.

    Anyway, the key object is new RegExp()
    http://www.javascriptkit.com/jsref/regexp.shtml
    http://www.regular-expressions.info/javascript.html

  3. #3
    Join Date
    Sep 2008
    Location
    Jackson MS
    Posts
    373
    Thanks for the links, I've bookmarked both of them.

    Three things can happen on any match statement:
    1. Finds nothing. Try the next match.
    2. Finds something but result.index is > 0. If this is less than maxOffset replace it with result.index then try next match.
    3. Finds something and result.index == 0. Perform unique processing for the found match, remove the length of the found construct from the beginning of lines, and ignore the rest of the loop.


    If there was not a 3., maxOffset contains the length of the garbage to be removed from the front of lines after creating an error message.

    In your suggested code, I think I can take the value of i to a switch statement to do the unique processing--that should get me started.

    Many thanks.

  4. #4
    Join Date
    Sep 2008
    Location
    Jackson MS
    Posts
    373
    I ran into a few problems trying to check out this code:
    1. All carriage returns and/or line feeds have been replaced with a literal "\r\n" before entering this loop but the display in the alert was on two lines. Can I convert this to a regexp without this being reinterpreted?
    2. It quits with "invalid range in character class" on the header (occurrence [1]).
    Code:
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
       "http://www.w3.org/TR/html4/strict.dtd">
    <html>
    <head>
    <title></title>
    <meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
    <script type="text/javascript"> 
    function genNwc() {
    var expressions=[
    "/\\\r\\\n ?w:(.*$)", //lyric   /\\r\\n ?w:(.*$)/
    "/\\\r\\\n([A-Za-z]): ?([^\\\]*)", //header /\\r\\n([A-Za-z]): ?([^\\]*)/
    "\"(_?)([^\"]+)\"", //found text   /"(_?)([^"]+)"/
    "\|\|",   //double bar     /\|\|/
    ":\\|(\\d)",  //Repeat Close and Special ending    /:\|(\d)/
    ], i=0, e, reg;
    
    for (i=0; i < expressions.length; i++) {
      e=expressions[i];
      alert("i= " + i + " e   ->" + e + "<-");
      reg=new RegExp(e);
      alert("i= " + i + " expr->" + reg.source + "<-");
      }
    }
    </script>
    </head>
    <body>
    <form name="nwcform">
      <table>
        <tr>
          <td><input type="button" id="bClick1" value="Submit" onclick="genNwc()"></td></tr>
      </table>
    </form>
    </body>
    </html>
    TIA

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles