[RESOLVED] A RegExp for cutting any JavaScript out from the code
Hello guys! First of all two things:
1. Happy New Year to everyone, wish you tons of good, let this year bring you all only positive emotions and expectations!
2. typeof(Santa) is myth
so, here is my problem. i need a regular expression which cuts any javascript from the page html code. i mean only scripts between <script> and </script> tags, not the inline javascript.
i wrote the function which disables scripts:
Code:
function noscripts(content){var s_path_1=/<script/ig,s_path_2=/\/script>/ig;content=content.replace(s_path_1,'<!--script').replace(s_path_2,'script-->');return content;}
but i would like to fully remove this stuff from the html. i'm using JQuery $.get() and i need to remove scripts from the returned data before any manipulations with it, because i put a portion of the returned data into a temporary div to be able to search through its elements and i do not need scripts in there.
i tried to use this one
Code:
/<\s*script[^>]*>[\w\W\s]*<\s*\/script>/ig;
sometimes it matches but sometimes it does not. can anybody help? thanks in advance
use [code]YOUR CODE GOES HERE[/code] or burn in Hell
Maybe I don't quite understand what you require, but if you put all of a block of HTML into a separate DIV, can't you do something like this:
Code:
var f = document.getElementById('myDivWithJavascriptTagsInIt');
var scr = f.getElementsByTagName('script');
var i, len = scr.length;
for(i = 0; i < len; i++){
scr[i].parentNode.removeChild(scr[i]);
}
thanks Tcobb, but i need to remove scripts from the returned data before i put it into the temp div. there is no any elements at that moment the data is just a large string. that's why i need a regular expression to remove the matching parts from the large string - something like this:
Code:
var noscripts=/bla-bla-bla/ig;
$.get("somepage.php",function(data){
data=data.replace(noscripts,'');/*this removes scripts from the data*/
$('#temp_div').html(data);
/*the data is clean for now and i can manipulate with it*/
});
use [code]YOUR CODE GOES HERE[/code] or burn in Hell
I understand that the javascript may be self-executing, in which case my solution will not help you. I have had to do a similar thing in PHP, and here is the approximate process.
(1) Loop through the string, replacing every '<(space)' with '<'
(2) repeat the loop so long as the resultant length of the string is different
(3) do the same thing for '(space)>'
(4) use regexps to replace any variant of '<script such as '<SCript' with '<script'
(5) use regexps to replace any variant of '</script> such as '</SCript>' with '</script>'
(6) now use regexps to replace anything beginning with <script and ending with </script> with the empty string.
what i am trying to find is just regular expression for removing all the scripts with their tags from the response data markup before it is added on the main page... i could not compose such regexp by myself even using The Regex Coach that's why i asked for help ))
use [code]YOUR CODE GOES HERE[/code] or burn in Hell
I changed your RegExp a bit so it would match more variations of script tags:
Code:
/<\s*script.*?>.*?(<\s*\/script.*?>|$)/ig
Though, I would choose rnd's solution over using regular expressions - it seems to the most secure way.
thanks man, but this regexp matches tags only and ignores everything between these tags. i do not need to remove or disable the script tags only, i am trying to find the way to cut off the script tags and all the code between them by replacing it with nothing
data=data.replace(regexp goes here,'nothing');
for example, let us try to remove all scripts from this code:
Code:
<meta http-equiv="content-type" content="text/html; charset=windows-1251">
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="-1">
<script type="text/javascript" src="/scripts/jquery-1.4.2.min_223.js"></script>
<script type="text/javascript" src="/scripts/overlib.js"></script>
<script type="text/javascript" src="/scripts/ressources.js"></script>
<script type="text/javascript" src="/scripts/form.js"></script>
</head>
<body>
<script type="text/javascript" language="javascript">
var ress = new Array(2647618, 174070, 442978);
var max = new Array(5525950,3069400,1709800);
var production = new Array(32.011111111111, 10.041666666667, 5.6625);
window.setInterval("res_online()",1000);
</script>
<form name="ress" id="ress" style="display:inline">
<INPUT TYPE="hidden" ID="metall" value="0">
<INPUT TYPE="hidden" ID="crystall" value="0">
<INPUT TYPE="hidden" ID="deuterium" value="0">
<INPUT TYPE="hidden" ID="bmetall" value="0">
<INPUT TYPE="hidden" ID="bcrystall" value="0">
<INPUT TYPE="hidden" ID="bdeuterium" value="0">
</form>
this is a fragment of the response data. before i put it in a temp div i need this code to become "script-free", smth like this:
I haven't tried it, but this might do what you're looking for with just a string into string type operation.
Code:
function killScripts(str){ //takes string as argument containing the HTML
//get ride of all outer spaces in tags
var arr, inArr, i, oldLen, output, len = str.length;
do{
oldLen = len;
str = str.replace(/< /ig, '<');
str = str.replace(/ >/ig, '>);
len = str.length;
} while(len != oldLen);
//dispose of case sensitivity
str = str.replace(/<script/ig,'<script');
str = str.replace(/<\/script/ig,'</script');
//now take them out
arr = str.split('<script');
len = arr.length;
output = arr[0];
for(i = 1; i < len; i++){
inArr = arr[i].split('</script>');
output += inArr[1];
}
return output;
}
And FYI, the last expression I posted did not ignore anything between the script tags - it only failed if the script contained newlines.
Edit: on a closer look, the biggest problem with your original expression is that you make a greedy search between the script's start and end tag, thus it'll match the very first <script>, the very last </script>, and everything in between (including other script end and start tags).
i think that we might not need to be so defensive against malformed scripts.
if i understand it, the scripts are coming from a trusted source, so you probably won't see something like:
many thanks to all you guys who tried to help me in this thread! i've always knew that i could find help here ))
ReFreezed, the second regexp edition works perfectly - here is the evidence matches_now.png, thanks! Tcobb, i haven't tried your code yet, but i'm going to try it and put it in my "must have" js folder if it works (it looks like it does), thanks! rnd me, thank you for trying to help me, i very much appreciate it! i know that removing elements through the DOM would be the simpliest way, but it causes js-errors if i let these scripts stay in the response data, that's why i need string operations to sweep the scripts out of the data before i put it in the page. But anway, thanks!
use [code]YOUR CODE GOES HERE[/code] or burn in Hell
Bookmarks