preg_replace has a limit on how much data it can handle. I'm not sure what the limit is, but I don't think it can handle a 400 page novel.
My 2nd suggestion is to map the characters into a character map, then return them encoded as html entities. I'm assuming this is for a form submission. So when someone submits "<script src="haxx.com/sesshijack.js>", it comes back exactly how it came in, accept the < is a < etc etc.
That's how these forums (may) work. You can also build an array of each character you want taken out (its not as flexible as regex) and just loop through it with str_replace, or remove the html completely.
“The power of the Web is in its universality. Access by everyone regardless of disability is an essential aspect.”
—Tim Berners-Lee, W3C Director and inventor of the World Wide Web
Bookmarks