Click to See Complete Forum and Search --> : No HTML


edatz
07-24-2009, 07:44 AM
I use this so that HTML cannot be posted. So far this has worked very well.

$body =~ s/<([^>]|\n)*>//g;

If somebody types in say:
Visit the <a href="http://somesite.com">somesite.com</a> stuff

It will just return:
Visit the somesite.com stuff.

What do I do to have it return nothing at all?
Like this: Visit the stuff

Or, go to a new page completely where they will be soundly rebuked for trying to do that.

The only exception for this will be within the BBCode's Code tag - but I'll get to that later.

Hope someone can help.

Many thanks.

Nedals
07-24-2009, 01:09 PM
my $body = 'Visit the <a href="http://somesite.com">somesite.com</a> stuff';
$body =~ s/<([^>]|\n)*>//g; #seems that this could be simplified to:- $body =~ s/<.+?>//g;

meaning:-
'<'
'.+?' one or more chars(any) '?' - non-greedy searches from the beginning until it finds '>'
'>'
So a basic solution might be this:-
$body =~ s/<.+?>.+?<.+?>//g;

If the '?' were left out it would ues a greedy search from the end of the string back to '>'
Check this out:
my $body = 'Visit the <a href="http://somesite.com">somesite.com</a> stuff';
$body =~ s/<.+?>//g;
print "$body\n";

$body = 'Visit the <a href="http://somesite.com">somesite.com</a> stuff';
$body =~ s/<.+>//g;
print "$body\n";
You will notice the second solution also does what you want but only for this example
It would not work if you had additional HTML tags on the line.

edatz
07-24-2009, 05:54 PM
Hi Nedals, I tried those out and came up with mixed results. When multiple links (I tried several of them like the spambots do) were present I got all but one.

So I messed around with the basic solution and came up with this:

$body = '
If you live in Pleasanton
<a href="http://somesite.com">somesite.com</a>
<a href="http://somesite.com">somesite.com</a>
<a href="http://somesite.com">somesite.com</a>
<a href="http://somesite.com">somesite.com</a>
<a href="http://somesite.com">somesite.com</a>
<a href="http://somesite.com">somesite.com</a>
You are about this far ...... from Oakland.
';
$body =~ s/<.+?>|.+?<.+?>//g;
print "$body\n";

Interesting :D

On a post in my test forum, I had a blank line for each link. Also tried it on other HTML and the same result.

No errors seemed came up on my local testbed log, so I guess the code has not kept any processes running. It wipes the tags and everything inbetween them. I would assume (cuz I didn't try it) that a single tag, like an image, would just blank out all text after it in the post.

Nedals
07-24-2009, 07:18 PM
The code I gave you works if you have starting AND ending tags.
but with a little modification..

my $body = '
If you live in Pleasanton
<img src="image.gif">
<a href="http://somesite.com">somesite.com</a>
<a href="http://somesite.com">somesite.com</a>
<a href="http://somesite.com">somesite.com</a>
<img src="image.gif">
<a href="http://somesite.com">somesite.com</a>
You are about 30 miles S.E. of Oakland.
';
$body =~ s/<.+?>(.+?<.+?>)*//g;
print "$body\n";

-or-
$body =~ s/<.+?>(.+?<.+?>)*\n*//g; # to also get rid of the resulting blank lines

perl_diver
07-24-2009, 07:27 PM
I know you are doing your own forum, but what happens when you post HTML code on this forum?

<a href=test>test</a>

You see the html code. For your own sanity I suggest you stick with that same philosophy. Convert < and > to their HTML escape sequences.

edatz
07-24-2009, 08:19 PM
Thanks Nedals that works a treat. I tried posting 100 links and instead of a big blank entry I had a one line blank entry. It's only this one line of code in 2 places in the script and all the HTML is dealt with.

perl_diver. I think that when that happens, you get this:
<a href=test>test</a>
<a href=test>test</a>
<a href=test>test</a>
<a href=test>test</a>
<a href=test>test</a>
<a href=test>test</a>
<a href=test>test</a>
<a href=test>test</a>
<a href=test>test</a>
<a href=test>test</a>
ad nauseum,

Which is what I'm trying to prevent. The BBcode works except the "Code" bit and I'm working on that. The way things are now it's either BBCode or plain text. HTML itself will be banned totally (you did talk me into BBcode :))

If'n it weren't for spambots life would be a bunch easier.

perl_diver
07-26-2009, 06:30 PM
What happens when people want to show another person how to do something with HTML code or ask questions about HTML code? If they post HTML code your script is going to remove it.

perl_diver
07-26-2009, 06:32 PM
perl_diver. I think that when that happens, you get this:
<a href=test>test</a>
<a href=test>test</a>
<a href=test>test</a>
<a href=test>test</a>
<a href=test>test</a>
<a href=test>test</a>
<a href=test>test</a>
<a href=test>test</a>
<a href=test>test</a>
<a href=test>test</a>
ad nauseum,

Which is what I'm trying to prevent.

That's what forum moderators are for.

edatz
07-26-2009, 07:12 PM
What happens when people want to show another person how to do something with HTML code or ask questions about HTML code? If they post HTML code your script is going to remove it.

That will come under "CODE". I've been looking at how some other forums handle it and they each tend to do it differently, but I noticed that the code does it and translates each character to a specific entity. My first try gave only partial results (I'm taking a break doing other stuff to clear my head on that score).

When the BBCode is used, it writes HTML to the end file, but the process to get there all HTML is banned until the BBCode convertor does it's stuff in the final write. Works fine.

As for Moderators, I figure the less work they have to do the better.

perl_diver
07-27-2009, 05:54 PM
Well, its your script, and your sanity. ;)