Click to See Complete Forum and Search --> : [RESOLVED] Specific HTML Only
edatz
06-21-2009, 01:16 PM
Hi, the perl script I'm working on will have input from users.
I can stop or show "all" HTML with this (fine that's good)
# html
if ($html eq "no") {
$body =~ s/<([^>]|\n)*>//g;
}
if ($html eq "yes") {
$body =~ s/</</g;
$body =~ s/"/"/g;
}
How do I allow say - only the bold tag?
<b>BoldText</b>
and still disregard all other HTML tags?
I figure if I start with this I can then add other allowed tags.
I may have more queries - especially about link tags.
Hope someone can help, thanks.
perl_diver
06-21-2009, 09:21 PM
Most forums avoid the issue by using "BB" code, just like this forum. So instead if <b>bold</b> you use [ b ]bold[ /b ] (without the spaces)
Sixtease
06-22-2009, 03:15 AM
Another tool I use is HTML::TagFilter (http://search.cpan.org/perldoc?HTML::TagFilter) module. Very flexible.
edatz
06-22-2009, 03:22 AM
Hi, I had thought of using BBCode, until I saw how much was needed to accomplish it, just to translate and retranslate code. So I figured that since Perl can do HTML exclusion in a line or two, it should be able to allow just certain code only and block the rest.
I'm trying to "think outside the box" on this one Perl_diver. So far I'm at 131k on the whole thing and aim to keep it all around the 300k mark total (base installation). Also, this will be offered as a download from my site, I need to take into consideration that not every ISP/Host is geared to developers and designers like my one is (learned that the hard way I did). So I don't use any modules if I can.
Code insertion is handled by Javascript and I have my own interpreter for that (about 6k). Everything else can be done in Perl when writing to the files. The form input is going to be a very, tiny, WYSIWYG (currently at 2k). So any HTML produced simply needs allowing or stopping.
Sixtease
06-22-2009, 03:38 AM
Well, if you only want to allow <b> and want to avoid using modules, then perhaps
$body =~ s{(</?(\w+)[^>]*>)}{$2 eq 'b' ? $1 : ''}sge
could do. It will need some polishing:
not sure if the 's' switch ensures newlines are matched by [^>]
this will allow <b onclick="alert('I am evil javascript code!')">
It's not been tested at all
But the idea is to use the 'e' switch to allow perl code in the substitution, so you can check if the tag is allowed.
edatz
06-22-2009, 03:50 AM
I'll give that code a try Sixtease. Thanks.
edatz
06-22-2009, 04:26 AM
Sixtease, that works fine. I added italics and tried other HTML. Only the bold and italics showed.
$body =~ s{(</?(\w+)[^>]*>)}{$2 eq 'b' ? $1 : $2 eq 'i' ? $1 : ''}sge;
I did try it with two lines, one for bold and another for italic, but it didn't like that.
I'm going to play around with that and see if I can make another line for the a tag (for links). I want to control the number of links allowed (if I allow links, I want them very limited).
Hmmmm, eg also works (eeg doesn't)
$body =~ s{(</?(\w+)[^>]*>)}{$2 eq 'b' ? $1 : $2 eq 'i' ? $1 : ''}eg;
perl_diver
06-23-2009, 12:27 AM
converting BB code to html tags is pretty simple:
$str = 'some text bold here some more test';
$str =~ s/\[(b|i|u)\]/<$1>/gio;
$str =~ s/\[\/(b|i|u)\]/<\/$1>/gio;
print $str;
of course this no more validates that the "bb" tags are in open/closed pairs anymore than your html filtering does. But is safer than allowing html tags. Take this really obvious situation:
$body = 'some text <b>bold here</b> some more <style>test</style> and now < javascript > some nasty stuff here < /javascript >';
$body =~ s{(</?(\w+)[^>]*>)}{$2 eq 'b' ? $1 : $2 eq 'i' ? $1 : ''}sge;
print $body;
oops..... ;)
A good reason to use a real HTML Parser like the one Sixtease mentioned.
Shorts
06-23-2009, 12:58 AM
Another good reason to use the HTML Parser Sixtease mentioned is the idea of developing and programming. Instead of reinventing the wheel, go with an open source option that is worked on by various people. This will save you time and hassle in the end not having to worry about small stuff :D
perl_diver
06-23-2009, 01:02 PM
And if you really don't want to use a module, get a copy of NMS Guestbook and look at how the authors filter the html. But I warn you, its probably not going to jive with your keep it simple philosophy Ted. Using BB code is really the much better option. Especially if all you want is to allow a few html tags like most forums do.
http://nms-cgi.sourceforge.net/scripts.shtml
edatz
06-23-2009, 01:37 PM
Hi perl_diver, in between me looking and logging in you posted. I will give that a miss (looked at it yesterday and didn't like it). I've had my mind changed on this (thanks). I can see that a WYSIWYG for my blog script is okay, but not for the forum.
I've played around with those bits of code and have a result. I found a small(ish) BBCode editor which got the text with [] brackets showing okay, then did those 2 lines you posted and it writes to the file as html. Great :D
I can run with this :). Thanks a lot. Now I shall begin messing around with other tags.
So to review.
HTML is banned totally (text will show, but not tag code handling - that's normal). The BBCode editor inputs to Perl script which has 2 lines of code to do the stuff for bold and italic (I removed the u). No module is needed (so users don't have to worry about that) and my code has grown by 5 1/2k (with editor JS). Good. Much appreciated.
I'm not even going to attempt to understand why some code I saw was over 300 lines to handle BBCode :eek:.
Shorts - When the wheel is too big for the job, it needs reinventing. I do it too. I get all involved with Photoshop and Freehand, coming up with some amazing logo and the client ends up with a really simple single color thing (learned that the expensive way - ouch!). Sometimes we do need the big boy for a big job, but in this case it's overkill.
The forum I'm working on is not meant to be an all singing/dancing one. It's aimed for small sites who only want a couple of forums and will probably have only a couple hundred users at most.
perl_diver
06-23-2009, 01:59 PM
The reason some code might be longer is because it is validating that an opening tag has a closing tag. If someone enters in your forum with no all the text following will be in bold, even text that might be in other posts, although I think you are using tables so that won't occur. An even better solution is to use CSS in conjunction with your BB code tags, so instead of converting [b] into <b> you convert it into <b class='b'> then you can associate the class of the tag to a style sheet and use a style sheet to allow users of the forum to implement different looks for the forum quit easily. Maybe you already thought of this so forgive me for going on about it if you already have.
edatz
06-23-2009, 02:42 PM
<b class='b'> now that I didn't know. I do use things like input or textarea etc., never thought of using CSS that way. Nice one p_d.
Now it's back to watching Wimbledon (if my back wasn't troubling me I'd go there) - it's only a few miles away on the other end of the tram line.
If you visit the live forum test site you'll have to register now, as that is in place. I'll try and get the editor up tomorrow or later on this evening.