Click to See Complete Forum and Search --> : Alt tag text
edatz
12-21-2009, 07:44 AM
Has anyone ever grabbed the text from inbetween html alt tags?
Without a module?
For instance
<img src="any.gif" alt="The text inside">
So it would print to screen: The text inside
Hoe someone can help - thanks
Sixtease
12-22-2009, 03:33 AM
Looks like a job for HTML::Parser (http://search.cpan.org/perldoc?HTML::Parser).
Assuming your have the HTML code you want to search in a $html variable:
use HTML::Parser ();
my $p = HTML::Parser->new(start_h => [\&start, 'text, attr']);
sub start {
my ($text, $attr) = @_;
if (exists $attr->{alt}) {
print $attr->{alt}, " (in $text)\n";
}
}
$p->parse($html);
Or if you think this is an overkill, you can simply do
my @alts;
while ($html =~ /<[^>]*\balt=(["'])(.*?)\1/ig) {
push @alts, $2;
}
print "values of alt tags:\n", join("\n",@alts), "\n";
edatz
12-22-2009, 05:57 AM
Hi Sixtease, I tried the one without the module like this:
File:
<img src="/altest/images/01.gif" border="0" width="160" height="150" alt="One is one and all alone">
<img src="/altest/images/02.gif" border="0" width="160" height="150" alt="Two for the road">
open(TFL, "file.txt") || die("could not open");
@MFL = <TFL>;
close(TFL);
foreach $line (@MFL) {
my @alts = $line;
while ($html =~ /<[^>]*\balt=(["'])(.*?)\1/ig) {
push @alts, $2;
}
print "values of alt tags:\n", join("\n",@alts), "<br>\n";
}
The result was:
values of alt tags: The whole image, as an image
values of alt tags: The whole image, as an image
Have I done it wrong?
------------------------------------------------
It's alright I've seen what I did wrong
foreach $line (@MFL) {
my @alts;
while ($line =~ /<[^>]*\balt=(["'])(.*?)\1/ig) {
push @alts, $2;
}
print "", join("\n",@alts), "<br>\n";
}
It now works fine.
Thank you very much.
Sixtease
12-22-2009, 06:27 AM
You need not do the outer loop, either:
open(TFL, "file.txt") || die("could not open");
my $html = join('', <TFL>);
close(TFL);
my @alts;
while ($html =~ /<[^>]*\balt=(["'])(.*?)\1/isg) {
push @alts, $2;
}
print "values of alt tags:\n", join("\n",@alts), "<br>\n";
Update: I also added a s modifier to the regexp to deal with multiline alt attributes.
edatz
12-22-2009, 03:45 PM
Thanks for that Sixtease. At first the output didn't work quite right for me, but I tweaked it a little and it's doing the result with good breaks.
open(TFL, "file.txt") || die("could not open");
$MFL = join('', <TFL>);
close(TFL);
my @alts;
while ($TFL =~ /<[^>]*\balt=(["'])(.*?)\1/ig) {
push @alts, $2;
}
print join("<br>\n",@alts);
I then applied it to a file that's a small FFDB of 4 fields. I picked up the file name and used the extract on its image fields and I now have the result I wanted. Works a treat. Thanks again.