Click to See Complete Forum and Search --> : Alt tag text


edatz
12-21-2009, 07:44 AM
Has anyone ever grabbed the text from inbetween html alt tags?
Without a module?

For instance

<img src="any.gif" alt="The text inside">

So it would print to screen: The text inside

Hoe someone can help - thanks

Sixtease
12-22-2009, 03:33 AM
Looks like a job for HTML::Parser (http://search.cpan.org/perldoc?HTML::Parser).

Assuming your have the HTML code you want to search in a $html variable:
use HTML::Parser ();
my $p = HTML::Parser->new(start_h => [\&start, 'text, attr']);
sub start {
my ($text, $attr) = @_;
if (exists $attr->{alt}) {
print $attr->{alt}, " (in $text)\n";
}
}
$p->parse($html);

Or if you think this is an overkill, you can simply do
my @alts;
while ($html =~ /<[^>]*\balt=(["'])(.*?)\1/ig) {
push @alts, $2;
}
print "values of alt tags:\n", join("\n",@alts), "\n";

edatz
12-22-2009, 05:57 AM
Hi Sixtease, I tried the one without the module like this:

File:

<img src="/altest/images/01.gif" border="0" width="160" height="150" alt="One is one and all alone">
<img src="/altest/images/02.gif" border="0" width="160" height="150" alt="Two for the road">




open(TFL, "file.txt") || die("could not open");
@MFL = <TFL>;
close(TFL);
foreach $line (@MFL) {
my @alts = $line;
while ($html =~ /<[^>]*\balt=(["'])(.*?)\1/ig) {
push @alts, $2;
}
print "values of alt tags:\n", join("\n",@alts), "<br>\n";
}


The result was:
values of alt tags: The whole image, as an image
values of alt tags: The whole image, as an image

Have I done it wrong?
------------------------------------------------
It's alright I've seen what I did wrong

foreach $line (@MFL) {
my @alts;
while ($line =~ /<[^>]*\balt=(["'])(.*?)\1/ig) {
push @alts, $2;
}
print "", join("\n",@alts), "<br>\n";
}

It now works fine.

Thank you very much.

Sixtease
12-22-2009, 06:27 AM
You need not do the outer loop, either:
open(TFL, "file.txt") || die("could not open");
my $html = join('', <TFL>);
close(TFL);

my @alts;
while ($html =~ /<[^>]*\balt=(["'])(.*?)\1/isg) {
push @alts, $2;
}
print "values of alt tags:\n", join("\n",@alts), "<br>\n";

Update: I also added a s modifier to the regexp to deal with multiline alt attributes.

edatz
12-22-2009, 03:45 PM
Thanks for that Sixtease. At first the output didn't work quite right for me, but I tweaked it a little and it's doing the result with good breaks.


open(TFL, "file.txt") || die("could not open");
$MFL = join('', <TFL>);
close(TFL);

my @alts;
while ($TFL =~ /<[^>]*\balt=(["'])(.*?)\1/ig) {
push @alts, $2;
}
print join("<br>\n",@alts);


I then applied it to a file that's a small FFDB of 4 fields. I picked up the file name and used the extract on its image fields and I now have the result I wanted. Works a treat. Thanks again.