I have a test form up here that will analyze a given site's meta tags. I would like to simply get the meta-tags, and then list them or put them into a table...
You can enter an URL into the box and the script will print out the meta tags in a list or table.
The code thus far:
PHP Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Meta Tag Analyzer</title>
</head>
<body>
<form action="test.php" method="post">
<input type="text" name="url" onFocus="this.value=''; this.onfocus=null;" onblur="this.value='Enter your URL here';" value="Enter your URL here" />
<input type="submit" value="Analyze Meta Tags" />
</form><br />
<?php
$url = $_POST['url'];
$url = str_replace("Enter your URL here", '', $url); //makes sure that there is an URL
if ($url == '' ) {
echo "Enter an URL to analyze above";
echo "\n";
} else {
$url = "http://$url"; // adds http:// to the URL in case the user forgets it...
$url = preg_replace('[http://http://]', 'http://', $url); // makes the form work with or without the http:// by simply removing double http://http://
function getUrlData($url)
{
$result = false;
$contents = getUrlContents($url);
if (isset($contents) && is_string($contents))
{
$title = null;
$metaTags = null;
preg_match('/<title>([^>]*)<\/title>/si', $contents, $match );
$result = array (
'title' => $title,
'metaTags' => $metaTags
);
}
return $result;
} function getUrlContents($url, $maximumRedirections = null, $currentRedirection = 0)
{
$result = false;
$contents = @file_get_contents($url); // Check if we need to go somewhere else
if (isset($contents) && is_string($contents))
{
preg_match_all('/<[\s]*meta[\s]*http-equiv="?REFRESH"?' . '[\s]*content="?[0-9]*;[\s]*URL[\s]*=[\s]*([^>"]*)"?' . '[\s]*[\/]?[\s]*>/si', $contents, $match);
if (isset($match) && is_array($match) && count($match) == 2 && count($match[1]) == 1)
{
if (!isset($maximumRedirections) || $currentRedirection < $maximumRedirections)
{
return getUrlContents($match[1][0], $maximumRedirections, ++$currentRedirection);
}
$result = false;
} else
{
$result = $contents;
}
}
return $contents;
}
$result = getUrlData("$url");
echo '<hr />';
echo '<h1>Results</h1>';
echo "<b>Results for:</b> ";
echo '<font color="red" face="courier">';
echo $url;
echo '</font>';
echo '<br />';
echo "<title>";
echo $result[title];
echo '</title>';
echo '<br /><font size="-1">';
echo $result[metaTags][description][html];
echo '<br />';
echo $result[metaTags][keywords][html];
echo '<br />';
echo $result[metaTags][Charset][html];
echo '<br />';
echo '<font color="red" face="arial">↑ <i>This is what I would like the results page to look like, except with the full listing of all metatags on the page</i></font>';
echo '<hr />';
echo 'Here is a listing of the types of tags that appear on the page, but I cannot figure out how to extract the HTML array and display it...';
// start table and print heading
reset($result[metaTags]);
list($c1, $c2) = each($result[metaTags]);
echo("<table><tr><td>$c1</td><td>$c2</td></tr>\n");
// print the rest of the values
while (list($c1,$c2) = each($result[metaTags])) {
echo("<tr><td>$c1</td><td>$c2</td></tr>\n");
}
// end the table
echo("</table>");
//end test
I'd recommend using something like the DOM functions to get the data, e.g.:
PHP Code:
<?php
/**
* Get title and meta info from URL
* @return mixed Array on success, else false
* @param string $url URL to parse
*/
function getMeta($url)
{
$content = file_get_contents($url);
if (!empty($content)) {
$data = array();
$dom = new DOMDocument();
@$dom->loadHTML($content);
if(empty($dom)) {
user_error("Unable to parse text");
return false;
}
$titles = $dom->getElementsByTagName('title');
if(!empty($titles)) {
foreach($titles as $title) {
$data['title'] = $title->textContent;
break; // should only be one
}
}
$metas = $dom->getElementsByTagName('meta');
foreach($metas as $meta) {
$tagData = array();
if ($meta->hasAttributes()) {
$attributes = $meta->attributes;
if (!is_null($attributes)) {
foreach($attributes as $index => $attr) {
$tagData[$index] = $attr->value;
}
$data['meta'][] = $tagData;
}
}
}
return $data;
}
else {
user_error("Unable to get page");
return false;
}
}
// TEST
$metaData = getMeta("http://www.ebookworm.us/");
echo "<p><b>Title:</b> ";
echo !empty($metaData['title']) ? $metaData['title'] : "[none]";
echo "</p>\n";
echo "<p><b>Meta Tags:</b></p>\n<ul>\n";
foreach($metaData['meta'] as $meta) {
echo "<li>";
foreach($meta as $name => $content) {
echo "<b>$name:</b> $content<br />\n";
}
echo "</li>\n";
}
echo "</ul>\n";
"Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
~ Terry Pratchett in Nation
Here is what I was able to do... almost there, but since I want to represent the meta tags exactly as they appear and detect if they are HTML or xhtml... I need to grab the right bracket with the slash if it appears in xhtml or without if in plain HTML.
I've gotten the output to look exactly what I like, except that I have manually re-added the trailing " />
This does not exactly show what is in the source code...
that is, for example, if a metatag appears like this:
Code:
<meta name="robots" content="index, follow">
This script will show it like this:
Code:
<meta name="robots" content="index, follow" />
...with the trailing slash for xhtml...
any ideas???
Here is the code on the page now...
PHP Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Meta Tag Analyzer</title>
</head>
<body>
<form action="TEST.php" method="post">
<input type="text" name="url" />
<input type="submit" value="Analyze Meta Tags" />
</form><br />
<?php
$url = $_POST['url'];
$url = str_replace("Enter your URL here", '', $url); //makes sure that there is an URL
if ($url == '' ) {
echo '↑Enter an URL to analyze above.<br /><br />';
echo 'If you are testing it for me from WebDeveloper.com, you can use this url: ';
include('TEST2.html');
echo "\n";
} else {
$url = "http://$url"; // adds http:// to the URL in case the user forgets it...
$url = preg_replace('[http://http://]', 'http://', $url); // makes the form work with or without the http:// by simply removing double http://http://
function getMeta($url)
{
$content = file_get_contents($url);
if (!empty($content)) {
$data = array();
$dom = new DOMDocument();
@$dom->loadHTML($content);
if(empty($dom)) {
user_error("Unable to parse text");
return false;
}
$titles = $dom->getElementsByTagName('title');
if(!empty($titles)) {
foreach($titles as $title) {
$data['title'] = $title->textContent;
break; // should only be one
}
}
$metas = $dom->getElementsByTagName('meta');
foreach($metas as $meta) {
$tagData = array();
if ($meta->hasAttributes()) {
$attributes = $meta->attributes;
if (!is_null($attributes)) {
foreach($attributes as $index => $attr) {
$tagData[$index] = $attr->value;
}
$data['meta'][] = $tagData;
}
}
}
return $data;
}
else {
user_error("Unable to get page");
return false;
}
}
// TEST
$metaData = getMeta("$url");
echo '<title>';
echo !empty($metaData['title']) ? $metaData['title'] : "[none]";
echo '</title>';
echo "</p>\n";
echo "<p><b>Meta Tags:</b></p>\n<table>\n";
foreach($metaData['meta'] as $meta) {
echo "<tr><td><meta ";
foreach($meta as $name => $content) {
echo "$name="$content"\n";
}
echo ' /></td></tr>';
}
echo "</table>\n";
}
?>
</body>
</html>
I will try to figure out some way of checking if the source page is HTML or xhtml and whether or not the meta tags contain the closing slash...
The script now displays the page's meta tags, the Facebook Open Graph Tags and if the 4 mandatory ones are present, if the page has a sitemap.xml file and a robots.txt file...
Bookmarks