Click to See Complete Forum and Search --> : Link checker
Hey, guys, I've got yet another question. I was thinking today, and wondering... How would I use PHP to go to a page and follow all of the links on that page and print their URIs? Any ideas? I haven't come up with anything, but I am definitely ready and willing to learn this new field.
Thanks, ;)
Jona
Nevermore
05-16-2003, 01:10 PM
Is it possible to open pages outside your own server with PHP? I only know about fopen, and that is your-server only.
So do I, but it must be possible. I mean, unless you have to use CGI, but PHP should be able to do it, too.
http://validator.w3.org/ searches through the source of a specific URI (or an uploaded temporary file, but I know how to do that).
There is another site that checks spelling, links, etc., etc. on a Web page you specify. How do I make PHP follow links from page to page and write out the results (example: visit somesite.com, find all of the links on the page, print them all out, go to each one of those pages individually and print them out, etc.).
That's what I meant.... lol :p
Nevermore
05-16-2003, 01:16 PM
I know how you can get it to find links, but if I can't make it open pages, it isn't much use, really. Where's Pyro...
He's, "not at my desk" right now... Man, he would know... (lol)
While we wait, though, how's about you show me some PHP code about getting all the links on a page? (I could just use Javascript and frames.... lol, but I don't wanna do that..)
Nevermore
05-16-2003, 01:19 PM
validator.w3.org is using PHP. Now if they'd just written how they did it...
They use CGI. Look: http://dev.w3.org/cvsweb/~checkout~/validator/htdocs/source/index.html?rev=1.28&content-type=text/html
Here is the source code for the script: http://dev.w3.org/cvsweb/~checkout~/validator/httpd/cgi-bin/check?rev=1.305.2.43&content-type=text/plain
Nevermore
05-16-2003, 01:34 PM
How strange, if you try to go to it by using index.php in their chck directory, it finds it.
This PHP regular expression should find any and all hyperlinks and return them. You would just need to loop. (Soz, I haven't tested it. I'm not at home, I'm on a laptop with a dial up.)
$look=ereg(<a [a-zA-Z0-9]* href="[a-zA-Z0-9]*" [a-zA-Z0-9]*>[a-zA-Z0-9]*</a>);
Hmmm.... I see. That looks interesting. It looks like one of the most used things in PHP is RegExps... I'll have to study more on those. lol
Nevermore
05-16-2003, 01:40 PM
I've been using PHP for a while, and they are quite hard to use. Most often they can be replaced by simpler code, so I don't use them enough to become good.
I see. Well, I'll get good at 'em nonetheless! lol :D
AdamGundry
05-16-2003, 02:23 PM
You should be able to open the links with CURL (http://www.php.net/manual/en/ref.curl.php).
Adam
Adam, I'm assuming I'll need to install the package on my server then, right? I don't think I can do that on a free server... Is there any other way possible? (I will check to see if it's already installed on my server, which hopefully it is.)
Is this what you are looking for, Jona?
<?php
$code = file ('http://www.yahoo.com/'); // file to open
foreach ($code as $line_num => $line) { // loop through lines
echo "<span style=\"font-weight:bold;\">Line #$line_num :</span> " . htmlspecialchars($line) . "<br>\n"; // echo lines to screen. Note htmlspecialchars() convers special characters to their HTML entities
}
?>
Cijori, two things. One: Your code doesn't work. Two, I changed it up to make it work, but when I do it prints nothing. How can I fix this?
Thanks.
Nevermore
05-17-2003, 04:03 AM
You might want to talk to Pyro about that - as I said, I'm not brilliant with RegExps. Are you looping through each line looking for things, then printing the contents of the variable? That's what I would try.
Well, all I know is this ereg/preg_match/preg_match_all stuff gets quite confusing--more so when it doesn't work as expected. lol
Nevermore
05-17-2003, 10:33 AM
Can you use require() on a file that is on a different server?
I've never tried (or used) it, how's the syntax? require("http://myotherothersite.com/myfile.php"); right?
Nevermore
05-17-2003, 10:36 AM
That's it; if that works then you could grab other files.
Hmmm... It's an idea. Hold on let me test it.
AdamGundry
05-17-2003, 10:38 AM
Yes, and it should work as long as allow-url-fopen (http://www.php.net/manual/en/ref.filesystem.php#ini.allow-url-fopen) is set on the server. (I just checked the docs).
Adam
Yes, it does work. :D That's neat.. Now about getting all of the URLs in it... :rolleyes:
Nevermore
05-17-2003, 11:23 AM
I'm starting to think you may have to go via CGI...
That's a possibility. Man, I sure wanted to use PHP, though. It's so much easier to find a free server that way. *Sigh*
CGI is more powerful than PHP, though, isn't it? It's also quite a bit harder... Well, not too hard, but the syntax is a bit different.... And you have to learn how to, "move around" in it.
Nevermore
05-17-2003, 11:33 AM
If you want to find a free web host, try http://www.clickherefree.com. It's a free database of free web hosts. freewebspace.net is another (I think).
Don't worry about me, I can find one for free. lol I've been practicing that for over a year now. lol
Nevermore
05-17-2003, 11:39 AM
Yeah, I've been through geocities, IT3, brinkster, Tripod and only recently have moved on to paying for hosting - at the moment I'm hosting my own, but I'm probably going to move to colocation soon.
Nevermore
05-17-2003, 11:58 AM
If you don't need anything that Bravenet isn't already giving you, Brinkster (http://www.brinkster.com) might be better for you. They offer MySQL and ASP, put no ads on your pages, and are free.
Well, I don't know ASP and I don't want to learn it yet.. It's too hard! lol I don't know why Microsoft makes everything in caps and stuff... Also, Brinkster doesn't have much bandwidth at all--something that I need. ;)
There is also http://freewebs.com/ which offers no ads, PHP support, and is free...
Nevermore
05-17-2003, 12:06 PM
Freewebs also have CGI, so you could use them for your link checker. What do you want to check the links of, by the way?
I basically just want to learn all of the practical (or impractical, lol) uses of PHP. I want to learn all that "extra" stuff that no one bothers with. I want to just learn all I can! ;) I don't have a book or anything to learn from... All I have is http://php.net/ a useful resource, but not a tutorial area.
Programming and HTML have their advantages, HTML is easy to learn yet you have to have valid (http://validator.w3.org/) HTML; in programming there is no "valid" or "invalid" unless it's a syntax error or something... Which is its advantage over HTML. Programming is also more powerful (duh, how do you think they came up with HTML? lol).
BTW, don't say to go to Webmonkey.lycos.com or whatever it is for PHP tutorials because none of the ones there are any good... At least, not to me. :rolleyes:
AdamGundry
05-17-2003, 12:41 PM
HTML is easy to learn I suppose it depends on how you code - I found programming fairly easy to pick up, but learning to hard-code HTML took longer. HTML does have editors though, which makes it a lot easier.
Of course, you then get on to whether a RAD tool like Delphi or another IDE is an editor, and to "hard-code" you should be doing everything manually.
Good luck with learning PHP - it's a great language. I'm gradually learning, and I agree with you - the best resources is the website.
Adam
P.S. A good way I found to learn was (i) to make something I enjoy (internet games), and (ii) set up a webserver on my computer so I can test much more easily.
Adam, that's exactly what I do. I just don't enjoy making games, I enjoy making more complex things... I satisfy myself more often when I accomplish something and don't get frustrated. lol
Also, I have the http://aprelium.com/ Web server installed on my system so I can run PHP (and I can also download CGI) scripts on my local machine. The one thing is I can't CHMOD folders...
Nevermore
05-17-2003, 01:40 PM
Thanks for the link to the server - it's the only one I've seen that I can run. Now I can test PHP more easily. I was running a link through my server - they aren't exactly miles from one another...
I think this is what you are looking for, Jona:
<html>
<head>
<title>Link Validator</title>
<style type="text/css">
a {
color:darkblue;
}
</style>
</head>
<body>
Input a full url (ie. http://www.infinitypages.com/index.php).
<form action="checklinks.php" method="post">
<input type="text" name="url" size="50">
<input type="submit" name="submit" value="Check links">
</form>
<?php
#######################################################
# This script is Copyright 2003, Infinity Web Design #
# Written by Ryan Brill - ryan@infinitypages.com #
# All Rights Reserved - Do not remove this notice #
#######################################################
if ($_POST["url"]) {
$file = $_POST["url"];
echo "Links in file <a href=\"$file\">$file</a>:<br/><br/>\n";
$x = 1;
$valid = 0;
$invalid = 0;
$filename = split("/",$file);
$filename = $filename[count($filename)-1];
$path = split($filename, $file);
$path = $path[0];
$contents = @file($file) or die ("Failed to open <a href=\"$file\">$file</a> to check links. Please be sure it is an absolute URL.");
foreach ($contents as $line_num => $line) {
if (preg_match('/\<a href=.*?\>/', $line, $a)) {
for ($i = 0; $i < count($a); $i++) {
$url = preg_split("/href=['\"]/", $a[$i]);
$url2 = preg_split("/['\"]/", $url[1]);
$spliturl = parse_url($url2[0]);
if ($spliturl[scheme] == "") {
$finalurl = $path.$url2[0];
}
else {
$finalurl = $url2[0];
}
if (strtolower($spliturl[scheme]) != "mailto") {
$code = @file ($finalurl);// file to open
if (!$code) {
echo "<span style=\"color:darkred;\">Invalid:</span> <a href=\"$finalurl\">$finalurl</a><br/>\n";
$invalid++;
}
else {
echo "<span style=\"color:green;\">Valid:</span> <a href=\"$finalurl\">$finalurl</a><br/>\n";
$valid++;
}
}
}
}
}
}
if ($x == 1) {
echo "<br/>\n";
echo $valid+$invalid." links checked.<br/>\n";
if ($valid > 0) {
echo "<span style=\"color:green;\">$valid valid links found.</span><br/>\n"; }
if ($invalid > 0) {
echo "<span style=\"color:darkred;\">$invalid invalid links found.</span><br/>\n";
}
}
?>
</body>
</html>
Nevermore
05-19-2003, 03:56 PM
Woah...
Yup, Cijori, Pyro knows his stuff and he knows it well!