Click to See Complete Forum and Search --> : fullproof robot blocking of a folder


bennystylee
11-08-2009, 11:07 AM
Hi all,

I need a fullproof way of blocking a dynamically created folder on my site. Ii is crucial that items in the folder and the folder name does not get cached by search engines or robots ideally (the folders contain mp3's that I dont wish there to be direct links to anywhere, I have implemented a way of playing the files (thanks dev shed) - which keeps the urls/locations hidden)

So I am thinking, robots is out unless I write the folder name to the robots file, which would mean the folder name is in the robots file. And considering ther could be alot of these new folders created that robots file will get big and some bots ignore it.

I have no index in the folder so placing the following wont work

<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">
<META HTTP-EQUIV="CACHE-CONTROL" CONTENT="NO-CACHE">
<META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE">
<META NAME="ROBOTS" CONTENT="NONE">
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">

also note I have added Options -Indexes to my htaccess.

I am wondering can I do this with permissions?

Chmod my folder to 711? with www-data being the owner - would this work? would this stop the bloomin bots from peeking in? But still allow apache to play the file...

Any advice is much appreciated - thanks in advance

developerguru
11-10-2009, 06:45 AM
Hi all,

I need a fullproof way of blocking a dynamically created folder on my site. Ii is crucial that items in the folder and the folder name does not get cached by search engines or robots ideally (the folders contain mp3's that I dont wish there to be direct links to anywhere, I have implemented a way of playing the files (thanks dev shed) - which keeps the urls/locations hidden)

So I am thinking, robots is out unless I write the folder name to the robots file, which would mean the folder name is in the robots file. And considering ther could be alot of these new folders created that robots file will get big and some bots ignore it.

I have no index in the folder so placing the following wont work

<META NAME="ROBOTS" CONTENT="NOINDEX,NOFOLLOW">
<META HTTP-EQUIV="CACHE-CONTROL" CONTENT="NO-CACHE">
<META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE">
<META NAME="ROBOTS" CONTENT="NONE">
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">

also note I have added Options -Indexes to my htaccess.

I am wondering can I do this with permissions?

Chmod my folder to 711? with www-data being the owner - would this work? would this stop the bloomin bots from peeking in? But still allow apache to play the file...

Any advice is much appreciated - thanks in advance

suppose the folder containing the mp3's is named /music/

then you can create a robots.txt file & specify:
User-agent: *
Disallow: /music/

to prevent crawling/indexing of that folder by all search engines.

bennystylee
11-10-2009, 06:51 AM
I found this solution, which can be put in my main .htaccess and covers all mp3s in my root. It basically only allows an internal call for mp3s, and forbids external calls.


<FilesMatch "^(.*)\.mp3$">
Order Deny,Allow
Deny from All
Allow from env=REDIRECT_STATUS
</FilesMatch>


http://www.codingforums.com/showthread.php?p=886294#post886294

Thanks for the reply