www.webdeveloper.com
Results 1 to 4 of 4

Thread: Making Sphider ignore disallowed pages?

Threaded View

  1. #1
    Join Date
    Jun 2009
    Posts
    14

    Unhappy Making Sphider ignore disallowed pages?

    I meant: Ignore the disallow instruction, going ahead and retrieve the page.

    I tried sphider, sphider-plus and some mods to make it "ignore robots", but it seems not enough.
    I'm trying to index a third party website to help users to find other's posts, since the owner seems too busy with the "sales, sales, sales" part.
    The problem is seems they deliberately want us not to find help because they also added some "disallow" rule.

    I can browse the pages, and even changed sphider agent to Firefox's, no success.

    Is it even possible to browse a website as a browser, other than faking the user agent? in other words: How many ways a server has to figure out whether it's a robot or not what is reading the pages?

    What I'm stating could be wrong, and there could be other instructions/rules in robot.txt or somewhere else, but bear with me

    Thanks.
    Last edited by sergiozambrano; 03-22-2012 at 06:22 AM.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles