Results 1 to 4 of 4

Thread: Making Sphider ignore disallowed pages?

Threaded View

  1. #1
    Join Date
    Jun 2009

    Unhappy Making Sphider ignore disallowed pages?

    I meant: Ignore the disallow instruction, going ahead and retrieve the page.

    I tried sphider, sphider-plus and some mods to make it "ignore robots", but it seems not enough.
    I'm trying to index a third party website to help users to find other's posts, since the owner seems too busy with the "sales, sales, sales" part.
    The problem is seems they deliberately want us not to find help because they also added some "disallow" rule.

    I can browse the pages, and even changed sphider agent to Firefox's, no success.

    Is it even possible to browse a website as a browser, other than faking the user agent? in other words: How many ways a server has to figure out whether it's a robot or not what is reading the pages?

    What I'm stating could be wrong, and there could be other instructions/rules in robot.txt or somewhere else, but bear with me

    Last edited by sergiozambrano; 03-22-2012 at 07:22 AM.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
HTML5 Development Center