Hi, I have developed some code that crawls web pages looking for links. I need to filter out irrelevant links such as those that refer to css, javascript functions, favicons, this is simple enough to achieve with regex. What i need to know is what other irrelevant links am i likely to find on web pages?
Also is there a name for links of the following form: -
Bookmarks