www.webdeveloper.com
Results 1 to 2 of 2

Thread: Web-page link validation

  1. #1
    Join Date
    Apr 2011
    Posts
    1

    Web-page link validation

    Hello there,

    I have a question regarding web-page link validation in java.

    I have an assignement to make a java program that will extract all the links from a web-page.
    I have written a code for that, as you can see in the CODE wrappers, but i also need to check if the link is valid before doing the extraction, so that the program doesn't write the error links.
    Do you have any suggestion for this problem, I could really use some advice. Thanks!

    Code:
    import java.io.BufferedReader;
    import java.io.FileReader;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    public class Main {
      public static void main(String[] arguments)throws Exception {
        StringBuffer output = new StringBuffer();
    
        FileReader file = new FileReader("a.htm");
        BufferedReader buff = new BufferedReader(file);
        boolean eof = false;
        while (!eof) {
          String line = buff.readLine();
          if (line == null)
            eof = true;
          else
            output.append(line + "\n");
        }
        buff.close();
    
        String page = output.toString();
        Pattern pattern = Pattern.compile("<a.+href=\"(.+?)\"");
        Matcher matcher = pattern.matcher(page);
        while (matcher.find()) {
          System.out.println(matcher.group(1));
        }
      }
    }

  2. #2
    Join Date
    Nov 2008
    Posts
    19
    You can parse all the link in a webpage can be parse by using
    javax.swing.text.html.HTML,j avax.swing.text.html.HTMLEditorKit, javax.swing.text.html.parser.ParserDelegator packages.
    You can use the below it will shoot for your requirement.
    Broken Link Checker in Java

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles