Click to See Complete Forum and Search --> : SearchXML: it stops in first tag it finds......


athanach
12-07-2007, 07:44 AM
I have a problem with the next code. I want to find the elements which are inside a tag. The code below it finds me the element inside the first tag it finds and not for all tags it has the page. For example in a rss feed may has a lot of news but if search for the tag <title> to get the titles of news it gives me only the first.

Any suggestions to make it work for all tags and not for the first only???


public void search(String filename) throws Exception {



File file = new File(filename);

DOMParser parser = new DOMParser();

parser.parse(file.toURL().toString());

Document doc = parser.getDocument();


// Get node to start iterating with
Element root = doc.getDocumentElement();


NodeList descriptionElements =
root.getElementsByTagName("title");

Element description = (Element)descriptionElements.item(0);

// Get a NodeIterator
NodeIterator i = ((DocumentTraversal)doc)
.createNodeIterator(description, NodeFilter.SHOW_ALL,
new FormattingNodeFilter(), true);


Node n;

while ((n = i.nextNode()) != null) {
String buf = n.getNodeValue();
System.out.println("Search phrase found: '" + buf + "'");



}
}

chazzy
12-07-2007, 07:47 PM
It's because of this block of code


NodeList descriptionElements =
root.getElementsByTagName("title");

Element description = (Element)descriptionElements.item(0);

You're explicitly getting the first item. You should really wrap this item(i) call in a for loop until the end of this method block.

athanach
12-08-2007, 07:40 AM
I have 2 problems with the code i give you.

1) I want to look for the tag link but the tag link is given as

<link type=application href=www.example.com />

and the program isn't find them because is search for

<link> type=... href=.... </link>

do you no how i solve this problem????

2) Do you no how to search the html document without stop to errors in pages.Now i search the page and if is not well formed it gives me the error and it doesn't search for the tag i give him. I want to parse the page without give me errors and to do the search even if the page isn't well formed

chazzy
12-08-2007, 08:20 AM
have you thought about just using a library that already exists? why reinvent the wheel if it's already been done?

http://htmlparser.sourceforge.net/

athanach
12-08-2007, 08:34 AM
i use other libraries for parcing a html pages this u think it helps more?
the problems i have isn't with how i search docs but how i do so it doesn't throw me errors and stop the searching.

And the problem with the link i think it will be the same and in htmlparser u gave me because it looks for <tag>....</tag> no for <tag...../>