I have a bunch of HTML documents with a P element and 'id' attribute set to 'title'. Like so:
In some cases, I have a title that has a forced line break:HTML Code:<p id="title"> Title of the document
I have created an UpdateAndSynchronize.php document that scans a tree where all my web documents are, loads the document (using DOMDocument::loadHTML()), sets up the XPath object, and extracts the info I want to put in the MySQL database.HTML Code:<p id="title"> This Is A Title Of A Document<br>With A BR Element In It
My XPath expression to get the document title is:
$htmlXPath is an XPath object.PHP Code:$docTitle = $htmlXPath->query('.//p[@id="title"]')->item(0)->textContent;
$docTitle = trim(str_replace(array("\n", "\r\n", "\r", "\t"), " ", $docTitle));
I had to add the second line to get rid of leading and trailing whitespace.
My problem is the str_replace() is not working, because the <br> element in the XPath query is probably being converted (translated?) to some other character.
The question is:
How should I be setting up my XPath->query() to convert <br> elements into a single space character?
Also, is there a good reference (book? web pages?) that show how to set up XPath queries (evaluations?) with lots of examples?


Reply With Quote
Bookmarks