I have written a perl script which scrapes a website and generates an HTML file. I pass on this file to a java servlet. On parsing the xml in the servlet i sometimes get a org.xml.sax.SAXParseException. I noticed that this exception is because the generated xml sometimes contains characters like nbsp, Iuml etc which cannot be parsed by the xml parser. Is there some way I can get over the problem?
Doing a bit of online search I found that declaring entities like
<!ENTITY nbsp CDATA "*" is one way to have a well formed xml. But how do I declare the entities in the perl file?
Any help would be appreciated.