www.webdeveloper.com
Results 1 to 3 of 3

Thread: Specifying DOCTYPE for xml in Perl

  1. #1
    Join Date
    Apr 2010
    Posts
    2

    Specifying DOCTYPE for xml in Perl

    Hi,

    I have written a perl script which scrapes a website and generates an HTML file. I pass on this file to a java servlet. On parsing the xml in the servlet i sometimes get a org.xml.sax.SAXParseException. I noticed that this exception is because the generated xml sometimes contains characters like nbsp, Iuml etc which cannot be parsed by the xml parser. Is there some way I can get over the problem?

    Doing a bit of online search I found that declaring entities like
    <!ENTITY nbsp CDATA "*" is one way to have a well formed xml. But how do I declare the entities in the perl file?

    Any help would be appreciated.

    Thanks.

  2. #2
    Join Date
    Oct 2007
    Location
    Vienna, Austria
    Posts
    391
    I suggest you to convert the entities to numeric ones. I suggest using one of these modules:
    HTML::Entities::Numbered
    or XML::Entities (the author of which is accidentally me).
    Both are capable of converting named entities into numeric ones, which are inherently supported by XML.

  3. #3
    Join Date
    Apr 2010
    Posts
    2
    Thanks Sixtease,

    I was able to get around the problem using the same method!

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles