Click to See Complete Forum and Search --> : XML parsing dilema


kurent
05-26-2009, 02:20 AM
For example we have an application with a java servlet that sends data as a response to a request. He sends something like this:

<info>Record found!</info>
<data>
<name>Peter</name>
<age>24</age>
</data>

But what if Peter is a naughty boy and instead of inputing his name he writes "<data>". Now the parser will get confused.

What is the best way of preventing this? The only option I see is to intentionally garble the tags, maybe even generate them randomly. So the improved version is <data_ogr8w31hf9q27sd> ... </data_ogr8w31hf9q27sd>
Now it is almost impossible to confuse the application.

Am I on the right track here?

Charles
05-26-2009, 05:47 AM
That's one pretty dumb parser that can't tell the difference between a tag name and the content of a text node. Stick with plain old "data" and use a competent parser. And you'll want to process the data at some point before it goes into the XML, encoding the characters "<>&" but chances are you app already does this.

kurent
05-26-2009, 06:26 AM
The thing is I parse in javascript (AJAX). I search the java servlet response for "<name>" and "</name>". Everything between is the data.

How could a parser know what actually is the data if the user is trying to confuse the application and inputs something he knows the parser will look for?

Charles
05-26-2009, 09:16 AM
How can the parser tell the difference? The same way that you do.

jkmyoung
05-28-2009, 02:14 PM
The application should automatically escape <data> as &lt;data&gt; if entered into that field. It should sanitize the inputs.