Click to See Complete Forum and Search --> : XML with files


auxone
09-09-2008, 09:57 PM
I've just started using XML for my new project, and as part of it I have to send files along with other elements. Currently I am actually just stuffing all the binary data betten a <file></file> tag and sending it on it's merry way. Is there a better way to do this? The files not http accessible, so I can't just provide a URI.

Any suggestions would be greatly appreciated.
Thanks!

NogDog
09-09-2008, 10:22 PM
Probably not something intended for XML, but if you really need to, I might consider something like uuencoding or base64 encoding the data into a text string, which would then be included as CDATA within the XML element.

auxone
09-09-2008, 10:46 PM
If I may ask, what is the reason for encoding it? Also, if I am also writing the program that receives the XML file would I still have to use CDATA since that is really a syntactical thing?

What keeps the binary file information from having the same 'string' as my closing tag thus creating an error?

Thanks for the response!

NogDog
09-09-2008, 11:56 PM
XML is a text mark-up specification. Therefore, if you do something like uuencode or base64-encode the binary data, you convert it into a text string which can then be included in your XML text. You could put that text between <![CDATA[ . . . ]]> tags so that any XML parser does not try to parse anything within that data string and to keep your XML valid, although in actuality that probably would not be necessary since, at least with base64, no invalid characters would be used. Base64 encoding generates a string using a "64-character alphabet consisting of upper- and lower-case Roman alphabet characters (A–Z, a–z), the numerals (0–9), and the "+" and "/" symbols. The "=" symbol is also used as a special suffix code."* Thus there is no chance of coincidentally have a sequence of "]]>" characters in the encoded data that would break your CDATA section, or any angle brackets or &'s to break your XML if you do not use CDATA tags.

So in PHP, for example, you could create the field as:

echo "<field>" . base64_encode(file_get_contents($fileName)) . "</field>\n";

Then the receiving script would simply grag the text from that field and reverse the encoding (base64_decode() in PHP) to regenerate the binary data.
_____________________
* From http://en.wikipedia.org/wiki/Base64

auxone
09-10-2008, 10:21 AM
Thank you very much for the well thought out response. It's exactly what I am looking for. The program that receives the XML is written in .NET so I hope it has a function similar to base64_decode! I'll go look now. Thanks again!

Do you think this solutions is impractical when sending files that are several MB in length? As you say, XML is a text mark-up. I wonder how other people handle file transfer.

Later

auxone
09-10-2008, 10:57 AM
The Wikipedia seems to infer that encoding makes the file larger in some situations. Maybe I am just confused. Seems like that is only with MIME.

NogDog
09-10-2008, 04:24 PM
The Wikipedia seems to infer that encoding makes the file larger in some situations. Maybe I am just confused. Seems like that is only with MIME.
Yes it will be larger, but I don't know exactly how much.

Personally, if I needed to communicate with another web application and send it a file, if possible I'd probably try to do it by sending it as a HTTP POST method file upload, perhaps using cURL in the sending script to set up and send that post data. To me XML just doesn't seem like a very good fit for this. Alternatively, I might use FTP either to put the file to the receiving host, or in the XML data I'm sending it supply it the name of the file so that it can do a FTP get of the file. In either case (HTTP POST data or FTP put/get), you are using an interface designed for handling the transfer of files, whereas with XML you are trying to sort of force it to do so.

auxone
09-10-2008, 05:00 PM
Thanks again. I suppose I should POST the file, but XML in question is actually sent in response to a GET from the client application. The XML data associated with it (the stuff that isn't in the <file></file> tag) is important too, so I'll have to find a way to group them logically in the transmission.

For security the files are being served deeper than public_html, so just listing the files location in the response won't be sufficient.

Hmm.. maybe I have to rethink this whole system.

NogDog
09-10-2008, 06:21 PM
Thanks again. I suppose I should POST the file, but XML in question is actually sent in response to a GET from the client application. The XML data associated with it (the stuff that isn't in the <file></file> tag) is important too, so I'll have to find a way to group them logically in the transmission.

For security the files are being served deeper than public_html, so just listing the files location in the response won't be sufficient.

Hmm.. maybe I have to rethink this whole system.
Rethinking is often good. :)

In theory the XML with an encoded file could work, but it seems sub-optimal, especially if the files are large. (How big is "large"? I'm not sure.)

You might want to look into using the DIME protocol (http://msdn.microsoft.com/en-us/library/aa480488.aspx) if it's possible to set up the interface as a SOAP implementation.

auxone
09-10-2008, 07:02 PM
Thanks for the suggestions but I am really stuck on REST, which is what the system is most like now. I'll just keep brainstorming on a better way. Maybe I will put the files in a web-accessible directory and just link that in the <file></file> tag. I'll just have to read up on keeping them secure, as the files are very sensitive.

Thanks again for your consult.