www.webdeveloper.com
Results 1 to 6 of 6

Thread: Dumping XML feeds to MySQL??

  1. #1
    Join Date
    Nov 2005
    Posts
    9

    Question Dumping XML feeds to MySQL??

    Hi I was wondering if anyone has experience with parsing large XML files to MySQL up to 5G.
    So far I wasn't able to come with efficient code that will do that.
    Last solution I come up with is cut the file on multiple files 100mg each and process them in batch but it takes too much time so I have to come up with something else but the major problem is I'm on shared host so If script takes too much server resources process get automatically killed.
    I just don't think XML feeds are efficient for store large amount of information.


    Best,DS

  2. #2
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    19,321
    I'm not sure there's any way to handle text files that large in PHP that will qualify as "efficient". Reading the whole file into memory via file() or file_get_contents() is likely going to cause memory usage problems, and reading a line at a time is probably going to be too slow.

    For that matter, I'm not sure that any language is going to efficiently read and parse a 5 GB XML file (at least not without a high-quality, dedicated server to do the work). It might be time to give the whole concept a careful look and determine if this whole concept makes sense or if there's a simpler way to get the data you need. (For instance, how much duplicate and/or unwanted data is included in each XML file?)
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  3. #3
    Join Date
    Aug 2005
    Location
    The Garden State
    Posts
    5,634
    isn't there a limit in php of 2 mb each per file anyways? or is that 2gb?

    I just don't think XML feeds are efficient for store large amount of information.
    You're right, they're not. Who designed your xml to be 5gb?

    Edit: What I mean to say is that you shouldn't be storing your data in xml. look at mysql and oracle's implementations of xml schemas for tips.
    Last edited by chazzy; 06-15-2006 at 05:27 PM.
    Acceptable Use | SQL Forum FAQ | celery is tasteless | twitter

    celery is tasteless - currently needing some UI time

  4. #4
    Join Date
    Nov 2005
    Posts
    9

    Cool

    No we do not storing our data in XML format.
    But we display data from other websites most of them provide feeds in XML format. I have no have problem parsing regular comma or tab delimited text file feeds even if they big. I dumping some feeds form HALF.com they provide feeds in regular text file file size 500mg no problem there.
    Maybe I should convert XML data to just regular text file first?

    Best, DS

  5. #5
    Join Date
    Aug 2005
    Location
    The Garden State
    Posts
    5,634
    well how exactly are you parsing it? i hope not by hand, as that is very inefficient from a coding and processing stand point. i don't see how any site on the web could possibly send 5 gb of data via XML and expect 1) for anyone to parse it 2) for them to not have inexorbinant amounts of wasted bandwidth.
    Acceptable Use | SQL Forum FAQ | celery is tasteless | twitter

    celery is tasteless - currently needing some UI time

  6. #6
    Join Date
    Nov 2005
    Posts
    9
    Guess what. Amazon does.
    We tried to use amazon API and getting results directly from Amazon which still XML file but it's slow specially when it shows multiple results.

    Cheers,DS

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles