Click to See Complete Forum and Search --> : XML to database


jvanamali
01-25-2009, 05:51 PM
Here is my problem,

my client has several customers who send him xml files. I need to dump all those xml files data into an existing table (insert and update depending on key)

The problem the all xml do not have the same structure, i.e there are a few common that every xml has , but they are others that the customer may or may not fill depending on his choice.

For Example

Customer 1 XML might have the following
<root>
<product>
<PrimaryField1>111</PrimaryField1>
<MandatoryField2>aaa<MandatoryField2>
<MandatoryField3>bbb</MandatoryField3>
<ExtraField1>extrainfo</ExtraField1>
<ExtraField3>extrainfo</ExtraField3>
</product>
<product>
<PrimaryField1>222</PrimaryField1>
<MandatoryField2>aaa<MandatoryField2>
<MandatoryField3>bbb</MandatoryField3>
<ExtraField1>extrainfo</ExtraField1>
<ExtraField3>extrainfo</ExtraField3>
</product>
</root>

Customer 2 XML might have the following
<root>
<product>
<PrimaryField1>555</PrimaryField1>
<MandatoryField2>aaa<MandatoryField2>
<MandatoryField3>bbb</MandatoryField3>
<ExtraField1>extrainfo</ExtraField1>
<ExtraField3>extrainfo</ExtraField3>
</product>
<product>
<PrimaryField1>666</PrimaryField1>
<MandatoryField2>aaa<MandatoryField2>
<MandatoryField3>bbb</MandatoryField3>
<ExtraField2>sf</ExtraField2>
<ExtraField4>extrainfo</ExtraField4>
</product>
</root>

The table has all the fields the primary fields,mandatory fields and all the other fields for which customer(s) may or may not give data.

What is the best possible to do this.

I tried generating an insert statement depending on the xml structure but that made the system slow, it was almost impossible to work with large xml files. some of the xml files have more than million records.

chazzy
01-25-2009, 06:26 PM
no matter how you shake it, inserting a million records it going to take some time. i would say though, create a single insert statement and just repeatedly bind parameters to it. also, send parameters that are missing as nulls, rather than not sending them, since in the file they are in fact null.

without seeing your code though, it's unclear how to tell where the bottleneck may be, IE maybe its in the XML parsing or the execution or something else.

jvanamali
01-25-2009, 08:07 PM
currently what i am doing is the changing the insert statement and looping through the file and inserting/ updating to the database.


But what i tried earlier was , get the xml into dataset,create an insert statement first and just loop through the records. I used the code below to generate the insert statement

private string GenerateInsertCommand(DataSet ds)
{
string str = "";

str = String.Format("INSERT INTO products_xml ('PROGRAMNAME','PROGRAMURL','LASTUPDATED','NAME','KEYWORDS','DESCRIPTION','SKU','MANUFACTURER','MANU FACTURERID','UPC','ISBN','CURRENCY','SALEPRICE','PRICE','RETAILPRICE','FROMPRICE','BUYURL','IMPRESSI ONURL','IMAGEURL','ADVERTISERCATEGORY','THIRDPARTYID','THIRDPARTYCATEGORY','AUTHOR','ARTIST','TITLE' ,'PUBLISHER','LABEL','FORMAT','SPECIAL','GIFT','PROMOTIONALTEXT','STARTDATE','ENDDATE','OFFLINE','ON LINE','INSTOCK','CONDITION','WARRANTY','STANDARDSHIPPINGCOST')" +
"VALUES ({0},{1},{2},{3},{4},{5},{6},{7},{8},{9},{10},{11},{12},{13},{14},{15},{16},{17},{18},{19},{20},{21} ,{22},{23},{25},{26},{27},{28},{29},{30},{31},{32},{33},{34},{35},{36},{37},{38},{39})",
getColumnIndex("PROGRAMNAME", ds), getColumnIndex("PROGRAMURL", ds), getColumnIndex("LASTUPDATED", ds), getColumnIndex("NAME", ds), getColumnIndex("KEYWORDS", ds), getColumnIndex("DESCRIPTION", ds), getColumnIndex("SKU", ds), getColumnIndex("MANUFACTURER", ds), getColumnIndex("MANUFACTURERID", ds), getColumnIndex("UPC", ds),
getColumnIndex("ISBN", ds), getColumnIndex("CURRENCY", ds), getColumnIndex("SALEPRICE", ds), getColumnIndex("PRICE", ds), getColumnIndex("RETAILPRICE", ds), getColumnIndex("FROMPRICE", ds), getColumnIndex("BUYURL", ds), getColumnIndex("IMPRESSIONURL", ds), getColumnIndex("IMAGEURL", ds), getColumnIndex("ADVERTISERCATEGORY", ds),
getColumnIndex("THIRDPARTYID", ds), getColumnIndex("THIRDPARTYCATEGORY", ds), getColumnIndex("AUTHOR", ds), getColumnIndex("ARTIST", ds), getColumnIndex("TITLE", ds), getColumnIndex("PUBLISHER", ds), getColumnIndex("LABEL", ds), getColumnIndex("FORMAT", ds), getColumnIndex("SPECIAL", ds), getColumnIndex("GIFT", ds),
getColumnIndex("PROMOTIONALTEXT", ds), getColumnIndex("STARTDATE", ds), getColumnIndex("AUTHOR", ds), getColumnIndex("ENDDATE", ds), getColumnIndex("OFFLINE", ds), getColumnIndex("ONLINE", ds), getColumnIndex("INSTOCK", ds), getColumnIndex("CONDITION", ds), getColumnIndex("WARRANTY", ds), getColumnIndex("STANDARDSHIPPINGCOST", ds));

return str;

}

private string getColumnIndex(string columnName, DataSet ds)
{
int i = 0;
string strColumnIndex = "NULL";
for (i = 0; i < ds.Tables[0].Columns.Count; i++)
{
if (columnName.ToLower().Trim() == ds.Tables[0].Columns[i].ColumnName.ToLower().Trim())
{
strColumnIndex = "ds.Tables[0].Rows[i][" + i + "]";
//strColumnIndex = "strColumn" + i;
}
}

return strColumnIndex;
}

this generates insert statement , but while looping through the records i need to replace ds.Tables[0].Rows[i][ColNum] with actual data.

so i had to write another loop to replace, so the number of total iterations became noofRecords * noofcolumns

I want a method in which i can just loop through the records in an efficient way.

Even to insert null i should know whether the col is existing in the xml or not, for which i need to write loop again to run through all the cols, which is increasing no of iterations .