www.webdeveloper.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 23

Thread: Multiple Values in One MySQL Column?

Hybrid View

  1. #1
    Join Date
    Oct 2009
    Posts
    13

    Exclamation Multiple Values in One MySQL Column?

    I have a site with a product table in a MySQL Database. The basic structure, simplified for example, is as follows:

    Item Name | Item Description | Color

    My problem is, some items have multiple color choices. For example, I might have something similar to the following in my Database:

    Item One | This Is Item One | Blue
    Item Two | This Is Item Two | Green
    Item Three | This Is Item Three | Blue, Purple, Green
    Item Four | This Is Item Four | Green, Red

    There are several colors that need to be included for items, but I am unsure how to store them. I need the user to be able to select the color from a drop down next to the item, which will then fill in a PHP generated PayPal button.

    Any help or advice would be appreciated. Thank you for your time

  2. #2
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    19,176
    Multiple data elements in one column is a no-no (or "non-normal" if you want to get more technical ).

    You probably want to create a separate table which would be a many-to-one relation to the item table. It would, at a minimum, have two columns: item_id and color. You could even have a color table that would have an auto-increment key and the color name, in which case the color relation table's columns would be item_id and color_id. You then would just use a JOIN or two in your queries to get the item and its color.

    If this is new to you, you might want to take a look at http://dev.mysql.com/tech-resources/...alization.html.
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  3. #3
    Join Date
    Oct 2009
    Posts
    13
    Sounds good. Thanks for the information, I appreciate it. I'll definitely check out that link as well, and do some more research.

  4. #4
    Join Date
    Oct 2009
    Posts
    13
    Well, relational tables definitely looks like the way to go.

    I'm still wondering, what would the structure of the Color table be? I think I've been guilty of the "Spreadsheet Syndrome" in the past, as the article you posted describes, and now "Normalizing" is throwing some learning curves my way, haha.

    Also, how would I go about joining the tables? If I have multiple colors for some items..

    I'll continue researching, but any additional information you could provide would be greatly appreciated.

  5. #5
    Join Date
    Jan 2007
    Location
    Wisconsin
    Posts
    2,120
    Before you spend hours normalizing your database and writing complex queries to deal with normalized data, you need to determine whether the intended normalization is beneficial. (sometimes its not) People can get carried away with normalization. And, before you can normalize your data in a beneficial way, you need to determine which of your fields or attributes are really also objects.

    Some questions questions to consider:

    • Does each color (like "blue") have it's own properties? (that you need to store)
    • Will you ever need to query based on the a single item from the list of attribute X? (do you ever need to select all products where color contains blue?)
    • Do colors (like blue) need to regularly changed globally (like to green or light-blue)?


    Now, this ...
    Multiple data elements in one column is a no-no (or "non-normal" if you want to get more technical ).
    ... points to an important question: Is it a list of colors or string of colors? To illustrate the point, consider the Item Name field. Is it a list of characters or a string of characters? Or, is it a list of words or a string of words? To assist in making this decision, refer to previous list of questions ...

    Now, in your case, NogDog is probably right. Color may best be reference to another entity. But, before you make the design decision--be sure. And, if you're just getting into database design and normalization, you need to take care to think in terms of how you intend to query the database and think about the data. You can't correctly normalize without first philosophizing.

    And then, after crafting the ideal image of your data, you need to reconcile it with efficient operation. There's no sense thinking of colors as a separate entity if it only serves to complicate your queries and/or force you to pull more rows (which you'll ultimately recompile into a single record).

    ... sorry. I don't mean to sew confusion--even less do I mean to contradict NogDog. He's absolutely right. But, I also mean to spread a word of caution when normalizing data: it can become overzealous and detrimental very quickly ...
    Jon Wire

    thepointless.com | rounded corner generator

    I agree with Apple. Flash is just terrible.

    Use CODE tags!

  6. #6
    Join Date
    Nov 2008
    Posts
    2,477
    Quote Originally Posted by svidgen View Post
    you need to determine whether the intended normalization is beneficial. (sometimes its not)...
    Care to venture an example? I can't think of any situation in which you'd want to be using text fields with csv data, or loads of order1, order2, order3 etc columns in a table, or directly writing things like colour names into a field which could just hold a simple indexed integer foreign key.

    I would be very wary about following advice which disagrees with the importance of normalizing database tables. To quote the article posted above, "with this goal come benefits in the form of reduced redundancies, fewer anomalies, and improved efficiencies" - not sure why anyone would advise against that?
    Last edited by Mindzai; 03-11-2010 at 01:03 PM.
    The first rule of Tautology Club is the first rule of Tautology Club.

  7. #7
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    19,176
    Quote Originally Posted by Swoods View Post
    Well, relational tables definitely looks like the way to go.

    I'm still wondering, what would the structure of the Color table be? I think I've been guilty of the "Spreadsheet Syndrome" in the past, as the article you posted describes, and now "Normalizing" is throwing some learning curves my way, haha.

    Also, how would I go about joining the tables? If I have multiple colors for some items..

    I'll continue researching, but any additional information you could provide would be greatly appreciated.
    One possible implementation:
    Code:
    item
    =============
    id  (int, auto-increment PK)
    name
    description
    et_cetera
    
    color
    ==============
    id (int, auto-increment PK)
    color
    
    item_color
    ===============
    item_id (int)
    color_id (int)
    (PK would be set on the combination of both columns)
    Sample query:
    PHP Code:
    $id = (int) $_POST['item_id'];
    $sql "
    SELECT
       item.id,
       item.name,
       GROUP_CONCAT(color.color SEPARATOR ',') AS colors
    FROM item
       INNER JOIN item_color AS ic ON ic.item_id = item.id
       INNER JOIN color ON ic.color_id = color.id
    WHERE item.id = 
    $id
    GROUP BY item.id
    "

    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  8. #8
    Join Date
    Dec 2005
    Posts
    2,984
    I'd like to semi-firmly disagree (if any experts would like to shut me up, please do so) with the idea that multiple values in a single field is a bad idea. I know, most everybody here said it's 'usually' a bad idea or something along those lines, which I completely agree with.

    But what about the 'set' type of a column in mysql? It was created for a reason, wasn't it? I've recently worked on a project in which the client gathers large amounts of data about lots of different people in which they can dynamically create fields which are, among other things, drop down, or multiple select fields. I found set and enum fields very useful for this type of project...
    Last edited by aj_nsc; 03-11-2010 at 02:06 PM.
    I've switched careers...
    I'm NO LONGER a scientist,
    but now a web developer...
    awesome.

  9. #9
    Join Date
    Jan 2007
    Location
    Wisconsin
    Posts
    2,120
    Before I head off to my meeting, I'd like to quickly point out that I'm not advocating widespread denormalization here. I wasn't originally intending to advocate denormalization at all. My point more along the lines of taking care when distinguishing between what is and is not an entity: entities are independent of other entities in some significant capacity.

    A list of colors ... well, each of those colors may be a normalizable entity. On the other hand, maybe it's just a string ... Again--not saying it IS just a string in this case, just that it could be.

    And, even if it IS a set of entities, you may want to denormalize for the sake of efficiency. I mean, are you really worried about having to perform a mass-update to change all occurrences of blue to ... not-blue?

    ... just ... don't get zealous. Application performance suffers under two conditions: under-zeal and over-zeal. Don't zeal too much in either direction. that's all ... let's not blow **** out of proportion here.
    Jon Wire

    thepointless.com | rounded corner generator

    I agree with Apple. Flash is just terrible.

    Use CODE tags!

  10. #10
    Join Date
    Nov 2008
    Posts
    2,477
    Quote Originally Posted by aj_nsc View Post
    I'd like to semi-firmly disagree (if any experts would like to shut me up, please do so) with the idea that multiple values in a single field is a bad idea. I know, most everybody here said it's 'usually' a bad idea or something along those lines, which I completely agree with.

    But what about the 'set' type of a column in mysql? It was created for a reason, wasn't it? I've recently worked on a project in which the client gathers large amounts of data about lots of different people in which they can dynamically create fields which are, among other things, drop down, or multiple select fields. I found set and enum fields very useful for this type of project...
    Using ENUM is not the same thing - ENUM fields are stored internally as separate, integer values - data is not actually stored as text in the same way as stuffing csv data into a field. The performance issues alone with the CSV approach is enough to make it a bad idea, and that's before you even get in to the other implications such as future expansion and changes. It's just a bad idea, good, normalized database tables are always best.
    The first rule of Tautology Club is the first rule of Tautology Club.

  11. #11
    Join Date
    Jan 2007
    Location
    Wisconsin
    Posts
    2,120
    The performance issues alone with the CSV approach is enough to make it a bad idea, and that's before you even get in to the other implications such as future expansion and changes. It's just a bad idea, good, normalized database tables are always best.
    There's a lot more debate out this in scholarly settings than you're letting on (or are possibly even aware of). Denormalization is a very common, well-accepted practice. If you're going to insist that it's not acceptable, you need to justify the position.

    Please explain why it is always inefficient to store comma-delimited data in an table column. Take various access patterns into account.

    For instance, I would argue that storing colors as a comma-delimited string lowers the complexity of accessing a produce from O((M/N)*Log(M)*Log(N)) to (Log(M)) where N is the total number of color assignments used (relationships between a color and a product) and M is the number of products in the products table.

    Log(N) is the cost of the product lookup, assuming a BTREE index.
    Log(M) is the cost of the color lookup, assuming a BTREE index.
    M/N is the average cost of joining the product to its colors.

    So, by denormalizing the data, you're looking at saving a M/N Log(N) lookups, each of which will be a higher cost than the single Log(M) lookup needed for the denormalized form. The cost is the inability to efficiently search, update, or remove single colors globally across all product rows. Of course, making those processes efficient requires yet another table, which introduces another lookup ...

    So, if you don't need to do global color updates on a regular basis or search for products by single colors, it's probably more efficient and sensible to store them in a single field. For this application, I'm assuming the ability to do these things needs to be present, making this approach incorrect--but you cannot generalize and say that this a universally incorrect approach. That is not only incorrect, but irresponsibly mis-informative!!!

    ... forgive the hasty complexity estimates here. In a rush ...
    Jon Wire

    thepointless.com | rounded corner generator

    I agree with Apple. Flash is just terrible.

    Use CODE tags!

  12. #12
    Join Date
    Nov 2008
    Posts
    2,477
    Quote Originally Posted by svidgen View Post
    For instance, I would argue that storing colors as a comma-delimited string lowers the complexity of accessing a produce from O((M/N)*Log(M)*Log(N)) to (Log(M)) where N is the total number of color assignments used (relationships between a color and a product) and M is the number of products in the products table.

    Log(N) is the cost of the product lookup, assuming a BTREE index.
    Log(M) is the cost of the color lookup, assuming a BTREE index.
    M/N is the average cost of joining the product to its colors.

    So, by denormalizing the data, you're looking at saving a M/N Log(N) lookups, each of which will be a higher cost than the single Log(M) lookup needed for the denormalized form.
    However this is not the end of the story because you now have to post process all your data to get it in a useable form. You also have potential issues with large indexes from TEXT fields used to store the CSV (Postgre in particular has issues), your tables are locked whilst maintaining these larger indexes, and you also have to use more storage for the TEXT field when you could have used an INT. There are more factors involved here.

    The cost is the inability to efficiently search, update, or remove single colors globally across all product rows.
    That's certainly one of the other costs, though there are more.
    The first rule of Tautology Club is the first rule of Tautology Club.

  13. #13
    Join Date
    Jan 2007
    Location
    Wisconsin
    Posts
    2,120
    However this is not the end of the story because you now have to post process all your data to get it in a useable form.
    Well ... yes. And, storing a comma delimited list of colors is no more difficult--if not easier--to process than iterating through a resultset and throwing away redundant data.

    PHP Code:
    // it's a one liner ... 
    $colors preg_split(",\s+"$row['colors']); 
    Table locking? ... only if that field is in your where clause. And my assumption is that you wouldn't do that in this case. In fact, that's the key pressupposition for storing the data that way--it's not indexed!
    Jon Wire

    thepointless.com | rounded corner generator

    I agree with Apple. Flash is just terrible.

    Use CODE tags!

  14. #14
    Join Date
    Aug 2004
    Location
    Ankh-Morpork
    Posts
    19,176
    I think in large part it comes down to whether you may be (now or in the likely foreseeable future) wanting to search the database based on one of those individual values. If so, it should be more efficient to use a separate relational table with individual values, allowing an index on that column to help optimize searches and sorting. If you clump them into one field, say "blue,red,green" and then want to search for all items which include "green", the DBMS will likely have to do a full table read each time in order to parse each field based on some sort of regexp, LIKE, or other parsing mechanism.

    If, on the other hand, all you expect to do is extract the colors and display them in some manner (say, a select form object to choose which color to purchase) and you have no reasonable expectation to use it in any other manner where you would need to search/sort/group by color, then I would have no real problem with keeping it "clumped" (though probably I'd still separate it into another table, as it's really not all that difficult one you get the hang of doing simple JOINs, and then I'm ready for any future enhancements where I might decide I need it in a separate table).
    "Please give us a simple answer, so that we don't have to think, because if we think, we might find answers that don't fit the way we want the world to be."
    ~ Terry Pratchett in Nation

    eBookworm.us

  15. #15
    Join Date
    Jan 2007
    Location
    Wisconsin
    Posts
    2,120
    Quote Originally Posted by NogDog View Post
    I think in large part it comes down to whether you may be (now or in the likely foreseeable future) wanting to search the database based on one of those individual values ...
    agreed.
    Jon Wire

    thepointless.com | rounded corner generator

    I agree with Apple. Flash is just terrible.

    Use CODE tags!

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
HTML5 Development Center



Recent Articles