Click to See Complete Forum and Search --> : Column type/size for url shortening site?
petewilliams
01-22-2008, 09:34 AM
I'm building a URL shortening site along the lines of TinyURL, just for practice.
I'm currently designing my database tables and am not sure what data type and size to use for the unshortened URL?
I was thinking probably VARCHAR is best, but what maximum length should I set? Some URLs can run into several hundred characters but if I made the column say 5,000 characters long, would that significantly slow down my site?
Thanks,
Pete
TJ111
01-22-2008, 09:40 AM
http://www.boutell.com/newfaq/misc/urllength.html
I'd say 2,000 is a pretty safe bet, I doubt you'll run into any URL's longer then that. Use CHAR instead of VARCHAR, as it will have a fixed length making it easier to index.
petewilliams
01-22-2008, 10:18 AM
Thanks TJ, interesting to see that different browsers support drastically different URL lengths!
I guess the problem I'm facing is that I want to accept as wide a range of URLs as possible while at the same time maximising performance.
Let's say I've got a URL that's 200 characters long, is there any performance difference if this was stored in a VARCHAR(300) or a VARCHAR(5000)? Either way the data will be the same length, so I wonder if they'd perform the same, but the second would just give a larger potential allowance?
Pete
TJ111
01-22-2008, 10:35 AM
Here's a quote from the site I linked
Extremely long URLs are usually a mistake. URLs over 2,000 characters will not work in the most popular web browser. Don't use them if you intend your site to work for the majority of Internet users.
So 2000 maxlength is a pretty safe bet.
Ok, I'm not 100% positive about what I'm saying here, so don't take it as a FACT, it's based on my understanding of it.
If you use type VARCHAR(2000), it means it is a variable length of characters, so on every index lookup the database has to check the actual length of the content stored in that field.
If you use type CHAR(2000), it will assume every field contains 2000 characters, even if it does not. That means that the database will be slightly larger (each new row will allocate 2000 bytes for that field), but it will increase indexing performance as the database already knows the size of the content in that field.
Like I said that's probably not 100% syntactically correct, but it's the general idea behind the different field types/sizes and how it affects performance. I know someone on these forums can clarify further.
petewilliams
01-22-2008, 10:58 AM
I think in my case, using VARCHAR might not actually affect the query speed, since I will only be returning that column, rather than searching for a match within it. If that's true then using VARCHAR should help to minimise wasted space in my DB without taking a performance hit. I may be wrong though...
The other thing is that I think CHAR will only go up to 255 characters.
Pete
TJ111
01-22-2008, 11:14 AM
Your right that CHAR will only go to 255 in a MySQL db. Here's something to look at: http://www.raditha.com/wiki/SQL_character_types. Remember, each byte = 1 character. I think the type you use ultimately depends upon what type of database you are using.
In pgsql the choice of data type does not effect performance, the same cannot be said about the use of varchar, text and blob in mysql, where they lead to to slower inserts/selects especially with MyIsam tables.
Finally the only difference between blobs and text in mysql is that the former is case sensitive while the latter is not.