Is an URL a good primary key? (or what else?)

From: Lorenzo Lazzeri <spider75_at_gmail.com>
Date: Wed, 6 Oct 2010 10:43:40 -0700 (PDT)
Message-ID: <ac427bda-fd94-4c9c-b563-dd6c798447f7_at_26g2000yqv.googlegroups.com>



I'm designing a database for an application which has to do with web URLs.
I'll have tables to store sites attributes as well as specific pages and RSS/Atom feeds. I'll have millions of them. Everything has an URL which is by definition unique. I'm in the "natural key" bandwagon, but I have doubts to use the URLs themselves as primary key, for the reason that they can be very (very) long. The HTTP standard doesn't set a limit, and the smaller max size limit imposed by a browser is 2048 characters. So I need to have URLs field of at least CHAR(2048).
If I use URL as primary key, will that size create problems of performance in joins?

So I tried with a surrogate key, the sha1() hash of the URL. It is very useful, still long 40 chars, but it has all the problems of surrogate key: it says nothing about the data it refers to if taken by itself, and it needs some triggers and on update cascade costraints to keep the whole schema coherent if something (the url...) changes. Moreover, I need a further key on the url to speed up searches.

I want to avoid auto incremental id for the well known reasons: moreover it has to do with the data even less than the hash (i.e. nothing).

So I'm looking for and advice on it. Should I use URLs as primary key or what else? Is that performance problem on join really an issue or can I ignore it? May be because the index is already treated as an hash internally by the db?

In the meantime I'm exploring another solution. Since I'll have at least seven tables with URL as discriminant (and couting as I add features), maybe I should remove the URL from that tables and store them in another table as schema, domain, port, path, query_string, auth_user, auth_password. Then I could use the url hash on the other tables as foreign key as well as primary key. Am I totally wrong on this way?

For the sake of completeness I'm designing for MySQL 5.1, even if I'd like to avoid non-standard constructs.

Thank you in advance for any answers.

Lorenzo Lazzeri

P.S. Please forgive my english syntax and don't hesitate to ask me clarifications where needed. Received on Wed Oct 06 2010 - 19:43:40 CEST

Original text of this message