-CELKO- wrote:
> Observation:
>
> Nobody here proposed looking at Industry Standards. Why do we like to
> re-invent the wheel?
Well, yeah, except for maybe me. If you go back and take a closer look
at my original post (which I'll admit is prolix, as my technical
writing generally is), you'll see that I specified the industry-related
codes which I have considered, and rejected, for use in this model.
Just a quick, enumerated recap:
- The primary identity used by the international intellectual property
establishment (which transcends the RIAA and the various music
licensors) is the ISRC. I found two main weaknesses for using it for my
purposes:
- It is attached at the track, rather than the album level.
- While the ISRC is supposed to be unique per track, ISRCs are
recycled between albums if the label believes the track to be
identical. This is a problem because labels frequently hide uncredited
edits of the tracks, so you can have tracks with identical ISRCs but
which differ by minutes in length. To use ISRCs as part of a candidate
key, you need Tracks(isrc, duration), and not Tracks(isrc) alone.
- The primary identity used by music retailers and distributors is the
ISBN / UPC / EAN associated with the barcode(s) on the release. Anyone
familiar with buying and selling independent media knows that the
glaring problem with using that as an identifier for releases is the
plethora of independent labels (and distributors) who are free-wheelin'
and fun-lovin' and can't be bothered with obtaining either. As DIY
distribution has grown more sophisticated in the Internet age, this has
become less prevalent, but there's still a HUGE volume of music
released without these codes.
- Musicbrainz has taken a whack at providing unique identifiers for
both albums (independent of release status, which is exactly what I'm
looking for) and tracks. It's a noble effort to bring some stability to
the chaotic sea of musical metadata on the Internet (Musicbrainz was
invented to help identify random audio files you have sitting on your
hard drive), but it's not an authoritative source. Also, it uses GUIDs,
and you yourself said GUIDs were a bad choice for keys, Mr. Celko.
- Various libraries (including the Library of Congress) have defined
categorization schemas for audio recordings. They're useful for
classification but collapse under the weight of popular music, do not
cover much music actually actually released and consumed by average
listeners, and are designed for classification and not identification.
- Muze, Gracenote and the FreeDB project all provide lookup services
for CDs (which is to say, releases rather than albums). They use the
embedded table of contents on the CD to define identity (although
anyone who's used one of the covered services knows that there are disc
collisions all the time), but the identifiers they use are, again,
basically surrogate keys into their own schemas. Also, none of them
even pretend to have uniqueness or sound relational principles as a
design goal. And some of them cost money to use.
> But why did Roy
> miss the entire music industry? This is the guy who knows about
> additive congruential methods of generating values in pseudo-random
> order!
Roy's answers have been extremely helpful to me. Without everyone's
input, I never would have arrived at my present schema, which I think
is going to work. When I get it a little fleshed out, I'll post the DDL
for the relevant tables for everyone's dissection / approval /
castigation.
> I offer that we are still thinking like "procedural coders" and NOT
> like "database people" instead.
I submit that this has nothing to do with procedural vs relational
thinking. I included the results of my research in my original post,
which while not exhaustive are certainly pretty extensive, to show that
I had hit a wall because nobody's really tried to solve this problem
exactly the same way as me. After my research, I realized I was still
screwed. Also, even though I don't think it's particularly relevant, my
day job is working for a major music download service, so I'm working
closely with a real-world schema dealing with similar data on a daily
basis. My day job uses auto-generated sequence IDs like crazy, because
it turns out defining this model is hard.
Received on Fri Jan 20 2006 - 14:14:42 CST