I have come across a difficult problem while designing my
database; basically I guess it involves mapping my data to a
relational format.
Each of the records of my database has a variable number
of attributes, further, the attributes can have different
names. Each record corresponds to a disk file, which is a
binary image file containing lots of tags. Each tag is
identified by a name, length and value triple. We want to
put these tags into a database to facilitate finding required
files using searches of the tag space.
The problem is that a file can have any tags, ie the file
author can define and use his own tags. In a particular
subset of files the tags might be consistent, but across the
whole lot they will not be.
The tag names, I guess, correspond to the attributes in a
relational schema; but these cannot be fixed for the entire
data set. However one attribute at least is fixed, that is the
file name (fully qualified path names are unique, other than
that, it's a mush).
I was thinking of a lookup table for the rest of the tags, ie
have a table, call it tagstbl. It has an external key based on
the primary key of the filename table; attributes tag name
and tag value.
The problem is that there are approx 200 tags per file; and
the total number of files is large (and growing), I can't
imagine decent performance if I put all my tag lookups into
one table. Perhaps if I could segment the tags over 2 or
more tables (instead of 1 tagtbl) that would help, but I don't
know what criterion I could use.
Is there a generalized way to deal with mapping this data
organization to the relational domain? Is what I am
planning even appropriate? Any pointers are welcome.
Thanks in advance,
Nandan
BTW is this group moderated? my previous post has
disappeared...
Received on Mon Oct 28 2002 - 21:02:46 CET