Re: Optimizing long string foreign keys

From: Dave Hau <davehau_nospam_123_at_nospam_netscape.net>
Date: Fri, 08 Aug 2003 04:55:33 GMT
Message-ID: <3F332D45.3060605@nospam_netscape.net>

Boris Pelakh wrote:
> I have a schema design problem with which I would appreciate some
> advice. In my setup I have two tables, each having a very long string
> primary key (can easily be 1KB). More so, each row in table B refers
> to a row in Table A, so it has
> to store a copy of the TableA.key as well. I am concerned about
> performance,
> with the duplicate storage, the very long records, and the very long
> key.
>
> How can I resolve this problem ? I have considered auto-sequencing
> table A,
> and storing just the sequence number as a reference in table B, but I
> have
> a frequent need to do the following
>
> select <fields> from TableB where TableA_key = '<some value>';
>
> which would now force a JOIN. Is there a better solution ?
>
> I appreciate any help,
> Boris

I would still recommend auto-sequencing table A because the resulting join will still be faster than the original query with the long TableA_key included in TableB.

Here's how the original query (without joining) would be processed:

Look up b-tree index on TableA_key. (TableA_key is a foreign key in this table so each value can appear multiple times. So this index can potentially be huge because each long string can potentially be duplicated many times in this index. Also, each node traversal involves matching a long string which can be slow.) Obtain rowid's for matched rows.
Retrieve matched rows in the heap-organized table based on rowid(s). Again each row in the table is bigger than it needs to be because TableA_key is a long string.

Here's how the joined query (using a sequence for the primary key of table A) would be processed:

Look up b-tree index on TableA.longstring. Because this string is unique for each row in TableA (it used to be the primary key of TableA so I assume it must be unique), this index is smaller than the index on TableA_key in TableB that we saw above.) Get the rowid for the matched row.
Retrieve the matched row in TableA. Get the primary key (which is now an integer.)
Use this integer to look up a b-tree index on TableB.key (foreign key, which is also an integer.) This index is much smaller (because it consists of integers instead of long strings) and also each node traversal now involves matching an integer instead of a long string.
Retrieve the matched row in TableB based on the rowid obtained in Step 3.

If it's a data warehouse instead of OLTP, and TableB.foreignkey has a *low cardinality*, I would also use a bitmap join index if you're using 9i, so that the tables can be pre-joined. If the cardinality of TableB.foreignkey is low, then the performance improvement is well worth the extra storage required for the index.

CREATE BITMAP INDEX bji ON TableB (TableA.longstring) FROM TableB, TableA where TableB.foreignkey = TableA.primarykey

Cheers,
Dave Received on Thu Aug 07 2003 - 23:55:33 CDT