Mazes, trees, and forests
Date: Fri, 23 Apr 2004 09:34:31 +0200
Mazes, trees, and forests.
Definition of "biggest": The biggest relation is
the relation with the largest number
# of tuples x # of non-key attributes max Pop(R) = Tup(R) x NKA(R)
Algorithm: Forestify database.
Find minimum loss trees:
A1. Take the "biggest" relation, R1.
A2. Walk up from R1 along the foreign-keys of R1 to R2.
Take the first candidate R2 to be the biggest relation
to which an FK in R1 refers.
Etcetera until we have a relation with no foreign keys. This gives us candidate tree Tree(R1,1). Repeat A2 until we walked all trees.
Delete all population in the biggest tree from the database.
How much data (sum of (#T x #NKA) over all relations)) is left? Repeat from A1 n times on the remaining data.
How much data is deleted after 1 time, after n?
In other words how much of the database
can be covered by just n trees?
This tells us something about wether and how much of the data in the database can be expressed as trees and the loss we incur by doing so.
I suspect that in many actual databases the size of the database very rapidly shrinks in the first few iterations. In other words I suspect that that a small forest can be rather expressive. Received on Fri Apr 23 2004 - 09:34:31 CEST