Mazes, trees, and forests

From: mAsterdam <>
Date: Fri, 23 Apr 2004 09:34:31 +0200
Message-ID: <4088c70c$0$574$>

Mazes, trees, and forests.

Say we have a populated database. For the ease of discussion let's assume it is a relational database.

Definition of "biggest": The biggest relation is the relation with the largest number
of data-items:

	# of tuples x # of non-key attributes
         max Pop(R) = Tup(R) x NKA(R)

Algorithm: Forestify database.

Find minimum loss trees:

A1. Take the "biggest" relation, R1.

A2. Walk up from R1 along the foreign-keys of R1 to R2. Take the first candidate R2 to be the biggest relation to which an FK in R1 refers.
Etcetera until we have a relation with no foreign keys. This gives us candidate tree Tree(R1,1). Repeat A2 until we walked all trees.
Delete all population in the biggest tree from the database.

How much data (sum of (#T x #NKA) over all relations)) is left? Repeat from A1 n times on the remaining data.

How much data is deleted after 1 time, after n? In other words how much of the database
can be covered by just n trees?

This tells us something about wether and how much of the data in the database can be expressed as trees and the loss we incur by doing so.

I suspect that in many actual databases the size of the database very rapidly shrinks in the first few iterations. In other words I suspect that that a small forest can be rather expressive. Received on Fri Apr 23 2004 - 09:34:31 CEST

Original text of this message