List partitioning

From: Lee <Lee_at_JamToday.com>
Date: Sat, 01 Dec 2007 21:05:14 -0500
Message-ID: <fit40g$6tl$1@reader1.panix.com>

<BackgroundInfo>:

Overimplifying for the sake of brevity:

An "RDF Triple" is simply a tuple consisiting of a subject, predicte and object, thus (S,P,O)
A "triple store" is a a table of triples.

They are of interest when they get to the Mega Row size and beyond.

3. Here is a snipet of a discussion about "MPT" a particular implementation of a triple store:

The traditional way of storing RDF in a relational database is to use a "big table of triples". Variations on this basic approach are employed by popular RDF storage engines including Jena, Sesame, and 3store.

Our approach, MPT ("Mapped Predicate Tables"), is different. Recognizing that the number of relationship types in real RDF data is much lower than the number of nodes, MPT distributes triples across several tables, each holding all the relationships of a certain type. This design offers efficient query plans for complex queries as well as an opportunity to scale across storage devices.

</BackgroundInfo>

I am not a DBA, however, it seems to me that what our heros have done is to invent (Re invent?) the idea of table partioning.

Their insight into their data is that the number of "predicates" is typically way smaller (they say 50, but lets say 50->1000 different predicates) than the number of triples (10s or even 100s of MegaRows)

But couldnt we get the same effect in Oracle by doing list partioning
(one partition for every predicate) or range partitioning (one partion
for each group of predicates, say one partion for predicate starting with a-h, another for predicates starting wih i-m, etc ) ?

Oh yes, I know that Oracle provides its own triple store as part of the RDF support which in turn is part of Oracle Spatial. Leave that aside for the moment, unless someone here has knowlege of some of the internals about how the Oracle triple store works.

This communication is about inviting informed comments about how one might use partitioning to advantage in the context above.

So what do you say? If I discovered that I had a table of three columns and Mega rows, but I could say that one of the columns had no more than
(50? 100? 1000? ) distinct values could I use partitioning to good
effect and if so, what sort ... i.e. are more partions always better or is a smaller set but uniformly populated better or what? Received on Sat Dec 01 2007 - 20:05:14 CST