Re: A question for Mr. Celko
Date: Mon, 19 Jul 2004 04:08:48 GMT
Message-ID: <kPHKc.128410$XM6.22757_at_attbi_s53>
"Jan Hidders" <jan.hidders_at_REMOVETHIS.pandora.be> wrote in message news:pan.2004.07.19.00.19.32.644229_at_REMOVETHIS.pandora.be...
> On Sun, 18 Jul 2004 16:13:24 +0000, Marshall Spight wrote:
> > "Jan Hidders" <jan.hidders_at_REMOVETHIS.pandora.be> wrote in message
> > news:pan.2004.07.18.10.12.22.935425_at_REMOVETHIS.pandora.be...
> >> On Sun, 18 Jul 2004 00:12:27 +0000, Marshall Spight wrote:
> >>
> >> So what happens for example if the list is really big and I do a join
> >> between the list and the relation in the relation-valued attribute.
> >> Will your query optimizer recoginize that situation and choose the
> >> right join algorithm?
> >
> > I'm still unclear on implemenation techniques involving mixing lists and
> > relations. (Also, I'm not sure what list you mean; did you mean "*a*
> > list" instead of "*the* list"? Is the list an attribute?) Can I rewrite
> > your question to use two relation-valued attributes?
>
> Er, no, because then it becomes "Can the optimizer decide when a join
> between two relations is a join?" which is not really such a difficult
> question. :-)
Fair enough.
> Let me simplify it for you to the question if in you list
> algebra the optimzer can recognize exactly when a certain expression
> actually computes a join.
I see. Yes, this is a good point; if there's no way to build a query optimizer for the proposed data model that's performant, then the model's not so attractive.
> > The list-as-relation question is a hard one. For most lists, the best
> > implementation is an array, but this doesn't make the situation very
> > nice when you want to treat the list a relation on (position, element)
> > and do queries on it.
>
> That's a physical layer problem and the ideal database should let you
> choose the right data structure for the mix of updates and queries that
> you expect.
> So the question is not "what is the best data structure" but
> "what data structures are there" and in which case should we use which
> data structure.
Okay. So I'll ask: what data structures are there? And what kind of join executions are there? I guess we have nested loops and merge-joins. Nested loops make no demands on the data structure that I can think of. Merge joins require sorted inputs, yes? Nested loop is O(N*M) but merge join is O(N+M), which is pretty cool. I guess you could sort the data on the fly, as well. And there are hash joins, yes? Don't you need an index for that? Or I guess you could build the hashtable on the fly there too.
Is there a definitive text on this subject? I've only read a few papers so far, and they haven't been all that enlightening.
> And after that you have to tackle the question how you let
> the query optimizer make use of the data structure that is used
> when it is confronted with a certain ad hoc query.
I'd propose there's an additional dimension to all of this, which is data compression. If our data structure and query executor can handle compression, then we can improve overall throughput by reducing total I/O. That might also make the difference between a cache hit and a cache miss, or even allow some datasets to be entirely in-memory where they wouldn't have been before.
Marshall Received on Mon Jul 19 2004 - 06:08:48 CEST