Re: Aggregates: Largest Groups

From: Sampo Syreeni <decoy_at_iki.fi>
Date: Thu, 8 Apr 2010 07:51:48 -0700 (PDT)
Message-ID: <62f02217-2685-4233-a47f-1dba41b7b786_at_i37g2000yqn.googlegroups.com>


On Mar 22, 11:42 pm, da..._at_fetter.org (David Fetter) wrote:

> I'd like to know what all the groups of vehicles larger than X whose
> with average speed over Y is.

That's a nasty algorithmic design problem. Doing a greedy, streaming heuristic is easy enough by updating a flexible LIFO queue, its length and a running sum over it, extending and shrinking it as needed. But formally getting a list of all of the contiguous subranges becomes difficult, especially if the sequence values are not uniformly bounded. My guess is that the simplest algorithm in this case would be some sort of backwards-forwards one, revolving around adding/removing samples from consideration based on their position in the overall magnitude order.

This isn't really a database question. But why isn't it? I think mainly because data sublanguages like SQL aren't fit to handle this sort of thing. On the one hand that is by design: they've laways catered to typical categorical business processing needs at the expense of arbitrary (e.g. numerical) computations. That's where your typical relational optimizer gets the organization and logical structure which makes it so easy to use and efficient. But on the other hand, there are lots of problems with order, time series, aggregation, topology based computation, open ended recursion and processing of gridded ("scientific") data which can be important, have in fact been addressed e.g. in Fortran compilers, and should perhaps be incorporated into data sublanguages as well. That extends to business needs as well -- anybody who's ever had to deal with financial time series data can attest to that.

From that viewpoint, it really is a good database theory question why data manipulation languages are so poor, unexpressive and inefficient at this sort of thing.

--
Sampo
Received on Thu Apr 08 2010 - 16:51:48 CEST

Original text of this message