Re: Aggregates: Largest Groups
Date: 30 Mar 2010 20:46:30 GMT
Message-ID: <4bb26326$0$14122$703f8584_at_textnews.kpn.nl>
David Fetter wrote:
>Let's say we're looking for groups of 20 or more vehicles whose
>average speed is >= 175 kph.
>
>The first 20 vehicles pass by at 200 kph, then another 20 pass at 100,
>each at distinct times.
>
>When vehicle 6 of the slower bunch passes, the average speed of cars
>up until now is (20*200+6*100)/26 or ~176.92 kph. Vehicle 7 passes,
>and the average is now ~174.07, which is under 175, so the first group
>has 26 vehicles in it, and we're now looking to start the next group.
What do you count as a vehicle? Is it just one measurement in the record, or do you have any way of cross-identiying different measurements as belonging to the same vehicle? Your use of the term 'average' suggests that you do.
Do you want to take averages over all measurements on a vehicle ever done? Or per month, week, day, or some other interval?
Once this is sorted out you can write a query that will list all vehicles and their average speeds, sorted by average speed if you like.
Now comes the part that is very awkward in SQL: grouping based on the ordering. (If that is what you really want, that is.) SQL doesn't support the use of total orderings on domains very well.
The only way I see to do it is a follows. First you join the query result against itself (so I'd put it into a table first): the result has three attributes, namely, a vehicle V1, a vehicle V2, and the average speed A of all vehicles with an average speed between that of V1 and V2, inclusive. Next you select from that table all tuples V1, V2, A such that for all V1, V2, A', either A' < A or A' > 25 (which requires another selfjoin). Finally you need to select all V1, V2, A from this selection such that either A is the heighest average speed or for some vehicle V'1 with maximum speed, there is a natural number k >=1 such that there are V'k and Ak such that V'k = V1 and V'1, V'2, A1, ... V'k-1, V'k, Ak are in the selection. This requires transitive closure, which is apparently supported in modern standard SQL by the means of recursive subqueries, which I've never used.
ALl this is awkward and extremely expensive for an operation that, if we could just perform aggregations along a totel ordering of the domain, would be linear. So if you have a "wrapper" language such as TSQL or PL/SQL, write a procedure to do this.
In summary:
- if you described your intention more clearly, we wouldn't have to guess so much
- if I understand your intention correctly, I doubt your query can be expressed in relational algebra, which may explain why you have trouble expressing it.
-- ReinierReceived on Tue Mar 30 2010 - 22:46:30 CEST