Re: Self Joins and optimization

From: Vadim Tropashko <vadimtro_invalid_at_yahoo.com>
Date: 11 May 2007 15:50:19 -0700
Message-ID: <1178923819.226853.9500_at_p77g2000hsh.googlegroups.com>


On May 5, 10:30 am, Kevin Kirkpatrick <kvnkrkpt..._at_gmail.com> wrote:
> A simple case study: extend a relation by adding an attribute that
> gives the rank of each tuple based on the ordering of a subset of
> existing attributes of the relation. With primitive relational
> operators, the only way I'm aware of to express this is:
>
> SELECT E1.NAME, E1.SAL, COUNT(*) RNK
> FROM EMP E1 EMP E2
> WHERE E2.SAL <= E1.SAL
> GROUP BY E1.NAME, E1.SAL;
>
> I'm not aware of any rationale that would allow an optimizer to
> compute the above with anything faster than a O(n^2) algorithm.
>
> However, the SQL "windowing function" version:
>
> SELECT NAME, SAL, RANK() OVER (ORDER BY SAL) RNK
> FROM EMP;
>
> computes the relation in O(n log n) time (it does a sort, then a scan,
> of the SAL field, to compute the RANK() function for each tuple).

Well, this is a favorite tune of the "analytics rocks" folks. I disagree. What if you want to ask salary rank of all the people in the department 10, or a single person in a big corporation? The self-join query is easily adjusted to the new requirements:

SELECT E1.ENAME, E1.SAL, COUNT(*) RNK
FROM EMP E1, EMP E2

WHERE E2.SAL <= E1.SAL
AND E1.ename = 'BLAKE'
GROUP BY E1.ENAME, E1.SAL Wouldn't you have to apply selection on top of analytics query? Now, both queries have complexity O(n*log n), but I would argue that logarithm for index unique scan smaller than logarithm for sorting:-) Next, what if EMP is not actually a table but view, can you be sure that optimizer would be able to find all the optimizations that it can do in simple select-project-join-groupby operation space? Received on Sat May 12 2007 - 00:50:19 CEST

Original text of this message