Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.misc -> Re: SQL frustration - cannot make "obvious" query work well
giuliano.procida_at_googlemail.com wrote:
> I have a fairly large data set, consisting of slices of points
> (essentially in 2-d). Around 100million points in 1million slices.
> Points/slice is very skewed (very many with < 10 points, very few >
> 10000 points).
>
> create table points (
> slice number(10) not null,
> x number(10) not null,
> y number(10) not null,
> value number(10) not null,
> constraint pky primary key (slice, x, y)
> ) -- organization index -- not currently an IOT, but should be
>
> It so happens that each slice is a meet semi-lattice (i.e, max(x),
> max(y) over any non-empty subset of a slice is also in the slice).
>
> Typical access is to look at subset of slices (values from another
> query) and for each slice, look for the the top-right point in a
> bottom-left quarter-plane and return the slice and value.
>
> The old query selected max(x) for a slice, and max(y) for a slice[,
> max(x)] and finally the row itself. Doing this in bulk for many slices
> involves plans with expensives scans and sorts.
>
> There is a very efficient query to fetch a value for a single slice:
>
> SELECT slice, value
> FROM (
> SELECT slice, value
> FROM points
> WHERE slice = :slice
> AND x <= :x
> AND y <= :y
> ORDER BY x DESC, y DESC
> )
> WHERE ROWNUM = 1
>
> The query plan uses the index, and a stopkey.
>
> The query cannot be rewritten to work with subsets of slices:
>
> SELECT slice, precious_data
> FROM
> all_slices,
> (
> SELECT slice, value
> FROM (
> SELECT slice, value
> FROM points
> WHERE slice = all_slices.slice
> AND x <= :x
> AND y <= :y
> ORDER BY y DESC, x DESC
> )
> WHERE ROWNUM = 1
> )
>
> -- ORA-00904: "ALL_SLICES"."SLICE": invalid identifier
>
> The nested, nested query cannot see the top-level table. This is very
> annoying.
>
> I have tried:
>
> a) Getting around the 00904 limitation somehow. No luck. This exact
> issue has popped up to asktom etc. a few times, no solution seems
> available.
>
> b) Writing a pipelined function that PIPEs out the single rows returned
> by the single-slice query, looping over slices returning by a driving
> query.
>
> This works a treat, over double the performance of the equivalent query
> using max(x) subselects, but the driving query varies a lot in the
> application (50 different queries => 50 functions to put in a package).
>
> c) Writing a version of the pipelined function that takes a cursor
> returning slices.
>
> I thought this would be flexible and as fast as b) but performance is
> worse than the original max(x), max(y) query.
>
> d) Using Oracle analytics.
>
> Selecting first_value, partitioning by slice and ordering the
> partitions by the index desc. Oracle ignores the index, fetches a load
> a data and throws away much. Slower.
>
> e) Simulating spatial/2-d space-filling curves indexes.
>
> Given that most slices are small and that selecting by subsets of the
> slices is most important, this seemed to be not worth the effort. The
> data would need to be reindexed by x, y first then slice. Genuine DBMS
> r-tree etc. indexes are hard to come by and space-filling curves
> (fitting a 2-d index into 1-d by mixing together alternating bits of x
> and y) give bad behaviour, require an extra column to index and
> application changes.
>
> I would really like a solution to a), or else get c) to be as fast as
> b) or as a last resort get the right query plan with d) so that the
> Oracle just skips through the index as it is supposed to.
>
> Any help appreciated.
>
> Giuliano.
I understand a bit what you are trying to accomplish. Maybe an index
on:
SLICE, X DESC, Y DESC would help?
Thanks for posting the SQL to generate the table. I used it to create
a test case with 4,000,000 rows:
INSERT INTO
POINTS
SELECT
TRUNC(DBMS_RANDOM.VALUE(1,1000001)) SLICE, TRUNC(DBMS_RANDOM.VALUE(0,10000)) X, TRUNC(DBMS_RANDOM.VALUE(0,10000)) Y, ROUND(DBMS_RANDOM.VALUE(0,10000),4) VALUEFROM
If I then try a query like this:
SELECT
SLICE,
X,
Y,
VALUE,
RANK() OVER (PARTITION BY SLICE ORDER BY X DESC, Y DESC) RESULT_ROW
FROM
POINTS
WHERE
SLICE<=100 AND X<=10000 AND Y<=1000;
The following is returned:
SLICE X Y VALUE RESULT_ROW ========== ========== ========== ========== ==========
2 7559 542 4900 1 6 7967 791 1908 1 7 8147 531 3496 1 9 7343 502 3437 1 9 4869 187 2916 2 12 2983 428 9978 1 14 8796 691 1600 1 17 2157 642 5857 1 19 9754 234 1133 1 20 3135 214 6066 1 21 411 611 1180 1 25 9621 902 6387 1 27 9740 459 9443 1 28 4389 266 1940 1 37 6149 266 642 1 38 7715 985 7287 1 38 3348 16 5731 2 40 6682 978 6596 1 40 119 298 2134 2 45 9940 113 5355 1 53 6735 574 4786 1 53 6623 778 3075 2 55 4746 394 248 1 58 455 13 3542 1 61 4209 29 6171 1 61 2986 831 9756 2 65 1130 392 591 1 66 5145 759 9949 1 70 7041 944 2744 1 70 2181 727 5496 2 77 6972 271 5758 1 77 5201 709 94 2 78 6423 872 1761 1 78 5229 544 4732 2 86 1458 219 539 1 89 3973 500 1758 1 91 1800 495 9315 1 91 142 695 9479 2 94 3860 418 3860 1 96 1928 39 7027 1
40 ROWS SELECTED
If we can then eliminate all result rows, except those with
RESULT_ROW=1, we will get close to the answer that you are looking for:
SELECT
SLICE,
VALUE
FROM
(SELECT
SLICE,
X,
Y,
VALUE,
RANK() OVER (PARTITION BY SLICE ORDER BY X DESC, Y DESC) RESULT_ROW
FROM
POINTS
WHERE
SLICE<=100 AND X<=10000 AND Y<=1000)
2 4900 6 1908 7 3496 9 3437 12 9978 14 1600 17 5857 19 1133 20 6066 21 1180 25 6387 27 9443 28 1940 37 642 38 7287 40 6596 45 5355 53 4786 55 248 58 3542 61 6171 65 591 66 9949 70 2744 77 5758 78 1761 86 539 89 1758 91 9315 94 3860 96 7027
31 ROWS SELECTED
Now, how can we use another table to determine which slices are
returned?
CREATE TABLE ALL_SLICES (
SLICE NUMBER(10));
INSERT INTO
ALL_SLICES
SELECT
TRUNC(DBMS_RANDOM.VALUE(1,1000001)) SLICE
FROM
some_table
WHERE
ROWNUM<=100000;
Something like this might work:
SELECT
SLICE,
VALUE
FROM
(SELECT
P.SLICE,
VALUE,
RANK() OVER (PARTITION BY P.SLICE ORDER BY X DESC, Y DESC)
RESULT_ROW
FROM
POINTS P,
ALL_SLICES A
WHERE
P.SLICE=A.SLICE
AND X<=10000
AND Y<=1000)
WHERE
RESULT_ROW=1;
Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc.
Received on Thu Nov 30 2006 - 10:09:18 CST