Re: SQL frustration - cannot make "obvious" query work well

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: 30 Nov 2006 08:09:18 -0800
Message-ID: <1164902958.358467.144320@j44g2000cwa.googlegroups.com>

giuliano.procida_at_googlemail.com wrote:
> I have a fairly large data set, consisting of slices of points
> (essentially in 2-d). Around 100million points in 1million slices.
> Points/slice is very skewed (very many with < 10 points, very few >
> 10000 points).
>
> create table points (
> slice number(10) not null,
> x number(10) not null,
> y number(10) not null,
> value number(10) not null,
> constraint pky primary key (slice, x, y)
> ) -- organization index -- not currently an IOT, but should be
>
> It so happens that each slice is a meet semi-lattice (i.e, max(x),
> max(y) over any non-empty subset of a slice is also in the slice).
>
> Typical access is to look at subset of slices (values from another
> query) and for each slice, look for the the top-right point in a
> bottom-left quarter-plane and return the slice and value.
>
> The old query selected max(x) for a slice, and max(y) for a slice[,
> max(x)] and finally the row itself. Doing this in bulk for many slices
> involves plans with expensives scans and sorts.
>
> There is a very efficient query to fetch a value for a single slice:
>
> SELECT slice, value
> FROM (
> SELECT slice, value
> FROM points
> WHERE slice = :slice
> AND x <= :x
> AND y <= :y
> ORDER BY x DESC, y DESC
> )
> WHERE ROWNUM = 1
>
> The query plan uses the index, and a stopkey.
>
> The query cannot be rewritten to work with subsets of slices:
>
> SELECT slice, precious_data
> FROM
> all_slices,
> (
> SELECT slice, value
> FROM (
> SELECT slice, value
> FROM points
> WHERE slice = all_slices.slice
> AND x <= :x
> AND y <= :y
> ORDER BY y DESC, x DESC
> )
> WHERE ROWNUM = 1
> )
>
> -- ORA-00904: "ALL_SLICES"."SLICE": invalid identifier
>
> The nested, nested query cannot see the top-level table. This is very
> annoying.
>
> I have tried:
>
> a) Getting around the 00904 limitation somehow. No luck. This exact
> issue has popped up to asktom etc. a few times, no solution seems
> available.
>
> b) Writing a pipelined function that PIPEs out the single rows returned
> by the single-slice query, looping over slices returning by a driving
> query.
>
> This works a treat, over double the performance of the equivalent query
> using max(x) subselects, but the driving query varies a lot in the
> application (50 different queries => 50 functions to put in a package).
>
> c) Writing a version of the pipelined function that takes a cursor
> returning slices.
>
> I thought this would be flexible and as fast as b) but performance is
> worse than the original max(x), max(y) query.
>
> d) Using Oracle analytics.
>
> Selecting first_value, partitioning by slice and ordering the
> partitions by the index desc. Oracle ignores the index, fetches a load
> a data and throws away much. Slower.
>
> e) Simulating spatial/2-d space-filling curves indexes.
>
> Given that most slices are small and that selecting by subsets of the
> slices is most important, this seemed to be not worth the effort. The
> data would need to be reindexed by x, y first then slice. Genuine DBMS
> r-tree etc. indexes are hard to come by and space-filling curves
> (fitting a 2-d index into 1-d by mixing together alternating bits of x
> and y) give bad behaviour, require an extra column to index and
> application changes.
>
> I would really like a solution to a), or else get c) to be as fast as
> b) or as a last resort get the right query plan with d) so that the
> Oracle just skips through the index as it is supposed to.
>
> Any help appreciated.
>
> Giuliano.

I understand a bit what you are trying to accomplish. Maybe an index on:
SLICE, X DESC, Y DESC would help?

Thanks for posting the SQL to generate the table. I used it to create a test case with 4,000,000 rows:
INSERT INTO
POINTS
SELECT

  TRUNC(DBMS_RANDOM.VALUE(1,1000001)) SLICE,
  TRUNC(DBMS_RANDOM.VALUE(0,10000)) X,
  TRUNC(DBMS_RANDOM.VALUE(0,10000)) Y,
  ROUND(DBMS_RANDOM.VALUE(0,10000),4) VALUE

FROM
some_table
WHERE
ROWNUM<=4000000;

If I then try a query like this:
SELECT
  SLICE,
  X,
  Y,
  VALUE,
  RANK() OVER (PARTITION BY SLICE ORDER BY X DESC, Y DESC) RESULT_ROW FROM
  POINTS
WHERE

  SLICE<=100
  AND X<=10000
  AND Y<=1000;

The following is returned:

SLICE X Y VALUE RESULT_ROW ========== ========== ========== ========== ==========

	 2	 7559	     542       4900	     1
	 6	 7967	     791       1908	     1
	 7	 8147	     531       3496	     1
	 9	 7343	     502       3437	     1
	 9	 4869	     187       2916	     2
	12	 2983	     428       9978	     1
	14	 8796	     691       1600	     1
	17	 2157	     642       5857	     1
	19	 9754	     234       1133	     1
	20	 3135	     214       6066	     1
	21	  411	     611       1180	     1
	25	 9621	     902       6387	     1
	27	 9740	     459       9443	     1
	28	 4389	     266       1940	     1
	37	 6149	     266	642	     1
	38	 7715	     985       7287	     1
	38	 3348	      16       5731	     2
	40	 6682	     978       6596	     1
	40	  119	     298       2134	     2
	45	 9940	     113       5355	     1
	53	 6735	     574       4786	     1
	53	 6623	     778       3075	     2
	55	 4746	     394	248	     1
	58	  455	      13       3542	     1
	61	 4209	      29       6171	     1
	61	 2986	     831       9756	     2
	65	 1130	     392	591	     1
	66	 5145	     759       9949	     1
	70	 7041	     944       2744	     1
	70	 2181	     727       5496	     2
	77	 6972	     271       5758	     1
	77	 5201	     709	 94	     2
	78	 6423	     872       1761	     1
	78	 5229	     544       4732	     2
	86	 1458	     219	539	     1
	89	 3973	     500       1758	     1
	91	 1800	     495       9315	     1
	91	  142	     695       9479	     2
	94	 3860	     418       3860	     1
	96	 1928	      39       7027	     1

40 ROWS SELECTED If we can then eliminate all result rows, except those with RESULT_ROW=1, we will get close to the answer that you are looking for: SELECT
  SLICE,
  VALUE
FROM
  (SELECT
    SLICE,
    X,
    Y,
    VALUE,
    RANK() OVER (PARTITION BY SLICE ORDER BY X DESC, Y DESC) RESULT_ROW   FROM
    POINTS
  WHERE

    SLICE<=100
    AND X<=10000
    AND Y<=1000)

WHERE
RESULT_ROW=1; SLICE VALUE
========== ==========

31 ROWS SELECTED Now, how can we use another table to determine which slices are returned?
CREATE TABLE ALL_SLICES (
  SLICE NUMBER(10)); INSERT INTO
  ALL_SLICES
SELECT
  TRUNC(DBMS_RANDOM.VALUE(1,1000001)) SLICE FROM
  some_table
WHERE
  ROWNUM<=100000;

Something like this might work:
SELECT
  SLICE,
  VALUE
FROM
  (SELECT
    P.SLICE,
    VALUE,
    RANK() OVER (PARTITION BY P.SLICE ORDER BY X DESC, Y DESC) RESULT_ROW
  FROM
    POINTS P,
    ALL_SLICES A
  WHERE
    P.SLICE=A.SLICE
    AND X<=10000
    AND Y<=1000)
WHERE
  RESULT_ROW=1; Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc. Received on Thu Nov 30 2006 - 10:09:18 CST