Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: SQL frustration - cannot make "obvious" query work well

Re: SQL frustration - cannot make "obvious" query work well

From: Charles Hooper <hooperc2000_at_yahoo.com>
Date: 30 Nov 2006 08:09:18 -0800
Message-ID: <1164902958.358467.144320@j44g2000cwa.googlegroups.com>


giuliano.procida_at_googlemail.com wrote:
> I have a fairly large data set, consisting of slices of points
> (essentially in 2-d). Around 100million points in 1million slices.
> Points/slice is very skewed (very many with < 10 points, very few >
> 10000 points).
>
> create table points (
> slice number(10) not null,
> x number(10) not null,
> y number(10) not null,
> value number(10) not null,
> constraint pky primary key (slice, x, y)
> ) -- organization index -- not currently an IOT, but should be
>
> It so happens that each slice is a meet semi-lattice (i.e, max(x),
> max(y) over any non-empty subset of a slice is also in the slice).
>
> Typical access is to look at subset of slices (values from another
> query) and for each slice, look for the the top-right point in a
> bottom-left quarter-plane and return the slice and value.
>
> The old query selected max(x) for a slice, and max(y) for a slice[,
> max(x)] and finally the row itself. Doing this in bulk for many slices
> involves plans with expensives scans and sorts.
>
> There is a very efficient query to fetch a value for a single slice:
>
> SELECT slice, value
> FROM (
> SELECT slice, value
> FROM points
> WHERE slice = :slice
> AND x <= :x
> AND y <= :y
> ORDER BY x DESC, y DESC
> )
> WHERE ROWNUM = 1
>
> The query plan uses the index, and a stopkey.
>
> The query cannot be rewritten to work with subsets of slices:
>
> SELECT slice, precious_data
> FROM
> all_slices,
> (
> SELECT slice, value
> FROM (
> SELECT slice, value
> FROM points
> WHERE slice = all_slices.slice
> AND x <= :x
> AND y <= :y
> ORDER BY y DESC, x DESC
> )
> WHERE ROWNUM = 1
> )
>
> -- ORA-00904: "ALL_SLICES"."SLICE": invalid identifier
>
> The nested, nested query cannot see the top-level table. This is very
> annoying.
>
> I have tried:
>
> a) Getting around the 00904 limitation somehow. No luck. This exact
> issue has popped up to asktom etc. a few times, no solution seems
> available.
>
> b) Writing a pipelined function that PIPEs out the single rows returned
> by the single-slice query, looping over slices returning by a driving
> query.
>
> This works a treat, over double the performance of the equivalent query
> using max(x) subselects, but the driving query varies a lot in the
> application (50 different queries => 50 functions to put in a package).
>
> c) Writing a version of the pipelined function that takes a cursor
> returning slices.
>
> I thought this would be flexible and as fast as b) but performance is
> worse than the original max(x), max(y) query.
>
> d) Using Oracle analytics.
>
> Selecting first_value, partitioning by slice and ordering the
> partitions by the index desc. Oracle ignores the index, fetches a load
> a data and throws away much. Slower.
>
> e) Simulating spatial/2-d space-filling curves indexes.
>
> Given that most slices are small and that selecting by subsets of the
> slices is most important, this seemed to be not worth the effort. The
> data would need to be reindexed by x, y first then slice. Genuine DBMS
> r-tree etc. indexes are hard to come by and space-filling curves
> (fitting a 2-d index into 1-d by mixing together alternating bits of x
> and y) give bad behaviour, require an extra column to index and
> application changes.
>
> I would really like a solution to a), or else get c) to be as fast as
> b) or as a last resort get the right query plan with d) so that the
> Oracle just skips through the index as it is supposed to.
>
> Any help appreciated.
>
> Giuliano.

I understand a bit what you are trying to accomplish. Maybe an index on:
SLICE, X DESC, Y DESC would help?

Thanks for posting the SQL to generate the table. I used it to create a test case with 4,000,000 rows:
INSERT INTO
  POINTS
SELECT

  TRUNC(DBMS_RANDOM.VALUE(1,1000001)) SLICE,
  TRUNC(DBMS_RANDOM.VALUE(0,10000)) X,
  TRUNC(DBMS_RANDOM.VALUE(0,10000)) Y,
  ROUND(DBMS_RANDOM.VALUE(0,10000),4) VALUE
FROM
  some_table
WHERE
  ROWNUM<=4000000;

If I then try a query like this:
SELECT
  SLICE,
  X,
  Y,
  VALUE,
  RANK() OVER (PARTITION BY SLICE ORDER BY X DESC, Y DESC) RESULT_ROW FROM
  POINTS
WHERE

  SLICE<=100
  AND X<=10000
  AND Y<=1000;

The following is returned:

     SLICE X Y VALUE RESULT_ROW ========== ========== ========== ========== ==========

	 2	 7559	     542       4900	     1
	 6	 7967	     791       1908	     1
	 7	 8147	     531       3496	     1
	 9	 7343	     502       3437	     1
	 9	 4869	     187       2916	     2
	12	 2983	     428       9978	     1
	14	 8796	     691       1600	     1
	17	 2157	     642       5857	     1
	19	 9754	     234       1133	     1
	20	 3135	     214       6066	     1
	21	  411	     611       1180	     1
	25	 9621	     902       6387	     1
	27	 9740	     459       9443	     1
	28	 4389	     266       1940	     1
	37	 6149	     266	642	     1
	38	 7715	     985       7287	     1
	38	 3348	      16       5731	     2
	40	 6682	     978       6596	     1
	40	  119	     298       2134	     2
	45	 9940	     113       5355	     1
	53	 6735	     574       4786	     1
	53	 6623	     778       3075	     2
	55	 4746	     394	248	     1
	58	  455	      13       3542	     1
	61	 4209	      29       6171	     1
	61	 2986	     831       9756	     2
	65	 1130	     392	591	     1
	66	 5145	     759       9949	     1
	70	 7041	     944       2744	     1
	70	 2181	     727       5496	     2
	77	 6972	     271       5758	     1
	77	 5201	     709	 94	     2
	78	 6423	     872       1761	     1
	78	 5229	     544       4732	     2
	86	 1458	     219	539	     1
	89	 3973	     500       1758	     1
	91	 1800	     495       9315	     1
	91	  142	     695       9479	     2
	94	 3860	     418       3860	     1
	96	 1928	      39       7027	     1

40 ROWS SELECTED If we can then eliminate all result rows, except those with RESULT_ROW=1, we will get close to the answer that you are looking for: SELECT
  SLICE,
  VALUE
FROM
  (SELECT
    SLICE,
    X,
    Y,
    VALUE,
    RANK() OVER (PARTITION BY SLICE ORDER BY X DESC, Y DESC) RESULT_ROW   FROM
    POINTS
  WHERE

    SLICE<=100
    AND X<=10000
    AND Y<=1000)

WHERE
  RESULT_ROW=1;      SLICE VALUE
========== ==========
	 2	 4900
	 6	 1908
	 7	 3496
	 9	 3437
	12	 9978
	14	 1600
	17	 5857
	19	 1133
	20	 6066
	21	 1180
	25	 6387
	27	 9443
	28	 1940
	37	  642
	38	 7287
	40	 6596
	45	 5355
	53	 4786
	55	  248
	58	 3542
	61	 6171
	65	  591
	66	 9949
	70	 2744
	77	 5758
	78	 1761
	86	  539
	89	 1758
	91	 9315
	94	 3860
	96	 7027

31 ROWS SELECTED Now, how can we use another table to determine which slices are returned?
CREATE TABLE ALL_SLICES (
  SLICE NUMBER(10)); INSERT INTO
  ALL_SLICES
SELECT
  TRUNC(DBMS_RANDOM.VALUE(1,1000001)) SLICE FROM
  some_table
WHERE
  ROWNUM<=100000;

Something like this might work:
SELECT
  SLICE,
  VALUE
FROM
  (SELECT
    P.SLICE,
    VALUE,
    RANK() OVER (PARTITION BY P.SLICE ORDER BY X DESC, Y DESC) RESULT_ROW
  FROM
    POINTS P,
    ALL_SLICES A
  WHERE
    P.SLICE=A.SLICE
    AND X<=10000
    AND Y<=1000)
WHERE
  RESULT_ROW=1; Charles Hooper
PC Support Specialist
K&M Machine-Fabricating, Inc. Received on Thu Nov 30 2006 - 10:09:18 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US