SQL frustration - cannot make "obvious" query work well

From: <giuliano.procida_at_googlemail.com>
Date: 30 Nov 2006 05:37:57 -0800
Message-ID: <1164890406.726998.236320@j44g2000cwa.googlegroups.com>

I have a fairly large data set, consisting of slices of points
(essentially in 2-d). Around 100million points in 1million slices.
Points/slice is very skewed (very many with < 10 points, very few > 10000 points).

create table points (

   slice number(10) not null,
   x number(10) not null,
   y number(10) not null,
   value number(10) not null,
   constraint pky primary key (slice, x, y) ) -- organization index -- not currently an IOT, but should be

It so happens that each slice is a meet semi-lattice (i.e, max(x), max(y) over any non-empty subset of a slice is also in the slice).

Typical access is to look at subset of slices (values from another query) and for each slice, look for the the top-right point in a bottom-left quarter-plane and return the slice and value.

The old query selected max(x) for a slice, and max(y) for a slice[, max(x)] and finally the row itself. Doing this in bulk for many slices involves plans with expensives scans and sorts.

There is a very efficient query to fetch a value for a single slice:

SELECT slice, value
FROM (
    SELECT slice, value
    FROM points
    WHERE slice = :slice
    AND x <= :x
    AND y <= :y
    ORDER BY x DESC, y DESC
)
WHERE ROWNUM = 1 The query plan uses the index, and a stopkey.

The query cannot be rewritten to work with subsets of slices:

SELECT slice, precious_data
FROM
  all_slices,
  (
    SELECT slice, value
    FROM (

        SELECT slice, value
        FROM points
        WHERE slice = all_slices.slice
        AND x <= :x
        AND y <= :y
        ORDER BY y DESC, x DESC

    )
    WHERE ROWNUM = 1
  )

ORA-00904: "ALL_SLICES"."SLICE": invalid identifier

The nested, nested query cannot see the top-level table. This is very annoying.

I have tried:

Getting around the 00904 limitation somehow. No luck. This exact issue has popped up to asktom etc. a few times, no solution seems available.
Writing a pipelined function that PIPEs out the single rows returned by the single-slice query, looping over slices returning by a driving query.

This works a treat, over double the performance of the equivalent query using max(x) subselects, but the driving query varies a lot in the application (50 different queries => 50 functions to put in a package).

c) Writing a version of the pipelined function that takes a cursor returning slices.

I thought this would be flexible and as fast as b) but performance is worse than the original max(x), max(y) query.

d) Using Oracle analytics.

Selecting first_value, partitioning by slice and ordering the partitions by the index desc. Oracle ignores the index, fetches a load a data and throws away much. Slower.

e) Simulating spatial/2-d space-filling curves indexes.

Given that most slices are small and that selecting by subsets of the slices is most important, this seemed to be not worth the effort. The data would need to be reindexed by x, y first then slice. Genuine DBMS r-tree etc. indexes are hard to come by and space-filling curves
(fitting a 2-d index into 1-d by mixing together alternating bits of x
and y) give bad behaviour, require an extra column to index and application changes.

I would really like a solution to a), or else get c) to be as fast as b) or as a last resort get the right query plan with d) so that the Oracle just skips through the index as it is supposed to.

Any help appreciated.

Giuliano. Received on Thu Nov 30 2006 - 07:37:57 CST