Home » SQL & PL/SQL » SQL & PL/SQL » Assigning an ID (ORACLE 11.2.0.3.0)
Assigning an ID [message #576420] Mon, 04 February 2013 15:30 Go to next message
tmcallister
Messages: 98
Registered: December 2007
Member
I have a table with 200k entries, which has ~12k unique combination of columns that I wish to assign an ID to. My real world problem is with addresses and street components (of which nulls are allowed); but this should be a reasonable example.

DROP TABLE tt;

CREATE TABLE tt AS
  SELECT       ROWNUM id,
               CASE
                 WHEN MOD(ROWNUM, 107) = 0 THEN NULL
                 ELSE MOD(ROWNUM, 101)
               END
                 c1,
               CASE
                 WHEN MOD(ROWNUM, 67) = 0 THEN NULL
                 ELSE MOD(ROWNUM, 109)
               END
                 c2,
               CAST(NULL AS NUMBER) sid
  FROM         DUAL
  CONNECT BY   LEVEL < 200000;

CREATE SEQUENCE sid_seq
  MINVALUE 1
  START WITH 317;


Now I have 3 ways to assigned an ID (fill the SID column).

The first is congruent with the method most commonly used, but seems especially slow in this case. I think this is due to the COALESCE/TO_CHAR usage. Can this be optimized?

DECLARE
  CURSOR c1 IS
    SELECT   a.*, sid_seq.NEXTVAL sid
    FROM     (SELECT DISTINCT c1, c2 FROM tt) a;
BEGIN
  FOR s IN c1 LOOP
    UPDATE   tt d
    SET      sid   = s.sid
    WHERE    COALESCE(TO_CHAR(d.c1), '~') = COALESCE(TO_CHAR(d.c1), '~')
    AND      COALESCE(TO_CHAR(d.c2), '~') = COALESCE(TO_CHAR(d.c2), '~');
  END LOOP;
END;


The second I've made relies on the usage of a temp table, which I really don't like, but it is faster; so that is something at least.

CREATE TABLE ttt AS
  SELECT   a.*, sid_seq.NEXTVAL sid
  FROM     (SELECT DISTINCT c1, c2 FROM tt) a;
         
MERGE INTO   tt d
USING        (SELECT c1, c2, sid FROM ttt) s
ON           (COALESCE(TO_CHAR(d.c1), '~') = COALESCE(TO_CHAR(s.c1), '~')
AND           COALESCE(TO_CHAR(d.c2), '~') = COALESCE(TO_CHAR(s.c2), '~'))
WHEN MATCHED THEN
  UPDATE SET d.sid   = s.sid;


The third discards the sequence, and uses dense_rank. This allows us to use the ID (which might have an index, whereas c1 and c2 will not).

MERGE INTO   tt d
USING        (SELECT id, (SELECT COALESCE(MAX(sid), 317) FROM tt) + DENSE_RANK() OVER (ORDER BY c1, c2) sid FROM tt) s
ON           (d.id = s.id)
WHEN MATCHED THEN
  UPDATE SET d.sid   = s.sid


The ideal solution would use the SEQUENCE, not use a temporary table and match/merge/update based on ID, and be 'fast'; but I cannot conceive of such a solution; does one exist? What would you do?




[Updated on: Mon, 04 February 2013 16:41]

Report message to a moderator

Re: Assigning an ID [message #576426 is a reply to message #576420] Mon, 04 February 2013 17:34 Go to previous messageGo to next message
tmcallister
Messages: 98
Registered: December 2007
Member
I found a solution; but I'm still looking for better solutions.

Using the cursor (option #1), and adding the following INDEX

CREATE INDEX tt_idx
  ON tt(COALESCE(TO_CHAR(c1), '~'), COALESCE(TO_CHAR(c2), '~'));


Completes in a reasonable amount of time, it still runs 200-300% longer than the other options; but that only equates to 2-4s. I still have to add and remove the index for this single statement, but it's feasible.

EDIT: Unfortunately in my real world case (which compares 5 values rather than 2) this still results in unsatisfactory performance. (3 minutes as opposed to 2 seconds for other options)

[Updated on: Mon, 04 February 2013 17:45]

Report message to a moderator

Re: Assigning an ID [message #576485 is a reply to message #576426] Tue, 05 February 2013 09:23 Go to previous message
tmcallister
Messages: 98
Registered: December 2007
Member
Going with option #2 (Create temp table with sequence, merge in); but I was really hoping there was a trick I was missing!
Previous Topic: exception with merge
Next Topic: Report with Date Range
Goto Forum:
  


Current Time: Fri Dec 19 20:38:26 CST 2014

Total time taken to generate the page: 0.12513 seconds