Re: Processing new rows in staging table quickly...

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Thu, 18 Apr 2019 10:38:22 +0000
Message-ID: <LO2P265MB0415B4144723B93BAD0EA2A2A5260_at_LO2P265MB0415.GBRP265.PROD.OUTLOOK.COM>




where b.access_key=v_access_key
       and b.bucket=v_bucket
       and b.stats_processed_time is null

Assming v_access and v_bucket are bind variables then it would be perfectly reasonable to query like this, using an index (bucket, access_key, stats_processed_time), the order of the first two columns could be switched depending on other access requirements or (all other things being equal) to put the most repetitive column first to maximise the scope for index compression (the name "bucket" suggests many rows per value while the name "access_key" suggests uniqueness - which is why I've chosen the order I showed, and why I would also use "compress 1" as a starting guess.

If you don't want to support an index that is 100 times the size it needs to be but do want the precision of selecting on bucket and access key for non-processed rows then , a two-column function-based index (or index on two virtual columns if Postgres supports that) would be best, e.g.

create index stg_fbi_1 on stage_table(

    case when stats_processed_time is null then bucket end,     case when stats_processed_time is null then access_key end
);

query with the where clase:

where

    case when stats_processed_time is null then bucket end = v_bucket and case when stats_processed_time is null then access_key end = v_access_key

you could also order by bucket and access key and avoid a sort by using

order by

    case when stats_processed_time is null then bucket end,     case when stats_processed_time is null then access_key end

The SQL looks messy - which is why creating virtual columns is the more desirable solution: e.g. alter table staging_table add (

        unprocessed_bucket generated always as case when stats_processed_time is null then bucket end,
        unprocessed_access_key generated always as case when stats_processed_time is null then access_key end

);

create index stg_fbi2 on staging_table(unprocssed_bucket,, unprocessed_access_key); -- compress 1 ?

Regards
Jonathan Lewis



From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on behalf of post.ethan_at_gmail.com <post.ethan_at_gmail.com> Sent: 18 April 2019 00:31:37
To: 'ORACLE-L'
Subject: Processing new rows in staging table quickly...

I have a staging table in which 99% of the rows are going to have a date for stats_processed_time.

I will need to only select for rows that have not been processed.

What is the best strategy here? Should I allow nulls and select like below? Or should I put a date, way out in the future, say 2099, and select for that instead and then update to correct data once things are processed? Perhaps allow nulls and a bitmap or FBI solution (I would like to avoid solutions which can’t be ported to Postgres.)


where b.access_key=v_access_key

       and b.bucket=v_bucket
       and b.stats_processed_time is null
--
http://www.freelists.org/webpage/oracle-l
Received on Thu Apr 18 2019 - 12:38:22 CEST

Original text of this message