RE: Processing new rows in staging table quickly...

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Thu, 18 Apr 2019 12:21:22 -0400
Message-ID: <0e0f01d4f602$c3868ed0$4a93ac70$_at_rsiz.com>



Information life cycle and backtrack auditability requirements might mean a monthly or longer cycle.  

IF you have partitioning, range interval date partitioning can be very useful for this with eventual "unhooking" and exchanging an "old enough" partition somewhere else.  

In the interim, some transaction streams are well suited to having multiple "eligible for" and "completed" date columns with respect to stages of processing, all of which might also be cycled through a FIFO ordering initially in the "status" column that is searched for "work to do." Whether this is merely a staging table or a transaction history, handling them by keeping an index sparse by punching the final value in the "available work" status column to null is at least as old an idea as Oracle 5.  

An the entire notion that NULL doesn't mean anything is an issue for RDBMS compliance with standards. The RDBMS cannot implicitly assign a meaning to a NULL value.  

YOU CAN. And if that is the best way to take advantage of the physical implementation detail that Oracle does in fact make simple columns all null sparse by NOT keeping them in the index is to denote that NULL for a column in your schema MEANS "no work to be done here" then forty whacks with a wet noodle to anyone who tries to preach purity of the relational model to you.  

Purity of the relational model would be not allowing null at all in some schools of thought, and it is a fine weekend for a religious war I won't be a part of.  

Good luck,

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Reen, Elizabeth (Redacted sender "elizabeth.reen" for DMARC) Sent: Thursday, April 18, 2019 10:35 AM
To: 'neil_chandler_at_hotmail.com'; post.ethan_at_gmail.com; niall.litchfield_at_gmail.com
Cc: ORACLE-L
Subject: RE: Processing new rows in staging table quickly...  

                Staging tables should always be cleaned up.  Otherwise they
grow and get unwieldy.  

Liz  

From: [External] oracle-l-bounce_at_freelists.org
<oracle-l-bounce_at_freelists.org> On Behalf Of [External] Neil Chandler
Sent: Thursday, April 18, 2019 6:22 AM
To: post.ethan_at_gmail.com; niall.litchfield_at_gmail.com Cc: ORACLE-L <oracle-l_at_freelists.org> Subject: Re: Processing new rows in staging table quickly...  

I agree about not overloading the date column and also was wondering why there's data being left in the staging table.  

If the PROCESSED column has a default of 'N', and you update that to NULL to mean processed it keeps the index very small and efficient. Remember to document/comment that NULL means processed! Failing that, make sure you have a frequency histogram on the PROCESSED column.  

Neil.  


From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on behalf of niall.litchfield_at_gmail.com <niall.litchfield_at_gmail.com> Sent: 18 April 2019 09:44
To: post.ethan_at_gmail.com
Cc: ORACLE-L
Subject: Re: Processing new rows in staging table quickly...  

The FBI approach will work just fine. Personally, I'd probably do one of the following:  

  • add a "PROCESSED" column accepting 'Y' or 'N' and index/query on that, Mostly for human readability.
  • ask, if this is a "staging" table why are there ever any "processed" rows left in it.

On Thu, Apr 18, 2019 at 12:33 AM <post.ethan_at_gmail.com> wrote:

I have a staging table in which 99% of the rows are going to have a date for stats_processed_time.  

I will need to only select for rows that have not been processed.  

What is the best strategy here? Should I allow nulls and select like below? Or should I put a date, way out in the future, say 2099, and select for that instead and then update to correct data once things are processed? Perhaps allow nulls and a bitmap or FBI solution (I would like to avoid solutions which can't be ported to Postgres.)  

.

where b.access_key=v_access_key

       and b.bucket=v_bucket

       and b.stats_processed_time is null  

-- 

Niall Litchfield
Oracle DBA
http://www.orawin.info

<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.orawin.info&d=DwMF-
g&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=yWMFosURAngbt8VLeJtKLVJGefQxustAZ9UxecV7xpc&m=m vNR112CPT81d7wzQi8ZlOve3BX8yJbEPGJ75mVrpu0&s=5kPvdtx7NNWd2rHRZy66IXxXdzA_St5 reaaNxbPk7OA&e=> -- http://www.freelists.org/webpage/oracle-l
Received on Thu Apr 18 2019 - 18:21:22 CEST

Original text of this message