Re: Best way to clear stale data from very active audit tables

From: <bobdurie_at_gmail.com>
Date: Tue, 24 Jul 2007 05:09:34 -0700
Message-ID: <1185278974.918439.253010@o61g2000hsh.googlegroups.com>

On Jul 23, 4:03 pm, "Chris L." <diver..._at_uol.com.ar> wrote:
> On Jul 23, 4:36 pm, "bobdu..._at_gmail.com" <bobdu..._at_gmail.com> wrote:
>
>
>
> > On Jul 23, 2:20 pm, "Chris L." <diver..._at_uol.com.ar> wrote:
>
> > > On Jul 23, 12:37 pm, "bobdu..._at_gmail.com" <bobdu..._at_gmail.com> wrote:
>
> > > > Hi there, i have a java application using oracle's jdbc driver that
> > > > has a bunch of audit style tables. We're experiencing significant
> > > > issues with our application when the data within these tables is being
> > > > purged, and i need to know a better method for designing the schema
> > > > (or application) to better handle this situation.
>
> > > > Typically:
> > > > - our "audit" table is created with a bunch of fields, one being
> > > > "auditdate" which is a timestamp with a typical index on it.
> > > > - this table is continually insert'ed into
> > > > - select queries are run against this table on a regular basis
> > > > - once an hour our "purge" job runs that does:
> > > > DELETE FROM audits WHERE auditdate < (CURRENT_TIMESTAMP + INTERVAL
> > > > '-7' DAY(5))
>
> > > > When this table gets upwards of 20million+ records and many inserts
> > > > are happenning concurrently, the purge job hammers the system, and
> > > > causes the subsequent inserts to slow to a crawl, not to mention the
> > > > DELETE command above to start taking upwards of 10+ minutes to run.
>
> > > > I've debated doing a SELECT auditID with the where above and deleting
> > > > the records one by one, but that seems horribly inefficient. Stopping
> > > > the inserts altogether during the purge is not really an option. Can
> > > > someone suggest an alternate approach or design for this scenario?
>
> > > > A few other points:
> > > > - our application has several different "audit" style tables, each of
> > > > them has similar requirements, some get purged every week, some every
> > > > 2 hours. I'm hoping this approach is universal for all because all
> > > > are experiencing similar problems.
> > > > - our application can also run against sql server, which while fast,
> > > > occasionally has deadlock issues doing the above purging, inserting,
> > > > and selecting concurrently... but it is fast. This also means we aim
> > > > to have db agnostic code but i realize this is not always possible.
>
> > > > Thank you for any advice you can offer!!
>
> > > > Bob
>
> > > Is the index on "auditdate" table being used by Oracle?
> > > I tested your statement and came up with "table access full" on the
> > > explain plan. Using "DELETE FROM audits WHERE auditdate < sysdate-7"
> > > instead, came up with "index range scan" on the plan.
>
> > > I'm assuming that the table contains 168 hours (plus the current
> > > fraction) worth of logs and with the DELETE statement you posted
> > > (which runs hourly), you intend to delete 1 of those hours (the
> > > stalest)
>
> > > Kind regards
> > > C.
>
> > Oracle has told me because of the inequality (<) that it cannot use
> > the index, or to create a "function based index". This table doesn't
> > always contain 168 (or whatever) hours of data, this is controlled by
> > the application/administror's purge settings. It is as small as a
> > week, as long as they want, if they want to retain the audits for 4
> > months, they can, we'll still run this delete command every hour
> > (thats right - every hour we run this removing the stale elements).
> > As well, the "retain logs for X weeks" setting is configurable on the
> > fly, so ideally changing it won't result in redesigning the table (but
> > if thats the case then so be it).
>
> > If a partitioned table is the answer, do you think i could set this up
> > from an application, ie via sql? Would the partitioning scheme need
> > to be adjusted all the time, ie by creating new partitions as the days/
> > weeks go by? If anyone has any sample sql i'd love to see it.- Hide quoted text -
>
> > - Show quoted text -
>
> An index won't be used in inequality ( != ) but will most certainly be
> used in your case ( < ). As I said, using "less than sysdate-7"
> caused the index to be used. For some reason, the "> CURRENT_TIMESTAMP
> + INTERVAL '-7' DAY(5)" part is what isn't allowing the index to be
> used.
>
> About partitions I can't offer any tips sorry... other than setting up
> a night job that will both drop the oldest partition (from 7 days ago)
> and create a future partition (for 7 days from now). This would mean
> the purging would happen daily not hourly. (This shouldn't be a
> problem really). I don't know if a partition-by-hour is possible /
> recommended.
>
> Regards
> C.

On our production system, this is what one of the recommendations for our query is (table name is different here, and retention period is one week):

SQL The predicate
SYS_EXTRACT_UTC("PACKETLOGS"."LOGDATE")<SYS_EXTRACT_UTC(CURRENT_TIMESTAMP(6)+INTERVAL'-00007 00:00:00' DAY(5) TO SECOND(0)) used at line ID 2 of the execution plan contains an expression on indexed column "LOGDATE". This expression prevents the optimizer from selecting indices on table "ORACLEUSER"."PACKETLOGS". I don't want to use the SYSDATE as its a DATE and that is NOT going to translate well when the retention period is smaller, ie a few hours (not for this table but for others). What about if i used LOCALTIMESTAMP - i've done a quick scan and it might be because the CURRENT_TIMESTAMP is timezone specific, wheres the logdate is not. I can assume our app's clients all run in the same timezone (hopefully anyways!!!)... i'll give this a shot and i'll let you know what i find. Received on Tue Jul 24 2007 - 07:09:34 CDT