That flakey little animal called SNAPSHOT -- pl. comment

From: Rajiv Tandon <rajiv.tandon_at_bridge.bellsouth.com>
Date: 1996/08/01
Message-ID: <32016A31.4F49_at_bridge.bellsouth.com>


Hi all:

Who agrees ?

That the best way to describe that flakey little animal called a snapshot is: It's a quick & dirty market response by Oracle to provide an interim

    half-solution while they worked on the Replication Server.

Will it fade into oblivion, soon ?

After all, it's just put-together of a bunch of tables, triggers, and a refresh process, sitting on top of already developed features.

Humour aside, I'd greatly appreciate it if anyone would care to shed some light on the following issues I am trying to get a handle on:

Background



Multiple large Production databases running to 50 Gigs and above (each) on ORACLE 7.1.3/HP T500/HP-UX 9.04 enviroments supporting PeopleSoft Rel 3 applications. Each database is usually on it's own machine.

A master site houses 40 core central tables to be looked-up by other databases. They are currently "snapshotted" over to 2 such databases (later to be 6). Snapshots was the decided mechanism because the Replication Server was still Beta in 7.1.6. Also, because of various other dependencies such as PeopleSoft, and testing needs, we are not in a position to upgrade to higher Oracle versions at the drop of a hat.

When we added the 3rd snapshot site, we created the snapshots, then for an unrelated reason had to trash the site and re-build the database. Snapshots are NOT created yet in this 3rd site.

Problems



Ever since the snapshots were turned on, I haven't slept much. The one problem we could get a fix for was and ORA-600: [17271] Instantiation space leak, for which Oracle has a patch (namely, #275335 generic, #286670 for HP-UX)

When the 3rd remote snapshot was added, an entry was made in DBA_SNAPSHOT_LOGS. But all the entries look darned identical for a given master table, except for the field CURRENT_SNAPSHOTS which is a timestamp of when the remote site was last updated.   

(1) i.e. If you have more than one snapshot of the same master, what distinguishes

    the records in DBA_SNAPSHOT_LOGS from each other, associating them with the remote     database ? Did we botch our data dictionary ? Or is it that there are some fields     in SYS.MLOG$ and SYS.SLOG$ (tables the DBA_SNAPSHOT_LOGS view is based on) which     are taking care of it ?

(2) Along the same line, since I trashed my 3rd site, how can I remove the vagrant

    entries from DBA_SNAPSHOT_LOGS ? I know which ones they are because the     CURRENT_SNAPSHOTS field has the oldest date, but obviously I cannot remove     the records from a data dict. table manually. Would dropping and re-creating     the logs (and subsequently doing complete refreshes on dependent 2 sites) be the     only way or is there a command or package I am unaware of ?

(3) Meanwhile, the actual log tables (MLOG$_tablename) are burgeoning in size, as

    this defunct entry for the 3rd site is still there. So, we'll do a     DBMS_REFRESH.PURGE. Does this do a controlled commit (as are provided     in Import and SQL*Loader through parameters) or will it crap out on me because     of a rollback problem ? i.e. Should we anticipate huge rollback and plan for this ?

(4) Now, every now and again, the snapshot sites get the standard:

    ORA-12012, ORA-12004 and 3 ORA-6512s in the alert logs.     

    We figured out a few, but are there some commonly known reasons ? Documented in a     paper perhaps ?
    (I appologize to have to ask this, rather than look for it myself, as I have     temporarily lost my registration to Oracle's Supportlink since tha Beta one went     away).

(5) At the same time, we find that the master site is kicking ass in spitting out

    ORA-600: [12333] errors in udump during our nightly shutdown for backups.

    The Oracle Cust. Supp. consultant so far believes this is due to some     hanging/defunct processes and advised we used the SQL*Net EXPIRE_TIME feature for     dead-connection detection.

    Firstly, we can't use it as it is a SQL*Net 2.1 feature and PeopleSoft Rel 3 does     not handshake with SQL*Net 2.1. Secondly, the traces clearly reference the     ORACLE_SID of one of the 2 remote sites and some snapshot log stuff.

    I am guessing it is due to the snapshot polling processes colliding with the     shutdown. BTW, the ORACLE_SID of the other site is never in any trace (it     does have a refresh interval of 1 hour while the one mentioned has 5 minutes !!).

    Has anyone experienced ORA-600: [12333] and can share some experience ?     Is there a fix ???

(6) On the same subject, when the master is taken down for backups, the remotes dump

    errors. Understandable, as they must log a down master somewhere. What is a     good way to get around this ? Write a remote procedure call from the     master database to stop "snapshotting" while I am down ? Any thoughts ?

All opinions are mine, etc. etc.

Rajiv Tandon
DBA Consultant
Phoenix Project
BellSouth Received on Thu Aug 01 1996 - 00:00:00 CEST

Original text of this message