Oracle FAQ Your Portal to the Oracle Knowledge Grid

Home -> Community -> Mailing Lists -> Oracle-L -> How I save Cingular Wireless USD 30M

How I save Cingular Wireless USD 30M

From: Tom Pall <>
Date: Fri, 24 Aug 2007 14:13:56 -0500
Message-ID: <>

I was asked to look into a database which was deadlocking. This was one of dozens of databases which handled switch data. Nothing in the telecoms is small. These were up to 17 TB databases. And they were on 8i and 9i and dictionary managed.

I looked and looked. I could find a reason why the database was deadlocking except that it was running more slowly than before (vendor application which had not been changed). I opened up an iTAR with Oracle and of course got back the usual "How to tune a database" and "What causes deadlocks" citations. I was infuriated. I've been with Oracle since Oracle 2, so I didn't need this.

I thought and thought. I ruminated. I searched metalink. In a dream one night I had the thought "data dictionary corruption". No kidding. In a dream. So I searched Metalink and found script to report data dictionary corruption. I ran it and found that the database that was suffering from deadlocks had an amazing number of orphaned index segments. I called up Oracle, got eventually able to pull rank with Cingular's rep and got to talk to the head of Oracle Support and some internal gurus. Yes, they said, we've checked with developers and most likely internal errors (not one of the ora-xxxx ones) was probably the culprit. They told me to export the database. I told them to shove it, as I timed what an export/import would take on this 24 X 7 realtime database: 12 months. Oracle support said that they could NOT work with me to fix these orphaned segments and what later became 7 other types of corruption I discovered. So now we're at the VP - VP level. Finally Oracle gives in and allows me to fix the data dictionary /and keep the database and Oracle's support of it/. I fixed it by getting everyone off, putting it into restricted mode, running DML against that data dictionary, committing than ABORTING the database. Abort is necessary because the data dictionary doesn't follow the SCNs the way normal DML does. So I restarted the database and SMON pegged at 90%. After a couple of hours, it was over. I unrestricted the database, let the application in and the database was running 4X faster than it had previously run. We had one database that was so highly corrupted that Oracle Support and I gave it 4 months to live. So we cloned the database and kept feeds from the previous database. I spent Christmas and New Years two weeks a couple of years ago, with the flu and a fever of 102 F working 14 hours home, getting it ready to un-corrupt. When I ran the DML, committed and bounced the

database, SMON spun at 90% for 6 weeks straight.  Eventually it stopped
spinning, everything was fixed, I got the old feed into the database and got
it ready, two months before the 4 month death sentence the then new
department of Oracle Support Internals I caused to come into being gave the parent database to live. I announced that the new clone was ready and open for business, but of course no one paid attention until the day the parent database closed down and refused to open. There was a mad rush to certify the new database.

All in all, I saved Cingular approximately USD 30M in not losing these very vital databases and not having to continue to buy yet more hardware as they had been doing for years before this DBA cane on the scene. I achieved increase in speed of the Oracle Engine from 2-10X what it was previously.

What exactly did I do to fix the corruption. Sorry, but that is Oracle Confidential. If you have this problem I can give you names within Oracle but go no further. Yes, we eventually were able to convert to DMT (after even more corruption fixing) and now the problem is gone.

Tom Pall
Oracle DBA

Received on Fri Aug 24 2007 - 14:13:56 CDT

Original text of this message