Home » SQL & PL/SQL » SQL & PL/SQL » Delete 5 million records (oracle 9.2.0.3)
Delete 5 million records [message #404743] Sat, 23 May 2009 02:28 Go to next message
prachij593
Messages: 266
Registered: May 2009
Senior Member
This is the scenario: My client wants me to delete 5 millions record from member table .There is integrity constraints imposed across tables. The database is 24/7.Client will provide the member id's that needs to be removed.

What will be the approach that I should follow? the table contains 100 million of data


Many Thanks in advance
Re: Delete 5 million records [message #404745 is a reply to message #404743] Sat, 23 May 2009 03:43 Go to previous messageGo to next message
Michel Cadot
Messages: 64153
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Just DELETE.

Regards
Michel
Re: Delete 5 million records [message #404788 is a reply to message #404745] Sat, 23 May 2009 16:15 Go to previous message
Kevin Meade
Messages: 2101
Registered: December 1999
Location: Connecticut USA
Senior Member
Since you have a 24/7 requirement, you pretty much have to delete rows one at a time.

Write your self a package to do it. Bunch deletes 100 or 1000 at a time and commit after each batch. Then just start the job and monitor it. The job may run for hours or even days depending upon your cpu power, and the amount of data you must actually delete.

If integrity constraints are getting in the way then you may have to delete children rows too. This implies you need to understand something about their data model so start learning their data model.

You should keep a control table with the ids and mark keys as deleted as you delete them. That way you can easily restart your job when it fails (and it will).

I would suggest also that as part of your delete, you write the rows to an "OLD" table(s) first. That way when you figure out you made a mistake, or your client figures out they gave you bad instructions, you can get the data back without having to go to a restore and disrupting that ever precious 24/7.

One way to speed up the process is to create an id WORK table. Move a group of keys to the work table. Mark rows as deleted in your control table by reading the work table. Then delete rows form the source by reading the work table. You would use an IN clause with plain Jane DML rather that for loop. consider this:

insert into worktable select id from keycontroltable where rownun < 1001 and status = 'not processed';
update keycontroltable set status = 'deleted' where id in (select id from worktable);
insert into saverowtable select * from sourcetable where id in (select id from worktable);
delete from sourcetable where id in (select id from worktable);
delete from worktable;
commit;

The above is just an idea but it does the following:

1) it batches work in 1000 keys at a time. This means if your job fails you only loose the time spent trying to delete the last 1000 rows.
2) it save rows being delete so you can fix problems later.
3) it uses bulk insert/update/delete so there are minimal round trips to the database
4) it is easily restartable because of the control table

Any logic you can write in a for loop you should be able to write like this, so if you need to consider child tables you can just keep adding to the sql stream.

You will have to figure out what you want to do about errors. I suggest that on error mark all rows in the batch as failed rather than deleted, rollback, and then do the next batch.

There are other methods, and others may have different opinions, lets wait and see if anyone else pipes up. Do not forget to put an index on the status column.

Good luck, Kevin
Previous Topic: How to find the from which channel request came
Next Topic: Count records by week
Goto Forum:
  


Current Time: Sun Dec 11 06:17:02 CST 2016

Total time taken to generate the page: 0.07699 seconds