Home » RDBMS Server » Performance Tuning » Processing lots of data
Processing lots of data [message #267636] Fri, 14 September 2007 01:12 Go to next message
scottwmackey
Messages: 505
Registered: March 2005
Senior Member
I have a fairly large scope question for which I was wondering if there were any people out there who knew for certain (no guesses and theorizing please, we have plenty of that already in our little group) whether it could be done. We have an application that has several processes which may have to insert and/or update, with a little processing here or there, up to a billion rows a day. We are told we can get whatever hardware we need. We supposedly have the time to design the DB in anyway we need. But we were wondering if there was anything we can really do. For instance, we have one staging table which will have records batch inserted four records at a time, and immediately committed, from 30 or so java threads for a total of one billion records a day. Can we get around the wait problems that creates by having 30 partitions on the table (We can hash partition on a key to guarantee that the sizes of the partitions are similar.)? Do they each have their own high-water marks? Would putting the partitions all in separate tablespaces on different disks get around read/write waits? We then need to move the data from the staging tables to final tables. Would it maximize use of CPUs if we created several jobs, say through dbms_job, to process pre-assigned partitions of the staging table? If the staging table partitioning helps with IO, we would obviously follow a similar strategy for the final tables (they will hold probably 95% of the rows in the staging table). In short, is it possible to design an Oracle application that spreads the load across x number of CPUs, and y number of disks so that you can process whatever number of rows one needs just by adding enough CPUs and disks? Can Oracle and Linux (or some other UNIX variety) be configured as such? Now once again, I am not asking how to do it, but if anybody wants to volunteer some specifics I certainly wouldn't complain. I am just wondering if any knows for certain whether it can or cannot be done. If it can, then I will have to do more research to find out exactly how. If it can't, then I'm not going to waste my time.

Thanks,
Scott
Re: Processing lots of data [message #267646 is a reply to message #267636] Fri, 14 September 2007 01:35 Go to previous messageGo to next message
Michel Cadot
Messages: 64103
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Scott,
Maybe next time you may also remove all spaces.

Regards
Michel

[Updated on: Fri, 14 September 2007 01:35]

Report message to a moderator

Re: Processing lots of data [message #267760 is a reply to message #267646] Fri, 14 September 2007 14:08 Go to previous messageGo to next message
scottwmackey
Messages: 505
Registered: March 2005
Senior Member
I considered putting in some random paragraph breaks for the folks on the forum who don't read so well, but then I figured that anybody who was smart enough to give me an answer would be able to read.
Re: Processing lots of data [message #267764 is a reply to message #267760] Fri, 14 September 2007 14:19 Go to previous messageGo to next message
Michel Cadot
Messages: 64103
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
English is not my natural language.
Breaking rules are different in different languages and it is hard enough to read when one helps you to know where breaks are.
I could also make the effort to read and understand your question and answer you in French saying that if could not read it in French then you are not smart enough to understand it.

For the moment, you have no answer. I wonder if your method is a good one.

Regards
Michel

[Updated on: Fri, 14 September 2007 14:19]

Report message to a moderator

Re: Processing lots of data [message #267954 is a reply to message #267764] Sun, 16 September 2007 23:05 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Can I rephrase your question?

Ross reckons Scott said
Is Oracle infinitely scalable given infinite CPUs and disks? If so, how do you go about it? And if not, what will break first, and what can be done to prolong the break-point?


A multi-treaded application such as this will suffer from (some of) the same problems as a heavily used OLTP application. You have already outlined how you might isolate each thread to avoid block-contention. What about the shared resources though, such as the database writer and log writer? I'm a bit out of my depth here (not a DBA) but I understand that there are some necessarily serialized actions that multi-threading could be limited by.

Here's what I'd do:

Of these Billion rows, you may or may not have inter-row dependencies (say, an aggregation or a previous-row-compare). Say for example you need all rows with the same ACCOUNT_NO to be processed in the same logic-stream. Work out how many threads you want/need (N), and assign transactions to threads based on a hash of ACCOUNT_NO mod N. This way all transactions for account 20 will hash to the same value, and be processed by the same thread.

Program your listeners to receive a transaction, hash the ACCOUNT_NO, and push the transaction onto the correct queue. Note that the number of listeners and the number of threads is not neccessarily the same. Also, if you have no inter-transaction dependency then you can bypass this step - just let each listener be its own thread.

Now spread the N threads over X databases on Y boxes. Each of the X databases must carry a replicated copy of reference data needed by the transformation - this is easily done with MVs or replication.

Stage each thread common to a database into a separate partition hashed on ACCOUNT_NO. When you are ready to transform from the staging tables, use a single partition-wise parallel DML to transform into identically partitioned tables, but with all partitions in the same tablespace.

Now you have X tables (one for each database), hash-partitioned by ACCOUNT_NO. Use transportable tablespaces to ship them all to a common database, and then exchange-partition into a common sub-partitioned table. Use a technology that allows you to swing the disk between machines to prevent having to copy the data.

For example, say you have 30 listeners receiving a total of 1B transactions per day. Benchmarking tells you that - based on average processing speeds, you need 64 threads to stage the data. The Unix boxes you choose (based on price, of course) turn out to handle 4 databases each (based on available memory), and each database handles 4 threads (based on contention for shared resources). With each machine handling 16 threads, you therefore need 4 such machines.

Theoretically, scalability is now limited by only TWO things:
- The number of listeners you can create at the front-end.
- The amount of time it takes to EXCHANGE PARTITION at the tail end of the job - this would be serialized.

Ross Leishman
Re: Processing lots of data [message #268134 is a reply to message #267954] Mon, 17 September 2007 12:56 Go to previous message
scottwmackey
Messages: 505
Registered: March 2005
Senior Member
Ross,

You reckon correctly. Thank you ever so much for such a detailed response. I am indebted.

Scott
Previous Topic: Help - Will Reducing cardinality using histogram make query run faster?
Next Topic: Sql taking 2min with order by clause
Goto Forum:
  


Current Time: Sun Dec 04 20:36:23 CST 2016

Total time taken to generate the page: 0.04443 seconds