Re: A table with 4000 partitions

From: Vsevolod Afanassiev <vafanassiev_at_aapt.com.au>
Date: 30 Jan 2002 00:09:39 -0800
Message-ID: <4f7d504c.0201300009.211f8996@posting.google.com>

Daniel,

About a book of world records: if Oracle says that a table may have up to 64,000 partitions, isn't it worth trying? The idea is to use the extreme example to find out the effects that may not be so obvious for a table with, let's say, 1000 partitions.
Real-life scenario: a system continuously receives incoming files from 10 different sources (in the same format), we may get 10 - 20 files per minute. Files are of varying size, from one to more than 20,000 records, with record size around 200 bytes. The task is to load them in the database and display results to the users *as soon as possible*. The shorter is the time interval between the arrival of the file and the availability of the data in the database - the better (currently it is in 2 - 3 minutes range). Disk space limitation (200 GB) allows us to keep about 20 weeks worth of data, after 20 weeks the information can be deleted (truncated). Users mostly look at records received during the last 2 hours. Solution: a table partitioned by source (10) and day (366), this gives 3660 partitions. All partitions are pre-created, and incoming files have "day of the year" field. Each partition has small INITIAL and much bigger NEXT extent, so empty partitions do not take much space. 10 cron jobs (one for each source) run every minute and load incoming files using SQL*Loader direct mode. Each incoming file contains data for a short time interval, so in most cases it can be loaded in one partition, so loads for different sources can run in parallel. The table has no indexes as SQL*Loader puts indexes (or their partitions) in UNUSABLE state for a period of time after direct load, and during this period of time queries that use indexes return error. As all table access is full-scan based, reducing partition size is one of the ways of speeding up queries, and most user queries access 1 - 2 partitions. I think this solution is optimal.
Management nightmare: as we re-use partitions, we don't need to run partition creation script, the same applies to datafiles. The only management I do is to run partition truncation script. Occasionally, I need to add datafiles (or reduce retention period) to account for the small increase in the volume of the incoming traffic.
Incredibly long DDL to store for version control: with exception of CREATE TABLE, I don't have any other script that lists all partitions (not exactly true - I had to add new sources, and to to this, I had to run SPLIT PARTITION script. This took a while, but it wasn't a big problem because only one partition (for each source) is being loaded at each point in time, so I simply excluded current partitions from the split script, and run split for them on the next day).

I was mostly looking for the information on the internal working of Oracle for processing large number of partitions - for example, is using bind variables on partition key columns helps to reduce parse times? Or Oracle has to re-parse such statements because of the different partition elimination? I noted that partition elimination (pruning) may happen at the stage of parsing or executing: see Oracle8i Release 2 (8.1.6) Designing and Tuning for Performance manual, page 5-6 (explanation of possible values in PARTITION_START and PARTITION_STOP columns of EXPLAIN_PLAN table).

Generally, I feel that if the slow parse problem can be eliminated, a table with 20,000 - 40,000 partitions and no indexes may be the best solution in warehouse-type environment with simultaneous loads and queries - as long as users are not particularly unhappy about the fact that each query takes at least 10 seconds (we run this application on Sun 6500 with 14 CPUs and 8 GB RAM). A big advantage is that without indexes any column can be used in the WHERE clause, with or without the loading characters (i.e. you can search on 'Sev%' or '%Sev%' with the same performance).

Regards,
Sev

"Daniel A. Morgan" <damorgan_at_exesolutions.com> wrote in message news:<3C558194.EBF174D6_at_exesolutions.com>...
> Possible additional disadvantage:
>
> 4. Management nightmare.
> 5. Incredibly long DDL to store for version control
>
> Is the point getting into a book of world records or is there some reason
> to believe it will actually accomplish something related to a real
> problem? That's the question I'd ask.
>
> Daniel Morgan
>
>
>
>
>
> Vsevolod Afanassiev wrote:
>
> > What are possible advantages/disadvantages of using very large number
> > of partitions, let's say, 4000 or more, in DSS environment?
> > I see following
> > Advantages:
> > 1. Reduces the need for indexes, and potentially no indexes at all
> > 2. More flexibility in removing information by using "alter table
> > truncate partition"
> > 3. It is possible to run several direct loads in parallel in
> > differet partitions
> > 4. More flexibility with tablespaces/datafiles
> >
> > Disadvantages:
> > 1. Bigger shared pool
> > 2. Higher ENQUEUE_RESOURCES
> > 3. Some operations are slower, for example, snapshot refresh
> >
> > Anything else?
> >
> > Again, I am talking about DSS environment, with big "historic facts"
> > table
> > being gradually populated from one end (let's say by direct loads)
> > and truncated at the other end, the information is never updated,
> > users only access this table by running reports, so there is never a
> > need
> > to return just a few rows.
> >
> > I saw a note on the Metalink that says that the maximum number of
> > partitions
> > per table is 64,000.
> >
> > Thanks,
> > Sev
Received on Wed Jan 30 2002 - 02:09:39 CST