Re: partiioning option not worth it?

From: Mark Townsend <markbtownsend_at_attbi.com>
Date: Fri, 01 Mar 2002 03:44:13 GMT
Message-ID: <B8A4390D.143A4%markbtownsend@attbi.com>

Well not to belabour the point, but some additional comments inline

in article a4kf8.61658$Ah1.7832759_at_news2-win.server.ntlworld.com, Keith Boulton at kboulton_at_ntlworld.com wrote on 2/27/02 10:24 PM:

> As I said, one day I'll sit down with the partitioning manual however...
>
>
> Mark Townsend <markbtownsend_at_attbi.com> wrote in message
> news:B8A2EE6E.13F17%markbtownsend_at_attbi.com...

>> For small amounts of data building your own partitioning scheme IS

> trivial.

>> But how much data are you adressing ?

> The quantity of data is irrelevant. The number of partitions to be supported
> is an issue.

In YOUR situation the quantity of data is irrelevant. For many PO users, it is not. There are many, many sites that will have single tables exceeding 1 Tb in size. Partitioning provides true benefits to these sites in terms of simply dividing and conquering the management costs large databases bring to bear. For instance, with 1 TB of data, how are you going to manage IO balancing across the 1000's of 9 Gb drives you are going to use ? Are you going to back this all up at once ?

>> Doing multiple full table scans on what is basically a union all view is

> easy when you only have 10 Mb of data
> As I said, the point actually is to perform searches on only the relevant
> tables. This is easy to achieve but tedious unless you design your
> application with an architecture in which the runtime query is generated
> from a single component. In any case, this is a good idea.
>

>> address +500 Gb of data in a single table. And your solution doesn't

> support
>

>> secondary indices, prune partitions

> If by pruning partitions you mean excluding tables not required for a query,
> that is exactly what my solution does.

Yes, but in your situation you have a very simple case - either the entire table, or only 1 partition. Say you are partitioning on date - how will you do a month-to-month comparison and only access the months you want. Your solution also doesn't prune partitions on a join, or do a parallel range scan across multiple partitions. With a little data, these are not issues. With screeds of data, these are very big issues. Let me give you a business case

A telco I worked with sunddenly got new competition from an overseas telco company. The new entrant into the market did not own any local networks (they actually rented these from the incumbent telco), but did have a very good international network in place. So part of their marketing campaign was a 5 cent weekend. Every weekend, they would chose a new overseas destination, and let people call these locations for 5 cents a minute.

The incumbent telco had to respond - but they needed to work out what matching this offer would do to their existing business (i.e cannabilization) by looking at the call volumnes over the last 3, 6 and 12 months and determining the probable level of cannabilization. Their existing Billing DW was not on Oracle, and was designed to do 2 things - find a customers monthly spend, or a customers spend over the entire 12 months (i.e the 1 or all partition model you have built). The new query took 48 hours to run. The telco could not match the offer becaue by the time they had worked out the cost, the weekend had already passed.

We took the data and put it up on Oracle8, with range partitioning. Kicked the query of at lunchtime one Friday, and came back in 40 minutes. We had a result (we then spent the rest of the weekend verifying that the query was indeed correct, but thats another story).

So my point - if your business model changes, your design will need to change as well. With partitioning, you will need to change your design less often.

You also conviently left out secondary indexes.

>> determine the best optimizer plan from both table and partition level

> statistics
> Under what circumstances would this be an issue? Almost certainly, if the
> partition key was given as part of the query, a partition level search is
> most efficient.

Not correct - it depends on how many partitions are involved, and the relevant data skew between these partitions. In some circumstances, a parallel range scan could be much more efficient than a partition index probe and lookup, especially in a DW, where most operations are peformed against a set of data, instead of a single row. Stats at the table level will allow the optimizer to determine this.

Alternatively, partition level stats allows the optimizer to determine the best access plan on a partition by partition basis, once again when multiple partitions are involved.

>> perform partition-wise joins or

> I agree this is a more interesting problem, but, as I said, it can be
> addressed by using templated sql statements. In any case, for a reporting
> database when partitioning is most likely to be used, you are, I believe,
> generally better off not joining the partitioned table to anything else, but
> to perform joins only after aggregation for reporting purposes has taken
> place.

Hmm - I'd have to disagree on that one - in fact a typical star schema will preclude this type of operation altogether.

>> prevent you from updating rows so that they no longer blong to the correct

> Check constraint
>

>> (or move the rows if you do).

> Agreed. But then, I believe in most cases you have a bulk dataload with more
> appends than updates. After all, what are you going to partition on?
> Normally it will be a date or business unit.

My point is what if somebody makes a mistake, and a row ends up in the wrong 'table'. Queries from your application would never see this errant row, yet you would quite happily get results (albeit wrong). And unless you went looking for misplaced rows, you would never know you had a problem. Constraints don't address the problem either, becuase in many large VLDBS the constraints are not actually enforced on ayhtning other than the load cycle as they are to expensive. When we used to do manual partitioning before the 7.3 days, this was a huge problem - eventually it would happen and the business would lose rows.

But by all means take this with a grain of salt - YMMV.

>> I would agree that from your description, the partitioning option is way
>> overkill for your problem.

>> But for big sites with vast amounts of data, they
>> simply could not survive without partitioning.

> Again, the quantity of data is irrelevant.
>

No it's not !

>> BTW - I believe the row movement problem will be an issue in your app

> given

>> your description of how you are using codes to determine which partition

> to

>> ship queries to.

> It will not be an issue because it is a record of transactions for business
> units. A transaction does not move from one to another.
>
> I will re-iterate. I'm sure there are circumstances when buying the
> partitioning option (and more expensively, enterprise edition) is cost
> effective and possibly even required. I just don't believe there are many
> such circumstances ie less than 5% of the cases where partitioning is being
> used..
>
>
>
>
>
Received on Thu Feb 28 2002 - 21:44:13 CST