Partitioning to help the COB get good stats

From: Will <wrwatson_at_gmail.com>
Date: 6 May 2006 04:32:12 -0700
Message-ID: <1146915132.119743.86860@y43g2000cwc.googlegroups.com>

Hello,

The database I'm looking after is used for a hosted solution that supports about 150 different customers. Each of these customers uses some different modules, and is called a "community" within our system. Some of our customers are very small and simple, and some are larger and complex (e.g. franchises).

I've been having a few problems with the CBO coming up with bad plans. I've traced and the core of the problem seems to be that it is underestimating the number of rows it will retrieve when it visits the payment_batch table. I've checked this with a /*+ CARDINALITY(payment_batch 1) */ hint. With that hint, it switches to the good plan (looks good, runs fast). We are on 10g Release 1, soon to go to Release 2.

We have different customers producing different types of data. The CBO takes the average across all this data when coming up with plans.

As an example:-

SQL> r
  1 select number_of_suppliers, count(*), sum(r) number_rows   2 from
  3 (
  4 select count(distinct(id_businesS_supplier)) number_of_suppliers, id_community,count(*) r

  5     from payment_batch
  6     group by id_community

7 )
8* group by number_of_suppliers

NUMBER_OF_SUPPLIERS COUNT(*) NUMBER_ROWS

------------------- ---------- -----------
                  0          3         753
                  1        136       33252
                  2         19        5540
                  3          5        2232
                  4          7        2580
                  5          4        1338
                  6          2        2173
                  7          4        3868
                 11          1         650
                 12          2        1196
                 15          1        2995
                 17          1        2256
                 23          1         987
                303          1       12582
               1038          1       25863

15 rows selected.

After you've read the above SQL statement three times, you can see that there is one community that has 1038 different values for ID_business_supplier. There's another that has 303 different values from ID_business_supplier. However, most of the rest have only a few, and indeed most communities (136 of them) have but a single supplier.

If someone issues a WHERE ID_business_supplier = :B1 AND ID_community = :B2, Oracle underestimates the number of rows returned. I've tried dynamic sampling, even at value 10, and this slows down the plan creation, but does not make it any better.

My current plan is to partition the table based on ID_community. I'll put the two outliers (and any future ones) into their own partitions. Then, we'll change the application to use literals when selecting on ID_community. My explain plan tests show the CBO then switches to the partition-level statistics. The selectivity calculation is then within the partition only.

My questions are:-

Has anyone tried what I'm considering? Any gotchas?
Anyone with a better idea of fixing this problem, besides putting a bunch of hints everywhere?
Is there any tricky way of getting inside the CBO to help it out with selectivity calculations? For example, I know that ID_business_supplier and ID_community are correlated, but can't tell the CBO this and it doesn't figure it out.

Cheers,
Will Received on Sat May 06 2006 - 06:32:12 CDT