Re: cost of having MANY tables

From: Billy Verreynne <vslabs_at_onwe.co.za>
Date: Thu, 2 Dec 1999 12:07:28 +0200
Message-ID: <825gd4$9ck$1@ctb-nnrp1.saix.net>

Steve Parker wrote in message <384458AE.75EAEC11_at_averstar.com>...
>Can someone please give me a run down of the costs of having a HUGE
>amount of tables (each being pretty small) as opposed to having fewer
>tables that are bigger?

This is not a simple comparison.

You are in fact "breaking" the ERD and Codd's rule if you start to do this type of thing - splitting a single physical table into multiple smaller tables. Not that this is necessarily wrong, but something to keep in mind as it makes the "translation" of your 3rd normal design (or star schema or whatever design you're using) into the physical database structure more complex.

The application layer's complexity is increased as the developer no longer needs to only deal with single table X, but with tables X1 and X2 thru to Xn.

You can remove this from the application layer using views and stored procs in the database. However, you're not removing the complexity - you're simply shifting it to the database layer.

There are various ways to partition data physically. From a conceptual point of view you can either split data horizontally or vertically. Horizontally means that instead of having all the columns in a single table, you create multiple tables, all having the same key, and then split the columns between these. Vertical partitioning is what you are referring to - breaking rows from a single table up into multiple tables.

From a technical and implementation point of view, you also have various options. You can partition data using clusters, partition tables, partition views and so on.

So what is the Oracle costs between a couple of VLT's (very large tables) and a lot of smaller tables. Negligible. The additional overheads in Oracle are basically that the data dictionary requires more space to store the database structure data. But the space requirements for the data dictionary are really minuscule when compared to the overall space requirements of the entire database.

Joins and the like? That depends entirely on what you are trying to do. The nature of the joins (hash, nested loops), the indexes, parallelising of full table scans, and so on. You can run into performance problems by joining 10 small tables together. You can run into performance problem when processing a single large table. You can not say because I have lots of small tables I will not have performance problems. Neither can you say because I have everything in a few large tables, I will not have performance problems. Processing data is far more complex than the simple issue of the number of tables envolved. Or even the data volumes involved.

For example, on a data warehouse I help to build, I could profile a single family's medical history for the last 3 years in seconds. This required processing a VLT of over 170 million rows, joining it with several smaller tables and aggregating the resulting data. What played a critical role here was not the number of tables involved, but how well the database was designed to cope with this type of requirement.

So what to do? IMO you need to first identify WHAT needs to be done. User requirements, specifications, logical designs and the like. Once you have that, then you have enough info to decide HOW to do it. Doing the technical stuff - translating the logical design and user requirements into a physical design and processes.

My crazed thoughts anyway... ;-)

regards,
Billy Received on Thu Dec 02 1999 - 04:07:28 CST