Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Table with 7000 columns?

Re: Table with 7000 columns?

From: Peter Nolan <peter_at_peternolan.com>
Date: 30 Nov 2004 09:44:19 -0800
Message-ID: <f4f21ef3.0411300944.90ff020@posting.google.com>


hopehope_123_at_yahoo.com (utkanbir) wrote in message news:<f6c90ebe.0411292239.42efe1f2_at_posting.google.com>...
> Hi ,
>
> This is related to datawarehouse , data mining . We are told that data
> mining tools such as spss or sas , need a large table which has lots
> of columns inside . Based on our project , out final table which will
> be used in data mining , has lots of computed values . The final
> number of columns of the table is 7800!(lots of calculated values)
> First of all , i dont see the reason of this requirement of the data
> mining tools. I will be appreciated if someone can clarify this: Why
> does a data mining tool need such a large table?
>
> The main issue is if i create table with 7800 columns inside ( in fact
> oracle only allows to create table with 1000 columns) , i believe it
> cant be queried.
> My basic calculation shows that the average row size of this table
> will be 160kb. , considering my db_block_size of 16kb. this means 10
> blocks for 1 row. (The table will have more than 10.000.000 rows) No
> matter how fast my disk subsystem is , i think the queries against
> this table will fail. So what can i do? May be I need a different
> type of storage technique for instance column based storage ( i heard
> that sysbase has this feature , dont know the details / purpose..)
> How can i solve such a problem? The database server really does not
> matter , it can be oracle,sqlserver , sybase,informix , etc... I will
> be appreciated if someone can help me abut the issue.
>
> Kind Regards,
> hope

Hi Hope,
the situation is more like this...

A number of the DM tools, (Clementine, IBM Intelligent Miner etc) are very good at looking for associations between items that have not been recognised as being associated before. They are good at looking for strength of relationships between data thought to be 'non-related'.....ONE way to make this easy for these tools to do is to deliver a great big flat file with lots and lots of columns because that's easier for the tool to digest. And a number of these kinds of vendors will present this as a 'requirement'.

That's all ok. And the tools are great etc etc.

But this does not mean you need to design your database as one thumping great flat table. You only need to deliver some of the data like that. So you can run queries and present the data to the tool as the tool would most like it.

Also, the vendors might tell you that you need to deliver ALL your companies data to the data mining tool. (They often license on CPUs) But truth be known, if you can sample your data well, the tools find the relationships in the sample just as well as the whole lot. After all, if the attributes are 'related' they will be just as related in a 10% sample as they are in a 100% sample. Sure, the 10% sample increases 'error rates' but good sampling will help cut the error rates down a bit.

If you are going to build a DW to house this data, best start reading up on the typical design mechanisms.

So, no, you don't need a table with 7000 columns. Or even 500. I'm doing one at the moment with the largest number of columns I've ever done at 400 and we are experiencing some difficulties even with this on Oracle....we are considering more separate to reduce the number of columns....100 columns seems to be a much more practical limit for a table that will be indexed and queried.....but that's only a guess on my part....the good folks here will know more about it...

On IQ. I have used IQ quite a bit and I'm definitely a fan of it....but it seems a bit crass to talk about it on the Oracle newsgroup. Feel free to email me at peter_at_peternolan.com if you want to discuss it more. You will be asked to authorise yourself by my spam software.

Best Regards

Peter Nolan. Received on Tue Nov 30 2004 - 11:44:19 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US