Home » SQL & PL/SQL » SQL & PL/SQL » Storing varying number of columns in table (Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production)
Storing varying number of columns in table [message #633777] Tue, 24 February 2015 11:18 Go to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

Hi,

My problem is similar to below thread.

Split CLOB column to multiple columns within a row

I don't want to hijack the above thread , that's why I started a new one.
I am struggling with the similar problem (storing dynamic columns) - we will be receiving datasets with different(varying) number of columns.
What I am left with are below 3 options to store those datasets:

1. Storing them in clob will give hard time while parsing and fetching data (in terms of resource used), and moreover, I can't put any integrity constraints and you have shown all the problems already that we can face while having data in clob.
2. Storing them in inverted structure (columns into rows). I am already receiving the huge number of data, and if each column data I am going to store at row, even partitioning will not help much (I am not sure what kind of partitioning to use, primary key will be varchar to I left with only hash partitioning)
3. Creating separate table for each structure I receive, that way I will end up create hell lot of tables, whose management will be difficult.

I am going to choose either the 2nd or 3rd option, is there any other way which I should consider.
Any help will be much appreciated.

P.S Please let me know in case I need to provide more clarification on any point, or any questions.

Thanks,
Manu
Re: Storing varying number of columns in table [message #633779 is a reply to message #633777] Tue, 24 February 2015 12:17 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
> we will be receiving datasets with different(varying) number of columns.

what exactly a "dataset"?
"Receive" them how?

Perhaps I am just dense, but I am not clear what exactly are the data processing requirements?
what exactly needs to be done to or with the data contained in the dataset?
Re: Storing varying number of columns in table [message #633780 is a reply to message #633779] Tue, 24 February 2015 13:27 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

We will receive datasets in form of CSV files, having header as column names, and following rows as data. We have to treat every column as varchar2.

So dataset1 looks like:
col1,col2,col3,col4
a,b,c,d
e,f,g,h

dataset2 looks like:
col1,col2,col3,col4,col5,col6
m,n,o,p,q,r
s,t,u,v,w,x

Now, there are three ways in my mind, how I can store it in Oracle.

1st way [store in table(varchar2,CLOB)]

dataset1 [col1:a;col2:b;col3:c;col4:d]
dataset1 [col1:e;col2:f;col3:g;col4:h]
dataset2 [col1:m;col2:n;col3:o;col4:p;col5:q;col6:r]
... so on


2nd way - inverted table (column data into rows)
dataset1 rec1 col1 a
dataset1 rec1 col2 b
dataset1 rec1 col3 c
dataset1 rec1 col4 d
dataset1 rec2 col1 e
..
..
dataset2 rec1 col1 m
dataset2 rec1 col2 n
... so on


3rd way, creating separate table for each dataset.

First table dataset1(col1,col2,col3,col4)
a,b,c,d
e,f,g,h

Second table dataset2(col1,col2,col3,col4,col5,col6)
m,n,o,p,q,r
s,t,u,v,w,x



Now, as per me, 3rd most option is best suitable option, but in this case, I have to write a lot more code to manage the things.
Is there any other options you can suggest. Whenever we are retrieving data, we will retrieve on the basis of dataset names only.

Thanks,
Manu
Re: Storing varying number of columns in table [message #633781 is a reply to message #633780] Tue, 24 February 2015 13:38 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
>We will receive datasets in form of CSV files, having header as column names,
>we will retrieve on the basis of dataset names only.

Will there be any problem with "mapping" the dataset names to conforming Oracle object names?
Will there be any problem with "mapping" the column names to conforming Oracle column names?
Are dataset names & column names Case Sensitive?
Re: Storing varying number of columns in table [message #633782 is a reply to message #633781] Tue, 24 February 2015 13:41 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

Yes, dataset names may be larger than 30 characters. Column names should not be problem. Names are not case sensitive.

Thanks,
Manu
Re: Storing varying number of columns in table [message #633784 is a reply to message #633782] Tue, 24 February 2015 15:32 Go to previous messageGo to next message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
Sounds like maybe some basic data modeling is in order.

Are you dealing with truly unstructured data?  Or just lots of different structures possible?

Do you know all the possible different structures?  Or is this just one structure with potentially lots of nullable attributes?

Or maybe it is some of both, several possible structures (defined by keys) each of which has a large number of nullable attributes?

If there are lots of different structures, what are all the keys for each of them?

Asking and answering these kinds of questions is part of the analysis you will need to do, to take yourself to a better appreciation of your situation. From there you can make a better decision.

Kevin
Re: Storing varying number of columns in table [message #633786 is a reply to message #633784] Tue, 24 February 2015 16:03 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

Yes, you are right Kevin. Thanks.

I already faced many performance problem in previous project, as table was designed in inverted (transposed) way to store unstructured data, and it was causing a lot of problem to maintain and check integrity and transposing them back to rows.

So this time I want to be more careful, how I am going to design that.

Below are the answers of questions, please help me decide the approach.

Are you dealing with truly unstructured data?  Or just lots of different structures possible?
Answer : Truly unstructured

Do you know all the possible different structures?  Or is this just one structure with potentially lots of nullable attributes?
Answer : All the structure are not known, it's a data science project, at the completion of analysis only, we come to know what columns we are going to store. So we don't know in advance the structures.

Or maybe it is some of both, several possible structures (defined by keys) each of which has a large number of nullable attributes?
Answer : No, each attribute will have some value, no null values.

If there are lots of different structures, what are all the keys for each of them?
Answer : Key columns (used to define/identify a dataset) for all the structure will remain the same, but other columns that are actually going to store the data of analysis, will vary.


Thanks for helping.

Manu
Re: Storing varying number of columns in table [message #633791 is a reply to message #633786] Tue, 24 February 2015 18:01 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
>Answer : Key columns (used to define/identify a dataset)
Does above or below more accurately reflect this reality or will the answer change again after a subsequent post in this thread by you?
>>we will retrieve on the basis of dataset names only.

It is a challenge to formulate a solution when the requirements keep morphing.
Re: Storing varying number of columns in table [message #633792 is a reply to message #633791] Tue, 24 February 2015 18:09 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

I said the same thing in 2 different ways.

My dataset name will be like : A_B_C_D_E_F

So one approach will be either I can create a table with this dataset name (A_B_C_D_E_F) and required number of columns inside that table, but as the dataset name may increase 30 characters.

The second approach would be storing this dataset name in table, along with values, or assigning dataset a number or something that works.

So retrieval will depend how I am actually going to design my model.

Thanks for asking,
Manu
Re: Storing varying number of columns in table [message #633793 is a reply to message #633792] Tue, 24 February 2015 19:14 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
Relatively speaking, it is easy to deposit the data into oracle tables.

It may be difficult or IMPOSSIBLE to design a solution that will produce desired result set in an acceptable amount of time.

Will the data reporting or analysis require JOINS that include 1 or more of these datasets?
Re: Storing varying number of columns in table [message #633794 is a reply to message #633793] Tue, 24 February 2015 20:11 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

The data I am modelling, is the result of analytic process. I have to check whether a join will be required among datasets or not.

Thanks a lot,
Manu
Re: Storing varying number of columns in table [message #633829 is a reply to message #633794] Wed, 25 February 2015 13:22 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

No, I don't have to perform any kind of operation on these datasets. These datasets will be forwarded to the calling external application as it is.

Thanks,
Manu
Re: Storing varying number of columns in table [message #633830 is a reply to message #633829] Wed, 25 February 2015 13:24 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
manubatham20 wrote on Wed, 25 February 2015 11:22
No, I don't have to perform any kind of operation on these datasets. These datasets will be forwarded to the calling external application as it is.

Thanks,
Manu



So why is Oracle involved at all?
Re: Storing varying number of columns in table [message #633831 is a reply to message #633830] Wed, 25 February 2015 13:28 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

Nice question.
Oracle is involved, because the application that is processing data has to be stored somewhere for later use.

Manu
Re: Storing varying number of columns in table [message #633832 is a reply to message #633831] Wed, 25 February 2015 13:31 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
manubatham20 wrote on Wed, 25 February 2015 11:28
Nice question.
Oracle is involved, because the application that is processing data has to be stored somewhere for later use.

Manu


The data is already stored in the "input" CSV files.
Re: Storing varying number of columns in table [message #633834 is a reply to message #633832] Wed, 25 February 2015 14:18 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

Some other application processed data, and generated csv file, now we have temporary csv file, but the data inside csv file has to store somewhere permanently for later retrieval.

Manu
Re: Storing varying number of columns in table [message #633837 is a reply to message #633834] Wed, 25 February 2015 14:25 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
manubatham20 wrote on Wed, 25 February 2015 12:18
Some other application processed data, and generated csv file, now we have temporary csv file, but the data inside csv file has to store somewhere permanently for later retrieval.

Manu


store the CSV file content in source code repository
Re: Storing varying number of columns in table [message #633846 is a reply to message #633837] Thu, 26 February 2015 02:42 Go to previous messageGo to next message
ThomasG
Messages: 3211
Registered: April 2005
Location: Heilbronn, Germany
Senior Member
I have a few CSV import procedures. I basically use a staging table in which I load the data.

There I basically crated a table with one "dataset id" column, one "line number" column, one "processed flag" column, and a hundred varchar2(4000) columns COL_001 to COL_100. I load the column headers into line number 0, and load the data into their respective line numbers.

(The datasets I receive so far have at most ~25 columns, with the longest strings so far being about 300 characters)
Re: Storing varying number of columns in table [message #634000 is a reply to message #633846] Mon, 02 March 2015 16:40 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

2 reasons I decided for a transposed/inverted format (storing each column value in single row)

1. First and foremost, no. of cubes (columns) may go beyond 1000 (I came to know this later)
2. Can't go for csv, as we don't have to join data, but we have to query data to display in specific reports.

Thanks,
Manu
Re: Storing varying number of columns in table [message #634001 is a reply to message #634000] Mon, 02 March 2015 16:43 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
> but we have to query data to display in specific reports.

EXTERNAL TABLE allows you to query directly from any properly mapped CSV file.
Re: Storing varying number of columns in table [message #634003 is a reply to message #634001] Mon, 02 March 2015 17:14 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

Yes, but table structure is not know till run-time, so I will end up writing code creating external tables at run time for each structure I receive.

Thanks,
Manu
Re: Storing varying number of columns in table [message #634004 is a reply to message #634003] Mon, 02 March 2015 17:25 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
The EXTERNAL TABLE definitions could/should be done once when the CSV files arrive on the DB Server.
Then the data can be accessed transparently anytime thereafter.

Of course you are free to implement a more complicated solution if that is deemed the more desired approach.

BTW, how do you plan on accommodating the dataset names that are longer than 30 characters?
Re: Storing varying number of columns in table [message #634006 is a reply to message #634004] Mon, 02 March 2015 19:20 Go to previous messageGo to next message
rleishman
Messages: 3728
Registered: October 2005
Location: Melbourne, Australia
Senior Member
I've never done it and it may not be appropriate, but it might be possible to convert the CSV row to XML and store it in an XMLTYPE column. This will give you some pretty powerful XML tools to read, index and manipulate the data.

Ross Leishman
Re: Storing varying number of columns in table [message #634007 is a reply to message #634006] Mon, 02 March 2015 19:30 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

Hmmm, never thought of this rleishman... Let me give it a try in this direction.

Blackswan,
I am storing in table format (Dataset Name, rno, column_name, value, datetime), so dataset name can be upto 4000 chars (the second way I described in my second post) with interval partitioning, as table will be most queried on the date range.

Thanks,
Manu
Re: Storing varying number of columns in table [message #634009 is a reply to message #634007] Mon, 02 March 2015 19:37 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
query of table format (Dataset Name, rno, column_name, value, datetime) wil be SLOW, SLOW, Slow, slow
Re: Storing varying number of columns in table [message #634010 is a reply to message #634009] Mon, 02 March 2015 20:03 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

Every thing is fine, but how we will accommodate, when the process will generate more than 1000 columns? Creating table and external table will not work.

Manu
Re: Storing varying number of columns in table [message #634012 is a reply to message #634010] Mon, 02 March 2015 20:40 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
Ignoring the specifics of this situation, what would you do to accommodate any data collection that had more than 1000 "columns"?
I suspect that what you have is de-normalized data.

[Updated on: Mon, 02 March 2015 20:41]

Report message to a moderator

Re: Storing varying number of columns in table [message #634016 is a reply to message #634012] Mon, 02 March 2015 22:26 Go to previous messageGo to next message
manubatham20
Messages: 566
Registered: September 2010
Location: Seattle, WA, USA
Senior Member

What about XMLTYPE? How efficient that will be?
Re: Storing varying number of columns in table [message #634017 is a reply to message #634016] Mon, 02 March 2015 22:46 Go to previous message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
manubatham20 wrote on Mon, 02 March 2015 20:26
What about XMLTYPE? How efficient that will be?



some, many, most competent professionals would do a quick benchmark test to learn feasibility & then performance.
If you can't make it work for 5 columns, you'll never get to 1000+ columns
Previous Topic: How to get the connect string from the data base
Next Topic: Creating a salary report. Having trouble calculating previous salary.
Goto Forum:
  


Current Time: Thu Apr 25 19:01:24 CDT 2024