Home » SQL & PL/SQL » SQL & PL/SQL » Storing varying number of columns in table (Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production)
Storing varying number of columns in table [message #633777] |
Tue, 24 February 2015 11:18 |
manubatham20
Messages: 566 Registered: September 2010 Location: Seattle, WA, USA
|
Senior Member |
|
|
Hi,
My problem is similar to below thread.
Split CLOB column to multiple columns within a row
I don't want to hijack the above thread , that's why I started a new one.
I am struggling with the similar problem (storing dynamic columns) - we will be receiving datasets with different(varying) number of columns.
What I am left with are below 3 options to store those datasets:
1. Storing them in clob will give hard time while parsing and fetching data (in terms of resource used), and moreover, I can't put any integrity constraints and you have shown all the problems already that we can face while having data in clob.
2. Storing them in inverted structure (columns into rows). I am already receiving the huge number of data, and if each column data I am going to store at row, even partitioning will not help much (I am not sure what kind of partitioning to use, primary key will be varchar to I left with only hash partitioning)
3. Creating separate table for each structure I receive, that way I will end up create hell lot of tables, whose management will be difficult.
I am going to choose either the 2nd or 3rd option, is there any other way which I should consider.
Any help will be much appreciated.
P.S Please let me know in case I need to provide more clarification on any point, or any questions.
Thanks,
Manu
|
|
|
|
Re: Storing varying number of columns in table [message #633780 is a reply to message #633779] |
Tue, 24 February 2015 13:27 |
manubatham20
Messages: 566 Registered: September 2010 Location: Seattle, WA, USA
|
Senior Member |
|
|
We will receive datasets in form of CSV files, having header as column names, and following rows as data. We have to treat every column as varchar2.
So dataset1 looks like:
col1,col2,col3,col4
a,b,c,d
e,f,g,h
dataset2 looks like:
col1,col2,col3,col4,col5,col6
m,n,o,p,q,r
s,t,u,v,w,x
Now, there are three ways in my mind, how I can store it in Oracle.
1st way [store in table(varchar2,CLOB)]
dataset1 [col1:a;col2:b;col3:c;col4:d]
dataset1 [col1:e;col2:f;col3:g;col4:h]
dataset2 [col1:m;col2:n;col3:o;col4:p;col5:q;col6:r]
... so on
2nd way - inverted table (column data into rows)
dataset1 rec1 col1 a
dataset1 rec1 col2 b
dataset1 rec1 col3 c
dataset1 rec1 col4 d
dataset1 rec2 col1 e
..
..
dataset2 rec1 col1 m
dataset2 rec1 col2 n
... so on
3rd way, creating separate table for each dataset.
First table dataset1(col1,col2,col3,col4)
a,b,c,d
e,f,g,h
Second table dataset2(col1,col2,col3,col4,col5,col6)
m,n,o,p,q,r
s,t,u,v,w,x
Now, as per me, 3rd most option is best suitable option, but in this case, I have to write a lot more code to manage the things.
Is there any other options you can suggest. Whenever we are retrieving data, we will retrieve on the basis of dataset names only.
Thanks,
Manu
|
|
|
|
|
|
Re: Storing varying number of columns in table [message #633786 is a reply to message #633784] |
Tue, 24 February 2015 16:03 |
manubatham20
Messages: 566 Registered: September 2010 Location: Seattle, WA, USA
|
Senior Member |
|
|
Yes, you are right Kevin. Thanks.
I already faced many performance problem in previous project, as table was designed in inverted (transposed) way to store unstructured data, and it was causing a lot of problem to maintain and check integrity and transposing them back to rows.
So this time I want to be more careful, how I am going to design that.
Below are the answers of questions, please help me decide the approach.
Are you dealing with truly unstructured data? Or just lots of different structures possible?
Answer : Truly unstructured
Do you know all the possible different structures? Or is this just one structure with potentially lots of nullable attributes?
Answer : All the structure are not known, it's a data science project, at the completion of analysis only, we come to know what columns we are going to store. So we don't know in advance the structures.
Or maybe it is some of both, several possible structures (defined by keys) each of which has a large number of nullable attributes?
Answer : No, each attribute will have some value, no null values.
If there are lots of different structures, what are all the keys for each of them?
Answer : Key columns (used to define/identify a dataset) for all the structure will remain the same, but other columns that are actually going to store the data of analysis, will vary.
Thanks for helping.
Manu
|
|
|
|
|
|
|
|
|
|
|
|
|
Re: Storing varying number of columns in table [message #633846 is a reply to message #633837] |
Thu, 26 February 2015 02:42 |
ThomasG
Messages: 3211 Registered: April 2005 Location: Heilbronn, Germany
|
Senior Member |
|
|
I have a few CSV import procedures. I basically use a staging table in which I load the data.
There I basically crated a table with one "dataset id" column, one "line number" column, one "processed flag" column, and a hundred varchar2(4000) columns COL_001 to COL_100. I load the column headers into line number 0, and load the data into their respective line numbers.
(The datasets I receive so far have at most ~25 columns, with the longest strings so far being about 300 characters)
|
|
|
|
|
|
|
Re: Storing varying number of columns in table [message #634006 is a reply to message #634004] |
Mon, 02 March 2015 19:20 |
rleishman
Messages: 3728 Registered: October 2005 Location: Melbourne, Australia
|
Senior Member |
|
|
I've never done it and it may not be appropriate, but it might be possible to convert the CSV row to XML and store it in an XMLTYPE column. This will give you some pretty powerful XML tools to read, index and manipulate the data.
Ross Leishman
|
|
|
Re: Storing varying number of columns in table [message #634007 is a reply to message #634006] |
Mon, 02 March 2015 19:30 |
manubatham20
Messages: 566 Registered: September 2010 Location: Seattle, WA, USA
|
Senior Member |
|
|
Hmmm, never thought of this rleishman... Let me give it a try in this direction.
Blackswan,
I am storing in table format (Dataset Name, rno, column_name, value, datetime), so dataset name can be upto 4000 chars (the second way I described in my second post) with interval partitioning, as table will be most queried on the date range.
Thanks,
Manu
|
|
|
|
|
|
|
Re: Storing varying number of columns in table [message #634017 is a reply to message #634016] |
Mon, 02 March 2015 22:46 |
|
BlackSwan
Messages: 26766 Registered: January 2009 Location: SoCal
|
Senior Member |
|
|
manubatham20 wrote on Mon, 02 March 2015 20:26What about XMLTYPE? How efficient that will be?
some, many, most competent professionals would do a quick benchmark test to learn feasibility & then performance.
If you can't make it work for 5 columns, you'll never get to 1000+ columns
|
|
|
Goto Forum:
Current Time: Thu Apr 25 19:01:24 CDT 2024
|