Home » SQL & PL/SQL » SQL & PL/SQL » iterating over columns
iterating over columns [message #247925] Wed, 27 June 2007 09:50 Go to next message
moschen
Messages: 25
Registered: April 2007
Junior Member
Hi,

I have problem with a process which takes quite long and I would like to hear about suggestions on how to speed it up.

Problem:
There is a table with columns

data_table
| a1 | a2 | a3| a5 | b5 | n4 | ...


Now I want to know how many null values some of the columns have.
The columns that I am interested in, are stored in another table as values:

reference_table
| column_name | is_interesting |
+-------------+----------------+
|  a1         |  y             |
|  a5         |  y             |


the statistics is stored in another table:

statistics_table
| column_name | number_of_nulls | number_of_not_nulls |
+-------------+-----------------+---------------------+
|    a1       |      500,000    |        250,000      |


Current solution:
Works somehow like this:

select all column_names from reference_table

  loop all_column_names do

    insert into statistics_table
      as select 'column_name', count(*)-count(column_name), count(column_name)

  end loop


But this is very slow. Any ideas on how to speed up the process? My current ideas would be:

  • A big union of "select 'column_name', count(*)-count(column_name), count(column_name)"
  • build a big "select count(*), count(a1), count(a5)", but it might be complex to split the data up afterwards.


Database is oracle 10.
Number of columns is about 300.
Number of rows in the data_table is about 1,000,000 (but can go up to about 6,000,000)

Looking forward to receiving your comments.

Best regards,
m.
Re: iterating over columns [message #247931 is a reply to message #247925] Wed, 27 June 2007 10:04 Go to previous messageGo to next message
BlackSwan
Messages: 25036
Registered: January 2009
Location: SoCal
Senior Member
>But this is very slow.
I am not surprised.
I'd like to see SQL_TRACE & EXPLAIN_PLAN for this run.
I suspect a FTS of big table for every column; which is needlessly inefficient.

A simple PL/SQL procedure could obtain the answer(s) in a single FTS of big table.

[Updated on: Wed, 27 June 2007 10:05] by Moderator

Report message to a moderator

Re: iterating over columns [message #247940 is a reply to message #247925] Wed, 27 June 2007 10:40 Go to previous messageGo to next message
Michel Cadot
Messages: 64103
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Gather statistics for all columns then you'll have the number of values and number of nulls.

Regards
Michel
Re: iterating over columns [message #248057 is a reply to message #247940] Wed, 27 June 2007 22:44 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Rather than UNION ALL, get all of the counts in a single SELECT (ie. one row with lots of columns) and then un-pivot the single row result into rows using a nested table.

This post contains some of the techniques that would be used

Ross Leishman
Re: iterating over columns [message #248124 is a reply to message #247925] Thu, 28 June 2007 02:13 Go to previous message
moschen
Messages: 25
Registered: April 2007
Junior Member
Hi,

I am sorry, but I forgot to mention that the data_table has an additional column 'request_id' and the statistics should only be gathered for the rows of a particular request_is.

data_table
request_id | a1 | a2 | a3| a5 | b5 | n4 | ...


expected result:

statistics_table
| request_id | column_name | #_nulls | #_not_nulls |
+------------+-------------+---------+-------------+
|   1        |    a1       |500,000  | 250,000     |


Michel With the request_id in mind, that statistics gathering unfortunately would not work.
anacedent Do you have an example (maybe just pseudo code) for that simple procedure?
rleishman Thanks, I will give it a try.

Best Regards,
m.
Previous Topic: find max val in rows?
Next Topic: Execute Immediate
Goto Forum:
  


Current Time: Sun Dec 04 22:41:53 CST 2016

Total time taken to generate the page: 0.21703 seconds