Re: Can anyone suggest a better way to write this query?
From: C M <cmcmzzzz2_at_gmail.com>
Date: Thu, 19 Nov 2009 14:05:41 -0800 (PST)
Message-ID: <17466275-be99-4dac-b68d-936ccb304728_at_p35g2000yqh.googlegroups.com>
On Nov 19, 4:33 pm, Charles Hooper <hooperc2..._at_yahoo.com> wrote:
> On Nov 19, 3:10 pm, C M <cmcmzz..._at_gmail.com> wrote:
>
>
>
>
>
> > Table dir_size stores the mbytes of storage used in a given directory.
> > Table directories stores various directory names which may or may not
> > exist in table dir_size.
> > For every directory in table directories, report the cumulative
> > storage in that directory and all its subdirectories.
> > This solution uses a cartesian join. I imagine it will not scale
> > well.
>
> > Thanks in advance!
>
> > create table dir_size (
> > dir_name varchar2(40),
> > mbytes number
> > );
>
> > create table directories (
> > dir_name varchar2(40)
> > );
>
> > insert into dir_size values ('c:\aaa\bbb\ccc\ddd', 100);
> > insert into dir_size values ('c:\aaa\bbb\ccc', 100);
> > insert into dir_size values ('c:\aaa\bbb', 100);
> > insert into dir_size values ('c:\aaa', 100);
> > insert into dir_size values ('c:\', 100);
>
> > insert into directories values ('c:\aaa\bbb\ccc\ddd');
> > insert into directories values ('c:\aaa\bbb\ccc');
> > insert into directories values ('c:\aaa\bbb');
> > insert into directories values ('c:\aaa');
> > insert into directories values ('c:\');
> > insert into directories values ('c:\xxx\yyy\zzz');
>
> > commit;
>
> > select dir_name, sum(mbytes) from (
> > select directories.dir_name,
> > instr(dir_size.dir_name, directories.dir_name) INSTR,
> > mbytes
> > from directories, dir_size
> > )
> > where INSTR = 1
> > group by dir_name
> > order by 1;
>
> > DIR_NAME SUM(MBYTES)
> > ---------------------------------------- -----------
> > c:\ 500
> > c:\aaa 400
> > c:\aaa\bbb 300
> > c:\aaa\bbb\ccc 200
> > c:\aaa\bbb\ccc\ddd 100
>
> This appears to be a hard problem. To avoid headaches, make certain
> that each of the DIR_NAMES ends with \
>
> Let's start here:
> SELECT
> 'c:\aaa\bbb\ccc\ddd\' DIR_NAME,
> 100 MBYTES
> FROM
> DUAL;
>
> DIR_NAME MBYTES
> -------------------- ----------
> c:\aaa\bbb\ccc\ddd\ 100
>
> In your example, you would like to put 100MB into the following
> directories based on the above:
> c:\
> c:\aaa\
> c:\aaa\bbb\
> c:\aaa\bbb\ccc\
> c:\aaa\bbb\ccc\ddd\
>
> You somehow need to be able to break that one row into 5 rows. The
> following might help
> SELECT
> LEVEL L
> FROM
> DUAL
> CONNECT BY
> LEVEL<=20;
>
> L
> ---
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> 11
> 12
> 13
> 14
> 15
> 16
> 17
> 18
> 19
> 20
>
> If we join those two row sources together we might be able to create 5
> rows from the one row:
> SELECT
> SUBSTR(DIR_NAME,1,INSTR(DIR_NAME,'\',1,L)) DIR_NAME2,
> MBYTES
> FROM
> (SELECT
> 'c:\aaa\bbb\ccc\ddd\' DIR_NAME,
> 100 MBYTES
> FROM
> DUAL) DIR_SIZE,
> (SELECT
> LEVEL L
> FROM
> DUAL
> CONNECT BY
> LEVEL<=20) C
> WHERE
> SUBSTR(DIR_NAME,1,INSTR(DIR_NAME,'\',1,L)) IS NOT NULL;
>
> DIR_NAME2 MBYTES
> -------------------- ----------
> c:\ 100
> c:\aaa\ 100
> c:\aaa\bbb\ 100
> c:\aaa\bbb\ccc\ 100
> c:\aaa\bbb\ccc\ddd\ 100
>
> Now, if we performed the same process for all of the rows in the
> DIR_SIZE table, grouping on DIR_NAME2, we might be able to find the
> SUM of the MBYTES column.
>
> I will leave the rest for your experimentation. You probably will not
> have 20 \ characters in the DIR_NAME column, so you could optimize the
> SQL statement a bit.
>
> Charles Hooper
> IT Manager/Oracle DBA
> K&M Machine-Fabricating, Inc.- Hide quoted text -
>
> - Show quoted text -
Date: Thu, 19 Nov 2009 14:05:41 -0800 (PST)
Message-ID: <17466275-be99-4dac-b68d-936ccb304728_at_p35g2000yqh.googlegroups.com>
On Nov 19, 4:33 pm, Charles Hooper <hooperc2..._at_yahoo.com> wrote:
> On Nov 19, 3:10 pm, C M <cmcmzz..._at_gmail.com> wrote:
>
>
>
>
>
> > Table dir_size stores the mbytes of storage used in a given directory.
> > Table directories stores various directory names which may or may not
> > exist in table dir_size.
> > For every directory in table directories, report the cumulative
> > storage in that directory and all its subdirectories.
> > This solution uses a cartesian join. I imagine it will not scale
> > well.
>
> > Thanks in advance!
>
> > create table dir_size (
> > dir_name varchar2(40),
> > mbytes number
> > );
>
> > create table directories (
> > dir_name varchar2(40)
> > );
>
> > insert into dir_size values ('c:\aaa\bbb\ccc\ddd', 100);
> > insert into dir_size values ('c:\aaa\bbb\ccc', 100);
> > insert into dir_size values ('c:\aaa\bbb', 100);
> > insert into dir_size values ('c:\aaa', 100);
> > insert into dir_size values ('c:\', 100);
>
> > insert into directories values ('c:\aaa\bbb\ccc\ddd');
> > insert into directories values ('c:\aaa\bbb\ccc');
> > insert into directories values ('c:\aaa\bbb');
> > insert into directories values ('c:\aaa');
> > insert into directories values ('c:\');
> > insert into directories values ('c:\xxx\yyy\zzz');
>
> > commit;
>
> > select dir_name, sum(mbytes) from (
> > select directories.dir_name,
> > instr(dir_size.dir_name, directories.dir_name) INSTR,
> > mbytes
> > from directories, dir_size
> > )
> > where INSTR = 1
> > group by dir_name
> > order by 1;
>
> > DIR_NAME SUM(MBYTES)
> > ---------------------------------------- -----------
> > c:\ 500
> > c:\aaa 400
> > c:\aaa\bbb 300
> > c:\aaa\bbb\ccc 200
> > c:\aaa\bbb\ccc\ddd 100
>
> This appears to be a hard problem. To avoid headaches, make certain
> that each of the DIR_NAMES ends with \
>
> Let's start here:
> SELECT
> 'c:\aaa\bbb\ccc\ddd\' DIR_NAME,
> 100 MBYTES
> FROM
> DUAL;
>
> DIR_NAME MBYTES
> -------------------- ----------
> c:\aaa\bbb\ccc\ddd\ 100
>
> In your example, you would like to put 100MB into the following
> directories based on the above:
> c:\
> c:\aaa\
> c:\aaa\bbb\
> c:\aaa\bbb\ccc\
> c:\aaa\bbb\ccc\ddd\
>
> You somehow need to be able to break that one row into 5 rows. The
> following might help
> SELECT
> LEVEL L
> FROM
> DUAL
> CONNECT BY
> LEVEL<=20;
>
> L
> ---
> 1
> 2
> 3
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> 11
> 12
> 13
> 14
> 15
> 16
> 17
> 18
> 19
> 20
>
> If we join those two row sources together we might be able to create 5
> rows from the one row:
> SELECT
> SUBSTR(DIR_NAME,1,INSTR(DIR_NAME,'\',1,L)) DIR_NAME2,
> MBYTES
> FROM
> (SELECT
> 'c:\aaa\bbb\ccc\ddd\' DIR_NAME,
> 100 MBYTES
> FROM
> DUAL) DIR_SIZE,
> (SELECT
> LEVEL L
> FROM
> DUAL
> CONNECT BY
> LEVEL<=20) C
> WHERE
> SUBSTR(DIR_NAME,1,INSTR(DIR_NAME,'\',1,L)) IS NOT NULL;
>
> DIR_NAME2 MBYTES
> -------------------- ----------
> c:\ 100
> c:\aaa\ 100
> c:\aaa\bbb\ 100
> c:\aaa\bbb\ccc\ 100
> c:\aaa\bbb\ccc\ddd\ 100
>
> Now, if we performed the same process for all of the rows in the
> DIR_SIZE table, grouping on DIR_NAME2, we might be able to find the
> SUM of the MBYTES column.
>
> I will leave the rest for your experimentation. You probably will not
> have 20 \ characters in the DIR_NAME column, so you could optimize the
> SQL statement a bit.
>
> Charles Hooper
> IT Manager/Oracle DBA
> K&M Machine-Fabricating, Inc.- Hide quoted text -
>
> - Show quoted text -
Thanks for the suggestion. I suspect the best way will involve some kind of recursive processing. The tricky bit is the matching of the rows in the directories table to the rows in the dir_size table. We need to do a "like" (which we can't, of course) which is why I thought of the instr.
CM Received on Thu Nov 19 2009 - 16:05:41 CST