Re: Can anyone suggest a better way to write this query?

From: C M <cmcmzzzz2_at_gmail.com>
Date: Thu, 19 Nov 2009 14:05:41 -0800 (PST)
Message-ID: <17466275-be99-4dac-b68d-936ccb304728_at_p35g2000yqh.googlegroups.com>



On Nov 19, 4:33 pm, Charles Hooper <hooperc2..._at_yahoo.com> wrote:
> On Nov 19, 3:10 pm, C M <cmcmzz..._at_gmail.com> wrote:
>
>
>
>
>
> > Table dir_size stores the mbytes of storage used in a given directory.
> > Table directories stores various directory names which may or may not
> > exist in table dir_size.
> > For every directory in table directories, report the cumulative
> > storage in that directory and all its subdirectories.
> > This solution uses a cartesian join.  I imagine it will not scale
> > well.
>
> > Thanks in advance!
>
> > create table dir_size (
> >   dir_name     varchar2(40),
> >   mbytes       number
> >   );
>
> > create table directories (
> >   dir_name    varchar2(40)
> >   );
>
> > insert into dir_size values ('c:\aaa\bbb\ccc\ddd', 100);
> > insert into dir_size values ('c:\aaa\bbb\ccc', 100);
> > insert into dir_size values ('c:\aaa\bbb', 100);
> > insert into dir_size values ('c:\aaa', 100);
> > insert into dir_size values ('c:\', 100);
>
> > insert into directories values ('c:\aaa\bbb\ccc\ddd');
> > insert into directories values ('c:\aaa\bbb\ccc');
> > insert into directories values ('c:\aaa\bbb');
> > insert into directories values ('c:\aaa');
> > insert into directories values ('c:\');
> > insert into directories values ('c:\xxx\yyy\zzz');
>
> > commit;
>
> > select dir_name, sum(mbytes) from (
> >   select directories.dir_name,
> >     instr(dir_size.dir_name, directories.dir_name) INSTR,
> >     mbytes
> >   from directories, dir_size
> > )
> > where INSTR = 1
> > group by dir_name
> > order by 1;
>
> > DIR_NAME                                 SUM(MBYTES)
> > ---------------------------------------- -----------
> > c:\                                              500
> > c:\aaa                                           400
> > c:\aaa\bbb                                       300
> > c:\aaa\bbb\ccc                                   200
> > c:\aaa\bbb\ccc\ddd                               100
>
> This appears to be a hard problem.  To avoid headaches, make certain
> that each of the DIR_NAMES ends with \
>
> Let's start here:
> SELECT
>   'c:\aaa\bbb\ccc\ddd\' DIR_NAME,
>   100 MBYTES
> FROM
>   DUAL;
>
> DIR_NAME                 MBYTES
> -------------------- ----------
> c:\aaa\bbb\ccc\ddd\         100
>
> In your example, you would like to put 100MB into the following
> directories based on the above:
> c:\
> c:\aaa\
> c:\aaa\bbb\
> c:\aaa\bbb\ccc\
> c:\aaa\bbb\ccc\ddd\
>
> You somehow need to be able to break that one row into 5 rows.  The
> following might help
> SELECT
>   LEVEL L
> FROM
>   DUAL
> CONNECT BY
>   LEVEL<=20;
>
>   L
> ---
>   1
>   2
>   3
>   4
>   5
>   6
>   7
>   8
>   9
>  10
>  11
>  12
>  13
>  14
>  15
>  16
>  17
>  18
>  19
>  20
>
> If we join those two row sources together we might be able to create 5
> rows from the one row:
> SELECT
>   SUBSTR(DIR_NAME,1,INSTR(DIR_NAME,'\',1,L)) DIR_NAME2,
>   MBYTES
> FROM
>   (SELECT
>     'c:\aaa\bbb\ccc\ddd\' DIR_NAME,
>     100 MBYTES
>   FROM
>     DUAL) DIR_SIZE,
>   (SELECT
>     LEVEL L
>   FROM
>     DUAL
>   CONNECT BY
>     LEVEL<=20) C
> WHERE
>   SUBSTR(DIR_NAME,1,INSTR(DIR_NAME,'\',1,L)) IS NOT NULL;
>
> DIR_NAME2                MBYTES
> -------------------- ----------
> c:\                         100
> c:\aaa\                     100
> c:\aaa\bbb\                 100
> c:\aaa\bbb\ccc\             100
> c:\aaa\bbb\ccc\ddd\         100
>
> Now, if we performed the same process for all of the rows in the
> DIR_SIZE table, grouping on DIR_NAME2, we might be able to find the
> SUM of the MBYTES column.
>
> I will leave the rest for your experimentation.  You probably will not
> have 20 \ characters in the DIR_NAME column, so you could optimize the
> SQL statement a bit.
>
> Charles Hooper
> IT Manager/Oracle DBA
> K&M Machine-Fabricating, Inc.- Hide quoted text -
>
> - Show quoted text -

Thanks for the suggestion. I suspect the best way will involve some kind of recursive processing. The tricky bit is the matching of the rows in the directories table to the rows in the dir_size table. We need to do a "like" (which we can't, of course) which is why I thought of the instr.

CM Received on Thu Nov 19 2009 - 16:05:41 CST

Original text of this message