Home » SQL & PL/SQL » SQL & PL/SQL » percentages of data population (Oracle 11.2.0.3)
percentages of data population [message #652027] Tue, 31 May 2016 07:50 Go to next message
pointers
Messages: 451
Registered: May 2008
Senior Member
Hi,

I have two tables 1. master 2. lookup

CREATE TABLE SCOTT.MASTER
(
  ID      NUMBER,
  NAME    VARCHAR2(20 BYTE),
  LOC_ID  VARCHAR2(20 BYTE),
  REC_ID  NUMBER
)
/


ID and NAME are key columns in the master table i.e. id and name forms a unique record and to get rid of duplicates.
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
 Values(1, 's', '1000', 100);

Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
 Values(2, NULL, '2000', NULL);

Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
 Values(NULL, NULL, NULL, NULL);

Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
 Values(NULL, 'i', NULL, NULL);

Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
 Values(NULL, 'k', NULL, 200);

Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
 Values(6, NULL, NULL, NULL);

Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
 Values(7, NULL, '4000', 400);

Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
 Values(8, NULL, NULL, NULL);

Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
 Values(1, 's', '1000', 100);

Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
 Values(9, 'z', NULL, NULL);

Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
 Values(9, 'z', NULL, NULL);

Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
 Values(9, 'z', NULL, NULL);

Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
 Values(9, 'z', NULL, NULL);

COMMIT;

  
CREATE TABLE SCOTT.LOOKUP
(
  RECORDID     NUMBER,
  COLUMN_NAME  VARCHAR2(20 BYTE)
)
/

Insert into SCOTT.LOOKUP(RECORDID, COLUMN_NAME)
 Values(100, 'id');

Insert into SCOTT.LOOKUP(RECORDID, COLUMN_NAME)
 Values(100, 'name');

Insert into SCOTT.LOOKUP(RECORDID, COLUMN_NAME)
 Values(200, 'name');

Insert into SCOTT.LOOKUP(RECORDID, COLUMN_NAME)
 Values(400, 'id');
 


Below query gives the unique records of master table -
SELECT m.*
  FROM (SELECT   ID, NAME, MAX (ROWID) rid
            FROM MASTER
        GROUP BY ID, NAME) i,
       MASTER m
 WHERE i.rid = m.ROWID

Now I would like to write a query to pull pecentage for each column of master table where the percentage is calculated
(number of not null values for a column) / (total number of unique rows)

ID column got total 9 records (after above query using rowid) and it got 6 rows populated and 3 rows are null
so the percentage for id is 6/9*100

I could achieve this by using -
SELECT count(m.id)/count(*) "%id", count(m.name)/count(*) "%name",count(m.loc_id)/count(*) "%loc_id"
  FROM (SELECT   ID, NAME, MAX (ROWID) rid
            FROM MASTER
        GROUP BY ID, NAME) i,
       MASTER m
 WHERE i.rid = m.ROWID
 

but now, I would like calculate the percentages after the below change -
I would like to consider the "master" table column value as NULL for a record if the rec_id of master table is mapped to any value in the "lookup" table.
and lookup table contains the column for which it need to be considered as NULL.
for e.g. rec_id=100 of master table is mapped to 2 columns in the lookup table i.e. id and name so id and name for this record need to be considered as null.
where as rec_id=200 of master table is mapped to only one column i.e. name. so name need to be considered as null.

Thank you in advance.

Regards,
Pointers



Re: percentages of data population [message #652052 is a reply to message #652027] Tue, 31 May 2016 09:39 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Quote:
ID and NAME are key columns in the master table i.e. id and name forms a unique record and to get rid of duplicates.


So how this is possible:
Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
 Values(9, 'z', NULL, NULL);

Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
 Values(9, 'z', NULL, NULL);

Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
 Values(9, 'z', NULL, NULL);

Insert into SCOTT.MASTER(ID, NAME, LOC_ID, REC_ID)
 Values(9, 'z', NULL, NULL);


1/ Do NOT post your statement with a schema, tablespace or other specific stuff like that, why should be put garbage in my SCOTT schema?
2/ Create your tables with all the constraints, this will prevent your from posting wring stuff
3/ What does MAX(ROWID) (and ROWID itself) has to do with the question?
4/ What is the purpose of LOC_ID and REC_ID in the problem? They are mentioned nowhere.

[Updated on: Tue, 31 May 2016 09:40]

Report message to a moderator

Re: percentages of data population [message #652053 is a reply to message #652052] Tue, 31 May 2016 10:29 Go to previous messageGo to next message
pointers
Messages: 451
Registered: May 2008
Senior Member
- Sorry for the schema names.
- Its a warehouse environment. There are no constraints but as per the business ID and NAME forms a unique record.
-loc_id is just another column for which we calculate percentage.
- ROWID is used to pick the latest record incase of duplicate records w.r.t ID and NAME.
- REC_ID is a column which is used to join lookup table (recordid column)

Please let me know your suggestion.

Thank you.

Regards,
Pointers
Re: percentages of data population [message #652054 is a reply to message #652053] Tue, 31 May 2016 10:42 Go to previous messageGo to next message
cookiemonster
Messages: 13920
Registered: September 2008
Location: Rainy Manchester
Senior Member
You can't use rowid to pick the latest record. Indeed nothing can be read into its order.
Re: percentages of data population [message #652055 is a reply to message #652053] Tue, 31 May 2016 10:54 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Quote:
There are no constraints but as per the business ID and NAME forms a unique record


This is wrong in your test case as I showed it. So post a valid test case.

Quote:
Please let me know your suggestion.


Just build the row source using both tables then count in an outer query (it is not mandatory but it is clearer code and can't degrade the performances).

Re: percentages of data population [message #652057 is a reply to message #652053] Tue, 31 May 2016 11:41 Go to previous messageGo to next message
John Watson
Messages: 8929
Registered: January 2010
Location: Global Village
Senior Member
This is off topic for your question, but this statement
Quote:
- Its a warehouse environment. There are no constraints but as per the business ID and NAME forms a unique record

is awful. If you do not define constraints, you are crippling the optimizer. Constraints are not just about data integrity, you know: they give the optimizer the information it needs to develop good execution plans. In a warehouse, this is even more important than in TP systems.
Re: percentages of data population [message #652059 is a reply to message #652057] Tue, 31 May 2016 12:01 Go to previous messageGo to next message
pointers
Messages: 451
Registered: May 2008
Senior Member
Yes I agree. But that is how the system is defined.
@micheal: I have no clue, can you pls help me.
Thank you in advance.

Regards,
Pointers
Re: percentages of data population [message #652060 is a reply to message #652059] Tue, 31 May 2016 12:04 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Quote:
So post a valid test case.


Re: percentages of data population [message #652061 is a reply to message #652059] Tue, 31 May 2016 12:23 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Ignoring your "key", assuming there is none, and ignoring for a first step the lookup table, the percentages query is:
SQL> select round(100*count(distinct id)/count(*),2) "%id",
  2         round(100*count(distinct name)/count(*),2) "%name",
  3         round(100*count(distinct loc_id)/count(*),2) "%loc_id"
  4  from master
  5  /
       %id      %name    %loc_id
---------- ---------- ----------
     46.15      30.77      23.08

Now your lookup table is just a way to dynamically modify the master table, so replace in the query the master table by an inline view reflecting these changes).
First you have to pivot the lookup table:
SQL> select recordid, id, name, loc_id
  2  from lookup
  3  pivot (max(column_name)
  4         for column_name in ('id' "ID", 'name' "NAME", 'loc_id' "LOC_ID"))
  5  /
  RECORDID ID                   NAME                 LOC_ID
---------- -------------------- -------------------- --------------------
       100 id                   name
       400 id
       200                      name

Then join this to master table get the new master table:
SQL> select nvl2(l.id,     null, m.id) id,
  2         nvl2(l.name,   null, m.name) name,
  3         nvl2(l.loc_id, null, m.loc_id) loc_id,
  4         rec_id
  5  from master m,
  6       ( select recordid, id, name, loc_id
  7         from lookup
  8         pivot (max(column_name)
  9                for column_name in ('id' "ID", 'name' "NAME", 'loc_id' "LOC_ID")) ) l
 10  where l.recordid (+) = m.rec_id
 11  /
        ID NAME                 LOC_ID                   REC_ID
---------- -------------------- -------------------- ----------
                                1000                        100
                                1000                        100
                                4000                        400
                                                            200
         9 z
         9 z
         9 z
         9 z
         8
         6
           i

         2                      2000

compare to:
SQL> select * from master;
        ID NAME                 LOC_ID                   REC_ID
---------- -------------------- -------------------- ----------
         1 s                    1000                        100
         2                      2000

           i
           k                                                200
         6
         7                      4000                        400
         8
         1 s                    1000                        100
         9 z
         9 z
         9 z
         9 z

Is this correct?
Re: percentages of data population [message #652062 is a reply to message #652061] Tue, 31 May 2016 12:57 Go to previous messageGo to next message
pointers
Messages: 451
Registered: May 2008
Senior Member
I cant stop myself thanking you Micheal.

This should really solve the issue.

I would test it tomorrow and let you know.
But as I believe this should help me.

Thank you very much again.

Just curious is there any other way to achieve the same (Just to learn other way. I am learning new techniques from you )

Regards,
Pointers
Re: percentages of data population [message #652148 is a reply to message #652062] Thu, 02 June 2016 14:32 Go to previous message
Bill B
Messages: 1971
Registered: December 2004
Senior Member
Just curious, why is the ID number in master allowed to be null? Shouldn't it be the primary key, not null, and filled with a sequence.
Previous Topic: Data not inserted after MView creation
Next Topic: transpose and insert the data
Goto Forum:
  


Current Time: Thu Apr 18 21:51:42 CDT 2024