OraFAQ Forum: SQL & PL/SQL » how to get the rows of different values of single column

Home » SQL & PL/SQL » SQL & PL/SQL » how to get the rows of different values of single column

Show: Today's Messages :: Polls :: Message Navigator
E-mail to friend

how to get the rows of different values of single column [message #684197]

Wed, 21 April 2021 07:14

Sekhar6617
Messages: 22
Registered: March 2021

Junior Member

Hi Team,

Requesting you please help me with the query for selecting rows with different values of single column.
Below is the scenario explained with example. In this below example PO_ID = 2675 should be returned as a output since it has two distinct date fields.

CREATE TABLE PO_TBL (
    PO_ID int,
    DATE DATE,
    AMOUNT  INT, 
    Address varchar(255),
    City varchar(255)
);

INSERT INTO PO_TBL VALUES (2675, '02/01/2021', 665.00, 'TEXAS', 'TEXAS');
INSERT INTO PO_TBL VALUES (2675, '02/01/2021', 1023.00, 'TEXAS', 'TEXAS');
INSERT INTO PO_TBL VALUES (2675, '02/01/2021', 125.00, 'TEXAS', 'TEXAS');
INSERT INTO PO_TBL VALUES (2675, '02/25/2021', 78.00, 'TEXAS', 'TEXAS');

INSERT INTO PO_TBL VALUES (5360, '02/01/2021', 231.00, 'TEXAS', 'TEXAS');
INSERT INTO PO_TBL VALUES (5360, '02/01/2021', 349.00, 'TEXAS', 'TEXAS');

OUTPUT:

PO_ID     DATE         
2675      02/01/2021
2675      02/25/2021

I have tried with below sql query but results are not as expected.

SELECT A.PO_ID, A.DATE, COUNT(A.DATE) FROM PO_TBL A
WHERE A.DATE NOT IN
(SELECT DISTINCT B.DATE FROM PO_TBL B 
WHERE A.PO_ID = B.PO_ID
AND A.DATE <> B.DATE)
GROUP BY A.PO_ID, A.DATE

Thank you.

Regards
Sekhar

Report message to a moderator

Re: how to get the rows of different values of single column [message #684198 is a reply to message #684197]

Wed, 21 April 2021 07:54

Michel Cadot
Messages: 68645
Registered: March 2007
Location: Nanterre, France, http://...

Senior Member
Account Moderator

Always post your Oracle version, with 4 decimals (query v$version), as often solution depends on it.

DATE is not a valid column name:

SQL> CREATE TABLE PO_TBL (
  2      PO_ID int,
  3      DATE DATE,
  4      AMOUNT  INT,
  5      Address varchar(255),
  6      City varchar(255)
  7  );
    DATE DATE,
    *
ERROR at line 3:
ORA-00904: : invalid identifier

SQL> CREATE TABLE PO_TBL (
  2      PO_ID int,
  3      my_DATE DATE,
  4      AMOUNT  INT,
  5      Address varchar(255),
  6      City varchar(255)
  7  );

Table created.

SQL> INSERT INTO PO_TBL VALUES (2675, '02/01/2021', 665.00, 'TEXAS', 'TEXAS');
INSERT INTO PO_TBL VALUES (2675, '02/01/2021', 665.00, 'TEXAS', 'TEXAS')
                                 *
ERROR at line 1:
ORA-01843: not a valid month

ALWAYS use TO_DATE with a format mask to insert a date from a string.
Also note: how Oracle would know if 02/01 is January, 2nd or February, 1st?
Correct syntax is:

SQL> INSERT INTO PO_TBL VALUES (2675, TO_DATE('02/01/2021','MM/DD/YYYY'), 665.00, 'TEXAS', 'TEXAS');

1 row created.

Now for the question:

SQL> select * from PO_TBL order by 1, 2;
     PO_ID MY_DATE                 AMOUNT ADDRESS    CITY
---------- ------------------- ---------- ---------- ----------
      2675 01/02/2021 00:00:00        665 TEXAS      TEXAS
      2675 01/02/2021 00:00:00       1023 TEXAS      TEXAS
      2675 01/02/2021 00:00:00        125 TEXAS      TEXAS
      2675 25/02/2021 00:00:00         78 TEXAS      TEXAS
      5360 01/02/2021 00:00:00        231 TEXAS      TEXAS
      5360 01/02/2021 00:00:00        349 TEXAS      TEXAS

6 rows selected.

SQL> select distinct po_id, my_date
  2  from po_tbl
  3  where po_id in ( select po_id from po_tbl
  4                   group by po_id
  5                   having count(distinct my_date) > 1 )
  6  order by po_id, my_date
  7  /
     PO_ID MY_DATE
---------- -------------------
      2675 01/02/2021 00:00:00
      2675 25/02/2021 00:00:00

2 rows selected.

Report message to a moderator

Re: how to get the rows of different values of single column [message #684199 is a reply to message #684198]

Wed, 21 April 2021 12:21

Sekhar6617
Messages: 22
Registered: March 2021

Junior Member

Thank you Michel.
Sure, I will take care of Data Base version and will follow all your instructions.

The given solution working perfectly. Thank you.

Regards
Sekhar

Report message to a moderator

Re: how to get the rows of different values of single column [message #684200 is a reply to message #684198]

Wed, 21 April 2021 12:27

John Watson
Messages: 8931
Registered: January 2010
Location: Global Village

Senior Member

Here's another solution:

orclz> select distinct p.po_id,p.my_Date from po_tbl p where p.my_date <> any (select q.my_date from po_tbl q where p.po_id=q.po_id);

          PO_ID MY_DATE
--------------- -------------------
           2675 2021-02-25:00:00:00
           2675 2021-02-01:00:00:00

I love finding a reason to use the ANY or ALL operators. On my system, the cost is lower than Michel's query:

orclz> set autot trace exp
orclz> select distinct p.po_id,p.my_Date from po_tbl p where p.my_date <> any (select q.my_date from po_tbl q where p.po_id=q.po_id);

Execution Plan
----------------------------------------------------------
Plan hash value: 2711566224

------------------------------------------------------------------------------
| Id  | Operation           | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |     6 |   264 |     7  (15)| 00:00:01 |
|   1 |  HASH UNIQUE        |        |     6 |   264 |     7  (15)| 00:00:01 |
|*  2 |   HASH JOIN SEMI    |        |     6 |   264 |     6   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| PO_TBL |     6 |   132 |     3   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| PO_TBL |     6 |   132 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("P"."PO_ID"="Q"."PO_ID")
       filter("P"."MY_DATE"<>"Q"."MY_DATE")

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

orclz>
orclz> select distinct po_id, my_date
  2      from po_tbl
  3    where po_id in ( select po_id from po_tbl
  4                       group by po_id
  5                       having count(distinct my_date) > 1 )
  6      order by po_id, my_date;

Execution Plan
----------------------------------------------------------
Plan hash value: 4065634225

-------------------------------------------------------------------------------------
| Id  | Operation               | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |           |    18 |  1008 |     8  (25)| 00:00:01 |
|   1 |  SORT UNIQUE NOSORT     |           |    18 |  1008 |     8  (25)| 00:00:01 |
|*  2 |   FILTER                |           |       |       |            |          |
|   3 |    SORT GROUP BY        |           |    18 |  1008 |     8  (25)| 00:00:01 |
|*  4 |     HASH JOIN           |           |    18 |  1008 |     7  (15)| 00:00:01 |
|   5 |      VIEW               | VM_NWVW_2 |     6 |   132 |     4  (25)| 00:00:01 |
|   6 |       HASH GROUP BY     |           |     6 |   132 |     4  (25)| 00:00:01 |
|   7 |        TABLE ACCESS FULL| PO_TBL    |     6 |   132 |     3   (0)| 00:00:01 |
|   8 |      TABLE ACCESS FULL  | PO_TBL    |     6 |   204 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(COUNT("$vm_col_1")>1)
   4 - access("PO_ID"="$vm_col_2")

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold

orclz>

Report message to a moderator

Re: how to get the rows of different values of single column [message #684204 is a reply to message #684200]

Thu, 22 April 2021 08:31

Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA

Senior Member

Single path through PO_TBL:

WITH T AS (
           SELECT  PO_ID,
                   MY_DATE,
                   COUNT(DISTINCT MY_DATE) OVER(PARTITION BY PO_ID) CNT,
                   ROW_NUMBER() OVER(PARTITION BY PO_ID,MY_DATE ORDER BY 1) RN
             FROM  PO_TBL
          )
SELECT  PO_ID,
        MY_DATE
  FROM  T
  WHERE CNT > 1
    AND RN = 1
/

     PO_ID MY_DATE
---------- ----------
      2675 02/01/2021
      2675 02/25/2021

Execution Plan
----------------------------------------------------------
Plan hash value: 118720869

------------------------------------------------------------------------------
| Id  | Operation           | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |     6 |   288 |     4  (25)| 00:00:01 |
|*  1 |  VIEW               |        |     6 |   288 |     4  (25)| 00:00:01 |
|   2 |   WINDOW SORT       |        |     6 |   132 |     4  (25)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| PO_TBL |     6 |   132 |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CNT">1 AND "RN"=1)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

SQL>

SY.

Report message to a moderator

Re: how to get the rows of different values of single column [message #684206 is a reply to message #684204]

Thu, 22 April 2021 12:24

Michel Cadot
Messages: 68645
Registered: March 2007
Location: Nanterre, France, http://...

Senior Member
Account Moderator

I hesitated to post a solution with analytic functions because this is one of the cases where they are most often less efficient.
If you create an index on the table like:

SQL> create index PO_TBL_idx on PO_TBL (po_id,my_date) compress 1;

Index created.

Then the previous queries use the index when the latest one does not (I removed the costs as they are meaningless for such a small table):

SQL> set autot trace exp
SQL> select distinct p.po_id,p.my_Date
  2  from po_tbl p
  3  where p.my_date <> any (select q.my_date from po_tbl q where p.po_id=q.po_id)
  4  /

Execution Plan
----------------------------------------------------------
Plan hash value: 1247696620

----------------------------------------
| Id  | Operation         | Name       |
----------------------------------------
|   0 | SELECT STATEMENT  |            |
|   1 |  HASH UNIQUE      |            |
|*  2 |   HASH JOIN       |            |
|   3 |    INDEX FULL SCAN| PO_TBL_IDX |
|   4 |    INDEX FULL SCAN| PO_TBL_IDX |
----------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("P"."PO_ID"="Q"."PO_ID")
       filter("P"."MY_DATE"<>"Q"."MY_DATE")

SQL> WITH T AS (
  2             SELECT  PO_ID,
  3                     MY_DATE,
  4                     COUNT(DISTINCT MY_DATE) OVER(PARTITION BY PO_ID) CNT,
  5                     ROW_NUMBER() OVER(PARTITION BY PO_ID,MY_DATE ORDER BY 1) RN
  6               FROM  PO_TBL
  7            )
  8  SELECT  PO_ID,
  9          MY_DATE
 10    FROM  T
 11    WHERE CNT > 1
 12      AND RN = 1
 13  /

Execution Plan
----------------------------------------------------------
Plan hash value: 118720869

--------------------------------------
| Id  | Operation           | Name   |
--------------------------------------
|   0 | SELECT STATEMENT    |        |
|*  1 |  VIEW               |        |
|   2 |   WINDOW SORT       |        |
|   3 |    TABLE ACCESS FULL| PO_TBL |
--------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CNT">1 AND "RN"=1)

Then index should be far smaller than the table (about 10 bytes per row in the index over several hundred for the table).
Of course, best query depends on the table cardinality and data.
(Note: I used a 11.2.0.4 db for this, maybe a later version knows to use the index with analytic functions.)

Report message to a moderator

Re: how to get the rows of different values of single column [message #684207 is a reply to message #684204]

Thu, 22 April 2021 12:25

John Watson
Messages: 8931
Registered: January 2010
Location: Global Village

Senior Member

Now that is pretty good! I wish I were more familiar with the analytic functions. However, I wonder about scale. I did a couple of experiments, and if I index PO_ID the JW and MC solutions use the index but the SY solution does not. I tried hacking the object stat's to make the CBO think the table was huge in a very crude way:
EXEC DBMS_STATS.SET_TABLE_STATS(user,'PO_TBL',numrows=>100000000,numblks=>10000000)
and it did look as though my solution was best Smile

But I wouldn't think that bears much relation to reality.

Report message to a moderator

Re: how to get the rows of different values of single column [message #684208 is a reply to message #684207]

Thu, 22 April 2021 12:27

Michel Cadot
Messages: 68645
Registered: March 2007
Location: Nanterre, France, http://...

Senior Member
Account Moderator

Ha-ha, same thought at same time. Smile

Report message to a moderator

Re: how to get the rows of different values of single column [message #684209 is a reply to message #684208]

Thu, 22 April 2021 16:43

Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA

Senior Member

Michel, reason why John's solution is using index is simple - optimizer understands where p.my_date <> any (select q.my_date from po_tbl q where p.po_id=q.po_id) will not produce TRUE for rows where PO_ID is null or MY_DATE is null. And, BTW, it makes this solution a non-working solution for NULL PO_ID, althought most likely it is PK. Anyway, if we want analytics to take advantage of indexing PO_ID, MY_DATE and both are nullable then all we need is either add where po_id is not null and my_date is not null (but that's again could be an issue if PO_ID can be NULL) or add a constant to index to make sure all rows are indexed:

SQL> CREATE INDEX PO_TBL_IDX ON PO_TBL(PO_ID,MY_DATE);

Index created.

SQL> SET AUTOTRACE TRACE EXP
SQL> WITH T AS (
  2             SELECT  PO_ID,
  3                     MY_DATE,
  4                     COUNT(DISTINCT MY_DATE) OVER(PARTITION BY PO_ID) CNT,
  5                     ROW_NUMBER() OVER(PARTITION BY PO_ID,MY_DATE ORDER BY 1) RN
  6               FROM  PO_TBL
  7               WHERE PO_ID IS NOT NULL
  8                 AND MY_DATE IS NOT NULL
  9            )
 10  SELECT  PO_ID,
 11          MY_DATE
 12    FROM  T
 13    WHERE CNT > 1
 14      AND RN = 1
 15  /

Execution Plan
----------------------------------------------------------
Plan hash value: 1314444089

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |     6 |   288 |     1   (0)| 00:00:01 |
|*  1 |  VIEW             |            |     6 |   288 |     1   (0)| 00:00:01 |
|   2 |   WINDOW BUFFER   |            |     6 |   132 |     1   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN| PO_TBL_IDX |     6 |   132 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CNT">1 AND "RN"=1)
   3 - filter("PO_ID" IS NOT NULL AND "MY_DATE" IS NOT NULL)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

SQL> SET AUTOTRACE OFF

SQL> DROP INDEX PO_TBL_IDX;

Index dropped.

SQL> CREATE INDEX PO_TBL_IDX ON PO_TBL(PO_ID,MY_DATE,1);

Index created.

SQL> SET AUTOTRACE TRACE EXP
SQL> WITH T AS (
  2             SELECT  PO_ID,
  3                     MY_DATE,
  4                     COUNT(DISTINCT MY_DATE) OVER(PARTITION BY PO_ID) CNT,
  5                     ROW_NUMBER() OVER(PARTITION BY PO_ID,MY_DATE ORDER BY 1) RN
  6               FROM  PO_TBL
  7            )
  8  SELECT  PO_ID,
  9          MY_DATE
 10    FROM  T
 11    WHERE CNT > 1
 12      AND RN = 1
 13  /

Execution Plan
----------------------------------------------------------
Plan hash value: 1314444089

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |     6 |   288 |     1   (0)| 00:00:01 |
|*  1 |  VIEW             |            |     6 |   288 |     1   (0)| 00:00:01 |
|   2 |   WINDOW BUFFER   |            |     6 |   132 |     1   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN| PO_TBL_IDX |     6 |   132 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CNT">1 AND "RN"=1)

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

SQL>

SY.

Report message to a moderator

Re: how to get the rows of different values of single column [message #684292 is a reply to message #684209]

Mon, 03 May 2021 07:23

Sekhar6617
Messages: 22
Registered: March 2021

Junior Member

Thank you all for your suggestions.

Regards
Sekhar

Report message to a moderator

Previous Topic:	sql regex_instr
Next Topic:	multiple queries and 1 output window/result

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Thu Apr 25 08:57:17 CDT 2024