Home » SQL & PL/SQL » SQL & PL/SQL » Duplicate Data (Oracle 11g)
Duplicate Data [message #637738] Sat, 23 May 2015 02:03 Go to next message
plsql_guy
Messages: 6
Registered: May 2015
Junior Member
Hi Guys,

Recently I was asked one question in an Interview and I was not able to figure it out. May be you guys can help me on that. Just curious to know the answer.

There is one table where the distance is stored between two cities. But there are two entries for each distance. One from CITY A to CITY B and then again from CITY B to CITY A which is a replica of what we store First. e.g one entry is DELHI to MUMBAI - DISTANCE 2000 and the other entry is MUMBAI to DELHI - DISTANCE 2000. I just need one unique entry not the other one. Can you please help me with the query which will only list out unique records.

CREATE TABLE distance
  (
     city1 VARCHAR2(20),
     city2 VARCHAR2(20),
     dist  NUMBER
  );

INSERT INTO distance
VALUES      ('DELHI',
             'MUMBAI',
             4000);

INSERT INTO distance
VALUES      ('MUMBAI',
             'DELHI',
             4000);

INSERT INTO distance
VALUES      ('DELHI',
             'PUNE',
             2000);

INSERT INTO distance
VALUES      ('PUNE',
             'DELHI',
             2000);

INSERT INTO distance
VALUES      ('MUMBAI',
             'PUNE',
             6000);

INSERT INTO distance
VALUES      ('PUNE',
             'MUMBAI',
             6000);

COMMIT; 



I am looking for the below output :

CITY1 CITY2 DISTANCE

DELHI MUMBAI 4000
DELHI PUNE 2000
MUMBAI PUNE 6000

Thanks!!!


--moderator update: correcte the closing [code] tag

[Updated on: Sat, 23 May 2015 02:26] by Moderator

Report message to a moderator

Re: Duplicate Data [message #637740 is a reply to message #637738] Sat, 23 May 2015 02:26 Go to previous messageGo to next message
John Watson
Messages: 8929
Registered: January 2010
Location: Global Village
Senior Member
Welcome to the forum. This looks more like a college homework question than an interview question. You need to remove duplicate rows, defining a duplicate of city1||city2 as being city2||city1. I would write a query that SELECTs from DISTANCE with a sub-query in the predicate that excludes rows that match CITY1||CITY2 with CITY2||CITY1.
Re: Duplicate Data [message #637741 is a reply to message #637740] Sat, 23 May 2015 02:55 Go to previous messageGo to next message
plsql_guy
Messages: 6
Registered: May 2015
Junior Member
Hi John,

I tried the following query but still getting no results :

select * from distance d1
where not exists(Select 1 from distance d2
                  where d1.CITY1||d1.CITY2 = d2.CITY2||d2.CITY1 );

Re: Duplicate Data [message #637742 is a reply to message #637741] Sat, 23 May 2015 03:17 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

You get the expected result with your query: all rows that have no duplicates that is no rows with the test case you provided as all rows have a duplicate.

Add a new row like the following one and your query will return it as it has no duplicates:
INSERT INTO distance VALUES ('PARIS','MARSEILLE',800);

Re: Duplicate Data [message #637743 is a reply to message #637742] Sat, 23 May 2015 03:20 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

If you want all the rows but no duplicates you have to add a condition in your subquery to not exclude both duplicates that is ensure that one is kept.

Re: Duplicate Data [message #637744 is a reply to message #637743] Sat, 23 May 2015 03:47 Go to previous messageGo to next message
plsql_guy
Messages: 6
Registered: May 2015
Junior Member
Hi Michel,

I am asking for that one condition which will kept only single combination and exclude the other entry. I am not able to come up with that. Can you share one if you have thought it through.
Re: Duplicate Data [message #637745 is a reply to message #637744] Sat, 23 May 2015 03:57 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

I already did:
Quote:
you have to add a condition in your subquery to not exclude both duplicates that is ensure that one is kept.


For each duplicate couple your condition "where d1.CITY1||d1.CITY2 = d2.CITY2||d2.CITY1" is true so both duplicates are select and so removed.
You have to add a condition "where d1.CITY1||d1.CITY2 = d2.CITY2||d2.CITY1 and <something>" so that only one out of the two is selected.

Read How does one eliminate duplicates rows from a table? for other hints.

Re: Duplicate Data [message #637749 is a reply to message #637743] Sat, 23 May 2015 06:57 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
Michel Cadot wrote on Sat, 23 May 2015 04:20

If you want all the rows but no duplicates you have to add a condition in your subquery to not exclude both duplicates that is ensure that one is kept.



Or better to get rid of subquery altogether:

select  least(city1,city2) city1,
        greatest(city1,city2) city2,
        dist
  from  distance
  group by least(city1,city2),
           greatest(city1,city2),
           dist
/

CITY1                CITY2                      DIST
-------------------- -------------------- ----------
MUMBAI               PUNE                       6000
DELHI                MUMBAI                     4000
DELHI                PUNE                       2000

SQL> 


SY.
Re: Duplicate Data [message #637750 is a reply to message #637749] Sat, 23 May 2015 07:31 Go to previous messageGo to next message
John Watson
Messages: 8929
Registered: January 2010
Location: Global Village
Senior Member
A much better solution. If this really was a homework question, that should get top marks Smile
Re: Duplicate Data [message #637751 is a reply to message #637750] Sat, 23 May 2015 07:43 Go to previous messageGo to next message
plsql_guy
Messages: 6
Registered: May 2015
Junior Member
Thanks Solomon Yakobson !!!! I would have never come up with that solution. Really out of the Box ,I must say!!! Smile Smile
Re: Duplicate Data [message #637752 is a reply to message #637751] Sat, 23 May 2015 07:48 Go to previous messageGo to next message
plsql_guy
Messages: 6
Registered: May 2015
Junior Member
John ,

It was really not a homework Question but an Interview question. You can see it is not easy to come up with that solution in quick time under pressure. Smile Smile
Re: Duplicate Data [message #637754 is a reply to message #637752] Sat, 23 May 2015 08:48 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

This one no, but the other one with the subquery, yes. Try to find it.
And in addition, what should be added to the table definition to avoid such duplicates?

Re: Duplicate Data [message #637755 is a reply to message #637754] Sat, 23 May 2015 09:00 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
Unique FBI:

SQL> create unique index distance_uidx1
  2    on distance(
  3                least(city1,city2),
  4                greatest(city1,city2)
  5               )
  6  /

Index created.

SQL> INSERT INTO distance
  2  VALUES      ('DELHI',
  3               'MUMBAI',
  4               4000);

1 row created.

SQL> 
SQL> INSERT INTO distance
  2  VALUES      ('MUMBAI',
  3               'DELHI',
  4               4000);
INSERT INTO distance
*
ERROR at line 1:
ORA-00001: unique constraint (SCOTT.DISTANCE_UIDX1) violated


SQL> 


SY.

Re: Duplicate Data [message #637756 is a reply to message #637755] Sat, 23 May 2015 09:48 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

There is a better solution without index... and the exercise is not for you but for OP. Smile
Hint: it is also a solution for the query with EXISTS.

Re: Duplicate Data [message #637757 is a reply to message #637756] Sat, 23 May 2015 16:08 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
or do it without using EXISTS

WITH match 
     AS (SELECT d1.city1, 
                Max(d1.rowid) mrid 
         FROM   distance d1 
         GROUP  BY d1.city1) 
SELECT d1.city1, 
       d1.city2, 
       d1.dist 
FROM   match, 
       distance d1 
WHERE  d1.rowid = match.mrid 
Re: Duplicate Data [message #637758 is a reply to message #637756] Sat, 23 May 2015 17:37 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
Michel Cadot wrote on Sat, 23 May 2015 10:48

There is a better solution without index...


Right, check constraint would do.

SY.

[Updated on: Sat, 23 May 2015 17:40]

Report message to a moderator

Re: Duplicate Data [message #637772 is a reply to message #637757] Sun, 24 May 2015 05:09 Go to previous messageGo to next message
plsql_guy
Messages: 6
Registered: May 2015
Junior Member
Thanks BlackSwan!!!! That is also an interesting solution. Now I truly understand the role of Rowid's.

[Updated on: Sun, 24 May 2015 05:10]

Report message to a moderator

Re: Duplicate Data [message #637774 is a reply to message #637772] Sun, 24 May 2015 05:41 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Note that this is in the link I gave you but you obviously did not read.
And you still did not try to find what John and I suggested you. Solomon found the answer, what about you?
As far as I can see you just want to be fed and are not interested in learning.

Re: Duplicate Data [message #637777 is a reply to message #637774] Sun, 24 May 2015 09:27 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
>Now I truly understand the role of Rowid's.
forgive me, but I don't believe you since ROWID have no role other than to uniquely identify a single row.
Re: Duplicate Data [message #637865 is a reply to message #637774] Thu, 28 May 2015 01:22 Go to previous messageGo to next message
anniepeteroracle
Messages: 47
Registered: March 2012
Location: bangalore
Member
"Note that this is in the link I gave you but you obviously did not read.
And you still did not try to find what John and I suggested you. Solomon found the answer, what about you?
As far as I can see you just want to be fed and are not interested in learning."

Thanks Michel this like is very useful .
Re: Duplicate Data [message #637868 is a reply to message #637865] Thu, 28 May 2015 01:27 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

More than your post as it encourages to read the link to find other solutions and problems.

[Updated on: Thu, 28 May 2015 01:27]

Report message to a moderator

Re: Duplicate Data [message #637900 is a reply to message #637868] Thu, 28 May 2015 09:45 Go to previous messageGo to next message
msol25
Messages: 396
Registered: June 2011
Senior Member
Dear Friend,

Please find solution:


select *
from
(
        select  city1,
                city2,
                dist,
                row_number() over (partition by dist order by dist) as rnk
        from    distance
 )
 where rnk = 1;

Re: Duplicate Data [message #637901 is a reply to message #637900] Thu, 28 May 2015 10:08 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

This is wrong.

In addition, note that if you "partition by dist" then all rows of a partition have the same dist and so "order by dist" is meaningless.

Re: Duplicate Data [message #637904 is a reply to message #637738] Thu, 28 May 2015 14:36 Go to previous messageGo to next message
saadatahmad
Messages: 452
Registered: March 2005
Location: Germany/Paderborn
Senior Member

why not keep it more simple Smile

SQL> select * from distance
  2  /

CITY1                CITY2                      DIST
-------------------- -------------------- ----------
DELHI                MUMBAI                     4000
MUMBAI               DELHI                      4000
DELHI                PUNE                       2000
PUNE                 DELHI                      2000
MUMBAI               PUNE                       6000
PUNE                 MUMBAI                     6000

6 rows selected.

SQL> delete from distance
  2  where rowid not in
  3  (
  4   select min(rowid)
  5   from distance
  6   group by dist
  7  )
  8  /

3 rows deleted.

SQL> select * from distance
  2  /

CITY1                CITY2                      DIST
-------------------- -------------------- ----------
DELHI                MUMBAI                     4000
DELHI                PUNE                       2000
MUMBAI               PUNE                       6000

Re: Duplicate Data [message #637905 is a reply to message #637904] Thu, 28 May 2015 14:39 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

This is also wrong.

Re: Duplicate Data [message #637907 is a reply to message #637905] Thu, 28 May 2015 14:47 Go to previous messageGo to next message
saadatahmad
Messages: 452
Registered: March 2005
Location: Germany/Paderborn
Senior Member

I posted only what was required in the original question...I don't see anything wrong in it. It gives the required output.

regards

[Updated on: Thu, 28 May 2015 14:48]

Report message to a moderator

Re: Duplicate Data [message #637910 is a reply to message #637907] Thu, 28 May 2015 14:53 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

The fact it give the required output for the given test case does not prove it is correct to answer the question.
So I repeat it is wrong.
I let you find in what.

Re: Duplicate Data [message #637911 is a reply to message #637910] Thu, 28 May 2015 14:56 Go to previous messageGo to next message
saadatahmad
Messages: 452
Registered: March 2005
Location: Germany/Paderborn
Senior Member

I beleive in you...Smile
Re: Duplicate Data [message #637912 is a reply to message #637738] Fri, 29 May 2015 00:32 Go to previous messageGo to next message
jgjeetu
Messages: 373
Registered: July 2013
Location: www.Orafaq.com/Forum
Senior Member

and what about this:-

CITY1  CITY2  CITY3  DISTANCE

DELHI  MUMBAI  PUNE     4000
PUNE   DELHI   MUMBAI   4000
MUMBAI PUNE    DELHI    4000
PUNJAB HARYANA UP       6000
UP     HARYANA PUNJAB   6000
HARYANA PUNJAB UP       6000


REQUIRED OUTPUT:-
CITY1  CITY2  CITY3  DISTANCE

DELHI  MUMBAI  PUNE     4000
HARYANA PUNJAB UP       6000




[Edit MC: Change color to avoid confusion with original question.]

[Updated on: Fri, 29 May 2015 00:48] by Moderator

Report message to a moderator

Re: Duplicate Data [message #637913 is a reply to message #637912] Fri, 29 May 2015 00:47 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

This required output has nothing to do with OP's question.
So create your own topic, don't mix up question.
And explain the requirements WITH WORDS.

Re: Duplicate Data [message #637914 is a reply to message #637738] Fri, 29 May 2015 01:34 Go to previous messageGo to next message
msol25
Messages: 396
Registered: June 2011
Senior Member
Dear Michel,

As per question given by OP's,I hope we can give like this :


select *
from
(
        select  city1,
                city2,
                dist,
                row_number() over (partition by dist order by city1,city2,dist) as rnk
        from    distance
 )
 where rnk = 1;



I hope in this case partition based on dist and ordering based on city1,city2,dist.In this ordering will be done based on city1,city2,dist.

Here,Should have restriction that ordering will be done based on city1,city2 column ASCII Value for displaying output.Like ASCII for "D" is 68 and "M" is having 77.So Accoringly output will get prioritize for assigning the rank and accordingly we have to display where rank = 1.
Re: Duplicate Data [message #637917 is a reply to message #637914] Fri, 29 May 2015 01:45 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

No, this is not this. You have been misled by the wrong answers.
Just to put an end to this error add the following row (which must appear in the result):
INSERT INTO distance VALUES ('PARIS','ATHENS',2000);


And my remark about partitioning and ordering on the same column also applies to your query.

Re: Duplicate Data [message #637919 is a reply to message #637917] Fri, 29 May 2015 01:55 Go to previous messageGo to next message
msol25
Messages: 396
Registered: June 2011
Senior Member
Dear Michel,

In This case getting o/p as :


CITY1	CITY2	DIST	RNK

PARIS	ATHENS	2000	1
DELHI	MUMBAI	4000	1
MUMBAI	PUNE	6000	1



Becasue,ASCII('A') is 65 and ASCII('D') is 68.Both type of records belongs to same partitioning group i.e. DIST=2000.

So,I hope this is correct.

Re: Duplicate Data [message #637920 is a reply to message #637919] Fri, 29 May 2015 01:59 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Result which is wrong.
Where is PUNE <-> DELHI?

Re: Duplicate Data [message #638458 is a reply to message #637738] Fri, 12 June 2015 02:54 Go to previous messageGo to next message
nciteamo
Messages: 27
Registered: October 2014
Junior Member
select city2, city1 ,dist from(
select rownum id, a.* from distance a) 
where id in (select id+(id+1)-1 from (select rownum id, a.* from distance a))


i just wanna try to help and to be more active in this forum, sorry if it goes wrong

[Updated on: Fri, 12 June 2015 02:56]

Report message to a moderator

Re: Duplicate Data [message #638459 is a reply to message #638458] Fri, 12 June 2015 03:06 Go to previous messageGo to next message
cookiemonster
Messages: 13920
Registered: September 2008
Location: Rainy Manchester
Senior Member
1) this:
id+(id+1)-1

is a pointlessly complicated way of writing this:
id*2

2) The whole query could be rewritten as this:
select city2, city1 ,dist from(
select rownum id, a.* from distance a) 
where mod(id,2) = 0;

3) You're assuming the duplicates will come together in the inline view query, it's highly doubtful that they will
Re: Duplicate Data [message #638461 is a reply to message #638458] Fri, 12 June 2015 03:11 Go to previous message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Did you try your query with the rows I added?
It does not work.
The correct result is the one returned by Solomon's query:
SQL> select * from distance;
CITY1                CITY2                      DIST
-------------------- -------------------- ----------
DELHI                MUMBAI                     4000
MUMBAI               DELHI                      4000
DELHI                PUNE                       2000
PUNE                 DELHI                      2000
MUMBAI               PUNE                       6000
PUNE                 MUMBAI                     6000
PARIS                MARSEILLE                   800
PARIS                ATHENS                     2000

8 rows selected.

SQL> select  least(city1,city2) city1,
  2          greatest(city1,city2) city2,
  3          dist
  4    from  distance
  5    group by least(city1,city2),
  6             greatest(city1,city2),
  7             dist
  8  /
CITY1                CITY2                      DIST
-------------------- -------------------- ----------
MUMBAI               PUNE                       6000
MARSEILLE            PARIS                       800
ATHENS               PARIS                      2000
DELHI                MUMBAI                     4000
DELHI                PUNE                       2000

5 rows selected.
Previous Topic: Procedure or Function
Next Topic: merge columns from 2 different tables
Goto Forum:
  


Current Time: Wed Apr 17 23:49:02 CDT 2024