Home » SQL & PL/SQL » SQL & PL/SQL » remove duplicates (Oracle 9.2 - Windows.)
remove duplicates [message #340411] Tue, 12 August 2008 12:02 Go to next message
cygnusx04
Messages: 16
Registered: February 2008
Junior Member
I have thousands of records containing values seperate by the ^ delimiter. Many of the fields have duplicate values which i'd like to remove. I'm thinking of creating a procedure per field in the table.

an example of data in the field DATA1 is as follows.

IAN^ANDREW^KEVIN^IAN^ANDREW

following the procedure I should be left with

IAN^ANDREW^KEVIN

what is the best way of doing this?

I thought about using instr and substr to get the first value (IAN) using the delimiter. Then replacing all occurrences of IAN with a blank value, then appending 1 instance of IAN to the end of the string. I could repeat this but how do i know when to stop.

Another approach would be to use an array - but this seems more complicated and I'm still not sure how to detect completion.

sounded easy but when I came to pseudo write it, I couldn't Sad
Re: remove duplicates [message #340412 is a reply to message #340411] Tue, 12 August 2008 12:07 Go to previous messageGo to next message
Michel Cadot
Messages: 64111
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
There are many topics on the subject.
Search "value list", "list of values on string" or the like.

Regards
Michel
Re: remove duplicates [message #340429 is a reply to message #340411] Tue, 12 August 2008 16:56 Go to previous messageGo to next message
Barbara Boehmer
Messages: 8631
Registered: November 2002
Location: California, USA
Senior Member
Instead of using one string and concatenating the processed values to the end, you could use two strings, one incoming and one output. As you process each value, add the single value to the output string and remove all occurrences from the input string. I have provided a demonstration below. This would probably be better than using an array as you would still have to use substr and instr to get them into the array and concatenate them afterwards, so you might as well skip the array. There may be a simpler way to accomplish the whole thing using regular expressions.


SCOTT@orcl_11g> CREATE OR REPLACE FUNCTION no_dups
  2    (p_string IN VARCHAR2)
  3    RETURN VARCHAR2
  4  AS
  5    v_string_in   VARCHAR2 (32767) := p_string || '^';
  6    v_value	     VARCHAR2 (32767);
  7    v_string_out  VARCHAR2 (32767);
  8  BEGIN
  9    WHILE LENGTH (v_string_in) > 1 LOOP
 10  	 -- identify first value in incoming string:
 11  	 v_value := SUBSTR (v_string_in, 1, INSTR (v_string_in, '^'));
 12  	 -- concatenate first value to output string:
 13  	 v_string_out := v_string_out || v_value;
 14  	 -- replace all such values in incoming string:
 15  	 v_string_in := REPLACE (v_string_in, v_value, '');
 16  	 -- repeat loop with next value until no values left
 17    END LOOP;
 18    -- return output string:
 19    RETURN RTRIM (v_string_out, '^');
 20  END no_dups;
 21  /

Function created.

SCOTT@orcl_11g> SHOW ERRORS
No errors.
SCOTT@orcl_11g> SELECT no_dups ('IAN^ANDREW^KEVIN^IAN^ANDREW') FROM DUAL
  2  /

NO_DUPS('IAN^ANDREW^KEVIN^IAN^ANDREW')
--------------------------------------------------------------------------------
IAN^ANDREW^KEVIN

SCOTT@orcl_11g> 


Re: remove duplicates [message #340464 is a reply to message #340411] Wed, 13 August 2008 00:51 Go to previous messageGo to next message
Frank
Messages: 7880
Registered: March 2000
Senior Member
I'd say the best way to do it is how it should be done in the first place: create a detail table.
Add a unique key, use the aforementioned way(s) to split the string and Bob's your uncle. No more duplicates and never again any hassle. If you really need the concatenated string anywhere, create a view over the master-detail tables.
Re: remove duplicates [message #340575 is a reply to message #340429] Wed, 13 August 2008 05:59 Go to previous messageGo to next message
cygnusx04
Messages: 16
Registered: February 2008
Junior Member
thanks barbara - this looks good. I can incoperate this into my procedure and use a cursor to process all records.

I see I can also do this with expressions in Oracle 10, but I struggle to understand the code.

thanks again.
Re: remove duplicates [message #340637 is a reply to message #340575] Wed, 13 August 2008 09:40 Go to previous messageGo to next message
cygnusx04
Messages: 16
Registered: February 2008
Junior Member
I found an issue with this code.

dbms_output.put_line(no_dups('BOB^BILLYBOB^KEVIN^BOBBIE^BOB^'));

BOB^BILLYKEVIN^BOBBIE


I've made some changes so all strings are placed inside delimeters. Works correctly now (code a little messy Laughing ).

CREATE OR REPLACE FUNCTION no_dups
(p_string IN VARCHAR2)
RETURN VARCHAR2
AS
v_string_in VARCHAR2 (32767) := p_string || '^';
v_value VARCHAR2 (32767);
v_string_out VARCHAR2 (32767);
BEGIN
v_string_in := REPLACE (v_string_in, '^','^^');
WHILE LENGTH (v_string_in) > 1 LOOP
-- identify first value in incoming string:
v_string_in := LTRIM (v_string_in, '^');
v_value := SUBSTR (v_string_in, 1, INSTR (v_string_in, '^'));
If v_value = '^' then
EXIT;
end if;
v_value := '^' || v_value;
-- concatenate first value to output string:
v_string_out := v_string_out || v_value;
-- replace all such values in incoming string:
v_string_in := REPLACE ('^' || v_string_in, v_value, '^');
-- repeat loop with next value until no values left
END LOOP;
-- return output string:
v_string_out := REPLACE (v_string_out, '^^','^');
v_string_out := LTRIM (v_string_out, '^');
RETURN RTRIM (v_string_out, '^');
END no_dups;
Re: remove duplicates [message #340639 is a reply to message #340637] Wed, 13 August 2008 10:20 Go to previous messageGo to next message
Frank
Messages: 7880
Registered: March 2000
Senior Member
Have fun with your great db-design.
We'll be hearing from you again in a short while I bet.
Re: remove duplicates [message #340640 is a reply to message #340411] Wed, 13 August 2008 10:24 Go to previous messageGo to next message
BlackSwan
Messages: 25038
Registered: January 2009
Location: SoCal
Senior Member
I suspect this was a homework assignment and not part of any actual production system. (At least I hope so.)
Re: remove duplicates [message #340643 is a reply to message #340411] Wed, 13 August 2008 10:35 Go to previous messageGo to next message
cygnusx04
Messages: 16
Registered: February 2008
Junior Member
thanks for the support! This is a vendors DB and the data populated by a 3rd party app - duplicates due to a bug.

I'm just tasked with cleaning it up.

Seems like a lot of attitude here.
Re: remove duplicates [message #340645 is a reply to message #340411] Wed, 13 August 2008 10:39 Go to previous messageGo to next message
BlackSwan
Messages: 25038
Registered: January 2009
Location: SoCal
Senior Member
>This is a vendors DB and the data populated by a 3rd party app - duplicates due to a bug.
Then application has design flaws & running on obsoleted Oracle version; other than that life is wonderful.
Re: remove duplicates [message #340647 is a reply to message #340643] Wed, 13 August 2008 10:43 Go to previous messageGo to next message
Frank
Messages: 7880
Registered: March 2000
Senior Member
cygnusx04 wrote on Wed, 13 August 2008 17:35

Seems like a lot of attitude here.

Not really a matter of attitude, but where I come from, when people ask for help and they get advise they explain why they can't take it instead of just ignoring it.
Re: remove duplicates [message #340695 is a reply to message #340637] Wed, 13 August 2008 16:07 Go to previous message
Barbara Boehmer
Messages: 8631
Registered: November 2002
Location: California, USA
Senior Member
cygnusx04 wrote on Wed, 13 August 2008 07:40

I found an issue with this code ... I've made some changes so all strings are placed inside delimeters. Works correctly now ...



Good catch. Nice fix. Thanks for posting the corrected code.
Previous Topic: Query to Translate and Replace: Pls Help
Next Topic: Comparing date and string with SQL where clause
Goto Forum:
  


Current Time: Mon Dec 05 12:56:39 CST 2016

Total time taken to generate the page: 0.23363 seconds