remove duplicates [message #340411] |
Tue, 12 August 2008 12:02  |
cygnusx04
Messages: 16 Registered: February 2008
|
Junior Member |
|
|
I have thousands of records containing values seperate by the ^ delimiter. Many of the fields have duplicate values which i'd like to remove. I'm thinking of creating a procedure per field in the table.
an example of data in the field DATA1 is as follows.
IAN^ANDREW^KEVIN^IAN^ANDREW
following the procedure I should be left with
IAN^ANDREW^KEVIN
what is the best way of doing this?
I thought about using instr and substr to get the first value (IAN) using the delimiter. Then replacing all occurrences of IAN with a blank value, then appending 1 instance of IAN to the end of the string. I could repeat this but how do i know when to stop.
Another approach would be to use an array - but this seems more complicated and I'm still not sure how to detect completion.
sounded easy but when I came to pseudo write it, I couldn't
|
|
|
|
Re: remove duplicates [message #340429 is a reply to message #340411] |
Tue, 12 August 2008 16:56   |
 |
Barbara Boehmer
Messages: 9104 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
Instead of using one string and concatenating the processed values to the end, you could use two strings, one incoming and one output. As you process each value, add the single value to the output string and remove all occurrences from the input string. I have provided a demonstration below. This would probably be better than using an array as you would still have to use substr and instr to get them into the array and concatenate them afterwards, so you might as well skip the array. There may be a simpler way to accomplish the whole thing using regular expressions.
SCOTT@orcl_11g> CREATE OR REPLACE FUNCTION no_dups
2 (p_string IN VARCHAR2)
3 RETURN VARCHAR2
4 AS
5 v_string_in VARCHAR2 (32767) := p_string || '^';
6 v_value VARCHAR2 (32767);
7 v_string_out VARCHAR2 (32767);
8 BEGIN
9 WHILE LENGTH (v_string_in) > 1 LOOP
10 -- identify first value in incoming string:
11 v_value := SUBSTR (v_string_in, 1, INSTR (v_string_in, '^'));
12 -- concatenate first value to output string:
13 v_string_out := v_string_out || v_value;
14 -- replace all such values in incoming string:
15 v_string_in := REPLACE (v_string_in, v_value, '');
16 -- repeat loop with next value until no values left
17 END LOOP;
18 -- return output string:
19 RETURN RTRIM (v_string_out, '^');
20 END no_dups;
21 /
Function created.
SCOTT@orcl_11g> SHOW ERRORS
No errors.
SCOTT@orcl_11g> SELECT no_dups ('IAN^ANDREW^KEVIN^IAN^ANDREW') FROM DUAL
2 /
NO_DUPS('IAN^ANDREW^KEVIN^IAN^ANDREW')
--------------------------------------------------------------------------------
IAN^ANDREW^KEVIN
SCOTT@orcl_11g>
|
|
|
Re: remove duplicates [message #340464 is a reply to message #340411] |
Wed, 13 August 2008 00:51   |
Frank
Messages: 7901 Registered: March 2000
|
Senior Member |
|
|
I'd say the best way to do it is how it should be done in the first place: create a detail table.
Add a unique key, use the aforementioned way(s) to split the string and Bob's your uncle. No more duplicates and never again any hassle. If you really need the concatenated string anywhere, create a view over the master-detail tables.
|
|
|
Re: remove duplicates [message #340575 is a reply to message #340429] |
Wed, 13 August 2008 05:59   |
cygnusx04
Messages: 16 Registered: February 2008
|
Junior Member |
|
|
thanks barbara - this looks good. I can incoperate this into my procedure and use a cursor to process all records.
I see I can also do this with expressions in Oracle 10, but I struggle to understand the code.
thanks again.
|
|
|
Re: remove duplicates [message #340637 is a reply to message #340575] |
Wed, 13 August 2008 09:40   |
cygnusx04
Messages: 16 Registered: February 2008
|
Junior Member |
|
|
I found an issue with this code.
dbms_output.put_line(no_dups('BOB^BILLYBOB^KEVIN^BOBBIE^BOB^'));
BOB^BILLYKEVIN^BOBBIE
I've made some changes so all strings are placed inside delimeters. Works correctly now (code a little messy ).
CREATE OR REPLACE FUNCTION no_dups
(p_string IN VARCHAR2)
RETURN VARCHAR2
AS
v_string_in VARCHAR2 (32767) := p_string || '^';
v_value VARCHAR2 (32767);
v_string_out VARCHAR2 (32767);
BEGIN
v_string_in := REPLACE (v_string_in, '^','^^');
WHILE LENGTH (v_string_in) > 1 LOOP
-- identify first value in incoming string:
v_string_in := LTRIM (v_string_in, '^');
v_value := SUBSTR (v_string_in, 1, INSTR (v_string_in, '^'));
If v_value = '^' then
EXIT;
end if;
v_value := '^' || v_value;
-- concatenate first value to output string:
v_string_out := v_string_out || v_value;
-- replace all such values in incoming string:
v_string_in := REPLACE ('^' || v_string_in, v_value, '^');
-- repeat loop with next value until no values left
END LOOP;
-- return output string:
v_string_out := REPLACE (v_string_out, '^^','^');
v_string_out := LTRIM (v_string_out, '^');
RETURN RTRIM (v_string_out, '^');
END no_dups;
|
|
|
|
|
Re: remove duplicates [message #340643 is a reply to message #340411] |
Wed, 13 August 2008 10:35   |
cygnusx04
Messages: 16 Registered: February 2008
|
Junior Member |
|
|
thanks for the support! This is a vendors DB and the data populated by a 3rd party app - duplicates due to a bug.
I'm just tasked with cleaning it up.
Seems like a lot of attitude here.
|
|
|
|
Re: remove duplicates [message #340647 is a reply to message #340643] |
Wed, 13 August 2008 10:43   |
Frank
Messages: 7901 Registered: March 2000
|
Senior Member |
|
|
cygnusx04 wrote on Wed, 13 August 2008 17:35 |
Seems like a lot of attitude here.
|
Not really a matter of attitude, but where I come from, when people ask for help and they get advise they explain why they can't take it instead of just ignoring it.
|
|
|
Re: remove duplicates [message #340695 is a reply to message #340637] |
Wed, 13 August 2008 16:07  |
 |
Barbara Boehmer
Messages: 9104 Registered: November 2002 Location: California, USA
|
Senior Member |
|
|
cygnusx04 wrote on Wed, 13 August 2008 07:40 |
I found an issue with this code ... I've made some changes so all strings are placed inside delimeters. Works correctly now ...
|
Good catch. Nice fix. Thanks for posting the corrected code.
|
|
|