Re: Updating data in table depending on data in same table

From: DA Morgan <damorgan_at_psoug.org>
Date: Thu, 08 Feb 2007 12:42:08 -0800
Message-ID: <1170967329.186918@bubbleator.drizzle.com>

The J Man wrote:
> Hey everyone,
>
> First of all apologies for posting this in the oracle.server section I
> meant to post it in here first.
>
> I'm still a relative newb so go easy please!
>
> I've inherited a system from someone that is in a bit of a mess and I
> need to tidy it up. It's an Oracle 10g database.
>
> The part i have to clean up basically has two tables.
> Table A attributes are: SOURCEID, NAME, DESCRIPTION, DEPARTMENT,
> LANGUAGES.
> Table B attributes are: SOURCEID, LANGID, ROLE, STATUS.
>
> Now the way this system is supposed to work is that a row is entered
> into table A and X amount of languages are added to the LANGUAGES
> field. This field is then split up into a row for each language into
> table B with the corresponding SOURCEID.
> e.g.
> Table A:
> 123 -- Homer -- lazy -- HR -- French;Spanish;German
>
> Table B:
> 123 -- French -- manager -- approved
> 123 -- Spanish -- admin -- pending
> 123 -- German -- teaboy -- proposed
>
> This system had been working fine until a new person started and
> instead of just adding a new language to the original row in table A
> she added 1000+ rows that are identical to those already in the table
> except for the fact the language is Japanese. So now the table looks
> like
> Table A:
> 123 -- Homer -- lazy -- HR -- French;Spanish;German
> 456 -- Homer -- lazy -- HR -- Japanese;
>
> Table B:
> 123 -- French -- manager -- approved
> 123 -- Spanish -- admin -- pending
> 123 -- German -- teaboy -- proposed
> 456 -- Japanese -- janitor -- approved
>
> I have been given the unenviable task of fixing this by, identifying
> all terms that have identical attributes (except for the LANGUAGES
> attribute), changing the SOURCEID of the duplicate in table B to the
> master term SOURCEID (so in the example above, in table B change the
> row 456 -- Japanese -- janitor -- approved to 123 -- Japanese --
> janitor -- approved), tagging on Japanese onto the master term (in the
> example above, in table A change the row 123 -- Homer -- lazy -- HR --
> French;Spanish;German to 123 -- Homer -- lazy -- HR --
> French;Spanish;German;Japanese;) and then deleting the duplicate entry
> in table A (in the example above, delete the row 456 -- Homer -- lazy
> -- HR -- Japanese;)
>
> Thanks for sticking with me this far (if you have!). Luckily enough I
> know for a fact that all the master terms are in the SOURCEID range 1
> to 2000 and the duplicates are in the SOURCEID range 2500 to 6000. so
> identifying the terms is not a problem. I just ran an intersect query
> and found my ~1000 culprit terms. As you probably know though, from
> this I am not getting the SOURCEIDS of either row so basically i have
> been getting the SOURCEIDS of the identified terms and running the
> following command manually.
>
> update target set target.SOURCEID=MASTER_NUMBER where
> target.SOURCEID=DUPLICATE_NUMBER;
> update source set
> source.LANGSREQUIRED='French;Spanish;German;Italian;Brazilian
> Portuguese;Simplified Chinese;Traditional Chinese;Swedish;French
> Canadian;Dutch;Korean;Japanese;' where source.SOURCEID=MASTER_NUMBER;
> DELETE FROM source WHERE sourceid = DUPLICATE_NUMBER;
>
> As you can tell my wrists are about to fall off with the RSI of
> continually trying to find out the IDs and then run these commands.
> SQL is not really my game so i am not sure whether i would be able to
> use some procedure or cursor or something along those lines... I have
> already wasted an awful lot of time trying to read of some way to help
> me by running different commands but i am very quickly running out of
> time...
>
> Any help at all would be appreciated or even just a big fat no to say
> you're gonna have to put up with it and just do it manually.
>
> Thank you reading all this and I hope that I have explained it clearly
> enough,
> Jonathan.

What you have described above, if I understand it correctly, is a relational horror story. What happened should be technically impossible. While you certainly need to clean up the mess the LANGUAGES column serves no useful purpose as the information is available, properly stored in table B.

Also be aware that some of your column names are Oracle reserved words.

NAME and ROLE jump immediately to mind.

But this should start you off:

SELECT col1, col2, col3, col4, COUNT(*)
FROM table
GROUP BY col1, col2, col3, col4
HAVING COUNT(*) > 1;

-- 
Daniel A. Morgan
University of Washington
damorgan_at_x.washington.edu
(replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org

Received on Thu Feb 08 2007 - 14:42:08 CST