Home » SQL & PL/SQL » SQL & PL/SQL » Delete duplicate rows (Oracle 11g)
Delete duplicate rows [message #400923] Thu, 30 April 2009 07:13 Go to next message
Vackar
Messages: 81
Registered: October 2007
Member
Hi there,

I have a simple problem, but I cannot think of how to solve it.

I have a lookup table of genes with two columns: ID, TERM

I then have another table called results, with an FK to the gene lookup table, and some other columns

What has happened is that I have duplictes in the lookup table, and there are results which map to different rows, but the same gene. So I have this situation
----------------------
GENE TABLE
----------------------
ID        TERM
----      ----
1        Gene1
2        Gene2
3        Gene3
4        Gene1
5        Gene1


----------------------
RESULTS TABLE 
----------------------
GENEID
--------   ... other columns    
1       
1        
1       
1        
2
2
2
2
3
4
5
4
4        




And what I'd like to do is set the gene_id of each result to the max value from the gene lookup table, and then remove the duplicates from the gene table.

I know how to do the second part, but I can't think of how to do the first part. Any help would be much appreciated.


Thanks,
Vackar
Re: Delete duplicate rows [message #400940 is a reply to message #400923] Thu, 30 April 2009 07:56 Go to previous messageGo to next message
Michel Cadot
Messages: 64152
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Quote:
What has happened is that I have duplictes in the lookup table, and there are results which map to different rows, but the same gene.

What does this mean? How a gene is defined in your model?
Post a working Test case: create table and insert statements along with the result you want with these data.

Quote:
but I can't think of how to do the first part.

So do I as I don't understand what it is. With a test case, I surely get your point.
Also always post your Oracle version (4 decimals).

Regards
Michel
Re: Delete duplicate rows [message #400942 is a reply to message #400923] Thu, 30 April 2009 08:45 Go to previous messageGo to next message
Vackar
Messages: 81
Registered: October 2007
Member
Hi,

Here's how the problem arose from the java layer that was doing the crud:


String geneName = getTheGeneName();

GeneEntity gene = geneDao.getOrCreate(geneName);
GeneExpressionResult result = new GeneExpressionResult(gene, ...other params)
resultDao.save(result);



The getOrCeate() method had a bug which never found the record in the database, so it would create a new value in the lookup table each time it was called.

So instead of have a situation like this



--------------------
GENE
--------------------
id         term
1          gene1
2          gene2
3          gene3
--------------------


----------------------------------------------
RESULTS
----------------------------------------------
ID        gene_id        other_columns
1           1                ...
2           1                ...
3           1                ...
4           1                ...
5           2                ...
6           3                ...
7           1                ...





I have this:


--------------------
GENE
--------------------
id         term
1          gene1
2          gene1
3          gene1
4          gene1
5          gene2
6          gene3
7          gene1
--------------------


----------------------------------------------
RESULTS
----------------------------------------------
ID        gene_id        other_columns
1           1                ...
2           2                ...
3           3                ...
4           4                ...
5           5                ...
6           6                ...
7           7                ...




So what id like to do is

update result set gene_id = maxValOfMatchingIds(gene_id);

delete from gene where id not in 
(
  select id from
  (
     select max(id), term from gene group by gene
  )
);


So I need help with the 'maxValOfMatchingIds' part.

Thanks in advance,
Vackar
Re: Delete duplicate rows [message #400943 is a reply to message #400942] Thu, 30 April 2009 08:53 Go to previous messageGo to next message
Michel Cadot
Messages: 64152
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Quote:
select max(id), term from gene group by gene

I bet this returns an error.

http://www.lmgtfy.com/?q=oracle+delete+duplicates

Regards
Michel
Re: Delete duplicate rows [message #400944 is a reply to message #400923] Thu, 30 April 2009 09:15 Go to previous messageGo to next message
Vackar
Messages: 81
Registered: October 2007
Member
Sorry, typed that in a hurry, should have been

group by term
Re: Delete duplicate rows [message #400945 is a reply to message #400923] Thu, 30 April 2009 09:20 Go to previous messageGo to next message
BlackSwan
Messages: 25050
Registered: January 2009
Location: SoCal
Senior Member
>Sorry, typed that in a hurry, should have been
Is CUT & PASTE broken for you?
Re: Delete duplicate rows [message #400953 is a reply to message #400945] Thu, 30 April 2009 10:00 Go to previous messageGo to next message
Vackar
Messages: 81
Registered: October 2007
Member
Cut and paste what? I haven't written the sql for that part yet, I just wanted to give an example of what i'm trying to do.

Learn some manners man.

Anyway, here's the function that I used, may be useful for someone in the future:


CREATE OR REPLACE FUNCTION getMaxGeneIdsMatching(origGeneId IN NUMBER) RETURN NUMBER 

AS
  ret NUMBER;
  geneId Number := origGeneId;
BEGIN
 select max(matchedIds) into ret from
    (
        SELECT 
            T2.ID matchedIds
        FROM 
            GENE  T1, GENE  T2
        WHERE 
            T1.ID != T2.ID
            AND T1.TERM = T2.TERM
            AND T1.ID = geneId
        order by 
            T2.ID DESC
    );
return ret;
END;
Re: Delete duplicate rows [message #400954 is a reply to message #400953] Thu, 30 April 2009 10:12 Go to previous messageGo to next message
Michel Cadot
Messages: 64152
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Subquery is useless, you can directly select the max value.
Order by is useless as you want the max value and this does not depend on the order of the rows.

Regards
Michel
Re: Delete duplicate rows [message #401026 is a reply to message #400942] Fri, 01 May 2009 03:03 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
Sorry about the tone of one of the contributors - the internet attracts all sorts.

For future reference, you get a much better hit rate with questions if you post a test case - ie a set of DML & DDL statements that we can run into a test schema to reproduce the problem. That way, we can get straight down to SQ withot having to translate your requirements.

Something like this:
create table GENE (id number, term varchar2(10));

create table results (id  number, gene_id  number);

insert into gene values (1,          'gene1');
insert into gene values (2,          'gene1');
insert into gene values (3,          'gene1');
insert into gene values (4,          'gene1');
insert into gene values (5,          'gene2');
insert into gene values (6,          'gene3');
insert into gene values (7,          'gene1');

insert into results values (1,           1);
insert into results values (2,           2);
insert into results values (3,           3);
insert into results values (4,           4);
insert into results values (5,           5);
insert into results values (6,           6);
insert into results values (7,           7);


As far as your problem goes, I'd fix it like this:
update results r
set    gene_id = (select max(g1.id)
                  from   gene g1
                        ,gene g2
                  where  g1.term = g2.term
                  and    g2.id = r.gene_id);
                  
delete gene
where  (id,term) not in (select max(id),term
                         from   gene
                         group by term);

The trick it to have two copies of the Gene table to let you get the maximum value if the Id for a given Term.
Re: Delete duplicate rows [message #401029 is a reply to message #400923] Fri, 01 May 2009 03:13 Go to previous message
Vackar
Messages: 81
Registered: October 2007
Member
Thank you for the tip, it is much appreciated.
Previous Topic: how to write a procedure for this condition
Next Topic: Deleteing records
Goto Forum:
  


Current Time: Sat Dec 10 13:01:28 CST 2016

Total time taken to generate the page: 0.09453 seconds