Re: Duplicate Rows
From: Serge Rielau <srielau_at_ca.ibm.com>
Date: Sat, 26 Jul 2008 09:46:05 -0400
Message-ID: <6f0o51F99lkdU1@mid.individual.net>
>> Hi,
>>
>> I figured I'd ask this too. I know how to remove duplicate rows from
>> a table, but this example used an analytical function. I'm new to
>> these and do not get a huge chance to use them, so I'm always
>> forgetting how they work.
>>
>> Please see this:
>>
>> select * from customer
>> where rowid in
>> (select rowid from
>> (select
>> rowid,
>> row_number()
>> over
>> (partition by customer_id order by customer_id ) dup
>> from customer)
>> where dup > 1);
>>
>> Ok, that did show me the duplicates, but what if a duplicate was made
>> up of more than 1 column? Can I still use an analytical function?
>>
>> This is where I might learn about these things......
>>
>> Thank you.
The ORDER BY does nothing here.
You use the ORDER BY if there is a specific row that you want to keep. (Rows are the same except that some are better... ) Let's say the "newest": ROW_NUMBER() OVER (PARTITION BY itemid ORDER BY updatestamp DESC) to give you the last version of a row.
Date: Sat, 26 Jul 2008 09:46:05 -0400
Message-ID: <6f0o51F99lkdU1@mid.individual.net>
David Portas wrote:
> <artmerar_at_yahoo.com> wrote in message > news:67fb7670-5142-440f-9cc3-33c9ee914ebc_at_c65g2000hsa.googlegroups.com...
>> Hi,
>>
>> I figured I'd ask this too. I know how to remove duplicate rows from
>> a table, but this example used an analytical function. I'm new to
>> these and do not get a huge chance to use them, so I'm always
>> forgetting how they work.
>>
>> Please see this:
>>
>> select * from customer
>> where rowid in
>> (select rowid from
>> (select
>> rowid,
>> row_number()
>> over
>> (partition by customer_id order by customer_id ) dup
>> from customer)
>> where dup > 1);
>>
>> Ok, that did show me the duplicates, but what if a duplicate was made
>> up of more than 1 column? Can I still use an analytical function?
>>
>> This is where I might learn about these things......
>>
>> Thank you.
> > The OVER clause accepts a list of columns: > > SELECT * FROM > (SELECT customer_id, col2, col3, > ROW_NUMBER() > OVER > (PARTITION BY customer_id, col2, col3 > ORDER BY customer_id, col2, col3 ) dup > FROM customer) t > WHERE dup > 1; >
The ORDER BY does nothing here.
You use the ORDER BY if there is a specific row that you want to keep. (Rows are the same except that some are better... ) Let's say the "newest": ROW_NUMBER() OVER (PARTITION BY itemid ORDER BY updatestamp DESC) to give you the last version of a row.
Cheers
Serge
-- Serge Rielau DB2 Solutions Development IBM Toronto LabReceived on Sat Jul 26 2008 - 08:46:05 CDT