Re: sql question

From: Jonathan Leffler <johnl_at_informix.com>
Date: 1997/12/08
Message-ID: <66hkm2$42m_at_cssun.mathcs.emory.edu>#1/1


On Mon, 8 Dec 1997 david.ashby_at_workcover.nsw.gov.au wrote:
}> Subject: sql question
}> Author: ministry of education <moe003_at_netvision.net.il> at WCA-INET
}> Date: 07/12/97 6:10 PM
}>
}> I have a table with multiple exact duplicate rows.No indexes. How do I
}> delete all duplicate rows to get a single occurrence for each row?
 

}      Try this:
}      
}      - select distinct col1, col2 .... , count(*)
}         from table
}         group by col1, col2 ....
}         having count(*) > 1
}         into temp temp_dulicates;
}      
}      - delete from table
}         where col1||col2 .... in
}         (select col1||col2 .... 
}          from temp_duplicates);
}      
}      - insert into table
}         select col1, col2 .... from temp_duplicates;
}      
}      Warning TEST this before running!

That's a complicated way of writing:

BEGIN WORK;
SELECT DISTINCT * FROM Table INTO TEMP Temp1; DELETE FROM Table WHERE 1 = 1;
INSERT INTO Table SELECT * FROM Temp1;
COMMIT WORK;
This is a reasonable solution for small to medium tables when you have enough disk space available for the whole temporary table.

If you don't, then you have to get fancy with ROWID, assuming that you are not dealing with a fragmented table, of course! I attach an old answer to this which works with ROWIDs.

Is this in the FAQ? If not, I think it should be added. Unless someone else has a tidier solution...

Yours,
Jonathan Leffler (johnl_at_informix.com) #include <witticism.h>


Date: Mon, 2 May 94 00:00:00 PST
From: johnl (Jonathan Leffler)
Subject: Re: Unique table entries

I wrote:

}Date: Fri Apr 29 08:38:56 1994
}From: johnl (Jonathan Leffler)
}To: brianr_at_magnus1.com, informix-list_at_rmy.emory.edu
}
}>From: brianr_at_magnus1.com (Bulletin board login)
}>Date: Wed, 27 Apr 1994 14:01:44 GMT
}>X-Informix-List-Id: <news.6534>
}>
}>I was wondering if anyone could help with an ACE and SQL problem. I need to
}>query a equipment table for duplicate fields and fields with partial duplicate
}>data and can't seem to find the right commands.
 
}>I would like to query the equipment table for duplicate serial numbers
}>and then check to see if the model number is > 6 characters and delete
}>the duplicates.
}
}Let's assume the table looks somewhat like:
}
}CREATE TABLE Equipment
}(
}    SerialNumber    CHAR(15) NOT NULL,
}    ModelNumber     CHAR(15) NOT NULL
});
}
}I assume that you want to delete just one of the duplicates, not all the
}duplicated entries -- you need to immensely precise when specifying what
}you want to happen.
}
}SELECT SerialNumber
}    FROM Equipment
}    WHERE LENGTH(ModelNumber) > 6
}    GROUP BY SerialNumber
}    HAVING COUNT(*) > 1;
}
}This lists the SerialNumbers with duplicate entries.  I'm not convinced you
}want the WHERE filter -- and LENGTH may not be available in 2.10.03.
}
}Then to delete all but one of the duplicate entries, you have to be very
}careful, and you test the DELETE by replacing DELETE with 'SELECT *' so
}that you see what you are going to delete first. The code which follows is }untested.

The SQL that followed had not been tested, and there were a various errors in it. I have now got properly tested working code. About the only good thing about Friday's version was that I did say the code was untested...

INSERT INTO Equipment VALUES("AAA11122333", "Wasp Mk II R83");
INSERT INTO Equipment VALUES("AAA11122333", "Wasp Mk II R85");
INSERT INTO Equipment VALUES("AAA11122333", "Wasp Mk II R89");
INSERT INTO Equipment VALUES("AAA11122999", "Wasp Mk II R99");
INSERT INTO Equipment VALUES("AAA11122999", "Wasp Mk II R97");
INSERT INTO Equipment VALUES("AAA11122999", "Wasp Mk II R98");
INSERT INTO Equipment VALUES("AAA11122999", "Wasp Mk II R79");

LOCK TABLE Equipment IN EXCLUSIVE MODE;

SELECT SerialNumber

    FROM Equipment
    GROUP BY SerialNumber
    HAVING COUNT(*) > 1
    INTO TEMP x;

SELECT SerialNumber, ROWID xrowid

     FROM Equipment
     WHERE SerialNumber IN (SELECT * FROM x)
     INTO TEMP y;

DELETE FROM Equipment

    WHERE SerialNumber IN (SELECT SerialNumber FROM y)

      AND ROWID != (SELECT MAX(xrowid)
                        FROM y
                        WHERE y.SerialNumber = Equipment.SerialNumber
                   );

SELECT * FROM Equipment;

UNLOCK TABLE Equipment;

}If you have a transaction log, then you'll do the LOCK inside a
}transaction, and you'll use COMMIT WORK to release the lock instead of
}UNLOCK.
}
}>Then, with the fields that are > 6 chars, I need to clip the field using an
}>UPDATE command if possible.
}
}I take it that "field" is model number?  Please be precise when asking
}questions.
}
}UPDATE Equipment
}    SET ModelNumber = ModelNumber[1,6];
}
}For models where the number is 6 or fewer characters long, this does
}nothing, but those longer than 6 are truncated.

Yours,
Jonathan Leffler (johnl_at_informix.com) #include <disclaimer.h> Received on Mon Dec 08 1997 - 00:00:00 CET

Original text of this message