Re: Query to find first missing integer in a field

From: Shakespeare <>
Date: Wed, 08 Apr 2009 23:45:40 +0200
Message-ID: <49dd1b0a$0$192$>

DeanB schreef:
> On Apr 8, 8:43 am, Mark D Powell <> wrote:

>> On Apr 7, 4:38 pm, DeanB <> wrote:
>>> On Apr 7, 4:08 pm, ddf <> wrote:
>>>> On Apr 7, 2:51 pm, dean <> wrote:
>>>>> Does anyone know a good way to find the first free integer number that
>>>>> does not exist in a table in a specified column? I can think of a
>>>>> couple of tedious ways of doing this but they are hardly elegant.
>>>>> Assume there could be any integer present up to 2 billion.
>>>> This has 'DON'T DO THIS' written all over it in large, bold, red
>>>> letters as it's a disaster in the making. I suppose you're looking to
>>>> 'backfill' missing sequence numbers and that is a recipe for failure.
>>>> Seriously rethink this before you're in too deep and end up with an
>>>> application that won't run because it won't scale.
>>>> David Fitzjarrell
>>> Correct - the problem is that our consultants tend to poke around in
>>> the database and analyse data, and its far easier to do this if the ID
>>> numbers are 1234 rather than 158943758.
>>> Also, sequences are not reproducable, so debugging a series of SQL
>>> statements is sometimes a difficult process.
>> Why is 1234 easier than 158943758 ? Once you get into having millions
>> of rows what difference does it make? Most of the analysis should be
>> done via code which finds and uses the keys in question so no human
>> interventions should be necessary. Where the work is done manually
>> copy and paste comes to mind and well as a little pl/sql to perform
>> the necessary queries based off a short list of input values.
>> Here is one way to find the missing keys
>> --
>> -- select complete_range_of_interest (query to generate 1..N - many
>> examples on web)
>> -- from source_with_enough_rows
>> -- where range >= start_of_interest and range <=
>> max_value_of_interest (filter for min and max values in table)
>> -- minus
>> -- select values from table_of_interest
>> --
>> Be advised if you plan to update existing key to lower skipped values
>> that you have two issues that you may have to deal with: 1- if there
>> are FK relationships to the value you have to update the child
>> tables. If the FK relationship is not defined in the database but
>> only in the application you may easily miss a necessary update, and 2
>> - if you have any purge and archive processes then by re-using a key
>> you make restoration of archive data either impossible or create false
>> relationship between rows of data by restoring the reused key.
>> HTH -- Mark D Powell --- Hide quoted text -
>> - Show quoted text -
> Thanks. We would only re-use an empty 'slot' when inserting new data,
> rather than trying to correct any existing records. It may be that we
> only have 10K records, but with IDs up in the millions, like it or
> not, some consultants complain about such things. Doing all child-
> table inserts in a transaction will ensure referential integrity.
> No purge or archive process exists for these tables.

If you have 10k records with id's up to millions, you can genarate any random number and check if it exists until you find one that does not. In less than 1% of the cases you'll have to try more than once.... and in less than 0.01 % of the cases more than twice...... Store the last tried number in a separate table and start with the next value (+1) next time....

Or even better: create a new sequence, and insert using this sequence. If (or while) it fails, take the next value and try again. Still, this would fail in only 1% of the cases, having to draw a sequence twice. This will fill up your gaps....

When numbers get too high again, reset yor sequence to 0 and start again.....

(What's in a guess?) Received on Wed Apr 08 2009 - 16:45:40 CDT

Original text of this message