Re: Query to find first missing integer in a field

From: Shakespeare <whatsin_at_xs4all.nl>
Date: Wed, 08 Apr 2009 23:45:40 +0200
Message-ID: <49dd1b0a$0$192$e4fe514c_at_news.xs4all.nl>



DeanB schreef:
> On Apr 8, 8:43 am, Mark D Powell <Mark.Pow..._at_eds.com> wrote:

>> On Apr 7, 4:38 pm, DeanB <deanbrow..._at_yahoo.com> wrote:
>>
>>
>>
>>
>>
>>> On Apr 7, 4:08 pm, ddf <orat..._at_msn.com> wrote:
>>>> On Apr 7, 2:51 pm, dean <deanbrow..._at_yahoo.com> wrote:
>>>>> Does anyone know a good way to find the first free integer number that
>>>>> does not exist in a table in a specified column? I can think of a
>>>>> couple of tedious ways of doing this but they are hardly elegant.
>>>>> Assume there could be any integer present up to 2 billion.
>>>> This has 'DON'T DO THIS' written all over it in large, bold, red
>>>> letters as it's a disaster in the making. I suppose you're looking to
>>>> 'backfill' missing sequence numbers and that is a recipe for failure.
>>>> Seriously rethink this before you're in too deep and end up with an
>>>> application that won't run because it won't scale.
>>>> David Fitzjarrell
>>> Correct - the problem is that our consultants tend to poke around in
>>> the database and analyse data, and its far easier to do this if the ID
>>> numbers are 1234 rather than 158943758.
>>> Also, sequences are not reproducable, so debugging a series of SQL
>>> statements is sometimes a difficult process.
>> Why is 1234 easier than 158943758 ? Once you get into having millions
>> of rows what difference does it make? Most of the analysis should be
>> done via code which finds and uses the keys in question so no human
>> interventions should be necessary. Where the work is done manually
>> copy and paste comes to mind and well as a little pl/sql to perform
>> the necessary queries based off a short list of input values.
>>
>> Here is one way to find the missing keys
>> --
>> -- select complete_range_of_interest (query to generate 1..N - many
>> examples on web)
>> -- from source_with_enough_rows
>> -- where range >= start_of_interest and range <=
>> max_value_of_interest (filter for min and max values in table)
>> -- minus
>> -- select values from table_of_interest
>> --
>>
>> Be advised if you plan to update existing key to lower skipped values
>> that you have two issues that you may have to deal with: 1- if there
>> are FK relationships to the value you have to update the child
>> tables. If the FK relationship is not defined in the database but
>> only in the application you may easily miss a necessary update, and 2
>> - if you have any purge and archive processes then by re-using a key
>> you make restoration of archive data either impossible or create false
>> relationship between rows of data by restoring the reused key.
>>
>> HTH -- Mark D Powell --- Hide quoted text -
>>
>> - Show quoted text -
> 
> Thanks. We would only re-use an empty 'slot' when inserting new data,
> rather than trying to correct any existing records. It may be that we
> only have 10K records, but with IDs up in the millions, like it or
> not, some consultants complain about such things. Doing all child-
> table inserts in a transaction will ensure referential integrity.
> 
> No purge or archive process exists for these tables.
> 

If you have 10k records with id's up to millions, you can genarate any random number and check if it exists until you find one that does not. In less than 1% of the cases you'll have to try more than once.... and in less than 0.01 % of the cases more than twice...... Store the last tried number in a separate table and start with the next value (+1) next time....

Or even better: create a new sequence, and insert using this sequence. If (or while) it fails, take the next value and try again. Still, this would fail in only 1% of the cases, having to draw a sequence twice. This will fill up your gaps....

When numbers get too high again, reset yor sequence to 0 and start again.....

Shakespeare
(What's in a guess?) Received on Wed Apr 08 2009 - 16:45:40 CDT

Original text of this message