Re: data structure for strings

From: Brian Inglis <Brian.dot.Inglis_at_SystematicSw.ab.ca>
Date: 2000/08/12
Message-ID: <qo5cps4mlmbcp07s2gj20m0hdeeiqbjipd_at_4ax.com>#1/1


On Sat, 12 Aug 2000 06:41:21 +0200, Simon Richter <geier_at_phobos.fs.tum.de> wrote:

>On Sun, 23 Jul 2000, Marc Tardif wrote:
>
>> If I have 130 words of up to 6 characters, in what kind of data structure
>> can I store these words for quick string comparison? In other words, if I
>> have an unknown word, I need to determine if it is like any of the 130
>> words. This data structure does not need to be expandable, the list will
>> always be the same 130 words.
>
>What about a binary search? You need 7.022 steps per search, which means
>comparing a maximum of 14 (8 steps + 6 chars/word) bytes.
>
> Simon

I make it a minimum of 6 byte comparisons, a maximum of 41 byte comparisons (6 + 5 x 7), depending on the input word and its size and the distribution and sizes of the 130 words.

I would suggest you use a hash table with a hash function chosen depending on what you mean by the comparison operator *like*.

The hash function and table size can be set up to give you an average of one function calculation and one word comparison per lookup.

If you mean *like* and not /identical/ your hash function should give similar outputs with similar words, you could use the Soundex or a similar phonetic encoding before hashing, you could generate a bit string with bits indicating letters present as input to the hash function; or forget the hashing and compute something like a bulls and cows score for each word in the table relative to the input word, with the highest bulls and cows scores giving the nearest matches to the input word.

Thanks. Take care, Brian Inglis Calgary, Alberta, Canada

-- 
Brian_Inglis_at_CSi.com 	(Brian dot Inglis at SystematicSw dot ab dot ca)
				use address above to reply
Received on Sat Aug 12 2000 - 00:00:00 CEST

Original text of this message