Re: Good way to implement string interning?
Date: Tue, 28 Feb 2006 18:58:14 GMT
Message-ID: <m2slq3ux8a.fsf_at_gigamonkeys.com>
Jan Hidders <jan.hidders_at_REMOVETHIS.pandora.be> writes:
> Peter Seibel wrote:
>> Suppose I have a file containing several million strings (say URIs)
>> and I need to assign each unique string a numeric ID and store the
>> string-to-ID and ID-to-string mapping as (time) efficiently as
>> possible. Is there some clever technique databases use for this kind
>> of task. What about if we know that many of the strings are going to
>> have common prefixes (e.g. "http://www.gigamonkeys.com/book/" and
>> "http://www.gigamonkeys.com/lispbox/")
>
> An often used solution for the string -> id direction is a trie. For
> a short introduction see Wikipedia:
>
> http://en.wikipedia.org/wiki/Trie
> I've also heard of a related hashing technique called "trie hashing"
> introduced by ... drumroll ... Witold Lidwin. See:
>
> http://portal.acm.org/citation.cfm?id=582322
>
> But I'm not really familiar with that myself. If you don't have access
> to the ACM portal, let me know and I'll see what I can do.
Yeah, I bought that paper too. I skimmed it but it didn't seem to be going anywhere useful for my problem. Maybe I'll give it another look.
-Peter
-- Peter Seibel * peter_at_gigamonkeys.com Gigamonkeys Consulting * http://www.gigamonkeys.com/ Practical Common Lisp * http://www.gigamonkeys.com/book/Received on Tue Feb 28 2006 - 19:58:14 CET