Re: Good way to implement string interning?

From: Jan Hidders <jan.hidders_at_REMOVETHIS.pandora.be>
Date: Tue, 28 Feb 2006 18:52:21 GMT
Message-ID: <FZ0Nf.285433$sQ.8099984_at_phobos.telenet-ops.be>


Peter Seibel wrote:
> Suppose I have a file containing several million strings (say URIs)
> and I need to assign each unique string a numeric ID and store the
> string-to-ID and ID-to-string mapping as (time) efficiently as
> possible. Is there some clever technique databases use for this kind
> of task. What about if we know that many of the strings are going to
> have common prefixes (e.g. "http://www.gigamonkeys.com/book/" and
> "http://www.gigamonkeys.com/lispbox/")

An often used solution for the string -> id direction is a trie. For a short introduction see Wikipedia:

http://en.wikipedia.org/wiki/Trie

I've also heard of a related hashing technique called "trie hashing" introduced by ... drumroll ... Witold Lidwin. See:

http://portal.acm.org/citation.cfm?id=582322

But I'm not really familiar with that myself. If you don't have access to the ACM portal, let me know and I'll see what I can do.

  • Jan Hidders
Received on Tue Feb 28 2006 - 19:52:21 CET

Original text of this message