Re: Good way to implement string interning?
Date: Tue, 28 Feb 2006 18:52:21 GMT
Message-ID: <FZ0Nf.285433$sQ.8099984_at_phobos.telenet-ops.be>
Peter Seibel wrote:
> Suppose I have a file containing several million strings (say URIs)
> and I need to assign each unique string a numeric ID and store the
> string-to-ID and ID-to-string mapping as (time) efficiently as
> possible. Is there some clever technique databases use for this kind
> of task. What about if we know that many of the strings are going to
> have common prefixes (e.g. "http://www.gigamonkeys.com/book/" and
> "http://www.gigamonkeys.com/lispbox/")
An often used solution for the string -> id direction is a trie. For a short introduction see Wikipedia:
http://en.wikipedia.org/wiki/Trie
I've also heard of a related hashing technique called "trie hashing"
introduced by ... drumroll ... Witold Lidwin. See:
http://portal.acm.org/citation.cfm?id=582322
But I'm not really familiar with that myself. If you don't have access
to the ACM portal, let me know and I'll see what I can do.
- Jan Hidders