Re: general design issue

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Thu, 23 Apr 2009 00:19:59 -0300
Message-ID: <49efde61$0$5490$9a566e8b_at_news.aliant.net>


skink wrote:

> hi,
>
> i don't know much about SQL - just basics: tables, indexes, simple
> selects and
> i'm designing simple database storing key/value data (in fact its
> read
> only dictionary)
>
> the schema is very straightforward:
>
> CREATE TABLE words (key TEXT, entry TEXT)
> CREATE INDEX wordsIdx ON words (key)
>
> the database engine is quite slow (sqlite on embedded device) so
> speed
> is by far the **most**
> important since i need my data in short time (40-60 ms).
>
> the problem i have is as follows: for any key prefix (even not
> existing in the database)
> i'd like to know if there are keys starting with given prefix + 'a',
> 'b', 'c' etc
>
> in other words: for prefix lets say 'inte' i'd like to know if there
> are keys starting with
> 'intea', 'inteb', 'intec', 'inted' etc (just need true/false for each
> letter not number of keys matching)
>
> i cannot just run 26 queries for:
>
> SELECT key FROM words WHERE key >= 'inteX' LIMIT 1
>
> where X is letter [a..z] since each query takes ~15 ms
>
> of course i don't need to iterate over all 26 letters: for example
> when looking for 'intea'
> my query might return 'integral' (means that there is no "intea",
> "inteb", "intec", "inted", "intee"
> and "intef" prefixes since the first key found has prefix "integ") so
> i could start with
> 'inteh' in next iteratiion. but still in the worst scenario i need 26
> queries...
>
> my solution is to create second table storing prefix/32-bit-integer-
> map where
> each bit indicates one letter (for latin alphabet only 26 bits are
> used)
>
> CREATE TABLE prefixes (prefix TEXT, bitmap INTEGER)
> CREATE INDEX prefixesIdx ON prefixes (prefix)
>
> and use it as follows:
>
> SELECT bitmap FROM prefixes WHERE prefix = 'inte'
>
> this is very fast since i need only one database access to find out
> the answer for my question.
>
> however for longer prefixes it's quite expensive, e.g. for prefix
> 'internationalizatio' i have to store 19
> prefixes just to find out there is one or two words starting with
> that
> prefix.
>
> so my question is how to make it better keeping in mind that speed is
> **key** factor but
> database size is also important.
>
> any ideas?
>
> thanks
> pskink

I know better than to offer design advice online, but the first question that pops into my mind is:

Does sqlite have a substring operator? substr(key,1,length(prefix)+1)

Quickly followed by:

Does sqlite support group by or distinct? Received on Thu Apr 23 2009 - 05:19:59 CEST

Original text of this message