general design issue

From: skink <pskink_at_gmail.com>
Date: Wed, 22 Apr 2009 13:59:38 -0700 (PDT)
Message-ID: <be83370d-1df5-4697-b18c-8ae5b67eac2f_at_k38g2000yqh.googlegroups.com>



hi,

i don't know much about SQL - just basics: tables, indexes, simple selects and
i'm designing simple database storing key/value data (in fact its read
only dictionary)

the schema is very straightforward:

CREATE TABLE words (key TEXT, entry TEXT) CREATE INDEX wordsIdx ON words (key)

the database engine is quite slow (sqlite on embedded device) so speed
is by far the **most**
important since i need my data in short time (40-60 ms).

the problem i have is as follows: for any key prefix (even not existing in the database)
i'd like to know if there are keys starting with given prefix + 'a',
'b', 'c' etc

in other words: for prefix lets say 'inte' i'd like to know if there are keys starting with
'intea', 'inteb', 'intec', 'inted' etc (just need true/false for each
letter not number of keys matching)

i cannot just run 26 queries for:

SELECT key FROM words WHERE key >= 'inteX' LIMIT 1

where X is letter [a..z] since each query takes ~15 ms

of course i don't need to iterate over all 26 letters: for example when looking for 'intea'
my query might return 'integral' (means that there is no "intea", "inteb", "intec", "inted", "intee"
and "intef" prefixes since the first key found has prefix "integ") so i could start with
'inteh' in next iteratiion. but still in the worst scenario i need 26
queries...

my solution is to create second table storing prefix/32-bit-integer- map where
each bit indicates one letter (for latin alphabet only 26 bits are used)

CREATE TABLE prefixes (prefix TEXT, bitmap INTEGER) CREATE INDEX prefixesIdx ON prefixes (prefix)

and use it as follows:

SELECT bitmap FROM prefixes WHERE prefix = 'inte'

this is very fast since i need only one database access to find out the answer for my question.

however for longer prefixes it's quite expensive, e.g. for prefix
'internationalizatio' i have to store 19
prefixes just to find out there is one or two words starting with that
prefix.

so my question is how to make it better keeping in mind that speed is **key** factor but
database size is also important.

any ideas?

thanks
pskink Received on Wed Apr 22 2009 - 22:59:38 CEST

Original text of this message