Re: general design issue

From: skink <pskink_at_gmail.com>
Date: Thu, 23 Apr 2009 01:20:42 -0700 (PDT)
Message-ID: <0bfc054e-1dba-44b3-a945-39759e28a9e9_at_c9g2000yqm.googlegroups.com>


On 23 Kwi, 01:19, Bob Badour <bbad..._at_pei.sympatico.ca> wrote:
> skink wrote:
> > hi,
>
> > i don't know much about SQL - just basics: tables, indexes, simple
> > selects and
> > i'm designing simple database storing key/value data (in fact its
> > read
> > only dictionary)
>
> > the schema is very straightforward:
>
> > CREATE TABLE words (key TEXT, entry TEXT)
> > CREATE INDEX wordsIdx ON words (key)
>
> > the database engine is quite slow (sqlite on embedded device) so
> > speed
> > is by far the **most**
> > important since i need my data in short time (40-60 ms).
>
> > the problem i have is as follows: for any key prefix (even not
> > existing in the database)
> > i'd like to know if there are keys starting with given prefix + 'a',
> > 'b', 'c' etc
>
> > in other words: for prefix lets say 'inte' i'd like to know if there
> > are keys starting with
> > 'intea', 'inteb', 'intec', 'inted' etc (just need true/false for each
> > letter not number of keys matching)
>
> > i cannot just run 26 queries for:
>
> > SELECT key FROM words WHERE key >= 'inteX' LIMIT 1
>
> > where X is letter [a..z] since each query takes ~15 ms
>
> > of course i don't need to iterate over all 26 letters: for example
> > when looking for 'intea'
> > my query might return 'integral' (means that there is no "intea",
> > "inteb", "intec", "inted", "intee"
> > and "intef" prefixes since the first key found has prefix "integ") so
> > i could start with
> > 'inteh' in next iteratiion. but still in the worst scenario i need 26
> > queries...
>
> > my solution is to create second table storing prefix/32-bit-integer-
> > map where
> > each bit indicates one letter (for latin alphabet only 26 bits are
> > used)
>
> > CREATE TABLE prefixes (prefix TEXT, bitmap INTEGER)
> > CREATE INDEX prefixesIdx ON prefixes (prefix)
>
> > and use it as follows:
>
> > SELECT bitmap FROM prefixes WHERE prefix = 'inte'
>
> > this is very fast since i need only one database access to find out
> > the answer for my question.
>
> > however for longer prefixes it's quite expensive, e.g. for prefix
> > 'internationalizatio' i have to store 19
> > prefixes just to find out there is one or two words starting with
> > that
> > prefix.
>
> > so my question is how to make it better keeping in mind that speed is
> > **key** factor but
> > database size is also important.
>
> > any ideas?
>
> > thanks
> > pskink
>
> I know better than to offer design advice online, but the first question
> that pops into my mind is:
>
> Does sqlite have a substring operator? substr(key,1,length(prefix)+1)
>
> Quickly followed by:
>
> Does sqlite support group by or distinct?

both answers are yes.

your questions gave me some idea (probably the same as yours):

select distinct substr(key, 5, 1) from words where key like 'inte%'

but it takes ~1000 ms which is not good news... Received on Thu Apr 23 2009 - 10:20:42 CEST

Original text of this message