Re: general design issue

From: skink <pskink_at_gmail.com>
Date: Fri, 24 Apr 2009 10:42:24 -0700 (PDT)
Message-ID: <6719178b-ac20-40aa-a962-2f00f31e9f19_at_c36g2000yqn.googlegroups.com>


On 24 Kwi, 18:48, paul c <toledobythe..._at_oohay.ac> wrote:
> skink wrote:
>
> ...
>
> > my only suspicion is that OS i'm using (google android) puts
> > additional layer (i use java iface) in order to access sqlite.
> > maybe its implementation is somehow done not right way...
>
> > thanks again
>
> Then the only suggestion I can think of is to use 'explain' and 'explain
> query plan' on some other table.  Keep trying different queries until
> you get one that indicates something like "TABLE words WITH INDEX
> sqlite_autoindex_words_1" to show that an index was used.  Then
> carefully compare that table's definition with the one that is giving
> you trouble, try to eliminate any differences even if they seem
> cosmetic, as you eliminate each difference, run explain query plan
> again.  My guess is that you will locate one or more differences that
> are crucial.
>
> In other words, confirm that your platform will use an index sometimes.
>   If it never does, it's a packaging problem.  If it sometimes does,
> then you know whatever version of sqlite you've got should be capable of
> optimizing the query.

paul,

to be honest "explain" and "explain query plan" is another thing i was not aware of.;)

i used "explain query plan" and in fact GLOB SHOULD use my index, but it doesn't

sqlite> explain query plan select distinct substr(key, 5, 1) from words where key LIKE "inte%";
0|0|TABLE words
sqlite> explain query plan select distinct substr(key, 5, 1) from words where key GLOB "inte*";
0|0|TABLE words WITH INDEX wordsIdx

since i would need learn much about SQL to get some decent expertise i tried some experiments with different aproach and it seems that I found reasonable solution in terms of speed and storage.

when populating database i create in memory hash map with all prefixes - its ~105000 unique prefixes.

each value in that hash map contains number of words that match given prefix.

then i insert every prefix that has more that ten (10) words (its only 1615 such prefixes - i thought it will be much much more) to prefixes table.

my database is read only so i need to do it only once when creating database.

CREATE TABLE prefixes (prefix TEXT, bitmap INTEGER); CREATE TABLE words (key TEXT, value TEXT);

CREATE INDEX prefixesIdx ON prefixes (prefix); CREATE INDEX wordsIdx ON words (key);

this is how prefixes are distributed:

sqlite> select length(prefix), count(prefix) from prefixes group by length(prefix);

0|1
1|25
2|185
3|665
4|488
5|180
6|49
7|17
8|5

sqlite> select count(*) from prefixes;
1615

in my application for given prefix first i try to find prefix in prefixes table:

"select * from prefixes where prefix = 'inte';"

it takes ~15 ms

the bitmap is answer to my question (bit 0 indicates if there are words starting with "intea", bit 1 "inteb" etc)

if NOT found in prevoius select it means that i can do the following:

"select * from words where key >= 'inte' limit 10;"

it takes ~25ms

i could also use like/glob:

"select * from words where key like 'inte%';"

and i'm sure the length of result set will be < 10

iterating over that result set gives the anser to my question

what do you think of it?

thank you guys for your help, as i said in my original post i'm SQL novice and reading your replies i learnt something about LIKE/GLOB/susbtr and explain/explain query plan which i had no idea of

btw: is it possible in one query to get both prefixes(*) and words (*) ?

something like hypothetical:

"select * from prefixes where prefix = 'inte' + * from words where key
>= 'inte' limit 10"
Received on Fri Apr 24 2009 - 19:42:24 CEST

Original text of this message