Re: general design issue

From: paul c <toledobythesea_at_oohay.ac>
Date: Thu, 23 Apr 2009 23:48:06 GMT
Message-ID: <W67Il.23820$Db2.4970_at_edtnps83>


Bob Badour wrote:
> paul c wrote:
>

>> Bob Badour wrote:
>>
>>> skink wrote:
>>>
>>>>
>>>> Bob Badour write:
>>>>
>>>>> Other than being non-standard, what is your objection to using GLOB
>>>>> 'prefix*' ?
>>>>
>>>>
>>>> i'm sorry i overlooked GLOB in the previous post. ok gave it a try on
>>>> my sqlute and results are the same as LIKE: ~1200 ms...
>>>
>>>
>>> How many distinct items with the same prefix do you have? Presumably 
>>> you have no more than about a couple dozen with the same prefix plus 
>>> one character.
>>>
>>> Are the prefixes you use always the same length?
>>
>> Note I tried it with a unique index because I guessed that whatever 
>> words he's storing, he stores each one only once but I would guess if 
>> the OP built his index allowing duplicates it would still use the 
>> index but maybe I'm wrong and the index needs to be unique.

>
> They are prefix queries. Even if he does not allow duplicates, he could
> still have gigabytes of index for a given prefix because all the
> remaining characters might be different.
> ...

There could be a lot of wasted motion If the index is not 'primary', is also 'dense' and the rows are unclustered, but maybe he is using flash storage on the embedded device, so his elapsed time would be more in proportion with result volume even if he is doing extra cpu time. But he did imply that he was searching against the equivalent of a primary key. I found a reference that says sqlite doesn't support 'clustered indexes', which tells me that this index must be dense, and (as you say) could actually be bigger than his data! I guess that means that all sqlite indexes are dense, whereas I would have hoped that sqlite could use a sparse index (much smaller, ie. the data 'pages' serve as the bottom leaves of the index tree) with clustered rows, in which case the cost would still be proportional to result size but with less wasted motion. On the other hand, I'm thinking one must be careful about what reads at sqlite.org, having been fooled already today by the impression of LIKE being able to use an index and not finding that it does. Also there is that built-in sequence field sqlite uses, perhaps that precludes data clustering. I think I will look more closely at its physical organization as I've been thinking of using its virtual machine for a little project of mine. Maybe it will turn out that sqlite is aimed at apps which return results that are small compared to table size. Received on Fri Apr 24 2009 - 01:48:06 CEST

Original text of this message