Re: Search efficiency - RDMS search versus free text search

From: <patrick61z_at_yahoo.com>
Date: Fri, 23 Jan 2009 13:06:15 -0800 (PST)
Message-ID: <6d5498ed-149a-4aba-8e13-3d6a6b9feea7_at_y1g2000pra.googlegroups.com>


On Jan 13, 6:10 am, Taras_96 <taras..._at_gmail.com> wrote:
> Hi everyone,
>
> I just had an interesting discussion with a colleague regarding how we
> are implementing search. We allow users to browse resources under a
> list of topics, so for example a list of all resources relating to
> 'cars'. One way of doing this is with a SQL query against the database
> backend:
>
> SELECT * FROM resources WHERE topic_area = 'cars'
>
> The actual query is quite a bit more complicated than this (involves
> table joins and the like).
>
> However, he said that due to the table joins (coming from the fact
> that the database is normalised), that this technique is quite
> resource expensive. Instead, we are using the Google Search Appliance
> to index our resources (delivered via the web), and creating the topic
> pages by using Google to search on meta-data, which apparently results
> in better performance.
>
> This came to me somewhat as a suprise, as I was under the impression
> that returning such result sets were one of the key uses of a RDMS -
> the efficient storage and retrieval of records. I can understand the
> performance issues if the search was a free text search, but the above
> search is really searching against an indexed column value.
>
> Is this often the case with enterprise level database searches - that
> the database itself isn't used for searching because of performance
> issues?
>
> Cheers
>
> Taras

Search like this is probably one of the areas that RDBMS's struggle with, simply because the problem of keyword or text search is not what relational database tech is all about.

Yes it can do it, but full text search is best designed outside of relational algebra and like repeating groups in some of todays database engines, the engine itself will do full text indexing for you. And you can bet that underneath the hood, theres real programmers implementing indexes that maximize performance.

An example I found is http://www.developer.com/db/article.php/3446891 that discusses how SQL Server 2008 bails on this task to an instance of "Microsoft Search Service" and to me this makes perfect sense.

It wouldn't hurt to forget relational algebra long enough to read a few articles about document indexing and the design decisions that come into play. Along with that, be familiar with the types of indexes in use on todays databases (that while are parts of relational databases themselves, are definitely built using non relational algorithms). I've found a basic understanding of indexing techniques to be pretty helpful. Received on Fri Jan 23 2009 - 22:06:15 CET

Original text of this message