Re: Search efficiency - RDMS search versus free text search

From: paul c <toledobythesea_at_oohay.ac>
Date: Wed, 14 Jan 2009 14:56:26 GMT
Message-ID: <u2nbl.4952$PH1.750_at_edtnps82>


Taras_96 wrote:
> Hi everyone,
>
> I just had an interesting discussion with a colleague regarding how we
> are implementing search. We allow users to browse resources under a
> list of topics, so for example a list of all resources relating to
> 'cars'. One way of doing this is with a SQL query against the database
> backend:
>
> SELECT * FROM resources WHERE topic_area = 'cars'
>
> The actual query is quite a bit more complicated than this (involves
> table joins and the like).
>
> However, he said that due to the table joins (coming from the fact
> that the database is normalised), that this technique is quite
> resource expensive. Instead, we are using the Google Search Appliance
> to index our resources (delivered via the web), and creating the topic
> pages by using Google to search on meta-data, which apparently results
> in better performance.
>
> This came to me somewhat as a suprise, as I was under the impression
> that returning such result sets were one of the key uses of a RDMS -
> the efficient storage and retrieval of records. I can understand the
> performance issues if the search was a free text search, but the above
> search is really searching against an indexed column value.
>
> Is this often the case with enterprise level database searches - that
> the database itself isn't used for searching because of performance
> issues?
>
> Cheers
>
> Taras

I think this is a significant question even if it is perhaps naive in the c.d.t. context. The question I would ask is: does the 'appliance' give the same answers as the dbms? Possibly the answer to that question would indicate that the requirements for this particular database are just as fuzzy as the answers.

Presumably, this is a read-only db at least as far as some users are concerned. If there is a requirement for accurate updating by other users, let's hope they use some version or special copy of the db itself. On the other hand, I remember read-only advertising research apps from the early 1980's that did fuzzy searching. Their main requirement was to give possible answers, not all answers.

(Some time ago I was interested in finding the children of a woman I thought to be dead for twenty or more years. No criticism of Google but its algorithms got me nowhere whereas Yahoo gave me some clues as to countries she might have lived in. Knowing her husband's surname and the given name of her male child, I looked him up in a foreign telephone directory and by accident reached another son I didn't know of, and then to my surprise, the lady herself, still alive. At one time, Yahoo's machine-generated indexes were edited to some extent manually. No idea if this is still the case, but this example shows that the ranking involved in these 'search engines' may omit useful answers and has a big effect on subsequent searches.) Received on Wed Jan 14 2009 - 15:56:26 CET

Original text of this message