Re: Free text search in MySQL

From: Jerry Stuckle <jstucklex_at_attglobal.net>
Date: Thu, 31 Mar 2016 14:51:51 -0400
Message-ID: <ndjre5$tdf$1_at_jstuckle.eternal-september.org>


On 3/31/2016 2:25 PM, the well-known troll Thomas 'Pointed Head' Lahn wrote:
> Gordon Burditt wrote:
>

>>>> If I have the text "A rat in a hat" inside a MySQL DB, and if on my
>>>> webpage someone searches for the text "rat", I would like them to get
>>>> that line - how do I do that? I realise that this sort of *precludes* my
>>>> previous question about fuzzy matching, but I would still like to
>>>> know....
>>>
>>>   SELECT column FROM table WHERE text_key_column LIKE '%rat%';
>>
>> Beware that this also matches words or phrases like 'Democrat',
>> 'movie rating', and 'proration'.  If you put spaces around the word
>> in the query, you will fail to find mentions of that word that begin
>> or end the entry, or where the word has punctuation instead of a
>> trailing space.

>
> Reminds me of “The Boscombe Valley Mystery” :)
>
>> Some forms of regular expressions have a better facility for matching
>> "word boundaries" for some reasonable definition of the term, useful
>> if your database provides that.

>
> MySQL (which is the Database _Management System_ which is what matters) that
> is on-topic here, does.
>
> However, regular expressions, which are expensive compared to string search,
> are not required to match the space-delimited character sequence (which may
> be defined as a “word”) “rat”:
>
> SELECT `column` FROM `table`
> WHERE (`text_key_column` LIKE "rat"
> OR `text_key_column` LIKE "rat %"
> OR `text_key_column` LIKE "% rat %"
> OR `text_key_column` LIKE "% rat");
>

Which will not catch
The
cat
and
the
rat
are
fat.

Or similar (hint: `text_key_column` LIKE 'rat' will ONLY match a text column containing *exactly* 'rat').

And the correct SQL syntax is single quotes around the value, not double. Plus you have not escaped your input values.

FAIL.
> I do not know if this is in total more expensive than proper regular
> expression matching, which would be
>
> SELECT `column` FROM `table`
> WHERE `text_key_column` REGEXP "[[:<:]]rat[[:>:]]";
>
> in MySQL. Conciseness can one deceive over the underlying complexity.
>
> <http://dev.mysql.com/doc/refman/5.7/en/regexp.html#operator_regexp>
>

>> Take the contents of the Subject and Text Body, split it into
>> words, […]

>
> Splitting text into words only seems to be a trivial task; BTDT. It
> certainly is not trivial in MySQL. The least you have to do is to remove
> punctuation *if it is not within a word*, and to collapse whitespace.
>
> Please keep one attribution line per quotation level.
>

Please don't be your usual pedantic self.

-- 
==================
Remove the "x" from my email address
Jerry Stuckle
jstucklex_at_attglobal.net
==================
Received on Thu Mar 31 2016 - 20:51:51 CEST

Original text of this message