Re: Free text search in MySQL

From: Thomas 'PointedEars' Lahn <PointedEars_at_web.de>
Date: Thu, 31 Mar 2016 20:25:40 +0200
Message-ID: <3114073.WFD38WVuFh_at_PointedEars.de>


Gordon Burditt wrote:

>>> If I have the text "A rat in a hat" inside a MySQL DB, and if on my
>>> webpage someone searches for the text "rat", I would like them to get
>>> that line - how do I do that? I realise that this sort of *precludes* my
>>> previous question about fuzzy matching, but I would still like to
>>> know....
>> 
>>   SELECT column FROM table WHERE text_key_column LIKE '%rat%';

>
> Beware that this also matches words or phrases like 'Democrat',
> 'movie rating', and 'proration'. If you put spaces around the word
> in the query, you will fail to find mentions of that word that begin
> or end the entry, or where the word has punctuation instead of a
> trailing space.

Reminds me of “The Boscombe Valley Mystery” :)

> Some forms of regular expressions have a better facility for matching
> "word boundaries" for some reasonable definition of the term, useful
> if your database provides that.

MySQL (which is the Database _Management System_ which is what matters) that is on-topic here, does.

However, regular expressions, which are expensive compared to string search, are not required to match the space-delimited character sequence (which may be defined as a “word”) “rat”:

  SELECT `column` FROM `table`
  WHERE (`text_key_column` LIKE "rat"

         OR `text_key_column` LIKE "rat %"
         OR `text_key_column` LIKE "% rat %"
         OR `text_key_column` LIKE "% rat");

I do not know if this is in total more expensive than proper regular expression matching, which would be

  SELECT `column` FROM `table`
  WHERE `text_key_column` REGEXP "[[:<:]]rat[[:>:]]";

in MySQL. Conciseness can one deceive over the underlying complexity.

<http://dev.mysql.com/doc/refman/5.7/en/regexp.html#operator_regexp>

> Take the contents of the Subject and Text Body, split it into
> words, […]

Splitting text into words only seems to be a trivial task; BTDT. It certainly is not trivial in MySQL. The least you have to do is to remove punctuation *if it is not within a word*, and to collapse whitespace.

Please keep one attribution line per quotation level.

-- 
PointedEars
Zend Certified PHP Engineer 
<http://www.zend.com/en/yellow-pages/ZEND024953> | Twitter: _at_PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.
Received on Thu Mar 31 2016 - 20:25:40 CEST

Original text of this message