Re: Does MySQL have fuzzy matching or no?

From: Axel Schwenke <axel.schwenke_at_gmx.de>
Date: Fri, 15 Dec 2017 23:45:03 +0100
Message-ID: <p11j9g$t67$1_at_dont-email.me>


[Quoted] On 15.12.2017 21:20, Luuk wrote:
> On 15-12-17 18:46, Lew Pitcher wrote:

>>> If you are looking at strings that might sound similar if pronounced, then >>> you could use the SOUNDEX() function to match to a pre-determined value

> What am i missing?
>
> [root_at_test]> select * from test15;
> +------+
> | test |
> +------+
> | doop |
> | pood |
> | poop |
> +------+
> 3 rows in set (0.00 sec)
>
> [root_at_test]> select * from test15 where test sounds like 'doop';
> +------+
> | test |
> +------+
> | doop |
> +------+

Nothing.

[Quoted] SOUNDEX() is pretty poor when it comes to generic fuzzy matching. It works only for phonetically similar words (like "Meyer" vs. "Mayer"), only for English and even then it yields ... interesting ... results from time to time.

N-gram matching on the other hand is a very nice tool to find similar texts - especially when those texts are rather long. I had big success with it when I consolidated my joke collection (collected from various sources around the - then new - internet). Back those days I wrote a tool in C (after having read an article about trigram matching in a computer mag) and it helped excellently in weeding out (near-)duplicates.

Oracle added an n-gram parser to the FULLTEXT indexer for completely different reasons - to support languages that don't have the concept of separation characters in written text. But it will work nicely for fuzzy matching in western languages, too. Received on Fri Dec 15 2017 - 23:45:03 CET

Original text of this message