Re: Db design for a flexible search

From: dawn <dawnwolthuis_at_gmail.com>
Date: 19 Dec 2005 18:15:50 -0800
Message-ID: <1135044950.857643.82510_at_z14g2000cwz.googlegroups.com>


ironcito wrote:
> Hello,
>
> I don't know if this is the appropiate newsgroup for this post, but
> it's the most related one I could find. Please advise if it isn't.
> I'm building a database containing music data, basically artist and
> tracks. For example, a record might be "14369, The Beatles, Help",
> where the first number is an ID key. I might later add things like
> albums, but it doesn't make a difference in this issue. The problem is
> that these names are often unusual, and I want to be able to build a
> search engine that is capable of overcoming this and displaying
> relevant results. For example, there's a song named "A.D.I.D.A.S.", and
> many people are likely to search for just "adidas". Or names like
> Beyoncé, P!nk, and such. I can see ways to overcome accented
> characters rather easily, but other cases are tougher. What would be
> the correct approach to this? I was thinking of adding an extra
> field(s) that would contain keywords that people are likely to search.
> Or is there a more automated way to solve this things?
> Any comment or idea is greatly appreciated. Thanks!

How you might go about this is database-implementation dependent, but here is one approach that solves the issue of punctuation and letter cases. There are also old soundex algorithms (and likely newer better ones) that can be further applied to this approach.

Create a virtual field which would be a UDF (user-defined function) in SQL Server, for example that contains either an all upper case or all lower case version of the name of the song, stripped of all punctuation and perhaps even all spaces. Create an index on this derived attribute. When the user enters data, perform the same algorithm on their data before searching your derived data attribute for a matching value.

A soundex algorithm would do further manipulations such as stripping out vowels, equating d's and t's, etc. If you go the soundex route, then first showing exact matches (exact of the smashed-samecased version) and then soundex matches can be better than mixing in the "sounds like" with the "matches" in the list you return for selection by the user.

I hope that made sense. Cheers! --dawn Received on Tue Dec 20 2005 - 03:15:50 CET

Original text of this message