Re: Db design for a flexible search

From: Bob Stearns <rstearns1241_at_charter.net>
Date: Mon, 19 Dec 2005 18:49:05 -0500
Message-ID: <ZDHpf.13512$Eu3.8899_at_fe07.lga>


ironcito wrote:
> Hello,
>
> I don't know if this is the appropiate newsgroup for this post, but
> it's the most related one I could find. Please advise if it isn't.
> I'm building a database containing music data, basically artist and
> tracks. For example, a record might be "14369, The Beatles, Help",
> where the first number is an ID key. I might later add things like
> albums, but it doesn't make a difference in this issue. The problem is
> that these names are often unusual, and I want to be able to build a
> search engine that is capable of overcoming this and displaying
> relevant results. For example, there's a song named "A.D.I.D.A.S.", and
> many people are likely to search for just "adidas". Or names like
> Beyoncé, P!nk, and such. I can see ways to overcome accented
> characters rather easily, but other cases are tougher. What would be
> the correct approach to this? I was thinking of adding an extra
> field(s) that would contain keywords that people are likely to search.
> Or is there a more automated way to solve this things?
> Any comment or idea is greatly appreciated. Thanks!
>
> Diego
>
To stay with SQL, which is in itself a valid question, you need to add "searchable" columns and keywords. The "searchable" columns are derived from the original artist and title columns by mechanically converting them to a canonical form: converting "accented" characters to their base form, monocasing the entire column, reducing punctuation (I usually leave only internal apostrophes though even they may be elided), etc. The same process must be applied to the search terms to insure matches.

The addition of keywords is usually more human intensive: they are adding data to a row that is not contained in the original data: synonyms, homonyms, etc. (How do you, for instance, derive pink from the visual [almost] homonym P!nk?) Some of this may be table driven. Received on Tue Dec 20 2005 - 00:49:05 CET

Original text of this message