Query Challenge

From: Aamir Baig <abaig_at_etilize.com>
Date: 26 May 2003 21:03:30 -0700
Message-ID: <18456fd3.0305262003.d50122f_at_posting.google.com>



Hello All,
I am trying to build a parametric search engine on a DB that does not allow sub-selects (MySQL, I wouldn't want to use sub-selects either for performance reasons). The problem I believe can be essentially generalized as follows:

Have a table named Entity (PK EntityID) and another table called EntityProperties (PK EPID, Key varchar, Value varchar). EP stores key/value proeprties of E.

What I would like to be able to do is to return a list of Entities that match a given set of key/value pairs. The query in itself is simple:

Lets say I have 4 key value pairs (key1, value1) , (key2, value2),
(key3, value3), (key4, value4) entered for search. The resulting
search query would look like:

Select E.*
From
Entity E, EntityProperties EP
Where
E.EID = EP.EID AND
(
(EP.KEY = key1 AND EP.VALUE = value1) OR
(EP.KEY = key2 AND EP.VALUE = value2) OR
(EP.KEY = key3 AND EP.VALUE = value3) OR
(EP.KEY = key4 AND EP.VALUE = value4) OR
)
GROUP BY E.EID
Having count(E.EID) >= 4;

The count is 4, as there are 4 matching conditions required.

The problem comes in when I have like 200,000 rows in E and 2 million rows in EP, the query above just crawls to like 20 secs average.

So I create a 2 column index on EP (Key, value), and the query does become faster, comes down to around 7 secs avg.

But a) This is still not fast enough, I would like it to be under a second, and b) I expect my EP to grow upto around 20-30 millions rows, so it certainly won't scale too well.

I did think about creating key-value pair keywords, creating a column in E to store them and then create a FULL-TEXT index and use a BOOLEAN MODE search, however, In the key/value match conditions I use above, I might also need comparison operators..i-e something like: EP.KEY = key1 and EP.VALUE > value1
and that the current mysql full-text implementation does not allow.

I was also wondering if hash indexes can be used somehow, But I have no idea a) how they work in and b) whether they would be useful.

Another issue that I have not yet mentioned is that the key/value pairs can be duplicated . I-e an Entity can have multiple values for a key. This makes the count() > = no of conditions check invalid. Now, I'm really breaking my brain trying to figure out what's the accurate/fastest way to do this.

I would appreciate any advice/guidance in optimizing this query or a totally different solution that accomplishes the same result and showing me how to do this right.

Thank you,
Aamir Baig Received on Tue May 27 2003 - 06:03:30 CEST

Original text of this message