Re: Better search algorithm?

From: Heinz Huber <hhuber_at_racon-linz.at>
Date: Wed, 25 Jul 2001 08:11:32 +0200
Message-ID: <3B5E6314.C35F918C_at_racon-linz.at>


Jason Chancellor wrote:
>
> Hello, I am working on a project which is beginning to show some
> performance issues, and I have been trying some different techniques
> with no better results....so I thought I would ask around for some
> advice.
>
> The project is a matchmaker which basically pairs up people based on a
> questionnaire and their distance from each other. IE, I would like to
> find all people who are girls and live 30 miles from me. The database
> is MS Sql Server 7
>
> The tables in question are as follows :
>
> USER_INFO
> -----------------
> user_id numeric(10,0)
> zipcode int
> sex int
> age int
> etc......
>
> ZIP_CODE
> ---------------
> zipcode int
> longitude float
> latitude float
>
> SEARCH
> ------
> search_user_id numeric(10,0)
> found_user_id numeric(10,0)
> distance float
>
> OK, heres what I was trying. I was selecting a group of users from
> the USER_INFO table based on their answers to questions and inserting
> them into the SEARCH table. The search_user_id is who requested the
> search, and the found_user_id is who the search produced (depending on
> how broad the search was, this insert could be 1000's of rows. After
> the insert, I had a stored proc which went through the SEARCH table
> and calculated the distance the two people were from each other based
> on the longitude and latitude of their two zipcodes, and then this
> distance was used to determine if these two users were close enough
> together. This distance calculation is what is killing my
> performance, 1st because its a complicated algorithm, and two
> because I have to basically go through 1000s of rows and do 1 update
> at a time.
>
> I would like to be able to calculate the distance at the same time I
> am selecting the users based on their questions. My initial thought
> was to create a table which stored the precalculated distances between
> all zipcodes in America and then join in this table when doing my
> search. However, after a few calculations I realized that this would
> require this distance table to have over 640 million rows (thats a
> couple too many)

You should be able to update all the row in Search with one SQL statement: UPDATE Search
 SET distance =

    (SELECT SQRT(POWER(sz.longitude - fz.longitude, 2) + POWER(sz.latitude - fz.latitude, 2))

     FROM User_Info s JOIN Zip_Code sz ON s.zipcode = sz.zipcode, User_Info f JOIN Zip_Code sf ON f.zipcode = fz.zipcode

     WHERE s.user_id = Search.search_user_id AND f.user_id = Search.found_user_id)
 WHERE search_user_id = :id;

Perhaps the function names have to be adapted to SQL Server, because I'm not sure about the standard names.

Another possibility would be to go by zip code of the found ids: - Get a list of all zip codes of the found users. - Calculate the distance for each of these zip codes and do an update for every user that has the respective zip code.

This will cut down the number of updates drasticly, I think.

hth,
Heinz Received on Wed Jul 25 2001 - 08:11:32 CEST

Original text of this message